#apache-nifi
#apache-nifi

[ follow ]

Why data quality matters when working with data at scale

Data quality should be prioritized from the start to prevent costly issues later in data engineering projects.

Business intelligence

fromZDNET

4 days ago

I asked 5 data leaders about how they use AI to automate - and end integration nightmares

Strong processes and AI integration are essential for businesses to effectively utilize data.

DevOps

fromDevOps.com

6 days ago

Apica Extends Scope and Reach of Platform for Managing Telemetry Data - DevOps.com

Apica's Ascent platform update enhances telemetry data management for DevOps teams, improving observability and cost control.

Snowflake Supports Directory Imports

Easier package imports into Snowflake functions and procedures from stage directories and SnowGit directories streamline development and deployment.

Artificial intelligence

fromTheregister

2 weeks ago

Snowflake's ongoing pitch: bring AI to data, not vice versa

Snowflake is enhancing its platform for AI integration through strategic partnerships and acquisitions, focusing on customer ROI and data management efficiency.

Django

fromMedium

1 week ago

Snowflake Supports Directory Imports

Easier package imports into Snowflake functions and procedures from stage directories and SnowGit directories streamline development and deployment.

Artificial intelligence

fromTheregister

2 weeks ago

Snowflake's ongoing pitch: bring AI to data, not vice versa

Snowflake is enhancing its platform for AI integration through strategic partnerships and acquisitions, focusing on customer ROI and data management efficiency.

more#snowflake

fromInfoWorld

1 week ago

How Apache Kafka flexed to support queues

Apache Kafka has cemented itself as the de facto platform for event streaming, often referred to as the 'universal data substrate' due to its extensive ecosystem that enables connectivity and processing capabilities.

Scala

fromMedium

2 weeks ago

Data Extraction and Classification Using Structural Pattern Matching in Scala

Scala pattern matching enhances code readability and extensibility in real-world data engineering use cases.

Information security

fromTechzine Global

2 weeks ago

Databricks launches Lakewatch: agentic SIEM on the Lakehouse

Lakewatch is an open SIEM platform that consolidates security, IT, and business data, enabling rapid threat detection and response using AI agents.

fromInfoWorld

4 weeks ago

Migrating from Apache Airflow v2 to v3

Airflow 3 represents a clear architectural direction for the project: API-driven execution, better isolation, data-aware scheduling and a platform designed for modern scale. While Airflow 2.x is still widely used, it is clearly moving toward long-term maintenance (end-of-life April 2026) with most innovation and architectural investment happening in the 3.x line.

Software development

Data science

fromMedium

3 weeks ago

Building Consistent Data Foundations at Scale

Building consistent data foundations through intentional architecture, engineering, and governance is essential to prevent fragmentation, support AI adoption, ensure regulatory compliance, and enable reliable organizational decisions at scale.

Business intelligence

fromInfoWorld

3 weeks ago

Snowflake's new 'autonomous' AI layer aims to do the work, not just answer questions

Project SnowWork is Snowflake's autonomous AI layer that automates data analysis tasks like forecasting, churn analysis, and report generation without requiring data team intervention.

#ai-automation

Artificial intelligence

fromTechzine Global

3 weeks ago

Snowflake's Project SnowWork targets autonomous enterprise AI

Snowflake launches Project SnowWork, an autonomous AI interface that performs enterprise tasks like forecasts and reports without data team involvement, expanding from backend infrastructure to front-office productivity tool.

fromInfoWorld

1 month ago

Artificial intelligence

Databricks launches Genie Code to automate data science and engineering tasks

Artificial intelligence

fromTechzine Global

3 weeks ago

Snowflake's Project SnowWork targets autonomous enterprise AI

fromInfoWorld

1 month ago

Artificial intelligence

Databricks launches Genie Code to automate data science and engineering tasks

Update your databases now to avoid data debt

Multiple major open source databases reach end-of-life in 2026, requiring teams to plan upgrades and migrations to avoid security risks and higher costs.

Data science

fromMedium

1 month ago

Migrating to the Lakehouse Without the Big Bang: An Incremental Approach

Query federation enables safe, incremental lakehouse migration by allowing simultaneous queries across legacy warehouses and new lakehouse systems without risky big bang cutover approaches.

Software development

fromMedium

1 month ago

Unified Databricks Repository for Scala and Python Data Pipelines

Databricks repositories require structured setup with Gradle for multi-language support, dependency management, and version control to scale beyond manual notebook maintenance.

Data science

fromMedium

1 month ago

100 Scala Interview Questions and Answers for Data Engineers

Structured Scala and Apache Spark interview preparation requires understanding distributed systems, performance trade-offs, and pipeline design beyond theoretical knowledge.

Startup companies

fromInfoQ

2 months ago

Etleap Launches Iceberg Pipeline Platform to Simplify Enterprise Adoption of Apache Iceberg

Managed Iceberg pipeline platform unifies ingestion, transformation, orchestration, and table operations inside customers' VPCs, enabling enterprise Iceberg adoption without building custom stacks.

fromTechzine Global

2 months ago

Sumo Logic launches data pipeline apps for Snowflake and Databricks

Snowflake offers a fully managed data platform, but Sumo Logic users often lack insight into performance, login activity, and operational health. The Sumo Logic Snowflake Logs App analyzes login and access activity to identify anomalies or suspicious behavior. It also optimizes data pipelines with insights into long-running or failing queries. Teams can centralize log data to facilitate correlation across applications, cloud services, and data platforms.

Information security

Data science

fromDevOps.com

2 months ago

Why Data Contracts Need Apache Kafka and Apache Flink - DevOps.com

Data contracts formalize schemas, types, and quality constraints through early producer-consumer collaboration to prevent pipeline failures and reduce operational downtime.

Software development

fromMedium

2 months ago

Agentic Workflows in Scala (Without the Buzzwords)

Durable, decision-driven systems require explicit state, clear decision points, and explicit workflow orchestration rather than opaque autonomous agent loops.

Business intelligence

fromNew Relic

2 months ago

Optimize Databricks: Full Visibility with New Relic

New Relic Databricks Integration provides unified telemetry, speeding troubleshooting, improving performance and resource utilization, and linking Databricks performance directly to cost.

fromMedium

2 months ago

Agentic Workflows in Scala (Without the Buzzwords)

High-level view of the travel search workflow, highlighting parallel searches, explicit decision points, and iterative refinement. In Scala, we define this workflow using Workflows4s, encoding both state and transitions explicitly in the type system. Instead of opaque state blobs or untyped contexts, the state of the process is represented using algebraic data types - types like Started, Found, Sent, and Booked - each corresponding to a distinct point in the workflow's lifecycle.

Scala

fromInfoWorld

2 months ago

Snowflake updates developer tools, adds observability features

Snowflake adds observability capabilities via Trail The company also added new observability features in the form of Snowflake Trail, which provides visibility into data quality, pipelines, and applications, enabling developers to monitor, troubleshoot, and optimize their workflows. It is built with OpenTelemetry standards so developers can integrate with popular observability and alert platforms including Datadog, Grafana, Metaplane, PagerDuty, and Slack, among others.

DevOps

Artificial intelligence

fromInfoWorld

2 months ago

Teradata unveils enterprise AgentStack to push AI agents into production

Teradata positions Enterprise AgentStack as a vendor-agnostic execution layer across hybrid environments, contrasting platform-tied AI approaches from Snowflake and Databricks.

Business intelligence

fromTechzine Global

2 months ago

ClickHouse, the open-source challenger to Snowflake and Databricks

ClickHouse is a high-performance columnar OLAP database rapidly adopted by AI and enterprise users, now valued at $15B and acquiring Langfuse.

Artificial intelligence

fromMedium

2 months ago

Extracting AI-Ready Data From Organizational Documents

Poor document extraction corrupts retrieval; preserving document structure at ingestion produces reliable embeddings and trustworthy RAG outputs.

fromInfoWorld

2 months ago

AI-augmented data quality engineering

SHAP for feature attribution SHAP quantifies each feature's contribution to a model prediction, enabling: LIME for local interpretability LIME builds simple local models around a prediction to show how small changes influence outcomes. It answers questions like: "Would correcting age change the anomaly score?" "Would adjusting the ZIP code affect classification?" Explainability makes AI-based data remediation acceptable in regulated industries.

Artificial intelligence

Data science

fromInfoWorld

2 months ago

Snowflake debuts Cortex Code, an AI agent that understands enterprise data context

Cortex Code enables developers to use natural language to build, optimize, and deploy governed, production-ready data pipelines, analytics, ML workloads, and AI agents.

Artificial intelligence

fromInfoQ

2 months ago

Autonomous Big Data Optimization: Multi-Agent Reinforcement Learning to Achieve Self-Tuning Apache Spark

A Q-learning agent autonomously learns and generalizes optimal Spark configurations by discretizing dataset features and combining with Adaptive Query Execution for superior performance.

[ Load more ]

#apache-nifi#apache-nifi

Why data quality matters when working with data at scale

I asked 5 data leaders about how they use AI to automate - and end integration nightmares

Apica Extends Scope and Reach of Platform for Managing Telemetry Data - DevOps.com

Snowflake Supports Directory Imports

Snowflake's ongoing pitch: bring AI to data, not vice versa

Snowflake Supports Directory Imports

Snowflake's ongoing pitch: bring AI to data, not vice versa

How Apache Kafka flexed to support queues

Data Extraction and Classification Using Structural Pattern Matching in Scala

Databricks launches Lakewatch: agentic SIEM on the Lakehouse

Migrating from Apache Airflow v2 to v3

Building Consistent Data Foundations at Scale

Snowflake's new 'autonomous' AI layer aims to do the work, not just answer questions

Snowflake's Project SnowWork targets autonomous enterprise AI

Databricks launches Genie Code to automate data science and engineering tasks

Snowflake's Project SnowWork targets autonomous enterprise AI

Databricks launches Genie Code to automate data science and engineering tasks

Update your databases now to avoid data debt

Migrating to the Lakehouse Without the Big Bang: An Incremental Approach

Unified Databricks Repository for Scala and Python Data Pipelines

100 Scala Interview Questions and Answers for Data Engineers

Etleap Launches Iceberg Pipeline Platform to Simplify Enterprise Adoption of Apache Iceberg

Sumo Logic launches data pipeline apps for Snowflake and Databricks

Why Data Contracts Need Apache Kafka and Apache Flink - DevOps.com

Agentic Workflows in Scala (Without the Buzzwords)

Optimize Databricks: Full Visibility with New Relic

Agentic Workflows in Scala (Without the Buzzwords)

Snowflake updates developer tools, adds observability features

Teradata unveils enterprise AgentStack to push AI agents into production

ClickHouse, the open-source challenger to Snowflake and Databricks

Extracting AI-Ready Data From Organizational Documents

AI-augmented data quality engineering

Snowflake debuts Cortex Code, an AI agent that understands enterprise data context

Autonomous Big Data Optimization: Multi-Agent Reinforcement Learning to Achieve Self-Tuning Apache Spark

#apache-nifi
#apache-nifi