Data science
Data science

Why 'curate first, annotate smarter' is reshaping computer vision development

Strategic data selection and curation reduce annotation costs and enhance development productivity in computer vision teams.

In-Silico Perturbation Meets Single-Cell Foundation Models: From Zero-Shot Potential to Fine-Tuned...

In-silico perturbation simulates cellular state changes, but biological trustworthiness remains a challenge despite advancements in single-cell foundation models.

fromComputerworld

AI project 'failure' has little to do with AI

The reliability of genAI is compromised by various factors, necessitating independent verification of its outputs.

Datadog launches Experiments for A/B testing in observability

Datadog Experiments integrates A/B testing and product analytics into a single platform, addressing fragmentation in product development tools.

fromFortune

Meet China's AI-powered recycling robot that sorts 220 pounds of clothes in 2 to 3 minutes | Fortune

AI technology in textile recycling significantly improves efficiency and reduces waste impact.

#ai

Data science

TurboQuant is a big deal, but it won't end the memory crunch

2 days ago

Data science

How to halve Claude output costs with a markdown tweak

fromeLearning Industry

What It Actually Means To Build A Learning System Today

Organizations now build AI-driven platforms to control data retrieval and evaluation, making internal knowledge the core differentiator in learning intelligence.

fromTNW | Opinion

AI amplifies whatever you feed it, including confusion

Organizations struggle with AI due to confusion over relevant data, leading to overwhelmed teams and a disconnect between ambition and execution.

A data trust scoring framework for reliable and responsible AI systems

A rigorous trust scoring framework is essential to prevent AI from perpetuating inequality through biased data.

fromTNW | Corporates-Innovation

Google's TurboQuant compresses AI memory by 6x, rattles chip stocks

Google's TurboQuant algorithm significantly reduces memory usage for AI models, impacting memory stock prices due to lower physical memory needs.

TurboQuant is a big deal, but it won't end the memory crunch

TurboQuant is an AI data compression technology that reduces memory usage for KV caches but may not significantly alleviate memory shortages.

2 days ago

How to halve Claude output costs with a markdown tweak

A markdown file can reduce Claude's token output by over 50%, aiding enterprises in managing AI costs during production.

fromeLearning Industry

What It Actually Means To Build A Learning System Today

Organizations now build AI-driven platforms to control data retrieval and evaluation, making internal knowledge the core differentiator in learning intelligence.

fromTNW | Opinion

AI amplifies whatever you feed it, including confusion

Organizations struggle with AI due to confusion over relevant data, leading to overwhelmed teams and a disconnect between ambition and execution.

A data trust scoring framework for reliable and responsible AI systems

A rigorous trust scoring framework is essential to prevent AI from perpetuating inequality through biased data.

fromTNW | Corporates-Innovation

Google's TurboQuant compresses AI memory by 6x, rattles chip stocks

Google's TurboQuant algorithm significantly reduces memory usage for AI models, impacting memory stock prices due to lower physical memory needs.

IT lesson from the Iran war: AI makes your data problems so much worse

AI can exacerbate existing data issues in enterprises, as demonstrated by the US military's bombing due to outdated intelligence.

A GitHub tinkerer teaches Claude to talk less, and that may matter more than it seems

A markdown file can significantly reduce AI output token usage, enhancing efficiency without code changes.

The hidden costs of 'helpful' AI

Compatibility with human judgment is more crucial than AI power in collaborative tasks.

DeepSeek and Grok Cloud Dancing Data Color Schemes

The 2026 Pantone Color of the Year, Cloud Dancer, presents unique challenges and opportunities for data visualization color schemes.

Dancing in the clouds with Copilot and Claude

Cloud Dancing, Pantone's 2026 Color of the Year, presents challenges for data visualization color schemes due to its neutral, billowy white nature, requiring strategic application as background, palette hue, or diverging midpoint.

2 days ago

DeepSeek and Grok Cloud Dancing Data Color Schemes

The 2026 Pantone Color of the Year, Cloud Dancer, presents unique challenges and opportunities for data visualization color schemes.

Dancing in the clouds with Copilot and Claude

more#data-visualization

fromTechCrunch

4 days ago

Mantis Biotech is making 'digital twins' of humans to help solve medicine's data availability problem | TechCrunch

Large language models can enhance genomics and clinical practices, but struggle with rare diseases due to data scarcity.

fromComputerWeekly.com

4 days ago

Interview: Thierry Martin, head of enterprise data and analytics, Toyota Motor Europe | Computer Weekly

Thierry Martin emphasizes the connection between art and effective enterprise data platforms, highlighting aesthetics, simplicity, and the essence of goals.

fromPsychology Today

6 days ago

A New Digital Twin for Brain Activity Aims to Speed Research

A new AI model can predict human brain activity from various stimuli, accelerating neuroscience research and understanding of the brain.

fromStreetsblog USA

Talking Headways Podcast: Congestion Pricing Data Collection - Streetsblog USA

The MTA's congestion pricing report highlights data collection partnerships and air quality monitoring for future pricing schemes.

fromAnythingconverter

AnythingCounter - Real-Time Digital World Statistics with Sources

Approximately 500 tonnes of gold are lost in e-waste every year, which translates to a staggering worth of about $15 billion, highlighting the significant economic impact of electronic waste.

Data science

fromMarTech

How NotebookLM turns marketing docs into usable insights | MarTech

AI tools like NotebookLM transform static marketing documents into interactive knowledge bases for better insights and exploration.

fromLondon Business News | Londonlovesbusiness.com

Transform your data into compelling stories that drive results - London Business News | Londonlovesbusiness.com

Data storytelling transforms data into engaging narratives that inspire action and drive change.

fromMarTech

Data built modern marketing, but AI is rewriting the rules | MarTech

Data has evolved from being seen as a liability to a core asset for businesses, driving marketing strategies and decision-making.

As AI hits scaling limits, Google smashes the context barrier

TurboQuant significantly reduces KV cache size, enhancing AI model performance and expanding context windows for complex workloads.

fromFast Company

A top AI researcher explains the limitations of current models

Francois Chollet's ARC-AGI-3 benchmark reveals AI's limitations in navigating novel situations compared to human intelligence.

ODSC AI East 2026: Ten Sessions AI Engineers Should Not Miss

The ODSC AI East 2026 schedule emphasizes production readiness and operational reliability in applied AI.

Oracle adds pre-built agents to Private Agent Factory in AI Database 26ai

Structured Data Analysis Agent enhances data processing capabilities for enterprises using tools like Python's pandas library.

SAP and ODI are working on the IDEA AI-ready data infrastructure

The IDEA program aims to help organizations make their data infrastructure AI-ready, addressing the challenge of data primarily designed for human use, which is not suitable for AI applications.

Data science

A guide to the Nature Index

The Nature Index provides absolute and fractional counts of article publication at the institutional and national level and, as such, is an indicator of global high-quality research output and collaboration.

Data science

AI KPIs That Matter: Moving Beyond Model Accuracy in 2026

Measuring AI success requires connecting model performance to business outcomes, not just focusing on accuracy metrics.

AI Expo Halls: A Low-Commitment Way to Keep Up With Applied AI in 2026

AI Expo Halls provide hands-on exposure and direct Q&A, offering real market intelligence in applied AI.

fromGeeky Gadgets

7 Hidden Agent Skills in Google's NotebookLM You Need to Try

Combining NotebookLM and Claude's skill system creates specialized AI agents for specific tasks like B2B sales and SEO content generation.

Data Mesh in Action: A Journey From Ideation to Implementation

Data mesh is essential for organizations to develop independent data analytics capabilities after separation from larger parent companies.

How I squeeze fresh science from public data

Utilizing existing data can lead to significant discoveries and collaborations in research.

CERN eggheads burn AI into silicon to stem data deluge

CERN uses custom AI to optimize real-time data collection from the Large Hadron Collider, processing hundreds of terabytes per second.

Built a Music Genre Classifier That Predicts Song Genres from Lyrics

Lyrics can be used to classify music genres with approximately 78% accuracy using Natural Language Processing and Logistic Regression.

The 'toggle-away' efficiencies: Cutting AI costs inside the training loop

Simple optimizations can significantly reduce AI training costs and carbon emissions without needing the latest GPUs.

Building Consistent Data Foundations at Scale

Building consistent data foundations through intentional architecture, engineering, and governance is essential to prevent fragmentation, support AI adoption, ensure regulatory compliance, and enable reliable organizational decisions at scale.

How to create AI agents with Neo4j Aura Agent

Neo4j Aura Agent is an end-to-end platform for creating agents, connecting them to knowledge graphs, and deploying to production in minutes. In this post, we'll explore the features of Neo4j Aura Agent that make this all possible, along with links to coded examples to get hands-on with the platform.

Data science

fromFortune

Pokemon Go players built a 30-billion-photo map that's now training robots to deliver your pizza | Fortune

Pokémon Go players' 30 billion crowdsourced images created a photorealistic street-level world model enabling autonomous delivery robots to navigate cities globally.

fromInside Higher Ed | Higher Education News, Events and Jobs

Group Formed to Recommend Changes to Battered NCES

A task force formed by the Institute for Higher Education Policy will develop recommendations to modernize the National Center for Education Statistics after significant staffing cuts.

QCon London 2026: Blurring the Lines: Engineering & Data Teams in the Age of AI

AI has blurred engineering and data team boundaries, requiring data contracts, full-stack observability, and production-data testing to ensure data quality and real-time system reliability.

AI Engineer vs Data Scientist Salary in 2026: Why Production Skills Pay More

AI Engineer has replaced Data Scientist as the highest-paid tech role, commanding 15-25% higher salaries due to focus on production-ready systems rather than insights.

Migrating to the Lakehouse Without the Big Bang: An Incremental Approach

Query federation enables safe, incremental lakehouse migration by allowing simultaneous queries across legacy warehouses and new lakehouse systems without risky big bang cutover approaches.

fromScalac - Software Development Company - Akka, Kafka, Spark, ZIO

Scalendar April 2026

April 2026 features major tech conferences including Applied Machine Learning Conference, Google Cloud Next, ACM CHI, and JPoint, covering ML systems, data platforms, human-AI interaction, and JVM performance.

fromComputerworld

6 ways Gemini supercharges Google Sheets

Google's Gemini AI assistant in Google Sheets analyzes data, generates visualizations, creates formulas, and automates spreadsheet tasks through a sidebar interface or cell formulas.

fromSearch Engine Roundtable

Live Sports Scores In Google AI Mode

AI Mode displays live sports data including real-time game scores and statistics when searching for sports teams during active games.

QCon London 2026: Reliable Retrieval for Production AI Systems

Production RAG system failures primarily stem from indexing and retrieval challenges rather than language model limitations, requiring careful document parsing, chunking strategies, and enhanced retrieval methods.

Why the crisis in official statistics matters - and how it can be fixed

Governments must address declining survey response rates, inadequate funding, and political interference threatening the reliability of official statistics essential for effective policymaking.

fromTNW | Deep-Tech

Universal Robots and Scale AI launch the UR AI Trainer

Our customers, ranging from large enterprises to AI research labs, are no longer just asking for AI features. They need a way to collect high-fidelity, synchronized robot and vision data to train AI models on the same robots they intend to deploy. Our AI Trainer is the industry's first direct lab-to-factory solution for AI model training.

Data science

AlphaFold hits 'next level': the AI tool now includes protein pairing

Since its release in 2021, this repository has become a bedrock in discovery and a first port of call for research projects that try to understand life at the molecular level. But previous iterations of the database lacked predictions of how proteins form complexes, which can be indispensable for their function.

Data science

fromTechCrunch

Nvidia's DLSS 5 uses generative AI to boost photo-realism in video games, with ambitions beyond gaming | TechCrunch

Nvidia introduced DLSS 5, combining 3D graphics with generative AI to create realistic video game visuals using less computational power, with applications extending beyond gaming into enterprise computing.

#brain-initiative

BRAIN Initiative: Data Archives for the BRAIN Initiative

The BRAIN Initiative data ecosystem provides domain-specific archives for long-term storage, curation, and community access to neuroscience research data, with continued funding essential for maintaining reproducible pipelines and accommodating exponential data growth.

Data science

Brain Cell Atlas: From Data to Knowledge Base

BRAIN Initiative: Data Archives for the BRAIN Initiative

Data science

Brain Cell Atlas: From Data to Knowledge Base

more#brain-initiative

fromHackernoon

The World Model Problem: Why Sora-Style Video Still Breaks | HackerNoon

World models require consistency across three dimensions: temporal coherence, cross-modal alignment, and physical plausibility to achieve general artificial intelligence.

Google Researchers Propose Bayesian Teaching Method for Large Language Models

Google researchers developed a training method enabling large language models to approximate Bayesian reasoning by learning from optimal Bayesian system predictions, improving belief updates during multi-step interactions.

fromwww.scientificamerican.com

OpenAI and Ginkgo Bioworks show how AI can accelerate scientific discovery

OpenAI's GPT successfully designed and iterated on biology experiments autonomously, demonstrating AI capability in scientific hypothesis generation, experimental design, and result interpretation beyond summarization tasks.

fromComputerWeekly.com

Met Office 'supercomputing as a service' one year old | Computer Weekly

The Met Office's cloud-based supercomputing system from Microsoft achieved 100% availability for critical workloads over one year, delivering 60 quadrillion calculations per second with comparable latency to on-site infrastructure while offering greater flexibility and cost efficiency.

fromEngadget

Google built a flash-flood prediction tool using Gemini and old news reports

Google tasked Gemini with sorting through 5 million news articles from around the world and isolating flood reports. It transformed this data into a geo-tagged series of chronological events. Next, researchers trained a model to ingest current weather forecasts and leverage the Groundsource data to determine the likelihood of a flash flood in a given area.

Data science

AI can 'same-ify' human expression - can some brains resist its pull?

Large language models are homogenizing human writing styles, reasoning methods, and perspectives, potentially creating widespread sameness in discourse even among non-direct AI users.

fromHarvard Business Review

Research: Using AI Can Stifle Innovation. But It Doesn't Have To.

AI's ease of knowledge reuse creates efficiency gains but carries hidden cognitive and organizational costs that research reveals.

100 Scala Interview Questions and Answers for Data Engineers

Structured Scala and Apache Spark interview preparation requires understanding distributed systems, performance trade-offs, and pipeline design beyond theoretical knowledge.

fromThedrum

Google Analytics 4: What you need to know about the future of analytics?

Google Analytics 4 replaces Universal Analytics by July 2023, requiring marketers to transition immediately to maintain year-on-year performance data and adapt to a cookieless future driven by privacy regulations and browser controls.

fromFlowingData

Bird search patterns

A comprehensive analysis of Google search patterns related to birds explores what species people seek information about most frequently. The investigation spans six interconnected analyses examining bird variety, taxonomic classifications, information sharing behaviors, birder sighting correlations with search trends, regional popularity differences across states, and temporal patterns in search interest.

Data science

Unpacking the deceptively simple science of tokenomics

AI datacenter efficiency is measured by tokens generated per watt, with profitability determined by token revenue minus infrastructure costs, but optimization must balance throughput with service quality requirements.

Scaling Human Judgment: How Dropbox Uses LLMs to Improve Labeling for RAG Systems

Dropbox uses LLM-augmented human labeling to improve document retrieval quality in RAG systems, addressing the bottleneck of ranking millions of enterprise documents for relevance to user queries.

fromFlowingData

Mapping what makes us happy

HappyDB contains 100,000 crowdsourced happy moments classified and visualized on a map using axes of personal agency and time horizon, with filtering by demographics.

The revenge of SQL: How a 50-year-old language reinvents itself

SQL has experienced a major comeback driven by SQLite in browsers, improved language tools, and PostgreSQL's jsonb type, making it both traditional and exciting for modern development.

fromPsychology Today

From the Marketplace of Ideas to the Marketplace of Answers

AI language models shift belief formation from building understanding through critical thinking to selecting among pre-formed, persuasive answers, potentially replacing thinking itself with answer selection.

fromMarTech

The era of data dominance is over, and it didn't last very long | MarTech

Data alone provides limited business value; context about customers, brands, and strategy is essential for meaningful insights and decision-making.

fromTechRepublic

Inside the Gas Engine Strategy Powering AI's Next Wave

Gas reciprocating engines are emerging as a critical power solution for AI data centers, with manufacturers like Caterpillar securing multi-gigawatt orders to meet demand that exceeds grid and turbine capacity.

Hey ChatGPT, write me a fictional paper: these LLMs are willing to commit academic fraud

All major LLMs can facilitate academic fraud and junk science, though Claude models show the most resistance while Grok and early GPT versions perform worst.

fromLondon Business News | Londonlovesbusiness.com

Building a trusted AI data analyst for revenue operations - London Business News | Londonlovesbusiness.com

AI data analysts must enforce financial controls and governed metric definitions to produce revenue-grade insights that finance leaders will trust for decision-making.

fromPsychology Today

Computation Without Consequence

ChatGPT failed to recommend emergency care in 52% of cases physicians unanimously deemed emergencies, excelling only in clear patterns while struggling with subtle clinical ambiguity where consequences matter.

Buyer's guide: Comparing the leading cloud data platforms

Five leading cloud data platforms—Databricks, Snowflake, Amazon RedShift, Google BigQuery, and Microsoft Fabric—offer distinct architectural approaches for enterprise data storage, analytics, and AI workloads.

fromRealpython

The pandas DataFrame: Make Working With Data Delightful Quiz - Real Python

An 11-question interactive quiz assesses proficiency in pandas DataFrame operations including creation, column manipulation, data sorting, NumPy array extraction, and missing data handling.

Pinterest's CDC-Powered Ingestion Slashes Database Latency from 24 Hours to 15 Minutes

Pinterest deployed a next-generation database ingestion framework using CDC, Kafka, Flink, Spark, and Iceberg to reduce data latency from 24+ hours to minutes while processing only changed records.

Ataccama puts agentic data observability into platform core

Ataccama ONE introduces Agentic Data Observability technology to ensure high-quality, reliable data for AI systems while preventing autonomous errors and bias in regulated enterprises.

VAST Data leverages unique market position to develop full-stack AI infrastructure

VAST Data is expanding its software-based AI operating system across any infrastructure, positioning itself as infrastructure-agnostic like VMware was for virtualization.

fromBusiness Insider

Money managers are hungrier than ever for obscure data to give them an edge

Hedge funds and other money managers spent $2.8 billion on alternative data in 2025, according to a new report from consultancy Neudata, a 17% jump from the year before. It's more than double what asset managers spent on alternative data in 2021, which includes a wide range of non-traditional information sources. The report projects that the total spend on alternative datasets could jump to more than $23 billion in the consultancy's bull case in 2030 and just under $8 billion in the bear case.

Data science

Databricks Introduces Lakebase, a PostgreSQL Database for AI Workloads

Databricks Lakebase is a serverless PostgreSQL OLTP database that separates compute from storage and unifies transactional and analytical capabilities.

fromEntrepreneur

This Common Invisible Barrier Is Sabotaging Your Data-Driven Decisions

AI was everywhere, but I wasn't focused on product launches. I was looking at how companies think about data itself: how it's shared, governed and ultimately turned into decisions. And across conversations with executives and sessions on security and compliance, a pattern emerged: the technical limitations that once justified locking data down have largely been solved. What remains difficult is human. Alignment, trust and confidence inside organizations are now the true barriers.

Data science

fromEntrepreneur

Most Founders Don't Realize They're Giving Away Their Influence - Here's How to Take It Back

Every search, purchase, loyalty swipe, location ping and scroll feeds systems that now shape pricing, product decisions, hiring and marketing strategies. Most founders understand this in theory, but few grasp the practical consequence: whether they intend to or not, they and their customers are already casting votes with their data. And those votes? They're usually cast passively, on someone else's terms.

Data science

How to choose the best LLM using R and vitals

Swap model by creating a new chat solver, clone or create tasks with alternative LLMs, run evaluations, and bind results for comparison and analysis.

Panel: Modern Data Architectures

I wrote a book for O'Reilly on scaling machine learning with Spark specifically. My second book is coming out on how to improve high-performance Spark, the second edition. Started my career in the machine learning space 15 years ago, moved into data infrastructure, batch processing, and a year and a half ago I moved into the data streaming space, which I think it's what's going to help us pave the future in the data.

Data science

fromTreehouse Blog

Portfolio Projects for Entry-Level Data Roles

Most beginner data portfolios look similar. They include: A few cleaned datasets Some charts or dashboards A notebook with code and commentary Again, nothing here is wrong. But hiring teams don't review portfolios to check whether you can follow instructions. They review them to see whether you can think like a data analyst. When projects feel generic, reviewers are left guessing:

Data science

ServiceNow buys Pyramid Analytics

"Pyramid adds an analytics and semantic layer that can define metrics in a way that both humans and AI agents can rely on,"

Data science

2 months ago

From Graphs to Generative AI: Building Context That Pays-Part 1

Every year, poor communication and siloed data bleed companies of productivity and profit. Research shows U.S. businesses lose up to $1.2 trillion annually to ineffective communication, that's about $12,506 per employee per year. This stems from breakdowns that waste an average of 7.47 hours per employee each week on miscommunications. The damage isn't only interpersonal; it's structural. Disconnected and fragmented data systems mean that employees spend around 12 hours per week just searching for information trapped in those silos.

Data science

fromWIRED

A Wave of Unexplained Bot Traffic Is Sweeping the Web

For a brief moment in October, Alejandro Quintero thought he had made it big in China. The Bogotá-based data analyst owns and manages a website that publishes articles about paranormal activities, like ghosts and aliens. The content is written in "Spanglish," he says, and was never intended for an Asian audience. But last fall, Quintero's site suddenly began receiving a large volume of visits from China and Singapore.

Data science

2 months ago

Taking Back the Math: How Everyday Numbers Can Empower Us in an Algorithmic World

Learning basic mathematics empowers individuals to understand, question, and influence algorithms that shape choices, reducing opaque power imbalances in the algorithm-driven economy.

fromFlowingData