#model-robustness

[ follow ]
#ai
Artificial intelligence
fromTheregister
1 day ago

Claude is getting worse, according to Claude

Anthropic's Claude is facing significant issues with service quality and reliability, leading to customer dissatisfaction and increased complaints.
Artificial intelligence
fromTheregister
1 day ago

Claude is getting worse, according to Claude

Anthropic's Claude is facing significant issues with service quality and reliability, leading to customer dissatisfaction and increased complaints.
#large-language-models
Data science
fromMedium
5 days ago

The Top 10 LLM Training Datasets for 2026

Large language models require extensive training data, and practitioners can utilize ten leading public datasets for effective training and fine-tuning.
fromFuturism
2 months ago
Artificial intelligence

AI Agents Are Mathematically Incapable of Doing Functional Work, Paper Finds

Data science
fromMedium
5 days ago

The Top 10 LLM Training Datasets for 2026

Large language models require extensive training data, and practitioners can utilize ten leading public datasets for effective training and fine-tuning.
fromFuturism
2 months ago
Artificial intelligence

AI Agents Are Mathematically Incapable of Doing Functional Work, Paper Finds

Artificial intelligence
fromFuturism
2 days ago

OpenAI's Latest Thing It's Bragging About Is Actually Kind of Sad

The AI industry faces significant delays and cancellations in data center projects, impacting ambitious computing capacity goals.
Artificial intelligence
fromTheregister
2 days ago

The AI divide putting open weights models in spotlight

Open weights AI models are evolving from research projects to serious enterprise products, highlighting a growing divide between enterprise and frontier AI.
#ollama
fromZDNET
2 months ago
Artificial intelligence

I tested local AI on my M1 Mac, expecting magic - and got a reality check instead

fromZDNET
2 months ago
Artificial intelligence

I tested local AI on my M1 Mac, expecting magic - and got a reality check instead

#ai-agents
Data science
fromMedium
1 week ago

15 Datasets for Training and Evaluating AI Agents

Datasets for training and evaluating AI agents are essential for building reliable agentic systems and preventing execution failures.
fromTechCrunch
1 month ago
Artificial intelligence

Perplexity's new Computer is another bet that users need many AI models | TechCrunch

Artificial intelligence
fromZDNET
2 months ago

Is your AI agent up to the task? 3 ways to determine when to delegate

AI agents should be managed as an adjunct workforce, using management skills to decide which tasks to automate versus retain for humans.
Data science
fromMedium
1 week ago

15 Datasets for Training and Evaluating AI Agents

Datasets for training and evaluating AI agents are essential for building reliable agentic systems and preventing execution failures.
fromTechCrunch
1 month ago
Artificial intelligence

Perplexity's new Computer is another bet that users need many AI models | TechCrunch

fromZDNET
2 months ago
Artificial intelligence

Is your AI agent up to the task? 3 ways to determine when to delegate

DevOps
fromInfoWorld
3 weeks ago

An architecture for engineering AI context

AI systems must intelligently manage context to ensure accuracy and reliability in real applications.
Science
fromNature
3 weeks ago

Drowning in data sets? Here's how to cut them down to size

The Square Kilometre Array Observatory will generate massive data, but storage and retention pose significant challenges for researchers.
Software development
fromMedium
2 weeks ago

The Verifier-Compiler Loop: Turning Human Preferences into Production Agent Judgment

Production failures arise from compounded small errors in long workflows, not just isolated prompt failures.
Digital life
fromInfoWorld
3 weeks ago

AI optimization: How we cut energy costs in social media recommendation systems

Optimizing data processing in AI can significantly reduce energy consumption and operational costs.
#ai-development
fromInfoWorld
2 weeks ago
Artificial intelligence

Final training of AI models is a fraction of their total cost

Developing AI models incurs significant costs, with most expenditures on scaling and research rather than final training runs.
Data science
fromFast Company
2 weeks ago

A top AI researcher explains the limitations of current models

Francois Chollet's ARC-AGI-3 benchmark reveals AI's limitations in navigating novel situations compared to human intelligence.
Data science
fromMedium
3 weeks ago

AI KPIs That Matter: Moving Beyond Model Accuracy in 2026

Measuring AI success requires connecting model performance to business outcomes, not just focusing on accuracy metrics.
fromSearch Engine Roundtable
1 month ago

AI Mode Results Personalized to User Behavior

AI Mode can use your previous conversations, along with places you've searched for or tapped on in Search and Maps to deliver more relevant options, personalized to you. So if AI Mode infers that you have a preference for Italian food, plant-based meals, and places that have outdoor seating, you may get results suggesting options like these.
Privacy technologies
#ai-agent-evaluation
Software development
fromInfoQ
4 weeks ago

Evaluating AI Agents in Practice: Benchmarks, Frameworks, and Lessons Learned

AI agents require system-level evaluation across multiple turns measuring task success, tool reliability, and real-world behavior rather than single-turn NLP benchmarks like BLEU and ROUGE scores.
Artificial intelligence
fromInfoWorld
3 weeks ago

Why AI evals are the new necessity for building effective AI agents

User trust in AI agents depends on interaction-layer evaluation measuring reliability and predictability, not just model performance benchmarks.
Software development
fromInfoQ
4 weeks ago

Evaluating AI Agents in Practice: Benchmarks, Frameworks, and Lessons Learned

AI agents require system-level evaluation across multiple turns measuring task success, tool reliability, and real-world behavior rather than single-turn NLP benchmarks like BLEU and ROUGE scores.
Artificial intelligence
fromInfoWorld
3 weeks ago

Why AI evals are the new necessity for building effective AI agents

User trust in AI agents depends on interaction-layer evaluation measuring reliability and predictability, not just model performance benchmarks.
Productivity
fromEntrepreneur
1 month ago

How AI Clears the Path to Faster, Better Executive Decisions

Decision slowdowns stem from disorganized inputs forcing leaders to decode information rather than decide, which AI can resolve by standardizing briefs, surfacing tradeoffs, and documenting rationale.
Online marketing
fromMiami Herald
1 month ago

A 2026 guide to AI optimization: What it is, why it matters, and how to get cited

AI search platforms are redirecting customer queries away from traditional search engines, requiring businesses to optimize content for AI citation and recommendation rather than just search rankings.
Data science
fromInfoWorld
3 weeks ago

The 'toggle-away' efficiencies: Cutting AI costs inside the training loop

Simple optimizations can significantly reduce AI training costs and carbon emissions without needing the latest GPUs.
Artificial intelligence
fromFortune
2 weeks ago

'Intelligence may be scalable, but accountability is not': A new report exposes the hidden cost of the AI agent revolution | Fortune

Smarter AI increases demands on human accountability and leadership in corporate environments.
Software development
fromInfoWorld
4 weeks ago

How to build an AI agent that actually works

Successful agents embed intelligence within structured workflows at specific decision points rather than operating autonomously, combining deterministic processes with reasoning models where judgment is needed.
Artificial intelligence
fromFortune
3 weeks ago

Your AI agent's headline-grabbing capabilities may mask a serious reliability issue | Fortune

AI agents currently face significant reliability issues, impacting their effectiveness in various tasks.
fromFast Company
1 month ago

Should you be using AI for performance reviews?

Before you can even get the opportunity to impress a human interviewer, you will first need to impress the algorithm! More recently, AI has also been used to assist current employees in doing their jobs and then to help their employers evaluate how well employees are performing in those jobs.
Miscellaneous
Artificial intelligence
fromMedium
3 weeks ago

Less Compute, More Impact: How Model Quantization Fuels the Next Wave of Agentic AI

Model quantization and architectural optimization can outperform larger models, challenging the belief that more GPUs equal greater intelligence.
Environment
fromFast Company
2 months ago

These invisible factors are limiting the future of AI

AI progress is increasingly constrained by physical realities—power, geography, regulation, and infrastructure—rather than by algorithms or data alone.
Medicine
fromHarvard Gazette
2 months ago

New AI tool predicts brain age, dementia risk, cancer survival - Harvard Gazette

BrainIAC, a brain imaging adaptive core, accurately extracts multiple disease risk signals from routine brain MRIs using self-supervised learning and limited training data.
Tech industry
fromFuturism
2 months ago

Sam Altman Says Oops, They Accidentally Made the New Version of ChatGPT Worse Than the Previous One

GPT-5.2 prioritized technical intelligence, leading to degraded human-language performance and user dissatisfaction.
Artificial intelligence
fromTheregister
4 weeks ago

AI still doesn't work very well in business, reckoning soon

Enterprise organizations lack clear AI strategies and reference architectures, requiring experimentation and feedback loops to understand AI's actual capabilities and limitations before full deployment.
Digital life
fromInc
2 months ago

Fed Up With AI Slop? These Platforms Will Let You Dial it Down

Platforms are adding settings to reduce low-quality AI-generated content, but fully eliminating such content from feeds is extremely difficult.
Artificial intelligence
fromEntrepreneur
4 weeks ago

Why AI Made Me a Faster Researcher - Not a Lazier One

AI accelerates research mechanics like data sorting and literature reviews, but human judgment remains essential for determining relevance and driving meaningful insights.
Artificial intelligence
fromTheregister
1 month ago

AI models get better at math but still get low marks

Current LLMs struggle with mathematical accuracy, with even top performers scoring C-grade equivalent on practical math benchmarks, though recent versions show modest improvements.
Artificial intelligence
fromInfoQ
2 months ago

Foundation Models for Ranking: Challenges, Successes, and Lessons Learned

Large-scale search and recommendation systems use two-stage retrieval and ranking pipelines to efficiently serve personalized results for hundreds of millions of users and items.
Artificial intelligence
fromPsychology Today
1 month ago

Debugging Overconfidence: Is AI Too Sure of Itself?

AI systems inherit human cognitive biases including overconfidence through training data, model design, and user feedback, requiring mitigation at both development and user levels.
fromInfoQ
2 months ago

Building Embedding Models for Large-Scale Real-World Applications

What happens under the hood? How is the search engine able to take that simple query, look for images in the billions, trillions of images that are available online? How is it able to find this one or similar photos from all that? Usually, there is an embedding model that is doing this work behind the hood.
Artificial intelligence
#ai-safety
Artificial intelligence
fromFast Company
1 month ago

AI's biggest problem isn't intelligence. It's implementation

AI adoption is uneven, yielding clear efficiency gains in some functions yet producing limited measurable profit impacts across most large companies.
Artificial intelligence
fromForbes
1 month ago

Beyond The Hype: The Messy Reality Of Training AI

Short-term data annotation and AI training gigs offer flexible scheduling, prompt weekly pay, variable pay rates, and growing demand for AI and big data skills.
Artificial intelligence
fromZDNET
2 months ago

AI is quietly poisoning itself and pushing models toward collapse - but there's a cure

Unverified AI-generated data causes model collapse and unreliable AI outputs unless organizations enforce data provenance, verification, and governance.
Artificial intelligence
fromZDNET
2 months ago

How Microsoft obliterated safety guardrails on popular AI models - with just one prompt

AI model safety alignment is fragile and can be undone by a single prompt or post-deployment fine-tuning, requiring ongoing safety testing.
Artificial intelligence
fromInfoQ
2 months ago

Why Most Machine Learning Projects Fail to Reach Production

Most ML projects fail to reach production because of problem choice, data/labeling issues, model-to-product gaps, offline-online mismatches, and non-technical blockers.
Artificial intelligence
fromInfoQ
2 months ago

Autonomous Big Data Optimization: Multi-Agent Reinforcement Learning to Achieve Self-Tuning Apache Spark

A Q-learning agent autonomously learns and generalizes optimal Spark configurations by discretizing dataset features and combining with Adaptive Query Execution for superior performance.
Artificial intelligence
fromHackernoon
2 months ago

This "Flash" AI Model Is Fast and Dangerous at Math-Here's What It Can Do | HackerNoon

GLM-4.7-Flash is a 30-billion-parameter mixture-of-experts model offering strong performance for lightweight deployment.
Artificial intelligence
fromEntrepreneur
2 months ago

Comparing AI Models With This Tool Can Save Your Business Time and Money

ChatPlayground AI aggregates over 25 leading AI models into one interface for instant side-by-side comparisons, streamlined workflows, and a lifetime Unlimited subscription for entrepreneurs.
Artificial intelligence
fromTechCrunch
1 month ago

Running AI models is turning into a memory game | TechCrunch

Rising DRAM prices and sophisticated prompt-caching orchestration make memory management a critical cost and performance factor for large-scale AI deployments.
Artificial intelligence
fromAxios
2 months ago

Models that improve on their own are AI's next big thing

Recursive self-improvement lets AI models keep learning after training, accelerating progress while increasing risks, reducing visibility, and complicating safety and governance.
fromComputerworld
2 months ago

OpenAI's GPT is getting better at mathematics

OpenAI's GPT-5.2 Pro does better at solving sophisticated math problems than older versions of the company's top large language model, according to a new study by Epoch AI, a non-profit research institute.
Artificial intelligence
fromUX Magazine
2 months ago

Scaled AI Requires Canonical Truth

Before enterprises can deploy AI agents that actually work, they need something most organizations don't have: a single, authoritative source of truth.
Artificial intelligence
Artificial intelligence
fromArs Technica
2 months ago

New OpenAI tool renews fears that "AI slop" will overwhelm scientific research

OpenAI's free Prism workspace streamlines LaTeX scientific writing with GPT-5.2 but risks accelerating a flood of low-quality AI-assisted papers into journals.
fromTheregister
2 months ago

OpenAI will try to guess your age before ChatGPT gets spicy

sensitive or potentially harmful content.
Artificial intelligence
Artificial intelligence
fromTheregister
2 months ago

Robotics is forcing a fundamental rethink of AI compute

Physical AI requires purpose-built infrastructure for large-scale simulation, data collection, training, and deployment because cloud limitations hinder reliable scaling.
Artificial intelligence
fromZDNET
2 months ago

AI isn't getting smarter, it's getting more power hungry - and expensive

Total computing power explains more model performance gains than proprietary algorithmic 'secret sauce' across 809 large language models.
Artificial intelligence
fromNature
2 months ago

Training large language models on narrow tasks can lead to broad misalignment - Nature

Fine-tuning capable LLMs on narrow unsafe tasks can produce broad, unexpected misalignment across unrelated contexts, increasing harmful, deceptive, and unethical outputs.
fromNature
2 months ago

Multimodal learning with next-token prediction for large multimodal models - Nature

Since AlexNet5, deep learning has replaced heuristic hand-crafted features by unifying feature learning with deep neural networks. Later, Transformers6 and GPT-3 (ref. 1) further advanced sequence learning at scale, unifying structured tasks such as natural language processing. However, multimodal learning, spanning modalities such as images, video and text, has remained fragmented, relying on separate diffusion-based generation or compositional vision-language pipelines with many hand-crafted designs.
Artificial intelligence
Artificial intelligence
fromLogRocket Blog
2 months ago

How poor chunking increases AI costs and weakens accuracy - LogRocket Blog

Chunking determines AI feature cost, accuracy, and scalability; deliberate chunking reduces costs, improves retrieval accuracy, and enables reliable production systems.
[ Load more ]