#zero-shot-performance

[ follow ]
Artificial intelligence
fromZDNET
2 days ago

I put GPT-5.5 through a 10-round test: It scored 93/100, losing points only for exuberance

GPT-5.5 improves performance in writing, coding, and reasoning but can be overly eager, affecting accuracy.
Software development
fromMedium
3 days ago

The Ten Best Agent Skills to Teach Your AI Agent in 2026

Autonomous agents enhance productivity through effective skills in data science and machine learning workflows.
Education
fromeLearning Industry
4 days ago

Seeing The Whole Student: How AI Is Reshaping Skillset Recognition And Allocation In K-12 Education

AI-driven tools in education enhance skillset recognition, providing a richer understanding of student capabilities beyond traditional assessments.
fromNature
4 days ago

Evaluating large language models for accuracy incentivizes hallucinations - Nature

Next-word pretraining creates statistical pressure toward hallucination, even with idealized error-free data. Facts lacking repeated support in training data yield unavoidable errors, while recurring regularities do not.
Agile
fromPsychology Today
5 days ago

How to Move Beyond the AI Pilot

Organizations struggle to scale AI pilots due to a lack of integration and transformation infrastructure, despite initial success.
Artificial intelligence
fromMedium
2 days ago

How to Evaluate AI Tools Without Being a Data Scientist

Many organizations struggle to integrate AI effectively, with only 25% having done so despite plans for increased spending.
Data science
fromNature
1 week ago

Daily briefing: AI systems can 'teach' biases to other models

AI-generated data can transmit traits and biases to student models, influencing their behavior even when unrelated topics are addressed.
#ai-agents
Data science
fromMedium
2 weeks ago

15 Datasets for Training and Evaluating AI Agents

Datasets for training and evaluating AI agents are essential for building reliable agentic systems and preventing execution failures.
fromZDNET
1 month ago
Business intelligence

4 tips for building better AI agents that your business can trust

Artificial intelligence
fromZDNET
2 months ago

Is your AI agent up to the task? 3 ways to determine when to delegate

AI agents should be managed as an adjunct workforce, using management skills to decide which tasks to automate versus retain for humans.
Artificial intelligence
fromTheregister
2 months ago

AI agents can't teach themselves new tricks - people can

Providing explicit, reusable skills greatly improves AI agents' domain performance; asking agents to invent skills often fails and can worsen outcomes.
Software development
fromTechzine Global
1 week ago

OpenAI's new Agents SDK focuses on safety and scalability

OpenAI's updated Agents SDK enables developers to create autonomous AI agents for complex tasks with enhanced usability and a sandbox environment.
Data science
fromMedium
2 weeks ago

15 Datasets for Training and Evaluating AI Agents

Datasets for training and evaluating AI agents are essential for building reliable agentic systems and preventing execution failures.
Business intelligence
fromZDNET
1 month ago

4 tips for building better AI agents that your business can trust

AI agents are transforming professional roles, requiring companies to adopt and integrate these technologies effectively.
fromZDNET
2 months ago
Artificial intelligence

Is your AI agent up to the task? 3 ways to determine when to delegate

#meta
Tech industry
fromFuturism
2 weeks ago

First AI Model From Zuckerberg's Wildly Expensive Superintelligence Lab Flops Compared to Virtually All Rivals

Meta's Muse Spark faces challenges in competing with established AI models despite initial investor enthusiasm.
Artificial intelligence
fromTechzine Global
2 weeks ago

Meta is developing open-source versions of its next frontier AI models

Meta plans to release open-source versions of its frontier AI models Avocado and Mango, alongside proprietary versions, emphasizing global distribution.
Tech industry
fromFuturism
2 weeks ago

First AI Model From Zuckerberg's Wildly Expensive Superintelligence Lab Flops Compared to Virtually All Rivals

Meta's Muse Spark faces challenges in competing with established AI models despite initial investor enthusiasm.
Artificial intelligence
fromTechzine Global
2 weeks ago

Meta is developing open-source versions of its next frontier AI models

Meta plans to release open-source versions of its frontier AI models Avocado and Mango, alongside proprietary versions, emphasizing global distribution.
Education
fromFast Company
1 week ago

The future of AI in schools isn't personalized learning

Personalized learning through AI often results in device-mediated instruction, lacking the essential role of teachers in student development.
Online learning
fromeLearning Industry
2 weeks ago

AI In Workplace Learning: Are We Truly Improving Learning With AI, Or Simply Producing More Of It?

AI is accelerating content production in workplace learning, but it risks compromising learning quality and critical thinking.
#ai
Silicon Valley
fromTechCrunch
1 month ago

Startup Gimlet Labs is solving the AI inference bottleneck in a surprisingly elegant way | TechCrunch

Gimlet Labs raised $80 million to enhance AI inference efficiency across diverse hardware types.
Silicon Valley
fromTechCrunch
1 month ago

Startup Gimlet Labs is solving the AI inference bottleneck in a surprisingly elegant way | TechCrunch

Gimlet Labs raised $80 million to enhance AI inference efficiency across diverse hardware types.
JavaScript
fromInfoWorld
2 weeks ago

27 questions to ask when choosing an LLM

Model performance is crucial for hardware compatibility, speed, and rate limits in real-time applications.
Data science
fromMedium
2 weeks ago

The Top 10 LLM Training Datasets for 2026

Large language models require extensive training data, and practitioners can utilize ten leading public datasets for effective training and fine-tuning.
Online learning
fromeLearning Industry
2 weeks ago

The Role Of Artificial Intelligence In Improving Corporate Training Programs

AI is transforming corporate training by personalizing learning experiences and addressing individual employee needs.
Education
fromPsychology Today
2 weeks ago

When AI Provides Feedback on Student Work

Students intuitively understand the limitations of AI despite limited exposure, highlighting their natural decision-making abilities and critical thinking skills.
Python
fromPyImageSearch
3 weeks ago

Autoregressive Model Limits and Multi-Token Prediction in DeepSeek-V3 - PyImageSearch

Multi-Token Prediction (MTP) in DeepSeek-V3 allows simultaneous token forecasting, enhancing training speed and contextual understanding.
fromAxios
1 week ago

Anthropic's AI downgrade stings power users

"Claude has regressed to the point it cannot be trusted to perform complex engineering," an AMD senior director wrote in a widely shared post on GitHub.
Artificial intelligence
#artificial-intelligence
Artificial intelligence
fromFortune
3 weeks ago

For most workplace tasks, AI is good enough to pass but not good enough to impress, MIT finds | Fortune

AI technology is improving but still struggles to meet quality standards in many workplace tasks.
Artificial intelligence
fromNature
1 week ago

AI agents replicate human social dynamics in days

Moltbook, a social-media platform for AI agents, quickly attracted self-declared rulers and cryptocurrency initiatives after its launch.
Artificial intelligence
fromFortune
3 weeks ago

For most workplace tasks, AI is good enough to pass but not good enough to impress, MIT finds | Fortune

AI technology is improving but still struggles to meet quality standards in many workplace tasks.
Artificial intelligence
fromFuturism
2 weeks ago

OpenAI's Latest Thing It's Bragging About Is Actually Kind of Sad

The AI industry faces significant delays and cancellations in data center projects, impacting ambitious computing capacity goals.
Data science
fromFast Company
1 month ago

A top AI researcher explains the limitations of current models

Francois Chollet's ARC-AGI-3 benchmark reveals AI's limitations in navigating novel situations compared to human intelligence.
Artificial intelligence
fromTech Times
2 weeks ago

Claude vs ChatGPT: Why Users Are Switching and Which AI Is Better in 2026

Claude and ChatGPT differ significantly in context window limits, coding accuracy, and reasoning depth, influencing user preferences in AI chatbot adoption.
Data science
fromMedium
1 month ago

AI KPIs That Matter: Moving Beyond Model Accuracy in 2026

Measuring AI success requires connecting model performance to business outcomes, not just focusing on accuracy metrics.
Software development
fromMedium
1 month ago

Inside Dify AI: How RAG, Agents, and LLMOps Work Together in Production

Dify AI provides a unified platform for deploying production language model systems with built-in solutions for data freshness, observability, versioning, and safe deployment across multiple cloud environments.
fromSearch Engine Roundtable
1 month ago

AI Mode Results Personalized to User Behavior

AI Mode can use your previous conversations, along with places you've searched for or tapped on in Search and Maps to deliver more relevant options, personalized to you. So if AI Mode infers that you have a preference for Italian food, plant-based meals, and places that have outdoor seating, you may get results suggesting options like these.
Privacy technologies
#ai-agent-evaluation
Software development
fromInfoQ
1 month ago

Evaluating AI Agents in Practice: Benchmarks, Frameworks, and Lessons Learned

AI agents require system-level evaluation across multiple turns measuring task success, tool reliability, and real-world behavior rather than single-turn NLP benchmarks like BLEU and ROUGE scores.
Artificial intelligence
fromInfoWorld
1 month ago

Why AI evals are the new necessity for building effective AI agents

User trust in AI agents depends on interaction-layer evaluation measuring reliability and predictability, not just model performance benchmarks.
Software development
fromInfoQ
1 month ago

Evaluating AI Agents in Practice: Benchmarks, Frameworks, and Lessons Learned

AI agents require system-level evaluation across multiple turns measuring task success, tool reliability, and real-world behavior rather than single-turn NLP benchmarks like BLEU and ROUGE scores.
Artificial intelligence
fromInfoWorld
1 month ago

Why AI evals are the new necessity for building effective AI agents

User trust in AI agents depends on interaction-layer evaluation measuring reliability and predictability, not just model performance benchmarks.
Artificial intelligence
fromTheregister
3 weeks ago

Microsoft shivs OpenAI with new AI models for speech, images

Microsoft launched public preview versions of machine learning models for speech recognition, speech synthesis, and image generation, competing directly with OpenAI.
Software development
fromInfoWorld
1 month ago

How to build an AI agent that actually works

Successful agents embed intelligence within structured workflows at specific decision points rather than operating autonomously, combining deterministic processes with reasoning models where judgment is needed.
Graphic design
fromZDNET
1 month ago

I tested GPT-5.4, and the answers were really good - just not always what I asked

GPT-5.4 Thinking delivers superior analytical depth and reasoning capabilities compared to earlier ChatGPT models, though formatting and image generation remain weaker areas.
Roam Research
fromThe Verge
1 month ago

NotebookLM can now summarize research in 'cinematic' video overviews

Google's NotebookLM now generates fully animated cinematic videos from user notes using AI models including Gemini 3, Nano Banana Pro, and Veo 3, advancing beyond previous narrated slideshow capabilities.
Digital life
fromInc
2 months ago

Fed Up With AI Slop? These Platforms Will Let You Dial it Down

Platforms are adding settings to reduce low-quality AI-generated content, but fully eliminating such content from feeds is extremely difficult.
fromFast Company
1 month ago

Should you be using AI for performance reviews?

Before you can even get the opportunity to impress a human interviewer, you will first need to impress the algorithm! More recently, AI has also been used to assist current employees in doing their jobs and then to help their employers evaluate how well employees are performing in those jobs.
Miscellaneous
Tech industry
fromFuturism
2 months ago

Sam Altman Says Oops, They Accidentally Made the New Version of ChatGPT Worse Than the Previous One

GPT-5.2 prioritized technical intelligence, leading to degraded human-language performance and user dissatisfaction.
#anthropic
fromZDNET
2 months ago
Artificial intelligence

Claude Sonnet 4.6 delivers frontier-level AI for free and cheap-seat users

fromZDNET
2 months ago
Artificial intelligence

Claude Sonnet 4.6 delivers frontier-level AI for free and cheap-seat users

Online learning
fromeLearning Industry
1 month ago

How Do AI-Driven Learning Platforms Enhance Workforce Performance?

AI-driven learning platforms improve employee productivity and business outcomes by automating personalized learning paths aligned with performance goals.
Artificial intelligence
fromFortune
1 month ago

Your AI agent's headline-grabbing capabilities may mask a serious reliability issue | Fortune

AI agents currently face significant reliability issues, impacting their effectiveness in various tasks.
Artificial intelligence
fromMedium
1 month ago

Less Compute, More Impact: How Model Quantization Fuels the Next Wave of Agentic AI

Model quantization and architectural optimization can outperform larger models, challenging the belief that more GPUs equal greater intelligence.
Artificial intelligence
fromFast Company
1 month ago

OpenAI's new frontier models mark a huge change in how AI will be built

OpenAI released two frontier models in early March: GPT-5.3 optimized for fast responses and GPT-5.4 optimized for deep analytical work, representing a shift toward specialized AI models.
Artificial intelligence
fromFast Company
1 month ago

The next phase of AI must start solving everyday problems

Technology's value depends on consumer education driving adoption, which then creates society-wide impact; the most successful AI systems will solve real-world problems efficiently rather than showcase advanced features.
Artificial intelligence
fromMail Online
1 month ago

Can you tell which of these was written by ChatGPT?

Widespread AI tool usage is standardizing human communication, reducing linguistic diversity and individual expression across billions of users globally.
fromInfoQ
2 months ago

Building Embedding Models for Large-Scale Real-World Applications

What happens under the hood? How is the search engine able to take that simple query, look for images in the billions, trillions of images that are available online? How is it able to find this one or similar photos from all that? Usually, there is an embedding model that is doing this work behind the hood.
Artificial intelligence
Artificial intelligence
fromInfoQ
2 months ago

Foundation Models for Ranking: Challenges, Successes, and Lessons Learned

Large-scale search and recommendation systems use two-stage retrieval and ranking pipelines to efficiently serve personalized results for hundreds of millions of users and items.
Artificial intelligence
fromTheregister
1 month ago

AI models get better at math but still get low marks

Current LLMs struggle with mathematical accuracy, with even top performers scoring C-grade equivalent on practical math benchmarks, though recent versions show modest improvements.
fromFast Company
2 months ago

Are LTMs the next LLMs? This new type of AI can do what large-language models can't

A major difference between LLMs and LTMs is the type of data they're able to synthesize and use. LLMs use unstructured data-think text, social media posts, emails, etc. LTMs, on the other hand, can extract information or insights from structured data, which could be contained in tables, for instance. Since many enterprises rely on structured data, often contained in spreadsheets, to run their operations, LTMs could have an immediate use case for many organizations.
Artificial intelligence
Artificial intelligence
fromZDNET
1 month ago

New GPT-5.4 clobbers humans on pro-level work in OpenAI's tests - by 83%

GPT-5.4 matches or outperforms human professionals 83% of the time across nine industries and 44 occupations, with 18% fewer errors and 33% fewer false claims than GPT-5.2.
Artificial intelligence
fromAxios
2 months ago

Models that improve on their own are AI's next big thing

Recursive self-improvement lets AI models keep learning after training, accelerating progress while increasing risks, reducing visibility, and complicating safety and governance.
Artificial intelligence
fromFast Company
2 months ago

AI's biggest problem isn't intelligence. It's implementation

AI adoption is uneven, yielding clear efficiency gains in some functions yet producing limited measurable profit impacts across most large companies.
Artificial intelligence
fromForbes
2 months ago

Beyond The Hype: The Messy Reality Of Training AI

Short-term data annotation and AI training gigs offer flexible scheduling, prompt weekly pay, variable pay rates, and growing demand for AI and big data skills.
fromNature
2 months ago

Multimodal learning with next-token prediction for large multimodal models - Nature

Since AlexNet5, deep learning has replaced heuristic hand-crafted features by unifying feature learning with deep neural networks. Later, Transformers6 and GPT-3 (ref. 1) further advanced sequence learning at scale, unifying structured tasks such as natural language processing. However, multimodal learning, spanning modalities such as images, video and text, has remained fragmented, relying on separate diffusion-based generation or compositional vision-language pipelines with many hand-crafted designs.
Artificial intelligence
fromInfoWorld
2 months ago

AI agents still need humans to teach them

AI agents need skills - specific procedural knowledge - to perform tasks well, but they can't teach themselves, a new research suggests. The authors of the research have developed a new benchmark, SkillsBench, which evaluates agentic AI performance on 84 tasks across 11 domains including healthcare, manufacturing, cybersecurity and software engineering. The researchers looked at each task under three conditions:
Artificial intelligence
Artificial intelligence
fromInfoWorld
2 months ago

What is context engineering? And why it's the new AI architecture

Context engineering designs and manages the information, tools, and constraints an LLM receives, enabling scalable, high-signal inputs and improved model outcomes.
Artificial intelligence
fromBusiness Insider
1 month ago

Anthropic is seizing the moment by promoting how easy it is to switch to Claude

Anthropic simplified the process for users to import their conversation history from competing AI chatbots into Claude, enabling data transfer in under one minute.
Artificial intelligence
fromInfoQ
2 months ago

Why Most Machine Learning Projects Fail to Reach Production

Most ML projects fail to reach production because of problem choice, data/labeling issues, model-to-product gaps, offline-online mismatches, and non-technical blockers.
Artificial intelligence
fromHackernoon
2 months ago

This "Flash" AI Model Is Fast and Dangerous at Math-Here's What It Can Do | HackerNoon

GLM-4.7-Flash is a 30-billion-parameter mixture-of-experts model offering strong performance for lightweight deployment.
Artificial intelligence
fromTheregister
1 month ago

OpenAI GPT-5.3 Instant less likely to beat around the bush

GPT-5.3 Instant reduces unnecessary refusals and moralizing preambles while decreasing hallucination rates by up to 26.8 percent compared to prior models.
Artificial intelligence
fromTechzine Global
2 months ago

OpenAI seeks faster alternatives to Nvidia chips

OpenAI seeks alternative inference chips with larger on-chip SRAM to improve response speed for coding and AI-to-AI communication, aiming for about 10% of future inference capacity.
fromComputerworld
2 months ago

OpenAI's GPT is getting better at mathematics

OpenAI's GPT-5.2 Pro does better at solving sophisticated math problems than older versions of the company's top large language model, according to a new study by Epoch AI, a non-profit research institute.
Artificial intelligence
Artificial intelligence
fromInfoQ
2 months ago

Hugging Face Introduces Community Evals for Transparent Model Benchmarking

Community Evals enables benchmark datasets on the Hugging Face Hub to host leaderboards, collect reproducible evaluation results via Git-based .eval_results YAML submissions, and display scores.
fromYanko Design - Modern Industrial Design News
1 month ago

Nvidia wants robots to learn before executing tasks by watching 44,000 hours of human video - Yanko Design

The robotics industry, for now, faces the biggest challenge in teaching robots to operate in the messy real world. The unstructured environment means robots need massive amounts of data to learn. Gathering and structuring that data is the costliest thing in robotics and perhaps the biggest impediment, slowing the entire development process.
Artificial intelligence
fromComputerworld
2 months ago

Testing can't keep up with rapidly advancing AI systems: AI Safety Report

AI systems continued to advance rapidly over the past year, but the methods used to test and manage their risks did not keep pace, according to the International AI Safety Report 2026. The report, produced with inputs from more than 100 experts across over 30 countries, said that pre-deployment testing was increasingly failing to reflect how AI systems behaved once deployed in real-world environments, creating challenges for organisations that had expanded their use of AI across software development, cybersecurity, research, and business operations.
Artificial intelligence
[ Load more ]