#model-benchmarking tag

AI models are getting better at replacing cybersecurity pros on certain tasks

Frontier LLMs are rapidly improving cybersecurity task efficiency, with measurable time-horizon doubling occurring in months rather than years.

fromComputerworld

4 months ago

OpenAI's GPT is getting better at mathematics

OpenAI's GPT-5.2 Pro does better at solving sophisticated math problems than older versions of the company's top large language model, according to a new study by Epoch AI, a non-profit research institute.

Artificial intelligence

fromBig Think

5 months ago

Inside the meteoric rise of Mercor

Expert-labeled AI evaluations propelled Mercor's rapid growth, becoming a critical industry benchmark and revenue driver for top model developers and tech giants.

Artificial intelligence

fromLogRocket Blog

6 months ago

AI dev tool power rankings & comparison [Nov 2025] - LogRocket Blog

An evidence-based power ranking and 50+ feature comparison identifies top AI models and AI-powered development tools for frontend development as of November 2025.

Artificial intelligence

fromTheregister

7 months ago

AI chatbots carry hidden biases baked into their design

Large language models exhibit variable political biases that can skew outputs and influence real-world decisions like voting advice.

fromFuturism

7 months ago

OpenAI Releases List of Work Tasks It Says ChatGPT Can Already Replace

ChatGPT maker OpenAI has released a new evaluation, dubbed GDPval, to measure how well its AIs perform on "economically valuable, real-world tasks across 44 occupations." "People often speculate about AI's broader impact on society, but the clearest way to understand its potential is by looking at what models are already capable of doing," the company wrote in an accompanying blog post. "Evaluations like GDPval help ground conversations about future AI improvements in evidence rather than guesswork, and can help us track model improvement over time," OpenAI added.

Artificial intelligence

#model-benchmarking#model-benchmarking

AI models are getting better at replacing cybersecurity pros on certain tasks

OpenAI's GPT is getting better at mathematics

Inside the meteoric rise of Mercor

AI dev tool power rankings & comparison [Nov 2025] - LogRocket Blog

AI chatbots carry hidden biases baked into their design

OpenAI Releases List of Work Tasks It Says ChatGPT Can Already Replace

#model-benchmarking
#model-benchmarking