LM Arena, the organization behind popular AI leaderboards, lands $100M | TechCrunch
LM Arena has become an essential crowdsourced benchmarking project for AI labs, raising $100 million in seed funding to further its mission of evaluating AI models.
OpenAI's o3: AI Benchmark Discrepancy Reveals Gaps in Performance Claims
The performance of OpenAI's o3 model on benchmarks significantly differed from earlier claims, revealing the complexity and variability in AI evaluations.
AI benchmarking platform Chatbot Arena forms a new company | TechCrunch
Chatbot Arena is forming a company called Arena Intelligence Inc. to enhance its benchmarking capabilities significantly while maintaining neutrality in AI testing.
Last week, a post on X claimed Google's Gemini model surpassed Anthropic's Claude model in Pokemon, stirring controversy over AI benchmarks and implementation.