#benchmark-testing

[ follow ]
Artificial intelligence
fromHackernoon
6 days ago

Chameleon AI Shows Competitive Edge Over LLaMa-2 and Other Models | HackerNoon

Chameleon exhibits competitive performance against leading text-only language models, excelling particularly in commonsense reasoning.
The evaluations indicate that Chameleon is capable of outperforming larger models like Llama-2 in specific benchmarks.
Artificial intelligence
fromTechCrunch
3 weeks ago

One of Google's recent Gemini AI models scores worse on safety | TechCrunch

Gemini 2.5 Flash scores lower on safety tests compared to Gemini 2.0 Flash, raising concerns about AI safety compliance.
#openai
fromTechCrunch
1 month ago
Artificial intelligence

OpenAI's o3 AI model scores lower on a benchmark than the company initially implied | TechCrunch

fromTechCrunch
1 month ago
Artificial intelligence

OpenAI's o3 AI model scores lower on a benchmark than the company initially implied | TechCrunch

[ Load more ]