#code-evaluation

[ follow ]
fromHackernoon
1 year ago

Evaluating GPT and Open-Source Models on Code Mutation Tasks | HackerNoon

The performance of closed-source LLMs typically exceeds that of open-source models in key metrics, emphasizing the importance of training data quality and model architecture.
Scala
fromHackernoon
7 months ago

Inside the Evaluation Pipeline for Code LLMs With LuaUnit | HackerNoon

To streamline and standardize the automated evaluation procedure, we translated the native assertions in MCEVAL to LuaUnit-based assertions, improving consistency across benchmarks.
Scala
[ Load more ]