Experiment Design and Metrics for Mutation Testing with LLMs | HackerNoon
Briefly

The article focuses on evaluating LLM-generated mutations through a structured approach that emphasizes the importance of certain metrics. It categorizes mutations into compilable and non-compilable sets, identifying potential issues like useless and equivalent mutations that may skew results. The study outlines various evaluation metrics designed to assess costs, usability, and behaviors of these mutations, highlighting the complexity and challenges inherent in determining mutation quality. It underscores that higher mutation scores do not necessarily imply better mutation quality and discusses the implications of these findings for future research.
In evaluating LLM-generated mutations, we designed metrics that encompass cost, usability, and behavior, recognizing that higher mutation scores don't guarantee higher quality.
Generated mutations can be categorized into compilable and non-compilable sets, emphasizing the need to filter out useless and equivalent mutations.
Understanding the relationship between different types of generated mutations is key to enhancing quality assessment in our experiments and can guide future research.
Metrics capture different aspects of mutation evaluation, revealing complexities and nuances in quality that challenge simplistic interpretations based on mutation counts.
Read at Hackernoon
[
|
]