We Designed a Study to See If AI Can Imitate Real Software Bugs | HackerNoon
Briefly

The article explores the effectiveness of large language models (LLMs) in generating software mutations, particularly for Java programs. It is structured around five research questions that evaluate LLM performance concerning cost, usability, and behavior similarity to actual bugs. The study also investigates how different prompts and LLM architectures influence these outcomes, and it identifies underlying causes for errors in mutation generation. This comprehensive analysis aims to enhance the understanding of LLM capabilities in software testing contexts.
Our study investigates the capabilities of existing LLMs in mutation generation, focusing on performance evaluation, prompt engineering strategies, and root cause analysis for underperformed aspects.
We design research questions that target LLM performance in mutation generation regarding cost, usability, behavior similarity with real bugs, and the impact of different prompts and models.
Read at Hackernoon
[
|
]