This article discusses a comparative study of mutation generation techniques utilizing Large Language Models (LLMs) versus traditional approaches. The experimental evaluation reveals that modern LLMs like GPT-3.5 and CodeLlama achieve higher mutation counts but have greater costs and longer generation times compared to methods such as PIT and Major. The results highlight a trade-off between quality and efficiency, suggesting that while LLMs may produce more comprehensive mutations, they are slower and more expensive, raising questions about their utility in practical applications.
The analysis reveals that while GPT-3.5 and CodeLlama-30bInstruct generate a higher number of mutations, traditional methods are significantly faster with lower costs.
Our findings show that there is a clear trade-off between the quantity of mutations generated by LLM approaches and the cost-efficiency of traditional ones.
Collection
[
|
...
]