
"looked professional, but had no specific, deep feedback"
"This made us realize that there was a serious problem."
"An editor cannot be an expert in everything. If they receive a very persuasive AI-written negative review, it could easily influence their decision."
Claude 2.0 generated peer-review reports and related documents for 20 cancer-biology papers from eLife by processing original unedited manuscripts. The AI-written reviews looked professional but lacked specific, deep feedback and produced plausible citation suggestions and convincing rejection recommendations. Detection tools often failed to identify the AI origin: ZeroGPT misclassified 60% as human-written, and GPTzero misclassified over 80%. Persuasive AI negative reviews risk influencing editors who are not experts across all fields, potentially causing rejection of meritorious papers. Acceptable uses of large language models for referee reports vary among stakeholders, creating challenges for journals.
Read at Nature
Unable to calculate read time
Collection
[
|
...
]