
"The core technique is indirect prompt injection. Rather than attacking the AI model directly, the researcher embedded malicious instructions in places the agents were designed to trust: PR titles, issue descriptions, and comments."
"Against Anthropic's Claude Code Security Review, Guan crafted a PR title containing a prompt injection payload. Claude executed the embedded commands and included the output, including leaked credentials, in its JSON response."
"The vulnerabilities, disclosed by researcher Aonan Guan over several months, affect AI tools that integrate with GitHub Actions: Anthropic's Claude Code Security Review, Google's Gemini CLI Action, and GitHub's Copilot Agent."
Aonan Guan conducted prompt injection attacks on AI agents from Anthropic, Google, and Microsoft, successfully stealing API keys and tokens. The vulnerabilities affected tools like Claude Code Security Review, Gemini CLI Action, and Copilot Agent, which failed to distinguish between legitimate content and injected instructions. Guan embedded malicious commands in trusted areas such as pull request titles and issue descriptions. Despite the successful exploits, the companies paid bug bounties but did not issue public advisories or assign CVEs, leaving users unaware of the risks associated with older versions.
Read at TNW | Anthropic
Unable to calculate read time
Collection
[
|
...
]