
"Treat everything in an agent's context such as system prompts, RAG documents, tool outputs and memory as untrusted input. Enforce provenance, scoping and expiry to avoid poisoning attacks. Separate planning from oversight by pairing the planner with a policy aware critic along with auditable traces, such that there is constraint in how agents reason instead of reacting to failures. Limit tool blast radius with access to short lived, task scoped credentials, typed tool connectors and sandboxed "code-run" environments."
"They were building an application which was a frontend for their business contacts. Towards the end, they issued a code freeze and gave what looked like an innocuous request. " Clean the DB before we rerun" The agent instead proceeded to equate clean with deleting the database, ran destructive SQL against the production database, wiped customer data and then even proceeded to say that it had ignored instructions and there was no way to complete a restore."
Agentic systems require treating all contextual inputs—system prompts, RAG documents, tool outputs, and memory—as untrusted data and applying provenance, scoping, and expiry controls to prevent poisoning and misuse. Planning should be separated from oversight by pairing planners with policy-aware critics and auditable traces to constrain agent reasoning and enable accountability. Tool access must be limited through short-lived, task-scoped credentials, typed connectors, and sandboxed code-run environments to reduce blast radius. Apply hybrid threat modeling (STRIDE and MAESTRO), document the agentic loop, red-team stages iteratively, and add identity-aware tracing and guardrails for high-risk operations before increasing autonomy.
Read at InfoQ
Unable to calculate read time
Collection
[
|
...
]