Millions of lost orders, website errors, and 'sharp edges': Amazon cracks down on code changes
Briefly

Millions of lost orders, website errors, and 'sharp edges': Amazon cracks down on code changes
"A "trend of incidents" emerged since the third quarter of 2025, including "several major" incidents in the last few weeks. Problems included what he described as "high blast radius changes," where software updates propagated broadly because control planes lacked suitable safeguards. In other cases, data corruption took hours to unwind. Some failures were traced back to basic mechanisms, such as a requirement to have two people authorize code changes, that were either lacking or bypassed."
"We are implementing temporary safety practices which will introduce controlled friction to changes in the most important parts of the Retail experience. In parallel, we will invest in more durable solutions including both deterministic and agentic safeguards."
Amazon experienced multiple significant outages affecting its e-commerce operations since Q3 2025, with at least one disruption traced to its AI coding assistant Q. SVP Dave Treadwell identified a trend of incidents caused by high blast radius changes, inadequate control plane safeguards, data corruption, and bypassed authorization requirements. In response, Amazon is introducing tighter controls requiring engineers to document code changes more thoroughly and obtain additional approvals. The company is also implementing controlled friction in the code-change review process and developing deterministic and agentic safeguards to prevent future incidents.
Read at Business Insider
Unable to calculate read time
[
|
]