Amazon reveals cause of AWS outage that took everything from banks to smart beds offline
Briefly

Amazon reveals cause of AWS outage that took everything from banks to smart beds offline
"AWS said customers were unable to connect to DynamoDB, its database system where AWS customers store, due to a latent defect within the service's automated DNS [domain name system] management system. DynamoDB maintains hundreds of thousands of DNS records. It uses automation to monitor the system to ensure records are updated frequently to ensure additional capacity is added as required, hardware failures are handled and traffic is distributed efficiently."
"The root cause of the issue, AWS said, was an empty DNS record for the Virginia-based US-East-1 datacentre region. The bug failed to automatically repair, and required manual operator intervention to correct. AWS said it had disabled the DynamoDB DNS planner and DNS enactor automation worldwide while it fixes the conditions that led to the outage and adds extra protections."
"Platforms including Signal, Snapchat, Roblox, Duolingo, as well as services such as banking sites and the Ring doorbell company were some of the 2,000 companies affected by the outage, according to Downdetector a site that monitors internet outages with more than 8.1m reports of problems from users across the world. While services were restored in a matter of hours, the impact of the outage was felt widely."
AWS attributed a multi-hour outage to a latent defect in DynamoDB's automated DNS management system. DynamoDB maintains hundreds of thousands of DNS records and uses automation to update records, add capacity, and handle hardware failures. The root cause was an empty DNS record in the US‑East‑1 (Virginia) datacentre region that the automation failed to repair. The bug required manual operator intervention to correct, and AWS disabled the DynamoDB DNS planner and DNS enactor automation worldwide while it fixes conditions and adds protections. The outage affected about 2,000 companies including Signal, Snapchat, Roblox, Duolingo, banking sites, Ring, and Eight Sleep smart beds. Services were restored within hours, but Downdetector recorded more than 8.1 million problem reports globally.
Read at www.theguardian.com
Unable to calculate read time
[
|
]