Google caused outage by ignoring its quality protections
Briefly

Last week's outage of Google Cloud lasted for at least three hours, causing widespread disruptions, including to partners like Cloudflare. The issue arose from a new feature added to Google Cloud's Service Control, which underwent a flawed rollout without essential testing procedures being triggered. This oversight led to a failure when an error occurred in the binary, causing service outages. Google admitted shortcomings in their communication and pledged to enhance responsiveness to customer concerns during such events.
Google Cloud's recent outage was caused by a failed code change in its Service Control feature, resulting in three hours of downtime for various customers.
The failure stemmed from a new feature that wasn’t properly tested during rollout, as certain conditions weren’t triggered that would have accessed the faulty code.
Google acknowledged the problems with their external communication during the incident, promising to improve how information is conveyed to customers in future outages.
Despite the inherent risks in launching new features, Google noted that the malfunction originated from a lack of error handling and adequate safety measures.
Read at Theregister
[
|
]