Amazon released a summary of the S3 Service Disruption on US-EAST-1 yesterday. The outage affected many sites, apps, and utilities that rely on the service.
An S3 team member running an established playbook to remove a small number of servers accidently entered a command wrong and a larger set of servers was removed.
The servers that were inadvertently removed supported two other S3 subsystems. One of these subsystems, the index subsystem, manages the metadata and location information of all S3 objects in the region. This subsystem is necessary to serve all GET, LIST, PUT, and DELETE requests. The second subsystem, the placement subsystem, manages allocation of new storage and requires the index subsystem to be functioning properly to correctly operate. The placement subsystem is used during PUT requests to allocate storage for new objects. Removing a significant portion of the capacity caused each of these systems to require a full restart. While these were being restarted, S3 was unable to service requests. Other AWS services in the US-EAST-1 Region that rely on S3 for storage, including the S3 console, Amazon Elastic Compute Cloud (EC2) new instance launches, Amazon Elastic Block Store (EBS) volumes (when data was needed from a S3 snapshot), and AWS Lambda were also impacted while the S3 APIs were unavailable.
They go on to say that because of the massive growth over the last several years the process of restarting these services and running the necessary safety checks to validate the integrity of the metadata took longer than expected.
Finally, because the status board relied on S3 they were unable to update the status icons and had to rely on Twitter and the note at the top.
Filed in: News
Join the weekly newsletter and never miss out on new tips, tutorials, and more.
- Mid / Sen. Software Engineer
- Remote PHP / Laravel Developer
- Senior PHP/Laravel Developer: Your Dream Work Environment
iPhone Photography School
- Senior Laravel Developer
- PHP Developer
- Senior Laravel Developer (Canada and India)
London, Ontario, Canada
Factory Bucket Inc.
- Laravel, PHP, PostgreSQL, Neo4J Developer
Pune, India (intern in Denver, CO)
Testing File Uploads With Laravel
Laravel now includes a new system for testing file uploads through two new fake methods, one on the UploadFile class…
Laravel Countries and Currencies Package
Antonio Carlos Ribeiro recently launched a new package that gives you a powerful setup for dealing with the different…