Major AWS Outage Takes Down 500+ Companies Including Fortnite, Snapchat, Canva, and Alexa

What Happened This Morning (The Critical First Minute)

On October 20, 2025, Amazon Web Services experienced a massive outage that cascaded across the internet, taking down over 500 companies and leaving millions of users unable to access the services they depend on daily. A DNS resolution failure in the US-EAST-1 region, one of AWS's most critical data centers, triggered a domino effect that spread globally within minutes.

The impact was immediate and widespread. Popular apps like Fortnite, Snapchat, Alexa, and design tool Canva all went offline. Major cloud services including AWS Lambda, RDS databases, IAM (identity management), and API Gateway stopped working. The AWS Support Center itself crashed, meaning customers couldn't even report their problems. Within just two hours, Downdetector recorded over 4 million outage reports worldwide, with more than 400,000 coming from the UK alone.

Here's the key takeaway: 72 AWS services went down because a single DNS problem in one region brought the entire interconnected infrastructure crashing down. AWS engineers identified the root cause within 90 minutes and had most services recovering within 3 hours, but the speed of recovery doesn't change one critical fact: a single infrastructure failure can disable hundreds of companies simultaneously.

Read on to understand exactly what went wrong, which services were affected, and what this means for the future of cloud computing.

Which Services Went Down

A total of 72 AWS services were impacted by this outage. Here are the ones that affected real users and businesses:

Consumer Apps You Know: Fortnite went completely offline, leaving millions of gamers unable to access their accounts or play. Snapchat disappeared for users relying on the messaging app. Alexa stopped responding to voice commands across millions of households. Canva, the design platform that millions use to create graphics and presentations, went down during business hours. Amazon's own gaming services and streaming platforms all became inaccessible.

Critical Business Infrastructure: AWS Lambda (serverless computing), Amazon RDS (relational databases), Amazon DynamoDB (NoSQL databases), IAM (authentication and permissions), API Gateway, Amazon CloudWatch (monitoring), Amazon CloudFront (content delivery), and Amazon SageMaker (machine learning) all stopped functioning. For thousands of companies, these services are the backbone of their operations.

Enterprise Services: AWS Systems Manager, AWS Secrets Manager, AWS Organizations, AWS Transfer Family, and many compliance and security tools went offline. Companies managing infrastructure across multiple AWS accounts were completely blind.

Global Services: AWS global services relying on US-EAST-1 endpoints also failed, including IAM updates and DynamoDB Global Tables. The AWS Support Center and Support API went down, preventing customers from opening tickets or getting help.

The 72 affected services represent a cross-section of AWS's entire product portfolio, from storage to networking to machine learning to identity management.

How Did This Happen: The Technical Story

The Root Cause: DNS Resolution Failure

AWS identified the issue at 2:01 AM PDT: a DNS resolution failure for the DynamoDB API endpoint in US-EAST-1.

DNS is essentially the internet's address book. When you type a web address or your app tries to connect to a server, DNS converts that name into an IP address. If DNS fails, the connection fails. Every request just bounces back with an error.

In this case, requests to DynamoDB couldn't find the correct IP address for the service. That would have been bad enough if it only affected DynamoDB, but here's the critical part: many other AWS services depend on DynamoDB or share the same underlying infrastructure and systems. When DynamoDB went down, the failure cascaded outward.

Why It Spread Globally

This is the key to understanding why a single region problem became a global catastrophe.

US-EAST-1 is AWS's oldest and most utilized region. It's where many companies first launched their services. More importantly, AWS uses this region as a backend for global services like IAM (identity and access management), which handles authentication and permissions for every AWS customer worldwide. When US-EAST-1 went down, these global services failed too.

Think of it like a major power plant going offline. It doesn't just affect the city directly connected to it; it cascades through the entire power grid because everything is interconnected. When one piece of critical infrastructure fails spectacularly, dependent systems fail too.

Companies that had distributed their workloads across multiple regions often still had a dependency on US-EAST-1 for core services. They thought they were protected by redundancy, but they weren't.

AWS's Response and Recovery

AWS responded with a well-organized incident response. Here's what happened minute by minute:

12:11 AM: Engineers detected increased error rates and latencies. Investigation began immediately.

1:26 AM: AWS confirmed significant error rates for DynamoDB in US-EAST-1 and acknowledged the issue affects other services in the region.

2:01 AM: Root cause identified as DNS resolution issue. AWS activated multiple parallel mitigation paths simultaneously, meaning they didn't wait to fully understand the problem before starting to fix it.

2:22 AM: Initial mitigations applied. Early signs of recovery visible. AWS warned that requests might continue failing as the team worked toward full resolution.

2:27 AM: Significant recovery signs appearing. Most requests starting to succeed again.

3:03 AM: Global services and features relying on US-EAST-1 confirmed recovered. Full resolution in progress.

The entire incident from first detection to recovery took about three hours. That might sound quick, but for companies paying per second for cloud resources or losing revenue with every minute of downtime, three hours is devastating.

The Hidden Backlog Problem

Here's something most people don't realize: even after services came back online, the work wasn't done. Millions of requests had queued up while the system was down. These couldn't be instantly processed. AWS warned customers about significant latency and recommended retrying failed requests. Clearing this backlog meant additional failures and delays even after the outage ended.

What This Reveals About Cloud Infrastructure

The Single Region Risk: Many companies use only one AWS region for cost and simplicity. This outage proves that's dangerous. If you're in US-EAST-1, you're exposed to the exact vulnerability that just happened. Multi-region redundancy isn't just nice to have; it's essential.

Interconnected Infrastructure is Fragile: Cloud efficiency comes from shared infrastructure and interconnected systems. That's how AWS keeps costs down and services running fast. But it also means one failure point can cascade globally. There's no way around this tradeoff.

Global Services Have Hidden Dependencies: Many companies thought they were protected because they used AWS's global services and distributed their workloads. They didn't realize those global services still depend on specific regions for their backend operations. The dependency wasn't visible until it failed.

Even Giants Aren't Immune: AWS is the world's largest cloud provider with 99.99% uptime SLAs. Yet here we are. This doesn't mean AWS is unreliable; it means complete immunity to outages is impossible at this scale. What matters is how quickly you detect and fix problems (AWS was fast) and how your customers' businesses are designed to survive partial failures (most aren't).

The Bottom Line

A DNS resolution failure in AWS US-EAST-1 took down 72 services affecting over 500 companies in just minutes. Fortnite went offline, Snapchat disappeared, Canva stopped working, and millions of businesses lost functionality they depend on. AWS resolved the technical issue within three hours, but the incident revealed a critical vulnerability in how the modern internet operates: everything is interconnected, and single points of failure can have massive ripple effects.

For companies using AWS, this is a wake-up call. For the cloud industry, it's a reminder that distributed infrastructure is only as strong as its most critical components.

The services are back online now. But the question many CIOs are asking themselves today is: could this happen to us, and are we prepared if it does?

Were you affected by the outage? What service did you need most when it went down? Share your experience in the comments below.