When Two Giants Fall

On October 29, 2025, the internet experienced what tech experts are calling a "perfect storm" outage. Both Microsoft Azure and Amazon Web Services (AWS) went down simultaneously around 9 a.m. Pacific time (12:00 PM ET), creating a cascading failure that knocked out massive chunks of the internet at once.

Microsoft Azure was down for over 16,600 users and Microsoft 365 was down for nearly 9,000 users according to Downdetector, while AWS experienced its second major outage in just over a week. The timing couldn't be worse: Microsoft's outage hit just hours before the company was scheduled to report quarterly earnings.

Here's what went offline: Xbox, Minecraft, Microsoft 365 (Outlook, Teams, Word, Excel), Azure Portal, Starbucks systems, Kroger, Costco, and countless other services. On the AWS side, the notorious US-EAST-1 region experienced problems again, affecting services that were just recovering from last week's massive outage.

Both companies confirmed DNS issues as the root cause. Microsoft stated they began "experiencing DNS issues resulting in availability degradation of some services" starting at approximately 16:00 UTC. This marks the second time in nine days that DNS problems have brought down major portions of the internet infrastructure.

The big question everyone's asking: Is this a coincidence, or is something fundamentally broken with cloud infrastructure? Read on to understand what's really happening behind the scenes.

The Double Knockout: What Services Were Affected

Microsoft's Casualties

Microsoft 365, Azure, Xbox, Microsoft Store, Minecraft, and the Azure Portal all experienced simultaneous outages. For businesses, this meant:

Productivity Ground to a Halt: Microsoft 365 is the backbone of millions of companies worldwide. Outlook email, Teams video calls, SharePoint document sharing, and the entire Office suite became inaccessible. Employees showed up for work only to find they couldn't access any of their files or communication tools.

Gaming Interrupted: Xbox Live, Minecraft, and other Microsoft gaming services went down, leaving millions of gamers unable to play or access their accounts during peak hours.

Enterprise Systems Crashed: Major Microsoft customers like Starbucks, Kroger, and Costco also experienced outage spikes, suggesting their point-of-sale or inventory systems rely on Azure infrastructure.

Government and Healthcare Impact: Azure powers numerous government contracts and healthcare systems. When Azure goes down, the ripple effects touch critical infrastructure that people depend on for essential services.

AWS's Second Strike

AWS's US-EAST-1 region experienced issues with EC2 instance launches and ECS task failures, though the company initially disputed the severity of the outage. This is particularly concerning because:

Just One Week After Disaster: Last week's AWS outage on October 20 took down Fortnite, Snapchat, Reddit, Ring doorbells, and banking services for hours. That outage was caused by two automated systems trying to update the same DNS data simultaneously, creating what engineers call a "race condition."

US-EAST-1 Strikes Again: 78% of AWS outage reports came from the US-EAST-1 region, the same region that caused last week's chaos. This region is AWS's oldest and most critical, hosting infrastructure for countless global services.

Infrastructure Fragility Exposed: AWS acknowledged that many services have internal dependencies, meaning when one service fails, others cascade. The October 20 outage revealed how DynamoDB failures spread to EC2, Lambda, and dozens of other services.

The Simultaneous Nature is the Scary Part

What makes October 29 particularly alarming is that both cloud giants went down at nearly the same time. Reports started spiking on Downdetector around 9 a.m. Pacific for both platforms. This isn't just bad luck; it suggests either a common vulnerability or an infrastructure problem affecting multiple providers.

Microsoft’s Response

In a brief statement via its Azure status Twitter feed, Microsoft said only that engineers were “investigating an issue” affecting portal access [1]. No fix timeline was immediately announced. This outage closely follows AWS troubles – as one user forum quipped, “First AWS and now this”. (Notably, Amazon Web Services saw its own U.S. East region disruption just nine days earlier, underscoring how even big tech clouds can glitch.) Microsoft is generally seen as having robust multi-region redundancy, so a failure of this magnitude suggests either a systemic error (like a software push or networking fault) or an external factor; investigators will likely be poring over Azure’s logs.

IT and cloud experts caution that when identity services or regional hubs go offline, many disparate products can cascade into outages. As one analyst noted, “Microsoft’s cloud has often been resilient, but simultaneous failures in Azure core components can affect everything from Teams calls to Exchange email” (speaking anonymously to a tech publication). For customers, Microsoft recommends checking https://status.azure.com for updates and using mobile Teams apps or VPNs as temporary workarounds if regional portals are unavailable.

What's Really Wrong With Cloud Giants: The Uncomfortable Truth

The Myth of Cloud Reliability

Cloud providers market 99.99% uptime guarantees, which sounds impressive until you do the math. 99.99% uptime means 52 minutes of downtime per year is acceptable. But when that downtime happens all at once and affects thousands of companies simultaneously, the impact is catastrophic.

As one expert noted, AWS has never had a global outage due to their regional isolation strategy, but because their "outage blast radius is a significant part of the global economy," their failures are felt far more heavily.

The Centralization Problem

The internet was originally designed to be decentralized and resilient. No single point of failure. But the cloud computing revolution has created exactly what the internet was meant to avoid: massive concentration of infrastructure in the hands of a few providers.

The UK government estimates up to 60% of government cloud services are hosted on AWS, Microsoft, or Google platforms. When these platforms fail, entire sectors of the economy go dark.

The Interconnection Trap

AWS's own postmortem revealed that the outage meant people couldn't order food, communicate with hospital networks, access mobile banking, or connect with security systems and smart homes. Major companies including Netflix, Starbucks, and United Airlines were temporarily unable to give customers access to their online services.

This reveals a hidden vulnerability: most companies don't realize how dependent they are on these platforms until they fail. You might think you're using multiple services, but if they all run on AWS or Azure underneath, you have a single point of failure.

Automation Becoming the Problem

Both AWS and Microsoft run highly automated systems to manage their massive infrastructure. But automation itself caused AWS's October 20 outage when two automated systems conflicted. The systems designed to prevent problems created them instead.

As infrastructure becomes more complex and more automated, the potential for unexpected interactions increases exponentially. Humans can't possibly monitor everything in real-time, but automated systems can create cascading failures faster than humans can respond.
The Fragility is a Feature, Not a Bug

AWS employs some of the best engineers on the planet to think about reliability problems at a scale that few can contextualize. The fact that outages like this are newsworthy is actually a testament to how reliable they normally are.

But reliability at massive scale requires tradeoffs. The same interdependencies that make cloud infrastructure efficient also make it fragile. You can't have both perfect isolation and perfect efficiency.

Government Response

UK digital government minister Ian Murray stated that the AWS outage "affected a number of suppliers and departments, and it will take some time to fully understand the scale of the impact". The government is developing a cloud consumption dashboard to provide better visibility across the public sector.

The UK government announced plans to publish a strategy for handling future cloud outages, though that plan is still in development. The fact that governments are now treating cloud outages as national infrastructure issues shows how critical this problem has become.

The Bottom Line: What This Means for the Future

The Internet Needs a Backup Plan

One headline asked: "How many more AWS outages until the internet builds a real backup plan?" That question is becoming more urgent with each incident.

The current model concentrates too much infrastructure in too few hands. When AWS or Azure goes down, significant portions of the global economy stop functioning. This isn't sustainable.

Multi-Cloud Isn't Enough

Many companies think they're protected because they use multiple cloud providers. But when both AWS and Azure went down simultaneously, multi-cloud strategies failed. If your "redundant" systems all fail at once, you don't have redundancy.

Real redundancy requires fundamentally different infrastructure: different DNS providers, different network paths, different geographic locations, and different architectural approaches.

Automation Needs Human Oversight

AWS's October 20 outage was caused by automated systems conflicting with each other. As infrastructure becomes more automated, the potential for these conflicts increases. Companies need better human oversight of automated systems, not less.

DNS Needs a Fundamental Rethink

Two major outages in nine days, both caused by DNS issues, suggests DNS itself is the problem. The protocol was designed decades ago for a much smaller internet. Maybe it's time to rethink how name resolution works at internet scale.

What You Can Do Right Now

If you're a business relying on cloud infrastructure:

1. Audit Your Dependencies: Map out exactly which cloud providers you depend on, including hidden dependencies. That WordPress site might be on your own servers, but if it uses Cloudflare DNS and AWS for image hosting, you have AWS dependencies.

2. Test Your Failover: Don't wait for an outage to discover your backup systems don't work. Regularly test your failover procedures when systems are working.

3. Build Real Redundancy: Don't just use AWS and Azure. Consider on-premises backup systems, different DNS providers, and genuinely independent infrastructure.

4. Monitor Everything: You can't fix what you can't see. Invest in monitoring tools that alert you immediately when dependencies start failing.

5. Have a Communication Plan: When everything goes dark, how will you communicate with customers and employees? Email might be down. Phone systems might be down. Have alternative communication channels ready.

The Uncomfortable Truth

As one expert noted, "the internet is a complex web of overlapping services that are only as reliable as their weakest code". We've built a global economy on infrastructure that can be taken down by DNS failures.

The October 29 double outage of Microsoft Azure and AWS is a warning sign. Not a warning that cloud computing is bad, but a warning that we've become too dependent on too few providers with too many similar vulnerabilities.

The world is more interconnected today than it ever has been, while that interconnection has become increasingly centralized. That centralization creates efficiency and convenience, but also creates catastrophic single points of failure.

The question isn't if this will happen again. The question is whether we'll build better redundancy before the next outage, or whether we'll keep learning the same lesson the hard way.

Were you affected by today's outages? Which services did you lose access to? Share your experience in the comments below.

The Internet Just Had Its Worst Nightmare: Microsoft Azure AND AWS Went Down Simultaneously