On November 18, 2025, a large portion of the Internet staggered to a halt not due to an attack, but because Cloudflare, one of the world’s most critical Internet infrastructure providers, suffered a cascading internal failure. A database permission change caused a core configuration file to double in size, triggering widespread system crashes across Cloudflare’s global network. For users, the result was simple: websites wouldn’t load, apps failed, and critical services returned 5xx errors for hours. But behind this simple failure lies a deeper lesson about centralization, dependency, and how small internal mistakes can ripple across the entire Internet.
Cloudflare sits at the heart of global connectivity. When it fails, the impact is not isolated it spreads across thousands of websites, APIs, DNS lookups, authentication systems, payment providers, and applications. This outage wasn’t a cyberattack. It wasn’t a hardware breakdown. It was a reminder that even the Internet’s most resilient infrastructures can collapse from a single misconfiguration.
What Actually Happened Inside Cloudflare
At 11:20 UTC, Cloudflare’s network began returning high volumes of HTTP 5xx errors. The issue originated inside the Bot Management system, which relies on a frequently refreshed “feature file” that feeds the machine learning engine used to score and classify traffic.
Here’s the chain reaction:
A database permission update created unexpected duplicate data
A ClickHouse query that generates the bot feature file began returning extra metadata rows. This doubled the file size.
The oversized file was propagated globally
Every Cloudflare server loaded the new configuration file exceeding the memory limits designed for the bot feature module.
The system began to panic
When the feature count passed 200 (the preallocated limit), the proxy crashed, resulting in 5xx errors for traffic passing through Cloudflare.
The situation fluctuated and misled engineers
Since only part of the database cluster was updated, some queries returned “good” files and others returned “bad” ones. This caused Cloudflare’s systems to oscillate between recovery and failure, initially resembling signs of a massive DDoS attack.
Downstream services broke as well
The outage impacted
- CDN and security services
- Workers KV
- Access (authentication failures)
- Dashboard logins (Turnstile unavailable)
- Increased latency across the network
Workers KV and Access were manually routed around the core proxy at 13:05, reducing partial impact.
Resolution required halting propagation
At 14:24, Cloudflare stopped generating new files, pushed a known good version, and restarted core proxies worldwide.
By 17:06, all systems were stable.
Why a Small Internal Change Caused a Global Incident
Although Cloudflare is one of the most fault-tolerant networks on Earth, this outage demonstrates an uncomfortable truth:
The Internet is more centralized than people realize
Cloudflare sits between users and thousands of services. A failure there cascades instantly, even when last-mile connections are fully functional.
Configurations are more dangerous than code
A software bug can be predictable.
A configuration error can instantly cripple an entire distributed system.
Automated propagation multiplies risk
A bad file pushed to a global edge network means the failure spreads everywhere in minutes.
Debugging at hyperscale is extremely difficult
Cloudflare’s own status page hosted externally went offline at the same moment, leading engineers to suspect a coordinated external attack.
Even internal safeguards have limits
The bot feature module had memory preallocation boundaries, but those boundaries, once exceeded, shut down major traffic flows.
A Reminder About Internet Fragility
Despite its size, the Internet still relies on a few critical infrastructure providers. When one of them suffers a fault especially in DNS, CDN, or traffic routing the effect is immediate and global.
This outage reinforces several lessons:
- No provider, no matter how advanced, is immune to internal failures.
- Redundancy does not eliminate the risk of human error.
- Over centralization amplifies the severity of outages.
- Businesses must architect systems expecting providers to fail.
The incident also mirrors outages recently seen in Azure and AWS all driven not by attacks, but by misconfigurations.
What Organizations Should Learn
Multi-provider redundancy is not optional
Critical applications should never rely solely on one CDN, DNS, or authentication provider.
Monitor third-party dependencies
Many businesses realized only during the outage how much they indirectly depended on Cloudflare.
Understand your blast radius
If a single vendor outage breaks your core service, your architecture is too centralized.
Expect outages even from the best providers
Cloudflare itself states that these events are “unacceptable,” but also inevitable at global scale
The November 18 outage was not a cyberattack. It was a reminder that in a world defined by complex, interconnected systems, resilience is never guaranteed. A small internal permission change in a database created one of Cloudflare’s most significant failures since 2019 proving that even the strongest infrastructures can fall from tiny misalignments.
For the industry, this is a wake-up call: scale brings strength, but also fragility. And the more we rely on centralized cloud and edge providers, the more critical it becomes to design architectures that can survive when not if those providers go down.
If this outage taught the industry anything, it’s that your real perimeter isn’t your firewall it’s every SaaS, API, and third-party service your business depends on.
Want to understand how modern supply chain risks actually work, and how attackers exploit them long before you ever notice?
Read our deep-dive: The New Perimeter is the Supply Chain | Managing Third-Party and SaaS Risk.
Strengthen your architecture before the next outage or misconfiguration takes you down.

No. The failure resulted from an internal configuration file that exceeded expected limits due to a database permission change.
Because Cloudflare sits at the core of global Internet infrastructure, powering DNS, CDN, traffic routing, and security services used by thousands of companies.
The system alternated between good and bad configuration files, temporarily recovering and failing initially resembling a high-scale attack.
Core CDN, Workers KV, Access authentication, the Dashboard login, and bot scoring functionality.
Cloudflare halted propagation of the corrupted file, restored a known-good version, and restarted core proxy systems.
Yes. Any large scale distributed system can experience similar failures from misconfigurations, bugs, or cascading internal errors.
Sources
Cloudflare outage on November 18, 2025 – Cloudflare
Cloudflare resolves global outage that disrupted ChatGPT, X – businesstimes


