ZDNet | Security

The cloud broke Thursday and it’ll happen again – how to protect your business before then

dark clouds

Akaradech Pramoonsin/Getty Images

After a rocky Thursday afternoon on the internet, both Google and Cloudflare services appear to be operating normally as of Friday morning. When trouble started, the question wasn’t what’s wrong with what cloud service; it was, what service isn’t down?

Also: The best cloud storage services of 2025: Expert tested

What happened on Thursday? 

First, this was not just an American problem. Google Cloud reported that it was a global problem. Google stated that multiple GCP products were experiencing impact due to Identity and Access Management Service issues.

Also: Where the cloud goes from here: 8 trends to follow and what it could all cost

It also didn’t appear to be an internet problem, per se. There were no reports of troubles with the Domain Name System (DNS) or Border Gateway Protocol (BGP). Internet traffic was proceeding as usual.

The incident started around 1:49 p.m. ET, according to Google. By 3:41 p.m. ET, Google engineers identified the root cause of the issue, but the problem wasn’t fully resolved. By 4:49 p.m. ET, Google reported that all was well, according to the Google Cloud status page.

The company also issued a report about the cause of the outage:

“From our initial analysis, the issue occurred due to an invalid automated quota update to our API management system, which was distributed globally, causing external API requests to be rejected. To recover, we bypassed the offending quota check, which allowed recovery in most regions within two hours. However, the quota policy database in us-central1 became overloaded, resulting in much longer recovery in that region. Several products had a moderate residual impact (eg, backlogs) for up to an hour after the primary issue was mitigated, and a small number recovered after that.”

To prevent this from happening in the future, Google has made the following changes:

  • Prevent our API management platform from failing due to invalid or corrupt data.
  • Prevent metadata from propagating globally without appropriate protection, testing, and monitoring in place.
  • Improve system error handling and comprehensive testing for handling invalid data.

Also: Google rolls out 3 new Cloud Marketplace perks and incentives to keep you loyal

While Google services took the brunt of the failure, it wasn’t alone. In a statement, Cloudflare said that while many of its services suffered intermittent failures, its services were back to normal by Thursday evening.

A Cloudflare spokesperson said, “This is a Google Cloud outage. A limited number of services at Cloudflare that use Google Cloud were impacted. We expect them to come back shortly. The core Cloudflare services were not impacted.”

What can you do when another cloud outage occurs?

If you’re wondering what your business can do to make life easier when — not if — another major cloud outage occurs, well, as tempting as it may be to take your services in-house, you must ask yourself: “Can I equal the major cloud services — AWS, Azure, and Google Cloud — uptime of 99.99%?” Chances are, you can’t.

Also: Why some companies are backing away from the public cloud

What you can do is work on distributing your workloads across multiple cloud providers, eg, multi-cloud or combine public and private clouds, aka hybrid clouds. This reduces your risk of relying on a single provider and allows for failover if one cloud experiences an outage.

Simply using a multi-cloud or hybrid cloud isn’t enough. You need to automate a disaster recovery plan (DRP) to kick in when trouble comes your main cloud provider’s way. This can be as simple as real-time backup of your data or full failover.

Also: It’s a private cloud revival: Why Kubernetes and cloud-native tech are essential in the AI age

If you don’t have the technical expertise in your business to build a DRP, there are companies that can help you set one up and manage it. These include: CommVault, Druva, Flexential, and Tierpoint. If your enterprise relies on cloud services to get business done, I highly recommend talking with one or more of these companies to make sure you can keep operating even when a major cloud goes down.

Get the morning’s top stories in your inbox each day with our Tech Today newsletter.

READ MORE HERE