According to a recent report by Information Technology Intelligence Consulting, an hour of downtime cost data center operators USD $260,000 on average in 2018. Here is a summary of the outages that made the news in the last six months, and the key takeaways for data center operators.
Google goes down twice within four months
In March, Google experienced a global outage that lasted 4.5 hours and disrupted its Gmail and Google Drive services. The day prior, a three-hour outage in the Google Cloud Platform had affected more critical enterprise applications, such as the Google App Engine, the web framework and cloud computing platform for developing and hosting web applications in data centers managed by Google. Google did not provide an explanation for either outage.
In November of last year, Google’s customer traffic was wrongly diverted to China then Russia, which, considering the current geopolitical climate, created speculation of a state-sponsored data harvesting effort. Customers noticed problems connecting to G Suite, Google Search and Google Analytics. The next day, MainOne — a small ISP in Nigeria who has a peering relationship with Google via IXPN in Lagos — sent out a tweet that the root cause of the problem was a configuration error. It took 75 minutes for MainOne to be alerted of the problem and fix it, and about 45 minutes for services to be restored.
Facebook suffers its worst outage in history
Also in March, social media and advertising giant, Facebook, suffered one of its most sustained outages since 2008. The 14-hour outage affected millions of its users and advertisers around the world. Messenger, Instagram, WhatsApp and Facebook Workplace also went down with the mothership. Facebook is investigating the overall impact of the outage, including the possibility of refunds for advertisers. 2019 sales estimates put Facebook’s daily ad revenue at USD $250 million, making any downtime a big blow to its bottom line.
Considering Facebook’s user base of 2.3 billion and counting, the prevailing theory is that the outage was caused by capacity problems and/or a traffic routing error. In an ironic twist, Facebook went on Twitter to debunk speculations. Facebook later attributed the problem to a server configuration change.
Wells Fargo mobile and internet banking go offline
In February, Wells Fargo customers were shut out of mobile and internet banking services due to a Wells Fargo data center outage in Minnesota. Customers also reported issues using their Wells Fargo credit and debit cards. Two days after the outage, the fourth largest bank in the US had restored critical systems, mobile app, website and ATMs, throughout the country to normal service.
Wells Fargo representatives claimed that the outage was caused by an automatic power shutdown at one of the bank’s main data-center facilities, triggered by smoke created during routine maintenance activities in the building.
British Airways to sue CBRE over 2017 airport outage
British Airways has appointed law firm Linklaters to launch a legal battle against property specialist, CBRE, over 2017’s outage that brought down a British Airways data center operated by CBRE.
The three-day outage sent Gatwick and Heathrow’s busy bank holiday schedule into chaos. The airline was forced to cancel 672 flights, leaving 75,000 passengers stranded at an estimated cost of $75 million. British Airways Chief Executive, Alex Cruz, shared in an interview with the Mail that the airline has invested in new data centers to avoid any repeat occurrences.
Takeaways from the recent outages
Outages in recent months support the findings of a survey by Uptime Institute, which identified the top three causes of downtime to be power outages, network failures, and IT or software errors. 80% of the data center managers surveyed admitted that their most recent outage could have been prevented, indicating that continued investment in process improvement is likely to show positive returns.
According to Markku Rossi, CTO at SSH Communications Security, it’s important for data centers to cultivate the mindset that they need to be ready if something happens and infrastructure redundancy is still key. Rossi suggests that every data center should have a secondary data center with physical isolation between them, so they don’t rely on the same energy source. According to Uptime, data centers with a fully redundant, mirrored system experienced one-third fewer outages.
– – –
Like to know more? Keep up-to-date with data center and service provider news, via the ‘Subscribe’ box on this page. If you’d like to add your data center to Cloudscene or want to find out how you can claim your profile on our site, please reach out to our team.