Social media giant Facebook’s main platform, WhatsApp and Instagram didn’t just suffer an outage last night. They almost literally disappeared from the internet. The three platforms went down at around 9.30pm IST and services were restored over six hours later, after 4.30am. The companies had all addressed the outage but hadn’t given an explanation for why they went down...until now.
The social media giant issued an update today through its Engineering blog. According to Facebook, “configuration changes on the backbone routers that coordinate network traffic between our data centers caused issues that interrupted this communication. This disruption to network traffic had a cascading effect on the way our data centers communicate, bringing our services to a halt.”
That probably doesn’t make sense to the layman, because it is indeed a lot of engineering jargon to explain the outage. A better explanation comes from web infrastructure and security provider Cloudflare.
“It was as if someone had ‘pulled the cables’ from their data centers all at once and disconnected them from the Internet,” the company wrote in a blog. Cloudflare is one of the biggest service providers for websites and platforms looking for protection against specific kinds of hacks, etc.
DNS and BGP
It comes down to two key technologies in the internet infrastructure — domain name system (DNS) and border gateway protocol (BGP).
The DNS is often called the address book of the Internet. It is responsible for converting human readable websites names, like facebook.com, to machine readable IP addresses, like 192.168.1.1. Since computers can’t actually understand the English language and we humans don’t usually understand binary, the DNS is an essential technology for the operation of the internet.
As for BGP, it’s important to remember that at the end of the day, the internet is just a network full of networks. The BGP is what allows one network to talk to another and announce its presence to the world of computers that is the Internet. “It's a mechanism to exchange routing information between autonomous systems (AS) on the Internet. The big routers that make the Internet work have huge, constantly updated lists of the possible routes that can be used to deliver every network packet to their final destinations. Without BGP, the Internet routers wouldn't know what to do, and the Internet wouldn't work,” explained Cloudflare.
Here’s what happened
We may call them platforms, but Facebook, Instagram and WhatsApp are essentially networks on the Internet. When you connect from your Airtel-enabled smartphone to Facebook, the Airtel network calls the Facebook network and exchanges data. Each individual network on the world wide web has an autonomous system number (ASN), which is used to tell the BGP what route the data will take across a network, or a network of networks, which is Facebook.
As explained by Dane Knecht, Senior Vice President of Cloudflare, via a tweet last night, Facebook’s DNS and BGP were “withdrawn from the internet” last night. The social media giant hasn’t particularly explained what led to this withdrawal, but it essentially means that for every phone trying to ping the Facebook servers, the internet infrastructure was drawing a blank. They existed physically, but there was no way to connect to them.
The DNS is the address book, the BGP helps you navigate around that address book. But can you find a name that isn’t in the address book to begin with?