Can a typo ruin a business – or the world? The answer is less obvious (and much more scary) than you think.
The news drowned almost completely among stories from Ukraine, California, Australia and the far east. After all it wasn’t life threatening in itself. But it could have been. For at least 12 hours in late July, most of Apple’s Internet traffic was routed through Russia. Seemingly minor. 12 hours is nothing, is it? Indeed it is – this is seriously big. Even 15 minutes would be big. Because this shouldn’t be possible. The fact that it is, brings up all kinds of interesting scenarios – but let’s take a quick look at the facts first:
- Rerouting traffic in the Internet is technically easy, as it must be. The traffic is enormous, the dependency almost total and rapid rerouting means resiliency. It actually happens all the time to get around small or large outages, planned changes, maintenance etc.
- The ‘accident’ involving Apple’s traffic didn’t disrupt any service, but slowed down traffic and reduced capacity for a while.
- It is not obvious that the ‘accident’ – caused by Russian ISP Rostelecom – was malicious or intentional. It cannot be ruled out, but an accidental typo (!) is more likely: The rerouted address (owned by Apple) was 18.104.22.168/19 while an almost identical address, 22.214.171.124/19 – is owned by Rostelecom. The difference is – in technical terms – one single bit. A Rostelecom operator may simply have mistyped the address in a configuration file.
- Apple took appropriate action less than an hour after it happened, but there is only so much you can do in a hurry when something like this happens. Thus the 12 hours to restore ‘normal’ flow of traffic.
Fascinating or scary depending on your point of view. Fascinating as in ‘can a typo really cause that much harm?’ The answer is ‘yes, but’ and let me get back to the ‘but’ in a minute. Scary as in ‘can the Internet really be brought down that easily’? This time the answer is ‘no, but’. The Internet as a whole is extremely robust. Smaller or larger parts can be brought down – by typos, acts of terror, solar storms, acts of nature etc., but hardly the entire net. In the case at hand, if Rostelecom wanted to sabotage Apple, they could have sent all the traffic to ‘/dev/null’ which is tech-speak for ‘black hole’, effectively blocking users and customers from reaching the company. They may in fact not have been aware of the error until contacted by the regional Internet authority.
Can this happen to say, the French military, the German government, the White House, the Navy, Facebook or Amazon – to mention a few? Possibly. Actually it did happen to parts of Amazon’s network just a couple of years ago, when an ISP operator in Pakistan mistyped an address and sent lots of asian Amazon traffic astray.
In other words, this is not a new thing. It has happened (big time) at least 4 times in the last couple of years, which demonstrates how fragile parts of the Internet is. That said, and as alluded to earlier: The same mechanism that makes this type of failure or hijacking possible, is also the reason the Internet is so resilient.
Which brings us to the good news: It is possible for companies like Apple, Amazon etc, government bodies and big ISPs – owners of Internet address blocks – to protect themselves from such accidents. The protection mechanism has been in operation for many years, and it’s actually shame on Apple that this particular incident happened. Known in the tech community as ‘Route Origin Authorization’ or ‘ROA’, the mechanism has existed since 2012 and been in widespread use for more than 5 years.
In other words, the Apple/Rostelecom incident is more of a reminder to all Internet operators to keep their house in order than a reason for grave security concern: It’s a threat easily dealt with.
The next and just as important question is ‘what’s the actual risk’ when such incidents happens? Here’s more good news: Apart from the denial-of-service threat I already mentioned, there is little to worry about. Almost all traffic on the Internet these days is encrypted. Light encryption (SSL) for regular email and browser/web-traffic, ‘heavy’ encryption for most other types. Very different from 10 years ago and very encouraging for users and security professionals alike.
Which brings us to the final point, about operators, operations and responsibility. Ultimately, this and many other incidents were small operator errors with big consequences. Why do they happen, who’s to blame and what is being done about it? As always, when the ‘blame’-card is played, tensions rise and discussions get heated. Even with rigorous quality controls in place, configuration errors will occur and early detection is important to catch them fast. Continuous network monitoring is the name of the game, but there is a ‘turf problem’. While network configuration and maintenance is the responsibility of the NOC, Network Operations Center, the SOC – Security Operations Center – has the best tools to detect errors. Not a problem, right? They work closely together anyway. Except in many cases they don’t. It’s time to change that. Actually it’s critical.
Expert John Pescatore of the SANS Institute put it this way:
When you look at which groups have responsibility for availability and integrity of critical services, IT ops and Network ops really carry the bulk of the responsibility and authority, not security. But many of the tools and services that security uses will detect misconfigurations before IT/Network ops does. Integrated SOC/NOCs bring many benefits, next best thing is common or at least integrated tools being used across the groups.
Touché. The day Apple was Russian was a short day with small consequences – and a big warning.