An Introduction To BGP Traffic Shaping

Border Gateway Protocol (BGP) is the protocol of the Internet. It is the main external gateway protocol used to connect separate networks to one another. This article provides an easy to understand introduction to BGP and how it is used to shape traffic. It describes how BGP is used to ensure that the Internet is able to function correctly and how data is usually sent to the correct destination. It also details a number of ways in which networks can choose which path their data should take.

The Problem BGP Solves

When sending data over a network, all the devices need to know where to send that data. Routers and switches have multiple ports. To know which port to send a specific data packet out of, they need to have some kind of routing table. There are different protocols which can be used by switches and routers to determine where to send a specific packet. Routers figure all this out by communicating with each other using different routing protocols. BGP is one such routing protocol; others include OSPF and EIGRP.

What makes BGP stand out from other routing protocols is that it is used for external communications. It is how different networks communicate with one another. Sending data to a different network is more difficult than sending data inside your own network. There are a number of potential problems when communicating with other networks that don't exist when communicating within your own network.

These problems include:

There are other considerations when designing and maintaining a network which communicates with the Internet, but these are some of the key problems to look out for. BGP does a good job of addressing each of these.

In essence, BGP solves the problem of how to effectively communicate data between separate networks. These networks might have different internal routing protocols, they might be managed in completely incompatible ways. However, BGP provides a mutual language that all networks can understand and use to communicate with one another. BGP is a powerful tool for choosing which paths data takes to reach its destination.

External v. Internal Traffic

A single network is operated and maintained by a single entity. There is a set of procedures, policies, best practices and protocols which are used by this single entity. The network is run in a particular way.

Another network might be run in a totally different way. The Department of Defense is likely to have very different concerns than Facebook. These different concerns are going to show up in how each network is designed and operated. But Facebook and The DoD will still want to be able to communicate and pass traffic to one another. Because of this dilemma, there are two types of network protocols. Internal and external gateway protocols.

As you may have guessed from the names, one set of protocols is used internally, within a network, while the other is used externally to communicate with outside networks. The DoD may have a custom, highly encrypted, very secure internal network that might not be compatible with the internal network of Facebook. To communicate with one another they use an external protocol, such as BGP.

An external protocol is setup on a network's edge routers. An edge router is a router that is at the edge of a network. It is a router that acts as a border between the internal network and outside networks. This edge router is likely running two (or more) different protocols. That network's internet gateway protocol (such as OSPF), and the external gateway protocol (BGP).

All routers have a routing table which is used to determine where to send data. A routing table can be made up of different routes learned from different routing protocols. So an edge routing running BGP and OSPF will have a single routing table which will have routes learned from both protocols.

This routing table is then used to figure out where to send packets. This combination of routes learned through OSPF and through BGP allows the router to efficiently communicate between the external network and the internal network.

Often a network will also run iBGP, which is an internal version of BGP, along with OSPF or another internal gateway protocol.

BGP Sessions

A BGP session is a connection between two networks using the BGP protocol. A BGP session requires at least two IP addresses within the same network block (ie. a /30). Each side of every BGP session will have a unique IP address.

Networks can connect using IPv4 or IPv6 BGP. An IPv4 session will use a set of IPv4 addresses for the session and will be used to transfer IPv4 routes. IPv6 BGP uses an IPv6 pair of addresses and will be used to exchange IPv6 routes.

Two networks can have multiple BGP sessions with each other at different locations or even at the same location using the same equipment. A router can be setup to have both IPv4 and IPv6 BGP sessions, and/or multiple IPv4 and IPv6 sessions.

Each network must also have an Autonomous System Number (ASN) identifying the network.

Announcing BGP Routes

BGP is used by routers to learn routes (destinations). A route is an IP block such as 10.10.10.0/24 (this example is not a public IP). Specifically, BGP is used as a way to communicate routes from one network to another network.

A network announces which routes it knows how to reach. If Facebook knows how to reach 10.10.10.0/24, it can announce this block to all the other networks it connects to through BGP. Announcing a network means you are claiming that you have a route to that IP block. The IP block might not be within your own network. You might have learned how to reach that IP block from another network. So an announced route does not have to come from within a network. The destination can be within another network that Facebook is connected to.

For example, say Facebook peers with (peering is another work for connecting to) Amazon, and also peers with the Department of Defense. Lets say Amazon owns IP block 10.10.0.0/24 and announces this to Facebook using BGP. Facebook can then announce this block to the Department of Defense. Even though the block is not within Facebook's network, Facebook knows how to reach this block. The DoD can then send packets destined for the 10.10.0.0/24 route to Facebook, who sends it on to Amazon.

One important thing to know is that a network chooses what routes to announce to other networks. Even if Amazon is announcing that block to Facebook, Facebook does not have to announce the block on to The DoD. If Facebook chooses not to announce that block to The DoD, then The DoD will have to find a different path to reach that IP block.

BGP allows networks to choose what they announce, to whom and at which peering locations. Facebook and Amazon might have two different peering locations, one in San Francisco and one is New York. Amazon can choose to announce 10.10.0.0/24 to Facebook only in San Francisco. That means that Facebook will only be able to send data to that IP block through their San Francisco peering location, and not in New York.

As you are starting to see BGP is great at allowing networks to shape how traffic to their network travels. The fact that networks can decide what to announce to whom and where is a key difference between peering and IP transit.

Accepting BGP Routes

Not only can networks choose what they announce, they can also choose what to accept. For a route to be propagated, it not only needs to be announced, but also accepted. If Amazon is announcing 10.10.0.0/24 to Facebook, Facebook must still accept that route. If Facebook chooses not to accept that route, then it is as if Amazon were not announcing the route. Choosing to not accept a route means that Facebook's routers do not learn that route from that particular BGP session.

Most larger networks filter what they accept or the amount of routes that they will accept. This is done to prevent route leaking or hijacking. A route leak happens when someone misconfigures an edge router and attempts to announce IP blocks that do not belong to their network. A few years ago there was a route leak which cause all traffic meant for Youtube to be sent to Pakistan. This route was accepted by a major network and propagated throughout the Internet. This lead to Youtube being inaccessible for large sections of the Internet, as all destination traffic was sent to Pakistan rather than to Youtube's servers.

Route hijacking is similar to a route leak, only it is done on purpose. A route hijacking is usually done by spammers or other malicious actors. They claim to have permission to announce an IP block that they do not actually have permission to announce. If their announcement is accepted, they can then use these IP addresses as an origin point for their spam emails.

Because of these potential issues, BGP allows networks to filter what they accept. Some ISPs will create explicit filters where they ask a customer for all the routes the customer plans to announce. If a customer attempts to announce a route that is not within their ISPs filter, the ISP will not accept that route.

Networks can also create prefix limits, limiting the amount of routes announced to them. If another network attempts to announce too many routes, the BGP session is turned off. This can prevent networks from leaking more routes than they actually mean to announce.

So, BGP also allows networks to control which announcements they will accept. Again, this is great for shaping and managing traffic.

BGP Path Selection

Often a router will have multiple paths to a destination. When this happens BGP has a few metrics to decide which is the best path to reach a destination. Adjusting this selection criteria allows a network to shape which paths traffic primarily goes through.

Lets look at a made up example.

Lets say Amazon owns the block 10.10.0.0/22. Lets also say that Amazon buys Internet from Level 3 and from Centurylink. It has a BGP session with both networks. Amazon can announce 10.10.0.0/22 to both Level 3 and Centurylink.

Lets say Facebook wants to send data to 10.10.0.4. Where will Facebook send that traffic? To Level 3 or to Centurylink? It depends.

One of the main things BGP looks at when selecting the best route is the number of networks that the path goes through before reaching the destination. So If Facebook connects directly to Level 3, then the data goes from Facebook --> Level 3 --> Amazon. If Facebook does not have a direct relationship with Centurylink, the path will be longer: Facebook --> Some networks --> Centurylink --> Amazon. If this is the case, and all else is equal, then the data will always be sent through Level 3.

AS Prepending

But what is all else is equal and Facebook has a direct connection to both Level 3 and Centurylink? Then the paths are equal distant:

Facebook --> Level 3 --> Amazon

Facebook --> Centurylink --> Amazon

In this case other metrics will be used. But, lets say that for Amazon, Level 3 is cheaper than Centurylink. Amazon can influence the path the data takes by using something called AS prepending. This is a way to add extra distance in the network path. AS prepending can make the paths look like this:

Facebook --> Level 3 --> Amazon

Facebook --> Centurylink --> Amazon --> Amazon --> Amazon

Now it looks like the path through Centurylink is extra long because it looks like there are two 'fake' Amazon networks that the data has to go through the reach the final destination Amazon. All else being equal, Facebook will now send all the data through Level 3, because it looks like the shortest path. AS prepending is a popular way for destination networks to manage where traffic is sent.

Local Preference

But the sending network also has a way to shape this traffic.

Lets say the above is all true, but Facebook is also paying Level 3 and Centurylink for Internet. And let's say that Facebook gets cheaper Internet from Centurylink. So Even though Amazon wants traffic to be sent through Level 3, Facebook will want traffic to go through Centurylink.

Facebook can use something called local preference to decide where to send their outgoing data. Local preference is a BGP metric that decides which path is preferred. If Facebook sets a higher local preference number for Centurylink, then the traffic will go through Centurylink rather than Level 3.

A local preference trumps an AS path, so even if Amazon attempts to make the AS path look longer by using AS prepending, the traffic will still go through Centurylink. A local preference beats an AS path length.

Most Specific Route

In the end, the originating network (in this case Amazon) is always able to shape traffic how they want. This is because a more specific route will always win. This means that a smaller IP block is going to be preferred to a large IP block. This beats all other BGP criteria.

What this means is that a /24 is always going to beat a /23.

So what Amazon can do is take 10.10.0.0/22 and announce it in two ways to their two upstreams. Since Amazon wants to avoid sending traffic through Centurylink, they can announce 10.10.0.0/22 to Centurylink. This announcement is accepted as normal and passed on the the Internet, include being passed on to Facebook.

Amazon can then split 10.10.0.0/22 into 10.10.0.0/23 and 10.10.0.2/23, and announce those two more specific routes to Level 3. Level 3 then passes those routes along to the rest of the Internet as two /23s, which are more specific than a /22. All traffic destined for any IP within 10.10.0.0/22 will now go through Level 3.

Facebook can choose to not accept the routes from Level 3, but that is generally bad practice. If the Centurylink connection were to go down, then Facebook would not be able to reach 10.10.0.0/22 because they have chosen to not accept that route from that connection.

BGP For Redundancy

As you see from the above, there are different ways in which BGP allows networks to shape their traffic. There is a balance between what the announcing network and the accepting network can do in shaping traffic.

You may be wondering why Amazon would have a link with Centurylink at all, if they want all of their data to go through Level 3. The most common reason is to have redundancy. Network outages happen all the time. fiber gets cut, hardware fails, someone messes up a configuration. A company like Amazon cannot afford to have even a moment of downtime. If their connection with Level 3 were to have a problem, they want to be able to switch traffic over to Centurylink immediately. BGP allows this to happen.

BGP allows networks to have multiple connections to the Internet. The networks can prefer one of these connections, but ll the connections are there are ready to be used if there are ever any problems with the preferred connection.