This blog post assumes that you have read / know about the topics mentioned in the previous blog post. If you have not taken a look at the previous post, you can do so here.


The internet is a network of network consisting of over 50,000 Autonomous Systems (AS) that connect to each other using IXPs and Peering.

Within each AS, they make use of intra-domain routing protocols such as OSPF and BGP to route traffic within their own network. The networks within each AS are usually small. As a result, intra-domain routing protocols usually send large information about each router to each other to route traffic between them.

Between ASes, they make use of inter-domain routing protocols such as BGP to route traffic between each other. There are a large number of routes advertised by each AS. Using intra-domain routing protocols for inter-domain routing would be inefficient and would require large amounts of memory within each router.

Challenges of Inter-Domain routing

There are 3 main factors that make inter-domain routing challenging:

  1. Scale
    1. Around the world, there are millions of routers and over 200,000 prefixes
    2. 35,000+ self operated networks and 50,000 Autonomous Systems
  2. Privacy
    1. ASes do not want to expose internal topologies or business relations with neighbors
  3. Policy
    1. There is no internet wide notion of a link cost metric
    2. Each AS controls where it sends traffic and who can send traffic through it

Types of Routing algorithms

Routing algorithms can be classified into 2 main categories:

  1. Link State Algorithms
  2. Distance Vector Algorithms

Link state algorithms send complete topology and link cost information over the network. The information is then centralized within each router and the shortest distance to each node is computed using Dijkstra’s Algorithm An example of this is the intra-domain routing protocol Open Shortest Path First (OSPF).

However, there are some limitations of Link-State routing

  1. Topology information is flooded
    1. This incurs high bandwidth and storage overhead
    2. Nodes also sends sensitive information
  2. Entire path is computed locally per node
    1. This results in a high processing overhead when the network is large
  3. Minimize notion of total cost
    1. This only works of the policy is shared and uniform among all the routers running the protocol.

Distance Vector Algorithms

Distance Vector Algorithms send known neighbors and link cost information over the network. They make use of decentralized algorithms to compute the shortest distance to each node (Bellman Ford Algorithm). An example of this is the intra-domain routing protocol Routing Information Protocol (RIP).

Some benefits of this approach include:

  1. Hides details of the network topology
  2. Only next hop is determined per node

However, there are some limitations of Distance Vector routing

  1. It minimizes the notion of total distance
  2. The time taken to converge is also very long

Path-Vector Routing

Based on the above, a possible solution for inter-domain routing is by extending the Distance Vector Algorithm approach.

By extending the distance vector routing, path-vector routing routing is born.

The key idea is to advertise the entire path taken to reach a destination instead of just the next hop. Information about the distance metric per destination and the entire path for each destination is sent between routers.

This allows for many advantages:

  1. Faster loop
    1. To detect loops, a single node can check if the next hop consists of itself
    2. If such a path is found, the advertisement is discarded
  2. Flexible routing policies
    1. Each router can decide how to route traffic based on the path advertised

Border Gateway Protocol (BGP)

Border Gateway Protocol (BGP) is an inter-domain routing protocol that is used to exchange routing information between ASes. It is the de facto standard for inter-domain routing.

It allows subnets to advertise its existence to the rest of the internet while also allowing ASes to determine good routes to other networks based on reachability and policies.

2 Types of BGP

There are 2 types of BGP:

  1. Interior BGP (iBGP)
  2. Exterior BGP (eBGP)

Interior BGP (iBGP)

iBGP propagates reachability information across the backbone of an AS. It carries the ISP’s own customer prefixes.

This is used when the neighboring router is within the same AS. It does not require the routers to be directly connected. Intra-domain routing protocols such as OSPF and RIP are used to take care of internal BGP connectivity.

iBGP peers must be full meshed (logically). In other words, they must have an intra-domain routing protocol to connect to each other (Or each router is physically connected to all other routers in the AS). iBGP then run on top of the Intra-domain routing protocol to distribute reachability information from external ASes to each of the individual routers. A similar analogy will be how UDP runs on top of Ethernet.

Exterior BGP (eBGP)

eBGP exchange reachability information from neighboring ASes and implement routing policies. This happens between BGP Speakers from different ASes which are directly connected.

When AS A advertises a prefix to AS B, AS A is promising that it will forward the packet towards that prefix. ASes can aggregate prefixes in its advertisements.


Sample Topology

To make things clearer, we will make use of the diagram above and the table below to explain their differences.

IGP (OSPF, RIP, IS-IS)Distribute internal routes to one anotherWhen Advertising Router B to Router A & Router A to Router B
iBGPDistribute external routes to internal routers (Using IGP)When Advertising Router C to Router B
eBGPDistribute routes from 1 AS to anotherWhen Advertising Router A & B to C and Router C to Router A

How does BGP work?

BGP Message Format

BGP Message Format

A BGP message consists of the following:

Marker16 bytes of all 1sFor compatibility with old BGP versions with no semantic in the current BGP version
Length2 bytesLength of the message
Type1 byteType of the message

Depending on the type, the message will have different fields.

BGP Messages

There are 4 types of messages sent by BGP

1OpenOpens a TCP connection to the peer and authenticates sender
2UpdateSends routing information to the peer (Advertise new paths / withdraw old paths)
3NotificationSends a notification to the peer and reports errors in the previous message, also used to close connections (e.g. Malformed Message)
4Keep AliveSends a keep alive message to the peer to keep the TCP connection alive (Also acts as ACK for OPEN)

One interesting thing to note is that paths have no timeout in BGP, they are actively withdrawn or replaced.

BGP Flow

BGP Flow

The diagram above shows the states of a BGP session between 2 routers.

  1. Idle
    1. This is the first state where BGP waits for a start event
    2. Sets up ConnectRetry timer
    3. Connects to remote BGP neighbor to establish a connection
    4. Listens for a connection from the remote BGP neighbor
  2. Connect
    1. BGP waits for a TCP handshake
    2. If it is successful it moves to the OpenSent State.
    3. If it fails, it goes to the active state
    4. If the ConnectRetry times out, it stays in this state
    5. Else, it goes to the Idle state
  3. Active
    1. BGP will try a second TCP Handshake
    2. If successful, it moves to the OpenSent State
    3. If ConnectRetry Timer expires, goes back to connect state
    4. Listens for incoming connection.
    5. Else, goes to Idle state
  4. OpenSent
    1. BGP waits for an open message from the remote BGP neighbor.
    2. If there are errors in the message, it responds with a BGP notification msg and goes to idle
    3. This is the section where BGP decides if they are using eBGP or iBGP.
    4. If everything is ok, BGP sends keep alive messages and the lower value of keep alive is used
    5. If the TCP session fails, BGP goes back to active state
    6. If there are any other errors, BGP sends notification msg with the error code and goes to idle.
    7. Once everything is sent, BGP goes to the OpenConfirm state
  5. OpenConfirm State
    1. BGP waits for a keep alive message
    2. When we receive the keep alive message, we move to the established state
    3. Timer is reset
    4. If a BGP notification is received, go to idle
  6. Established
    1. The BGP neighbor adjacency is complete. BGP routers will send update packets to exchange routing information.
    2. Every time a keep alive or update message is received, the timer is reset.
    3. If a BGP notification message is received, go to idle

BGP Path Attributes

BGP uses a list of path attributes which describes the path to the destination.

They fall into 4 categories

  1. Well-Known Mandatory
  2. Well-Known Discretionary
  3. Optional Transitive
  4. Optional Non-Transitive

There are also implementation rules that goes with this categories

  1. Implementations must recognize all well-known attributes
  2. Mandatory attributes must be included within UPDATE messages that contain Network Layer Reachability Information (NLRI)
  3. Once a BGP peer updates well-known attributes, they must pass them on to other peers

Common Path Attributes

OriginWell-Known MandatoryConveys the origin of the prefix. Either (i) IGP, (e) EGP, or (?) Incomplete or Unknown Source.
AS_PATHWell-Known MandatoryContains the ASes which NLRI has passed through.
NEXT_HOPWell-Known MandatoryIndicates the IP address of the router in the next hop
LOCAL_PREFWell-Known DiscretionaryThis is used to select external BGP path preferences for a route. The higher the local preference, the more preferred the path becomes
ATOMIC_AGGREGATEWell-Known DiscretionaryUsed to notify downstream BGP routers of suppressed routes
AGGREGATOROptional Discretionary
COMMUNITYOptional TransitiveCommunity tags are used to group multiple destinations together. This is very useful in applying policies within and between ASes.
MULTI_EXIT_DISC (MED)Optional TransitiveUsed to identify multiple entry points to a neighboring AS to control inbound traffic. If received over eBGP, it will not be propagated to other ASes

Entries in forwarding tables

  1. Routers are first aware of the destination prefix (Through BGP advertisements)
  2. Router then determines the output port for the IP prefix (Choose best inter-as route, use IGP to determine the best intra-as route & identify port number for best route)
  3. Router enters the prefix-port pair in the forwarding table

There might be multiple entries for the same prefix in the forwarding table. The router will make a decision on which route to use.

When a packet is received, the router looks up the forwarding table to see where the packet should be forwarded to.

Hot Potato routing

By default, BGP routers will use Hot Potato routing to route packets.

What is Hot Potato Routing?

Hot Potato routing is a routing protocol where the router will forward the packet to the closest route outside the current AS.

An analogy is passing a hot potato among a group of friends. Everyone will want to pass it on to the next person as soon as possible. Similarly, an AS will find the shortest route out of the AS into another external AS as soon as possible.

Note: This may sometimes lead to longer paths from source to destination as the closest path out of the AS might not be the lowest cost for sending the packet from source to destination.

BGP Decision making process

BGP Decision Process

As shown in the diagram above, when there is an inbound advertisement, it is first filtered by the import policies before being handed over to the Adjacent Routing Information Base (RIB) in.

The router then selects the best path to be installed within the local RIB. The paths from the local RIBs will then pass through a set of export policies to the Adjacent RIB out before the update is sent out to other peers.

BGP Policies

There are 2 sets of policies in BGP:

  1. Import policies
  2. Export policies

There are 3 different attributes that routers can make use of to tune their BGP policy

  1. Preferences: Add / Delete / Modify Attributes
  2. Filtering: inbound/outbound filtering
  3. Tagging: Adding community attributes

Import policies

Import policies serve the following roles

  1. Filter unwanted routes from neighbors (EG: Prefixes not owned bu customers)
  2. User to prioritize routes
  3. Influence path selection

Export policies

Export policies serve the following purpose

  1. Filter routes that you do not want to tell your neighbors (EG: Export only customers routes so others don’t route traffic through you)
  2. Manipulate attributes to control what others see (EG: Make routes look longer)

BGP Policies in practice

In practice, BGP policies are used by ISPs to:

  1. Fulfill bilateral agreements with other ISPs
  2. Minimize Monetary costs (Maximize Revenue)
  3. Ensure good performance for customers

We will now examine each relation and see how policies can be applied to maintain the relationship.

Customer-Provider Relation

Customers pay the Provider for access to the internet.

In terms of policies, this translates to:

  1. Export customer routes to everyone
  2. Customer exports provider routes to their customers

Peer-to-Peer Relation

Peers exchange traffic between their customers

In terms of policies, this translates to:

  1. Peers Exports only customer routes to each other
  2. Peers export their other peer’s routes to customers

Attacks on BGP

We will be discussing 2 main attacks on the BGP protocol

  1. BGP Prefix Hijacking
  2. BGP Sub prefix Hijacking

BGP Prefix Hijacking

This happens when an attacker router advertises the same route as the origin router. Traffic which is closer to the attacker router will be redirected to the attacker router.

BGP Sub prefix Hijacking

This happens when an attacker router advertises a more specific prefix. Every AS picks the bogus route for that prefix as the longest matching prefix is used.

Traffic is redirected away from the origin AS.

Follow up on hijacking

After Hijacking the IP address, the attacker can do one or more of the following.

Black holeData for the traffic is discardedThis results in the loss of the service to affected customers
SnoopingData is inspected before it is redirected to the origin serverTraffic can be captured and information might be sniffed
ImpersonationAttacker replies the messageThe attacker replies the message as per usual and fake replies are sent to the user.

Preventing Hijacking

A common practice to prevent hijacking is to filter routes announced by its customers based on the prefixes that the customer owns. This prevents them from advertising prefixes which they do not own.

However, some routes do not implement this practice as it is hard to filter routes which are very far away from the origin. As a result, BGP is still vulnerable to hijacks.

There are up and coming technology used to stop hijacking:

  1. Route Origin Authentication (ROA)
  2. Resource Public Key Infrastructure (RPKI)

We will not be going through them in this blog post. (Maybe in a future blog post?)