Jun 23, 2014

BGP Routing Issues Case Study 1 - BGP configuration without filter

I learned BGP since 1998, of course, just like many other people, I made some human errors without fully understanding of BGP protocols. Just copy and paste sample configuration from cisco websites and modified it then applied to the production BGP router. However, its a dangerous thing if you just know part of something without complete knowledge and implement it on the production network.

This is the reason I want to start to share my knowledge and experience about BGP protocol. Maybe it can help some people to prevent doing some ridiculous BGP incidents over internet(ex: advertise private IP or default route to the internet)

Case Study 1 - BGP configuration without filter


The first BGP case study I would like to share is very important for each new BGP administrator, because many people would forget the inbound/outbound BGP policy when they activate the first BGP session over production network. Most new BGP engineer just happy about the BGP neighbor was established and check its routing table only...but there's might something occurred but you would not be aware of it.

As below is a common network topology of a BGP customer.

  • Transit ISP1(AS100) will advertise 100.100.0.0/16~100.103.0.0/16
  • Transit ISP2(AS200) will advertise 200.200.0.0/16~100.103.0.0/16 AS200 is the BGP customer's(AS300) upstream transit ISP. 


After the BGP configuration was done, the customer(AS300) BGP admin checked the routing table as below:

R3#sh ip bgp summary
BGP router identifier 200.200.200.2, local AS number 300
BGP table version is 10, main routing table version 10
9 network entries using 1017 bytes of memory
17 path entries using 884 bytes of memory
6/3 BGP path/bestpath attribute entries using 648 bytes of memory
4 BGP AS-PATH entries using 96 bytes of memory
0 BGP route-map cache entries using 0 bytes of memory
0 BGP filter-list cache entries using 0 bytes of memory
BGP using 2645 total bytes of memory
BGP activity 9/0 prefixes, 17/0 paths, scan interval 60 secs

Neighbor        V    AS MsgRcvd MsgSent   TblVer  InQ OutQ Up/Down  State/PfxRcd
100.100.100.5   4   100      44      44       10    0    0 00:36:45        8
200.200.200.1   4   200      45      44       10    0    0 00:37:53        8


R3#sh ip bgp
BGP table version is 10, local router ID is 200.200.200.2
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal,
              r RIB-failure, S Stale
Origin codes: i - IGP, e - EGP, ? - incomplete

   Network          Next Hop            Metric LocPrf Weight Path
*> 30.30.30.0/24    0.0.0.0                  0         32768 i
*  100.100.0.0/16   200.200.200.1                          0 200 100 i
*>                  100.100.100.5            0             0 100 i
*  100.101.0.0/16   200.200.200.1                          0 200 100 i
*>                  100.100.100.5            0             0 100 i
*  100.102.0.0/16   200.200.200.1                          0 200 100 i
*>                  100.100.100.5            0             0 100 i
*  100.103.0.0/16   200.200.200.1                          0 200 100 i
*>                  100.100.100.5            0             0 100 i
*  200.200.0.0/16   100.100.100.5                          0 100 200 i
*>                  200.200.200.1            0             0 200 i
*  200.201.0.0/16   100.100.100.5                          0 100 200 i
*>                  200.200.200.1            0             0 200 i
*  200.202.0.0/16   100.100.100.5                          0 100 200 i
*>                  200.200.200.1            0             0 200 i
*  200.203.0.0/16   100.100.100.5                          0 100 200 i
*>                  200.200.200.1            0             0 200 i

It seems perfect, every BGP neighbor was established and all routing was received successfully.

BGP Issues

But maybe after a while, the customer will notice that both internet uplink bandwidth was congested at the same time... why ???

BGP is a very interesting protocol that is you cannot check other's BGP configuration but only rely on public routing glass to guess or predict the BGP policy of other peering/transit BGP network.

Now let me tell you the common practice of ISP's BGP policy. In BGP protocol, we usually use BGP attribute - local-preference to synchronized all iBGP routers routing policy. So the local-preference will be the key component to set the BGP priority inside an autonomous system.

The transit ISP1(AS100) and ISP2(AS200) have a direct peering between them. In general, the direct peering relationship will not generate the revenue(in most case...if both parties think they are the same scale and have similar amount of customer base, they will have an agreement to setup the direct link and share the cost together). So both AS100 and AS200 will set the BGP routes learned from the peering has higher priority than other BGP routes learned from other transit providers (because more peering routes they have, the more bandwidth/cost they can saved from transit providers).

In this case, for example, AS100 and AS200 BGP admin configure the BGP local-preference as 120(default is 100) learned from each other as below: (AS300 is AS200's transit customer so AS200 would advertise AS300 to AS100 at the same time to offload AS200's other transit cost if possible).

But AS100 BGP admin would not just tune the peering local-preference only. Why ?

R1#sh ip bgp
BGP table version is 10, local router ID is 100.100.100.5
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal,
              r RIB-failure, S Stale
Origin codes: i - IGP, e - EGP, ? - incomplete

   Network          Next Hop            Metric LocPrf Weight Path
*> 30.30.30.0/24    100.100.100.2                 120      0 200 300 i
*                   100.100.100.6            0             0 300 i

Because if the peering local-preference is higher than the direct link between AS100 <=> AS300, then AS100 would not be able to get the revenue from the transit service for AS300.

So AS100 BGP admin will alter the local-preference as a higher value(ex:150) learned from the transit link of AS300 as below:

R1#sh ip bgp
BGP table version is 15, local router ID is 100.100.100.5
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal,
              r RIB-failure, S Stale
Origin codes: i - IGP, e - EGP, ? - incomplete

   Network          Next Hop            Metric LocPrf Weight Path
*> 30.30.30.0/24    100.100.100.6            0    150      0 300 i

It seems everything is perfect now. However
  • if the transit ISP1 and transit ISP2 BGP admins totally trust customer's BGP advertisement without any BGP inbound filter
  • if the customer BGP admin did not apply any BGP outbound filter with its own BGP prefix only
then it would cause the issue as below:

R1# sh ip bgp regex _200$
BGP table version is 15, local router ID is 100.100.100.5
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal,
              r RIB-failure, S Stale
Origin codes: i - IGP, e - EGP, ? - incomplete

   Network          Next Hop            Metric LocPrf Weight Path
*  200.200.0.0/16   100.100.100.2            0    120      0 200 i
*>                  100.100.100.6                 150      0 300 200 i
*  200.201.0.0/16   100.100.100.2            0    120      0 200 i
*>                  100.100.100.6                 150      0 300 200 i
*  200.202.0.0/16   100.100.100.2            0    120      0 200 i
*>                  100.100.100.6                 150      0 300 200 i
*  200.203.0.0/16   100.100.100.2            0    120      0 200 i
*>                  100.100.100.6                 150      0 300 200 i


R2#sh ip bgp regex _100$
BGP table version is 10, local router ID is 200.200.200.1
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal,
              r RIB-failure, S Stale
Origin codes: i - IGP, e - EGP, ? - incomplete

   Network          Next Hop            Metric LocPrf Weight Path
*> 100.100.0.0/16   200.200.200.2                 150      0 300 100 i
*                   100.100.100.1            0    120      0 100 i
*> 100.101.0.0/16   200.200.200.2                 150      0 300 100 i
*                   100.100.100.1            0    120      0 100 i
*> 100.102.0.0/16   200.200.200.2                 150      0 300 100 i
*                   100.100.100.1            0    120      0 100 i
*> 100.103.0.0/16   200.200.200.2                 150      0 300 100 i
*                   100.100.100.1            0    120      0 100 i

AS100 and AS200 would exchange their traffic via AS300... that's why AS300 both uplink transit circuit was congested...


BGP Best Practice

As a customer BGP admin, please DO REMEBER to apply BGP outbound policy to allow only your own BGP prefix as below:

R3#conf t
Enter configuration commands, one per line.  End with CNTL/Z.
R3(config)#ip as-path access-list 300 permit ^$
R3(config)#route-map ONLY_AS300 permit 10
R3(config-route-map)#match as-path 300
R3(config-route-map)#exit
R3(config)#router bgp 300
R3(config-router)#neighbor 100.100.100.5 route-map ONLY_AS300 out
R3(config-router)#neighbor 200.200.200.1 route-map ONLY_AS300 out
R3(config-router)#^Z
01:35:58: %SYS-5-CONFIG_I: Configured from console by console

R3#clear ip bgp * soft out

After R3 applied the BGP outbound filter to allow only its originated BGP routes to AS100 and AS200, let's see the result of R1 and R2:

R1#sh ip bgp
BGP table version is 31, local router ID is 100.100.100.5
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal,
              r RIB-failure, S Stale
Origin codes: i - IGP, e - EGP, ? - incomplete

   Network          Next Hop            Metric LocPrf Weight Path
*  30.30.30.0/24    100.100.100.2                 120      0 200 300 i
*>                  100.100.100.6            0    150      0 300 i
*> 100.100.0.0/16   0.0.0.0                  0         32768 i
*> 100.101.0.0/16   0.0.0.0                  0         32768 i
*> 100.102.0.0/16   0.0.0.0                  0         32768 i
*> 100.103.0.0/16   0.0.0.0                  0         32768 i
*> 200.200.0.0/16   100.100.100.2            0    120      0 200 i
*> 200.201.0.0/16   100.100.100.2            0    120      0 200 i
*> 200.202.0.0/16   100.100.100.2            0    120      0 200 i
*> 200.203.0.0/16   100.100.100.2            0    120      0 200 i


R2#sh ip bgp
BGP table version is 14, local router ID is 200.200.200.1
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal,
              r RIB-failure, S Stale
Origin codes: i - IGP, e - EGP, ? - incomplete

   Network          Next Hop            Metric LocPrf Weight Path
*> 30.30.30.0/24    200.200.200.2            0    150      0 300 i
*                   100.100.100.1                 120      0 100 300 i
*> 100.100.0.0/16   100.100.100.1            0    120      0 100 i
*> 100.101.0.0/16   100.100.100.1            0    120      0 100 i
*> 100.102.0.0/16   100.100.100.1            0    120      0 100 i
*> 100.103.0.0/16   100.100.100.1            0    120      0 100 i
*> 200.200.0.0/16   0.0.0.0                  0         32768 i
*> 200.201.0.0/16   0.0.0.0                  0         32768 i
*> 200.202.0.0/16   0.0.0.0                  0         32768 i
*> 200.203.0.0/16   0.0.0.0                  0         32768 i

These result of routing table is what we want and all traffic flow between them are match our expectation now!

Alternative Solutions

In the past decade, most BGP network admins relied on manual command line to tune routing one-by-one and collect NetFlow information to analyse the traffic performance and predict the bandwidth usage. In most case, BGP admins change specific destination best routes by re-routing traffic to different transit / peering ISP in order to solve the congestion loss or latency issues only when receiving customers complains. Such optimization approaches are reactive and still leave space to causing outages, as making mistakes is only human.

Recently, I have learned of another way to help isolating the root causes and eliminate issues automatically. There are several solutions on the market focuses on optimizing BGP autonomously, but the most recent one, popped-out during one of my online researches. There is a platform, claiming to improve routes autonomously, enhancing the overall network performance with about 40%. This product, also called: Noction Intelligent Routing Platform (Noction IRP) is proactively assessing paths continuously, re-routing traffic to better, alternative paths, avoiding congestion, outages or other routing anomalies, occurring in the internet’s mid-mile.
I’ve looked around a bit, and notices that the word-of-mouth has quite a positive approach regards to what this product does over several mailing lists. As I have been a bit curious, I’ve researches some of the case studies made with one of their customers, and it’s seems the platform does its job quite well. In only one month (according to that case study), Noction’s IRP improved about 23 percent of the total traffic originating from their customer’s network. During the same month, Noction IRP announced about 143,000 routing updates, which dropped the packet loss by an average of 80% for as many as 18,000 unique prefixes.

 Noction.com - Noction IRP‎The platform also reduced latency by an average of 20%, executing about 458,000 routing updates for about 30,000 unique prefixes. Compared to what a network engineer could do, I think this platform can actually be a great asses in optimizing routes within multi-homed environments. If you wish to find out more about this product, just google for “Intelligent Routing” and you will get right ahead to their page.


Post a Comment