BGP Next-Hop-Self

Contents

Intro

By default, BGP updates sent to iBGP peers don’t modify the next-hop. This can cause problems if a router received a prefix from an iBGP peer and doesn’t know how to reach the next hop. If the recursive lookup for the next hop fails the prefix will not be considered for best path selection and will never make it to the routing table. I’ll show you this problem and how to fix it using BGP next-hop-self in this tutorial.

Topology

BGP Next Hop Self lab topology with IP addressing details, featuring three routers: R1 (AS1), R2 and R3 (AS23)
The last octet is the router number unless specified otherwise. Example: R1's G0/0 is 10.10.12.1/24. Don't forget to download the EVE-NG topology file for this tutorial below.

Initial Configs

Configuration Steps

1. Configure Basic BGP

First, we establish the BGP session between all routers and have R1 advertise the 1.1.1.1/32 network into BGP.

R1:

router bgp 1
neighbor 10.10.12.2 remote-as 23
network 1.1.1.1 mask 255.255.255.255

R2:

router bgp 23
neighbor 10.10.12.1 remote-as 1
neighbor 10.10.23.3 remote-as 23

R3:

router bgp 23
neighbor 10.10.23.2 remote-as 2

If I use the show ip bgp summary command on R2 and R3 I can verify the neighbors are up and how many prefixes are being received from them. 

R2#sh ip bgp summary
BGP router identifier 10.10.23.2, local AS number 23
BGP table version is 2, main routing table version 2
1 network entries using 144 bytes of memory
1 path entries using 84 bytes of memory
1/1 BGP path/bestpath attribute entries using 160 bytes of memory
1 BGP AS-PATH entries using 24 bytes of memory
0 BGP route-map cache entries using 0 bytes of memory
0 BGP filter-list cache entries using 0 bytes of memory
BGP using 412 total bytes of memory
BGP activity 1/0 prefixes, 1/0 paths, scan interval 60 secs

Neighbor        V           AS MsgRcvd MsgSent   TblVer  InQ OutQ Up/Down  State/PfxRcd
10.10.12.1      4            1      20      19        2    0    0 00:15:10        1
10.10.23.3      4           23      19      20        2    0    0 00:15:04        0

R2’s peering to R1 and R3 is up. I can also see a single prefix being received from R1.

R3#sh ip bgp summary
BGP router identifier 10.10.23.3, local AS number 23
BGP table version is 1, main routing table version 1
1 network entries using 144 bytes of memory
1 path entries using 84 bytes of memory
1/0 BGP path/bestpath attribute entries using 160 bytes of memory
1 BGP AS-PATH entries using 24 bytes of memory
0 BGP route-map cache entries using 0 bytes of memory
0 BGP filter-list cache entries using 0 bytes of memory
BGP using 412 total bytes of memory
BGP activity 1/0 prefixes, 1/0 paths, scan interval 60 secs

Neighbor        V           AS MsgRcvd MsgSent   TblVer  InQ OutQ Up/Down  State/PfxRcd
10.10.23.2      4           23      20      19        1    0    0 00:15:04        1

The peering from R3’s perspective is up and a single prefix is being received from R2. I’ll check the BGP table on R2. 

R2#sh ip bgp
BGP table version is 2, local router ID is 10.10.23.2
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal, 
              r RIB-failure, S Stale, m multipath, b backup-path, f RT-Filter, 
              x best-external, a additional-path, c RIB-compressed, 
              t secondary path, 
Origin codes: i - IGP, e - EGP, ? - incomplete
RPKI validation codes: V valid, I invalid, N Not found

     Network          Next Hop            Metric LocPrf Weight Path
 *>   1.1.1.1/32       10.10.12.1               0             0 1 i

R2 has the 1.1.1.1/32 prefix in its BGP table and the “greater than” sign all the way on the left means this path has been marked as best. This prefix should now be in R2’s routing table.

R2#show ip route | begin Gateway
Gateway of last resort is not set

      1.0.0.0/32 is subnetted, 1 subnets
B        1.1.1.1 [20/0] via 10.10.12.1, 00:23:57
      10.0.0.0/8 is variably subnetted, 4 subnets, 2 masks
C        10.10.12.0/24 is directly connected, GigabitEthernet0/0
L        10.10.12.2/32 is directly connected, GigabitEthernet0/0
C        10.10.23.0/24 is directly connected, GigabitEthernet0/1
L        10.10.23.2/32 is directly connected, GigabitEthernet0/1

R2 has no issues marking this prefix as best and placing it into the routing table. Now I’ll check R3.

R3#sh ip bgp
BGP table version is 1, local router ID is 10.10.23.3
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal, 
              r RIB-failure, S Stale, m multipath, b backup-path, f RT-Filter, 
              x best-external, a additional-path, c RIB-compressed, 
              t secondary path, 
Origin codes: i - IGP, e - EGP, ? - incomplete
RPKI validation codes: V valid, I invalid, N Not found

     Network          Next Hop            Metric LocPrf Weight Path
 * i  1.1.1.1/32       10.10.12.1               0    100      0 1 i

The prefix is in R3’s BGP table with a next hop IP of 10.10.12.1 but there is no “greater than” sign indicating this is the best path. Let’s check the details of the prefix.

R3#show ip bgp 1.1.1.1
BGP routing table entry for 1.1.1.1/32, version 0
Paths: (1 available, no best path)
Flag: 0x4100
  Not advertised to any peer
  Refresh Epoch 1
  1
    10.10.12.1 (inaccessible) from 10.10.23.2 (10.10.23.2)
      Origin IGP, metric 0, localpref 100, valid, internal
      rx pathid: 0, tx pathid: 0

When I do the show ip bgp 1.1.1.1 command it says that the next hop IP of 10.10.12.1 is inaccessible. If R2 doesn’t know how to reach the next hop then the prefix will never be marked as “best” and won’t make into the routing table. If I do a quick check of the routing table I can confirm whether or not R2 knows how to reach the next-hop IP address of 10.10.12.1.

R3#sh ip route | begin Gateway
Gateway of last resort is not set

      10.0.0.0/8 is variably subnetted, 2 subnets, 2 masks
C        10.10.23.0/24 is directly connected, GigabitEthernet0/1
L        10.10.23.3/32 is directly connected, GigabitEthernet0/1

There’s no route for the 10.10.12.0/24 network, this confirms that R3 has no idea how to reach the next-hop used in the 1.1.1.1/32 prefix. To fix this, I need R2 to change the next-hop IP to itself when it sends eBGP prefixes to its iBGP peer, R2. I’ll use the BGP next-hop-self feature for this. 

If a recursive lookup to the next-hop fails the prefix won't be marked as best and will never make into the routing table.

2. Configuring BGP next-hop-self

BGP Next-hop self is configured on a per neighbor basis.

R2:

router bgp 23
 neighbor 10.10.23.3 next-hop-self

What I’m telling R2 to do is right before it sends the eBGP prefix to its iBGP peer R3, remove the next-hop IP of 10.10.12.1 and replace it with the IP R2 is using to peer with R3. 

In other words R2 is telling R3 Hey if you wanna reach 1.1.1.1/32 use me a next-hop and I’ll take care of the rest!

Now I’ll check the BGP table on R3 to see if anything changed.

R3#sh ip bgp          
BGP table version is 7, local router ID is 10.10.23.3
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal, 
              r RIB-failure, S Stale, m multipath, b backup-path, f RT-Filter, 
              x best-external, a additional-path, c RIB-compressed, 
              t secondary path, 
Origin codes: i - IGP, e - EGP, ? - incomplete
RPKI validation codes: V valid, I invalid, N Not found

     Network          Next Hop            Metric LocPrf Weight Path
 *>i  1.1.1.1/32       10.10.23.2               0    100      0 1 i

The above show ip bgp output tells me the 1.1.1.1/32 prefix has been marked as best. The IP used in the next-hop is now 10.10.23.2 which is the IP R2 uses to peer with R3. We already know R3 can reach that address because it’s used for the iBGP peering. The BGP next-hop-self command did its job perfectly. Let’s look at the details of the prefix. 

R3#show ip bgp 1.1.1.1
BGP routing table entry for 1.1.1.1/32, version 6
Paths: (1 available, best #1, table default)
  Not advertised to any peer
  Refresh Epoch 1
  1
    10.10.23.2 from 10.10.23.2 (10.10.23.2)
      Origin IGP, metric 0, localpref 100, valid, internal, best
      rx pathid: 0, tx pathid: 0x0

The details of the prefix show a next-hop IP address of 10.10.23.2 that means no more inaccessible next-hops. The network is in the BGP table but we still need to make sure it entered the routing table. 

R3#show ip route | begin Gateway
Gateway of last resort is not set

      1.0.0.0/32 is subnetted, 1 subnets
B        1.1.1.1 [200/0] via 10.10.23.2, 00:01:32
      10.0.0.0/8 is variably subnetted, 2 subnets, 2 masks
C        10.10.23.0/24 is directly connected, GigabitEthernet0/1
L        10.10.23.3/32 is directly connected, GigabitEthernet0/1

R3’s next hop processing issues are resolved and the 1.1.1.1/32 network is now in the routing table. I hope you found this tutorial helpful. Make sure to lab it out in EVE-NG using the topology file provided below. 

EVE-NG Lab File

Images used in lab: VIOS-ADVENTERPRISEK9-M), Version 15.9(3)M2

Here’s the EVE-NG Topology file if you want to import it into your own EVE-NG lab and practice. Please note this is the topology file not the Cisco images, we don’t provide those. 

Download

Full Configs

Here are the full configs from all routers if you want to try it out yourself.

Got questions?

Any questions or comments feel free to send me an email at rafael@networkengineerpro.com and I’ll get back to you when I can.