BGP Next-Hop-Self
Contents
Intro
By default, BGP updates sent to iBGP peers don’t modify the next-hop. This can cause problems if a router received a prefix from an iBGP peer and doesn’t know how to reach the next hop. If the recursive lookup for the next hop fails the prefix will not be considered for best path selection and will never make it to the routing table. I’ll show you this problem and how to fix it using BGP next-hop-self in this tutorial.
Topology
Initial Configs
conf t
host R1
no ip domain-lookup
line con 0
logg syn
!
int g0/0
no sh
ip add 10.10.12.1 255.255.255.0
!
int lo0
ip add 1.1.1.1 255.255.255.255
conf t
host R2
no ip domain-lookup
line con 0
logg syn
!
int g0/0
no sh
ip add 10.10.12.2 255.255.255.0
!
int g0/1
no sh
ip add 10.10.23.2 255.255.255.0
conf t
host R3
no ip domain-lookup
line con 0
logg syn
!
int g0/1
no sh
ip add 10.10.23.3 255.255.255.0
Configuration Steps
1. Configure Basic BGP
First, we establish the BGP session between all routers and have R1 advertise the 1.1.1.1/32 network into BGP.
R1:
router bgp 1
neighbor 10.10.12.2 remote-as 23
network 1.1.1.1 mask 255.255.255.255
R2:
router bgp 23
neighbor 10.10.12.1 remote-as 1
neighbor 10.10.23.3 remote-as 23
R3:
router bgp 23
neighbor 10.10.23.2 remote-as 2
If I use the show ip bgp summary command on R2 and R3 I can verify the neighbors are up and how many prefixes are being received from them.
R2#sh ip bgp summary BGP router identifier 10.10.23.2, local AS number 23 BGP table version is 2, main routing table version 2 1 network entries using 144 bytes of memory 1 path entries using 84 bytes of memory 1/1 BGP path/bestpath attribute entries using 160 bytes of memory 1 BGP AS-PATH entries using 24 bytes of memory 0 BGP route-map cache entries using 0 bytes of memory 0 BGP filter-list cache entries using 0 bytes of memory BGP using 412 total bytes of memory BGP activity 1/0 prefixes, 1/0 paths, scan interval 60 secs Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd 10.10.12.1 4 1 20 19 2 0 0 00:15:10 1 10.10.23.3 4 23 19 20 2 0 0 00:15:04 0
R2’s peering to R1 and R3 is up. I can also see a single prefix being received from R1.
R3#sh ip bgp summary
BGP router identifier 10.10.23.3, local AS number 23
BGP table version is 1, main routing table version 1
1 network entries using 144 bytes of memory
1 path entries using 84 bytes of memory
1/0 BGP path/bestpath attribute entries using 160 bytes of memory
1 BGP AS-PATH entries using 24 bytes of memory
0 BGP route-map cache entries using 0 bytes of memory
0 BGP filter-list cache entries using 0 bytes of memory
BGP using 412 total bytes of memory
BGP activity 1/0 prefixes, 1/0 paths, scan interval 60 secs
Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd
10.10.23.2 4 23 20 19 1 0 0 00:15:04 1
The peering from R3’s perspective is up and a single prefix is being received from R2. I’ll check the BGP table on R2.
R2#sh ip bgp
BGP table version is 2, local router ID is 10.10.23.2
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal,
r RIB-failure, S Stale, m multipath, b backup-path, f RT-Filter,
x best-external, a additional-path, c RIB-compressed,
t secondary path,
Origin codes: i - IGP, e - EGP, ? - incomplete
RPKI validation codes: V valid, I invalid, N Not found
Network Next Hop Metric LocPrf Weight Path
*> 1.1.1.1/32 10.10.12.1 0 0 1 i
R2 has the 1.1.1.1/32 prefix in its BGP table and the “greater than” sign all the way on the left means this path has been marked as best. This prefix should now be in R2’s routing table.
R2#show ip route | begin Gateway
Gateway of last resort is not set
1.0.0.0/32 is subnetted, 1 subnets
B 1.1.1.1 [20/0] via 10.10.12.1, 00:23:57
10.0.0.0/8 is variably subnetted, 4 subnets, 2 masks
C 10.10.12.0/24 is directly connected, GigabitEthernet0/0
L 10.10.12.2/32 is directly connected, GigabitEthernet0/0
C 10.10.23.0/24 is directly connected, GigabitEthernet0/1
L 10.10.23.2/32 is directly connected, GigabitEthernet0/1
R2 has no issues marking this prefix as best and placing it into the routing table. Now I’ll check R3.
R3#sh ip bgp
BGP table version is 1, local router ID is 10.10.23.3
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal,
r RIB-failure, S Stale, m multipath, b backup-path, f RT-Filter,
x best-external, a additional-path, c RIB-compressed,
t secondary path,
Origin codes: i - IGP, e - EGP, ? - incomplete
RPKI validation codes: V valid, I invalid, N Not found
Network Next Hop Metric LocPrf Weight Path
* i 1.1.1.1/32 10.10.12.1 0 100 0 1 i
The prefix is in R3’s BGP table with a next hop IP of 10.10.12.1 but there is no “greater than” sign indicating this is the best path. Let’s check the details of the prefix.
R3#show ip bgp 1.1.1.1
BGP routing table entry for 1.1.1.1/32, version 0
Paths: (1 available, no best path)
Flag: 0x4100
Not advertised to any peer
Refresh Epoch 1
1
10.10.12.1 (inaccessible) from 10.10.23.2 (10.10.23.2)
Origin IGP, metric 0, localpref 100, valid, internal
rx pathid: 0, tx pathid: 0
When I do the show ip bgp 1.1.1.1 command it says that the next hop IP of 10.10.12.1 is inaccessible. If R2 doesn’t know how to reach the next hop then the prefix will never be marked as “best” and won’t make into the routing table. If I do a quick check of the routing table I can confirm whether or not R2 knows how to reach the next-hop IP address of 10.10.12.1.
R3#sh ip route | begin Gateway
Gateway of last resort is not set
10.0.0.0/8 is variably subnetted, 2 subnets, 2 masks
C 10.10.23.0/24 is directly connected, GigabitEthernet0/1
L 10.10.23.3/32 is directly connected, GigabitEthernet0/1
There’s no route for the 10.10.12.0/24 network, this confirms that R3 has no idea how to reach the next-hop used in the 1.1.1.1/32 prefix. To fix this, I need R2 to change the next-hop IP to itself when it sends eBGP prefixes to its iBGP peer, R2. I’ll use the BGP next-hop-self feature for this.
2. Configuring BGP next-hop-self
BGP Next-hop self is configured on a per neighbor basis.
R2:
router bgp 23
neighbor 10.10.23.3 next-hop-self
What I’m telling R2 to do is right before it sends the eBGP prefix to its iBGP peer R3, remove the next-hop IP of 10.10.12.1 and replace it with the IP R2 is using to peer with R3.
In other words R2 is telling R3 Hey if you wanna reach 1.1.1.1/32 use me a next-hop and I’ll take care of the rest!
Now I’ll check the BGP table on R3 to see if anything changed.
R3#sh ip bgp
BGP table version is 7, local router ID is 10.10.23.3
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal,
r RIB-failure, S Stale, m multipath, b backup-path, f RT-Filter,
x best-external, a additional-path, c RIB-compressed,
t secondary path,
Origin codes: i - IGP, e - EGP, ? - incomplete
RPKI validation codes: V valid, I invalid, N Not found
Network Next Hop Metric LocPrf Weight Path
*>i 1.1.1.1/32 10.10.23.2 0 100 0 1 i
The above show ip bgp output tells me the 1.1.1.1/32 prefix has been marked as best. The IP used in the next-hop is now 10.10.23.2 which is the IP R2 uses to peer with R3. We already know R3 can reach that address because it’s used for the iBGP peering. The BGP next-hop-self command did its job perfectly. Let’s look at the details of the prefix.
R3#show ip bgp 1.1.1.1
BGP routing table entry for 1.1.1.1/32, version 6
Paths: (1 available, best #1, table default)
Not advertised to any peer
Refresh Epoch 1
1
10.10.23.2 from 10.10.23.2 (10.10.23.2)
Origin IGP, metric 0, localpref 100, valid, internal, best
rx pathid: 0, tx pathid: 0x0
The details of the prefix show a next-hop IP address of 10.10.23.2 that means no more inaccessible next-hops. The network is in the BGP table but we still need to make sure it entered the routing table.
R3#show ip route | begin Gateway
Gateway of last resort is not set
1.0.0.0/32 is subnetted, 1 subnets
B 1.1.1.1 [200/0] via 10.10.23.2, 00:01:32
10.0.0.0/8 is variably subnetted, 2 subnets, 2 masks
C 10.10.23.0/24 is directly connected, GigabitEthernet0/1
L 10.10.23.3/32 is directly connected, GigabitEthernet0/1
R3’s next hop processing issues are resolved and the 1.1.1.1/32 network is now in the routing table. I hope you found this tutorial helpful. Make sure to lab it out in EVE-NG using the topology file provided below.
EVE-NG Lab File
Here’s the EVE-NG Topology file if you want to import it into your own EVE-NG lab and practice. Please note this is the topology file not the Cisco images, we don’t provide those.
Full Configs
Here are the full configs from all routers if you want to try it out yourself.
conf t
host R1
no ip domain-lookup
line con 0
logg syn
!
int g0/0
no sh
ip add 10.10.12.1 255.255.255.0
int lo0
ip add 1.1.1.1 255.255.255.255
!
router bgp 1
neighbor 10.10.12.2 remote-as 23
network 1.1.1.1 mask 255.255.255.255
conf t
host R2
no ip domain-lookup
line con 0
logg syn
!
int g0/0
no sh
ip add 10.10.12.2 255.255.255.0
int g0/1
no sh
ip add 10.10.23.2 255.255.255.0
!
router bgp 23
neighbor 10.10.12.1 remote-as 1
neighbor 10.10.23.3 remote-as 23
neighbor 10.10.23.3 next-hop-self
conf t
host R3
no ip domain-lookup
line con 0
logg syn
!
int g0/1
no sh
ip add 10.10.23.3 255.255.255.0
!
router bgp 23
neighbor 10.10.23.2 remote-as 23
Got questions?
Any questions or comments feel free to send me an email at rafael@networkengineerpro.com and I’ll get back to you when I can.