S9706交换机软件问题导致L3 VPN场景部分路由不通的故障

问题描述

1、组网信息:


核心侧是华为的NE40E-X16和NE5000E两台路由器做负载分担,每个地级市有一台NE40E-X8设备分别连到两台核心设备上,ME60-X3网络侧接入到地级市市核心路由器NE40E-X8,S9700交换机通过三层接入到地级市两台ME60-X3 BRAS设备进行三层认证,市区通过片区汇聚层交换机S9300汇聚到盟市核心交换机S9700。

2、故障设备S9700信息:

版本:V200R006C00SPC500,补丁:V200R006SPH005

3、故障现象:

S9700在L3 VPN场景下导致部分路由转发不通,在VPN里ping上层设备业务地址出现不通的情况:

[HH-S-S9303-1]ping -c 10 -vpn-instance guangdian_vod_l3vpn_1 10.132.240.2
PING 10.132.240.2: 56  data bytes, press CTRL_C to break
Request time out
Request time out

Request time out
Request time out
Request time out
Request time out

Request time out
Request time out
Request time out
Request time out
---10.132.240.2 ping statistics ---
10 packet(s) transmitted
0 packet(s) received
100.00% packet loss

 

告警信息

处理过程

1、检查上层路由正常

[HH-S-S9706-1]dis ip routing-table  vpn-instance  guangdian_vod_l3vpn_1  10.13.1.249 verbose

Route Flags: R - relay, D - download to fib
------------------------------------------------------------------------------

Routing Table : guangdian_vod_l3vpn_1
Summary Count : 1
Destination: 10.13.1.248/29
Protocol:IBGP             Process ID: 0   Preference: 255                    Cost: 0
NextHop:10.12.1.1         Neighbour: 10.11.20.1
State: Active  Adv Relied       Age: 03h37m02s
Tag: 0                  Priority: low
Label:3159                QoSInfo: 0x0
IndirectID: 0x5
RelayNextHop:10.15.1.21        Interface: Vlanif1201
TunnelID:0xbd1f                Flags: RD
2、通过流量统计发现流量已经从XGigabitEthernet1/0/6收到

[HH-S-S9706-1]display traffic policy statistics interface
XGigabitEthernet 1/0/6 inbound

Interface:XGigabitEthernet1/0/6
Traffic policyinbound: test

Rule number: 7

Current status: OK!

Statistics interval:300

Board : 1

---------------------------------------------------------------------

Matched          |      Packets:          10
|      Bytes:                           1,060
|      Rate(pps):                           0
|      Rate(bps):                           0---------------------------------------------------------------------

Passed         |     Packets:                           10
|      Bytes:                           1,060
|      Rate(pps):                           0
|      Rate(bps):                           0
---------------------------------------------------------------------
Dropped        |     Packets:                            0

|      Bytes:                               0
|      Rate(pps):                           0

|      Rate(bps):                           0

---------------------------------------------------------------------

Filter       |     Packets:                            0

|      Bytes:                               0

---------------------------------------------------------------------

Car          |     Packets:                            0

|      Bytes:                               0

---------------------------------------------------------------------
3、查看1号槽芯片表项发现下一跳索引为79216的下一跳表项下发错误:

[HH-S-S9706-1-diagnose] display  fpi table slot 1 0 l3-table  fibv4 10.13.1.248   29  5
Table: FIBv4 Table
Unit 0, Total Number of Fibv4 entries: 1310720
Used: 64850;Free: 1245870
-------------------------------------------------------------------------
Entry      VRF        DIP/MASK             RE INDEX   L3HitEn
-------------------------------------------------------------------------
1          5          10.13.1.248/29          1204        0
-------------------------------------------------------------------------
[HH-S-S9706-1-diagnose] display  fpi table   slot   1 0 l3-table re 1204 1
Table: RE Table
Unit 0, Total Number of RE entries: 1048576
Data: 13570000 00c57000 00000060 00000005
-----------------------------------------------------------------------
0  | VcLabel1   SetExp1   PhbId1     OpCode     Valid
| 0          0          0          2          1
-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
1  | NhpIndex1  NstIndex1 BGP  Frr  Lbt SetTTL1  RoutePolicy  NstCnt
| 0          0          0   0    0    0       3            0
| -- -- -- -- -- -- -- OpCode=2(Single NextHop L3VPN) -- -- -- -- -- -- --
2  | VcLabel0   Rsv       SetExp0    Exp0       PhbId0     SetTtl0
| 3159       0          0          0          0          0
-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
3  | NhpIndex0  Rsv       Ecc
| 79216      0         0
-----------------------------------------------------------------------

[HH-S-S9706-1-diagnose] display  fpi table   slot   1 0 l3-table  nhp 79216 1
Table: Next Hop Table

Unit 0, Total Number of NHP entries: 262144
Current Index: 79216

Data: 00000000 00000000 00000000 00000000
-----------------------------------------------------------------------
0  | Trunk      Mod       Port       FwdType    OportType TagNum     Valid
| 0          0          0          0          0          0          0
-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
1  | Ovid       Ivid       Flag
| 0          0          T
| -- -- -- -- -- -- -- --FwdType=Unknown(0)-- -- -- -- -- -- -- -- -  //此处错误
-----------------------------------------------------------------------
4、查看上层适配软表发现在下发NHLFE时报参数错误
[HH-S-S9706-1-diagnose]display   adp-mpls nhlfe-info  brief   slot    1
SOFTID   TOKENREC    OUTLABEL OPERTYPE   NHPIDX  PDTIDX     FLAG(FAKE/RET)
---------------------------------------------------------------
48415    48415      153123   PUSH       79216   0x2000001  3/18    //此处18代表参数错误。
5、通过分析代码发现MPLS 底层中下一跳规格判断值较小,只有65535,当超过这个值时会导致L3VPN表项下发错误。前面说的下一跳索引79216 已经超过65535。

 

 

 

根因

从以上信息来看,根因是由于上下层下一跳规格不一致导致的,当上层分配的下一跳索引大于底层的下一跳规格大小时就会导致L3 VPN转发异常。
 

解决方案

通过升级版本解决该问题,由于现网的V200R006版本已经EOFS,建议升级到V200R007及以上的版本,并打上最新的补丁。

建议与总结

1、建议做全网排查,确认下ARP、私网路由以及LSP的相关使用量,注:一台设备可以只获取一个接口板。
请收集以下信息:
[Dis-12708-CNBJPEK12-02-diagnose]dis fib  x statistics  all
[Dis-12708-CNBJPEK12-02-diagnose] dis arp statistics all
[Dis-12708-CNBJPEK12-02-diagnose]display  adp-mpls nhlfe-info  brief slot  x
2、当现网出问题的设备arp表项已经超过了65535时,建议临时减少arp数量到6万以下。
3、建议将已经EOFS的版本升级到主流版本,并打上最新的补丁。

阅读剩余
THE END
阿里云ECS特惠活动
阿里云ECS服务器 - 限时特惠活动

云服务器爆款直降90%

新客首单¥68起 | 人人可享99元套餐,续费同价 | u2a指定配置低至2.5折1年,立即选购享更多福利!

新客首单¥68起
人人可享99元套餐
弹性计费
7x24小时售后
立即查看活动详情
阿里云ECS服务器特惠活动