S9700(V200R001C00)VRRP状态一直变化.

问题描述

 

XX客户网络设备巡检时,发现有台交换机(S9700_02,V200R001C00SPC300SPH016)日志中出现部分接口(VLANIF651,VLANIF696,VLANIF697,VLANIF699)VRRP主备状态不断切换

在S9700_02上执行display logbuffer,显示信息如下:

-------------------------------------------------------------------------------

Feb 11 2018 18:40:15+08:00 GZ1_EP_AS_02%%01VRRP/4/STATEWARNINGEXTEND(l)[0]:Virtual Router state BACKUP changed to MASTER, because of protocol timer expired. (Interface=Vlanif697, VrId=7, InetType=IPv4)

Feb 11 2018 18:40:08+08:00 GZ1_EP_AS_02%%01VRRP/4/STATEWARNINGEXTEND(l)[1]:Virtual Router state MASTER changed to BACKUP, because of priority calculation. (Interface=Vlanif697, VrId=7, InetType=IPv4)

Feb 11 2018 18:39:49+08:00 GZ1_EP_AS_02%%01VRRP/4/STATEWARNINGEXTEND(l)[2]:Virtual Router state BACKUP changed to MASTER, because of protocol timer expired. (Interface=Vlanif699, VrId=9, InetType=IPv4)

Feb 11 2018 18:39:44+08:00 GZ1_EP_AS_02 %%01VRRP/4/STATEWARNINGEXTEND(l)[3]:Virtual Router state BACKUP changed to MASTER, because of protocol timer expired. (Interface=Vlanif696, VrId=11, InetType=IPv4)

Feb 11 2018 18:39:42+08:00 GZ1_EP_AS_02%%01VRRP/4/STATEWARNINGEXTEND(l)[4]:Virtual Router state MASTER changed to BACKUP, because of priority calculation. (Interface=Vlanif699, VrId=9,InetType=IPv4)Feb 11 2018 18:39:37+08:00 GZ1_EP_AS_02%%01VRRP/4/STATEWARNINGEXTEND(l)[5]:Virtual Router state MASTER changed to BACKUP, because of priority calculation. (Interface=Vlanif696, VrId=11, InetType=IPv4)

Feb 11 2018 18:39:26+08:00 GZ1_EP_AS_02%%01VRRP/4/STATEWARNINGEXTEND(l)[6]:Virtual Router state BACKUP changed to MASTER, because of protocol timer expired. (Interface=Vlanif697, VrId=7, InetType=IPv4)

Feb 11 2018 18:39:20+08:00 GZ1_EP_AS_02%%01VRRP/4/STATEWARNINGEXTEND(l)[7]:Virtual Router state MASTER changed to BACKUP, because of priority calculation. (Interface=Vlanif697, VrId=7, InetType=IPv4)

Feb 11 2018 18:39:13+08:00 GZ1_EP_AS_02%%01VRRP/4/STATEWARNINGEXTEND(l)[8]:Virtual Router state BACKUP changed to MASTER, because of protocol timer expired. (Interface=Vlanif696, VrId=11, InetType=IPv4)

Feb 11 2018 18:39:06+08:00 GZ1_EP_AS_02%%01VRRP/4/STATEWARNINGEXTEND(l)[9]:Virtual Router state MASTER changed to BACKUP, because of priority calculation. (Interface=Vlanif696, VrId=11,InetType=IPv4)Feb 11 2018 18:38:46+08:00 GZ1_EP_AS_02%%01VRRP/4/STATEWARNINGEXTEND(l)[10]:Virtual Router state BACKUP changed to MASTER, because of protocol timer expired. (Interface=Vlanif651, VrId=1, InetType=IPv4)

Feb 11 2018 18:38:40+08:00 GZ1_EP_AS_02%%01VRRP/4/STATEWARNINGEXTEND(l)[11]:Virtual Router state MASTER changed to BACKUP, because of priority calculation. (Interface=Vlanif651, VrId=1, InetType=IPv4)

-------------------------------------------------------------------------------

交换机S9700_01与S9700_02通过Eth-trunk接口相连,执行display vrrp,发现S9700_01上的这些VLANIF接口的VRRP状态均正常显示为Master

处理过程

 

1. 执行display current-configuration,检查设备S9700_02上出现VRRP主备倒换的VLANIF接口配置,没有发现任何异常
-------------------------------------------------------------------------------
#
interface Vlanif651
ip address 192.168.208.29 255.255.255.240
vrrp vrid 1 virtual-ip 192.168.208.30
vrrp vrid 1 timer advertise 2
vrrp vrid 1 authentication-mode md5 %$%$!l#N:3+`U2JwVaLmVHs;Qzqh%$%$
#
interface Vlanif696
ip address 192.168.193.28 255.255.255.224
vrrp vrid 11 virtual-ip 192.168.193.30
vrrp vrid 11 timer advertise 2
vrrp vrid 11 authentication-mode md5 %$%$>@Db:FJtyEO%*/X2GXQ)A@7.%$%$
#
interface Vlanif697
ip address 192.168.192.60 255.255.255.224
vrrp vrid 7 virtual-ip 192.168.192.62
vrrp vrid 7 timer advertise 2
vrrp vrid 7 authentication-mode md5 %$%$5_[",c]1<&mXXCXtsrxJY'{r%$%$
#
interface Vlanif699
ip address 192.168.192.28 255.255.255.224
vrrp vrid 9 virtual-ip 192.168.192.30
vrrp vrid 9 timer advertise 2
vrrp vrid 9 authentication-mode md5 %$%$#[va&|0}j"`OX\Q;(sr~Q7.%%$%$
-------------------------------------------------------------------------------

2.执行display cpu-defend statistics all,发现S9700-2交换机9槽位两次的VRRP报文信息如下:
-------------------------------------------------------------------------------
Statistics on slot 9:
-------------------------------------------------------------------------------
Packet Type         Pass(Bytes)  Drop(Bytes)   Pass(Packets)   Drop(Packets)
-------------------------------------------------------------------------------
vrrp                 111891256k  4412221155k      1636134940     64875391147
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Statistics on slot 9:
-------------------------------------------------------------------------------
Packet Type         Pass(Bytes)  Drop(Bytes)   Pass(Packets)   Drop(Packets)
-------------------------------------------------------------------------------
vrrp                 111889123k  4412134000k      1636103589     64874109657
-------------------------------------------------------------------------------
结果表明,9槽位单板丢弃VRRP报文数以每秒近万个速度增长,引发了VRRP震荡

解决方案

 

1. 第一步,推测是否是cpcar的值设置较小,导致CPU大量丢弃VRRP报文,初始值为128,增大cpcar的值至512
执行如下命令:
-------------------------------------------------------------------------------
#
cpu-defend policy  test
car  packet-type vrrp cir  512
quit
#
cpu-defend-policy  test global
-------------------------------------------------------------------------------
结果发现,S9700_02的VLANIF接口VRRP状态显示仍然异常,而且CPU丢弃VRRP报文没有缓解,所以该推测有误

2. 第二步,检查S9700_02上与出现问题的VLANIF接口相关的物理接口信息,发现G9/0/47互联接口入方向有大量的组播报文,怀疑是异常VRRP心跳报文,显示信息如下:
-------------------------------------------------------------------------------
GigabitEthernet9/0/47 current state : UP
Line protocol current state : UP
Last 300 seconds input rate 5375456 bits/sec, 7626 packets/sec
Last 300 seconds output rate 9248 bits/sec, 8 packets/sec

Input:  66851370599 packets, 5932779154719 bytes
Unicast:                  285352593,  Multicast:                 66563451343
Broadcast:                  2566663,  Jumbo:                               0
Discard:                          0,  Total Error:                         0

Output:  331319403 packets, 86134078702 bytes
Unicast:                  307025307,  Multicast:                    24207294
Broadcast:                    86802,  Jumbo:                               0
Discard:                          0,  Total Error:                         0

Input bandwidth utilization threshold : 100.00%
Output bandwidth utilization threshold: 100.00%
Input bandwidth utilization  : 0.54%
Output bandwidth utilization :    0%
-------------------------------------------------------------------------------
因此使用Wireshark对G9/0/47进行分析

3. 从经分析判断,

可以发现,在1s内有近8000个VRRP数据包从192.168.193.2发出,导致S9700_02交换机CPU对VRRP报文处理能力不足,从而不断出现VRRP主备切换的现象,
而192.168.193.2的地址被分配给网控器_01使用
事实上,交换机S9700_01也会收到大量VRRP报文,但是由于S9700_01交换机VRRP初始状态即为Master,所以虽然CPU会丢弃大量VRRP报文,但是状态不会发生改变,因此在执行display vrrp时显示正常;而S9700_02交换机初始状态为Backup,收到大量VRRP报文后,VRRP状态会在Master与Backup间来回切换

4. 将网控器_01与交换机S9700_01的互联接口shutdown,再次对G9/0/47接口进行分析,

可以发现,结果恢复正常,同时交换机S9700_02的VRRP状态也恢复正常,问题解决。

阅读剩余
THE END
阿里云ECS特惠活动
阿里云ECS服务器 - 限时特惠活动

云服务器爆款直降90%

新客首单¥68起 | 人人可享99元套餐,续费同价 | u2a指定配置低至2.5折1年,立即选购享更多福利!

新客首单¥68起
人人可享99元套餐
弹性计费
7x24小时售后
立即查看活动详情
阿里云ECS服务器特惠活动