问题描述
两台S9706作集群,之前运行正常,2/8主控板突然注册不上,拔插单板故障依旧。
版本:V200R010C00SPC600
板卡型号:EH1D2SRUC000
集群子卡型号:EH1D2VS08000
告警信息
#Aug 1 2018 07:41:12 CORE-CSS-S9706 CSSM/3/CSSLINKDOWN:OID 1.3.6.1.4.1.2011.5.25.183.3.3.2.1 1/8 CSS port 1 down.
#Aug 1 2018 07:22:52 CORE-CSS-S9706 CSSM/4/CSSLINKUP:OID 1.3.6.1.4.1.2011.5.25.183.3.3.2.2 1/8 CSS port 1 up.
处理过程
1、display device查询2/8槽位状态已上电、未注册,同时可以得知是两台交换机做了集群。
===============display device===============
==================================================
Chassis 1 (Master Switch)
S9706's Device status:
Slot Sub Type Online Power Register Status Role
-------------------------------------------------------------------------------
7 - EH1D2SRUC000 Present PowerOn Registered Normal Master
1 EH1D2VS08000 Present PowerOn Registered Normal NA
8 - EH1D2SRUC000 Present PowerOn Registered Normal Slave
1 EH1D2VS08000 Present PowerOn Registered Normal NA
PWR1 - - Present PowerOn Registered Normal NA
PWR2 - - Present PowerOn Registered Normal NA
CMU1 - EH1D200CMU00 Present PowerOn Registered Normal Master
FAN1 - - Present PowerOn Registered Normal NA
FAN2 - - Present PowerOn Registered Normal NA
Chassis 2 (Standby Switch)
S9706's Device status:
Slot Sub Type Online Power Register Status Role
-------------------------------------------------------------------------------
1 - EH1D2G48TFA0 Present PowerOn Registered Normal NA
2 - EH1D2G48SX1E Present PowerOn Registered Normal NA
3 - ET1D2S24SX2S Present PowerOn Registered Normal NA
7 - EH1D2SRUC000 Present PowerOn Registered Normal Master
1 EH1D2VS08000 Present PowerOn Registered Normal NA
8 - - Present PowerOn Unregistered - Slave
PWR1 - - Present PowerOn Registered Normal NA
PWR2 - - Present PowerOn Registered Normal NA
CMU1 - EH1D200CMU00 Present PowerOn Registered Normal Master
FAN1 - - Present PowerOn Registered Normal NA
FAN2 - - Present PowerOn Registered Normal NA
2、display css channel,查不到2/8集群链路状态信息。
===============display css channel===============
=======================================================
CSS link-down-delay: 0ms
Chassis 1 || Chassis 2
================================================================================
Num [SRUC HG] [VS08 Port(Status)] || [VS08 Port(Status)] [SRUC HG]
1 1/7 0/12 -- 1/7/0/1(UP 10G) ---||--- 2/7/0/1(UP 10G) -- 2/7 0/12
13 1/8 0/14 -- 1/8/0/5(UP 10G) ---||--- 2/7/0/5(UP 10G) -- 2/7 0/14
3、根据VS08集群卡连线规则,2/8的两组分别对应1/7的2组和1/8的1组。
4、查询产品文档,用VS08做集群要求两组都连接,否则会导致主控板注册不上。
集群后有主控板注册不上是什么原因?
有可能是集群连线不符合连线规则导致的。
对于ES02VSTSA、EH1D2VS08000集群卡,均要求连线时需要与对端集群卡交叉连接,具体连线规则可参见《交换机集群安装指导》。ES02VSTSA集群卡要求端口全部连接;
EH1D2VS08000集群卡要求一端设备一块集群卡上的两组,分别与对端设备的两块集群卡的一组连接,即交叉连接,如果有一块主控板上的集群卡没有被连接,则这块主控板就会注册不上,反复复位。
所以在进行集群连线时,需要确保符合集群连线规则,否则会出现各种故障现象。
5、display transceiver verbose检查1/8/0/1光模块收发光正常,1/7/0/5光模块不发光。
Slot[1/7] CssPort[5] transceiver information:
-------------------------------------------------------------
Common information:
Transceiver Type :10GBASE_LR_SFP
Connector Type :LC
Wavelength(nm) :1310
Transfer Distance(m) :10000(9um)
Digital Diagnostic Monitoring :YES
Vendor Name :HUAWEI
Vendor Part Number :34060599
Ordering Name :
-------------------------------------------------------------
Manufacture information:
Manu. Serial Number :MA17130530031
Manufacturing Date :2017-03-31
Vendor Name :HUAWEI
-------------------------------------------------------------
Alarm information:
RX power low
TX power low
TX bias low
-------------------------------------------------------------
Diagnostic information:
Temperature(°C) :20
Voltage(V) :3.35
Bias Current(mA) :0.12
Bias High Threshold(mA) :90.00
Bias Low Threshold(mA) :2.00
Current Rx Power(dBM) :-19.17
Default Rx Power High Threshold(dBM) :2.50
Default Rx Power Low Threshold(dBM) :-16.40
Current Tx Power(dBM) :-40.00
Default Tx Power High Threshold(dBM) :2.50
Default Tx Power Low Threshold(dBM) :-10.20
User Set Rx Power High Threshold(dBM) :2.50
User Set Rx Power Low Threshold(dBM) :-16.40
User Set Tx Power High Threshold(dBM) :2.50
User Set Tx Power Low Threshold(dBM) :-10.20
6、更换1/7/0/5光模块,发光正常,同时2/8主控板也正常注册。
根因
VS08做集群要求两组都连接,用户是单链路集群组网(四个组分别只连了一根堆叠线),1/7/0/5的光模块没有发光,导致堆叠链路中断,主控板注册不上。
解决方案
更换1/7/0/5的光模块。
建议与总结
建议集群线缆满插,防止倒换或者单集群卡故障引发集群问题。