问题描述
一、设备型号:S9712
二、版本信息:v200r003c00spc500
三、组网:S9712---业务口集群----S9712

四、故障现象:配置集群命令前业务板卡注册正常,配置集群命令重启设备后业务板卡注册不上、进而导致集群不能组建成功
主交换机现象
<DT_Core-SW_S9712-A>
Jan 5 2015 14:30:04 DT_Core-SW_S9712-A %%01ALML/4/PUBLISH_EVENT(l)[2]:Publish event. (Slot=1/1, Event ID=BOARD_RESET).
<DT_Core-SW_S9712-A>
Jan 5 2015 14:30:04 DT_Core-SW_S9712-A %%01ALML/3/CPU_RESET(l)[3]:The canbus node of LPU board[1/1] detects that CPU was reset.
<DT_Core-SW_S9712-A>dis dev
Chassis 1 (Master Switch)
S9712's Device status:
Slot Sub Type Online Power Register Alarm Primary
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
1 - - Present PowerOn Unregistered - NA
1 - EH1D2X16SFC0 Present PowerOn Registered Normal NA未识别
3 - EH1D2G48TEA0 Present PowerOn Registered Normal NA
备交换机机现象
Jan 5 2015 14:25:42 DT_Core-SW_S9712-B %%01ALML/4/PUBLISH_EVENT(l)[33]:Publish event. (Slot=2/1, Event ID=BOARD_RESET).
<DT_Core-SW_S9712-B>
Jan 5 2015 14:25:42 DT_Core-SW_S9712-B %%01ALML/3/CPU_RESET(l)[34]:The canbus node of LPU board[2/1] detects that CPU was reset.
<DT_Core-SW_S9712-B>
<DT_Core-SW_S9712-B>
Chassis 2 (Master Switch)
S9712's Device status:
Slot Sub Type Online Power Register Alarm Primary
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
1 - - Present PowerOn Unregistered - NA
1 - EH1D2X16SFC0 Present PowerOn Registered Normal NA
3 - EH1D2G48TEA0 Present PowerOn Registered Normal NA
配置脚本如下:
LPU CSS 1+0 1+1 模式
交换机-1-MASTER
<Quidway> system-view
[Quidway] sysname DT_Core-SW_S9712-A
[DT_Core-SW_S9712-A] set css priority 200
[DT_Core-SW_S9712-A] set css mode lpu
[DT_Core-SW_S9712-A]set css id 1
[DT_Core-SW_S9712-A] interface css-port 1
[DT_Core-SW_S9712-A-css-port1/1] port interface xgigabitethernet 1/0/12 to xgigabitethernet 1/0/15 enable
[SwitchA] css enable
交换机-2
<Quidway> system-view
[Quidway] sysname DT_Core-SW_S9712-B
[DT_Core-SW_S9712-B] set css id 2
[DT_Core-SW_S9712-B] set css priority 100
[DT_Core-SW_S9712-B] set css mode lpu
[DT_Core-SW_S9712-B] interface css-port 1
[DT_Core-SW_S9712-B-css-port1] port interface xgigabitethernet 1/0/12 to xgigabitethernet 1/0/15 enable
[DT_Core-SW_S9712-B] css enable
处理过程
1、核对客户的设备型号、版本是否满足通过业务口组建集群;是,满足;
2、核对客户配置,无问题;
3、根据客户提供的诊断信息The canbus node of LPU board[1/1] detects that CPU was reset.此告警多次出现,单板复位;怀疑是不是板卡存在问题;
4、将配置的集群信息删除、重启设备发现板卡正常注册、无异常;排查板卡问题;
5、再次仔细检查诊断信息发现系统文件的存储路径不在CF卡上而在Flash文件中,怀疑是否是此问题导致;
Directory of flash:/
Idx Attr Size(Byte) Date Time FileName
0 -rw- 88,709,034 Apr 14 2014 14:46:18 S9700-V200R003C00SPC500.CC
103,824 KB total (17,152 KB free)
6、将系统文件移动到cfcard里进行测试,故障解决。
根因
由于系统文件S9700-V200R003C00SPC500.CC不在CF上而在Flash里面导致组建集群不成功。