Upload
vu-anh-tuan
View
247
Download
2
Embed Size (px)
Citation preview
Site
____________________________
____________________________
Site
VELIZYEVOLIUM SAS
Originators
MFS integration teamMFS TROUBLESHOOTING GUIDE
B9 RELEASE
System:ALCATEL 900 / BSS
Sub-system:MFS
Document Category:USER GUIDE
ABSTRACT
This document constitutes the reference location for storing troubleshooting actions related to operation of MFS B9. It is restricted to ALCATEL internal usage, notably for ALCATEL personnel providing on site support at customer premises.
This document will be updated each time new problem occurs.
Approvals
Name
App.J-J BELLEGOG. ACBARDB. FERNIER
Name
App.D. COTTIN
REVIEW
ED 12 RL07-07-06Reading report EVOLIUM/R&D/TD/MFS/2006-4968-PME
ED 13 RL27-09-06Reading report EVOLIUM/R&D/TD/MFS/ 2006-5042-PME
ED 14 RL27-11-06Reading report EVOLIUM/R&D/TD/MFS/ 2006-5092-PME
HISTORY
Ed. 01 Proposal 01Cancelled B8 chapters (FR close OUT, NRE, REL)
Ed. 01 Proposal 0201-11-2004P.MENON
Some clean up + synchronization with new tips from B8
Ed. 01 Proposal 0308-11-2004P.MENON
Suppress redundant informations with MFS Installation;Configuration,and Software replacement guide
Ed. 01 Proposal 0416-11-2004P.MENON
Minor corrections
Ed. 01 Proposal 0516-02-05P.MENON
Add Unix boot impossible (wrong default kernel)
Add check if backup Mib is not corrupted
Add How to get contents of unix patch BL
- Add for Trace of unix patch installation
Ed. 01 release11-03-05Release for B9 MR0
Ed. 02 release01-06-05Release for B9 MR2
P.MENON
- update Corrective action: second step (install_lsm)
Ed. 03 release02-06-05Release for B9 MR2
P.MENON
- S99trace_srv.ds is renamed in S99trace_server.ds since MFSAW10F
Ed. 04 release02-06-05Release for B9 MR2
P.MENON
- Add for Failure on Update Remote Inventory
Ed. 05 release09-06-05Release for B9 MR2
P.MENON
- Add for rmdir fails during execution of ins_swcx.sh when cygwin is installed on the PC
Ed. 06 release30-06-05Release for B9 MR2
P.MENON
-update Error at step 5/10 (Isolation) Check the full SCSI chain...
Ed. 07 release06-07-05Release for B9 MR2
P.MENON
-Add Connection by ftp from a MFS station to an external server is impossible FR 3BKA20FBR164817
- Add TRACE_SERVER does not run FR 3BKA13FBR164932
Add GPU traces are not completed
Add Impossible to load patch GPU B8 on GPUs FR 3BKA13FBR164932
Add Unix patch installation from OMC stopped due to a network failure
Add After a roll-back it is impossible to open the IMT terminal FR 3BKA20FBR162930
Ed. 08 Proposal 0130-08-05P.MENON
- Add for Inall procedure stopped due to a station in "halt in" state FR 3BKA13FBR166921
Ed. 08 Proposal 0201-09-05P.MENON
- Add new Installation from a not english PC fails (FR 3BKA20FBR166358)
Ed. 08 Proposal 0309-09-05P.MENON
Add new The trace server stops running after a while (FR 3BKA13FBR169218)
Add new Result of dupatch in B8 or B9 RC40 with BL24
Ed. 08 Proposal 0420-09-05P.MENON
Add new Control station reboots in loop with reset_code 214 after installation of BL22 (FR 3BKA13FBR170335)
Ed. 08 Proposal 0520-10-05P.MENON
- Update Control station reboots in loop with reset_code 214 after installation of BL22 (FR 3BKA13FBR170335)
- Suppress yellow paragraph
Ed. 08 Release28-10-05Release
Ed. 09 Release13-01-06Release
P.MENON
- quality corrections
- update Error at Step 2 (Creation)
- Add new MFS UNIX patch installation makes Control Station unusable (B9 MR1 ED2) (FR 3BKA23FBR174370)
- Update Error at step 5/10 (Isolation) (FR 3BKA13FBR175829)
- Add new Reinstallation of the MFS and restauration of data from OMC
- Add new How to restore the MIB without needing full reinstallation
- Add new Sanity check script to prevent any potential problem on the MFS
Ed. 10 Proposal 0108-02-06P.MENON
- Add new GPU problem but alarm is "Failure of a JAET1 applique" (FR 3BKA13FBR177178)
- Add new no more available disk space on /usr (FR 3BKA20FBR176683)
Ed. 10 Proposal 0209-02-06P.MENON
- update Sanity check script to prevent any potential problem on the MFS (FR 3BKA13CBR176689)
Ed. 10 Proposal 0313-02-06P.MENON
- update Sanity check script to prevent any potential problem on the MFS (FR 3BKA13CBR176689) after remarks
- add Result of dupatch in B9 with BL22 since MR1 Edx (MFSSAW11E)
Ed. 10 Proposal 0416-02-06P.MENON
- update Error at step 5/10 (Isolation)
- add System and Tomas (Nectar was the name in a former time) traces
Ed. 10 Proposal 0522-02-06P.MENON
- add Wrong httpd.conf
Ed. 10 Release23-02-06Release
Ed. 11 Release02-03-06Release
P.MENON
- update Sanity check script to prevent any potential problem on the MFS (FR 3BKA13CBR176689)
- add JBETI traces
- update The trace server stops running after a while- update TRACE_SERVER does not run
- add not enough space for Backup MIB
- add new GPU switch over no more possible (FR 3BKA20FBR149993 and 3BKA20FBR151855)
Ed. 12 Release30-06-06Release
P.MENON
- update Traces of unix patch installation
- update O&M trace SCIM (RTA)
- update GPU switch over no more possible, JBETI problems
- Add new Rebuild of mirrored partitions on RC40- rename Sanity check script to prevent any potential problem on the MFS to AuditMFS script to prevent any potential problem on the MFS
- Add new not possible to get PM of MFS from OMC FR 3BKA13FBR183494 not possible to unlock omcxchg account from User management option of IMT FR 3BKA13FBR183497
- Update Check if backup Mib is corrupted
- Update CRAFT cannot connect to MFS floating IP:wrong httpd.conf
- Update AuditMFS script to prevent any potential problem on the MFS with new codes FR/CR 3BKA13CBR179923 3BKA13CBR180184 3BKA13CBR180203 3BKA13CBR180618
- Add new Impossible to enable MRTG Collector FR 3BKA13FBR186503
- Add new active Control Station is blocked after automatic backup MIB on RC40 FR 3BKA13CBR189473
- Add new MFS UNIX patch installation fails with a core file generated from 'install_patch_du' FR 3BKA13FBR189822
- Merge with MX Trouble Shooting descriptions
Ed. 12 Release05-07-06Release
P.MENON
- update Sleeping cells
- Add new bul file execution returns 1 error FR 3BKA13FBR186955
- Add new No RRALLI sent on GSLs, for all cells of the BSC, after activate MLU FR 3BKA20FBR186403
Ed. 13 Proposal 0104-09-06P.MENON
- Update Error at step 3/10:
. after installation from scratch of B9 version containing the script clean_spdata, the next migration does not work (FR 3BKA13FBR188065)
. after installation from scratch a MFS which was coming from migration or software replacement, with restoration of the backup MIB, a new migration or software replacement fails (FR 3BKA13FBR194235/3BKA13FBR193877/3BKA13FBR181238)
- Add new Cell parameters modification is not allowed from IMT (BUI request)
- Add new dataPatch.bul" error during scratch installation in B9 MR4 (FR 3BKA13FBR185034)
- update AuditMFS script to prevent any potential problem on the MFS error codes added (118: CS are not time synchronized (CR3BKA13CBR193667) and 406: discrepancies in version descriptor files (CR 3BKA13CBR193904)
Ed. 13 Proposal 0227-09-06P.MENON
. update after installation from scratch a MFS which was coming from migration or software replacement, with restoration of the backup MIB, a new migration or software replacement fails (FR 3BKA13FBR194235/3BKA13CBR193877/3BKA13FBR181238)
- update AuditMFS script to prevent any potential problem on the MFS
Ed. 13 Release11-10-06P.MENON
Release approved
Ed. 14 Proposal 0127-10-06
15-11-06
16-11-06P.MENON
- Add new Serial splitter and RJ45 converter for Trouble shooting ( MFS Evolution only)
- Add new How to generate/backup on a platform MFS a virgin MIB and how to import this MIB on a field MFS.
(same architecture / same SW level) (CR 3BKA13CBR194432)
- Add new Impossible to install MFS Sanity Check Script (AW11EP_00D) (FR 3BKA13FBR196644)
- update JBETI trace
D. COTTIN
- Add FR 3BKA20FBR199071 Hangingalarm"Cardvoltageoutofrange"forMFSJBXSSWafterpowerfailure
- Add 3BKA20FBR186125 PV_PEM alarms raised and not cleared from IMT and OMCR after MFS power off/on
- Add 3BKA13FBR199323 mfssetup or configure_switch failure after replacing a new SSW board
Ed. 14 Release27-11-06P.MENON
Release approved
- Add new How to detect JBETI is not blocked
TABLE OF CONTENTS
141Introduction
141.1.1Document organisation
141.1.2Presentation
152GPU
152.1GPUs disappear from the IMT
152.1.1Reference FR: None.
152.1.2Problem description
152.1.3Corrective action
172.2GPU SO Impossible
172.2.1Reference FR: 3BKA20FBR108914
172.2.2Problem description
172.2.3Corrective action
172.3GPU reboots continuously
172.3.1Reference FR: 3BKA20FBR119782
172.3.2Problem description
182.3.3Corrective action
182.3.4Problem solved
182.4GPU connection problem
182.4.1Reference FR: none
182.4.2Problem description
182.4.3Corrective action
192.5GPU problem but alarm is "Failure of a JAETI1 applique"
192.5.1Reference FR: 3BKA13FBR177178
192.5.2Problem description
192.5.3Corrective action
192.6GPU switch over no more possible, JBETI problems
192.6.1Reference FR: 3BKA20FBR149993, 3BKA20FBR151855 and 3BKA13FBR163557
192.6.2Problem description
202.6.3Preventive action
202.6.4Corrective action
202.7GPU SW is not loaded ( MFS Evolution only )
202.7.1Reference FR: 3BKA13FBR 175541
202.7.2Problem description
202.7.3Corrective action
213INSTALLATION
213.1Station restart
213.1.1Reference FR: None.
213.1.2Problem description
213.1.3Corrective action
233.2Impossible to rlogin/telnet to MFS as root
233.2.1Reference FR: None.
233.2.2Problem description
233.2.3Corrective action
233.3Unix boot impossible (wrong default kernel)
233.3.1Reference FR: None.
233.3.2Problem description
243.3.3Corrective action
253.4dataPatch.bul" error during scratch installation in B9 MR4
253.4.1Reference FR: 3BKA13FBR185034
253.4.2Problem description
253.4.3Corrective action
254MFS based on RC40
264.1Installation from a not English PC fails
264.1.1Reference FR: FR 3BKA20FBR166358
264.1.2Problem description
264.1.3Corrective action
264.2MFS installation failed
264.2.1Reference FR: none
264.2.2Problem description
264.2.3Corrective action
294.3Inall procedure stopped due to a station in "halt in" state
294.3.1Reference FR: 3BKA13FBR166921
294.3.2Problem description
294.3.3Corrective action
304.4Failure during the SWC from the OMC at step 1/10 (before file transfer)
304.4.1Reference FR: none
304.4.2Problem description
304.4.3Corrective action
304.5Unix boot impossible
304.5.1Reference FR: None.
304.5.2Problem description
304.5.3Corrective action
314.6Rebuild of mirrored partitions on RC40
314.6.1Reference FR: None.
314.6.2Problem description
314.6.3Corrective action
324.7active Control Station is blocked after automatic backup MIB on RC40
324.7.1Reference FR: 3BKA13CBR189473
324.7.2Problem description
324.7.3Corrective action
334.7.4Impacts
335MFS based on MX
345.1"Inall" failed during MX-MFS installation
345.1.1Reference FR: None.
345.1.2Problem description
345.1.3Corrective action
345.2Connection to OMCP using console redirection does not work
345.2.1Reference FR: None.
345.2.2Problem description
355.2.3Corrective action
355.3"inall" failed during MX-MFS installation
355.3.1Reference FR: None.
355.3.2Problem description
355.3.3Corrective action
365.4"inall" failed during MX-MFS installation
365.4.1Reference FR: None.
365.4.2Problem description
365.4.3Corrective action
365.5Error at step 2/10 (Creation) ( MFS Evolution only )
365.5.1Reference FR
365.5.2Problem description
365.5.3Corrective action
375.6Error at step 3/10 (Verify) ( MFS Evolution only )
375.6.1Reference FR: None.
375.6.2Problem description
375.6.3Corrective action
375.7Error at step 7/10 (Validation) - ( MFS Evolution only )
375.7.1Reference FR: None.
375.7.2Problem description
385.7.3Corrective action
385.8The stand-by station is not operational ( MFS Evolution only )
385.8.1Reference FR: None.
385.8.2Problem description
385.8.3Corrective action
395.9Ethernet connection problem ( MFS Evolution only )
395.9.1Reference FR : none
395.9.2Problem description
395.9.3Corrective Action
435.10Impossible to connect IMT ( MFS Evolution only )
435.10.1Reference FR : 3BKA20FBR175917
435.10.2Problem Description
435.10.3Corrective Action
445.11After Power-on of ATCA shelf, OMCP servers are powered-off ( MFS Evolution only )
445.11.1Reference FR 3BKA20FBR172514
445.11.2Problem description
445.11.3Corrective action
445.12How to update time from OMC ( MFS Evolution only )
445.12.1Reference FR
445.12.2Problem description
445.12.3Corrective action
455.12.4Problem solved
455.13NE1oE supervision lost ( MFS Evolution only )
455.13.1Reference FR: None
455.13.2Problem description
455.13.3Corrective action
465.14Extension from 1 shelf configuration to 2 shelves configurations has failed ( MFS Evolution only )
465.14.1Reference FR: None
465.14.2Problem description
465.14.3Corrective action
475.15No RRALLI sent on GSLs, for all cells of the BSC, after activate MLU ( MFS Evolution only )
475.15.1Reference FR: 3BKA20FBR186403
475.15.2Problem description
475.15.3Corrective action
475.16Hangingalarm"Cardvoltageoutofrange"forMFSJBXSSWafterpowerfailure ( MFS Evolution only )
475.16.1Reference 3BKA20FBR199071
475.16.2Problem description
475.16.3Corrective action
485.17PV_PEM alarms raised and not cleared from IMT and OMCR after MFS power off/on ( MFS Evolution only )
485.17.1Reference 3BKA20FBR186125
485.17.2Problem description
485.17.3Corrective action
485.18mfssetup or configure_switch failure after replacing a new SSW board. ( MFS Evolution only )
485.18.1Reference 3BKA13FBR199323
485.18.2Problem description
485.18.3Corrective action
496AUTOMATIC SOFTWARE CHANGE
506.1Error during execution of ins_swcx.sh
506.1.1Reference FR: None.
506.1.2Problem description
506.1.3Corrective action
516.2rmdir fails during execution of ins_swcx.sh when cygwin is installed on the PC
516.2.1Reference FR: 3BKA13FBR163888
516.2.2Problem description
516.2.3Corrective action
516.3Error Temporary local directory error on IMT during step 0
516.3.1Reference FR: None.
516.3.2Problem description
516.3.3Corrective action
516.4Error File Access Error" with dlv.bck always appears when doing SW replacement
516.4.1Reference FR: 3BKA20FBR150527
516.4.2Problem description
516.4.3Corrective action
536.5Error at step 2/10 (Creation)
536.5.1Reference FR: None.
536.5.2Problem description
536.5.3Corrective action
546.6Error at step 3/10 (Verify)
546.6.1Reference FR: 3BKA20FBR099035 = 3BKA13FBR102355
546.6.2Problem description
546.6.3Corrective action
576.6.4Reference FR: 3BKA13FBR188065
586.6.5Reference FR: 3BKA13FBR194235, 3BKA13CBR193877, 3BKA13FBR181238
596.7Error at step 5/10 (Isolation)
596.7.1Reference FR: 3BK - A13FBR096085 / 105356 / 112480 - A20FBR096035 / 105055 / 129810 / 139842 - A23FBR174097
596.7.2Save traces
596.7.3Problem description
606.7.4Specific casefor 3BKA20FBR129810 : Problem occurs while Backup Server is down.
616.7.5Specific casefor 3BKA13FBR175829: broken shared disk
686.8Error at step 6/10 (Major version change)
686.8.1Reference FR: 3BKA13FBR107676
686.8.2Problem description
686.8.3Check if disks are shared correctly
686.8.4Corrective action
696.9Error at step 7/10 (Validation)
696.9.1Reference FR: None.
696.9.2Problem description
696.9.3Corrective action
696.10Control station reboots in loop with reset_code 214 after installation of BL22
696.10.1Reference FR: 3BKA13FBR170335
696.10.2Problem description
706.10.3Corrective action
706.11MFS UNIX patch installation makes Control Station unusable (B9 MR1 ED2)
706.11.1Reference FR: 3BKA23FBR174370
706.11.2Problem description
726.11.3Corrective action
736.12MFS UNIX patch installation fails with a core file generated from 'install_patch_du'
736.12.1Reference FR: 3BKA45FBR188097/3BKA25FBR188087/ 3BKA13FBR189822
736.12.2Problem description
746.12.3Corrective action
756.13bul file execution returns 1 error
756.13.1Reference FR: 3BKA13FBR186955
756.13.2Problem description
756.13.3Corrective action
767MFS Running
777.1The stand-by station is not operational
777.1.1Reference FR: None.
777.1.2Problem description
777.1.3Corrective action
777.2Station not reachable
777.2.1Reference FR: none
777.2.2Problem description
787.2.3Corrective action
787.2.4Problem solved
787.3System console not reachable
787.3.1Reference FR: none
787.3.2Problem description
787.3.3Corrective action
797.4A process is looping
797.4.1Reference FR: 3BKA45FBR119174
797.4.2Problem description
797.4.3Corrective action
807.4.4impacts
807.5Reboots in loop on MFS reset due to bad IP address
807.5.1Reference FR: 3BKA20FBR079434 - 3BKA20FBR081233 (close NIP)
807.5.2Problem description
807.5.3Corrective action
817.6Reboots in loop due to no more disk space
817.6.1Reference FR: None
817.6.2Problem description
817.6.3Corrective action
827.7OMC-MFS link problem at different interface cases
847.8Ethernet connection problem
847.8.1Reference FR: none
847.8.2Problem description
847.8.3Corrective Action
847.9Sleeping cells
847.9.1Alerter definition
857.10DS10 servers dont come up automatically after power off/power on
857.10.1Reference FR: 3BKA45FBR17363, 3BKA20FBR135619
857.10.2Problem description
877.10.3Corrective action
887.11Failure on Update Remote Inventory
887.11.1Reference FR: none
887.11.2Problem description
887.11.3Corrective Action
887.12Connection by ftp from a MFS station to an external server is impossible
887.12.1Reference FR: 3BKA20FBR164817
887.12.2Problem description
887.12.3Corrective action
897.13The trace server stops running after a while
897.13.1Reference FR: 3BKA13FBR169218
897.13.2Problem description
897.13.3Corrective action
897.14TRACE_SERVER does not run
897.14.1Reference FR: 3BKA13FBR164932
897.14.2Problem description
907.14.3Corrective action
907.15GPU traces are not completed
907.15.1Reference FR: none
907.15.2Problem description
907.15.3Corrective action
907.16Impossible to load patch GPU B8 on GPUs
907.16.1Reference FR: 3BKA13FBR164932
907.16.2Problem description
907.16.3Corrective action
907.17Unix patch installation from OMC stopped due to a network failure
907.17.1Reference FR: none
907.17.2Problem description
917.17.3Corrective action
917.18After a roll-back it is impossible to open the IMT terminal
917.18.1Reference FR: 3BKA20FBR162930
917.18.2Problem description
917.18.3Corrective action
917.19Telnet access from Windows
917.19.1Reference FR: none
917.19.2Problem description
917.19.3Corrective Action
927.20no more available disk space on /usr
927.20.1Reference FR: 3BKA20FBR176683
927.20.2Problem description
927.20.3Corrective action
937.21CRAFT cannot connect to MFS floating IP:wrong httpd.conf
937.21.1Reference FR: 3BKA13FBR177317
937.21.2Problem description
937.21.3Corrective action
947.22not enough space for Backup MIB
947.22.1Reference FR: none
947.22.2Problem description
957.22.3Corrective action
957.23not possible to get PM of MFS from OMC, not possible to unlock omcxchg account from User management option of IMT
957.23.1Reference FR: 3BKA13FBR183494, 3BKA13FBR183497
957.23.2Problem description
967.23.3Corrective action
967.24Impossible to enable MRTG Collector
967.24.1Reference FR: 3BKA13FBR186503
967.24.2Problem description
967.24.3Corrective action
977.25Cell parameters modification is not allowed from IMT (BUI request)
977.25.1Reference FR: none
977.25.2Problem description
977.25.3Corrective action
977.26Impossible to install MFS Sanity Check Script (AW11EP_00D)
977.26.1Reference FR: 3BKA13FBR196644
977.26.2Problem description
987.26.3Corrective action
988Crash/Traces
988.1Determine crash cause
998.2Save traces
998.3O&M trace
998.3.1SCIM (RTA)
998.3.2Q3
1008.3.3RETIX
1008.3.4UNIX
1008.4GPU trace
1008.4.1Trace level
1018.4.2Which level to activate
1028.4.3How to modify size of mfs_trace_p_XX file?
1038.5JBETI trace
1038.6Traces of unix patch installation
1048.7Problems
1048.7.1GPU traces
1048.7.2Trace Server
1058.7.3Disk quota
1058.7.4mfs_trace_p_XX traces location
1058.8System and Tomas (Nectar was the name in a former time) traces
1058.8.1system traces (if required)
1068.8.2Advfs traces
1068.8.3TOMAS traces
1068.9NE1oE Traces
1079Various information
1079.1User count creation via IMT on MFS
1079.1.1Reference FR: 3BKA45FBR144680
1079.1.2Problem description
1079.1.3Corrective action
1089.2Update disk usage information
1089.2.1Problem description
1089.2.2Action
1089.3Shared disks access
1089.3.1Problem description
1089.3.2Action
1109.4How to get MFS component versions
1109.4.1Problem description
1109.4.2Action
1139.5How to know how many IMT are open at same time ?
1139.5.1Reference FR: none
1139.5.2Problem description
1139.5.3Corrective action
1149.6How to update time from OMC
1149.6.1Reference FR: 3BKA13FBR141970
1149.6.2Problem description
1149.6.3Corrective action
1159.6.4Problem solved
1169.7MFS restoration problem
1169.7.1Problem description
1169.7.2Corrections description
1169.8MFS system restoration problem: supervision ( MFS Evolution only )
1179.9Backup ( MFS Evolution only )
1179.10Restore ( MFS Evolution only )
1189.11Check if backup Mib is corrupted
1189.11.1Reference FR
1189.11.2Problem description
1209.11.3Correction description
1209.12Reinstallation of the MFS and restauration of data from OMC
1209.12.1Reference FR: 3BKA13CBR177682
1209.12.2Problem description
1209.12.3Correction description
1209.13How to get contents of Unix patch BL
1209.13.1Problem description
1209.13.2Action
1239.14How to restore the MIB without needing full reinstallation
1239.14.1Reference FR: 3BKA13CBR177682
1239.14.2Problem description
1239.14.3Correction description
1249.15AuditMFS script to prevent any potential problem on the MFS
1249.15.1Reference FR: 3BKA13CBR176689
1249.15.2Return codes explanation
1269.15.3Corrective action
1339.15.4Example on AS800 (based on Tomas RC23)
1409.15.5Example on DS10 (based on Tomas RC23)
1489.15.6Example on DS10 (based on Tomas RC40)
1559.16Serial splitter and RJ45 converter for Trouble shooting ( MFS Evolution only)
1559.16.1Reference FR: None
1559.16.2Problem description
1559.16.3Action
1569.17How to generate/backup on a platform MFS a virgin MIB and how to import this MIB on a field MFS (same architecture / same SW level)
1569.17.1Reference FR: 3BKA13CBR194432
1569.17.2Problem description
1569.17.3Action
1579.18How to to detect JBETI is not blocked
1579.18.1Reference FR: none
1579.18.2Problem description
1579.18.3Correction description
15810GLOSSARY AND ABBREVIATIONS
160AHW settings of environmental variables (FW)
INTERNAL REFERENCED DOCUMENTS
Not applicable
REFERENCED DOCUMENTS[ 1 ] MFS B9 installation user guide, reference 3BK 09679 JAAA RJZZA
[ 2 ] EVOLIUM A9135 MFS MAINTENANCE HANDBOOK, reference 3BK 20935 AAAA PCZZA
[ 3 ] B8/B9 A9135 MFS SOFTWARE MIGRATION Release B9, reference 3BK 17422 0202 RJZZA
RELATED DOCUMENTSPMU logging messages description and principles release B6.23BK 09850 FCAD PWZZA
OPEN POINTS / RESTRICTIONSno open point and no restriction have been found1 Introduction
1.1.1 Document organisation
This document is organized the following way:
1) This chapter
2) Troubles coming from GPU, with, most of the time a Quality Alert attached
3) Troubles coming at installation time
4) Troubles coming at SW change time, depending on the SWC phase
5) Troubles happening when MFS is started
6) What to do in case of crash, which information to be kept?
7) How to set and to get traces
8) Information: general information, as disk usage,
Plus an appendix for specific information
A) IOLAN configuration
B) HW setting of environmental variables
1.1.2 Presentation
Each chapter are introduced with a table summarising the addressed problems, origin and fix.
Very few chapters can be shown to the customer. They are highlighted in green.
Commands are presented in grey rectangle
2 GPU
What/behaviorTrouble originFix
1) GPUs disappear from IMT1 or more GPU with bad componentsChange GPU
2) Impossible GPU switch overJAE1 applique mistakeChange JAE1
3) GPU reboots continuously GPU FW mistake Change the GPU
4) GPU connection problem Connection, ethernet Check Ethernet,
Extract and re-plug the board
5) GPU problem but alarm is "Failure of a JAETI1 applique"Faulty GPUChange faulty GPU
6) GPU switch over no more possibleJBETI becomes blockedreset the active JBETI
7) GPU SW is not loaded No more DHCP lease availableRemove DHCP lease file
2.1 GPUs disappear from the IMT
2.1.1 Reference FR: None.
2.1.2 Problem description
One or more (up to all) GPU in a subrack disappear from time to time on the Craft terminal (IMT), like they have been unpluged.
The GSM and GPRS remains available, but its impossible to perform any remote action to these GPU (download or modify the configuration, switch over, reset data, lock ).
A reset hardware (=> outage telecom GSM + GPRS) solve the problem for a short time (< 1 day).
2.1.3 Corrective action
At least one GPU in the subrack can have bad hardware components.
All GPU of the subrack must be checked.
To check one GPU, unplug it (=> outage telecom GSM + GPRS).
Then compare the 5 components references like on the following pictures:
For these 5 components (XXX):
FB2041 is the good reference
FBL2041 is a wrong reference!
Bad component must be changed.
2.2 GPU SO Impossible
2.2.1 Reference FR: 3BKA20FBR108914
2.2.2 Problem description
Switch-over of one GPU (by Craft Terminal) on the spare GPU is not possible : the spare GPU begins to load its telecom configuration, but, some seconds after, the board is blocked and alarms On free run mode appears. The traffic is stopped.
When a switch-back of the GPU board is done, the traffic comes back, and everything is normal.
To confirm the problem, switch some GPU and some applique.
The problem is due to a difference between 2 variants of the appliques for the technology of the redundancy bus transceivers: The AxABxx version is equipped with component FB2041BB (running with VCC= 5V ), and AxAAxx version is equipped with FBL2041BB (running with VCC=3.3 V).
An hardware correction under study for 3BK08231AxABxx pcm appliques.
2.2.3 Corrective action
It has been demonstrated that the pcm applique with the reference number 3BK08231AxABxx causes the problem. PCM applique 3BK08231AxAAxx, must be fully operational.
JAE1C boards (75 PCM) : Check the pcm applique reference:
3BK08231ABAAxx: good board
3BK08231ABABxx: faulty board: Change the board by a good JAE1C
JAE1 boards (120 PCM): Check the pcm applique reference:3BK08231AAAAxx: good board
3BK08231AAABxx: faulty board: Change the board by a good JAE1
2.3 GPU reboots continuously
2.3.1 Reference FR: 3BKA20FBR119782
2.3.2 Problem description
The GPU reboots Continuously after configuration completed and board unlocked with GPU. After GPRS has been configured and the GPU and GPRS unlocked, it reboots continuously. When a switchover is performed, the same problem occurs.
In internal GPU traces (file mfs_trace_p_XX), the following traces indicate there is a failure in PMU package initialisations:
DATA_ERR : T: 200 : rrmswcomp.cpp : 160 : Cell Traffic Package init failure...
DATA_ERR : T: 200 : rrmswcomp.cpp : 172 : Bss Management Package init failure...
Then, check if the GPU reference (on the front side of the board) is GPU 3BK08064ABAC01
2.3.3 Corrective action
If the GPU reference is GPU 3BK08064ABAC01, and if the behavior is as described above, then contact the Local TAC, who has to change the GPU and to send the faulty GPU to Alcatel Repair Center, where a fix will be applied.
The problem is due to a bad detection of the remote inventory by the firmware of the GPU: the firmware checks in the remote inventory the combination of functional variant (VF), realization variant (VR) ABAA. This is a bug, it should check (VF) AB field only and not care about (VR) AC field. As ABAA is not found, the GPU board is not detected as JBGPU2 ( with 128 MB of PPC memory ), but by default as a JBGPU ( with 64 MB of PPC memory). It explains that some PMU packages can not initialize their memory allocation. 2.3.4 Problem solved
Hardware correction under study.
2.4 GPU connection problem
2.4.1 Reference FR: none
2.4.2 Problem description
GPU stays initial/idle (craft site view) and does not connect to the MFS. The led can be either fixed or blinked orange.
2.4.3 Corrective action
1. Check that at least one Ethernet link is plugged for that GPU in one of the switch.
2. Launch a Console on that GPU: plug a cable between the debug output of the applique and a COM port. (CTRL uu to enter GPU menu). Type help to list the available command. ve /vi display MAC / IP addresses.
3. If the GPU initialization is stopped at boot request (the GPU does not know its IP address) ( there is no connection between GPU and control station. Check that UDP packets corresponding to boot request are actually sent through one of the interface (tu1 or tu2):
Set-up the tcpdump on the net:
cd /dev
./MAKEDEV pfilt
pfconfig +p +c tu1
tcpdump i tu1 udp port 68
(if necessary : lan_config I tu1 s 10 x 0 a 0
# Set output to 10 Mega )
Packet sent through port 68 are bootpc (client = GPU) ones. Port 67 packets are bootps (server = control station) answer.
Check the file /etc/bootptab : It should have a line giving the board IP address according to the Ethernet address :
gpu1_lg0:tc=DS.default:ha=00809F090804:ip=1.1.1.50:bf=Loader.hex:\
ha is the Ethernet address, check with the console that the GPU gives the right address.
4. If the GPU initialisation is stopped at BNP init (On GPU console, the following messages is printed:
Wait for answer from GEM since x seconds
( there is a communication between the GPU and the control station (it is not an Ethernet problem). It is a known bug (see FR:A13/90904). Workaround: extract the board and plug it again. (This may be done several times)
2.5 GPU problem but alarm is "Failure of a JAETI1 applique"
2.5.1 Reference FR: 3BKA13FBR177178
2.5.2 Problem description
The origin of this issue seems to be real HW problem (faulty GPU) but the alarm is reported on the wrong board. (problem occured in B8 MR5 Ed4)
The GPU's part number is 3BK08064ACAB06 and it is not impacted by known Quality Alerters
2.5.3 Corrective action
Unplug the problematic GPU and after reset the JBET1 either on left or right handside.
2.6 GPU switch over no more possible, JBETI problems2.6.1 Reference FR: 3BKA20FBR149993, 3BKA20FBR151855 and 3BKA13FBR1635572.6.2 Problem description Sometimes the JBETI becomes blocked, so that it won't treat any request ( Remote inventory, Gpu reset, Gpu switchover ), and alarm are not cleared neither raised, while alls led on the JBETI are green: a switchover is done on spare GPU but no telecom traffic possible.we can fall in this situation for the following reasons
- after a GPU crash:
On the GPU software crash, O&M detect the loss of supervion of this GPU board and send a reset order to this GPU through the JBETI, but as JBETI is blocked the GPU won't reset/reboot.
Then after a while ( about 3 minutes ), as O&M don't see the GPU rebooting ( it conclude that the GPU is failling) , so O&M send a GPU switchover order (to route the PCM signal from applique to the spare GPU ), and send Telecom configuration to the spare GPU.
but as the JBETI is blocked, it will not treat the GPU switchover order, leading to not traffic possible on spare GPU.- after a manual GPU switchover command sent from IMT
O&M send a GPU switchover order (to route the PCM signal from applique to the spare GPU ), and send Telecom configuration to the spare GPU.
but as the JBETI is blocked, it will not treat the GPU switchover order, leading to not traffic possible on spare GPU.To confirm that the JBETI is blocked :
a remote inventory command from IMT will fail in time-out
2.6.3 Preventive action
None2.6.4 Corrective actionwhen JBETI is suspected as blocked, reset the active JBETI
2.6.4.1 Problem solved
MFS.PATCH.B9_0.RCxx.11EP_00G (JBETI_AA patch) and MFS.PATCH.B9_0.RCxx.11EP_00H (JBETI_AB patch) solve the problem in B9 MR1 ED4 QD11 (MFSSAW11E/41E, MFSSAW11F/41F)
2.7 GPU SW is not loaded ( MFS Evolution only )
2.7.1 Reference FR: 3BKA13FBR 175541
2.7.2 Problem description
Lease related to BOOTP is infinite so it is necessary to remove the lease file to be able to replace the boards with no constraint (32 different GP boards can be plugged)
2.7.3 Corrective action
Remove the /var/dhcp/dhcpd.leases
At any terminal accessing the active STATION, type
STATION_x> cd /var/dhcp/dhcpd.leases
STATION_x> rm dhcpd.leases
STATION_x> rm dhcpd.leases~
2. Then kill the DHCP server (it may trigger an OMCP switch-over):
STATION_x> ps -efd | grep dhcpd
root 1205 1115 0 Dec13 ? 00:00:00 /usr/nectar/bin/dhcpd_ctrl
root 1467 1205 0 Dec13 ? 00:00:02 /usr/sbin/dhcpd -cf /nfm_local/spdata/nectar/dhcpd/dhcpd.conf -f -q eth0 eth1
STATION_x> kill -9 1467
3 INSTALLATION
What/behaviorTrouble originFix
1) Both stations restartWrong address declarationModify address
2) rlogin is refusedImpossible rlogin/telnet as rootModify securettys file
3) Unix boot impossibleWrong default kernelModify boot_file variable in Firmware
4) dataPatch.bul" error during scratch installation in B9 MR4conf_alarm [cfgalarm101AH] object is defined twice in bul filesthis error is expected without bad effect on the MFS
3.1 Station restart
3.1.1 Reference FR: None.
3.1.2 Problem description
In some cases, when trying to restart one station, both of them restart, this may be due to the fact that they are declared to a wrong address.
3.1.3 Corrective action
Check (and if necessary modify) the firmware software configuration of the control stations, which can be accessed through the system console.
1. At any terminal accessing one of the control stations (STATION_A or STATION_B, by telnet or rlogin), it is possible to access the system console of any control station by typingeither:
STATION_x> telnet 1.1.1.20 10002
(for STATION_A system console)
STATION_x> telnet 1.1.1.20 10003
(for STATION_B system console)
2. Type some to get the prompt; then:
1) The UNIX login or the shell prompt is displayed: login root if necessary, then halt the station gently under the firmwareby typing the following command:
STATION_x> init 0
When the firmware prompt is displayed ( >>> ), go to step 6).
2) The machine doesnt react and the display is still: force the machine to stop by typing the keystroke sequence:
rmc
3) Then, when the RMC prompt is available:
RMC>halt in
4) Then again:rmc
5) Then when the RMC prompt is available:
RMC>halt out
Then the firmware prompt should be available.
6) Type the following command
>>>show *
(give all firmware variables values)
(Refer to Appendix B for a list of currently advised values depending on the hardware configuration)
If values are erronous, especially pka0_host_id, pkb0_host_id, pkc0_host_id and auto_action, modify them. For example:
>>>set pkc0_host_id 6
When all checks and modifications are done, do the following:
>>>init
The machine should now reboot automatically.
Now release the system console (as it is used from time to time by NECTAR Hardware Management):
On Sun station, by typing:
]
(simultaneously control and closing square bracket) then:
telnet>quit
If the console is accessed from the other station through a PC/NT X terminal, close simply the window.
(Another method to release the system console is to restart the iolan (see other chapter) from another session).
3.2 Impossible to rlogin/telnet to MFS as root
3.2.1 Reference FR: None.
3.2.2 Problem description
When trying to rlogin to the CS root, action is refused by the control station (access denied)
3.2.3 Corrective action
The file /etc/securettys is not good : it should include a line ptys to enable to be root from another terminal.
Login as admin on one of the control stations.
telnet 1.1.1.20 10002 /10003 to gain access to the system console (see 3.1.3 for more details)
login as root
type
echo ptys >> /etc/securettys
This adds the line ptys in securettys
perform the same action on the other control station
release the terminal or (in case of problem) reboot the iolan (telnet 1.1.1.20, return, su, iolan, reboot)
3.3 Unix boot impossible (wrong default kernel)
3.3.1 Reference FR: None.
3.3.2 Problem description
Unix cant boot because it cant open the default kernel 'vmunix.pre_capmn':
You should have the following at the console:
ff.fe.fd.fc.fb.fa.f9.f8.f7.f6.f5.f3.f2.f1.f0.ef.df.ee.f4.
probing hose 0, PCI
probing PCI-to-EISA bridge, bus 1
probing PCI-to-PCI bridge, bus 2
bus 0, slot 5 -- pka -- QLogic ISP10x0
bus 0, slot 6 -- vga -- S3 Trio64/Trio32
bus 2, slot 0 -- ewa -- DE500-BA Network Controller
bus 2, slot 1 -- ewb -- DE500-BA Network Controller
bus 2, slot 2 -- ewc -- DE500-BA Network Controller
bus 2, slot 3 -- ewd -- DE500-BA Network Controller
bus 0, slot 12, function 0 -- pkb -- NCR 53C875
bus 0, slot 12, function 1 -- pkc -- NCR 53C875
ed.ec.*** keyboard not plugged in...
eb.....ea.e9.e8.e7.e6.e5.e4.e3.e2.e1.e0.
V5.8-24, built on Jul 11 2001 at 10:57:51
Memory Testing and Configuration Status
512 Meg of System Memory
Bank 0 = 512 Mbytes(128 MB Per DIMM) Starting at 0x00000000
Bank 1 = No Memory Detected
CPU 0 booting
waiting for pkb0.6.0.12.0 to poll...
(boot dka0.0.0.5.0 -file vmunix.pre_capmn -flags S)
block 0 of dka0.0.0.5.0 is a valid boot block
reading 16 blocks from dka0.0.0.5.0
bootstrap code read in
Building FRU table
FRU table size = 0xbed
base = 1d2000, image_start = 0, image_bytes = 2000
initializing HWRPB at 2000
initializing page table at 1ffce000
initializing machine state
setting affinity to the primary CPU
jumping to bootstrap code
Digital UNIX boot - Mon Nov 1 17:21:23 EST 1999
can't open vmunix.pre_capmn
Enter [option_1 ... option_n]
Hit to boot default kernel 'vmunix.pre_capmn':
This is due to a wrong value of the boot_file variable at Firmware level:
>>>show boot*_file
boot_file vmunix.pre_capmn
booted_file vmunix.pre_capmn
3.3.3 Corrective action
Modify the boot_file variable:
>>>set boot_file vmunix
Verify the boot_file variable:
>>>show boot_file
boot_file vmunix
3.4 dataPatch.bul" error during scratch installation in B9 MR4
3.4.1 Reference FR: 3BKA13FBR185034
3.4.2 Problem description
During scratch installation in B9 MR4,MFS in conf mode,the following error appears after sending "dataPatch.bul" file downloaded from IMT in BUI-->Reception view :
*Request #779 was refused => CREATE_RSP conf_alarm [cfgalarm101AH] (
/* Errors : ***************/ generic_err = ENCA_DUP_MO_INSTANCE : The specified object instance already exists) ;
This error appears because "CREATE_RSP conf_alarm [cfgalarm101AH]" has been already configurated by "02_mfsconfig.bul"
3.4.3 Corrective action
This command has been added in datapatch.bul, in order to correct a FR in RC40 branch. This evolution has been added at the end of file in order to not prevent the correct loading of other patch information. Thus the error is expected without bad effect on the MFS.
4 MFS based on RC40
What/behaviorTrouble originFix
1) Installation fails during ftp phaseexpect does not recognize not English wordsUse a PC configured in English
2) Installation stops during the inconf phaseIncorrect additional disk configurationClear disk label
3) Inall procedure is stopped in inpatch stepHalt Button is IN, BOOT NOT POSSIBLE" under ">>>>" type "boot" then from the PC, in the Expect session, type "inall"
4) Popup window pearl failed at step 1 of SWC/tmp partition is fullFree disk space
5) Vmunix file access impossibleImpossible UNIX bootWith UNIX CD
6) Rebuild of mirrored partitionsStructure of mirrored partitions is broken With install_rc40_lsm
7) Lost of connection between OMC and MFS,active CS is blocked after automatic backup MIB on RC40a file set is corruptedReinstall from scratch if patch is missing
4.1 Installation from a not English PC fails
4.1.1 Reference FR: FR 3BKA20FBR1663584.1.2 Problem description
The phrases in Portuguese are given by the installation PC during the ftp. They are not recognized by the script expect that is the program to recognize the standard words of ftp in English and in French only.
4.1.3 Corrective action
Use a PC configured in English4.2 MFS installation failed
Incorrect additional disk configuration : HP must provided additional disk without any OS already installed. The internal additional disk must be: not formatted no partionning no labelling.
This requirement is mandatory for the first factory installation.
4.2.1 Reference FR: none
4.2.2 Problem description
The automatic MFS RC40 installation stops during the inconf phase. This problem can be pointed out by reading the inconf_STATION_A.log ( or inconf_STATION_B.log ) log file in the PC used for the installtion in the following directory /expect/bin/log.
In the problem occurs then the following sequence of lines appear in the log file :
Error: partition /dev/nfm/vol0a and overlapping partition(s) are
marked in use in the disklabel. Use "disklabel -e" to fix the
disklabel if it is improperly labeled.
start Actif FMA retcode -1 errno 0
Jun 8 21:16:20 STATION_A FM_Agent_stdalone[63148]: start_active: mount /omcxchg failed ret 256
4.2.3 Corrective action
Here is the way to fix the problem.
A) Check the second disk is formatted with UNIX BSD4.2 by using the command :
disklabel dsk1
B) You should have information like :
# /dev/rdisk/dsk1c:
type: EIDE
disk: 6E040L0
label:
flags: dynamic_geometry
bytes/sector: 512
0 ( 24, 41) FDX
sectors/track: 63
tracks/cylinder: 16
sectors/cylinder: 1008
cylinders: 16383
sectors/unit: 78165360
rpm: 4500
interleave: 1
trackskew: 0
cylinderskew: 0
headswitch: 0 # milliseconds
track-to-track seek: 0 # milliseconds
drivedata: 0
8 partitions:
# size offset fstype fsize bsize cpg # ~Cyl values
a: 131072 0 unused 0 0 # 0 - 130*
b: 262144 131072 unused 0 0 # 130*- 390*
c: 78165360 0 4.2BSD 1024 8192 16 # 0 - 77544
d: 0 0 unused 0 0 # 0 - 0
e: 0 0 unused 0 0 # 0 - 0
f: 0 0 unused 0 0 # 0 - 0
g: 38886072 393216 unused 0 0 # 390*- 38967*
h: 38886072 39279288 unused 0 0 # 38967*- 77544
You can see that the BSD 4.2 is present
C) Clear the disk label by using the command :
disklabel -z dsk1
D) Set the standard label name dsk1 :
disklabel -wr dsk1
E) Re-check the result :
disklabel dsk1You should have information like :
/dev/rdisk/dsk1c:
type: EIDE
disk: 6E040L0
label:
flags: dynamic_geometry
bytes/sector: 512
0 ( 24, 41) FDX
sectors/track: 63
tracks/cylinder: 16
sectors/cylinder: 1008
cylinders: 16383
sectors/unit: 78165360
rpm: 4500
interleave: 1
trackskew: 0
cylinderskew: 0
headswitch: 0 # milliseconds
track-to-track seek: 0 # milliseconds
drivedata: 0
8 partitions:
# size offset fstype fsize bsize cpg # ~Cyl values
a: 131072 0 unused 0 0 # 0 - 130*
b: 262144 131072 unused 0 0 # 130*- 390*
c: 78165360 0 unused 1024 8192 16 # 0 - 77544
d: 0 0 unused 0 0 # 0 - 0
e: 0 0 unused 0 0 # 0 - 0
f: 0 0 unused 0 0 # 0 - 0
g: 38886072 393216 unused 0 0 # 390*- 38967*
h: 38886072 39279288 unused 0 0 # 38967*- 77544
Now the partition c is unused.
F) Restart the installation from the beginning by typing on the PC used for the installation the following command :
Open a DOS session and type :
cd C:\expect\bin
tclsh80 clear
tclsh80 inall
4.3 Inall procedure stopped due to a station in "halt in" state
Inall procedure is stopped on station A in inpatch step.
4.3.1 Reference FR: 3BKA13FBR166921
4.3.2 Problem description
Inpatch step proceeds to a boot of the station. This boot is refused with the following message displayed on screen (also in log file inpatchSTATIONA.log)
">>>boot
Halt Button is IN, BOOT NOT POSSIBLE".
4.3.3 Corrective action
In order to continue the installation the following has been applied successfully: - log on station A by Iolan - under ">>>>" prompt, type "boot" - when station at UNIX level ("login:" prompt is displayed), then from the PC, in the Expect session, type "inall"
4.4 Failure during the SWC from the OMC at step 1/10 (before file transfer)
4.4.1 Reference FR: none4.4.2 Problem description
Sometimes, the /tmp partition is full and the migration is stopped by errors displayed in a popup window with the following message on the IMT: "perl failed"
On the OMC:
- the log file (/alcatel/var/home/axadmin/alcatel/debug/s_Thu May 13 16:24:10 CEST 2004.out, the folowing message should be appear:
SCGui - NewFTPSoftChange () - IOException raised: java.io.IOException: Not enough space
In /var/adm/messages the following message should be appear:
On May 13 18:03:02 omcr08 unix: WARNING: /tmp: File system full, swap space limit exceeded 4.4.3 Corrective action
On OMC, login as root and check the available disk space, especially in /tmp and /alcatel partition by using the 'df -k' command and do a cleanup if needed.
4.5 Unix boot impossible
4.5.1 Reference FR: None.
4.5.2 Problem description
Unix cant boot because it cant access to vmunix file
4.5.3 Corrective action
Following actions have to be performed:
insert the Unix 4.0F CDROM
(warning : do not use an MFS+UNIX INSTALL CDROM which reformats the disks automatically)
boot dka400(AS800)
boot dqb0
(DS10)
cd /dev
./MAKEDEV rz0
cd /etc/fdmns
touch .adfslock_root_domain
mkdir root_domain
cd root_domain
ln s /dev/rz0a .
cd /
mount root_domain#root /mnt
4.6 Rebuild of mirrored partitions on RC40
4.6.1 Reference FR: None.
4.6.2 Problem description
It can happen that the mirrored partitions need to be rebuilt4.6.3 Corrective action
check MIB consistency according to chapter 9.11
REF _Ref132006709 \r \h \* MERGEFORMAT Error! Reference source not found. and verify a backup MIB is available under /usr/backup_mib.
install patch MFS.PATCH.B9_0.RC40.41FP_00x (not yet available)
unplug JBETIS ethernet cables to avoid any GPU reset during operations on control stations
stop both control stations (firstly on standby then on active):
# /usr/mfs/bin/mfs_stop_nectarlog on STATION_A and launch the following command:
# cd /usr/tools
# ./install_rc40_lsmboth stations will reboot after "install_rc40_lsm". Then all processes of both stations will startup.
If one station is seen failed in the nectar view of IMT, do a Clear_alarm on the failed station, this will clear the associated alarm and reboot the station
If everything is fine, reconnect the ethernet cables of the JBETIs in a non busy hour, because GPUs will reset.
Note: this procedure will erase the PM counters in /omcxchg partition. Therefore and if possible, these have to be saved before rebuild of the mirrored partitions and restored after if needed.
4.7 active Control Station is blocked after automatic backup MIB on RC404.7.1 Reference FR: 3BKA13CBR1894734.7.2 Problem description
If connection between OMC and MFS is lost since 02h AM (check the alarm history) and if active station not reachable with telnet or rlogin, it is probably due to a weakness of the unix system (file set corrupted).Therefore, it is requested to perform the procedure below in order to get information for analysis.
WARNING: do not power off/on the blocked control station because you will lose the possibility to generate a crash4.7.3 Corrective action
Access the concerned station to the console port.
1. Access the RMC chip with the sequence of keystroke:
rmc
2. Then enter halt mode:
RMC> halt in
3. Here, force the crash of the machine:
>>> crash
As the main processor execution was frozen by the RMC chip, the core which will be generated will contain all needed information related to the running software agents.
4. Access again the RMC chip with the same sequence of keystroke as above in step (2)
5. Exit the halt mode:
RMC> halt out
After typing at least one the system should run under the firmware of the main processor.
6. . The machine does not reboot automatically:
>>> boot
Then release the console.
7. On the control station, the directory /var/adm/crash includes both files vmunix.n and vmzcore.n with the right time. Produce the text file to exploit them:
/usr/bin/crashdc vmunix.n vmzcore.n > crashdata.n.txt
8. when prompt is available, logon on and type the command:/usr/sbin/sys_check -escalateIt will work for 30-60 minutes and produce a file named escalate.tar
4.7.4 Impacts
When the problem is encountered there is a risk of GPRS telecom outage if standby station is not not able to switch as active.
If T64KIT1000532-V51AB24-20060417 patch is not installed then you need unfortunately to reinstall from scratch the system.
To check if T64KIT1000532-V51AB24-20060417 patch is present, use the following command
# dupatch -track -type kit -nolog | grep "T64KIT1000532-V51AB24-20060417 OSF520"
5 MFS based on MX
What/behaviorTrouble originFix
1) "inall" phase failed during MX-MFS installation Bad BIOS settings on the OMCP board Configure the BIOS settings in line with MX-MFS installation method
2) Connection to OMCP using console redirection does not workConsole redirection not working (
2 console redirection opened at the same time)
Close the other console redirection
3) "inall" phase failed during MX-MFS installationShMC_1 is not activeTrigger a ShMC switch-over
4) "inall" phase failed during MX-MFS installation NFS server not correctly configured Configure it again, and launch again the installation
5) Error at Step 2/10 (Creation)Full partitionClean log files
6) Error at Step 3/10 (Verify)Set TOLERANT VERIFY option
7) Error at Step 7/10 (Validation) Old version not deleted Delete old version
8) Stand by station not operationalClear alarm
9) Ethernet connection problemCheck IP configurationMFS_inet
10) Impossible to connect IMTRcp blockedKill rcp process
11) After Power-on of ATCA shelf, OMCP servers are powered-offOMCP HW problemUnplug and plug back OMCP board
12) How to update time from OMC
13) Ne1oE supervision lost
14) Extension from 1 shelf configuration to 2 shelves configurations has failed
15) No RRALLI sent on GSLs, for all cells of the BSC, after activate MLU"Re-initialyze GPRS" for all the cells of the BSC
16) Hangingalarm"Cardvoltageoutofrange"forMFSJBXSSWafterpowerfailureTOMAS MD4/SP1 HW management limitationPut ShMC active in the same plan as faulty JBXSSW board and unplug and plug back JBXSSW board
17) PV_PEM alarms raised and not cleared from IMT and OMCR after MFS power off/onTOMAS MD4/SP1 HW management limitationPower Off/Power On the PV_PEM board on alarm.
18) mfssetup or configure_switch failure after replacing a new SSW board.Unexpected VLAN configuration definition already existing on SSW BoardDelete unexpected VLAN configuration SSW Board.
5.1 "Inall" failed during MX-MFS installation
5.1.1 Reference FR: None.
5.1.2 Problem description
In some cases, MX-MFS installation can failed during "inall" phase. This can be due to wrong BIOS settings.
5.1.3 Corrective action
Check the BIOS settings of the control stations, which can be accessed through the console redirection.
At any terminal accessing one of the ShMC, it is possible to access the console of STATION_A (resp. STATION_B) by typing:
STATION_x> telnet 172.17.3.8 4503
(for STATION_A)
STATION_x> telnet 172.18.3.9 4504
(for STATION_B)
Reboot the station, and enter the BIOS by typing following sequence: ESC Shift 2
5.2 Connection to OMCP using console redirection does not work
5.2.1 Reference FR: None.
5.2.2 Problem description
When trying to telnet to the CSs via console redirection of the ShMC, action is refused with following message
Trying 172.17.3.8...
Connected to 172.17.3.8.
Escape character is '^]'.
Linux 2.4.22-1.0.48 (172.17.3.3) (ttyp1)
ICR: error in open() of /dev/icr02, No such device
Connection closed by foreign host.
5.2.3 Corrective action
Check that the console redirection is not already used to access another board of the ATCA shelf by typing following command on the active Shelf Manager
shm3s8:~ # ps -efd | grep "telnetd -E"
If this command returns a command, for instance:
549 root 1072 S telnetd -E /usr/bin/icr /dev/icr01
kill the process.
5.3 "inall" failed during MX-MFS installation
5.3.1 Reference FR: None.
5.3.2 Problem description
Installation of MX-MFS failed during "inall" phase to a problem with console redirection of OMCP.
5.3.3 Corrective action
Step 1: Check 3.2
Step 2: Check that active Shelf Manager is ShMC_1 (172.17.3.8):
On the Installation PC, telnet on 172.17.3.8 (login:root, password:root):
Then type following command
shm3s8:~ # sv_status
The ShmC is active if this command returns:
get status-> openhpid on (localhost:5566)
openhpid is active
Otherwise, in order to trigger a ShMC switch-over, type following command:
shm3s8:~ # sv_activate
activate-> openhpid on (localhost:5566)
5.4 "inall" failed during MX-MFS installation
5.4.1 Reference FR: None.
5.4.2 Problem description
Installation of MX-MFS failed during "inall" phase and specifically in indu or inacp phase.
In c:/TomasInstall/Log/Indu_STATION_x.log we can see :
echo mount -o mountport=$mountport,nolock $bootserver:/d /mnt
mount -o mountport=,nolock 172.17.3.4:/d /mnt
init-2.05a# mount -o mountport=$mountport,nolock $bootserver:/d /mnt
mount: 172.17.3.4:/d failed, reason given by server: Permission denied
or in c:/TomasInstall/Log/inacp_STATION-x.log you can see a similar error on a mount command
5.4.3 Corrective action
Put the correct configuration for the NFS server and don't forget the user access
5.5 Error at step 2/10 (Creation) ( MFS Evolution only )
5.5.1 Reference FR
None.
5.5.2 Problem description
During the software replacement the phase CREATION is stopped by errors displayed in a popup window with the following:
Generic error enca_ope_failed ensw file check error PILOT/A - The file copy from delivery fileset to target fileset failed : fsync error
That means the root file system ( / ) of the active station should be full (100%).
5.5.3 Corrective action
First open a xterm, and type " df -k ". Check that /usr and /var directories are not full (less than 85% used). If it's not the case, go to
/usr/mfs/log => remove all big trace files (*.old, and TraceGOM if exist)
/var/local/nectar/dated = > remove all Core files and old traces
Then check again the space left with " df -k " command. Do not begin the migration in case the space left is too small.
See following directories:
/usr
/usr/mfs/log
/var/local/nectar
/RESULT
5.6 Error at step 3/10 (Verify) ( MFS Evolution only )
5.6.1 Reference FR: None.
5.6.2 Problem description
During a software change or a migration, after the click on < Next > button in step 3, The IMT pops up an alert window with the following text: "Error occur , see log file".
This means that the software replacement is stopped due to errors found at the VERIFY phase.
5.6.3 Corrective action
In that specific case, software change can be forced with TOLERANT_VERIFY option.
To be completed.
Error at step 7/10 (Validation) - ( MFS Evolution only )
5.6.4 Reference FR: None.
5.6.5 Problem description
During a software change or a migration, after the click on < Next > button in step 7 (ready to validate new version), the IMT terminal disconnects then re-connects, signaling that there is a software change in progress: < SW change in progress do you want to continue it ? >.
If < Yes > button, following information is display in the Software Change Window:
< Old version name > in state created.
Step 1/10
< Old version name > version will be installed.
5.6.6 Corrective action
Delete the old version, to complete the software change or the migration, follow the next steps:
1 Click on < Software Management / Software Change >.
2 Click on < Back > button.
3 The following message will appear: < Do you want to install < new version name > version ? >. Click on < No > button.
4 Click on < Software Management / MFS versions >. Result is:
Version 1 : < new version name >
state : validated
5 Software change or migration from < old version name > to < new version name > is fully completed.
5.7 The stand-by station is not operational ( MFS Evolution only )
5.7.1 Reference FR: None.
5.7.2 Problem description
At the terminal (login : root) on the concerned station (Hereafter " x " stands for " A " or " B "):
STATION_x> ps -ef | grep mfs
STATION_x> ps -ef | grep nectar
There is no processes running or only SUA processes (nectar).
At the IMT, on the BUI->request window, when launching the command:
get sta[PILOT/x] (*);
The "system_state" of the station is either "initializing" or "not_installed" (the normal awaited state is "stand-by").
5.7.3 Corrective action
At the IMT, in the Site view, check the STA states (aspA or aspB):
1. If the state is disabled/not installed, at the IMT window (clicking on right button), perform a clear_alarm.
2. If the state is initializing, then wait up to 40 minutes.
Check with alias "stg" that TOMAS is started with following command:
stg
If answer is no, type following command
st yes
mfs_start_tomas -site
5.8 Ethernet connection problem ( MFS Evolution only )
5.8.1 Reference FR : none
5.8.2 Problem description
No ping from STATION_A/B (or from PC) to STATION_B/A on network 172.17.0.0 or 172.18.0.0.
No ping from any other station/PC on STATIONA/B on IP general network address
5.8.3 Corrective Action
Check the Ethernet configuration of the control station. On system console:
ifconfig a
The result must contain at least the following line (given case : STATION_B):
eth0 Link encap:Ethernet HWaddr 00:80:42:17:5E:60
inet addr:172.17.3.4 Bcast:172.17.255.255 Mask:255.255.0.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:2546730 errors:0 dropped:0 overruns:0 frame:0
TX packets:896030 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:100
RX bytes:312772681 (298.2 Mb) TX bytes:344701015 (328.7 Mb)
Base address:0x2000 Memory:fe800000-fe820000
eth0.5 Link encap:Ethernet HWaddr 00:80:42:17:5E:60
inet addr:139.54.96.205 Bcast:139.54.96.255 Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:338995 errors:0 dropped:0 overruns:0 frame:0
TX packets:55852 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:28176315 (26.8 Mb) TX bytes:8401714 (8.0 Mb)
eth0.5:@0 Link encap:Ethernet HWaddr 00:80:42:17:5E:60
inet addr:139.54.98.210 Bcast:139.54.98.255 Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
eth0:-ECC Link encap:Ethernet HWaddr 00:80:42:17:5E:60
inet addr:172.17.0.20 Bcast:172.17.255.255 Mask:255.255.0.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
Base address:0x2000 Memory:fe800000-fe820000
eth0:-V3 Link encap:Ethernet HWaddr 00:80:42:17:5E:60
inet addr:172.16.3.3 Bcast:172.16.255.255 Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
Base address:0x2000 Memory:fe800000-fe820000
eth0:ALI1 Link encap:Ethernet HWaddr 00:80:42:17:5E:60
inet addr:172.17.3.200 Bcast:172.17.255.255 Mask:255.255.0.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
Base address:0x2000 Memory:fe800000-fe820000
eth0:GPNE Link encap:Ethernet HWaddr 00:80:42:17:5E:60
inet addr:172.19.33.1 Bcast:172.19.255.255 Mask:255.255.0.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
Base address:0x2000 Memory:fe800000-fe820000
eth0:MUNE Link encap:Ethernet HWaddr 00:80:42:17:5E:60
inet addr:172.17.33.1 Bcast:172.17.255.255 Mask:255.255.0.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
Base address:0x2000 Memory:fe800000-fe820000
eth0:nfs Link encap:Ethernet HWaddr 00:80:42:17:5E:60
inet addr:172.17.3.100 Bcast:172.17.255.255 Mask:255.255.0.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
Base address:0x2000 Memory:fe800000-fe820000
eth0:ntp Link encap:Ethernet HWaddr 00:80:42:17:5E:60
inet addr:172.32.0.166 Bcast:172.32.255.255 Mask:255.255.0.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
Base address:0x2000 Memory:fe800000-fe820000
eth1 Link encap:Ethernet HWaddr 00:80:42:17:5E:61
inet addr:172.18.3.4 Bcast:172.18.255.255 Mask:255.255.0.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:2365322 errors:0 dropped:0 overruns:0 frame:0
TX packets:1010319 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:100
RX bytes:286602938 (273.3 Mb) TX bytes:770450537 (734.7 Mb)
Base address:0x2040 Memory:fe820000-fe840000
eth1.5 Link encap:Ethernet HWaddr 00:80:42:17:5E:61
inet addr:139.54.97.206 Bcast:139.54.97.255 Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:1565 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:0 (0.0 b) TX bytes:103230 (100.8 Kb)
eth1:-ECC Link encap:Ethernet HWaddr 00:80:42:17:5E:61
inet addr:172.18.0.20 Bcast:172.18.255.255 Mask:255.255.0.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
Base address:0x2040 Memory:fe820000-fe840000
eth1:-V4 Link encap:Ethernet HWaddr 00:80:42:17:5E:61
inet addr:172.16.4.3 Bcast:172.16.255.255 Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
Base address:0x2040 Memory:fe820000-fe840000
eth1:ALI2 Link encap:Ethernet HWaddr 00:80:42:17:5E:61
inet addr:172.18.3.200 Bcast:172.18.255.255 Mask:255.255.0.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
Base address:0x2040 Memory:fe820000-fe840000
eth1:MUNE Link encap:Ethernet HWaddr 00:80:42:17:5E:61
inet addr:172.18.33.1 Bcast:172.18.255.255 Mask:255.255.0.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
Base address:0x2040 Memory:fe820000-fe840000
eth1:mir Link encap:Ethernet HWaddr 00:80:42:17:5E:61
inet addr:172.18.3.111 Bcast:172.18.255.255 Mask:255.255.0.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
Base address:0x2040 Memory:fe820000-fe840000
eth1:nfs Link encap:Ethernet HWaddr 00:80:42:17:5E:61
inet addr:172.18.3.100 Bcast:172.18.255.255 Mask:255.255.0.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
Base address:0x2040 Memory:fe820000-fe840000
eth1:ntp Link encap:Ethernet HWaddr 00:80:42:17:5E:61
inet addr:172.32.0.166 Bcast:172.32.255.255 Mask:255.255.0.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
Base address:0x2040 Memory:fe820000-fe840000
lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
UP LOOPBACK RUNNING MTU:16436 Metric:1
RX packets:143 errors:0 dropped:0 overruns:0 frame:0
TX packets:143 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:11432 (11.1 Kb) TX bytes:11432 (11.1 Kb)
To configure the network, use the command
/usr/mfs/bin/mfs_inet
5.9 Impossible to connect IMT ( MFS Evolution only )
5.9.1 Reference FR : 3BKA20FBR175917
5.9.2 Problem Description
It may happen that the IMT can not be launched anymore.
5.9.3 Corrective Action
On system console:
STATION_A# ps -efd | grep rcp
Then, kill process linked to following commands:
* rcp /etc/nectar/data/ncma_init_data STATION_B:/etc/nectar/data/ncma_init_data 2>/dev/null* rcp STATION_B:/etc/group /var/tmp/craft_srvtmp/grpfile2
5.10 After Power-on of ATCA shelf, OMCP servers are powered-off ( MFS Evolution only )
5.10.1 Reference FR 3BKA20FBR172514
5.10.2 Problem description
It may be observed sporadically that after a power on of a MFS Evolution, the OMCP boards don't start : no LED are ON on this board and mainly the blue LED is OFF. Preventive actions : When manually switching on a MFS Evolution, power on all sub-racks at the same time : i.e. A1&B1 switches must be switched on at the same time than A2&B2.
5.10.3 Corrective action
In this case both OMCP boards must be unplugged and plugged back in.
5.11 How to update time from OMC ( MFS Evolution only )
5.11.1 Reference FR
3BKA13FBR141970
5.11.2 Problem description
It happens that the time is not synchronized between OMC and MFS.
5.11.3 Corrective action
On both stations do the two following commands (the inputs are just examples):
- declaration of the OMC to MFS:
#/usr/mfs/bin/mfs_addomc
Enter OMC hostname: carlsberg
Enter IP address: 192.168.17.79
carlsberg added to /etc/hosts.
Do you want to add another omc [y]?
- synchronization of the MFS with the OMC:
#/usr/mfs/bin/mfs_ntp
Enter NTP server hostname : carlsberg
Testing carlsberg...
Original /etc/ntp.conf saved to /etc/ntp.conf.pre_mfs_ntp.1.
Original /etc/rc.config saved to /etc/rc.config.pre_mfs_ntp.1.
carlsberg is an NTP server for STATION_A.
STATION_B is an NTP peer for STATION_A.
Allow to use local clock in last resort when all other NTP sources have gone away.
Restarting NTP server
Network Time Service started
The synchronization is not done immediatly, so be patient !
5.11.4 Problem solved
Not applicable
5.12 NE1oE supervision lost ( MFS Evolution only )
5.12.1 Reference FR: None
5.12.2 Problem description
The nE1oE supervision is lost on GP or MUX boards. In that case, in physical view of IMT, GP boards are red and ne1oe_operational_state is equal to "disable"
5.12.3 Corrective action
This problem can be due to a wrong configuration of tagged VLAN on JBXSSW boards. On active pilot station, enter following command:
#/usr/mfs/bin/checkVlanConfig 172.17.3.10
(for left switch of Shelf 3)
#/usr/mfs/bin/checkVlanConfig 172.18.3.20
(for right switch of Shelf 3)
#/usr/mfs/bin/checkVlanConfig 172.17.4.10
(for left switch of Shelf 4)
#/usr/mfs/bin/checkVlanConfig 172.18.4.20
(for right switch of Shelf 4)
This command must return:
Checking MXMFS vlan configuration for switch 172.17.3.10
Number of vlan configured: 4
vlanID : 1
vlanID : 3
vlanID : 5
vlanID : 3193
good configuration for egress ports on vlan 5
good configuration for forbidden ports on vlan 5
good configuration for untagged ports on vlan 5
good configuration for egress ports on vlan 3
good configuration for forbidden ports on vlan 3
good configuration for untagged ports on vlan 3
good configuration for egress ports on vlan 1
good configuration for forbidden ports on vlan 1
good configuration for untagged ports on vlan 1
(for left switches)
and
------------------------------------------------------
Checking MXMFS vlan configuration for switch 172.18.3.20
Number of vlan configured: 4
vlanID : 1
vlanID : 4
vlanID : 5
vlanID : 3193
good configuration for egress ports on vlan 5
good configuration for forbidden ports on vlan 5
good configuration for untagged ports on vlan 5
good configuration for egress ports on vlan 4
good configuration for forbidden ports on vlan 4
good configuration for untagged ports on vlan 4
good configuration for egress ports on vlan 1
good configuration for forbidden ports on vlan 1
good configuration for untagged ports on vlan 1
(for right switches)
5.13 Extension from 1 shelf configuration to 2 shelves configurations has failed ( MFS Evolution only )
5.13.1 Reference FR: None
5.13.2 Problem description
During extension from 1 shelf configuration to 2 shelves configurations, The IMT pops up an alert window with the following text: "Error occur , see log file".
This means that the shelf extension is stopped due to errors found at the tagged VLAN configuration phase.
5.13.3 Corrective action
1. Check that the new ATCA shelf is powered-on.
2. Check that the new ATCA shelf (JBXSSW boards have to be connected) is correctly connected to existing ATCA shelf containing OMCP boards (see A9130 MFS Evolution Commissioning method),
3. Check tagged VLAN configuration of the new ATCA shelf (see checks described in 5.8.3)
5.14 No RRALLI sent on GSLs, for all cells of the BSC, after activate MLU ( MFS Evolution only )5.14.1 Reference FR: 3BKA20FBR1864035.14.2 Problem description
TESTENVIRONMENT: OMCSAW20N MFSXAW20K BSCXAW21S + patch 0029 (for MLU)
An MLU was started on the platform. After Activate MLU, on GSL traces, RRALLI messages were sent by the BSC to MFS
5.14.3 Corrective action
The workaround find for this problem is "Re-initialyze GPRS" for all the cells of the BSC.
5.15 Hangingalarm"Cardvoltageoutofrange"forMFSJBXSSWafterpowerfailure ( MFS Evolution only )
5.15.1 Reference 3BKA20FBR199071
5.15.2 Problem description
After powering off by accident this equipment and powering on again, False CRITICAL alarm (JBXSSWs: 'card voltage out of range') is reported to operator. They are not cleared but hanging in MFS.
5.15.3 Corrective action
No correction availbale in TOMAS MD4/SP1 and TOMAS MD5.
Workaround :
The workaround to suppress a switch alarm is to extract, then insert the concerned switch. This can be done only if the switch is not on the same switch plane than the active ShMC.
For an alarm on switch plane LSN1, the Shelf Manager on plane LSN1 must be active: Determine the active Shelf Manager: From the active station, execute "/usr/nectar/bin/sv_status 172.17.3.8" (this IP address matches to Shelf Manager plane 1 in Shelf 3, with a subnet 172.17.0.0).
If the response is "openhpid is active", ShMC 1 is active, then switch plane LSN2 can be extracted. If the response is "openhpid is standby", ShMC 1 is standby, execute "/usr/nectar/bin/sv_activate 172.17.3.8" to make it active. After a check with sv_status, switch plane 2 can be extracted.
Do the opposite for an alarm on switch plane LSN2.
5.16 PV_PEM alarms raised and not cleared from IMT and OMCR after MFS power off/on ( MFS Evolution only )
5.16.1 Reference 3BKA20FBR186125
5.16.2 Problem description
After powering off / powering on MX MFS equipment according to method documentation, some alarm "Failure of a chassis unit" on some PV_PEM board may remain.
5.16.3 Corrective action
Power off/Power on PV_PEM board on alarm. The corresponding alarms should disappear from IMT. In order to eliminate the alarm from OMCR, it is necessary to perform "Audit Alarm" from MFSUSM.
5.17 mfssetup or configure_switch failure after replacing a new SSW board. ( MFS Evolution only )
5.17.1 Reference 3BKA13FBR199323
5.17.2 Problem description
Some errors may occur after launching mfssetup or configure_switch tools around VLAN configuration after making a SSW board replacement. Not expected VLAN definition , expecially VLAN tag 34 and 35 may be already defined and blocked any mfssetup or configure_switch actions.
5.17.3 Corrective action
Delete unexpected VLAN configuration ( VLAN 34 and 35 definition ) by launching the following commands :
Check VLAN definitions of SSW board :
/usr/mfs/bin/checkVlanConfig @SSW
with @SSW representing the IP adress of the SSW board.
If unexpected VLAN detected ( especially VLAN 34 or VLAN 35 ) launched the corresponding commands :
For VLAN 34 definition :
/usr/mfs/bin/ del_vlan_mxmfs @SSW 34
with @SSW representing the IP adress of the SSW board
For VLAN 35 definition :
del_vlan_mxmfs @SSW 35
with @SSW representing the IP adress of the SSW board
6 AUTOMATIC SOFTWARE CHANGE
Note: Pre requisit for SWC are described in Installation user guide, reference [1].
What/behaviorTrouble originFix
1) Bad exec of ins_swcx.shFiles created by wrong ownerClean up and restart
2) rmdir fails during execution of ins_swcx.shcygwin is installed on the PCRename /usr/bin/rmdir.exe
3) Error Temporary local directory error at starting timeBad loginLog as admin and restart
4) Step 2: File Access Error for the /DELIV/dlv.bck fileone /nfs partition not seen on standby CS Relauch standby CS with BUI command restart
5) Step 2: CreationRoot file system (/) of active CS is fullFree disk space
6) Step 3: VerifyVarious originCheck 4.3 chapter
7) Step 5: IsolationVarious originCheck 4.4 chapter
8) Step 6: Major version changeNew active CS rebootsCheck Shared disk state
9) Step 7: strange IMT displayOld version not deletedDelete old version
10) CS reboots in loop with reset-code 214/etc/sysconfigtab corruptedAdd missing lines
11) UNIX patch installation makes Control Station unusable (B9 MR1 ED2)Current kernel is cleaned and new kernel is not generatedRestore the backup system
12) UNIX patch installation fails with a core file generated from 'install_patch_du'PWD variable is not setpatches_DUNIX4.0F-22-13_SEC10 and upper for RC23
patches_DUNIX5.0A-24-4_SEC11 and upper for RC40
13) bul file execution returns 1 errorDataPatch*.bul already launched during SW migration/ReplacementNo need to execute again
14) Step 3: Verify- there are no double links for /usr/mfs/bin/clean_spdata
- version descriptor files are not correctly restored- make the double link manually on STATION_B
- launch install_lsm
6.1 Error during execution of ins_swcx.sh
6.1.1 Reference FR: None.
6.1.2 Problem description
A previous software was performed with a bad userid : that creates files owned by a wrong userid and prevents file creation by the automatic software change.
All the software change from the OMC must be performed with username = axadmin for OMC and admin for IMT, otherwise there can be error during SW change.
6.1.3 Corrective action
Remove (logged as root on OMC) the following files and directories if existing:
/var/tmp/cw323mt.dll/var/tmp/indus_ngp_del_desc_file.pl/var/tmp/install.pl/var/tmp/paexr.exe/var/tmp/perl.dll/var/tmp/perl.exe/alcatel/var/home/axadmin/alcatel/tmp_mfs (directory => rm rf )/alcatel/tmp_mfs (directory => rm rf)
Then, login as axadmin on OMC and perform again the preinstallation (ins_swcx.sh).
Thereafter, reopen the IMT with username = admin .
6.2 rmdir fails during execution of ins_swcx.sh when cygwin is installed on the PC
6.2.1 Reference FR: 3BKA13FBR163888
6.2.2 Problem description
Rmdir fails with the following error:
rmdir: option invalide -- q Pour en savoir davantage, faites: `rmdir --help'.
6.2.3 Corrective action
rename /usr/bin/rmdir.exe to /usr/bin/rmdir.exe.sav launch again ins_swcx.sh
6.3 Error Temporary local directory error on IMT during step 0
6.3.1 Reference FR: None.
6.3.2 Problem description
When trying to start automatic software change, an error happens
6.3.3 Corrective action
Log as admin user on IMT and OMC-R in order to perform automatic software change.
6.4 Error File Access Error" with dlv.bck always appears when doing SW replacement
6.4.1 Reference FR: 3BKA20FBR150527
6.4.2 Problem description
When performing a SW Replacement, the step 1/10 of the procedure completes, but when it is in step 2/10, there is an error message in the IMT "File Access Error" for the file /DELIV/dlv.bck
Problem comes that one /nfs partition is not seen on the stanby station, so that /DELIV can not be seen on both stations.
6.4.3 Corrective action
Connect on Standby station and type :
df -k
You must see the following result concerning xxx.nfs partitions :
secure_serveur.100:/var/nse/mnt/secure_serveur/RESERVED 102400 33 97736 1% /var/nse/mnt/secure_serveur/RESERVED.nfssecure_serveur.100:/var/nse/mnt/secure_serveur/BACKUP 102400 16 95008 1% /var/nse/mnt/secure_serveur/BACKUP.nfssecure_serveur.100:/var/nse/mnt/secure_serveur/DELIV 512000 101921 403624 21% /var/nse/mnt/secure_serveur/DELIV.nfssecure_serveur.100:/var/nse/mnt/secure_serveur/RESULT 307200 5372 295144 2% /var/nse/mnt/secure_serveur/RESULT.nfssecure_serveur.100:/var/nse/mnt/secure_serveur/omcxchg 102400 585 95680 1% /var/nse/mnt/secure_serveur/omcxchg.nfssecure_serveur.100:/var/nse/mnt/secure_serveur/spdata 65536 7434 52232 13% /var/nse/mnt/secure_serveur/spdata.nfsIf you have not these elements in the output, you must relaunch the standby station.
At the IMT, on the BUI->request window, you must type the following command
if the standby station is STATION_A :
action sta [PILOT/A] (restart());if the standby station is STATION_B :
action sta [PILOT/B] (restart());Then, check on Nectar view that the standby station has come up.
Do a roll-back until step1 and try the SW replacement again.
6.5 Error at step 2/10 (Creation)
6.5.1 Reference FR: None.
6.5.2 Problem description
During the software replacement the phase CREATION is stopped by errors displayed in a popup window with the following:
Generic error enca_ope_failed ensw file check error PILOT/A - The file copy from delivery fileset to target fileset failed : fsync error
that means the root file system ( / ) of the active station should be full (100%).
6.5.3 Corrective action
First open a xterm, and type df k. Check that /usr and /var directories are not full (less than 85% used). If its not the case, go to
/usr/mfs/log => remove all big trace files (*.old, and TraceGOM if exist)
/var/adm/crash => remove vmunix & vmcore files (\rm vm*)
/var/adm/nectar/crash => remove all Dump file & Core files (\rm core*, \rm Dump*)
Then check again the space left with df k command. Do not begin the migration in case the space left is too small.See following directories:
/var/adm/nectar/log
/usr/mfs/log
/var/adm/nectar/crash
/RESULT
Perform also the quotacheck command to report the discrepancies between the calculated and recorded disk quota:
On active Control Station:
quotacheck -v /var
quotacheck -v /usr
quotacheck -v /
quotacheck -v /DELIV
quotacheck -v /spdataquotacheck -v /omcxchg
quotacheck -v /RESULTOn standby Control Station:
quotacheck -v /var
quotacheck -v /usr
quotacheck -v /
6.6 Error at step 3/10 (Verify)
6.6.1 Reference FR: 3BKA20FBR099035 = 3BKA13FBR1023556.6.2 Problem description
The IMT pops up an alert window with the following text: Error occur , see log file.
This means that the software replacement is stopped due to errors found at the VERIFY phase.
6.6.3 Corrective action
Open the BUI reception view on IMT to see more details.
Only four current cases are described below:
6.6.3.1 Many errors found
The best to do in this case is to remove and destroy the version by clicking several times on back button on IMT and perform again the automatic software change. The installation was probably badly performed.
6.6.3.2 bad state error
Example:> --- Software management error ---> Failed on request: action version[MFSSAT05_06A](verify());> Message for request #63 =>ACTION_RSP version [MFSSAT05_06A]
(
verify(), /* Errors : ***************/
generic_err= ENCA_MAJOR_ERROR : A major error occurred during the action ...,
specific_err= ENSW_CHECKSUM_ERROR: component checksum error,
text_err= "PILOT/A - /usr/mfs/bin/mfsQ3Agt"
) ;
> _____ Abortive session for request #63 => ACTION_RSP version [MFSSAT05_06A]
(
verify(), /* Errors : ***************/
generic_err= ENCA_OPE_FAILED : the operation cannot be executed,
specific_err= ENCM_PF_VERSION_BAD_STATE: The specified version is in a bad state for this request,
text_err= "PILOT/A - /usr/mfs/bin/mfsQ3Agt"
) ;
> --- Software management error end ------
Rollback to the step two of the Software change.
Perform again a software change
6.6.3.3 Checksum errors on the MIB files
These files are located into the /spdata directory.
This procedure is to be used only with MIB files (i. e. files located in /spdata directory) , as the final purpose is to get rid of MIB checksum.
When platforms will be installed with version MFSSAT05.05L and further).Example:Message for request #3 =>
ACTION_RSP version [MFSSAT05_05L]
(
verify(),
/* Errors :
***************/
generic_err = ENCA_MAJOR_ERROR : A major error occurred during the action ...,
specific_err = ENSW_CHECKSUM_ERROR : component checksum error,
text_err = "PILOT/B