201
Site VELIZY EVOLIUM™ SAS Originators MFS integration team MFS TROUBLESHOOTING GUIDE B9 RELEASE System : ALCATEL 900 / BSS Sub-system : MFS Document Category : USER GUIDE ABSTRACT This document constitutes the reference location for storing troubleshooting actions related to operation of MFS B9. It is restricted to ALCATEL internal usage, notably for ALCATEL personnel providing on site support at customer premises. This document will be updated each time new problem occurs. Approvals Name App. J-J BELLEGO G. ACBARD B. FERNIER Name App. D. COTTIN ED 14 Release MFS Troubleshooting guide release B9 EVOLIUM document.doc 27/11/2006 3BK 29042 JAAA PWZZA 1/201

MFS Trouble Shooting Guide B9 ed14.doc

Embed Size (px)

Citation preview

Site

____________________________

____________________________

Site

VELIZYEVOLIUM SAS

Originators

MFS integration teamMFS TROUBLESHOOTING GUIDE

B9 RELEASE

System:ALCATEL 900 / BSS

Sub-system:MFS

Document Category:USER GUIDE

ABSTRACT

This document constitutes the reference location for storing troubleshooting actions related to operation of MFS B9. It is restricted to ALCATEL internal usage, notably for ALCATEL personnel providing on site support at customer premises.

This document will be updated each time new problem occurs.

Approvals

Name

App.J-J BELLEGOG. ACBARDB. FERNIER

Name

App.D. COTTIN

REVIEW

ED 12 RL07-07-06Reading report EVOLIUM/R&D/TD/MFS/2006-4968-PME

ED 13 RL27-09-06Reading report EVOLIUM/R&D/TD/MFS/ 2006-5042-PME

ED 14 RL27-11-06Reading report EVOLIUM/R&D/TD/MFS/ 2006-5092-PME

HISTORY

Ed. 01 Proposal 01Cancelled B8 chapters (FR close OUT, NRE, REL)

Ed. 01 Proposal 0201-11-2004P.MENON

Some clean up + synchronization with new tips from B8

Ed. 01 Proposal 0308-11-2004P.MENON

Suppress redundant informations with MFS Installation;Configuration,and Software replacement guide

Ed. 01 Proposal 0416-11-2004P.MENON

Minor corrections

Ed. 01 Proposal 0516-02-05P.MENON

Add Unix boot impossible (wrong default kernel)

Add check if backup Mib is not corrupted

Add How to get contents of unix patch BL

- Add for Trace of unix patch installation

Ed. 01 release11-03-05Release for B9 MR0

Ed. 02 release01-06-05Release for B9 MR2

P.MENON

- update Corrective action: second step (install_lsm)

Ed. 03 release02-06-05Release for B9 MR2

P.MENON

- S99trace_srv.ds is renamed in S99trace_server.ds since MFSAW10F

Ed. 04 release02-06-05Release for B9 MR2

P.MENON

- Add for Failure on Update Remote Inventory

Ed. 05 release09-06-05Release for B9 MR2

P.MENON

- Add for rmdir fails during execution of ins_swcx.sh when cygwin is installed on the PC

Ed. 06 release30-06-05Release for B9 MR2

P.MENON

-update Error at step 5/10 (Isolation) Check the full SCSI chain...

Ed. 07 release06-07-05Release for B9 MR2

P.MENON

-Add Connection by ftp from a MFS station to an external server is impossible FR 3BKA20FBR164817

- Add TRACE_SERVER does not run FR 3BKA13FBR164932

Add GPU traces are not completed

Add Impossible to load patch GPU B8 on GPUs FR 3BKA13FBR164932

Add Unix patch installation from OMC stopped due to a network failure

Add After a roll-back it is impossible to open the IMT terminal FR 3BKA20FBR162930

Ed. 08 Proposal 0130-08-05P.MENON

- Add for Inall procedure stopped due to a station in "halt in" state FR 3BKA13FBR166921

Ed. 08 Proposal 0201-09-05P.MENON

- Add new Installation from a not english PC fails (FR 3BKA20FBR166358)

Ed. 08 Proposal 0309-09-05P.MENON

Add new The trace server stops running after a while (FR 3BKA13FBR169218)

Add new Result of dupatch in B8 or B9 RC40 with BL24

Ed. 08 Proposal 0420-09-05P.MENON

Add new Control station reboots in loop with reset_code 214 after installation of BL22 (FR 3BKA13FBR170335)

Ed. 08 Proposal 0520-10-05P.MENON

- Update Control station reboots in loop with reset_code 214 after installation of BL22 (FR 3BKA13FBR170335)

- Suppress yellow paragraph

Ed. 08 Release28-10-05Release

Ed. 09 Release13-01-06Release

P.MENON

- quality corrections

- update Error at Step 2 (Creation)

- Add new MFS UNIX patch installation makes Control Station unusable (B9 MR1 ED2) (FR 3BKA23FBR174370)

- Update Error at step 5/10 (Isolation) (FR 3BKA13FBR175829)

- Add new Reinstallation of the MFS and restauration of data from OMC

- Add new How to restore the MIB without needing full reinstallation

- Add new Sanity check script to prevent any potential problem on the MFS

Ed. 10 Proposal 0108-02-06P.MENON

- Add new GPU problem but alarm is "Failure of a JAET1 applique" (FR 3BKA13FBR177178)

- Add new no more available disk space on /usr (FR 3BKA20FBR176683)

Ed. 10 Proposal 0209-02-06P.MENON

- update Sanity check script to prevent any potential problem on the MFS (FR 3BKA13CBR176689)

Ed. 10 Proposal 0313-02-06P.MENON

- update Sanity check script to prevent any potential problem on the MFS (FR 3BKA13CBR176689) after remarks

- add Result of dupatch in B9 with BL22 since MR1 Edx (MFSSAW11E)

Ed. 10 Proposal 0416-02-06P.MENON

- update Error at step 5/10 (Isolation)

- add System and Tomas (Nectar was the name in a former time) traces

Ed. 10 Proposal 0522-02-06P.MENON

- add Wrong httpd.conf

Ed. 10 Release23-02-06Release

Ed. 11 Release02-03-06Release

P.MENON

- update Sanity check script to prevent any potential problem on the MFS (FR 3BKA13CBR176689)

- add JBETI traces

- update The trace server stops running after a while- update TRACE_SERVER does not run

- add not enough space for Backup MIB

- add new GPU switch over no more possible (FR 3BKA20FBR149993 and 3BKA20FBR151855)

Ed. 12 Release30-06-06Release

P.MENON

- update Traces of unix patch installation

- update O&M trace SCIM (RTA)

- update GPU switch over no more possible, JBETI problems

- Add new Rebuild of mirrored partitions on RC40- rename Sanity check script to prevent any potential problem on the MFS to AuditMFS script to prevent any potential problem on the MFS

- Add new not possible to get PM of MFS from OMC FR 3BKA13FBR183494 not possible to unlock omcxchg account from User management option of IMT FR 3BKA13FBR183497

- Update Check if backup Mib is corrupted

- Update CRAFT cannot connect to MFS floating IP:wrong httpd.conf

- Update AuditMFS script to prevent any potential problem on the MFS with new codes FR/CR 3BKA13CBR179923 3BKA13CBR180184 3BKA13CBR180203 3BKA13CBR180618

- Add new Impossible to enable MRTG Collector FR 3BKA13FBR186503

- Add new active Control Station is blocked after automatic backup MIB on RC40 FR 3BKA13CBR189473

- Add new MFS UNIX patch installation fails with a core file generated from 'install_patch_du' FR 3BKA13FBR189822

- Merge with MX Trouble Shooting descriptions

Ed. 12 Release05-07-06Release

P.MENON

- update Sleeping cells

- Add new bul file execution returns 1 error FR 3BKA13FBR186955

- Add new No RRALLI sent on GSLs, for all cells of the BSC, after activate MLU FR 3BKA20FBR186403

Ed. 13 Proposal 0104-09-06P.MENON

- Update Error at step 3/10:

. after installation from scratch of B9 version containing the script clean_spdata, the next migration does not work (FR 3BKA13FBR188065)

. after installation from scratch a MFS which was coming from migration or software replacement, with restoration of the backup MIB, a new migration or software replacement fails (FR 3BKA13FBR194235/3BKA13FBR193877/3BKA13FBR181238)

- Add new Cell parameters modification is not allowed from IMT (BUI request)

- Add new dataPatch.bul" error during scratch installation in B9 MR4 (FR 3BKA13FBR185034)

- update AuditMFS script to prevent any potential problem on the MFS error codes added (118: CS are not time synchronized (CR3BKA13CBR193667) and 406: discrepancies in version descriptor files (CR 3BKA13CBR193904)

Ed. 13 Proposal 0227-09-06P.MENON

. update after installation from scratch a MFS which was coming from migration or software replacement, with restoration of the backup MIB, a new migration or software replacement fails (FR 3BKA13FBR194235/3BKA13CBR193877/3BKA13FBR181238)

- update AuditMFS script to prevent any potential problem on the MFS

Ed. 13 Release11-10-06P.MENON

Release approved

Ed. 14 Proposal 0127-10-06

15-11-06

16-11-06P.MENON

- Add new Serial splitter and RJ45 converter for Trouble shooting ( MFS Evolution only)

- Add new How to generate/backup on a platform MFS a virgin MIB and how to import this MIB on a field MFS.

(same architecture / same SW level) (CR 3BKA13CBR194432)

- Add new Impossible to install MFS Sanity Check Script (AW11EP_00D) (FR 3BKA13FBR196644)

- update JBETI trace

D. COTTIN

- Add FR 3BKA20FBR199071 Hangingalarm"Cardvoltageoutofrange"forMFSJBXSSWafterpowerfailure

- Add 3BKA20FBR186125 PV_PEM alarms raised and not cleared from IMT and OMCR after MFS power off/on

- Add 3BKA13FBR199323 mfssetup or configure_switch failure after replacing a new SSW board

Ed. 14 Release27-11-06P.MENON

Release approved

- Add new How to detect JBETI is not blocked

TABLE OF CONTENTS

141Introduction

141.1.1Document organisation

141.1.2Presentation

152GPU

152.1GPUs disappear from the IMT

152.1.1Reference FR: None.

152.1.2Problem description

152.1.3Corrective action

172.2GPU SO Impossible

172.2.1Reference FR: 3BKA20FBR108914

172.2.2Problem description

172.2.3Corrective action

172.3GPU reboots continuously

172.3.1Reference FR: 3BKA20FBR119782

172.3.2Problem description

182.3.3Corrective action

182.3.4Problem solved

182.4GPU connection problem

182.4.1Reference FR: none

182.4.2Problem description

182.4.3Corrective action

192.5GPU problem but alarm is "Failure of a JAETI1 applique"

192.5.1Reference FR: 3BKA13FBR177178

192.5.2Problem description

192.5.3Corrective action

192.6GPU switch over no more possible, JBETI problems

192.6.1Reference FR: 3BKA20FBR149993, 3BKA20FBR151855 and 3BKA13FBR163557

192.6.2Problem description

202.6.3Preventive action

202.6.4Corrective action

202.7GPU SW is not loaded ( MFS Evolution only )

202.7.1Reference FR: 3BKA13FBR 175541

202.7.2Problem description

202.7.3Corrective action

213INSTALLATION

213.1Station restart

213.1.1Reference FR: None.

213.1.2Problem description

213.1.3Corrective action

233.2Impossible to rlogin/telnet to MFS as root

233.2.1Reference FR: None.

233.2.2Problem description

233.2.3Corrective action

233.3Unix boot impossible (wrong default kernel)

233.3.1Reference FR: None.

233.3.2Problem description

243.3.3Corrective action

253.4dataPatch.bul" error during scratch installation in B9 MR4

253.4.1Reference FR: 3BKA13FBR185034

253.4.2Problem description

253.4.3Corrective action

254MFS based on RC40

264.1Installation from a not English PC fails

264.1.1Reference FR: FR 3BKA20FBR166358

264.1.2Problem description

264.1.3Corrective action

264.2MFS installation failed

264.2.1Reference FR: none

264.2.2Problem description

264.2.3Corrective action

294.3Inall procedure stopped due to a station in "halt in" state

294.3.1Reference FR: 3BKA13FBR166921

294.3.2Problem description

294.3.3Corrective action

304.4Failure during the SWC from the OMC at step 1/10 (before file transfer)

304.4.1Reference FR: none

304.4.2Problem description

304.4.3Corrective action

304.5Unix boot impossible

304.5.1Reference FR: None.

304.5.2Problem description

304.5.3Corrective action

314.6Rebuild of mirrored partitions on RC40

314.6.1Reference FR: None.

314.6.2Problem description

314.6.3Corrective action

324.7active Control Station is blocked after automatic backup MIB on RC40

324.7.1Reference FR: 3BKA13CBR189473

324.7.2Problem description

324.7.3Corrective action

334.7.4Impacts

335MFS based on MX

345.1"Inall" failed during MX-MFS installation

345.1.1Reference FR: None.

345.1.2Problem description

345.1.3Corrective action

345.2Connection to OMCP using console redirection does not work

345.2.1Reference FR: None.

345.2.2Problem description

355.2.3Corrective action

355.3"inall" failed during MX-MFS installation

355.3.1Reference FR: None.

355.3.2Problem description

355.3.3Corrective action

365.4"inall" failed during MX-MFS installation

365.4.1Reference FR: None.

365.4.2Problem description

365.4.3Corrective action

365.5Error at step 2/10 (Creation) ( MFS Evolution only )

365.5.1Reference FR

365.5.2Problem description

365.5.3Corrective action

375.6Error at step 3/10 (Verify) ( MFS Evolution only )

375.6.1Reference FR: None.

375.6.2Problem description

375.6.3Corrective action

375.7Error at step 7/10 (Validation) - ( MFS Evolution only )

375.7.1Reference FR: None.

375.7.2Problem description

385.7.3Corrective action

385.8The stand-by station is not operational ( MFS Evolution only )

385.8.1Reference FR: None.

385.8.2Problem description

385.8.3Corrective action

395.9Ethernet connection problem ( MFS Evolution only )

395.9.1Reference FR : none

395.9.2Problem description

395.9.3Corrective Action

435.10Impossible to connect IMT ( MFS Evolution only )

435.10.1Reference FR : 3BKA20FBR175917

435.10.2Problem Description

435.10.3Corrective Action

445.11After Power-on of ATCA shelf, OMCP servers are powered-off ( MFS Evolution only )

445.11.1Reference FR 3BKA20FBR172514

445.11.2Problem description

445.11.3Corrective action

445.12How to update time from OMC ( MFS Evolution only )

445.12.1Reference FR

445.12.2Problem description

445.12.3Corrective action

455.12.4Problem solved

455.13NE1oE supervision lost ( MFS Evolution only )

455.13.1Reference FR: None

455.13.2Problem description

455.13.3Corrective action

465.14Extension from 1 shelf configuration to 2 shelves configurations has failed ( MFS Evolution only )

465.14.1Reference FR: None

465.14.2Problem description

465.14.3Corrective action

475.15No RRALLI sent on GSLs, for all cells of the BSC, after activate MLU ( MFS Evolution only )

475.15.1Reference FR: 3BKA20FBR186403

475.15.2Problem description

475.15.3Corrective action

475.16Hangingalarm"Cardvoltageoutofrange"forMFSJBXSSWafterpowerfailure ( MFS Evolution only )

475.16.1Reference 3BKA20FBR199071

475.16.2Problem description

475.16.3Corrective action

485.17PV_PEM alarms raised and not cleared from IMT and OMCR after MFS power off/on ( MFS Evolution only )

485.17.1Reference 3BKA20FBR186125

485.17.2Problem description

485.17.3Corrective action

485.18mfssetup or configure_switch failure after replacing a new SSW board. ( MFS Evolution only )

485.18.1Reference 3BKA13FBR199323

485.18.2Problem description

485.18.3Corrective action

496AUTOMATIC SOFTWARE CHANGE

506.1Error during execution of ins_swcx.sh

506.1.1Reference FR: None.

506.1.2Problem description

506.1.3Corrective action

516.2rmdir fails during execution of ins_swcx.sh when cygwin is installed on the PC

516.2.1Reference FR: 3BKA13FBR163888

516.2.2Problem description

516.2.3Corrective action

516.3Error Temporary local directory error on IMT during step 0

516.3.1Reference FR: None.

516.3.2Problem description

516.3.3Corrective action

516.4Error File Access Error" with dlv.bck always appears when doing SW replacement

516.4.1Reference FR: 3BKA20FBR150527

516.4.2Problem description

516.4.3Corrective action

536.5Error at step 2/10 (Creation)

536.5.1Reference FR: None.

536.5.2Problem description

536.5.3Corrective action

546.6Error at step 3/10 (Verify)

546.6.1Reference FR: 3BKA20FBR099035 = 3BKA13FBR102355

546.6.2Problem description

546.6.3Corrective action

576.6.4Reference FR: 3BKA13FBR188065

586.6.5Reference FR: 3BKA13FBR194235, 3BKA13CBR193877, 3BKA13FBR181238

596.7Error at step 5/10 (Isolation)

596.7.1Reference FR: 3BK - A13FBR096085 / 105356 / 112480 - A20FBR096035 / 105055 / 129810 / 139842 - A23FBR174097

596.7.2Save traces

596.7.3Problem description

606.7.4Specific casefor 3BKA20FBR129810 : Problem occurs while Backup Server is down.

616.7.5Specific casefor 3BKA13FBR175829: broken shared disk

686.8Error at step 6/10 (Major version change)

686.8.1Reference FR: 3BKA13FBR107676

686.8.2Problem description

686.8.3Check if disks are shared correctly

686.8.4Corrective action

696.9Error at step 7/10 (Validation)

696.9.1Reference FR: None.

696.9.2Problem description

696.9.3Corrective action

696.10Control station reboots in loop with reset_code 214 after installation of BL22

696.10.1Reference FR: 3BKA13FBR170335

696.10.2Problem description

706.10.3Corrective action

706.11MFS UNIX patch installation makes Control Station unusable (B9 MR1 ED2)

706.11.1Reference FR: 3BKA23FBR174370

706.11.2Problem description

726.11.3Corrective action

736.12MFS UNIX patch installation fails with a core file generated from 'install_patch_du'

736.12.1Reference FR: 3BKA45FBR188097/3BKA25FBR188087/ 3BKA13FBR189822

736.12.2Problem description

746.12.3Corrective action

756.13bul file execution returns 1 error

756.13.1Reference FR: 3BKA13FBR186955

756.13.2Problem description

756.13.3Corrective action

767MFS Running

777.1The stand-by station is not operational

777.1.1Reference FR: None.

777.1.2Problem description

777.1.3Corrective action

777.2Station not reachable

777.2.1Reference FR: none

777.2.2Problem description

787.2.3Corrective action

787.2.4Problem solved

787.3System console not reachable

787.3.1Reference FR: none

787.3.2Problem description

787.3.3Corrective action

797.4A process is looping

797.4.1Reference FR: 3BKA45FBR119174

797.4.2Problem description

797.4.3Corrective action

807.4.4impacts

807.5Reboots in loop on MFS reset due to bad IP address

807.5.1Reference FR: 3BKA20FBR079434 - 3BKA20FBR081233 (close NIP)

807.5.2Problem description

807.5.3Corrective action

817.6Reboots in loop due to no more disk space

817.6.1Reference FR: None

817.6.2Problem description

817.6.3Corrective action

827.7OMC-MFS link problem at different interface cases

847.8Ethernet connection problem

847.8.1Reference FR: none

847.8.2Problem description

847.8.3Corrective Action

847.9Sleeping cells

847.9.1Alerter definition

857.10DS10 servers dont come up automatically after power off/power on

857.10.1Reference FR: 3BKA45FBR17363, 3BKA20FBR135619

857.10.2Problem description

877.10.3Corrective action

887.11Failure on Update Remote Inventory

887.11.1Reference FR: none

887.11.2Problem description

887.11.3Corrective Action

887.12Connection by ftp from a MFS station to an external server is impossible

887.12.1Reference FR: 3BKA20FBR164817

887.12.2Problem description

887.12.3Corrective action

897.13The trace server stops running after a while

897.13.1Reference FR: 3BKA13FBR169218

897.13.2Problem description

897.13.3Corrective action

897.14TRACE_SERVER does not run

897.14.1Reference FR: 3BKA13FBR164932

897.14.2Problem description

907.14.3Corrective action

907.15GPU traces are not completed

907.15.1Reference FR: none

907.15.2Problem description

907.15.3Corrective action

907.16Impossible to load patch GPU B8 on GPUs

907.16.1Reference FR: 3BKA13FBR164932

907.16.2Problem description

907.16.3Corrective action

907.17Unix patch installation from OMC stopped due to a network failure

907.17.1Reference FR: none

907.17.2Problem description

917.17.3Corrective action

917.18After a roll-back it is impossible to open the IMT terminal

917.18.1Reference FR: 3BKA20FBR162930

917.18.2Problem description

917.18.3Corrective action

917.19Telnet access from Windows

917.19.1Reference FR: none

917.19.2Problem description

917.19.3Corrective Action

927.20no more available disk space on /usr

927.20.1Reference FR: 3BKA20FBR176683

927.20.2Problem description

927.20.3Corrective action

937.21CRAFT cannot connect to MFS floating IP:wrong httpd.conf

937.21.1Reference FR: 3BKA13FBR177317

937.21.2Problem description

937.21.3Corrective action

947.22not enough space for Backup MIB

947.22.1Reference FR: none

947.22.2Problem description

957.22.3Corrective action

957.23not possible to get PM of MFS from OMC, not possible to unlock omcxchg account from User management option of IMT

957.23.1Reference FR: 3BKA13FBR183494, 3BKA13FBR183497

957.23.2Problem description

967.23.3Corrective action

967.24Impossible to enable MRTG Collector

967.24.1Reference FR: 3BKA13FBR186503

967.24.2Problem description

967.24.3Corrective action

977.25Cell parameters modification is not allowed from IMT (BUI request)

977.25.1Reference FR: none

977.25.2Problem description

977.25.3Corrective action

977.26Impossible to install MFS Sanity Check Script (AW11EP_00D)

977.26.1Reference FR: 3BKA13FBR196644

977.26.2Problem description

987.26.3Corrective action

988Crash/Traces

988.1Determine crash cause

998.2Save traces

998.3O&M trace

998.3.1SCIM (RTA)

998.3.2Q3

1008.3.3RETIX

1008.3.4UNIX

1008.4GPU trace

1008.4.1Trace level

1018.4.2Which level to activate

1028.4.3How to modify size of mfs_trace_p_XX file?

1038.5JBETI trace

1038.6Traces of unix patch installation

1048.7Problems

1048.7.1GPU traces

1048.7.2Trace Server

1058.7.3Disk quota

1058.7.4mfs_trace_p_XX traces location

1058.8System and Tomas (Nectar was the name in a former time) traces

1058.8.1system traces (if required)

1068.8.2Advfs traces

1068.8.3TOMAS traces

1068.9NE1oE Traces

1079Various information

1079.1User count creation via IMT on MFS

1079.1.1Reference FR: 3BKA45FBR144680

1079.1.2Problem description

1079.1.3Corrective action

1089.2Update disk usage information

1089.2.1Problem description

1089.2.2Action

1089.3Shared disks access

1089.3.1Problem description

1089.3.2Action

1109.4How to get MFS component versions

1109.4.1Problem description

1109.4.2Action

1139.5How to know how many IMT are open at same time ?

1139.5.1Reference FR: none

1139.5.2Problem description

1139.5.3Corrective action

1149.6How to update time from OMC

1149.6.1Reference FR: 3BKA13FBR141970

1149.6.2Problem description

1149.6.3Corrective action

1159.6.4Problem solved

1169.7MFS restoration problem

1169.7.1Problem description

1169.7.2Corrections description

1169.8MFS system restoration problem: supervision ( MFS Evolution only )

1179.9Backup ( MFS Evolution only )

1179.10Restore ( MFS Evolution only )

1189.11Check if backup Mib is corrupted

1189.11.1Reference FR

1189.11.2Problem description

1209.11.3Correction description

1209.12Reinstallation of the MFS and restauration of data from OMC

1209.12.1Reference FR: 3BKA13CBR177682

1209.12.2Problem description

1209.12.3Correction description

1209.13How to get contents of Unix patch BL

1209.13.1Problem description

1209.13.2Action

1239.14How to restore the MIB without needing full reinstallation

1239.14.1Reference FR: 3BKA13CBR177682

1239.14.2Problem description

1239.14.3Correction description

1249.15AuditMFS script to prevent any potential problem on the MFS

1249.15.1Reference FR: 3BKA13CBR176689

1249.15.2Return codes explanation

1269.15.3Corrective action

1339.15.4Example on AS800 (based on Tomas RC23)

1409.15.5Example on DS10 (based on Tomas RC23)

1489.15.6Example on DS10 (based on Tomas RC40)

1559.16Serial splitter and RJ45 converter for Trouble shooting ( MFS Evolution only)

1559.16.1Reference FR: None

1559.16.2Problem description

1559.16.3Action

1569.17How to generate/backup on a platform MFS a virgin MIB and how to import this MIB on a field MFS (same architecture / same SW level)

1569.17.1Reference FR: 3BKA13CBR194432

1569.17.2Problem description

1569.17.3Action

1579.18How to to detect JBETI is not blocked

1579.18.1Reference FR: none

1579.18.2Problem description

1579.18.3Correction description

15810GLOSSARY AND ABBREVIATIONS

160AHW settings of environmental variables (FW)

INTERNAL REFERENCED DOCUMENTS

Not applicable

REFERENCED DOCUMENTS[ 1 ] MFS B9 installation user guide, reference 3BK 09679 JAAA RJZZA

[ 2 ] EVOLIUM A9135 MFS MAINTENANCE HANDBOOK, reference 3BK 20935 AAAA PCZZA

[ 3 ] B8/B9 A9135 MFS SOFTWARE MIGRATION Release B9, reference 3BK 17422 0202 RJZZA

RELATED DOCUMENTSPMU logging messages description and principles release B6.23BK 09850 FCAD PWZZA

OPEN POINTS / RESTRICTIONSno open point and no restriction have been found1 Introduction

1.1.1 Document organisation

This document is organized the following way:

1) This chapter

2) Troubles coming from GPU, with, most of the time a Quality Alert attached

3) Troubles coming at installation time

4) Troubles coming at SW change time, depending on the SWC phase

5) Troubles happening when MFS is started

6) What to do in case of crash, which information to be kept?

7) How to set and to get traces

8) Information: general information, as disk usage,

Plus an appendix for specific information

A) IOLAN configuration

B) HW setting of environmental variables

1.1.2 Presentation

Each chapter are introduced with a table summarising the addressed problems, origin and fix.

Very few chapters can be shown to the customer. They are highlighted in green.

Commands are presented in grey rectangle

2 GPU

What/behaviorTrouble originFix

1) GPUs disappear from IMT1 or more GPU with bad componentsChange GPU

2) Impossible GPU switch overJAE1 applique mistakeChange JAE1

3) GPU reboots continuously GPU FW mistake Change the GPU

4) GPU connection problem Connection, ethernet Check Ethernet,

Extract and re-plug the board

5) GPU problem but alarm is "Failure of a JAETI1 applique"Faulty GPUChange faulty GPU

6) GPU switch over no more possibleJBETI becomes blockedreset the active JBETI

7) GPU SW is not loaded No more DHCP lease availableRemove DHCP lease file

2.1 GPUs disappear from the IMT

2.1.1 Reference FR: None.

2.1.2 Problem description

One or more (up to all) GPU in a subrack disappear from time to time on the Craft terminal (IMT), like they have been unpluged.

The GSM and GPRS remains available, but its impossible to perform any remote action to these GPU (download or modify the configuration, switch over, reset data, lock ).

A reset hardware (=> outage telecom GSM + GPRS) solve the problem for a short time (< 1 day).

2.1.3 Corrective action

At least one GPU in the subrack can have bad hardware components.

All GPU of the subrack must be checked.

To check one GPU, unplug it (=> outage telecom GSM + GPRS).

Then compare the 5 components references like on the following pictures:

For these 5 components (XXX):

FB2041 is the good reference

FBL2041 is a wrong reference!

Bad component must be changed.

2.2 GPU SO Impossible

2.2.1 Reference FR: 3BKA20FBR108914

2.2.2 Problem description

Switch-over of one GPU (by Craft Terminal) on the spare GPU is not possible : the spare GPU begins to load its telecom configuration, but, some seconds after, the board is blocked and alarms On free run mode appears. The traffic is stopped.

When a switch-back of the GPU board is done, the traffic comes back, and everything is normal.

To confirm the problem, switch some GPU and some applique.

The problem is due to a difference between 2 variants of the appliques for the technology of the redundancy bus transceivers: The AxABxx version is equipped with component FB2041BB (running with VCC= 5V ), and AxAAxx version is equipped with FBL2041BB (running with VCC=3.3 V).

An hardware correction under study for 3BK08231AxABxx pcm appliques.

2.2.3 Corrective action

It has been demonstrated that the pcm applique with the reference number 3BK08231AxABxx causes the problem. PCM applique 3BK08231AxAAxx, must be fully operational.

JAE1C boards (75 PCM) : Check the pcm applique reference:

3BK08231ABAAxx: good board

3BK08231ABABxx: faulty board: Change the board by a good JAE1C

JAE1 boards (120 PCM): Check the pcm applique reference:3BK08231AAAAxx: good board

3BK08231AAABxx: faulty board: Change the board by a good JAE1

2.3 GPU reboots continuously

2.3.1 Reference FR: 3BKA20FBR119782

2.3.2 Problem description

The GPU reboots Continuously after configuration completed and board unlocked with GPU. After GPRS has been configured and the GPU and GPRS unlocked, it reboots continuously. When a switchover is performed, the same problem occurs.

In internal GPU traces (file mfs_trace_p_XX), the following traces indicate there is a failure in PMU package initialisations:

DATA_ERR : T: 200 : rrmswcomp.cpp : 160 : Cell Traffic Package init failure...

DATA_ERR : T: 200 : rrmswcomp.cpp : 172 : Bss Management Package init failure...

Then, check if the GPU reference (on the front side of the board) is GPU 3BK08064ABAC01

2.3.3 Corrective action

If the GPU reference is GPU 3BK08064ABAC01, and if the behavior is as described above, then contact the Local TAC, who has to change the GPU and to send the faulty GPU to Alcatel Repair Center, where a fix will be applied.

The problem is due to a bad detection of the remote inventory by the firmware of the GPU: the firmware checks in the remote inventory the combination of functional variant (VF), realization variant (VR) ABAA. This is a bug, it should check (VF) AB field only and not care about (VR) AC field. As ABAA is not found, the GPU board is not detected as JBGPU2 ( with 128 MB of PPC memory ), but by default as a JBGPU ( with 64 MB of PPC memory). It explains that some PMU packages can not initialize their memory allocation. 2.3.4 Problem solved

Hardware correction under study.

2.4 GPU connection problem

2.4.1 Reference FR: none

2.4.2 Problem description

GPU stays initial/idle (craft site view) and does not connect to the MFS. The led can be either fixed or blinked orange.

2.4.3 Corrective action

1. Check that at least one Ethernet link is plugged for that GPU in one of the switch.

2. Launch a Console on that GPU: plug a cable between the debug output of the applique and a COM port. (CTRL uu to enter GPU menu). Type help to list the available command. ve /vi display MAC / IP addresses.

3. If the GPU initialization is stopped at boot request (the GPU does not know its IP address) ( there is no connection between GPU and control station. Check that UDP packets corresponding to boot request are actually sent through one of the interface (tu1 or tu2):

Set-up the tcpdump on the net:

cd /dev

./MAKEDEV pfilt

pfconfig +p +c tu1

tcpdump i tu1 udp port 68

(if necessary : lan_config I tu1 s 10 x 0 a 0

# Set output to 10 Mega )

Packet sent through port 68 are bootpc (client = GPU) ones. Port 67 packets are bootps (server = control station) answer.

Check the file /etc/bootptab : It should have a line giving the board IP address according to the Ethernet address :

gpu1_lg0:tc=DS.default:ha=00809F090804:ip=1.1.1.50:bf=Loader.hex:\

ha is the Ethernet address, check with the console that the GPU gives the right address.

4. If the GPU initialisation is stopped at BNP init (On GPU console, the following messages is printed:

Wait for answer from GEM since x seconds

( there is a communication between the GPU and the control station (it is not an Ethernet problem). It is a known bug (see FR:A13/90904). Workaround: extract the board and plug it again. (This may be done several times)

2.5 GPU problem but alarm is "Failure of a JAETI1 applique"

2.5.1 Reference FR: 3BKA13FBR177178

2.5.2 Problem description

The origin of this issue seems to be real HW problem (faulty GPU) but the alarm is reported on the wrong board. (problem occured in B8 MR5 Ed4)

The GPU's part number is 3BK08064ACAB06 and it is not impacted by known Quality Alerters

2.5.3 Corrective action

Unplug the problematic GPU and after reset the JBET1 either on left or right handside.

2.6 GPU switch over no more possible, JBETI problems2.6.1 Reference FR: 3BKA20FBR149993, 3BKA20FBR151855 and 3BKA13FBR1635572.6.2 Problem description Sometimes the JBETI becomes blocked, so that it won't treat any request ( Remote inventory, Gpu reset, Gpu switchover ), and alarm are not cleared neither raised, while alls led on the JBETI are green: a switchover is done on spare GPU but no telecom traffic possible.we can fall in this situation for the following reasons

- after a GPU crash:

On the GPU software crash, O&M detect the loss of supervion of this GPU board and send a reset order to this GPU through the JBETI, but as JBETI is blocked the GPU won't reset/reboot.

Then after a while ( about 3 minutes ), as O&M don't see the GPU rebooting ( it conclude that the GPU is failling) , so O&M send a GPU switchover order (to route the PCM signal from applique to the spare GPU ), and send Telecom configuration to the spare GPU.

but as the JBETI is blocked, it will not treat the GPU switchover order, leading to not traffic possible on spare GPU.- after a manual GPU switchover command sent from IMT

O&M send a GPU switchover order (to route the PCM signal from applique to the spare GPU ), and send Telecom configuration to the spare GPU.

but as the JBETI is blocked, it will not treat the GPU switchover order, leading to not traffic possible on spare GPU.To confirm that the JBETI is blocked :

a remote inventory command from IMT will fail in time-out

2.6.3 Preventive action

None2.6.4 Corrective actionwhen JBETI is suspected as blocked, reset the active JBETI

2.6.4.1 Problem solved

MFS.PATCH.B9_0.RCxx.11EP_00G (JBETI_AA patch) and MFS.PATCH.B9_0.RCxx.11EP_00H (JBETI_AB patch) solve the problem in B9 MR1 ED4 QD11 (MFSSAW11E/41E, MFSSAW11F/41F)

2.7 GPU SW is not loaded ( MFS Evolution only )

2.7.1 Reference FR: 3BKA13FBR 175541

2.7.2 Problem description

Lease related to BOOTP is infinite so it is necessary to remove the lease file to be able to replace the boards with no constraint (32 different GP boards can be plugged)

2.7.3 Corrective action

Remove the /var/dhcp/dhcpd.leases

At any terminal accessing the active STATION, type

STATION_x> cd /var/dhcp/dhcpd.leases

STATION_x> rm dhcpd.leases

STATION_x> rm dhcpd.leases~

2. Then kill the DHCP server (it may trigger an OMCP switch-over):

STATION_x> ps -efd | grep dhcpd

root 1205 1115 0 Dec13 ? 00:00:00 /usr/nectar/bin/dhcpd_ctrl

root 1467 1205 0 Dec13 ? 00:00:02 /usr/sbin/dhcpd -cf /nfm_local/spdata/nectar/dhcpd/dhcpd.conf -f -q eth0 eth1

STATION_x> kill -9 1467

3 INSTALLATION

What/behaviorTrouble originFix

1) Both stations restartWrong address declarationModify address

2) rlogin is refusedImpossible rlogin/telnet as rootModify securettys file

3) Unix boot impossibleWrong default kernelModify boot_file variable in Firmware

4) dataPatch.bul" error during scratch installation in B9 MR4conf_alarm [cfgalarm101AH] object is defined twice in bul filesthis error is expected without bad effect on the MFS

3.1 Station restart

3.1.1 Reference FR: None.

3.1.2 Problem description

In some cases, when trying to restart one station, both of them restart, this may be due to the fact that they are declared to a wrong address.

3.1.3 Corrective action

Check (and if necessary modify) the firmware software configuration of the control stations, which can be accessed through the system console.

1. At any terminal accessing one of the control stations (STATION_A or STATION_B, by telnet or rlogin), it is possible to access the system console of any control station by typingeither:

STATION_x> telnet 1.1.1.20 10002

(for STATION_A system console)

STATION_x> telnet 1.1.1.20 10003

(for STATION_B system console)

2. Type some to get the prompt; then:

1) The UNIX login or the shell prompt is displayed: login root if necessary, then halt the station gently under the firmwareby typing the following command:

STATION_x> init 0

When the firmware prompt is displayed ( >>> ), go to step 6).

2) The machine doesnt react and the display is still: force the machine to stop by typing the keystroke sequence:

rmc

3) Then, when the RMC prompt is available:

RMC>halt in

4) Then again:rmc

5) Then when the RMC prompt is available:

RMC>halt out

Then the firmware prompt should be available.

6) Type the following command

>>>show *

(give all firmware variables values)

(Refer to Appendix B for a list of currently advised values depending on the hardware configuration)

If values are erronous, especially pka0_host_id, pkb0_host_id, pkc0_host_id and auto_action, modify them. For example:

>>>set pkc0_host_id 6

When all checks and modifications are done, do the following:

>>>init

The machine should now reboot automatically.

Now release the system console (as it is used from time to time by NECTAR Hardware Management):

On Sun station, by typing:

]

(simultaneously control and closing square bracket) then:

telnet>quit

If the console is accessed from the other station through a PC/NT X terminal, close simply the window.

(Another method to release the system console is to restart the iolan (see other chapter) from another session).

3.2 Impossible to rlogin/telnet to MFS as root

3.2.1 Reference FR: None.

3.2.2 Problem description

When trying to rlogin to the CS root, action is refused by the control station (access denied)

3.2.3 Corrective action

The file /etc/securettys is not good : it should include a line ptys to enable to be root from another terminal.

Login as admin on one of the control stations.

telnet 1.1.1.20 10002 /10003 to gain access to the system console (see 3.1.3 for more details)

login as root

type

echo ptys >> /etc/securettys

This adds the line ptys in securettys

perform the same action on the other control station

release the terminal or (in case of problem) reboot the iolan (telnet 1.1.1.20, return, su, iolan, reboot)

3.3 Unix boot impossible (wrong default kernel)

3.3.1 Reference FR: None.

3.3.2 Problem description

Unix cant boot because it cant open the default kernel 'vmunix.pre_capmn':

You should have the following at the console:

ff.fe.fd.fc.fb.fa.f9.f8.f7.f6.f5.f3.f2.f1.f0.ef.df.ee.f4.

probing hose 0, PCI

probing PCI-to-EISA bridge, bus 1

probing PCI-to-PCI bridge, bus 2

bus 0, slot 5 -- pka -- QLogic ISP10x0

bus 0, slot 6 -- vga -- S3 Trio64/Trio32

bus 2, slot 0 -- ewa -- DE500-BA Network Controller

bus 2, slot 1 -- ewb -- DE500-BA Network Controller

bus 2, slot 2 -- ewc -- DE500-BA Network Controller

bus 2, slot 3 -- ewd -- DE500-BA Network Controller

bus 0, slot 12, function 0 -- pkb -- NCR 53C875

bus 0, slot 12, function 1 -- pkc -- NCR 53C875

ed.ec.*** keyboard not plugged in...

eb.....ea.e9.e8.e7.e6.e5.e4.e3.e2.e1.e0.

V5.8-24, built on Jul 11 2001 at 10:57:51

Memory Testing and Configuration Status

512 Meg of System Memory

Bank 0 = 512 Mbytes(128 MB Per DIMM) Starting at 0x00000000

Bank 1 = No Memory Detected

CPU 0 booting

waiting for pkb0.6.0.12.0 to poll...

(boot dka0.0.0.5.0 -file vmunix.pre_capmn -flags S)

block 0 of dka0.0.0.5.0 is a valid boot block

reading 16 blocks from dka0.0.0.5.0

bootstrap code read in

Building FRU table

FRU table size = 0xbed

base = 1d2000, image_start = 0, image_bytes = 2000

initializing HWRPB at 2000

initializing page table at 1ffce000

initializing machine state

setting affinity to the primary CPU

jumping to bootstrap code

Digital UNIX boot - Mon Nov 1 17:21:23 EST 1999

can't open vmunix.pre_capmn

Enter [option_1 ... option_n]

Hit to boot default kernel 'vmunix.pre_capmn':

This is due to a wrong value of the boot_file variable at Firmware level:

>>>show boot*_file

boot_file vmunix.pre_capmn

booted_file vmunix.pre_capmn

3.3.3 Corrective action

Modify the boot_file variable:

>>>set boot_file vmunix

Verify the boot_file variable:

>>>show boot_file

boot_file vmunix

3.4 dataPatch.bul" error during scratch installation in B9 MR4

3.4.1 Reference FR: 3BKA13FBR185034

3.4.2 Problem description

During scratch installation in B9 MR4,MFS in conf mode,the following error appears after sending "dataPatch.bul" file downloaded from IMT in BUI-->Reception view :

*Request #779 was refused => CREATE_RSP conf_alarm [cfgalarm101AH] (

/* Errors : ***************/ generic_err = ENCA_DUP_MO_INSTANCE : The specified object instance already exists) ;

This error appears because "CREATE_RSP conf_alarm [cfgalarm101AH]" has been already configurated by "02_mfsconfig.bul"

3.4.3 Corrective action

This command has been added in datapatch.bul, in order to correct a FR in RC40 branch. This evolution has been added at the end of file in order to not prevent the correct loading of other patch information. Thus the error is expected without bad effect on the MFS.

4 MFS based on RC40

What/behaviorTrouble originFix

1) Installation fails during ftp phaseexpect does not recognize not English wordsUse a PC configured in English

2) Installation stops during the inconf phaseIncorrect additional disk configurationClear disk label

3) Inall procedure is stopped in inpatch stepHalt Button is IN, BOOT NOT POSSIBLE" under ">>>>" type "boot" then from the PC, in the Expect session, type "inall"

4) Popup window pearl failed at step 1 of SWC/tmp partition is fullFree disk space

5) Vmunix file access impossibleImpossible UNIX bootWith UNIX CD

6) Rebuild of mirrored partitionsStructure of mirrored partitions is broken With install_rc40_lsm

7) Lost of connection between OMC and MFS,active CS is blocked after automatic backup MIB on RC40a file set is corruptedReinstall from scratch if patch is missing

4.1 Installation from a not English PC fails

4.1.1 Reference FR: FR 3BKA20FBR1663584.1.2 Problem description

The phrases in Portuguese are given by the installation PC during the ftp. They are not recognized by the script expect that is the program to recognize the standard words of ftp in English and in French only.

4.1.3 Corrective action

Use a PC configured in English4.2 MFS installation failed

Incorrect additional disk configuration : HP must provided additional disk without any OS already installed. The internal additional disk must be: not formatted no partionning no labelling.

This requirement is mandatory for the first factory installation.

4.2.1 Reference FR: none

4.2.2 Problem description

The automatic MFS RC40 installation stops during the inconf phase. This problem can be pointed out by reading the inconf_STATION_A.log ( or inconf_STATION_B.log ) log file in the PC used for the installtion in the following directory /expect/bin/log.

In the problem occurs then the following sequence of lines appear in the log file :

Error: partition /dev/nfm/vol0a and overlapping partition(s) are

marked in use in the disklabel. Use "disklabel -e" to fix the

disklabel if it is improperly labeled.

start Actif FMA retcode -1 errno 0

Jun 8 21:16:20 STATION_A FM_Agent_stdalone[63148]: start_active: mount /omcxchg failed ret 256

4.2.3 Corrective action

Here is the way to fix the problem.

A) Check the second disk is formatted with UNIX BSD4.2 by using the command :

disklabel dsk1

B) You should have information like :

# /dev/rdisk/dsk1c:

type: EIDE

disk: 6E040L0

label:

flags: dynamic_geometry

bytes/sector: 512

0 ( 24, 41) FDX

sectors/track: 63

tracks/cylinder: 16

sectors/cylinder: 1008

cylinders: 16383

sectors/unit: 78165360

rpm: 4500

interleave: 1

trackskew: 0

cylinderskew: 0

headswitch: 0 # milliseconds

track-to-track seek: 0 # milliseconds

drivedata: 0

8 partitions:

# size offset fstype fsize bsize cpg # ~Cyl values

a: 131072 0 unused 0 0 # 0 - 130*

b: 262144 131072 unused 0 0 # 130*- 390*

c: 78165360 0 4.2BSD 1024 8192 16 # 0 - 77544

d: 0 0 unused 0 0 # 0 - 0

e: 0 0 unused 0 0 # 0 - 0

f: 0 0 unused 0 0 # 0 - 0

g: 38886072 393216 unused 0 0 # 390*- 38967*

h: 38886072 39279288 unused 0 0 # 38967*- 77544

You can see that the BSD 4.2 is present

C) Clear the disk label by using the command :

disklabel -z dsk1

D) Set the standard label name dsk1 :

disklabel -wr dsk1

E) Re-check the result :

disklabel dsk1You should have information like :

/dev/rdisk/dsk1c:

type: EIDE

disk: 6E040L0

label:

flags: dynamic_geometry

bytes/sector: 512

0 ( 24, 41) FDX

sectors/track: 63

tracks/cylinder: 16

sectors/cylinder: 1008

cylinders: 16383

sectors/unit: 78165360

rpm: 4500

interleave: 1

trackskew: 0

cylinderskew: 0

headswitch: 0 # milliseconds

track-to-track seek: 0 # milliseconds

drivedata: 0

8 partitions:

# size offset fstype fsize bsize cpg # ~Cyl values

a: 131072 0 unused 0 0 # 0 - 130*

b: 262144 131072 unused 0 0 # 130*- 390*

c: 78165360 0 unused 1024 8192 16 # 0 - 77544

d: 0 0 unused 0 0 # 0 - 0

e: 0 0 unused 0 0 # 0 - 0

f: 0 0 unused 0 0 # 0 - 0

g: 38886072 393216 unused 0 0 # 390*- 38967*

h: 38886072 39279288 unused 0 0 # 38967*- 77544

Now the partition c is unused.

F) Restart the installation from the beginning by typing on the PC used for the installation the following command :

Open a DOS session and type :

cd C:\expect\bin

tclsh80 clear

tclsh80 inall

4.3 Inall procedure stopped due to a station in "halt in" state

Inall procedure is stopped on station A in inpatch step.

4.3.1 Reference FR: 3BKA13FBR166921

4.3.2 Problem description

Inpatch step proceeds to a boot of the station. This boot is refused with the following message displayed on screen (also in log file inpatchSTATIONA.log)

">>>boot

Halt Button is IN, BOOT NOT POSSIBLE".

4.3.3 Corrective action

In order to continue the installation the following has been applied successfully: - log on station A by Iolan - under ">>>>" prompt, type "boot" - when station at UNIX level ("login:" prompt is displayed), then from the PC, in the Expect session, type "inall"

4.4 Failure during the SWC from the OMC at step 1/10 (before file transfer)

4.4.1 Reference FR: none4.4.2 Problem description

Sometimes, the /tmp partition is full and the migration is stopped by errors displayed in a popup window with the following message on the IMT: "perl failed"

On the OMC:

- the log file (/alcatel/var/home/axadmin/alcatel/debug/s_Thu May 13 16:24:10 CEST 2004.out, the folowing message should be appear:

SCGui - NewFTPSoftChange () - IOException raised: java.io.IOException: Not enough space

In /var/adm/messages the following message should be appear:

On May 13 18:03:02 omcr08 unix: WARNING: /tmp: File system full, swap space limit exceeded 4.4.3 Corrective action

On OMC, login as root and check the available disk space, especially in /tmp and /alcatel partition by using the 'df -k' command and do a cleanup if needed.

4.5 Unix boot impossible

4.5.1 Reference FR: None.

4.5.2 Problem description

Unix cant boot because it cant access to vmunix file

4.5.3 Corrective action

Following actions have to be performed:

insert the Unix 4.0F CDROM

(warning : do not use an MFS+UNIX INSTALL CDROM which reformats the disks automatically)

boot dka400(AS800)

boot dqb0

(DS10)

cd /dev

./MAKEDEV rz0

cd /etc/fdmns

touch .adfslock_root_domain

mkdir root_domain

cd root_domain

ln s /dev/rz0a .

cd /

mount root_domain#root /mnt

4.6 Rebuild of mirrored partitions on RC40

4.6.1 Reference FR: None.

4.6.2 Problem description

It can happen that the mirrored partitions need to be rebuilt4.6.3 Corrective action

check MIB consistency according to chapter 9.11

REF _Ref132006709 \r \h \* MERGEFORMAT Error! Reference source not found. and verify a backup MIB is available under /usr/backup_mib.

install patch MFS.PATCH.B9_0.RC40.41FP_00x (not yet available)

unplug JBETIS ethernet cables to avoid any GPU reset during operations on control stations

stop both control stations (firstly on standby then on active):

# /usr/mfs/bin/mfs_stop_nectarlog on STATION_A and launch the following command:

# cd /usr/tools

# ./install_rc40_lsmboth stations will reboot after "install_rc40_lsm". Then all processes of both stations will startup.

If one station is seen failed in the nectar view of IMT, do a Clear_alarm on the failed station, this will clear the associated alarm and reboot the station

If everything is fine, reconnect the ethernet cables of the JBETIs in a non busy hour, because GPUs will reset.

Note: this procedure will erase the PM counters in /omcxchg partition. Therefore and if possible, these have to be saved before rebuild of the mirrored partitions and restored after if needed.

4.7 active Control Station is blocked after automatic backup MIB on RC404.7.1 Reference FR: 3BKA13CBR1894734.7.2 Problem description

If connection between OMC and MFS is lost since 02h AM (check the alarm history) and if active station not reachable with telnet or rlogin, it is probably due to a weakness of the unix system (file set corrupted).Therefore, it is requested to perform the procedure below in order to get information for analysis.

WARNING: do not power off/on the blocked control station because you will lose the possibility to generate a crash4.7.3 Corrective action

Access the concerned station to the console port.

1. Access the RMC chip with the sequence of keystroke:

rmc

2. Then enter halt mode:

RMC> halt in

3. Here, force the crash of the machine:

>>> crash

As the main processor execution was frozen by the RMC chip, the core which will be generated will contain all needed information related to the running software agents.

4. Access again the RMC chip with the same sequence of keystroke as above in step (2)

5. Exit the halt mode:

RMC> halt out

After typing at least one the system should run under the firmware of the main processor.

6. . The machine does not reboot automatically:

>>> boot

Then release the console.

7. On the control station, the directory /var/adm/crash includes both files vmunix.n and vmzcore.n with the right time. Produce the text file to exploit them:

/usr/bin/crashdc vmunix.n vmzcore.n > crashdata.n.txt

8. when prompt is available, logon on and type the command:/usr/sbin/sys_check -escalateIt will work for 30-60 minutes and produce a file named escalate.tar

4.7.4 Impacts

When the problem is encountered there is a risk of GPRS telecom outage if standby station is not not able to switch as active.

If T64KIT1000532-V51AB24-20060417 patch is not installed then you need unfortunately to reinstall from scratch the system.

To check if T64KIT1000532-V51AB24-20060417 patch is present, use the following command

# dupatch -track -type kit -nolog | grep "T64KIT1000532-V51AB24-20060417 OSF520"

5 MFS based on MX

What/behaviorTrouble originFix

1) "inall" phase failed during MX-MFS installation Bad BIOS settings on the OMCP board Configure the BIOS settings in line with MX-MFS installation method

2) Connection to OMCP using console redirection does not workConsole redirection not working (

2 console redirection opened at the same time)

Close the other console redirection

3) "inall" phase failed during MX-MFS installationShMC_1 is not activeTrigger a ShMC switch-over

4) "inall" phase failed during MX-MFS installation NFS server not correctly configured Configure it again, and launch again the installation

5) Error at Step 2/10 (Creation)Full partitionClean log files

6) Error at Step 3/10 (Verify)Set TOLERANT VERIFY option

7) Error at Step 7/10 (Validation) Old version not deleted Delete old version

8) Stand by station not operationalClear alarm

9) Ethernet connection problemCheck IP configurationMFS_inet

10) Impossible to connect IMTRcp blockedKill rcp process

11) After Power-on of ATCA shelf, OMCP servers are powered-offOMCP HW problemUnplug and plug back OMCP board

12) How to update time from OMC

13) Ne1oE supervision lost

14) Extension from 1 shelf configuration to 2 shelves configurations has failed

15) No RRALLI sent on GSLs, for all cells of the BSC, after activate MLU"Re-initialyze GPRS" for all the cells of the BSC

16) Hangingalarm"Cardvoltageoutofrange"forMFSJBXSSWafterpowerfailureTOMAS MD4/SP1 HW management limitationPut ShMC active in the same plan as faulty JBXSSW board and unplug and plug back JBXSSW board

17) PV_PEM alarms raised and not cleared from IMT and OMCR after MFS power off/onTOMAS MD4/SP1 HW management limitationPower Off/Power On the PV_PEM board on alarm.

18) mfssetup or configure_switch failure after replacing a new SSW board.Unexpected VLAN configuration definition already existing on SSW BoardDelete unexpected VLAN configuration SSW Board.

5.1 "Inall" failed during MX-MFS installation

5.1.1 Reference FR: None.

5.1.2 Problem description

In some cases, MX-MFS installation can failed during "inall" phase. This can be due to wrong BIOS settings.

5.1.3 Corrective action

Check the BIOS settings of the control stations, which can be accessed through the console redirection.

At any terminal accessing one of the ShMC, it is possible to access the console of STATION_A (resp. STATION_B) by typing:

STATION_x> telnet 172.17.3.8 4503

(for STATION_A)

STATION_x> telnet 172.18.3.9 4504

(for STATION_B)

Reboot the station, and enter the BIOS by typing following sequence: ESC Shift 2

5.2 Connection to OMCP using console redirection does not work

5.2.1 Reference FR: None.

5.2.2 Problem description

When trying to telnet to the CSs via console redirection of the ShMC, action is refused with following message

Trying 172.17.3.8...

Connected to 172.17.3.8.

Escape character is '^]'.

Linux 2.4.22-1.0.48 (172.17.3.3) (ttyp1)

ICR: error in open() of /dev/icr02, No such device

Connection closed by foreign host.

5.2.3 Corrective action

Check that the console redirection is not already used to access another board of the ATCA shelf by typing following command on the active Shelf Manager

shm3s8:~ # ps -efd | grep "telnetd -E"

If this command returns a command, for instance:

549 root 1072 S telnetd -E /usr/bin/icr /dev/icr01

kill the process.

5.3 "inall" failed during MX-MFS installation

5.3.1 Reference FR: None.

5.3.2 Problem description

Installation of MX-MFS failed during "inall" phase to a problem with console redirection of OMCP.

5.3.3 Corrective action

Step 1: Check 3.2

Step 2: Check that active Shelf Manager is ShMC_1 (172.17.3.8):

On the Installation PC, telnet on 172.17.3.8 (login:root, password:root):

Then type following command

shm3s8:~ # sv_status

The ShmC is active if this command returns:

get status-> openhpid on (localhost:5566)

openhpid is active

Otherwise, in order to trigger a ShMC switch-over, type following command:

shm3s8:~ # sv_activate

activate-> openhpid on (localhost:5566)

5.4 "inall" failed during MX-MFS installation

5.4.1 Reference FR: None.

5.4.2 Problem description

Installation of MX-MFS failed during "inall" phase and specifically in indu or inacp phase.

In c:/TomasInstall/Log/Indu_STATION_x.log we can see :

echo mount -o mountport=$mountport,nolock $bootserver:/d /mnt

mount -o mountport=,nolock 172.17.3.4:/d /mnt

init-2.05a# mount -o mountport=$mountport,nolock $bootserver:/d /mnt

mount: 172.17.3.4:/d failed, reason given by server: Permission denied

or in c:/TomasInstall/Log/inacp_STATION-x.log you can see a similar error on a mount command

5.4.3 Corrective action

Put the correct configuration for the NFS server and don't forget the user access

5.5 Error at step 2/10 (Creation) ( MFS Evolution only )

5.5.1 Reference FR

None.

5.5.2 Problem description

During the software replacement the phase CREATION is stopped by errors displayed in a popup window with the following:

Generic error enca_ope_failed ensw file check error PILOT/A - The file copy from delivery fileset to target fileset failed : fsync error

That means the root file system ( / ) of the active station should be full (100%).

5.5.3 Corrective action

First open a xterm, and type " df -k ". Check that /usr and /var directories are not full (less than 85% used). If it's not the case, go to

/usr/mfs/log => remove all big trace files (*.old, and TraceGOM if exist)

/var/local/nectar/dated = > remove all Core files and old traces

Then check again the space left with " df -k " command. Do not begin the migration in case the space left is too small.

See following directories:

/usr

/usr/mfs/log

/var/local/nectar

/RESULT

5.6 Error at step 3/10 (Verify) ( MFS Evolution only )

5.6.1 Reference FR: None.

5.6.2 Problem description

During a software change or a migration, after the click on < Next > button in step 3, The IMT pops up an alert window with the following text: "Error occur , see log file".

This means that the software replacement is stopped due to errors found at the VERIFY phase.

5.6.3 Corrective action

In that specific case, software change can be forced with TOLERANT_VERIFY option.

To be completed.

Error at step 7/10 (Validation) - ( MFS Evolution only )

5.6.4 Reference FR: None.

5.6.5 Problem description

During a software change or a migration, after the click on < Next > button in step 7 (ready to validate new version), the IMT terminal disconnects then re-connects, signaling that there is a software change in progress: < SW change in progress do you want to continue it ? >.

If < Yes > button, following information is display in the Software Change Window:

< Old version name > in state created.

Step 1/10

< Old version name > version will be installed.

5.6.6 Corrective action

Delete the old version, to complete the software change or the migration, follow the next steps:

1 Click on < Software Management / Software Change >.

2 Click on < Back > button.

3 The following message will appear: < Do you want to install < new version name > version ? >. Click on < No > button.

4 Click on < Software Management / MFS versions >. Result is:

Version 1 : < new version name >

state : validated

5 Software change or migration from < old version name > to < new version name > is fully completed.

5.7 The stand-by station is not operational ( MFS Evolution only )

5.7.1 Reference FR: None.

5.7.2 Problem description

At the terminal (login : root) on the concerned station (Hereafter " x " stands for " A " or " B "):

STATION_x> ps -ef | grep mfs

STATION_x> ps -ef | grep nectar

There is no processes running or only SUA processes (nectar).

At the IMT, on the BUI->request window, when launching the command:

get sta[PILOT/x] (*);

The "system_state" of the station is either "initializing" or "not_installed" (the normal awaited state is "stand-by").

5.7.3 Corrective action

At the IMT, in the Site view, check the STA states (aspA or aspB):

1. If the state is disabled/not installed, at the IMT window (clicking on right button), perform a clear_alarm.

2. If the state is initializing, then wait up to 40 minutes.

Check with alias "stg" that TOMAS is started with following command:

stg

If answer is no, type following command

st yes

mfs_start_tomas -site

5.8 Ethernet connection problem ( MFS Evolution only )

5.8.1 Reference FR : none

5.8.2 Problem description

No ping from STATION_A/B (or from PC) to STATION_B/A on network 172.17.0.0 or 172.18.0.0.

No ping from any other station/PC on STATIONA/B on IP general network address

5.8.3 Corrective Action

Check the Ethernet configuration of the control station. On system console:

ifconfig a

The result must contain at least the following line (given case : STATION_B):

eth0 Link encap:Ethernet HWaddr 00:80:42:17:5E:60

inet addr:172.17.3.4 Bcast:172.17.255.255 Mask:255.255.0.0

UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1

RX packets:2546730 errors:0 dropped:0 overruns:0 frame:0

TX packets:896030 errors:0 dropped:0 overruns:0 carrier:0

collisions:0 txqueuelen:100

RX bytes:312772681 (298.2 Mb) TX bytes:344701015 (328.7 Mb)

Base address:0x2000 Memory:fe800000-fe820000

eth0.5 Link encap:Ethernet HWaddr 00:80:42:17:5E:60

inet addr:139.54.96.205 Bcast:139.54.96.255 Mask:255.255.255.0

UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1

RX packets:338995 errors:0 dropped:0 overruns:0 frame:0

TX packets:55852 errors:0 dropped:0 overruns:0 carrier:0

collisions:0 txqueuelen:0

RX bytes:28176315 (26.8 Mb) TX bytes:8401714 (8.0 Mb)

eth0.5:@0 Link encap:Ethernet HWaddr 00:80:42:17:5E:60

inet addr:139.54.98.210 Bcast:139.54.98.255 Mask:255.255.255.0

UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1

eth0:-ECC Link encap:Ethernet HWaddr 00:80:42:17:5E:60

inet addr:172.17.0.20 Bcast:172.17.255.255 Mask:255.255.0.0

UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1

Base address:0x2000 Memory:fe800000-fe820000

eth0:-V3 Link encap:Ethernet HWaddr 00:80:42:17:5E:60

inet addr:172.16.3.3 Bcast:172.16.255.255 Mask:255.255.255.0

UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1

Base address:0x2000 Memory:fe800000-fe820000

eth0:ALI1 Link encap:Ethernet HWaddr 00:80:42:17:5E:60

inet addr:172.17.3.200 Bcast:172.17.255.255 Mask:255.255.0.0

UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1

Base address:0x2000 Memory:fe800000-fe820000

eth0:GPNE Link encap:Ethernet HWaddr 00:80:42:17:5E:60

inet addr:172.19.33.1 Bcast:172.19.255.255 Mask:255.255.0.0

UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1

Base address:0x2000 Memory:fe800000-fe820000

eth0:MUNE Link encap:Ethernet HWaddr 00:80:42:17:5E:60

inet addr:172.17.33.1 Bcast:172.17.255.255 Mask:255.255.0.0

UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1

Base address:0x2000 Memory:fe800000-fe820000

eth0:nfs Link encap:Ethernet HWaddr 00:80:42:17:5E:60

inet addr:172.17.3.100 Bcast:172.17.255.255 Mask:255.255.0.0

UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1

Base address:0x2000 Memory:fe800000-fe820000

eth0:ntp Link encap:Ethernet HWaddr 00:80:42:17:5E:60

inet addr:172.32.0.166 Bcast:172.32.255.255 Mask:255.255.0.0

UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1

Base address:0x2000 Memory:fe800000-fe820000

eth1 Link encap:Ethernet HWaddr 00:80:42:17:5E:61

inet addr:172.18.3.4 Bcast:172.18.255.255 Mask:255.255.0.0

UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1

RX packets:2365322 errors:0 dropped:0 overruns:0 frame:0

TX packets:1010319 errors:0 dropped:0 overruns:0 carrier:0

collisions:0 txqueuelen:100

RX bytes:286602938 (273.3 Mb) TX bytes:770450537 (734.7 Mb)

Base address:0x2040 Memory:fe820000-fe840000

eth1.5 Link encap:Ethernet HWaddr 00:80:42:17:5E:61

inet addr:139.54.97.206 Bcast:139.54.97.255 Mask:255.255.255.0

UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1

RX packets:0 errors:0 dropped:0 overruns:0 frame:0

TX packets:1565 errors:0 dropped:0 overruns:0 carrier:0

collisions:0 txqueuelen:0

RX bytes:0 (0.0 b) TX bytes:103230 (100.8 Kb)

eth1:-ECC Link encap:Ethernet HWaddr 00:80:42:17:5E:61

inet addr:172.18.0.20 Bcast:172.18.255.255 Mask:255.255.0.0

UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1

Base address:0x2040 Memory:fe820000-fe840000

eth1:-V4 Link encap:Ethernet HWaddr 00:80:42:17:5E:61

inet addr:172.16.4.3 Bcast:172.16.255.255 Mask:255.255.255.0

UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1

Base address:0x2040 Memory:fe820000-fe840000

eth1:ALI2 Link encap:Ethernet HWaddr 00:80:42:17:5E:61

inet addr:172.18.3.200 Bcast:172.18.255.255 Mask:255.255.0.0

UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1

Base address:0x2040 Memory:fe820000-fe840000

eth1:MUNE Link encap:Ethernet HWaddr 00:80:42:17:5E:61

inet addr:172.18.33.1 Bcast:172.18.255.255 Mask:255.255.0.0

UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1

Base address:0x2040 Memory:fe820000-fe840000

eth1:mir Link encap:Ethernet HWaddr 00:80:42:17:5E:61

inet addr:172.18.3.111 Bcast:172.18.255.255 Mask:255.255.0.0

UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1

Base address:0x2040 Memory:fe820000-fe840000

eth1:nfs Link encap:Ethernet HWaddr 00:80:42:17:5E:61

inet addr:172.18.3.100 Bcast:172.18.255.255 Mask:255.255.0.0

UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1

Base address:0x2040 Memory:fe820000-fe840000

eth1:ntp Link encap:Ethernet HWaddr 00:80:42:17:5E:61

inet addr:172.32.0.166 Bcast:172.32.255.255 Mask:255.255.0.0

UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1

Base address:0x2040 Memory:fe820000-fe840000

lo Link encap:Local Loopback

inet addr:127.0.0.1 Mask:255.0.0.0

UP LOOPBACK RUNNING MTU:16436 Metric:1

RX packets:143 errors:0 dropped:0 overruns:0 frame:0

TX packets:143 errors:0 dropped:0 overruns:0 carrier:0

collisions:0 txqueuelen:0

RX bytes:11432 (11.1 Kb) TX bytes:11432 (11.1 Kb)

To configure the network, use the command

/usr/mfs/bin/mfs_inet

5.9 Impossible to connect IMT ( MFS Evolution only )

5.9.1 Reference FR : 3BKA20FBR175917

5.9.2 Problem Description

It may happen that the IMT can not be launched anymore.

5.9.3 Corrective Action

On system console:

STATION_A# ps -efd | grep rcp

Then, kill process linked to following commands:

* rcp /etc/nectar/data/ncma_init_data STATION_B:/etc/nectar/data/ncma_init_data 2>/dev/null* rcp STATION_B:/etc/group /var/tmp/craft_srvtmp/grpfile2

5.10 After Power-on of ATCA shelf, OMCP servers are powered-off ( MFS Evolution only )

5.10.1 Reference FR 3BKA20FBR172514

5.10.2 Problem description

It may be observed sporadically that after a power on of a MFS Evolution, the OMCP boards don't start : no LED are ON on this board and mainly the blue LED is OFF. Preventive actions : When manually switching on a MFS Evolution, power on all sub-racks at the same time : i.e. A1&B1 switches must be switched on at the same time than A2&B2.

5.10.3 Corrective action

In this case both OMCP boards must be unplugged and plugged back in.

5.11 How to update time from OMC ( MFS Evolution only )

5.11.1 Reference FR

3BKA13FBR141970

5.11.2 Problem description

It happens that the time is not synchronized between OMC and MFS.

5.11.3 Corrective action

On both stations do the two following commands (the inputs are just examples):

- declaration of the OMC to MFS:

#/usr/mfs/bin/mfs_addomc

Enter OMC hostname: carlsberg

Enter IP address: 192.168.17.79

carlsberg added to /etc/hosts.

Do you want to add another omc [y]?

- synchronization of the MFS with the OMC:

#/usr/mfs/bin/mfs_ntp

Enter NTP server hostname : carlsberg

Testing carlsberg...

Original /etc/ntp.conf saved to /etc/ntp.conf.pre_mfs_ntp.1.

Original /etc/rc.config saved to /etc/rc.config.pre_mfs_ntp.1.

carlsberg is an NTP server for STATION_A.

STATION_B is an NTP peer for STATION_A.

Allow to use local clock in last resort when all other NTP sources have gone away.

Restarting NTP server

Network Time Service started

The synchronization is not done immediatly, so be patient !

5.11.4 Problem solved

Not applicable

5.12 NE1oE supervision lost ( MFS Evolution only )

5.12.1 Reference FR: None

5.12.2 Problem description

The nE1oE supervision is lost on GP or MUX boards. In that case, in physical view of IMT, GP boards are red and ne1oe_operational_state is equal to "disable"

5.12.3 Corrective action

This problem can be due to a wrong configuration of tagged VLAN on JBXSSW boards. On active pilot station, enter following command:

#/usr/mfs/bin/checkVlanConfig 172.17.3.10

(for left switch of Shelf 3)

#/usr/mfs/bin/checkVlanConfig 172.18.3.20

(for right switch of Shelf 3)

#/usr/mfs/bin/checkVlanConfig 172.17.4.10

(for left switch of Shelf 4)

#/usr/mfs/bin/checkVlanConfig 172.18.4.20

(for right switch of Shelf 4)

This command must return:

Checking MXMFS vlan configuration for switch 172.17.3.10

Number of vlan configured: 4

vlanID : 1

vlanID : 3

vlanID : 5

vlanID : 3193

good configuration for egress ports on vlan 5

good configuration for forbidden ports on vlan 5

good configuration for untagged ports on vlan 5

good configuration for egress ports on vlan 3

good configuration for forbidden ports on vlan 3

good configuration for untagged ports on vlan 3

good configuration for egress ports on vlan 1

good configuration for forbidden ports on vlan 1

good configuration for untagged ports on vlan 1

(for left switches)

and

------------------------------------------------------

Checking MXMFS vlan configuration for switch 172.18.3.20

Number of vlan configured: 4

vlanID : 1

vlanID : 4

vlanID : 5

vlanID : 3193

good configuration for egress ports on vlan 5

good configuration for forbidden ports on vlan 5

good configuration for untagged ports on vlan 5

good configuration for egress ports on vlan 4

good configuration for forbidden ports on vlan 4

good configuration for untagged ports on vlan 4

good configuration for egress ports on vlan 1

good configuration for forbidden ports on vlan 1

good configuration for untagged ports on vlan 1

(for right switches)

5.13 Extension from 1 shelf configuration to 2 shelves configurations has failed ( MFS Evolution only )

5.13.1 Reference FR: None

5.13.2 Problem description

During extension from 1 shelf configuration to 2 shelves configurations, The IMT pops up an alert window with the following text: "Error occur , see log file".

This means that the shelf extension is stopped due to errors found at the tagged VLAN configuration phase.

5.13.3 Corrective action

1. Check that the new ATCA shelf is powered-on.

2. Check that the new ATCA shelf (JBXSSW boards have to be connected) is correctly connected to existing ATCA shelf containing OMCP boards (see A9130 MFS Evolution Commissioning method),

3. Check tagged VLAN configuration of the new ATCA shelf (see checks described in 5.8.3)

5.14 No RRALLI sent on GSLs, for all cells of the BSC, after activate MLU ( MFS Evolution only )5.14.1 Reference FR: 3BKA20FBR1864035.14.2 Problem description

TESTENVIRONMENT: OMCSAW20N MFSXAW20K BSCXAW21S + patch 0029 (for MLU)

An MLU was started on the platform. After Activate MLU, on GSL traces, RRALLI messages were sent by the BSC to MFS

5.14.3 Corrective action

The workaround find for this problem is "Re-initialyze GPRS" for all the cells of the BSC.

5.15 Hangingalarm"Cardvoltageoutofrange"forMFSJBXSSWafterpowerfailure ( MFS Evolution only )

5.15.1 Reference 3BKA20FBR199071

5.15.2 Problem description

After powering off by accident this equipment and powering on again, False CRITICAL alarm (JBXSSWs: 'card voltage out of range') is reported to operator. They are not cleared but hanging in MFS.

5.15.3 Corrective action

No correction availbale in TOMAS MD4/SP1 and TOMAS MD5.

Workaround :

The workaround to suppress a switch alarm is to extract, then insert the concerned switch. This can be done only if the switch is not on the same switch plane than the active ShMC.

For an alarm on switch plane LSN1, the Shelf Manager on plane LSN1 must be active: Determine the active Shelf Manager: From the active station, execute "/usr/nectar/bin/sv_status 172.17.3.8" (this IP address matches to Shelf Manager plane 1 in Shelf 3, with a subnet 172.17.0.0).

If the response is "openhpid is active", ShMC 1 is active, then switch plane LSN2 can be extracted. If the response is "openhpid is standby", ShMC 1 is standby, execute "/usr/nectar/bin/sv_activate 172.17.3.8" to make it active. After a check with sv_status, switch plane 2 can be extracted.

Do the opposite for an alarm on switch plane LSN2.

5.16 PV_PEM alarms raised and not cleared from IMT and OMCR after MFS power off/on ( MFS Evolution only )

5.16.1 Reference 3BKA20FBR186125

5.16.2 Problem description

After powering off / powering on MX MFS equipment according to method documentation, some alarm "Failure of a chassis unit" on some PV_PEM board may remain.

5.16.3 Corrective action

Power off/Power on PV_PEM board on alarm. The corresponding alarms should disappear from IMT. In order to eliminate the alarm from OMCR, it is necessary to perform "Audit Alarm" from MFSUSM.

5.17 mfssetup or configure_switch failure after replacing a new SSW board. ( MFS Evolution only )

5.17.1 Reference 3BKA13FBR199323

5.17.2 Problem description

Some errors may occur after launching mfssetup or configure_switch tools around VLAN configuration after making a SSW board replacement. Not expected VLAN definition , expecially VLAN tag 34 and 35 may be already defined and blocked any mfssetup or configure_switch actions.

5.17.3 Corrective action

Delete unexpected VLAN configuration ( VLAN 34 and 35 definition ) by launching the following commands :

Check VLAN definitions of SSW board :

/usr/mfs/bin/checkVlanConfig @SSW

with @SSW representing the IP adress of the SSW board.

If unexpected VLAN detected ( especially VLAN 34 or VLAN 35 ) launched the corresponding commands :

For VLAN 34 definition :

/usr/mfs/bin/ del_vlan_mxmfs @SSW 34

with @SSW representing the IP adress of the SSW board

For VLAN 35 definition :

del_vlan_mxmfs @SSW 35

with @SSW representing the IP adress of the SSW board

6 AUTOMATIC SOFTWARE CHANGE

Note: Pre requisit for SWC are described in Installation user guide, reference [1].

What/behaviorTrouble originFix

1) Bad exec of ins_swcx.shFiles created by wrong ownerClean up and restart

2) rmdir fails during execution of ins_swcx.shcygwin is installed on the PCRename /usr/bin/rmdir.exe

3) Error Temporary local directory error at starting timeBad loginLog as admin and restart

4) Step 2: File Access Error for the /DELIV/dlv.bck fileone /nfs partition not seen on standby CS Relauch standby CS with BUI command restart

5) Step 2: CreationRoot file system (/) of active CS is fullFree disk space

6) Step 3: VerifyVarious originCheck 4.3 chapter

7) Step 5: IsolationVarious originCheck 4.4 chapter

8) Step 6: Major version changeNew active CS rebootsCheck Shared disk state

9) Step 7: strange IMT displayOld version not deletedDelete old version

10) CS reboots in loop with reset-code 214/etc/sysconfigtab corruptedAdd missing lines

11) UNIX patch installation makes Control Station unusable (B9 MR1 ED2)Current kernel is cleaned and new kernel is not generatedRestore the backup system

12) UNIX patch installation fails with a core file generated from 'install_patch_du'PWD variable is not setpatches_DUNIX4.0F-22-13_SEC10 and upper for RC23

patches_DUNIX5.0A-24-4_SEC11 and upper for RC40

13) bul file execution returns 1 errorDataPatch*.bul already launched during SW migration/ReplacementNo need to execute again

14) Step 3: Verify- there are no double links for /usr/mfs/bin/clean_spdata

- version descriptor files are not correctly restored- make the double link manually on STATION_B

- launch install_lsm

6.1 Error during execution of ins_swcx.sh

6.1.1 Reference FR: None.

6.1.2 Problem description

A previous software was performed with a bad userid : that creates files owned by a wrong userid and prevents file creation by the automatic software change.

All the software change from the OMC must be performed with username = axadmin for OMC and admin for IMT, otherwise there can be error during SW change.

6.1.3 Corrective action

Remove (logged as root on OMC) the following files and directories if existing:

/var/tmp/cw323mt.dll/var/tmp/indus_ngp_del_desc_file.pl/var/tmp/install.pl/var/tmp/paexr.exe/var/tmp/perl.dll/var/tmp/perl.exe/alcatel/var/home/axadmin/alcatel/tmp_mfs (directory => rm rf )/alcatel/tmp_mfs (directory => rm rf)

Then, login as axadmin on OMC and perform again the preinstallation (ins_swcx.sh).

Thereafter, reopen the IMT with username = admin .

6.2 rmdir fails during execution of ins_swcx.sh when cygwin is installed on the PC

6.2.1 Reference FR: 3BKA13FBR163888

6.2.2 Problem description

Rmdir fails with the following error:

rmdir: option invalide -- q Pour en savoir davantage, faites: `rmdir --help'.

6.2.3 Corrective action

rename /usr/bin/rmdir.exe to /usr/bin/rmdir.exe.sav launch again ins_swcx.sh

6.3 Error Temporary local directory error on IMT during step 0

6.3.1 Reference FR: None.

6.3.2 Problem description

When trying to start automatic software change, an error happens

6.3.3 Corrective action

Log as admin user on IMT and OMC-R in order to perform automatic software change.

6.4 Error File Access Error" with dlv.bck always appears when doing SW replacement

6.4.1 Reference FR: 3BKA20FBR150527

6.4.2 Problem description

When performing a SW Replacement, the step 1/10 of the procedure completes, but when it is in step 2/10, there is an error message in the IMT "File Access Error" for the file /DELIV/dlv.bck

Problem comes that one /nfs partition is not seen on the stanby station, so that /DELIV can not be seen on both stations.

6.4.3 Corrective action

Connect on Standby station and type :

df -k

You must see the following result concerning xxx.nfs partitions :

secure_serveur.100:/var/nse/mnt/secure_serveur/RESERVED 102400 33 97736 1% /var/nse/mnt/secure_serveur/RESERVED.nfssecure_serveur.100:/var/nse/mnt/secure_serveur/BACKUP 102400 16 95008 1% /var/nse/mnt/secure_serveur/BACKUP.nfssecure_serveur.100:/var/nse/mnt/secure_serveur/DELIV 512000 101921 403624 21% /var/nse/mnt/secure_serveur/DELIV.nfssecure_serveur.100:/var/nse/mnt/secure_serveur/RESULT 307200 5372 295144 2% /var/nse/mnt/secure_serveur/RESULT.nfssecure_serveur.100:/var/nse/mnt/secure_serveur/omcxchg 102400 585 95680 1% /var/nse/mnt/secure_serveur/omcxchg.nfssecure_serveur.100:/var/nse/mnt/secure_serveur/spdata 65536 7434 52232 13% /var/nse/mnt/secure_serveur/spdata.nfsIf you have not these elements in the output, you must relaunch the standby station.

At the IMT, on the BUI->request window, you must type the following command

if the standby station is STATION_A :

action sta [PILOT/A] (restart());if the standby station is STATION_B :

action sta [PILOT/B] (restart());Then, check on Nectar view that the standby station has come up.

Do a roll-back until step1 and try the SW replacement again.

6.5 Error at step 2/10 (Creation)

6.5.1 Reference FR: None.

6.5.2 Problem description

During the software replacement the phase CREATION is stopped by errors displayed in a popup window with the following:

Generic error enca_ope_failed ensw file check error PILOT/A - The file copy from delivery fileset to target fileset failed : fsync error

that means the root file system ( / ) of the active station should be full (100%).

6.5.3 Corrective action

First open a xterm, and type df k. Check that /usr and /var directories are not full (less than 85% used). If its not the case, go to

/usr/mfs/log => remove all big trace files (*.old, and TraceGOM if exist)

/var/adm/crash => remove vmunix & vmcore files (\rm vm*)

/var/adm/nectar/crash => remove all Dump file & Core files (\rm core*, \rm Dump*)

Then check again the space left with df k command. Do not begin the migration in case the space left is too small.See following directories:

/var/adm/nectar/log

/usr/mfs/log

/var/adm/nectar/crash

/RESULT

Perform also the quotacheck command to report the discrepancies between the calculated and recorded disk quota:

On active Control Station:

quotacheck -v /var

quotacheck -v /usr

quotacheck -v /

quotacheck -v /DELIV

quotacheck -v /spdataquotacheck -v /omcxchg

quotacheck -v /RESULTOn standby Control Station:

quotacheck -v /var

quotacheck -v /usr

quotacheck -v /

6.6 Error at step 3/10 (Verify)

6.6.1 Reference FR: 3BKA20FBR099035 = 3BKA13FBR1023556.6.2 Problem description

The IMT pops up an alert window with the following text: Error occur , see log file.

This means that the software replacement is stopped due to errors found at the VERIFY phase.

6.6.3 Corrective action

Open the BUI reception view on IMT to see more details.

Only four current cases are described below:

6.6.3.1 Many errors found

The best to do in this case is to remove and destroy the version by clicking several times on back button on IMT and perform again the automatic software change. The installation was probably badly performed.

6.6.3.2 bad state error

Example:> --- Software management error ---> Failed on request: action version[MFSSAT05_06A](verify());> Message for request #63 =>ACTION_RSP version [MFSSAT05_06A]

(

verify(), /* Errors : ***************/

generic_err= ENCA_MAJOR_ERROR : A major error occurred during the action ...,

specific_err= ENSW_CHECKSUM_ERROR: component checksum error,

text_err= "PILOT/A - /usr/mfs/bin/mfsQ3Agt"

) ;

> _____ Abortive session for request #63 => ACTION_RSP version [MFSSAT05_06A]

(

verify(), /* Errors : ***************/

generic_err= ENCA_OPE_FAILED : the operation cannot be executed,

specific_err= ENCM_PF_VERSION_BAD_STATE: The specified version is in a bad state for this request,

text_err= "PILOT/A - /usr/mfs/bin/mfsQ3Agt"

) ;

> --- Software management error end ------

Rollback to the step two of the Software change.

Perform again a software change

6.6.3.3 Checksum errors on the MIB files

These files are located into the /spdata directory.

This procedure is to be used only with MIB files (i. e. files located in /spdata directory) , as the final purpose is to get rid of MIB checksum.

When platforms will be installed with version MFSSAT05.05L and further).Example:Message for request #3 =>

ACTION_RSP version [MFSSAT05_05L]

(

verify(),

/* Errors :

***************/

generic_err = ENCA_MAJOR_ERROR : A major error occurred during the action ...,

specific_err = ENSW_CHECKSUM_ERROR : component checksum error,

text_err = "PILOT/B