132
WCDMA RAN and I-HSPA, Rel. RU30, Operating Documentation, Issue 03 OMS Alarms DN70398724 Issue 04C Approval Date 2011-06-01 Confidential

Oms Alarms - Ru30 Ep1 p7

  • Upload
    lhgoul

  • View
    273

  • Download
    11

Embed Size (px)

DESCRIPTION

Oms Alarms - Ru30 Ep1 p7

Citation preview

  • WCDMA RAN and I-HSPA, Rel. RU30, Operating Documentation, Issue 03

    OMS Alarms

    DN70398724

    Issue 04CApproval Date 2011-06-01

    Confidential

  • 2 DN70398724Issue 04C

    OMS Alarms

    Id:0900d80580892333Confidential

    The information in this document is subject to change without notice and describes only the product defined in the introduction of this documentation. This documentation is intended for the use of Nokia Siemens Networks customers only for the purposes of the agreement under which the document is submitted, and no part of it may be used, reproduced, modified or transmitted in any form or means without the prior written permission of Nokia Siemens Networks. The documentation has been prepared to be used by professional and properly trained personnel, and the customer assumes full responsibility when using it. Nokia Siemens Networks welcomes customer comments as part of the process of continuous development and improvement of the documentation.

    The information or statements given in this documentation concerning the suitability, capacity, or performance of the mentioned hardware or software products are given "as is" and all liability arising in connection with such hardware or software products shall be defined conclusively and finally in a separate agreement between Nokia Siemens Networks and the customer. However, Nokia Siemens Networks has made all reasonable efforts to ensure that the instructions contained in the document are adequate and free of material errors and omissions. Nokia Siemens Networks will, if deemed necessary by Nokia Siemens Networks, explain issues which may not be covered by the document.

    Nokia Siemens Networks will correct errors in this documentation as soon as possible. IN NO EVENT WILL Nokia Siemens Networks BE LIABLE FOR ERRORS IN THIS DOCUMENTA-TION OR FOR ANY DAMAGES, INCLUDING BUT NOT LIMITED TO SPECIAL, DIRECT, INDI-RECT, INCIDENTAL OR CONSEQUENTIAL OR ANY LOSSES, SUCH AS BUT NOT LIMITED TO LOSS OF PROFIT, REVENUE, BUSINESS INTERRUPTION, BUSINESS OPPORTUNITY OR DATA,THAT MAY ARISE FROM THE USE OF THIS DOCUMENT OR THE INFORMATION IN IT.

    This documentation and the product it describes are considered protected by copyrights and other intellectual property rights according to the applicable laws.

    The wave logo is a trademark of Nokia Siemens Networks Oy. Nokia is a registered trademark of Nokia Corporation. Siemens is a registered trademark of Siemens AG.

    Other product names mentioned in this document may be trademarks of their respective owners, and they are mentioned for identification purposes only.

    Copyright Nokia Siemens Networks 2011. All rights reserved

    f Important Notice on Product SafetyThis product may present safety risks due to laser, electricity, heat, and other sources of danger.

    Only trained and qualified personnel may install, operate, maintain or otherwise handle this product and only after having carefully read the safety information applicable to this product.

    The safety information is provided in the Safety Information section in the Legal, Safety and Environmental Information part of this document or documentation set.

    The same text in German:

    f Wichtiger Hinweis zur Produktsicherheit Von diesem Produkt knnen Gefahren durch Laser, Elektrizitt, Hitzeentwicklung oder andere Gefahrenquellen ausgehen.

    Installation, Betrieb, Wartung und sonstige Handhabung des Produktes darf nur durch geschultes und qualifiziertes Personal unter Beachtung der anwendbaren Sicherheits-anforderungen erfolgen.

    Die Sicherheitsanforderungen finden Sie unter Sicherheitshinweise im Teil Legal, Safety and Environmental Information dieses Dokuments oder dieses Dokumentations-satzes.

  • DN70398724 3

    OMS Alarms

    Id:0900d80580892333Confidential

    Table of contentsThis document has 132 pages.

    Summary of changes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

    1 Common alarms. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91.1 70001 CONFIGURATION OF SNMP MEDIATOR IS OUT OF ORDER . 91.2 70002 INVALID SNMP TRAP COMMUNITY STRING . . . . . . . . . . . . . 111.3 70003 NO REPLY TO SNMP REQUEST . . . . . . . . . . . . . . . . . . . . . . . 131.4 70004 UNKNOWN SNMP TRAP. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151.5 70005 INCORRECT ALARM DATA. . . . . . . . . . . . . . . . . . . . . . . . . . . . 171.6 70007 AUTHENTICATION FAILURE IN ETHERNET DEVICE. . . . . . . 191.7 70011 NODE NOT RESPONDING . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211.8 70025 POSSIBLE SECURITY THREAT IN NETWORK ELEMENT . . . 241.9 70030 DISK DATABASE IS GETTING FULL . . . . . . . . . . . . . . . . . . . . 251.10 70064 BACKUP ERROR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 271.11 70110 CONFIGURATION OF NWI3 ADAPTER IS OUT OF ORDER. . 281.12 70111 FAILED TO CREATE NETACT CONNECTION . . . . . . . . . . . . . 311.13 70156 DISK DATABASE WATCHDOG START-UP FAILED . . . . . . . . 341.14 70157 CPU USAGE OVER LIMIT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 361.15 70158 FILE SYSTEM USAGE OVER LIMIT . . . . . . . . . . . . . . . . . . . . . 371.16 70159 MANAGED OBJECT FAILED. . . . . . . . . . . . . . . . . . . . . . . . . . . 391.17 70160 MEMORY USAGE OVER LIMIT. . . . . . . . . . . . . . . . . . . . . . . . . 441.18 70161 OPERATING SYSTEM MONITORING FAILURE . . . . . . . . . . . 451.19 70162 RAID ARRAY HAS BEEN DEGRADED . . . . . . . . . . . . . . . . . . . 461.20 70163 ETHERNET INTERFACE USAGE OVER LIMIT . . . . . . . . . . . . 471.21 70164 ETHERNET LINK FAILURE . . . . . . . . . . . . . . . . . . . . . . . . . . . . 481.22 70166 MANAGED OBJECT LOCKED. . . . . . . . . . . . . . . . . . . . . . . . . . 491.23 70168 CLUSTER STARTED (RESTARTED) . . . . . . . . . . . . . . . . . . . . 501.24 70173 BACKEND DATABASE REQUIRED BY CORBA NAMING SER-

    VICE IS UNAVAILABLE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 511.25 70186 CLUSTER OPERATION INITIATED BY OPERATOR . . . . . . . . 541.26 70188 MANAGED OBJECT SHUTDOWN BY OPERATOR . . . . . . . . . 551.27 70189 MANAGED OBJECT UNLOCKED BY OPERATOR. . . . . . . . . . 561.28 70236 LDAP DATABASE CORRUPTED. . . . . . . . . . . . . . . . . . . . . . . . 571.29 70237 CORRUPTED LDAP DATABASE RECOVERED. . . . . . . . . . . . 601.30 70242 ALARM LOG FILE INACCESSIBLE . . . . . . . . . . . . . . . . . . . . . . 621.31 70243 ALARM PROCESSOR CONFIGURATION IS OUT OF ORDER 641.32 70244 CORRUPTED ALARM DATA . . . . . . . . . . . . . . . . . . . . . . . . . . . 661.33 70245 ILLEGAL INTERNAL USAGE OF EXTERNAL ALARM NOTIFICA-

    TION FORMAT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 671.34 70246 ALARM SYSTEM HEARTBEAT . . . . . . . . . . . . . . . . . . . . . . . . 691.35 70247 ALARM SYSTEM HEARTBEATING SWITCHED OFF . . . . . . . 711.36 70256 RESOURCE ALLOCATION OR DE-ALLOCATION FAILURE . . 731.37 70265 RECOVERY ACTIONS BANNED FOR MANAGED OBJECT . . 751.38 70267 EXTERNAL USER ACCOUNT VALIDATION FAILED . . . . . . . . 771.39 70268 EXTERNAL LDAP FAILURE . . . . . . . . . . . . . . . . . . . . . . . . . . . 801.40 70269 INVALID ACTIVE SESSIONS. . . . . . . . . . . . . . . . . . . . . . . . . . . 83

  • 4 DN70398724

    OMS Alarms

    Id:0900d80580892333Confidential

    1.41 70280 UNKNOWN SPECIFIC PROBLEM . . . . . . . . . . . . . . . . . . . . . . . 861.42 70316 LOCAL OR REMOTE APPLICATION SERVER [PROCESS]

    DOWN. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 891.43 70320 SCCP SIGNALING POINT INACCESSIBLE . . . . . . . . . . . . . . . . 931.44 70322 SCCP USER OUT OF SERVICE. . . . . . . . . . . . . . . . . . . . . . . . . 941.45 71000 PM FTP CONNECTION FAILED . . . . . . . . . . . . . . . . . . . . . . . . . 951.46 71001 MEASUREMENT DATA NOT TRANSFERRED . . . . . . . . . . . . . 961.47 71002 MEASUREMENT DATA ERROR . . . . . . . . . . . . . . . . . . . . . . . . 971.48 71003 OMS MEASUREMENT DATA PROCESSING OVERLOAD . . . . 981.49 71005 THRESHOLD MONITORING LIMIT EXCEEDED . . . . . . . . . . . . 991.50 71006 WCEL THRESHOLD MONITORING LIMIT EXCEEDED . . . . . 1011.51 71007 MEASUREMENT THRESHOLD MONITORING LIMIT EXCEEDED

    1021.52 71052 OMS FILE TRANSFER CONNECTION COULD NOT BE OPENED

    1041.53 71053 O&M SUPPORT FOR INTEGRATED 3RD PARTY DEVICES . 1051.54 71054 O&M MEDIATION FAILURE . . . . . . . . . . . . . . . . . . . . . . . . . . . 1061.55 71055 NETWORK ELEMENT RESTARTED . . . . . . . . . . . . . . . . . . . . 1071.56 71057 RNW NOTIFICATION MISSING . . . . . . . . . . . . . . . . . . . . . . . . 1081.57 71058 NE O&M CONNECTION FAILURE . . . . . . . . . . . . . . . . . . . . . . 1091.58 71059 INCORRECT CONFIGURATION DATA IN LDAP. . . . . . . . . . . 1101.59 71060 EXTERNAL ETHERNET SWITCH CONNECTION FAILURE . . 1121.60 71061 INVALID IP CONFIGURATION . . . . . . . . . . . . . . . . . . . . . . . . . 1131.61 71086 MAJOR SW UPGRADE DATA IMPORT FAILURE . . . . . . . . . . 1151.62 71087 NTP TIME SYNCHRONISATION LEADING TO LDAP REPLICA-

    TION FAILURE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1161.63 71088 MMI CONNECTION FAILURE. . . . . . . . . . . . . . . . . . . . . . . . . . 1181.64 71101 OMS ALARM UPLOAD FROM NE FAILED . . . . . . . . . . . . . . . 1191.65 71102 ALARM FROM NE CORRUPTED . . . . . . . . . . . . . . . . . . . . . . . 1201.66 71103 ID CONFLICT IN BTS O&M CONNECTION . . . . . . . . . . . . . . . 1211.67 71106 TROUBLESHOOTING DATA RECEIVED. . . . . . . . . . . . . . . . . 1221.68 71107 INSECURE O&M CONNECTION . . . . . . . . . . . . . . . . . . . . . . . 1231.69 71109 PERFORMANCE MEASUREMENTS DROPPED. . . . . . . . . . . 1241.70 71110 STAGING AREA IN INCONSISTENT STATE . . . . . . . . . . . . . . 1251.71 71111 SW SET ACTIVATION FAILED . . . . . . . . . . . . . . . . . . . . . . . . . 1261.72 71112 SW SET POSTACTIVATION SCRIPT EXECUTION ERROR. . 127

    2 IPA-RNC-specific alarms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1282.1 71104 NE CONNECTION REJECTED. . . . . . . . . . . . . . . . . . . . . . . . . 1282.2 71105 BTS O&M TOTAL CONNECTION LIMIT EXCEEDED . . . . . . . 129

    3 I-HSPA-specific alarms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1303.1 71099 INCOMPATIBLE SW PACKAGES. . . . . . . . . . . . . . . . . . . . . . . 1303.2 71104 NE CONNECTION REJECTED. . . . . . . . . . . . . . . . . . . . . . . . . 1313.3 71105 BTS O&M TOTAL CONNECTION LIMIT EXCEEDED . . . . . . . 132

  • DN70398724 5

    OMS Alarms

    Id:0900d80580892333Confidential

    List of tablesTable 1 Valid and default attribute values of the NWI3 adapter configuration file .

    28

  • 6 DN70398724

    OMS Alarms

    Id:0900d80580892403Confidential

    Summary of changes

    Summary of changesChanges between document issues are cumulative. Therefore, the latest document issue contains all changes made to previous issues.

    Changes between issues 04B (2011-03-03, WCDMA RAN RU30) and 04C (2011-06-06, WCDMA RAN RU30)

    General changes

    I-HSPA-specific alarms have been included. Alarm 71102 ALARM FROM NE CORRUPTED has been moved from the Multicon-

    troller RNC-specific alarms to the Common alarms.

    New alarms

    70316 LOCAL OR REMOTE APPLICATION SERVER [PROCESS] DOWN 70320 SCCP SIGNALING POINT INACCESSIBLE 70322 SCCP USER OUT OF SERVICE 71088 MMI CONNECTION FAILURE 71099 INCOMPATIBLE SW PACKAGESModified alarms

    71104 DISK DATABASE IS GETTING FULL 71105 DISK DATABASE WATCHDOG START-UP FAILED

    Changes between issues 04A (2010-12-11, WCDMA RAN RU30) and 04B (2011-03-03, WCDMA RAN RU30)Alarm 71109 PERFORMANCE MEASUREMENTS DROPPED has been moved from the IPA-RNC-specific alarms to the Common alarms.

    New alarms

    71059 INCORRECT CONFIGURATION DATA IN LDAP 71060 EXTERNAL ETHERNET SWITCH CONNECTION FAILURE 71061 INVALID IP CONFIGURATION 71086 MAJOR SW UPGRADE DATA IMPORT FAILURE71109 PERFORMANCE

    MEASUREMENTS DROPPED

    Modified alarms

    70030 DISK DATABASE IS GETTING FULL 70156 DISK DATABASE WATCHDOG START-UP FAILED 70256 RESOURCE ALLOCATION OR DE-ALLOCATION FAILURE

  • DN70398724 7

    OMS Alarms Summary of changes

    Id:0900d80580892403Confidential

    Changes between issues 04 (2010-10-08, WCDMA RAN RU30) and 04A (2010-12-11, WCDMA RAN RU30)New alarms

    71109 PERFORMANCE MEASUREMENTS DROPPED 71110 STAGING AREA IN INCONSISTENT STATE 71111 SW SET ACTIVATION FAILED 71112 SW SET POSTACTIVATION SCRIPT EXECUTION ERRORModified alarms

    71087 NTP TIME SYNCHRONISATION LEADING TO LDAP REPLICATION FAILURE

  • 8 DN70398724

    OMS Alarms

    Id:0900d80580892403Confidential

    Summary of changes

  • DN70398724 9

    OMS Alarms Common alarms

    Id:0900d80580892384Confidential

    1 Common alarms

    1.1 70001 CONFIGURATION OF SNMP MEDIATOR IS OUT OF ORDERProbable cause: Corrupt data

    Event type: Processing error

    Default severity: Minor

    MeaningThe configuration of the SNMP mediator contains values that are unacceptable.

    The invalid part of configuration is ignored. This causes partial loss of functionality. The SNMP traps may be lost.

    Identifying additional information fieldsConfiguration entry

    The name and value of the attribute that is out of order under the fssnmpMediator-Name=1, fsFragmentId=SNMP, fsClusterId=ClusterRoot branch.

    Additional information fields-

    InstructionsUse the parameter management application to correct the configuration branch that is out of order. The Application Additional Information field displays the attribute or entry name that has an unacceptable value. For example, the following entry causes the alarm 70001, if xxx is not a hostname that can be resolved:

    fssnmpNEId=xxx,fssnmpAttributeType=NEattrs,fssnmpMediatorName=1,fsFragmentId=SNMP,fsClusterId=ClusterRoot

    Testing instructions section below provides instructions for creating the invalid entry.

    ClearingThe alarm is cleared automatically by the alarm system after five minutes. If the config-uration is still out of order after that, the alarm is raised again.

    Testing instructions

    1. Open parameter management application and use it in the extended mode (select Browse > Mode > Extended Mode).

    2. Add an invalid hostname to SNMP mediators LDAP configuration:a) Expand the entry tree below fsFragmentID=SNMP: In the parameter manage-

    ment application main window, click the arrow next to the SNMP fragment in the entry tree (fsFragmentID=SNMP).

    b) Click the arrow next to fssnmpMediatorName=1 to further expand the entry tree.c) Select fssnmpAttributeType=NEattrs and click the arrow next to it to display the

    managed NEs.

  • 10 DN70398724

    OMS Alarms

    Id:0900d80580892384Confidential

    Common alarms

    d) Select Entry > New Child or right-click fssnmpAttributeType=NEattrs and select New Child.

    e) In the Add new entry dialog box, enter any value for attribute fssnmpMOID and value xxx for fssnmpNEId.

    f) Click OK and select Forced Activation in the Select Operation window.3. Restart /SNMPMediator.Alarm 70001 with IAAI=fssnmpNEId=xxx is raised.

  • DN70398724 11

    OMS Alarms

    Id:0900d805802d4c1bConfidential

    1.2 70002 INVALID SNMP TRAP COMMUNITY STRINGProbable cause: Corrupt data

    Event type: Processing error

    Default severity: Warning

    MeaningThe SNMP Mediator has received an SNMP trap that contains an invalid trap community string, that is, the community string in the trap does not match the community string in SNMP Mediator's configuration. The community strings are passwords that are used to authenticate the senders of SNMP traps.

    Identifying additional information fields-

    Additional information fields

    1. IP address of the SNMP agent that sent the trap2. The received trap community string3. Version of the used SNMP, possible values are:

    SNMPv1 SNMPv2c

    4. Object identifier of the received trapInstructions

    1. Check the IP address of the SNMP agent that sent the trap. The IP address is dis-played in the Identifying additional information fields field #1 of the alarm

    2. Check the community string that was received in the trap. The community string is displayed in the Application Additional Information field #1 of the alarm.

    3. Use the parameter management tool to check the community string that the SNMP Mediator expects. Attribute fssnmpCommunityString of the following entry defines the community string:fssnmpTrapSource=,fssnmpAttributeType=Commstrings,fssnmpMediatorName=1,fsFragmentId=SNMP,fsClusterId=ClusterRoot

    4. Modify the community string in the LDAP directory to match the community string received in the trap, or configure the SNMP agent to use the community string that the SNMP Mediator expects. Note that if no community string has been specified for an IP address in the LDAP, the SNMP Mediator accepts all community strings from that address.

    ClearingClear the alarm with the alarm management application after correcting the fault as pre-sented in Instructions.

    Testing instructions

    1. Open the parameter management application and use it in normal mode, when SNMP Mediator is running.

  • 12 DN70398724

    OMS Alarms

    Id:0900d805802d4c1bConfidential

    2. Define the trap community for address CLA-0 to be -secret" by adding the following entry to SNMP mediator's LDAP configuration: dn:fssnmpTrapSource=CLA-0,fssnmpAttributeType=Commstrings,fssnmpMediatorName=1,fsFragmentId=SNMP,fsClusterId=ClusterRoot,fssnmpCommunityString: secret,fssnmpTrapSource: CLA-0,objectClass: FSSNMPTrapCommunityString,objectClass: top,objectClass: FSMOCBase

    3. Log into CLA-0.4. Send a trap to SNMP Mediator with the following command:

    # snmptrap -v 1 -c public SNMPMediator "" 0 0 ""Alarm 70002 INVALID SNMP TRAP COMMUNITY STRING withIAAI= and AAI="public SNMPv1 .1.3.6.1.6.3.1.1.5.1" is raised.

  • DN70398724 13

    OMS Alarms

    Id:0900d805803b05c7Confidential

    1.3 70003 NO REPLY TO SNMP REQUESTProbable cause: Corrupt data

    Event type: Processing error

    Default severity: Warning

    MeaningSNMP Mediator has sent an SNMP request to an SNMP agent but it has not received a response.

    Example 1. A filter condition has been added for the authenticationFailure1.3.6.1.6.3.1.1.5.5 trap. Thus the following entry can be viewed by the parameter management tool:fssnmpV2TrapId=.1.3.6.1.6.3.1.1.5.5 fssnmpAttributeType=V2trapsfssnmpMediatorName=1,fsFragmentId=SNMP,fsClusterId=ClusterRoot The filter condition is defined by the attribute fssnmpFilterCondition. fssnmpFilterCondition may have, for example, the value (.1.3.6.1.2.1.1.1.0=*Linux*). See RFC 2254 for more information about the filter syntax.

    Example 2. The SNMP Mediator receives the authenticationFailure trap that does not contain the value of variable .1.3.6.1.2.1.1.1.0. 3. The SNMP Mediator queries the value of .1.3.6.1.2.1.1.1.0 from the SNMP agent, but does not receive a response.

    The SNMP is not able to handle the trap correctly, because it is not able to query or modify variables in the SNMP agent.

    Identifying additional information fields-

    Additional information fieldsIP address of the SNMP agent that does not answer

    Instructions

    1. Check the IP address of the SNMP agent that sent the trap. The IP address is dis-played in the Application Additional Information field #1 of the alarm.

    2. The net-snmp command line tools (snmpget, snmpset and so on) provided by the operating system may be used to verify the functionality of the SNMP agent.

    3. To check the attributes defined for the SNMP agent, use the parameter manage-ment tool. The attributes are located under the following entry:fssnmpNEId=,fssnmpAttributeType=NEattrs,fssnmpMediatorName=1,fsFragmentId=SNMP,fsClusterId=ClusterRoot

    4. Verify that the optional attribute fssnmpUDPPort has the value that the SNMP agent is listening to. The default value is 161.

  • 14 DN70398724

    OMS Alarms

    Id:0900d805803b05c7Confidential

    5. Verify that the optional attribute fssnmpProtocolVersion is the same that the SNMP agent supports. The default value is V2c.

    6. Verify that the optional attributes fssnmpReadCommString and fssnmpWriteCommString are the ones that the SNMP agent expects.

    ClearingClear the alarm with the alarm management application after correcting the fault as pre-sented in Instructions.

    Testing instructions

    1. Open the parameter management tool and use it in normal mode, when SNMP Mediator is running.

    2. Add entry "fssnmpV2Trapld=.1.3.6.1.6.3.1.1.5.1" under branch "fssnmpAttribute-Type=V2traps,fssnmpMediatorName=1,fsFragmentld=SNMP,fsClusterld=Cluster-Root"

    3. Add attribute fssnmpFilterCondition to the entry created in step 2 and give it the value (.1.3.6.1.2.1.1.5.0=anystring) (The grammar for the filter condition is specified in http:/www.ietf.org/rfc/rfc2254.txt?number=2254)

    4. Verify that there is no SNMP agent process such as snmpd running on CLA-0.#netstat -alp | grep snmptcp 0 0 *:smux *:*LISTEN 11017/snmpdudp 0 0 *:snmp *:*11017/snmpd# kill 11017root@CLA-0(GUI):~# netstat -alp | grep snmp

    #5. Send a trap to SNMP Mediator with the following command (use the IP address of

    CLA-0 as agent IP):# snmptrap -v 1 -c public SNMPMediator "" 192.168.128.1 0 0 ""

    Alarm 70003 NO REPLY TO SNMP REQUEST is raised with AAI=192.168.128.1, because

    SNMP Mediator receives trap ".1.3.6.1.6.3.1.1.5.1", which does not contain the variable ".1.3.6.1.2.1.1.5.0" that is part of the filter condition.

    SNMP Mediator tries to get the value of ".1.3.6.1.2.1.1.5.0" from an SNMP agent running in address 192.168.128.1.

    SNMP Mediator does not get a response from 192.168.128.1, because no SNMP agent is running in the address.

  • DN70398724 15

    OMS Alarms

    Id:0900d805802d470bConfidential

    1.4 70004 UNKNOWN SNMP TRAPProbable cause: Corrupt data

    Event type: Processing error

    Default severity: Warning

    MeaningThe SNMP Mediator has received an SNMP trap that it is unaware of. The trap is unknown to the SNMP Mediator, if 1) the IP address of the SNMP agent that sends the trap is missing from the SNMP Mediator's configuration, or 2) the OID (object identifier) of the trap is unknown to the SNMP Mediator.

    1. Unknown traps may contain information that could be useful.2. Unnecessary traps waste network capacity.Identifying additional information fields-

    Additional information fields1. IP address of the SNMP agent that sent the trap

    2. Version of the used SNMP, possible values:

    SNMPv1 SNMPv2c3. Object identifier of the received trap

    Instructions

    1. Using the parameter management application, check that the IP address of the SNMP agent is stored in the SNMP Mediator's configuration. An entry of the follow-ing format should be found:fssnmpNEId=,fssnmpAttributeType=NEattrs,fssnmpMediatorName=1,fsFragmentId=SNMP,fsClusterId=ClusterRoot

    2. If the trap is unnecessary, check whether there is a way to disable the sending of the trap in the SNMP agent or use filtering in the SNMP Mediator. The SNMP Mediator may be configured to filter out traps by adding an entry of the following format:fssnmpV2TrapId= fssnmpAttributeType=V2traps,fssnmpMediatorName=1,fsFragmentId=SNMP,fsClusterId=ClusterRootIf the above entry without attributes exists in the configuration, the SNMP Mediator will ignore the trap and no alarm is raised. Additionally, filtering attributes fssnmpAcceptFrom or fssnmpDiscardFrom may be used to define the IP addresses from where the trap should be accepted or ignored. Attribute fssnmpFil-terCondition may be used for filtering away traps based on variables within the trap itself. See RFC 2254 for information about the filter syntax ("approx", "extensible" and "escaping mechanism" are not supported).

    3. If the trap contains important information, the implementation of the SNMP Mediator should be updated. The rules that define what the SNMP Mediator does when it

  • 16 DN70398724

    OMS Alarms

    Id:0900d805802d470bConfidential

    receives traps are part of the implementation. Fill in a problem report and send it to your local Nokia Siemens Networks representative.

    ClearingClear the alarm with the alarm management application after correcting the fault as pre-sented in Instructions.

    Testing instructions

    1. Log into the active CLA.2. Send coldStart trap to SNMP Mediator by using agent IP that is not in SNMPMedi-

    ator's configuration (127.0.0.1):# snmptrap -v 1 -c public SNMPMediator "" 127.0.0.1 0 0 ""

    3. Alarm 70004 UNKNOWN SNMP TRAP with AAI=127.0.0.1 and AAI= "SNMPv1 .1.3.6.1.6.3.1.1.5.1" is raised.

  • DN70398724 17

    OMS Alarms

    Id:0900d805803f7d2dConfidential

    1.5 70005 INCORRECT ALARM DATAProbable cause: Invalid parameter

    Event type: Processing error

    Default severity: Major

    MeaningThe alarm system has been requested to raise or clear an alarm with incorrect alarm data. One or more arguments provided with the request might have an invalid value or meaning:

    null empty too long out of specified range contain non-printable characters have an incorrect formatThe alarm number (Specific Problem) might also be unknown. An incorrect format in this case means, for example, that a character value was entered where a numeric value was expected. A special case of an incorrect format is if the quotes (") surrounding the value of an information field are missing from an alarm notification record in the syslog.The alarm which is requested to be raised or cleared with incorrect data is not processed further but the information is put as additional information in this alarm. If the alarm number is unknown, then the actual fault for which the alarm has been raised is also left unknown.

    Identifying additional information fields1. Erroneous data

    Identifies the alarm data that was incorrect or that was totally missing. Only the name of the first field containing invalid data is mentioned here. Possible values are: SP: Specific Problem given in the data is not known by the alarm system, or is

    not reasonable; MOId: Managed Object Id given in the data is not reasonable; PS: Perceived Severity given in the data is not reasonable; applId: Application Id given in the data is not reasonable; AAI: Additional Information given in the data is not reasonable; IAAI: Identifying Additional Information given in the data is not reasonable; alarmTime: Alarm time is presented in too long a format, or is in non-numerical

    format; length: The combined length of the string type fields (Managed Object Id, Appli-

    cation Id, Application Additional Information, Identifying Application Additional Information) given in the data exceeds the maximum value of 896 characters. Note that in this case, both Application Id and Managed Object Id in the given data are considered as invalid, as only the combined length is verified.

    In addition, these values are also possible for RNC alarms: rncLocalMOId: the Local Managed Object Id given in the data is not reasonable; rncApplicationId: the RNC Application Id given in the data is not reasonable; rncNotificationId: the RNC Notification Id given in the data is not reasonable;

  • 18 DN70398724

    OMS Alarms

    Id:0900d805803f7d2dConfidential

    rncFlowControl: the RNC Flow Control given in the data is not reasonable.2. Specific Problem

    Specific problem (the alarm number) of the invalid alarm can also contain the original invalid value if this was the invalid field.

    Additional information fieldsManaged Object Id

    Distinguished name of the managed object that was given as the Managed Object Id in the invalid alarm. If the MOId itself was the incorrect data, then the value fsManagedObjectId=invalid, fsClusterId=ClusterRoot is displayed in this field.

    InstructionsFill in a problem report and send it to your local Nokia Siemens Networks representative.

    ClearingClear the alarm with the alarm management application after correcting the fault as pre-sented in Instructions, in other words, after sending the report to your local Nokia Siemens Networks representative.

    Testing instructionsUse, for example, the alarm system command line interface (CLI) command flexalarm to send a request to raise or clear an alarm with a Specific Problem that does not exist.

    For example:

    $> flexalarm -raise -mo= -ap= -sp=700111where and have the correct format.

    Since the 700111 Specific Problem does not exist, alarm 70005 is raised.

  • DN70398724 19

    OMS Alarms

    Id:0900d805803c315dConfidential

    1.6 70007 AUTHENTICATION FAILURE IN ETHERNET DEVICEProbable cause: Protection path failure

    Event type: Equipment

    Default severity: Minor

    MeaningAn Authentication Failure SNMP trap signifies that the sending protocol entity is the addressee of a protocol message that is not properly authenticated. The agent on an Authentication failure generates this trap. The SNMP Trap is generated when some actor tries to request the SNMP queries with wrong authentication methods/keys. This authentication key is called the community string in SNMP. This is most likely someone with a misconfigured SNMP manager or MIB browser, but it may indicate malicious activity, that is, some malicious user trying to obtain information by sending an SNMP request. It does not get triggered for CLI (Command Line Interface)/Web login failures.

    The SNMP request will fail and no information will be returned.

    Identifying additional information fieldsIP address

    The trap was generated because of this IP address entity had wrong community string.

    Additional information fields-

    InstructionsIn case when there is no misconfigured SNMP managers there is a danger that some entity is inside the network without an authorization and this actor must be found. This entity can be identified from the authentication failure SNMP trap sent by SNMP agent.

    In case of misconfigured SNMP configuration in manager, the SNMP community string must be updated.

    ClearingClear the alarm with the alarm management application after correcting the fault as pre-sented in Instructions.

    Testing instructions

    1. Log into the switch. For example: [root@CLA-0(MIKAEL_R_FSPR4EDC_1.9) /root]# ssh switch-1Linux swsea 2.4.17_mvl21-swsea #1 Wed May 17 11:59:44 CDT 2006 ppc unknownLinux swsea 2.4.17_mvl21-swsea #1 Wed May 17 11:59:44 CDT 2006 ppc unknown

    2. 2. Start the swc command line tool:root@swsea@1-1-8:~# swc(RadiSys SWSE-A Switch) >

    3. Display the community strings by "show snmpcommunity":(RadiSys SWSE-A Switch) >show snmpcommunity

  • 20 DN70398724

    OMS Alarms

    Id:0900d805803c315dConfidential

    4. Exit the switch:(RadiSys SWSE-A Switch) >quitThe system has unsaved changes.Would you like to save them now? (y/n) nroot@swsea@1-1-8:~# exitlogoutConnection to switch-1 closed.

    5. Perform an SNMP Get request with a valid community string:# snmpget -c tstcomm -v 2c switch-1 system.sysDescr.0SNMPv2-MIB::sysDescr.0 = STRING: RadiSys SWSE-A Switch

    6. Perform an SNMP Get request with an invalid community string:# snmpget -c invalid -v 2c switch-1 system.sysDescr.0SNMPv2-MIB::sysDescr.0 = STRING: RadiSys SWSE-A SwitchAlarm 70007 will be raised after step 6 due to the invalid community string.

    SNMP Com-munity Name

    Client IP Address

    Client IP Mask

    Access Mode Status

    tstcomm 192.168.128.1

    0.0.0.0 Read Only Enable

    com 192.168.128.1

    0.0.0.0 Read Only Enable

  • DN70398724 21

    OMS Alarms

    Id:0900d8058043d853Confidential

    1.7 70011 NODE NOT RESPONDINGProbable cause: Equipment malfunction

    Event type: Equipment

    Default severity: Major

    MeaningA physical computing node has not restarted despite of restart attempts. The node may be broken, is unable to restart, or is stuck.

    Any important services/functions that are provided with an active-standby recovery group may have been taken over by other operational nodes. Services may be down if standby nodes are also down.

    Identifying additional information fields-

    Additional information fieldsAny further information if available.

    InstructionsPerform the following steps to verify the state of the node:

    1. Log into the cluster as root user. 2. Use the hwcli command to verify the state of the node. For example, the state of

    the node /CLA-1 can be checked as follows:$ hwcli CLA-0

    CLA-1: available (FlexiSvr CPI1 000157:0108 01.02)

    3. Previous hwcli output shows that the CLA-0 node is physically available. The high availability services (HAS) of the system attempts, after about 30 minutes, to restart a failed node by issuing a power-off, power-on and restart sequence. If you do not want to wait for this, you can perform the power-off, power-on and restart sequence manually.For example:

    $ hwcli --power off CLA-0ATTAMPTING TO POWER OFF NODECLA-0ARE YOU SURE YOU WANT TO PROCEED? yesPowering off CLA-0: OK$ hwcli --power on CLA-0Powering on CLA-0: OK$ hwcli --reset CLA-0ATTAMPTING TO RESET NODECLA-0ARE YOU SURE YOU WANT TO PROCEED? yesResetting CLA-0: OK

    4. If the node does not start within a few minutes or the hwcli does not show that the node is available, check if the CPU board has any error lights on. If it does, you can try to restore the node into service by removing and re-inserting the node.

  • 22 DN70398724

    OMS Alarms

    Id:0900d8058043d853Confidential

    5. Contact your Nokia Siemens Networks representative even if these operations bring the node up, because it is possible that the computing node needs to be replaced or it may, for example, need a BIOS upgrade.

    ClearingThe system clears the alarm automatically when the fault has been corrected.

    Testing instructions

    1. Power-off an operational unlocked node using hwcli. You can check the state of the node using fshascli. For example,

    $ fshascli --state /AS-1/AS-1administrative(UNLOCKED)

  • DN70398724 23

    OMS Alarms

    Id:0900d8058043d853Confidential

    3. The alarm is automatically cancelled when the node has successfully restarted. Issue a power-on for the node using hwcli and wait for the node restart to com-plete. For example,

    $ hwcli --power on AS-1Powering on AS-1: OK$ sleep 3m$ fshascli --state /AS-1/AS-1administrative(UNLOCKED)

  • 24 DN70398724

    OMS Alarms

    Id:0900d8058038eeedConfidential

    1.8 70025 POSSIBLE SECURITY THREAT IN NETWORK ELEMENTProbable cause: Threshold crossed

    Event type: Quality of Service

    Default severity: Warning

    MeaningThere is reason to suspect that someone is trying to intrude a network element. This condition emerges if there are too many wrong login attempts.

    Identifying additional information fields-

    Additional information fields-

    InstructionsSecurity log data must be checked. Investigate specially login entries made just before alarm was raised.

    ClearingAfter correcting the fault as presented in Instructions, clear the alarm with the alarm management application.

    Testing instructionsPrerequisites for the testing: Make an internal test account (i.e., to reside in the network element's LDAP server by using either the parameter management application or the fsuseradd CLI command) and set its password.

    1. Log into a node with ssh and with a valid user account and password so that a session is successfully started.

    2. Log out from the node.3. Log in with the same user account but with a wrong password the predefined

    number of times (for the number, please see the file /etc/pam.d/ssh its row"/opt/Nokia_BP/lib/security/$ISA/PamAlarm.so file=/var/log/faillog alarmThreshold= validfor=internal" in which the threshold is defined with the parameter alarmThreshold=").The default value for the needed subsequent failed logins is 5. Make sure that there are no successful logins for the user between the failed ones.An alarm should be raised after the predefined number of failed logins Check the alarm list with the alarm management application.Tip: You can also use Element Manager instead of ssh for the test.

  • DN70398724 25

    OMS Alarms

    Id:0900d8058082a8abConfidential

    1.9 70030 DISK DATABASE IS GETTING FULLProbable cause: Storage capacity problem

    Event type: Processing error

    Default severity: Major

    MeaningThe disk storage area reserved for disk database is filling up.

    The disk database is still fully operational. If the database fills up completely, its services cannot be used anymore.

    Identifying additional information fields-

    Additional information fields

    1. Max size: the maximum size of database in kB2. Fill ratio: the fill ratio of the databaseInstructionsThe actions to be done in order to avoid a completely full database are database-spe-cific, so contact your local Nokia Siemens Networks representative immediately and provide them with the information you obtained from the alarm notification's fields.

    ClearingThe system clears the alarm automatically when the fault has been corrected.

    Testing instructionsYou can test the alarm either by filling the database until the allocated space exceeds the fill ratio alarm limit, or by decreasing the fill ratio alarm limit under the current fill ratio of the database. You can also combine these two approaches.

    In the first approach, you simply create a dummy table to the database and insert rows to it until the fill ratio exceeds the fill ratio alarm limit (see attribute fsdbFillRatioAlarmLimit in the DB fragment in LDAP - Lightweight Directory Access Protocol).

    In the second approach, you must use a parameter management tool to change the fsdbFillRatioAlarmLimit attribute of the DB fragment to a smaller value than the current fill ratio of the database. After this, you must restart the recovery group of the database (fshascli -r /). The current fill ratio of the database can be estimated as follows:1. Get the maximum size of the database either by checking the

    innodb_data_file_path attribute from the MySQL instance configuration file (/var/mnt/local/MySQL_/my.cnf) or by connecting to the instance and entering the following command:SHOW GLOBAL VARIABLES LIKE 'innodb_data_file_path'\GThe maximum size is the sum of the maximum size of each InnoDB data file listed in the value. For example, the following result means that the maximum size is 500 MB (512'000 kB):

    *************************** 1. row ***************************Variable_name: innodb_data_file_path Value: ibdata1:500M

  • 26 DN70398724

    OMS Alarms

    Id:0900d8058082a8abConfidential

    2. Get the free space of the database by connecting to the instance and entering the following command for any InnoDB table:

    SHOW TABLE STATUS FROM LIKE ''\Gwhere is the schema name of the InnoDB table and is the name of the table. The comment column of the result set shows the free space. For example, the following result means that the database has 492'544 kB free space (when using the example size of step 1, the result leads to fill ratio of 3,8%):

    mysql> SHOW TABLE STATUS FROM test LIKE 'mysqlwdtest'\G*************************** 1. row *************************** Name: mysqlwdtest... Comment: InnoDB free: 492544 kB

    It does not matter which InnoDB table is used in the query. 3. Check the schema and the name of an arbitrary InnoDB table by using the fol-

    lowing query:SELECT table_schema,table_name FROM information_schema.tables WHERE engine = 'InnoDB' LIMIT 1;

  • DN70398724 27

    OMS Alarms

    Id:0900d805802f1c90Confidential

    1.10 70064 BACKUP ERRORProbable cause: Application subsystem failure

    Event type: Processing error

    Default severity: Minor

    MeaningBackup has failed because of a fatal error or it has been interrupted.

    As a result, either the backup archive does not exist or it is corrupted and unusable.

    Identifying additional information fields1. Backup log file. Identifies the name of the backup log file without the path.

    The format is BUTYPE_$BASE_$DATE, where $BUTYPE is either "FULL", "PARTIAL" or "CUSTOM", $BASE is the name of the base delivery or hostname (if flexiserver link is not present in the system), and $DATE is current date in the format YYYYMMDD_HHMMSS.

    Additional information fields-

    Instructions

    1. Locate the backup log from /var/mnt/local/backup/SS_Backup. The name of the log file is given in the alarm.

    2. See the backup summary at the end of the log.3. Search the log contents for "ERROR" and "WARNING" statements to see which

    backup module has failed.4. Refer to the backup and restore troubleshooting instructions.5. If the backup has failed before the log file has been created, search the syslog for

    the latest fsbackup entries.6. After the failure, re-execute the backup.However, if the failure was caused by incorrect environment and/or configuration, refer to backup and restore troubleshooting instructions and correct the environment and/or configuration before re-executing the backup.

    ClearingClear the alarm with an alarm management application after correcting the fault as pre-sented in Instructions.

    Testing instructions

    1. Start a partial backup. For example:fsbackup -p -v

    2. Interrupt the process by pressing Ctrl-C.The backup process raises an alarm.

    Or

    1. Lock a database recovery group (for example, TimesTen and Solid)2. Execute custom backup, for example:

    fsbackup -d -vThe backup process raises an alarm.

  • 28 DN70398724

    OMS Alarms

    Id:0900d8058036b134Confidential

    1.11 70110 CONFIGURATION OF NWI3 ADAPTER IS OUT OF ORDERProbable cause: Configuration or customizing error

    Event type: Processing error

    Default severity: Minor

    MeaningThe configuration file of NWI3 adapter contains invalid attribute values. Depending on the release, the configuration is stored only in files or files and LDAP (Lightweight Direc-tory Access Protocol).

    The system ignores the invalid parameters and uses the default values or the closest acceptable value. For example, the value 2000 is greater than the highest acceptable value (1440) for heartbeatPeriod (see the table in the Instructions) and causes this alarm. In this case, 1440 would be used as the heartbeatPeriod.

    Identifying additional information fieldsAttribute name: name of the attribute that has an invalid value

    Additional information fieldsFile path: the path of the file that includes invalid attribute values; or LDAP branch: the LDAP branch that includes invalid attribute values

    Instructions

    1. Correct the invalid attribute value. The attribute name is displayed in the Identifying additional information field. The name of the configuration file is displayed in the Additional information field. The attributes that can cause this alarm are mainly stored in file /var/opt/Nokia/SS_Nwi3Adapter/config/nwi3mdcorba.ini or LDAP branch fsFragmentId=mediator, fsFragmentId=NWI3, fsClusterId=ClusterRoot. The valid as well as default values of these attributes are presented in the table below. The attribute names in LDAP are prefixed with fsnwi3.

    Name Type Default

    (fsnwi3)takeIntoUseNext boolean: (0=false,1=true) in nwi3mdcorba.ini and (false,true) in LDAP

    0

    (fsnwi3)registrationServiceIOR string, a valid IOR to NetActs registration service

    empty string

    (fsnwi3)heartbeatPeriod short: [0..1440] minutes, granularity:1 minute

    15

    (fsnwi3)reRegistrationPeriod short: [15..1440] minutes, granularity:1 minute

    60

    (fsnwi3)registrationRetryBasePeriod short: [5..240] minutes, gran-ularity:1 minute

    15

    (fsnwi3)retryRandom short: [5..240] minutes, gran-ularity:1 minute

    5

    Table 1 Valid and default attribute values of the NWI3 adapter configuration file

  • DN70398724 29

    OMS Alarms

    Id:0900d8058036b134Confidential

    2. This alarm can also be caused by the parameter mediatorSessionManagerIOR located in file /var/opt/Nokia/www/SessionManager_V1.ior.Restart the NWI3 adapter to generate mediatorSessionManagerIOR into SessionManager_V1.ior. In normal conditions, the restart generates the param-eter with valid value.

    3. If the problem is the results from the parameter systemID in file /var/opt/Nokia/www/systemid.txt, the probable cause is that the file systemid.txt is missing. The value in systemID should be the same as in the file /etc/cluster-id. Copy /etc/cluster-id to /var/opt/Nokia/www/systemid.txt and restart the NWI3 adapter.

    ClearingClear the alarm with alarm management application after correcting the fault as pre-sented in Instructions.

    Testing instructions

    1. If file /var/opt/Nokia/SS_Nwi3Adapter/config/nwi3mdcorba.ini exists, set the following content to it (no value for registrationServiceIOR and takeIntoUseNext=1):

    [DN:N3CF-1]objectClassVersion=1N3CFId=1objectClass=N3CFconfigurationActive=0takeIntoUseNext=1registrationServiceIOR=registrationServiceUsername=NemuadminregistrationServicePassword=nemuuserheartbeatPeriod=15reRegistrationPeriod=60registrationRetryBasePeriod=15retryRandom=5rePublicationPeriod=3getPublicationServiceRetryPeriod=15userLabel=

    2. If branch fsFragmentId=mediator, fsFragmentId=NWI3, fsClusterId=ClusterRoot exists in the LDAP, use parameter management application for creating a new child to the branch. Enter the following attributes in the Add New Entry dialog: fsnwi3N3CFId=1 takeIntoUseNext=1

    (fsnwi3)rePublicationPeriod short [1..60] minutes, granu-larity:1 minute

    3

    (fsnwi3)getPublicationServiceRetry-Period

    short [1..60] minutes, granu-larity:1 minute

    15

    Name Type Default

    Table 1 Valid and default attribute values of the NWI3 adapter configuration file

  • 30 DN70398724

    OMS Alarms

    Id:0900d8058036b134Confidential

    3. Restart NWI3Adapter.If file nwi3mdcorba.ini was modified in step 1, alarm 70110 with IAAI= registration-ServiceIOR and AAI=/var/opt/Nokia/SS_Nwi3Adapter/config/nwi3mdcorba.ini is raised. If LDAP was modified in step 1, alarm 70110 with IAAI= fsnwi3registrationServiceIOR and AAI= fsnwi3N3CFId=1,fsFragmentId=mediator,fsFragmentId=NWI3,fsClus-terId=ClusterRoot is raised.

  • DN70398724 31

    OMS Alarms

    Id:0900d8058050aef9Confidential

    1.12 70111 FAILED TO CREATE NETACT CONNECTIONProbable cause: Connection establishment error

    Event type: Communications

    Default severity: Major

    MeaningThe NWI3 adapter failed to register to Nokia NetAct.

    NetAct cannot subscribe to notifications or be used for managing the network element (NE) via NWI3.

    Identifying additional information fields-

    Additional information fieldsDepending on the release

    N3CFId: the naming attribute of the active N3CF instance in file /var/opt/Nokia/SS_Nwi3Adapter/config/nwi3mdcorba.ini; or Distin-guished name of the active N3CF instance in LDAP.

    Instructions

    1. If file /var/opt/Nokia/SS_Nwi3Adapter/config/nwi3mdcorba.ini exists:a) Make sure that the NetAct Registration Service IOR (parameter registrationSer-

    viceIOR) is filled in file /var/opt/Nokia/SS_Nwi3Adapter/config/nwi3mdcorba.ini and check the correctness of the IOR. The command printIOR can be used for viewing the IP address and port included in the IOR.

    b) Verify that there is a valid username (parameter registrationServiceUsername) and password (registrationServicePassword) to the registration service of NetAct in file /var/opt/Nokia/SS_Nwi3Adapter/config/nwi3mdcorba.ini.

    c) Check the value of the takeIntoUseNext parameter in the nwi3mdcorba.ini file. The value of the parameter in an active section should be 1, and the value of the configurationActive parameter should also be 1. The system sets the value of the configurationActive parameter automatically to 1 when a parameter set is taken into use.

    2. If file /var/opt/Nokia/SS_Nwi3Adapter/config/nwi3mdcorba.ini does not exist and NWI3 adapter's configuration is stored under branch fsFragmen-tId=mediator, fsFragmentId=NWI3, fsClusterId=ClusterRoot in the LDAP:a) Verify that there is an LDAP entry fsnwi3N3CFId=>,fsFragmentId=media-

    tor, fsFragmentId=NWI3 with attribute fsnwi3takeIntoUseNext=true, which defines the active attribute set.

    b) Make sure that the NetAct Registration Service IOR (attribute fsnwi3registrationServiceIOR) has been specified for the active set and check the correctness of the IOR. Command printIOR can be used for viewing the IP address and port included in the IOR.

  • 32 DN70398724

    OMS Alarms

    Id:0900d8058050aef9Confidential

    c) If attributes fsnwi3NEAccountUsername and fsnwi3NEAccountPassword exist under branch fsFragmentId=security, fsFragmentId=NWI3, they are used for NetAct registration. Verify that they are valid.

    d) If attributes fsnwi3NEAccountUsername and fsnwi3NEAccountPassword do not exist under branch fsFragmentId=security, fsFragmentId=NWI3, the initial username (attribute fsnwi3initialRegistrationUsername) and password (fsnwi3initialRegistrationPassword) defined in the active set are used for NetAct registration. Verify that they are valid.

    3. Verify that NetAct is up and running and check the connection between the NE and NetAct. Ping NetAct from the node where the NWI3 adapter is running: ping -I .

    4. Check that the NetAct hostname is configured in the external domain name system (DNS) in use.

    ClearingThe alarm system clears the alarm automatically after the fault has been corrected.

    Testing instructions

    1. If file /var/opt/Nokia/SS_Nwi3Adapter/config/nwi3mdcorba.ini exists:a) Set the following content to it (a valid registrationServiceIOR of a non-existent

    NetAct object) and takeIntoUseNext=1):[DN:N3CF-1]objectClassVersion=1N3CFId=1objectClass=N3CFconfigurationActive=0takeIntoUseNext=1registrationServiceIOR=IOR:000000000000002449444c3a4e5749332f526567697374726174696f6e536572766963655f56313a312e3000000000010000000000000064000102000000000e3137322e32312e3232302e3631009c3f0000002400504d43000000040000000a2f4e65744163745253002020000000084e65744163745253000000025649530300000005000507017d00000000000000000000080000000056495300registrationServiceUsername=NemuadminregistrationServicePassword=nemuuserheartbeatPeriod=15reRegistrationPeriod=60registrationRetryBasePeriod=15retryRandom=5rePublicationPeriod=3getPublicationServiceRetryPeriod=15userLabel=

    b) Verify that NetAct's registration service is not running in the IP address and port defined by registrationServiceIOR.

    c) Restart NWI3Adapter.Alarm 70111 with AAI=1 is raised.

    2. If branch fsFragmentId=mediator, fsFragmentId=NWI3, fsClusterId=ClusterRoot exists in the LDAP:

  • DN70398724 33

    OMS Alarms

    Id:0900d8058050aef9Confidential

    a) Use parameter management tool for creating a new child to the branch. Enter the following attributes in the Add New Entry dialog: fsnwi3N3CFId=1 takeIntoUseNext=1 fsnwi3registrationServiceIOR=

    IOR:000000000000002449444c3a4e5749332f526567697374 726174696f6e536572766963655f56313a312e300000000001 0000000000000064000102000000000e3137322e32312e3232 302e3631009c3f0000002400504d43000000040000000a2f4e 65744163745253002020000000084e65744163745253000000 025649530300000005000507017d000000000000000000000 80000000056495300

    b) Restart NWI3Adapter.Alarm 70111 with AAI="fsnwi3N3CFId=1,fsFragmentId=mediator,fsFragmen-tId=NWI3,fsClusterId=ClusterRoot" is raised.

  • 34 DN70398724

    OMS Alarms

    Id:0900d8058082a8aeConfidential

    1.13 70156 DISK DATABASE WATCHDOG START-UP FAILEDProbable cause: Configuration or Customizing Error

    Event type: Processing error

    Default severity: Critical

    MeaningStart-up of the disk database watchdog has failed due to a configuration error, or other reasons.

    Because the disk database and its watchdog belong to the same Recovery Unit (RU), the disk database watchdog start-up failure means that the database is not available.

    Identifying additional information fields-

    Additional information fields

    1. Reason. Possible values: Disk database watchdog failed to read the parameters from the parameter man-

    agement system. Invalid or missing parameter value.

    2. List of invalid or missing parameters if the reason for the alarm is 2.InstructionsCheck the Application Additional Information field for a reason for the configuration error:

    Reason 1: Disk database watchdog failed to read the parameters from parameter management system.

    Reason 2: Invalid or missing parameter value.Continue according to the following procedure:

    1. Check that the following parameters exist in parameter management system for each database entry in the database fragment with the DN (Distinguished Name) "fsFragmentId=DB, fsClusterId=ClusterRoot":

    fsdbRedundancyModelfsdbDataSourceNamefsdbFillRatioAlarmLimitfsdbFillRatioCheckFreq

    2. Use the parameter management system to get the values of those parameters for the database in question. To find those parameters, use the value of the Managed Object field in alarm management application, for example:

    fsdbName=DB_Alarm,fsFragmentId=DB,fsClusterId=ClusterRoot3. Send the found values and/or parameters that do not exist (parameters for which

    the fields are empty) to your local Nokia Siemens Networks representative.

    ClearingThe system clears the alarm automatically when the fault has been corrected.

  • DN70398724 35

    OMS Alarms

    Id:0900d8058082a8aeConfidential

    Testing instructions

    1. Use the parameter management system to change the fsdbFillRatioAlarmLimit or fsdbFillRatioCheckFreq attribute of the database to a non-numeric value

    2. Restart the recovery group of the database.

  • 36 DN70398724

    OMS Alarms

    Id:0900d80580331c00Confidential

    1.14 70157 CPU USAGE OVER LIMITProbable cause: Threshold crossed

    Event type: Quality of service

    Default severity: Major

    MeaningA processor is being used at a very high throughput level because the execution of some processes is taking a lot of CPU time.

    There is a risk that the node is unable to fulfill the tasks allocated to it. This depends on to what extent the processes taking the most of the CPU time are blocking other pro-cesses from getting runtime on the CPU, and whether there is a temporary or a perma-nent increase on the throughput.

    If the processor is constantly used at a very high throughput level, the system might appear very slow. For example, the execution of commands takes an unusually long time to finish.

    Identifying additional information fields1. CPU index (optional).

    Additional information fields-

    Instructions

    1. RuntopLinux command on the node that reports the alarm. The command gives a repetitive update of processor activity in real time. It gives a listing of the most CPU-intensive tasks of the system.

    2. If the problem persists, contact your local Nokia Siemens Networks representative and provide the information gathered in the previous step.

    ClearingThe alarm is cleared automatically by the operating system's fault detector once the CPU usage is on a low enough level. The raising / clearing thresholds are different to prevent unnecessary trashing.

    Testing instructionsDo not test this alarm, because testing it will result in reduced quality of service.

  • DN70398724 37

    OMS Alarms

    Id:0900d8058034faeaConfidential

    1.15 70158 FILE SYSTEM USAGE OVER LIMITProbable cause: Threshold Crossed

    Event type: Quality of service

    Default severity: Major

    MeaningThe available disk space on a partition is smaller than the minimal requirement. The par-tition can be filled up, for example, by crashing programs resulting large core files or by large log files, if the rotation of logs does not function.

    There is a risk that some data cannot be written to the disk.

    Identifying additional information fieldsMountpoint

    Additional information fields-

    Instructions

    1. Run the df -k Linux command on the node that reports the alarm to get a report of the usage of the file system disk space in 1 kilobyte blocks.See the mountpoint in the Identifying additional information fields of the alarm.Alternatively, run the Linux commanddf -h to see the information in a human readable format.

    2. Run the Linux command du -k or du -h on the node that reports the alarm to disocver the directories that consume most of the space.

    3. Check with du -h /var/tmp/. if /var/tmp is among the large directories. If it is, remove the unnecessary files.

    4. Check with du -h /var/log/.if /var/log is among the large directories. If it is, move the old files outside the Network Element (NE) using the appropriate network management tools.

    5. Check with du -h /var/crash/. if /var/crash is among the large directories. If it is, move the core files outside the NE using the appropriate network management tools.

    6. If the alarm is not cleared, contact your local Nokia Siemens Networks representa-tive.

    ClearingThe alarm is automatically cleared by the operating system's fault detector once the amount of available disk space increases above the specified limit. The raising / clearing thresholds are different to prevent unnecessary trashing.

  • 38 DN70398724

    OMS Alarms

    Id:0900d8058034faeaConfidential

    Testing instructionsDo not test this alarm, because testing it in a live system will reduce the quality of service.

  • DN70398724 39

    OMS Alarms

    Id:0900d80580292626Confidential

    1.16 70159 MANAGED OBJECT FAILEDProbable cause: Software program abnormally terminated

    Event type: Processing error

    Default severity: Major

    MeaningThe named managed object (MO) has failed. The managed object can be a software, hardware or logical entity. The type of the managed object identifies the following:

    Node: The physical computing node, its system software, or operating system has failed, or the node has been manually restarted.

    Recovery Unit (RU): A recovery unit contains one or more processes. A recovery unit failure is usually caused by a process failure.

    Process: The process has crashed, terminated abnormally or stopped responding. Recovery Group (RG): A recovery group consists of one or more recovery units. A

    recovery group failure alarm is raised for an active-standby configuration, when both redundant components (recovery units of the recovery group) have failed. This is always a serious situation as it indicates a double failure (for example, two nodes have failed at the same time).

    The effect of the situation depends on the managed object type:

    Node: Any important services/functions that are provided with an active-standby or N+M recovery group may be taken over by other operational nodes. Services may be down if standby/spare nodes are also down.

    Recovery Unit (RU): If the recovery unit belongs to an active-standby or N+M recovery group, the service may be taken over by an operational standby/spare recovery unit.

    Process: The service or function that the process provides is not available. A process failure can cause a recovery unit level recovery action or the system may attempt to restart the failed process.

    Recovery Group (RG): The service provided by the recovery group is not available. Manual correction is required, as the automatic system repair actions have not solved the problem.

    The system High Availability Services (HAS) will periodically attempt to solve the problem with corrective actions, such as switchovers or restarts. The alarm system also clears the obsolete alarms that may have been raised by this managed object or by its child managed objects.

    Identifying additional information fields-

    Additional information fields

    1. Identifies the managed object type: "Node", "Recovery unit", "Process" or "Recovery group".

    2. Explains the string of the fault type (if that information is available) or just the string "failure".For example: "Process has stopped responding to heartbeats""Node connection heartbeat failure""Recovery group failure"

  • 40 DN70398724

    OMS Alarms

    Id:0900d80580292626Confidential

    Instructions

    1. Log into the cluster and check that the named managed object has been success-fully restarted.

    2. Verify also that the MO did not raise any new alarms that would explain the failure.You can check the status of an MO with the HAS user interface tool fshascli. An opera-tional MO has the value ENABLED in the operational state attribute and an empty pro-cedural status attribute.

    For example, the state of the process NodeDNS in the recovery unit FSNodeDNSServer of the node AS-5 can be seen as follows:

    $ fshascli --status /AS-5/FSNodeDNSServer/NodeDNS /AS-5/FSNodeDNSServer/NodeDNS:administrative(UNLOCKED) operational(ENABLED) usage(ACTIVE) procedural() availability( ) unknown(FALSE) role(ACTIVE)

    If the MO is not operational, perform the following steps:

    1. With a node MO, you can wait for a node restart. The system will raise another alarm (70011 NODE NOT RESPONDING) if the node does not come up within some time.

    2. Check the system logs (/var/log/master-syslog on the active CLA node) for error(s) that have occurred by searching for the MO's name and/or by looking at events that occurred before this alarm was raised.

    3. You can also use the HAS user interface tool to initiate an immediate restart attempt of the failed MO using the -r (--restart) command line option:

    $ fshascli --restart /AS-5/FSNodeDNSServer

    The restart operation is mostly useful after a problem has been corrected. Verify the result from the syslog and by checking the status of the MO.

    4. An alarm for a recovery group implies a multiple error situation (for example, multiple node failures) or a persistent configuration or corruption problem. In this case, contact your local Nokia Siemens Networks representative.

    ClearingThe system clears the alarm automatically when the fault has been corrected.

    Testing instructionsScenario 1: Alarm for a node

    1. Restart an operational unlocked node using fshascli. For example,$ fshascli --state /AS-1/AS-1administrative(UNLOCKED)

  • DN70398724 41

    OMS Alarms

    Id:0900d80580292626Confidential

    unknown(FALSE)alarm()$ fshascli --restart --nowarning /AS-1/AS-1 is restarted successfully

    2. Wait for a few seconds for the node to turn DISABLED. The alarm is raised after this. For example, $ fshascli --state /AS-1/AS-1administrative(UNLOCKED)

  • 42 DN70398724

    OMS Alarms

    Id:0900d80580292626Confidential

    $ fshascli -state /TA-A/TestApplAServer/TestProcA /TA-A/TestApplAServer/TestProcA:administrative(UNLOCKED)

  • DN70398724 43

    OMS Alarms

    Id:0900d80580292626Confidential

    $ ssh TA-A killall testProcB

    2. Verify that the alarm was raised and (very likely) also immediately cancelled. The HAS cancels the alarm immediately if the recovery unit repair cycle allowed an immediate restart.The alarm raising is also visible in the syslog as a message that begins as follows:

    ALARM RAISE SP=70159 . . .

    Similarly, the alarm cancellation is also visible in the syslog as a message that begins as follows:

    ALARM CANCEL SP=70159 . . .

  • 44 DN70398724

    OMS Alarms

    Id:0900d805802f6914Confidential

    1.17 70160 MEMORY USAGE OVER LIMITProbable cause: Threshold crossed

    Event type: Quality of service

    Default severity: Major

    MeaningMemory consumption is too high because some processes are using too much memory.

    There is a risk that the node is unable to fulfil the tasks allocated to it because the pro-cesses cannot reserve enough memory for their use. As a result, the processes cannot perform the tasks allocated to them.

    Identifying additional information fields-

    Additional information fields-

    Instructions

    1. RuntopLinux command on the node that reports the alarm to view a snapshot of the current global memory. Press M to sort the processes in the node based on their memory resident size to check which processes consume the most memory.

    2. If the problem persists, contact your local Nokia Siemens Networks representative and provide them with the information gathered in the previous step.

    ClearingThe alarm is automatically cleared by the operating system's fault detector once the memory usage is on a low enough level. The raising / clearing thresholds are different to prevent unnecessary trashing.

    Testing instructionsDo not test this alarm, because testing it will result in reduced quality of service.

  • DN70398724 45

    OMS Alarms

    Id:0900d805803aa9bdConfidential

    1.18 70161 OPERATING SYSTEM MONITORING FAILUREProbable cause: System call unsuccessful

    Event type: Processing error

    Default severity: Major

    MeaningThe fault detector in the operating system has failed to capture the statistics of the usage of a given resource.

    The state of the named device cannot be discovered, which may indicate that there are some fundamental problems with it.

    Identifying additional information fields

    1. Failed subsystem2. Failed resource, where the values are

    CPU: Index of the processor FILESYSTEM: Name of the mountpoint ETHERNET: Name of the interface MEMORY: RAID: Name of the device FC (Fibre Channel):

    Additional information fields-

    InstructionsIf the alarm is not cleared automatically, contact your Nokia Siemens Networks repre-sentative.

    ClearingDo not clear the alarm. The alarm is automatically cleared when the fault detector of the operating system is able to capture the statistics of the failed resource.

    Testing instructionsThis alarm is difficult to test, because the hardware problem cannot be simulated.

  • 46 DN70398724

    OMS Alarms

    Id:0900d805804b01a5Confidential

    1.19 70162 RAID ARRAY HAS BEEN DEGRADEDProbable cause: Disk problem

    Event type: Equipment

    Default severity: 3 Major

    MeaningRedundancy of the RAID array is lost. A device belonging to the RAID array can be marked faulty by the system. The alarm may be caused by either errors in the fibre channel (FC) or small computer system interface (SCSI) bus or by a potentially broken disk media.

    In the case of a subsequent disk failure, data will be lost.

    Identifying additional information fields1. RAID array.

    Additional information fields2. Faulty device (optional).

    InstructionsIf the hardware is FlexiServer Blade Hardware, then follow these instructions:

    1. Use the command cat /proc/mdstat to check the status of the RAID array found in the Identifying additional information field of the alarm on the node that reports the alarm.The [UU] field printed by the command describes whether both of the disks are in the RAID array or not. If this field contains [_U] or [U_], one of the disks is not in the RAID array.

    2. The redundancy of the RAID array should be automatically restored by the system within an hour. If the problem persists and the alarm is not cleared within an hour, contact your local Nokia Siemens Networks representative.

    3. If the problem persists, try changing the faulty disk according to the hardware main-tenance instructions. If that does not help, contact your local Nokia Siemens Networks representative.

    If the hardware is IBM BladeCenter, then follow these instructions:

    1. Check the Maintenance Module and find the faulty disk and the possible cause of the fault. Replace the faulty disk with a new disk, referring to the hardware mainte-nance documentation for detailed replacement instructions.

    2. The redundancy of the RAID array should be automatically restored by the system within an hour. If the problem persists and the alarm is not cleared within an hour, contact your local Nokia Siemens Networks representative.

    ClearingThe alarm is automatically cleared by the operating system's fault detector once the redundancy of the RAID array is restored.

    Testing instructionsDo not test this alarm in a live system. Any real disk faults during the execution of this test may lead to data corruption.

  • DN70398724 47

    OMS Alarms

    Id:0900d8058047092dConfidential

    1.20 70163 ETHERNET INTERFACE USAGE OVER LIMIT Probable cause: Threshold Crossed

    Event type: Quality of service

    Default severity: Minor

    MeaningThe Ethernet interface is used at a very high level. This alarm may be raised, for example, when large files are copied over the network causing a lot of network file system (NFS) traffic.

    Packages are not lost yet but if the interface is loaded increasingly, packages might eventually be lost.

    Identifying additional information fields1. Bonding interface

    2. Ethernet interface

    Additional information fields-

    InstructionsThis is an informative alarm and does not require direct actions.

    ClearingThe alarm is automatically cleared by the operating system's fault detector once the Ethernet load has decreased to a tolerable level.

    Testing instructionsDo not test this alarm, because testing it will create instability in the system.

  • 48 DN70398724

    OMS Alarms

    Id:0900d80580384d0cConfidential

    1.21 70164 ETHERNET LINK FAILURE Probable cause: Link failure

    Event type: Equipment

    Default severity: Minor

    MeaningThe redundancy of Ethernet is lost because of an Ethernet link failure. The error might have been caused by a hardware failure, that is, a potentially broken Ethernet port, by an unplugged cable on the front panel of the gateway (GW) node, or if some program or user has issued a command shutting down the Ethernet interface.

    In case of subsequent link failure, the Ethernet packages are lost which means that the node cannot receive or transmit data over the network.

    Identifying additional information fields1. Bonding interface

    2. Ethernet interface

    Additional information fields-

    Instructions

    1. If the alarm is raised for an external Ethernet interface, check that the cable is properly connected in the front panel of the GW node.

    2. Take a console connection to the node with the alarming interface.3. Check the status of the interface with the following command:

    ifconfig -a For example, ifconfig -a eth0

    4. Assuming that the interface does not have the UP and RUNNING flags set, try to configure the interface UP with the following command ifup For example, ifup eth0

    5. If the previous steps have not resolved the situation, contact your local Nokia Siemens Networks representative.

    ClearingThe alarm is automatically cleared by the operating system's fault detector when the Ethernet link comes up.

    Testing instructionsDo not test this alarm, because testing it will create instability in the system.

  • DN70398724 49

    OMS Alarms

    Id:0900d80580465c3bConfidential

    1.22 70166 MANAGED OBJECT LOCKEDProbable cause: Software program abnormally terminated

    Event type: Processing error

    Default severity: Warning

    MeaningThe administrative state of the named managed object (MO) which can be a cluster, a node, or a recovery unit (RU) has changed to LOCKED as a result of a user action (grace-ful shutdown or lock operation).

    The named MO and its child MOs have been stopped and will not be started before a corresponding unlock operation is performed by the user. The service provided by the MO is not available, unless the MO is a RU with some operational and UNLOCKED redun-dant resources.

    When a MO is locked, the alarm system of the cluster clears the alarms raised by the MO and its child MOs.

    Identifying additional information fields-

    Additional information fieldsIdentifies the MO type: a cluster, a node, or a RU.

    InstructionsThis is an informative alarm and does not require any actions.

    ClearingDo not clear the alarm. This is an informative alarm and will be cleared automatically by the alarm system after its time to live has expired.

    Testing instructionsLock the managed object using fshascli. For example:

    $ fshascli --lock --nowarning /AS-1/FSNodeDNSServerThe alarm raising is also visible in the syslog as a message that begins as follows:

    ALARM RAISE SP=70166...Note that test case for alarm 70189 MANAGED OBJECT UNLOCKED BY OPERATOR should be run after this to get the initial situation restored.

  • 50 DN70398724

    OMS Alarms

    Id:0900d805803276a5Confidential

    1.23 70168 CLUSTER STARTED (RESTARTED) Probable cause: Software environment problem

    Event type: Processing error

    Default severity: Major

    MeaningThe whole cluster is starting or restarting.

    Starting or restarting of the whole cluster means (re)starting of all managed objects within the cluster.

    The (re)start may have been initiated by an operator or be caused by fatal errors in some critical hardware or software component. When the cluster is restarted, the alarm system clears all alarms that were raised by the cluster's managed objects before the restart.

    Identifying additional information fields-

    Additional information fields-

    InstructionsThis alarm is an informative alarm indicating that the whole cluster has been (re)started. As this operation is critical for software and hardware, check carefully the alarm status in the cluster after the restart.

    ClearingClear the alarm after carefully checking the alarm status in the cluster.

    Testing instructions

    1. Restart the cluster usingfshascli:$ fshascli --restart --nowarning /

    2. Wait for the cluster to restartThe alarm is visible in the alarm database (if configured) and in syslog as a message that begins as follows:

    ALARM RAISE SP=70168 ...3. Note that all services are unavailable during restart.

  • DN70398724 51

    OMS Alarms

    Id:0900d8058034a2fbConfidential

    1.24 70173 BACKEND DATABASE REQUIRED BY CORBA NAMING SERVICE IS UNAVAILABLE Probable cause: Underlying Resource Unavailable

    Event type: Processing error

    Default severity: Major

    MeaningThe MySQL database instance DB_CosNaming, used by the private CORBA naming service (NaS) instance, cannot be contacted by the NaS wrapper. Note that the recovery group that owns the backend database is NamingServiceDB and CORBA NaS instances belong to recovery group PrivateCosNaming.

    The CORBA NaS is not able to store data in the database. Therefore the CORBA NaS is not functional and replies to the high availability services (HAS) heartbeats with a failure indication.

    Identifying additional information fields-

    Additional information fields-

    Instructions

    Check that the error situation still exists /opt/Nokia/SS_Naming/bin/ns_listallThese commands should list the content of the private naming graphs when the NaS is working correctly. If the command throw exceptions, the NaS is not working cor-rectly, which may result, for example, from an unavailable backend database.

    Check if the backend database DB_CosNaming (RG NamingServiceDB) is unlocked and active.fshascli -s /NamingServiceDBIf the NamingServiceDB is locked, unlock it.fshascli -u /NamingServiceDB After a few seconds the database should have restarted and the NaS should have automatically re-established connections. Ensure the restart and the re-established connections by issuing the ns_listall command mentioned above.

    If this does not solve the problem, there is something wrong with the database deployment or configuration. In that case, also the alarm 70156 DISK DATABASE WATCHDOG START-UP FAILED should be raised by the MySQL DB watchdog dedicated for the DB_CosNaming database instance.The following steps describe the error checking procedure if NamingServiceDB RG fails (see alarm description 70156 DISK DATABASE WATCHDOG START-UP FAILED for more information).

    1. Check the master-syslog for any indication of errors.less /var/log/master-syslog

    2. Check that the LDAP (Lightweight Directory Access Protocol) server is up and running. Check that the RG owning the LDAP server is unlocked.

    fshascli -s /Directory

  • 52 DN70398724

    OMS Alarms

    Id:0900d8058034a2fbConfidential

    Check that the LDAP server is really working by listing the content of the LDAP tree (CTRL-C aborts the listing).ldapsearch

    3. If the LDAP is working correctly, check that the DB directory mount is functional: Lock the NamingServiceDB RG (if not yet locked). Mount the database directory manually.

    a) Create the SW RAID (md device) to where the DB_CosNaming directory is stored at.

    create_sw_raid /dev/md8 \ /dev/VG_62/MySQL_DB_CosNaming \ /dev/VG_63/MySQL_DB_CosNaming

    Note that the device paths given as arguments above may be different in your system.Check the correct device paths from:/opt/Nokia_BP/etc/ldapfile/ldif_in/PFSAN*.ldifThe device paths are defined under an entry defining the FSHWSWRAID object class for the NaS:dn: fshwStorageResourceName=/dev/md8, fshwSANName=0,fsFragmentId=HW, fsClusterId=ClusterRootfshwStorageResourceName: /dev/md8objectClass: FSHWStorageResourceobjectClass: FSHWSWRAIDobjectClass: extensibleObjectfshwRAIDLevel: 1fshwPartitionName: /dev/VG_62/MySQL_DB_CosNamingfshwPartitionName: /dev/VG_63/MySQL_DB_CosNamingfsUserComment: MySQL DB for CORBA Naming Service

    b) Mount the directory.mkdir /tmp/tmp_nasDBmount /dev/md8 /tmp/tmp_nasDB

    Remember to unmount the directory and to stop the md device after the following checks have been performed (see the last step).

    4. Check that the database disk content is accessible and readable ls -la /tmp/tmp_nasDB

    5. Check that the my.cnf and odbc.ini files exist in that directory and have read access rights. Check also that these files are identical to those under the SS_Naming home directory.

    diff /tmp/tmp_nasDB/odbc.ini /opt/Nokia/SS_Naming/etc/odbc.inidiff /tmp/tmp_nasDB/my.cnf /opt/Nokia/SS_Naming/etc/my.cnf

    6. Check the mysql.err file for any error indications. You can also find this file from the /tmp/tmp_nasDB directory.

    7. Remove the mount and stop the md devicesa) Unmount and remove the directory.

    umount /tmp/tmp_nasDBrmdir /tmp/tmp_nasDB

  • DN70398724 53

    OMS Alarms

    Id:0900d8058034a2fbConfidential

    b) Stop the md device.mdadm --manage -S /dev/md8

    If any of the preceding checks fail, a major software failure exists in the system. In that case, contact your Nokia Siemens Networks representative with the information gathered during the preceding steps.

    ClearingHAS clears the alarm automatically when it has detected the NaS to be faulty and there-fore restarted the PrivateCosNaming recovery group.

    However, if the backend database remains faulty, the alarm is raised again. This may result in a restart loop constantly raising the same alarm. Therefore, if the problem seems to be permanent, it is recommended to lock the NaS and the database recovery groups with the following commands:

    fshascli -l /NamingServiceDB fshascli -l /PrivateCosNaming and to clear the alarm manually before performing the steps for solving the error.

    Testing instructions

    1. Unlock the NamingServiceDB RG.2. Unlock the CosNaming and PublicCosNaming RGs.3. Running the command /opt/Nokia/SS_Naming/bin/ns_listall should list

    all the object bound in the name service. This shows that the Naming Service is func-tional.

    4. Lock the NamingServiceDB RG.Within some tens of seconds the alarm should be raised.

    Clearing:

    1. Lock the CosNaming and PublicCosNaming RGs.2. Unlock the NamingServiceDB RG.3. Unlock the CosNaming and PublicCosNaming RGs.4. Check with /opt/Nokia/SS_Naming/bin/ns_listall that the naming service

    is functional again.

    The alarm should be cleared at this point.

    The alarm is automatically cleared by the naming service when it re-establishes connec-tions to database.

  • 54 DN70398724

    OMS Alarms

    Id:0900d80580344161Confidential

    1.25 70186 CLUSTER OPERATION INITIATED BY OPERATOR Probable cause: Congestion

    Event type: Quality of service

    Default severity: Warning

    MeaningThis is an informative alarm which indicates that an operator has initiated a cluster oper-ation on the specified managed object (MO). The MO can refer to the whole cluster, a node, a recovery unit (RU), recovery group (RG), or a process. The platform high avail-ability services (HAS) is now executing the operation. The operation can be

    switchover restart power-off.The operations have different effects:

    SwitchoverApplicable only to recovery groups (RG). The active RU instance of the RG is termi-nated and a standby instance on another node started or, in case of a hot active standby RG, activated. The service provided by the named RU is down until the swi-tchover is complete.

    RestartFor the cluster and nodes this means a physical restart (reboot) of node(s). For other MOs, the named MO is stopped and restarted. The services provided by the named MO are down during the restart.

    Power-offApplicable only to nodes. The named node is being powered off.

    Identifying additional information fields-

    Additional information fields1. Identifies the MO type (the cluster, a node, a process, or an RU).

    InstructionsThis is an informative alarm and does not require any actions.

    ClearingThe alarm system clears the alarm automatically after its time to live has expired.

    Testing instructions

    1. Log into the cluster.2. Restart a managed object using fshascli. For example:

    fshascli --restart --nowarning /AS-1

    The alarm is visible in the alarm database (if configured) and in the syslog as a message that begins as follows:

    ALARM RAISE SP=70186 ...

  • DN70398724 55

    OMS Alarms

    Id:0900d80580296bb3Confidential

    1.26 70188 MANAGED OBJECT SHUTDOWN BY OPERATOR Probable cause: Congestion

    Event type: Quality of service

    Default severity: Warning

    MeaningThis is an informative alarm which indicates that the specified managed object (MO) which can be the whole cluster, a node or a recovery unit (RU) is being shutdown. The named MO and all its unlocked sub-resources are now terminating.

    The MO is being shutdown by an operator. All services provided by the named MO are terminating. Once the operation is completed, the administrative state of the MO and all its sub-MOs will be changed to locked.

    Note that a shutdown request may take a long time if the maximum duration for the oper-ation has not been specified. The shutdown request can be forced to completion by issuing a lock command. In that case the platform high availability services (HAS) will terminate the services ungracefully.

    Identifying additional information fields-

    Additional information fields1. Identifies the MO type (a cluster, a node, or an RU)

    InstructionsThis is an informative alarm which requires no user actions.

    ClearingThe alarm system clears this alarm automatically after its time to live has expired.

    Testing instructionsThe target of the shutdown command can be a cluster, node, recovery group or recovery unit.

    1. Log into the cluster2. Execute the shutdown command to the managed object. For example: fshascli --

    shutdown /AS-1

    The alarm is also visible in the syslog as a message that begins as follows:

    ALARMRAISE SP=70188 ...

    Note that in the example above --shutdown does not power off the node. It just grace-fully shuts down all HAS managed non-critical processes in the node.

    After the testing is finished, use the fshascli --unlock command to get the initial situation restored. For example:

    fshascli --unlock /AS-1

  • 56 DN70398724

    OMS Alarms

    Id:0900d805803d689dConfidential

    1.27 70189 MANAGED OBJECT UNLOCKED BY OPERATOR Probable cause: Congestion

    Event type: Quality of service

    Default severity: Warning

    MeaningThis is an informative alarm which indicates that the specified managed object (MO) which can be the whole cluster, a node, or a recovery unit (RU) has been unlocked. The named MO and its unlocked sub-resources (if there are any) can now be activated.

    Notice that the MO (or its sub-MOs) can remain locked because of the dependency on a higher level MOs. That is, the unlock operation will not have effect on the MO in question before the higher level MOs are unlocked. For example, an RU in a node will remain locked, if the node or the cluster MO is locked.

    The MO has been set to the unlocked state. If all the higher level MOs are unlocked as well, the services provided by the MO are activated.

    Identifying additional information fields-

    Additional information fieldsIdentifies the MO type (a cluster, a node, or an RU)

    InstructionsThis is an informative alarm and does not require any actions.

    ClearingThe alarm system clears the alarm automatically after its time to live has expired.

    Testing instructionsUnlock the previously locked managed object using fshascli:1. Log into the cluster.2. Unlock the managed object using fshascli. For example:

    fshascli -unlock /AS-1/FSNodeDNSServer

    The alarm is also visible in the syslog as a message that begins as follows:

    ALARM RAISE SP=70189 ...

    Note that this test should be run after the test case for alarm 70166 MANAGED OBJECT LOCKED.

  • DN70398724 57

    OMS Alarms

    Id:0900d805804611f9Confidential

    1.28 70236 LDAP DATABASE CORRUPTED70236 LDAP DATABASE CORRUPTED

    Severity Major

    Fault reasonA primary or secondary Lightweight Directory Access Protocol (LDAP) database is cor-rupted and cannot be accessed anymore. An LDAP database can get corrupted, for example, when:

    a disk becomes full while the database is being updated a node failure and/or ungraceful node restart happens while the database is being

    updated.

    The identified LDAP database is currently unavailable.

    In case of a secondary database, the only impact is that the node start-ups can take slightly longer because some platform services attempt to use the secondary data-base(s) by default.

    Failure of the primary database has a more significant impact. Most application pro-cesses cannot be (re)started anymore and applications that update LDAP will fail. If a secondary database is still available, nodes can still be (re)started but only basic platform services will be able to start. If the primary and all secondary databases have failed, the cluster or any of its nodes cannot (re)start anymore. The system will next automatically try to recover the corrupted database from an operational primary or sec-ondary database.

    Description

    Identifying additional information fields-

    Additional information fields

    1. Type of the database: Primary or Secondary2. Relative path of the database directory. Notice that secondary databases are

    usually located in a directory such as /var/mnt/local/localimg//opt/Nokia_BP/var/pmgmt/pt/Nokia_BP/var/pmgmt//fsPlatformSlave-ldbm. Primary LDAP database directory is usually of the following format: /var/mnt/local/sysimg//opt/Nokia_BP/var/pmgmt//fsPlatform-ldbm. Notice especially that the lowest level directory is fsPlatformSlave-ldbm for secondary databases and fsPlatform-ldbm for the primary database.

    InstructionsThe system will automatically attempt to recover the corrupted database from a func-tional copy. If the automatic recovery is successful, this alarm is automatically cleared and the system raises a new "CORRUPTED LDAP DATABASE RECOVERED" warning alarm. The automatic recovery, if successful, takes less than a minute.

    If the primary and seco