Oam Best Practices

Embed Size (px)

Citation preview

  • 8/3/2019 Oam Best Practices

    1/26

    SERVICE PROVIDER

    OAM Best Practices in Mission-Critical

    MPLS, IP, and Carrier Ethernet Networks

    A variety of Operations, Administration, and Management (OAM)

    protocols and tools have been developed recently for MPLS, IP, and

    Ethernet networks, which provide the unparalleled power to proactively

    manage networks and customer Service-Level Agreements (SLAs).

    This paper reviews the OAM tools available in MPLS, IP, and Ethernet

    networks at various layers and describes best practices for choosing

    the right OAM tool to use for particular network deployments.

  • 8/3/2019 Oam Best Practices

    2/26

    SERVICE PROVIDER BEST PRACTICES GUIDE

    OAM Best Practices in Mission Critical MPLS, IP, and Carrier Ethernet Networks 2 of 26

    CONTENTS

    Overview ............................................................................................................................................................................................................................................. 3OAM Layering ................................................................................................................................................................. 3OAM Tools and Network Layers .................................................................................................................................... 4

    Layer 2 OAM Tools .......................................................................................................................................................................................................................... 5Layer 2 Trace ................................................................................................................................................................. 5Port Loop Detection ....................................................................................................................................................... 6Unidirectional Link Detection ........................................................................................................................................ 7Single-Link LACP Keep-Alive .......................................................................................................................................... 8IEEE 802.1ag CFM ......................................................................................................................................................... 9

    Continuity Check Messages (CCM) ...................................................................................................................... 11Loopback Messages (LBM) .................................................................................................................................. 11Linktrace Messages (LTM) ................................................................................................................................... 11Brocade Implementation of 802.1ag: ................................................................................................................. 12Hierarchical Fault Detection using 802.1ag ....................................................................................................... 12IEEE 802.1ag Configuration Example ................................................................................................................. 13IEEE 802.1ag CFM versus ITU-T Y.1731 OAM .................................................................................................... 15

    ITU-T Y.1731 Performance Management ................................................................................................................... 15IEEE 802.3ah Ethernet First Mile (EFM) Link OAM .................................................................................................... 16Layer 2 OAM Summary ................................................................................................................................................ 17

    MPLS OAM Tools ...........................................................................................................................................................................................................................18LSP Ping ....................................................................................................................................................................... 18LSP Traceroute ............................................................................................................................................................. 19LSP Ping and LSP Traceroute Considerations ............................................................................................................ 19BFD for RSVP-TE LSPs ................................................................................................................................................. 20MPLS OAM Summary ................................................................................................................................................... 21

    IP and VRF OAM Tools .................................................................................................................................................................................................................22IP and VRF Ping ............................................................................................................................................................ 22IP and VRF Traceroute ................................................................................................................................................. 22BFD for OSPFv2, OSPFv3, IS-IS, and BGP4 ................................................................................................................ 23IP and VRF OAM Summary .......................................................................................................................................... 25

    Summary .........................................................................................................................................................................................................................................26

  • 8/3/2019 Oam Best Practices

    3/26

    SERVICE PROVIDER BEST PRACTICES GUIDE

    OAM Best Practices in Mission Critical MPLS, IP, and Carrier Ethernet Networks 3 of 26

    OVERVIEW

    A variety of OAM tools have been developed in recent years for MPLS, IP, and Ethernet networks. These

    tools provide unparalleled power for an operator to proactively manage networks and customer Service-

    Level Agreements (SLAs). These OAM tools address fault detection, fault verification, and fault isolation and

    provide proactive detection of service degradation, service performance monitoring, and SLA verification.

    In MPLS, IP, and Ethernet networks, Operations, Administration, and Management (OAM) and Provisioning

    (OAM&P) encompasses the Management Plane (see Figure 1), represented by Network Management

    Systems (NMS) and Element Management Systems (EMS), and the Network Plane, represented by Network

    Elements (NE) and the OAM tools that run across NEs.

    This white paper reviews the OAM tools available in MPLS, IP, and Ethernet networks at various layers of the

    networking stack and recommends and reviews best practices for choosing the right OAM tool to use for a

    particular network deployment.

    Figure 1. OAM tools

    OAM Layering

    OAM tools can be classified into three main types based on the OAM layer (Figure 2):

    Service Layer OAM. Tools applicable to services on an end-to-end basis Network Layer OAM. Tools applicable to services over a particular network Transport Layer OAM. Tools applicable to the transport layer of the network

    Figure 2. OAM layers

    These OAM layers are hierarchical in nature. For example, inFigure 3the Service Layer OAM for Operator A

    can be seen as a Transport Layer OAM for the service provider, who sees the service provided by Operator A

    as a transport tunnel for the customer.

    Management Plane

    (NMS, EMS)

    OAM&PNetwork Plane

    (Network Elements)

    The scope of this paper is OAM tools across Network Elements

    Service Layer OAM

    Network Layer OAM

    Transport Layer OAM

  • 8/3/2019 Oam Best Practices

    4/26

    SERVICE PROVIDER BEST PRACTICES GUIDE

    OAM Best Practices in Mission Critical MPLS, IP, and Carrier Ethernet Networks 4 of 26

    NOTE: The terms customer, service provider, and operator are commonly used to reflect the

    business relationships that often exist among organizations and individuals. An operator provides a

    single Layer 2 or Layer 3 backbone network to a service provider. An operator can be identical to, or a

    part of the same organization as, a service provider.

    The best OAM tools to use at a particular network layer depend on the type of network. For example, in

    Figure 3, Operator A has an MPLS network and uses MPLS OAM tools, while Operator B has an Ethernetnetwork and uses Ethernet OAM tools.

    Figure 3. Customer, operator, and service provider views of OAM layering

    OAM Tools and Network Layers

    Each network layer has its own best-suited OAM tools.Figure 4lists common OAM tools applicable to

    Layer 2, MPLS, IP (Layer 3), and the Virtual Private Network (VPN), which includes Layers 2 and 3 VPNs.

    Note that certain OAM tools, for example,802.1ag CFM and Y.1731 PM, are applicable to Layer 2 networks

    and also to Layer 2 VPN services, as shown inFigure 4.

    The following sections address the OAM tools shown inFigure 4.

    Figure 4. Each network layer has its own best-suited OAM tools

    Customer

    network

    Site 1 Site 2

    Customer

    networkOperator B

    Network

    Ethernet

    Operator A

    Network

    MPLS

    Service Provider

    Ethernet OAM

    (Operator B)

    Link

    OAM

    Link

    OAM

    Link

    OAM

    MPLS OAM

    (Operator A)

    Service OAM

    IP

    Layer 2

    tracePort loop

    detection UDLD

    Single-link

    LACPkeep-alive

    Ping and Traceroute BFD for OSPF and IS-IS

    VRF Ping and Traceroute

    (L3VPN)

    802.1ag CFM for VPLS/VLLY.1731 PM for VPLS/VLL

    (L2VPN)

    VPN

    MPLS

    Layer 2

    802.1ag

    CFM/Y.7131 PM

    802.3ah

    EFM OAM

    BFD for RSVP-TE LSPsLSP Ping and Traceroute

  • 8/3/2019 Oam Best Practices

    5/26

    SERVICE PROVIDER BEST PRACTICES GUIDE

    OAM Best Practices in Mission Critical MPLS, IP, and Carrier Ethernet Networks 5 of 26

    LAYER 2OAMTOOLS

    This section addresses the Layer 2 OAM tools listed in Figure 4. These tools function in Layer 2 networks to

    monitor:

    Layer 2 services and connectivity (VLANs): Layer 2 Trace, Port Loop Detection, 802.1ag CFM, andY.1731 PM

    Layer 2 links: UDLD, single-link keep-alive, and 802.3ah EFM OAMLayer 2 Trace

    Layer 2 Trace is a Brocade proprietary OAM tool that traces the traffic path in a VLAN. Layer 2 Trace is run

    on demand using a CLI command. Layer 2 Trace can be used to trace a particular IP, MAC, or hostname in a

    given VLAN. The Layer 2 Trace command (trace-l2) probes the entire Layer 2 topology and displays the

    input or output ports of each hop in the path, the round trip travel time of each hop, and each hop's Layer 2

    protocol (such as STP, RSTP, 802.1w, SSTP, metro ring, or route-only).

    Figure 5 shows an example of Layer 2 Trace command (trace-l2) executed for the given network

    configuration. The probed Layer 2 information is discarded after 10 minutes or when a new trace-l2

    command is issued.

    Layer 2 Trace can also display hops that form a forwarding loop in a VLAN. Figure 6 is an example in which

    the active topology for VLAN 2 forms a forwarding loop. In this case, Layer 2 Trace on VLAN 2 detects the

    forwarding loop and issues the indicated warning message.

    Layer 2 Trace configuration considerations:

    The devices that will participate in the Layer 2 Trace protocol must be assigned to a VLAN and alldevices on that VLAN must be Brocade devices that support the Layer 2 Trace protocol.

    Devices that do not support the Layer 2 Trace protocol simply forward Layer 2 Trace packets without areply and are transparent to the Layer 2 Trace protocol.

    The destination for the packet with the trace-l2 protocol must be a device that supports the Layer 2Trace protocol.

    The destination cannot be a client, such as a personal computer, or devices from other vendors.

    Figure 5. Layer 2 Trace example

  • 8/3/2019 Oam Best Practices

    6/26

    SERVICE PROVIDER BEST PRACTICES GUIDE

    OAM Best Practices in Mission Critical MPLS, IP, and Carrier Ethernet Networks 6 of 26

    Figure 6. Layer 2 trace in a loop topology

    Port Loop Detection

    Port Loop Detection is a Brocade proprietary OAM toll used to detect Layer 2 forwarding loops. Upondetecting a Layer 2 forwarding loop, the Port Loop Detection tool disables the errant port(s). The device can

    be configured to automatically re-enable ports after a timeout period.

    This OAM tool sends special protocol packets from the device and detects Layer 2 forwarding loops when

    these packets are received on ports on the same device.

    Layer 2 Trace can also detect forwarding loops. However, the difference is that Port Loop Detection does

    not require manual interaction to detect loops. That is, Layer 2 Trace is run on demand using a CLI

    command, while Port Loop Detection runs continuously to provide automatic detection and reduce down-

    time due to misconfigurations.

    Port Loop Detection supports two modes of operation:

    Strict mode. Detects a Layer 2 forwarding loop where packets loop back to the same physical port,

    that is, a hair pin loop.

    NetIron(config)#interface ethernet 1/1

    NetIron(config-if-e1000-1/1)#loop-detection

    Loose mode. Detects Layer 2 forwarding loops for a given VLAN or a VLAN group. Loose mode floodstest packets to the entire VLAN or VLAN group. See Figure 7.

    NetIron(config)#vlan 20

    NetIron(config-vlan-20)#loop-detection

    NetIron(config)#vlan-group 10

    NetIron(config-vlan-group-10)#add-vlan 1 to 100

    NetIron(config-vlan-group-10)#loop-detection

  • 8/3/2019 Oam Best Practices

    7/26

    SERVICE PROVIDER BEST PRACTICES GUIDE

    OAM Best Practices in Mission Critical MPLS, IP, and Carrier Ethernet Networks 7 of 26

    Figure 7. Port Loop Detection example (loose mode)

    Unidirectional Link Detection

    Unidirectional Link Detection (UDLD) is a Brocade proprietary OAM tool used to monitor an Ethernet link

    between two Brocade NetIron devices and to provide fast detection of link failures.

    Ports enabled for UDLD exchange proprietary health-check packets once every keep-alive interval. The

    keep-alive interval can be configured between 100 ms and 6000 ms in increments of 100 ms. The default

    keep-alive interval is 500 ms.

    If a port does not receive a health-check packet from the port at the other end of the link after a number ofkeep-alive retry intervals, UDLD brings the port down. As a consequence, UDLD brings the ports on both

    ends of the link down if the link goes down on one direction. Keep-alive retry intervals can be configured

    from 3 to 10, and the default is 5.

    When UDLD is enabled on a port, the port transitions into an init state to detect if the other end supports

    UDLD. The port does not go down if the other end is not UDLD-enabled.

    Figure 8illustrates UDLD used to monitor a link between two nodes. Figure 9 is an example of a global

    show UDLD command. The show command also supports showing information for a specific port (not

    shown in the figure).

    Configuration considerations include the following:

    UDLD is supported only on Ethernet ports. To configure UDLD on a LAG group, you must configure the feature on each port of the group

    individually. Configuring UDLD on a LAG groups primary port enables the feature on that port only.

    Dynamic LAG is not supported. If you want to configure a LAG group that contains ports on which UDLDis enabled, you must remove the UDLD configuration from the ports. After you create the LAG group,

    you can add the UDLD configuration back.

    Tagged UDLD is also supported:NetIron(config)# link-keepalive ethernet 1/18 vlan 22

    Figure 8. UDLD configuration example

  • 8/3/2019 Oam Best Practices

    8/26

    SERVICE PROVIDER BEST PRACTICES GUIDE

    OAM Best Practices in Mission Critical MPLS, IP, and Carrier Ethernet Networks 8 of 26

    Figure 9. Displaying UDLD information

    Single-Link LACP Keep-Alive

    The Single-Link Link Aggregation Control Protocol (LACP) Keep-Alive OAM tool supports asingle-port Link

    Aggregation Group (LAG). Single-Link LACP Keep-Alive is used to monitor an Ethernet link between two

    devices and to provide for fast detection of link failures. This is similar to the UDLD OAM tool, except that

    the Single-Link LACP Keep-Alive OAM tool uses LACP, which is a standard protocol, instead of a proprietary

    protocol between nodes.

    When should you use Single-link LACP Keep-Alive instead of UDLD?

    UDLD is a proprietary protocol. Single-link LACP Keep-Alive can be used to interoperate with third-partyequipment also supporting this feature.

    With Single-Link LACP Keep-Alive, LACP PDUs are exchanged between the two nodes to determine if the

    connection between the devices is still active. If no LACP PDUs are received from the other node after 3

    lacp-timeout periods, a timeout event occurs and the port is blocked.

    The LACP keep-alive PDUs can be sent every 1 second (lacp-timeout short) or every 30 seconds (lacp-

    timeout long). Since a timeout is declared after missing 3 consecutive LACP keep-alive PDUs, a timeout can

    be declared in 3 seconds or 90 seconds, depending on the selected LACP keep-alive PDUs interval.

    To configure single-link LACP keep-alive timeout intervals:

    NetIron(config)# lacp-timeout short | long

    Figure 10shows an example of a single-link LACP keep-alive configuration.

    Figure 10. Single-Link LACP Keep-Alive example

  • 8/3/2019 Oam Best Practices

    9/26

    SERVICE PROVIDER BEST PRACTICES GUIDE

    OAM Best Practices in Mission Critical MPLS, IP, and Carrier Ethernet Networks 9 of 26

    IEEE 802.1ag CFM

    The IEEE 802.1ag Connectivity Fault Management (CFM) OAM tool facilitates path discovery, fault

    detection, fault verification and isolation, fault notification, and fault recovery.

    CFM terminology (seeFigure 11):

    MD (Maintenance Domain). The part of a network for which faults in Layer 2 connectivity can bemanaged.

    MEP (Maintenance End Point). A Maintenance Point (MP) at the edge of a domain that actively sourcesCFM messages. There are two types of MEPs, as shown in Figure 12:

    Up (inward) MEP: Considering a MEP on a given physical port, an up MEP sends 802.1agmessages into the node.

    Down (outward) MEP: A down MEP sends 802.1ag messages out of the node.Note that up and down MEPs can be used to include or exclude more of the internal path inside a

    switch, as shown inFigure13.

    MIP (Maintenance Intermediate Point). A maintenance point internal to a domain that only respondswhen triggered by certain CFM messages. A MIP does not actively source CFM messages.

    MA (Maintenance Association). A set of MEPs established to verify the integrity of a single serviceinstance, for example, a VLAN or a VPLS.

    ME (Maintenance Entity). A point-to-point relationship between two MEPs within a single MA. MD Level. An integer from 0 to 7 in a field in a CFM PDU that is used, along with the VLAN ID, to

    identify which MIPs/MEPs would be interested in the contents of a CFM PDU. MD levels are used to

    separate the MAs of customer, service provider, and operators. MD levels 802.1ag recommendations

    for customers, service providers, and operators are shown inFigure 11.

    CFM Hierarchy. MD levels create a hierarchy in which 802.1ag messages sent by customer, serviceprovider, and operators are processed only by MIPs and MEPs at the respective level of the message.

    A common practice is for the service provider to set up a MIP at the customer MD level at the edge ofthe network, as shown inFigure11, to allow the customer to check continuity of the Ethernet service to

    the edge of the network. Similarly, operators set up MIPs at the service provider level at the edge of

    their respective networks, as shown inFigure 11, to allow service providers to check the continuity of

    the Ethernet service to the edge of the operators networks. Inside an operator network, all MIPs are at

    the respective operator level, also shown inFigure 11.

  • 8/3/2019 Oam Best Practices

    10/26

    SERVICE PROVIDER BEST PRACTICES GUIDE

    OAM Best Practices in Mission Critical MPLS, IP, and Carrier Ethernet Networks 10 of 26

    Figure 11. IEEE 802.1ag terminology

    Figure 12. Up and down MEPs

    Figure 13. Using up and down MEPs to include or exclude the path inside a switch

    IEEE 802.1ag CFM supports Continuity Check Messages (CCM), Linktrace, and Loopback Messages, which

    are described in the following sections.

    Customer

    network

    Site 1

    MEDown

    MEPMD level 5

    (7, 6, or 5)

    Site 2

    Customer

    networkOperator B

    Network

    Operator A

    Network

    Service Provider

    Customer MA

    ME

    MEP

    MIP

    Up

    MEPMD level 3

    (4 or 3)

    Service Provider MA

    ME MD level 1

    (2, 1, or 0)

    Operator A MA

    ME

    Operator B MA

    EthernetMPLS

    Down MEP

    Up MEP

    Down MEP

    Up MEP

    Switch

    Port Port

    Down MEP Down MEPUp MEP Up MEP

    Switch Switch

  • 8/3/2019 Oam Best Practices

    11/26

    SERVICE PROVIDER BEST PRACTICES GUIDE

    OAM Best Practices in Mission Critical MPLS, IP, and Carrier Ethernet Networks 11 of 26

    Continuity Check Messages (CCM)

    CCMs are periodic hello messages multicast by a MEP within the maintenance domain to detect continuity

    failures. If a MEP stops receiving periodic CCMs from a peer MEP on a remote bridge, it assumes that either

    the remote bridge has failed or the continuity of the path between the two bridges has been interrupted.

    Figure 14. 802.1ag Continuity Check Messages (CCM)

    Loopback Messages (LBM)

    LBM is a Unicast message used to verify the connectivity between a MEP and a peer MEP or MIP. Loopback

    messages are also used for fault localization.

    To verify the connectivity between a MEP and a peer MEP or a MIP, an LBM is initiated by the source MEP

    with a destination MAC address set to the MAC address of desired peer MEP or MIP. The receiving MIP or

    MEP responds to the LBM with a (Unicast) Loopback Reply (LBR) addressed to the source MEP.

    LBM helps a MEP identify the location of a continuity fault along a given MA. A MIP in front of the continuity

    fault responds with a loopback reply. A MIP or MEP behind the continuity fault does not respond. For

    loopback to work, the MEP must know the MAC address of the target MIP or MEP. These MAC addresses

    can be discovered using the Linktrace Message.

    Figure 15. 802.1ag Loopback Message (LBM)

    Linktrace Messages (LTM)

    LTM is a multicast message used by a source MEP to trace the path to other MEPs in the same MA. Allreachable MIPs and MEPs respond back with a Linktrace Reply (LTR) message addressed to the source

    MEP. The originating MEP can then determine the MAC addresses of all MIPs and MEPs belonging to the

    same MA.

    Note that the source MEP sends a single LTM to the next hop along the trace path. However, it can receive

    many LTR messages from different MIPs along the trace path and different MEPs terminating the branches

    of the trace path.

    Linktrace can also be used when no faults are apparent in order to discover the routes normally taken by

    data through the network.

    Figure 16. 802.1ag Linktrace Message (LTM)

  • 8/3/2019 Oam Best Practices

    12/26

    SERVICE PROVIDER BEST PRACTICES GUIDE

    OAM Best Practices in Mission Critical MPLS, IP, and Carrier Ethernet Networks 12 of 26

    Brocade Implementation of 802.1ag:

    CCM period 3.3 ms, 10 ms, 100 ms, 1 sec, 1 min, 10 min Support for minimum CCM timers (3.3 ms) using hardware offload

    Support for MIPs and up/down MEPs Support for all 8 MD levels (0 7) Support for the following types of end-points/services

    VLANs, VPLS, and VLLHierarchical Fault Detection using 802.1ag

    As shown inFigure 11, 802.1ag CFM defines a domain hierarchy in which customers, service providers, and

    operators use different MD levels. This hierarchy is also used for fault detection.

    Figure 17illustrates an example in which a customer has an Ethernet service between Sites 1 and 2. This

    Ethernet service is provided by Operators A and B. Operator B supports the service at the core with an MPLS

    network. Operator A supports the service at Metro Locations 1 and 2 using a Layer 2 Ethernet network.

    InFigure 17, a service continuity fault occurs inside Operator Bs network. The customer can detect an end-

    to-end service continuity fault using CCM, but it cannot determine the location of the fault within the

    operators network. Operator A can detect that a service continuity fault exists within Operator Bs network.

    Operator B can detect the service continuity fault, but it cannot isolate the location of the continuity fault

    using 802.1ag CFM, since it has an MPLS network. Operator B needs to use MPLS OAM tools to isolate the

    fault location.

    Figure 17. Example of 802.1ag hierarchical fault detection (refer to the numbered items below)

  • 8/3/2019 Oam Best Practices

    13/26

    SERVICE PROVIDER BEST PRACTICES GUIDE

    OAM Best Practices in Mission Critical MPLS, IP, and Carrier Ethernet Networks 13 of 26

    To simplify this example, the service provider level is not shown. If it were, the service provider would be

    represented by the overall network from Operator A in Location 1 through Operator B to Operator A in

    Location 2.

    The following is an example of how this fault can be detected at the different levels of the hierarchy:

    1. The customer detects a service continuity fault using CCMs.2. Using Linktrace, the customer finds that the fault is beyond the MIPs at the border of Operator A.3. Provider A detects a service continuity fault using CCMs.4. Using Linktrace, Provider A determines that the fault is inside Operator Bs network.5. Operator B detects a service continuity fault using CCMs.Operator B uses MPLS OAM tools to determine the location of the fault in its MPLS network. See the MPLS

    OAM section for details on MPLS-specific OAM tools. This statement is included here to emphasize the fact

    that you need to use the appropriate OAM tools for the type of network being used. In this case, Operator B

    has an MPLS network and needs to use MPLS OAM tools. Operator A has a Layer 2 Ethernet network and

    can use 802.1ag CFM. Note that Operator Bs MPLS network is required to support 802.1ag CFM messages

    over VPLS and VLL to allow customers and Operator A to use 802.1ag end-to-end.1

    Note that the customer, Operator A, and Operator B can concurrently and independently detect the

    continuity fault and run Linktrace to determine the location of the fault. The steps above are numbered to

    allow for easy reference to the respective actions depicted inFigure 17. The numbering does not imply an

    ordered sequence of events. That is, Operator A does not have to wait for the customer to tell it that the

    service is broken before it runs its own Continuity Check.

    Note that the CCMs shown inFigure 17can be set up to run continuously to detect potential continuity

    faults or they can be set up on demand as needed.

    IEEE 802.1ag Configuration Example

    InFigure 18, a customer has a point-to-point service (VLL) over an MPLS network. In this example, the

    customer runs CCM at 10 ms intervals at MD level 7 between CE1 and CE2. The service provider runs CCMat 10 ms intervals at MD level 4 between PE1 and PE2.

    Figure 19andFigure 20show example configurations for CE1, CE2, PE1, and PE2 shown inFigure 18.

    1 Brocade supports 802.1ag CFM over VPLS and VLL to allow Ethernet OAM to function end-to-end over an

    MPLS core network.

  • 8/3/2019 Oam Best Practices

    14/26

    SERVICE PROVIDER BEST PRACTICES GUIDE

    OAM Best Practices in Mission Critical MPLS, IP, and Carrier Ethernet Networks 14 of 26

    Figure 18. Example of 802.1ag configuration

    Figure 19. CE1 and CE2 configurations

    MPLS

    7 77

    7

    4

    44

    7

    7VLL

    Customer CCM @ 10 sec

    Service provider CCM @ 10sec

    1/1 1/1 2/1 2/1

    CE1 CE2PE1 PE2

    Customer down MEP

    Customer MIP

    Service Provider up MEP

  • 8/3/2019 Oam Best Practices

    15/26

    SERVICE PROVIDER BEST PRACTICES GUIDE

    OAM Best Practices in Mission Critical MPLS, IP, and Carrier Ethernet Networks 15 of 26

    Figure 20. PE1 and PE2 configurations

    IEEE 802.1ag CFM versus ITU-T Y.1731 OAM

    ITU-T Y.1731 OAM is a superset of IEEE 802.1ag CFM. 2

    ITU-T Y.1731 Performance Management

    ITU-T Y.1731s ETH-CC (Ethernet Connectivity

    Check), ETH-LB (Ethernet Loopback), and ETH-LT (Ethernet Linktrace) OAM functions are equivalent to

    802.1ag CCM, LBM, and LTM, respectively. Devices deploying 802.1ag CCM, LBM, and LTM can

    interoperate with devices deploying Y.1731 ETH-CC, ETH-LB, and ETH-LT, respectively. However, Y.1731

    ETH-CC supports either multicast or Unicast messages, while 802.1ag CCM supports multicast messages

    only. Therefore, to interoperate 802.1ag CCM with Y.1731 ETH-CC, the Y.1731 device must be set up to use

    ETH-CC multicast messages.

    ITU-T Y.1731 Performance Management (PM) supports on-demand measurement of round-trip Frame Delay

    (FD) and Frame Delay Variation (FDV). These measurements are made between defined MEPs (seeFigure

    21).

    The main benefit of Y.1731 PM is for Service Level Agreement (SLA) monitoring and verification of services

    provided to customers in aggregation, metro, and core networks. SLA monitoring and verification is

    essential for delay-sensitive applications, for example, voice, and for services with SLA guarantees.

    The Brocade implementation supports a high-precision, hardware-based time-stamping mechanism that

    provides measurements with microsecond granularity. It also supports delay measurements for Layer 2

    bridging services and for VPLS and VLL services.

    Figure 21. Y.1731 delay measurement

    2 Besides CFM and other functionality, ITU-T Y.1731 also includes Performance Management, which is

    addressed in this paper.

    Brocade MLX

    ETH-DMMEP 2MEP 3

    Brocade MLX

  • 8/3/2019 Oam Best Practices

    16/26

    SERVICE PROVIDER BEST PRACTICES GUIDE

    OAM Best Practices in Mission Critical MPLS, IP, and Carrier Ethernet Networks 16 of 26

    Figure 22shows an example of the Y.1731 delay measurement between MEP3 and MEP2 shown inFigure

    21. The command sends a selectable number (default is 10) of delay measurement PDUs (ETH-DM), which

    are time-stamped in hardware at the source and destination MEPs to achieve high-precision measurement

    independent of software delays. The command averages the individual measurements and lists the

    resulting minimum, average, and maximum delays.

    Figure 22. Y.1731 delay measurement example

    IEEE 802.3ah Ethernet First Mile (EFM) Link OAM

    IEEE 802.3ah Ethernet First Mile (EFM) link OAM monitors and supports troubleshooting individual links.

    That is, 802.3ah OAM operates on a point-to-point link and does not propagate beyond a single hop. As

    shown inFigure 23, this IEEE standard was originally developed to monitor the link between a service

    provider and customer, where it is usually called the first mile link.

    802.3ah EFM OAM supports the following functions:

    OAM discovery Used to discover the 802.3ah EFM OAM capabilities of the peer device

    Remote failure indication (critical events) Used to inform the peer node that the receive path of the link is non-operational Also includes communication of conditions such as dying gasp

    Link monitoring Can generate event notifications (alarms) when defined error thresholds are exceed

    Remote loopback testing Puts the peer in data loopback state

    802.3ah supports two modes of operation:

    Active mode Normally used by a device controlled by a service provider The device can source OAM PDU packets in order to initiate an EFM OAM discovery process

    Passive mode Normally used by customer devices connected to a service provider device The device cannot source OAM PDU packets, but it can respond to received OAM PDUs

  • 8/3/2019 Oam Best Practices

    17/26

    SERVICE PROVIDER BEST PRACTICES GUIDE

    OAM Best Practices in Mission Critical MPLS, IP, and Carrier Ethernet Networks 17 of 26

    Figure 23. IEEE 802.3ah EFM OAM

    Figure 24shows an example of the output of an 802.3ah EFM OAM show command. Note that the show

    command displays not only local link OAM information, but also remote link OAM information.

    Figure 24. Example of 802.3ah EFM OAM show command

    Layer 2 OAM Summary

    Table 1 presents a summary of the Layer 2 OAM tools described in this section.

    Layer 2 Trace Port Loop

    Detection

    UDLD Single-Link

    Keep-Alive

    802.1ag

    CFM

    Y.1731 PM 802.3ah

    EFM OAM

    Intended

    Application

    Layer 2 network

    troubleshooting

    and detection of

    misconfiguration

    Layer 2 network

    troubleshooting

    and detection of

    misconfiguration

    Single-link

    keep alive

    Single-link

    keep alive

    Service

    verification

    Perfor-

    mance

    (SLA)

    verification

    Customer

    access

    verification

    Supports Layer 2 topology

    discovery, Layer 2

    loop detection

    Layer 2 loop

    detection

    Single-link

    keep alive

    Single-link

    keep alive

    Layer 2

    connectivity

    Check,

    Linktrace,

    loopback

    One-waydelay and

    delay

    variation

    Single-link

    OAM: fault

    detection,

    discovery,

    loopback

    GenerationManual Automatic Automatic Automatic

    CC: auto

    LT, LB:

    manual

    Manual

    Auto,

    Manual

    (LB)

    Standard No No No Yes Yes Yes Yes

    802.3ahOAM

    802.3ahOAM

  • 8/3/2019 Oam Best Practices

    18/26

    SERVICE PROVIDER BEST PRACTICES GUIDE

    OAM Best Practices in Mission Critical MPLS, IP, and Carrier Ethernet Networks 18 of 26

    MPLSOAMTOOLS

    This section addresses the MPLS OAM tools listed in Figure 4:

    LSP Ping LSP Traceroute BFD for RSVP-TE LSPsLSP Ping

    LSP Ping provides OAM functionality for MPLS networks based on RFC 4379. LSP Ping is used to detect

    data plane failure and to check the consistency between the data plane and the control plane.

    LSP Ping verifies that packets that belong to a particular Forwarding Equivalence Class (FEC) actually end

    their MPLS path on a Label Switching Router (LSR) that is an egress for that FEC. LSP Ping sends MPLS

    echo requests following the same data path that normal MPLS packets would traverse (Figure 25).

    LDP LSP Ping and RSVP LSP Ping are supported, as shown inFigure 26andFigure 27respectively.

    Figure 25. LSP Ping operation

    Figure 26. LDP LSP Ping example

    Figure 27. RSVP LSP Ping example

    MPLS Network

    P

    LSR

    LSP

    PE PE

    LERLER

    Echo Request

    Echo Reply

  • 8/3/2019 Oam Best Practices

    19/26

    SERVICE PROVIDER BEST PRACTICES GUIDE

    OAM Best Practices in Mission Critical MPLS, IP, and Carrier Ethernet Networks 19 of 26

    LSP Traceroute

    LSP Traceroute provides OAM functionality for MPLS networks based on RFC 4379. LSP Traceroute is used

    to isolate a data plane failure to a particular router and to provide LSP path tracing.

    With LSP Traceroute, an echo request packet is sent to each transit LSR and the LER. The echo request

    follows the same data path that normal MPLS packets would traverse. A transit LSR or an LER receiving the

    echo request checks that it is indeed a transit LSR or LER for this path and returns echo replies (Figure 28).

    LDP LSP Traceroute and RSVP LSP Traceroute are supported, as exemplified inFigure 29andFigure 30,

    respectively.

    Figure 28. LSP Traceroute operation

    Figure 29. LDP LSP Traceroute example

    Figure 30. RSVP LSP Traceroute example

    LSP Ping and LSP Traceroute ConsiderationsThe following are common considerations for LSP Ping and LSP Traceroute:

    Redundant RSVP LSPs. LSP Ping or LSP Traceroute on a LSP is performed on the currently active path. One-to-one Fast ReRoute (FRR) LSPs. LSP Ping or LSP Traceroute on a one-to-one FRR LSP is

    performed on the active path. If a path switchover occurs while a Ping or Traceroute is in-progress, the

    echo request is sent out on the old active path.

    FRR bypass LSPs. You can Ping or Traceroute the protected LSP and bypass tunnel separately, e.g., byspecifying the name of the LSP.

    MPLS Network

    P

    LSR

    LSP

    PE PE

    LERLER

    Echo request

    Echo replies

  • 8/3/2019 Oam Best Practices

    20/26

    SERVICE PROVIDER BEST PRACTICES GUIDE

    OAM Best Practices in Mission Critical MPLS, IP, and Carrier Ethernet Networks 20 of 26

    Transit-originated detour. The user can initiate a Ping or Traceroute operation on a transit-originated,detour LSP. Because the session name does not uniquely identify a session on a transit LSR, the user

    needs to specify the entire session ID (including the tunnel end-point, tunnel ID, and extended tunnel

    ID) for the detour LSP to which the LSP Ping or Traceroute command is applied.

    LSP re-optimization. If LSP re-optimization occurs while the Ping or Traceroute is in progress, the echorequest will be sent out on the current LSP instance until the new instance is created.

    BFD for RSVP-TE LSPs

    Bidirectional Forwarding Detection (BFD) RSVP-TE LSP defines a method for rapid detection of the failure of

    the data path of an LSP (Figure 31). While LSP Ping can be used for this purpose, BFD for RSVP-TE LSP

    provides the following advantages:

    BFD for RSVP-TE LSP can be configured to dynamically detect data plane failure of MPLS RSVP LSPs. BFD for RSVP-TE LSP provides faster failure detection, since it does not require control plane

    verification as LSP Ping does.

    BFD for RSVP-TE LSP can be used to concurrently detect faults on a number of LSPs without manualinteraction as required using LSP ping.

    BFD allows for the detection of a forwarding path failure in 300 milliseconds or less (depending on the

    configuration).

    Figure 31. BFD for RSVP-TE operation

    BFD for RSVP-TE LSP should be used selectively to monitor unreliable paths such as those through non-

    MPLS devices, for example, optical switches. InFigure 32, for example, the LSP traverses optical switches.

    The optical switches keep the links to the MPLS routers up even in the event of a failure between the

    optical switches. This would prevent the MPLS routers from supporting path switchover (since, as far as the

    MPLS routers are concerned, the link between them is up). BFD for RSVP-TE LSP would detect the LSP path

    failure and would trigger a path switchover.3

    Since a link failure will trigger FRR directly, the only benefit of using BFD for RSVP-TE LSP when there are no

    optical switches (or other transport types that would prevent MPLS routers from detecting the physical path

    as down) would be to detect control plane failures.

    3In configurations in which there is no alternative path, the LSP is brought down and the BFD session is deleted.

    The LSP then follows the normal retry procedures to come back up.

    MPLS Network

    P

    LSR

    LSP

    PE PE

    LERLER

    BFD

  • 8/3/2019 Oam Best Practices

    21/26

    SERVICE PROVIDER BEST PRACTICES GUIDE

    OAM Best Practices in Mission Critical MPLS, IP, and Carrier Ethernet Networks 21 of 26

    Figure 32. BFD for RSVP-TE LSP used to monitor paths through non-MPLS devices

    BFD for RSVP-TE LSP can be enabled or disabled on the fly at the global MPLS level 4 Figure 33(see ) or for

    each individual RSVP LSP (seeFigure 34) without affecting the LSP operational status. In addition, BFD for

    RSVP-TE LSP parameters can be changed on the fly without changing the state of the BFD session.

    Figure 33. Enabling BFD for RSVP LSP globally

    Figure 34. Enabling BFD for a specific RSVP-TE LSP

    MPLS OAM Summary

    Table 2 presents a summary of the MPLS OAM tools described in this section.

    LSP Ping LSP Traceroute BFD for RSVP-TE LSPs

    Intended ApplicationTo detect data plane failure

    and to check the consistency

    between the data plane and

    the control plane

    To isolate the data plane

    failure to a particular router

    and to provide LSP path

    tracingFast data plane failure

    detection for RSVP LSPs

    Supports

    Connectivity verificationConnectivity troubleshooting,fault localization

    Fast data plane failure

    detection (link may be

    up, but data path is

    down)

    Generation Manual Manual Automatic

    Standard Yes Yes Yes

    4The number of BFD sessions supported by the system must be taken into account when enabling BFD for RSVP-

    TE globally.

    LSP

    Failure

    BFD BFD

  • 8/3/2019 Oam Best Practices

    22/26

    SERVICE PROVIDER BEST PRACTICES GUIDE

    OAM Best Practices in Mission Critical MPLS, IP, and Carrier Ethernet Networks 22 of 26

    IP ANDVRFOAMTOOLS

    This section addresses the IP and L3VPN (VRF) OAM tools listed in Figure 4:

    IP and VRF Ping IP and VRF Traceroute BFD for OSPFv2, OSPFv3, IS-IS, and BGP4IP and VRF Ping

    IP Ping is a tool used to verify connectivity at the IP level. The IP ping command sends an Internet Control

    Message Protocol (ICMP) echo request to the IP address or selected hostname and waits for a reply (see

    Figure 35). The Ping VRF option lets you ping an address on a specific L3VPN, that is, an address

    associated with a VRF table.

    Figure 36shows an example of IPv4 Ping, whileFigure 37shows an example of IPv6 Ping. Note that Ping

    VRF is supported for both IPv4 and IPv6.

    Figure 35. IP Ping operation

    Figure 36. IPv4 Ping example

    Figure 37. IPv6 Ping example

    IP and VRF Traceroute

    The IP Traceroute tool identifies the path that packets take through a network on a hop-by-hop basis. The

    IP Traceroute tool works by sending ICMP echo packets with varying IP Time-to-Live (TTL) values to thedestination (seeFigure 38).

    The Traceroute VRF option lets you traceroute an address on a specific L3VPN, that is, an address

    associated with a VRF table.

    Figure 39shows an example of IPv4 Traceroute, whileFigure 40shows an example of IPv6 Traceroute.

    Note that Traceroute VRF is supported for IPv4 and IPv6.

    Sourcerouter

    Destinationrouter

    Echo request

    Echo reply

  • 8/3/2019 Oam Best Practices

    23/26

    SERVICE PROVIDER BEST PRACTICES GUIDE

    OAM Best Practices in Mission Critical MPLS, IP, and Carrier Ethernet Networks 23 of 26

    Figure 38. IP Traceroute operation

    Figure 39. IPv4 Traceroute example

    Figure 40. IPv6 Traceroute example

    BFD for OSPFv2, OSPFv3, IS-IS, and BGP4

    Bidirectional Forwarding Detection (BFD) defines a method for rapid detection of the failure of a forwarding

    path by checking that the next-hop router is alive. Without BFD enabled, it can take from 3 to 30 seconds to

    detect that a neighboring router is not operational (and packet losses would occur during that time).

    BFD can detect data path failures when a link is up, but the data path is not, for example, failures due to

    misconfiguration and path through optical switches (seeFigure 41). BFD allows for the detection of a

    forwarding path failure in 300 ms or less (depending on the configuration). When BFD is enabled on a

    routed interface, a BFD session is automatically established when a neighbor router is discovered.

    Figure 41. BFD operation

    Source

    router

    Destination

    router

    Echo request

    Echo reply

    Echo request

    Echo reply

    Failure

    Link is up

    BFDBFD BFD

    BFDBFDBFD

  • 8/3/2019 Oam Best Practices

    24/26

    SERVICE PROVIDER BEST PRACTICES GUIDE

    OAM Best Practices in Mission Critical MPLS, IP, and Carrier Ethernet Networks 24 of 26

    Figure 42shows an example of BFD configuration. BFD can be enabled or disabled for all interfaces or

    per interface for use with OSPFv2 (that is, IPv4), OSPFv3 (that is, IPv6), and IS-IS, as shown inFigure 43,

    Figure 44, andFigure 45, respectively.

    Figure 42. BFD configuration example

    Figure 43. Enabling/disabling BFD for OSPFv2 for all interfaces (top) or per interface (bottom)

    Figure 44. Enabling/disabling BFD for OSPFv3 for all interfaces (top) and per interface (bottom)

    Figure 45. Enabling/disabling BFD for IS-IS for all interfaces (top) and per interface (bottom)

  • 8/3/2019 Oam Best Practices

    25/26

    SERVICE PROVIDER BEST PRACTICES GUIDE

    OAM Best Practices in Mission Critical MPLS, IP, and Carrier Ethernet Networks 25 of 26

    BFD for BGP4 supports single-hop and multi-hop BFD on Ethernet, POS, and Virtual Interfaces. BFD for

    BGP4 can be enabled or disabled at the global BGP router level, for each individual peer, or for a peer

    group, as shown inFigure 46, Figure 47, andFigure 48, respectively.

    Figure 46. Enabling/disabling BFD globally for BGP4

    Figure 47. Enabling/disabling BFD for a specific BGP4 peer

    Figure 48. Enabling/disabling BFD for a BGP4 peer group

    IP and VRF OAM Summary

    Table 3 presents a summary of the IP and VRF OAM tools described in this section.

    IP Ping

    VRF Ping

    IP Traceroute

    VRF Traceroute

    BFD for OSPFv2,

    OSPFv3, IS-IS, BGP4

    Intended Application Connectivity verification

    at the IP level

    Identification of the path that IP

    packets take through a network

    on a hop-by-hop basisFast data path failure

    detection

    SupportsConnectivity verification Connectivity troubleshooting,

    fault localizationData path failure detection

    (link may be up, but data

    path is down)Generation Manual Manual AutomaticStandard Yes Yes Yes

  • 8/3/2019 Oam Best Practices

    26/26

    SERVICE PROVIDER BEST PRACTICES GUIDE

    SUMMARY

    This paper reviewed OAM tools available for MPLS, IP, and Ethernet networks at various layers of the stack

    and reviewed best practices for choosing the right OAM tool to use in a particular network deployment.

    These tools provide unparalleled power for an operator to proactively manage networks and customer

    Service Level Agreements (SLAs). These OAM tools address fault detection, fault verification, and fault

    isolation; enable proactive detection of service degradation; and provide service performance monitoringand SLA verification.

    2010 Brocade Communications Systems, Inc. All Rights Reserved. 11/10 GS-BP-356-00

    Brocade, the B-wing symbol, BigIron, DCFM, DCX, Fabric OS, FastIron, IronView, NetIron, SAN Health, ServerIron, TurboIron, and Wingspan

    are registered trademarks, and Brocade Assurance, Brocade NET Health, Brocade One, Extraordinary Networks, MyBrocade, VCS, and VDX

    are trademarks of Brocade Communications Systems, Inc., in the United States and/or in other countries. Other brands, products, or

    service names mentioned are or may be trademarks or service marks of their respective owners.

    Notice: This document is for informational purposes only and does not set forth any warranty, expressed or implied, concerning

    any equipment, equipment feature, or service offered or to be offered by Brocade. Brocade reserves the right to make changes

    to this document at any time, without notice, and assumes no responsibility for its use. This informational document describes

    features that may not be currently available. Contact a Brocade sales office for information on feature and product availability.

    Export of technical data contained in this document may require an export license from the United States government.