Carrier Class Operating System

Embed Size (px)

Citation preview

  • 8/14/2019 Carrier Class Operating System

    1/20

    White Paper

    Juniper Networks, Inc.

    1194 North Mathilda Avenue

    Sunnyvale, California 94089

    USA

    408.745.2000

    1.888 JUNIPER

    www.juniper.net

    Architectural Issues in Carrier ClassOperating Systems

    Jeff Doyle

    JUNOS Product Management

    Part Number: 200209-001 Dec 2006

  • 8/14/2019 Carrier Class Operating System

    2/20

    Copyright 2006, Juniper Networks, Inc2

    Architectural Issues in Carrier Class Operating Systems

    Table of Contents

    Executive Summary ................................................................................3

    Introduction ............................................................................................3

    Router Operating System Objectives .......................................................4

    Objectives or Any Router OS ...........................................................4

    Open Standards Support .............................................................4

    Flexibility ....................................................................................5

    Manageability .............................................................................5

    Basic Security .............................................................................5

    Service and Support....................................................................6

    Basic Reliability ..........................................................................6

    What Makes a Router OS Carrier Class? ............................................6

    Stability ......................................................................................7

    Advanced Security ......................................................................8

    Scalability ...................................................................................8

    Precision .....................................................................................9

    High Availability ..........................................................................9

    Consistency ................................................................................9Predictability.............................................................................10

    Carrier Class Reliability .............................................................10

    JUNOS Architecture .............. ............. .............. .............. .............. .......... 11

    Modularity .............. .............. ............. .............. .............. ............. .... 11

    Managing Modular Architectures ....................................................12

    Intelligent Modular Design ..............................................................12

    Intelligent Modular Design: The JUNOS Routing Module .................13

    Intelligent Modular Design: The Periodic Packet

    Management Daemon .................................................................15

    The JUNOS Kernel ..........................................................................16

    Engineering Discipline ..........................................................................17

    JUNOS Release Schedule .................................................................18

    JUNOS Single Train Release Model ..................................................18New Product Introduction ...............................................................19

    Conclusions...........................................................................................20

  • 8/14/2019 Carrier Class Operating System

    3/20

    Copyright 2006, Juniper Networks, Inc

    Architectural Issues in Carrier Class Operating Systems

    Executive Summary

    Juniper Networks original market was large service providers, carriers, and other high

    perormance networks requiring the utmost levels o dependability while providing a rich set o

    eatures. We recognized that no router operating systems existed at that time to answer these

    requirements. Only recently have other vendors begun oering router operating systems that are

    being positioned as carrier class.

    This paper examines the characteristics that dene a carrier class router operating system, and

    the architectural and engineering practices that are required to support these characteristics.

    From its inception Juniper Networks has maintained that a modular sotware architecture is

    undamental to any carrier class operating system. Although at least one o our competitors

    has long disagreed with that assertion, they have recently released a modular operating

    system o their own, claiming that their architecture is superior to JUNOS because it is more

    modular. We contend that this is a nave understanding o the benets o modularity, arising

    rom inexperience in building and managing such sotware. Modularity is an engineering tool

    or creating a reliable operating system, and it is as important to understand the limitations o

    modularity as it is to understand its useulness.

    Even the most well designed operating system, however, cannot continue to deliver carrier

    class quality unless it is supported by disciplined engineering practices. Complexity must be

    controlled through unwavering adherence to strict development processes and release standards;

    otherwise an operating system quickly becomes unpredictable, unreliable, and unmanageable.

    Introduction

    For a almost two decades IP has been synonymous with Internet services. When you

    thought o an IP network you thought o web browsing, e-mail, data access and transer, and

    IM. During those early years network operators gained experience and condence in IP as a

    oundation communications protocol, and now we are in the beginning years o building much

    more demanding and critical services over IP networks. Voice, video, an array o business and

    entertainment services, military and emergency response communications, industrial sensors

    and controls, mobile and wireless services these are just some o the capabilities that are nowbeing deployed over IP networks.

    The driver or the move to consolidation o multiple services over IP is economics: It is ar

    cheaper to build and operate a single inrastructure that can support many services, and it

    is attractive to customers to receive multiple services rom a single provider over a single

    connection . One o the more prominent examples o this move is BTs 21CN project. Once the

    incumbent telephone monopoly in the United Kingdom, BT is abandoning its circuit-switched

    voice inrastructure and consolidating both old and new services onto a high-perormance IP

    backbone and in the process transorming itsel rom a traditional PSTN into a cutting-edge,

    orward-thinking communications company. While BT itsel calls the move radical, one o the

    expected benets should make sense to the most conservative o executives: An annual savings,

    when the transition is complete, o 1 billion ($1.86 billion US).

    Digitised voice, data and video can now be combined, changed, merged and manipulatedon a single digital platorm, says BTs Paul Reynolds. And i it is the ability to merge multiple

    inormation ormats on a single platorm that is driving the desire or convergence at a device

    level, the availability o carrier class IP networks, multi-service networks and sotware-driven

    switching, are uelling the agenda or undamental change in our industry.

  • 8/14/2019 Carrier Class Operating System

    4/20

    Copyright 2006, Juniper Networks, Inc4

    Architectural Issues in Carrier Class Operating Systems

    Given its corporate history, BT certainly understands exactly what is meant by carrier class IP

    networks: It is the PSTNs and telcos, with over a century o developing and operating circuit-

    switched networks, that have set the modern expectations or communications service quality.

    Best-eort packet delivery is perectly acceptable or early Internet-oriented applications such

    as le transers and e-mail, but is wholly unacceptable or quality-sensitive applications such as

    voice and video. Convergence o such services onto a common IP inrastructure can succeedonly i the service quality meets or exceeds that o legacy networks.

    The heart o all IP networks is the packet processors routers and the heart o all routers is

    the operating system. A carrier class IP network, then, must begin with a carrier class router

    operating system.

    Router Operating System Objectives

    The two basic unctions o any router are route processing and packet switching. These unctions

    are accomplished, respectively, by two logical entities: the control plane and the orwarding

    plane. The routers operating system is the sotware that creates these two logical entities the

    routing protocols and the various databases that the routing protocols use to build the orwarding

    inormation, or example, are a component o the control plane, or example. The OS alsomanages the physical components o the router, and is the means by which you access the

    router both directly, such as the CLI, and indirectly, such as SNMP. It also includes peripheral

    protocols or managing and operating the router, such as FTP or TFTP, NTP, Telnet, and SSH.

    Objectives or Any Router OS

    Beore discussing the characteristics that make a router OS carrier class, there is a more basic

    set o eatures that you should expect o any router OS, rom the largest high-perormance core

    router to the smallest home routers. These eatures are:

    Support or open standards

    Feature fexibility

    Manageability

    Basic security

    Service and support

    Basic reliability

    Open Standards Support

    Any router uses a number o protocols and or a high-end router it is a long list indeed to

    perorm its duties. These protocols can be specied by open standards bodies like the IETF, IEEE,

    and ITU-T, or they can be proprietary to the manuacturer o the routers OS. Open standards are

    important or three reasons.

    First, they give you some assurance that your router will interoperate with other routers

    supporting the same standards, regardless o the manuacturer. Proprietary protocols obligate

    you to use the same vendor or all routers between which the protocol must operate, sharplyreducing your design options and your ability to negotiate pricing among multiple vendors.

    Second, your network operators are ar more likely to be intimately amiliar with open standards

    because the specications are publicly available. This understanding is essential when your

    network experiences problems or ailures; thereore open standards support contributes directly

    to network reliability.

  • 8/14/2019 Carrier Class Operating System

    5/20

    Copyright 2006, Juniper Networks, Inc 5

    Architectural Issues in Carrier Class Operating Systems

    Third, perhaps counter-intuitively, open standards are more secure. It is certainly true that

    malicious parties study open protocols or security vulnerabilities; but it is equally true that

    open protocols are subject to a scope o peer review not possible or a single vendor. Thereore

    security risks are more likely to be identied and corrected in open standards beore the

    protocols are ever implemented. A vulnerability in proprietary code is more likely to go

    unnoticed until it is exploited.

    Flexibility

    Most networks change. New routing protocols are introduced as the network grows, new

    eatures are enabled to support added network missions, new interaces are installed to satisy

    growing bandwidth or redundancy requirements. The router OS must present you with a rich

    menu o protocol and conguration options to support not only your initial design choices but

    the changes you are sure to make as your network grows. Additionally, the OS must have the

    capability o being easily upgraded to accommodate both improved code and newly-added

    eatures rom the vendor.

    Manageability

    Just as the OS should support a variety o protocols to adapt to dierent design philosophies and

    network growth, the OS should provide a variety o means by which to manage the router. At aminimum, the OS should provide:

    An intuitive command line interace (CLI) with extensive error checking capabilities and

    help options

    A web-based conguration tool

    Simple Network Management Protocol (SNMP)

    The applicable open standards management inormation bases (MIBs)

    The OS should also support both direct access to the router (in conjunction with the physical

    router architecture) through a console connection and a modem connection; and remote access

    to the router both through a dedicated management network connection and in-band access via

    protocols such as Telnet and SSH.

    Further management fexibility can be achieved by oering an application programminginterace (API) using an open standard such as Extensible Markup Language (XML). Such an

    interace allows the router to be managed and congured using third-party management

    platorms.

    Basic Security

    There are many aspects to security or an IP router, but there are certain security eatures that

    you should expect every router, rom the smallest to the largest, to support. A eature that should

    be present on any router o any size is password-secured access. On any but the smallest home

    routers this access authentication should be supplemented by the ability to dene permissions

    or dierent users that is, the ability to speciy what actions a given user or user group is

    authorized to perorm on the router and the ability to monitor and record what actions

    each user takes while accessing the router. These three security unctions should be urther

    strengthened through the capability o being supported by independent servers: For example,

    Radius or TACACS or authentication and authorization, and an independent le server or

    accounting.

    A remote access protocol such as Secure Shell (SSH) should be available as an alternative to less

    secure access protocols such as Telnet.

    All routing protocols should have the capability o authenticating all peers. This is highly

    recommended practice within your own network, and essential when peering with untrusted

    neighbors meaning all routers in networks not under your direct control.

  • 8/14/2019 Carrier Class Operating System

    6/20

    Copyright 2006, Juniper Networks, Inc6

    Architectural Issues in Carrier Class Operating Systems

    Finally, a router should not have any potentially vulnerable protocols, such as Telnet or small

    servers such as Finger, enabled by deault. You should be required to explicitly enable all services

    and protocols you desire to run on the router, and never be required to disable services you do

    not want. Said another way, a router powered up out o the box should do almost nothing until

    you tell it what to do. This gives you a reasonable assurance that no exploitable vulnerabilities

    will go overlooked.

    Service and Support

    The value o strong technical support becomes most apparent when things go wrong. At such

    times getting your network back to normal must be done as quickly as possible, which means

    support sta must be both knowledgeable and responsive. At the same time, technical support

    must be proactive; or operating systems that means implementing engineering processes

    that minimize bugs and interoperability problems beore customers ever see the sotware, and

    implementing ongoing programs that can identiy and correct problems in production code

    beore the problems become apparent as a wider network concern.

    Basic Reliability

    Reliability matters or even home routers. Interruption o services is at the least irritating, and

    can drive customers away rom vendors o routers that are perceived to be undependable.

    Reliability increases in importance with the criticality o the services the router is transporting.

    But just what is reliability? At a basic level we understand reliability to be the ability o a system

    to unction as expected or a given amount o time. Ideally we would like to add without

    ailures to the denition; however, as the complexity o a system increases the potential or

    ailure increases. Thereore a reliable system is one in which ailures are minimized as much as

    reasonably possible, but also one which can recover rom a ailure quickly and eciently when

    the unexpected does occur.

    Given this denition, all o the eatures discussed so ar can be seen to contribute to reliability:

    Open standards support, fexibility, manageability, security, and strong technical suppport.

    Moving to the discussion o the characteristics o a carrier class OS, all o those dening

    characteristics can also be listed as the contributors to carrier class reliability.

    What Makes a Router OS Carrier Class?

    A carrier class router operating system must have all o the eatures described in the previous

    section, but the quality o those eatures must ar exceed what we have described so ar. There

    are also additional qualities to be ound in a carrier class OS. Although you might nd one or a

    ew o these additional eatures in other operating systems, carrier class requires the presence o

    all o them. The unique eatures distinguishing a router OS as carrier class are:

    Stability

    Advanced security

    Scalability

    Precision High availability

    Consistency

    Predictability

    Carrier class reliability

    1 An exception to this rule, called Hierarchical VPLS (H-VPLS) is discussed in a later section.

  • 8/14/2019 Carrier Class Operating System

    7/20

    Copyright 2006, Juniper Networks, Inc 7

    Architectural Issues in Carrier Class Operating Systems

    Stability

    Stability is the capability o a router to deliver invariable perormance under variable network

    circumstances. Every network has its ups and downs: Erratic trac loads and topological

    change. I a network operators business is dependent on guaranteed service levels as carrier

    class networks are no router in the network can suer a perormance degradation while

    coping with variable network behavior.

    Every router perorms two very undamental unctions: Packet orwarding and route processing.

    Packet orwarding is o course the action o reading the destination address (and possibly other

    inormation) in the header o an incoming packet, making a decision about where that packet

    should go, and then switching the packet to the correct outgoing interace. Route processing

    is the means by which the router comes to know how to make the correct packet orwarding

    decisions: Routers exchange inormation about the network inrastructure among themselves

    and then determine the best path to all known destinations based on some agreed-upon set o

    rules2.

    A router must perorm these two basic unctions at the same time, and that has implications

    or stability. I the trac load through the router becomes very heavy most resources might be

    used in perorming packet orwarding, causing delays in route processing, and resulting in slow

    reactions to changes in the network topology. On the other hand, a signicant change in the

    network topology might cause a food o new inormation to the router; most resources might be

    used in perorming route processing, slowing the routers packet orwarding.

    The key to the problem as described is internal resources. I most resources are consumed by one

    o the two basic unctions, the other unction suers, and the router is destabilized. The answer

    to the problem is to perorm these unctions in separate physical entities, each with its own

    resources, as shown in Figure 1. In such an architecture the packet orwarding (orwarding plane)

    and the route processing (control plane) do not draw processing cycles away rom each other.

    Figure 1

    The physical architecture depicted in Figure 1 has positive implications on the routers operating

    system. Because a part o the OS is the routing protocols, the OS resides in the control plane.

    So, the orwarding plane can perorm to ull capacity without aecting the ability o the OS to

    control the entire physical system. It also means that the OS is protected rom unintentional or

    malicious infuences o the network, as discussed in the next section.

    Control Plane

    Route Process

    RIB

    Management Process

    Kernel

    FIB

    Security

    Forwarding Plane

    FIB

    Layer 2 Processing

    Interfaces

    2Static routes are a notable exception to this description; the route processing mainly occurs inside a human brain and the results o the best path determination are

    manually entered into the router. But while static routes are commonly a part o carrier class network congurations, they are never a primary source o route inor-

    mation in such networks.

  • 8/14/2019 Carrier Class Operating System

    8/20

    Copyright 2006, Juniper Networks, Inc8

    Architectural Issues in Carrier Class Operating Systems

    Advanced Security

    A carrier class router OS must go ar beyond the basic security eatures discussed earlier in this

    paper. At this advanced level, eatures and tools must be provided which address two missions:

    Strong protection o the router itsel

    Protection o the network in generalAs mentioned at the end o the previous section, the physical architecture o Figure 1 is a key

    contributor to protection o the router. Attacks against the router will almost always come rom

    the network (rather than an out-o-band connection), and are directed against one o the routing

    protocols or the OS itsel. That means that attacks must enter at the packet orwarding entity

    and then make their way up to the route processing entity. The link between these two entities,

    then, serves as a choke point at which malicious packets can be identied and stopped, as

    shown in Figure 2.

    Figure 2

    Powerul rewalling capabilities must be available or detailed identication and passing o only

    specically permitted packets to the control plane, blocking all others. Rate limiting capabilities

    must also be available so that essential packets permitted through the rewall lters, such as

    ICMP, cannot be exploited or fooding attacks.

    The tools or protecting the router ne-grained packet ltering and rate limiting should be

    extended to the protection o the network itsel, by being extendable to the interaces o the

    orwarding plane. In this application, however, it is important that the application o such control

    unctions on production interaces does not negatively impact the perormance o the router.

    A carrier class router OS should also oer tools that help the network operator take action

    against malicious trac entering the network. For example, i a distributed denial o service

    (DDoS) attack is in progress against a node in the network or is transiting the network toward

    its target, the OS should have capabilities that aid the operator in tracing the attack trac to itsentry points, where specic lters or rate limiters can be enabled to stop or alleviate the attack.

    Scalability

    The eature fexibility discussed earlier certainly contributes to scalability. At the carrier class

    level, the eature fexibility must be at perormance. That is, the network operator must be able

    to condently enable a multitude o eatures on a given router without reducing the routers

    basic packet processing and orwarding rates. For example, in support o a multiservice network

    a router might be running OSPF or IS-IS, Multiprotocol BGP, intricate routing policies, MPLS

    and its associated signaling and trac engineering protocols, related layer 2 and layer 3 VPNs,

    IP multicast protocols, highly granular packet identication both or security and or trac

    classication, and advanced queuing all while orwarding packets at or near line rate.

    Control Plane

    AttackPackets

    Forwarding Plane

  • 8/14/2019 Carrier Class Operating System

    9/20

    Copyright 2006, Juniper Networks, Inc 9

    Architectural Issues in Carrier Class Operating Systems

    Scalability also means that new eatures can be added to the OS quickly, eciently, and saely.

    Finally, scalability means that the same OS can be used on multiple hardware platorms

    and with any interace type; upgrading hardware or adding interaces must not require the

    replacement o the existing OS code with a dierent hardware-, interace-, or eature-specic

    version o the OS.

    Precision

    Route calculation errors, even transient ones, can cause inaccurate packet orwarding,

    orwarding loops, and black holes. In a network carrying sensitive trac such as voice and

    entertainment-quality video such errors, no matter how temporary, are unacceptable. Thereore

    the route calculations o a carrier class OS must be correct every time. Without precision

    stability, scalability, and security are impossible.

    High Availability

    With the convergence o high-quality, high-demand services such as voice and video onto IP

    inrastructures, network outages o any kind are no longer acceptable. Even relatively small

    packet losses can have a negative eect on users perception o the service delivery; a major

    node, link, or interace ailure can have serious eects or the provider. A carrier class router

    operating system must thereore itsel be resilient to ailure, and must provide the network

    operator with tools that minimize network ailures whenever possible and that minimize the

    eects o ailures that do occur.

    It must be noted that most unplanned network outages are due not to hardware or sotware

    ailures, but to conguration mistakes. The possibility o ailures would be much reduced,

    writes Jerey Nudler, a Senior Analyst with Enterprise Management Associates, i you consider

    that changing device conguration causes 60% o downtime due to human error.3 A carrier

    class router operating system must take this human actor into account and help the operator

    avoid making conguration mistakes.

    Planned outages taking a node ofine or routing maintenance or upgrades are somewhat

    less service impacting than unplanned outages, because they are predictable. Nevertheless,

    modern service level guarantees and ve nines network standards preclude the traditionalpractice o ofine router operations. Carrier class router operating systems must enable in-

    service router changes and upgrades.

    Consistency

    Multiservice networks require complex congurations, which in turn can present enormous

    operational challenges. Considering, as the previous section emphasizes, that human error is the

    major cause o network outages, unnecessary operational complexity must be avoided whenever

    possible. I dierent versions o an operating system are required or dierent platorms,

    dierent interaces, or dierent eatures, the diculties o network management and hence the

    chances o operational mistakes are signicantly increased.

    The ability to run the same OS sotware image on all routers helps control operational

    complexity. Such consistency requires several actors:

    No platorm-specic versions o the OS

    No interace-specic versions o the OS

    No eature-specic versions o the OS

    3http://www.networkworld.com/news/2005/101005-iet.html

  • 8/14/2019 Carrier Class Operating System

    10/20

    Copyright 2006, Juniper Networks, Inc10

    Architectural Issues in Carrier Class Operating Systems

    A consistent OS contributes to network stability and availability not only rom an operational

    aspect, but also rom a sotware maintenance aspect. I the OS vendor is managing only a single

    release at a time, adding enhancements and new eatures is greatly simplied; because changes

    impact only a single code set, the changes can be more thoroughly tested. This translates directly

    into more reliable sotware or the customer. Similarly, the customers regression testing beore

    an upgrade to a newer OS release is much more trustworthy i there are no multiple versions oreature packages to test, reducing the chance o overlooked incompatibilities or unexpected code

    conficts during implementation.

    Predictability

    Delivery o high quality services requires a predicable transport network. There are two aspects

    to predictability that are infuenced by the router OS:

    Predictable network behavior

    Predictable sotware management by the OS vendor

    The actors contributing to consistency discussed in the previous section an OS that is not

    platorm or interace specic, and no separate eature packages also help make the network

    predictable by reducing the chances or unexpected events during OS changes. These actorsalso help conserve predictability because the addition o eatures, interaces, or platorms to the

    network are ar less likely to entail a change o OS sotware.

    Network predictability is also helped by OS resilience. Although engineering practices that

    minimize sotware bugs are crucial, occasional bugs are an inescapable act in any complex

    sotware code. Thereore an OS architecture that can isolate and limit the negative eects o

    bugs, preventing them rom causing systemwide ailures, supports network predictability.

    Another aspect o predictability is the manner in which the vendor manages the OS. Tightly

    controlled development milestones, well-dened engineering quality principles, and a strict

    adherence to a regular release schedule all enable condent planning or the network operator.

    Carrier Class Reliability

    Carrier class reliability is dened by all the qualities that go into basic reliability, as describedearlier in this paper, plus all o the carrier class qualities described in this section: stability,

    advanced security, scalability, precision, high availability, consistency, and predictability. The

    reduction o any one o these qualities diminishes the overall ability o the operating system to

    ulll the requirements o modern carrier class networks.

    2 It should be noted that there are corresponding Options A, B, and C or inter-AS Layer 3 MPLS VPNs; Option B or VPLS is actually much more scalable than its

    L3VPN counterpart.

  • 8/14/2019 Carrier Class Operating System

    11/20

    Copyright 2006, Juniper Networks, Inc 11

    Architectural Issues in Carrier Class Operating Systems

    JUNOS Architecture

    Juniper Networks rst ocus market was carriers and large-scale service providers. We

    recognized that there were no carrier class router operating systems in existence, and we

    designed JUNOS to ll that void. The architectural choices we made then proved to be the right

    choices; while continually building upon that oundation operating system architecture, we havenever ound, nor do we oresee, a need to consider a new operating system. In act it is only

    recently that some competitors have begun attempting to oer an operating system similar to

    what we rst oered a decade ago.

    This section examines the key architectural eatures o the JUNOS sotware, and how these

    eatures enable JUNOS to meet the requirements o a carrier class OS.

    Modularity

    The most essential architectural characteristic o JUNOS is its modularity. Rather than a single,

    highly complex code base the JUNOS sotware consists o a set o individual components, each

    running in its own protected memory space, communicating with each other through well-

    dened interaces, and all controlled by the JUNOS kernel (Figure 3). The separate modules,

    called daemons4, are key to both stability and scalability.

    Figure 3

    Modularity is essential to stability because o the unctional separation o sotware components.

    A malunction or bug in one module might cause the module to ail, while the rest o the

    system continues unctioning; a monolithic operating system, on the other hand, has no such

    compartmentalization and a similar malunction or bug is likely to cause a ull system crash.

    Similarly, because each module operates in its own protected memory space and cannot

    scribble on another modules memory space, the modules cannot disrupt each other.

    Stability is also supported by the ability to replace an individual module. So i a problem is

    identied in a given module, that module can be changed; without modularity the entire

    operating system would have to be changed, meaning the router must be taken out o service, to

    perorm a similar code patch.

    Protocols(RPD)

    PPMD(Hellos)

    ChassisMgmt

    Operating System

    SNMP

    InterfaceMgmt

    4Daemon is a Unix term and refects the FreeBSD origins o the JUNOS kernel.

  • 8/14/2019 Carrier Class Operating System

    12/20

    Copyright 2006, Juniper Networks, Inc12

    Architectural Issues in Carrier Class Operating Systems

    The concept o modular scaling is certainly not new; one o the innovations Vint Cer and Bob

    Kahn introduced in TCP/IP was the idea o a layered protocol stack, allowing the change o one

    layer without aecting the other layers.

    The modular JUNOS architecture supports scalability because new modules can be added as

    needed, and existing modules can be updated, without requiring a complete overhaul o the

    entire OS code. This principle has been proven over and over; through the lie o JUNOS several

    dozen new modules have been added to the original OS as new eatures and capabilities

    have been introduced. Yet or years ater the advent o JUNOS other routers continued to run

    monolithic operating systems with inherent instability and scaling limits.

    Managing Modular Architectures

    There are also engineering advantages to the JUNOS architecture that contribute to stability and

    scalability. A reasonably small team o engineers manages the sotware comprising each module,

    and the same team o engineers is responsible or the same module release ater release.

    Thereore the code is much better understood than it would be i it were a more integral part

    o a monolithic code base or i there were separate release teams. As a result any additions or

    change to the module code is very well understood in terms o how the changes will aect the

    code. Because the module communicates with other modules through a dened interace, its

    interactions with other modules are tightly controlled.

    Because dedicated engineering teams manage the modules, communication within the team and

    between teams can be careully controlled. A strong sense o ownership is also inspired, insuring

    ewer bugs in the code. And there are no separate bug x teams; when bugs do arise, the team

    responsible or writing the code is responsible or correcting the code.

    So the engineering advantages o the modular JUNOS architecture result in aster code

    development, testing, and debugging. The end benet is to the customer is sound, reliable

    operating system sotware.

    Intelligent Modular Design

    It might seem that i modularity is good, the more modular the OS the better. But this is not the

    case. While a component module must be small enough to be benecially managed, it must also

    be large enough to contain major interdependencies. I a module is made too small, articial

    barriers will be created between dependent unctions, and the interprocess communication

    between those unctions adds complexity to the overall system.

    A undamental advantage o grouping unctions into individual modules or processes, as already

    discussed, is that the processes can be stopped, replaced, or can ail independently without

    crashing the entire system. When deciding whether a unction should be a part o an existing

    module or should be in its own module, a determination must be made about what it means or

    this unction to stop or ail independently: Will other unctions be aected? I so, the unctions

    are interdependent and should probably be grouped together in the same module. This concept

    is illustrated in Figure 4.

  • 8/14/2019 Carrier Class Operating System

    13/20

    Copyright 2006, Juniper Networks, Inc 1

    Architectural Issues in Carrier Class Operating Systems

    Figure 4

    Another consideration involves shared unctions. I there is a common unction that serves

    several other unctions, all o those unctions should probably be grouped together in the same

    module. Otherwise, as Figure 5 shows, either heavy interprocess communication must be

    accepted in order or the separated unctions to work together, or the shared unction must be

    duplicated in each module.

    Figure 5

    Intelligent Modular Design: The JUNOS Routing Module

    A particularly clear example o intelligent modular design can be ound in the JUNOS routing

    module, called the Routing Protocol Daemon (RPD). The RPD contains all o the routing

    protocols, such as OSPF, BGP, IS-IS, and RIP. It has been proposed by others that this is an

    old architecture, and that containing each routing protocol in its own module (a BGP module,

    an OPSF module, and so on) is better. There are two arguments to be made in avor separate

    protocol modules:

    A single protocol can ail or be stopped independently, without aecting the other

    protocols.

    A single protocol can be upgraded to gain new eatures without the necessity o

    upgrading the entire OS.

    A

    A EF

    G

    HI

    J

    K

    M

    O

    L

    N

    B

    C

    D

    = Function

    Module 1 Module 2 Module 3

    = Functional Interaction

    Interdependencies well contained

    Light interprocess communications

    A

    A E

    F

    G

    H

    I

    J

    K

    M

    O

    L

    NB

    C

    D

    = Function

    Module

    1

    Module

    2

    Module

    5

    Module

    6

    Module

    3

    Module

    4

    = Functional Interaction

    Interdependencies poorly contained

    Heavy interprocess communications

  • 8/14/2019 Carrier Class Operating System

    14/20

    Copyright 2006, Juniper Networks, Inc14

    Architectural Issues in Carrier Class Operating Systems

    Both o these arguments are attractive and make sense on the surace. They are, ater all, two

    o the undamental reasons a modular OS is superior to a monolithic OS. And in act, Juniper

    Networks has on more than one occasion considered replacing the RPD with individual protocol

    modules. In each case Juniper engineers concluded that such a change was a move in the wrong

    direction, and that more problems would be created than would be solved.

    The rst argument, that protocols can ail or be stopped independently without aecting otherrouting protocols, is fawed because it assumes that each protocol is completely independent.

    Such is not the case. Even at a supercial level all o the protocols running on a router tend to

    have dependencies on each other. I OSPF ails, or example, it can aect IBGP, RSVP-TE, LDP,

    and the RPF checks used both or security and or IP multicast. I BGP is stopped, it aects not

    only inter-AS routing but possibly L2 and L3 MPLS VPNs, and IP multicast. Dig a little deeper and

    you nd that all IP routing protocols share a dependence on several other common protocols

    and unctions such as ICMP and ARP. Go even deeper and you nd that the protocols must

    cooperate in such basic unctions as choosing a best route and maintaining the routing database.

    I the routing protocols are in separate modules, heavy interprocess communication is required,

    burdening the overall system, and sharing such basic unctions as ARP and routing database

    maintenance becomes complex problems.

    By maintaining all routing protocols in a single module, the RPD, the many interdependenciesamong individual protocols are contained. The interprocess communication load is not taxed,

    shared unctions are controlled, and the overall system is simpler, which translates into a more

    reliable routing platorm.

    The second argument, that modularizing individual protocols allows the customer to upgrade

    only the protocol he wishes in order to acquire new eatures, is particularly appealing. For

    example, the BGP module could be upgraded to a version that supports a desirable new eature

    without the necessity o upgrading the entire OS. This provides the appearance o an In-

    Service Sotware Upgrade (ISSU), because one section o code can be replaced without taking

    the entire system out o service. Modularizing at the protocol level would seem to make sense

    when oering this approach, so that individual protocols can be updated as non-disruptively as

    possible.

    But given the interdependencies among protocols already discussed, replacing a single protocolsuch as BGP is hardly as non-disruptive to routing operations as it might appear on the surace.

    Far more important, the practice o selectively replacing protocol modules or any OS module,

    or that matter comes at a steep price in lost consistency and predictability. To illustrate the

    problem, take a hypothetical router OS that has ve currently available releases: Release A

    through Release E. Release B is newer (and thereore has newer eatures) than A, C is newer than

    B, and so on. In each release, there is an OSPF module, an IS-IS module, a BGP module, and a

    RIP module. You are allowed to pick and choose among the protocol modules to attain exactly the

    eatures you want: Perhaps OSPF rom Release B, RIP rom Release A, and BGP rom Release D.

    To make this menu o combinations available to you, the sotware vendor must maintain and

    understand the interactions o each routing protocol module rom each release with all o the

    other routing protocol modules rom each release. Given the our protocols across ve releases,

    the total possible release-specic protocol combinations is approximately 45, or 1,024. Wheneverthe vendor adds a new eature to one o the protocols he must perorm regression testing

    not just or that release, but or all o the 1000+ possible protocol combinations. And i you

    experience problems with a newly upgraded protocol module, the vendors technical support

    personnel must understand the interoperability implications o all 1000+ combinations.

    5The actual number o combinations is slightly less (16 less, in this example), because a given protocol rom one release would never be combined with the same

    protocol rom another release.

  • 8/14/2019 Carrier Class Operating System

    15/20

    Copyright 2006, Juniper Networks, Inc 15

    Architectural Issues in Carrier Class Operating Systems

    This example considers just RIP, OSPF, IS-IS, and BGP modules. Add to that an MPLS module

    and an IP multicast module in each o the ve releases. The possible protocol combinations

    now become approximately 65, or 7,776. And this assumes that the MPLS module is not urther

    divided into separate RSVP-TE, LDP, L2 VPN, L3 VPN, and VPLS modules, or that IP multicast is

    not urther divided into its constituent protocols. Take the practice beyond just routing protocol

    modules and include all o the OS modules, and the possible package combinations acrossseveral releases soars exponentially into the hundreds o thousands.

    The liabilities o this approach are clear: A vendor might gain positive short-term customer

    response by allowing mix-and-match modules rom dierent releases, but the code will

    quickly become unmanageable. The end result is an inconsistent, unpredictable, and ultimately

    unreliable operating system.

    The JUNOS RPD thereore remains a single module containing all routing protocols. And while

    the RPD can be replaced as a module, Juniper Networks supports doing so only or installing

    code patches and bug xes when necessary; new eatures are acquired by upgrading the entire

    OS. This practice is key to a well understood, closely controlled, highly reliable operating system.

    Intelligent Modular Design: The Periodic Packet Management Daemon

    Although good engineering practice dictates keeping all routing protocols in a single module,

    there is another view o modularization o the routing unctions. To understand where

    modularization is benecial in the routing process it is necessary to think about basic routing

    unctions. On the one hand, a routing process is responsible or perorming route calculations

    using the inormation presented to it. Precision and stability require that this calculation be

    allowed to run uninterrupted until it is nished. I the calculation is interrupted, there is a risk

    o incorrect or incomplete route inormation nding its way into the routing database, possibly

    resulting in incorrect orwarding, routing loops, or packet black holes.

    On the other hand, there are elements o a routing process that must be serviced as soon as

    possible. Hellos, adjacency maintenance messages, and route updates have timers oten tightly

    set timers that require quick processing and response. Reacting slowly to these unctions

    could cause timeouts that in turn can result in unnecessary message retransmissions at best and

    closed adjacencies at worst. Stability, precision, and predictability can all be negatively aected.

    There is a potential confict in these two basic unctions. A route calculation is a run-to-

    completion task in computer science terms, it requires cooperative multitasking. Perorming

    adjacency maintenance and update tasks is a real-time, or preemptive multitasking, unction.

    When a routing protocol implementation must share a processor, should it allow interruptions

    o its run-to-completions tasks whenever a real-time task needs the processor, at the risk o

    temporarily corrupted route data? Or should it require real-time demands to wait until run-to-

    completion tasks are nished, at the risk o broken adjacencies and network instabilities?

    The answer, o course, is that neither situation is acceptable. Herein, then, is a justication or

    a separation o the sotware comprising the real-time and run-to-completion elements o the

    routing process. JUNOS implements the RPD, with all o its constituent routing protocols, as a

    run-to-completion module. The real-time elements o the routing protocols are separated into amodule called the Periodic Packet Management Daemon (PPMD). The distinct processing needs

    o each module are then served, and a scheduler manages the demands o both modules on the

    shared Routing Engine processor. The result is a highly responsive, accurate, and stable routing

    platorm.

  • 8/14/2019 Carrier Class Operating System

    16/20

    Copyright 2006, Juniper Networks, Inc16

    Architectural Issues in Carrier Class Operating Systems

    The JUNOS Kernel

    The heart o JUNOS, the JUNOS kernel began as a FreeBSD kernel. FreeBSD is renowned or

    running on servers with exceptionally long uptimes, indicating both its level o reliability and

    its inrequent need or updating. Because FreeBSD is open source sotware, Juniper Networks

    engineers were ree to retain what mattered, discard what didnt, and custom-build the parts that

    make the kernel JUNOS rather than FreeBSD.

    Recently one o Juniper Networks competitors has begun oering a new operating system

    built on the proprietary QNX Neutrino microkernel, and that vendor has made much about the

    supposed superiority o microkernels over kernels such as JUNOS. To understand the issue,

    it helps to briefy describe the reasoning behind microkernels. A simplistic comparison o a

    monolithic kernel to a microkernel is illustrated in Figure 6. Only essential system services

    remain in the microkernel (hence the prex micro); unctions such as the host stack, device

    drivers, and le system have become external processes running in user mode, communicating

    with the microkernel via system calls. By doing this, these externalized unctions can restart or

    ail independently without causing a complete kernel ailure.

    Figure 6

    This argument in avor o microkernels is o course the same argument in avor o modularity

    in the overall OS architecture. But the principles or intelligent modular design discussed in this

    paper also apply here. The system is so heavily dependent on the host stack and le system

    that a ailure o one o these services is likely to have a severely negative impact on the entire

    system whether they are in the kernel or external processes. And in reality, device drivers can

    be sopped and started even within the kernel. So the reality o microkernels is that by adding

    articial barriers between these services interprocess communication is increased; the attempt

    to simpliy the kernel adds complexity to the overall system.

    There is nothing new in the arguments currently being made in avor o microkernels; in act

    they come rom a 20-year-old academic debate. One o the more enlightening versions o this

    debate took place in 1992 between Andy Tanenbaum, proponent o the microkernel-basedMinix operating system, and Linus Torvolds, creator o the kernel-based Linux, on the Usenet

    newsgroup comp.os.minix6. Tanenbaum made the same arguments then as the arguments

    now being used to promote microkernels as the latest innovation in router operating system

    architecture. Among people who design operating systems, Tanenbaum wrote, the debate is

    essentially over. Microkernels have won7.

    HostStack

    DeviceDrivers

    FileSystem

    Processes

    IPC

    ExternalProcesses

    SystemCalls

    SystemCalls

    Hardware

    Microkernel

    Scheduler,Paging...

    System call interface no kernel

    Kernal interface no hardware

    Processes

    Hardware

    Kernel

    Scheduler,Paging

    Virtual Memory

    Etc.

    Host StackDevice DriversFile System

    System call interface no kernel

    Kernal interface no hardware

    6A simple Google search provides the complete text o the debate.7Andy Tanenbaum, LINUX is Obsolete, comp.os.minix, January 1992.

  • 8/14/2019 Carrier Class Operating System

    17/20

    Copyright 2006, Juniper Networks, Inc 17

    Architectural Issues in Carrier Class Operating Systems

    Yet reality has shown otherwise. While microkernels have proven popular in embedded systems

    such as automotive computers and industrial controls, (QNX is amously used in the Space

    Shuttles robotic arm), they have ound little acceptance in more complex operating systems.

    Microkernels are mostly discredited now, writes Miles Nordin in Linux Journal, because they

    have perormance problems, and the benets originally promised are a antasy. This view is

    supported in the widely respected textbook on operating system design, Operating SystemConcepts: Unortunately, microkernels can suer rom perormance decreases due to increased

    system unction overhead.

    Juniper Networks maintains no strong position on the arguments or and against microkernels.

    Rather we chose FreeBSD as the genetic orerunner o the JUNOS kernel because o its

    openness, in keeping with our strong belie in open standards. Its open source sotware has

    made FreeBSD the most peer-reviewed sotware in the world; the reliability o JUNOS is thereby

    rooted in the reliability o FreeBSD.

    Engineering Discipline

    The consistent message o the previous ew sections has been that modularity is essential to a

    carrier class OS architecture, but nave approaches to designing modules can cause as many ormore problems than it solves. This paper has called thoughtul, experience-based modularity

    intelligent modular design.

    There is a deeper message throughout this paper: A router operating system that can meet

    carrier class demands is only possible when it is managed by a highly experienced, highly

    disciplined engineering team ollowing strict engineering processes. J. M. Juran, the guru o

    modern business and industrial quality practices, says that you can determine the quality o the

    product by assessing the quality o the processes used to develop it.

    Any carrier class router OS is necessarily a highly complex system. Reliability can only

    be maintained in such a system when the processes or improving the code and eature

    enhancements are tightly controlled.

    The principles o engineering discipline and strict processes were implemented at JuniperNetworks rom the very beginning, by the engineers joining the young company. Many had

    experienced rst hand what happens when the rules governing product development are loose,

    and when the developers do not have control o the code: The sotware becomes unmanageable,

    and changes bring unpredictable conficts that oten become apparent only when the customer

    attempts to implement the sotware.

    Our quality development practices have evolved and matured with the company, but Juniper

    Networks has never deviated rom the standards implemented in its rst years. In act our

    acquisition o TL9000 certication only required documenting the processes already in place, not

    implementing new processes.

    8Miles Nordin, Obsolete Microkernel Dooms MAC OS X to Lag Linux in Perormance, Linux Journal, May 2002.9Abraham Silberschatz, Peter Baer Galvin, and Greg Gagne, Operating System Concepts, Seventh Edition, John Wiley & Sons, 2005, page 62.

  • 8/14/2019 Carrier Class Operating System

    18/20

    Copyright 2006, Juniper Networks, Inc18

    Architectural Issues in Carrier Class Operating Systems

    JUNOS Release Schedule

    There are our major releases o JUNOS each year, one per quarter, always in the same months:

    February

    May August

    November

    There are also, typically, ve working releases at any given time:

    Three maintenance releases

    One release in beta

    One release under development

    The release schedule provides a high degree o predictability or customers planning upgrades

    and new eature implementation. Because o this, the release schedule always has highest

    priority. Several dozen new eatures are included in each release, so it is important that the

    customers planning or these new eatures not be delayed by development problems with one

    eature. I a new eature project becomes delayed, the eature is moved to the next release; the

    release is never delayed while waiting or a specic eature development to catch up.

    Well-dened development milestones are essential to this process, so that expected development

    delays can be identied early on. Any rescheduling o a eature to a later release, then, normally

    occurs early enough that customers expecting the eature are given plenty o lead time to adjust

    their plans accordingly.

    Major inrastructure projects and unusually complex new eatures are introduced in phases

    over multiple releases. A good example o a phased project is Non-Stop Routing (NSR). Early

    components o NSR were added to JUNOS code as early as release 7.6; these rst components

    were invisible to the customer, but allowed Juniper Networks system test personnel to insure

    correct integration beore moving on to the next phase components. The rst customer visible

    NSR components OSPF and IS-IS support were released in JUNOS 8.1, and the NSR project

    will be ully complete at JUNOS 9.0. Releasing such projects in phases insures reliability byallowing incremental regression testing o components as they are added.

    The JUNOS release schedule is also essential to helping adhere to the single train model.

    JUNOS Single Train Release Model

    The JUNOS single-train model means that or each JUNOS release, there is only one image; that

    one image runs on all T, M, and J Series routers. The same code that runs on the largest T series

    router also runs on the smallest J Series router. And all eatures supported at a given release are

    supported in the one image. There are no separate eature packages to add when you want to

    add a eature; you only have to enable the eature you want.

    There are a number o development ethics that are adhered to in order to maintain this single

    train model: No eature development is perormed in maintenance (working) releases. New eatures

    are added to new releases.

    No back-porting o eatures is allowed. That is, when a new eature is developed in a new

    release, the eature cannot be added to an older release.

    No customer specials. All eatures requested by all customers are developed and

    released in the mainline code.

  • 8/14/2019 Carrier Class Operating System

    19/20

    Copyright 2006, Juniper Networks, Inc 19

    Architectural Issues in Carrier Class Operating Systems

    Stating what we will not do seems somewhat infexible. It is. Loosening any o these rules means

    veering o o the ocused path delineated by our strict quality processes, and in the end our

    customers would suer. Adhering to the rules means that at all times our developers are working

    with only a single code at any release; the result is well-understood code, with new eatures

    and changes careully tested or correct integration. For the customer, this means superior

    reliability. It also eliminates or the customer any need to cautiously select rom a complex menuo platorm-specic, interace-specic, and eature-specic packages and then perorm careul

    regression testing to insure that the selected code interoperates as expected with previously-

    implemented versions o the code and all installed hardware.

    The single train model also benets our customers in the ollowing ways:

    The same development teams manage the same sotware modules release ater release,

    insuring that the code and any chances made to the code is intimately understood.

    The same team responsible or writing the code is responsible or nding and correcting

    bugs in the code. As a result, bugs are remedied ar aster than would be possible i we

    used separate bug x teams.

    The dedicated engineering team concept inspires a sense o ownership or the code,

    sharply reducing the chances o bugs in the sotware in the rst place.Again, these principles translate directly into reliability or our customers.

    New Product Introduction

    In addition to engineering rules and procedures, there must be a set o phases that guide a given

    product throughout its lietime rom rst inception to end-o-lie. At Juniper Networks this is the

    New Product Introduction (NPI) model. The NPI model denes seven phases, and is applied to all

    engineering projects. Well-dened milestones must be met or any project to progress rom one

    phase to the next. Figure 7 shows the specic NPI model or JUNOS releases and eatures.

    fgure 7

    Certainly every company that produces a product has some similar model or dening the

    products liecycle; but without strict engineering discipline, the models mean little. Juniper

    Networks NPI model is dened to provide value to the customer by enabling us to oresee

    resource requirements well in advance o the point where they might aect timely delivery to

    our customers.

    Phase 0 Phase 1 Phase 2 Phase 3 Phase 4 Phase 5 Phase 6

    Initial Feature

    Content, and

    SW Resource

    Estimate

    Design,

    Development,

    and

    Unit Test

    Beta Test

    Product

    Definition,

    Commitment

    and Approval

    System and

    Alpha Test

    End of Life

    FRS to

    Production

  • 8/14/2019 Carrier Class Operating System

    20/20

    Architectural Issues in Carrier Class Operating Systems

    Enorcement o the development milestones is accomplished by mechanized process controls.

    Communications within and between development teams is also highly mechanized, beginning

    with enhancement requests rom eld sales personnel, representing our customers, all the way

    to the end o product lie.

    Just as module teams and the single train release model are essential or understanding the code,

    the NPI model is essential or projecting resource requirements and understanding exactly where

    the code is in its liecycle.

    Conclusions

    JUNOS is the repository o our accumulated networking knowledge. It does not distinguish

    between core and edge, service provider and enterprise. The power, discipline, and consistency

    o our engineering practices insure the continuing advancement o JUNOS as the single operating

    system architecture or all uture Juniper Networks platorms.

    JUNOS was designed rom the beginning to meet the demands o carrier class networks,

    and we have continually improved upon it while never deviating rom our core engineering

    principles. Our decade o experience with JUNOS modular architecture brings a level o mature

    understanding o managing such architectures that cannot be matched by other vendors just

    now attempting to oer similar router operating systems.

    JUNOS has always been the premier operating system in high-perormance, high-demand

    networks. As more and more sensitive services are added to existing networks, the unmatched

    reliability o JUNOS becomes more important to serious service providers than ever beore.

    Copyright 2006, Juniper Networks, Inc. All rights reserved. Juniper Networks and the Juniper Networks logo are registered trademarks o Juniper Networks, Inc. in

    the United States and other countries. All other trademarks, service marks, registered trademarks, or registered service marks in this document are the property o

    Juniper Networks or their respective owners. All specications are subject to change without notice. Juniper Networks assumes no responsibility or any inaccuracies

    in this document or or any obligation to update inormation in this document. Juniper Networks reserves the right to change, modiy, transer, or otherwise revise this

    publication without notice.