TW QFabric TrafficFlows

THISWEEK:QFABRICSYSTEMTRAFFIC FLOWSANDTROUBLESHOOTINGTraditional Data Center architecture follows a layered approach that uses separate switch de-vices for access, aggregation, and core layers. But a completely scaled QFabric system combines all the member switches and enables them to function as a single unit. So, if your Data Center deploys a QFabric system with one hundred QFX3500 nodes, then those one hundred switches will act like a single switch. Trafcowsdifferentlyinthissuper-sizedVirtualChassisthatspansyourentireDataCenter. KnowinghowtrafcmovesiscriticaltounderstandingandarchitectingDataCenteropera-tions,butitisalsonecessarytoensureefcientday-to-dayoperationsandtroubleshooting. This Week: QFabric System Trafc Flows and Troubleshooting is a deep dive into how the QFabric system externalizes the data plane for both user data and data plane trafc and why thats such a massive advantage from an operations point of view. QFabric is an unique accomplishment making 128 switches look and function as one.Ankit brings to this book a background of both supporting QFabric customers, andasaresidentengineer,implementingcomplexcustomermigrations.Thisdeep dive into the inner workings of the QFabric system is highly recommended for anyone looking to implement or better understand this technology. John Merline, Network Architect, Northwestern MutualLEARN SOMETHING NEW ABOUT QFABRIC THIS WEEK:Understand the QFabric system technology in great detail.Compare the similarities of the QFabric architecture with MPLS-VPN technology.Verify the integrity of various protocols that ensure smooth functioning of the QFabric system.Understand the various active/backup Routing Engines within the QFabric system.Understand the various data plane and control plane flows for different kinds of traffic within the QFabric system.Operate and effectively troubleshoot issues that you might face with a QFabric deployment.Published by Juniper Networks Books www.juniper.net/booksISBN 978-193677987197819367798715 1 6 0 0JunosFabricandSwitchingTechnologiesTHIS WEEK: QFABRIC SYSTEM TRAFFIC FLOWS AND TROUBLESHOOTING ByAnkitChadhaKnowing how trafc ows through a QFabric system is knowing how your Data Center can scale. THISWEEK:QFABRICSYSTEMTRAFFIC FLOWSANDTROUBLESHOOTINGTraditional Data Center architecture follows a layered approach that uses separate switch de-vices for access, aggregation, and core layers. But a completely scaled QFabric system combines all the member switches and enables them to function as a single unit. So, if your Data Center deploys a QFabric system with one hundred QFX3500 nodes, then those one hundred switches will act like a single switch. Trafcowsdifferentlyinthissuper-sizedVirtualChassisthatspansyourentireDataCenter. KnowinghowtrafcmovesiscriticaltounderstandingandarchitectingDataCenteropera-tions,butitisalsonecessarytoensureefcientday-to-dayoperationsandtroubleshooting. This Week: QFabric System Trafc Flows and Troubleshooting is a deep dive into how the QFabric system externalizes the data plane for both user data and data plane trafc and why thats such a massive advantage from an operations point of view. QFabric is an unique accomplishment making 128 switches look and function as one.Ankit brings to this book a background of both supporting QFabric customers, andasaresidentengineer,implementingcomplexcustomermigrations.Thisdeep dive into the inner workings of the QFabric system is highly recommended for anyone looking to implement or better understand this technology. John Merline, Network Architect, Northwestern MutualLEARN SOMETHING NEW ABOUT QFABRIC THIS WEEK:Understand the QFabric system technology in great detail.Compare the similarities of the QFabric architecture with MPLS-VPN technology.Verify the integrity of various protocols that ensure smooth functioning of the QFabric system.Understand the various active/backup Routing Engines within the QFabric system.Understand the various data plane and control plane flows for different kinds of traffic within the QFabric system.Operate and effectively troubleshoot issues that you might face with a QFabric deployment.Published by Juniper Networks Books www.juniper.net/booksISBN 978-193677987197819367798715 1 6 0 0JunosFabricandSwitchingTechnologiesTHIS WEEK: QFABRIC SYSTEM TRAFFIC FLOWS AND TROUBLESHOOTING ByAnkitChadhaKnowing how trafc ows through a QFabric system is knowing how your Data Center can scale. This Week:QFabric System Trafc Flows and Troubleshooting By Ankit ChadhaChapter 1: Physical Connectivity and Discovery. . . . . . . . . . . . . . . . . . . . . . . . 9Chapter 2:Accessing Individual Components . . . . . . . . . . . . . . . . . . . . . . . . . 25Chapter 3: Control Plane and Data Plane Flows . . . . . . . . . . . . . . . . . . . . . . . 39Chapter 4: Data Plane Forwarding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63Knowing how trafc ows through your QFabric system isknowing how your Data Center can scale.iiii 2014 by Juniper Networks, Inc. All rights reserved.Juniper Networks, Junos, Steel-Belted Radius, NetScreen, and ScreenOS are registered trademarks of Juniper Networks, Inc. in the United States and other countries. The Juniper Networks Logo, the Junos logo, and JunosE are trademarks of Juniper Networks, Inc. All other trademarks, service marks, registered trademarks, or registered service marks are the property of their respective owners. Juniper Networks assumes no responsibility for any inaccuracies in this document. Juniper Networks reserves the right to change, modify, transfer, or otherwise revise this publication without notice. Published by Juniper Networks BooksAuthors: Ankit ChadhaTechnical Reviewers: John Merline, Steve Steiner, Girish SV Editor in Chief: Patrick AmesCopyeditor and Proofer: Nancy Koerbel J-Net Community Manager: Julie WiderISBN: 978-1-936779-87-1 (print)Printed in the USA by Vervante Corporation.ISBN: 978-1-936779-86-4 (ebook)Version History:v1, April 2014 23 4 5 6 7 8 9 10About the Author: Ankit Chadha is a Resident Engineer in the Advanced Services group of Juniper Networks. He has worked on QFabric system solutions in various capacities including solutions testing, engineering escalation, customer deployment, and design roles. He holds several industry recognized certications such as JNCIP-ENT and CCIE-RnS.Authors Acknowledgments:I would like to thank Patrick Ames, our Editor in Chief, for his continuous encouragement and support from the conception of this idea until the delivery. There is no way that this book would have been successfully completed without Patricks support. John Merline and Steve Steiner provided invaluable technical review; Girish SV spent a large amount of time carefully reviewing this book and made sure that its ready for publishing. Nancy Koerbel made sure that it shipped without any embarrassing mistakes. Thanks to Steve Steiner, Mahesh Chandak, Ruchir Jain, Vaibhav Garg, and John Merline for being great mentors and friends. Last but not least, Id like to thank my family and my wife, Tanu, for providing all the support and love that they always do.This book is available in a variety of formats at: http://www.juniper.net/dayone. iii iiiWelcome to This WeekThis Week books are an outgrowth of the popular Day One book series published by Juniper Networks Books. Day One books focus on providing just the right amount of information that you can execute, or absorb, in a day. This Week books, on the other hand, explore networking technologies and practices that in a class-room setting might take several days to absorb or complete. Both libraries are available to readers in multiple formats: Download a free PDF edition at http://www.juniper.net/dayone. Get the ebook edition for iPhones and iPads at the iTunes Store>Books. Search for Juniper Networks Books. Get the ebook edition for any device that runs the Kindle app (Android, Kindle, iPad, PC, or Mac) by opening your devices Kindle app and going to the Kindle Store. Search for Juniper Networks Books. Purchase the paper edition at either Vervante Corporation (www.vervante.com) or Amazon (www.amazon.com) for prices between $12-$28 U.S., depending on page length. Note that Nook, iPad, and various Android apps can also view PDF les. If your device or ebook app uses .epub les, but isnt an Apple product, open iTunes and download the .epub le from the iTunes Store. You can now drag and drop the le out of iTunes onto your desktop and sync with your .epub device. What You Need to Know Before ReadingBefore reading this book, you should be familiar with the basic administrative functions of the Junos operating system, including the ability to work with opera-tional commands and to read, understand, and change Junos congurations. There are several books in the Day One library to help you learn Junos administration, at www.juniper.net/dayone.This book makes a few assumptions about you, the reader: You have a working understanding of Junos and the Junos CLI, including conguration changes using edit mode. See the Day One books at www.juniper.net/dayone for a variety of tutorials on Junos at all skill levels. You can make conguration changes using the CLI edit mode. See the Day One books at www.juniper.net/dayone for a variety of tutuorials on Junos at all skill levels. You have an understanding of networking fundamentals like ARP, MAC addresses, etc. You have a thorough familiarity with BGP fundamentals. You have a thorough familiarity with MP-BGP and MPLS-VPN fundamentals and their terminologies. See This Week: Deploying MBGP Multicast VPNs, Second Edition at www.juniper.net/dayone for a quick review. Finally, this book uses outputs from actual QFabric systems/deployments readers are strongly encouraged to have a stable lab setup to execute those commands.ivivWhat You Will Learn From This Book Youll understand the working of the QFabric technology (in great detail). Youll be able to compare the similarities of the QFabric architecture with MPLS VPN technology. Youll be able to verify the integrity of various protocols that ensure smooth functioning of the QFabric system. Youll understand the various active/backup Routing Engines within the QFabric. Youll understand the various data plane, control plane ows for different kinds of trafc within the QFabric system. Youll be able to operate and effectively troubleshoot issues that you might face with a QFabric system deployment.Information ExperienceThis book is singularly focused on one aspect of networking technology. There are other sources at Juniper Networks, from white papers to webinars to online forums such as J-Net (forums.juniper.net). Look for the following sidebars to directly access other superb informational resources:MORE?Its highly recommended you go through the technical documentation and the minimum requirements to get a sense of QFabric hardware and deployment before you jump in.The technical documentation is located at www.juniper.net/documen-tation.Use the Pathnder tool on the documentation site to explore and nd the right information for your needs.About This BookThis book focuses on the inner workings and internal trafc ows of the Juniper Networks QFabric solution and does not address deployment or conguration practices. MORE?The complete deployment guide for the QFabric can be found here: https://www.juniper.net/techpubs/en_US/junos11.3/information-products/pathway-pages/qfx-series/qfabric-deployment.html.QFabric vs. Legacy DataLCenter ArchitectureTraditional Data Center architecture follows a layered approach to building a Data Center using separate switch devices for access, aggregation, and core layers. Obvi-ously these devices have different capacities with respect to their MAC table sizes, depending on their role or their placement in the different layers.Since Data Centers host mission critical applications, redundancy is of prime impor-tance. To provide the necessary physical redundancy within a Data Center, Spanning Tree Protocol (STP) is used. STP is a popular technology and is widely deployed around the world. A Data Center like the one depicted in Figure A.1 always runs some avor of STP to manage the physical redundancy. But there are drawbacks to using Spanning Tree Protocol for redundancy: vFigure A.1Traditional Layered Data Center Topology STP works on the basis of blocking certain ports, meaning that some ports can potentially be overloaded, while the blocked ports do not forward any trafc at all. This is highly undesirable, especially because the switch ports deployed in a Data Center are rather costly. This situation of some ports not forwarding any trafc can be overcome somewhat by using different avors of the protocol, like PVST or MSTP, but STP inherently works on the principle of blocking ports. Hence, even with PVST or MSTP, complete load balancing of trafc over all the ports cannot be achieved. Using PVST and MSTP versions, load balancing can be done across VLANs one port can block for one VLAN or a group of VLANs and another port can block for the rest of the VLANs. However, there is no way to provide load balancing for different ows within the same VLAN. Spanning Tree relies on communication between different switches. If there is some problem with STP communication, then the topology change recalculations that follow can lead to small outages across the whole Layer 2 domain. Even small outages like these can cause signicant revenue loss for applications that are hosted on your Data Center.By comparison, a completely scaled QFabric system can have up to 128 member switches. This new technology works by combining all the member switches and making them function as a single unit to other external devices. So if your Data Center deploys a QFabric with one hundred QFX3500 nodes, then all those one hundred switches will act as a single switch. In short, that single switch (QFabric) will have (100x48) 4800 ports!vi Since all the different QFX3500 nodes act as a single switch, there is no need to run any kind of loop prevention protocol like Spanning Tree. At the same time, there is no compromise on redundancy because all the Nodes have redundant connections to the backplane (details on the connections between different components of a QFabric system are discussed throughout this book). This is how the QFabric solution takes care of the STP problem within the Data Center. Consider the case of a traditional (layered) Data Center design. Note that if two hosts connected to different access switches need to communicate with each other, they need to cross multiple switches in order to do that. In other words, communication in the same or a different VLAN might need to cross multiple switch hops to be successful.Since all the Nodes within a QFabric system work together and act as a large single switch, all the external devices connected to the QFabric Nodes (servers, lers, load balancers, etc.) are just one hop away from each other. This leads to a lower number of lookups, and hence, considerably reduces latency. Different Components of a QFabric SystemA QFabric system has multiple physical and logical components lets identify them here so you have a common place you can return to when you need to review them.Physical ComponentsA QFabric system has the following physical components as shown in Figure A.2: Nodes:these are the top-of-rack (TOR) switches to which external devices are connected.All the server-facing ports of a QFabric system reside on the Nodes.There can be up to 128 Nodes in a QFabric-G system and up to 16 Nodes in a QFabric-M implementation.Up to date details on the differences between various QFabric systems can be found here: http://www.juniper.net/us/en/products-services/switching/qfabric-system/#overview. Interconnects:The Interconnects act as the backplane for all the data plane trafc.All the Nodes should be connected to all the Interconnects as a best practice.There can be up to four Interconnects (QFX3008-I) in both QFabric-G and QFabric-M implementations. Director Group:There are two Director devices (DG0 and DG1) in both QFabric-G and QFabric-M implementations.These Director devices are the brains of the whole QFabric system and host the necessary virtual components (VMs) that are critical to the health of the system. The two Director devices operate in a master/slave relationship. Note that all the protocol/route/inventory states are always synced between the two. Control Plane Ethernet Switches:These are two independent EX VCs or EX switches (in case of QFabric-G and QFabric-M, respectively) to which all the other physical components are connected. These switches provide the necessary Ethernet network over which the QFabric components can run the internal protocols that maintain the integrity of the whole system. The LAN segment created by these devices is called the Control Plane Ethernet segment or the CPE segment. viiFigure A.2Components of a QFabric SystemVirtual ComponentsThe Director devices host the following Virtual Machines: Network Node Group VM:The NWNG-VM are the routing brains for a QFabric system, where all the routing protocols like OSPF, BGP, or PIM are run.There are two NWNG-VMs in a QFabric system (one hosted on each DG) and they operate in an active/backup fashion with the active VM always being hosted on the master Director device. Fabric Manager:The Fabric Manager VM is responsible for maintaining the hardware inventory of the whole system.This includes discovering new Nodes and Interconnects as theyre added and keeping a track of the ones that are removed.The Fabric Manager is also in charge of keeping a complete topological view of how the Nodes are connected to the Interconnects.In addition to this, the FM also needs to provide internal IP addresses to every other component to allow for the internal protocols to operate properly.There is one Fabric Manager VM hosted on each Director device and these VMs operate in an active/backup conguration. Fabric Control:The Fabric Manager VM is responsible for distributing various routes (Layer 2 or Layer 3) to different Nodes of a QFabric system.This VM forms internal BGP adjacencies with all the Nodes and Interconnects and sends the appropriate routes over these BGP peerings.There is one Fabric Manager VM hosted on each Director device and these operate in an active/active fashion.viii Node Groups Within a QFabric SystemNode groups is a new concept introduced by the QFabric technology; it is a logical collection of one or more physical Nodes that are part of a QFabric system. When-ever multiple Nodes are congured to be part of a Node group, they act as one. Individual Nodes can be congured to be a part of these kinds of Node groups: Server Node Group (SNG): This is the default group and consists of one Node. Whenever a Node becomes part of a QFabric system, it comes up as an SNG. These mostly connect to servers that do not need any cross Node redundancy.The most common examples are servers that have only one NIC. Redundant Server Node Group (RSNG): An RSNG consists of two physical Nodes. The Routing Engines on the Nodes operate in an active/backup fashion (think of a Virtual Chassis with two member switches). You can congure multiple pairs of RSNGs within a QFabric system. These mostly connect to dual-NIC servers. Network Node Group (NWNG): Each QFabric has one Network Node Group and up to eight physical Nodes can be congured to be part of the NWNG. The Routing Engines (RE) on the Nodes are disabled and the RE functionality is handled by the NWNG-VMs that are located on the Director devices.MORE?Every Node device can be a part of only one Node group at a time. The details on how to congure different kinds of Node groups can be found here: http://www.juniper.net/techpubs/en_US/junos12.2/topics/task/conguration/qfabric-node-groups-conguring.html.NOTEChapter 3 covers these abstractions, including a discussion of packet ows.Differences Between a QFabric System and a Virtual ChassisJuniper EX Series switches support Virtual Chassis (VC) technology, which enables multiple physical switches to be combined. These multiple switches then act as a single switch. MORE?For more details on the Virtual Chassis technology, refer to the following technical documentation:https://www.juniper.net/techpubs/en_US/junos13.3/topics/concept/virtual-chassis-ex4200-components.html.One of the advantages of a QFabric system is its scale. A virtual chassis can host tens of switches, but a fully scaled QFabric system can have a total 128 Nodes combined.The QFabric system, however, is much more than a supersized Virtual Chassis. QFabric technology completely externalizes the data plane because of the Intercon-nects.Chapter 4 discusses the details of how user data or data plane trafc ows through the external data plane.Another big advantage of the QFabric system is that the Nodes can be present at various locations within the Data Center. The Nodes are normally deployed as top-of-rack (TOR) switches. To connect to the backplane, the Nodes connect to the Interconnects. Thats why QFabric is such a massive advantage from an operations point of view, as one large switch that spans the entire Data Center and the cables from the servers still plugs in to only the top-of-rack switches.Various components of the QFabric system (including the Interconnects) are dis-cussed throughout this book and you can return to these pages to review these basic denitions at any time.Chapter 1Physical Connectivity and DiscoveryInterconnections of Various Components. . . . . . . . . . . . . . . . . . . . . . . . . . . . 10Why Do You Need Discovery? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .13System and Component Discovery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14Fabric Topology Discovery (VCCPDf) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16Relation Between VCCPD and VCCPDf . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17Test Your Knowledge . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2410This Week: QFabric System Trafc Flows and TroubleshootingThis chapter discusses what a plain-vanilla QFabric system is supposed to look like. It does not discuss any issues in the data plane or about packet forwarding; its only focus is the internal workings of QFabric and checking the protocols that are instru-mental in making the QFabric system function as a single unit.The important rst step in setting up a QFabric system is to cable it correctly. MORE?Juniper has great documentation about cabling and setting up a QFabric system, so it wont be repeated here. If you need to, review the best practices of QFabric cabling: https://www.juniper.net/techpubs/en_US/junos11.3/information-products/pathway-pages/qfx-series/qfabric-deployment.htmlMake sure that the physical connections are made exactly as mentioned in the deploy-ment guide. Thats how the test units used for this book were set up. Any variations in your lab QFabric system might cause discrepencies with the correlating output that is shown in this book.Interconnections of Various ComponentsAs discussed, the QFabric system consists of multiple physical components and these components need to be connected to each other as well. Consider these inter-compo-nent links: Nodes to EX Series VC: These are 1GbE links Interconnects to EX Series VC: These are 1GbE links DG0/DG1 to EX Series VC: These are 1GbE links DG0 to DG1: These are 1GbE links Nodes to Interconnects: These are 40GbE linksAll these physical links are Gigabit Ethernet links except for the 40GbE links between the Nodes and the Interconnects these 40 GbE links show up as FTE interfaces on the CLI. The usual Junos commands like show interfaces terse and show interfac-es extensive apply and should be used for troubleshooting any issues related to nding errors on the physical interfaces.The only devices where these Junos commands cannot be run are the Director devices because they run on Linux. However, the usual Linux commands do work on Direc-tor devices (like ifconfig, top, free, etc.).Lets start with one of the most common troubleshooting utilities that a network engi-neer needs to know about: checking the status of the interfaces and their properties on the Director devices. To check the status of interfaces of the Director devices from the Linux prompt, the regular ifconfig command can be used. However, this output uses the following keywords for specic interfaces types: Bond0: This is the name of the aggregated interface that gets connected to the other Director device (DG). The two Director devices are called DG0 and DG1. Note that the IP address for the Bond0 interface on DG0 is always 1.1.1.1 and on DG1 it is always set to 1.1.1.2. This link is used for syncing and maintaining the states of the two Director devices. These states include VMs, congurations, le transfers (cores), etc.Chapter 1:Physical Connectivity and Discovery11 Bond1: This aggregated interface is used mainly for internal Control plane communication between the Director devices and other QFabric components like the Nodes and the Interconnects. Eth0: This is the management interface of the DG. This interface gets con-nected to the network and you can SSH to the IP address of this interface from from an externally reachable machine. Each Director device has an interface called Eth0, which should be connected to the management network. At the time of installation, the QFabric system prompts the user to enter the IP address for the Eth0 interface of each Director device. In addition to this, the user is required to add a third IP address called the VIP (Virtual IP Address). This VIP is used to manage the operations of QFabric, such as SSH, telnet, etc.Also, the CLI command show fabric administration inventory director-group status shows the status of all the interfaces. Here is sample output of this CLI command: root@TEST-QFABRIC> show fabric administration inventory director-group statusDirector Group Status Tue Feb 11 08:32:50 CST 2014 Member Status Role Mgmt AddressCPU Free Memory VMs Up Time ------ ------ -------- --------------- --- ----------- --- ------------- dg0online master 172.16.16.5 1%3429452k4 97 days, 02:14 hrs dg1online backup 172.16.16.6 0%8253736k3 69 days, 23:42 hrs Member Device Id/AliasStatusRole ------ ---------------- ------- --------- dg0TSTDG0 onlinemasterMaster Services---------------Database ServeronlineLoad Balancer Director onlineQFabric Partition AddressofflineDirector Group Managed Services-------------------------------Shared File System onlineNetwork File SystemonlineVirtual Machine Server onlineLoad Balancer/DHCP onlineHard Drive Status-----------------Volume ID:0FFF04E1F7778DA3 optimalPhysical ID:0onlinePhysical ID:1onlineResync Progress Remaining:00%Resync Progress Remaining:10%SizeUsed Avail Used% Mounted on-------- ----- ----- ----------423G 36G366G9% /99M16M79M 17%/boot93G13G81G 14%/pbdataDirector Group Processes------------------------Director Group Manager onlinePartition ManageronlineSoftware Mirroring onlineShared File System masteronlineSecure Shell Process onlineNetwork File SystemonlineFTP Server onlineSyslog onlineDistributed Management online12This Week: QFabric System Trafc Flows and TroubleshootingSNMP Trap ForwarderonlineSNMP Process onlinePlatform ManagementonlineInterface Link Status---------------------Management Interface upControl Plane Bridge upControl Plane LAGupCP Link [0/2]downCP Link [0/1]upCP Link [0/0]upCP Link [1/2]downCP Link [1/1]upCP Link [1/0]upCrossover LAGupCP Link [0/3]upCP Link [1/3]up Member Device Id/AliasStatusRole ------ ---------------- ------- --------- dg1TSTDG1 onlinebackupDirector Group Managed Services-------------------------------Shared File System onlineNetwork File SystemonlineVirtual Machine Server onlineLoad Balancer/DHCP onlineHard Drive Status-----------------Volume ID:0A2073D2ED90FED4 optimalPhysical ID:0onlinePhysical ID:1onlineResync Progress Remaining:00%Resync Progress Remaining:10%SizeUsed Avail Used% Mounted on-------- ----- ----- ----------423G 39G362G10%/99M16M79M 17%/boot93G13G81G 14%/pbdataDirector Group Processes------------------------Director Group Manager onlinePartition ManageronlineSoftware Mirroring onlineShared File System masteronlineSecure Shell Process onlineNetwork File SystemonlineFTP Server onlineSyslog onlineDistributed Management onlineSNMP Trap ForwarderonlineSNMP Process onlinePlatform ManagementonlineInterface Link Status---------------------Management Interface upControl Plane Bridge upControl Plane LAGupCP Link [0/2]downCP Link [0/1]upCP Link [0/0]upCP Link [1/2]downChapter 1:Physical Connectivity and Discovery13CP Link [1/1]upCP Link [1/0]upCrossover LAGupCP Link [0/3]upCP Link [1/3]uproot@TEST-QFABRIC> --snip--Note that this output is taken from a QFabric-M system, and hence, port 0/2 is down on both the Director devices.Details on how to connect these ports on the DGs is discussed in the QFabric Installation Guide cited at the beginning of this chapter, but once the physical installation of a QFabric system is complete, you should verify the status of all the ports. Youll nd that once a QFabric system is installed correctly, it is ready to forward trafc, and the plug-and-play features of the QFabric technology make it easy to install and maintain. However, a single QFabric system has multiple physical components, so lets assume youve cabled your test bed correctly in your lab and review how a QFabric system discovers its multiple components and makes sure that those different components act as a single unit.Why Do You Need Discovery?The front matter of this book succinctly discusses the multiple physical components that comprise a QFabric system. The control plane consists of the Director groups and the EX Series VC. The data plane of a QFabric system consists of the Nodes and the Interconnects. A somewhat loose (and incorrect) analogy that might be drawn is that the Director groups are similar to the Routing Engines of a chassis-based switch, the Nodes are similar to the line cards, and the Interconnects are similar to the backplane of a chassis-based switch. But QFabric is different from a chassis-based switch as far as system discovery is concerned.Consider a chassis-based switch. There are only a certain number of slots in such a device and the line cards can only be inserted into the slots available. After a line card is inserted in one of the available slots, it is the responsibility of the Routing Engine to discover this card. Note that since there are a nite number of slots, it is much easier to detect the presence or absence of a line card in a chassis with the help of hardware-based assistance. Think of it as a hardware knob that gets activated whenever a line card is inserted into a slot. Hence, discovering the presence or absence of a line card is easy in a chassis-based device.However, QFabric is a distributed architecture that was designed to suit the needs of a modern Data Center. A regular Data Center has many server cabinets and the Nodes of a QFabric system can act as the TOR switches. Note that even though the Nodes can be physically located in different places within the Data Center, they still act as a single unit. One of the implications of this design is that the QFabric system can no longer use a hardware-assist mechanism to detect different physical components of the QFabric system. For this reason, QFabric uses an internal protocol called Virtual Chassis Control Protocol Daemon (VCCPD ) to make sure that all the system components can be detected.14This Week: QFabric System Trafc Flows and TroubleshootingSystem and Component DiscoveryVirtual Chassis Control Protocol Daemon runs on the Control plane Ethernet network and is active by default on all the components of the QFabric system. This means that VCCPD runs on all the Nodes, Interconnects, Fabric Control VMs, Fabric Manager VMs, and the Network Node Group VMs. Note that this protocol runs on the backup VMs as well.There is a VM that runs on the DG whose function is to make these VCCPD adjacen-cies with all the devices. This VM is called Fabric Manager, or FM.The Control plane Ethernet network is comprised of the EX Series VC and all of the physical components that have Ethernet ports connected to these EX VCs. Since there are no IP addresses on the devices when they come up, VCCPD protocol uses the IS-IS protocol to make sure that no IP addresses are needed for the system discovery. All the components send out and receive VCCPD Hello messages on the Control plane Ethernet (CPE) network. With the help of these messages, the Fabric Manager VM is able to detect all the components that are connected to the EX Series VCs. Consider a system in which you have only the DGs connected to the EX Series VC. The DGs host the Fabric Manager VMs, which send VCCPD Hellos on the CPE network. When a new Node is connected to the CPE, then FM and the new Node form a VCCPD adjacency, and this is how the DGs detect the event of a new Nodes addition to the QFabric system. This same process holds true for the Interconnect devices, too.After the adjacency is created, the Nodes, Interconnects, and the FM send out periodic VCCPD Hellos on the CPE network. These Hellos act as heartbeat messages and bidirectional Hellos conrm the presence or absence of the components.If the FM doesnt receive a VCCPD Hello within the hold time, then that device is considered dead and all the routes that were originated from that Node are ushed out from other Nodes.Like any other protocol, VCCPD adjacencies are formed by the Routing Engine of each component, so VCCPD adjacency stats are available at: The NNG VM for the Node devices that are a part of the Network Node Group The master RSNG Node device for a Redundant Server Node Group The RE of a standalone Server Node Group deviceThe show virtual-chassis protocol adjacency provisioning CLI command shows the status of the VCCPD adjacencies:qfabric-admin@NW-NG-0> show virtual-chassis protocol adjacency provisioningInterface System StateHold (secs)vcp1.32768P7814-CUp28vcp1.32768P7786-CUp28vcp1.32768R4982-CUp28vcp1.32768TSTS2510bUp29vcp1.32768TSTS2609bUp28vcp1.32768TSTS2608aUp27vcp1.32768TSTS2610bUp28vcp1.32768TSTS2611bUp28vcp1.32768TSTS2509bUp28vcp1.32768TSTS2511aUp29vcp1.32768TSTS2511bUp28vcp1.32768TSTS2510aUp28Chapter 1:Physical Connectivity and Discovery15vcp1.32768TSTS2608bUp29vcp1.32768TSTS2610aUp28vcp1.32768TSTS2509aUp28vcp1.32768TSTS2611aUp28vcp1.32768TSTS1302bUp29vcp1.32768TSTS2508aUp29vcp1.32768TSTS2508bUp29vcp1.32768TSTNNGS1205a Up27vcp1.32768TSTS1302aUp28vcp1.32768__NW-INE-0_RE0 Up28vcp1.32768TSTNNGS1204a Up29vcp1.32768G0548/RE0Up27vcp1.32768G0548/RE1Up28vcp1.32768G0530/RE1Up29vcp1.32768G0530/RE0Up28vcp1.32768__RR-INE-1_RE0 Up29vcp1.32768__RR-INE-0_RE0 Up29vcp1.32768__DCF-ROOT.RE0 Up29vcp1.32768__DCF-ROOT.RE1 Up28 {master}The same output can also be viewed from the Fabric Manager VM:root@Test-QFabric> request component login FM-0Warning: Permanently added 'dcfnode---dcf-root,169.254.192.17' (RSA) to the list of known hosts.--- JUNOS 12.2X50-D41.1 built 2013-03-22 21:44:05 UTCqfabric-admin@FM-0>qfabric-admin@FM-0> show virtual-chassis protocol adjacency provisioningInterface System StateHold (secs)vcp1.32768P7814-CUp27vcp1.32768P7786-CUp28vcp1.32768R4982-CUp29vcp1.32768TSTS2510bUp29vcp1.32768TSTS2609bUp28vcp1.32768TSTS2608aUp29vcp1.32768TSTS2610bUp28vcp1.32768TSTS2611bUp28vcp1.32768TSTS2509bUp27vcp1.32768TSTS2511aUp29vcp1.32768TSTS2511bUp29vcp1.32768TSTS2510aUp27vcp1.32768TSTS2608bUp28vcp1.32768TSTS2610aUp28vcp1.32768TSTS2509aUp28vcp1.32768TSTS2611aUp28vcp1.32768TSTS1302bUp28vcp1.32768TSTS2508aUp27vcp1.32768TSTS2508bUp29vcp1.32768TSTNNGS1205aUp28vcp1.32768TSTS1302aUp29vcp1.32768__NW-INE-0_RE0 Up28vcp1.32768TSTNNGS1204aUp28vcp1.32768G0548/RE0Up28vcp1.32768G0548/RE1Up29vcp1.32768G0530/RE1Up28vcp1.32768G0530/RE0Up27vcp1.32768__RR-INE-1_RE0 Up29vcp1.32768__NW-INE-0_RE1 Up28vcp1.32768__DCF-ROOT.RE0 Up29vcp1.32768__RR-INE-0_RE0 Up28--snip--16This Week: QFabric System Trafc Flows and TroubleshootingVCCPD Hellos are sent every three seconds and the adjacency is lost if the peers dont see each others Hellos for 30 seconds.After the Nodes and Interconnects form VCCPD adjacencies with the Fabric Man-ager VM, the QFabric system has a view of all the connected components.Note that the VCCPD adjacency only provides details about how many Nodes and Interconnects are present in a QFabric system. VCCPD does not provide any infor-mation about the data plane of the QFabric system; that is, it doesnt provide infor-mation about the status of connections between the Nodes and the Interconnects.Fabric Topology Discovery (VCCPDf)The Nodes can either be QFX-3500s, or QFX-3600s (QFX5100s are supported as a QFabric node only from 13.2X52-D10 onwards), and both of these have four FTE links by default. Note that the term FTE-link here means the links that can be connected to the Interconnects.The number of FTE links on a QFX 3600 can be modied by using the CLI, but this modication can not be preformed on the QFX 3500. These FTE links can be connected to up to four different Interconnects and the QFabric system uses a protocol called VCCPDf (VCCPD over fabric links) which helps the Director devices form a complete topological view of the QFabric system. One of the biggest advantages of the QFabric technology is its exibility and its ability to scale. To further understand this exibility and scalability, consider a new Data Center deployment in which the initial bandwidth requirements are so low that none of the Nodes are expected to have more than 80 Gbps of incoming trafc at any given point in time. This means that this Data Center can be deployed with all the Nodes having just two out of the four FTE links connected to the Interconnects. To have the necessary redundancy, these two FTE links would be connected to two different Interconnects.In short, such a Data Center can be deployed with only two Interconnects. However, as the trafc needs of the Data Center grow, more Interconnects can be deployed and then the Nodes can be connected to the newly added Interconnects to allow for greater data plane bandwidth. This kind of exibility can allow for future proong of an investment made in the QFabric technology.Note that a QFabric system has the built in intelligence to gure out how many FTE links are connected on each Node and this information is necessary to be able to know how to load-balance various kinds of trafc between different Nodes.The QFabric technology uses VCCPDf to gure out the details of the data plane. Whenever a new FTE link is added or removed, it triggers the creation of a new VCCPDf adjacency or the deletion of an existing VCCPDf adjacency, respectively. This information is then fed back to the Director devices over the CPE links so that the QFabric system can always maintain a complete topological view of how the Nodes are connected to the Interconnects. Basically, VCCPDf is a protocol that runs on the FTE links between the Nodes and the Interconnects.VCCPDf runs on all the Nodes and the Interconnects but only on the 40GbE (or FTE) ports. VCCPDf utilizes the neighbor discovery portion of IS-IS. As a result, each Node device would be able to know how many Interconnects it is connected to, the device ID of those Interconnects, and the connected ports ID on the Interconnects. Similarly each Interconnect would be able to know how many Node devices it is connected to, the device ID of those Node devices, and the connected ports ID on the Node devices. This information is fed back to the Director devices. With the help of this information, the Director devices are able to formulate the complete topological Chapter 1:Physical Connectivity and Discovery17picture of the QFabric system. This topological information is necessary in order to congure the forwarding tables of the Node devices efciently. The sequence of steps mentioned later in this chapter will explain why the topological database is needed. (This topological database contains information about how the Nodes are connected to the Interconnects).Relation Between VCCPD and VCCPDfAll Juniper devices that run the Junos OS run a process called chassisd (chassis dae-mon). The chassisd process is responsible for monitoring and managing all the hard-ware-based components present on the device. QFabric software also uses chassisd. Since there is a system discovery phase involved, inventory management is a little different in this distributed architecture.Here are the steps that take place internally with respect to system discovery, VCCPD, and VCCPDf:Nodes, Interconnects, and the VMs exchange VCCPD Hellos on the control plane Ethernet network.The Fabric Manager VM processes the VCCPD Hellos from the Nodes and the Interconnects.The Fabric Manager VM then assigns a unique PFE-ID to each Node and Interconnect. (The algorithm behind the generation of PFE-ID is Juniper condential and is beyond the scope of this book.)This PFE-ID is also used to derive the internal IP address for the components.After a Node or an Interconnect is detected by VCCPD, the FTE links are activated and VCCPDf starts running on the 40GbE links.Whenever a new 40GbE link is brought up on a Node or an Interconnect, this information is sent back to the Fabric Manager so that it can update its view of the topology.Note that any communication with the Fabric Manager is done using the CPE network.Whenever such a change occurs (an FTE link is added or removed), the Fabric Manager recomputes the way data should be load balanced on the data plane.Note that the load balancing does not take place per packet or per prex.The QFabric system applies an algorithm to nd out the different FTE links through which other Nodes can be reached.Consider that a Node has only one FTE link connected to an Interconnect.At this point in time, the Node has only one way to reach the other Nodes.Now if another FTE link is connected, then the programming would be altered to make sure the next hop for some Nodes is FTE-1 and is FTE-2 for others.With the help of both VCCPD and VCCPDf, the QFabrics Director devices are able to get information about: How many devices, and which ones (Nodes and Interconnects), are part of the QFabric system (VCCPD). How the Nodes are connected to the Interconnects (VCCPDf).At this point in time, the QFabric system becomes ready to start forwarding trafc. Now lets take a look at how VCCPD and VCCPDf become relevant when it comes to a real life QFabric solution. Consider this sequence of steps:18This Week: QFabric System Trafc Flows and Troubleshooting1. The only connections present are the DG0-DG1 connections and the connections between the Director devices and the EX Series VC. 1.1. Note that DG0 and DG1 would assign IP addresses of 1.1.1.1 and 1.1.1.2 respectively to their bond0 links.This is the link over which the Director devices sync up with each other.Figure 1.1Only the Control Plane Connections are Up1.2. The Fabric Manager VM running on the DGs would run VCCPD and the DGs will send VCCPD Hellos on their links to the EX Series VC. Note that there would be no VCCPD neighbors at this point in time as Nodes and Interconnects are yet to be connected. Also, the Control plane switches (EX Series VC) do not participate in the VCCPD adjacencies. Their function is only to provide a Layer 2 segment for all the components to communicate with each other.Chapter 1:Physical Connectivity and Discovery19Figure 1.2Two Interconnects are Added to the CPE Network2. In Figure 1.2 two Interconnects (IC-1 and IC-2) are connected to the EX Series VC.2.1. The Interconnects start running VCCPD on the link connected to the EX Series VC. The EX Series VC acts as a Layer 2 switch and only oods the VCCPD packets.2.2. The Fabric Manager VMs and the Interconnects see each others VCCPD Hellos and become neighbors. At this point in time, the DGs know that IC-1 and IC-2 are a part of the QFabric system.20This Week: QFabric System Trafc Flows and TroubleshootingFigure 1.3Two Nodes are Connected to the CPE Network3. In Figure 1.3 two new Node devices (Node-1 and Node-2) are connected to the EX Series VC.3.1. The Nodes start running VCCPD on the links connected to the EX Series VC. Now the Fabric Manager VMs know that there are four devices in the QFabric inventory: IC-1, IC-2, Node-1, and Node-2.3.2. Note that none of the FTE interfaces of the Nodes are up yet. This means that there is no way for the Nodes to forward trafc (there is no data plane connectivity). Whenever such a condition occurs, Junos disables all the 10GbE interfaces on the Node devices. This is a security measure to make sure that a user cannot connect a production server to a Node device that doesnt have any active FTE ports. This also makes troubleshooting very easy. If all the 10GbE ports of a Node device go down even when devices are connected to it, the rst place to check should be the status of the FTE links. If none of the FTE links are in the up/up state, then all the 10GbE interfaces will be disabled. In addition to bringing down all the 10GbE ports, the QFabric system also raises a major system alarm. The alarms can be checked using the show system alarms CLI command.Chapter 1:Physical Connectivity and Discovery21Figure 1.4Node-1 and Node-2 are Connected to IC-1 and IC-2, Respectively4. In Figure 1.4 The following FTE links are connected:4.1Node-1 to IC-1.4.2Node-2 to IC-2.4.3The Nodes and the Interconnects will run VCCPDf on the FTE links and see each other.4.4This VCCPDf information is fed to the Director devices. At this point in time, the Directors know that: There are four devices in the QFabric system. This was established at point# 3.1. Node-1 is connected to IC-1 and Node-2 is connected to IC-2.5. Note that some of the data plane of the QFabric is connected, but there would be no connectivity for hosts across Node devices. This is because Node-1 has no way to reach Node-2 via the data plane and vice-versa, as the Interconnects are never connected to each other. The only interfaces for the internal data plane of the QFabric system are the 40GbE FTE interfaces. In this particular example, Node-1 is connected to IC-1, but IC-1 is not connected to Node-2. Similarly, Node-2 is connected to IC-2, but IC-2 is not connected to Node-1. Hence, hosts connected behind Node-1 have no way of reaching hosts connected behind Node-2, and vice-versa. 22This Week: QFabric System Trafc Flows and TroubleshootingFigure 1.5Node-1 is Connected to IC-26. In Figure 1.5 Node-1 is connected to IC-2. At this point, the Fabric Manager has the following information: 6.1There are four devices in the QFabric system. 6.2IC-1 is connected to Node-1. 6.3IC-2 is connected to both Node-1 as well as Node-2. The Fabric Manager VM running inside the Director devices realizes that Node-1 and Node-2 now have mutual reachability via IC-2. FM programs the internal forwarding table of Node-1. Now Node-1 knows that to reach Node-2, it needs to have the next hop of IC-2. FM programs the internal forwarding table of Node-2. Now Node-2 knows that to reach Node-1, it needs to have the next-hop of IC-2.6.4At this point in time, hosts connected behind Node-1 should be able to communicate with hosts connected behind Node-2 (provided that the basic laws of networking like VLAN, routing, etc. are obeyed).Chapter 1:Physical Connectivity and Discovery23Figure 1.6Node-2 is Connected to IC-1, which Completes the DataPlane of the QFabric System7. In Figure 1.6 Node-2 is connected to IC-1.7.1The Nodes and IC-1 discover each other using VCCPDf and send this information to Fabric Manager VM running on the Directors.7.2Now the FM realizes that Node-1 can reach Node-2 via IC-1, also.7.3After the FM nishes programming the tables of Node-1 and Node-2, both Node devices will have two next hops to reach each other. These two next hops can be used for load-balancing purposes. This is where the QFabric solution provides excellent High Availability and also effective load balancing of different ows as we add more 40GbE uplinks to the Node devices.At the end of all these steps, the internal VCCPD and VCCPDf adjacencies of the QFabric would be complete, and the Fabric Manager will have a complete topologi-cal view of the system.24This Week: QFabric System Trafc Flows and TroubleshootingTest Your KnowledgeQ: Name the different physical components of a QFabric system. Nodes, Interconnects, Director devices, EX Series VCs.Q: Which of these components are connected to each other? The EX Series VC is connected to the Nodes, Interconnects, and the Director devices. The Director devices are connected to each other. The Nodes are connected to the Interconnects.Q: Where is the management IP address of the QFabric system congured? During installation, the user is prompted to enter a VIP. This VIP is used for remote management of the QFabric system.Q: Which protocol is used for QFabric system discovery? Where does it run? VCCPD is used for system discovery and it runs on all the components and VMs. The adjacencies for VCCPD are established over the CPE network.Q: Which protocol is used for QFabric data plane topology discovery? Where does it run? VCCPDf is used for discovering the topology on the data plane. VCCPDf runs only on the Nodes and the Interconnects, and adjacencies for VCCPDf are established on the 40GbE FTE interfaces.Q: Which Junos process is responsible for the hardware inventory management of a system? Chassisd.Chapter 2Accessing Individual ComponentsLogging In to Various QFabric Components . . . . . . . . . . . . . . . . . . . . . . . . . . . 26Checking Logs at Individual Components . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32Enabling and Retrieving Trace Options From a Component:. . . . . . . . . . . 35Extracting Core Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36Checking for Alarms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37Inbuilt Scripts. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37Test Your Knowledge . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3826This Week: QFabric System Trafc Flows and TroubleshootingBefore this book demonstrates how to troubleshoot any problems, this chapter will educate the reader about logging in to different components and how to check and retrieve logs at different levels (physical and logical components) of a QFabric system. Details on how to congure a QFabric system and aliases for individual Nodes are documented in the QFabric Deployment Guide at www.juniper.net/documentation.Logging In to Various QFabric ComponentsAs discussed previously, a QFabric solution has many components, some of which can be physical (Nodes, Interconnects, Director devices, CPE, etc.), and some of which are logical VMs. One of the really handy features about QFabric is that it allows administrators (those with appropriate privileges) to log in to these individual components, an extremely convenient feature when an administrator needs to do some advanced troubleshooting that requires logging in to a specic component of the system.The hardware inventory of any Juniper router or switch is normally checked using the show chassis hardware command. QFabric software also supports this com-mand, but there are multiple additions made to this command expressly for the QFabric solution, and these options allow users to check the hardware details of a particular component as well. For instance:root@Test-QFABRIC> show chassis hardware ?Possible completions: Execute this command clei-modelsDisplay CLEI barcode and model number for orderable FRUs detail Include RAM and disk information in output extensiveDisplay ID EEPROM information interconnect-device Interconnect device identifier models Display serial number and model number for orderable FRUs node-deviceNode device identifier |Pipe through a commandroot@Test-QFABRIC> show chassis hardware node-device ? Possible completions: Node device identifier BBAK0431 Node device Node0 Node device Node1 Node device--SNIPConsider the following QFabric system:root@Test-QFABRIC> show fabric administration inventory ItemIdentifierConnection ConfigurationNode groupNW-NG-0 Connected ConfiguredNode0 P6966-C Connected Node1 BBAK0431Connected RSNG-1Connected ConfiguredNode2 P4423-C Connected Node3 P1377-C Connected RSNG-2Connected ConfiguredNode4 P6690-C Connected Node5 P6972-C Connected Interconnect deviceIC-A9122 Connected ConfiguredA9122/RE0Connected A9122/RE1Connected IC-IC001 Connected ConfiguredIC001/RE0ConnectedChapter 2:Accessing Individual Components27IC001/RE1Connected Fabric managerFM-0 Connected ConfiguredFabric controlFC-0 Connected ConfiguredFC-1 Connected ConfiguredDiagnostic routing engineDRE-0Connected ConfiguredThis output shows the alias and the serial number (mentioned under the Identier column) of every Node that is a part of the QFabric system. It also shows if the Node is a part of an SNG, Redundant-SNG, or the Network Node Group.The rightmost column of the output shows the state of each component. Each component of the QFabric should be in Connected state. If a component shows up as Disconnected, then there must be an underlying problem and troubleshooting is required to nd out the root cause.As shown in Figure 2.1, this particular fabric system has six Nodes and two Inter-connects. Node-0 and Node-1 are part of the Network Node Group, Node-2 and Node-3 are part of a Redundant-SNG named RSNG-1, Node-4 and Node-5 are part of another Redundant-SNG named RSNG-2.Figure 2.1Visual Representation of the QFabric System of This SectionThere are two ways of accessing the individual components: From the Linux prompt of the Director devices From the QFabric CLIAccessing Components From the DGsAll the components are assigned an IP address in the 169.254 IP-range during the system discovery phase. Note that IP addresses in the 169.254.193.x range are used 28This Week: QFabric System Trafc Flows and Troubleshootingto allot IP addresses to Node groups and the Interconnects and IPs in the 169.254.128.x range are allotted to Node devices and to VMs. These IP addresses are used for internal management and can be used to log in to individual components from the Director devices Linux prompt. The IP addresses of the components can be seen using the dns.dump utility, which is located under /root on the Director devices. Here is an example showing sample output from dns.dump and explaining how to log in to various components:[root@dg0 ~]# ./dns.dump ; DiG 9.3.6-P1-RedHat-9.3.6-4.P1.el5 -t axfr pkg.dcbg.juniper.net @169.254.0.1;; global options:printcmdpkg.dcbg.juniper.net. 600 INSOA ns.pkg.dcbg.juniper.net. mail.pkg.dcbg.juniper.net. 104 3600 600 7200 3600pkg.dcbg.juniper.net. 600 INNSns.pkg.dcbg.juniper.net.pkg.dcbg.juniper.net. 600 INA 169.254.0.1pkg.dcbg.juniper.net. 600 INMX1 mail.pkg.dcbg.juniper.net.dcfnode---DCF-ROOT.pkg.dcbg.juniper.net. 45 IN A 169.254.192.17 show system core-dumpsRepository scope: sharedRepository head: /pbdata/exportList of nodes for core repository: /pbdata/export/rdumps/ Detected new host dcfnode---DRE-0dcfnode---DRE-0 - ok----> Detected new host dcfnode-13daf6fc-9b6c-11e2-bafc-00e081ce1e76dcfnode-13daf6fc-9b6c-11e2-bafc-00e081ce1e76 - ok----> Detected new host dcfnode-150d8a4e-9b6c-11e2-a1ae-00e081ce1e76dcfnode-150d8a4e-9b6c-11e2-a1ae-00e081ce1e76 - ok----> Detected new host dcfnode-16405946-9b6c-11e2-a345-00e081ce1e76dcfnode-16405946-9b6c-11e2-a345-00e081ce1e76 - ok38This Week: QFabric System Trafc Flows and Troubleshooting----> Detected new host dcfnode-17732b54-9b6c-11e2-a937-00e081ce1e76dcfnode-17732b54-9b6c-11e2-a937-00e081ce1e76 - ok----> Detected new host dcfnode-226b5716-9b80-11e2-aea7-00e081ce1e76--snip-- dcf_sfc_show_versions: Shows the software version (revision number) running on the SFC component. Also shows the versions of various daemons running on the system. CAUTIONCertain scripts can cause some trafc disruption and hence should never be run on a QFabric system that is carrying production trafc, for instance: format.sh, dcf_sfc_wipe_cluster.sh, reset_initial_conguration.sh. Test Your KnowledgeQ: Which CLI command can be used to view the hardware inventory of the Nodes and Interconnects? The show chassis hardware command can be used to view the hardware inventory. This is a Junos command that is supported on other Juniper plat-forms as well. For a QFabric system, additional keywords can be used to view the hardware inventory of a specic Node or Interconnect.Q: Which CLI command can be used to display all the individual components of a QFabric system? The show fabric administration inventory command lists all the compo-nents of the QFabric system and their current states.Q: What are the two ways to log in to the individual components? From the Linux prompt of the Director devices. From the CLI using the request component login command.Q: What is the IP address range that is allocated to Node groups and Node devices? Node devices: 169.254.128.x Node groups: 169.254.193.xQ: What inbuilt script can be used to obtain the IP address allocated to the different components of a QFabric system? The dns.dump script is located at /root directory on the Director devices.Chapter 3Control Plane and Data Plane FlowsControl Plane and Data Plane . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40Routing Engines. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40Route Propagation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43Maintaining Scale. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44Distributing Routes to Different Nodes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47Differences Between Control Plane Trafc and Internal Control Plane Trafc . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58Test Your Knowledge . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6240This Week: QFabric System Trafc Flows and TroubleshootingOne of the goals of this book is to help you efciently troubleshoot an issue on a QFabric system. To achieve this, its important to understand exactly how the internal QFabric protocols operate and the packet ow of both the data plane and control plane trafc.Control Plane and Data PlaneJunipers routing and switching platforms, like the MX Series and the EX Series, all implement the concept of separating the data plane from the control plane. Here is a quick explanation: The control plane is responsible for a devices interaction with other devices and for running various protocols. The control plane of a device resides on the CPU and is responsible for forming adjacencies and peerings, and for learning routes (Layer 2 or Layer 3). The control plane sends the information about these routes to the data plane. The data plane resides on the chip or ASIC and this is where the actual packet forwarding takes place. Once the control plane sends information about specic routes to the data plane, the forwarding tables on the ASIC are populated accordingly. The data plane takes care of functions like forwarding, QoS, ltering, packet-parsing, etc. The performance of a device is determined by the quality of its data plane (also called the Packet Forwarding Engine or PFE).Routing Engines This chapter discusses the path of packets for control plane and data plane trafc. The following are bulleted lists about the protocols run on these abstractions or Node groups.Server Node Group (SNG) As previously discussed, when a Node is connected to a QFabric system for the rst time, it comes up as an SNG. Its considered to be a Node group with only one Node. The SNG is designed to be connected to servers and devices that do not need cross-Node resiliency. The SNG doesnt run any routing protocols, needs to run only host-facing protocols like LACP, LLDP, ARP. The Routing Engine functionality is present on the local CPU. This means that MAC-addresses are learned locally for the hosts that are connected directly to the SNG. The local PFE has the data plane responsibilities. See Figure 3.1.Chapter 3:Control Plane and Data Plane Flows41Figure 3.1Server Node Group (SNG)Redundant Server Node Group (RSNG) Two independent SNGs can be combined (using conguration) to become an RSNG. The RSNG is designed to be connected to servers/devices that need cross-Node resiliency. Common design: At least one NIC of a server is connected to each Node of an RSNG. These ports are bundled together as a LAG (LACP or static-LAG). Doesnt run any routing-protocols, needs to run only host-facing protocols like LACP, LLDP, and ARP. The Routing Engine functionality is active/passive (only one Node has the active RE, the other stays in backup mode). This means that the MAC address-es of switches/hosts connected directly to the RSNG-Nodes are learned on the active-RE of the RSNG. The PFEs of both the Nodes are active and forward trafc at all times. See Figure 3.2.42This Week: QFabric System Trafc Flows and TroubleshootingFigure 3.2 Redundant Server Node Group (RSNG)Network Node Group (NW-NG) Up to eight Nodes can be congured to be part of the NW-NG. The NW-NG is designed provide connectivity to routers, rewalls, switches, etc. Common design: Nodes within NW-NG connect to routers, rewalls, or other important Data Center devices like load-balancers, lters, etc. Runs all the protocols available on RSNG/SNG. It can also run protocols like RIP, OSPF, BGP, xSTP, PIM, etc. The Routing Engine functionality is located on VMs that run on the DGs; these VMs are active/passive. The REs of the Nodes are disabled. This means that the MAC addresses of the devices connected directly to the NW-NG are learned on the active NW-NG-VM. Also, if the NW-NG is running any Layer 3 protocols with the connected devices (OSPF, BGP, etc.), then the routes are also learned by the active NW-NG-VM. The PFE of the Nodes within an NW-NG is active at all times. See Figure 3.3.Chapter 3:Control Plane and Data Plane Flows43Figure 3.3Network Node GroupRoute PropagationAs with any other internetworking device, the main job of QFabric is to send trafc end-to-end. To achieve this, the system needs to learn various kinds of routes (such as Layer 2 routes, Layer 3 routes, ARP, etc.). As discussed earlier, there can be multiple active REs within a single QFabric system. Each of these REs can learn routes locally, but a big part of understanding how QFabric operates is to know how these routes are exchanged between various REs within the system.One approach to exchanging these routes between different REs is to send all the routes learned on one RE to all the other active REs. While this is simple to do, such an implementation will be counter productive because all the routes eventually will need to be pushed down to the PFE so that hardware forwarding can take place. If you send all the routes to every RE, then the complete scale of the QFabric comes down to the table limits of a single PFE. It means that the scale of the complete QFabric solution is as good as the scale of a single RE. This is undesirable and the next section discusses how Junipers QFabric technology maintains scale with a distributed architecture.44This Week: QFabric System Trafc Flows and TroubleshootingMaintaining Scale One of the key advantages of the QFabric architecture is its scale. The scaling num-bers of MAC addresses and IP addresses obviously depend on the number of Nodes that are a part of a QFabric system because the data plane always resides on the Nodes and you need the routes to be programmed in the PFE (the data plane) to ensure end-to-end trafc forwarding. As discussed earlier, all the routes learned on an RE are not sent to every other RE. Instead, an RE receives only the routes that it needs to forward data. This poses a big question: What parameters decide if a route should be sent to a Nodes PFE or not? The answer is: it depends on the kind of route. The deciding factor for a Layer 2 route is different from the factor for a Layer 3 route. Lets examine them briey to under-stand these differences.Layer 2 RoutesA Layer 2 route is the combination of a VLAN and a MAC address (VLAN-MAC pair) and the information stored in the Ethernet switching table of any Juniper EX Series switch. Now, Layer 2 trafc can be either unicast or BUM (Broadcast, Un-known-unicast, or Multicast in which all three kinds of trafc would be ooded within the VLAN). Figure 3.4 is a representation of a QFabric system where Node-1 has active ports in VLANs 10 and 20 connected to it, Node-2 has hosts in VLANs 20 and 30 connected to it, and both Node-3 and Node-4 have hosts in VLANs 30 and 40 connected to it. Active ports means that the Nodes either have hosts directly connected to them, or that the hosts are plugged into access switches and these switches plug into the Nodes. For the sake of simplicity, lets assume that all the Nodes shown in Figure 3.4 are SNGs, meaning that for this section, the words Node and RE can be used inter-changeably. Figure 3.4A Sample of QFabrics Nodes, ICs, and connected HostsChapter 3:Control Plane and Data Plane Flows45Layer 2 Unicast TrafcConsider that Host-2 wants to send trafc to Host-3, and lets assume that the MAC address of Host-3 is already learned. This would cause trafc to be Layer 2 Unicast trafc as both the source and destination devices are in the same VLAN. When Node-1 sees this trafc coming in from Host-2, all of it should be sent over to Node-2 internally within the QFabric example in Figure 3.4. When Node-2 receives this trafc, it should be sent unicast to the port when the host is connected. This kind of communication means: Host-3s MAC address is learned on Node-2. There should be some way to send this layer-2-routes information over to Node-1. Once Node-1 has this information, it knows that everything destined to Host-3s MAC address should be sent to Node-2 over the data plane of the QFabric. This is true for any other host in VLAN-20 that is connected on any other Node. Note that if Host-5 wishes to send some trafc to Host-3, then this trafc must be routed at Layer 3, as these hosts are in different VLANs. The regular laws of networking would apply in this case and Host-1 would need to resolve the ARP for its gateway. The same concept would apply if Host-6 wishes to send some data to Host-3. Since none of the hosts behind Node-3 ever need to resolve the MAC address of Host-3 to be able to send data to it, there is no need for Node-2 to advertise Host-3s MAC address to Node-3. However, this would change if a new host in VLAN-20 is connected behind Node-3.Conclusion: if a Node learns of a MAC address in a specic VLAN, then this MAC address should be sent over to all the other Nodes that have an active port in that particular VLAN. Note that this communication of letting other Nodes know about a certain MAC address would be a part of the internal Control Plane trafc within the QFabric system. This data will not be sent out to devices that are connected to the Nodes of the QFabric system. Hence, for Layer 2 routes, the factor that decides whether a Node gets that route or not is the VLAN. Layer 2 BUM Trafc Lets consider that Host-4 sends out Layer 2 broadcast trafc, that is, frames in which the destination MAC address is ff:ff:ff:ff:ff:ff and that all this trafc should be ooded in VLAN-30. In the QFabric system depicted in Figure 3.4, there are three Nodes that have active ports in VLAN-30: Node-2, Node-3, and Node-4. What happens? All the broadcast trafc originated by the Host-4 should be sent internally to Node-3 and Node-4 and then these Nodes should be able to ood this trafc in VLAN-30. Since Node-1 doesnt have any active ports in VLAN-30, it doesnt need to ood this trafc out of any revenue ports or server facing ports. This means that Node-2 should not send this trafc over to Node-1. However, at a later time, if Node-1 gets an active port in VLAN-30 then the broadcast trafc will be sent to Node-1 as well. These points are true for BUM trafc assuming that IGMP snooping is disabled.46This Week: QFabric System Trafc Flows and TroubleshootingIn conclusion, if a Node receives BUM trafc in a VLAN, then all that trafc should be sent over to all the other Nodes that have an active port in that VLAN and not to those Nodes that do not have any active ports in this VLAN.Layer 3 RoutesLayer 3 routes are good old unicast IPv4 routes. Note that only the NW-NG-VM has the ability to run Layer 3 protocols with externally connected devices, hence at any given time, the active NW-NG-VM has all the Layer 3 routes learned in all the routing instances that are congured on a given QFabric system. However, not all these routes are sent to the PFE of all the Nodes within an NW-NG. Lets use Figure 3.5, which represents a Network Node Group, for the discussion of Layer 3 unicast routes. All the Nodes shown are the part of NW-NG. Host-1 is connected to Node-1, Host-2 and Host-3 are connected to Node-2, and Host-4 is connected to Node-3. You can see that all the IP addresses and the subnets are shown as well. Additionally, the subnets for Host-1 and Host-2 are in routing instance RED, whereas the subnets for Host-3 and Host-4 are in routing instance BLUE. The default gateways for these hosts are the Routed VLAN Interfaces (RVIs) that are congured and shown in the diagram in Figure 3.5. Lets assume that there are hosts and devices connected to all three Nodes in the default (master) routing instance, although not shown in the diagram. The case of IPv4 routes is much simpler than Layer 2 routes. Basically, its the routing instance that decides if a route should be sent to other REs or not. Figure 3.5Network Node Group and Connected HostsChapter 3:Control Plane and Data Plane Flows47In the QFabric conguration shown in Figure 3.5, the following takes place: Node-1 and Node-2 have one device each connected in routing instance RED. The default gateway (interface vlan.100) for these devices resides on the active NW-NG-VM, meaning that the NW-NG-VM has two direct routes in this routing instance, one for the subnet 1.1.1.0/24 and the other for 2.2.2.0/24. Since the route propagation deciding factor for Layer 3 routes is the routing instance, the active NW-NG-VM sends the route of 1.1.1.0/24 and 2.2.2.0/24s to both Node-1 and Node-2 so that these routes can be programmed in the data plane (PFE of the Nodes). The active NW-NG-VM will not send the information about the directly connected routes in routing instance BLUE over to Node-1 at all. This is because Node-1 doesnt have any directly connected devices in the BLUE routing instance. This is true for all kinds of routes learned within a routing instance; they could either be directly connected, static, or learned via routing protocols like BGP, OSPF, or IS-IS. All of the above applies to Node-2 and Node-3 for the routing instance named BLUE. All of the above applies to SNGs, RSNGs, and the master routing instance.In conclusion, the route learning always takes place at the active NW-NG-VM and only selective routes are propagated to the individual Nodes for programming the data plane (PFE of the Nodes). The individual Nodes get the Layer 3 routes from the active NW-NG-VM only if the Node has an active port in that routing instance. This concept of sending routes to an RE/PFE only if it needs that route ensures that you do not send all the routes everywhere. Thats the high scale at which a QFabric system can operate. Now lets discuss how are those routes are sent over to different Nodes.Distributing Routes to Different NodesThe QFabric technology uses the concept of Layer 3 MPLS-VPN (RFC-2547) internally to make sure that a Node gets only the routes that it needs. RFC2547 introduced the concept of Route Distinguishers (RD) Route Targets (RT). QFabric technology also uses the same concept to make sure that a route gets only the routes that it needs. Lets again review the different kind of routes in Table 3.1.Table 3.1Layer 2 and Layer 3 Route ComparsionLayer 2Routes Layer 3 RoutesDeciding factor is the VLAN. Deciding factor is the routing instance.Internally, each VLAN is assigned a token. Internally, each routing instance is assigned a token.This token acts as the RD/RT and acts as the deciding factor about whether a route should be sent to a Node or not.This token acts as the RD/RT and acts as the deciding factor about whether a route should be sent to a Node or not.48This Week: QFabric System Trafc Flows and TroubleshootingEach active RE within the QFabric system forms a BGP peering with the VMs called FC-0 and FC-1. All the active REs send all Layer 2 and Layer 3 routes over to the FC-0 and FC-1 VMs via BGP. These VMs only send the appropriate routes over to individual REs (only the routes that the REs need).The FC-0 and FC-1 VMs act as route reectors. However, these VMs follow the rules of QFabric technology when it comes to deciding which routes to be sent to which RE (not sending the routes that an RE doesnt need).Figure 3.6 shows all the components (SNG, RSNG, and NW-NG VMs) sending all of their learned routes (Layer 2 and Layer 3) over to the Fabric Control VM.Figure 3.6Different REs Send Their Learned Routes to the Fabric Control VMHowever, the Fabric Control VM sends only those routes to a component that are relevant to it. In Figure 3.7, the different colored arrows signify that the relevant routes that the Fabric Control VMs send to each component may be different.Lets look at some show command snippets that will demonstrate how a local route gets sent to the FC, and then how the other Nodes see it. They will be separated into Layer 2 and Layer 3 routes, and most of the snippets have bolded notes preceded by show vlans V709---qfabric extensiveVLAN: V709---qfabric, Created at: Thu Nov 14 05:39:28 2013802.1Q Tag: 709, Internal index: 4, Admin State: Enabled, Origin: StaticProtocol: Port Mode, Mac aging time: 300 secondsNumber of interfaces: Tagged 0 (Active = 0), Untagged0 (Active = 0){master}qfabric-admin@RSNG0> show fabric vlan-domain-map vlan 4Vlan L2Domain L3-Ifl L3-Domain412 00{master}qfabric-admin@RSNG0>The Layer 2 domain shown in the output of show fabric vlan-domain-map vlan contains the same value as that of the hardware token of the VLAN and its also called the L2Domain-Id for a particular VLAN.As discussed earlier, this route is sent over to the FC-VM. This is how the route looks on the FC-VM (note that the FC-VM uses a unique table called bgp.brid-gevpn.0 :qfabric-admin@FC-0> show route fabric table bgp.bridgevpn.0--snip--65534:1:12.ac:4b:c8:f8:68:97/152 *[BGP/170] 6d 07:42:56, localpref 100AS path: I, validation-state: unverified> to 128.0.130.6 via dcfabric.0, Push 1719, Push 10, Push 25(top)[BGP/170] 6d 07:42:56, localpref 100, from 128.0.128.8AS path: I, validation-state: unverified> to 128.0.130.6 via dcfabric.0, Push 1719, Push 10, Push 25(top)Chapter 3:Control Plane and Data Plane Flows51So, the next hop for this route is being shown as 128.0.130.6. Its clear from the output snippets mentioned earlier, that this is the internal IP address for the RSNG. The bolded portion of the route shows the hardware token of the VLAN. The output snippets above showed that the token for VLAN.709 is 12.The labels that are being shown in the output of the route at the FC-VM are specic to the way the FC-VM communicates with this particular RE (the RSNG). The origination and explanation of these labels is beyond the scope of this book.As discussed earlier, a Layer 2 route should be sent across to all the Nodes that have active ports in that particular VLAN. In this specic example, here are the Nodes that have active ports in VLAN.709:root@TEST-QFABRIC# run show vlans 709Name Tag InterfacesV709 709 MLRSNG01a:xe-0/0/8.0*, NW-NG-0:ae0.0*, NW-NG-0:ae34.0*, NW-NG-0:ae36.0*, NW-NG-0:ae38.0*[edit]Since the NW-NG Nodes are active for VLAN 709, the active NW-NG-VM should have the Layer 2 route under discussion (ac:4b:c8:f8:68:97 in VLAN 709) learned via the FC-VM via the internal BGP protocol. Here are the corresponding show snippets from the NW-NG-VM (note that whenever the individual REs learn Layer 2 routes from the FC, they are stored in the table named default.bridge.0):root@TEST-QFABRIC# run request component login NW-NG-0Warning: Permanently added 'dcfnode-default---nw-ine-0,169.254.192.34' (RSA) to the list of known hosts.Password:--- JUNOS 13.1I20130618_0737_dc-builder built 2013-06-18 08:51:07 UTCAt least one package installed on this device has limited support.Run 'file show /etc/notices/unsupported.txt' for details.{master}qfabric-admin@NW-NG-0> show route fabric table default.bridge.0--snip--12.ac:4b:c8:f8:68:97/88 *[BGP/170] 1d 10:53:47, localpref 100, from 128.0.128.6AS path: I, validation-state: unverified> to 128.0.130.6 via dcfabric.0, Layer 2 Fabric Label 1719 PFE Id 10 Port Id 25[BGP/170] 1d 10:53:47, localpref 100, from 128.0.128.8AS path: I, validation-state: unverified> to 128.0.130.6 via dcfabric.0, Layer 2 Fabric Label 1719 PFE Id 10 Port Id 25The bolded portion of the snippet shows the token for VLAN.709 (12), the destina-tion PFE-ID and the Port-ID are data plane entities. This is the information that gets pushed down to the PFE of the member Nodes and then these details are used to forward data in hardware. In this example, whenever a member Node of the NW-NG gets trafc for this MAC address, it sends this data via the FTE links to the Node with PFE-ID of 10. The PFE-IDs of all the Nodes within a Node group can be seen by logging into the corresponding VM and correlating the outputs of show fabric multicast vccpdf-adjacency and show virtual chassis. In this example, its the RSNG that locally learns the Layer 2 route of ac:4b:c8:f8:68:97 in VLAN 709. Here are the outputs of commands that show which Node has the PFE of 10:root@TEST-QFABRIC# run request component login RSNG0Warning: Permanently added 'dcfNode-default-rsng0,169.254.193.3' (RSA) to the list of known hosts.Password:52This Week: QFabric System Trafc Flows and Troubleshooting--- JUNOS 13.1I20130618_0737_dc-builder built 2013-06-18 08:50:01 UTCAt least one package installed on this device has limited support.Run 'file show /etc/notices/unsupported.txt' for details.{master}qfabric-admin@RSNG0> show fabric multicast vccpdf-adjacencyFlags: S - StaleSrcSrc SrcDestSrc DestDev id INE Dev type Dev id Interface FlagsPortPort934TOR256n/a-1-1934TOR512n/a-1-110 259(s)TOR256fte-0/1/1.327681 310 259(s)TOR512fte-0/1/0.327680 311 34TOR256n/a-1-111 34TOR512n/a-1-112 259(s)TOR256fte-1/1/1.327681 212 259(s)TOR512fte-1/1/0.327680 2256260 F2 9n/a-1-1256260 F2 10 n/a-1-1256260 F2 11 n/a-1-1256260 F2 12 n/a-1-1512261 F2 9n/a-1-1512261 F2 10 n/a-1-1512261 F2 11 n/a-1-1512261 F2 12 n/a-1-1{master}The Src Dev ID shows the PFE-IDs of the member Nodes and the Interface column shows the FTE interface that goes to the interconnects. The highlighted output shows that the device with fpc-0 has the PFE-ID of 10 (fte-0/1/1 means that the port belongs to member Node which is fpc-0).The output of show virtual-chassis shows which device is fpc-0:qfabric-admin@RSNG0> show virtual-chassisPreprovisioned Virtual ChassisVirtual Chassis ID: 0000.0103.0000MstrMember IDStatus Model prioRole Serial No0 (FPC 0)Prsntqfx3500 128 Master*P6810-C1 (FPC 1)Prsntqfx3500 128 Backup P7122-C{master}These two snippets show that the device with fpc-0 is the Node with device ID of P6810-C. Also, the MAC address was originally learned on port xe-0/0/8 (refer to the preceeding outputs). The last part of the data plane information on the NW-NG was the port-ID of the Node with PFE-ID = 10. The PFE-ID generation is Juniper condential information and beyond the scope of this book. However, the port-ID shown in the output of show route fabric table default.bridge.0 would always be 17 more than the actual port-number of the ingress Node in case when QFX 3500s are being used as the Nodes. In this example, the MAC address was learned on xe-0/0/8 on the RSNG Node. This means that the port-ID being shown on the NW-NG should be 8 + 17 = 25. This is exactly the information that we saw in the output of show route fabric default.bridge.0 earlier.Chapter 3:Control Plane and Data Plane Flows53Show Command Snippets for Layer 3 Routes Layer 3 routes are also propagated similar to Layer 2 routes. The only difference is that the table is named bgp.l3vpn.0. As discussed, its the routing instance that decides whether a Layer 3 route should be sent to a Node device or not. Lets look at the CLI snippets to verify the details:root@TEST-QFABRIC# run request component login NW-NG-0Warning: Permanently added 'dcfnode-default---nw-ine-0,169.254.192.34' (RSA) to the list of known hosts.Password:--- JUNOS 13.1I20130618_0737_dc-builder built 2013-06-18 08:51:07 UTCAt least one package installed on this device has limited support.Run 'file show /etc/notices/unsupported.txt' for details.{master}qfabric-admin@NW-NG-0> show route protocol directinet.0: 95 destinations, 95 routes (95 active, 0 holddown, 0 hidden)Restart Complete+ = Active Route, - = Last Active, * = Both172.17.106.128/30*[Direct/0] 6d 08:38:41> via ae4.0 show vlans 29error: vlan with tag 29 does not exist{master}qfabric-admin@RSNG0>4. At this point in time, the NW-NG-0s default.fabric.0 table does not contain only local information:qfabric-admin@NW-NG-0> show fabric summaryAutonomous System: 100INE Id : 128.0.128.4INE Type : NetworkSimulation Mode: SI{master}qfabric-admin@NW-NG-0> ...0 fabric-route-type mcast-routes l2domain-id 5default.fabric.0: 88 destinations, 92 routes (88 active, 0 holddown, 0 hidden)Restart Complete56This Week: QFabric System Trafc Flows and Troubleshooting+ = Active Route, - = Last Active, * = Both5.ff:ff:ff:ff:ff:ff:128.0.128.4:128:000006c3(L2D_PORT)/184 *[Fabric/40] 11w1d 02:59:50> to 128.0.128.4:128(NE_PORT) via ae0.0, Layer 2 Fabric Label 17315.ff:ff:ff:ff:ff:ff:128.0.128.4:162:000006d3(L2D_PORT)/184 *[Fabric/40] 6w3d 09:03:04> to 128.0.128.4:162(NE_PORT) via ae34.0, Layer 2 Fabric Label 17475.ff:ff:ff:ff:ff:ff:128.0.128.4:164:000006d1(L2D_PORT)/184 *[Fabric/40] 11w1d 02:59:50> to 128.0.128.4:164(NE_PORT) via ae36.0, Layer 2 Fabric Label 17455.ff:ff:ff:ff:ff:ff:128.0.128.4:166:000006d5(L2D_PORT)/184 *[Fabric/40] 11w1d 02:59:50> to 128.0.128.4:166(NE_PORT) via ae38.0, Layer 2 Fabric Label 1749{master}The command executed above is show route fabric table default.fabric.0 fabric-route-type mcast-routes l2domain-id 5.5. The user congures a port on the Node group named RSNG0 in vlan.29. After this, RSNG0 started displaying the details for vlan.29:root@TEST-QFABRIC# ...hernet-switching port-mode trunk vlan members 29[edit]root@TEST-QFABRIC# commitcommit complete[edit]root@TEST-QFABRIC# show | compare rollback 1[edit interfaces]+ P7122-C:xe-0/0/9 {+ unit 0 {+ family ethernet-switching {+ port-mode trunk;+ vlan {+ members 29;+ }+ }+ }+ }[edit]qfabric-admin@RSNG0> show vlans 29 extensiveVLAN: vlan.29---qfabric, Created at: Thu Feb 27 13:16:39 2014802.1Q Tag: 29, Internal index: 7, Admin State: Enabled, Origin: StaticProtocol: Port Mode, Mac aging time: 300 secondsNumber of interfaces: Tagged 1 (Active = 1), Untagged0 (Active = 0)xe-1/0/9.0*, tagged, trunk{master} 6. Since RSNG0 is now subscribing to vlan.29, this information should be sent over to the NW-NG-0 VM. Here is what the default.fabric.0 table of contents looks like at RSNG0:qfabric-admin@RSNG0> show fabric summaryAutonomous System: 100INE Id : 128.0.130.6INE Type : ServerSimulation Mode: SI{master}qfabric-admin@RSNG0> ...c.0 fabric-route-type mcast-routes l2domain-id 5default.fabric.0: 30 destinations, 54 routes (30 active, 0 holddown, 0 hidden)Restart Complete+ = Active Route, - = Last Active, * = Both5.ff:ff:ff:ff:ff:ff:128.0.130.6:49174:000006c1(L2D_PORT)/184 *[Fabric/40] 00:04:59Chapter 3:Control Plane and Data Plane Flows57> to 128.0.130.6:49174(NE_PORT) via xe-1/0/9.0, Layer 2 Fabric Label 1729{master}This route is then sent over to NW-NG-0 via the Fabric Control VM.7. NW-NG-0 receives the route from RSNG0 and updates its default.fabric.0 table:qfabric-admin@NW-NG-0> ...ic-route-type mcast-routes l2domain-id 5--snip5.ff:ff:ff:ff:ff:ff:128.0.130.6:49174:000006c1(L2D_PORT)/184 *[BGP/170] 00:07:34, localpref 100, from 128.0.128.6 show fabric multicast root vlan-group-pfe-mapL2 domainGroup Flag PFE map Mrouter PFE map22.255.255.255.255 61A00/30/055.255.255.255.255 61A00/30/0--snip--Check the entry corresponding to the L2Domain-ID for the corresponding VLAN. In this case, the L2Domain-ID for vlan.29 is 5.9. The NW-NG-0 VM creates a Multicast Core-Key for the PFE-map (4101 in this case):qfabric-admin@NW-NG-0> ...multicast root layer2-group-membership-entriesGroup Membership Entries:--snip--L2 domain: 5Group:Source:5.255.255.255.255Multicast key: 4101Packet Forwarding map: 1A00/3--snip--The command used here was show fabric multicast root layer2-group-member-ship-entries. This command is only available in Junos 13.1 and higher. In earlier versions of Junos the show fabric multicast root map-to-core-key command can be used to obtain the Multicast Core Key number.10. The NW-NG-0 VM sends this Multicast Core Key to all the Nodes and Interconnects via the Fabric Control VM. This information is placed in the default.fabric.0 table. Note that this table is only used to store the Core Key information and is not used to actually forward data trafc. Note that the next hop is 128.0.128.4, which is the IP address of the NW-NG-0 VM.qfabric-admin@RSNG0> show route fabric table default.fabric-route-type mcast-member-map-key 4101default.fabric.0: 30 destinations, 54 routes (30 active, 0 holddown, 0 hidden)Restart Complete+ = Active Route, - = Last Active, * = Both4101:7(L2MCAST_MBR_MAP)/184 *[BGP/170] 00:35:12, localpref 100, from 128.0.128.6AS path: I, validation-state: unverified> to 128.0.128.4 via dcfabric.0, PFE Id 27 Port Id 27[BGP/170] 00:35:12, localpref 100, from 128.0.128.8AS path: I, validation-state: unverified> to 128.0.128.4 via dcfabric.0, PFE Id 27 Port Id 2758This Week: QFabric System Trafc Flows and Troubleshooting11. The NW-NG-0 VM sends out a broadcast route for the corresponding VLAN to all the Nodes and the Interconnects. The next hop for this route is set to the Multicast Core Key number. This route is placed in the default.bridge.0 table and is used to forward and ood the data trafc. The Nodes and Interconnects will install this route only if they have information for the Multicast Core Key in their default.fabric.0 table. In this example, note that the next hop contains the information for the Multicast Core Key as well:qfabric-admin@RSNG0> show route fabric table default.bridge.0 l2domain-id 5--snip--5.ff:ff:ff:ff:ff:ff/88 *[BGP/170] 00:38:13, localpref 100, from 128.0.128.6AS path: I, validation-state: unverified> to 128.0.128.4:57005(NE_PORT) via dcfabric.0, MultiCast - Corekey:4101 Keylen:7[BGP/170] 00:38:13, localpref 100, from 128.0.128.8AS path: I, validation-state: unverified> to 128.0.128.4:57005(NE_PORT) via dcfabric.0, MultiCast - Corekey:4101 Keylen:7The eleven steps mentioned here are a deep dive into how the Nodes of a QFabric system subscribe to a given VLAN. The aim of this technology is to make sure that all the Nodes and the Interconnects have consistent information regarding which Nodes subscribe to a specic VLAN. This information is critical to ensuring that there is no excessive ooding within the data plane of a QFabric system. At any point in time, there may be multiple Nodes that subscribe to a VLAN, raising the question of where a QFabric system should replicate BUM trafc. QFabric systems replicate BUM trafc at the following places: Ingress Node: Replication takes place only if: There are any local ports in the VLAN where BUM trafc was received. BUM trafc is replicated and sent out on the server facing ports. There are any remote Nodes that subscribe to the VLAN in question. BUM trafc is replicated and sent out towards these specic Nodes over the 40GbE FTE ports. Interconnects: Replication takes place if there are any directly connected Nodes that subscribe to the given VLAN. Egress Node: Replication takes place only if there are any local ports that are active in the given VLAN.Differences Between Control Plane Trafc and Internal Control Plane TrafcMost of this chapter has discussed the various control plane characteristics of the QFabric system and how the routes are propagated from one RE to another. With this background, note the following functions that a QFabric system has to perform to operate:Chapter 3:Control Plane and Data Plane Flows59 Control plane tasks: form adjacencies with other networking devices, learn Layer 2 and Layer 3 routes Data plane tasks: forward data end-to-end Internal control plane tasks: discover Nodes and Interconnects, maintain VCCPD, VCCPDf adjacencies, health of VMs, exchange routes within the QFabric system to enable communication between hosts connected on differ-ent NodesThe third bullet here makes QFabric a special system. All the control plane trafc that is used for the internal workings of the QFabric system is referred to as internal control plane trafc. And the last pages of this chapter are dedicated to bringing out

Documents

TW QFabric TrafficFlows