Upload
fossnow
View
357
Download
12
Tags:
Embed Size (px)
Citation preview
Cisco Systems Advanced Services
Telecom Montenegro MPLS/VPN Network (Mipnet)
Version 1.1
Corporate HeadquartersCisco Systems, Inc.170 West Tasman DriveSan Jose, CA 95134-1706USAhttp://www.cisco.comTel: 408 526-4000
800 553-NETS (6387)Fax: 408 526-4100
THE SPECIFICATIONS AND INFORMATION REGARDING THE PRODUCTS IN THIS MANUAL ARE SUBJECT TO CHANGE WITHOUT NOTICE. ALL STATEMENTS, INFORMATION, AND RECOMMENDATIONS IN THIS MANUAL ARE BELIEVED TO BE ACCURATE BUT ARE PRESENTED WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED. USERS MUST TAKE FULL RESPONSIBILITY FOR THEIR APPLICATION OF ANY PRODUCTS.
THE SOFTWARE LICENSE AND LIMITED WARRANTY FOR THE ACCOMPANYING PRODUCT ARE SET FORTH IN THE INFORMATION PACKET THAT SHIPPED WITH THE PRODUCT AND ARE INCORPORATED HEREIN BY THIS REFERENCE. IF YOU ARE UNABLE TO LOCATE THE SOFTWARE LICENSE OR LIMITED WARRANTY, CONTACT YOUR CISCO REPRESENTATIVE FOR A COPY.
The following information is for FCC compliance of Class A devices: This equipment has been tested and found to comply with the limits for a Class A digital device, pursuant to part 15 of the FCC rules. These limits are designed to provide reasonable protection against harmful interference when the equipment is operated in a commercial environment. This equipment generates, uses, and can radiate radio-frequency energy and, if not installed and used in accordance with the instruction manual, may cause harmful interference to radio communications. Operation of this equipment in a residential area is likely to cause harmful interference, in which case users will be required to correct the interference at their own expense.
The following information is for FCC compliance of Class B devices: The equipment described in this manual generates and may radiate radio-frequency energy. If it is not installed in accordance with Cisco’s installation instructions, it may cause interference with radio and television reception. This equipment has been tested and found to comply with the limits for a Class B digital device in accordance with the specifications in part 15 of the FCC rules. These specifications are designed to provide reasonable protection against such interference in a residential installation. However, there is no guarantee that interference will not occur in a particular installation.
You can determine whether your equipment is causing interference by turning it off. If the interference stops, it was probably caused by the Cisco equipment or one of its peripheral devices. If the equipment causes interference to radio or television reception, try to correct the interference by using one or more of the following measures:
Turn the television or radio antenna until the interference stops.
Move the equipment to one side or the other of the television or radio.
Move the equipment farther away from the television or radio.
Plug the equipment into an outlet that is on a different circuit from the television or radio. (That is, make certain the equipment and the television or radio are on circuits controlled by different circuit breakers or fuses.)
Modifications to this product not authorized by Cisco Systems, Inc. could void the FCC approval and negate your authority to operate the product.
The following third-party software may be included with your product and will be subject to the software license agreement:
CiscoWorks software and documentation are based in part on HP OpenView under license from the Hewlett-Packard Company. HP OpenView is a trademark of the Hewlett-Packard Company. Copyright Ó 1992, 1993 Hewlett-Packard Company.
The Cisco implementation of TCP header compression is an adaptation of a program developed by the University of California, Berkeley (UCB) as part of UCB’s public domain version of the UNIX operating system. All rights reserved. Copyright Ó 1981, Regents of the University of California.
Network Time Protocol (NTP). Copyright Ó 1992, David L. Mills. The University of Delaware makes no representations about the suitability of this software for any purpose.
Point-to-Point Protocol. Copyright Ó 1989, Carnegie-Mellon University. All rights reserved. The name of the University may not be used to endorse or promote products derived from this software without specific prior written permission.
The Cisco implementation of TN3270 is an adaptation of the TN3270, curses, and termcap programs developed by the University of California, Berkeley (UCB) as part of the UCB’s public domain version of the UNIX operating system. All rights reserved. Copyright Ó 1981-1988, Regents of the University of California.
Cisco incorporates Fastmac and TrueView software and the RingRunner chip in some Token Ring products. Fastmac software is licensed to Cisco by Madge Networks Limited, and the RingRunner chip is licensed to Cisco by Madge NV. Fastmac, RingRunner, and TrueView are trademarks and in some jurisdictions registered trademarks of Madge Networks Limited. Copyright Ó 1995, Madge Networks Limited. All rights reserved.
Xremote is a trademark of Network Computing Devices, Inc. Copyright Ó 1989, Network Computing Devices, Inc., Mountain View, California. NCD makes no representations about the suitability of this software for any purpose.
The X Window System is a trademark of the X Consortium, Cambridge, Massachusetts. All rights reserved.
NOTWITHSTANDING ANY OTHER WARRANTY HEREIN, ALL DOCUMENT FILES AND SOFTWARE OF THESE SUPPLIERS ARE PROVIDED “AS IS” WITH ALL FAULTS. CISCO AND THE ABOVE-NAMED SUPPLIERS DISCLAIM ALL WARRANTIES, EXPRESSED OR IMPLIED, INCLUDING, WITHOUT LIMITATION, THOSE OF MERCHANTABILITY, FITNESS FOR A PRACTICAL PURPOSE AND NONINFRINGEMENT OR ARISING FROM A COURSE OF DEALING, USAGE, OR TRADE PRACTICE.
IN NO EVENT SHALL CISCO OR ITS SUPPLIERS BE LIABLE FOR ANY INDIRECT, SPECIAL, CONSEQUENTIAL, OR INCIDENTAL DAMAGES, INCLUDING, WITHOUT LIMITATION, LOST PROFITS OR LOSS OR DAMAGE TO DATA ARISING OUT OF THE USE OR INABILITY TO USE THIS MANUAL, EVEN IF CISCO OR ITS SUPPLIERS HAVE BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.
AccessPath, AtmDirector, Browse with Me, CCIP, CCSI, CD-PAC, CiscoLink, the Cisco Powered Network logo, Cisco Systems Networking Academy, the Cisco Systems Networking Academy logo, Cisco Unity, Fast Step, Follow Me Browsing, FormShare, FrameShare, IGX, Internet Quotient, IP/VC, iQ Breakthrough, iQ Expertise, iQ FastTrack, the iQ logo, iQ Net Readiness Scorecard, MGX, the Networkers logo, ScriptBuilder, ScriptShare, SMARTnet, TransPath, Voice LAN, Wavelength Router, and WebViewer, Aironet, ASIST, BPX, Catalyst, CCDA, CCDP, CCIE, CCNA, CCNP, Cisco, the Cisco Certified Internetwork Expert logo, Cisco IOS, the Cisco IOS logo, Cisco Press, Cisco Systems, Cisco Systems Capital, the Cisco Systems logo, Empowering the Internet Generation, Enterprise/Solver, EtherChannel, EtherSwitch, FastHub, FastSwitch, GigaStack, IOS, IP/TV, LightStream, MICA, Network Registrar, Packet, PIX, Post-Routing, Pre-Routing, RateMUX, Registrar, SlideCast, StrataView Plus, Stratm, SwitchProbe, TeleRouter, and VCO are trademarks or registered trademarks of Cisco Systems, Inc. and/or its affiliates in the U.S. and certain other countries.
All other trademarks mentioned in this document or Web site are the property of their respective owners. The use of the word partner does not imply a partnership relationship between Cisco and any other company. (0110R).
Please refer to http://www.cisco.com/logo/ for the latest information on Cisco logos, branding and trademarks.
INTELLECTUAL PROPERTY RIGHTS:
THIS DOCUMENT CONTAINS VALUABLE TRADE SECRETS AND CONFIDENTIAL INFORMATION OF CISCO SYSTEMS, INC. AND IT’S SUPPLIERS, AND SHALL NOT BE DISCLOSED TO ANY PERSON, ORGANIZATION, OR ENTITY UNLESS SUCH DISCLOSURE IS SUBJECT TO THE PROVISIONS OF A WRITTEN NON-DISCLOSURE AND PROPRIETARY RIGHTS AGREEMENT OR INTELLECTUAL PROPERTY LICENSE AGREEMENT APPROVED BY CISCO SYSTEMS, INC. THE DISTRIBUTION OF THIS DOCUMENT DOES NOT GRANT ANY LICENSE IN OR RIGHTS, IN WHOLE OR IN PART, TO THE CONTENT, THE PRODUCT(S), TECHNOLOGY OF INTELLECTUAL PROPERTY DESCRIBED HEREIN.
Copyright Ó 2003, Cisco Systems, Inc.All rights reserved.COMMERCIAL IN CONFIDENCE.A PRINTED COPY OF THIS DOCUMENT IS CONSIDERED UNCONTROLLED.
1 Contents
Contents 3
Tables 11
Figures 14
About This Low Level Design 19
History 19
Review 20
About This Design Document 22
Document Purpose 22
Intended Audience 22
Scope 23
Document Usage Guidelines 23
Assumptions and Caveats 24
Overview 25
Network Topology 25
Design Considerations 26
MPLS Network Architecture 26
MPLS/VPN 26
Internet Transport 27
QoS 27
Network Architecture 30
Naming and Addressing Specifications 30
BGP AS Number 30
Naming Conventions 30
Mipnet Routers (P, PE, MCE, MPE, iGW) 30
Customer Edge (CE) Routers 32
DNS Domain Name 32
IP Addressing Scheme 32
Loopbacks 33
Backbone Links 34
NOC Links and Hosts 36
Access CE-PE Connections 36
Cat3550s and 1760s Vlan99 36
Physical Network Design 36
Physical Connectivity in Mipnet 36
Core PoPs 37
Bar 37
Bjelo Polje 37
Podgorica TKC 38
Podgorica MTKC 39
Regional PoPs 39
Podgorica MAN 40
Access Layer (Customer Edge Devices) 41
Logical Network Design 42
IGP Routing – IS-IS 42
The role of ISIS in Mipnet 42
IS-IS Overview 42
Network Entity Title (NET) – The CLNS Address 43
IS-IS Areas and Summarization in Mipnet 44
IS-IS Authentication 45
Loopback Addresses 47
IS-IS Metrics 47
Default Routes 48
Timers and Advanced IS-IS Features 48
IS-IS Configuration Template 49
Cisco Express Forwarding – CEF 50
Forwarding Information Base – FIB 50
Adjacency Tables 51
CEF in the Mipnet 52
Multi Protocol Label Switching – MPLS 52
Overview 52
LDP Authentication 54
TDP/LDP & CEF interaction 55
MPLS Design Rules in Mipnet 56
MPLS Configuration Template 56
Dial 56
HW/SW Release Table 57
Network Services 59
MPLS/VPN 59
MPLS-VPN 59
How does it work? 59
Data Forwarding 60
VRF, RD and RT 61
VRF 61
RD 62
RT 63
MP-iBGP Support for MPLS/VPN 64
MP-iBGP and address families 64
iBGP Timers 64
Route Reflectors 65
Overview 65
RR Topology in Mipnet 67
Peer-groups 68
MP-iBGP authentication 69
MP-iBGP Design Rules in Mipnet 70
MPLS/VPN Topologies 70
Any-to-any VPN (Full Mesh) 70
Hub and Spoke VPN – No Connectivity between Spokes 72
Hub and Spoke VPN – Connectivity between Spokes via Hub 73
Inter-VPN (Extranet) 74
MPLS/VPN Access Layer 75
Addressing between VPN-PE and CE 75
CE-PE Connectivity Scenarios 75
Routing protocols between PE and CE 76
Static Routes 76
eBGP 77
RIPv2 79
OSPF 81
Multi-Homed Sites 83
Single CE in Customer Site 83
Multiple CEs in Customer Site 84
Internet Access for MPLS/VPN customers 85
Overview 85
Two CEs – Two Physical Links 86
Multi-VRF CE (Single Physical Link) 87
Network Address Translation for MPLS/VPN customers 91
Inter Provider (aka. Inter-AS) MPLS/VPNs 93
Service Model 93
Inter-AS Implementation Details 94
Configuration Template 95
Operations and Security 96
Hierarchical VPNs (CsC – Carrier-supporting-Carriers) 97
Service Model 97
Design Rules 99
Security and Operation 99
Configuration Template 100
Multicast in the MPLS/VPNs 102
Multicast VRF 102
Multicast Tunnels 102
RPF 103
Forwarding 104
Multicast VPN Basic Configuration 105
Step 1. Enable Global Multicast routing 105
Step 2. Enable VRF instance Multicast routing 106
Step 3. Configure mVRF multicast parameters 106
Step 4. Configure the Default MDT 106
Step 5. Configure the Data MDT 106
Source Specific Multicast 106
mVPN Extranet 107
Internet Transport 108
Operational Overview 108
BGP communities 110
Advertisement Control. 110
Routing Details 112
Routing with Upstream Providers 112
Profile #1: Statically routed Internet customer 113
Profile #2: BGP routed customer with default route 114
Profile #3: BGP routed customer with full Internet routing table 115
Routing with Peering Partners 116
QoS 117
Introduction 117
Differentiated Services Model – Introduction 118
Default PHB 119
Class-Selector PHB: 119
Assured Forwarding PHB 120
Expedited Forwarding PHB 121
QoS and VoIP 122
Interleaving mechanisms: FRF.12 or MLPPP / LFI 125
Delay Model 126
QoS in an MPLS network 128
Mipnet QoS design – An Overview 129
CE-to-PE QoS mechanisms (applied on the CE) 130
Classification 130
Marking 132
Policing 134
Class Queuing 135
Congestion avoidance 137
About Random Early Detection 137
WRED design objective in Mipnet 139
Minimum and Maximum Thresholds 140
Drop Probability 143
Exp. Weighting Const 143
CE-to-PE QoS mechanisms (applied on the PE) 145
Classification 145
Marking 146
Policing 146
Unmanaged CEs and Unmanaged Internet CPEs 146
Service without QoS 146
Customer configures QoS on the CE router 147
Dial Customers 148
PE-to-P QoS mechanisms (applied on the PE) 148
Classification 148
Marking 148
Class queuing 148
Congestion avoidance 149
PE-P, P-P and P-PE QoS mechanisms for 12000s (applied on the P) 149
Class Queuing (MDRR) 149
MDRR queuing operation 150
MDRR configuration guide for Mipnet 151
Congestion management 153
Exponential weighting constant 153
Policing of Voice class with WRED 153
WRED Configuration 153
PE to CE QoS mechanisms (applied on the PE) 154
Classification 154
Class queuing 154
Congestion avoidance 155
QoS mechanisms on Frame Relay DLCIs 155
Non-distributed platforms 155
7500 VIP-based platforms 156
Dial 157
Dial to MPLS-VPN Infrastructure 157
Integration into MipNet’s Topology 158
Physical Topology and MTU Setting 158
Logical Topology 159
Routing 160
L2TP Setup 160
PPP Multilink and LNS redundancy 161
Addressing 162
Authentication, Authorization and Accounting 163
Provisioning Dial Customers with ISC 165
Adding New Customer VRF 165
Adding User Dial-in Accounts to existing Customer VRF 165
Adding CE Dial-backup Accounts to existing Customer VRF 166
Advanced Features 166
IPsec Access to MPLS/VPNs 166
Overview 166
VRF Aware IKE/IPsec 167
Configuration Examples 168
IPSec Remote Access-to-MPLS VPN Example 168
Static IPSec-to-MPLS VPN Example 169
Network Security and Filtering 173
PE-CE Routing Protocols Security - Summary 173
BGP Community Anti-Spoofing filters 173
BGP damping on iGWs (RIPE-229) 174
What is route-flap damping? 174
Route-flap damping implementation in Cisco IOS 175
Filtering of BGP Updates 177
Prefix Filtering 177
AS_PATH Filtering 177
Policing of ICMP traffic on border Internet links 178
SMURF attacks 178
DSCP Spoofing 179
SNMP 179
Password Management 180
Console Ports 181
Controlling TTY’s 181
Controlling VTYs and Ensuring VTY Availability 182
Logging 182
Saving logging information 183
Recording Access List Violations 183
Anti-spoofing 184
Anti-spoofing with packet filters 184
Turbo ACLs 185
Anti-spoofing with RPF checks 186
Controlling Directed Broadcasts 186
IP Source Routing 187
ICMP Redirects 187
Switching Modes and Cisco Express Forwarding 187
Scheduler Configuration 188
Last-Resort Routing to the Null Device 188
TCP and UDP “Small Services” 188
Finger 189
CDP 189
NTP 189
Miscellaneous 190
Global Configuration 190
Interface Configuration 191
Configuration Templates 194
NOC – Network Operations Centre 195
Physical Connectivty 195
Logical Design 196
Interconnection with MPLS Core 196
Management VPN 196
NOC LAN - Outbound Routing 197
NOC LAN - Inbound Routing 197
IP Addressing 198
NOC VLANs 198
Appendix I 200
Appendix II 204
Appendix III 205
Appendix IV 206
Glossary of Terms 206
2 Tables
Table 1 Revision History 19
Table 2 Revision Review 20
Table 3 PoP Codes 31
Table 4 Mipnet IP Addressing – Loopback Interfaces 33
Table 5 Mipnet IP Addressing – Backbone Links 34
Table 6 Proposed IS-IS Metrics 48
Table 7 Default value for iBGP timers 65
Table 8 iBGP Split Horizon Rules 66
Table 9 BGP Community Scheme 111
Table 10 LOC_PREF settings 112
Table 11 Class-Selector PHBs 120
Table 12 Serialisation delay [ms] as function of link speed and packet size 123
Table 13 Recommended fragment size 126
Table 14 The components of the end-to-end delay model 126
Table 15 CoS Mechanisms Overview 129
Table 16 NB and EB settings 134
Table 17 WRED Settings for Business Class. 141
Table 18 WRED Settings for Streaming Class. 142
Table 19 WRED Settings for Standard Class. 142
Table 20 WRED - exponential weighting constant 144
Table 21 MDRR weights 151
Table 22 Defaul BGP damping parameters 175
3 Figures
Figure 1 Mipnet Network Topology 25
Figure 2 Mipnet IP Addressing – An Overview 33
Figure 3 Core PoP – Bar 37
Figure 4 Core PoP – Bjelo Polje 37
Figure 5 Core PoP – Podgorica TKC 38
Figure 6 Core PoP – Podgorica MTKC 39
Figure 7 A typical Regional PoP 39
Figure 8 Podgorica MAN 40
Figure 9 Non-resilient 3550-attached VRFs 41
Figure 10 Protection against P failure in Podgorica 41
Figure 11 NSAP Address Format 43
Figure 12 Sample FIB Entry 50
Figure 13 Adjacency table 51
Figure 14 MPLS Header 52
Figure 15 Overview of Label Switching using MPLS 53
Figure 16 MPLS-VPN Network 60
Figure 17 Data Forwarding in an MPLS-VPN Network 61
Figure 18 RD encoding options 62
Figure 19 RT encoding options 63
Figure 20 Best-practice RR design in MPLS/VPN Networks 67
Figure 21 Initial RR Topology in Mipnet 67
Figure 22 Any-To-Any VPN Model 71
Figure 23 Hub and Spoke – No Connectivity between Spokes 72
Figure 24 Hub and Spoke Model – Connectivity between Spokes via Hub 73
Figure 25 Access-layer Topologies 75
Figure 26 OSPF on PE-CE link 81
Figure 27 Fully Redundant Access Scenario (2CEs-2PEs) 84
Figure 28 Internet Access from a VPN using separate CEs and two physical links 86
Figure 29 Multiple CEs 87
Figure 30 Multi-VRF CE 88
Figure 31 Internet Access from a VPN – Multi-VRF CE 88
Figure 32 NAT in CE router 92
Figure 33 Inter-AS Service Model 93
Figure 34 MP-eBGP – VPN route and label propagation 94
Figure 35 MP-eBGP – Packet forwarding 95
Figure 36 CsC Operational Model 98
Figure 37 CsC – Control Plane 98
Figure 38 mVPN Extranet 107
Figure 39 mDC in Mipnet 108
Figure 40 BGP community colouring 110
Figure 41 Profile #1: Statically routed Internet customer 113
Figure 42 Profile #2: BGP routed customer with default route 115
Figure 43 Profile #3: BGP routed customer with full Internet routing table 116
Figure 44 Various interpretations of the TOS field 118
Figure 45 DSCP Interpretation 121
Figure 46 Adaptive jitter buffer 124
Figure 47 - Call admission control 124
Figure 48 LFI to reduce frame delay and jitter 125
Figure 49 Overview of end-to-end delay segments. 127
Figure 50 DSCP to EXP mapping 128
Figure 51 DSCP / MPLS Headers 128
Figure 52 QoS mechanisms overview 130
Figure 53 In/Out-contract Marking and Policing (example for Business class) 133
Figure 54 WRED Algorithm 138
Figure 55 Dial to MPLS-VPN Architecture 158
Figure 56 AS5350 Connection - Physical Topology 159
Figure 57 AS5350 Connection - Logical Topology 159
Figure 58 IPSec to MPLS VPNs (Single box) 167
Figure 59 VRF-aware IPsec 168
Figure 60 Community spoofing example for BGP customers 174
Figure 61 Community spoofing example for transit ISPs and peering partners 174
Figure 62 Prefix-list filtering of customer routes 177
Figure 63 AS_PATH filtering on customer eBGP sessions 177
Figure 64 NOC – Physical Topology 195
Figure 65 NOC – Routing Setup 197
Figure 66 NOC – VLANs 198
4 About This
Author: Valentin Lisjak, CCIE #2041
Oliver Boehmer (dial)
Cisco Systems, Inc.
Change Authority: Advanced Services
Reference Number: < EDCS or other document reference number, this LLD template is EDCS-157549>
4.1 History
Table 1 Revision History
Version No.
Issue Date Status Reason for Change
0.1 5-Sep-2003 Draft First draft
0.2 19-Sep-2003 2nd Draft p.25 - Updated overall network topology drawing
p.30 - Added naming convention for as5350
p.30 - BGP AS number defined by TMN
p.30 - PoP codes defined by TMN
p.32 - Included IP addressing scheme
p.37 - Core PoP and MAN figures reflecting the network topology
p.37 - Included logical topology of Mipnet PoPs for easier understanding
p.50 - Included IS-IS flooding optimisation
p.50 - Included IS-IS “no hello padding”
p.192 – Included “IPsec” section
p.201 - Included Network security section
p.231 - Moved IS-IS convergence tuning to Appendix I, because IS-IS timers will not be tuned in initial Mipnet deployment.
1.0 1st release Updates:
p.181 - Updated Dial design
Included the following chapters:
p.104 - Inter Provider (aka. Inter-AS) MPLS/VPNs
p.109 – Hierarchical MPLS/VPNs (CsC)
p.115 – Multicast in the MPLS/VPNs
p.226 - NOC
Version No.
Issue Date Status Reason for Change
1.1 19-Apr-2004 Updated version after completion of NRFU tests
p.37 – Inlcuded VLAN99 address block
p. 41 – Removed bi-connected VRFs for 3550-attached customers (not available in ISC)
p. 42 – Backup link between PEs in Podgorica to protect against P failure (single point of failure for statically routed customers behind MAN 3550s)
p.44 – TMN changed NET coding rule
p.79 – Multi-homed HUB site.
p.193 – Firewall currently not installed between IPsec PE and the Internet.
p.183 – MTU can not be adjusted, which may imply fragmentation of L2TP traffic.
p.227 – 3550-PE links (vlan99) are also routed in global RT via ISIS. Updated Figure 7
4.2 Review
Table 2 Revision Review
Reviewer’s Details Version No. Date
<Name>
<Organization>
<Version number> <dd-mmm-yyyy>
Change Forecast: Medium
This document will be kept under revision control.
A printed copy of this document is considered uncontrolled.
5 About This Design Document
5.1 Document Purpose
The purpose of this document is to outline the Cisco Systems recommended Low Level Design (LLD) for Telecom Montenegro (TMN) MPLS/VPN network and services. It details the physical and logical requirements and how we will accomplish these requirements.
The document provides sufficient detail to derive the device configurations that will be used during the subsequent deployment and testing phases. Some configuration parameters may be determined during network deployment.
The content of this LLD is structured in the following main sections:
Network Architecture
o Naming and addressing rules
o Physical design
o Logical design
Network Services
o MPLS/VPNs
o Internet transport
o QoS
o Dial
Network Security
Configuration Templates
5.2 Intended Audience
TMN Engineering and Operations
Cisco Systems Project Team
Cisco Systems TMN Account Team
5.3 Scope
The following project scope has been agreed by involved parties.
Service Description Witnessed* Staging Test
Implemented in Network
NRFU Test**
Intranet MPLS VPN (including the ISP transport)
NO YES YES
Extranet MPLS VPN NO YES YES
QoS Enabled MPLS VPN NO YES YES
Internet access from VPNs NO YES YES
Dialup access to MPLS VPN NO YES YES
IPsec access to MPLS VPN NO YES YES
ISP Selection (Simulation*) NO** YES *** YES**
Inter-AS VPN Services (Simulation*) NO** YES *** YES**
Multicast in VPNs (Simulation*) NO** YES *** YES**
Carrier-supporting Carrier (hierarchical VPN) (Simulation*)
YES**** NO NO
* Tests can only be simulated, as there is e.g. no other AS/ISP or customer available at the moment for that service.
** We perform NRFU Tests during staging. This will still be a simulation due to the lack of a real AS/ISP, but it will be planned agreed and documented as NRFU tests. The results will be shown in the NRFU document that will be crafted later in the project. During the “normal” NRFU Tests these tests will then be skipped, as they were already tested. For multicast tests TMN will provide multicast source and destination.
*** In case no ISP/AS or multicast customer is defined there is no opportunity to implement this configuration because specific parameters like AS number, community scheme or IP addressing are missing. A “Dummy Configuration” will be implemented only if this does not cause problems (e.g. errors messages in NMS) in the network. Configuration templates will be attached to the LLD.
**** Basic functionality will be tested.
5.4 Document Usage Guidelines
The document should be used as a guideline for deriving the necessary information to ultimately create the configurations that allow the network to provide the necessary services. Consequently this LLD document covers the following areas:
Customer Requirements
Generic Content
Best Practice Guidance
TMN Specific Content.
The more theoretical sections should be used in conjunction with the practical sections in order to allow the deployment engineer to understand the service requirements behind the configurations. This will also allow the deployment engineer to take certain decisions when deploying and configuring the network.
As long as the LLD document is in a draft format, it is susceptible to modifications and additions initiated by Cisco Systems.
After acceptance of the LLD by the customer, the LLD document is still a living document that will be updated by experiences gained throughout the deployment and testing phases.
5.5 Assumptions and Caveats
Configuration templates will be included in this LLD document after the completition of staging tests.
6 Overview
6.1 Network Topology
Figure 1 Mipnet Network Topology
BR 350
Cat 3550-24
Cat 3550-24
InternetTKC
MTKC
Ulcinj7206 VXR
Pljev lja
Mojkov ac
Pluzine
Kolasin
Niksic
Zabljak
Sav nik
Danilov grad
Cetinje
Budv a
Kotor
Herceg Nov i
Tiv at
Berane
Rozaje
Plav
Andrijev ica
Bjelo Pole
Bar
Cat 3550-24
AS 5350
ISC 3.0CW
Cat 3550-24
Cat 3550-24
Cat 3550-24 Cat 3550-24
Cat 3550-24
Cat 6509
Cat 6509
Cat 3550-24Cat 3550-24
PIX 515PIX 515
Cat 3550-24
Cat 3550-24
Cat 3550-24
Cat 3550-24
Cat 3550-24
Cat 3550-24
7206 VXR
7206 VXR
7206 VXR
7206 VXR
7206 VXR
7206 VXR7206 VXR
7206 VXR
7206 VXR Cat 3550-24
Cat 3550-24
7206 VXR
7206 VXR 7206 VXR
7206 VXR
7206 VXR
7206 VXR
7206 VXR7206 VXR
7206 VXR
7206 VXR
Malo Brdo
Tolosi
Vektra
Zagoric
Donja Gorica
Celebic
Masline
Tuzi
Stari Aerodrom
Konik
Zeljeznicka stanica
SPP
1760
1760
1760
1760
1760
1760
1760 1760
1760 1760
1760 1760
BR 350
BR 350
GSR 12406
GSR 12406
GSR 12406
GSR 12406
Cat 3550-24
Cat 3550-24
Cat 3550-24
Cat 3550-24
Cat 3550-24
Cat 3550-24
Cat 3550-24
Cat 3550-24 Cat 3550-24
Cat 3550-24Cat 3550-24
Cat 3550-24
GSR 12406
GSR 12406
CWCIC CIC
BR 350
Cat 3550-24
PL_CAT_1Cat 3550-24
BJPO_CAT_1Cat 3550-24
BJPO_PE_17206VXR
PL_PE_17206VXR
Cat 3550-24
MipNetTelecom Montenegro
v 1.1 08.08.03
STM-4
STM-1
GE
E3
E1
FE
Golubov ci
6.2 Design Considerations
This chapter summarizes the design objectives that have been respected throughout the LLD, and the design rules we have taken to meet these objectives.
6.2.1 MPLS Network Architecture
Fast IGP convergence
Fast convergence and network stability are two orthogonal components in any network design. This is why we have decided not to tune the convergence of routing protocols in initial deployment of Mipnet.
Network Stability and Scalability
Any routing protocol would scale well, if the routing information is stable. For this reason we will:
Offload any customer routes from backbone IGP into BGP.
Aggregate the subnets of dial-up customers with fixed addresses on VPDN tunnel concentrator, and redistribute them as static routes into BGP
Network resilience
Although Mipnet is not fully resilient, the following shall contribute to overall high availability:
Physical and logical design ensures that primary and backup path exists between any two core routers
PEs are in general bi-connected with two P routers. Exception is for example the PE router in Bijelo Polje. PE in Berane is also a single point of failure for PEs and customers in Rozaje, Plav and Andrijevica.
Network security
Cisco will implement best-practices security mechanisms on Mipnet routers to protect the network. Customer security and managed firewall service is not in the scope of this project.
Simplicity
Mipnet design is clean and simple to understand. Any feature or design element that would increase network complexity - but have a limited overall benefit - has been avoided.
MPLS
LDP has been chosen for label distribution in Mipnet. LDP will be enabled on all core links (P-P, P-PE).
6.2.2 MPLS/VPN
Flexible and scalable managed IP VPN serviceAchieved through MPLS technology, properly applied MPLS/VPN functionality and ISC management system1.
Service resilience(partially) Resilient MPLS backbone, redundant route reflectors and the possibility of fully resilient connectivity scenarios on access-layer (2CE-2PE), are necessary building blocks for high availability MPLS/VPN service.
It was TMN decision to deploy single PE router and single PE-P connection in some PoPs. These represent a single-point of failure for Mipnet customers.
1 ISC and NMS design is covered in separate LLD document.
End-to-end Quality of ServiceAchieved through the use of various Diffserv mechanisms: classification, marking, policing, queuing and dropping. QoS will be implemented on access layer and in the backbone. QoS won’t be deployed in Podgorica MAN due to sufficient bandwidth.
Internet Access for MPLS/VPN customers Internet access from the MPLS/VPN will be provided for customers with such requirement. For security reasons we only recommend to implement the Internet connection through a dedicated CE router and dedicated access-layer circuit. Internet connectivity in Mipnet is achived through separate MPLS/VPNs.
SecurityAssuming that MPLS core is secure, and that access links on PE routers have been protected, the MPLS/VPN solution offers same level of security as the traditional layer-2 VPN networks.
6.2.3 Internet Transport
ResilienceInternet transit will be implemented by two PE routers connected with two iGWs of upstream provider.
ScalabilityInternet routes will not installed in MPLS/VPN network. Only customer routes and vrf-specific default routes will be carried in MP-iBGP. Internet transit is achieved through powerful 7206vxr series routers equipped with NPE-G1 processors.
SecurityInternet traffic will be trapped in the MPLS/VPN, which greatly increases the scurity of TMN MPLS core.
6.2.4 QoS
BW guaranteesThe following Classes of Service will be implemented in the Mipnet: Voice, Streaming, Business, Standard. Each class may have different QoS attributes and guaranteed (configured) bandwidth that cannot be utilised by any other class during congestion periods.
Backbone links must be provisioned with sufficient capacity for each of the classes!
Voice strict priority VoIP traffic will be carried in priority queue (LLQ) to reduce the jitter and delay.
FlexibilityModular QoS CLI will allow to map traffic flows of Mipnet customers in one of the Classes of Service. Such classification and marking is extremely flexible (different customers can map different applications in any of the classes), but requires the
understanding of traffic profiles (e.g. SMTP or any other data traffic must not be mixed with delay-sensitive VoIP packets).
Scalable implementationThe customer-specific QoS configuration is implemented on CE routers – QoS configuration template on PE and P devices will remain stable and the same for all TMN customers. ISC shall be used for accurate provisioning of QoS parameters on access (PE-CE) connections.
<SECTION BREAKto avoid header/footer and page setup problems do not remove the carriage return
following this line>
7 Network Architecture
7.1 Naming and Addressing Specifications
7.1.1 BGP AS Number
AS number for Mipnet is 29453.
7.1.2 Naming Conventions
Mipnet Routers (P, PE, MCE, MPE, iGW)
Naming convention for backbone routers is defined as follows:
[POPcode]_[Function]_[ID][_Interface]
POPcode represents the location of router and is detailed in Table 3.
Function identifies the role of the router in the network
o p – Provider label switch router
o pe – Provider Edge router
o mpe – Management PE
o mce – Management CE
o vce – Vrf-Lite CE (1720s in Podgorica MAN)
o irr – Internet Route Reflector
o lac – L2TP Access Concentrator (as5350)
o vrr – MPLS/VPN RR.
o igw – Internet Gateway.
ID will make the router name unique if there are multiple routers of the same type in the city
Interface (optional) reverse DNS mapping shall be populated with interface names for more meaningful traceroute outputs.
Table 3 PoP Codes
PoP PoP DNS Code
Andrijevica an
Bar br
Berane ba
Bijelo Polje bp
Budva bd
Celebic pg_cel
Cetinje Ct
Danilovgrad Dg
Donja Gorica pg_dgo
Golubovci pg_gol
Herceg Novi Hn
Kolasin Kl
Konik pg_kon
Kotor Ko
Malo Brdo pg_mbr
Masline pg_mas
Mojkovac Mo
Niksic Nk
Plav Pl
Pljevlja Pv
Pluzine Pu
Podgorica MTKC Mtkc
Podgorica TKC Tkc
Rozaje Ro
Savnik Sa
SPP pg_spp
Stari Aerodrom pg_sta
Tivat Tv
Tolosi pg_tol
Tuzi pg_tuz
Ulcinj Ul
Vektra pg_vek
Zabljak Za
Zagoric pg_zag
Zeljeznicka Stanica pg_zst
Customer Edge (CE) Routers
Naming convention for CE routers shall in addition to backbone naming scheme incorporate the customer name or customer ID.
[Customer]_[POPcode]_ce_[ID]
Customer is the customer abbreviated name (or any other customer identification that will be meaningful to Mipnet Operations).
POPcode represents the location of CE router and is detailed in Table 3.
ID will make the router name unique if there are multiple CE routers in the same location and for the same customer (e.g. primary and backup CE)
7.1.3 DNS Domain Name
Telekomcg.net
7.1.4 IP Addressing Scheme
The following address block has been registered with RIPE for Mipnet network devices, and will be used for numbering of network links as described in Figure 2, followed by detailed IP addressing tables.
inetnum: 195.140.164.0 - 195.140.167.255netname: Telekom-mplsdescr: mpls network in Telekom Montenegrocountry: CSadmin-c: SL1324-RIPEtech-c: zd24-ripestatus: ASSIGNED PImnt-by: RIPE-NCC-HM-PI-MNTmnt-by: as8585-mntmnt-lower: RIPE-NCC-HM-PI-MNTmnt-routes: as8585-mntnotify: [email protected]: [email protected] 20030904source: RIPE
The important design decision is to number all router interfaces with IP addresses from a contiguous address block. This would permit easier packet filtering rules and protection of network elements.
Such segmentation of TMN Mipnet address block will allow the following number of network connections:
128 Loopbacks with mask /32
80 Backbone links with mask /30
64 Host addresses in Mipnet NOC (mask /32)
128 Access links with mask /30
TMN shall register another address block for CE-PE links, if expected number of customer connections will go beyond 128.
Customers’ addresses are not part of address block registered for Mipnet devices. TMN shall register another address block for numbering of devices (hosts) and links behind CE routers. Alternative is to use private IP addresses, but this will require NAT on CE routers to implement Internet connectivity or MPLS/VPN Extranets.
Figure 2 Mipnet IP Addressing – An Overview
P, PELoopbacks
P-P, P-PEBackbone Links
195.140.164.0/24
NOC
195.140.165.0/24 195.140.166.0/24 195.140.167.0/24
CE-PEAccess Links
Loopbacks
195.140.164.x/32 x = [1, 2, 3 .. 127]
Table 4 Mipnet IP Addressing – Loopback Interfaces
PoP DNS Code
P PE LAC Comments
Andrijevica an .5
Bar br .1 .6
Berane ba .7
Bijelo Polje bp .2 .8
Budva bd .9
Celebic pg_cel .10
Cetinje ct .11
Danilovgrad dg .12
Donja Gorica pg_dgo .13
Golubovci pg_gol .14
Herceg Novi hn .15
Kolasin kl .16
Konik pg_kon .17
Kotor ko .18
Malo Brdo pg_mbr .19
Masline pg_mas .20
Mojkovac mo .21
Niksic nk .22
Plav pl .23
Pljevlja pv .24
Pluzine pv .25
Podgorica MTKC
mtkc .3 .26 .40 iGW will have Loopback numbered from AS8585 block.
Podgorica TKC tkc .4 .27 iGW will have Loopback numbered from AS8585 block.
Rozaje ro .28
Savnik sa .29
SPP pg_spp .30
Stari Aerodrom pg_sta .31
Tivat tv .32
Tolosi pg_tol .33
Tuzi pg_tuz .34
Ulcinj ul .35
Vektra pg_vek .36
Zabljak za .37
Zagoric pg_zag .38
Zeljeznicka Stanica
pg_zst .39
Backbone Links
Subnets below will be sequentially assigned to backbone connections (P-P, P-PE, MCE-MPE, MCE-MCE)
195.140.164.x/30 x = [128, 132, 136 .. 252]
195.140.165.x/30 x = [0, 4, 8 .. 188]
Table 5 Mipnet IP Addressing – Backbone Links
Subnet Router #1 IP Address Router #2 IP Address
195.140.164.128/30 bp_p_1 195.140.164.129 mtkc_p_1 195.140.164.130
195.140.164.132/30 mtkc_p_1 195.140.164.133 tkc_p_1 195.140.164.134
195.140.164.136/30 tkc_p_1 195.140.164.137 br_p_1 195.140.164.138
195.140.164.140/30 br_p_1 195.140.164.141 mtkc_p_1 195.140.164.142
195.140.164.144/30 bp_p_1 195.140.164.145 tkc_p_1 195.140.164.146
195.140.164.148/30 mtkc_p_1 195.140.164.149 mtkc_pe_1 195.140.164.150
195.140.164.152/30 mtkc_p_1 195.140.164.153 mtkc_pe_1 195.140.164.154
195.140.164.156/30 mtkc_p_1 195.140.164.157 kl_pe_1 195.140.164.158
195.140.164.160/30 mtkc_p_1 195.140.164.161 nk_pe_1 195.140.164.162
195.140.164.164/30 mtkc_p_1 195.140.164.165 ul_pe_1 195.140.164.166
195.140.164.168/30 tkc_p_1 195.140.164.169 tkc_pe_1 195.140.164.170
195.140.164.172/30 tkc_p_1 195.140.164.173 tkc_pe_1 195.140.164.174
195.140.164.176/30 tkc_p_1 195.140.164.177 ba_pe_1 195.140.164.178
195.140.164.180/30 tkc_p_1 195.140.164.181 dg_pe_1 195.140.164.182
195.140.164.184/30 tkc_p_1 195.140.164.185 ct_pe_1 195.140.164.186
195.140.164.188/30 bp_p_1 195.140.164.189 bp_pe_1 195.140.164.190
195.140.164.192/30 bp_p_1 195.140.164.193 ba_pe_1 195.140.164.194
195.140.164.196/30 bp_p_1 195.140.164.197 pv_pe_1 195.140.164.198
195.140.164.200/30 bp_p_1 195.140.164.201 mo_pe_1 195.140.164.202
195.140.164.204/30 br_p_1 195.140.164.205 br_pe_1 195.140.164.206
195.140.164.208/30 br_p_1 195.140.164.209 ul_pe_1 195.140.164.210
195.140.164.212/30 br_p_1 195.140.164.213 bd_pe_1 195.140.164.214
195.140.164.216/30 br_p_1 195.140.164.217 ko_pe_1 195.140.164.218
195.140.164.220/30 br_p_1 195.140.164.221 hn_pe_1 195.140.164.222
195.140.164.224/30 ba_pe_1 195.140.164.225 an_pe_1 195.140.164.226
195.140.164.228/30 ba_pe_1 195.140.164.229 pl_pe_1 195.140.164.230
195.140.164.232/30 ba_pe_1 195.140.164.233 ro_pe_1 195.140.164.234
195.140.164.236/30 ro_pe_1 195.140.164.237 pl_pe_1 195.140.164.238
195.140.164.240/30 an_pe_1 195.140.164.241 pl_pe_1 195.140.164.242
195.140.164.244/30 mo_pe_1 195.140.164.245 kl_pe_1 195.140.164.246
195.140.164.248/30 pv_pe_1 195.140.164.249 za_pe_1 195.140.164.250
195.140.164.252/30 pv_pe_1 195.140.164.253 pu_pe_1 195.140.164.254
195.140.165.0/30 za_pe_1 195.140.165.1 sa_pe_1 195.140.165.2
195.140.165.4/30 sa_pe_1 195.140.165.5 nk_pe_1 195.140.165.6
195.140.165.8/30 pu_pe_1 195.140.165.9 nk_pe_1 195.140.165.10
195.140.165.12/30 nk_pe_1 195.140.165.13 dg_pe_1 195.140.165.14
195.140.165.16/30 ct_pe_1 195.140.165.17 bd_pe_1 195.140.165.18
195.140.165.20/30 bd_pe_1 195.140.165.21 tv_pe_1 195.140.165.22
195.140.165.24/30 tv_pe_1 195.140.165.25 ko_pe_1 195.140.165.26
195.140.165.28/30 ko_pe_1 195.140.165.29 hn_pe_1 195.140.165.30
195.140.165.32/30 mtkc_pe_1 195.140.165.33 mtkc_mce_1 195.140.165.34
195.140.165.36/30 tkc_pe_1 195.140.165.37 tkc_mce_1 195.140.165.38
195.140.165.40/30 mtkc_mce_1 195.140.165.41 tkc_mce_1 195.140.165.42
195.140.165.44/30 195.140.165.45 195.140.165.46
Etc.
NOC Links and Hosts
NOC prefix 195.140.165.192/26 ie. ¼ of class-C size subnet has been reserved for numbering of links and hosts in Mipnet NOC.
NOC prefix can be VLSM subnetted to allow resilient NOC connectivity with Mipnet, through PIX firewalls and multiple security zones (VLANs) within the NOC site.
Access CE-PE Connections
Two class-C equivalent subnets below will be assigned by ISC to access connections. This will be sufficient for 128 point-to-point links.
Connections between Multi-VRF CEs and PE will also have IP addresses assigned from this address pool.
195.140.166.x/30 x = [0, 4, 8 .. 252]
195.140.167.x/30 x = [0, 4, 8 .. 252]
Cat3550s and 1760s Vlan99
Catalyst 3550 switches and 1760 series routers in Podgorica MAN and regional PoPs have been connected to closest PE router via dedicated VLAN99. Each device is configured with /30 p2p subnet, and terminated in the global routing table. Passive-inetrface command is used to inject these subnets into backbone ISIS. These subnets are only reachable from NOC site.
VLAN99 subnets are numbered from the following address block:
192.168.99.x/30 x = [0, 4, 8 .. 252]
7.2 Physical Network Design
7.2.1 Physical Connectivity in Mipnet
Network connections in Mipnet are outlined in Figure 1.
7.2.2 Core PoPs
This chapter details the architecture of four core PoPs in Mipnet.
Please note that Aironet bridges are presented in the drawings below just to have the full view in PoP architecture. Only four of them will be installed in the scope of this project.
Bar
Figure 3 Core PoP – Bar
TK
C
Ulc
inj
Her
ceg
novi
Kot
or
12406
7206 VXR
PoP Bar
MPLS/VPN Customers
Cat-3550
BR350
Bjelo Polje
Figure 4 Core PoP – Bjelo Polje
MT
KC
TK
C
Moj
kova
c
Plje
valja
12406
7206 VXR
PoP Bjelo Polje
MPLS/VPN Customers
Cat-3550
BR350
Podgorica TKC
Figure 5 Core PoP – Podgorica TKC
iGW2
Bar
Bije
lo P
olje
Ber
ane
Cet
inje
12406
7206 VXR
6509
PoP Podgorica [TKC]
AS
5350
MT
KC
NO
C
BR350
Podgorica MTKC
Figure 6 Core PoP – Podgorica MTKC
iGW1
Pod
goric
a MA
N
NOCTKC
Ulc
inj
Kol
asi
n
Bar
Nik
sic
12406
7206 VXR
6509
PoP Podgorica [MTKC]
6509
(TK
C)
BR350
7.2.3 Regional PoPs
Figure 7 A typical Regional PoP
Regional PoP
MPLS/VPN Customers
Cat-3550
BR350
7.2.4 Podgorica MAN
Figure 8 Podgorica MAN
Mipnet
7206VXR(PE)
1760BR350
12406(iGW)
7206VXR(PE)
12406(iGW)
6509
6509
3550
Note: VRF for a customer attached at MAN Cat3550s (Figure 8 Logical layout) would be configured on both PEs (with HSRP), for resiliency purposes. Currently the management platform ISC does not support such setup, so only one PE will be used. Logical layout installed in Mipnet is depicted on the following drawing.
Figure 9 Non-resilient 3550-attached VRFs
Mipnet
7206VXR(PE)
1760BR350
12406(iGW)
7206VXR(PE)
12406(iGW)
6509
6509
3550
Described ISC limitation introduces a single point of failure in Mipnet core topology in Podgorica. If a TKC P router fails, 3550-attached customers (statically routed) whose VRFs are terminated on adjacent TKC PE, would loose connectivity. The workaround would be to create a p2p VLAN between TKC and MTKC PEs, across 6509s. This VLAN is configured for LDP and ISIS, and provides a backup link from TKC PE when TKC P router fails (equivalently for protection against MTKC P failure).
Figure 10 Protection against P failure in Podgorica
Mipnet
7206VXR(TKC PE)
7206VXR(MTKC PE)
6509
6509
GSR(MTKC P)
GSR(TKC P)
VLAN
7.2.5 Access Layer (Customer Edge Devices)
The following devices (managed CE) will be initially made available to Mipnet customers:
Cisco 1721
Cisco 805
In case of unmanaged (ie. customer-managed) CE device, customer may install different platforms as long as they’re compatible on physical and IP layer with Mipnet infrastructure.
For example, in first release of Mipnet the ATM connectivity or EIGRP routing will not be available and hence can’t be utilised for interconnection between Mipnet and CE device.
An example of unmanaged CE that can be attached to Mipnet is a web server, connected via FastEthernet as a customer MPLS/VPN site with a single IP address.
Access media will be either:
Leased line, Aironet p2p bridge BR350, DSL p2p bridge, with the following encapsulations: HDLC, PPP, FR (when multiple logical links are needed over single physical circuit).
Dial-up
Ethernet or FastEthernet for co-located customers
7.3 Logical Network Design
7.3.1 IGP Routing – IS-IS
The role of ISIS in Mipnet
The TMN MPLS network requires an underlying Interior Gateway Protocol (IGP) to be enabled to perform a number of functions. These include enabling BGP next-hop reach-ability, routing of management traffic and routing of accounting data. The choice of IS-IS is a very good one as it is standardised, scales well and converges quickly. Optional new services that may be required later on, namely the MPLS Traffic Engineering also require links state protocol.
IS-IS will be responsible for interior routing only. It will not be used to carry any externally BGP derived routes nor will it carry customer addresses or links. Network addresses of the following links will be carried in the IS-IS updates:
backbone P-P links
distribution layer PE-P links
loopback0 interfaces
MPE-MCE IPv4 links
IS-IS Overview
Intermediate System to Intermediate System protocol (IS-IS) is an intra-domain OSI dynamic routing protocol specified in ISO 10589. The protocol is designed to operate in OSI Connection-less Network Service (CLNS). Data is carried using the protocol specified in ISO 8473.
In order to support large routing domains, provision is made for Intra-domain routing to be organized hierarchically. A large domain may be administratively divided into areas. Each system resides in exactly one area . Routing within an area is referred to as Level 1
routing. Routing between areas is referred to as Level 2 routing. Level 2 Intermediate Systems keep track of the paths to destination areas. Level 1 Intermediate Systems keep track of the routing within their own area. For a packet destined to another area, a level 1 Intermediate System sends the packet to the nearest level 2 IS in its own area, regardless of what the destination area is. Then the packet travels via level 2 routing to the destination area, where it again travels via level 1 routing to the destination. It should be noted that selecting an exit from an area based on level 1 routing to the closest level 2 IS could result in sub-optimal routing .
On broadcast media’s a DIS (Designated Intermediate System) is elected and will conduct the flooding over the media. The DIS is analogous to the designated router in OSPF.
Intra-Domain IS-IS Routing Protocol may be used as an interior gateway protocol (IGP) to support TCP/IP as well as OSI. This allows a single routing protocol to be used to support pure IP environments, pure OSI environments, and dual environments. Integrated ISIS is deployed extensively in an IP-only environment in the Tier-1 ISP networks. This specification for Integrated IS-IS [RFC1195] was developed by the IS-IS working group of the Internet Engineering Task Force.
Network Entity Title (NET) – The CLNS Address
Even when IS-IS is used to route IP traffic only, IS-IS is still an ISO CLNP protocol. Consequently, the packets by which IS-IS communicates with its peers are CLNS PDUs, which in turn means that even in an IP-only environment, an IS-IS router must have an ISO address. The ISO address is termed Network Entity Title (NET). The length of a NET can range from 8 to 20 octets. The NET is defined in standards document ISO 8348. The ISO designed the NET to be many things to many systems. Depending on your viewpoint the address format is either extremely flexible or it is a cumbersome muddle of variable fields. If it is possible to choose any NET for an IP-only environment it is advisable to choose the simplest format. Regardless of the format, the following rules apply:
The NET must begin with a single octet.
The NET must end with a single octet, which should be set to 0x00. This octet is commonly called “selector byte” (SEL). IS-IS will function if the selector byte is non-zero but a dual CLNP/IP router may experience problems.
On Cisco routers, the System ID part of the NET must be six octets long.
TMN does not intend to connect to external OSI networks and is therefore using a private numbering scheme based on the local NSAP address format.
Figure 11 NSAP Address Format
A network that uses the local address format is not connected to a public data network, and forms a single, isolated routing domain. The addresses used are significant only within the own domain. The authority and format identifier (AFI) part of the NET address indicates the local organization (which is not an officially sanctioned registration authority) that assigns the NSAP address. The AFI consists of two digits at the beginning of the local address format.
The following list describes the different kinds of local address format fields:
AFI – The AFI of 0x49 indicates that a local address is used (analogous to RFC1918 for IP addresses). It indicates that a local organization, which is not an officially sanctioned registration authority, assigns the NSAP address.
Subnet ID – The subnet identifier is a 4-digit hexadecimal number that identifies a particular subnetwork within an organization. In Mipnet this field will be used to decode the area.
End-System ID – The end-system ID is a hexadecimal number that identifies a particular system on a subnetwork.
NSAP Selector – The NSAP selector is a 2-digit hexadecimal number that uniquely identifies a network service user on the system. On Cisco routers this value is set to 0.
The 10-byte NET used for Mipnet consists of a 1 byte AFI, a 2-byte area number, a 6-byte system id, and 1-byte selector as shown below:
49. AAAA.SSSS.SSSS.SSSS.00
The 2-byte (4 hexadecimal digits) Subnet ID field, shown as AAAA in the example above, defines the unique IS-IS area. Initially the value is set to 0x0001. If an area topology will be implemented 0x0001 will become the backbone area. Other area numbers will then be derived by incrementing the number.
In Cisco’s implementation of IS-IS the End-System ID field has a fixed length of 6 bytes.
For operational and debugging purposes it is helpful to establish a connection between the End-System ID field and the IP address of a routers loopback interface. The 4-byte IP address can be completely mapped into the 6 bytes of the End-System ID field using “decimal” notation. Each byte of the IP address would be represented using three digits of the NET.
For example, the IP Address of the loopback interface
213.149.124.1 can be rewritten as:
213. 149. 124.001 and encoded as (note the two decimal points instead of three):
2131.4912.4001 and finally the NET will be:
49.0001.2131.4912.4001.00
TMN selected another approach, where IP address of Loopback0 is reflected as HEX number in NET as per following example:
195.140.164.42 = c3.8c.a4.2a (hex)
which results in NET 49.0100.0100.c38c.a42a.00
IS-IS Areas and Summarization in Mipnet
IS-IS supports variable length IP subnetting, tagging of externally derived routing information, and packet authentication. It is based on a two-level hierarchical structure where groups of IS-IS routers can be aggregated into “areas”, with each area connected to a backbone area that is used to route between the other areas.
Route summarization at area boundaries into the backbone area helps reduce the routing information propagated within the core network and to other areas. Routers only maintain a database of routes for areas in which they reside. Therefore, by dividing the overall network into areas, the effects of route flooding due to topology changes only affect the routers in the given area and not the entire network. Areas also allow creating routing summaries. A summary is a single aggregate route advertisement of the networks that reside in a given area. This greatly reduces the size of the routing tables and allows for a logical grouping of networks.
Mipnet will be configured as a single level-2 area initially. The main reasons behind that decision are:
Network size. Although there’s no simple formula to determine when to split the IS-IS domain into areas, the number of links and nodes in TMN MPLS core does not require multi-area approach. The large number of access-layer links that tend to flap frequently and introduce lots of routing updates (stability concern) will not be carried in the Mipnet IS-IS. Backbone links are generally very stable.
Route-summarization can introduce sub-optimal routing decisions on inter-PoP links, which is a major issue for core routing in service provider networks.
IS-IS Authentication
http://www.cisco.com/univercd/cc/td/doc/product/software/ios122/122newft/122t/122t13/ftismd5.htm
http://wwwin-metrics.cisco.com/cgi-bin/ddtsdisp.cgi?id=CSCdu82470
IS-IS already has a nice “built-in” security mechanism: IS-IS LSPs are encapsulated directly in the layer-2 frames. This means that spoofed IS-IS updates (layer-2 frames) cannot be received from outside of Mipnet IGP domain as IP packets.
The implementation
ISIS HMAC-MD5 adds HMAC-MD5 digest to each ISIS PDU packet. This allows authentication at the ISIS routing protocol level which prevents unauthorized routing messages from being injected into network routing domain. The current implementation is based on the current IETF draft "IS-IS HMAC-MD5 Authentication" <draft-ietf-isis-hmac-03.txt>. For the detail of HMAC and MD5 algorithms refer to RFC 2104 and RFC 1321.
ISIS has 5 different packet types, LSP, LAN Hello, Serial Hello, CSNP and PSNP. Cisco's old implementation of cleartext password authentication was only on the first three types of PDUs. This implementation of HMAC-MD5, along with the cleartext mechanism, can be applied to all 5 types of PDUs. The authentication can also be enabled on different ISIS levels independently. The interface related PDUs can be enabled with authentication on different interfaces, with different levels and different passwords. Passwords can also be rolled over to new ones without the disruption of routing messages. A router can be optionally configured to accept any PDUs without or with wrong authentication information but still to send out PDUs with authentication for the network transition purpose.
If the "service password-encryption" is configured on the router, the new scheme password string will be encrypted to add further security for the network operation.
Network Operation
Network operators can have their choice of authentication in ISIS level-1 or level-2, and they can decide if this authentication is only for LSPs, or only for interface related PDUs: Hello, SNP, or for both.
They can choose to use HMAC-MD5 or cleartext for their authentication. The router uses new style cleartext authentication will interoperate with the routers use old style cleartext under certain conditions.
The HMAC-MD5 mode can not be mixed with the cleartext mode on the same authentication scope. But operators can use one mode for LSP, but the other mode for some of interfaces for example. If mixed modes are intended, then different keys should be used for different modes in order not to compromise the encrypted password the in PDUs.
Password Rollover
With the new IS-IS authentication scheme, we use the key management commands for the password implementation. This scheme gives us the flexibility of defining multiple passwords, encryption on display, and password rollover (changing passwords without causing routing message disruption) features.
The basic idea of password or key rollover is the following: Assume we have an old key and we want to move onto the new key and we further assume that we want this to happen around time X. On every router within the authentication scope, two keys are defined; or more precisely, the old key is already defined and being used, but a new key is added.
Within the key chain command, there are two optional attributes can be specified, one is accept-lifetime, the other is send-lifetime. We need to define the old key send-lifetime being from the current time to time X, so the router will stop to use the packets using the old key when time X is reached. We need to define the old key accept-lifetime being from current time to X+y, where y will be the maximum hold-time of a IS-IS packet. The new key send-lifetime should be from X to infinity, the configuration of infinity can be changed later when the next key rollover comes. The new key accept-lifetime should also be from X to infinity.
The use of NTP on routers is strongly recommended in order to synchronize all the routers on the network.
From time X and on, all the routers will start to send out packets with authentication using the new key, since the old key send-lifetime has expired on time X. All the router should be able to authenticate the new packets because the new key accept-lifetime is valid now. In case there is packets using the old key is still during transition some where on the network, all the routers can still authenticate them as long as it's within the X+y time frame. After time X+y, the old key is not in use by the system any more, we can optionally delete the old key configuration.
This completes the key rollover.
Router Configuration
In Mipnet, the HMAC-MD5 authentication mode will be used on all network links, except on AS5380-PE connections where cleartext mode will be configured (HMAC-MD5 authentication is currently not available on AS5380 series).
IS-IS authentication will be configured on per-interface basis, according to the following configuration template.
Config 1 IS-IS HMAC-MD5 authentication example
!service password-encryption
!key chain chain1key key1 key-string nontrivialpwd1!key chain chain2 key key1 key-string nontrivialpwd2!interface xxxx description Link to P/PE devices (global RT) ip router isis 1234 isis authentication mode md5 level-2 isis authentication key-chain chain1 level-2!interface xxxx description Link to AS5380 (global RT) ip router isis 1234 isis authentication mode text level-2 isis authentication key-chain chain2 level-2!router isis 1234 net xxxx is-type level-2!
Loopback Addresses
Each of the backbone routers will have a loopback addresses configured. These are used to force stability of the router ID for IGP, MPLS, BGP.
IS-IS Metrics
MPLS Traffic Engineering will not be deployed in the first phase, but may be required in the future. In order to support MPLS-TE we will use the new IS-IS “wide metrics”.
IS-IS wide metrics for the router interface are represented as 0 to 16,777,215 in decimal. The total path metric is represented in decimal from 0 to 4,261,412,864. The new wide metrics allow for more granularity in metric allocation.
See the table below for the recommended IS-IS Metrics for the production network. This table is based on the largest interface speed (STM-256) having a metric of 10. This will allow us to scale to even faster interface speeds in the future if required.
A common metric policy for the entire network can result in undesirable natural routing path selection. For example an intra-POP packet in a bi-connected POP (two P/PE routers) going from one PE router to another might leave the POP and go all the way to the P router, and then came back into the POP again because the combined cost of the higher speed backbone trunks is lower than the cost inside the POP. To avoid this scenario two separate intra-POP and inter-POP metric policies will be used.
Similar thinking applies to inter-PoP metrics: the reference metric is STM-256 and metrics for lower link speeds are multiplied by 4. This is to prevent traffic to be rerouted on low-capacity link in case of failure on major trunks and therefore prevent congestions. For example, if the STM-4 link between two PoPs fails, the traffic will take a backup path of three “hops” via STM-4 links instead of a direct backup STM-1 link.
The default metric (if not specified) for all IS-IS router interfaces is 10.
Table 6 Proposed IS-IS Metrics
Line Rate Bandwidth Intra-PoP Intra-PoPBackup
Inter-PoP Inter-PoPBackup
STM-256 40 Gb 10 11 100 101
STM-64 10 Gb 40 41 400 401
STM-16 2.5 Gb 160 161 1600 1601
STM-4 / GE 622 Mb 640 641 6400 6401
STM-1/FE 155 Mb 2560 2561 25600 25601
T3,E3 45/34 Mb 10240 10241 102400 102401
E1/T1/Eth 2/1,5/10 Mb 40960 40961 409600 409601
Default Routes
Default route is not present in the Mipnet IGP.
Timers and Advanced IS-IS Features
The various IS-IS timers and parameters will be left as default. If after experience with the initial production network, end-to-end network convergence times need to be tuned, then the timers can be carefully adjusted, as described in Appendix I.
Link failure detection
Another value we should look at is the carrier-delay on the POS interfaces. The default is 2 seconds, so if we wait 2 seconds that an interface is down before informing ISIS, the tuning won't do much good. Advice here would be to set a short carrier-delay to detect link down events fast. Value of about 20 msec would be recommended ("carrier-delay msec 20"). That way we react fast to interface down situations.
If you expect the interfaces to be physically unstable for periods shorter than it takes for the IGP to converge, you may want to put this value higher, and that will effectively "filter out" these interface transitions as far as the IGP is concerned.
With IS-IS current hello timers of 10 seconds and three times the hold interval, it can take up to 30 seconds to notice that a neighbor went down. We recommend reducing the hello timer to 3 seconds, resulting in an interval of 9 seconds to detect a failure due to missing keepalives.
interface <interface> isis hello-interval 3
Set overload-bit on-startup <seconds>
Defines the time in seconds that the router keeps the "overload" bit set after startup in order to prevent it being the next hop for BGP routes (gives the BGP table time to stabilise)
IS-IS Priority
Used to determine which router will perform the role of "DIS" (Designated Intermediate System) on a LAN segment. The router with the highest priority will take over this role (should be the router with the highest spare CPU capacity on the segment). The default value is 64, and the chosen DIS should be given a value of 127.
Flooding
Max-lsp-lifetime specifies the maximum lifetime in seconds specified in the LSP header. Routers use this timer to age-out and purge old LSPs. The default is 1200 seconds (20 minutes). The recommendation is to increase this timer to the maximum of 65535 seconds (~18.7 hours). This will decrease the number of unnecessary LSP re-flooding.
Lsp-refresh-interval specifies the time in seconds the router will wait before refreshing and transmitting its own LSPs. The default is 900 seconds (15 minutes). This happens to make sure that LSPs are refreshed before the max-lsp-lifetime expires. The recommendation is to increase this timer to 65000 seconds (18 hours). Make sure that the lsp-refresh-interval is lower than the max-lsp-lifetime in order for the LSP never to age-out.
router isis! max-lsp-lifetime 65535
lsp-refresh-interval 65000!
No hello-padding
By default all hello packets generated by the ISIS process are padded to the full MTU size. Initially this was meant to be a mechanism to detect MTU inconsistency. Unfortunately this means that on a PoS interface the ISIS hello packet will be 4470 byte instead of 60 byte. This is a waste of BW and buffers, especially if the hello-time is tuned down. If the padding is turned off, the first packet that is sent out to form the adjacency will still be padded to the full MTU size. After that hello packets will only be 60 bytes. This behaviour gives us the possibility to have different configuration on the nodes in the network (which is a great benefir during a migration). When applying this command all CLNS neighbourship will be re-established, which most likely will cause a traffic disturbance.
The command can be applied either on a per interface basis or on a global basis:
IS-IS Configuration Template
The generic IS-IS configuration template is shown below.
***TBD***
!
7.3.2 Cisco Express Forwarding – CEF
Cisco's Express Forwarding (CEF) technology for IP is a scalable, distributed, layer 3 switching solution designed to meet the future performance requirements of the Internet and Enterprise networks. CEF is also a key component of Cisco's Tag Switching architecture.
Express Forwarding evolved to best accommodate the changing network dynamics and traffic characteristics resulting from increasing numbers of short duration flows typically associated with Web-based applications and interactive type sessions. Existing layer 3 switching paradigms use a route-cache model to maintain a fast lookup table for destination network prefixes. The route-cache entries are traffic-driven in that the first packet to a new destination is routed via routing table information and as part of that forwarding operation, a route-cache entry for that destination is then added. This allows subsequent packets flows to that same destination network to be switched based on an efficient route-cache match. These entries are periodically aged out to keep the route cache current and can be immediately invalidated if the network topology changes. This `demand-caching' scheme - maintaining a very fast access subset of the routing topology information - is optimized for scenarios whereby the majority of traffic flows are associated with a subset of destinations. However, given that traffic profiles at the core of the ISPs (and potentially within some large Enterprise networks) are no longer resembling this model, a new switching paradigm was required that would eliminate the
increasing cache maintenance resulting from growing numbers of topologically dispersed destinations and dynamic network changes.
CEF avoids the potential overhead of continuous cache churn by instead using a Forwarding Information Base (FIB) for the destination switching decision, which mirrors the entire contents of the IP routing table. i.e. there is a one-to-one correspondence between FIB table entries and routing table prefixes; therefore no need to maintain a route-cache.
This offers significant benefits in terms of performance, scalability, network resilience and functionality, particularly in large complex networks with dynamic traffic patterns.
Forwarding Information Base – FIB
CEF uses a Forwarding Information Base (FIB) to make IP destination prefix-based switching decisions. The FIB is conceptually similar to a routing table or information base. It maintains a mirror image of the forwarding information contained in the IP routing table. When routing or topology changes occur in the network, the IP routing table is updated, and those changes are reflected in the FIB. The FIB maintains next-hop address information based on the information in the IP routing table. Because there is a one-to-one correlation between FIB entries and routing table entries, the FIB contains all known routes and eliminates the need for route cache maintenance that is associated with earlier switching paths such as fast switching and optimum switching.
Figure 12 Sample FIB Entry
bbwu301#sh ip cef 57.64.0.0 detail
57.64.0.0/30, version 678050, per-packet sharing0 packets, 0 bytes via 57.64.0.5, Hssi0/0/0.1, 0 dependencies traffic share 1, current path next hop 57.64.0.5, Hssi0/0/0.1 valid adjacency via 57.64.0.9, Hssi0/0/0.2, 0 dependencies traffic share 1 next hop 57.64.0.9, Hssi0/0/0.2 valid adjacency 0 packets, 0 bytes switched through the prefixbbwu301#
Adjacency Tables
Network nodes in the network are said to be adjacent if they can reach each other with a single hop across a link layer. In addition to the FIB, CEF uses adjacency tables to
prepend Layer 2 addressing information. The adjacency table maintains Layer 2 next-hop addresses for all FIB entries.
Figure 13 Adjacency table
BGP table
Address Prefix AS-Path Communities Other attr.Next hop
10.0.0.0 /8 42 13 37:121.2.3.4
... ... ... ... ......
IP routingtable
Address Prefix
... ...
FIB table(CEF cache)
Next-hop Outgoing interfaceAddressProtocol
BGP
ARP cache
Adjacency pointer
...
1.5.4.1 Ethernet 01.2.3.0OSPF
--- Ethernet 01.5.4.0conn.
MAC address
...
IP address
...
Layer 2 header
...
Adjacencytable
IP address
...
1.5.4.1 MAC header
Prefix
/24
/24
1.2.3.4 ---10.0.0.0 /8
0c.00.11.22.33.441.5.4.1
10.0.0.0 /8 1.5.4.1
The adjacency table is populated as adjacencies are discovered. Each time an adjacency entry is created (such as through the ARP protocol), a link-layer header for that adjacent node is precomputed and stored in the adjacency table. Once a route is determined, it points to a next hop and corresponding adjacency entry. It is subsequently used for encapsulation during CEF switching of packets.
A route might have several paths to a destination prefix, such as when a router is configured for simultaneous load balancing and redundancy. For each resolved path a pointer is added for the adjacency corresponding to the next-hop interface for that path. This mechanism is used for load balancing across several paths. For per destination load balancing a hash is computed out of the source and destination IP address. This hash points to exactly one of the adjacency entries in the adjacency table, providing that the same path is used for all packets with this source/destination address pair. If per packet load balancing is used the packets are distributed round robin over the available paths. In either case the information in the FIB and adjacency tables provide all the necessary forwarding information, just like for non-load balancing operation. The additional task for load balancing is to select one of the multiple adjacency entries for each forwarded packet.
CEF in the Mipnet
MPLS requires that CEF (Cisco Express Forwarding) to be enabled on the router. In fact on the 12000 series routers this is the only forwarding mechanism that is available. CEF is enabled on other platforms with the IOS command
!ip cef <distributed>!
The distributed keyword can be included for platforms such as the GSR, 75xx series routers where the linecards have their own processors and packet memory for FIB tables.
In case of multiple equal-cost paths (routes) exist for a given destination, the per-destination load sharing will be implemented. This is the default CEF operation. The destination based load sharing will not necessary result in equal load distribution across equal-cost paths.
Per-packet load balancing is not recommended as it may introduce out-of-order packet delivery.
7.3.3 Multi Protocol Label Switching – MPLS
Overview
In conventional Layer 3 forwarding, as a packet traverses the network, each router extracts forwarding information from the Layer 3 header. Header analysis is repeated at each router (hop) through which the packet passes. This is CPU intensive task and does not scale well in the core Internet routers with +100k routes in the routing table (and FIB).
In a MPLS network, only Provider Edge routers need to maintain the external routing information (BGP) and customer routes. Packets in the core are forwarded based on labels. Each IGP learned IP network that is reachable through an interface is assigned a unique label2. A mapping is established between an incoming label and an out going label. This is maintained in the Label Forwarding Information Base (LFIB) table. Each node examines the incoming label, does a table lookup, swaps the incoming label for the outgoing label and then forwards the packet out of the out going interface.
Figure 14 shows the details of the MPLS header. It is located between the Layer 3 (IP) header and Layer 2 header. The EXP bits and the TTL field of the MPLS header can be copied from the IP header. The S bit indicates whether there is more than one MPLS label in this packet.
Figure 14 MPLS Header
2 Packets for BGP learned routes will be tagged with the same label as the IP address of iBGP next-hop.
A protocol is used between the routers in a MPLS network to assign labels to IP network and exchange label information with other routers. There are two protocols that are currently used today and offer the same functionality with minor operational differences:
1. Tag Distribution Protocol (TDP Port Number 711)
2. Label Distribution protocol (LDP Port number 646)
Figure 15 gives an overview of label switching in an MPLS enabled IP network. TDP/LDP is used to assign labels to networks that have been learnt by the IGP. At the ingress of the MPLS Network, a MPLS header is added to the IP packet. At each hop, the packet is forwarded by looking only at the label in the MPLS header. The label is swapped before forwarding it to the next router. At the egress of the MPLS network the MPLS header is stripped and the IP packet is forwarded out of the egress interface.
Figure 15 Overview of Label Switching using MPLS
1a. Existing routing protocols (e.g. OSPF, IS -IS) establish reachability to destination netw orks
2. Ingress Label Edge Router receives packet, performs Layer 3 value-added services, and “labels” packets
3. Label Sw itches switch labelled packets using label sw apping
4. Label Edge Router at egress removes label and delivers packet
IP Packet
MPLS Packet
1b. Tag/Label Distribution Protocol (TDP/LDP) establishes label to destination netw ork mappings. IP Packet
MPLS Enabled IP Network
1a. Existing routing protocols (e.g. OSPF, IS -IS) establish reachability to destination netw orks
2. Ingress Label Edge Router receives packet, performs Layer 3 value-added services, and “labels” packets
3. Label Sw itches switch labelled packets using label sw apping
4. Label Edge Router at egress removes label and delivers packet
IP Packet
MPLS Packet
1b. Tag/Label Distribution Protocol (TDP/LDP) establishes label to destination netw ork mappings. IP Packet
MPLS Enabled IP Network
To enable MPLS and use LDP on a router we need the following commands:
Layer-2 HeaderLayer-2 Header
LABELLABEL
EXPEXP
SS
TTLTTL
20 bit 3 bit 1 8 bit
EXP - Experimental Bits (COS)
S - Bottom of Stack
TTL - Time To Live
MPLS HeaderMPLS Header
Layer-3 HeaderLayer-3 Header
PayloadPayload
!mpls ipmpls label protocol ldpmpls ldp router-id loopback 0!interface pos 6/0 mpls ip!
These commands enable MPLS globally on the router and choose LDP as the label distribution protocol, they also set the router-id to the Loopback address; this is a similar stability mechanism as for BGP peering to use Loopback addresses to generate updates.
We also need to enable MPLS on each interface where we want to perform Label Switching, note that when this is activated the interface can still accept normal IP packets.
When using an MPLS enabled network we have the choice to propagate TTL (time to live) through the network or not, we can choose to leave the default behaviour (which is enabled) or to selectively enable TTL propagation for locally generated traffic or for pass-through traffic.
Enabling TTL for locally generated traffic means that internal troubleshooting is assisted as pings and traceroute commands show the full path, where as traffic generated from outside of the network (on a CE device somewhere) does not see the TTL. This has the added benefit of preventing customers from learning the core topology.
TTL propagation is enabled by default, to selectively enable TTL the following IOS commands can be used:
!no mpls ip propagate-ttl [forwarded] [local]!
This command gives us the three options, to completely disable TTL propagation or selectively disable it either for locally generated traffic or externally generated traffic.
LDP Authentication
MD5 authentication on TCP connection between two LDP peers can be invoked with the following command:
!mpls ldp neighbor <ip addr> password 7 <pwd-string> !
Specifying this command causes the generation and checking of the MD5 digest on every segment sent on the TCP LDP connection. When the LSR receives a TCP segment with an MD5 digest, it validates the segment by calculating the MD5 digest (using its own record of the password) and compares the computed digest with the received digest. If the comparison fails, the segment is dropped without any response to the sender.
This method is based on BGP MD5 authentication algorithm, defined in rfc2385.
The same password has to be configured on both LDP neighbors, otherwise the LDP session will not be made. Existing LDP session will be torn down, if MD5 authentication is invoked on one LDP neighbor only, or when configured passwords do not match.
The following example shows how verify that MD5 authentication with LDP neighbor (eg. 20.0.0.23) has been enabled.
First we determine the TCB of LDP TCP session with neighbor 20.0.0.23.
fra-p1#sh tcp brieTCB Local Address Foreign Address (state)01742DA8 20.0.0.22.11013 20.0.0.17.646 ESTAB017421B8 20.0.0.22.11012 20.0.0.1.646 ESTAB016A3470 20.0.0.22.646 20.0.0.23.11024 ESTAB
Then we search for “md5” flag in the parameters of that TCP session.
fra-p1#sh tcp tcb 016A3470Connection state is ESTAB, I/O status: 1, unread input bytes: 0Local host: 20.0.0.22, Local port: 646Foreign host: 20.0.0.23, Foreign port: 11024
Enqueued packets for retransmit: 0, input: 0 mis-ordered: 0 (0 bytes)
TCP driver queue size 21153008, flow controlled FALSE
Event Timers (current time is 0x4C534B8C):Timer Starts Wakeups NextRetrans 6 0 0x0TimeWait 0 0 0x0AckHold 6 0 0x0SendWnd 0 0 0x0KeepAlive 0 0 0x0GiveUp 0 0 0x0PmtuAger 0 0 0x0DeadWait 0 0 0x0
iss: 3092571909 snduna: 3092572566 sndnxt: 3092572566 sndwnd: 4032irs: 260220405 rcvnxt: 260221060 rcvwnd: 4044 delrcvwnd: 84
SRTT: 165 ms, RTTO: 1172 ms, RTV: 1007 ms, KRTT: 0 msminRTT: 12 ms, maxRTT: 300 ms, ACK hold: 200 msFlags: passive open, retransmission timeout, gen tcbs, md5, non-blocking reads, non-blocking writes
Datagrams (max data segment is 516 bytes):Rcvd: 14 (out of order: 0), with data: 6, total data bytes: 654Sent: 13 (retransmit: 0, fastretransmit: 0), with data: 5, total data bytes: 656
TDP/LDP & CEF interaction
MPLS leverages the information stored in CEF databases:
In the tag imposition router (PE), the packet is switched based on a CEF table lookup to find the nexthop and then the appropriate tag information (which hangs off the fib database) is added to it.
When the packet is tag-to-tag switched, the switch is based on a MPLS table lookup. But this table inturn is derived from the information in the CEF tables. Changes to a fib path will trigger changes in the MPLS (LFIB) tables as well. Multiple fib paths to a destination will result in multiple outgoing tag rewrites, one per fib path. The tag loadsharing information is also derived from the cef loadsharing information. If CEF cannot resolve a route, then tag cannot switch packets to that destination either.
MPLS Design Rules in Mipnet
The following are MPLS design rules in Mipnet:
LDP is the protocol that should be used for label distribution:
o It is standardized (rfc3036), which means TMN could integrate other vendors’ equipment into Mipnet core.
o MD5 authentication can be used for exchange of LDP messages.
LDP router-id will be forced to use the IP address of Loopback 0
LDP will be enabled on the following links:
o P-P, P-PE
MD5 authentication will be enabled on LDP TCP sessions.
(Distributed) CEF has to be enabled on all MPLS devices.
TTL propagation will be initially left enabled (as the default). This is to allow troubleshooting direct from the NOC as opposed to being on local routers to perform traceroute and ping commands.
MPLS Configuration Template
!interface Loopback0 ip address <loopback-address> 255.255.255.255!ip cef <distributed>!mpls ipmpls label protocol ldpmpls ldp neighbor <neighbor1-loopback> password 7 <ldp-pwd>mpls ldp neighbor <neighbor2-loopback> password 7 <ldp-pwd>! mpls ldp neighbor ... etc.mpls ldp router-id loopback 0!interface <core-interface> mpls ip!
7.3.4 Dial
Dial access (ISDN + analog) will be used within the MPLS-VPN network to provide remote access into customers’ VRF. This access will also be used to back-up PE-CE leased lines throughout the MPLS-VPN network.
The integration of those connections into the respective customer VRF (incl. the “Internet VRF”) is done via VPDN/L2TP to a so-called Virtual Home Gateway (vHGW) as MPLS-VPN PE functionality is not supported on any AS5xxx access server. A user
session into the AS5350 will be forwarded via an L2TP tunnel to the vHGW/PE based on DNIS or domain name, the final authentication and authorization is done on the vHGW/PE where the PPP session is terminated.
As only one AS5350 carrying a maximum of 120 sessions (4 E1/PRI) is used, the PPP sessions will be terminated on the regular PE’s in Podgorica. Both PE’s will be used to allow for redundancy.
7.4 HW/SW Release Table
***TBD***
<SECTION BREAKto avoid header/footer and page setup problems do not remove the carriage return
following this line>
8 Network Services
8.1 MPLS/VPN
This section describes how the VPN services will be offered by TMN using the MPLS/VPN technology.
8.1.1 MPLS-VPN
How does it work?
P and PE routes share a common routing protocol within the core (IS-IS is the IGP used for the TMN MPLS network) - the routers use this routing information to build ‘label switched’ paths between PE routers and use two levels of labels to transport packets. The PE offering VPN services can be referred to as the VPN-PE, as compared to Internet PE that may be dedicated for Internet services only.
VRFs (Virtual Routing and Forwarding instance) are defined on the VPN-PE - each VRF instance represents the end point of a VPN (a separate routing table, a set of interfaces to which CEs are attached and CE/PE routing protocols - RIP/eBGP).
To make VPN routes unique on a VPN-PE, the VRF needs to define a Route Distinguisher (RD) that is pre-pended to each VPN route to make a VPN IPv4 route. For example in Figure 16, RED’s VRF routing table contains a route 10.0.0.0/24 and the RED VRF has defined a RD of 20804:1. This route is passed via MP BGP to peer VPN-PEs as 20804:1:10.0.0.0/24 and GREEN’s VRF routing table contains a route 10.0.0.0/24 and VRF GREEN has defined a RD of 20804:2 – this route is passed via MP BGP to peer VPN-PEs as 20804:2:10.0.0.0/24.
VPNs are built using an extended BGP community called Route Target (RT). RTs have 64 bits and have the format X:Y. A VPN-PE attaches RTs (export Route Target) to the routes learnt from directly connected CEs. This RT is sent to remote VPN-PEs as an extend BGP attribute of the route using MP BGP. A set of VRFs on different VPN-PEs constitutes a VPN when they import routes that have the same RTs. For example, all routes from VRF RED VPN1 have RT 20804:1 and hence they belong to the same VPN
VPN routing tables are propagated between VPN-PEs using MP-BGP. MP BGP carries the RD and the RTs associated with the VPN Routes The VPN-PE allocates labels to VPN Routes learnt from the CE and this is propagated along with the MP-BGP updates to the remote VPN-PEs.
The VPN-PE marks packets from CE routers with two labels - the inner label is for the destination VPN route and the second outside label selects the label switched path to the remote VPN-PE (BGP next hop) that originated the destination VPN route.
More details about MPLS VPN Architecture can be found in RFC 2547 (http://www.rfc-editor.org/rfc/rfc2547.txt).
Figure 16 MPLS-VPN Network
Data Forwarding
The VPN-PE allocates labels to VPN Routes learnt from the CE and this is propagated along with the MP-BGP updates to the remote VPN-PEs.
At the ingress of the MPLS-VPN network, the VPN-PE marks packets from CE routers with two labels - the inner label is for the destination VPN route and the second outside label selects the ‘label switched’ path to the remote VPN-PE (BGP next hop) which originated the destination VPN route.
The P routers do not see the VPN label and do the data forwarding based on the outer MPLS label. Any changes or instability in the VPN network does not affect the LFIB (Label Forwarding Information Base) of the P routers. This data-forwarding paradigm makes the architecture scalable and stable.
Figure 17 gives an overview of how VPN traffic is forwarded in an MPLS-VPN network. At the ingress, PE1 receives a packet whose destination is 11.0.0.1 (VRF RED). The VPN label for 11.0.0.0/24 is 1. The BGP next hop for 11.0.0.0 (VRF RED) PE2 and the MPLS label for PE2 is 6. P1 and P2 forward the data packet based on the MPLS label. P2 is the penultimate hop in the data path and it pops the MPLS label (23) and forwards the packet
MPLS Network
VPN - PE 1 VPN - PE 2
CE 1
CE 2
CE 2
RED
GREEN GREEN
CE 1
RED
10.0.0.0/24
10.0.0.0/24
11.0.0.0/24
11.0.0.0/24
iBGP Peering MP BGP
VRF RED RD 20804:1 RT 20804:1
VRF GREEN RD 20804:2 RT 20804:2
Data Forwarding Path for RED
Data Forwarding Path for GREEN
CE 3 RED
12.0.0.0/24
MPLS Network
VPN - PE 1 VPN - PE 2
CE 1
CE 2
CE 2
RED
GREEN GREEN
CE 1
RED
10.0.0.0/24
10.0.0.0/24
11.0.0.0/24
11.0.0.0/24
iBGP Peering MP BGP
VRF RED
VRF GREEN
Data Forwarding Path for RED Data Forwarding Path for RED
Data Forwarding Path for GREEN
CE 3 RED
12.0.0.0/24
to PE2. PE2 looks at the VPN label 1 and forwards the packet out of the VRF RED interface.
Figure 17 Data Forwarding in an MPLS-VPN Network
8.1.2 VRF, RD and RT
VRF
A VRF is a logical end point of a VPN on a PE. Interfaces on a PE (to which the CEs are connected) are assigned to VRFs. CEs that belong to the same full-mesh VPN and that are connected to the same PE can have their interfaces in the same VRF.
If 2 CEs attached to the same PE are spokes in the same hub and spoke VPN but are not allowed to route directly to each other then their interfaces should be assigned to different VRFs despite the fact that both VRFs will export and import the same RTs. This requires additional resources on the router but is required in order to stop the CEs routing directly to each other without first passing through the hub site.
VRFs have names that are used within the router configuration to identify them. VRF names will be automatically chosen by ISC.
The following example shows a sample VRF configuration on a channelised E1 interface.
MPLS Network
VPN - PE 1 VPN - PE 2
CE 1
CE 2
CE 2
RED
GREEN
CE 1
RED 11.0.0.0/24
VRF RED RD 20804:1
RT 20804:1
VRF GREEN RD 20804:2
RT 20804:2
P 2 P 1
IP Dest = 11.0.0.1
2
IP Dest = 11.0.0.1
IP Dest = 11.0.0.1
IP Dest = 11.0.0.1
1
IP Dest = 11.0.0.1
11.0.0.0/24
IP Dest = 11.0.0.1
IP Dest = 11.0.0.1
1 6
IP Dest = 11.0.0.1
2 6
IP Dest = 11.0.0.1
2 23
IP Dest = 11.0.0.1
1 23
MPLS Label for PE 2
VPN Label for 11.0.0.0 (VRF RED)
VPN Label for 11.0.0.0 (VRF GREEN)
10.0.0.0/24
MPLS Network
VPN - PE 1 VPN - PE 2
CE 1
CE 2
CE 2
RED
GREEN
CE 1
RED
10.0.0.0/24
VRF RED
VRF GREEN
P 2 P 1
IP Dest = 11.0.0.1
2 IP Dest = 11.0.0.1
2
IP Dest = 11.0.0.1
IP Dest = 11.0.0.1
IP Dest = 11.0.0.1
1 IP Dest = 11.0.0.1
1
IP Dest = 11.0.0.1
IP Dest = 11.0.0.1
IP Dest = 11.0.0.1
1 6
IP Dest = 11.0.0.1
1 6
IP Dest = 11.0.0.1
2 6
IP Dest = 11.0.0.1
2 6
IP Dest = 11.0.0.1
2 23
IP Dest = 11.0.0.1
2 23
10.0.0.0/24 11.0.0.0/24
11.0.0.0/24
!ip vrf red rd 29453:123 route-target export 29453:123 route-target import 29453:123!controller E1 2/0/0 channel-group 0 timeslots 1-15!interface Serial2/0/0:0 ip vrf forwarding red ip address 14.1.1.5 255.255.255.252 encapsulation ppp ip route-cache distributed no cdp enable!
RD
Route Distinguisher (RD) is used to identify a VRF on a PE. In the context of VPNs, a RD is used to make the IPv4 address unique across different VPNs. A RD is significant only on a PE and must be unique for each VRF that is defined on the PE.
RD can, however, have some significance to MP-BGP outside the originating PE if the VPN IPv4 address, formed by combining the RD and IPv4 address, is not unique. This can become an issue when 2 PEs advertise the same IPv4 route with the same RD but with different RTs or other attributes. In this case the best route is added to the routing table prior to examining Route Targets or other attributes, this can cause service problems because valid routes may not be imported into VRFs on remote PEs because the preferred route did not carry a required RT. This is particularly important when route-reflectors are used because they only advertise their preferred route and do not examine attributes such as RT before advertising that route.
Mipnet will use different RDs for all VRFs so that the above problem does not occur (ie. RDs will not be re-used across multiple PEs). RDs will be allocated exclusively by ISC and administered sequentially so that the first VRF created will be 29453:1, the second will be 29453:2 etc.
An RD is either ASN-relative, in which case it is composed of an autonomous system number and an arbitrary number, or it is IP-address-relative, in which case it is composed of an IP address and an arbitrary number:
Figure 18 RD encoding options
The first notation using the AS number is the recommended one. The AS number used should be the Mipnet AS number 29453.
Furthermore, it is recommended that the RD number allocation scheme encodes a customer ID and – possible – a VPN type (e.g. Intranet, central services). The following is the generic required RD configuration.
!ip vrf red rd 29453:123!
RT
Route targets are used to implement routing policies in between VRFs (therefore sites). RT numbering should not require modifications each time a new site is connected (e.g. in a central services VPN topology).
The RT allocation scheme depends on the VPN topology in question (simple-full mesh VPN, overlapping VPNs, central services VPN, hub-and-spoke VPN). The exact RT allocation scheme for the simple-full mesh and hub-and-spoke VPN topologies will be outlined in the VPN topologies section.
RTs are carried between PEs as BGP extended community attributes. RTs are 64 bits in length and have 2 possible formats:
Figure 19 RT encoding options
The first notation using the AS number is the recommended one. The AS number used should be the Mipnet AS number 29453.
TYPE16 bit
TYPE16 bit ASN
16 bit
ASN16 bit NUMBER
32 bit
NUMBER32 bit
TYPE16 bit
TYPE16 bit I P Address
32 bit
I P Address32 bit NUMBER
16 bit
NUMBER16 bit
TYPE16 bit
TYPE16 bit ASN
16 bit
ASN16 bit NUMBER
32 bit
NUMBER32 bit
TYPE16 bit
TYPE16 bit I P Address
32 bit
I P Address32 bit NUMBER
16 bit
NUMBER16 bit
The following is the generic required RT configuration for a simple-full mesh VPN topology.
!ip vrf red rd 29453:123 route-target export 29453:123 route-target import 29453:123!
8.1.3 MP-iBGP Support for MPLS/VPN
MP-iBGP and address families
Key to the configuration of MP-iBGP is the concept of address families, which are in fact routing contexts. A VPNv4 address family together with an address family for each VRF and that particular PE, needs to be configured. Furthermore, the MP-iBGP neighbourship needs to be activated under the VPNv4 address family. Also, the sending of both standard and extended BGP communities is enabled under the VPNv4 address families. Enabling the forwarding of extended BGP communities is default since RTs are carried as extended communities. The forwarding of standard BGP communities can be useful for customers who want to convey their standard BGP communities transparently through the MPLS/VPN network. In the latter case, the routing protocol between the CE and PE would be eBGP.
In this particular example, the PE to CE routing protocol is static, so a “redistribute static” needs to be configured under the “IPv4 vrf red” address family.
!hostname xx_pe_x!router bgp 29453 bgp log-neighbor-changes no bgp default ipv4-unicast neighbor <bar_pe_1 Loopback0> remote-as 29453 neighbor <bar_pe_1 Loopback0> update-source Loopback0 neighbor <bij_pe_1 Loopback0> remote-as 29453 neighbor <bij_pe_1 Loopback0> update-source Loopback0 no auto-summary! address-family vpnv4 neighbor <bar_pe_1 Loopback0> activate neighbor <bar_pe_1 Loopback0> send-community both
neighbor <bij_pe_1 Loopback0> activate neighbor <bij_pe_1 Loopback0> send-community both no auto-summary exit-address-family! address-family ipv4 vrf red redistribute static no auto-summary no synchronization exit-address-family!
The MP-iBGP Route-Refresh feature (enabled by default) will be used to enable a PE router to request a resend of all VPNv4 routes from it’s neighbour. This is required when a PE has previously discarded a VPNv4 routing update for which it had no import RT configured in any of it’s VRFs.
iBGP Timers
Several iBGP timers can be used to tune the iBGP convergence in a MPLS-VPN Network:
Advertisement-interval: To set the interval between the sending of two consecutive BGP routing updates. The default interval is 5 seconds for iBGP peers.
Keepalive: Frequency, in seconds, with which the Cisco IOS software sends keepalive messages to its peer. The default is 60 seconds.
Holdtime: Interval, in seconds, after not receiving a keepalive message that the software declares a peer dead. The default is 180 seconds.
Scan-time: Configures import processing of VPNv4 unicast routing information from BGP routers into routing tables. Valid values used for selecting the desired scanning interval are from 5 to 60 seconds. The default scan time is 60 seconds.
Table 7 Default value for iBGP timers
iBGP Timers Default Value
Advertisement-interval 5 sec
Keepalive 60 sec
Holdtime 180 sec
Scan-time 60 sec
As discussed in the IGP routing section, the IGP convergence could be tuned down to a low value (about 5 sec). This means that when a PE router or the link between 2 PE routers will fail, BGP next-hop reachability will be lost, on average, after 5 seconds. The BGP scanning process will detect the loss of next-hop for the MP-BGP route and stop
selecting it in the BGP table. Therefore, the BGP scan-time of 60 seconds, by default, is really determining the overall PE to PE convergence for MP-BGP routes.
Reducing the BGP scan-time for VPNv4 routes is acceptable when there is not much information in the BGP table. However, it takes a long time for a router to scan the entire BGP table when it contains several hundred thousand networks. Choosing the right timer setting is influenced by the amount of routing information in the routing table and the strength of the CPU in the router.
The decision is to leave the BGP timers at their default values, in this initial deployment of MPLS/VPN services.
Route Reflectors
Overview
The AS_PATH attribute prevents routing loops of eBGP learned routes. Since there is no similar mechanism to prevent routing information loops within an AS, the iBGP speaker must not propagate iBGP-learned routes to other iBGP peers (iBGP split horizon rule). This implies an iBGP peering between any two iBGP speakers within the AS, or in total of N*(N-1)/2 iBGP sessions. The full iBGP mesh can represent a scalability concern in large networks due to large number of TCP sessions and unnecessary multiplication of routing updates.
Route reflection (RFC 2796) provides amore scaleable alternative to any-to-any meshing between iBGP peers. In this model the Route Reflector (RR) clients establish normal iBGP sessions with one or more RRs. Client-to-client iBGP meshing (i.e. any-to-any meshing) is not required anymore, because RRs are allowed to propagate iBGP-learned routes to RR clients. Full iBGP mesh must still be maintained among RRs.
When a RR receives a route from an iBGP peer, it selects the best path based on its path selection rule. After the best path is selected, route is propagated in accordance with iBGP split horizon rules summarized in the following table.
Table 8 iBGP Split Horizon Rules
BGP Speaker Update From: Propagated To:
Classical iBGP speaker eBGP peer all peers (iBGP & eBGP)
iBGP peer eBGP peers
RR eBGP peer all peers (iBGP & eBGP)
classical iBGP peer eBGP peers and RR clients
RR client all peers except sender
RR Client eBGP peer all peers (iBGP & eBGP)
BGP Speaker Update From: Propagated To:
iBGP peer eBGP peers
For high-availability reasons it is recommended that each RR client establishes iBGP sessions with two RRs. To prevent routing loops in resilient RR topologies, RRs will add a cluster-ID to cluster-list attribute for any reflected route. The RR will ignore any route with RR’s cluster-ID in the cluster-list
The RR model with resilient RRs in the same cluster implies that each client in that cluster must establish an iBGP session with all RRs. This must be respected to avoid black-holing and loss of connectivity. However, this rule is many times overlooked in a network topology where some RR clients interconnects with a single physical link to the rest of the network. For this reason it is recommended to put each RR in a separate cluster, which will be achieved automatically3 if the command “bgp cluster-id” is omitted.
The first RR that reflects the route, also sets an additional BGP attribute called originator-ID to the BGP router-ID of its client. Any router that receives an IBGP update with the originator attribute set to its own BGP router-ID will ignore that update4.
RR model supports multiple levels of RRs (RR hierarchy) where first level RRs become the clients of second level RRs. Unless dictated by large number (many hundreds!) of iBGP peers, such setup is generally not recommended as it increases the end-to-end routing convergence delay.
The backbone routers in the network run MPLS and have the sole purpose of forwarding packets - Running full BGP on these routers would have an adverse impact on CPU and memory as well as defeat some of the benefits of MPLS. MPLS allows us to forward packets based on label swapping using the IGP only for next-hop resolution whilst running the BGP on the edge. The decision was therefore made to keep BGP out of the backbone routers altogether.
The best-practice design for RR topology in MPLS/VPN networks5 involves two Internet RRs and two VPNv4 RRs. This is to logically separate MPLS/VPN routing model from public Internet routing systems. In even of major routing accidents in the Internet, the Internet RRs might experience performance or memory problems, but the VPNv4 RRs would continue to operate normally6. Also, when the BGP session is established, the two BGP neighbors will first exchange the IPv4 routes, followed by VPNv4 routes. This may affect the convergence times for MPLS/VPN customers after failure of PE router or RR.
3 Without special configuration, the cluster-ID will be derived from BGP router-ID.4 This is for example applicable when RR clients peer with RRs in different clusters. Another example are combined Confederation/RR scenarios, where cluster list is not preserved on confed-eBGP sessions.5 Assuming that Internet is implemented in a global routing table and each PE router holds full Internet routing table.6 One must be aware of RR importance for MPLS/VPN services: if both RRs fail, the connectivity for all MPLS/VPN customers will be lost !
Figure 20 Best-practice RR design in MPLS/VPN Networks
vrr_1 vrr_2
pe_5
MP-iBGP session
Reflected IPv4 Internet Table
Reflected VPN Customer Routes
VP
N R
Rs
RR
Clien
ts
irr_1 irr_2IPv4
RR
s
pe_4pe_3pe_2pe_1
iBGP session
RR Topology in Mipnet
Initially, dedicated Route Reflectors will not be installed in Mipnet. This should have no effect on scalability, at least in the first deployment phase because of low number of routes in Mipnet MP-BGP system. The RR functionality will be implemented on two PE routers: Bjelo Polje and Bar, which are both equipped with powerful NPE-G1 processor and 256Mb memory. Nevertheless, we recommended implementing dedicated RRs in the future, as number of customers/routes and PE routers increase.
Internet routes of Mipnet customers will be injected in the Internet-VPN, so there’s no need for IPv4 RRs. IPv4 iBGP sessions between RRs and RR-clients will be explicitly disabled.
For resiliency purposes, each PE (a RR client) will have two MP-iBGP sessions with the two RRs.
Figure 21 Initial RR Topology in Mipnet
bar_pe_1 bij_pe_1
etc ...
MP-iBGP session
Reflected VPN Customer RoutesVP
N R
Rs
RR
Clien
ts
cet_pe_1bud_pe_1ber_pe_1and_pe_1
Peer-groups
The three main benefits of peer-groups on BGP speakers are:
Reduction of resource requirements (CPU load and memory) when formatting and propagating the BGP UPDATE messages
Faster BGP convergence
Peer groups simplify and reduce the BGP configuration.
With peer-group, the BGP table is walked only once and updates are replicated to all other peer-group members that are in sync. Depending on the number of members, the number of prefixes in the table and the number of prefixes advertised, this could significantly reduce the load. It is thus highly recommended that peers with identical outbound announcement policies be grouped into peer-groups.
All members of a peer-group must share identical outbound announcement policies (e.g., distribute-list, filter-list, and route-map), except for the default-origination, which is handled on a per-peer basis even for peer-group members. The inbound update policy can be customized for each individual member of a peer-group.
A peer-group must be either internal (with iBGP members) or external (with eBGP members). Members of an external peer-group have different AS numbers.
To observe the efficiency of packing and replication of UPDATE messages, the command “show bgp peer-group” will explain how many updates have been formatted and replicated. Example below is not a good representative as it comes from lab with lots of BGP session restarts. Ideally, the number of replications divided by number of formatted messages would be N-1 (when peer-group consists of N peers).
The replication efficiency of peer-groups can be degraded if some of the peers in the peer-group cannot consume the BGP updates at the same speed as other peer-group members. Then we say that peer-group members are not “in sync”.
rr2#sh bgp peer-groupBGP peer-group is CLIENTS, remote AS 1 BGP version 4 Default minimum time between advertisement runs is 5 seconds
For address family: IPv6 Unicast BGP neighbor is CLIENTS, peer-group internal, members: 2001:420::1 2001:420::2 2001:420::3 Index 1, Offset 0, Mask 0x2 Route-Reflector Client Community attribute sent to this neighbor Route refresh request: received 0, sent 0 Update messages formatted 95, replicated 122
The RR-CLIENT peer-group will be configured on each RR in Mipnet, and integrated in MP-BGP configuration as shown on the following example.
hostname bar_pe_1!router bgp 29453 no bgp default ipv4-unicast neighbor RR-CLIENT peer-group neighbor RR-CLIENT remote-as 29453 neighbor RR-CLIENT password 7 <md5_pwd> neighbor RR-CLIENT update-source Loopback0 ! iBGP full mesh of RRs neighbor <bij_pe_1 Loopback0> remote-as 29453 neighbor <bij_pe_1 Loopback0> password 7 <md5_pwd> neighbor <bij_pe_1 Loopback0> update-source Loopback0 ! RR Clients neighbor <and_pe_1 Loopback0> peer-group RR-CLIENT neighbor <ber_pe_1 Loopback0> peer-group RR-CLIENT ! etc. no auto-summary! address-family vpnv4 neighbor RR-CLIENT activate neighbor RR-CLIENT route-reflector-client neighbor RR-CLIENT send-community both ! iBGP full mesh of RRs neighbor <bij_pe_1 Loopback0> activate neighbor <bij_pe_1 Loopback0> send-community both ! RR Clients
neighbor <and_pe_1 Loopback0> peer-group RR-CLIENT neighbor <ber_pe_1 Loopback0> peer-group RR-CLIENT ! etc. exit-address-family!
MP-iBGP authentication
For security reasons, it is good design practice to authenticate routing updates in order to avoid attacks through the spoofing of routing protocols. BGP supports advanced cryptographic authentication using the MD5 hashing algorithm.
It is recommended to enable the cryptographic MD5 authentication in between MP-iBGP peers. The following is the required configuration to enable MD5 authentication between the MP-iBGP peers. Note that the authentication needs to be enabled under the global BGP configuration part, and not under the VPNv4 address family.
!router bgp 29453 no bgp default ipv4-unicast bgp log-neighbor-changes neighbor <iBGP peer> remote-as 20804 neighbor <iBGP peer> password 7 <password> neighbor <iBGP peer> date-source Loopback0 no auto-summary!
MP-iBGP Design Rules in Mipnet
The following design rules will be used for iBGP session between the RR and the VPN-PE (for exchanging VPNv4 routes) in the Mipnet:
1. PEs will peer with two VRRs.
2. iBGP meshing will be implemented on Loopback interfaces.
3. “no bgp default ipv4-unicast” on each router will prevent propagation of IPv4 routing information on MP-BGP sessions between a PE and VRR.
4. PEs and VRRs will hold the VPN routes only.
5. Each RR will use a separate Cluster ID (automatically set to Router_ID – Loopback0)
6. The VRRs will have an MP-iBGP peering with each other.
7. Synchronisation and Auto-Summary will be disabled in all address families.
8. Exchange of regular communities will be enabled on IPv4 BGP sessions. Exchange of regular and extended BGP communities will be enabled for the MP-BGP sessions.
9. ‘deterministic-med’ feature shall be enabled on all BGP routers.
10. BGP scan-time can be reduced to 30sec. Other BGP timers will be left to the default values.
11. Peer groups will be used on the RRs for more effective distribution of BGP routing updates and to reduce the configuration lines on the RR.
12. Enable logging of BGP neighbor changes.
13. Apply “ip bgp-community new-format” command to all BGP routers.
14. Enable MD5 authentication on iBGP (and eBGP) sessions
8.1.4 MPLS/VPN Topologies
Route Targets are used to build VPNs. RTs are used to decide which routes are visible in a VRF. By exporting routes with selective RTs and by selectively importing routes, different VPN models can be built.
Any-to-any VPN (Full Mesh)
This is the classical model and most common model. All the sites of a VPN can communicate directly with each other. All sites export the routes using the same RT and import routes that have the same export RT.
The following figure shows a VPN with three sites. All the sites can communicate with each other. All the sites use the same RT (65000:1) to export the routes and the same RT (65000:1) to import the routes.
AS number in MPLS/VPN network on the following examples is 65000.
Figure 22 Any-To-Any VPN Model
MPLS/VPN network
PE1
PE3
PE2CE1 CE2
CE3
10.0.0.0/ 2411.0.0.0/2412.0.0.0/24
10.0.0.0/2411.0.0.0/2412.0.0.0/ 24
10.0.0.0/2411.0.0.0/ 2412.0.0.0/24
RegionalSite1
RegionalSite3
VRF REDRD 65000:1RT export 65000:1RT import 65000:1
VRF REDRD 65000:2RT export 65000:1RT import 65000:1
RegionalSite2
VRF REDRD 65000:3RT export 65000:1RT import 65000:1
Hub and Spoke VPN – No Connectivity between Spokes
This VPN model is typically used when a VPN has a central site (Hub) and regional sites (spokes). The central site provides services to the regional sites and there is no need for the regional sites to communicate with each other directly.
The following figure depicts this connectivity model. The central site is connected to PE3. The two regional sites are connected to PE1 and PE2. The central site imports routes (with RT 65000:2) that are exported by the 2 Regional Sites. The Regional sites import only routes (with RT 65000:1) that are exported by the central site. This is called the Hub-and-spoke model without connectivity between spokes.
Figure 23 Hub and Spoke – No Connectivity between Spokes
MPLS/VPN network
PE1
PE3
PE2CE1 CE2
CE3
10.0.0.0/ 2412.0.0.0/24
10.0.0.0/2411.0.0.0/2412.0.0.0/ 24
11.0.0.0/ 2412.0.0.0/24
RegionalSite1
Central SI TE
VRF REDRD 65000:1RT export 65000:2RT import 65000:1
VRF REDRD 65000:2RT export 65000:2RT import 65000:1
RegionalSite2
VRF REDRD 65000:3RT export 65000:1RT import 65000:2
Hub and Spoke VPN – Connectivity between Spokes via Hub
A double-VRF solution explained in this chapter, is currently not supported by ISC. ISC product development are assessing how this topology option can be more cleanly implemented with the product7. However, there is a workaround by using 3 VPN objects and the RT range for special task with ISC. This workaround will be elaborated in ISC LLD document.
This VPN model is used for VPNs that have a Central Site (Hub) and Regional Sites (Spokes) and the Central Site controls the connectivity between the Regional Sites. For example the Central Site might have a Firewall and all the traffic between the Regional Sites must pass through the Firewall.
The following figure shows a VPN with one central site and two regional sites. Communication between the two regional sites is through the central site. The central site is connected to the MPLS-VPN Network with two different VRFs in order to implement this solution. VRF HUB-IMPORT is used to import all the routes from the Spoke sites and the second VRF HUB-EXPORT is used to export all the routes (including the routes from the Central Site) to the Spoke sites.
The bold arrows indicate the data forwarding path from Regional Site2 to Regional Site1.
Figure 24 Hub and Spoke Model – Connectivity between Spokes via Hub
7 A solution using a single VRF for the HUB site could be implemented, but we don't recommend it for the following reason. The single-VRF solution requires that the HUB site provide the default Routes for the Spoke sites. The Hub VRF imports all the routes from the Spoke VRFs and export only the default route. However, this requires policy-routing on the hub-CE router (to force the traffic received from a spoke-CE across the Firewall) - which is not a clear design approach.
MPLS/VPN network
PE1
PE3
PE2CE1 CE2
CE3
RegionalSite1
Central SI TE
VRF RED-SPOKE1RD 65000:1RT export 65000:2RT import 65000:1
VRF RED-SPOKE2RD 65000:2RT export 65000:2RT import 65000:1
RegionalSite2
CE4
VRF HUB-IMPORTRD 65000:3RT import 65000:2
VRF HUB-EXPORTRD 65000:4RT export 65000:1
Note: Multi-homed Hub site. During NRFU testing the Hub site has been multi-homed to another PE router. This protects against failure of single PE router. Looking at Figure24, CE3 and CE4 can have another physical connection to PE4, which is configured with VRFs HUB-IMPORT-BACKUP and HUB-EXPORT-BACKUP. Primary-backup routing policy can be achieved with tunning of MED/LOC_PREF attributes.
In the case of eBGP being used as a CE to PE routing protocol on the hub site (see the section on CE to PE routing for more details), the “allowas-in” option needs to be used in the eBGP neighbour statement on the hub PE. This is needed because otherwise, routing updates entering the PE from the hub site would be dropped due to the presence of the MIPNET ASN in the AS_PATH list. The number of allowed occurrences of the MIPNET ASN should be set to 1.
Furthermore, if eBGP is also in use between the spoke CEs and PEs, the “as-override” option needs to be used in the eBGP neighbour statement on the hub PE. This will overwrite the spoke site ASN (which would be identical to the hub site ASN, since both sites belong to the same customer), with the MIPNET ASN, thus preventing the hub CE from dropping the BGP update.
Inter-VPN (Extranet)
Inter-VPN services can be created by exporting or importing one or more addresses in a VRF with multiple RTs.
A whole VRF can be configured to be part of more than one VPN by configuring multiple route-target import or route-target export statements. If multiple route-target export statements are configured then all routes originated by the VRF will be learnt be all other VRFs in the MPLS network that import any or the RTs exported. If multiple route-target import statements are configured on a VRF then the VRF will learn routes from any other VRF that exports one of the RTs configured.
If only specific hosts in a VRF should be exported to multiple VPNs then an export map can be configured in the VRF. An export map is a route-map that matches specific criteria (typically by matching addresses defined by an access list), and then sets specific attributes (typically the set extcommunity rt command is used to define the list of RTs the route should be exported with).
Import maps can be used on a VRF if it is necessary to control that addresses imported to the VRF. This is typically used as a security mechanism to stop unwanted routes being learned. An import map is a route-map that permits or denies routes based on how they match a specific criteria (typically by matching addresses defined by an access list).
A common example of an inter-VPN service is the management VPN. Sites with managed CEs need access to their own VPNs but also need to advertise some routes to the management VPN. In this case there will be 2 or more route-target import statements in the VRF, one or more route-target export statements and an export map. The import
statements define which RTs to import for the standard VPNs that the VRF is part of with an additional import statement to import the routes advertised by the management VPN. The export statement defines the standard VPNs that the whole VRF should be advertised to and the export map manipulates specific routes, such as the CE loopback addresses, that should be exported with the RT that is imported by the management VPN. The management VPN is a hub-and-spoke VPN (ie different RTs imported and exported) so that there is no unwanted routing between different customers, all managed VRFs export one RT which is imported into the management VRF and the management VRF exports a different RT which is imported into all managed VRFs. Import maps can be configured on the management VRF so that only permitted routes are learned by the management network.
8.1.5 MPLS/VPN Access Layer
Important:
Any provisioning of IP addresses, QoS features and routing protocol between CE and PE routers must be handled by ISC. Manual configuration will not be synchronized with ISC inventory database.
The configuration examples in the following chapters are supposed to highlight implementation of various connectivity models.
Addressing between VPN-PE and CE
TMN will allocate a registered (public address) block for numbering of PE-CE links.
Why do we need public IP addresses for numbering of PE-CE connections?
Mipnet customers may already use the private address blocks and these may overlap with access PE-CE subnet that TMN selected. This would break end-to-end connectivity within that MPLS/VPN. Imagine host1 in site1 that wants to reach host2 in site2, and the IP address of host2 is the same as the CE1 access link. Host1 would then talk to CE1 instead of host2.
CE-PE Connectivity Scenarios
Figure below depicts possible connectivity scenarios (logical connections) between MPLS/VPN customers’ site and the Mipnet.
The simplest connectivity Option 1 interconnects customer site using single CE router and single CE-PE link. When for example high-speed media is not available, Option 2 with multiple parallel links between a single PE and single CE may be used. Both Option 1 and 2 do not offer any resiliency mechanisms.
Options 3-5 are more appropriate for customers’ sites with high-availability requirements. Option 5 is the most resilient connectivity model, with two CE routers and two PE routers interconnected via primary and backup link.
Figure 25 Access-layer Topologies
Routing protocols between PE and CE
It is possible to use different routing protocols between the PE and CE. This is true on a link by link basis, ie. two interfaces connected to the same PE are part of the same VRF and one uses eBGP as it’s routing protocol and the other uses static routes. Different routing protocols have their own strengths and weaknesses that are highlighted in the sections below.
Static Routes
Static routes are simple, stable and do not require a great deal of router resources. Default routes can often be configured so that individual static routes are not required for each routable subnet. Following is the PE configuration template for static routing between PE and CE. On CE side, the static default route will point across the CE-PE link.
!router bgp 29453 address-family ipv4 vrf <VRF name> redistribute connected redistribute static [Redistribution options] no auto-summary no synchronization exit-address-family!ip route vrf <VRF name> <address> <mask> <destination> [Options]!
Static routes do, however, have the drawbacks that they are not dynamic and require additional configuration every time a route changes. This can be a considerable administrative overhead in rapidly changing networks or when default routes are not
Option 1
PE
CE
Option 2
CE
PE
Option 4
PE PE
Option 3
CE CE*
PE
Option 5
PE
CE
PE
CE*CE
possible. If default routes are not possible on the CE then every route configured on a PE towards a CE will also need to be configured on every other CE in the VPN.
Default routes are sometimes not possible if the customer is using them elsewhere within their network, for example if a customer is using Mipnet to provide an internal VPN but is using another service provider (or Mipnet) to provide Internet access via separate circuit or from another CE router. If default routes are used within a network then this also has a drawback if packets are destined to unreachable addresses because they will be routed through the network all the way to the last router seeing a default route before the packet is rejected. In a dynamic routing environment the packet would be dropped at the first hop, thus requiring less network resources.
Routing Stability
When the CE-PE interface goes down, the all routes associated with that interface will be removed from the routing table and BGP withdrawn message will be sent to all neighbors. To improve the routing stability, the keyword “permanent” can be appended to the static route configuration statement. This will cause static route to remain in routing (and BGP) table regardless the status of CE-PE link. Of course this solution only makes sense for single-homed sites, with a single CE-PE link.
In conclusion, static routing is particularly well suited for sites mono-attached to the service provider backbone and with a simple and stable routing scheme.
eBGP
eBGP is the most appropriate dynamic routing protocol for PE-CE links in MPLS/VPN environments. It provides extensive routing policy features, naturally prevents routing loops and doesn’t require route redistribution on the PE. These features make eBGP the only option when sites are multi-homed.
BGP does have the limitation that the CE requires an image capable of supporting the protocol. This can mean that a more expensive IOS image, even a larger routing platform, or more memory may be required.
Some new service providers consider BGP to be overly complicated and heavy on network resources because they associate it with routing on the Internet, and therefore are apprehensive about using it on PE-CE links. These fears are not valid in this environment. On a PE-CE link there is usually only one neighbour and in most cases the extensive route selection/manipulation features available with BGP are not used. In a stable network BGP does not require a great deal of system and network resources, it does not require the frequent periodic flooding of routes associated with RIPv2.
Route redistribution is required between the customer’s IGP and BGP on the CE rather than on the PE as with other routing protocols. This is beneficial to the service provider because it means that less routing protocols are required on the PE.
BGP requires that an AS Number is configured on the CE. Private AS numbers (64512 to 65535) can be used so that it is not necessary to obtain public AS numbers. The same AS number can be used for all the sites of a VPN to conserve the number of AS numbers this also allows for VPNs that have more than 1024 sites. AS-override is used on the PEs in order to reuse the same AS number for all the sites.
In order to prevent loops for multi-homes sites, a BGP extended community Site-Of-Origin (SOO) is used to identify each site. SOO has the same format as the RD or the RT (X:Y) and uses 64 bits. Each multi-homed site is assigned a unique SOO.
For security reasons, it is good design practice to authenticate routing updates in order to avoid attacks through the spoofing of routing protocols. It is therefore recommended to enable the cryptographic MD5 authentication between eBGP peers. Note that the authentication needs to be enabled under the appropriate IPv4 VRF address family.
Dynamic protocols do however use more system resources and open service providers to dangers caused by customer network instability or poor configuration. Two safeguards will be implemented to protect the TMN network infrastructure against ill-behaving customers.
The total number of eBGP learned routes from any CE will be limited to <100>. A syslog warning message will be generated at a threshold of 75 % of the maximum value. In the event that the CE sends more prefixes than the configured maximum, the eBGP neighbourship will be dropped.
The total number of routes in any VRF will be limited to <1000>. In the event the total number of routes in any VRF reaches a threshold of 750, a syslog warning message will be generated. When the total number of routes in the VRF reaches the maximum value, the VRF will stop accepting new routes. It is important to understand that this maximum applies to the total number of routes in the VRF, whether they have been locally accepted from CEs or remotely through MP-iBGP.
The following design rules apply to the eBGP session between the PE and CE in Mipnet:
1. Private AS numbers shall be used for the BGP session on the CE Router
2. as-override will be used to conserve the private AS numbers.
3. Each multi-homed customer site belonging to a VPN will be assigned a unique SOO attribute to detect and prevent loops.
4. All the eBGP timers will be left to their default value in this phase of the project.
5. MD5 passwords will be used to authenticate eBGP session between VPN-PE and CE. The password will be unique for each VPN.
6. Peer-Groups can be used on the VPN-PE routes for eBGP sessions with CE Routers that belong to the same VPN.
In conclusion, BGP is the recommended routing protocol on PE-CE links and is the only option for multi-homed sites. BGP should be used whenever a dynamic routing protocol
is required (many routes or frequent changes in IP addressing of customer sites) and the CE is capable of supporting it.
The following IOS commands shows sample configuration on a PE router that uses BGP as the Routing Protocol with the CE. This configuration also uses as-override and soo. The soo is set to 29453:4
!hostname PE!ip vrf VRF1 maximum routes 1000 750!router bgp 29453 address-family ipv4 vrf VRF1 redistribute connected neighbor CE remote-as 65001 neighbor CE password vrf1 neighbor CE update-source Serial3/0:0 neighbor CE activate neighbor CE send-community neighbor CE maximum-prefix 100 neighbor CE as-override neighbor CE route-map VRF1-SOO in no auto-summary no synchronization exit-address-family! route-map VRF1-SOO permit 10 set extcommunity soo 29453:4!
BGP load-balancing across parallel links between PE and CE router
This configuration shows the implementation of load-balancing on parallel PE-CE links, when using BGP. Please note that destination-based CEF load-sharing shall be implemented. The destination based CEF load-sharing is turned on by default. Private IP addresses could be used for VRF loopback addresses if possible (when not conflicting with customer addressing scheme).
hostname CE!interface Loopback0 description eBGP session with PE
ip address 10.20.20.2 255.255.255.255!interface Ethernet 0 description LAN interface ip address 10.10.10.1 255.255.255.0!router bgp <CustomerAS> network 10.10.10.0 mask 255.255.255.0 neighbor 10.20.20.1 remote-as 29453 neighbor 10.20.20.1 update-source Loopback0 no sync no auto!ip route 10.20.20.1 255.255.255.255 <PE-serial1>ip route 10.20.20.1 255.255.255.255 <PE-serial2>
hostname PE!interface Loopback<VRF_loopback> ip vrf forwarding red ip address 10.20.20.1 255.255.255.255!router bgp 29453!address-family ipv4 vrf red redistribute static neighbor 10.20.20.2 remote-as <CustomerAS> neighbor 10.20.20.2 ebgp-multihop 2 neighbor 10.20.20.2 update-source Loopback1 neighbor 10.20.20.2 activate neighbor 10.20.20.2 as-override no auto-summary no synchronization exit-address-family!! 10.20.20.2 is the CE Loopback IP address! used for eBGP Peering with PE router!ip route vrf red 10.20.20.2 255.255.255.255 <CE-serial1>ip route vrf red 10.20.20.2 255.255.255.255 <CE-serial2>
RIPv2
RIP v2 will be used as the routing protocol on the CEs that do not support BGP. RIPv2 does not provide the loop avoidance capabilities of BGP so RIP v2 should not be used in multi-homed sites. RIPv2 also can’t limit the number of prefixes received from neighbours.
Passwords will be used to authenticate RIP v2 sessions between VPN-PE and CE. The password will be unique for each VPN.
For small sites (sites that do not require the complete routing table of the VPN) and stub sites (sites that have only one network behind the CE router), RIP updates from the VPN-PE to the CE will be suppressed. The PE access interfaces (CE-PE links) will be configured with passive RIPv2 interfaces (PE will not send any RIP update to the CE). Default route pointing to the VPN-PE will be added to the routing table on the CE.
RIPv2 requires an address family to be created within RIP v2 and also requires redistribution of routes between it and the corresponding BGP address family.
hostname PE! key chain config must be the same on PE and CE!key chain <VRF name> key 1 key-string 234 ! interface Serial2 ip vrf forwarding <VRF name> ip address <PE_CE_subnet> ip rip authentication mode md5 ip rip authentication key-chain <VRF name>!router rip version 2 ! address-family ipv4 vrf <VRF name> version 2 passive interface Serial2 network <PE_CE_subnet> no auto-summary exit-address-family! router bgp 29453 address-family ipv4 vrf <VRF name> redistribute rip [Redistribution options]
no auto-summary no synchronization exit-address-family!
hostname CE! key chain config must be the same on PE and CE!key chain <VRF name> key 1 key-string 234 ! interface Ethernet0 ip address <CE_LAN>!interface Serial0 ip address <PE_CE_subnet> ip rip authentication mode md5 ip rip authentication key-chain <VRF name>!router rip version 2 network <PE_CE_subnet> network <CE_LAN> no auto-summary!ip route 0.0.0.0 0.0.0.0 Serial0
In conclusion, RIP v2 is suitable for mono attached sites with frequent routing changes and for routers that do not support BGP.
OSPF
Customers on PE-CE links often request OSPF because they use it as their IGP and consider that they will benefit by using it across the whole of their network. It is, however, not suitable for an MPLS VPN environment. OSPF has the severe drawback that each instance of an OSPF address family on a PE counts as an additional routing process. Routers are able to support a theoretical maximum of 31 routing processes in total and this includes connected routes, static routes and the BGP and IGP processes required to provide the MPLS environment. According to Cisco best practice, it is not recommended to implement more than 3 OSPF routing processes on a single router. Configuring additional routing processes requires additional system resources. OSPF therefore doesn’t offer a scaleable solution for service providers as it greatly limits the number of VRFs supported per PE.
Customers assume that because they are using OSPF on both sides of their VPN that they have a single OSPF routing environment. This is not the case because in an MPLS VPN environment you still require redistribution between OSPF and BGP on the PE and only BGP is carried across the core. OSPF does not have the loop avoidance capabilities of BGP in an MPLS VPN environment and is therefore not suitable for multi-homed sites. OSPF therefore has no benefits over BGP on the PE-CE link.
If OSPF is used as the customer’s IGP it is strongly advisable to redistribute it to BGP on the CE router. If BGP is not supported on the CE then redistribution into RIP v2 is preferable to using OSPF on the PE-CE link for all but a small number of customers because it will not limit the scalability of the service provider network.
In conclusion, although it is technically possible to supports a limited number of customers with OSPF on their PE-CE links this should not be implemented if it can be avoided.
Operation of OSPF on PE-CE link
Traditionally, an elaborate OSPF network consists of a backbone area (area 0) and a number of areas connected to this backbone by an area border router (ABR). By using an MPLS VPN backbone with OSPF on the customer's site, we introduce a third level in the hierarchy of the OSPF model. This third level is called the MPLS VPN super backbone.
The MPLS VPN super backbone enables customers to use multiple area 0 backbones on their sites. Each site can have a separate area 0 as long as it is connected to the MPLS VPN super backbone. The result is the same as a partitioned area 0 backbone. In this case, the PE routers are ABR and ASBR routers. The CE routers are ABR routers. The LSAs containing VPN information are transported using BGP extended communities from PEs to other PEs. In summary network (type 3) LSAs, information is transported between PEs and CEs.
Each VPN must have its own OSPF process.
The following example is provided to illustrate the implementation of OSPF on PE-CE links (again: this is not the recommended design) in a simplified manner.
Figure 26 OSPF on PE-CE link
hostname PE1!! we need loopback2 to fix OSPF RTR_ID during PE booting sequence!interface Loopback2 ip vrf forwarding vpn1 ip address <loopback2>!ip vrf vpn1 rd 6855:2 route-target export 6855:2 route-target import 6855:2! interface Serial1/1 description PE-CE link ip ospf cost Y ip vrf forwarding vpn1 ip address <PE_CE_subnet_vpn1>!router ospf 2 vrf vpn1 log-adjacency-changes passive interface loopback2 redistribute bgp 20804 subnets network <PE_CE_subnet_vpn1> area 0!router bgp 6855! address-family ipv4 vrf vpn1 redistribute ospf 2 no auto-summary
Area 2Area 1
Network:10.0.10.0/24
Area 0Area 0
CE1 CE2
Telenergo MPLS/VPN networkPE1 PE2
1. Type-2 Network LSALSA-I D: DR/10.0.10.0Adv. Router: x.x.x.x
2. Type-3 Summary LSALSA-I D: 10.0.10.0Adv. Router: CE1Metric: X
3. VPNv4 MP-BGP update10.0.10.0/24Next-hop: PE1RT: 20804:2MED: X+YExt. Community: <area><LSA_type> <OSPF_rtr_ I D><ospf_proc_ I D>
5. Type-3 Summary LSALSA-I D: 10.0.10.0Adv. Router: CE2Metric: X+Y+Z
4. Type-3 Summary LSALSA-I D: 10.0.10.0Adv. Router: PE2Metric: X+Y
no synchronization exit-address-family !
hostname CE1!interface Serial0 description PE-CE link ip address <PE_CE_subnet_vpn1>!interface Ethernet0 description LAN segment ip ospf cost X ip address 10.0.10.1 255.255.255.0!router ospf 2 log-adjacency-changes network <PE_CE_subnet_vpn1> area 0 network 10.0.10.0 0.0.0.255 area 1!
Multi-Homed Sites
A VPN site can have multiple connections to the MPLS backbone. There are several possibilities for multi-homed sites with connections to the MPLS backbone (Figure 25).
Single CE in Customer Site
One CE-PE link will be designated as the Primary link and will carry all the traffic between CE and PE (and vice versa). The other link will be the Backup link and will forward traffic between the CE and PE (and vice versa) only when the Primary link fails.
Primary and Secondary links can be implemented using:
MED for traffic from PE to CE, if eBGP as the Routing protocol between PE and CE
local preference for traffic from CE to PE if eBGP as the Routing protocol between PE and CE
RIP v2 shall not be used as the Routing Protocol between PE and CE if the site is multi-homed.
The following IOS commands show how this can be done for BGP on the CE for traffic from PE to CE (using MED) and CE to PE (using local preference). We can see that configuration of routing policy is completely offloaded to CE router, i.e. the CE will
signal the preferred entry point into the customer site via MED. This is desired as it simplifies the PE configuration!
hostname CE!interface Ethernet 0 description LAN interface ip address 10.10.10.1 255.255.255.0!router bgp <CustomerAS> network 10.10.10.0 mask 255.255.255.0 ! Primary link neighbor <VPNv4-link-on-PE1> remote-as 29453 neighbor <VPNv4-link-on-PE1> route-map med-primary out neighbor <VPNv4-link-on-PE1> route-map localpref-primary in ! Secondary Link neighbor <VPNv4-link-on-PE2> remote-as 29453 neighbor <VPNv4-link-on-PE2> route-map med-secondary out neighbor <VPNv4-link-on-PE2> route-map localpref-secondary in no sync no auto!route-map med-primary permit 5 set metric 100!route-map localpref-primary permit 5 set local-preference 100!route-map med-secondary permit 5 set metric 110!route-map localpref-secondary permit 5 set local-preference 90
hostname PE1 ! Configuration template for PE2 is identical!router bgp 29453 address-family ipv4 vrf <CustomerVRF> redistribute connected neighbor <VPNv4-link-on-CE> remote-as <CustomerAS> neighbor <VPNv4-link-on-CE> password <BGP_PWD>
neighbor <VPNv4-link-on-CE> update-source <PE-CE_interface_name> neighbor <VPNv4-link-on-CE> activate neighbor <VPNv4-link-on-CE> send-community neighbor <VPNv4-link-on-CE> maximum-prefix 100 neighbor <VPNv4-link-on-CE> as-override neighbor <VPNv4-link-on-CE> route-map <CustomerVRF>-SOO in no auto-summary no synchronization exit-address-family!route-map <CustomerVRF>-SOO permit 10 set extcommunity soo 29453:<SOO>
Multiple CEs in Customer Site
When more than one CE implements connectivity with the Mipnet, one CE will be the Primary CE and will forward all the traffic between the CE and VPN-PE. The other CE will be the Backup CE and will start forwarding traffic between CE and VPN-PE only when the primary CE fails. The following picture depicts this scenario.
Figure 27 Fully Redundant Access Scenario (2CEs-2PEs)
Mipnet
PE1
PE2 0.0.0.0/0LOC_PREF = 50
CE1
CE2
Cust_prefixMED=150
0.0.0.0/0LOC_PREF = 150
Cust_prefixMED=50
HSRP address -Active
HSRP address -Stanby
ip route 0.0.0.0/0 HSRP_address
eBGP between CE and PE
iBGP between the CE1 and CE2
Primary PE-CE link
Backup PE-CE link
Routing updates
For traffic going from VPN-PE to CE, MED can be used on the CEs to force the traffic on to the primary CE. BGP is recommended as the routing protocol between PE and CE for this case, because customer routing policy (primary-backup) is configured on CE router (no route-maps are needed on PE side, which simplifies PE router configuration).
For traffic from CE to VPN-PE, Primary and Backup CE is implemented using
HSRP for traffic originating from the LAN interface of the CE. The Primary CE is in active state and the Secondary CE will be in standby state.
Redistributing routes learnt from the VPN-PE (via BGP) into the IGP with different metrics for traffic originating from the Customer network that is located behind the CE Router. The primary CE will redistribute with a better metric than the Secondary CE.
Internet Access for MPLS/VPN customers
Overview
There are two basic design models for combining Internet Access with MPLS / VPN services.
Internet access is offered through global routing on the PE routers. There are 2 implementation options.
o A first one is to implement packet leaking shortcut between a VRF and the global routing table. This option has a number of drawbacks and must be avoided.
o A second implementation option is to use separate physical or logical interfaces for VPN and for Internet access. The physical or logical interface meant for Internet access will be placed in the global routing table. Ideally, the Internet interface (also called IPv4 link) will be implemented on a separate CE router, which permits to put the FW in customer site.
Internet access is offered through yet another VPN. This is called the Internet VPN (and associated Internet VRF). This solution has the advantage that the provider’s backbone is isolated from the Internet, resulting in improved security. A drawback is that full Internet routing cannot be implemented because of scalability problems.
Internet in a VPN approach has been requested by TMN, and will be implemented in Mipnet. Internet Transport chapter in explains implementations details.
Two CEs – Two Physical Links
From the point of view of the VPN customer, the “separate CE” design model maps ideally on the situation where the VPN customer wants centralised and firewalled access to the Internet. Secure The customer managed firewall can provide NAT services in between the private VPN addressing and the public Internet addressing. The central customer site firewall gives the customer the ability to control security and Internet service policies. A drawback is that all the Internet traffic must flow through a central site, which could be problematic for pan-European or worldwide VPNs (RTTs).
For example, a large bank with hundreds of branches would not want to implement Internet access directly from each of the branches, as this would imply management of strict security policies at every site (difficult and expensive). The centralised FW approach with two CE routers is more appropriate solution.
Figure 28 Internet Access from a VPN using separate CEs and two physical links
tkc_pe_1
MPLS Network
CE1
Internet
PE1
PE3
CE1
VPN Hosts
Region. Site
Default route injected intoVPN
Data forwarding path fromregional sites to Internet
CE2
VRF_RED interface (VPNv4)
VRF_ INET interface (VPNv4)
DMZ
FW
It is worth to mention that default static routes will be injected into VPN and used by regional sites, but the default route can not be used for VPN traffic on central site. On the drawing above, the CE2 will be configured with a default route pointing to PE3 via IPv4 interface. For this reason, the CE1 (and CE2 and FW) have to have all the VPN routes in the routing table.
Central site shall learn the VPN routes dynamically with BGP4 or RIPv2 between CE1 and PE3. This is recommended approach as it allows greater flexibility and redundancy. For example, customer may want to implement two VPN CEs in central site to improve service availability.
In case of small number of regional prefixes, or if all regional prefixes can be summarized in a single aggregate route, static route can be implemented from CE1 to PE3 for VPN traffic. Static routing shall be preferred option for FW.
Multi-VRF CE (Single Physical Link)
Cost-effective solution described in this chapter is based on a Multi-VRF CE (also called VRF-Lite). The advantage of Multi-VRF design is that only a single physical connection and a single CE router is required to implement Internet access from the MPLS/VPN in a relatively secure manner.
Multi-VRF Overview
In essence Multi-VRF extends the PE functionality up to CE router, without need for MP-iBGP or MPLS on CE-PE link. Multi-VRF CE architecture uses the VRF concept to support multiple (overlapping and independent) routing and forwarding tables per customer; in our case VPN and Internet routing table. Multi-VRF is not a feature but an application based on VRF implementation.
Multi-VRF CE could be used to interconnect several departments of Customer network with a single CE router, as displayed on the following two drawings. First picture shows a classical setup, where several CE routers are needed to implement separate VPNs for Engineering, Finance and HR departments.
Figure 29 Multiple CEs
CE router
PE routerMPLS network
Site 1
Engineering
HR
Finance
CE router
CE router
In Figure 30, the CE router using Multi-VRF can segment its LAN traffic by placing each client or organization with its own IP address space either on separate Ethernet interfaces such as Client 5 or through one FastEthernet interface segmented into multiple sub-interfaces. Each sub-interface contains its own IP address space to separate each different client.
When receiving an outbound customer data packet from a directly attached interface, the CE router then performs a route lookup in the VRF that is associated with that site. The specific VRF is determined by the interface or sub-interface over which the data packet is received. Support for multiple forwarding tables makes it easy for the CE router to provide the per-VPN segregation of routing information before it is sent to the PE router. The use of a E1 line with multiple point-to-point sub-interfaces allows traffic from the CE router to the PE router to be segmented into each separate VRF.
In this model, the CE router associates a specific VRF by the clients connected to its interfaces and exchanges that information with the PE router. Any routing protocol that is supported by normal VRF can be used in a Multi-VRF CE implementation.
Figure 30 Multi-VRF CE
CE - VRF
Client 510.1/24
PE
Client 1
10.1/24
Client 2
11.1/24
MPLSNetwork
Client 3
12.1/24
Client 4
13.1/24
One E1 line with MultiplePoint-to-Point Sub-Interfaces
CE-VRF
1. CE-VRF learns Client 1’s VPN Green routes from a sub-interface of the Fast Ethernet interface directly attached to CE-VRF. CE-VRF then installs these routes into VRF Green
PE2. PE 1 learns Client 1’s VPN Green routes from
the CE-VRF and installs them into VRF Green.
Local VPN Blue routes from Client 4 are not associated with VPN Green and are not imported into VRF Green
Multi-VRF CE for Internet Access.
Figure 31 depicts the Internet access scenario with a single Multi-VRF CE router and single physical link. Such design might be preferred option for small business, where infrastructure expenses very often need to be minimized.
Previously described 2CE-2Links-FW solution is comparable with Multi-VRF CE setup below, because the traffic flows between Business VPN and Internet are enforced to traverse the FW. FW implements company security policies and NAT, when private addressing is used in Business VPN. The fundamental security characteristic of 2CE-2Links-FW solution is that Business VPN domain, including the Business VPN CE router in Central site is totally separated from Internet domain with a FW. If for example an intruder in the Internet gains access to the Internet CE, the operation of Business VPN shall8 not be affected.
Whereas if a Multi-VRF CE is hacked, or flooded by a DoS storm from the Internet, the implications on serviceability and security of Business VPN may be severe, and extremely expensive compared to the price for 2CE-2Links design.
Figure 31 Internet Access from a VPN – Multi-VRF CE
8 This depends on FW implementation and security rules.
Multi-VRFCE
tkc_pe_1
MPLS Network
CE1
Internet
PE1
PE3
VPN Hosts
Region. Site
Default route injected intoVPN
Data forwarding path fromregional sites to Internet
VRF_RED interface (VPNv4)
VRF_ INET interface (VPNv4)
FR/dot1Q Link - two LogicalConnections
FW
DMZCentral Site
The following configuration example shows how the Multi-VRF CE and PE are configured for Internet access. Example below is only supposed to illustrate Multi-VRF traffic flows, Multi-VRF in production network will be provisioned by ISC.
Ethernet is assumed as layer2 media between Central site and PE. Layer-3 design would be the same if Layer-2 media is FR or ATM. Static routing can be used for Business and Internet VPNs, because all sites are single-homed and number of VPN routes is relatively small. Public servers are attached to DMZ interface on FW.
Although RED_VPN is configured in Any-to-Any topology, only central site originates a default-route towards remote sites of RED_VPN (Hub-and-Spoke topology for Internet access.) Spoke CEs are connected via a single link with PE routers and default route can be used for CE->PE routing.
hostname PE3!ip vrf RED rd 1:1 route-target export 1:1 route-target import 1:1!ip vrf INET rd 1:2 route-target export 1:2 route-target import 1:2!
interface Ethernet0/0.1 encapsulation dot1q 1 ip vrf forwarding RED ip address <pe_multivrf_red>!interface Ethernet0/0.2 encapsulation dot1q 2 ip vrf forwarding INET ip address <pe_multivrf_inet>!router bgp 29453! address-family ipv4 vrf RED redistribute static redistribute connected no auto-summary no sync network 0.0.0.0 exit-address-family ! address-family ipv4 vrf INET redistribute static no auto-summary no sync exit-address-family!! Routes injected in VPN RED by Central_PE!ip route vrf RED <0/0> <ce_multivrf_red>ip route vrf RED <central_site_red_subnet> <ce_multivrf_red>!! DMZ injected in VPN INET by Central_PE!ip route vrf INET <central_site_dmz_subnet> <ce_multivrf_inet>!! Additionally, all VPN RED routes needs to be redistributed into INET VPN if! NAT is not configured on FW!!ip route vrf INET <hub_subnet> <ce_multivrf_inet>!ip route vrf INET <spoke1_subnet> <ce_multivrf_inet>!ip route vrf INET <spoke2_subnet> <ce_multivrf_inet>!ip route vrf INET <spoke3_subnet> <ce_multivrf_inet>
!ip route vrf INET <spoke4_subnet> <ce_multivrf_inet>!...
hostname MULTI-VRF-CE!ip vrf RED rd 1:1 route-target export 1:1 route-target import 1:1!ip vrf INET rd 1:2 route-target export 1:2 route-target import 1:2!! Ethernet 0/0 -> CE-PE links! Ethernet 1/0 -> CE-FW links!interface Ethernet0/0.1 encapsulation dot1q 1 ip vrf forwarding RED ip address <ce_multivrf_red>!interface Ethernet0/0.2 encapsulation dot1q 2 ip vrf forwarding INET ip address <ce_multivrf_inet>!interface Ethernet1/0.1 encapsulation dot1q 1 ip vrf forwarding RED ip address <ce_fw_red>!interface Ethernet1/0.2 encapsulation dot1q 2 ip vrf forwarding INET ip address <ce_fw_inet>!! Packets received from PE via INET VRF must be directed towards FW!ip route vrf INET <central_site_dmz_subnet> <fw_fw_inet>!
! Without NAT, routes for all SPOKE sites (packets received from PE via INET VRF! must be directed towards FW)!!ip route vrf INET <spoke1_subnet> <fw_fw_inet>!ip route vrf INET <spoke2_subnet> <fw_fw_inet>!ip route vrf INET <spoke3_subnet> <fw_fw_inet>!ip route vrf INET <spoke4_subnet> <fw_fw_inet>!...!! Routes for all SPOKE sites (Internet packets received from FW)!ip route vrf RED <spoke1_subnet> <pe_multivrf_red>ip route vrf RED <spoke2_subnet> <pe_multivrf_red>ip route vrf RED <spoke3_subnet> <pe_multivrf_red>ip route vrf RED <spoke4_subnet> <pe_multivrf_red>!...!! Default route (Packets received from VPN RED spoke sites will be sent towards FW)!ip route vrf RED <0/0> <pe_multivrf_red>!! Default route (Packets received from Spokes via FW in INET VRF are forwarded to PE! in INET VRF)!ip route vrf INET <0/0> <pe_multivrf_red>!
Network Address Translation for MPLS/VPN customers
The following configuration template can be used on customer’s CE router in case of private IP addressing in customer site. The example below shows two types of NAT translations:
Static on-to-one translation for servers in customer site, that must be reachable from the Internet
Dynamic NAT in overload mode (PAT) for PC clients.
Please note that NAT is only required on Internet link.
hostname CE!interface Ethernet0 description Customer site x ip address 10.10.10.254 255.255.255.0 !--- This is the inside local IP address and it's a private IP address. ip nat inside!interface Serial0 description CE-PE Internet link ip address 213.x.x.x 255.255.255.252 !--- This is the inside global IP address. !--- This is public IP address and it is provided by TMN. ip nat outside!interface Serial1 description CE-PE VPN link ip address 213.x.x.x 255.255.255.252 !--- NAT is not performed on the VPNv4 link!!--- This statement makes the router perform PAT to overload the Serial0!--- IP address for all the End Stations behind the Ethernet interface !--- that are using private IP addresses defined in access list #1.ip nat inside source list 1 interface Serial0 overload!!--- This statement performs the static address translation for the Web server. !--- With this statement, users trying to reach 171.68.1.1 port 80 (www) will be !--- automatically redirected to 10.10.10.5 port 80 (www), which in this case !--- is the Web server.ip nat inside source static tcp 10.10.10.5 80 171.68.1.1 80!!--- This access list defines the private network !--- that will be network address translated using PAT overload mode. access-list 1 deny host 10.10.10.5access-list 1 permit 10.10.10.0 0.0.0.255
!ip route 0.0.0.0 0.0.0.0 Serial0!
Figure 32 NAT in CE router
PE CE
Webserv.
PC
VPNv4 link
I Pv4 link S0
S1 E0
10.10.10.5/24
10.10.10.x/24
.254
Static NAT translation10.10.10.5 <-> 171.68.1.1
Dynamic NAT inoverload mode
VRFip route 10.10.10.0/24 ->
S1@CE1
Global RTip route 171.68.1.1/32 ->
S0@CE1
8.1.6 Inter Provider (aka. Inter-AS) MPLS/VPNs
Service Model
The inter-autonomous systems for MPLS VPNs feature allows service providers, running separate autonomous systems, to jointly offer MPLS VPN services to the same end customer. A VPN can begin at one customer site and traverse different VPN service provider backbones before arriving at another site of the same customer. Previous MPLS VPN could only traverse a single BGP autonomous system service provider backbone. The inter-autonomous system feature allows multiple autonomous systems to form a continuous (and seamless) network between customer sites of a service provider.
Figure 33 shows two ISPs that offer MPLS/VPN services. ISPs can operate in different geographic areas or compete within the same territory. In both cases the aim is to enable MPLS/VPN service for a customer that has several sites connected to both providers.
Customer VPN_C on figure below is for example a large Enterprise with branches in Montenegro and Serbia. Enterprise would like to replace its WAN network that currently spans across both countries with more cost efficient MPLS/VPN service, which is already offered in both countries by local ISPs (eg. TMN and Telecom Serbia).
Figure 33 Inter-AS Service Model
VPN C VPN C VPN BVPN A
ISP 1
(MPLS/VPN)
ISP 2
(MPLS/VPN)
Inter-AS Implementation Details
Figure 34 illustrates one MPLS VPN consisting of two separate autonomous systems. Each autonomous system operates under different administrative control and runs a different IGP. Service providers exchange routing information through EBGP border edge routers (ASBR1, ASBR2).
The following describes exchange of VPN routing information:
The provider edge router (PE-1) assigns a VPN label for a route before distributing that route. The PE router uses MP-iBGP to transmit VPN label mapping information towards RR. The PE router distributes the route as a VPNv4 address.
Route reflector reflects VPNv4 internal routes within the autonomous system.
The EBGP border edge router (ASBR1) redistributes the route to the next autonomous system (ASBR2). ASBR1 specifies its own address as the value of the EBGP next hop attribute and assigns a new label. The address ensures the following:
o That the next hop router is always reachable in the service provider (P) backbone network.
o That the label assigned by the distributing router is properly interpreted. (The label associated with a route must be assigned by the corresponding next hop router.)
The EBGP border edge router (ASBR2) redistributes the route in one of the following ways, depending on its configuration:
o If the IBGP neighbors are configured with the neighbor next-hop-self command, ASBR2 changes the next hop address (and label) of updates received from the EBGP peer, then forwards it on.
Scalability note: this requires an entry in LFIB for each VPN route received from ASBR1.
o If the IBGP neighbors are not configured with the neighbor next-hop-self command, the next hop address does not get changed. ASBR2 must propagate a host route for the MP-eBGP peer through the IGP. To propagate the host route of MP-eBGP neighbor, use the redistribute connected subnets command in the backbone IGP.
This alternative will be implemented in Mipnet
Inter-AS does not require redistribution of VPN routing information into provider IGP, or the exchange of IGP routes between the providers. This is because the next hop for VPNv4 routes is rewritten by ASBRs.
MPLS (LDP) is not needed on ASBR1-ASBR2 connection.
MP-eBGP session between two ASBRs must be established on directly connected interface – multihop MP-eBGP on ASBRs’ Loopbacks is not supported.
Figure 34 MP-eBGP – VPN route and label propagation
Figure 35 MP-eBGP – Packet forwarding
Configuration Template
The following is the configuration template for Inter-AS (ASBR) router, which for easier understanding shows only relevant commands.
ASBRs are integrated in RR configuration in the same way as other MPLS/VPN PEs.
hostname <asbr>!interface Serial4/0 description Inter-AS Link to ASx ip address 12.0.0.2 255.255.255.252!router isis net <isis_net> redistribute connected!router bgp 29453 no synchronization no bgp default ipv4-unicast no bgp default route-target filter neighbor 12.0.0.1 remote-as x neighbor <RR1_loopback> remote-as 29453 neighbor <RR1_loopback> update-source Loopback0
neighbor <RR2_loopback> remote-as 29453 neighbor <RR2_loopback> update-source Loopback0 no auto-summary ! address-family vpnv4 neighbor 12.0.0.1 activate neighbor 12.0.0.1 send-community both neighbor <RR1_loopback> activate neighbor <RR1_loopback> send-community both neighbor <RR2_loopback> activate neighbor <RR2_loopback> send-community both no auto-summary exit-address-family !
Operations and Security
Route Filtering
Outbound prefix list shall be configured on Inter-AS MP-eBGP session. This is a security mechanism because it prevents leaking of MPLS/VPN routing information to Inter-AS peers for VPNs that do not span across multiple autonomous systems.
Inbound prefix-list from MP-eBGP peer shall accept only the VPN routes that have been agreed by both ISPs and affected customers. This is a scalability mechanism that will protect Mipnet MP-iBGP system from accidental announcement of large number of VPN routes from Inter-AS peers.
Because maintenance of prefix lists could represent a serious operational problem, a compromise is to use “maximum-prefix” command on MP-eBGP peers.
Route Target Filters
A route-map with incoming RT filter shall only permit VPN routes with agreed RT values. This will for example prevent Inter-AS peer ISP from accidentally (misconfiguration) announcing a default route into MPLS/VPN that has all sites connected to Mipnet.
hostname <asbr>!router bgp 29453 neighbor 12.0.0.1 remote-as x!address-family vpnv4 neighbor 12.0.0.1 activate
neighbor 12.0.0.1 route-map INTERAS-IN-x in!route-map INTERAS-IN-x permit 10 match extcommunity 101!ip extcommunity-list 101 permit RT:1:3ip extcommunity-list 101 permit RT:1:4!
BGP Authentication
Inter-AS MP-eBGP sessions shall be MD5 authenticated as already explained previously in this LLD document.
“no bgp default route-target filter”
Because no VRFs are configured on ASBR (PE) router, all VPN routes received from RRs will be dropped. This command changes the default behaviour so that all VPN routes are accepted on ASBR. This will utilise BGP memory on ASBR, but will not consume VRF, CEF, routing table and LFIB (assuming “redistribute connected” in IGP) memory pools.
Border interface in the global routing table
Inter-AS links have to be terminated in a global routing table on ASBR PE. For this reason we recommend to deploy dedicated ASBR PE routers for termination of Inter-AS peer ISPs, and enable strict packet filtering for protection of Mipnet devices from attacks originated in Inter-AS peer autonomous systems.
(This is the same concern as with termination of IPsec sessions on a dedicated IPsec PE router.)
Incoming packet filtering
Incoming access list on Inter-AS interface shall block any traffic towards Mipnet routers’ interfaces.
8.1.7 Hierarchical VPNs (CsC – Carrier-supporting-Carriers)
Service Model
The proposed architecture is based on a hierarchical MPLS model. Nested label switched paths (LSPs) are used to create hierarchical MPLS-VPNs. Each LSP layer carries specific information, for example a TP forwarding label on top of a SP VPN label on top of a client VPN label. Each LSP layer is independent from the LSP layers above or underneath. MPLS services offered by a service provider are transparent to the IP/MPLS transport provider network. Label binding protocols are separated. The transport provider
network uses LDP on the forwarding plane, and MP-BGP to implement service provider VPNs. The service provider in turn uses MP-BGP to implement client VPNs.
The advantage of using a hierarchical MPLS-VPN model are clearly separated administrative domains and better scalability properties of the transport provider network.
A well-defined interface between transport and service provider domain decouples service provisioning. It allows the service provider to create MPLS-VPN services for his clients without involvement of the transport provider. This reduces a service provider’s turn-around time to client requests. It also limits the number of routes that have to be carried in the transport provider network, which improves the scalability of the network.
Figure 36 CsC Operational Model
TP1 TP2TPE1 TPE2SPE1CE1
Transport ProviderServiceProvider
ServiceProvider
Client
CE2
Client
SPE2
SP1
SPE3
Propagation of routing information in CsC topology is presented on Figure 37 and briefly described as follows (from left to right side of drawing) in two steps:
Service Provider MPLS/VPNs
o SPE1 redistributes static route for Client subnet behind CE1 into MP-iBGP.
o Client VPN route is propagated in MP-iBGP to any other SPE that has RED VRF attached. MP-iBGP topology can be any-to-any mesh or via RRs.
o Reachability of MP-iBGP next-hops (ie. the SPEs’ Loopbacks) is created transparently on the top of Transport provider MPLS/VPN network.
Transport Provider MPLS/VPNs
o SPE1 announces its Loopback, (and Loopbacks of any other SPE in the same site) and label to TPE1 via eBGP + send-label. eBGP session between SPE1 and TPE1 is created on directly connected link.
o TPE1 propagates VPN route for SPE1’s Loopback via MP-iBGP to TPE2.
o TPE2 has eBGP + send-label session with SPE2, created in VRF GREEN. SPE2 now has a route and label for SPE1’s Loopabck.
o SPE2 will redistribute the route for SPE1 to other SPEs in the same site via IGP (OSPF, IS-IS). LDP in Service Provider site is used to establish end-to-end LSP between SPE3 and SPE1.
Important to note is that Clinet routes, ie. the customer routes of Service Provider, are never injected into MP-iBGP of Transport Provider. Only the Loopback addresses of Service Provider PEs are carried in the MPLS/VPN system of Transport Provider.
Figure 37 CsC – Control Plane
TP1 TP2TPE1 TPE2SPE1CE1 CE2SPE2
SP1
SPE3
BGP<->OSPFredistribution
Client routes via MP-iBGP
TPE Loopbacks via OSPF + LDP
SPE Loopbacks viaeBGP + send-label
SPE Loopbacks viaeBGP + send-label
SPE Loopbacks viaIGP + LDP
SPE Loopbacks via MP-iBGP
Design Rules
Customer edge routers are connected via IP-based VRF interfaces to service provider edge routers.
Service provider edge routers are connected via IP-based VRF interfaces to transport provider edge routers. SPE routers are customer edge routers from the perspective of a transport provider.
In summary:
One MPLS-VPN per service provider is configured between TPE routers in the transport provider network. Forwarding in the transport provider network is based on MPLS labels.
IPv4 BPG peering with send-label is configured between TPE and SPE routers. Interfaces on the TPE routers are put into the VRF representing a specific service provider.
A service provider network typically consists of many small sites (one or two routers per site). SPE routers communicate with each other over a label switched path through the transport provider network.
All SPE routers form either a full iBGP-VPNv4 mesh or are clients of a SP route reflector.
The service provider provisions client VPNs over his MPLS-VPN network independently of transport provider.
Security and Operation
Security concerns on CsC TPE-SPE links are similar to legacy CE-PE MPLS/VPN customer connections, with the following extension:
Label spoofing
TPE need a security mechanism to accept (or not) labels used by the SPE. The TPE must control that labels used by the SPE are associated to IP routes present in the TPE-SPE VRF. TPE will keep the knowledge of which label bindings have been advertised to which TPE-SPE interface.
The TPE router ensures that the data traffic of one Service Provider is not spoofed by other Service Providers. This is accomplished in the TPE router by examining the labels in the MPLS traffic that each SPE router transmits to the TPE router. TPE verifies that each packet contains a label that the TPE router previously advertised to the particular SPE router.
Number of routes in Transport Provider MPLS/VPN
Normally the Service Provider will redistribute its IGP routes (or at least SPE Loopbacks) into BGP, for propagation towards TPE and further to other SPEs. Nevertheless, TPE shall have “maximum-prefix” command configured on TPE-SPE eBGP session, to prevent route advertisement accidents due to misconfiguration in Service Provider network.
Bi-connected SPEs
If SPE has two uplinks to single or distinct TPEs, usual BGP mechanisms can be used to define desired routing policy (eg. primary backup). TPE shall apply SOO extended community to routes received from bi-connected SPEs, to avoid routing information loops. Setting Site-of-Origin is not required for sites with only one connection to the TP network.
Redistribution between IGP and BGP is needed in SP sites with several SPEs.
Same AS number in all Service Provider sites
“As-override” command shall be applied on TPE, to replace SPE’s AS number with the one of Transport Provider. Otherwise the SPE will drop routing updates from SPEs in remote sites on account of local AS already in the AS_PATH.
Configuration Template
The following two printouts outline the CsC implementation details on TPE and SPE routers. Legacy MP-iBGP configuration statements on TPE are not shown.
!hostname TPE!ip vrf <SP-VPN-NAME> rd <SP-ROUTE-DISTINGUISHER> route-target export <SP-ROUTE-TARGET> route-target import <SP-ROUTE-TARGET>! interface <INTERFACE-TPE-TO-SPE> ip vrf forwarding <SP-VPN-NAME> ip address <IP-ADDRESS-TPE-TO-SPE> <MASK>! router bgp 29453!address-family ipv4 vrf <SP-VPN-NAME> neighbor <IP-ADDRESS-SPE-TO-TPE> remote-as <SP-AS-NUMBER> neighbor <IP-ADDRESS-SPE-TO-TPE> activate neighbor <IP-ADDRESS-SPE-TO-TPE> send-community both neighbor <IP-ADDRESS-SPE-TO-TPE> as-override neighbor <IP-ADDRESS-SPE-TO-TPE> send-label no auto-summary no synchronization exit-address-family! !hostname SPE!ip vrf <CLIENT-VPN-NAME> rd <CLIENT-ROUTE-DISTINGUISHER> route-target export <CLIENT-ROUTE-TARGET> route-target import <CLIENT-ROUTE-TARGET>!interface Loopback0 ip address <IP-ADDRESS-SPE-LOOP0> 255.255.255.255!interface <INTERFACE-SPE-TP-TPE> ip address <IP-ADDRESS-SPE-TO-TPE> <MASK>!router isis net <ISIS-NET> redistribute bgp <SP-AS-NUMBER> is-type level-2-only metric-style wide
log-adjacency-changes!router bgp <SP-AS-NUMBER> ! ! Network command is more elegant if a single SPE is in this site – otherwise ! redistribute ISIS<->IGP ! network <IP-ADDRESS-SPE-LOOP0> mask 255.255.255.255 redistribute isis level-2 route-map REDISTRIBUTE-ONLY-LOCAL-ADDRESSES neighbor SPE peer-group neighbor SPE remote-as <SP-AS-NUMBER> neighbor SPE update-source Loopback0 neighbor <IP-ADDRESS-TPE-TO-SPE> remote-as 29453 neighbor <IP-ADDRESS-TPE-TO-SPE> activate neighbor <IP-ADDRESS-TPE-TO-SPE> send-label no auto-summary ! address-family ipv4 vrf <CLIENT-VPN-NAME> redistribute connected no auto-summary no synchronization exit-address-family ! address-family vpnv4 neighbor SPE activate neighbor SPE next-hop-self neighbor SPE send-community both ! iBGP mesh with all other SPE routers ! or peering with SP route reflectors neighbor <IP-ADDRESS-SPE-OR-SPRR-1> peer-group SPE neighbor <IP-ADDRESS-SPE-OR-SPRR-2> peer-group SPE no auto-summary exit-address-family!route-map REDISTRIBUTE-ONLY-LOCAL-ADDRESSES permit 10 match ip address 1!access-list 1 permit <IP-ADDRESS-SPE1-Loop>access-list 1 permit <IP-ADDRESS-SPE2-Loop>access-list 1 permit <IP-ADDRESS-SPE3-Loop>!
8.1.8 Multicast in the MPLS/VPNs
The solution consists in the support of multicast routing and forwarding in the context of a VRF and the use of multicast tunnels over the provider network for control and data connectivity.
Limitations:
The current solution allows multicast VPNs to exist only within one domain defined as Autonomous System. A solution for Inter-domain (inter-provider) or inter-AS multicast VPNs is left for future development1.
The current solution does not allow for the same multicast VPN to simultaneously offer Internet multicast in global routing table as well as VPN multicast services.
Currently the multicast VPN is not supported for Extranet MPLS/VPN topologies.
Multicast in VPN is currently not supported on Multi-VRF CE devices.
Multicast VRF
Within the proposed solution, every P and PE router is to be multicast enabled and hold a global multicast routing table for multicast routing with the provider core. Each multicast enabled VPN is augmented with a VPN multicast routing table. This is an extension of the VPNs VRF and is termed as a multicast VRF (mVRF). In each instance of a VRF the PE maintains a PIM adjacency with a PIM capable CE device, reachable over an interface associated with the VRF. At no point do two or more non-directly connected CE devices peer with each other. The MPLS VPN mechanism used for associating physical or logical (CE-PE) interfaces to a VRF is not altered.
If the end customer is not running Source Specific Multicast (SSM), Rendezvous Points (RPs) need to exist within the customer network. The address of these needs to be known within the context of the customer’s VRF. Customer RP information is either configured statically in the VRF or learnt dynamically via Auto-RP. SSM is also a configurable option, which does away altogether with the need of an RP.
Globally the PE is configured to run PIM (global instance) with each of the neighbouring P routers.
Multicast Tunnels
Each PE router builds a single, per VRF instance, default multicast distribution tree (MDT) using traditional PIM mechanisms with it's peers. The MDT is used to distribute end customer multicast packets and control messages. The representation of access to this tree is via a Multicast Tunnel interface (discussed in the following section), which is created automatically upon configuration of the MDT.
A PE having a VRF for a particular VPN will always be the root of the MDT, with all other PEs as leafs. The same PE will be likewise a leaf of MDTs rooted on remote PEs. It’s important to note that this translates to (PEn, G) state info where n is the number of PEs, i.e. a per-source-PE state information on each device, when running pure PIM-SM. The use of Bi-directional PIM would reduce the amount of state information needed.
Apart from the default MDT, additional Data-MDTs can be built automatically to distribute traffic that exceeds a certain pre-set threshold. The advantage of forwarding such traffic over the Data-MDT as opposed to the default-MDT is that PEs can individually signal an interest (join) in receiving traffic sent on a specific Data-MDT. This improves network utilisation since high-rate traffic will be sent across the Data-MDT only to PEs that have joined the Data-MDT, instead of the traffic being sent to all PEs on the default-MDT (even ones which have no active receivers).
Forwarding along the MDT is done using packet encapsulation - GRE (default) or IP-in-IP. The destination IP address of the encapsulating packet is the multicast group address for a particular MDT. This address is unique and consistent in the P network, i.e. the group address needs to be configured to belong to the MDT of a particular VPN on all participating PEs in a consistent manner. The source IP address is taken to be the IP address of the interface used for MBGP peering.
The forwarding is done purely in IP and no MPLS labels are used. The special nature of Default MDT is indicated by the Z flag in the corresponding (S, G) entry.
Multicast Tunnel Interface
Each MDT is associated with an mVRF. The interface pointing to this MDT from the mVRF is a Multicast Tunnel interface, also referred to as a Multicast Virtual Interface. For the purposes of clarity, the term “C-packet” will be used hereon to refer to multicast data or control packets belonging to the customer network, and the term “P-packets” for multicast data or control packets belonging to the provider network.
Thus, a C-packet is GRE/IP-in-IP encapsulated to create a P-packet for forwarding in the P network along the MDT via a Multicast Tunnel Interface. Although the Multicast Tunnel interface is treated as a PIM-enabled interface with PIM adjacencies to other PEs, unicast routing is NOT run over it, and there are no unicast routing adjacencies over it. Unicast reachability of PE routers (MDT roots) over the multicast tunnel interfaces is resolved by topology information provided by MBGP. Consequently, this requires the IP addresses/prefixes selected as the root of the multicast tunnels to be distributed by MBGP. This MDT root information is used when running Source Specific Multicast (SSM) in the P network, since it allows PEs to identify the [root] sources participating on a particular MDT.
The prefix is sent with a new MBGP AFI2. The format of the new extended community used is 2:<VPN_RD>, where VPN_RD is the Route Distinguisher configured for the originator’s VRF in a format as defined in rfc2547bis, e.g. ASn:Number. The MBGP Update containing the extended community will be for the prefix of the source address (typically the PEs’ loopback) of the multicast tunnel. This indicates the root interface of
the MDT on the remote PE. The prefix is checked in unicast VRF table to find the upstream PIM neighbour (BGP next-hop) to whom joins/prunes can be sent.
The Multicast Tunnel is treated by that VRF's VPN-specific PIM instances as a LAN interface. The PEs which are adjacent on the Multicast Tunnel execute the PIM LAN procedures, including the generation and processing of Assert packets. This allows VPNspecific PIM routes to be extended from site to site, without appearing in the P routers.
RPF
Given the special nature of the Multicast Tunnel and MDT, a few changes to the RPF procedures have been made. Two procedures are to be noted here; i) the procedure for determining the RPF neighbour and interface (used for sending join/prune messages over the multicast tunnel); ii) the RPF checking made during packet forwarding. The forwarding RPF process uses the information computed during the RPF neighbour and interface computation done at the control plane [set-up].
The following, adapted from draft-rosen-mvpn-00.txt, specifies how the RPF interface is determined. If a VRF is in a single MDT:
- a P-packet received over a backbone interface is considered to pass the RPF check if the IGP next hop to its encapsulated C-source address, according to the associated VRF, is not one of the customer interfaces associated with that VRF.
- a C-Join/Prune message from a CE router needs to be forwarded over the Multicast Tunnel if the next hop interface to the root of the corresponding multicast tree is not one of the interfaces associated with that VRF.
As mentioned, the above specifies how to determine the RPF interface. To determine the RPF neighbour for a particular C-address, it is first necessary to determine the BGP next hop for the corresponding VPN-IP address, then verify that the BGP next hop is a PIM neighbour on the RPF interface. This is done through the VRFs unicast routing table to the C-source and PIM adjacency table.
Forwarding
For C-packets received from a CE interface, and passing the RPF check, forwarding is done using the mVRF table. The Multicast Tunnel interface must be in the olist of a shared or source entry in the mVRF. The router replicates the packet to the Multicast Tunnel interface. This forwarding action results in the necessity to re-write the IP header, which in turn also causes a lookup in the global multicast table. This lookup allows the PE to replicate the packet to each backbone interface that exists in the global olist for the MDT group.
For multicast received by a PE over a P attached interface, the router first looks in the backbone multicast routing table using the source and destination addresses in the packet
header. If the packet is intended for the backbone only, the PE router follows regular multicast forwarding rules and replicates the packet to each interface in the oif list.
Otherwise, if the lookup result indicates that the packet is received from a multicast tunnel (Z flag is set for this (S, G) entry), the PE router locates the mVRF based on the destination address in the delivery header of what is now known to be a P-packet. Then, it performs RPF check using the source address in the C-packet, to verify that the packet was sent by the BGP next hop to the source in the VPN. If the RPF check succeeds, the P-packet is replicated to all interfaces in the global oif list.
For delivery to mVRF receivers, the P-header is discarded and the C-packet passed to the mVRF for routing.
Data MDT
Initially all MVPN traffic that crosses the backbone is encapsulated in the MDT-default group. Certain policies can be applied to move MVPN (S,G)'s from the MDT-default to a specific MDT-data group. Currently only traffic rate is configurable for this. If a traffic rate threshold is exceeded on a PE router that is connected to the source, a switch over from the default to the Data MDT is initiated. The PE router that makes the decision to switch advertises this "MVPN (S,G) to MDT-group" mapping to the other PE routers.
This mapping is advertised in a special message. Once a source drops below the set threshold forwarding reverts to the Default-MDT after a fixed delay interval. Given the stability focus of BGP, it was chosen not to use this protocol to convey the Data-MDT messaging. Instead a patented PIM-like control message is used. The (S, G, MDT-data) is advertised in a TLV format and sent to the ALL_PIM_ROUTER group (224.0.0.13), with a normal IP protocol type and a unique UDP port number. (The UDP port number needs to be requested). The message is cached on non-participating PEs in case of receivers for that group requesting a join.
The TLV format is shown below:
0 16 31 bits------------ ------------| Type | Length | ------------ ------------| Source |-------------------------| Group |-------------------------| MDT-data |-------------------------
The multicast group address for the Data-MDT is chosen from a range of configurable addresses that have to be valid multicast addressed in the core network. Currently a maximum of 255 data-MDTs can be configured. Only (S, G) pairs within the VPN will be distributed on the additionally created Data-MDTs. In the case that more (S,G) pairs
are present inside the VPN than Data-MDTs are available, the distribution will be done equally among the available Data-MDTs. MDT data ranges can be different on different PE routers.
For BIDIR groups or PIM-SM with spt-threshold set to infinity is configured, no Data-MDTs are created. (*,G) traffic usually goes to a lot of receivers (many to many) so it may be better distributed over the default MDT anyway. Since there is no relationship between (S, G) and the MDT-data group, the (S, G, MDTdata) mappings are stored in a separate table on all the receiving PE router that participate in the MVPN. This table is used to map incoming GRE encapsulated Multicast packets to the right Tunnel Interface (that leads to the VRF) and to join the correct MDT-data group in the backbone for a specific MVPN group. It’s important to note that an MDT-data group will not create an additional tunnel interface. The reference to the special table is indicated by two new flags in the multicast routing table:
Y flag - Joined MDT-data group; indicating that the (S, G) entry is being received over the Data MDT.
y flag - Sending to MDT-data group; indicating that the (S, G) entry is to be sent on a Data MDT.
Multicast VPN Basic Configuration
Multicast VPNs are configured alongside or after MPLS VPN configuration. It is recommended that the correct operation of MPLS VPNs be first verified before proceeding with the multicast set-up.
Step 1. Enable Global Multicast routing
This is the only manual MCAST provisioning step – any VRF-specific multicast commands will be provisioned by ISC.
This configuration step should be no different to that used to multicast enable the provider network. In the steps below, PIM-SM is used. Enable global multicast routing on all P and PE devices with the ip multicast routing global configuration command.
Successively, proceed to enable PIM sparse mode on the device’s interfaces with the ip pim sparse-mode in interface configuration mode, and any other topology specific multicast configuration commands (e.g. NBMA interfaces). Also, optionally change PIM version. Do not forget to PIM enable the interfaces used for MBGP peering, typically a loopback. This interface will be used to source the Multicast tunnel interface.
Configure the provider’s desired RP address with the ip pim rp-address <ip address> global command.
Step 2. Enable VRF instance Multicast routing
On each PE, enable VRF multicast routing with the ip multicast routing vrf <VRF_name> global command. Configure the PE-CE interfaces for multicast operation with the interface command ip pim [dense-mode | sparse-mode].
Step 3. Configure mVRF multicast parameters
On each PE, configure the PIM operation mode parameters for the particular VRF. For example, configure the customer’s RP address with the ip pim vrf <VRF_name> rpaddress <ip address> global command, where the RP’s ip address falls in the address range used by the customer and known in the local VRF.
If the customer is using SSM configure the mVRF to operate in SSM mode with the ip pim vrf <VRF_name> ssm command.
Step 4. Configure the Default MDT
Once again, on each PE in VRF configuration mode configure the MDT default group address using the mdt default <ip address> command. Optionally, change the mdt encapsulation mode from the GRE default - mdt mode [ip | gre].
Step 5. Configure the Data MDT
To configure a range of data MDTs to control the flooding of multicast traffic to only interested PE receivers, use the mdt data <range> <range mask> {threshold [0-255]} {list [access-list]} command in VRF configuration mode. Optionally but recommended the data MDT can be configured with the threshold specifying that only additional data MDTs will be created for sources which exceed this threshold. Additionally sources eligible to trigger a Data-MDT may be specified by an access-list.
Source Specific Multicast
A recent development that goes towards fulfilling Goals 4 and 6 and reducing the overall complexity of a multicast network is Source Specific Multicast (SSM).
With SSM, receivers send a Group Join message (requires IGMPv3) directed to a specific Source. The first hop router can ‘snoop’ this join message and perform the necessary control procedures and send Joins to the source based on the local routing
table. Thus, SSM removes the necessity of running an RP in the network, as well as using MSDP for exchanging information about external sources.
The use of SSM is currently recommended for new multicast deployments. It is also recommended that existing PIM-SM deployments investigate a migration to SSM where appropriate.
mVPN Extranet
As mentioned in the beginning of this section, mVPN Extranets are currently not supported in Cisco IOS.
Figure 38 mVPN Extranet
a2
b1
a1
c1
VRF A-1(bank)
VRF A-2(bank)
VRF C-1(mDC)
VRF B-1(inet)
SP Backbone
PE1 PE2
Site 1 Site 2
In the picture above, there are two customer VPNs, VPN-A that is a national Bank and VPN-B, which represents Internet customers in Mipnet MPLS/VPN network. A third VPN-C is the common TMN multicast data centre (mDC) that implements video streaming service to several Mipnet MPLS/VPNs.
Within each site, the network is described as A-* for VPN-A and B-* for VPN-B respectively. a1, a2, b1, and b2 are hosts. c1 is the video streaming server.
To connect source a1 and receive a2, we configure an MDT group for VRF A-1 and the same MDT group for VRFA2, establishing a multicast tunnel in the SP backbone. The multicast tunnel encapsulates and transports traffic from a1 to a2 from one site to the other. The same method can be used to connect source and receivers within other MPLS/VPNs (Intranet topologies).
The problem arises when a2 wishes to receive traffic from both c1 and a1, while b1 is not allowed to access traffic sourced from a1. This type of service is typically known as extranet. A straightforward solution to the problem is to configure the same MDT group
in all the VRFs and configure multicast boundary or ACL on the interface connecting B-1 on PE2. But this solution doesn’t scale. It also has a potential security issue in that the traffic from a1 does reach VRF B-1.
The following picture represents a possible (but not necessarily feasible) workaround for Multicast extranets. The idea is to have several (logical) links between PE and mDC_CE, and terminate each logical link into a distinct VRF on PE router.
The most obvious problem here is that if receiver in VPN RED and GREEN listens to the same source in mDC, there’ll be two streams flowing from mDC into both VPNs (across distinct MDTs). From bandwidth requirements perspective this is still more efficient than (unicast) possibly hundreds of point-to-point video streams between every single receiver and that source.
Another and more severe issue with centralised “multi-homed” mDC is the possibility of overlapping address space (eg. the same IP address for RP in two VPNs).
Figure 39 mDC in Mipnet
mDC
VPN RED
VPN GREEN
8.2 Internet Transport
8.2.1 Operational Overview
This Internet solution is essentially an Intranet VPN between a Downstream ISP and the Internet CG Internet gateways (Upstream ISP) in which a BGP session can be established to exchange full Internet routes. The most important feature of this solution9 is that the VRF tables in Mipnet do not hold the full Internet routing table. The VRF’s actually hold only the routes originating from the downstream ISPs and Loopback interfaces of eBGP-multihop peers. This makes the solution very scalable and allows Mipnet to offer the service to a large number of ISP’s and Internet customers in general.
9 For this solution to be scalable the Tier 3 ISP must not be a transit network for other ISP’s otherwise there will be too many routes in the VRF. This will ensure the VRF will only hold originating routes from the Tier 3 ISP and not transit routes.
In this scenario, a small downstream ISP gains access to the Internet via one or more of the Mipnet Internet gateways (AS8585) that provide wholesale Internet access. Downstream ISP may require BGP routes from the AS8585 for which the eBGP multihop session is required between dCE and iGWs. Both the downstream ISP CE (dCE) and the iGW connections are terminated in the VRFs.
It would be possible to implement the “Internet Transit” service in a single MPLS/VPN; however the Extranet MPLS/VPN topology will allow the “ISP Selection” service, which has also been requested by TMN. Ie. a only a default route from selected upstream ISP’s VRF will be imported into a Downstream ISP’s VRF.
ISP Selection.
Customer site can be assigned to desired Upstream_ISP Extranet. This is accomplished by importing a default route from appropriate Upstream ISP VPN, and exporting customer route into the same Upstream ISP VPN. Extranet will be provisioned (import/export route-targets) via ISC.
Example below shows how the Extranets can be implemented with import/export of route targets. Customer X will receive internet access via provider AS1, whereas Customers Y and Z will be routed via default route of provider AS2.
UPSTR_AS1:
route-target import 29453:<upstr_as1_import>route-target export 29453:<upstr_as1_export>
UPSTR_AS2:
route-target import 29453:<upstr_as2_import>route-target export 29453:<upstr_as2_export>
CUST_x:
! CUST_X Intranetroute-target import 29453:<cust_x>route-target export 29453:<cust_x>! Extranet with AS1route-target import 29453:<upstr_as1_export>route-target export 29453:<upstr_as1_import>
CUST_y:
! CUST_Y Intranetroute-target import 29453:<cust_y>route-target export 29453:<cust_y>! Extranet with AS2route-target import 29453:<upstr_as2_export>route-target export 29453:<upstr_as2_import>
CUST_z:
! CUST_Z Intranetroute-target import 29453:<cust_z>route-target export 29453:<cust_z>! Extranet with AS2route-target import 29453:<upstr_as2_export>route-target export 29453:<upstr_as2_import>
Connectivity between Internet customers of Mipnet.
Optionally, customer might want to exchange the traffic with any other Internet customer of Mipnet directly, and not via selected Upstream ISP (it really makes no sense to send the packet across Atlantic twice, between the customer sites both located in Podgorica). For this reason the customer site could join extranet INET_CUST, composed of all routes of Mipnet Internet customers.
It is assumed that address space of Internet customers is not overlapping.
8.2.2 BGP communities
This chapter presents the BGP communities and their use in Mipnet. BGP communities will be primarily used for two reasons:
Scalable and efficient control of advertised routes towards neighboring Ass.
Implementation of routing policies with BGP routed Mipnet customers.
Advertisement Control.
The following figure depicts the idea of colouring of BGP routes with community values that are later on used for filtering of route advertisements towards neighbouring ASs. First and most important thing to remember is that no route will be advertised to neighbouring AS unless it has been previously coloured with appropriate community number. All routes are coloured on PEs, when either received from eBGP peer or originated locally (redistribute from static) on the PE.
In the example below we can see that the outbound route-map will explicitly allow propagation of 29453:25xx routes towards Internet CG, which corresponds to PI prefixes of Mipnet customers and Mipnet aggregates. Any other prefix in the BGP table, for example routes received from peering partners or routes that have not been assigned any community value (eg. CE-PE connections, private IP address blocks), will be implicitly blocked.
A note on scalability of such advertisement control: if the outbound route propagation is to be controlled on prefix and/or as-path basis, the respective prefix/as-path lists on all routers with eBGP peers would have to be updated each time new PI route is connected,
or a new BGP routed customer is connected to Mipnet, or even worse, when each of downstream ISPs advertises a new route or customer AS. This clearly doesn’t scale from provisioning and operational perspective.
With community based approach, routes that can be accepted from an eBGP neighbour are filtered inbound on that eBGP session and coloured with appropriate community. Each time when for example a downstream ISP announces a new route, TMN will update the prefix-list on that particular eBGP session, but the outbound route-maps on any other PEs/iGWs will remain unchanged.
Figure 40 BGP community colouring
PECPE1
Peer
iGW
MipnetAS8585set
mipnetAS:2501 PE
PE
CPE2
PE
set
mipnetAS:2501
matchmipnetAS:25..
set
mipnetAS:2401
Only routes markedwith proper
community will beexported - singleconfig. line for all
customers !
Routes are coloured on PE,when received from eBGPpeer (or originated locally
on the PE)
Implicit DENY ofother communities
(mipnetAS:2601,mipnetAS:2401,...)
Table 9 BGP Community Scheme
Communities accepted from BGP customers
29453:50 Signal LOCAL_PREFERENCE=50 ie. Mipnet is a global backup provider
Primary/backup
29453:90 Signal LOCAL_PREFERENCE=90 ie. backup connection with Mipnet
Route colouring (used for scaleable advertisement control)
29453:2401 Routes from Mipnet peering partners Can be advertised to customers (or downstream ISPs)
29453:25xx PI route of Mipnet customers, connected in PoP with ID XX (including Mipnet aggregates)
Can be advertised to any neighbouring AS.
29453:26xx PA route of Mipnet customer, connected in PoP with ID XX (not exported to eBGP peers)
Can be advertised to any neighbouring AS, if this is agreed by both providers. Blocked by default.
29453:2700 Default route (originated by iGWs) Can be advertised to customers.
The following table summarizes LOC_PREF settings of BGP routes in Mipnet AS. Please note that higher LOC_PREF means better preference, ie. route with higher LOC_PREF value will be preferred in a BGP path selection process. Routes are assigned appropriate LOC_PREF value when received from eBGP neighbor.
Table 10 LOC_PREF settings
LOC_PREF Description
50 Customer routes - global backup
60 Routes from global upstream providers (ie. a default route in case of Mipnet)
70 Routes from Mipnet peering partners
90 Customer routes - backup link with Mipnet AS
100 Customer routes - primary connection
8.2.3 Routing Details
This chapter explains routing setup with Upstream ISPs of Mipnet (eg. Internet CG) and the following three Internet customer profiles:
#1 – Statically routed customer without an AS number
#2 – BGP routed customer with default route (and eventually Montenegro national routes)
#3 – BGP routed customer with full Internet routing table for optimal inter-domain routing.
Routing with Upstream Providers
BGPv410 will be the routing protocol between Mipnet PEs and Upstream ISPs. eBGP session will be established on IP addresses of CE-PE link. Each upstream ISP will be connected in a different VPN on Mipnet PEs. This will allow creation of Internet Extranets needed for the “ISP Selection” service.
Exchange of Routing Information.
The following routes will be exchanged between PEs and border routers (iGW) of upstream ISP:
iGW => PE: default route.
PE => iGW: PI prefixes of Mipnet and Mipnet customers.
PE will also redistribute connected routes (CE-PE links) into respective VRFs. These are needed to establish eBGP multihop sessions (across Mipnet) for customer profile #3. CE-PE links will not be advertised to neighboring ASs.
In case of multiple links between Mipnet and upstream ISP, classical BGP methods shall be used to achieve desired routing policy. For instance, MEDs and/or AS_PATH prepending can be used to affect routing (eg. primary/backup) from Internet towards Mipnet, without manual intervention in neighboring AS. LOC_PREF shall be used to achieve desired routing policy for outbound traffic.
Routes received from upstream ISPs (ie. a default route in first phase) will be installed in MP-iBGP with LOC_PREF 60.
Security of Routing Protocols.
Inbound prefix list on PE router will explicitly permit default (0/0) route and deny other prefixes.
Outbound community list will only allow propagation of PI blocks of Mipnet and Mipnet’s customers.
BGP community string for routes received from upstream ISPs will be set according to Mipnet community scheme.
MD5 authentication shall be enabled on eBGP TCP session.
Profile #1: Statically routed Internet customer
Static routing is appropriate in the following scenarios:
Single-homed Internet customers that do not have an AS number10 BGP or Static on CE-PE?
Routing between Downstream CE and PE can be static or BGP. Static routing is simpler, particularly with regard to security aspects of routing protocols, and therefore recommended for Mipnet. However, BGP shall be preferred routing protocol for downstream and upstream ISPs, when the ISP has two or more logical links with Mipnet and dynamic routing protocol is needed for detection of link failures. For example, when the iGW is connected to the PE via Ethernet switch, static route from PE towards iGW may result in blackholing, because the interface failure on iGW would not be detected on PE router.
Bi-connected customer site that does not need dynamic routing protocol for detection of CE-PE link failure (eg. leased lines)
Each customer link will be connected in a separate VRF. Customer VRF will be assigned to appropriate UPSTREAM_ISP extranet.
Routing configuration.
Customer => Mipnet: CE will have a default static route configured, pointing towards remote IP address of CE-PE link.
Mipnet => Customer: PE will install a static route for customer prefix, pointing towards remote address of CE-PE link. Static route will be configured in appropriate VRF, and redistributed into MP-iBGP.
If the customer prefix has been allocated by TMN from Mipnet’s address space, it will be advertised to the global Internet within Mipnet aggregate. If that customer has registered a PI prefix with RIPE, it will be coloured with appropriate BGP community that permits export to neighboring ASs.
Figure 41 Profile #1: Statically routed Internet customer
AS1Upstream ISP
iGW2iGW1
tkc_pe_1
PE1
mtkc_pe_1
Mipnet
L1 L2
Lx - CE-PE links
B - Full BGP Internet routing table (~120k routes)
BB
BBB
BBBB B
BB
BBBB
BB
Dx - VRF-specific default route
C - Customer routes of downstream ISP
Cy Cy
UPSTR_AS1 extranet
CEx
Cx
Internet CustomerISPx
INET_CUSTy VRF
IPv4 routing table
eBGP
L1
D2 Cx
L1
D1 Cy
B B B BCx Cx
PE2
CEy
Cy
Internet CustomerISPy
L2
Cx
L2
Cy
L3 L4 L3 L4
L3L4
L1
D1 Cx
L2 L3 L4 L1
D1
L2
Cy
L3 L4
Backup link
D1
Sta
tic
D1
Sta
tic
Cx
Sta
tic
Cy
Sta
tic
Profile #2: BGP routed customer with default route
If a dynamic routing protocol is required between Mipnet and Internet customer, BGP is again the best choice because of its flexibility and security mechanisms. To run BGP with Mipnet customer has two options:
Official (RIPE registered) AS number
Private AS number assigned by Mipnet.
If private AS number is used, it will be stripped out from AS_PATH when the route is advertised towards neighboring ASs of Mipnet (remove-private-as command is required
on all eBGP sessions). Customer route in the global Internet will appear as originated by Mipnet AS.
Customer with private AS number can not be a transit AS for “official” AS11.
Each customer link will be connected in a separate VRF. Customer VRF will be assigned to appropriate UPSTREAM_ISP extranet.
Exchange of routing information.
CE => PE: customer prefixes (registered in RIPE / allocated by TMN)
PE => CE: VRF-specific default route and (optionally) routes of all Internet customers of Mipnet (extranet INET_CUST)
If the customer prefix has been allocated by TMN from Mipnet’s address space, it will be advertised to the global Internet within Mipnet aggregate. If that customer has registered a PI prefix with RIPE, it will be coloured with appropriate BGP community that permits export to neighboring ASs.
Routing policy implementation shall be pushed to CE routers as much as possible. This means that customer-specific requirements should not be configured on PE routers. PE router will have a standard configuration profile that can be used for any Internet customer. For example, to implement primary/backup scenario, customer can either signal routers in Mipnet AS with appropriate MEDs, or colour his routes with BGP community 29453:50 or 29453:90 that is interpreted as LOC_PRE 50[90] by route-map on Mipnet routers. The only customer-specific configuration will be inbound prefix-list, which should be provisioned through ISC templates (To be verified with Guy)
Security of Routing Protocols.
Inbound prefix list on PE router will explicitly permit customer routes and deny other prefixes.
Outbound community list will only allow propagation of default route and (optionally) PI blocks in extranet INET_CUST.
BGP community string for routes received from Internet customer will be set according to Mipnet community scheme.
MD5 authentication shall be enabled on eBGP TCP session.
Figure 42 Profile #2: BGP routed customer with default route
11 If the AS path includes both private and public autonomous system numbers, IOS considers this to be a configuration error and does not remove the private autonomous system numbers.
AS1Upstream ISP
iGW2iGW1
tkc_pe_1
PE1
mtkc_pe_1
Mipnet
L1 L2
Lx - CE-PE links
B - Full BGP Internet routing table (~120k routes)
BBB
BBBB
BB B
BB
BBBB
BB
Dx - VRF-specific default route
C - Customer routes of downstream ISP
Cy Cy
UPSTR_AS1 extranet
CEx
Cx
Internet CustomerISPx
INET_CUSTy VRF
eBGP
L1
D2 Cx
L1
D1 Cy
B B B BCx Cx
PE2
CEy
Cy
Internet CustomerASy
L2
Cx
L2
Cy
L3 L4 L3 L4
L3 L4
L1
D1 Cx
L2 L3 L4 L1
D1
L2
Cy
L3 L4
Backup link
D1
Sta
tic
Cx
Sta
tic
D1
via
eBG
P
Cy
via
eBG
P IPv4 routeReceived via PROTO
Profile #3: BGP routed customer with full Internet routing table
A multihomed customer may need full Internet routing table from one of the Mipnet’s upstream ISPs. Because the Internet routing table must not be injected into Mipnet MPLS/VPNs (scalability concern), customer border router will establish an eBGP multihop session directly with upstream ISP of Mipnet. In first phase this will be one or two iGWs in AS8585.
Exchange of routing information with Mipnet and BGP security measures are the same as in profile #2. It is up to customer to accept a default route from AS29453 or not.
Each customer link will be connected in a separate VRF. Customer VRF will be assigned to appropriate UPSTREAM_ISP extranet.
For eBGP multihop session between CE router and iGW we have two possibilities:
It can be “read-only”, meaning that CE would only download Internet routes from AS8585 and will not announce any routes across this multihop session. This would result in “asymmetric” AS_PATH because customer router will not have AS29453 in routes downloaded from AS8585, whereas in the global Internet, the route of this customer will be announced with AS29453 in the AS_PATH.
If the above asymmetry represents an issue for inter-domain routing of that customer, the eBGP multihop session could be used to announce customer routes to upstream ISP (“read-write” mode). iGW will prefer this route over the customer route advertised through AS29453 on account of shorter AS_PATH. In any case, routing setup between customer and Mipnet will not change.
Customer CE router shall have a static route configured for eBGP multihop peer. This will be a host route, pointing to the IP address configured on CE-PE link on PE router. If eBGP multihop is implemented on Loopback addresses, these need to be installed in MP-iBGP; as redistributed static routes, or announced vie eBGP from CE router to Mipnet.
Figure 43 Profile #3: BGP routed customer with full Internet routing table
AS1Upstream ISP
iGW2iGW1
tkc_pe_1
PE1
mtkc_pe_1
Mipnet
L1 L2
Lx - CE-PE links
B - Full BGP Internet routing table (~120k routes)
BBB
BBBB
BB B
BB
BBBB
BB
Dx - VRF-specific default route
C - Customer routes of downstream ISP
Cy Cy
UPSTR_AS1 extranet
CEx
Cx
Internet CustomerISPx
INET_CUSTy VRF
eBGP
L1
D2 Cx
L1
D1 Cy
B B B BCx Cx
PE2
CEy
Cy
Internet CustomerASy
L2
Cx
L2
Cy
L3 L4 L3 L4
L3 L4
L1
D1 Cx
L2 L3 L4 L1
D1
L2
Cy
L3 L4
Backup link
D1
Sta
tic
Cx
Sta
tic
Cy
via
eB
GP
IPv4 routeReceived via PROTO
eBGP multihop
B B B
B
via eBGP m
ultihop
L1
L2
Sta
tic
Cy
via eBGP multihop
Routing with Peering Partners
Two ISPs that mutually exchange the customers’ traffic free of charges are called Peering Partners. This means that the two ISPs establish direct link (private circuit or at NAP) instead of routing customer traffic across expensive transit ISPs (cost reduction). This “shortcut” connection is controlled by the two Peering partners, and can be upgraded as necessary to prevent packet loss during peak periods (better QoS).
Routes from each Peering partner will be injected in a separate VPN, which will, in the same manner as with Upstream ISPs, allow creation of extranets as requested by a given Internet customer or downstream ISP.
Exchange of routing information.
Peering Partner => PE: routes of peering partner customers (registered in RIPE)
PE => Peering Partner: routes of Mipnet customers in relevant extranet
Security of Routing Protocols.
Peering partner is typically a SP with a customer base that may change on a daily/weekly basis. This would require TMN Operations to update inbound prefix-list accordingly, which could become operational nightmare. A good compromise is to limit just the number of routes that can be accepted from a peering partner, which would protect from major DoS accidents (eg. full Internet routing table exported from Peering partner into Mipnet). For example, if number of routes received from peering partner under normal network operation is
between 40 and 50, the maximum number of accepted prefixes (configured inbound on eBGP session) could be set to 80.
If peering partner by accident or intentionally advertises a route that belongs to customer of Mipnet, some level of protection will be guaranteed on account of lower LOC_PREF value. This is not the case if peering partner advertises more specific route – if TMN suspects this could be an issue, it is recommended to install strict inbound prefix-lists.
Outbound community list will only allow propagation PI blocks in relevant extranet.
BGP community string for routes received from Internet customer will be set according to Mipnet community scheme.
MD5 authentication shall be enabled on eBGP TCP session.
8.3 QoS
8.3.1 Introduction
In order to fulfil TMN requirements of having four distinct classes of service, potentially each with their specific service characteristics, QoS mechanisms will be deployed on the access layer and backbone links. The following section describes the technical implementation and features that form the basis for a set of new innovative products.
Scalability and stability are the main criteria for any extension of the network. It is absolutely necessary to aggregate IP streams with identical flow characteristic. The expression used for this solution is “service classes”. Dedicated handling of single streams is only meaningful in special cases when high bandwidths are involved, and there are no plans for this solution to be introduced in the first instance.
The number of service classes should be strictly limited from the technical point of view. This is not a restriction to construct various commercial products on top of it. Service level agreements (SLA) form the definition interface for the service that will be delivered to the customer by TMN. Parameters should describe a probability for a certain service and will be reported on a per class base.
For Mipnet a robust solution that aligns to base ideas of IETF's DiffServ approach would appear to be practicable at present. With respect to the intended MPLS solution, a maximum of 8 code points per path can be supported. These are distinguished using the three experimental bits of the MPLS shim header. A large part of best effort background traffic is required to produce efficient high quality service classes because DiffServ is based on relative priorities. The strength of a large IP backbone network is to be seen in the fact that high-priority and low-priority traffic is merged on a single network platform. This results in synergy that permits optimum resource utilisation. The bundling of many different traffic streams (statistical multiplexing) smoothes individual bursts.
8.3.2 Differentiated Services Model – Introduction
This section is intended as an introduction to the Differentiated Services (DiffServ) reference model.
DiffServ is a new model by which traffic is treated by intermediate systems with relative priorities based on the type of services (ToS) or Differentiated Services Code Point (DSCP) field. Defined in RFC’s 2474 and 2475, the DiffServ standard supersedes the original specification for defining packet priority described in RFC 791.
The new DiffServ standard proposes a new way of interpreting a field that has always been part of an IP packet. In the DiffServ standard, the ToS field will be renamed to Differentiated Services Code Point (DSCP) and will have new meaning. The DiffServ standard proposes to increase the number of definable priority levels by re-allocating bits of an IP packet for priority marking.
As per RFC 791, the ToS field describes one entire byte (eight bits) of an IP packet. Precedence refers to the three most significant bits of the ToS field---that is, [XXX]XXXXX. There may be some confusion because the RFC 1349 defines a new 4-bit ToS XXX[XXXX]X as shown on the following picture.
Figure 44 Various interpretations of the TOS field
The three most significant bits of the RFC-791 ToS field - the precedence bits - define the IP packet priority or importance.
XXX00000 Bits 0,1,2 = Precedence, where:
111 = Network Control = Precedence 7
110 = Internetwork Control = Precedence 6
101 = CRITIC/ECP = Precedence 5
100 = Flash Override = Precedence 4
011 = Flash = Precedence 3
010 = Immediate = Precedence 2
001 = Priority = Precedence 1
000 = Routine = Precedence 0
The four bits of the RFC-1349 TOS are used in IOS configuration and have the following semantics:
000XXXX0 Bits 3, 4, 5, 6:
1000 = Minimize delay
0100 = Maximize throughput
0010 = Maximize reliability
0001 = Minimize monetary cost
0000 = Normal service
0000000X Bit 7: Reserved for future use
This one-byte ToS field has been almost completely unused since it was proposed almost 20 years ago. Only in the last few years have Cisco and other router companies begun utilising the Precedence bits for making forwarding decisions.
The DiffServ standard follows a similar scheme to RFC 791, but utilises more bits for setting priority. The new standard maintains backward compatibility with RFC 791 implementations, but allows more efficient use of bits 3, 4, and 5. (Bits 6 and 7 will still be reserved for future development.) With the additional 3 bits, there are now a total of 64 classes instead of the previous 7 classes.
RFC 2475 defines Per Hop Behaviour (PHB) as the externally observable forwarding behaviour applied at a DiffServ-compliant node to a DiffServ Behaviour Aggregate (BA).
With the ability of the system to mark packets according to DSCP setting, collections of packets with the same DSCP setting and sent in a particular direction can be grouped into a BA. Packets from multiple sources or applications can belong to the same BA.
In other words, a PHB refers to the packet scheduling, queuing, policing, or shaping behaviour of a node on any given packet belonging to a BA, as configured by a service level agreement (SLA) or a policy map.
The following sections describe the four available standard PHBs:
Default PHB (as defined in RFC 2474)
Class-Selector PHB (as defined in RFC 2474)
Assured Forwarding (AFxy) PHB (as defined in RFC 2597)
Expedited Forwarding (EF) PHB (as defined in RFC 2598)
Default PHB
The default PHB essentially specifies that a packet marked with a DSCP value of 000000 (recommended) receives the traditional best-effort service from a DS-compliant node (that is, a network node that complies with all of the core DiffServ requirements). Also, if a packet arrives at a DS-compliant node, and the DSCP value is not mapped to any other PHB, the packet will get mapped to the default PHB.
For more information about default PHB, refer to RFC 2474, Definition of the Differentiated Services Field in IPv4 and IPv6 Headers.
Class-Selector PHB:
To preserve backward-compatibility with any IP Precedence scheme currently in use on the network, DiffServ has defined a DSCP value in the form xxx000, where x is either 0 or 1. These DSCP values are called Class-Selector Code Points. (The DSCP value for a packet with default PHB 000000 is also called the Class-Selector Code Point.)
The PHB associated with a Class-Selector Code Point is a Class-Selector PHB. These Class-Selector PHBs retain most of the forwarding behaviour as nodes that implement IP Precedence-based classification and forwarding.
For example, packets with a DSCP value of 110000 (the equivalent of the IP Precedence-based value of 110) have preferential forwarding treatment (for scheduling, queuing, and so on), as compared to packets with a DSCP value of 100000 (the equivalent of the IP Precedence-based value of 100). These Class-Selector PHBs ensure that DS-compliant nodes can coexist with IP Precedence-based nodes.
The DiffServ standard utilises the same precedence bits (the most significant bits: 0, 1, and 2) for priority setting, but further clarifies their functions/definitions, plus offers finer priority granularity through use of the next three bits in the ToS field. DiffServ reorganises (and renames) the precedence levels (still defined by the three most significant bits of the ToS field) into the following categories:
Table 11 Class-Selector PHBs
Precedence 7 Stays the same (link layer and routing protocol keep alive)
Precedence 6 Stays the same (used for IP routing protocols)
Precedence 5 Class 5
Precedence 4 Class 4
Precedence 3 Class 3
Precedence 2 Class 2
Precedence 1 Class 1
Precedence 0 Best effort
For more information about class-selector PHB, refer to RFC 2474, Definition of the Differentiated Services Field in IPv4 and IPv6 Headers.
Assured Forwarding PHB
Assured Forwarding PHB is nearly equivalent to Controlled Load Service available in the integrated services model. AFxy PHB defines a method by which BAs can be given different forwarding assurances.
For example, network traffic can be divided into the following classes:
Gold: Traffic in this category is allocated 50 percent of the available bandwidth.
Silver: Traffic in this category is allocated 30 percent of the available bandwidth.
Bronze: Traffic in this category is allocated 20 percent of the available bandwidth.
Further, the AFxy PHB defines four AF classes: AF1, AF2, AF3, and AF4. Each class is assigned a specific amount of buffer space and interface bandwidth, according to the SLA with the service provider or policy map.
Within each AF class, you can specify three drop precedence (dP) values: 1, 2, and 3. Assured Forwarding PHB can be expressed as shown in the following example:
AFxy
In this example, x represents the AF class number (1, 2, or 3) and y represents the dP value (1, 2, or 3) within the AFx class. In instances of network traffic congestion, if packets in a particular AF class (for example, AF1) need to be dropped, packets in the AF1 class will be dropped according to the following guideline:
dP(AFx1) <= dP(AFx2) <= dP(AFx3)
where dP (AFxy) is the probability that packets of the AFxy class will be dropped. In other words, y denotes the dP within an Afx class. The dP method penalises traffic flows within a particular BA that exceed the assigned bandwidth. Packets on these offending flows could be re-marked by a policer to a higher drop precedence.
Bits 3 and 4 of DiffServ field allow further priority granularity through the specification of a packet drop probability for any of the defined classes. Collectively, Classes 1-4 are referred to as Assured Forwarding (AF). The following table illustrates the DSCP coding for specifying the priority level (class) plus the drop percentage. (Bits 0, 1, and 2 define the class; bits 3 and 4 specify the drop percentage; bit 5 is always 0.)
Using this system, a device would first prioritise traffic by class, then differentiate and prioritise same-class traffic by considering the drop percentage. It is important to note that this standard has not specified a precise definition of "low," "medium," and "high" drop percentages. Additionally, not all devices will recognise the DiffServ bit 3 and 4
settings. Remember also that even when the settings are recognised, they do not necessarily trigger the same forwarding action to be taken by each type of device on the network---each device will implement its own response in relation to the packet priorities it detects. The DiffServ standard is meant to allow a finer granularity of priority setting for the applications and devices that can make use of it, but it does not specify interpretation (that is, action to be taken).
Expedited Forwarding PHB
Resource Reservation Protocol (RSVP), a component of the integrated services model, provides a Guaranteed Bandwidth Service. Applications such as Voice over IP (VoIP), video, and online trading programs require this kind of robust service. The EF PHB, a key ingredient of DiffServ, supplies this kind of robust service by providing low loss, low latency, low jitter, and assured bandwidth service.
EF PHB is ideally suited for applications such as VoIP that require low bandwidth, guaranteed bandwidth, low delay, and low jitter. The recommended DSCP value for EF PHB is 101110.
For more information about EF PHB, refer to RFC 2598, An Expedited Forwarding PHB.
Figure 45 DSCP Interpretation
Class 0Prec 0
Class 1Prec 1
Class 2Prec 2
Class 3Prec 3
Class 4Prec 4
ReservedPrec 5
RoutingPrec 6
RoutingPrec 7
Class-Selector PHBs
000 000
BE PHB
DSCP 0
001 000
CS PHB
DSCP 8
010 000
CS PHB
DSCP 16
011 000
CS PHB
DSCP 24
100 000
CS PHB
DSCP 32
101 000
CS PHB
DSCP 40
110 000
CS PHB
DSCP 48
111 000
CS PHB
DSCP 56
Unused 000 001
001 001
010 001
011 001
100 001
101 001 110 001 111 001
Low Drop Precedence
000 010
001 010
010 010
011 010
100 010
101 010 110 010 111 010
AF11
DSCP 10
AF21
DSCP 18
AF31
DSCP 26
AF41
DSCP 34
Unused 000 011
001 011
010 011
011 011
100 011
101 011 110 011 111 011
Medium Drop Precedence
000 100
001 100
AF12
DSCP 12
010 100
AF22
DSCP 20
011 100
AF32
DSCP 28
100 100
AF42
DSCP 36
101 100 110 100 111 100
Unused 000 101
001 101
010 101
011 101
100 101
101 101 110 101 111 101
High Drop Precedence
000 110
001 110
AF13
DSCP 14
010 110
AF23
DSCP 22
011 110
AF33
DSCP 30
100 110
AF43
DSCP 38
101 110
EF PHB
DSCP 46
110 110 111 110
Unused 000 111
001 111
010 111
011 111
100 111
101 111 110 111 111 111
QoS and VoIP
Although TMN currently don’t have VoIP customers, QoS design in Mipnet shall address this requirement.
Voice quality is directly affected by two major factors:
Lost packets
Delayed packets
Packet loss causes voice clipping and skips. The industry standard codec algorithms used in Cisco Digital Signal Processor (DSP) can correct for up to 30 ms of lost voice. Cisco Voice over IP (VoIP) technology uses 20-ms samples of voice payload per VoIP packet. Therefore, for the codec correction algorithms to be effective, only a single packet can be lost during any given time.
Packet delay can cause either voice quality degradation due to the end-to-end voice latency or packet loss if the delay is variable. If the end-to-end voice latency becomes too long (250 ms, for example), the conversation begins to sound like two parties talking on a CB radio. If the delay is variable, there is a risk of jitter buffer overruns at the receiving end. Eliminating drops and delays is even more imperative when including fax and modem traffic over IP networks. If packets are lost during fax or modem transmissions, the modems are forced to "retrain" to synchronize again. By examining the causes of packet loss and delay, we can gain an understanding of why Quality of Service (QoS) is needed.
Network congestion can lead to both packet drops and variable packet delays. Voice packet drops from network congestion are usually caused by full transmit buffers on the egress interfaces somewhere in the network. As links or connections approach 100% utilization, the queues servicing those connections become full. When a queue is full, new packets attempting to enter the queue are discarded.
Because network congestion is typically sporadic, delays from congestion tend to be variable in nature. Egress interface queue wait times or large serialization delays cause variable delays of this type. Both of these factors are discussed in the next section, "Delay and Jitter".
Delay is the time it takes for a packet to reach the receiving endpoint after being transmitted from the sending endpoint. This time is termed the "end-to-end delay” and it consists of two components: fixed network delay and variable network delay. Jitter is the delta, or difference, in the total end-to-end delay values of two voice packets in the voice flow.
Fixed network delay should be examined during the initial design of the VoIP network. The International Telecommunications Union (ITU) standard G.114 states that a one-way delay budget of 150 ms is acceptable for high voice quality. Research at Cisco has shown that there is a negligible difference in voice quality scores using networks built with 200-ms delay budgets. Examples of fixed network delay include the propagation delay of signals between the sending and receiving endpoints, voice encoding delay, and the voice packetization time for various VoIP codecs. Propagation delay calculations work out to almost 0.0063 ms/km. The G.729A codec, for example, has a 25 ms encoding delay value (two 10 ms frames + 5 ms look-ahead) and an additional 20 ms of packetization delay.
Congested egress queues and serialization delays on network interfaces can cause variable packet delays. Without Priority or Low-Latency Queuing (LLQ), queuing delay times equal serialization delay times as link utilization approaches 100%. Serialization delay is a constant function of link speed and packet size. As shown in Table 12, the
larger the packet and the slower the link clocking speed, the greater the serialization delay. While this is a known ratio, it can be considered variable because a larger data packet can enter the egress queue before a voice packet at any time.
If the voice packet must wait for the data packet to serialize, the delay incurred by the voice packet is its own serialization delay plus the serialization delay of the data packet in front of it. Using Link Fragmentation and Interleave (LFI) techniques, serialization delay can be configured to be a constant delay value.
Table 12 Serialisation delay [ms] as function of link speed and packet size
Link speed \ packet size
64 bytes
128 bytes
256 bytes
512 bytes
1024 bytes
1500 bytes
56 kbps 9 18 36 72 144 214
64 kbps 8 16 32 64 128 187
128 kbps 4 8 16 32 64 93
256 kbps 2 4 8 16 32 46
512 kbps 1 2 4 8 16 23
2048 kbps (E1) 0,25 0,5 1 2 4 5,8
34 Mbps (E3) 0,015 0,3 0,06 0,12 0,24 0,35
155 Mbps (STM-1)
3.3*10-3 0,006 0,013 0,026 0,052 0,077
622 Mbps (STM-4)
0,82*10-3 1,6*10-3 3,3*10-3 6,6*10-3 0,013 0,019
2.5 Gbps (STM-16)
0,2*10-3 0,4*10-3 0,82*10-3 1,6*10-3 3,3*10-3 4,8*10-3
Because network congestion can be encountered at any time within a network, buffers can fill instantaneously. This instantaneous buffer utilization can lead to a difference in delay times between packets in the same voice stream. This difference, called jitter, is the variation between when a packet is expected to arrive and when it actually is received. To compensate for these delay variations between voice packets in a conversation, VoIP endpoints use jitter buffers to turn the delay variations into a constant value so that voice can be played out smoothly.
Cisco VoIP endpoints use DSP algorithms that have an adaptive jitter buffer between 20 and 50 ms, as illustrated in the following picture. The actual size of the buffer varies between 20 and 50 ms based on the expected voice packet network delay. These algorithms examine the timestamps in the Real-time Transport Protocol (RTP) header of the voice packets, calculate the expected delay, and adjust the jitter buffer size accordingly. When this adaptive jitter buffer is configured, a 10-ms portion of "extra" buffer is configured for variable packet delays. For example, if a stream of packets is entering the jitter buffer with RTP timestamps indicating 23 ms of encountered network
jitter, the receiving VoIP jitter buffer is sized at a maximum of 33 ms. If a packet's jitter is greater than 10 ms above the expected 23-ms delay variation (23 + 10 = 33 ms of dynamically allocated adaptive jitter buffer space), the packet is dropped.
Figure 46 Adaptive jitter buffer
Voice quality is only as good as the quality of the weakest network link. Packet loss, delay, and delay variation all contribute to degraded voice quality. In addition, because network congestion (or more accurately, instantaneous buffer congestion) can occur at any time in any portion of the network, network quality is an end-to-end design issue.
Call admission control is another important issue that needs to be considered. Call admission control is a mechanism for ensuring that voice flows do not exceed the maximum provisioned bandwidth allocated for voice conversations. After doing the calculations to provision the network with the required bandwidth to support voice, data, and possibly video applications, it is important to ensure that voice does not oversubscribe the portion of the bandwidth allocated to it. While most QoS mechanisms are used to protect voice from data, call admission control is used to protect voice from voice. This is illustrated in the following figure, which shows an environment where the network has been provisioned to support two concurrent voice calls. If a third voice call is allowed to proceed, the quality of all three calls is degraded.
Call admission control should be external to the network (ie. not available in Diffserv context).
Figure 47 - Call admission control
Interleaving mechanisms: FRF.12 or MLPPP / LFI
For low-speed WAN connections (in practice, those with a clocking speed of 1 Mbps or below), it is necessary to provide a mechanism for Link Fragmentation and Interleaving (LFI). A data frame can be sent to the physical wire only at the serialization rate of the interface. This serialization rate is the size of the frame divided by the clocking speed of the interface. For example, a 1500-byte frame takes 214 ms to serialize on a 56-kbps circuit. If a delay-sensitive voice packet is behind a large data packet in the egress interface queue, the end-to-end delay budget of 150-200 ms could be exceeded. In addition, even relatively small frames can adversely affect overall voice quality by simply increasing the jitter to a value greater than the size of the adaptive jitter buffer at the receiver.
LFI tools are used to fragment large data frames into regularly sized pieces and to interleave voice frames into the flow so that the end-to-end delay can be predicted accurately. This places bounds on jitter by preventing voice traffic from being delayed behind large data frames, as illustrated in the following figure. The two techniques used for this are FRF.12 for Frame Relay and Multilink Point-to-Point Protocol (MLPPP) for point-to-point serial links.
Figure 48 LFI to reduce frame delay and jitter
A 10-ms blocking delay is the recommended target to use for setting fragmentation size. To calculate the recommended fragment size, divide the recommended 10 ms of delay by one byte of traffic at the provisioned line clocking speed, as follows:
Fragment_Size = (Max_Allowed_Jitter * Link_Speed_in_kbps) / 8
For example:
Fragment_Size = (10 ms * 56) / 8 = 70 bytes
The following table shows the recommended fragment size for various link speeds.
Table 13 Recommended fragment size
Link Speed(kbps)
Recommended fragment size(bytes)
56 70
64 80
128 160
256 620
512 640
768 960
Obviously, the fragmentation size should be set larger than the largest VoIP packet in order to ensure that no VoIP packets get fragmented.
When using FRF.12 as an LFI mechanism on a Frame Relay access link, traffic shaping (either FRTS or dTS) becomes mandatory. Enabling FRF.12 will have an impact on the FRTS / dTS shaping parameters, since it adds 4 bytes of overhead to each fragment (2 bytes of FRF.12 overhead and 2 bytes of Cisco encapsulation overhead). The FRTS implementation will take into account this additional overhead (but still not the FCS and flag overhead) but the dTS overhead will not take into account the additional FRF.12 / Cisco encapsulation overhead). This is because FRF.12 runs in distributed mode on the VIP (dFRF.12).
Delay Model
The delay model for an IP packet consists of the summary of individual delays of nodes and links that are part of the end-to-end connection. The main factors that determine the overall end-to-end delay are typically:
Serialisation delay of narrow-band links
Propagation delays of long distance connections
Queuing delay in case of congestion situations
All times have to be described statistically, and must be seen as average in a certain time period.
Table 14 The components of the end-to-end delay model
Decision DelayTDecision
This is the required time in a node to decide what interface a packet should go out. There can be a dependence on node utilisation, but in general on the high-end platforms TDecision < 1ms.
Queuing DelayTQueuing
Queuing delay has variable dependencies to determine this delay, queue length, queuing mechanism, line utilisation, platform and CPU utilisation.
During times of non-congestion, there is no queuing delay; once congestion occurs the extra CPU cycles required to manage the scheduling has a small impact on the delay variable in the network.
Serialisation DelayTSerialisation
This is the time that is necessary to put a packet of a certain size on a line of a certain speed (please see the Table 12)
Transmit Buffer DelayTTransmit
On the egress interface a single buffer exist which additionally has an influence on the transmit delay. This buffer is used to control the various queuing mechanisms (CBWFQ/MDRR) in front of the transmit queue, by using a threshold. The length of this queue can be configured. A suited set-up has to be decided upon to minimise delay and maximise efficiency.
Propagation DelayTPropagation
Describes the speed of light in a fibre which is about 6 ms per 1000 Km (2/3 c0)
Node DelayTNode
The node delay summarises all node dependent delays per node.
TransmitQueuingDecisionNode TTTT
Link DelayTLink
The link delay summarises all link dependent delays per link.
opagationionSerializatLink TTT Pr
Core DelayTCore
The core delay summarises all core dependent delays, which are all node and link
delays inside the core. This includes PE routers, P routers and the links in-between. Summarizing node and link delay for the core simplifies the delay model.
Core Core
LinkNodeCore TTT
Access DelayTAccess
The access delay summarises all access dependent delays, which are all node and link delays in the access network. This includes CE routers, PE routers and the links in-between. Summarizing node and link delay for the access network simplifies the delay model.
)( )(
)(xAccess xAccess
LinkNodexAccess TTT
End-to-End DelayTEnd-to-End
The end-to-end delay is defined by the following formula:
)()( remoteAccessCorelocalAccessendtoEnd TTTT
Figure 49 Overview of end-to-end delay segments.
QoS in an MPLS network
MPLS is a technology allowing multi-service networking in an IP environment. In MPLS packets QoS information is carried in the EXP bits of the MPLS header of frame based MPLS packets. The MPLS EXP bits are only three bits long, while the DSCP bits are six. Therefore not all the information is copied directly from the DSCP IP field into the MPLS EXP field. Only the class selector (three most significant bits) are copied into the MPLS EXP bits by default as demonstrated in the following figure.
Figure 50 DSCP to EXP mapping
IP L3 Header
0 2 3 4 5 6 7 8
C lass SelectorC odepoint
U nusedD SC P codepoint fie ld
20 21 22
M PLS EX P
MPLS Header
L3 codepointcopied from
DSCP to MPLSEXP bits
demonstrates the DSCP/EXP location; the MPLS header is pre-pended to the front of the IP packet. It is also feasible that multiple labels are added to the front of the IP packets instead of the one demonstrated in the drawing (e.g. MPLS/VPN label, TE label, FRR label). In such case, the QoS features in MPLS core devices shall only look in the EXP
CE PE P P PE CE
Tnode Tlink
Tdecission Tqueueing Ttransmit Tserialization Tpropagation
Taccess TaccessTcore
bits of the top-most label as the DSCP and “inner” labels in the label stack may carry customer-defined classes of services.
Figure 51 DSCP / MPLS Headers
IPv4 Packet IPv4 Packet Label x
DSCPabcd
DSCPabcd
EXPab
IPv4 Dom ain MPLS Dom ain
8.3.3 Mipnet QoS design – An Overview
The following table and figure give an overview of the various QoS mechanisms that will be used Mipnet.
The various QoS mechanisms and their detailed configuration will be discussed in detail in the subsequent sections. Detailed configuration templates will be derived during staging procedures. It should be understood that the IP addresses, DLCI numbers, ACL numbers, etc, have been taken for the sake of examples and should be adapted to the specific requirements of Mipnet.
It is Cisco’s experience that a Quality of Service design and deployment is never a straightforward process – after an initial deployment, a performance assessment phase and subsequent tuning of the QoS deployment is a necessity. Therefore, we strongly recommend a tuning phase while beta customers are connected.
Table 15 CoS Mechanisms Overview
MarketintgClass
QoSMechanism
StandardBest Effort data
(e.g. http)
BusinessBusiness data
(e.g. SNA)
StreamingMultimedia
(e.g. Video)
VoiceVoIP
In-contr.
Out-contr.
In-contr.
Out-contr.
PHBDSCP
EXP
BE
0
0
AF11
10
1
AF21
18
2
AF31
26
3
AF41
34
4
EF
46
Max. % of link BW 25% 30% 25% 20%
Queue Length long medium short very short
Classification CE any non-classified packet
ACL 100 ACL 101 ACL 102
PE DSCP DSCP DSCP DSCP
P EXP EXP EXP EXP
Marking CE MQCLI MQCLI MQCLI MQCLI
PE - - - -
P - - - -
Policing CE - MQCLI MQCLI MQCLI
P/PE - - - -
Class Queuing
Access class-default business streaming voice (LLQ)
Core class-default business streaming voice (LLQ)
Congestion Avoidance
CE,PE DSCP WRED
DSCP WRED DSCP WRED Tail drop
P EXP WRED EXP WRED EXP WRED Tail drop
(minTH=maxTH)
The drawing below displays an overview of QoS mechanisms used in Mipnet. The following chapters will detail the QoS design on a hop-by-hop basis, following the packet from source (left CE) to its destination (right CE).
Figure 52 QoS mechanisms overview
MPLS
IP IP
GSRP
MDRRWRED
MDRRWRED
7206VXRPE
N/A
LLQ
WR
ED
7206 VXRPE
N/A
LLQWRED
Managed CE
N/A
LLQ
WR
ED
Managed CE
N/A
N/A
LAN
MPLS
LAN
Classification (ACL)Marking (DSCP)
Policing (MQCLI)Queuing (DSCP)
Cong. mgmt. (DSCP)
Classification (DSCP)Marking (DSCP->EXP auto)
Queuing (DSCP)Cong. mgmt. (DSCP)
Classification (DSCP)Queuing (DSCP)Cong. mgmt. (DSCP)
Classification (EXP)Queuing (EXP MDRR)
Cong. mgmt (EXP)[to-fabric, to-interface]
8.3.4 CE-to-PE QoS mechanisms (applied on the CE)
Classification
On the CE, packets will be classified with extended access lists (ACLs). These ACLs can match packets on Source/Destination IP address, protocol type, and UDP/TCP port numbers.
The ACLs for Business (100) Streaming (101) and Voice (102) traffic should be agreed with the customer. Any non-classified traffic will go into Standard traffic class, which is implemented as class-default in MQC defintion.
The following is an example ACL for Voice traffic:
!! Voice!access-list 102 permit udp any any range 16384 32767!! Voice Signalling MGCP!access-list 102 permit udp any any eq 2427access-list 102 permit tcp any any eq 2428access-list 102 permit tcp any any eq 1720!! H.323 voice control traffic!access-list 102 permit tcp any any range 11000 11999!
The ACL for Management (103) traffic should match SNMP, TFTP, TELNET and any other required traffic to and from the network management systems IP address range.
!access-list 103 permit tcp any any eq bgpaccess-list 103 permit udp any any eq ripaccess-list 103 permit tcp any <NOC_lan> eq telnetaccess-list 103 permit udp any <NOC_lan> eq snmpaccess-list 103 permit udp any <NOC_lan> eq tftp!
Voice signalling traffic will need to be classified and marked appropriately. Depending on the customer VoIP implementation, the different possibilities are:
RTCP: odd RTP port numbers
H.323 / H.245 standard connect: TCP 11xxx
H.323 / H.245 fast connect: TCP 1720
H.323 / H.225 RAS: TCP 1719
Skinny control traffic: TCP 2000-2002
ICCP: TCP 8001-8002
MGCP: UDP 2427, TCP 2428
Dependent on the actual signalling method used (packet sizes), speed of the access links and the number of concurrent voice call set-ups that need to be supported, two possible design options can be taken with regards to the queuing method used.
Queue the voice signalling packets in the same PQ as the actual voice bearer packets. This will result in a simpler design but could delay the transmission of some of the voice bearer packets (dependent on voice signalling packet size, access link speed and number of concurrent voice call set-ups). This could than have an impact on the voice delay / jitter.
Queue the voice signalling packets in another normal class queue. This should ideally be a separate class queue from the ones that are used for regular data traffic to ensure delivery of the voice signalling packets. This will result in a more complicated design where bandwidth needs to be allocated for the voice signalling class. Also, voice signalling packets might be delayed through the network resulting in a delay in the voice call set-up process. The advantage is that the actual voice quality will not be impacted as no voice signalling packets will travel in the PQ.
Testing has indicated that, without cRTP (Compressed Real Time Protocol) enabled, the effect of mapping VoIP signalling packets together with the VoIP bearer packets in the same priority queue is negligible. The signalling packets have little effect on the latency nor do they cause any drops due to the default bust size of 200ms that has been built into the priority queue. Therefore, the design recommendation is to match the VoIP signalling packets with ACL 102 and queue them together with the VoIP bearer packets in the priority queue.
It should however be understood that VoIP signalling implementations differ and that some might have a negative effect on the performance of the priority queue. In that event, the VoIP signalling traffic needs to be mapped in another class queue (Business, for example).
The classified traffic will subsequently be mapped in their respective classes using the MQCLI. The Standard traffic will not match any of the classes and will be mapped in the default class (class-default). A maximum of 64 classes can be defined on a single router.
!class-map match-all business match access-group 100class-map match-all streaming
match access-group 101class-map match-all voice match access-group 102class-map match-any management match access-group 103!
Marking
After classification, packets need to be marked with their appropriate IP precedence or DSCP value. The following is the required configuration for Class Based Marking on CE router.
The following is the required configuration for LPR marking of the locally generated management traffic. As discussed before, ACL 103 matches all management traffic.
!ip local policy route-map management!route-map management permit 10 match ip address 103 ! here we simulate the set ip dscp 48 command set ip precedence 6 set ip tos 0!
The In/Out-contract design is less restrictive than simple policing of class bandwidth to SLA limit, because it allows customer to exceed the subscribed class-BW thresholds when other classes on CE-PE link are underutilised. This is because in CBWFQ queuing strategy, the bandwidth of underutilised traffic classes can be consumed by other classes proportionally with the respective configured class bandwidths.
Instead of policing in each of the traffic classes, it is possible to introduce a mechanism of in / out contract for the Business and Streaming traffic classes. The main reasons behind this recommendation are twofold:
In an MPLS / VPN environment, it should be avoided that well behaving customer sites are penalised by ill-behaving customer sites. A well behaving customer site is a site which sends traffic into the network below the Ingress Committed Rate (ICR), and this on a per traffic class basis. An ill behaving site sends traffic into the network above the ICR for a particular traffic class. The problem is that, if a well behaving site and an ill behaving site both send traffic to a third site, congestion might occur on the egress PE to that site. If there is no way of differentiating between the “well behaving” traffic and “ill behaving” traffic, traffic from the well behaving site might be dropped instead of traffic from the ill behaving site. The introduction of an in / out contract traffic marking mechanism at the ingress CE will prevent this.
The introduction of in / out contract traffic profiles will facilitate the capacity planning of the backbone network which is shared among the different MPLS / VPN customers. Indeed, the shared backbone network needs to be engineered and capacity planned only for the in-contract part of the customer traffic. When, in a second phase, QoS mechanisms are deployed in the core backbone network due to possible backbone congestion, it will be possible to differentiate the out-contract traffic from the in-contract traffic and as a result, discard the out-contract traffic earlier.
The following would be the required configuration for Police marking of the Business, Streaming and Voice traffic classes in Mipnet:
The in-contract Business traffic is marked as AF11 (DSCP 10). The out-contract Business traffic is marked as AF21 (DSCP 18).
The in-contract Streaming traffic is marked as AF31 (DSCP 26). The out-contract Streaming traffic is marked as AF41 (DSCP 34).
The Voice traffic is marked as EF (DSCP 46). The notion of out-contract traffic does not apply to jitter-sensitive Voice class (WRED is not applicable in LLQ).
!policy-map customer_profile class business police 128000 8000 16000 conform-action set-dscp-transmit 10 exceed-action set-dscp-transmit 18 class streaming police 64000 2000 2000 conform-action set-dscp-transmit 26 exceed-action set-dscp-transmit 34 class voice police 64000 2000 2000 conform-action set-dscp-transmit 46 exceed-action drop!
The following figure depict the in/out-contract marking in Businness and Streaming traffic classes. As previously described, any packets beyond subscribed bandwidth of Business class would be re-coloured and subject to more aggressive WRED dropping profile.
Figure 53 In/Out-contract Marking and Policing (example for Business class)
SLA Limit
Re-coloringout-contract
Coloringin-contract
x I P payload
18 I P payload
10 I P payload
Marking (MQCLI )Congestion
Management(WRED)
Out-contract traffic droppedbefore any in-contract packet
I P Packet
Policing
Policing in Voice traffic class is configured to provide rudimentary call admission thereby policing voice traffic levels into the core network. The Policing is carried out by the exceed-action option on the end of the police command. Anything over the expected number of voice calls bandwidth will not be forwarded. If a customer attempts to exceed this limit then all the calls flowing through that specific CE-PE connection could be affected to degradation in the quality of all the simultaneous calls. However, this affect is much better than single customer affecting all the other customers of Mipnet sharing a specific backbone link.
If the Business and Streaming traffic classes need to be policed to subscribed SLA limits using MQCLI police commands, a few important points surrounding the policing implementation should be understood.
Policing propagates bursts to a certain extent. It does not shape the traffic flow and as such does not cause any packet delay.
Police bandwidths need to be configured in 8 Kbps multiples. This needs to be reflected in the TMN service offerings.
Compared to CAR, police bandwidths include some layer-2 overhead (please see the Class Queuing chapter for details).
The police configuration requires the setting of the <normal-burst> NB and <excess-burst> EB parameters. These are parameters used in police’s Token Bucket algorithm.
For TCP oriented classes such as Business class, the recommended settings for rate limit normal and excess burst are:
NB = max(8000, RTT x CIR/8) bytes
EB = 2 x NB
The calculation result is rounded to the nearest 1000-byte boundary. The following table identifies the recommended NB and EB values in function of the access link speed, with RTT of 0.05s.
Note: If the burst values are configured too low, the achieved rate may be much lower than the configured rate. For this reason the values below may not be appropriate for any traffic profile and It is mandatory to test the NB/EB settings in lab environment, before deployment in production network.
Table 16 NB and EB settings
CIR [kbps]
NB [byte] EB [byte]
64 8000 16000
128 8000 16000
256 8000 16000
512 8000 16000
1024 8000 16000
2048 12800 25600
34368 214800 429600
100000 625000 1250000
155520 972000 1944000
The recommended settings for rate limit normal and excess burst for the VoIP class:
NB = 2000
EB = NB (CBR like policer to avoid jitter)
Class Queuing
Queuing within the classes will be implemented through Low latency Queuing (LLQ). LLQ is in fact the combination of Class Based Weighted Fair Queuing (CBWFQ) and Priority Queuing (PQ). The PQ is used for delay sensitive traffic such as VoIP. LLQ will be configured through the MQCLI.
Different traffic classes – a maximum of 64 traffic classes can be defined on a single router – can be combined in a service policy. This is kind of a traffic profile. Each of the classes in the service policy will be assigned a minimum bandwidth according to the service contract that has been agreed with the customer. The minimum bandwidth that can be configured is 8 Kbps. Under congestion, each of the traffic classes will have this minimum bandwidth available:
If one class is congested (and so experiences delay), the congestion is isolated from other classes, which still have a guaranteed minimum share of the link bandwidth.
If one class is under-utilised, other classes can use the available bandwidth. All flows and classes get a proportionate share of the spare bandwidth. The proportion is dictated by the configured bandwidth for classes where the higher the allocated bandwidth, the higher the proportion allocated. For flow-based weighted fair queuing, configurable in the default-queue, the proportion of available bandwidth is allocated based on the precedence of the packets where the packets with the highest precedence values get the highest proportion of bandwidth.
This enables worst-case bounds on delay and jitter to be designed independently between the classes whilst preventing any single class from being starved by over utilisation on
other classes. Also, other parameters like congestion avoidance and control parameters can be configured on a per-class basis. This will be discussed further on.
The sum of the minimum bandwidths reserved for the customer traffic classes needs to be lower than the total link bandwidth. Some bandwidth needs to be reserved for management traffic and routing traffic. Since TMN will offer a managed service, it needs to keep control over the CEs, even under congestion circumstances. Also the routing traffic – which is BGP or RIP in this case – needs to have some minimum bandwidth available (8 Kbps or 1 %, whatever is larger).
It should also be understood that the actual minimum bandwidths configured through MQCLI include the following layer 2 overhead, in contrast with CAR which only includes pure layer 3 IP bandwidth. Overhead added by the hardware (CRC, flags) is not included in the MQCLI bandwidths.
The 8 bytes of SNAP/LLC overhead and 4 bytes of the 8-byte AAL5 trailer for ATM interfaces (the remaining 4 bytes of the AAL5 trailer CRC are not taken into account). AAL5 padding is equally not taken into account. The ATM cell overhead (5 bytes per cell payload of 48 bytes) is not taken into account.
The 4-byte Frame Relay overhead for Cisco Frame Relay encapsulation (additional overhead due to possible FRF.12 headers is not taken into account). CRC and flags overhead is not taken into account.
The 2 bytes of PPP encapsulation overhead.
Also, all reports shall indicate the configured rates – so including the L2 overhead. It is worth considering for TMN to include the L2 overhead in traffic contracts with customers. This would ensure consistency in between the contracted bandwidths and the performance reports.
After defining the service policy in a policy-map, it needs to be applied on an interface (service-policy).
By default, on the non-distributed router platforms (non VIP based), the sum of the minimum bandwidths needs to be lower than 75 % of the configured access bandwidth. Since the actual required sum of minimum bandwidths will probably be larger, this default parameter setting can be changed (maximum-reserved-bandwidth) to 100 %. However, it is also a very good design practice not to push the design boundaries to the edge without allowing for any margin of error or unexpected traffic patterns. Therefore, it is still recommended to keep the sum of all minimum bandwidths below 100 %. Keeping the sum of all minimum bandwidths around 95 % will allow for unaccounted traffic such as layer 2 overhead, layer 2 keepalives, LMI (in the case of Frame Relay), etc.
The following is the sample configuration for LLQ class queuing. Class bandwidths can be configured in [kbps] or [%] of (max-res-bw – voice-bw).
!policy-map customer_profile
class business bandwidth percent 30 class streaming bandwidth percent 20 class voice priority 64 class management bandwidth percent 5 class class-default bandwidth percent 45!interface Serial0/1 bandwidth 512 encapsulation ppp max-reserved-bandwidth 95 service-policy output customer_profile clockrate 512000!
In the configuration template above, the Voice traffic class has been allocated 64kbs of link capacity. The “priority” command guarantees bandwidth to the priority class and restrains the flow of packets from the priority class: when the link is not congested, the priority class traffic is allowed to exceed its allocated bandwidth (but we will police it to contractual Voice class bandwidth). When the device is congested, the priority class traffic above the allocated bandwidth is discarded.
Business, Streaming, Management and Standard classes will share the remaining max-reserved-bandwidth as configured. For example, the Streaming traffic class will receive minimum bandwidth of ((512*95%)-64)*20% = 84 kbps in congestion periods.
Congestion avoidance
Congestion avoidance techniques monitor network traffic loads in an effort to anticipate and avoid congestion at common network bottlenecks. Congestion avoidance is achieved through packet dropping. Among the more commonly used congestion avoidance mechanisms is Random Early Detection (RED), which is optimum for high-speed transit networks. Cisco IOS QoS includes an implementation of RED that, when configured, controls when the router drops packets. If there is no Weighted Random Early Detection (WRED) configured, the router uses the cruder default packet drop mechanism called tail drop.
WRED combine the capabilities of the RED algorithm with the IP Precedence feature. Within the section on WRED, the following related features are discussed:
Tail Drop. Tail drop is the default congestion avoidance behaviour when WRED is not configured. Tail drop treats all traffic equally and does not differentiate between
classes of service within the same queue. Queues fill during periods of congestion. When the output queue is full and tail drop is in effect, packets are dropped until the congestion is eliminated and the queue is no longer full.
Weighted Random Early Detection. WRED avoids the globalisation problems that occur when tail drop is used as the congestion avoidance mechanism on the router. Global synchronisation occurs as waves of congestion crest only to be followed by troughs during which the transmission link is not fully utilised. Global synchronisation of TCP hosts, for example, can occur because packets are dropped all at once. Global synchronisation manifests when multiple TCP hosts reduce their transmission rates in response to packet dropping, then increase their transmission rates once again when the congestion is reduced.
About Random Early Detection
The RED mechanism was proposed by Sally Floyd and Van Jacobson in the early 1990s to address network congestion in a responsive rather than reactive manner. Underlying the RED mechanism is the premise that most traffic runs on data transport implementations that are sensitive to loss and will temporarily slow down when some of their traffic is dropped. TCP, which responds appropriately—even robustly—to traffic drop by slowing down its traffic transmission, effectively allows the traffic-drop behavior of RED to work as a congestion-avoidance signalling mechanism.
TCP constitutes the most heavily used network transport. Given the ubiquitous presence of TCP, RED offers a widespread, effective congestion-avoidance mechanism. The minimum threshold value should be set high enough to maximise the link utilisation. If the minimum threshold is too low, packets may be dropped unnecessarily, and the transmission link will not be fully used.
The difference between the maximum threshold and the minimum threshold should be large enough to avoid global synchronisation of TCP hosts (global synchronisation of TCP hosts can occur as multiple TCP hosts reduce their transmission rates). If the difference between the maximum and minimum thresholds is too small, many packets may be dropped at once, resulting in global synchronisation.
Random drops occur once the average queue length exceeds the minimum thresholds, once the average queue equals the maximum threshold the number of dropped packets equals the maximum drop probability value. When the average queue is greater than the maximum threshold then all packets are dropped.
Weighted random early detection
WRED makes early detection of congestion possible and provides for multiple classes of traffic. It also protects against global synchronisation. For these reasons, WRED is useful on any output interface where congestion is expected to occur.
However, WRED is usually used in the core routers of a network, rather than at the edge of the network. Edge routers assign IP precedence to packets as they enter the network. WRED uses this precedence to determine how to treat different types of traffic.
WRED provides separate thresholds and weights for different IP precedence, allowing ability to provide different qualities of service in regard to packet dropping for different traffic types. Standard traffic may be dropped more frequently than premium traffic during periods of congestion.
DiffServ compliant WRED
DiffServ Compliant WRED extends WRED to support Differentiated Services (DiffServ) and Assured Forwarding (AF) Per Hop Behavior (PHB). This feature enables customers to implement AF PHB by coloring packets according to differentiated services code point (DSCP) values and then assigning preferential drop probabilities to those packets.
The dscp-based argument enables WRED to use the DSCP value of a packet when it calculates the drop probability for the packet. The prec-based argument enables WRED to use the IP Precedence value of a packet when it calculates the drop probability for the packet. After enabling WRED to use the DSCP value, you can then use the new random-detect dscp command to change the minimum and maximum packet thresholds for that DSCP value.
MPLS compliant WRED
The MPLS Compliant WRED feature enables WRED to use the MPLS EXP value when it calculates the drop probability for a packet. The MPLS value is the 3 bits of the MPLS Experimental bits in the label header.
MPLS based WRED is automatically enabled if the transmitting packet has a MPLS header and uses the same values from the precedence configuration.
WRED operation
WRED is a congestion avoidance and control mechanism whereby packets will be randomly dropped when the average class queue depth reaches a certain minimum threshold (min-threshold). As congestion increases, packets will be randomly dropped (and with a rising drop probability) until a second threshold (max-threshold) where packets will be dropped with a drop probability equal to the mark-probability-denominator. Above max-threshold, packets are tail-dropped.
The following picture depicts the WRED algorithm.
Figure 54 WRED Algorithm
Avg. length ofclass queue
1
0
minTHout maxTHout = minTHin maxTHin
Out-contract profile
I n-contract profile
WRED will selectively instruct TCP stacks to back-off by dropping packets. Obviously, WRED has no influence on UDP based applications (besides the fact that their packets will be dropped equally).
The average queue depth is calculated using the following formula:
new_average = (old_average * (1-2-e) + (current_queue_depth * 2-e)
The “e” is the “exponential weighting constant”. The larger this constant, the slower the WRED algorithm will react. The smaller this constant, the faster the WRED algorithm will react. The exponential weighting constant can be set on a per-class basis. The min-threshold, max-threshold and mark probability denominator can be set on a per precedence or per DSCP basis.
The mark probability denominator should always be set to 1 (100 % drop probability at max-threshold).
WRED design objective in Mipnet
WRED will be applied on the Business, Streaming and Standard traffic classes. In the Business and Streaming traffic classes, the min-threshold and max-threshold values for the out-contract traffic will be lower than the min-threshold and max-threshold values of the in-contract traffic. This will ensure that all12 out-contract traffic will be dropped before in-contract traffic starts dropping.
In order to reduce the packet delay and jitter in the Streaming class, smaller min-threshold and max-threshold values will be used compared to the Business and Standard classes.
In order to reduce the packet loss in the Business class, larger min-threshold and max-threshold values will be used compared to the Streaming class.
12 This is not entirely true because with bursty traffic the average queue may not follow actual queue length very closely.
Again, it should be stressed that tuning QoS parameters is never a straightforward process and the results are depending on a large number of factors, including the offered traffic load and profile, the ratio of load to available capacity, the behaviour of end-system TCP stacks in the event of packet drops, etc. Therefore, it is strongly recommended to test these settings in a testbed environment using expected customer traffic profiles and to tune them, if required. In addition, after an initial production beta deployment, a performance assessment phase and subsequent tuning of the QoS deployment is a necessity.
Minimum and Maximum Thresholds
Each queue has a required length to serve its purpose of attempting to maintain specific maximum delay values. Depending on the service that will be using a specific queue, one may want to increase or decrease the time that packets are allowed in a queue before WRED starts dropping.
Different queue lengths have been selected for each of the defined classes. Each class serves data with distinct delay, jitter and packet loss sensitivities therefore dictating how long a queue can be before packets can be dropped.
The Business traffic class will be servicing mostly TCP data that is somewhat sensitive to delay but more so to packet loss, hence the medium sized queue. The Streaming traffic class will be serving data such as streaming video based on UDP that is sensitive to delay but less so to packet loss. A short queue allows us to estimate end-to-end delay. The Standard traffic class serves best effort data without a specific maximum end-to-end delay or packet loss requirement. A long queue that starts dropping earlier than other queues but at a lower ration because of a shallower RED curve is therefore ideal.
The values used below are estimate values and must be adjusted once TMN has a better understanding of their traffic patterns and quality of service.
The minimum and maximum WRED threshold values are calculated on the basis of the allocated class bandwidth and not on the link bandwidth. This will yield the most realistic results. The following generic formula is used to derive WRED thresholds based on the maximum allowed delay:
The minimum and maximum queue thresholds for each of the service classes will be calculated as follows:
maxTH =classBW [byt/s]
MTU [byt]delay [s] *
B [pkts/ s]
Business Class – Medium Queue – Max per-hop delay 100ms:
Min-threshold = 0.03 x B
Max-threshold = 0.1 x B
With B representing the class bandwidth in MTU sized packets per second. For Mipnet a MTU size of 1500 bytes is assumed. On the core trunks the management traffic will be carried in the Business class. For obvious reasons we have to protect the management traffic from customers’ traffic flows with less aggressive packet drop policy. The following are min and max thresholds for management traffic (DSCP 48) within the Business traffic class:
Min-threshold = 0.1 x B
Max-threshold = 0.2 x B
Streaming Class – Short Queue – Max per-hop delay 50ms:
Min-threshold = 0.015 x B
Max-threshold = 0.05 x B
With B representing the bandwidth in MTU sized packets per second. For Mipnet a MTU size of 1500 bytes is assumed.
Standard Class – Long Queue – Max per-hop delay 150ms:
Min-threshold = 0.045 x B
Max-threshold = 0.15 x B
With B representing the bandwidth in MTU sized packets per second. For Mipnet a MTU size of 1500 bytes is assumed.
WRED profile of Out-contract traffic will ensure that all out-contract traffic will be dropped before the dropping of in-contract packets starts. The out-contract WRED profile for Business and Streaming classes will be determined as:
Max-thresholdOUT = Min-thresholdIN
Min-thresholdOUT = max(1, 0.2 x Max-thresholdOUT)
For Voice traffic it is necessary to implement tail-drop to minimise and predict delay/jitter under congestion conditions. Therefore, no WRED will be used for the Voice traffic class (except on the GSR). WRED will also not be applied to management class.
The WRED min-threshold and max-threshold (calculated on basis of the class bandwidth) settings are as detailed in the following tables. They represent the values to be used across all platforms.
If TMN wishes to offer a class-bw, which is not included in the following tables, the min/max thresholds can be calculated as per formulas above.
Table 17 WRED Settings for Business Class.
Link Speedin kbps B
Link BW minTH
Link BW maxTH
Class BW 10% in kbps
Class BW 10% minTH
Class BW 10% maxTH
Class BW 20% in kbps
Class BW 20% minTH
Class BW 20% maxTH
Class BW 25% in kbps
Class BW 25% minTH
Class BW 25% maxTH
Class BW 30% in kbps
Class BW 30% minTH
Class BW 30% maxTH
64 6 3 9
128 11 3 9
256 22 3 9
512 43 3 9
1024 86 3 9
2048 171 6 18
10000 834 26 84 1000 3 9 2000 6 17 2500 7 21 3000 8 26
34684 2891 87 290 3468 9 29 6937 18 58 8671 22 7310405 27 87
100000 8334 251 834
10000 26 84
20000 51 167
25000 63 209
30000 76 251
155000
12917 388 1292
15500 39 130
31000 78 259
38750 97 323
46500 117 388
622000
51834 1556 5184
62200 156 519
124400 312 1037
155500 389 1296
186600 467 1556
2400000
200000 6000 20000
240000 600 2000
480000 1200 4000
600000 1500 5000
720000 1800 6000
For values smaller than E1, on a class percentage, the calculated value will be less than 3 for the MIN Threshold and 9 for the MAX threshold. Any smaller value will defeat the objectives of WRED, seeing that the router would not allow for much burst and react to aggressively in dropping the packets.
These values are therefore no considered in the calculations.
Table 18 WRED Settings for Streaming Class.
Link Spee B
Link BW
Link BW
Class
Class
Class BW
Class
Class
Class BW
Class
Class
Class BW
Class
Class
Class BW
din kbps
minTH
maxTH
BW 10% in kbps
BW 10% minTH
10% maxTH
BW 20% in kbps
BW 20% minTH
20% maxTH
BW 25% in kbps
BW 25% minTH
25% maxTH
BW 30% in kbps
BW 30% minTH
30% maxTH
64 6 3 9
128 11 3 9
256 22 3 9
512 43 3 9
1024 86 3 9
2048 171 3 9
10000 834 13 42 1000 3 9 2000 3 9 2500 4 11 3000 4 13
34684 2891 44 145 3468 5 15 6937 9 29 8671 11 3710405 14 44
100000 8334 126 417
10000 13 42
20000 26 84
25000 32 105
30000 38 126
155000
12917 194 646
15500 20 65
31000 39 130
38750 49 162
46500 59 194
622000
51834 778 2592
62200 78 260
124400 156 519
155500 195 648
186600 234 778
2400000
200000 3000 10000
240000 300 1000
480000 600 2000
600000 750 2500
720000 900 3000
Table 19 WRED Settings for Standard Class.
Link Speedin kbps B
Link BW minTH
Link BW maxTH
Class BW 10% in kbps
Class BW 10% minTH
Class BW 10% maxTH
Class BW 20% in kbps
Class BW 20% minTH
Class BW 20% maxTH
Class BW 25% in kbps
Class BW 25% minTH
Class BW 25% maxTH
Class BW 30% in kbps
Class BW 30% minTH
Class BW 30% maxTH
64 6 3 9
128 11 3 9
256 22 3 9
512 43 3 9
1024 86 4 13
2048 171 8 26
10000 834 38 126 1000 4 13 2000 8 26 2500 10 32 3000 12 38
34684 2891 131 434 3468 14 44 6937 27 87 8671 33 10910405 40 131
100000 8334 376 1251
10000 38 126
20000 76 251
25000 94 313
30000 113 376
155000
12917 582 1938
15500 59 194
31000 117 388
38750 146 485
46500 175 582
622000
51834 2333 7776
62200 234 778
124400 467 1556
155500 584 1944
186600 700 2333
2400000
200000 9000 30000
240000 900 3000
480000 1800 6000
600000 2250 7500
720000 2700 9000
Drop Probability
The drop probability at max-threshold for all classes will initially be configured as mark-propability-denominator=1. This means that when the average-queue-length reaches the max-threshold, all packets will be dropped until the average goes below the Max-threshold.
The formulae for this is:
mpd1
This means that when setting the mpd to 2 for instance, ½ according to the formula above represents that at the “max-threshold” only half or rather 50% of the all the packets are being dropped. This also means that the ratio at which the packets are dropped as the average queue length increases is also lower than if the mpd was set “to 1 for instance, seeing that an mpd of 1 actually means that 1/1 or 100% packets are dropped at “max-threshold.
Why is it important to set mpd to 1 rather than to another value? The answer is predictability. When calculating the other values for WRED, we know that any packet after Max-threshold is tail dropped. Therefore, by setting the mpd to 1, we ensure a more realistic drop ratio throughout the WRED curve. If the value was set to 2 for instance, WRED would only drop a number of packets so to reach a 50% drop ratio by the time the average queue depth reaches the Max-threshold and then, all of a sudden, one packet takes it over the Max-threshold and the packet drops go from 50% to 100%.
Exp. Weighting Const
WRED calculates an exponentially weighted average queue size, rather than the current queue size, when deciding the packet drop probability. The current average queue length depends on the previous average and on the queue's current actual size. In using an average queue size, RED achieved its goal to not react to momentary burstiness in the network and react only to persistent congestion.
With high values of exponential-weighting-constant, the average queue size closely tracks the old average queue size and more freely accommodates changes in the current queue size, resulting in the ability for RED to accommodate temporary bursts in traffic, smoothing out the peaks and troughs in the current queue size. RED is slow to start dropping packets, but it can continue dropping packets for a time after the actual queue size falls below the minimum threshold.
If exponential-weighting-constant is too high, RED does not react to congestion, as the current queue size becomes insignificant in calculating the average queue size. Packets are transmitted or dropped as if RED were not in effect.
With low values of exponential-weighting-constant, the average queue size closely tracks the current queue size, which enables the average queue size to move rapidly with the changing traffic levels. This means the RED process responds quickly to long queues. When the queue falls below the minimum threshold, the process stops dropping packets.
If exponential-weighting-constant it too low, RED overreacts to temporary traffic bursts and drops traffic unnecessarily. The formula for calculating exponential-weighting-constant (ewc) is as follows:
ewc = 10/B if Line Rate (core)/Committed Rate (edge) <= 34Mbps
ewc = 1/B if Line Rate (core)/Committed Rate (edge) > 34Mbps
,where B is the rate of 1500 byte packets (i.e. CEILING(Rate[kbps] * 1000 / 8 / 1500).
The configured exponential-weighting-constant (x) is applied to the router configuration as a negative power of 2. The relation between ewc and the configured value is:
ewc = 2-x which can be rewritten as:
1/ewc = 2x and the final formula for configured ewc is:
x = ln(1/ewc) / ln(2)
x = ln(B/10) / ln(2) if Line Rate (core)/Committed Rate (edge) <= 34Mbps
x = ln(B) / ln(2) if Line Rate (core)/Committed Rate (edge) > 34Mbps
Note:
The exponential-weighting-constant parameter is calculated based on the Class Bandwidth value and NOT on the link rate. For the GSR12000, however, since it is not possible to configure per class, the exponential-weighting-constant is calculated based on the link rate.
The ewc for Standard class (class-default) shall be based on link rate.
If the Class Bandwidth Allocation is configured as a percentage value in MQC, this should be converted to a value in Kbps for calculating ewc.
The following table computes the exponential-weighting-constant in function of the link speed (GSR) or class speed.
Table 20 WRED - exponential weighting constant
Link Speedin kbps B
B or B/10 x
Class BW 10% in kbps B
B or B/10 x
Class BW 20% in kbps B
B or B/10 x
Class BW 25% in kbps B
B or B/10 x
Class BW 30% in kbps B
B or B/10 x
32 3 3 3 3.2 1 1 3 6.4 1 1 3 8 1 1 3 9.6 1 1 3
64 6 6 3 6.4 1 1 3 12.8 2 2 3 16 2 2 3 19.2 2 2 3
128 11 11 3 12.8 2 2 3 25.6 3 3 3 32 3 3 3 38.4 4 4 3
256 22 22 4 25.6 3 3 3 51.2 5 5 3 64 6 6 3 76.8 7 7 3
512 43 43 5 51.2 5 5 3 102.4 9 9 3 128 11 11 3 153.6 13 13 4
1024 86 86 6 102.4 9 9 3 204.8 18 18 4 256 22 22 4 307.2 26 26 5
2048 171 171 7 204.8 18 18 4 409.6 35 35 5 512 43 43 5 614.4 52 52 6
10000834 834 1
01000 84 84 6 2000 167 167 7 2500 209 209 8 3000 250 250 8
346842891 289.1 8 3468.
4290 290 8 6936.
8579 579 9 8671 723 723 9 10405
.2868 868 1
0
100000
8334 833.4 10
10000
834 834 10
20000
1667 1667 11
25000
2084 2084 11
30000 25002500
11
155000
12917
1291.7
10
15500
1292 1292
10
31000
2584 2584 11
38750
3230 323 8 46500 3875387.5
9
622000
51834
5183.4
12
62200
5184 518.4
9 124400
10367
1036.7
10
155500
12959
1295.9
10
186600
15550
1555
11
24000 2000 2000 1 2400 2000 200 1 4800 4000 4000 1 6000 5000 5000 1 72000 6000600 1
00 00 0 4 00 0 0 1 00 0 2 00 0 2 0 0 0 3
The following is the required WRED configuration template on CE-PE link.
!policy-map customer_profile class voice ! class streaming random-detect dscp-based random-detect exponential-weighting-constant <x> random-detect dscp 26 <minTHin> <maxTHin> 1 random-detect dscp 34 <minTHout> <maxTHout> 1 class business random-detect dscp-based random-detect exponential-weighting-constant <x> random-detect dscp 10 <minTHin> <maxTHin> 1 random-detect dscp 18 <minTHout> <maxTHout> 1 class management ! class class-default random-detect dscp-based random-detect exponential-weighting-constant <x> random-detect dscp 0 <minTH> <maxTH> 1 !
8.3.5 CE-to-PE QoS mechanisms (applied on the PE)
The QoS mechanisms used on the PE (7206VXR platforms) are basically a subset of the mechanisms used on the non-distributed CE platforms. The configuration on the PEs is almost identical to the one on the CE. There are some differences and these will be highlighted.
Classification
The traffic can be classified on PE routers by matching the DSCP values, because all traffic has already been properly marked on the CEs when entering the network.
Traffic classification on CE-PE connection is required only for packets received from unmanaged CEs and Internet connections as explained below.
Marking
No customer traffic packet marking would be performed on the PE, since all packets have already been marked appropriately on the ingress CEs.
The management traffic generated locally on the PE will be marked through Local Policy Routing (LPR). The configuration template is the same as on the CE router.
Policing
Traffic has been already policed on the CE router so there’s no need to police the traffic coming from managed CE routers on the PE.
Unmanaged CEs and Unmanaged Internet CPEs
Unmanaged CE means that TMN does not have control over the CE router in customer’s premises, i.e. the customer is managing the CE device.
Service without QoS
The decision is that by default no QoS will be implemented and offered to customers with unmanaged CEs. In other words, traffic received from unmanaged CE router will be treated as best effort within the Mipnet and as such assigned to Standard traffic class. This is also true for customers who will not subscribe to TMN QoS services (even if the CE is managed by TMN).
The following configuration template will classify and mark the traffic from unmanaged CE routers13.
!policy-map unmanaged_CE class class-default set ip dscp 0!interface Serial 2/0/1:0 bandwidth <bw> description Link to unmanaged CE service-policy input unmanaged_CE!
The second example shows how the police command can be used to limit the bandwidth on high-speed circuits to subscribed subrate of kbps.
!policy-map limit_customer_512k
13 Also, traffic received from upstream transit providers, peering partners and Internet customers must be marked with DSCP 0, to prevent “precedence-spoofing” attacks.
class class-default police 512000 1280025600 conform-action set-dscp-transmit 0 exceed- action drop!interface Serial 2/0/1:0 bandwidth 2000 description Link to unmanaged CE with subrate of 512kb service-policy input limit_customer_512k!
Customer configures QoS on the CE router
TMN can in theory co-ordinate a proper CE router QoS configuration to the customer (vie e-mail or phone support), but based on our experiences this is in most cases extremely painful procedure for the service provider. QoS configuration, monitoring and troubleshooting is extremely complex task and may result in service disruption if non-skilled customers adjust the QoS parameters on customer-managed CE routers. It is then not trivial to prove to such customer that the TMN core network was operating normally when the customer experienced service outage due to QoS misconfiguration!
The following configuration template shows how to re-enforce the marking of traffic classes for unmanaged CE routers. The policy-map would have to be replicated and tuned for each customer. The Voice class is not allowed because we can’t trust customer who may in theory send 1500b packets marked with DSCP 46, which would affect the quality of VoIP calls of other Mipnet customers.
On the CE side, the QoS configuration template of managed CE can be reused for unmanaged CE routers.
!! Customer has already classified and marked the IP packets on unmanaged CE! The classification class-map is the same as with managed CE routers (the! same config for all CEs)!class-map match-any management match ip dscp 48 match access-group 103class-map match-any businessmatch ip dscp 10
match ip dscp 18class-map match-all streamingmatch ip dscp 26
match ip dscp 34!! TMN must mark the traffic classes according to SLA ! of that customer – this is customer-specific configuration and can result! in a very long router configuration file.!policy-map CUSTx_police class business police <bps> <nb> <eb> conform-action set-dscp-transmit 10 exceed-action set-dscp-transmit 18 class streaming police <bps> <nb> <eb> conform conform-action set-dscp-transmit 26 exceed-action set-dscp-transmit 34 class management police <bps> <nb> <eb> conform-action transmit exceed-action drop class class-default set ip dscp 0!interface Serial 2/0/1:0 bandwidth <bandiwdth> description Link to unmanaged CE of customer X service-policy input CUSTx_police
Dial Customers
IP traffic from dial-up users will be marked with DSCP 0 on LNS/PE.
8.3.6 PE-to-P QoS mechanisms (applied on the PE)
Classification
The customer traffic received from the CE routers has been marked with the DSCP. On MPLS uplinks the DSCP value will be automatically mapped in the MPLS EXP bits as shown in Figure 50.
The following configuration example depicts the EXP based classification on PE-P uplinks. MPLS frames needs to be classified in order to perform queuing and apply proper WRED drop policy.
!class-map match-any business_management match mpls experimental 1 2 6
class-map match-any streaming match mpls experimental 3 4class-map match-any voice match mpls experimental 5!
Marking
IP packets will be encapsulated in MPLS frames when leaving the PE router. The DSCP code point value (i.e. the precedence bits) will be automatically mapped into EXP bits of MPLS label. No further configuration is needed.
Class queuing
The following is an example configuration for the class queuing on PE-to-P trunks.
!policy-map PE-P class business_management bandwidth percent 30 class streaming bandwidth percent 25 class voice priority percent 20 class class-default bandwidth percent 25!interface POS 2/0/1 description PE-P core link bandwidth 155000 service-policy output PE_P!
Congestion avoidance
WRED will be used for graded packet dropping in each traffic class. The DSCP-based WRED is currently supported on MPLS uplinks.
The following configuration template will be used for congestion management on PE-to-P links. WRED thresholds and ewc are derived in the same way as for the CE-to-PE links.
!policy-map PE-P
class business_management random-detect dscp-based random-detect exponential-weighting-constant 9 random-detect dscp 10 117 388 1 random-detect dscp 18 23 117 1 random-detect dscp 48 388 775 1 class streaming random-detect dscp-based random-detect exponential-weighting-constant 8 random-detect dscp 26 49 162 1 random-detect dscp 34 9 49 1 class class-default random-detect dscp-based random-detect exponential-weighting-constant 11 random-detect dscp 0 146 485 1 !
8.3.7 PE-P, P-P and P-PE QoS mechanisms for 12000s (applied on the P)
Class Queuing (MDRR)
MDRR is architecturally different from LLQ where bandwidth is not reserved per class but rather weights or ”timeslots” are allocated for each class. With MDRR we have the ability to manipulate queue weights to define the quantum or time spent servicing a queue. Also, like the PQ in LLQ, MDRR has a low latency queue typically used for servicing real-time traffic such as voice.
The GSR also differs architecturally from the other platforms in that it maintains two instances of queuing with MDRR during the flow of a packet from the input interface to the output interface. The first instance is called “to fabric” and the second instance is called “from fabric”.
“To-fabric” or RX-COS MDRR
The “to-frabric” MDRR is applied exactly as the name implies – packets exiting a line card to the switching fabric. The considerations to take here are that, unlike “from fabric” queuing, one does not know the destination port line speed but still need to take all possibilities into account. Consider the following:
Packets come in from a high speed STM-16 port and are destined to exit through a lower speed STM-1 card. Clearly, this will cause congestion. The ability to push back the congestion management to before the packets hit the switching fabric is clearly beneficial. Therefore, when creating a traffic management policy or “cos-group” as it is known in MDRR, one must first create one for each available interface in the chassis. In Mipnet, the “to-fabric” and “from-fabric” cos-groups are the same because we want the same behaviour at both queuing instances.
The application method is as follows:
For packets destined to a slot with an STM-1 line card, an STM-1 cos-group will be applied, regardless of what the source line card is.
For packets destined to a slot with an STM-4 line card, an STM-1 cos-group is applied if the source line card is STM-1, STM-4 cos-group if the source is an STM-4.
In doing so, one applies the cos-group depending on what the destination slot is, therefore avoiding “congestion” on the switching fabric.
“From-fabric” or TX-COS MDRR
The “from-fabric” MDRR is a lot simpler is terms of configuration. The queueing occurs at the egress to the TX-queue. At this stage, one knows the exit slot and interface speed. The cos-group is simply applied to the actual interface, just like a service policy is applied to an interface on a 7XXX platform.
MDRR queuing operation
Each DRR queue can be given a relative weight, with one of the queues in the group is defined as a low latency queue. This is done via the queue command under the cos-queue-group.
queue <0-6> <1-2048>queue low-latency [alternate-priority | strict-priority] <1-2048>
The weights give a relative bandwidth for each queue when the interface is congested. The DRR algorithm de-queues data from each queue in turn if there is data in the queue to be sent. So if all the regular DRR queues have data in them they will be serviced as the following:
0-1-2-3-4-5-6-0-1-2-3-4-5-6...
On each time through cycle the queue will get to packet de-queue the quantum Q that is proportional to the configured queue weight W. Packet de-queue quantum Qn is:
Qn = MTU + (Wn - 1)*512
A value of 1 is equivalent of giving the interface a weight of its MTU. For each increment above 1, the weight of the queue increases by 512 bytes. For example, if the MTU of a particular interface is 4470 and the weight of a queue is configured to be 3, each time through the rotation 4470 + (3-1)*512 = 5494 bytes will be allowed to be de-queued. If for example 2 normal DRR queues, Queue0 and Queue1 are used, Queue0 is configured with a weight of 1 and Queue1 configured with a weight of 9. If both queues were congested, each time through the rotation Queue0 would be allowed to send 4470 bytes and Queue1 would be allowed to send 4470 + (9-1)*512 = 8566 bytes. This would give traffic going Queue0 approximately 1/3 of the bandwidth and the traffic going through Queue1 about 2/3.
The low latency queue can be added to give more priority to certain traffic. The low latency queue can be given 2 different priorities within the group. It can be put in strict priority or in alternating priority. In strict priority, this queue is serviced whenever it is non-empty.
To minimize the jitter in Voice class, the LLQ will be configured in strict priority mode.
The following table gives an example for MDRR weights that can be used in Mipnet as initial queuing and class capacity definition. Weights have been calculated following the algorithm above.
MTU on POS links is 4470.
Table 21 MDRR weights
Service Class
% of link BW
Queue STM-1 155 Mbps
STM-4622 Mbps
Class BW [Mbps]
Weight Class BW [Mbps]
Weight
Voice 20 low latency
31 10 124 10
Business, Mgmt
30 2 48 13 184 13
Streaming 25 1 38 10 157 10
Standard 25 0 38 10 157 10
MDRR configuration guide for Mipnet
From-fabric (TX COS)
Each interface has eight COS queues, which can be configured independently. Flexible mapping between IP precedence and the eight possible queues is offered in the MDRR implementation. MDRR allows a maximum of eight queues so that each IP precedence value can be made its own queue. The mapping is flexible; however, the number of queues needed and the precedence values mapped to those queues are user-configurable. It is possible to map one or more precedence values into a queue.
The TMN network will have four queues,
Low-Latency Queue. Low-latency queue will be a strict priority queue; this will be for VoIP traffic packets marked with MPLS EXP 5 will be forwarded to this queue. 20% of the available physical bandwidth will be available for voice traffic.
Queue 2 will be used for Business (in/out-contract) and Management traffic classes. Packets marked MPLS EXP 1, 2 and 6 will be forwarded to this queue. 30% of the available physical bandwidth will be available for Business and management traffic.
Queue 1 will be the Streaming data queue for delay sensitive traffic but variable packet sizes. Packets marked with MPLS EXP 3 and 4 will be forwarded to this queue. 25% of available physical bandwidth will be available for streaming traffic.
Queue 0 will be for default-classified traffic – i.e. Standard traffic class. MPLS EXP 0 will be forwarded to this queue. 25% of the available physical bandwidth will be available for best-effort traffic.
The following commands are an example configuration in Mipnet. The same MDRR TX-COS configuration could be applied to STM-1 and STM-4 links, but the WRED parameters will be different. So we have to have one cos-queue-group per link capacity. However, the same cos-queue-group can be applied on RX and TX side; this will reduce the size of router configuration file.
The precedence-based configuration acts on EXP bits in the case of MPLS packets.
!cos-queue-group STM<1,4> ! Duplicated for each rate, same for RX and TX side prec 0 queue 0 ! Map the packet with PREC/EXP=0 into queue 0 prec 1 queue 2 prec 2 queue 2 prec 3 queue 1 prec 4 queue 1 prec 5 queue low-latency prec 6 queue 2 prec 7 queue 2 queue 0 10 queue 1 10 queue 2 13 queue low-latency strict-priority 10!interface pos 3/1 description This is STM-<1,4> backbone link tx-cos STM<1,4>
To-fabric or RX COS
In addition to the transmit COS, a receive COS will also be configured. The queues will be identical to the interface transmits queues, but instead of being applied directly to the line interface they are built as a table and applied from the receive buffer to the backbone fabric buffers.
Each line card has 8 COS queues per destination interface. All the interfaces on a destination slot have the same COS parameters.
In the example, the slot-table-cos stm-to-fabric command defines the COS policy for destination line cards 0,1 and 2,3 based on the STM-1 and STM-4 cos-queue-group. The rx-cost-slot command applies the stm-to-fabric slot-table-cos configuration to a particular slot (line card). As previously mentioned, the cos-groups will be applied as follows:
For packets destined to a slot with an STM-1 line card, an STM-1 cos-group will be applied, regardless of what the source line card is.
For packets destined to a slot with an STM-4 line card, an STM-1 cos-group is applied if the source line card is STM-1, STM-4 cos-group if the source is an STM-4.
!rx-cos-slot 0 STM1-TO-FABRIC ! We have STM-1 interfaces in this slotrx-cos-slot 1 STM1-TO-FABRIC ! We have STM-1 interfaces in this slotrx-cos-slot 2 STM4-TO-FABRIC ! We have STM-4 interfaces in this slotrx-cos-slot 3 STM4-TO-FABRIC ! We have STM-4 interfaces in this slot!slot-table-cos STM1-TO-FABRIC destination-slot all STM1!slot-table-cos STM4-TO-FABRIC destination-slot 0 STM1 destination-slot 1 STM1 destination-slot 2 STM4 destination-slot 3 STM4!
Congestion management
WRED parameters on GSR routers will follow the guidelines already explained for CE and PE routers. The GSR-specific configuration is depicted in this chapter.
Exponential weighting constant
On GSR the ewc cannot be configured on a per-class basis. For this reason, the link bandwidth will be used to calculate the ewc. According to Table 20 the ewc for STM-1 links will be 10, and for STM-4 links the ewc will have the value of 12.
Policing of Voice class with WRED
Because the provisioning rule for Mipnet is max. 20% Voice traffic on a link, congestion in the Voice class would therefore be highly unlikely and if at all, only under extreme cases such as multiple flows from STM-4 links to a single STM-1. Nonetheless the remote possibility of this occurring should be prevented.
In the GSR and MDRR, tail-drop or control of the LLQ is not possible. In order to achieve this, a WRED setting will be applied on the LLQ. The MIN/MAX-threshold settings will be calculated based on a maximum delay of 3ms and an average packet size of 64 bytes. The idea is to allow a small burst. The MIN-threshold will therefore be quite small and the MAX-threshold will be equal to the MIN.
Random-detect-label 5 will be used to apply WRED on the Voice traffic (valid for ENG-2 linceards as well).
Max-threshold (Voice) ~ 0.003 x B [256 for STM-1, 2048 for STM-16]
Min-threshold = Max-threshold
where B = bandwidth in MTU sized packets. For Voice an MTU of 64 bytes is assumed.
WRED Configuration
This is an example configuration template for WRED on STM-1 GSR links.
Please note that “precedence x random-detect-label y” statements apply to IP packets with precedence x and also to MPLS frames with EXP bits set to x. “y” here refers to index of WRED profile.
!cos-queue-group STM1 ! Duplicated for each STM rate with precedence 0 random-detect-label 0 ! appropriate WRED thresholds and EWC precedence 1 random-detect-label 1 precedence 2 random-detect-label 2 precedence 3 random-detect-label 3 precedence 4 random-detect-label 4 precedence 5 random-detect-label 5 precedence 6 random-detect-label 6 precedence 7 random-detect-label 6 random-detect-label 0 146 485 1 ! Standard random-detect-label 1 117 388 1 ! Business in-contract
random-detect-label 2 23 117 1 ! Business out-contract random-detect-label 3 49 162 1 ! Streaming in-contract random-detect-label 4 9 49 1 ! Streaming out-contract random-detect-label 5 180 181 1 ! Voice (3ms tail-drop of 64byt packets) random-detect-label 6 388 775 1 ! Routing & Management exponential-weighting-constant 10 ! 10 is default!
8.3.8 PE to CE QoS mechanisms (applied on the PE)
Classification
The traffic will be classified by matching the DSCP values, for scheduling onto PE-CE connection. Management traffic is carried in a dedicated Management class on PE-CE links. Classification of locally sourced traffic with LPR has already been demonstrated.
The following configuration template will classify the traffic for queuing and congestion management on PE-CE link (outbound direction).
!class-map match-any business match ip dscp 10 18class-map match-any streaming match ip dscp 26 34class-map match-any voice match ip dscp 46class-map match-any management match ip dscp 48!
Class queuing
The following is the sample configuration for the class queuing on PE-to-CE links. Please note that class bandwidths shall match with those configured on the CE side.
!policy-map PE-CE class business bandwidth percent 35 class streaming bandwidth percent 20 class voice priority 64
class management bandwidth percent 5 class class-default bandwidth percent 40!interface Serial 2/0/1:1.1 description PE-CE access layer link bandwidth 512 max-reserved-bandwidth 95 service-policy output PE-CE!
Congestion avoidance
WRED one PE-CE link shall be configured with the same parameters as on the CE router. Below is a sample configuration template.
!policy-map PE-CE class voice ! class streaming random-detect dscp-based random-detect exponential-weighting-constant <x> random-detect dscp 26 <minTHin> <maxTHin> 1 random-detect dscp 34 <minTHout> <maxTHout> 1 class business random-detect dscp-based random-detect exponential-weighting-constant <x> random-detect dscp 10 <minTHin> <maxTHin> 1 random-detect dscp 18 <minTHout> <maxTHout> 1 class management ! class class-default random-detect dscp-based random-detect exponential-weighting-constant <x> random-detect dscp 0 <minTH> <maxTH> 1 !
8.3.9 QoS mechanisms on Frame Relay DLCIs
Non-distributed platforms
In the event a Frame Relay access network is being used in between the CEs and PEs, Frame Relay Traffic Shaping needs to be used to shape the traffic to a traffic rate equal to the Committed Information Rate (CIR) of the Frame Relay PVC.
Unlike CAR, FRTS will actually delay the traffic by applying queuing. FRTS is required in order to create the necessary backpressure for the LLQ class queuing mechanism to kick in. In other words, if FRTS would not be there, the Frame Relay PVC would be able to transmit at a rate higher than the actual CIR of the PVC. This could result in random packet drops in the Frame Relay access cloud. Obviously, packets drops in the Frame Relay network will not take into account the traffic classification of the IP traffic. With FRTS enabled, the PVC will not be allowed to transmit at a higher rate that it’s configured CIR. As a result, no packets will be dropped in the Frame Relay access cloud, and normal packet prioritisation among configured traffic classes will occur.
In terms of configuration, this basically means that a Frame Relay class needs to be defined. FRTS will be configured in this Frame Relay class and the required service policy will be applied on the Frame Relay class. Lastly, the Frame Relay class needs to be applied to the PVC and FRTS needs to be enabled on the main interface.
The following are the FRTS recommendations for the MINCIR / CIR, Bc (Committed Burst) and Be (Excess Burst) values:
MINCIR / CIR (bps): These values do not take into account the full frame size. In fact, 2 bytes of FCS and 2 bytes of flags (before and after the frame) are added by the transmitting hardware and are therefore not included in the MINCIR and CIR shaping values. The recommendation is to configure the MINCIR and CIR to a value slightly lower than the actual contracted CIR (with the Frame Relay access provider). The MINCIR and CIR values can be configured to 0.98 of the contracted CIR. This will compensate for 4 bytes of FCS and flags overhead on an average IP packet size of 200 bytes. In the event that the actual average packet size will be smaller than 200 bytes, the 0.98 figure needs to be reduced. Configuring this parameter correctly is especially important when there is only 1 PVC per physical link with a CIR equal to the access speed. Without this reduction of the MINCIR / CIR, FRTS would never kick in, and packets, in the case of congestion, would be dropped on the physical output interface without any consideration of the traffic class which is applied through the service policy. It should be clear that the sum of the LLQ class bandwidths cannot exceed the configured MINCIR / CIR value. Thus: MINCIR = CIR = 0.98 x contracted CIR
Bc (bits): In general, the smaller the Bc value, the smoother is the shaping process. On the low-end platforms, the minimum time between token bucket replenishment is 10 msec. Thus: Bc = 0.01 x MINCIR
Be (bits): Again, the smaller the Be value, the smoother the shaping process. The minimum value is 0. Thus: Be = 0
The following is the required configuration for FRTS on the CE.
!map-class frame-relay FR_profile no frame-relay adaptive-shaping frame-relay cir 125440 frame-relay bc 1254 frame-relay be 0 frame-relay mincir 125440 service-policy output customer_profile!interface Serial0/0 no ip address encapsulation frame-relay IETF no fair-queue frame-relay traffic-shaping frame-relay lmi-type ansi frame-relay intf-type dce!interface Serial0/0.1 point-to-point bandwidth 128 ip address 14.1.1.6 255.255.255.252 no cdp enable frame-relay interface-dlci 100 class FR_profile!
7500 VIP-based platforms
Although the principle of traffic shaping is identical, the implementation is different on the VIP platforms. Indeed, due to the 7500 hardware architecture, the traffic shaping implementation on the 7500 platform is distributed in nature. Therefore, it is referred to as Distributed Traffic Shaping (dTS) or as Class Based Shaping. This is because it’s configuration uses hierarchical service policies. A first service policy defines the traffic shaping Target Bit Rate (analogous to the MINCIR / CIR values as discussed in the CE FRTS section). To the first “shaping” service policy, the second “class” service policy is then applied. Finally, the “shaping” service policy is applied to a frame relay class which is applied to a PVC.
The following are the dTS recommendations for the Target Bit Rate, Bits per Interval (sustained) and Bits per Interval (Excess) values:
Target Bit Rate (bps): These values do not take into account the full frame size. In fact, 2 bytes of FCS and 2 bytes of flags (before and after the frame) are added by the transmitting hardware and are therefore not included in the Target Bit Rate value. The recommendation is to configure the Target Bit Rate to a value slightly lower than the actual contracted CIR (with the Frame Relay access provider). The Target Bit Rate value can be configured to 0.98 of the contracted CIR. This will compensate for 4 bytes of FCS and flags overhead on an average IP packet size of 200 bytes. In the event that the actual average packet size will be smaller than 200 bytes, the 0.98
figure needs to be reduced. It is important to note that the Target Bit rate needs to be a multiple of 8 Kbps. It should be clear that the sum of the LLQ class bandwidths cannot exceed the configured Target Bit Rate value. Thus: Target Bit Rate = 0.98 x contracted CIR (closest 8 Kbps multiple)
Bits per Interval - sustained (bits): In general, the smaller this value, the smoother is the shaping process. On the VIP platforms, the minimum time between token bucket replenishment is 4 msec. Thus: Bc = 0.004 x Target Bit Rate
Bits per Interval - excess (bits): Again, the smaller this value, the smoother the shaping process. The minimum value is 0. Thus: Be = 0
The following is the required configuration for dTS on the 7500 series CE.
!policy-map Shape_profile class class-default shape average 120000 480 0 service-policy customer_profile!map-class frame-relay FR_profile service-policy output Shape_profile!interface Serial2/0/0:0 no ip address encapsulation frame-relay IETF ip route-cache distributed no fair-queue clockrate 128000 frame-relay lmi-type ansi!interface Serial2/0/0:0.1 point-to-point bandwidth 128 ip address 14.1.1.5 255.255.255.252 no cdp enable frame-relay interface-dlci 100 class FR_profile!
8.4 Dial
8.4.1 Dial to MPLS-VPN Infrastructure
Integration of Dial access into MPLS-VPN will be done using Cisco’s “Naiad” solution. As MPLS-VPN Provider Edge (PE) functionality is not supported on Cisco’s Access Servers (AS5xxx) due to scalability reasons, the PPP sessions will be L2TP-tunneled to a vHGW that acts as an MPLS-VPN PE. The overall architecture is illustrated in Figure 55.
LAC(AS5350)
P
P
PECE
CE(Backup)
RemoteUser
IPPPP IPPPPL2TPIP IPLabel(s) IP
PPP Session
LNS/PE
LNS/PE
MPLS/VPN network
Figure 55 Dial to MPLS-VPN Architecture
A connection from a remote user or a CE using its ISDN interfaces is as follows
1. User calls the LAC using ISDN or analog and starts the PPP negotiation with the LAC.
2. After negotiating PPP LCP and agreement on the authentication protocol (PAP or CHAP), LAC receives PAP/CHAP challenge from the user.
3. Based on the domain name contained in the user’s challenge (i.e. “[email protected]”), the LAC forwards the PPP session to an LNS/PE using Layer2 Tunneling Protocol (L2TP). This selection can also be based on the DNIS contained in the ISDN SETUP message.
4. The LNS continues the PPP negotiation, authenticates the user against an AAA server (Radius).
5. If the authentication succeeds, the LNS creates a virtual-access interfaces based on the information contained in the Radius Access-Accept packet. This information contains IP address (pool, static routes) and VRF information.
6. The virtual-access interface will be placed into the specified VRF and the user will be able to reach other sites within his VRF.
8.4.2 Integration into MipNet’s Topology
MipNet uses Cisco 7200 series routers equipped with an NPE-G1 as MPLS-VPN PE. Those routers can also serve as vHGW/LNS. As the POP Podgorica will use two PE’s (one in “Podgorica MTKC”, the other in “Podgorica TKC”), both PE’s will be used as LNS to provide redundancy in case of a PE failure. Both PE’s can either be used in an active/active (i.e. L2TP tunnels will be load-balanced across both PE’s) or active/standby configuration (L2TP tunnels will be terminated on a “primary” PE, the “secondary” PE will take over in case the primary fails). To distribute the load more evenly across the PE’s, active/active redundancy will initially be deployed.
8.4.3 Physical Topology and MTU Setting
The physical topology of the AS5350 connections is shown in Figure 56 below. One FastEthernet interface will connected the AS5350 to each of the two Catalyst 6509 switches to provide redundancy. Four E1s connect to the PSTN/ISDN network.
AS 5350
PE MTKC7206 VXR
Podgorica
Cat6509
Cat 6509
MPLS Core
PE TKC7206 VXR
Fa0/0
Fa0/1
PSTN
E1/PRI
Trunk 1
Trunk 2
Trunk 3
Figure 56 AS5350 Connection - Physical Topology
In every tunneled environment (L2TP, PPPoE, etc.), the problem of MTU and fragmentation arises when Ethernet links are in the transit path. This problem occurs when a 1500 byte packet arrives on the LAC and needs to be L2TP-encapsulated and sent over the Ethernet connection to the LNS (or vice versa). As L2TP adds another 40 byte header to the packet (20 byte IP + 8 byte UDP + 12 byte L2TP), the encapsulated packet can’t be transmitted over an Ethernet segment with a default MTU of 1500, thus it needs to be fragmented on the LAC (or LNS). To avoid this performance impact, several methods exist (see http://www.cisco.com/warp/public/471/l2tp_mtu_tuning.html). In MipNet’s environment we recommend to increase the Ethernet MTU on the LAC and PE’s to at least 1540 and enable jumbo frame support for those Vlans on the Cat6509.
Note: MTU can not be tuned on given hardware, which may imply fragmentation as described above. It is recommended to monitor CPU utilisation during peak periods.
8.4.4 Logical Topology
The LAC/AS5350 will need to communicate with both PE’s via FastEthernet. To fully use all redundant FastEthernet connections, two Vlans must be configured in the Catalyst’s Layer 2 infrastructure. Each Vlan connect one AS5350 FastEthernet port to both PE’s via GigabitEthernet subinterfaces on the PE. Both Vlans need to be allowed on all the 802.1q trunk connections (Trunk 1, 2 and 3 as shown in Figure 56).
This results in the following Layer 3 topology (logical topology) in Figure 57. As each Vlan needs to support three hosts, at least a /29 subnet each needs to be assigned. This also adds room for a redundant AS5350 which might be deployed in the future.
AS 5350
PE MTKC7206 VXR
PE TKC7206 VXR
Fa0/0
Fa0/1
Vlan A
Vlan B
GigE0/0.100
GigE0/0.101
GigE0/0.100
GigE0/0.101
Figure 57 AS5350 Connection - Logical Topology
8.4.5 Routing
To allow for a fully redundant setup, L2TP tunnels between the LAC/AS530 and the LNS/PE will use dedicated Loopback interfaces as source and destination addresses. This way L2TP tunnels will stay connected when communication via one of the Vlans fails.
The following communication relationships exist within this network segment:
LAC/AS5350 establishes L2TP tunnels to both LNS/PE’s
LNS/PE’s communicate via SGBP to bundle Multilink sessions (Multi-Chassis-Multilink-PPP)
LAC/AS5350 must be reachable for management
The common Vlans connecting LAC and LNS should not be used to carry PE-PE MPLS traffic, this traffic should be forwarded only via the PE-P links through the MPLS core. These requirements ask for a special routing design where IS-IS cannot be used:
If the LAC/AS5350 participated in IS-IS, both PE’s would treat these connections as an alternate path and would use it for PE-PE (tagged) traffic when the primary links fail. To avoid this situation, we introduce a new IGP via those links (OSPF) where only the Loopback addresses used for L2TP and SGBP as well as the LAC’s management loopback interface and the common Vlan addresses are carried. Both PE’s originate a default-route into the OSPF domain. The OSPF domain will then be redistributed into IS-IS to ensure network-wide reachability for LAC management purposes, redistribution filters will be deployed on the PEs to prevent the default-route to be redistributed back into the IS-IS domain.
To only carry L2TP and SGBP traffic across the OSPF domain, dedicated loopback addresses/interfaces will be used independent from the loopback interfaces used for management/LDP/BGP next-hop.
As the LAC doesn’t terminate any user connection directly, it will not redistribute any external routes into the IS-IS domain.
The layer 3 topology (see Figure 57) creates a scenario where equal-cost paths exist between the LAC and each PE. To avoid any out-of-order delivery of packets, CEF per-destination load-sharing (default setting) must be used.
The OSPF configuration on the PE will look like this:
interface Loopback1
description used for L2TP/SGBP only ip address x.x.x.x 255.255.255.255!router ospf 1 router-id x.x.x.x network x.x.x.x 0.0.0.0 area 0 network <vlan1-address> 0.0.0.7 area 0 network <vlan2-address> 0.0.0.7 area 0 default-information originate!router isis redistribute ospf 1 route-map ospf2isis level-2! Prefix list filtering 0.0.0.0/0 only while allowing the restip prefix-list ospf2isis-filter deny 0.0.0.0/0ip prefix-list ospf2isis-filter permit 0.0.0.0/0 le 32!route-map ospf2isis permit 10 match ip address prefix-list ospf2isis-filter
The OSPF configuration on the LAC/AS5350 will include all configured addresses/interfaces.
8.4.6 L2TP Setup
When a dial-in connection arrives at the LAC, the LAC forwards the session to one of the two LNS/PEs. Unless an L2TP tunnel already exist between the LAC and the LNS, a new tunnel will be created. As the user sessions are being load-balanced between both LNS/PE’s, only two L2TP tunnels will exist in the setup, each carrying about half of the PPP sessions. It is possible to use a distinct tunnel for each customer, but this is usually not needed, all customers’ users can share the same tunnel between the LAC and a given PE.
When one PE fails, the L2TP tunnel times out, all sessions forwarded via this tunnel are torn down and the affected users must re-establish the ISDN/modem connections. The sessions will then be forwarded to the other PE.
L2TP Tunnel establishment will be authenticated using locally configured passwords on the LAC and LNS. This adds some protection against misuse.
L2TP tunnel sequencing will not be used as it creates more overhead and is not needed in this simple setup where we can more or less guarantee ordered delivery of datagrams since the LAC and LNS are directly attached.
The LAC will forward all sessions to the LNS/PE, it will not terminate any PPP sessions locally (those sessions would terminate in the global routing table). This setup makes it unnecessary to distinguish certain users by domain name, and we can use the DNIS contained in the ISDN setup message as a selector to forward the sessions to the LNS/PE. The E1’s should be ordered from the PSTN provider as a so-called “Hunt Group”. This way all E1/PRI can be reached via a single telephone number (the DNIS).
Based on this, the following configuration will be used on the LAC to forward the session to the LNS in a round-robin fashion:
username DIAL2MPLS password <PASSWORD>
!vpdn enablevpdn search-order dnis!vpdn-group dial-to-mpls request-dialin protocol l2tp dnis <DNIS> initiate-to ip <PE1-LOOPBACK> initiate-to ip <PE2-LOOPBACK> local name DIAL2MPLS l2tp tunnel password <PASSWORD> source-ip <LAC-LOOPBACK>
If multiple DNIS are being used, one can also reference a dialer dnis group within the vpdn-group and list all used DNIS in the group, i.e.:
vpdn-group dial-to-mpls request-dialin dnis ALL-DNIS!dialer dnis group ALL-DNIS number … number …
The L2TP configuration on the LNS will look like this:
username DIAL2MPLS password <PASSWORD>!vpdn enable!vpdn-group dial-to-mpls accept-dialin protocol l2tp virtual-template 1 terminate-from hostname DIAL2MPLS local name DIAL2MPLS l2tp tunnel password <PASSWORD> lcp renegotiation on-mismatch
8.4.7 PPP Multilink and LNS redundancy
In our proposed configuration, the LAC forwards PPP sessions to both LNS/PE in a round-robin method. When a dial-in or dial-backup user needs more bandwidth than a single B-channel (64 kBit/s), PPP multilink can be used to add additional channel to the Multilink bundle. As the LAC uses a round-robin load-sharing approach, multiple channels for a given user/multilink bundle can end up on different LNS/PE. This setup is commonly referred to as Multi-Chassis-Multilink-PPP.
The Cisco IOS feature used to bundle multiple channels on different chassis is called Stack Group Bidding Protocol (SGBP). All chassis terminating multilink PPP sessions
are statically configured in an SGBP group. When a PPP multilink session is forwarded to an LNS, a SGBP query is sent to all other members asking whether a multilink bundle with the given name already exists. If this is the case, the member again forwards the session using another L2x (L2F or L2TP) tunnel.
The multilink bundle name is usually set to the authenticated peer name (i.e. user name), but it can also be set to the Multilink endpoint discriminator, or both (i.e. “user/endpoint-discriminator”). Even if MipNet might not plan to use non-unique user names, we recommend using the latter option by configuring “multilink bundle-name both”.
SGBP is statically configured on both LNS/PE using a dedicated Loopback address. This loopback address is routed via OSPF via the connected GigabitEthernet Vlans, so all SGBP traffic will use the direct LAN instead of being routed via the MPLS core in order to minimize delay.
The resulting Multi-Chassis Multilink configuration on MTK_PE_1 looks like this:
!aaa authentication sgbp default local!hostname MTK_PE_1!username TKC_PE_1 password ciscousername MULTILINK password cisco!multilink virtual-template 1multilink bundle-name both!sgbp group MULTILINKsgbp member LNSPE2 <loopback1-address of LNSPE2>sgbp source-ip <loopback1-address of LNSPE1>
Note: By default, SGBP authentication uses the PPP authentication default method. This can be changed by the command “aaa authentication sgbp ..” in 12.3 and later. When this command is not available and PPP uses radius, a profile for the SGBP members (“LNSPE2” in the above example) must be created on the Radius server. Another alternative would be to use a non-default method for PPP user authentication/authorization via Radius and leave PPP default authentication defined as local for SGBP authentication. This avoids querying the Radius server for SGBP authentication:
aaa authentication ppp default localaaa authentication ppp DIALIN group radiusaaa authorization network DIALIN group radius!interface Virtual-Template1 … ppp authentication chap pap callin DIALIN ppp authorization DIALIN
NOTE: on Cisco CPE routers using multilink, “ppp link reorder” should be configured on the dialer interface to activate the relaxed lost-fragment detection algorithm.
8.4.8 Addressing
Two different user types will use dial access to their respective VRF:
1. Roaming/remote users
2. CE using ISDN backup because their primary connection has failed
IP addressing for both user types must be distinguished. While remote users are able to access the VRF using a dynamic IP address assigned from a pre-configured address pool, the CE backup connection requires some static routes to be installed on the PE/LNS.
Dynamic address pools can be configured locally on the PE-LNS or downloaded via Radius. The following sample configuration creates three distinct address pools; two of the pools use overlapping IP addresses and must be grouped via the “group <name>” command option:
ip local pool VRF1-POOL 1.1.1.1 1.1.1.127ip local pool VRF2-POOL 2.2.2.1 2.2.2.127 group VRF2ip local pool VRF3-POOL 2.2.2.1 2.2.2.127 group VRF3
Depending on customer address allocation methods, overlapping IP address pools allows to use the same address pool in all customer’s VRFs.
To work around the route propagation delay when redistributing dynamic IP addresses assigned to user connections into MP-BGP, corresponding static Null routes should be provisioned together with the pools. Those Null-routes will be redistributed into MP-BGP and will not change over time:
ip route vrf VRF1 1.1.1.0 255.255.255.128 Null0 200ip route vrf VRF2 2.2.2.0 255.255.255.128 Null0 200ip route vrf VRF3 2.2.2.0 255.255.255.128 Null0 200
Static routes for ISDN backup connections should best be downloaded via Radius as part of the remote user’s AAA authorization phase. The networks will then be redistributed into MP-BGP. This redistribution will create a delay of about 15-30 seconds before the CE dialling in is reachable from all other CE sites. A sample radius profile can be found in the following paragraph.
8.4.9 Authentication, Authorization and Accounting
Radius will be used to authenticate and authorize users connecting to the LAC and LNS/PE. This allows for a central storage of all user-specific information. Unfortunately VPNSC is so far not able to automatically provision Radius user entries, requiring other means to maintain the Radius database.
Most Radius servers use a flat ASCII file containing all user data as a central repository. This notation will be used in the following examples.
A remote user joe@customer uses the password “my-pass” and gets an IP address assigned from address pool “VRF1-POOL”. The virtual-access interface which is cloned on the LNS/PE is put into the VRF “VRF1” and is made unnumbered to the vrf-loopback 1000 already configured on the PE:
[email protected] Password = "my-pass"
Service-Type = Framed, Framed-Protocol = PPP, Cisco-avpair = "ip:addr-pool=VRF1-POOL", Cisco-avpair = "lcp:interface-config#1=ip vrf forwarding VRF1", Cisco-avpair = "lcp:interface-config#2=ip unnumbered Loopback1000"
The next profile authenticates a CE using ISDN backup, the CE’s dialer interface uses the IP address 20.1.1.1, so this address will be assigned to the user. Two static routes will be installed in the customer’s VRF table (20.0.0.0/24 and 2.0.1.0/25). The next-hop can be omitted, the LNS/PE will automatically insert the remote’s next-hop (20.1.1.1). The virtual-access interface configuration also includes VRF and ip unnumbered information as seen in the previous example:
[email protected] Password = "backup-password" Service-Type = Framed, Framed-Protocol = PPP, Framed-IP-Address = "10.1.1.1", Framed-Route = "10.1.2.0 255.255.255.0", Cisco-avpair = "lcp:interface-config#1=ip vrf forwarding VRF1", Cisco-avpair = "lcp:interface-config#2=ip unnumbered Loopback1000"
Cisco ACS profile examples are shown below. Here14 we will make use ACS’ group hierarchy where we configure customer specific attributes (i.e. vrf-membership) within the “Customer” profiles, service-specific attributes (address pool, etc.) within the services profile, and the user profiles only store user-specific information (password, per-user static routes for backup, etc.):
(Root) Customer-VRF1 Dial-In User-1User-2…
Dial-Backup CE-1CE-2…
Customer-VRF2 Dial-In User-1User-2…
Dial-Backup CE-1CE-2…
Sample customer profile assigning the VRF information to the virtual-access interface:
#./ViewProfile -p 9900 -g Customer-VRF1 Group Profile Information
14 This group hierarchy is just an example. Depending on the other Radius attributes to be stored or on MipNet’s preferences, the structure might vary. If another structure will be used, the sample Radius profile mentioned above should be used as a reference as to which attributes must be included.
group = Customer-VRF1{profile_id = 37 profile_cycle = 9 member = MipNet radius=Cisco12.05 {reply_attributes= {9,1="lcp:interface-config#2=ip unnumbered Loopback1000"9,1="lcp:interface-config#1=ip vrf forwarding VRF1"6=2 # corresponds to “User-Service-Type = Framed”7=1 # corresponds to “Framed-Protocol = PPP”} } }
Service profile assigning a IP pool definition where the dynamic IP address will be allocated from:
# ./ViewProfile -p 9900 -g Dial-In Group Profile Informationgroup = Dial-In{profile_id = 46 profile_cycle = 2 member = Customer-VRF1 radius=Cisco12.05 {reply_attributes= {9,1="ip:addr-pool=VRF1-POOL"} } }
Finally a user profile for a dial-in customer. As all service and vrf-specific attributes are included in the parent group(s), this profile only contains the password as a check item:
# ./ViewProfile -p 9900 -u [email protected] Profile Informationuser = [email protected]{profile_id = 38 profile_cycle = 4 member = Dial-In radius=Cisco12.05 {check_items= {2=my-pass} } }
A profile for a dial-backup customer (i.e. CE router) contains IP address information and a static route for the CE LAN. Multiple static routes can be included in the profile:
# ./ViewProfile -p 9900 -u [email protected] Profile Informationuser = [email protected]{profile_id = 45 profile_cycle = 3 member = Dial-Backup radius=Cisco12.05 {check_items= {2=backup-password} reply_attributes= {6=27=18=167837953 # in decimal notation, corresponds to “10.1.1.1”22="10.1.2.0 255.255.255.0"} } }
8.4.10 Provisioning Dial Customers with ISC
The integration of Dial-access to MPLS-VPN into Internet Solution Center (ISC) v3.0 is still very limited. ISC does not provision most of the information needed to terminate Dial customers on a LNS/PE, so some manual work must be performed when adding new customer VRFs or new users to existing VRFs. The following paragraphs show the necessary steps to perform when adding customers.
The steps assume a Radius structure similar to the one shown in the previous section.
Adding New Customer VRF
1. Provision IP VRF and a new Loopback interface on both LNS-PE’s using an ISC Service Request. Note the loopback interface number (Loopback<n>) as well as the VRF’s name for later use.
NOTE: The same loopback address number must be used on both PE’s (i.e. if loopback500 is provisioned for customer-VRF 1 on MTK_PE_1, loopback500 must be used on TKC_PE_1 as well). This restriction might lead to not provision the loopback via ISC at all????
2. Manually allocate two address pools from the customer’s address space and manually configure the pools on both LNS-PE’s including redistribution into MP-BGP and address summarization:
ip local pool POOL-<vrf-name> x.x.x.x y.y.y.yip route vrf <vrf-name> <prefix> <netmask> Null0!router bgp <asn> address-family ipv4 vrf <vrf-name>
redistribute connected redistribute static aggregate-address <prefix> <netmask> summary-only
3. Create the Radius profile hierarchy for the new customer:
Create a group <customer-name> containing the reply attributes User-Service-Type = Framed Framed-Protocol = PPP Cisco-avpair = "lcp:interface-config#1=ip vrf forwarding VRF1" Cisco-avpair = "lcp:interface-config#2=ip unnumbered Loopback1000"
Create two groups for Dial-in and Dial-backup within the newly created group. Add the following reply attributes to the Dial-in group: Cisco-avpair = "ip:addr-pool=POOL-<vrf-name>"
Adding User Dial-in Accounts to existing Customer VRF
1. Add new Radius profile within the customer’s “Dial-in” group using only a Radius check-item for the user’s password: Password = "<password>"
Adding CE Dial-backup Accounts to existing Customer VRF
2. Add new Radius profile within the customer’s “Dial-backup” group using a Radius check-item for the user’s password and Radius reply attributes for static routes and the remote IP address Password = "<password>"
Framed-IP-Address = "<remote-address>" Framed-Route = "<network> <netmask>"
8.4.11 Advanced Features
per-vrf-AAA
vpdn-multihop vrf-aware
8.5 IPsec Access to MPLS/VPNs
Solution documentation can be found on CCO:
http://www.cisco.com/univercd/cc/td/doc/product/vpn/solution/aswan15/
Documentation below gives an overview and deployment model, however provisioning and implementation of IPsec access into MPLS/VPNs will be controlled through ISC as described at
http://www.cisco.com/univercd/cc/td/doc/product/rtrmgmt/isc/3_0/secmgmt/ipsec2.htm
8.5.1 Overview
The solution can be implemented with either a single device providing all functionality or with multiple devices, each providing some of the functionality. In the “one box” topology, IPSec tunnels terminate on a Cisco IOS router (IPsec PE), which is capable of mapping these tunnels into the appropriate MPLS VPNs. In the “two box” topology IPSec aggregation and other value added services are provided by one device while a second device provides the MPLS PE functionality. This segregation of functionality will provide better scaling for each function (and therefore for the architecture as a whole).
A “one box” design has been selected for Mipnet, because large-scale use of IPsec is not expected at least in near future. IPsec PE shall be a dedicated 7206VXR, because of special IOS requirements.
IPSec PE must have Internet facing interface in a global routing table. It is therefore recommended to implement a PIX firewall between the Internet gateway and IPsecPE, to secure the global routing table of Mipnet routers.
Note: currently the firewall is not installed between the IPsec PE and the Internet (L2 switch in AS8585). We therefore recommend configuring strict packet filtering rules on Internet-facing link on IPsec PE. ACL must only allow IPsec traffic destined to local Loopback address, and block any other packet towards Mipnet infrastructure, ie. towards and IP address configured in global routing table of any router in Mipnet.
Figure 58 IPSec to MPLS VPNs (Single box)
Packet Flow into the IPSec Tunnel
A VPN packet arrives from the Service Provider MPLS backbone network to the PE and is routed through an interface facing the Internet.
The packet is matched against the Security Policy Database (SPD), and the packet is IPSec encapsulated. The SPD includes the IVRF and the access control list (ACL).
The IPSec encapsulated packet is then forwarded using the FVRF routing table.
Packet Flow from the IPSec Tunnel
An IPSec-encapsulated packet arrives at the PE router from the remote IPSec endpoint.
IPSec performs the Security Association (SA) lookup for the Security Parameter Index (SPI), destination, and protocol.
The packet is decapsulated using the SA and is associated with IVRF.
The packet is further forwarded using the IVRF routing table.
8.5.2 VRF Aware IKE/IPsec
Providing VRF awareness within IKE/IPsec is a key feature in this phase of the solution (12.2[15]T). Without VRF awareness, every VPN required a unique public IP address and hence a unique subinterface on the IPsec aggregation router to which the clients and sites could connect. Hence if the SP intended to support 100 VPN customers, it was necessary to have 100 subinterfaces to terminate sessions from each of the VPNs. This is necessary because the way a user/site/session was linked to a VRF was based on the VRF affiliation of the incoming interface. By introducing dynamic VRF association, this restriction is removed.
VRF association is now done using ISAKMP profiles. The incoming sessions are matched based on parameters such as IP address or the group name (in case of VPN clients) during phase one IKE negotiation and placed in the corresponding VRFs. The IKE Security Associations (SA) that are created also include the VRF-ID; when a SA lookup is done, the VRF-ID is also included as part of search criteria.
Figure 59 VRF-aware IPsec
IOS Router
Single Interface/Public IP Address for All the VPNs
Int MPLS Int
• Based on the IKE authentication, the IPSec tunnel is directlyassociated with the VRF
• AAA passes the VRF ID to the router for the tunnel
• Decrypted clear-text packets forwarded directly to the right VRF
IPSecIPSec
MPLS WrappedClear-Text Packets
Forward toMPLS VPNs
GlobalRouting Table
GlobalRouting Table
IPSecCrypto Map
IPSecCrypto Map
VRF-2VRF-2
VRF-1VRF-1
IntIntMPLS
InterfaceMPLS
Interface
Another modification from earlier releases is that we no longer reach IKE endpoint reachability information in the global as well as the VRF table. The endpoint reachability is now required only in the global routing table. Once the packets are encrypted, the route lookup for the destination address in the outer IP header is done directly in the global routing table.
8.5.3 Configuration Examples
IPSec Remote Access-to-MPLS VPN Example
The following shows an IPSec remote access-to-MPLS VPN configuration. The configuration maps IPSec tunnels to MPLS VPNs. The IPSec tunnels terminate on a single public-facing interface.
PE Router Configuration
aaa new-model ! aaa group server radius vpn1 server-private 10.1.1.1 auth-port 1645 acct-port 1646 timeout 5 retransmit 3 key vpn1 ! aaa group server radius vpn2 server-private 10.1.1.1 auth-port 1645 acct-port 1646 timeout 5 retransmit 3 key vpn2 ! aaa authorization network aaa-list group radius ! ip vrf vpn1
rd 100:1 route-target export 100:1 route-target import 100:1 ! ip vrf vpn2 rd 101:1 route-target export 101:1 route-target import 101:1 ! crypto isakmp profile vpn1-ra vrf vpn1 match identity group vpn1-ra client authentication list vpn1 isakmp authorization list aaa-list client configuration address initiate client configuration address respond crypto isakmp profile vpn2-ra vrf vpn2 match identity group vpn2-ra client authentication list vpn2 isakmp authorization list aaa-list client configuration address initiate client configuration address respond ! ! crypto ipsec transform-set vpn1 esp-3des esp-sha-hmac crypto ipsec transform-set vpn2 esp-3des esp-md5-hmac ! crypto dynamic-map vpn1 1 set transform-set vpn1 set isakmp-profile vpn1-ra reverse-route ! crypto dynamic-map vpn2 1 set transform-set vpn2 set isakmp-profile vpn2-ra reverse-route ! ! crypto map ra 1 ipsec-isakmp dynamic vpn1 crypto map ra 2 ipsec-isakmp dynamic vpn2 ! interface Ethernet1/1 ip address 172.17.1.1 255.255.0.0 tag-switching ip
! interface Ethernet1/2 ip address 172.18.1.1 255.255.255.0 crypto map ra ! ip local pool vpn1-ra 10.4.1.1 10.4.1.254 group vpn1-ra ip local pool vpn2-ra 10.4.1.1 10.4.1.254 group vpn2-ra !
Static IPSec-to-MPLS VPN Example
The following sample shows a static configuration that maps IPSec tunnels to MPLS VPNs. The configurations map IPSec tunnels to MPLS VPNs "VPN1" and "VPN2." Both of the IPSec tunnels terminate on a single public-facing interface.
IPSec PE Configuration
ip vrf vpn1 rd 100:1 route-target export 100:1 route-target import 100:1 ! ip vrf vpn2 rd 101:1 route-target export 101:1 route-target import 101:1 ! crypto keyring vpn1 pre-shared-key address 172.16.1.1 key vpn1 crypto keyring vpn2 pre-shared-key address 10.1.1.1 key vpn2 ! crypto isakmp policy 1 encr 3des authentication pre-share group 2 ! crypto isakmp profile vpn1 vrf vpn1 keyring vpn1 match identity address 172.16.1.1 255.255.255.255 ! crypto isakmp profile vpn2 vrf vpn2
keyring vpn2 match identity address 10.1.1.1 255.255.255.255 ! crypto ipsec transform-set vpn1 esp-3des esp-sha-hmac crypto ipsec transform-set vpn2 esp-3des esp-md5-hmac ! crypto map crypmap 1 ipsec-isakmp set peer 172.16.1.1 set transform-set vpn1 set isakmp-profile vpn1 match address 101 crypto map crypmap 3 ipsec-isakmp set peer 10.1.1.1 set transform-set vpn2 set isakmp-profile vpn2 match address 102 ! interface Ethernet1/1 ip address 172.17.1.1 255.255.0.0 tag-switching ip ! interface Ethernet1/2 ip address 172.18.1.1 255.255.255.0 crypto map crypmap ! ip route 172.16.1.1 255.255.255.255 172.168.1.2 ip route 10.1.1.1 255.255.255.255 172.168.1.2 ip route vrf vpn1 10.2.0.0 255.255.0.0 172.18.1.2 global ip route vrf vpn2 10.2.0.0 255.255.0.0 172.18.1.2 global ! access-list 101 permit ip 10.1.0.0 0.0.255.255 10.2.0.0 0.0.255.255 access-list 102 permit ip 10.1.0.0 0.0.255.255 10.2.0.0 0.0.255.255
IPSec Customer Provided Edge (CPE) Configuration for VPN1
crypto isakmp policy 1 encr 3des authentication pre-share group 2 crypto isakmp key vpn1 address 172.18.1.1 ! !
crypto ipsec transform-set vpn1 esp-3des esp-sha-hmac ! crypto map vpn1 1 ipsec-isakmp set peer 172.18.1.1 set transform-set vpn1 match address 101 ! interface FastEthernet1/0 ip address 172.16.1.1 255.255.255.0 crypto map vpn1 ! interface FastEthernet1/1 ip address 10.2.1.1 255.255.0.0 ! access-list 101 permit ip 10.2.0.0 0.0.255.255 10.1.0.0 0.0.255.255 !
IPSec CPE Configuration for VPN2
crypto isakmp policy 1 encr 3des authentication pre-share group 2 ! crypto isakmp key vpn2 address 172.18.1.1 ! ! crypto ipsec transform-set vpn2 esp-3des esp-md5-hmac ! crypto map vpn2 1 ipsec-isakmp set peer 172.18.1.1 set transform-set vpn2 match address 101 ! interface FastEthernet0 ip address 10.1.1.1 255.255.255.0 crypto map vpn2 ! interface FastEthernet1 ip address 10.2.1.1 255.255.0.0 ! access-list 101 permit ip 10.2.0.0 0.0.255.255 10.1.0.0 0.0.255.255
<SECTION BREAKto avoid header/footer and page setup problems do not remove the carriage return
following this line>
9 Network Security and Filtering
In this section, we try to address and recommend some important security and operational features that can be enabled in Cisco IOS to increase security of Mipnet. Detailed definition of TMN security policy goes beyond the scope of this project.
9.1 PE-CE Routing Protocols Security - Summary
Security features of dynamic routing protocols on access layer have been already explained. Following security measures can be turned on for dynamic routing protocols between PE and CE routers:
BGP updates will be authenticated with MD5 signatures.
Maximum number of prefixes that can be received from any BGP CE will be limited (to prevent customer dumping e.g. full Internet routing table into the PE router).
Maximum number of routes for any VRF shall be limited (for similar reason as maximum-prefix limit in case of RIP as CE-PE routing protocol).
RIPv2 sessions shall be authenticated.
Strict prefix and AS_PATH filtering will be applied on BGP-routed Internet customers as explained below.
Spoofing of BGP communities that are significant for routing policy implementation in Mipnet, shall be enabled on each eBGP peer.
Filtering of routing information received from business MPLS/VPN customers is not mandatory, since the routing updates can not affect other MPLS/VPN customers (assuming that maximum number of routes per VRF and from each BGP neighbor is limited). In addition, the dynamic routing protocol will be used for large customers with lots of prefixes to avoid manual configuration of static routes. Flexibility offered by dynamic routing would be lost, in case of strict routing information filters that would have to be altered in the same way as static routes.
9.2 BGP Community Anti-Spoofing filters
BGP communities that have local significance, i.e. that will be used for implementing the routing policy in Mipnet, must not be accepted from any eBGP neighbor – VPN and Internet customer. They should be stripped out with inbound route-map as shown on the following examples.
Some route-maps may have multiple “permit” statements; the “set comm-list delete” command must be applied to each “permit” instance within the route-map.
Figure 60 Community spoofing example for BGP customers
!router bgp 29453 neighbor <neighborIP> route-map CUST-XY in!route-map CUST-XY permit 100
set comm-list 123 delete set community 29453:26xx additive ! Eg. Mark the route as PI or PA customer!ip community-list 123 deny _29453:50_ ! These communities will be acceptedip community-list 123 deny _29453:90_ip community-list 123 permit .* ! Any other community is deleted!
The explicit “comm-list delete” command in the route-map below is not mandatory, because the community attribute received from transit ISP or peering partner shall be completely overwritten with 29453:xxxx community (no “additive” keyword in “set community” command). Nevertheless, we will apply the “comm-list delete” for consistency and as a protection mechanism against router misconfiguration (e.g. if the set community command is not entered or by accident removed)
Figure 61 Community spoofing example for transit ISPs and peering partners
!router bgp 29453 neighbor <neighborIP> route-map PEER-or-TRANSIT-XY in!route-map PEER-or-TRANSIT-XY permit 100 set comm-list 122 delete set community 29453:2700 ! Mark the route as needed!ip community-list 122 permit .* ! All communities will be deleted!
9.3 BGP damping on iGWs (RIPE-229)
BGP damping is a mechanism to improve the stability of routing in the Internet and to reduce the load of CPU on Internet backbone routers. It shall be enabled on iGW routers and any other Internet PE that holds the whole routing table.
9.3.1 What is route-flap damping?
The following is description of route-flap damping as documented in the RIPE-229 document (http://www.ripe.net/ripe/docs/routeflap-damping.html) that recommends the co-ordination of damping parameters among the service providers.
When BGP route-flap damping is enabled in a router, the router starts to collect statistics about the announcement and withdrawal of prefixes. Route-flap damping is governed by
a set of parameters with vendor-supplied default values which may be modified by the router manager. The names, semantic and syntax of these parameters differ between the various implementations; however, the behaviour of the damping mechanism is basically the same.
Each time a prefix is withdrawn, the router will increment the damping penalty by a fixed amount. When the number of withdrawals/announcements (=flap) is exceeded in a given time frame (cutoff threshold) the path is no longer used and not advertised to any BGP neighbour for a predetermined period starting from when the prefix stops flapping. Any more flaps happening after the prefix enters suppressed state will attract additional penalty. Once the prefix stops flapping, the penalty is decremented over time using a half-life parameter until the penalty is below a reuse threshold. Once below this reuse threshold the suppressed path is then re-used and re-advertised to BGP neighbours.
9.3.2 Route-flap damping implementation in Cisco IOS
It is important to understand the following terminology and concepts when configuring and tuning BGP dampening on Cisco IOS:
Routes received from iBGP neighbors are not subject to damping.
Every time a route flaps (withdrawn) it gets 1000 penalty points. For change in BGP attribute the penalty is increased by 500 points.
When the penalty exceeds “suppress limit”, the route is damped (no longer used or propagated).
Penalty placed on a route is decayed using exponential decay algorithm
timehalf
01
01
2
1*
TT
PP . New penalties are calculated (with granularity of 5-second ticks)
at each run of BGP Scanner process. Damped route is re-used and re-advertised when the penalty drops below “reuse limit”.
Flap history is forgotten when the penalty drops below half of the“reuse limit”.
Once the route stabilizes, it can not be suppressed for more than “max-suppress” time.
Maximum penalty that a path can accumulate is:
timehalf
timesuppressmax
2limitreusepenalty-max
An unreachable route with flap history is put in “history state” - it stays in the BGP table but only to maintain the flap history. This can be noticed by constant number of paths (“sh ip bgp sum”) even with presence of route flaps.
Penalty is applied on individual path in the BGP table, not on the IP prefix.
Default BGP damping parameters that are used when we just turn damping on with ‘bgp dampening’ command are shown in the table below.
Table 22 Defaul BGP damping parameters
half-time per-flap penalty(non configurable)
suppress limit reuse limit max-suppress-time
15 minutes 1000 on down event
500 on attribute change
2000 750 60 minutes
The configuration template based on recommendation in RIPE-229 is copied from http://www.golden-networks.net/cisco.txt. The same damping parameters should be applied to any router under TMN administration domain that holds the full Internet routing table.
!router bgp 29453 NO bgp damp bgp damp route-map graded-flap-damping!! don't damp Golden Networks!route-map graded-flap-damping deny 10 match ip address prefix-list golden-networks!! - /24 and longer prefixes: max=min outage 60 minutes!route-map graded-flap-damping permit 20 match ip address prefix-list min24 set damp 30 820 3000 60!! - /22 and /23 prefixes: max outage 45 minutes but potential for! less because of shorter half life value - minimum of 30 minutes! outage
route-map graded-flap-damping permit 30 match ip address prefix-list max22-23 set damp 15 750 3000 45!! - all else prefixes: max outage 30 minutes min outage 10 minutes!route-map graded-flap-damping permit 40 set damp 10 1500 3000 30!!-----------------------------------------------------------------------
! DEFINE PREFIX-LISTS!-----------------------------------------------------------------------!! Most recent list of Golden networks is defined at! http://www.golden-networks.net/cisco.txt!no ip prefix-list golden-networksip prefix-list golden-networks description Golden Networks!no ip prefix-list min24ip prefix-list min24 description Apply to /24 and longer prefixesip prefix-list min24 permit 0.0.0.0/0 ge 24!no ip prefix-list max22-23ip prefix-list max22-23 description Apply to /22 and /23 prefixesip prefix-list max22-23 permit 0.0.0.0/0 ge 22 le 23!
In Cisco IOS, it is possible to configure flap damping parameters which do nothing. One example might be the configuration "bgp dampening 30 750 3000 60". 60 minutes is the max-suppress-time, at which the penalty must be at 750. Half-life is 30 minutes, so working back from 60 minutes, at 30 minutes the penalty will be 1500, and at 0 minutes the penalty is 3000. Penalty has to go over 3000 before the prefix is suppressed, so a max penalty of 3000 won't result in any suppression at all. Increasing the reuse-limit from 750 to e.g. 800, or the suppress-limit from 3000 to 3200 would improve the dampening behaviour.
9.4 Filtering of BGP Updates
9.4.1 Prefix Filtering
Inbound prefix-list filtering in Mipnet will be performed on BGP sessions with Internet customers (ie. customers in Internet VPNs). Only the routes that have been assigned to that customer will be accepted. This is to prevent misconfigured customer routers from announcing someone else’s routes, or, if the customer is multi-homed, to prevent injecting full Internet routing table or a default route into Mipnet.
This rule shall be respected even if the Mipnet customer is a “downstream” service provider with it’s own customer base. In this case, the prefix filter shall comprise the downstream provider’s routes and the routes of downstream provider’s customers.
Prefix-list filtering is not feasible on BGP sessions with peering partners or upstream service providers because of the size and frequent changes of prefix-lists (operational issue).
Outbound prefix-lists will not be implemented, as the route export control from Mipnet AS shall be based on communities.
Figure 62 Prefix-list filtering of customer routes
!router bgp 29453! address-family ipv4 vrf VPN1! neighbor <eBGP_peer1_addr> prefix-list CUST_Corporation1 in neighbor <eBGP_peer1_addr> capability orf prefix-list send !ip prefix-list CUST_Corporation1 seq 10 permit <Corporation1_prefix1>ip prefix-list CUST_Corporation1 seq 20 permit <Corporation1_prefix2>!
9.4.2 AS_PATH Filtering
Inbound AS_PATH filtering shall be implemented on customer BGP sessions to prevent accidental AS_PATH spoofing. AS_PATH spoofing is not a serious security issue as it would not break the connectivity of a third-party network (NLRI is already filtered by prefix-list). Nevertheless it shall not be tolerated to prevent confusion and unnecessary operational problems due to ill-behaved customers.
The following AS_PATH filter will permit the routes originated in customer AS20 and allow AS_PATH prepending. It will also permit the routes originated in AS21 and transiting AS20.
Figure 63 AS_PATH filtering on customer eBGP sessions
router bgp 29453! address-family ipv4 vrf VPN1! neighbor <eBGP_peer1_addr> filter-list 100 in!ip as-path access-list 100 permit ^(_20)+(_21)*$
!
The route will be accepted if it is permitted by prefix-list AND AS_PATH filter.
9.5 Policing of ICMP traffic on border Internet links
Marking of Internet packets with DSCP 0 (Standard traffic class) will protect the Mipnet from misusing of implemented QoS mechanisms by the customer that has not subscribed to one of TMN QoS offerings. As previously explained in the QoS section, the DSCP-0 marking must be applied on the following links:
interfaces from transit ISPs,
interfaces from peering partners,
Internet customers and,
VPN customers with unmanaged CE routers.
In addition to “DSCP spoofing”, it is a common practice is to police certain protocol streams, which are more likely to be abused.
9.5.1 SMURF attacks
In 1997 a new generation of attacks were launched on the Internet – SMURFs. The “smurf” attack is a specific Denial of Service (DoS) attack, named after its exploit program. It is a recent category of network-level attacks against hosts. A perpetrator sends a large amount of ICMP echo (ping) traffic to specific IP broadcast addresses. All the ICMP echo packets will have the spoofed source address of a victim. If the routing device delivering traffic to those broadcast addresses performs the IP broadcast to layer 2, then the ICMP broadcast function will be forward to all host on the layer 2 medium. Most hosts on that IP network will take the ICMP echo request and reply to it with an echo reply. This multiplies the traffic by the number of hosts responding. On a multi-access broadcast network, there could potentially be hundreds of machines to reply to each packet.
The “smurf” attack’s cousin is “fraggle”, which uses UDP echo packets in the same fashion as the ICMP echo packets; it was a simple re-write of the “smurf” programme. Currently, the systems most commonly hit are Internet Relay Chat (IRC) servers and their providers.
Two parties are hurt by this attack:
The intermediary (broadcast) devices – called the “amplifiers”
The spoofed address target – the “victim”
The victim is the target of the large amount of traffic that the amplifiers generate.
Consider a co-location switched network with 100 hosts, and that the attacker has a T1 circuit. The attacker sends, say, a 768kb/s stream of ICMP echo (ping) packets, with a spoofed source address of the victim, to the broadcast address of the “bounce” or amplifier site. These ping packets hit the bounce site’s broadcast network of 100 hosts; each of them takes the packet and responds to it, creating 100ping replies out-bound. If you multiply the bandwidth, you will see that 76.8 Mbps is generated outbound from the “bounce site”. Because of the spoofed source address of the originating packets, these reply packets are then directed towards the victim.
The following example polices all UDP, ICMP and Multicast traffic to 1Mbps and relatively small bursts. The DSCP and QoS-group marking must be applied all the times, whereas the policing may be turned on only when the “SMURF” attack has been detected15 or reported.
If the TMN’s operational policy is to prevent the SMURF attacks (i.e. having the smurf policing applied all the times), it is recommended to monitor the “normal” rate of ICMP/UDP packets across given interface and adjust the rate in police command accordingly. Otherwise, customer traffic (e.g. video streams) may be affected by this security mechanism.
!hostname xxxPE1!class-map match-any smurf match access-group 160!policy-map Internet_link class smurf set qos-group 0 ! Limit UDP, ICMP & Multicast to 1M police 1000000 8000 8000 conform-action set-dscp-transmit 0 exceed- action drop class class-default set qos-group 0 set ip dscp 0!interface Serial 2/0/1:0 bandwidth 10000 description Conection with peering partner service-policy input Internet_link!access-list 160 permit udp any any access-list 160 permit icmp any any
15 For example when the MRTG graph shows abnormal increase of interface load, it is recommended to turn on the SMURF policing.
access-list 160 permit ip any 224.0.0.0 15.255.255.255
9.6 DSCP Spoofing
DSCP must be set to value of zero on each IPv4 connection: CE-PE links of Internet customers, routes received from downstream/upstream ISPs and peering partners.
This is to prevent best effort Internet traffic to be carried as eg. Business-class data across Mipnet core.
9.7 SNMP
SNMP is very widely used for router monitoring, and frequently for router configuration changes as well. Unfortunately, version 1 of the SNMP protocol, which is the most commonly used, uses a very weak authentication scheme based on a “community string”, which amounts to a fixed password transmitted over the network without encryption. If at all possible, use SNMP version 2, which supports an MD5-based digest authentication scheme, and allows for restricted access to various management data.
If you must use SNMP version 1, you should be careful to choose unobvious community strings (not, for example, “public” or “private”). If at all possible, you should avoid using the same community strings for all network devices; use a different string or strings for each device, or at least for each area of the network. Do not make a read-only string the same as a read-write string. If possible, periodic SNMP version 1 polling should be done with a read-only community string; read-write strings should be used only for actual write operations.
SNMP version 1 is not well suited to use across the public Internet for the following reasons:
It uses clear text authentication strings.
Most SNMP implementations send those strings repeatedly as part of periodic polling.
It is an easily spoofable, datagram-based transaction protocol.
You should carefully consider the implications before using it that way.
In most networks, legitimate SNMP messages will come only from certain management stations. If this is true in your network, you should probably use the access list number option on the snmp-server community command to restrict SNMP version 1 access to only the IP addresses of the management stations. Do not use the snmp-server community command for any purpose in a pure SNMP version 2 environments; this command implicitly enables SNMP version 1.
For SNMP version 2, configure digest authentication with the authentication and md5 keywords of the snmp-server party configuration command. If possible, use a different MD5 secret value for each router.
SNMP management stations often have large databases of authentication information, such as community strings. This information may provide access to many routers and other network devices. This concentration of information makes the SNMP management station a natural target for attack, and it should be secured accordingly.
9.8 Password Management
Passwords and similar secrets (such as SNMP community strings) are the primary defence against unauthorized access to your router. The best way to handle most passwords is to maintain them on a TACACS+ or RADIUS authentication server. However, almost every router will still have a locally configured password for privileged access, and may also have other password information in its configuration file.
The enable secret command is used to set the password that grants privileged administrative access to the IOS system. An enable secret password should always be set. You should use enable secret, not the older enable password because the later uses a weak encryption algorithm.
If no enable secret is set, and a password is configured for the console TTY line, the console password may be used to get privileged access, even from a remote VTY session. This is almost certainly not what you want, and is another reason to be certain to configure an enable secret.
The service password-encryption command directs the IOS software to encrypt the passwords, CHAP secrets, and similar data that are saved in its configuration file. This is useful for preventing casual observers from reading passwords, for example, when they happen to look at the screen over an administrator's shoulder.
However, the algorithm used by service password-encryption is a simple Vigenere cipher; any competent amateur cryptographer could easily reverse it in at most a few hours. The algorithm was not designed to protect configuration files against serious analysis by even slightly sophisticated attackers, and should not be used for this purpose. Any Cisco configuration file that contains encrypted passwords should be treated with the same care used for a clear text list of those same passwords.
This weak encryption warning does not apply to passwords set with the enable secret command, but it does apply to passwords set with enable password.
The enable secret command uses MD5 for password hashing. The algorithm has had considerable public review, and is not reversible as far as anybody at Cisco knows. It is, however, subject to dictionary attacks (a "dictionary attack" is having a computer try every word in a dictionary or other list of candidate passwords). It's therefore wise to keep your configuration file out of the hands of untrusted people, especially if you're not sure your passwords are well chosen.
9.9 Console Ports
It is important to remember that the console port of an IOS device has special privileges. In particular, if a BREAK signal is sent to the console port during the first few seconds after a reboot, the password recovery procedure can easily be used to take control of the system. This means that attackers who can interrupt power or induce a system crash, and who have access to the console port via a hardwired terminal, a modem, a terminal server, or some other network device, can take control of the system, even if they do not have physical access to it or the ability to log in to it normally.
It follows that any modem or network device that gives access to the Cisco console port must itself be secured to a standard comparable to the security used for privileged access to the router. At a bare minimum, any console modem should be of a type that can require the dialup user to supply a password for access, and the modem password should be carefully managed.
9.10 Controlling TTY’s
Local asynchronous terminals are less common than they once were, but they still exist in some installations. Unless the terminals are physically secured, and usually even if they are, the router should be configured to require users on local asynchronous terminals to log in before using the system. Most TTY ports in modern routers are either connected to external modems, or are implemented by integrated modems; securing these ports is obviously even more important than securing local terminal ports.
By default, a remote user can establish a connection to a TTY line over the network; this is known as "reverse Telnet," and allows the remote user to interact with the terminal or modem connected to the TTY line. It is possible to apply password protection for such connections. Often, it is desirable to allow users to make connections to modem lines, so that they can make outgoing calls. However, this feature may allow a remote user to connect to a local asynchronous terminal port, or even to a dial-in modem port, and simulate the router's login prompt to steal passwords, or to do other things that may trick local users or interfere with their work.
To disable this reverse Telnet feature, apply the configuration command transport input none to any asynchronous or modem line that should not be receiving connections from network users. If at all possible, do not use the same modems for both dial-in and dial-out, and do not allow reverse Telnet connections to the lines you use for dial-in.
9.11 Controlling VTYs and Ensuring VTY Availability
Any VTY should be configured to accept connections only with the protocols actually needed. This is done with the transport input command. For example, a VTY that was expected to receive only Telnet sessions would be configured with transport input telnet, while a VTY permitting both Telnet and SSH sessions would have transport input telnet ssh. If your software supports an encrypted access protocol such as SSH, it may be wise to enable only that protocol, and to disable clear text Telnet. It's also usually a good idea to use the ip access-class command to restrict the IP addresses from which the VTY will accept connections.
A Cisco IOS device has a limited number of VTY lines (usually five). No additional remote interactive connections can be established if all of the VTY’s are in use. This creates the opportunity for a denial-of-service attack; if an attacker can open remote sessions to all the VTY’s on the system, the legitimate administrator may not be able to log in. The attacker does not have to log in to do this; the sessions can simply be left at the login prompt.
One way of reducing this exposure is to configure a more restrictive ip access-class command on the last VTY in the system than on the other VTY’s. The last VTY (usually VTY 4) might be restricted to accept connections only from a single, specific administrative workstation, whereas the other VTY’s might accept connections from any address in a corporate network.
Another useful tactic is to configure VTY timeouts using the exec-timeout command. This prevents an idle session from consuming a VTY indefinitely. Although its effectiveness against deliberate attacks is relatively limited, it also provides some protection against sessions accidentally left idle. Similarly, enabling TCP keepalives on incoming connections (with service tcp-keepalives-in) can help to guard against both malicious attacks and "orphaned" sessions caused by remote system crashes.
Disabling all non-IP-based remote access protocols, and using IPSec encryption for all remote interactive connections to the router can provide complete VTY protection. IPSec is an extra-cost option, and its configuration is beyond the scope of this document.
9.12 Logging
Cisco routers can record information about a variety of events, many of which have security significance. Logs can be invaluable in characterizing and responding to security incidents. The main types of logging used by Cisco routers are:
AAA logging, which collects information about user dial-in connections, logins, logouts, HTTP accesses, privilege level changes, commands executed, and similar events. AAA log entries are sent to authentication servers using the TACACS or RADIUS protocols, and are recorded locally by those servers, typically in disk files. If you are using a TACACS or RADIUS server, you may wish to enable AAA logging of various sorts; this is done using AAA configuration commands such as aaa accounting.
SNMP trap logging, which sends notifications of significant changes in system status to SNMP management stations.
System logging, which records a large variety of events, depending on the system configuration. System logging events may be reported to a variety of destinations, including the following:
o System console port (logging console).
o Servers using the syslog protocol (logging <ip-address>, logging trap).
o Sessions on VTY’s and TTY’s (logging monitor, terminal monitor).
o Local buffer in router RAM (logging buffered).
Console logging shall be disabled during debugging of various router protocols to prevent router “freeze”
From a security point of view, the most important events usually recorded by system logging are interface status changes, changes to the system configuration, access list matches, and events detected by the optional firewall and intrusion detection features.
Each system-logging event is tagged with an urgency level. The levels range from debugging information (at the lowest urgency), to major system emergencies. Each logging destination may be configured with threshold urgency, and will receive logging events only at or above that threshold.
9.12.1 Saving logging information
By default, system-logging information is sent only to the asynchronous console port. Since many console ports are unmonitored, or are connected to terminals without historical memory and with relatively small displays, this information may not be available when it is needed, especially when a problem is being debugged over the network.
Almost every router should save system logging information to a local RAM buffer. The logging buffer is of a fixed size, and retains only the newest information. The contents of the buffer are lost whenever the router is reloaded. Even so, even a moderately sized logging buffer is often of great value. On low-end routers, a reasonable buffer size might be 16384 or 32768 bytes; on high-end routers with lots of memory (and many logged events), even 262144 bytes might be appropriate. You can use the show memory command to make sure that your router has enough free memory to support a logging buffer. Create the buffer using the logging buffered <buffer-size> configuration command.
Larger installations will have syslog servers. You can send logging information to a server with logging <server-ip-address>, and you can control the urgency threshold for logging to the server with logging trap <urgency>. Even if you have a syslog server, you should still enable local logging.
If your router has a real-time clock or is running NTP, you will probably want to time-stamp log entries using service timestamps log|debug datetime msecs.
9.12.2 Recording Access List Violations
If you use access lists to filter traffic, you may want to log packets that violate your filtering criteria. Older Cisco IOS software versions support logging using the log keyword, which causes logging of the IP addresses and port numbers associated with packets matching an access list entry. Newer versions provide the log-input
keyword, which adds information about the interface from which the packet was received, and the MAC address of the host that sent it.
It is not usually a good idea to configure logging for access list entries that will match very large numbers of packets. Doing so will cause log files to grow excessively large, and may cut into system performance. However, access list log messages are rate-limited, so the impact is not catastrophic.
Access list logging can also be used to characterize traffic associated with network attacks, by logging the suspect traffic.
9.13 Anti-spoofing
Many network attacks rely on an attacker falsifying, or spoofing the source addresses of IP datagrams. Some attacks rely on spoofing to work at all, and other attacks are much harder to trace if the attacker can use somebody else’s address. Therefore, it is valuable for network administrators to prevent spoofing wherever feasible.
Anti spoofing should be done at every point in the network where it is practical, but is usually both easiest and most effective at the borders between large address blocks, or between domains of network administration. It is usually impractical to do anti-spoofing on every router in a network; because of the difficulty of determining which source addresses may legitimately appear on any given interface.
For an Internet service provider effective anti-spoofing, together with other effective security measures, can cause expensive, annoying problem subscribers to take their business to other providers. ISP’s should be especially careful to apply anti-spoofing controls at dialup pools and other end-user connection points (see also RFC 2267).
Administrators of firewalls or perimeter routers sometimes install anti-spoofing measures to prevent hosts on the Internet from assuming the addresses of internal hosts, but do not take steps to prevent internal hosts from assuming the addresses of hosts on the Internet. It's a far better idea to try to prevent spoofing in both directions. There are at least three good reasons for doing anti-spoofing in both directions at an organizational firewall:
Internal users will be less tempted to try launching network attacks and less likely to succeed if they do try.
Wrongly configured internal hosts will be less likely to cause trouble for remote sites.
Outside crackers often break into networks as launching pads for further attacks. These crackers may be less interested in a network with outgoing spoofing protection.
9.13.1 Anti-spoofing with packet filters
Unfortunately, it is not practical to give a simple list of commands that will provide appropriate spoofing protection; access list configuration depends too much on the individual network. However, the basic goal is simple: to discard packets that arrive on interfaces that are not viable paths from the supposed source addresses of those packets. For example, on a two-interface router connecting a corporate network to the Internet,
any datagram that arrives on the Internet interface, but whose source address field claims that it came from a machine on the corporate network, should be discarded.
Similarly, any datagram arriving on the interface connected to the corporate network, but whose source address field claims that it came from a machine outside the corporate network, should be discarded. If CPU resources allow it, anti-spoofing should be applied on any interface where it is feasible to determine what traffic may legitimately arrive.
ISPs carrying transit traffic have limited opportunities to configure anti-spoofing access lists, but can usually at least filter outside traffic that claims to originate within the ISP's own address space.
In general, anti-spoofing filters must be built with input access lists; that is, packets must be filtered at the interfaces through which they arrive at the router, not at the interfaces through which they leave the router. This is configured with the ip access-group <list> in interface configuration command. It is possible to do anti-spoofing using output access lists in some two-port configurations, but input lists are usually easier to understand even in those cases. Furthermore, an input list protects the router itself from spoofing attacks, whereas an output list protects only devices behind the router.
Please note that anti-spoofing filters can increase operational/management complexity. Some large VPNs may change or update their address allocation on a daily or weekly basis, which means that TMN operations will have to maintain and update packet-spoofing filters accordingly. The fact that IP packets from a given VPN can’t escape into any other VPN somehow eliminates the need for use of anti-spoofing filters. A misbehaved customer can only attack its own sites. A MPLS/VPN customer cannot affect any other MPLS/VPN customer, nor the Mipnet backbone routers.
Inbound anti spoofing filter #101 shall be however implemented on Internet connections (ineternet VRFs), and shall consist of the following major sections:
Block packets with invalid or prohibited source IP address from being sent towards or across Mipnet
Block packets with source IP from 195.140.164.0/23 (backbone links, NOC) from being sent towards or across Mipnet
Allow any other packet that is not destined (ie. transit traffic) towards Mipnet devices.
!
! Generic anti-spoofing template
!
access-list 101 deny ip 0.0.0.0 0.0.0.0 any
access-list 101 deny ip 0.0.0.0 0.255.255.255 any
access-list 101 deny ip 10.0.0.0 0.255.255.255 any
access-list 101 deny ip 127.0.0.0 0.255.255.255 any
access-list 101 deny ip 169.254.0.0 0.0.255.255 any
access-list 101 deny ip 172.16.0.0 0.15.255.255 any
access-list 101 deny ip 192.0.2.0 0.0.0.255 any
access-list 101 deny ip 192.168.0.0 0.0.255.255 any
access-list 101 deny ip 224.0.0.0 7.255.255.255 any
access-list 101 deny ip 255.0.0.0 0.255.255.255 any
!
! Prevent spoofing of Mipnet backbone links
!
access-list 101 deny ip 195.140.164.0 0.0.1.255 any
!
! Any other traffic transiting Mipnet is allowed
!
access-list 101 permit ip any any
!
9.13.2 Turbo ACLs
For improved performance of packet filtering, TurboACL feature shall be enabled on supported platforms and IOS release (please check the CCO for latest feature support information).
Turbo ACL feature allows for a more efficient searching algorithm. The algorithm for access list processing has been to process ACLs sequentially, one line at a time. The entire list must be scanned one line at a time and the performance is proportional to the matching depth. The Turbo ACL feature compiles the ACLs into a set of lookup tables, while maintaining the first match requirements. Packet headers are used to access these tables in a small, fixed number of lookups, independently of the existing number of ACL entries. The ACL evaluations are therefore independent of matching depth and they run much faster.
Global configuration command “access-list compiled” will enable Turbo ACLs. To verify successful configuration please use a command “show access-list compiled”.
9.13.3 Anti-spoofing with RPF checks
In almost all Cisco IOS software versions that support Cisco Express Forwarding (CEF), it is possible to have the router check the source address of any packet against the interface through which the packet entered the router. If the input interface is not a feasible path to the source address according to the routing table, the packet will be dropped.
This works only when routing is symmetric. If the network is designed in such a way that traffic from host A to host B may normally take a different path than traffic from host B to host A, the check will always fail and communication between the two hosts will be impossible. This sort of asymmetric routing is common in the Internet core. You should make sure that your network does not use asymmetric routing before enabling this feature.
This feature is known as a reverse path forwarding (RPF) check, and is enabled with the command ip verify unicast rpf. It is available in Cisco IOS software 11.1CC, 11.1CT, 11.2GS, and all 12.0 and later versions, but requires that CEF be enabled in order to be effective.
This feature can be used in Mipnet for single-homed, statically routed sites. It is not recommended to use RPF on multi-homed sites - because of potential asymmetric routing. It is not beneficial to use RPF on sites with dynamic routing protocol, where customer controls the RPF via routing information he sends to the PE router.
9.14 Controlling Directed Broadcasts
IP directed broadcasts are used in the extremely common and popular “smurf” denial-of-service attack, and can also be used in related attacks.
An IP directed broadcast is a datagram which is sent to the broadcast address of a subnet to which the sending machine is not directly attached. The directed broadcast is routed through the network as a unicast packet until it arrives at the target subnet, where it is converted into a link-layer broadcast. Because of the nature of the IP addressing architecture, only the last router in the chain, the one that is connected directly to the target subnet, can conclusively identify a directed broadcast. Directed broadcasts are occasionally used for legitimate purposes, but such use is not common outside the financial services industry.
In a smurf attack, the attacker sends ICMP echo requests from a falsified source address to a directed broadcast address, causing all the hosts on the target subnet to send replies to the falsified source. By sending a continuous stream of such requests, the attacker can create a much larger stream of replies, which can completely inundate the host whose address is being falsified.
If a Cisco interface is configured with the no ip directed-broadcast command, directed broadcasts that would otherwise be expanded into link-layer broadcasts at that interface are dropped instead. The command no ip directed-broadcast must be configured on every interface of every router that might be connected to a target subnet; it is not sufficient to configure only firewall routers. The no ip directed-broadcast command is the default in Cisco IOS software version 12.0 and later. In earlier versions, the command should be applied to every LAN interface that is not known to forward legitimate directed broadcasts.
9.15 IP Source Routing
The IP protocol supports source routing options that allow the sender of an IP datagram to control the route that datagram will take toward its ultimate destination, and generally the route that any reply will take. These options are rarely used for legitimate purposes in real networks. Some older IP implementations do not process source-routed packets properly, and it may be possible to crash machines running these implementations by sending them datagrams with source routing options.
A Cisco router with no ip source-route set will never forward an IP packet, which carries a source routing option. You should use this command unless you know that your network needs source routing.
It is strongly recommended to disable the IP source routing option in Mipnet.
9.16 ICMP Redirects
An ICMP redirect message instructs an end node to use a specific router as its path to a particular destination. In a properly functioning IP network, a router will send redirects only to hosts on its own local subnets, no end node will ever send a redirect, and no redirect will ever be traversed more than one network hop. However, an attacker may violate these rules; some attacks are based on this. It is a good idea to filter out incoming ICMP redirects at the input interfaces of any router that lies at a border between administrative domains, and it is not unreasonable for any access list that is applied on the input side of a Cisco router interface to filter out all ICMP redirects. This will cause no operational impact in a correctly configured network.
Note that this filtering prevents only redirect attacks launched by remote attackers. It's still possible for attackers to cause significant trouble using redirects if their host is directly connected to the same segment as a host that's under attack.
9.17 Switching Modes and Cisco Express Forwarding
The CEF switching mode, available in Cisco IOS software versions 11.1CC, 11.1CT, 11.2GS, and 12.0, replaces the traditional Cisco routing cache with a data structure that mirrors the entire system routing table. Because there is no need to build cache entries when traffic starts arriving for new destinations, CEF behaves more predictably than other modes when presented with large volumes of traffic addressed to many destinations.
Although most flooding denial-of-service attacks send all of their traffic to one or a few targets and therefore do not tax the traditional cache maintenance algorithm, many popular SYN flooding attacks use randomised source addresses. The host under attack replies to some fraction of the SYN floods packets, creating traffic for a large number of destinations. Routers configured for CEF therefore perform better under SYN floods (directed at hosts, not at the routers themselves) than do routers using the traditional cache. CEF is recommended when available.
9.18 Scheduler Configuration
When a Cisco router is fast-switching a large number of packets, it is possible for the router to spend so much time responding to interrupts from the network interfaces that no other work gets done. Some very fast packet floods can cause this condition. Using the scheduler interval command, which instructs the router to stop handling interrupts and attend to other business at regular intervals, can reduce the effect. A typical configuration might include the command scheduler interval 500, which indicates that process-level tasks are to be handled no less frequently than every 500 milliseconds. This command very rarely has any negative effects, and should be a part of your standard router configuration unless you know of a specific reason to leave it out.
Many newer Cisco platforms use the command scheduler allocate instead of scheduler interval. The scheduler allocate command takes two parameters: a period in microseconds for the system to run with interrupts enabled, and a period in microseconds for the system to run with interrupts masked. If your system does not recognize the scheduler interval 500 command, try scheduler allocate 250000 1000.
9.19 Last-Resort Routing to the Null Device
A static default route to the null device (ip route 0.0.0.0 0.0.0.0 null 0 255) may greatly improve a router’s performance in discarding datagrams with unreachable destination addresses. This can prevent simplistic pure-flooding denial-of-service attacks from causing undue trouble, as well as improving performance for high-traffic routers in general. This command should not be introduced into a working routing configuration without a thorough understanding of its effects, since it can cause failures in some configurations involving route redistribution.
There are many other IP routing configuration settings that can affect performance under denial-of-service attacks. Most of these settings are also important for dealing with wrongly configured devices, and even normal traffic. Some of these settings have effects specific to certain versions of IOS software, and all of them are very specific to the surrounding network configuration. In general, such settings are most important for high-traffic backbone routers, especially routers with large numbers of interfaces.
9.20 TCP and UDP “Small Services”
By default, Cisco devices up through IOS version 11.3 offer the small services: echo, chargen, and discard. These services, especially their UDP versions, are infrequently used for legitimate purposes, but can be used to launch denial of service and other attacks that would otherwise be prevented by packet filtering.
For example, an attacker might send a DNS packet, falsifying the source address to be a DNS server that would otherwise be unreachable, and falsifying the source port to be the DNS service port (port 53). If such a packet were sent to the Cisco's UDP echo port, the result would be the Cisco sending a DNS packet to the server in question. No outgoing access list checks would be applied to this packet, since it would be considered to be locally generated by the router itself.
Abuses of the small services can be avoided or made less dangerous by anti-spoofing access lists. Since the services are rarely used, the best policy is usually to disable them on all routers of any description.
The small services are disabled by default in Cisco IOS 12.0 and later software. In earlier software, they may be disabled using the commands no service tcp-small-servers and no service udp-small-servers.
9.21 Finger
Cisco routers provide an implementation of the finger service, which is used to find out which users are logged into a network device. Although this information is not usually tremendously sensitive, it can sometimes be useful to an attacker. The finger service may be disabled with the command no service finger.
9.22 CDP
Cisco Discovery Protocol (CDP) is used for some network management functions, but is dangerous in that it allows any system on a directly connected segment to learn that the router is a Cisco device, and to determine the model number and the Cisco IOS software version being run. This information may in turn be used to design attacks against the router. CDP information is accessible only to directly connected systems. The CDP protocol may be disabled with the global configuration command no cdp running. CDP may be disabled on a particular interface with no cdp enable.
9.23 NTP
The Network Time Protocol (NTP) is a protocol used to time-synchronize network devices. NTP runs over UDP and is documented in RFC 1305. An NTP stratum 1 server should get its time from an authoritative time source, such as a GPS system or an atomic clock attached to a timeserver. NTP then distributes this time across the network. NTP is a very sophisticated and efficient protocol, which only needs one packet per minute to synchronize two machines to within a millisecond of one another.
NTP uses the concept of a "stratum" to describe how many NTP "hops" away a machine is from an authoritative time source. A "stratum 1" time source has a reference clock such as a GPS or atomic clock directly attached, a "stratum 2" time source receives its time from a "stratum 1" time source, and so on. This “hop” count isn’t related to the IP hops between two NTP time sources. A device running NTP automatically chooses the lowest stratum timeserver as its time source. It only talks and listens to servers, which it has a configuration entry for.
To avoid synchronization problems NTP has two methods to determine the validity of the time source. NTP will never synchronize to a device, which is not synchronized itself. It will also not synchronize to a source; whichs time is significantly different than all the other time sources.
The NTP configuration is usually static. Every device has a list of IP addresses with which it will exchange NTP messages. These communication agreements are called associations. On LAN segments NTP can use IP broadcast messages as well.
With Cisco two mechanisms are available to secure the communication: an access list-based restriction scheme and an encrypted authentication mechanism. A limitation of Cisco’s implementation is that it doesn’t support stratum 1 service, which means a reference clock such as a GPS or atomic clock cannot be connected directly to the Cisco box.
NTP is a very valuable tool for reporting and troubleshooting, because cause and effect of problems can be clearly correlated. Care must be taken, where the time information comes from, especially if additional time sources from the Internet are used as a reference. Confusing the time system can render system log files completely useless.
The Network Time Protocol (NTP) will be used to synchronize router clocks. NTP authentication will used to have secure NTP associations. The loopback0 address is used to form NTP associations.
ntp authentication-key 1 md5 *&^^&*_(_ 7ntp authenticatentp trusted-key 1ntp source Loopback0ntp update-calendarntp server <mtkc_p_1 loopback> key 1ntp server <tkc_p_1 loopback> key 1
Podgorica P routers could synchronise with an external timeserver, if the Internet transit was implemented in the global routing table. This is not the case in Mipnet, so two P routers in Podgorica synchronise among themselves as NTP peers to increase the stability (please note that GSR clock is already very accurate, and more important than absolute accuracy is time synchronization between all Mipnet devices).
Any other P/PE device becomes a NTP client of mtkc_p_1 and tkc_p_1.
9.24 Miscellaneous
9.24.1 Global Configuration
Consider enabling Core Dump. This facility operates like many other similar systems in UNIX, i.e. when a router crashes a copy of the core memory is kept, and before the memory is wiped on reboot the router can be set up to copy the core dump to a UNIX server.
ip ftp source-interface loopback0ip ftp username <name>ip ftp password <password>exception protocol ftpexception dump <host>
Configure a default domain-name on the router which will appended to all host names which do not have a domain-name included by default.
ip domain-list telekomcg.netip domain-list .
When using a standard TCP implementation to send keystrokes between machines, TCP tends to send one packet for each keystroke typed, which can use up bandwidth and contribute to congestion on larger networks. John Nagle's algorithm (RFC 896) helps alleviate the small-packet problem in TCP. The first character typed after connection establishment is sent in a single packet, but TCP holds any additional characters typed until the receiver acknowledges the previous packet. Then the second, larger packet is sent, and additional typed characters are saved until the acknowledgement comes back. The effect is to accumulate characters into larger chunks, and pace them out to the network at a rate matching the round-trip time of the given connection. This method is usually good for all TCP-based traffic.TMN can implement the nagle service on all Cisco components where available as follows:
service nagle
To solve problems with terminal sessions that crash and hold open VTY ports the following command should be used.
Service tcp-keepalives-in
Turn off un-necessary IOS functions. These include;
no service padno ip bootp serverno ip http server
IP subnet-zero should be enabled
ip subnet-zero
IP classless should be enabled
ip classless
No logging console should be configured to prevent situation when large amount of debugging messages may ‘freeze’ the router.
no logging console
All enabled services will use the Loopback address as the source address for any packets originating from the router to the NOC.
ip telnet source-interface <loopback0>ip tftp source-interface <loopback0>ip ntp source-interface <loopback0>ip radius source-interface <loopback0>snmp-server trap-source <loopback0>logging source-interface <loopback0>
9.24.2 Interface Configuration
Interface bandwidth command should always be used in Mipnet as the QoS mechanisms use this in apportionment calculations.
The description command should be used on all interfaces to improve the readability of configurations and ease troubleshooting.
The following commands should be applied to turn OFF services on all interfaces.
no ip redirectsno ip proxy-arp
SPD (Selective Packet Discard) protects routing update packets from being dropped during congestion so it will be left on.
<SECTION BREAKto avoid header/footer and page setup problems do not remove the carriage return
following this line>
10 Configuration Templates
<SECTION BREAKto avoid header/footer and page setup problems do not remove the carriage return
following this line>
11 NOC – Network Operations Centre
11.1 Physical Connectivty
The following drawing depicts the physical connectivity in Mipnet NOC.
Because dedicated Management CE routers have not been foreseen for Mipnet NOC, the MCE functionality is configured on two Cat6509 switches in Podgorica16.
Each MPE router is interconnected with one P router via two POS STM-1 links.
Connection between MCE and MPE is implemented with a single GE cable. Encapsulation 802.1q on this back-to-back connection permits two or more logical subinterfaces, which are needed for IPv4 and Management VPN connectivity.
Dot1q trunk between MCEs will allow creation of HSRP for outbound routing, and iBGP connectivity between MCEs.
FE media is used for connectivity within the Outside and Inside security zones. Outside LAN segment is created as back-to-back FE links. Inside LAN segments is implemented on two pairs of Catalyst 3550 switches, interconnected via FE 802.1q trunk. Layer-3 (IP) is not enabled on these 3550s.
Optionally, the servers/hosts that need Internet connectivity can be placed in a separate VLAN on 3550s (DMZ).
FE connection between PIX firewalls is used for signalling purposes (statefull failover) and will not carry applications’ payload.
Figure 64 NOC – Physical Topology
mtkc_p_1
tkc_p_1 tkc_pe_1
mtkc_pe_1(cat6500)
mtkc_mce_1
tkc_mce_1(cat6500)
noc_pix_2
noc_pix_1
noc1
noc2
dmz1
GE dot1q
(vlan 20, 40)
FE
FE
GE
do
t1q
(vla
n 5
0,
60)
FE
FE
FE
FEFE dot1q
(vlan 70, 80) FE
FE
Outside Inside
GE dot1q
(vlan 10, 20)
noc_3550_2
noc_3550_1
16 For this reason the central catalysts will have to run native IOS (and not CatOS)
11.2 Logical Design
11.2.1 Interconnection with MPLS Core
Connectivity between NOC and MPLS core is based on ISIS routing protocol. BGP would be another alternative; however BGP is not configured on P routers.
For this reason, the backbone IGP has been extended across MPE-MCE IPv4 links. The following ISIS routing information is exchanged between MCEs and MPLS core network:
P-P links
P-PE links
P, PE Loopbacks
Cat3550s, 1760s in MAN and regional PoPs (VLAN 99 is configured on all 3550s and 1760s and injected in ISIS global routing table with passive interface command)
For resiliency reasons, two IPv4 links have been created – each on a separate MCE-MPE pair. Primary/backup scenario is achieved with tunning of ISIS costs. ISIS costs on MCE-MPE links must be sufficiently increased to prevent transit of P-P customer payload across MCEs.
11.2.2 Management VPN
In addition to IPv4 connection for exchange of ISIS routes, a dedicated logical interface (VPNv4 link) has been established between the MCE and MPE. This interface will be terminated in the Management VRF on MPE router, to create a communication path between NOC servers (eg. VPNSC) and MPLS/VPN service creation points.
e-BGP will be used to advertise the following routing information from the MPLS network to the NOC:
PE-CE connections of MPLS/VPN customers
Loopbacks of TMN-managed MPLS/VPN CE devices
For resiliency purpose, the VPNv4 link is created on both MCE-MPE pairs. Primary backup scenario is achieved with tuning of BGP metrics.
i-BGP session must be established between the MCEs to prevent black-holing of outbound traffic in the case when the MPE1 looses connectivity with the MPLS core, and the HSRP virtual address is still owned by MCE1.
The VPNv4 link will not be used for communication with backbone devices. There’s no route redistribution between ISIS and BGP on the MCE routers.
Please note that BGP routing with the Management VPN must not be provisioned manually – configuration is controlled by VPNSC.
11.2.3 NOC LAN - Outbound Routing
The following steps describe the routing in direction NOC -> MCE.
Each NOC server has a default static route configured, which points to the IP address configured on Inside interface of noc_pix_1 (Note: noc_pix_1 is active and noc_pix_2 is in standby mode under normal network operation)
When the packet arrives to noc_pix_1, it is again forwarded via statically configured default route, towards HSRP address shared by the two MCEs. When all network links and MCEs are functioning properly, tkc_mce_1 is Active and therefore acting as a default gateway, and mtkc_mce_1 is a Standby HSRP router.
11.2.4 NOC LAN - Inbound Routing
IP packets from MCEs towards NOC servers are also statically routed. Under normal network operation, management traffic will be received in NOC through MCE1.
Each MCE has a static route configured for NOC subnet. This static route points towards IP address on Outside port of noc_pix_1.
When the inbound packet is received at PIX, it can be delivered to appropriate server on NOC subnet, which is a directly connected route.
Figure 65 NOC – Routing Setup
mtkc_p_1
tkc_p_1tkc_pe_1
mtkc_pe_1
(cat6500)mtkc_mce_1
tkc_mce_1(cat6500)
noc_pix_2
noc_pix_1
noc1
noc2
Backbone
Sta
tefu
ll
Fai
love
r
Standby
Active
HS
RP
Static default route
Backbone ISIS - NOC announced towards backbone links
BGPv4 - NOC announced into Management VPN
Static route towards Inside subnet
11.3 IP Addressing
195.140.165.192/26 has been allocated for use in the Mipnet NOC. This prefix will be further subnetted as follows:
195.140.165.192/27 (default gateway 195.140.165.193) NOC servers (32 hosts)
195.140.165.224/28 (default gateway 195.140.165.225) DMZ hosts/servers (16 hosts) that require internet access
195.140.165.240/28
o 195.140.165.240/29 Outside subnet (8 hosts) with 195.140.165.241 being the HSRP address
o 195.140.165.248/29 Not yet utilised
VPNv4 MCE-MPE links will be numbered by ISC from CE-PE address range.
IPv4 MCE-MPE links and the link between MCEs will have IP addresses assigned from Backbone address range as define in Mipnet addressing scheme.
11.4 NOC VLANs
The following diagram shows VLAN setup in Mipnet NOC.
Figure 66 NOC – VLANs
mtkc_p_1
tkc_p_1 tkc_pe_1
mtkc_pe_1(cat6500)
mtkc_mce_1
tkc_mce_1(cat6500)
noc_pix_2
noc_pix_1
noc1
noc2
VLAN 60 Outside
VL
AN
70
In
sid
e
HS
RP
VL
AN
50
MC
E B
ack
-to
-B
ack
VLAN 10 ipv4_prim
VLAN 20 ipv4_back
VLAN 40 grey_back
VLAN 30 grey_prim
dmz1
VL
AN
80
DM
Z
<SECTION BREAKto avoid header/footer and page setup problems do not remove the carriage return
following this line>
12 Appendix I
IS-IS Convergence Tuning
IOS uses an exponential back-off algorithm to control the time when SPF and PRC are being executed. This algorithm allows to have IS-IS react very quickly to events which trigger an SPF or PRC, but to back-off and to wait longer for subsequent executions in case of continued instabilities. This avoids networks meltdowns seen when routing protocols need to react to constant changes causing high CPU usage on the route processor causing other problems like IGP keepalive drops and so on.
If you have a small network (less than 100 nodes) you can use short values for all timers. As we have a small network at this time, with few nodes and routes, and very powerful equipment/links, and fast convergence is our primary concern, we will try to keep the timers as small as possible. In our situation, we shall have <100 prefixes in ISIS, and <50 nodes, which is a relatively small network from IS-IS scalability perspective.
Now, just a few preliminary remarks:
IOS time granularity is 4 msecs.
This means that even if a command allows you to configure less than 4msecs, it is absolutely not guaranteed at all that such interval will be honoured.
When playing with fast convergence you want to react very fast to "bad news" (link going down) and you can afford to wait a bit longer for "good news" (link coming up). However, the current backoff algorithm doesn't take into account the type of the change.
So, let's review the timers.
lsp-gen lsp-max-wait lsp-initial-wait lsp-second-wait (defaults: 5s 50ms 5000ms)
The lsp-initial-wait argument indicates the initial wait time (in milliseconds) before generating the first LSP.
The lsp-second-wait indicates the amount of time to wait (in milliseconds) between the first and second LSP generation.
Each subsequent wait interval is twice as long as the previous one until the wait interval reaches the lsp-max-wait interval specified, so this value causes the throttling or slowing down of the LSP generation after the initial and second intervals. Once this interval is reached, the wait interval continues at this interval until the network calms down.
After the network calms down and there are no triggers for 2 times the lsp-max-wait interval, fast behavior is restored (the initial wait time).
Example: spf-interval 10 100 1000
On original trigger an inital delay of 100 ms is incurred prior to running SPF. If a second SPF is required, a delay of at least 1000 msecs must expire (Incremental
interval).The third SPF can only be run after another 2 sec (double the last interval of 1000 msec), then 4 sec, then 8 sec, then 10 sec up to the maximum interval of 10 sec. When the network stabilises, and no triggers are received for 2 times the minimum interval (20 sec in this example), return to steady state and fast behaviour (100 ms initial wait).
Of course you want each router to generate a new LSP as soon as something changes. However, don't forget that the backoff algorithm is *NOT* a dampening mechanism and there is no discrimination between "bad" and "good" news.
If a link flaps and if you use a short value for both initial and secondary timer you know that the incremental wait will increase slowly but at one stage you will start waiting anyway. If this happens when a "bad news" need to be processed, then you will just wait before advertising a new LSP and this means that you may not advertise a link down. This can be easily turn out in a (transient) routing loop. Transient routing loops are part of link-state protocols but still it's better to avoid them or at least to reduce their impact (especially when talking about fast convergence).
You don't want to take the risk of applying a longer penalty to a "bad" news event and therefore you want to react exactly the same for all your link-state changes.
So, the recommendation would be to use a small value for the backoff algorithm for lsp generation and use relatively short values:
lsp-gen 1 20 1000
This means that you have a initial delay of 20 msec and then the secondary wait and the maximum wait are the same (1 second) in order to apply the same timer to all events (link up/down).
spf-interval spf-max-wait spf-initial-wait spf-second-wait (defaults: 10s 5500ms 5500ms)
Here we want to use an aggressive spf timer:
spf-interval 1 40 80
What link-state routing protocols are supposed to do is to flood first and then compute SPF. In Cisco implementation SPF computation and flooding are handled in the same process. Without entering in too many details, there is a main loop in the process during which you interleave packet processing (i.e.: flooding) and SPF computation.
If you set short spf timers, it _may_ happen that, when receiving new LSPs, you run SPF _before_ flooding it. Thus you will have a kind of distance-vector behaviour where your
neighbours have to wait for you to compute your routing table before getting the routing update.
Again, if the network is pretty small then the time SPF will take is not significant and the slowdown will not even be notified. If you have a very large network (more than 2000 nodes) then, you will see the difference since SPF will probably take around 200 msecs and each node _may_ run spf before flooding (so the next node will get the update 200msecs later).
In this case the initial delay is 40 msecs during which all flooding will occur. In our scenario, this should be ample time to flood all the LSP's. Then you run SPF and for the next one you will wait at least the number of msecs it takes to run SPF (this is easy to find with "show isis spf-log" command. As a safe value here I have taken 80 msecs.
Note that if you use a shorter value (as initial and secondary wait) it will have the effect to not to have (almost) any delay between the very first SPF computations which in some cases allows to compute "good" news pretty fast but at the cost of multiple SPFs. This is because when a new adjacency is discovered you need both ends of the adjacency to report it on its respective LSP in order for SPF to successfully consider it.
So, running SPF with no delay may be too early and (for "good" news) you will need another SPF.
However, in order for SPF to detect a link failure it's enough to compute SPF with one LSP (you don't need to have both ends's LSPs). So again: bad news has to be processed faster than good news and in this case you can afford to run SPF immediately (no delay). However, as stated before, backoff is _NOT_ a dampening mechanism and we do not discriminate between "bad" and "good" news. So we have to use a common configuration.
prc-interval prc-max-wait prc-initial-wait prc-second-wait (defaults: 5s 2000ms 5000ms)
PRC is used when just some leaf routes (i.e.: IP prefix) have changed on one or more LSPs. Then we just flag that particular LSP for recalculation and instead of re-running SPF we just recompute all leaf routes of that particular LSP. So, in general this involves a few LSPs and doesn't take that much time. We can afford here to have much more aggressive timers than in SPF but still we can't predict how many LSPs we will have to recalculate:
prc-interval 1 40 20
Same as for spf-interval, we have to give flooding a chance to happen before (even if in this case is less dangerous since prc computation will really take a few msecs). Then we can use a smaller increment value (20 msecs) again, because the CPU usage for a PRC is very limited.
lsp-interval (interface command)
This is something many people just simply forget about. In IS-IS specs there is a requirement of LSP pacing over interfaces. This results on a 33msecs gap between LSPs during flooding, which means that you will never use more than ~360Kb/sec for flooding. In today's networks it can be seen as a bit concervative and therefore lsp-interval allows you to arbitrarily change the interval between LSPs. I think we can afford better performance on flooding and you can reduce the interval to:
lsp-interval 15
This will not change that much in terms of convergence speed since in general flooding concerns just a few LSPs and if you take the example of an adjacency going down, only one LSP is enough for SPF to drop the adjacency from the shortest path tree (remember: you want to process bad news faster than good news).
So to summarise, the initial values we would recommend for the IS-IS timers are following:
lsp-gen 1 20 1000spf-interval 1 40 80prc-interval 1 40 20lsp-interval 15
Please remember that these are initial values, and they may require further tuning as the network evolves.
<SECTION BREAKto avoid header/footer and page setup problems do not remove the carriage return
following this line>
13 Appendix II
<SECTION BREAKto avoid header/footer and page setup problems do not remove the carriage return
following this line>
14 Appendix III
14.1
<SECTION BREAKto avoid header/footer and page setup problems do not remove the carriage return
following this line>
15 Appendix IV
15.1 Glossary of Terms
<SECTION BREAKto avoid header/footer and page setup problems do not remove the carriage return
following this line>
文档名称内 部 公
开
Corporate HeadquartersCisco Systems, Inc.170 West Tasman Drive
European HeadquartersCisco Systems Europe11 Rue Camille
Americas HeadquartersCisco Systems, Inc.170 West Tasman
Asia Pacific HeadquartersCisco Systems Australia, Pty., Ltd
2023-4-13 华为机密,未经许可不得扩散 第 239页, 共 240页
文档名称内 部 公
开
San Jose, CA 95134-1706USAwww.cisco.comTel: 408 526-4000
800 553-NETS (6387)Fax: 408 526-4100
Desmoulins92782 Issy-Les-Moulineaux Cedex 9Francewww-europe.cisco.comTel: 33 1 58 04 60 00Fax: 33 1 58 04 61 00
DriveSan Jose, CA 95134-1706USAwww.cisco.comTel: 408 526-7660Fax: 408 527-0883
Level 9, 80 Pacific HighwayP.O. Box 469North Sydney NSW 2060 Australiawww.cisco.comTel: +61 2 8448 7100Fax: +61 2 9957 4350
Cisco Systems has more than 200 offices in the following countries and regions. Addresses,
phone numbers, and fax numbers are listed on the
Cisco Web site at www.cisco.com/go/offices.
Argentina • Australia • Austria • Belgium • Brazil • Bulgaria • Canada • Chile •
China • Colombia • Costa Rica • Croatia • Czech Republic Denmark • Dubai, UAE
Finland • France • Germany • Greece • Hong Kong SAR • Hungary • India •
Indonesia • Ireland • Israel • Italy • Japan • Korea • Luxembourg • Malaysia •
Mexico
The Netherlands • New Zealand • Norway • Peru • Philippines • Poland • Portugal •
Puerto Rico • Romania • Russia • Saudi Arabia • Singapore • Slovakia • Slovenia
South Africa • Spain • Sweden • Switzerland • Taiwan • Thailand • Turkey
• Ukraine • United Kingdom • United States • Venezuela • Vietnam • Zimbabwe
2023-4-13 华为机密,未经许可不得扩散 第 240页, 共 240页