69
The Basics of BGP (Border Gateway Protocol) Routing and its Performance in Today’s Internet Presenter: Sophia Poku Slides taken from presentation by Nina Taft

The Basics of BGP (Border Gateway Protocol) Routing and its Performance in Today’s Internet Presenter: Sophia Poku Slides taken from presentation by Nina

Embed Size (px)

Citation preview

The Basics of BGP (Border Gateway Protocol) Routing and its Performance in Today’s Internet

Presenter: Sophia Poku Slides taken from presentation by Nina Taft

Outline1. Highlights

2. Addressing and CIDR

3. BGP Messages and Prefix Attributes

4. BGP Decision and Filtering Processes

5. I-BGP

6. Route Reflectors

7. Multihoming

8. Aggregation

9. Routing Instability

10. BGP Table Growth

Routing Protocols

R border router

internal router

R1

AS1

R4

R5

B

AS3

E-BGP

R2R3

A

AS2 announce B

IGP: Interior Gateway Protocol.Examples: IS-IS, OSPF

I-BGP

AS (Autonomous System) - a collection of routers under the same technical and administrative domain. EGP (External Gateway Protocol) - used between two AS’s to allow them to exchange routing information so that traffic can be forwarded across AS borders. Example: BGP

Routers used

Internal Router: directly connects networks belonging to the same area

It runs a single copy of the basic routing protocol

Border/Boundary Router: exchanges routing information with routers belonging to other AS

Purpose: to share connectivity information

R border router

internal router

BGPR2

R1

R3

A

AS1

AS2

you can reachnet A via me

traffic to A

table at R1:dest next hopA R2

BGP Sessions Primary function is to exchange network-reachability

information (includes AS #s) Uses TCP to establish connection Initially … node advertises ALL routes it wants neighbor

to know (could be >50K routes)

Ongoing … only inform neighbor of changes One router can participate in many BGP sessions.

AS1

AS2 AS3

Configuration and Policy

A BGP node has a notion of which routes to share with its neighbor. It may only advertise a portion of its routing table to a neighbor.

A BGP node does not have to accept every route that it learns from its neighbor. It can selectively accept and reject messages.

What to share with neighbors and what to accept from neighbors is determined by the routing policy, that is specified in a router’s configuration file.

Addressing Schemes Original addressing schemes (class-based):

32 bits divided into 2 parts:

Class A

0xxx or 1-126 in decimal; subnet mask:255.0.0.0

Class B

10xx or 128-192 in decimal

Subnet mask:255.255.0.0

Class C

110x or 192-223 in decimal, Subnet Mask:255.255.255.0

network host 0

80

network host 0

160

~2 million nets256 hostsnetwork host 0

240

CIDR (Classless Inter-Domain Routing)

CIDR introduced to solve 2 problems: exhaustion of IP address space size and growth rate of routing table

Problem #1: Lifetime of Address Space

Example: an organization needs 500 addresses. A single class C address not enough (256 hosts). Instead a class B address is allocated. (~64K hosts) That’s overkill -a huge waste.

CIDR allows networks to be assigned on arbitrary bit boundaries. permits arbitrary sized masks: 178.24.14.0/23 is valid requires explicit masks to be passed in routing protocols

CIDR solution for example above: organization is allocated a single /23 address (equivalent of 2 class C’s).

Problem #2: Routing Table Size

serviceprovider

232.71.0.0232.71.1.0232.71.2.0…..232.71.255.0

232.71.0.0232.71.1.0232.71.2.0…..232.71.255.0

Globalinternet

Without CIDR:

serviceprovider

232.71.0.0232.71.1.0232.71.2.0…..232.71.255.0

Globalinternet

With CIDR:

232.71.0.0/16

CIDR: Classless Inter-Domain Routing

Address format <IP address/prefix P>. The prefix denotes the upper P bits of the IP address.

E.g. in CIDR address 206.13.01.48/25, the “/25” indicates the first 25 bits are used to identify a unique network, the remaining bits are host’s

Idea - use aggregation - provide routing for a large number of customers by advertising one common prefix. This is possible because nature of addressing is

hierarchical Summarizing routing information reduces the size of

routing tables, but allows to maintain connectivity. Aggregation is critical to the scalability and survivability of

the Internet

Address Arithmetic: Address Blocks

The <address/prefix> pair defines an address block: Examples:

128.15.0.0/16 => [ 128.15.0.0 - 128.15.255.255 ] 188.24.0.0/13 => [ 188.24.0.0 - 188.31.255.255 ]

consider 2nd octet in binary:

Address block sizes a /13 address block has 232-13 addresses(=524288) (/16 has

232-16 =65536) a /13 address block is 8 times as big as a /16 address block

because 232-13 = 232-16 * 23

13th bit

188.00011000.0.0

settable

CIDR: longest prefix match Because prefixes of arbitrary length allowed,

overlapping prefixes can exist. Example:

router hears 124.39.0.0/16 from one neighborand 124.39.11.0/24 from another neighbor

Router forwards packet according to most specific forwarding information, called longest prefix match Packet with destination 124.39.11.32 will be

forwarded using /24 entry. Packet w/destination 124.39.22.45 will be forwarded

using /16 entry

Will CIDR work ? For CIDR to be successful need:

address registries must assign addresses using CIDR strategy

providers and subscribers should configure their networks, and allocate addresses to allow for a maximum amount of aggregation

BGP must be configured to do aggregation as much as possible

Factors that complicate achieving aggregation multihoming, proxy aggregation, changing

providers

Four Basic Messages

Open: Establishes BGP session (uses TCP port #179)

Notification: Report unusual conditions

Update:Inform neighbor of new routes that become activeInform neighbor of old routes that become inactive

Keepalive: Inform neighbor that connection is still viable

BGP Database

1.Neighbor table

List of BGP neighbors

2. BGP forwarding table

List of all networks learned from each neighbor

3. IP routing table

List of best path to destination networks

OPEN Message

During session establishment, two BGP speakers exchange their AS numbers BGP identifiers (usually one of the router’s IP addresses) Router ID Holdtime

Open messages are confirmed using a keep-alive message sent by a peer and must be confirmed before updates

A BGP speaker has option to refuse a session Select the value of the hold timer:

maximum time to wait to hear something from other end before assuming session is down.

authentication information (optional)

NOTIFICATION and KEEPALIVE Messages

NOTIFICATION Indicates an error terminates the TCP session gives receiver an indication of why BGP session terminated Examples: header errors, hold timer expiry, bad peer AS,

bad BGP identifier, malformed attribute list, missing required attribute, AS routing loop, etc.

KEEPALIVE protocol requires some data to be sent periodically.

If no UPDATE to send within the specified time period, then send KEEPALIVE message to assure partner that connection still alive

UPDATE Message

Updates are sent using TCP to ensure delivery

used to either advertise and/or withdraw unfeasible prefixes from routing table

path attributes: list of attributes that pertain to ALL the prefixes in the Reachability Info field

Withdrawn routes length (2 octets)

Withdrawn routes (variable length)

Total path attributes length (2 octets)

Path Attributes (variable length)

Reachability Information (variable length)

FORMAT:

Advertising a prefix

When a router advertises a prefix to one of its BGP neighbors: information is valid until first router explicitly

advertises that the information is no longer valid BGP does not require routing information to be

refreshed if node A advertises a path for a prefix to node B,

then node B can be sure node A is using that pathitself to reach the destination.

BGP Attributes

Attributes: routes learned via BGP have associated properties that are used to determine the best route to a destination when multiple paths exist to a particular destination

• Local Preference• Multi-Exit Discriminator (MED)• Origin• AS-path• Next-hop

Attribute: ORIGIN

ORIGIN: Who originated the announcement? Where was a

prefix injected into BGP? – indicates how BGP learned about a particular route

IGP: route is interior to the originating AS. This value is Value set using network router configuration command to inject router into BGP

EGP: route learned via the External Gateway Protocol Incomplete (often used for static routes): origin of

routes unknown or learned in some other way

Attributes: AS_PATH

a list of AS’s through which the announcement for a prefix has passed each AS prepends its

AS # to the AS-PATH attribute when forwarding an announcement

useful to detect and prevent loops

Attribute: NEXT HOP

IP address used to reach the advertising router For EBGP session, NEXT HOP = IP address

of neighbor that announced the route. For IBGP sessions, if route originated inside

AS, NEXT HOP = IP address of neighbor that announced the route

For routes originated outside AS, NEXT HOP of EBGP node that learned of route, is carried unaltered into IBGP.

Destination Next Hop

140.20.1.0/24 2.2.2.2

209.15.1.0/24 1.1.1.1

Destination Next Hop

140.20.1.0/24 2.2.2.2

2.2.2.0/24 3.3.3.3

3.3.3.0/24 Connected

209.15.1.0/24 1.1.1.1

1.1.1.0/24 3.3.3.3

BGP Table at Router C:

IP Routing Table at Router C:

B

A

D

C 2.2.2.2

3.3.3.3

140.20.1.0/24IBGP

EBGP

209.15.1.0/24

1.1.1.1

Next-Hop Cont’d

Router C advertises 172.16.1.0 with next hop 10.1.1.1

A propagates it within its AS

Attribute: Multi-Exit Discriminator (MED) when AS’s

interconnected via 2 or more links

AS announcing prefix sets MED

enables AS2 to indicate its preference

AS receiving prefix uses MED to select link

a way to specify how close a prefix is to the link it is announced on

Attribute: Local Preference Used to prefer an exit point

from the local AS Used to indicate preference

among multiple paths for the same prefix anywhere in the internet.

The higher the value the more preferred

Exchanged between IBGP peers only. Local to the AS.

Often used to select a specific exit point for a particular destination

Routing Process Overview

Inputpolicyengine

Decisionprocess

Routesused byrouter

Routesreceived from peers

Routessent topeersOutput

policyengine

BGP table IP routingtable

Choosebest route

accept,deny, set preferences

forward,not forwardset MEDs

Input Policy Engine Inbound filtering controls outbound traffic

filters route updates received from other peers filtering based on IP prefixes, AS_PATH, community denying a prefix means BGP does not want to reach that

prefix via the peer that sent the announcement accepting a prefix means traffic towards that prefix may be

forwarded to the peer that sent the announcement Attribute Manipulation

sets attributes on accepted routes example: specify LOCAL_PREF to set priorities among

multiple peers that can reach a given destination

BGP Decision Process

1. Choose route with highest LOCAL-PREF

2. If have more than 1 route, select route with shortest AS-PATH

3. If have more than 1 route, select according to lowest ORIGIN type where IGP < EGP < INCOMPLETE

4. If have more than 1 route, select route with lowest MED value

5. Select min cost path to NEXT HOP using IGP metrics

6. If have multiple internal paths, use BGP Router ID to break tie.

Output Policy Engine

Outbound Filtering controls inbound traffic forwarding a route means others may choose to

reach the prefix through you not forwarding a route means others must use

another router to reach the prefix may depend upon whether E-BGP or I-BGP peer example: if ORIGIN=EGP and you are a non-transit

AS and BGP peer is non-customer, then don’t forward

Attribute Manipulation sets attributes such as AS_PATH and MEDs

Transit vs. Nontransit AS

AS1

ISP1 ISP2

r1r2 r2

r3

r2

r1 r3

AS1

ISP1 ISP2

r1r2,r3 r2,r1

r3

r2

r1 r3

Transit traffic = traffic whose source and destination are outside the AS

Nontransit AS: does not carry transit traffic Transit AS: does carry transit traffic• Advertise own routes only• Do not propagate routes learned from other AS’s• case 1:

• Advertises its own routes PLUS routeslearned from other AS’s

AS1

r1r2 r2r3

r2

AS2r1 r3• case 2:

Internal BGP (I-BGP) Used to distribute routes learned via EBGP to all the

routers within an AS I-BGP and E-BGP are same protocol in that

same message types used same attributes used same state machine BUT use different rules for readvertising prefixes

Rule #1: prefixes learned from an E-BGP neighbor can be readvertised to an I-BGP neighbor, and vice versa

Rule #2: prefixes learned from an I-BGP neighbor cannot be readvertised to another I-BGP neighbor

I-BGP: Preventing Loops and Setting Attributes Why rule #2? To prevent announcements from

looping. In E-BGP can detect via AS-PATH. AS-PATH not changed in I-BGP

Implication of rule: a full mesh of I-BGP sessions between each pair of routers in an AS is required

Setting Attributes: The router that injects the route into the I-BGP mesh is responsible for setting the LOCAL-PREF attribute prepending AS # to AS-PATH

Route Reflectors Problem: requiring a full mesh of I-

BGP sessions between all pairs of routers is hard to manage for large AS’s.

Solution: group routers into clusters. Assign a leader to each cluster,

called a route reflector (RR). Members of a cluster are called

clients of the RR I-BGP Peering

clients peer only with their RR

RR’s must be fully meshed

clients

RR

AB

C clusters

RR

RR

Route Reflectors: Rule on Announcements Provides mechanisms for minimizing the # of updates

messages transmitted within an AS and reducing the amount of data propagated in each message.

If received from RR, reflect to clients If received from a client, reflect to RRs and clients If received from E-BGP, reflect to all - RRs and

clients RR’s reflect only the best route to a given prefix, not

all announcements they receive. helps size of routing table sometimes clients don’t need to carry full table

Avoiding Loops with Route Reflectors Loops cannot be detected by traditional approach using

AS-PATH because AS-PATH not modified within an AS. Announcements could leave a cluster and re-enter it. Two new attributes introduced:

ORIGINATOR_ID: router id of route’s originator in ASrule: announcement discarded if returns to originator

CLUSTER_LIST: a sequence of cluster id’s. set by RRs.rule: if an RR receives an update and the cluster list

contains its cluster id, then update is discarded.

Multihoming

Single-homed vs. Multi-homed subscribers A single-homed network has one connection to

the Internet (i.e., to networks outside its domain)

A multi-homed network has multiple connections to the Internet. Two scenarios: can be multi-homed to a single provider can be multi-homed to multiple providers

Why multi-home? Reliability Performance

Single-homed AS

Subscriber called a “stub AS” Provider-Subscriber communication for

route advertisement: static configuration. most common.

Provider’s router is configured with subscriber’s prefix.

good if customer’s routes can be represented by small set of aggregate routes

bad if customer has many noncontiguous subnets

can use BGP between provider and stub AS

R2

Subscriber

R1

Provider

Multihoming to Multiple Providers

Customer

ISP 1 ISP 2

ISP 3

Multihoming Issues Load sharing

how distribute the traffic over the multiple links? Reliability

if load sharing leads to preferencing certain links for certain subnets, is reliability reduced?

Address/Aggregation which subnet addresses should the multihomed

customer use? how will this affect its provider’s ability to

aggregate routes?

Load sharing from ISP to Customer using attributes

Goal: provider splits traffic across 2 links according to prefix

Implement this strategy using attributes customer sets MEDs provider sets

LOCAL_PREF

R2

Customer

R3

R1

ISP

140.35/16

208.22/16

208.22/16140.35/16

Load sharing from Customer to ISP using policy Goal: send traffic to

ISP’s customers on one link; send traffic to the rest of the Internet on 2nd link

Implement using policy to control announcements

R3

Customer

R1

ISP

R2

advertisecustomerroutes only

advertisedefaultroute 0/0

traffic

blue: announcementsred: traffic

Address/Aggregation Issue Where should the

customer get its address block from? 1. From ISP1 2. From ISP2 3. From both ISP1 and

ISP2 4. Independently from

address registry

ISP 3

ISP 1 ISP 2

customer(cases 1 and 2 are equivalent)

Case 1 & 2: Get address block from one ISP

ISP 3

ISP 1

140.20/16

ISP 2

200.50/16

140.20.6/24

customer

140.20.6/24140.20.6/24

140.20/16200.50/16140.20.6/24

example: customer gets address from ISP 1

ISP 1’s aggregation is not broken

customer’s prefix not aggregatable at ISP 2

longer prefix becomes a traffic magnet

How good is load sharing?If all ISP’s generate same amount of traffic for customer, then ISP2-customer link twice as loaded as ISP1-customer link

Case 3: Get address block from both ISPs

announcement policy: announce prefix only to its “parent”

advantage: both ISP’s can aggregate the prefix they receive

disadvantage: lose reliability

load balancing good? depends upon how much traffic sent to each prefix

ISP 3

ISP 1 ISP 2

customer

140.20/16 200.50/16

140.20.1/24 200.50.1/24

140.20.1/24 200.50.1/24

Case 4: obtain address block from registry no aggregation possible no traffic magnets

created good reliability how to achieve load

sharing? customer breaks address

block into 2 /25 blocks, and announce one per link (but may lose reliability)

OR use method of AS-PATH manipulation

150.55.10/24

150.55.10/24

ISP 3

ISP 1 ISP 2

customer

100.20/16 200.50/16

150.55.10/24

150.55.10/24

150.55.10/24

200.50/16100.20/16

AS-PATH manipulation Idea: prepend your AS

number in AS-PATH multiple times to discourage use of a link

makes AS-PATH seem longer than it is

recall BGP decision process uses shortest AS-PATH length as a criteria for selecting best path

Example: ISP 3 will choose path through AS 2 because its AS-PATH appears shorter

150.55.10/24 - 33

ISP 3

AS 1 AS 2

150.55.10/24 - 33 33

customer

150.55.10/24

AS 33

150.55.10/24 - 1 33 33 150.55.10/24 - 2 33

Aggregation

Address Arithmetic: When is aggregation possible? Case 1

Process of combining characteristics of several different routes in such a way that a single route is advertised.

Possible when one prefix contained in another. Example: Two AS’s having customer-provider relationship.

Provider does the aggregation. Provider has address block 18.0.0.0/8 Its customers have address blocks 18.6.0.0/15 and 18.9.0.0/15 Provider announces its address block only

Rule: Prefix p1 contains prefix p2 iff length(p2) > length(p1) ANDaddress(p2) / 232-length(p1) = address(p1) / 232-length(p1)

Address Arithmetic: When is aggregation possible? Case 2

Some pairs of consecutive prefixes Example: routes within the same AS:

AS has 2 address blocks: 1.2.2.0/24 = 0000001.00000010.00000010.00000000/241.2.3.0/24 = 0000001.00000010.00000011.00000000/24can announce instead 1.2.2.0/23

Rule: consecutive prefixes p1 and p2 are aggregatable iff

length(p1)=length(p2) ANDaddress(p1) / 232-length(p1) +1 = address(p2) / 232-length(p2) AND address(p1) % 233-length(p1) = 0

“The cardinal sin of BGP routing is advertising routes that you don't know how to get to; called "black-holing" someone”

If you announce part of IP space owned by someone else, using a more-specific prefix, all their traffic flows to you. Makes that address block disconnected from the Internet.

Example: black holes can happen inadvertently by non-careful aggregation

Black holes and cardinal sins

ISP 1100.24/16

ISP 2222.2/16

NAP

100.24.16.0/21

Net A

[100.24.16.0-100.24.23.255]

100.24.0.0/20

Net B[100.24.0.0-100.24.15.255]

Net C

100.24.56.0/21

[100.24.56.0-100.24.63.255]

100.24/16

222.2/16

wrong !!100.24.0.0/18

Limitations to Aggregation Lack of hierarchical allocation of address space

prior to CIDR (before 1995) A single AS can have noncontiguous address

blocks Customer AS and provider AS can have non-

contiguous address blocks Reluctance of customers to renumber their address

space when they switch providers Multi-homing

multi-homed prefixes require global visibility may choose not to: load sharing

Rules of Thumb for Aggregation

To avoid black holes: an ISP is not allowed to aggregate routes unless it is a supernet of the address block it wants to aggregate

In other words, an ISP has to specifically announce each IP address range of its downstream customers that are not part of its own address space.

Routing Instability

Route Stability Routing instability: rapid fluctuation of network

reachability information route flapping: when a route is withdrawn and

re-announced repeatedly in a short period of time happens via UPDATE messages

because messages propagate to global Internet, route flapping behavior can cascade and deteriorate routing performance in many places

Effects: increased packet loss, increased network latency, CPU overhead, loss of connectivity

Route Stability Studies by C. Labovitz, R. Malan & F. Jahanian

Types of Routing Updates Forwarding instability

reflects legitimate topology changes e.g., changes in Prefix, NEXT_HOP and/or ASPATH affects forwarding paths used

Policy fluctuation reflects changes in policy e.g., changes in MED, LOCAL_PREF, etc. may not necessarily affect forwarding paths used

Pathological redundant messages reflect neither topology nor policy changes

Who’s Responsible?

AS’s No single AS dominates instability statistics No correlation between the size of an AS and

its share of updates generated. Prefixes

Instability is evenly distributed across routes. Example of measurements:

75% of AADiff events come from prefixes change less than 10 times a day.

80-90% of instability comes from prefixes that are announced less than 50 times/day.

Sources of Instabilities in General

router configuration errors transient physical and data link problems software bugs problems with leased lines (electrical timing

issues that cause false alarms of disconnect) router failures network upgrades and maintenance

Instability Problem and Cause. Example 1. Problem: 3-5 million duplicate withdrawals Cause: stateless BGP implementation

time-space tradeoff: no state maintained on what advertised to peers

when receive any change, transmit withdrawal to all peers regardless of whether previously notified or not

sent updates for both explicit and implicit withdrawals By 1998, most vendors had BGP implementations with

partial state. Result: number of WWDups reduced by an order of

magnitude

Instability Problem and Cause. Example 2 Problem: duplicate announcements Cause: min-advertisement timer & stateless BGP

min-adv timer: wait 30 seconds. Combine all received updates in last 30 seconds into single outbound update message (if possible).

within 30 seconds route can be withdrawn and re-announced so that there is no net change to original announcement

Solution: Have BGP keep some state about recently sent messages to peers. Avoid sending duplicate messages

Instability Problem and Cause. Example 3

AS 1R2

R1 R3

R4

R6

R5

R border router

internal router

AS2

AS3

R8

R7

E-BGP

10

23

5 6

IGP

Net X

4

MED=15

MED=5

Example: interaction IGP/BGPpolicy: set MED using IGP metrics, such as shortest path

MED=24

Controlling route instability: Route Dampening track number of times a route has flapped

over a period of time routes that exhibit a high level of instability in

a short period of time should be suppressed (not advertised)

penalize ill behaved routes proportionally to their expected future stability

if a suppressed route stops flapping for a long enough period of time, unsuppress it (readvertise)

Route Dampening

time

pena

lty

reuse limit

suppress limit

Route Dampening Algorithm

Each time a route flaps, increase the penalty If the route has not flapped in the last ‘half-

life’ time units, then cut penalty in half. If the penalty > suppress limit, then suppress

the route If the penalty < reuse limit, then free a

suppressed route

Side Effect of Route Dampening

A legitimate update may arrive and it will be ignored because that route has been suppressed and not yet released.

The modification needed for the legitimate announcement is delayed

Aggregation can help route flapping If a more-specific route is

flapping, but provider only announces aggregated prefix, then other networks don’t see route flap.

Hence aggregation can mask route flapping.

Aggregation helps combat instability because it reduces the number of networks visible in the core Internet.

Tier 1 provider

Tier 2 provider

customer

140.40.4/24

140.40/16

flapping

not flapping