69
1 Internet Routing Instability and it's Origins Ilia Ferdman Lilia Tsvetinovich

1 Internet Routing Instability and it's Origins Ilia Ferdman Lilia Tsvetinovich

  • View
    222

  • Download
    0

Embed Size (px)

Citation preview

1

Internet Routing Instability and it's Origins

Ilia FerdmanLilia Tsvetinovich

2

Abstract

Problems discussed Internet Routing Instability Origins of Internet Routing Instability

3

Internet Routing Instability

Defined as rapid fluctuation of network reachability and topology information

Also referred as “route flap”

4

Origins of Routing Instability

Router configuration errorsTransient physical and data link

problemsSoftware bugs

5

Primary Effects

Instability can lead to Increased packet loss Delays in the time for network

convergence Additional resource overhead(memory,

CPU)Imminent “death of the Internet”

6

Internet Structure

Comprised of interconnected regional and national backbones

Large public exchange points are the “core” of the Internet

7

Internet Structure (cont.)

BSP –Backbone

service provider

EP –Exchange

points

BSP

1

BSP

1EP 2

EP 2

EP 4

EP 4

EP 3

EP 3

EP 1

EP 1

BSP

3

BSP

3

BSP

2

BSP

2

BSP

5

BSP

5

BSP

7

BSP

7

BSP

4

BSP

4

BSP

6

BSP

6

8

Internet Structure (cont.)

Backbone service providers exchange Traffic Routing information

Backbones in the core maintain default-free routing table

9

Internet Structure (cont.)Autonomous systems

Distinct routing policies Connect to private or public exchange

pointsPeer border routers in AS exchange

reachability information to prefixesPrefixes – IP address blocksExchange information through BGP

10

Border Gateway Protocol

BGP Incremental

protocol Uses TCP Limits distribution

of routing information

IGRP, OSPF, etc Interior protocols Use datagram

service Flood network with

all known routing table entries

BGP vs. IGRP & OSPF

11

BGP (cont.)

Allows configuration for policy (MED)

MED – Multi Exit Descriptor

ASPATH - list of AS numbers

12

BGP (cont.)Allows configuration for policy (MED)MED – Multi Exit DescriptorASPATH - list of AS numbersBGP updates

Announcements Withdrawals

13

BGP updates - Withdrawals

Explicit Withdrawals

Implicit Withdrawals

R1

R1

R2

R2

R3

R3

R1

R1

R2

R2

14

BGP updates - Withdrawals

Explicit Withdrawals

R1

R1

R2

R2

R3

R3

15

BGP updates - Withdrawals

Implicit Withdrawals

R1

R1

R2

R2

16

BGP (cont.)Allows configuration for policy (MED)ASPATHBGP updates

Announcements Withdrawals

Stable wide-area networks performance expectations

17

Methodology

Since January 1996, 9 monthsRouting Arbiter projectPublic exchange points: AADS, Mae-

East, Mae-West, PacBell, Sprint

18

Methodology

19

Methodology

Mae-East backbone service providers: ANS, BBN, MCI, Sprint and UUNet

RAP – Routing Arbiter ProjectRoute Servers used to collect

information12 gigabytes of compressed data

20

Types of Routing InstabilityBGP updates Instability rateForwarding instabilityRouting Policy FluctuationsPathological updates

Instability – instance of forwarding instability or policy fluctuations

21

Possible impacts

Increase in cache missesCPU & memory problemsRoute “flap storm”Forwarding loops

22

Route Caching Architecture

Routing table cache of destination and next-hop lookups

Routing table is too big to keep it in main memory

Instability causes increase in cache misses

Load on CPU

23

Route Caching Architecture

Possible solution: Full routing table in

main memory

24

Possible impacts

Increase in cache missesCPU & memory problemsRoute “flap storm”Forwarding loops

25

CPU & Memory Problems

Normally could manage the router’s computational needs

Instability places large demands on a router’s CPU

Keep-Alive packets delayed

26

Possible impacts

Increase in cache missesCPU & memory problemsRoute “flap storm”Forwarding loops

27

Route “flap storm”

Overloaded router marked as unreachable

Peer routers choose alternative paths

Peers update their peers

“Down” router recovers and tries to re-initiate peering sessions

Large state dump transmissions are generated

Increased load causes more routers to fail

28

Route “flap storm” (cont.)Possible solution:

Higher priority to Keep-Alive messages

29

Possible impacts

Increase in cache missesCPU & memory problemsRoute “flap storm”Forwarding loops

30

Forwarding loops

Defined as steady-state cyclic transmission of user data between a set of peers

Loop verification by checking ASPATH

Unconstrained routing policies

31

BGP Update Types

WA Different – WADiffAA Different – AADiffWA Duplicate – WADupAA Duplicate – AADupWW Duplicate – WWDup

32

BGP Update Types - WADiff

Explicit withdrawalUnreachable route is replaced by

alternative routeASPATH or next-hop attribute differs

Forwarding instability

33

BGP Update Types - AADiff

Implicit withdrawalRoute is unreachableAlternative path becomes available

Forwarding instability

34

WADiff and AADiff

WADiff Explicit withdrawal Forwarding

instability

AADiff Implicit withdrawal Forwarding

instability

Route is replaced by alternative one

35

BGP Update Types - WADup

Explicit withdrawalRoute explicitly withdrawn and then

re-announced a reachableTransient topological problems (link

or router)

Forwarding instability or Pathological behavior

36

BGP Update Types - AADup

Implicit withdrawalRoute is implicitly withdrawn and

replaced by it’s duplicateDuplicate route does not differ in

ASPATH or next-hop attribute information

Policy fluctuations and Pathological behavior

37

WADup and AADup

WADup Explicit withdrawal Pathological behavior

Forwarding instability

AADup Implicit withdrawal Pathological behavior

Policy fluctuations

38

BGP Update Types - WWDup

Repeated BGP withdrawals for a prefix that is unreachable

Pathological behavior

39

BGP Update Types - Summary  Explicit

WithdrawalImplicit

WithdrawalForwarding instability

Policy Fluctuations

Pathological

Behavior

WADiff V   V    

AADiff   V V    

WADup V   V   V

AADup   V   V V

WWDup – 

– 

    V

40

BGP Update Types

41

WW Duplicate

Transmitted by routers of AS that never previously announced reachability for the withdrawn prefixes

42

Let’s have a break

43

Internet Routing Instability and it's Origins

Ilia FerdmanLilia Tsvetinovich

44

Instability Origins

Hardware configuration problems

Software bugs problems

Multi – Homing sites

BGP implementation problems

45

Instability Origins – Hardware configurationInternet growth -> Traffic growth ->

New hardware needOld Hardware -> Increase in number

of updates : CPU overload Link failures

Small Service Providers use old hardware

46

Instability Origins – Hardware configurationCache architecture

Not all prefix table in memory

Increase in number of updates -> Increase in number of cache misses

47

Instability Origins – Software bugs

Use of old or not configured software is

the reason for Routing Instability

Small Service Providers use old software

48

Instability Origins – Multi – Homing sitesEnd-sites connect to

Internet via multiple Service Providers(SP)

Multi-Homed customer prefixes require global visibility

Routers maintain longer prefixes

SP1SP1

SP2SP2

SP3SP3

SiteSite

49

Instability Origins – BGP implementation

Stateless BGP Announcements or withdrawals are

send without check O(N*U) additional updates

N – number of routersU – number of updates

There are better implementations

50

Instability Origins – BGP implementation

Misconfigured interaction between different gateway protocols

R1BGP

R1BGP

R2OSPF

R2OSPF

51

Possible solutions

Route Server

Route Dampening Algorithm

Aggregation

52

Possible solutions – Route Servers

R1,R2,R3,R4,R5 – routers

R.S. – rout server

R.S. collects BGP information from routers

R3

R3

R2

R2

R5

R5

R4

R4

R1

R1

53

Possible solutions – Route Servers

R1,R2,R3,R4,R5 – routers

R.S. – route server

R.S. collects BGP information from routers

R3

R3

R2

R2

R5

R5

R4

R4

R1

R1

R.S.

R.S.

54

Possible solutions – Route Servers

Do not forward network trafficPeer with service providersProvide aggregate BGP informationUnique platform for statistic

collection and monitoring

55

Possible solutions – Route Dampening Algorithm“Hold-down” frequent updates

Announcements about new networks delayed

Draconian version of enforcing stability

56

Possible solutions – AggregationAggregation also calls supernetting

Concept of Aggregation: Several networks to supernetwork

57

Possible solutions – Aggregation

Network

1

Network

1Network

3

Network

3

Network

2

Network

2

Network

4

Network

4

58

Possible solutions – Aggregation

Super network

Network

1 Network

3

Network

2

Network

4

59

Possible solutions – Aggregation (cont.)Advantage:

decrease in number of global visible addresses

decrease in number of updates

Problems: No correlation between Service Providers Multi – Homing sites prevent aggregation

60

Statistic

Default – free table size : 45.000 prefixesNumber of updates per day : between 3 to

6 millions99 percent of routing information is

pathologicOnly 10 percent of routers send one or

less WADiff per dayOnly 20 percent of routers send one or

less AADiff per day

61

Statistic – updates per period from 04/96 to 10/96

62

Statistic – updates per week

63

Results

Too many BGP updates exchangedPathological updates are dominatedDaily and weekly cyclic trendsInstability happens to everybodyForwarding instability the main

contributor

64

Statistic – updates from 96 to 98

65

Things that you don’t need to know about Routing Instability Routing Instability is rapid fluctuation

of network reachability and topology information

There are three types of Instability: forwarding instability, policy

fluctuation, pathological updates

66

Things that you don’t need to know about Routing Instability (cont.)

Instability can lead to many unpleasant things such as Increased packet loss Delays in the time for network

convergence Additional resource overhead(memory,

CPU)

67

Things that you don’t need to know about Routing Instability (cont.)

The possible origins of routing instability are Router configuration errors Transient physical and data link

problems Software bugs

68

Things that you don’t need to know about Routing Instability (cont.)

There are several solutions: Route Server Route Dampening Algorithm Aggregation

69

The END