A survey of Internet infrastructure reliability Presented by Kundan Singh PhD candidacy exam May 2, 2003

A survey of Internet A survey of Internet infrastructure infrastructure reliabilityreliability

Presented by Kundan SinghPhD candidacy exam

May 2, 2003

2

AgendaAgenda

Introduction Routing problems

Route oscillations, slow convergence, scaling, configuration

Reliability via DNS, transport, application

Effect on VoIP

http://www.cs.columbia.edu/~kns10/research/readings/

3

Overview of Internet Overview of Internet routingrouting

AT&T (inter-national provider) Regional provider

MCI

Regional provider

Campus

Campus

OSPF(optimize path)

BGP(policy based)

Autonomous systems

Cable modem provider

4

Border gateway protocol Border gateway protocol TCP

OPEN, UPDATE, KEEPALIVE, NOTIFICATION

Hierarchical peering relationship Export

all routes to customers only customer and local routes

to peers and providers Path-vector Optimal AS path satisfying

policy

1 2

543

6 7Provider Customer

Peer Peer

Backup

d: 31247e: 3125. . .

d

d: 1247

d: 247

d: 47

d: 247

0

[1] A border gateway protocol (BGP-4), RFC 1771

5

Route selectionRoute selection Local AS preference AS path length Multi-exit

discriminator (MED) Prefer external-BGP

over internal-BGP Use internal routing

metrics (e.g., OSPF) Use identifier as last

tie breaker

AS1

AS3

AS2

AS4

B1

B2

B3

B4R1

R2

C1C2

6

Route oscillationRoute oscillation

Each AS policy independent

Persistent vs transient Not if distance based Solution:

Static graph analysis Policy guidelines Dynamic “flap” damping

0

1 2

[2] Persistent route oscillations in inter-domain routing

7

Static analysisStatic analysis

Abstract models: Solvable? Resilience on link failure? Multiple solutions? Sometimes solvable?

Does not work NP complete Relies on Internet routing registries

[7] An analysis of BGP convergence property

8

Policy guidelinesPolicy guidelines

MUST Prefer customer over peer/provider Have lowest preference for backup

path “avoidance level” increases as path

traverses

Works even on failure and consistent with current practice

Limits the policy usage[3] Stable internet routing without global co-ordination[4] Inherently safe backup routing with BGP

9

Convergence in intra-Convergence in intra-domaindomain IS-IS – millisecond

convergence Detect change

(hardware, keep-alive) Improved incremental

SPF Link “down”

immediate, “up” delayed

Propagate update before calculate SPF

Keep-alive before data packets

Detect duplicate updates

OSPF stability Sub-second keep-

alive Randomization Multiple failures Loss resilience

Distance vector Count to infinity

[5] Towards milli-second IGP convergence[6] Stability issues in OSPF routing

10

BGP convergenceBGP convergence

0

12

R

( R, 1R, 2R)

(0R, 1R, R) (0R, R, 2R)

[7] An analysis of BGP convergence properties[8] Experimental study of delayed internet routing convergence

11


0

12

R

( - , 1R, 2R)

(0R, 1R, - ) (0R, - , 2R)

0->1: 01R0->2: 01R

1->0: 10R1->2: 10R

2->0: 20R2->1: 20R

12


0

12

R

( - , 1R, 2R)

(01R,1R, - ) ( - , - , 2R)

1->0: 10R1->2: 10R1->0: 12R1->2: 12R

2->0: 20R2->1: 20R2->0: 21R2->1: 21R

01R01R

13


0

12

R

( - , - , 2R)

(01R,10R, - ) ( - , - , 2R)

1->0: 12R1->2: 12R

2->0: 20R2->1: 20R2->0: 21R2->1: 21R2->0: 201R2->1: 201R

10R

10R

0->1: W0->2: W

14

BGP convergenceBGP convergence MinRouteAdver

To announcements

In 13 steps Sender side loop

detection One step

0

12

R

( - , - , - )

( - , - , - ) ( - , - , - )

After 48 steps

15

BGP convergence [2]BGP convergence [2] Latency due to path exploration

Fail-over latency = 30 n Where n = longest backup path length

Within 3min, some oscillations up to 15 min

Loss and delay during convergence “up” converges faster than “down” Verified using experiment

[8] An experimental study of delayed internet routing convergence[9] The impact of internet policy and topology on delayed routing convergence

16

BGP convergence [3]BGP convergence [3]

Path exploration => latency More dense peering => more

latency Large providers, better

convergence Most error path due to

misconfiguration or software bugs

[9] The impact of internet policy and topology on delayed routing convergence

17

BGP convergence [4]BGP convergence [4] Route flap damping

To avoid excessive flaps, penalize updated routes

Penalty decays exponentially. “suppression” and “reuse” threshold

Worsens convergence Selective damping

Do not penalize if path length keeps increasing

Attach a preference with route

[10] Route flap damping exacerbates Internet routing convergence,

18

BGP convergence [5]BGP convergence [5]

12R and 235R are inconsistent. Prefer directly learnt 235R

Order of magnitude improvement

Distinguish failure with policy change

1

20 R

3 5

12R

235R2R

[11] Improving BGP convergence through consistency assertions

19

BGP scalingBGP scaling

Full mesh logical connection within an AS

Add hierarchy

20

BGP scaling [2]BGP scaling [2] Route reflector

More popular Upgrade only

RR Confederations

Sub-divide AS Less updates,

sessions

[12] A comparison of scaling techniques for BGP

21

BGP scaling [3]BGP scaling [3] May have loop If signaling path is not

forwarding path Persistent oscillations

possible Modify to pass multiple

route information within an AS

RR

C2

RR

C1

Q

P

Signaling path

Choose Q Choose P

Logical BGP session

Physical link

[13] On the correctness of IBGP configuration[14] Route oscillations in I-BGP with route reflections

22

BGP stabilityBGP stability Initial experiment (’96)

99% redundant updates <= implementation or configuration bug After bug fixes (97-98)

Well distributed across AS and prefix

[15] Internet routing instabilities[16] Experimental study of Internet stability and wide-area backbone failures

23

BGP stability [2]BGP stability [2] Inter-domain experiment (’98)

9 months, 9GB, 55000 routes, 3 ISP, 15 min filtering 25-35% routes are 99.99% available 10% of routes less that 95% available

[16] Experimental study of Internet stability and wide-area backbone failures

24

BGP stability [3]BGP stability [3] Failure

More than 50% have MTTF > 15 days, 75% failed in 30 days

Most fail-over/re-route within 2-days (increased since ’94)

Repair 40% route failure repaired in < 10min, 60% in

30min Small fraction of routes affect majority of

instability Weekly/daily frequency => congestion

possible[16] Experimental study of Internet stability and wide-area backbone failures[24] End-to-end routing behavior in the Internet

25

BGP stability [4]BGP stability [4] Backbone routers

Interface MTTF 40 days 80% failures resolved in 2 hr Maintenance, power and PSTN are major

cause for outages (approx 16% each) Overall uptime of 99%

Popular destinations Quite robust Average duration is less than 20s => due to

convergence

[16] Experimental study of Internet stability and wide-area backbone failures[17] BGP routing stability of popular destinations

26

BGP under stressBGP under stress Congestion

Prioritize routing control messages over data

Routing table size AS count, prefix

length, multi-home, NAT

Effects: Number of updates; convergence

Configuration, no universal filter

Real routers “malloc” failure Cascading effect Prefix limiting option Graceful restart

CodeRed/Nimda Quite robust Some features get activated

during stress Cascading failures

[18] Routing Stability in Congested Networks: Experimentation and Analysis[19] Analyzing the Internet BGP routing table[20] An empirical study of router response to large BGP routing table load[21] Observation and analysis of BGP behavior under stress[22] Network Resilience: Exploring Cascading Failures within BGP

27

BGP misconfigurationBGP misconfiguration Failure to summarize, hijack, advertise

internal prefix, or policy. 200-1200 prefix each day ¾ of new advertisement as a result 4% prefix affect connectivity

Cause Initialization bug (22%), reliance on upstream

filtering (14%), from IGP (32%) Bad ACL (34%), prefix based (8%)

Conclusion user interface, authentication, consistency

verification, transaction semantics for command

[23] Understanding BGP misconfiguration

28

Reactive routingReactive routing Resilient overlay network

Detect failure (outages, loss) and reroute Application control of metric, expressive

policy Scalability suffers

Failure often, everywhere 90% of failure last 15min, 70% less than

5min, median is just over 3min Many near edge, inside AS Helps in case of multi-homing Failures in core more related with BGP

[26] Resilient overlay networks[27] Measuring the effect of Internet path faults on reactive routing

29

Reliable multicastReliable multicast Reliable, sequenced, loosely

synchronized Existing TCP

ACK aggregation Local recovery possible

Performance Linux-2.0.x

BSD packet filter, IP firewall and raw socket

[30] IRMA: A reliable multicast architecture for the Internet

30

Transport layer fail-overTransport layer fail-over

Server fail-over Front-end bottleneck or forge IP

address Migrate TCP

Works for static data (http pages) Needs application stream mapping Implemented in Apache 1.3 Huge overhead for short service

[29] Fine grained failover using connection migration

31

DNS performanceDNS performance Low TTL

Latency grows by 2 orders Client and local name server may be

distant Embedded object

23% no answer, 13% failure answer 27% sent to root server failed

TTL as low as 10min Share DNS cache by < 10-20 clients

[33] On the effectiveness of DNS-based server selection[32] DNS performance and effectiveness of caching

32

DNS replicationDNS replication

Replicate entire DNS in distributed servers

[31] A replicated architecture for the domain name system

Network

RNS RNS

RNSRNS

AS AS

AS

AS

AS

ASAS

AS

AS

AS

AS

33

Reliable server poolingReliable server pooling

[34] Architecture for reliable server pooling[35] Requirements for reliable server pooling[36] Comparison of protocols for reliable server pooling

34

PSTN failuresPSTN failures Switch vendors aim for 99.999%

availability Network availability varies

(domestic US calls > 99.9%) Study in ‘97

Overload caused 44% customer-minutes

Mostly short outages Human error caused 50% outages Software only 14%

No convergence problem

[37] Sources of failures in PSTN

35

VoIPVoIP Backbone links

underutilized Tier-1 backbone (Sprint)

have good delay, loss characteristics.

Average scattered loss .19% (mostly single packet loss, use FEC)

99.9% probes have <33ms delay

Most burst loss due to routing problem

Mean opinion score: 4.34 out of 5

Customer sites have more problems

Internet backbone Can be provided by

some ISP But many lead to poor

performance Adaptive delay is

needed for bad paths Mostly due to reliability

and router operation, not traffic load

Choice of audio codec

[28] Understanding traffic dynamics at a backbone POP[39] Impact of link failures on VoIP performance[38] Assessing the quality of voice communications over Internet backbones

36

VoIP [2]VoIP [2] Prevalent but not persistent path Very asymmetric loss; bursty Outages = more than 300ms loss More than 23% losses are outages Outages are similar for different

networks Call abortion due to poor quality Net availability = 98%

[25] Measurement and interpretation of Internet packet loss[41] Assessment of VoIP service availability in the current Internet

37

Future workFuture work End system and higher layer

protocol reliability and availability Mechanism to reduce effect of

outages in VoIP Redundancy of VoIP systems during

outages Convergence and scaling of TRIP,

which is similar to BGP Scaling (DNS) + Reliable (server

pool)

Documents

A survey of Internet infrastructure reliability Presented by Kundan Singh PhD candidacy exam May 2, 2003