Upload
zarita
View
38
Download
0
Embed Size (px)
DESCRIPTION
Inter-Domain Traffic Engineering. Principles, Applications and Case Studies. Who We Are. Josh Wepman Applications Engineer/Snake Oil Salesman Ixia NetOps [email protected] Joe Abley Toolmaker/Engineer/Token Canadian MFN PAIX [email protected]. What We Are Talking About. - PowerPoint PPT Presentation
Citation preview
Inter-Domain Traffic Engineering
Principles, Applications and Case Studies
Who We Are
Josh Wepman Applications Engineer/Snake Oil Salesman Ixia NetOps [email protected]
Joe Abley Toolmaker/Engineer/Token Canadian MFN PAIX [email protected]
What We Are Talking About
Inter-domain Measurement, Analysis and Control
Improving Connectivity With whom? Where? At what speed?
What we are NOT talking about
MPLS DiffServ RSVP CR-LDP All sorts of other words with lots of
capital letters that have become associated with “traffic engineering…”
Goals For The Afternoon
Methods and Concepts on how to "improve" inter-domain connectivity Depending on who YOU are, "improve" will have
different meanings
Finding ways to reduce impact of failure in peer or transit networks a.k.a. "increasing reliability“
WARNING: Some operational complexity may arise! Put on your peril-sensitive glasses...
Presentation Outline
Inter-Domain TE Goals Definition Inter-domain TE Measurement Applying Data to Address Your Goals Eliciting Control and the Feedback-Loop Conceptual Examples Who is Doing This Stuff? Real_Live_Network Examples No Questions? Good!
Inter-Domain TE Goals Definition
Iteration-1 – Conceptual
Define Goals, Measure, Analyze, Refine Goals, Action
What is it you need to accomplish?
Examples of Goals
Need to offload my "NSFnet" peering links outbound (congestion management)
Need to expand my inter-domain peering links cluefully (growth)
Need to find some people to provide my services to (sales) That's right, I said it…sell stuff!!!
Adjusting Your Assumptions
Be prepared to adjust your assumptions based on measured data!
What you planned to do, and what you end up doing may change substantially.
Do not fear - this is real network data! Clue should increase as valid network data
becomes available and consulted
Data Needs…
What data sets are required? Flow-export data
BGP routing data
Active measurement data
SNMP
Some public tools available (cflowd, zebra, ping, scotty, etc)
Some commercial products available…
Inter-domain TE Measurement
Also Known As:Getting good, problem/goal specific data!
Assumed Network Model
Hierarchical Network Model
Ingress/Egress Network services are separated from Transit Services
Works in other network models (as we will show), but this is what we are focusing on...
Hierarchical Network Model
Core1 Core2
Peer1 Peer2
AS2 AS3 AS3 AS4
Core Network Services
AS9
LocalASN
RemoteASN
Types of Data to Measure
Routing Data Focus here is BGP
Traffic Data Flow-export V5 is the focus here
Active Measurement Performance Data Ping/Traceroute/One-way delay/Jitter
Routing Data
Routers generally do this well
Core competency by design (Routers route...)
Different data sets are available for measurement
IBGP (Good if you are looking at the whole system, looking outbound or using a flat network model)
Route-Reflection (Often needed for inbound analysis, can create some complexity in flat netowrk models)
EBGP (Good for seeing your neighbor's view of you)
Choose the right one to measure based on your needs/goals
Routing Data – In/Outbound
Core1 Core2
Peer1 Peer2
AS2 AS3 AS3 AS4
Core Network Services
AS9
LocalASN
RemoteASN
Collector
Routes
Data
IBGP vs.Route-Reflection
Routing Data – In/Outbound
When your goal is outbound characterization, and your measurement point is the exit point for traffic, IBGP is your guy/girl/other. Routes are always external, and thus always
propagated (sans election and policy of course) “Protocols hate being anthropomorphized”
When your goal is inbound characterization, and your measurement point is the entry point for traffic, Route-Reflection must be used. Only way to get internal routes “cleanly”
Route Data – Full Mesh (tangent)
Value of full mesh monitoring… Historical route tracking Policy benchmarking Tracking med-selection issue Identifying disasters the FIRST time cluefully
Don’t just wait for it to happen again! PLEASE! For everyone’s sake!
Slightly off topic, but pretty darn important!
Route Data – Full Mesh (pic)
Core2
Core1
Core2
Core1Core2 Core1
Core2
Core1
Core2
Core1Core1 Core2
Collector
Traffic Accounting Data
Also Known As: Flow-export NetFlow Cflow A MAJOR pain in the AS!
The Quick Skinny on Flow
Packet and Byte counters per unique set of traffic attributes
Measured from strategic routers per input interface
Which interfaces depends on your defined goals/needs...
Come a long way in the last few years In some respects…
Flow Data Inbound - Easy
Core1 Core2
Peer1 Peer2
AS2 AS3 AS3 AS4
Core Network Services
AS9
LocalASN
RemoteASN
Collector
Routes
Data
Flow Data Outbound - Easy
Core1 Core2
Peer1 Peer2
AS2 AS3 AS3 AS4
Core Network Services
AS9
LocalASN
RemoteASN
Collector
Routes
Data
Flow Data Outbound - Harder
Core
Core
Core Core
CoreAS6
AS2 AS4
AS3
Flow Data Outbound - Harder
Since flow-export data is inbound only, all potential feeder links in a non-hierarchical, mixed services device must be accounted for in order to catch all traffic outbound
Issue: How do you know what data coming in core link4 is bound for the local external link? Route Reflection is bad here! Can double-count!
Problem exacerbated by complex policy
18 Words or less on flow data
Micro-management of networks based on flows == BAD
Macro-management of networks based on flows == GOOD
Operational Challenges (1)
Keep this in mind!
Gilb’s Law: “Anything can be measured in a way that is
superior to not measuring it at all.”
Operational Challenges (2)
ACLs vs. data-export in the great beast! Sampled NetFlow on the GSR is usually
distributed to the LCs ACL > SNF > PIRC > IP Coloring >
BGP Policy accounting > FR Traffic policing which is not FR traffic shaping
Apparently this changes in 12.0(18)S
Operational Challenges (3)
Some releases of JUNOS have bugs where only flow data from the highest-numbered ifIndex gets exported
Check for PR20159
Operational Challenges (4)
On high-speed interfaces, the best you can realistically do is sample at some ratio < 1:1 If you need to count bytes, this will introduce
errors If you need to compare samples, make sure
the samples are normalized This does NOT mean multiply by interval!
Lack of current research on statistical validity of flow data based on samples Last research circa 1993 Research predates substantial HTTP traffic
Operational Challenges (5)
The Gilb-Wepman Construct: “The total P.I.T.A. factor experienced through
the process of network measurement is far less than the total P.I.T.A factor experienced through planning and engineering a network without network measurements.”
P.I.T.A = Pain In The Ass those without customers may be unfamiliar with
this term
Performance Data
Active measurement Round-trip vs. one-way
mrtg and link utilization
Important, but not part of our examples Short on time sadly…
Helps in goal selection and re-selection Bottom line – is it better or worse?
Applying Data to your Goals
What to do with all this data?
Traffic Accounting Data applied to Routing data?
Traffic Load per <something> attribute or route The focus here is on traffic stats (byte and packet
rates) per AS-PATH
AS-PATH / Traffic-data tables
Traffic load per AS-PATH creates a tree of traffic relationships (101) X-bits/sec (101,1234) Y-bits/sec (101,1234,9995) Z-bits/sec 101 -> 1234 -> 9995
X+Y+Z -> Y+Z -> Z Addresses the middle mile AS’s instead of
traditional first or last ASN. Allows "TO“ (source/sink) and "THROUGH“
(transit) values instead of just "TO" values.
Data Aggregation - Time
Aggregate data over timeframes (macro-level view) Long term averages Short term benchmarks
Of course, short term means “~long term”. Micro-management of networks based on flows
BAD!
Data Aggregation - Interfaces
Aggregate across the set of interfaces that represent your problem statement
What interfaces am I interested in? Can be interface specific (one) Can be router specific (many) Can be domain wide (all) Can be N of M interfaces (some)
Pretty common…
What to do with all this?
What does one do once they have all this data?
Eliciting Control and The Feedback Loop
Sit down, Josh Begone with your Snake Oil It’s time to beat on some routers
Assumptions about your Routing Architecture
Routes to external networks are in BGP Your IGP tells you how to find the NEXT_HOP
addresses in BGP We select exit points for traffic based on BGP
path selection, not some other weird thing If your routing policy differs significantly from
this, you have more problems than measurement can solve
Fixing Outbound Traffic
Mark policy on BGP routes at the place where you learn them General policy -- prefer peering links over
expensive transit links, prefer private peering links over public peering links
Specific policy -- temporarily avoid NAP X for traffic to AS Y, prefer AS C to reach remote network D
Tweakable Knobs
LOCAL_PREF MED AS_PATH Check your vendor’s BGP path selection
tiebreaker list, and chose a set of knobs that gives you the kind of control your policy dictates
Control of Outbound Traffic
Danger, Will Robinson! Helpdesk phone may ring Small change, pause, check, log, pause,
breathe, repeat Exit selection is a reasonably precise
science
Fixing Inbound Traffic
Controlling inbound traffic flow is all about trying to influence the BGP path selection decisions which happens in networks you don’t control
Some of those networks you pay money to. Money is sometimes an appropriate weapon
It’s nice to buy people drinks at NANOG
Tweakable Knobs
Provider-specific knobs whois -h whois.ra.net as1755
CIDR abuse Cheap trick Longest prefix wins
AS_PATH stuffing AS_PATH pollution
Another cheap trick
Responsible Citizenship
Some tweakable knobs have an unwelcome impact on the networks of others Have you met my friend, MED?
Your relationship with your target networks is symbiotic
It is inappropriate to make demands of someone else’s routing policy, but asking nicely is OK
Conceptual Examples (1)
Who are the top consumers of my network resources? Top sources of traffic Top sinks of traffic Asymmetry
Conceptual Examples (2)
Traffic Aggregation Points and Peering Optimisation Appropriate network expansion Offloading the expensive peer
Mitigating settlement fees and traffic ratios Mitigating congestion
Do it without MED selection issues Maximize route availibility (N>1 copies, not 1 or 0)
Conceptual Examples (3)
Theft-over-IP (how to know when peers are stealing from you) Peers dumping traffic at you for routes you
didn’t send them Rather rude Catch them in the act
Who is doing this stuff?
Yahoo! - Jeffrey Papen (TUNDRA Tool) Peering Analysis, Capacity Planning, Performance
Analysis Features:
Custom macros for AS analysis: Source and Destination AS bandwidth details Transit AS (hop counts) bandwidth summary data Bandwidth forecasting; peering merit analysis Billing formulas for cost/benefit budget analysis
Also: Analyze internal usage for Charge Back Billing POP-to-POP Network Performance Analysis (latency / loss) DOS attack detection
Destination vs. Transit Traffic – UUNet (Yahoo – TUNDRA Output)
Who is doing this stuff?
MFN Lots of people, we think Not enough people, we think
Real Live Network Examples 1
We peer with a particular large regional ISP in several places. Due to various familiar reasons, the demands on the peering circuits approach supply
Who are the top talkers and top listeners that we reach via this peer?
Maybe we can peer with them directly Not just sinks, but traffic aggregation
points (middle mile)
Network Facts
Topology is not pure core/edge in some locations, so we might expect some complexities
All peering routers happen to be GSR12000s
Peering circuits are all OC12 Backbone links are mostly OC48
Data Collection
Relative traffic volumes Low NetFlow sample ratio is OK
Turning on “ip route-cache flow sampled” seems like it can cause traffic belches
Turn off all inbound ACLs on peering interfaces
Turn off all outbound ACLs on peering routers Drink from the Hose Take off every /var
Analysis of Data
Relative byte count through and to networks reached through the peer in question
Ranked list of peering candidates Absolute numbers don’t really matter; we
have a list of people we should be talking to, in order of how useful they would be to peer with
SeeASP Output
Facets:TimeInterval : 12/4/01 11:03:59.55 - 12/6/01 13:40:10.02 EST
RouterIpv4Addr : 63.136.120.65RouterAS : 3549
RouterName : DiamondJoeAS P ppsThru bpsThru ppsTo bpsTo ppsTotal bpsTotal
----- - ---------- ------------ ---------- ------------ ---------- ------------3561 P 1.05 933.57 74.34 64.633K 75.39 65.567K701 P 4.63 2.401K 35.21 14.653K 39.84 17.054K209 P 0.82 7.324K 0 1.36 0.82 7.325K
3967 P 0.6 297.19 11.3 5.694K 11.91 5.991K6461 P 0 3.1 11.51 4.790K 11.51 4.793K8112 - 0 0 0.57 4.699K 0.57 4.699K
19262 - 0.57 4.699K 0 0 0.57 4.699K7018 P 8.44 4.244K 0 1.26 8.44 4.246K
1 P 8.6 3.576K 0 1.56 8.6 3.578K87 - 0 0 8.16 3.396K 8.16 3.396K
286 - 0.24 2.621K 0 0.05 0.24 2.621K2603 - 0.24 2.620K 0 0.24 0.24 2.620K1653 - 0 0.05 0.24 2.619K 0.24 2.620K
10764 - 0 0 5.36 2.230K 5.36 2.230K703 - 0 1.25 4.36 1.815K 4.37 1.816K
7660 - 0 0 3.23 1.344K 3.23 1.344K3549 - 0 0 2.75 1.306K 2.75 1.306K
14265 - 0 0 1.05 934 1.05 934
Real Live Network Examples 2
AS R wants to peer That’s fine, we’ll public peer with
anybody. We’re easy. AS R wants to private peer right away,
since they say we send them 140M of traffic already
Can we confirm those numbers before we dedicate a port to them?
Network Facts
We currently reach AS R through AS T We peer with AS T in six places One of the peering routers is a 7500,
which doesn’t do SNF One of the peering routers is a router
which is also being used to collect data to answer the previous question
More Network Facts
Topology is not edge/core everywhere We want numbers out of this, so we
need to manage the SNF ratios K1dd13s keep attacking the routers
Ops folk attack K1dd13s with ACLs The ACL attacks the SNF The SNF dies!
Analysis
We only have traffic samples, but we want absolute numbers
We have interface byte and packet counters
We can take AS R traffic as a proportion of all AS T traffic, and divide up the mrtg/duck data in proportion
Summary
What did we talk about? Answering specific, ad-hoc questions by attacking
them with numbers Inter-Domain Traffic Engineering is an Iterative
process (lather, rinse, repeat) What didn’t we talk about?
Experience exporting from Juniper (and other non-cisco) routers
Construction of a full-time, general-purpose measurement infrastructure
What if my vendor does not support flow-export and traffic accounting?
Questions? No? Good.