View
213
Download
0
Embed Size (px)
Citation preview
Near-Deterministic Inference of AS Relationships
Udi Weinsberg
A thesis submitted toward the degree of Master of Science in Electrical and Electronic Engineering
Under the guidance of Dr. Yuval Shavitt and Eran Shir.
Introduction
Today's Internet consists of thousands of networks administrated by various Autonomous Systems (AS).Large Provider - AS7018 – AT&TSmall Provider - AS1680 – NetVisionEducational Network – AS378 – ILAN
ASes are assigned with one or more blocks of IP prefixes and communicate routing information to each other using Border Gateway Protocol (BGP).
Type-of-Relationship
ASes use a set of local policies for selecting the best route for each reachable prefix.
These policies are based on the Type-of-Relationship (ToR) that exists between ASes.
ToRs are used to calculate paths between ASes.ToRs are regarded as proprietary information.Deducing them is an important yet difficult
problem
Type-of-Relationship (2)
Three major commercial relationships between neighboring ASes:Customer-to-Provider (C2P)Peer-to-Peer (P2P)Sibling-to-Sibling (S2S)
Customer-to-Provider (C2P)
Customer AS pays a Provider AS for traffic that is sent between the two.
Provider AS is usually larger than the customer.
Provider
Customer Customer
C2P C2P
Peer-to-Peer (P2P)
Two ASes freely exchange traffic between themselves and their customers.
Do not exchange traffic from or to their providers or other peers.
PeerPeer
P2P
Sibling-to-Sibling (S2S)
Two ASes administratively belong to the same organization.
Freely exchange traffic between their providers, customers, peers, or other siblings
SiblingSibling
S2S
Valley Free Routing
BGP paths must comply with the following Valley-Free hierarchical pattern:An uphill segment of zero or more c2p or s2s
links,Followed by zero or one p2p links,Followed by a downhill segment of zero or more
p2c or s2s links.
The Problem
Given the AS graph (ASes as vertices with interconnecting edges), find the type-of-relationship between all adjacent ASes.Inferring ToR = Classifying edges.
??
Provider
CustomerProvider
Customer
Peer
Peer
Related Works
Current relationships inference algorithms use one of two techniques:Using heuristic assumptions
• Comparing AS degree to determine the “larger” AS.
Optimizing some aspects of the ToR assignments
• Minimizing number of paths that are not valley-free• Not allowing cycles in the resulting directed AS graph
The Gap
Using heuristic assumptions throughout the relationships inference process causes the erroneous ToRs to be spread over all interconnecting ASes links.
Optimization models fail to capture the true Internet hierarchy.
Work Goal
Improve on existing methods by providing a near-deterministic inference algorithm for solving the ToR problem.
We use the Internet Core, a sub-graph that consists of the globally top-level providers of the InternetTheir interconnecting edges are already
classified.
Near-Deterministic Inference
Theoretically, given an accurate core with no relationships errors, the algorithm deterministically infers most of the remaining AS relationships using the AS-level paths relative to this coreWithout incurring additional inference errors!
In real-world scenarios, where the core and AS-level paths can contain errors, the algorithm introduces minimal inference errors.
Why Near?-
For the remaining set of relationships that cannot be inferred deterministically, a heuristic inference method is deployed.
This group is relatively small, so it is still possible to provide a strict bound on the inference error.
Algorithm - Definitions
InputS – a set of AS-level routing paths.G(VG,EG) – the set of vertices that represent all
ASes, and the interconnecting edges that need to be classified.
Core(VC,EC) – the vertices and interconnecting edges that represent the core of G, and is assumed to contain all the top-level ASes.
OutputEG – Edges of input graph with votes for ToRs.
Deterministic Algorithm
Prior to starting the relationships inference algorithm, we infer S2S relationships.
We use S2S data collected from CAIDAObtained from IRR databases (RIPE, ARIN,
APNIC).
Pre-processing
Deterministic Algorithm
Assuming that the input core consists of the global top-level ASes. Use the valley-free model of Internet
routing. All paths that pass through the core
are split into three segments: A segment of zero or more uphill C2P
edges towards the core, At most one P2P edge in the core, A downhill segment of zero or more
P2C edges from the core.
Phase 1
Code
Deterministic Algorithm
Paths that do not traverse the core, fail to provide us with a direct method for classification. There are paths that partly overlap other
paths that traverse the core. For each of the remaining paths:
Edges that precede a C2P edge must reside in an uphill segment, and be of type C2P.
Edges that follow a P2C edge must be in a downhill segment, and be of type P2C.
Phase 2
Code
C2P
C2P
Deterministic Algorithm
The data we use might be noisy and reflect transient routing effects. Especially when performing relationships inference over a
long time frame. To avoid incorrect inferences resulting from these
effects, we use voting technique: The above methods vote for the ToR of each traversed
edge. Once the algorithm is finished, we count the votes and
assign each edge with the type that received a relative votes count that passes a given threshold.
Voting
Code
Graph
Non-Deterministic Algorithm
The deterministic algorithm fails to classify several types of edges.
We use heuristic assumptions to classify these edges.
Non-Deterministic Algorithm
Edges that appear in paths that do not traverse the core, and reside between a c2p edge and a p2c edge.A c2p or p2c edges should
participate in, at least, one path that pass through the core.
The path may have a p2p relationship between its two top-level vertices
P2P
Peers
Non-Deterministic Algorithm
Edges that have a similar number of votes for two or more types of relationships:The result of changes in the commercial
relationship over the measurements period.More complex peering agreements that can
cause the same edge to behave differently as seen from different view points in the Internet.
• Internet Exchange Points.
Compare AS degrees to resolve ambiguities.
Voting Ties
Non-Deterministic Algorithm
Edges might appear in non-valley-free paths.Result of valid paths that pass a
malformed core,Or invalid paths that pass an
accurate core. These invalid paths occur in only
a small fraction of paths less than 1% on average from
the investigated paths per week.
Valleys
Core Graph Construction
We use three core construction methods, that result in cores that vary in size and density:Greedy Max CliqueKmax-CoreCAIDA Peers
Core Graph Construction (1)
Greedy Max CliqueTauro et at. proposed the Jellyfish model.The core is a clique of high-degree vertices.The first vertex in the core is the one with the
highest degree.Sorting vertices in non-increasing degree order.A vertex is added to the vertex only if it forms a
clique with the vertices already in the core. The resulting core is a clique but not necessarily
the maximal clique of the graph.
Core Graph Construction (2)
Kmax-Core (kCore)Carmi et at. proposed the Medusa model. Use a k-pruning algorithm to decompose the
Internet AS graph and extract a nucleus • The Kmax-Core, which is a very well connected globally
distributed subgraph.This algorithm extracts a core by looking at the
entire graph (global approach).The nucleus plays a critical role in BGP routing,
since its vertices lie in a large fraction of the paths that connect different ASes.
Core Graph Construction (4)
CAIDA PeersConstructed from ASes and edges that exhibit
P2P relationship under the inference method of Dimitropoulos et al.
Used the Automated AS ranking provided by CAIDA and constructed a graph that contains all the edges classified as P2P.
Selected the largest connected component that contains some of the largest tier-1 ASes.
Algorithm – Recall in brief
Construct AS-level graph and extract the Core. Classify all edges in paths relative to the core:
Uphill to the core. Downhill from the core.
Classify all edges in remaining paths, that now have some classified edges.
Count votes to decide on types. Classify remaining paths using heuristics:
Single edge between P2C and C2P is probably a P2P Break voting ties using AS degree.
Data Sources
Combined data from RouteViews and DIMESMaximize the size and density of the topology.RouteViews collects BGP advertisements using
several routers.DIMES performs ~2 million daily active
traceroute measurements from hundreds of Agents.
The raw DIMES data was filtered in order to reduce inference mistakes. Filtering
Data SourcesTopology
On a weekly average, we filtered approximately 5,100 DIMES edges that were measured only once, which is over 15% of the edges measured by DIMES. Around half of these edges appear in RouteViews.
Sensitivity AnalysisCore Construction
The smallest GMC core results in the lowest deterministic inference percentage while the largest CAIDA Peers core have the highest percentage.
Sensitivity Analysis
kCore provides an excellent overall inference percentage.Over 95% deterministically inferred and around
75% matching CAIDA). CAIDA Peers core seems to result in the best
overall performanceHowever, almost all 6,000 edges marked as P2P
in CAIDA are in connected.This is very unlikely to be the case, and causes a
bias.
Core Construction
Comparing Cores
Size Sensitivity AnalysisRobustness to Core Size
For more than 20 vertices in the core the algorithm classification success and similarity to CAIDA do not significantly change, while the number of deterministically classified edges increases.
Size Sensitivity AnalysisNon-Valley-Free paths
The increase in the number of deterministically classified edges comes with an increase in the percentage of non-valley-free paths.
Time Sensitivity AnalysisIncreasing Time Frame
Using data from a single week results in over 90% of the edges being classified for all core types.
Time Sensitivity AnalysisMatching CAIDA
At any time frame, the algorithms agree on over 92% of the edges.
Mistake Sensitivity AnalysisHeuristically Classified Edges
While the algorithm's performance decreases as we increase the randomness of the core, the overall degradation is not as high as one would expect.
Mistake Sensitivity AnalysisType of Heuristically Classified Edges
As more errors are injected, the algorithm needs to use more heuristics.
P2P AnalysisValidating the DIMES Promise
While on average the p2p relationships comprise 4-5% of the total number of edges, it goes up to around 12% of the edges that appear only in DIMES. Approximately 40% of the p2p edges inferred by our algorithm, do not appear in the RouteViews.
Conclusion (1)
The common weakness of previously proposed AS relationships inference algorithms is their lack of guarantee on inference errors introduced during the process.
This work improves on existing methods by providing a near-deterministic algorithm that, given a classified error-free input core, does not introduce additional inference errors.
Conclusion (2)
The proposed algorithm provides accurate inferences Robust under changes in the core's size and creation
technique. A core containing as little as 20 almost fully-connected
ASes is sufficient for good inference results. Heuristic methods can still play an important role in
inferring the remaining relationships. Using single week’s data, the algorithm runs for only
about 2 hours and yields over 95% deterministically inferred relationships.
Voting ThresholdValidation
On average, over 94% of the edges have votes for exactly one relationship type, and almost 99% of the edges have over 80% of the votes for a single relationship type.
Data Sources
The raw DIMES data was filtered in order to reduce inference mistakes and inclusion of false links:Included edges that were seen from at least two
agents.Trimmed all traces that exhibit known traceroute
problems.• Routing loops• Destination impersonation
Filtering
Internet Exchange Points
An Internet exchange point (IXP) is a physical infrastructure that allows different ASes to exchange traffic between themEdges connecting an IXP to
adjacent ASes can exhibit different ToR depending on the AS-level path they participate in.
Definition
Internet Exchange Points
We identify edges between ASes and IXPs and create corresponding virtual edgesA virtual edge is an edge that connects two
ASes that have indirect peering via an IXP. The algorithm then infers relationships in the
paths using these virtual edgesInstead of using the original edges between the
ASes and IXPs.
Analysis