Upload
audra-hill
View
216
Download
1
Embed Size (px)
Citation preview
1
Hybrid network traffic engineering system
(HNTES)Zhenzhen Yan, M. Veeraraghavan, Chris TracyUniversity of Virginia ESnet
June 23, 2011
Please send feedback/comments to:[email protected], [email protected], [email protected]
This work was carried out as part of a sponsored research project from the US DOE ASCR program office on grant DE-SC002350
2
Outline
• Problem statement• Solution approach
– HNTES 1.0 and HNTES 2.0 (ongoing)
• ESnet-UVA collaborative work• Future work: HNTES 3.0 and integrated
network
Project web site: http://www.ece.virginia.edu/mv/research/DOE09/index.html
Problem statement
• Hybrid network is one that supports both IP-routed and circuit services on:– Separate networks as in ESnet4, or– An integrated network
• A hybrid network traffic engineering system (HNTES) is one that moves data flows between these two services as needed– engineers the traffic to use the service type
appropriate to the traffic type
3
Two reasons for using circuits
1. Offer scientists rate-guaranteed connectivity– necessary for low-latency/low-jitter applications such
as remote instrument control– provides low-variance throughput for file transfers
2. Isolate science flows from general-purpose flows
4
ReasonCircuit scope
Rate-guaranteed connections
Science flow isolation
End-to-end(inter-domain)
Per provider (intra-domain)
Role of HNTES
• HNTES is a network management system and if proven, it would be deployed in networks that offer IP-routed and circuit services
5
6
Outline
• Problem statement Solution approach
– Tasks executed by HNTES– HNTES architecture– HNTES 1.0 vs. HNTES 2.0– HNTES 2.0 details
• ESnet-UVA collaborative work• Future work: HNTES 3.0 and integrated
network
Three tasks executed by HNTES
7
online: upon flow arrival
1.
2.
3.
HNTES architecture
8
1. Offline flow analysis and populate MFDB
2. RCIM reads MFDB and programs routers to port mirror packets from MFDB flows
3. Router mirrors packets to FMM
4. FMM asks IDICM to initiate circuit setup as soon as it receives packets from the router corresponding to one of the MFDB flows
5. IDCIM communicates with IDC, which sets up circuit and PBR for flow redirection to newly established circuit
HNTES 1.0
Heavy-hitter flows
• Dimensions– size (bytes): elephant and mice– rate: cheetah and snail– duration: tortoise and dragonfly– burstiness: porcupine and stingray
9
Kun-chan Lan and John Heidemann, A measurement study of correlations of Internet flow characteristics. ACM Comput. Netw. 50, 1 (January 2006), 46-62.
HNTES 1.0 vs. HNTES 2.0
10
HNTES 1.0(tested on ANI testbed)
HNTES 2.0
Dimension of heavy-hitter flow
Duration Size
Circuit granularity Circuit for each flow Circuit carries multiple flows
Heavy hitter flow identification
Online Offline
Circuit provisioning Online Offline
Flow redirection (PBRconfiguration)
Online Offline
IDC circuit setup delay is about 1 minute
Can use circuits only forlong-DURATION flows
Focus: DYNAMIC (or online) circuit setup
HNTES 1.0 logic
Rationale for HNTES 2.0
• Why the change in focus?– Size is the dominant dimension of heavy-
hitter flows in ESnet– Large sized (elephant) flows have negative
impact on mice flows and jitter-sensitive real-time audio/video flows
– Do not need to assign individual circuits for elephant flows
– Flow monitoring module impractical if all data packets from heavy-hitter flows are mirrored to HNTES
11
HNTES 2.0 solution
• Task 1: offline algorithm for elephant flow identification - add/delete flows from MFDB
• Nightly analysis of MFDB for new flows (also offline)– Task 2: IDCIM initiates provisioning of rate-unlimited
static MPLS LSPs for new flows if needed– Task 3: RCIM configures PBR in routers for new flows
• HNTES 2.0 does not use FMM
12
MFDB: Monitored Flow Data BaseIDCIM: IDC Interface ModuleRCIM: Router Control Interface ModuleFMM: Flow Monitoring Module
HNTES 2.0: use rate-unlimited static MPLS LSPs
• With rate-limited LSPs: If the PNNL router needs to send elephant flows to 50 other ESnet routers, the 10 GigE interface has to be shared among 50 LSPs
• A low per-LSP rate will decrease elephant flow file transfer throughput• With rate-unlimited LSPs, science flows enjoy full interface bandwidth• Given the low rate of arrival of science flows, probability of two elephant
flows simultaneously sharing link resources, though non-zero, is small. Even when this happens, theoretically, they should each receive a fair share
• No micromanagement of circuits per elephant flow• Rate-unlimited virtual circuits feasible with MPLS technology• Removes need to estimate circuit rate and duration
13
PNNL-locatedESnet PE router
PNWG-cr1ESnet core router
10 GigE LSP 50 to site PE router
LSP 1 to site PE router
HNTES 2.0 Monitored flow database (MFDBv2)
14
Row number
Source IP address
Destination IP address
Is the source a data door?
Is the destination a data door?
Day 1
Day 2 .... Day 30
(total transfer size; if one day the total transfer size between this node pair is < 1GB, list 0)
1 0 or 1 0 or 12
Row number Ingress Router ID Egress Router ID
Row number
Source IP address
Destination IP address
Ingress Router ID
Egress Router ID
Circuit number
12
Identified elephant flows table
Existing circuits table
Flow analysis table
HNTES 2.0 Task 1Flow analysis table
• Definition of “flow”: source/destination IP address pair (ports not used)
• Add sizes for a flow from all flow records in say one day
• Add flows with total size > threshold (e.g. 1GB) to flow analysis table
• Enter 0 if a flow size on any day after it first appears is < threshold
• Enter NA for all days other than when it first appears as a > threshold sized flow
• Sliding window: number of days15
HNTES 2.0 Task 1Identified elephant flows table
• Sort flows in flow analysis table by a metric• Metric: weighted sum of
– persistency measure– size measure
• Persistency measure: Percentage of days in which size is non-zero out of the days for which data is available
• Size measure: Average per-day size measure (for days in which data is available) divided by max value (among all flows)
• Set threshold for weighted sum metric and drop flows whose metric is smaller than threshold
• Limits number of rows in identified elephant flows table
16
Sensitivity analysis
• Size threshold, e.g., 1GB• Period for summation of sizes, e.g.,
1 day• Sliding window, e.g., 30 days• Value for weighted sum metric
17
Is HNTES 2.0 sufficient?
• Will depend on persistency measure– if many new elephant flows appear each
day, need a complementary online solution
• Online Flow Monitoring Module (FMM)
18
19
Outline
• Problem statement• Solution approach
– HNTES 1.0 and HNTES 2.0 (ongoing)
ESnet-UVA collaborative work– Netflow data analysis– Validation of Netflow based size estimation– Effect of elephant flows
• SNMP measurements• OWAMP data analysis
– GridFTP transfer log data analysis
• Future work: HNTES 3.0 and integrated network
Netflow data analysis
• Zhenzhen Yan coded OFAT (Offline flow analysis tool) and R program for IP address anonymization
• Chris Tracy is executing OFAT on ESnet Netflow data and running the anonymization R program
• Chris will provide UVA Flow Analysis table with anonymized IP addresses
• UVA will analyze flow analysis table with R programs, and create identified elephant flows table
• If high persistency measure, then offline solution is suitable; if not, need HNTES 3.0 and FMM!
20
Findings: NERSC-mr2, April 2011 (one month data)
21
Persistency measure = ratio of (number of days in which flow size > 1GB) to (number of days from when the flow first appears)Total number of flows = 2281 Number of flows that had > 1GB transfers every day = 83
Data doors
• Number of flows from NERSC data doors = 84 (3.7% of flows)
• Mean persistency ratio of data door flows = 0.237
• Mean persistency ratio of non-data door flows = 0.197
• New flows graph right skewed offline is good enough? (just one month – need more months’ data analysis)
• Persistency measure is also right skewed online may be needed
22
Validation of size estimation from Netflow data
• Hypothesis– Flow size from concatenated Netflow
records for one flow can be multiplied by 1000 (since the ESnet Netflow sampling rate is 1 in 1000 packets) to estimate actual flow size
23
Experimental setup
24
• GridFTP transfers of 100 MB, 1GB, 10 GB files
• sunn-cr1 and chic-cr1 Netflow data used
Chris Tracy set up this experiment
Flow size estimation experiments
• Workflow inner loop (executed 30 times):– obtain initial value of firewall counters at sunn-cr1
and chic-cr1 routers– start GridFTP transfer of a file of known size– from GridFTP logs, determine data connection
TCP port numbers– read firewall counters at the end of the transfer– wait 300 seconds for Netflow data to be exported
• Repeat experiment 400 times for 100MB, 1 GB and 10 GB file sizes
25Chris Tracy ran the experiments
Create log files
• Filter out GridFTP flows from Netflow data• For each transfer, find packet counts and
byte counts from all the flow records and add
• Multiply by 1000 (1-in-1000 sampling rate)• Output the byte and packet counts from
the firewall counters• Size-accuracy ratio = Size computed from
Netflow data divided by size computed from firewall counters
26Chris Tracy wrote scripts to create these log files and gave UVA these files for analysis
Size-accuracy ratio
27
Netflow records obtained from Chicago ESnet router
Netflow records obtained from Sunnyvale ESnet router
Mean Standard deviation
Mean Standard deviation
100 MB 0.949 0.2780 1.0812 0.30731 GB 0.996 0.1708 1.032 0.165310 GB 0.990 0.0368 0.999 0.0252
• Sample mean shows a size-accuracy ratio close to 1
• Standard deviation is smaller for larger files. • Dependence on traffic load• Sample size = 50
Zhenzhen Yan analyzed log files
28
Outline
• Problem statement• Solution approach
– HNTES 1.0 and HNTES 2.0 (ongoing)
ESnet-UVA collaborative work– Netflow data analysis– Validation of Netflow based size estimation Effect of elephant flows
• SNMP measurements• OWAMP data analysis
– GridFTP log analysis
• Future work: HNTES 3.0 and integrated network
Effect of elephant flows on link loads
• SNMP link load averaging over 30 sec• Five 10GB GridFTP transfers• Dashed lines: rest of the traffic load 29
2.5 Gb/s
10 Gb/s
1 minute
SUNN-cr1interface SNMP load
CHIC-cr1interface SNMP load
Chris Tracy
OWAMP (one-way ping)
• One-Way Active Measurement Protocol (OWAMP)– 9 OWAMP servers across Internet2 (72 pairs)– The system clock is synchronized– The “latency hosts” (nms-rlat) are dedicated
only to OWAMP– 20 packets per second on average (10 for ipv4,
10 for ipv6) for each OWAMP server pair– Raw data for 2 weeks obtained for all pairs
30
Study of “surges” (consecutive higher OWAMP delays on 1-minute basis)
• Steps:• Find the 10th percentile delay b across
the 2-weeks data set• Find the 10th percentile delay i for
each minute • If i > n × b, i is considered a surge
point (n = 1.1, 1.2, 1.5)• Consecutive surge points are
combined as a single surge
31
Study of surges cont.
32
CHIC-LOSA CHIC-KANS KANS-HOUS HOUS-LOSA LOSA-SALT
10th percentile 29 ms 5 ms 6.7 ms 16.1 ms 7.3 ms>1.1×(10th percentile) 31 ms 5.9 ms 7.3 ms 17.5 ms 8.5 ms
>1.2×(10th percentile) 34 ms 6.3 ms 8 ms 19 ms 9.5 ms
>1.5×(10th percentile) NA NA NA 23.9 ms 11.6 ms
• Sample absolute values of 10th percentile delays
PDF of surge duration
33
• a surge lasted for 200 mins• the median value is 34 mins
95th percentile per minute
34
CHIC-LOSA CHIC-KANS KANS-HOUS HOUS-LOSA LOSA-SALT
10th percentile of 2 weeks 29 ms 5 ms 6.7 ms 16.1 ms 7.3 ms
>1.2×(10th percentile) 33 ms 6.4 ms 8 ms 18.7 ms 9.3 ms
>1.5×(10th percentile) 50 ms 8.1 ms 18.8 ms 23.9 ms 11.5 ms
>2×(10th percentile) 58 ms 11 ms 18.8 ms 40.7 ms NA
>3×(10th percentile) 84 ms 17 ms NA 53.8 ms NA
Max of 95th percentile 119.8 ms 50.5 ms NA 86.7 ms NA
•The 95 percentile delay per min was 4.13 (CHIC-LOSA), 10.1 (CHIC-KANS) and 5.4 (HOUS-LOSA) times the one way propagation delay
Future workDetermine cause(s) of surges
• Host (OWAMP server) issues?– In addition to OWAMP pings, OWAMP server pushes
measurements to Measurement Archive at IU
• Interference from BWCTL at HP LAN switch within PoP?– Correlate BWCTL logs with OWAMP delay surges
• Router buffer buildups due to elephant flows– Correlate Netflow data with OWAMP delay surges
• If none of above, then surges due to router buffer buildups resulting from multiple simultaneous mice flows
35
GridFTP data analysis findings
Size (bytes) Duration (sec) Throughput
Minimum 100003680 0.25 1.2 Mbps
Median 104857600 2.5 348 Mbps
Maximum 96790814720 = 90 GB
9952 4.3 Gbps
36
• All GridFTP transfers from NERSC GridFTP servers that > 100 MB: one month (Sept. 2010)
• Total number of transfers: 124236 • Data from GridFTP logs
Throughput of GridFTP transfers
37
• Total number of transfers: 124236
• Most transfers get about 50 MB/sec or 400 Mb/s
Variability in throughput for files of the same size
Throughput in bits/s
Minimum 7.579e+08
1st quartile 1.251e+09
Median 1.499e+09
Mean 1.625e+09
3rd quartile 1.947e+09
Maximum 3.644e+09
38
• There were 145 file transfers of size 34359738368 (bytes) – 34 GB approx.
• IQR (Inter-quartile range) measure of variance is 695 Mbps
• Need to determine other end and consider time
39
Outline
• Problem statement• Solution approach
– HNTES 1.0 and HNTES 2.0 (ongoing)
• ESnet-UVA collaborative work Future work: HNTES 3.0 and integrated
network
HNTES 3.0
• Online flow detection Packet header based schemes– Payload based scheme– Machine learning schemes
• For ESnet– Data door IP address based 0-length (SYN) segment
mirroring to trigger PBR entries (if full mesh of LSPs), and LSP setup (if not a full mesh)
– PBR can be configured only after finding out the other end’s IP address (data door is one end)
– “real-time” analysis of Netflow data• Need validation by examining patterns within each day
40
HNTES in an integrated network
• Setup two queues on each ESnet physical link; each rate-limited
• Two approaches• Use different DSCP taggings
– General purpose: rate limited at 20% capacity– Science network: rate limited at 80% capacity
• IP network + MPLS network– General purpose: same as approach I– Science network: full mesh of MPLS LSPs
mapped to 80% queue
41
Ack: Inder Monga
Comparison
• In first solution, there is no easy way to achieve load balancing of science flows
• Second solution:– MPLS LSPs are rate unlimited– Use SNMP measurements to measure load on each
of these LSPs– Obtain traffic matrix– Run optimization to load balance science flows by
rerouting LSPs to use whole topology– Science flows will enjoy higher throughput than in
the first solution because TE system can periodically re-adjust routing of LSPs
42
Discuss integration with IDC
• IDC established LSPs have rate policing at ingress router
• Not suitable for HNTES redirected science flows
• Add a third queue for this category
43Discussion with Chin Guok
Summary
• HNTES 2.0 focus– Elephant (large-sized) flows– Offline detection– Rate-unlimited static MPLS LSPs– Offline setting of policy based routes for flow
redirection
• HNTES 3.0– Online PBR configuration – Requires flow monitoring module to receive port
mirrored packets from routers and execute online flow redirection after identifying other end
• HNTES operation in an integrated network44