24
1 An update on HNTES Thanks to the US DOE ASCR for grants DE-SC0002350 and DE- SC0007341 (UVA), and for DE-AC02- 05CH11231 (ESnet) Thanks to Brian Tierney, Chin Guok, Eric Pouyoul UVA Students: Zhenzhen Yan, Tian Jin, Zhengyang Liu, Hanke (Casey) Meng, Ranjana Addanki, Haoyu Chen, and Sam Elliott M. Veeraraghavan University of Virginia (UVA) [email protected] Feb. 24, 2014 Chris Tracy ESnet [email protected]

1 An update on HNTES Thanks to the US DOE ASCR for grants DE-SC0002350 and DE- SC0007341 (UVA), and for DE-AC02- 05CH11231 (ESnet) Thanks to Brian Tierney,

Embed Size (px)

Citation preview

Page 1: 1 An update on HNTES Thanks to the US DOE ASCR for grants DE-SC0002350 and DE- SC0007341 (UVA), and for DE-AC02- 05CH11231 (ESnet) Thanks to Brian Tierney,

1

An update on HNTES

Thanks to the US DOE ASCR for grants DE-SC0002350 and DE-SC0007341 (UVA), and for DE-AC02- 05CH11231 (ESnet)

Thanks to Brian Tierney, Chin Guok, Eric Pouyoul

UVA Students: Zhenzhen Yan, Tian Jin, Zhengyang Liu, Hanke (Casey) Meng, Ranjana Addanki, Haoyu Chen, and Sam Elliott

M. Veeraraghavan University of Virginia (UVA)

[email protected]

Feb. 24, 2014

Chris TracyESnet

[email protected]

Page 2: 1 An update on HNTES Thanks to the US DOE ASCR for grants DE-SC0002350 and DE- SC0007341 (UVA), and for DE-AC02- 05CH11231 (ESnet) Thanks to Brian Tierney,

Outline

• Three main contributions– HNTES– AFCS– QoS provisioning

• Goal: Operationalize AFCS on ESnet5• Future work: feedback?

2

HNTES: Hybrid Network Traffic Engineering SystemAFCS: Alpha Flow Characterization System or EFCS

Page 3: 1 An update on HNTES Thanks to the US DOE ASCR for grants DE-SC0002350 and DE- SC0007341 (UVA), and for DE-AC02- 05CH11231 (ESnet) Thanks to Brian Tierney,

Contributions

• HNTES: Tested the hypothesis that if IP address prefixes extracted from offline analysis of completed alpha flows are used to redirect future alpha flows to traffic-engineered MPLS LSPs, the solution will be effective

• AFCS: Characterize alpha flows (size, duration)• QoS provisioning: Requested support for rate-

unspecified circuits: policing can throttle throughput– Two new classes added in new ESnet QoS document

• Best-Effort Circuit Class (different from Best-Effort Class)• Assured Forwarding Class

3

Page 4: 1 An update on HNTES Thanks to the US DOE ASCR for grants DE-SC0002350 and DE- SC0007341 (UVA), and for DE-AC02- 05CH11231 (ESnet) Thanks to Brian Tierney,

Publications

Published•Z. Yan, M. Veeraraghavan, C. Tracy, C. Guok, “On how to provision Quality of Service (QoS) for large dataset transfers,” CTRQ 2013, Best Paper Award•T. Jin, C. Tracy, M. Veeraraghavan, Z. Yan, “Traffic Engineering of High-Rate Large-Sized Flows,” IEEE HPSR 2013•Z. Liu, M. Veeraraghavan, Z. Yan, C. Tracy, J. Tie, I. Foster, J. Dennis, J. Hick, Y. Li and W. Yang, “On using virtual circuits for GridFTP transfers,” IEEE SC2012, Nov. 10-16, 2012•Z. Yan, C. Tracy, M. Veeraraghavan, “A hybrid network traffic engineering system,” IEEE HPSR 2012, June 24-27, 2012Submitted•Two journal papers and one conference paper

4

Page 5: 1 An update on HNTES Thanks to the US DOE ASCR for grants DE-SC0002350 and DE- SC0007341 (UVA), and for DE-AC02- 05CH11231 (ESnet) Thanks to Brian Tierney,

HNTES vs. AFCS

• Goal of HNTES was to identify IP addresses of data transfer nodes that were sourcing/sinking alpha flows – Analyzes only single NetFlow records (one

generated per minute per flow)

• Goal of AFCS: characterize the size, rate and duration of alpha flows– Requires concatenation of multiple NetFlow

records to characterize individual flows– Not aggregation as done by commercial

tools5

Page 6: 1 An update on HNTES Thanks to the US DOE ASCR for grants DE-SC0002350 and DE- SC0007341 (UVA), and for DE-AC02- 05CH11231 (ESnet) Thanks to Brian Tierney,

AFCS

• AFCS work is newer: current focus• Easier to operationalize than HNTES

– HNTES requires additional step to redirect flows to AF class through firewall filter config.

– Needs new work for ALUs – previous QoS experiments on Junipers

• Goal: Characterize alpha flows– Determine size (bytes), duration, rate

6

Page 7: 1 An update on HNTES Thanks to the US DOE ASCR for grants DE-SC0002350 and DE- SC0007341 (UVA), and for DE-AC02- 05CH11231 (ESnet) Thanks to Brian Tierney,

AFCS Algorithm

• Find NetFlow records for all gamma flows– gamma flow is defined to be a flow that has at

least one “Large” NetFlow record– Large NetFlow record: size > threshold (1 GB)– Maximum duration of a NetFlow record is 1 min

because of “active timeout interval” value configured in ESnet routers

• Start concatenation procedure to reconstruct “flows” out of “records”

• Use size/rate thresholds to find alpha flows– e.g., 10 GB and 200 Mbps

7

Page 8: 1 An update on HNTES Thanks to the US DOE ASCR for grants DE-SC0002350 and DE- SC0007341 (UVA), and for DE-AC02- 05CH11231 (ESnet) Thanks to Brian Tierney,

Step 1: Finding NetFlow records of gamma flows

• Find all Large Netflow records• Extract five-tuple IDs of these

Large records– srcIP, dstIP, srcport, dstport, protocol

• Find all Small NetFlow records corresponding to those five-tuple IDs

8

Page 9: 1 An update on HNTES Thanks to the US DOE ASCR for grants DE-SC0002350 and DE- SC0007341 (UVA), and for DE-AC02- 05CH11231 (ESnet) Thanks to Brian Tierney,

Step 2: Concatenation procedure

(using example)

• All records (reports) observed on same day• Time gap between last-pkt TS of one record and

first-pkt TS of next record < 1 min for grouping

9

difference: 889.798 secdifference: 180 ms

One

flow

difference: 40665 sec

Page 10: 1 An update on HNTES Thanks to the US DOE ASCR for grants DE-SC0002350 and DE- SC0007341 (UVA), and for DE-AC02- 05CH11231 (ESnet) Thanks to Brian Tierney,

Step 3: find alpha flows

• Total size of each gamma flow– Sum of sizes of concatenated NetFlow records

and multiply by 1000– Packet sampling rate: 1-in-1000

• Total duration of each gamma flow– Last packet timestamp of last NetFlow record

minus first packet TS of first NetFlow group in group

• Rate: size/duration• Alpha flows: gamma flows whose size and

rate exceed preset thresholds10

Page 11: 1 An update on HNTES Thanks to the US DOE ASCR for grants DE-SC0002350 and DE- SC0007341 (UVA), and for DE-AC02- 05CH11231 (ESnet) Thanks to Brian Tierney,

Validated algorithm

• Because of NetFlow packet sampling rate, we needed to validate our size/duration computation algorithm

• Found GridFTP logs from NERSC data transfer node• Found corresponding NetFlow records from ESnet router • Found additional NetFlow records with same flow IDs• Applied algorithm to find size/duration of flows from

NetFlow records• Recreated “sessions” from GridFTP transfer logs (-fast

option: multiple files transferred on one TCP connection); found session size and compared with flow size determined from NetFlow records

• Accuracy close to 100% but decreases with size • Size accuracy ratio > 100% for smaller sizes

11

Page 12: 1 An update on HNTES Thanks to the US DOE ASCR for grants DE-SC0002350 and DE- SC0007341 (UVA), and for DE-AC02- 05CH11231 (ESnet) Thanks to Brian Tierney,

NetFlow observation points (OP)(data obtained from ESnet4: May-Nov.

2011)

12

router-1, router-2: BNL and NERSC PErouter-3: sunn-cr1 (REN peerings)router-4: eqx-sj (commercial peerings)

Page 13: 1 An update on HNTES Thanks to the US DOE ASCR for grants DE-SC0002350 and DE- SC0007341 (UVA), and for DE-AC02- 05CH11231 (ESnet) Thanks to Brian Tierney,

Characterization of flows(May-Nov. 2011 data)

Provider edge routers (downloads)

Core routers (uploads to DOE labs)

router-1 router-2 router-3REN peerings

router-4Commercial peerings

bnl nersc sunn-cr1 eqx-sj

# flows 28685 27963 2516 212

# unique flow src-dst pairs

1479 1611 193 158

max size ( flow)

633.3GB 811.6GB 233.6GB 112.8GB

max rate ( flow)

5.1Gpbs 5.7Gbps 0.97Gbps 0.78Gbps

longest flow 9hr 8.8hr 3.87hr 2.77hr

Page 14: 1 An update on HNTES Thanks to the US DOE ASCR for grants DE-SC0002350 and DE- SC0007341 (UVA), and for DE-AC02- 05CH11231 (ESnet) Thanks to Brian Tierney,

Provider edge routers (downloads)

Core routers (uploads to DOE labs)

router-1 router-2 router-3REN peerings

router-4Commercial peerings

Min 1001 1001 1005 1010

1st Qu. 1149 1540 4050 1203

Median 1275 2869 4360 1532

Mean 2513 9046 17540 3612

3rd Qu. 1701 8768 21380 3772

Max 633300 811600 233600 112800

IQR 552 7227 17330 2569

CV 5.20 2.56 1.4 2.43

skewness 25.35 12.56 2.37 10.09

Size (MB) of flows(May-Nov. 2011 data)

14

 112 GB

811 GB

Page 15: 1 An update on HNTES Thanks to the US DOE ASCR for grants DE-SC0002350 and DE- SC0007341 (UVA), and for DE-AC02- 05CH11231 (ESnet) Thanks to Brian Tierney,

Duration (s) of flows(May-Nov. 2011 data)

15

Provider edge routers (downloads)

Core routers (uploads to DOE labs)

router-1 router-2 router-3REN peerings

router-4Commercial peerings

Min 4.212 8.044 9.55 12.03

1st Qu. 41.85 60.94 190.9 54.97

Median 54.17 121.1 272 94.28

Mean 122.8 414.2 1098 235.6

3rd Qu. 73.58 398.9 1169 227.6

Max 32460 31910 13940 9978

IQR 31.73 338.01 977.94 172.67

CV 7.392 2.34 1.50 3.18

skewness

23.767 10.33 2.32 10.99 mean is above the median under right (positive) skew

 2.8 hours

 9 hours

Page 16: 1 An update on HNTES Thanks to the US DOE ASCR for grants DE-SC0002350 and DE- SC0007341 (UVA), and for DE-AC02- 05CH11231 (ESnet) Thanks to Brian Tierney,

Rate (Mbps) of flows(May-Nov. 2011 data)

16

Provider edge routers (downloads)

Core routers (uploads to DOE labs)

router-1 router-2 router-3REN peerings

router-4Commercial peerings

Min 11.7 3.6 34.6 49.2

1st Qu. 160.9 147 117.6 130.9

Median 199.3 181.9 132.6 156.4

Mean 245.2 230.9 159 182.7

3rd Qu. 258.9 252.1 159.2 195.8

99% 881 944 503 649

Max 5154 5757 979 776

CV 0.71 0.72 0.56 0.61

skewness

7.36 3.95 3.82 2.86

Page 17: 1 An update on HNTES Thanks to the US DOE ASCR for grants DE-SC0002350 and DE- SC0007341 (UVA), and for DE-AC02- 05CH11231 (ESnet) Thanks to Brian Tierney,

Characterization of flows(May-Nov. 2011 data)

• Results: # of flows over 214 days (sensitivity to size-rate threshold)

size rate Router-1

Router-2

Router-3

Router-4

10GB 100Mbps

526 5460 726 3

10GB 150Mbps

399 4121 297 1

10GB 180Mbps

375 3037 124 0

10GB 200Mbps

357 2443 92 0

50GB 200Mbps

19 505 28 0

80GB 500Mbps

0 20 0 0

Page 18: 1 An update on HNTES Thanks to the US DOE ASCR for grants DE-SC0002350 and DE- SC0007341 (UVA), and for DE-AC02- 05CH11231 (ESnet) Thanks to Brian Tierney,

Persistency measure(May-Nov. 2011 data)

• CDF of number of and flows per src/dst pair (router-1 plot close to router-2 plot and hence omitted)

flows: > 5 GB and 100 Mbps flows

Page 19: 1 An update on HNTES Thanks to the US DOE ASCR for grants DE-SC0002350 and DE- SC0007341 (UVA), and for DE-AC02- 05CH11231 (ESnet) Thanks to Brian Tierney,

Discussion

• Largest-sized flow rate:301 Mbps, fastest-flow size:7.14 GB, and longest-flow size: 370 GB

• At the low end, one 1.9 GB lasted 4181 sec• High skewness in size for downloads• Larger-sized flows for downloads than uploads, and

more frequent• Max number of and flows per src-dst pair were

(2913, 1596) for router-2 (nersc)• The amount of data analyzed is a small subset of

our total dataset, both in time and number of routers analyzed. Concatenating flows is somewhat of an intensive task, so we tried to choose routers that would be representative.

19

Page 20: 1 An update on HNTES Thanks to the US DOE ASCR for grants DE-SC0002350 and DE- SC0007341 (UVA), and for DE-AC02- 05CH11231 (ESnet) Thanks to Brian Tierney,

Potential application

• Find src-dst pairs that are experiencing high variance in throughput to initiate diagnostics and improve user experience– In the 2913 -flow set between same src-dst

pair, 75% of the flows experienced less than 161.2 Mbps while the highest rate experienced was 1.1 Gbps (size: 3.5 GB).

– In the 1596 -flow set, 75% of the flows experienced less than 167 Mbps, while the highest rate experienced was 536 Mbps (size: 11 GB).

20

Page 21: 1 An update on HNTES Thanks to the US DOE ASCR for grants DE-SC0002350 and DE- SC0007341 (UVA), and for DE-AC02- 05CH11231 (ESnet) Thanks to Brian Tierney,

Other applications

• Identify suboptimal paths– science flows should typically enter

ESNet via REN peerings, but some of the observed alpha flows at eqx-sj could have occurred because of BGP sub-optimal configurations

– correlate AFCS findings with BGP data

• HNTES: traffic engineering alpha flows

21

Page 22: 1 An update on HNTES Thanks to the US DOE ASCR for grants DE-SC0002350 and DE- SC0007341 (UVA), and for DE-AC02- 05CH11231 (ESnet) Thanks to Brian Tierney,

Ongoing work

• ESnet4 upgrade to ESnet5 (2012)– Juniper to ALU routers– Netflow v5 to NetFlow v9– Flow-tools to nfdump

• Rewrote AFCS code• Running on an ESnet VM

– CryptoPAN IP address anonymization

• Demo: D3.js GUI (preliminary)

22

Page 23: 1 An update on HNTES Thanks to the US DOE ASCR for grants DE-SC0002350 and DE- SC0007341 (UVA), and for DE-AC02- 05CH11231 (ESnet) Thanks to Brian Tierney,

Numbers for Oct. 1-Nov. 12, 2013 data

from bnl-mr2 (24255 flows)

23

Size (MB) Duration (sec)

Rate (Mbps)

Min 1000 8 15

1st Qu. 1992 19.2 227

Median 2300 22.25 651

Mean 4147 80.18 1217

3rd Qu. 4839 65 938.8

90% 6400 259 2626

99% 23396 381 9249

99.9% 40721 1904 10129

Max 313600 36190 10670 (size over estimate)

IQR 2847.25 45.8 711.8

CV 1.44 5.15 1.48

skewness

17.19 61 2.87

Not the same flow

Incr

ease

d r

ela

tive t

o 2

01

1 d

ata

Incr

ease

d r

ela

tive t

o 2

01

1 d

ata

Page 24: 1 An update on HNTES Thanks to the US DOE ASCR for grants DE-SC0002350 and DE- SC0007341 (UVA), and for DE-AC02- 05CH11231 (ESnet) Thanks to Brian Tierney,

Feedback?

• Goal: Integrate operational AFCS output with my.es.net

• Current plan– Run software every night, compute numbers

for gamma flows observed that day– Pre-calculate last 24-hours, 7-days, last 30-

days json files for quick visualization– Per-site alpha flows (configurable thresholds)– Store gamma-flow information in SQL

database for easier querying of other types of requests

24