16
UltraLight: Network & UltraLight: Network & Applications Research at UF Applications Research at UF Dimitri Bourilkov Dimitri Bourilkov University of Florida University of Florida CISCO - UF Collaborative Team Meeting CISCO - UF Collaborative Team Meeting Gainesville, FL, September 12, 2006 Gainesville, FL, September 12, 2006

UltraLight: Network & Applications Research at UF

  • Upload
    ajaxe

  • View
    37

  • Download
    0

Embed Size (px)

DESCRIPTION

UltraLight: Network & Applications Research at UF. Dimitri Bourilkov University of Florida CISCO - UF Collaborative Team Meeting Gainesville, FL, September 12, 2006. Overview a NSF Project. The UltraLight Team. - PowerPoint PPT Presentation

Citation preview

Page 1: UltraLight: Network & Applications Research at UF

UltraLight: Network & UltraLight: Network & Applications Research at UFApplications Research at UF

Dimitri BourilkovDimitri BourilkovUniversity of FloridaUniversity of Florida

CISCO - UF Collaborative Team MeetingCISCO - UF Collaborative Team Meeting

Gainesville, FL, September 12, 2006Gainesville, FL, September 12, 2006

Page 2: UltraLight: Network & Applications Research at UF

D.BourilkovD.Bourilkov UltraLight 22

Overview a NSF Project

Page 3: UltraLight: Network & Applications Research at UF

D.BourilkovD.Bourilkov UltraLight 33

The UltraLight Team Steering Group: H. Newman (Caltech, PI), P. Avery (U.

Florida), J. Ibarra (FIU), S. McKee (U. Michigan) Project Management: Richard Cavanaugh (Project Coordinator),

PI and Working Group Coordinators: Network Engineering: Shawn McKee (Michigan);

+ S. Ravot (LHCNet), R. Summerhill (Abilene/HOPI), D. Pokorney (FLR), J. Ibarra (WHREN, AW), C. Guok

(ESnet),L. Cottrell (SLAC), D. Petravick, M. Crawford

(FNAL), S. Bradley, J. Bigrow (BNL), et al.

Applications Integration: Frank Van Lingen (Caltech);+ I. Legrand (MonALISA), J. Bunn (GAE + TG); C. Steenberg, M. Thomas (GAE), Sanjay Ranka (Sphinx) et al.

Physics Analysis User Group: Dimitri Bourilkov (UF; CAVES, Codesh)

Network Research, Wan In Lab Liaison: Steven Low (Caltech)

Education and Outreach: Laird Kramer (FIU), + H. Alvarez, J. Ibarra, H. Newman

Page 4: UltraLight: Network & Applications Research at UF

D.BourilkovD.Bourilkov UltraLight 44

TOTEM pp, general purpose; HI

pp, general purpose; HI

LHCb: B-physics

ALICE : HI

pp s =14 TeV L=1034 cm-2 s-1

27 km Tunnel in Switzerland & France

Large Hadron Collider CERN, Geneva: 2007 Start

Large Hadron Collider CERN, Geneva: 2007 Start

CMS

Atlas

Higgs, SUSY, Extra Dimensions, CP Violation, QG Plasma, … the Unexpected

5000+ Physicists 250+ Institutes 60+ Countries

Challenges: Analyze petabytes of complex data cooperativelyHarness global computing, data & NETWORK resources

Page 5: UltraLight: Network & Applications Research at UF

D.BourilkovD.Bourilkov UltraLight 55

LHC Data Grid Hierarchy

CERN/Outside Ratio Smaller; Expanded Role of Tier1s & Tier2s:

Greater Reliance on Networks

CERN/Outside Resource Ratio ~1:4

Tier0/( Tier1)/( Tier2) ~1:2:2

DISUN:

4 of 7 US CMS Tier2s ShownWith ~8 MSi2k; 1.5 PB Disk by

2007>100 Tier2s at LHC

10-40+ Gbps

2.5 - 30 Gbps

Page 6: UltraLight: Network & Applications Research at UF

D.BourilkovD.Bourilkov UltraLight 66

Tier-2s

~100 Identified – Number still growing

Page 7: UltraLight: Network & Applications Research at UF

D.BourilkovD.Bourilkov UltraLight 77

HENP Bandwidth Roadmap for Major Links (in Gbps)

Continuing Trend: ~1000 Times Bandwidth Growth Per Decade;HEP: Co-Developer as well as Application Driver of Global Nets

Year Production Experimental Remarks

2001 0.155 0.622-2.5 SONET/SDH

2002 0.622 2.5 SONET/SDH DWDM; GigE Integ.

2003 2.5 10 DWDM; 1 + 10 GigE Integration

2005 10 2-4 X 10 Switch; Provisioning

2007 2-4 X 10 ~10 X 10; 40 Gbps

1st Gen. Grids

2009 ~10 X 10 or 1-2 X 40

~5 X 40 or ~20-50 X 10

40 Gbps Switching

2011 ~5 X 40 or

~20 X 10

~25 X 40 or ~100 X 10

2nd Gen Grids Terabit Networks

2013 ~Terabit ~MultiTbps ~Fill One Fiber

Page 8: UltraLight: Network & Applications Research at UF

D.BourilkovD.Bourilkov UltraLight 88

Data Samples and Transport ScenariosData Samples and Transport Scenarios

107 EventSamples

Data Volume (TBytes)

Transfer Time (hrs)@ 0.9 Gbps

Transfer Time (hrs) @ 3 Gbps

Transfer Time (hrs)@ 8 Gbps

AOD 0.5-1 1.2 – 2.5 0.37-0.74 0.14 – 0.28

RECO 2.5 - 5 6 - 12 1.8 – 3.7 0.69 – 1.4

RAW+RECO 17.5 - 21 43 - 86 13 - 26 4.8 – 9.6

MC 20 98 30 11

107 Events is a typical data sample for analysis or reconstruction development [Ref.: MONARC]; equivalent to just ~1 day’s running

Transporting datasets with quantifiable high performance is needed for efficient workflow, and thus efficient use of CPU and storage resources

One can only transmit ~2 RAW + REC or MC samples per day on a 10G path Movement of 108 event samples (e.g. after re-reconstruction) will take

~1 day (RECO) to ~1 week (RAW, MC) with a 10G link at high occupancy Transport of significant data samples will require one, or multiple 10G links

Page 9: UltraLight: Network & Applications Research at UF

D.BourilkovD.Bourilkov UltraLight 99

UltraLight Goals

• Goal:Goal: Enable the Enable the network as an network as an integrated managed integrated managed resourceresource

• Meta-Goal:Meta-Goal: Enable Enable physics analysis & physics analysis & discoveries which discoveries which otherwise could not be otherwise could not be achievedachieved

• Caltech, Florida, Michigan, FNAL, SLAC, CERN, BNL, Internet2/HOPI

• UERJ (Rio), USP(Sao Paulo), FIU, KNU (Korea), KEK (Japan),TIFR (India), PERN (Pakistan)

• NLR, ESnet, CENIC, FLR, MiLR, US Net, Abilene, JGN2, GLORIAD, RNP, CA*net4; UKLight, Netherlight, Taiwan

• Cisco, Neterion, Sun …

• Next generation Information System, with the network as an integrated, actively managed subsystem

in a global Grid

• Hybrid network infrastructure: packet-switched + dynamic optical paths

• End-to-end monitoring; Realtime tracking and optimization

• Dynamic bandwidth provisioning; Agent-based services spanning all layers

Page 10: UltraLight: Network & Applications Research at UF

D.BourilkovD.Bourilkov UltraLight 1010

Large Scale Data Transfers

• Network aspect:Bandwidth*Delay Product (BDP); we have to use TCP

windows matching it in the kernel AND the application

• On a local connection with 1GbE and RTT 0.19 ms, to fill the pipe we need around 2*BDP2*BDP = 2*1Gb/s*0.00019s = ~ 48 KBytesOr, for a 10 Gb/s LAN: 2*BDP = ~ 480 KBytes

• Now on the WAN: from Florida to Caltech the RTT is 115 ms. So for 1 Gb/s to fill the pipe we need2*BDP = 2*1Gb/s*0.115s = ~ 28.8 MBytes etc.

• User aspect: are the servers on both ends capable of matching these rates for useful disk-to-disk? Tune kernels, get highest possible disk read/write speed etc. Tables turned: WAN outperforms disk speeds!

Page 11: UltraLight: Network & Applications Research at UF

D.BourilkovD.Bourilkov UltraLight 1111

bbcp Tests

bbcp was selected as a starting tool for data transfers on the WAN:• Supports multiple streams, highly tunable (window

size etc), peer-to-peer type

• Well supported by Andy Hanushevsky from SLAC

• Is used successfully in BaBar

• I have used it in 2002 for CMS production: massive data transfers from Florida to CERN; the only limit observed at the time was disk writing speed (LAN), network (WAN)

• Starting point Florida Caltech: < 0.5 MB/s on the WAN, very poor performance

Page 12: UltraLight: Network & Applications Research at UF

D.BourilkovD.Bourilkov UltraLight 1212

Evolution of Tests Leading to SC|05

• End points in Florida (uflight1) and Caltech (nw1): AMD Opterons over UL network

• Tuning of Linux kernels (2.6.x) and bbcp window sizes – coordinated iterative procedure

• Current status (for file sizes ~ 2GB):• 6-6.5 Gb/s with iperf• up to 6 Gb/s memory to memory• 2.2 Gb/s ramdisk remote disk write

> the speed was the same writing to SCSI disk which is supposedly less than 80 MB/s or writing to a raid array, so de facto it always goes first to memory cache (the Caltech node has 16 GB ram)

• Used successfully with up to 8 bbcp processes in parallel from Florida to the show floor in Seattle; CPU load still OK

Page 13: UltraLight: Network & Applications Research at UF

D.BourilkovD.Bourilkov UltraLight 1313

bbcp Examples Florida Caltech

[bourilkov@uflight1 data]$ iperf -i 5 -c 192.84.86.66 -t 60------------------------------------------------------------Client connecting to 192.84.86.66, TCP port 5001TCP window size: 256 MByte (default)------------------------------------------------------------[ 3] local 192.84.86.179 port 33221 connected with 192.84.86.66 port 5001[ 3] 0.0- 5.0 sec 2.73 GBytes 4.68 Gbits/sec[ 3] 5.0-10.0 sec 3.73 GBytes 6.41 Gbits/sec[ 3] 10.0-15.0 sec 3.73 GBytes 6.40 Gbits/sec[ 3] 15.0-20.0 sec 3.73 GBytes 6.40 Gbits/sec

bbcp: uflight1.ultralight.org kernel using a send window size of 20971584 not 10485792bbcp -s 8 -f -V -P 10 -w 10m big2.root [email protected]:/dev/nullbbcp: Sink I/O buffers (245760K) > 25% of available free memory (231836K); copy may be slowbbcp: Creating /dev/null/big2.rootSource cpu=5.654 mem=0K pflt=0 swap=0

File /dev/null/big2.root created; 1826311140 bytes at 432995.1 KB/s24 buffers used with 0 reorders; peaking at 0.Target cpu=3.768 mem=0K pflt=0 swap=01 file copied at effectively 260594.2 KB/s

bbcp -s 8 -f -V -P 10 -w 10m big2.root [email protected]:dimitribbcp: uflight1.ultralight.org kernel using a send window size of 20971584 not 10485792bbcp: Creating ./dimitri/big2.rootSource cpu=5.455 mem=0K pflt=0 swap=0

File ./dimitri/big2.root created; 1826311140 bytes at 279678.1 KB/s24 buffers used with 0 reorders; peaking at 0.Target cpu=10.065 mem=0K pflt=0 swap=01 file copied at effectively 150063.7 KB/s

Page 14: UltraLight: Network & Applications Research at UF

D.BourilkovD.Bourilkov UltraLight 1414

bbcp Examples Caltech Florida

[uldemo@nw1 dimitri]$ iperf -s -w 256m -i 5 -p 5001 -l 8960------------------------------------------------------------Server listening on TCP port 5001TCP window size: 512 MByte (WARNING: requested 256 MByte)------------------------------------------------------------[ 4] local 192.84.86.66 port 5001 connected with 192.84.86.179 port 33221[ 4] 0.0- 5.0 sec 2.72 GBytes 4.68 Gbits/sec[ 4] 5.0-10.0 sec 3.73 GBytes 6.41 Gbits/sec[ 4] 10.0-15.0 sec 3.73 GBytes 6.40 Gbits/sec[ 4] 15.0-20.0 sec 3.73 GBytes 6.40 Gbits/sec[ 4] 20.0-25.0 sec 3.73 GBytes 6.40 Gbits/sec

bbcp -s 8 -f -V -P 10 -w 10m big2.root [email protected]:/dev/nullbbcp: Sink I/O buffers (245760K) > 25% of available free memory (853312K); copy may be slowbbcp: Source I/O buffers (245760K) > 25% of available free memory (839628K); copy may be slowbbcp: nw1.caltech.edu kernel using a send window size of 20971584 not 10485792bbcp: Creating /dev/null/big2.rootSource cpu=5.962 mem=0K pflt=0 swap=0

File /dev/null/big2.root created; 1826311140 bytes at 470086.2 KB/s24 buffers used with 0 reorders; peaking at 0.Target cpu=4.053 mem=0K pflt=0 swap=01 file copied at effectively 263793.4 KB/s

Page 15: UltraLight: Network & Applications Research at UF

D.BourilkovD.Bourilkov UltraLight 1515

SuperComputing 05 Bandwidth Challenge

475 TBytes Transported in < 24 hAbove 100 Gbps for Hours

Page 16: UltraLight: Network & Applications Research at UF

D.BourilkovD.Bourilkov UltraLight 1616

Outlook

• The UltraLight network is already very performant

• SC|05 was a big success

• The hard problem from the user perspective now is to match it with servers capable of sustained rates for large files > 20 GB (when the memory caches are exhausted); fast disk writes are key (raid arrays)

• To fill 10 Gb/s pipes we need several pairs (3-4) of servers

• Next step: disk-to-disk transfers between Florida, Caltech, Michigan, FNAL, BNL, CERN, preparations for SC|06 (next talk)

• More info: http://ultralight.caltech.edu