26
User Community Report User Community Report Dimitri Bourilkov Dimitri Bourilkov University of Florida University of Florida UltraLight Visit to NSF UltraLight Visit to NSF Arlington, VA, January 4, 2006 Arlington, VA, January 4, 2006

User Community Report

  • Upload
    sinjin

  • View
    61

  • Download
    0

Embed Size (px)

DESCRIPTION

User Community Report. Dimitri Bourilkov University of Florida UltraLight Visit to NSF Arlington, VA, January 4, 2006. Physics Analysis User Group Motivation and Mission. Establish a community of physicists - early adopters and users: first within UltraLight (expert users) - PowerPoint PPT Presentation

Citation preview

Page 1: User Community Report

User Community ReportUser Community Report

Dimitri BourilkovDimitri BourilkovUniversity of FloridaUniversity of Florida

UltraLight Visit to NSFUltraLight Visit to NSF

Arlington, VA, January 4, 2006Arlington, VA, January 4, 2006

Page 2: User Community Report

D.BourilkovD.Bourilkov User Community Report 22

Physics Analysis User GroupMotivation and Mission

• Establish a community of physicists - early adopters and users:• first within UltraLight (expert users)

• later outside users

• This community uses the system being developed e.g.• starts actual physics analysis efforts

exploiting the test-bed

• provides a certain user perspective on the problems being solved

Page 3: User Community Report

D.BourilkovD.Bourilkov User Community Report 33

• Organizes early adoption of the system

• Identifies the most valuable features of the system from the users perspective, to be released early in production (or useful level of functionality)

• This is "where the rubber meets the road" and will provide rapid user feedback to the development team

Physics Analysis User Group

Page 4: User Community Report

D.BourilkovD.Bourilkov User Community Report 44

Physics Analysis User Group

• Evolving dialog with applications WG on:• the scope (what is most valuable for physics

analysis)• priorities for implementing features• composition and timing of releases - aligned with

the milestones of the experiments

• Develops, in collaboration with the applications group, a suite of functional tests; can be used for:• measuring the progress of the project• educating new users and making it easier to pass

the threshold for adopting the system• demonstrating the UltraLight services in action in

education/outreach workshops

Page 5: User Community Report

D.BourilkovD.Bourilkov User Community Report 55

Physics Analysis User Group

• Studies in depth the software framework of HEP applications (e.g. ORCA/COBRA or the new Software framework for CMS, ATHENA for ATLAS), the data and metadata models and the steps to best integrate the systems

• Maintains a close contact with people in charge of software developments in the experiments and responds to their requirements and needs

• Provides expert help with synchronization and integration between UltraLight and the software systems of the experiments

Page 6: User Community Report

D.BourilkovD.Bourilkov User Community Report 66

Physics Analysis User Group

• Contributes to ATLAS/CMS Physics preparation milestones (UL members are already active in LHC physics and several analyses are officially recognized in CMS)

• In the longer term enables physics analysis and LHC physics research

• In the shorter term involved actively in SC|05 activities, culminating with the Bandwidth Challenge in Seattle in November

• Prepared a tutorial on data analysis with ROOT for the E & O workshop in Miami, June 2005

Page 7: User Community Report

D.BourilkovD.Bourilkov User Community Report 77

CMS Data Samples for Testing

• For initial testing generated seven samples of Z’ events with masses from 0.2 to 5 TeV: fully simulated with OSCAR and reconstructed with ORCA on local Tier2 resources at UF; in total 42k events, ~ 2 GB in root trees (ExRootAnalysis format); used for SC|05

• Additional data for different channels: QCD background, top events, bosons + jets, SUSY points; ~ 35 GB, same format

• In addition ~ 100k single or di-muon events were generated over the summer at the FNAL LPC Tier1 resources

Page 8: User Community Report

D.BourilkovD.Bourilkov User Community Report 88

Prototype CMS Analysis

• Developed a stand-alone C++ code to analyze the ExRootAnalysis trees:• Lightweight, no external dependencies

besides ROOT, used for iGrid2005 and SC|05 demos and by users at UF for analysis and CMS production validation

• Some parts of the info e.g. trigger bits, harder to access than in CMS framework (need to load ORCA libraries)

Page 9: User Community Report

D.BourilkovD.Bourilkov User Community Report 99

Visualization

Before detector simulation: PYTHIA 4-vectors – CMKINViewer (DB)

After reconstruction – COJAC (Julian Bunn)

Page 10: User Community Report

D.BourilkovD.Bourilkov User Community Report 1010

Collaborative Community Tools:CAVES / CODESH Projects

• Concentrate on the interactions between scientists collaborating over extended periods of time

• Seamlessly log, exchange and reproduce results and the corresponding methods, algorithms and programsAutomatic and complete logging and reuse of work /

analysis sessions: collect all user activities on the command line + code of all executed programs

• Extend the power of users working / performing analyses in their habitual way, giving them virtual logbook capabilities

• CAVES is used in normal analysis sessions with ROOT• CODESH is a UNIX shell with virtual logbook capabilities• Build functioning collaboration suites - close to users!• Formed a team: CODESH: DB & Vaibhav Khandelwal;

CAVES: DB & Sanket Totala

Page 11: User Community Report

D.BourilkovD.Bourilkov User Community Report 1111

Choice of Scenarios

Case1: SimpleUser 1 : Does some analysis and produces a result with tag analX_user1. User 2: Browses all current tags in the repository and fetches the session stored with tag analX_user1.

Case2: ComplexUser 1 : Does some analysis and produces a result with tag analX_user1. User 2: Browses all current tags in the repository and fetches the session stored with tag analX_user1.User 2: Does a modification in the program obtained from the session of user1 and stores the same along with a new result with tag analX_user2_mod_code.User 1: Browses the repository, finds that his program was modified and decides to extract that session using the tag analX_user2_mod_code.This scenario can be extended to include an arbitrary number of steps and users in a working group or groups in a collaboration.

Page 12: User Community Report

D.BourilkovD.Bourilkov User Community Report 1212

CAVES / CODESH ArchitecturesScalable and Distributed

First prototypes use popular tools: Python, ROOT and CVS; e.g. all ROOT commands and CAVES commands or all UNIX shell commands and CODESH commands available

Page 13: User Community Report

D.BourilkovD.Bourilkov User Community Report 1313

Working Releases - CODESH

• Virtual log-book for “shell” sessions

• Parts can be local (private) or shared

• Tracks environment variables, aliases, invoked program code etc during a session

• Reproduces complete working sessions

• Complex CMS ORCA example operational

• All CMS data generations for the community group done at the LPC are stored in CODESH and the knowledge is available

Page 14: User Community Report

D.BourilkovD.Bourilkov User Community Report 1414

Working Releases - CAVES

Higgs W+W-

Page 15: User Community Report

D.BourilkovD.Bourilkov User Community Report 1515

Large Scale Data Transfers

• Network aspect:Bandwidth*Delay Product (BDP); we have to use TCP

windows matching it in the kernel AND the application

• On a local connection with 1GbE and RTT 0.19 ms, to fill the pipe we need around 2*BDP2*BDP = 2*1Gb/s*0.00019s = ~ 48 KBytesOr, for a 10 Gb/s LAN: 2*BDP = ~ 480 KBytes

• Now on the WAN: from Florida to Caltech the RTT is 115 ms. So for 1 Gb/s to fill the pipe we need2*BDP = 2*1Gb/s*0.115s = ~ 28.8 MBytes etc.

• User aspect: are the servers on both ends capable of matching these rates for useful disk-to-disk? Tune kernels, get highest possible disk read/write speed etc. Tables turned: WAN outperforms disk speeds!

Page 16: User Community Report

D.BourilkovD.Bourilkov User Community Report 1616

bbcp Tests

bbcp was selected as a starting tool for WAN tests:• Supports multiple streams, highly tunable (window

size etc), peer-to-peer type

• Well supported by Andy Hanushevsky from SLAC

• Is used successfully in BaBar

• I have used it in 2002 for CMS production: massive data transfers from Florida to CERN; the only limit observed at the time was disk writing speed (LAN), network (WAN)

• Starting point Florida Caltech: < 0.5 MB/s on the WAN, very poor performance

Page 17: User Community Report

D.BourilkovD.Bourilkov User Community Report 1717

Evolution of Tests Leading to SC|05

• End points in Florida (uflight1) and Caltech (nw1): AMD Opterons over UL network

• Tuning of kernels and bbcp window sizes – coordinated iterative procedure

• Current status (for file sizes ~ 2GB):• 6-6.5 Gb/s with iperf• up to 6 Gb/s memory to memory• 2.2 Gb/s ramdisk remote disk write

> the speed was the same writing to SCSI disk which is supposedly less than 80 MB/s or writing to a raid array, so de facto it always goes first to memory cache (the Caltech node has 16 GB ram)

• Used successfully with up to 8 bbcp processes in parallel from Florida to the show floor in Seattle; CPU load still OK

Page 18: User Community Report

D.BourilkovD.Bourilkov User Community Report 1818

bbcp Examples Florida Caltech

[bourilkov@uflight1 data]$ iperf -i 5 -c 192.84.86.66 -t 60------------------------------------------------------------Client connecting to 192.84.86.66, TCP port 5001TCP window size: 256 MByte (default)------------------------------------------------------------[ 3] local 192.84.86.179 port 33221 connected with 192.84.86.66 port 5001[ 3] 0.0- 5.0 sec 2.73 GBytes 4.68 Gbits/sec[ 3] 5.0-10.0 sec 3.73 GBytes 6.41 Gbits/sec[ 3] 10.0-15.0 sec 3.73 GBytes 6.40 Gbits/sec[ 3] 15.0-20.0 sec 3.73 GBytes 6.40 Gbits/sec

bbcp: uflight1.ultralight.org kernel using a send window size of 20971584 not 10485792bbcp -s 8 -f -V -P 10 -w 10m big2.root [email protected]:/dev/nullbbcp: Sink I/O buffers (245760K) > 25% of available free memory (231836K); copy may be slowbbcp: Creating /dev/null/big2.rootSource cpu=5.654 mem=0K pflt=0 swap=0

File /dev/null/big2.root created; 1826311140 bytes at 432995.1 KB/s24 buffers used with 0 reorders; peaking at 0.Target cpu=3.768 mem=0K pflt=0 swap=01 file copied at effectively 260594.2 KB/s

bbcp -s 8 -f -V -P 10 -w 10m big2.root [email protected]:dimitribbcp: uflight1.ultralight.org kernel using a send window size of 20971584 not 10485792bbcp: Creating ./dimitri/big2.rootSource cpu=5.455 mem=0K pflt=0 swap=0

File ./dimitri/big2.root created; 1826311140 bytes at 279678.1 KB/s24 buffers used with 0 reorders; peaking at 0.Target cpu=10.065 mem=0K pflt=0 swap=01 file copied at effectively 150063.7 KB/s

Page 19: User Community Report

D.BourilkovD.Bourilkov User Community Report 1919

Data Transfers and Analysis

• CMS service challenges

• Phedex CMS system for data transfers Tier0 Tier1 Tier2• Get expertise with the system

• Provide user feedback

• Integrate Storage/Transfer (SRM/Dcache/Phedex) with network

• Analysis of data from the cosmic runs in collaboration with FNAL Tier1 (muons, calorimetry)

Page 20: User Community Report

D.BourilkovD.Bourilkov User Community Report 2020

Outlook on Data Transfers

• The UltraLight network is already very performant

• The hard problem from the user perspective now is to match it with servers capable of sustained rates for large files > 20 GB (when the memory caches are exhausted); fast disk writes are key (raid arrays)

• To fill 10 Gb/s pipes we need several pairs (3-4) of servers

• In ramdisk tests we achieved 1.2 GB/s on read and 0.3 GB/s on write (cp, dd, bbcp)

• Next step: disk-to-disk transfers between Florida, Caltech, Michigan, FNAL, BNL, CERN

Page 21: User Community Report

D.BourilkovD.Bourilkov User Community Report 2121

UltraLight Analysis Environment

• Interact closely with the application group on integration of UltraLight services in the experiments’ software environments• Clarens web services oriented framework

• MCPS job submission

• Grid Analysis Environment etc.

See talk by Frank van Lingen – Application group

Page 22: User Community Report

D.BourilkovD.Bourilkov User Community Report 2222

Align with ATLAS/CMS/OSG Milestones

• ATLAS/CMS Software stacks are complex and still developing

> Integration work is challenging and constantly evolving

• Data and Service Challenges 2006>Exercise computing services together with LCG + centers>System scale: 50% of single experiment’s needs in 2007

• Computing, Software, Analysis (CSA) Challenges 2006

>Ensure readiness of software + computing systems for data

>10M’s of events through the entire system (incl. Tier2)>Extensive needs for Tier2 to Tier2 data exchanges;

collaboration with DISUN

Page 23: User Community Report

D.BourilkovD.Bourilkov User Community Report 2323

Outlook

• Dedicated groups of people (expert users) for data transfers and analysis tasks available

• Excellent collaboration with the networking and application groups

• Team for developing collaboration tools formed• Explore commonalities and increase the

participation of ATLAS at the analysis stage• SC|05 was a great success, laying a solid

foundation for the next steps• We are involved actively in LHC physics

preparations e.g. the CMS Physics TDR• The Physics Analysis group will play a key role in

achieving successful integration of UltraLight applications in the experiments’ analysis environments

Page 24: User Community Report

D.BourilkovD.Bourilkov User Community Report 2424

Backup slides

Page 25: User Community Report

D.BourilkovD.Bourilkov User Community Report 2525

Linux Kernel Tunings

• Edit sysctl.conf to add the following lines> net.core.rmem_default = 268435456

> net.core.wmem_default = 268435456

> net.core.rmem_max = 268435456

> net.core.wmem_max = 268435456

> net.core.optmem_max = 268435456

> net.core.netdev_max_backlog = 300000

> net.ipv4.tcp_low_latency = 1

> net.ipv4.tcp_timestamps = 0

> net.ipv4.tcp_sack = 0

> net.ipv4.tcp_rmem = 268435456 268435456 268435456

> net.ipv4.tcp_wmem = 268435456 268435456 268435456

> net.ipv4.tcp_mem = 268435456 268435456 268435456

• Enable on the fly the changes in sysctl.conf by executing: sysctl -p /etc/sysctl.conf

• Sizes ~ 256 MB worked best (bigger were not helpful)

Page 26: User Community Report

D.BourilkovD.Bourilkov User Community Report 2626

bbcp Examples Caltech Florida

[uldemo@nw1 dimitri]$ iperf -s -w 256m -i 5 -p 5001 -l 8960------------------------------------------------------------Server listening on TCP port 5001TCP window size: 512 MByte (WARNING: requested 256 MByte)------------------------------------------------------------[ 4] local 192.84.86.66 port 5001 connected with 192.84.86.179 port 33221[ 4] 0.0- 5.0 sec 2.72 GBytes 4.68 Gbits/sec[ 4] 5.0-10.0 sec 3.73 GBytes 6.41 Gbits/sec[ 4] 10.0-15.0 sec 3.73 GBytes 6.40 Gbits/sec[ 4] 15.0-20.0 sec 3.73 GBytes 6.40 Gbits/sec[ 4] 20.0-25.0 sec 3.73 GBytes 6.40 Gbits/sec

bbcp -s 8 -f -V -P 10 -w 10m big2.root [email protected]:/dev/nullbbcp: Sink I/O buffers (245760K) > 25% of available free memory (853312K); copy may be slowbbcp: Source I/O buffers (245760K) > 25% of available free memory (839628K); copy may be slowbbcp: nw1.caltech.edu kernel using a send window size of 20971584 not 10485792bbcp: Creating /dev/null/big2.rootSource cpu=5.962 mem=0K pflt=0 swap=0

File /dev/null/big2.root created; 1826311140 bytes at 470086.2 KB/s24 buffers used with 0 reorders; peaking at 0.Target cpu=4.053 mem=0K pflt=0 swap=01 file copied at effectively 263793.4 KB/s