20
DOSAR Workshop V September 27, 2007 Michael Bryant Louisiana Tech University Louisiana Tech Site Report

Louisiana Tech Site Report

  • Upload
    haven

  • View
    57

  • Download
    1

Embed Size (px)

DESCRIPTION

Louisiana Tech Site Report. DOSAR Workshop V September 27, 2007 Michael Bryant Louisiana Tech University. Louisiana Tech University and LONI. COMPUTING IN LOUISIANA. Computing Locally at LTU. At the Center for Applied Physics Studies (CAPS), - PowerPoint PPT Presentation

Citation preview

Page 1: Louisiana Tech Site Report

DOSAR Workshop V

September 27, 2007

Michael BryantLouisiana Tech University

Louisiana Tech Site Report

Page 2: Louisiana Tech Site Report

9/27/2007DOSAR Workshop V

2

Louisiana Tech University and LONI

COMPUTING IN LOUISIANA

Page 3: Louisiana Tech Site Report

• At the Center for Applied Physics Studies (CAPS),▫Small 8 node cluster with 28 processors (60 Gigaflops)

Used by our local researchers and the Open Science Grid Dedicated Condor Pool of both 32-bit and 64-bit (w/ compat)

machines running RHEL5• Additional resources at LTU through the Louisiana Optical

Network Initiative (LONI)▫ Intel Xeon 5TF Linux cluster (not yet ready):

128 nodes (512 CPUs), 512 GB RAM 4.772 TF peak performance

▫ IBM Power5 AIX cluster 13 nodes (104 CPUs), 224 GB RAM 0.851 TF peak performance

9/27/2007DOSAR Workshop V

3

Computing Locally at LTU

Page 4: Louisiana Tech Site Report

• Focused on High Energy Physics, High Availability (HA) and Grid computing, and Biomedical Data Mining▫High Energy Physics:

Fermilab (D0), CERN (ATLAS), and ILC: Dr. Lee Sawyer, Dr. Dick Greenwood (Institutional Rep.), Dr. Markus

Wobisch» Joe Steele is now at TRIUMF in Vancouver

Jefferson Lab (G0, Qweak experiments) Dr. Kathleen Johnston, Dr. Neven Simicevic, Dr. Steve Wells, Dr. Klaus

Grimm▫HA and Grid computing :

Dr. Box Leangsuksun Vishal Rampure Michael Bryant (me)

9/27/2007DOSAR Workshop V

4

Louisiana Tech Researchers

Page 5: Louisiana Tech Site Report

• 40Gb/sec bandwidth state-wide• Next-generation network for research• Connected to the National LambdaRail (NLR, 10Gb/sec)

in Baton Rouge• Spans 6 universities and 2 health centers

The Louisiana Optical Network Initiative (LONI) is a high speed computing and networking resource supporting scientific research and the development of new technologies, protocols, and applications to positively impact higher education and economic development in Louisiana.

9/27/2007DOSAR Workshop V

5

Louisiana Optical Network Initiative

- http://loni.org

Page 6: Louisiana Tech Site Report

• 1 x Dell 50 TF Intel Linux cluster housed at the state's Information Systems Building (ISB)▫ “Queen Bee” named after Governor Kathleen Blanco who pledged $40 million

over ten years for the development and support of LONI.▫ 680 nodes (5,440 CPUs), 688 GB RAM

Two quad-core 2.33 GHz Intel Xeon 64-bit processors 8 GB RAM per node

▫ Measured 50.7 TF peak performance▫ According to the June, 2007 Top500 listing*, Queen Bee ranked the

23rd fastest supercomputer in the world.

• 6 x Dell 5 TF Intel Linux clusters housed at 6 LONI member institutions▫ 128 nodes (512 CPUs), 512 GB RAM

Two dual-core 2.33 GHz Xeon 64-bit processors 4 GB RAM per node

▫ Measured 4.772 TF peak performance

• 5 x IBM Power5 575 AIX clusters housed at 5 LONI member institutions▫ 13 nodes (104 CPUs), 224 GB RAM

Eight 1.9 GHz IBM Power5 processors 16 GB RAM per node

▫ Measured 0.851 TF peak performance

9/27/2007DOSAR Workshop V

6

LONI Computing Resources

* http://top500.org/list/2007/06/100

Combined total of 84 Teraflops

Page 7: Louisiana Tech Site Report

National Lambda Rail

Louisiana Optical Network

IBM P5 Supercomputers

LONI Members

Dell 80 TF Cluster

NEXT ???

9/27/2007DOSAR Workshop V

7

LONI: The big picture…by Chris Womack

Page 8: Louisiana Tech Site Report

•Goal: enable domain scientists to focus on their primary research problem, assured that the underlying infrastructure will manage the low-level data handling issues.

•Novel approach: treat data storage resources and the tasks related to data access as first class entities just like computational resources and compute tasks.

•Key technologies being developed: data-aware storage systems, data-aware schedulers (i.e. Stork), and cross-domain meta-data scheme.

•Provides and additional 200TB disk, and 400TB tape storage

9/27/2007DOSAR Workshop V

8

Page 9: Louisiana Tech Site Report

UNO

Tulane

LSU

ULL

LaTech

High Energy PhysicsBiomedical Data Mining

Coastal ModelingPetroleum Engineering

Synchrotron X-ray Microtomography Computational Fluid Dynamics

Biophysics

Molecular BiologyComputational Cardiac Electrophysiology Petroleum Engineering

Geology

Participating institutions in the PetaShare project, connected through LONI. Sample research of the participating researchers pictured (i.e. biomechanics by Kodiyalam & Wischusen, tangible interaction by Ullmer, coastal studies by Walker, and molecular biology by Bishop).

Page 10: Louisiana Tech Site Report

9/27/2007DOSAR Workshop V

10

LONI and the Open Science Grid

ACCESSING RESOURCES ON THE GRID

Page 11: Louisiana Tech Site Report

• Located here at Louisiana Tech University• OSG 0.6.0 production site• Using our small 8 node Linux cluster

▫ Dedicated Condor Pool using 20 of the 28 CPUs▫ 8 nodes (28 CPUs), 36 GB RAM

2 x Dual 2.2 GHz Xeon 32-bit processors, 2GB RAM per node 2 x Dual 2.8 GHz Xeon 32-bit processors, 2GB RAM per node 2 x Dual 2.0 GHz Operton 64-bit processors, 2GB RAM per node 1 x Two quad-core 2.0 GHz Xeon 64-bit processors, 16GB RAM 1 x Two quad-core 2.0 GHz Xeon 64-bit processors, 8GB RAM

• We would like to…▫ Expand to Windows Co-Linux Condor Pool▫ Combine with IfM and CS clusters

• Plan to move to OSG ITB when the LONI 5TF Linux cluster at LTU becomes available

9/27/2007DOSAR Workshop V

11

OSG Compute Element: LTU_OSG

Page 12: Louisiana Tech Site Report

• Located at the Center for Computation & Technology (CCT) at Louisiana State University (LSU) in Baton Rouge, La.

• OSG 0.6.0 production site• Using the LONI 5TF Linux cluster at LSU (Eric)

▫ PBS opportunistic single-processor queue▫ Only 64 CPUs (16 nodes) available from the 512 CPUs total

128 nodes, 512 GB RAM Two dual-core 2.33 GHz Xeon 64-bit processors 4 GB RAM per node

▫ The 16 nodes are shared with other PBS queues

• Played a big role in the DZero reprocessing effort▫ Dedicated access to LONI cluster during reprocessing▫ 384 CPUs total were used simultaneously

• Continuing to run DZero MC production at both sites

9/27/2007DOSAR Workshop V

12

OSG Compute Element: LTU_CCT

Page 13: Louisiana Tech Site Report

9/27/2007DOSAR Workshop V

13

Reprocessing at LTU_CCT

LTU_CCT (LONI)

Page 14: Louisiana Tech Site Report

9/27/2007DOSAR Workshop V

14

Reprocessing at LTU_CCT (cont.)

LTU_CCT (LONI)

Page 15: Louisiana Tech Site Report

9/27/2007DOSAR Workshop V

15

DZero MC Production for LTU*

Weekly production by site

Cumulative production by site* LTU_CCT and LTU_OSG are combined

8.5 million events total

Page 16: Louisiana Tech Site Report

9/27/2007DOSAR Workshop V

16

LONI OSG CEs and PanDA Scalability + High Availability

CURRENT STATUS AND FUTURE PLANS

Page 17: Louisiana Tech Site Report

• Upgraded to OSG 0.6.0 • Upgraded to RHEL5• Added two new Dell Precision Workstations (16 CPUs, two quad-

core 2.0GHz Xeon 64-bit processors, 16GB and 8GB)• Connected to LONI 40Gbps network in June (finally!)

Allows us to run D0 MC again• Running DZero MC production jobs (sent using Joel’s AutoMC

daemon)• Installed standalone Athena 12.0.6 on caps10 for testing ATLAS

analysis

9/27/2007DOSAR Workshop V

17

Current Status of LTU_OSG

Page 18: Louisiana Tech Site Report

• Switched to the LONI 5TF (Eric) cluster from SuperMike/Helix• Upgraded to OSG 0.6.0• Running DZero MC production jobs (sent using Joel’s AutoMC

daemon)• Running ATLAS production test jobs

▫ Problems so far: Pacman following symlinks! (/panasas/osg/app -> /panasas/osg/grid/app

on headnode) Conflict with 32-bit Python install on 64-bit OS (https:// not supported) OSG_APP Python path was wrong Incorrect Tier2 DQ2 URL

▫ 3 successful tests, need a few more before running full production

9/27/2007DOSAR Workshop V

18

Current Status of LTU_CCT

Page 19: Louisiana Tech Site Report

• Create OSG CEs at each of the six LONI sites• Possibly creating a LONI state-wide grid

▫Tevfik Kosar is building a campus grid at LSU• Begin setting up PetaShare storage at each LONI site• PanDA scalability tests on Queen Bee

▫Proposing to PanDA team and LONI allocation committee• Involving other non-HEP projects to DOSAR using PanDA (see talk

tomorrow)• Applying HA techniques to PanDA and the Grid (see talk

tomorrow)

9/27/2007DOSAR Workshop V

19

What’s next?

Page 20: Louisiana Tech Site Report

9/27/2007DOSAR Workshop V

20

QUESTIONS / COMMENTS?