June 2007 The State of TeraGrid: A National Production Cyberinfrastructure Facility Charlie Catlett,...
Preview:
Citation preview
- Slide 1
- June 2007 The State of TeraGrid: A National Production
Cyberinfrastructure Facility Charlie Catlett, Chair, TeraGrid Forum
University of Chicago and Argonne National Laboratory
cec@uchicago.edu Dane Skow, Director, TeraGrid GIG University of
Chicago and Argonne National Laboratory dds@uchicago.edu
www.teragrid.org UNIVERSITY OF CHICAGO THESE SLIDES MAY BE FREELY
USED PROVIDING THAT THE TERAGRID LOGO REMAINS ON THE SLIDES, AND
THAT THE SCIENCE GROUPS ARE ACKNOWLEDGED IN CASES WHERE SCIENTIFIC
IMAGES ARE USED. (SEE SLIDE NOTES FOR CONTACT INFORMATION)
- Slide 2
- June 2007 SDSC TACC UC/ANL NCSA ORNL PU IU PSC NCAR Caltech
USC/ISI UNC/RENCI UW Resource Provider (RP) Software Integration
Partner Grid Infrastructure Group (UChicago) 9 Resource Providers,
One Facility
- Slide 3
- June 2007 TeraGrid Vision TeraGrid will create integrated,
persistent, and pioneering computational resources that will
significantly improve our nations ability and capacity to gain new
insights into our most challenging research questions and societal
problems. Our vision requires an integrated approach to the
scientific workflow including obtaining access, application
development and execution, data analysis, collaboration and data
management.
- Slide 4
- June 2007 TeraGrid Objectives DEEP Science: Enabling Petascale
Science Make Science More Productive through an integrated set of
very-high capability resources Address key challenges prioritized
by users WIDE Impact: Empowering Communities Bring TeraGrid
capabilities to the broad science community Partner with science
community leaders - Science Gateways OPEN Infrastructure, OPEN
Partnership Provide a coordinated, general purpose, reliable set of
services and resources Partner with campuses and facilities
- Slide 5
- June 2007 How Are We Doing? Scientific impact TeraGrid is
enabling significant scientific advances spanning most of the NSF
Directorates. The panel believes the TG is proceeding well to meet
its main objectives Our team A careful evaluation of the governance
of the TG led the panel to a better appreciation of the innovative
methods being used to enhance collaboration and consensus building
across the geographically dispersed participants in the project.
Science Gateways Program impressed by the potential for Science
Gateways to enable new communities and to provide new integrated
technologies to a broader scientific audience. Executive Summary of
NSF Site Review report, February 2007.
- Slide 6
- June 2007 Where Can We Improve? Communication of Impact better
publicize the transformative science that has been done using the
TG grid services consider a breakdown of the science nuggets,
making it clear which the projects use their integrated grid
resources and services, and which ones could have been done using
the resources of a single RP. Education, Outreach, Training,
Inclusion The panel feels that stronger outreach and education are
integral in the successful integration of TG in the scientific
research environment. Four recommendations: outreach to
underrepresented groups, leverage EOT of other projects, expand EOT
partnerships (with societies, etc.), harvest user-developed EOT
materials. Understand Growth and Encourage New Communities more
detailed tracking and follow-up of the DAC allocations. Document
actual use of each SG, including (user and discipline based)
demographics. more widespread promotion of the SGs to potential
user communities. NSF Site Review report, February 2007.
- Slide 7
- June 2007 Critical Areas of Technical Work Security and
Authorization The development and implementation of authentication
and authorization models, which enable a transparent integration of
current and future resources, remains an important technical
challenge that must be addressed promptly. Scheduling make
automated metascheduling a reality
- Slide 8
- June 2007 Pressing Forward: Organization Our review indicates
that the current organization provides us a strong platform for
moving forward. And a strong platform for exploring optimization
More efficient movement from discussion to consensus and
translation of consensus to action More effectively tapping
strategic expertise across the project Key TeraGrid Organizational
Building Blocks Persistent working groups Agile RATs Rich
communications (weekly all-hands, quarterly management, etc.) RP
Forum as a representative body Advisory Groups need further
optimization GIG: Executive Steering Committee (ESC) Overall
TeraGrid: Cyberinfrastructure User Advisory Committee (CUAC)
- Slide 9
- June 2007 Next Steps on Organization Governance RAT RP Forum as
a basis for consensus-based democracy Understand leadership roles
in RPF, GIG Cyberinfrastructure User Advisory Committee strong need
for a science advisory board that could provide strategic guidance
to TeraGrid Focus specifically on TeraGrid a clearly delineated
mission for this panel that distinguishes its focus from those of
other advisory groups associated with the TeraGrid New GIG
Leadership!
- Slide 10
- June 2007 On a Personal Note Thank you
- Slide 11
- June 2007 Where Are We Now? 2006 Was a break-out year Growth by
every metric New science successes First gateways into production
Initial, strong adoption of new grid capabilities
- Slide 12
- June 2007 AllocationsFY05FY06% Change LRAC proposals awarded62
(13 new)88 (22 new) +42(+69) MRAC proposals awarded70 (50 new)160
(92 new) +129(+84) TeraGrid DAC proposals awarded123 (115 new)229
(209 new) +86(+82) Active TeraGrid PIs3611,019 +182 Usage NUs
Requested (LRAC/MRAC/DAC)1.3 B2.96 B +130 NUs Awarded844 M1.92 B
+128 NUs Available (max)881 M2.23 B +153 NUs Delivered (% util)565
M (64%)1.28 B (57%) +129(-11) NUs used by TG Staff10.4 M10.1 M Jobs
run594,7561,686,686 +185 Users (Total) Users with active accounts
during the year1,7124,190 +145 Users charging jobs during the
year8761,731 +98 Users with active accounts on December
311,4683,126 +113 User Home Institutions (users charging
jobs)151265 +76 US states (incl DC/PR) (users charging jobs)3747
+27 Users by Allocation Size LRAC Users (# charging jobs)509
(238)1,152 (496) +126(+108) MRAC Users (# charging jobs)542
(248)1,087 (423) +101(+71) DAC Users (# charging jobs)661
(365)1,948 (783) +195(+116)
- Slide 13
- June 2007 TeraGrid Resource Change Compute Additions: NCSA
(Cobalt, Copper, Xeon Linux Supercluster, Condor Cluster) SDSC
(DataStar p655, DataStar p690, BlueGene) Purdue (Condor+) TACC
(LoneStar+) IU: (BigRed) PSC: (BigBen+) Storage Additions:
GPFS-WAN: (+800TB) IU: Tape Archive SDSC: Data Collections,
Database Service Retirements: PSC: (TCS1) IU: ( - IA 32 & 64)
Upcoming: TACC (Ranger - Jan 2008), NCAR (Frost - Dec 2007 ?)
HPCOPS NCSA (Abe - June 2007 ?) + ? 2nd Track 2 Machine ($30M)
(announce Oct 2007) Track 1 Machine ($200M) (announce Oct
2007)
- Slide 14
- June 2007 TeraGrid Usage 33% Annual Growth Specific
AllocationsRoaming Allocations 200 100 Normalized Units (millions)
TeraGrid currently delivers an average of 400,000 cpu-hours per day
-> ~20,000 CPUs DC Dave Hart (dhart@sdsc.edu)
- Slide 15
- June 2007 TeraGrid User Community Dave Hart
(dhart@sdsc.edu)
- Slide 16
- June 2007 TeraGrid User Community Gateways Dave Hart
(dhart@sdsc.edu) Growth Target
- Slide 17
- June 2007 TeraGrid is Like an Accelerator... Deep Dedicated
Experiments Unique Machine Wide Education Center Computing Center
Open Fishing Buffalo
- Slide 18
- June 2007 or more like Many Accelerators Not Even HEP Has
Solved the Challenge of Integrating These !!
- Slide 19
- June 2007 Sergiu Sanielevici (sergiu@psc.edu) Advanced Support
for TeraGrid Applications (ASTA)
- Slide 20
- June 2007 Searching for New Crystal Structures Deem (Rice)
Searching for new 3-D zeolite crystal structures. Database of 3.4M+
structures created in 1 year (20,000x) Were working with a major
oil company to look at the structures in hopes of finding new
catalysts for chemical and petrochemical applications, said Deem.
This project could not have been accomplished in a one to three-
year time frame without the TeraGrid. http://www.physorg.com/news85
255507.html
- Slide 21
- June 2007 Predicting Storms Hurricanes and tornadoes cause
massive loss of life and damage to property Underlying physical
systems involve highly non-linear dynamics so computationally
intense Data comes from multiple sources real time derived from
streams of data from sensors Archived in databases of past storms
Infrastructure challenges: Data mine instrument radar data for
storms Allocate supercomputer resources automatically to run
forecast simulations Monitor results and retarget instruments. Log
provenance and metadata about experiments for auditing. Slides
Courtesy Dennis Gannon and LEAD Collaboration
- Slide 22
- June 2007 Experience so far First release to support
WxChallenge: the new collegiate weather forecast challenge The
goal: forecast the maximum and minimum temperatures, precipitation,
and maximum sustained wind speeds for select U.S. cities. to
provide students with an opportunity to compete against their peers
and faculty meteorologists at 64 institutions for honors as the top
weather forecaster in the nation. 79 users ran 1,232 forecast
workflows generating 2.6TBybes of data. Over 160 processors were
reserved on Tungsten from 10am to 8pm EDT(EST), five days each week
National Spring Forecast First use of user initiated 2Km forecasts
as part of that program. Generated serious interest from National
Severe Storm Center.
- Slide 23
- June 2007 Solve any Rubiks Cube in 26 moves? Rubik's Cube is
perhaps the most famous combinatorial puzzle of its time. > 43
quintillion states (4.3x10^19) Gene Cooperman and Dan Kunkle of
Northeastern Univ. just proved any state can be solved in 26 moves.
7TB of distributed storage on TeraGrid allowed them to develop the
proof URL: http://www.physorg.com/news99843195.html
- Slide 24
- June 2007 TeraGrid is Like a CAMAC Crate Standard
instrumentation infrastructure (backplane) Wide variety of
components built on top of that infrastructure Can be easily
partitioned, federated and replicated CTSS v4 (6/07): Small core
plus optional kits CTSS v2 (slightly smaller) CTSS v3 (add web
services, even smaller)... CTSS v1 (30+ pkgs)
- Slide 25
- June 2007 Lower Integration Barriers; Improved Scaling Initial
Integration: Implementation-based Coordinated TeraGrid Software and
Services (CTSS) Provide software for heterogeneous systems,
leverage specific implementations to achieve interoperation.
Evolving understanding of minimum required software set for users
Emerging Architecture: Services-based Core services: capabilities
that define a TeraGrid Resource Authentication & Authorization
Capability Information Service Auditing/Accounting/Usage Reporting
Capability Verification & Validation Mechanism Significantly
smaller than the current set of required components. Provides a
foundation for value-added services. Each Resource Provider selects
one or more added services, or kits Core and individual kits can
evolve incrementally, in parallelLower User Barriers; Increase
Security
- Slide 26
- June 2007 Use Modality Community Size (est. number of projects)
Batch Computing on Individual Resources 850 Exploratory and
Application Porting 650 Workflow, Ensemble, and Parameter Sweep 160
Science Gateway Access 100 Remote Interactive Steering and
Visualization 35 Tightly-Coupled Distributed Computation 10
TeraGrid Usage Modes in CY2006 Grid-y Users
- Slide 27
- June 2007 Monthly Use of Selected Grid Capabilities January
2005 through April 2007 MyCluster CPUs MyCluster Jobs Globus GRAM
Jobs QBETS Queries Globus GRAM Users Synchronous cross-site
jobs
- Slide 28
- June 2007 DAC Roaming Behavior 2006 Data Count of resource_n
ame TG DACs Total TGSUs 11431,745,314 260919,461 346664,231
416351,340 58183,271 65153,083 7164,270 813,878 916,979 10225,121
12197,774 Grand Total 2844,214,722 Analysis and Chart courtesy Dave
Hart, SDSC 284 active DACs in 2006. ~10X growth !! 25%
- Slide 29
- June 2007 Information Services (MDS) Kit Registry
- Slide 30
- June 2007 Drives This
- Slide 31
- June 2007 And This Google Earth GIN demo
http://www.physorg.com/news82811067.html
- Slide 32
- June 2007 TeraGrid is a Social Network TeraGrid conference is
going great ! LRAC/MRAC liaisons SGW community very successful
Mailing list/phonecon/Wiki Transitioning to consulting model CI
Days - campus outreach OSG/Internet2/NLR/EDUCAUSE/ MSI-CIEC
partnership HPC University OSG, Shodor, Krell, OSC, NCSI, MSI-CIEC
partnership CI-TEAM Workshop - July 9-11 For CI-TEAM awardees and
aspiring grantees - apply now! Education and Outreach Engaging
thousands of people
- Slide 33
- June 2007 TeraGrid Science Gateways Initiative: Community
Interface to Grids Common Web Portal or application interfaces
(database access, computation, workflow, etc). Back-End use of
TeraGrid computation, information management, visualization, or
other services. 4 talks on cross-grid work at this Conference. 3
Science track.
- Slide 34
- June 2007 HPC University Goals Advance researchers HPC skills
Search catalog of live and self-paced training Schedule series of
training courses Gap analysis of materials to drive development
Work with educators to enhance the curriculum Search catalog of HPC
resources Schedule workshops for curricular development Leverage
good work of others Offer Student Research Experiences Enroll in
HPC internship opportunities Offer Student Competitions Publish
Science and Education Impact Promote via TeraGrid Science
Highlights, iSGTW Publish education resources to NSDL-CSERD
- Slide 35
- June 2007 Workshop and Training Sites in 2007 TeraGrid RP
Minority Serving Institution Research 1 Univ. 2/4 Yr. College
Workshop Conference Tutorial TeraGrid 07
- Slide 36
- June 2007 Commercial/Public World is moving FAST! Two Examples
of Communities: Search for Jim Gray Over 12,000 people helped
search ! Facebook Group on Math Love Song (search Finite Simple
Group by Klein Four on YouTube)
- Slide 37
- June 2007 TeraGrid is: Operations We have facilities/services
on which users rely We provide infrastructure on which other
providers build AND R&D Were learning how to do distributed,
collaborative science on a global, federated infrastructure Were
learning how to run multi-institution shared infrastructure
- Slide 38
- June 2007 Looking to the Future Focus on Operations and
Transparency Add more resources into TeraGrid Framework
Documentation and Training Data Movement Scheduling and Info
Services Federation: Partner Grids, Campuses
- Slide 39
- June 2007 Backup Slides
- Slide 40
- June 2007 A Walk Down Memory Lane 1985: TCP/IP won network
dominance wars 1 MIPS analysis Machine = $750,000 1990: Rise of the
RISC workstation 65 MB hard drive = $350 1995: Web Browsers
(Mosaic) hit the scene (Internet Begins) #1 Machine on Top 500 =
250 Gflops (Nov. 1996) 2000: Y2K Bug/Napster Raises IT Awareness
Triumph of Linux/Farms in HEP 2005: Search becomes
King/Crowdsourcing TeraGrid begins operation The era of the
Production Grid Begins 2010: ?? What Will Be Effects of MultiCore
??
- Slide 41
- June 2007 TeraGrid Objectives DEEP Science: Enabling Petascale
Science Make Science More Productive through an integrated set of
very-high capability resources Address key challenges prioritized
by users WIDE Impact: Empowering Communities Bring TeraGrid
capabilities to the broad science community Partner with science
community leaders - Science Gateways OPEN Infrastructure, OPEN
Partnership Provide a coordinated, general purpose, reliable set of
services and resources Partner with campuses and facilities
- Slide 42
- June 2007 Real-Time Usage Mashup Alpha version Mashup tool -
Maytal Dahan, Texas Advanced Computing Center
(maytal@tacc.utexas.edu) 309 Jobs running across 9,336 processors
at 22:34 06/02/2007
- Slide 43
- June 2007 Org Chart
- Slide 44
- June 2007 Summary of Publications study
- Slide 45
- June 2007 Networking SDSC UC/ANLPSC TACC ORNL LA DEN NCSA NCAR
Abilene 2x10G 1x10G PU IPGrid IU CHI 1x10G 1x10G each 2x10G 1x10G
3x10G each Cornell 1x10G
- Slide 46
- June 2007 TeraGrid User Community Growth Begin TeraGrid
Production Services (October 2004) Incorporate NCSA and SDSC Core
(PACI) Systems and Users (April 2006) Decommissioning of systems
typically causes slight reductions in active users. E.g. December
2006 is due to decommissioning of Lemeux (PSC). FY05FY06 New User
Accounts9482,692 Avg. New Users per Quarter315365* Active
Users1,3503,228 All Users Ever1,7994,491 (*FY06 new users/qtr
excludes Mar/Apr 2006)
- Slide 47
- June 2007 TeraGrid Resources For Scientific Discovery Computing
- over 250 TFlops and growing Common help desk and consulting
requests CTSS software environment Remote visualization servers and
visualization software Data Management Over 20 Petabytes of storage
Over 100 Scientific Data Collections Broadening Participation in
TeraGrid Over 20 Science Gateways Advanced Support for TeraGrid
Applications Education and training events and resources Access
Common allocations mechanism - DAC, MRAC and LRAC Security
Shibboleth testbed underway for campus authentication
- Slide 48
- June 2007 TeraGrid Projects by Institution Blue: 10 or more PIs
Red: 5-9 PIs Yellow: 2-4 PIs Green: 1 PI 1000 projects, 3200 users
TeraGrid allocations are available to researchers at any US
educational institution by peer review. Exploratory allocations can
be obtained through a biweekly review process. See
www.teragrid.org.
- Slide 49
- June 2007 Popular Resources for DAC Awards
- Slide 50
- June 2007 Grid Service Usage (PreWS GRAM) Daily INCA Reporter
(http://tinyurl.com/23ugbm) courtesy Kate Ericson,
SDSChttp://tinyurl.com/23ugbm
- Slide 51
- June 2007 Daily GT4 WS Invocation Reports Graph courtesy Tony
Rimovsky, NCSA
- Slide 52
- June 2007 User Portal Additions
- Slide 53
- June 2007 Data as Resource What can we say here about
progress/status ?