Upload
herbert-doxey
View
218
Download
2
Embed Size (px)
Citation preview
Clemson Computing and Information Technology
Introduction to Cyberinfrastructure(aka: Grids, Clouds, etc)
Creating an Integrated South Carolina Environmental and Human Health Grid:Using Cyberinfrastructure to Link Hazards, Exposures and Health Effects
June 25, 2009
Jim Bottum and Jill Gemmill
Clemson Computing and Information Technology
Cyberinfrastructure (CI)
Computing and Communications
Data – Storage, retrieval, archiving, visualization, mining, security
Virtual Organizations – Software environments for communities
Education and Workforce Training
Cyberinfrastructure has a four-legged strategic approach.
Cyberinfrastructure = Information technology we all use
Clemson Computing and Information Technology 3
CI enables Scaling up ScienceCitation Network Analysis in Sociology
2002
1975
1990
1985
1980
2000
1995 Work of James Evans, University of Chicago,
Department of Sociology
Clemson Computing and Information Technology
• Query and analysis of 25+ million citations• Higher throughput and capacity enables deeper
analysis and broader community access.
• Analysis began on desktop workstations• Queries grew to month-long duration• Moved analysis to U of Chicago compute
cluster: 50 (faster) CPUs gave 100 X speedup
• Many more methods and hypotheses can be tested!
Scaling up the analysis
Clemson Computing and Information Technology
• Telephone / Voice• Photography / Images• Video / Movies / Animation• Information / Data /
Library
CI is the convergence of technologies:
Clemson Computing and Information Technology
Clemson CI Vision
“Cyberinfrastructure is the primary backbone that ties together innovation in research, instruction, and service to elevate Clemson to the Top 20”
Dori HelmsProvost
Clemson Computing and Information Technology
High Performance CI is similar, but scaled up in size and complexity
+
Clemson Computing and Information Technology
Independent computations can be done in parallel
• Many computations (modeling or analysis of data) consist of:• Large datasets as inputs (find datasets)• “Transformations” which work on the input datasets
(process)• The output datasets (store and publish)
Montage Workflow: ~1200 jobs, 7 levelsNVO, NASA, ISI/Pegasus - Deelman et al.
= Data Transfer
= Compute Job
Clemson Computing and Information Technology
Clusters: High Performance Computing
Clemson Palmetto Cluster: #60 supercomputer, #4 in stand alone US universities – 756 “PCs” (commodity hardware) with closely coupled, high speed communications and large scale, very fast storage.
½ PetaByte Storage
45 TeraFlops.
756 compute
nodes
High Throughput
interconnects (10 Gbps)
Clemson Computing and Information Technology
What problems? (months to minutes)
Estimation and Inference about Efficiency in Production Settings -Paul Wilson, Economics
Protein Interaction with Synthetic Materials -Robert Latour, Bioengineering
Effects of School Characteristics and Parents' Work Behaviors on Children's Performance in School--modeling the parents' choices of places of residence and their labor market behaviors (over 15,000 school districts)-Tom Mroz, Economics
Clemson Computing and Information Technology
High Throughput Computing
Harvesting unused student lab computer cycles using Condor software
1700 lab PCs can be used as a supercomputer
Small amounts of
storage
Low Throughput connections
Clemson Computing and Information Technology
What Problems? (months to minutes)
Manufacturing & Scheduling
Optimization-Mary Beth Kurz, Industrial
Engineering
Rendering Architectural 3D views and walk-throughs
ArcGIS Computations over multiple
inputs and very large
areas
Clemson Computing and Information Technology
Leverage Existing Investments
*Linked 1200 Clemson lab machines to provide high throughput compute power
M. B. Kurz, Industrial Engineering
“..Grid computing is saving me”
"I had all but given up on this line of research, when I was approached with this Condor idea. Now I am doing work at a scale larger than is usually done." M.B. Kurz
1700+ student lab machines
Clemson Computing and Information Technology
National Trend – Grids, Clouds,
Similar to power grids…..
…..computing grids enable resource sharing
Clemson Computing and Information Technology
Credit: Open Cloud ConsortiumCloud system at the NSF supported National Center for Data Mining at the UI - Chicago
Cloud computing enables faster, less expensive processing across geographically distributed data centers
National Trend - Clouds
"We demonstrated that our system is six times faster than competing technology.”
Robert Grossman, NCDM directorNSF Press Release 09-032, Feb. 25, 2009
Clemson Computing and Information Technology
FutureNet connects to every major R&E Network via multiple lambdas
National and Regional Research and Education Networks
South Carolina Light Rail and C-Light
Higher Bandwidth and Optional Dedicated Fiber Paths Connect Grids
R&E Connections are International
Clemson Computing and Information Technology
State Network FoundationSCLRFCC Rural Health CareDoE – PSA
ARRA Stimulus Broadband Activities may build on this
Clemson Computing and Information Technology
Clemson 2006 Clemson 2008
Networks:
No redundancy
Mbps
Data Center
No HPC
ITC
Poole-1 Poole-2
StrodeHolmesRiggs
1G1G
1G
1G 1G1G 1G1G 1G
Internet / Internet2
1G1G
BuildingBuildingBuildingBuildingBuildingBuildingBuildingBuilding
BuildingBuildingBuildingBuilding
Access Layer
Distribution Layer
Core Layer
Data Center Internet / Internet2 Wide Area Network
C-Light used no taxpayer $$
SCLR – I2 - NLRGbps
~30,000 sf aggregate data center
Re-engineered SAN; petascale of storage
Collocation, Condominium, Condor
• Approaching 100 Teraflops
#60 on Intl Top 500
24X7X365 NOC
“Last Mile” Clemson Infrastructure
Clemson Computing and Information Technology
• Virtual Organization: Group of people who share a common goal and share resources (or need resources) to achieve that goal• A classroom as a VO• A group of astronomers• Providers of a common application such as email
• Sharing resources among virtual organizations is what grid computing is trying to solve
• Portals / Science Gateways / Hubs make it easy to use grids – all you need is a web browser
National Trend – VOs and Communities
Clemson Computing and Information Technology
COMPUTATION
VISUALIZATION
EDUCATIONDATA
R&E Networks
Portal/ Gateway/Hub
Middleware (Grid
Software)
Another Way of looking at Grids
Clemson Computing and Information Technology24
Intelligent RiverTM Watershed Monitoring
Linking Water, Land Use, Energy, and Global Climate Change
Source: WATERS Network cyberinfrastructure, NSF
Slid
e c
ou
rtesy
of
Gene E
idso
n
Clemson Computing and Information Technology
Grids are enhancing and expanding traditional scientific methods
Clemson Computing and Information Technology
A Disruptive Technology
• Examples of some previous disruptive technologies:• The printing press• The telegraph and telephone• The web browser
• What changed as a result?• Widespread literacy• Real-time communications over distance• Anyone can use a computer
Clemson Computing and Information Technology
Grids are Disruptive Technologies
(1) The world is “flat”
(2) Everyone can be a producer as well as consumer of information
(3) ‘Web Services’ makes information re-useable and available anywhere
(4) Information can be customized to your interests (think Amazon.com)
Clemson Computing and Information Technology
What are “Web Services”?
The map and data on the left represent streamflow conditions – data is collected by the USGS and made available on a map on their web site.
Data is made available as a web service
Clemson Computing and Information Technology30
Therefore, the Intelligent RiverTM
(IR) can use this data to re-display in a different manner and combine with other, project-specific data. USGS data used is always current, but can be stored as well.
In the same way, IR data can be re-used by others.
Standards and MetaData make this possible
Clemson Computing and Information Technology
Science is Disrupted, too• Old way:
• The experimental notebook• The filing cabinet• The library catalog• Purchase/write your own analysis software• Results made available to research specialists
via conference presentations and journal publications
• Results are written, presented and preserved on paper.
• Information is organized and presented for a specific audience
Clemson Computing and Information Technology
Using Grids--
• The experimental notebook• Digital data in standardized format; use Metadata
to record date, parameters, owner, etc.• The filing cabinet• The Database • The library catalog• The search engine• Purchase/write your own analysis software• Use analytical/computational services available in
the Grid
• The experimental notebook• Digital data in standardized format; use Metadata
to record date, parameters, owner, etc.• The filing cabinet• The Database • The library catalog• The search engine• Purchase/write your own analysis software• Use analytical/computational services available in
the Grid
Clemson Computing and Information Technology 33
• Results made available to research specialists via conference presentations and journal publications• Results are available to anyone interested• Data may be made available upon collection or
analysis• Results may be made available as simulations,
animations, movies, tutorials, interactive games…or a mix of these with text.
• Results are written, presented and preserved on paper.• Results are digitally stored, archived, and
available on the Internet
• Results made available to research specialists via conference presentations and journal publications• Results are available to anyone interested• Data may be made available upon collection or
analysis• Results may be made available as simulations,
animations, movies, tutorials, interactive games…or a mix of these with text.
• Results are written, presented and preserved on paper.• Results are digitally stored, archived, and
available on the Internet
Using Grids (cont’d):
Clemson Computing and Information Technology
• Information is organized and presented for a specific audience
• Information and resources can be presented in a highly customized manner
• Information is organized and presented for a specific audience
• Information and resources can be presented in a highly customized manner
Using Grids (cont’d):
Clemson Computing and Information Technology
Conclusion: Why Cyberinfrastructure?
• New approaches to inquiry based on• Deep analysis of huge quantities of data• Interdisciplinary collaboration• Large-scale simulation and analysis• Smart instrumentation• Dynamically assemble the resources to
tackle a new scale of problem• Enabled by access to resources &
services without regard for location & other barriers