36
mson Computing and Information Technology Introduction to Cyberinfrastructure (aka: Grids, Clouds, etc) ating an Integrated South Carolina Environmental and Human Health Gr ng Cyberinfrastructure to Link Hazards, Exposures and Health Effects June 25, 2009 Jim Bottum and Jill Gemmill

Clemson Computing and Information Technology Introduction to Cyberinfrastructure (aka: Grids, Clouds, etc) Creating an Integrated South Carolina Environmental

Embed Size (px)

Citation preview

Clemson Computing and Information Technology

Introduction to Cyberinfrastructure(aka: Grids, Clouds, etc)

Creating an Integrated South Carolina Environmental and Human Health Grid:Using Cyberinfrastructure to Link Hazards, Exposures and Health Effects

June 25, 2009

Jim Bottum and Jill Gemmill

Clemson Computing and Information Technology

Cyberinfrastructure (CI)

Computing and Communications

Data – Storage, retrieval, archiving, visualization, mining, security

Virtual Organizations – Software environments for communities

Education and Workforce Training

Cyberinfrastructure has a four-legged strategic approach.

Cyberinfrastructure = Information technology we all use

Clemson Computing and Information Technology 3

CI enables Scaling up ScienceCitation Network Analysis in Sociology

2002

1975

1990

1985

1980

2000

1995 Work of James Evans, University of Chicago,

Department of Sociology

Clemson Computing and Information Technology

• Query and analysis of 25+ million citations• Higher throughput and capacity enables deeper

analysis and broader community access.

• Analysis began on desktop workstations• Queries grew to month-long duration• Moved analysis to U of Chicago compute

cluster: 50 (faster) CPUs gave 100 X speedup

• Many more methods and hypotheses can be tested!

Scaling up the analysis

Clemson Computing and Information Technology

• Telephone / Voice• Photography / Images• Video / Movies / Animation• Information / Data /

Library

CI is the convergence of technologies:

Clemson Computing and Information Technology

Clemson CI Vision

“Cyberinfrastructure is the primary backbone that ties together innovation in research, instruction, and service to elevate Clemson to the Top 20”

Dori HelmsProvost

Clemson Computing and Information Technology

High Performance CI is similar, but scaled up in size and complexity

+

Clemson Computing and Information Technology

Independent computations can be done in parallel

• Many computations (modeling or analysis of data) consist of:• Large datasets as inputs (find datasets)• “Transformations” which work on the input datasets

(process)• The output datasets (store and publish)

Montage Workflow: ~1200 jobs, 7 levelsNVO, NASA, ISI/Pegasus - Deelman et al.

= Data Transfer

= Compute Job

Clemson Computing and Information Technology

Clusters: High Performance Computing

Clemson Palmetto Cluster: #60 supercomputer, #4 in stand alone US universities – 756 “PCs” (commodity hardware) with closely coupled, high speed communications and large scale, very fast storage.

½ PetaByte Storage

45 TeraFlops.

756 compute

nodes

High Throughput

interconnects (10 Gbps)

Clemson Computing and Information Technology

What problems? (months to minutes)

Estimation and Inference about Efficiency in Production Settings -Paul Wilson, Economics

Protein Interaction with Synthetic Materials -Robert Latour, Bioengineering

Effects of School Characteristics and Parents' Work Behaviors on Children's Performance in School--modeling the parents' choices of places of residence and their labor market behaviors (over 15,000 school districts)-Tom Mroz, Economics

Clemson Computing and Information Technology

“Snapshot” of Palmetto in use

Clemson Computing and Information Technology

High Throughput Computing

Harvesting unused student lab computer cycles using Condor software

1700 lab PCs can be used as a supercomputer

Small amounts of

storage

Low Throughput connections

Clemson Computing and Information Technology

What Problems? (months to minutes)

Manufacturing & Scheduling

Optimization-Mary Beth Kurz, Industrial

Engineering

Rendering Architectural 3D views and walk-throughs

ArcGIS Computations over multiple

inputs and very large

areas

Clemson Computing and Information Technology

Leverage Existing Investments

*Linked 1200 Clemson lab machines to provide high throughput compute power

M. B. Kurz, Industrial Engineering

“..Grid computing is saving me”

"I had all but given up on this line of research, when I was approached with this Condor idea. Now I am doing work at a scale larger than is usually done." M.B. Kurz

1700+ student lab machines

Clemson Computing and Information Technology

National Trend – Grids, Clouds,

Similar to power grids…..

…..computing grids enable resource sharing

Clemson Computing and Information Technology

Credit: Open Cloud ConsortiumCloud system at the NSF supported National Center for Data Mining at the UI - Chicago

Cloud computing enables faster, less expensive processing across geographically distributed data centers

National Trend - Clouds

"We demonstrated that our system is six times faster than competing technology.”

Robert Grossman, NCDM directorNSF Press Release 09-032, Feb. 25, 2009

Clemson Computing and Information Technology

FutureNet connects to every major R&E Network via multiple lambdas

National and Regional Research and Education Networks

South Carolina Light Rail and C-Light

Higher Bandwidth and Optional Dedicated Fiber Paths Connect Grids

R&E Connections are International

Clemson Computing and Information Technology

State Network FoundationSCLRFCC Rural Health CareDoE – PSA

ARRA Stimulus Broadband Activities may build on this

Clemson Computing and Information Technology

Clemson 2006 Clemson 2008

Networks:

No redundancy

Mbps

Data Center

No HPC

ITC

Poole-1 Poole-2

StrodeHolmesRiggs

1G1G

1G

1G 1G1G 1G1G 1G

Internet / Internet2

1G1G

BuildingBuildingBuildingBuildingBuildingBuildingBuildingBuilding

BuildingBuildingBuildingBuilding

Access Layer

Distribution Layer

Core Layer

Data Center Internet / Internet2 Wide Area Network

C-Light used no taxpayer $$

SCLR – I2 - NLRGbps

~30,000 sf aggregate data center

Re-engineered SAN; petascale of storage

Collocation, Condominium, Condor

• Approaching 100 Teraflops

#60 on Intl Top 500

24X7X365 NOC

“Last Mile” Clemson Infrastructure

Clemson Computing and Information Technology

• Virtual Organization: Group of people who share a common goal and share resources (or need resources) to achieve that goal• A classroom as a VO• A group of astronomers• Providers of a common application such as email

• Sharing resources among virtual organizations is what grid computing is trying to solve

• Portals / Science Gateways / Hubs make it easy to use grids – all you need is a web browser

National Trend – VOs and Communities

Clemson Computing and Information Technology

COMPUTATION

VISUALIZATION

EDUCATIONDATA

R&E Networks

Portal/ Gateway/Hub

Middleware (Grid

Software)

Another Way of looking at Grids

Clemson Computing and Information Technology

Example Virtual Organizations

Clemson Computing and Information Technology

Clemson Computing and Information Technology24

Intelligent RiverTM Watershed Monitoring

Linking Water, Land Use, Energy, and Global Climate Change

Source: WATERS Network cyberinfrastructure, NSF

Slid

e c

ou

rtesy

of

Gene E

idso

n

Clemson Computing and Information Technology

Open Parks Grid

Clemson Computing and Information Technology

Grids are enhancing and expanding traditional scientific methods

Clemson Computing and Information Technology

A Disruptive Technology

• Examples of some previous disruptive technologies:• The printing press• The telegraph and telephone• The web browser

• What changed as a result?• Widespread literacy• Real-time communications over distance• Anyone can use a computer

Clemson Computing and Information Technology

Grids are Disruptive Technologies

(1) The world is “flat”

(2) Everyone can be a producer as well as consumer of information

(3) ‘Web Services’ makes information re-useable and available anywhere

(4) Information can be customized to your interests (think Amazon.com)

Clemson Computing and Information Technology

What are “Web Services”?

The map and data on the left represent streamflow conditions – data is collected by the USGS and made available on a map on their web site.

Data is made available as a web service

Clemson Computing and Information Technology30

Therefore, the Intelligent RiverTM

(IR) can use this data to re-display in a different manner and combine with other, project-specific data. USGS data used is always current, but can be stored as well.

In the same way, IR data can be re-used by others.

Standards and MetaData make this possible

Clemson Computing and Information Technology

Science is Disrupted, too• Old way:

• The experimental notebook• The filing cabinet• The library catalog• Purchase/write your own analysis software• Results made available to research specialists

via conference presentations and journal publications

• Results are written, presented and preserved on paper.

• Information is organized and presented for a specific audience

Clemson Computing and Information Technology

Using Grids--

• The experimental notebook• Digital data in standardized format; use Metadata

to record date, parameters, owner, etc.• The filing cabinet• The Database • The library catalog• The search engine• Purchase/write your own analysis software• Use analytical/computational services available in

the Grid

• The experimental notebook• Digital data in standardized format; use Metadata

to record date, parameters, owner, etc.• The filing cabinet• The Database • The library catalog• The search engine• Purchase/write your own analysis software• Use analytical/computational services available in

the Grid

Clemson Computing and Information Technology 33

• Results made available to research specialists via conference presentations and journal publications• Results are available to anyone interested• Data may be made available upon collection or

analysis• Results may be made available as simulations,

animations, movies, tutorials, interactive games…or a mix of these with text.

• Results are written, presented and preserved on paper.• Results are digitally stored, archived, and

available on the Internet

• Results made available to research specialists via conference presentations and journal publications• Results are available to anyone interested• Data may be made available upon collection or

analysis• Results may be made available as simulations,

animations, movies, tutorials, interactive games…or a mix of these with text.

• Results are written, presented and preserved on paper.• Results are digitally stored, archived, and

available on the Internet

Using Grids (cont’d):

Clemson Computing and Information Technology

• Information is organized and presented for a specific audience

• Information and resources can be presented in a highly customized manner

• Information is organized and presented for a specific audience

• Information and resources can be presented in a highly customized manner

Using Grids (cont’d):

Clemson Computing and Information Technology

Conclusion: Why Cyberinfrastructure?

• New approaches to inquiry based on• Deep analysis of huge quantities of data• Interdisciplinary collaboration• Large-scale simulation and analysis• Smart instrumentation• Dynamically assemble the resources to

tackle a new scale of problem• Enabled by access to resources &

services without regard for location & other barriers

Clemson Computing and Information Technology

Discussion