21
User Support: Current Levels and Methods Ralph Roskies Scientific Director PSC January 9, 2009

User Support: Current Levels and Methods Ralph Roskies Scientific Director PSC January 9, 2009

Embed Size (px)

Citation preview

Page 1: User Support: Current Levels and Methods Ralph Roskies Scientific Director PSC January 9, 2009

User Support: Current Levels and Methods

Ralph RoskiesScientific Director

PSCJanuary 9, 2009

Page 2: User Support: Current Levels and Methods Ralph Roskies Scientific Director PSC January 9, 2009

User Survey Results

•.2008 user survey satisfaction ratings:Helpfulness of TeraGrid user support staff 83.75%Promptness of ticket resolution 82.25%Effectiveness of user support in solving problems

79.5%

??User support is the most valuable aspect of TeraGrid?? Kelly Gaither- can you dig out the exact quote if you have it, from the user

who said this??(

Page 3: User Support: Current Levels and Methods Ralph Roskies Scientific Director PSC January 9, 2009

User Support Overview

Category FTEs (PY4)

Frontline User Support 32.3

Advanced User Support 29.2

Online User Support 5.75

Advanced Support for EOT 4.25

•For Q4 2008, 1,118 PIs, 1,413 users charging SUs.

• In PY4, TG has ~70 FTE involved with user support.

•Managed in concert by the GIG ADs for Operations, User Support Coordination, Advanced Support, User Facing Presence, EOT, and Science Gateways, with guidance from the Science Director.

Note- does not include substantial training efforts in HPC University, or work on Common User Environments.

Page 4: User Support: Current Levels and Methods Ralph Roskies Scientific Director PSC January 9, 2009

Frontline User Support

• Ticket Resolution and User EngagementProvide efficient and effective resolution of trouble

tickets by TeraGrid-wide sharing of technical information and best practices.

Refer issues that require >1 FTE-month to

Advanced Support.

Provide technical content for online information

based on recent problems and user feedback.

Provide ongoing personal contacts via the User Champions, Campus Champions, and Pathways programs.

??sergiu- what does this mean??

Organize the 2008 and 2009 user satisfaction surveys.

PY4: 32.3 FTE; $12K for external user survey contractor

Dear Dr. Hackworth,Entire NCCU physics computational group is very grateful for your prompt action and in helping us to resolve for us this very significant problem. (regarding rapid set up of an account) Branislav Vlahovic------------------------------To: R.F. CostaSubject: Re: Pople jobHey Rick,…you are awesome man, thanks for all the help. (setting up priority queue) Abhijit Ramachandran. Depart. Of Bioengineering. U. Texas at Arlington------------------------------…The support staff isextremely patient and helpful.

Keshav Pingali, Cornell

Page 5: User Support: Current Levels and Methods Ralph Roskies Scientific Director PSC January 9, 2009

Advanced User Support• Advanced Support for TeraGrid Applications

Provide targeted, >1 FTE-months support to users’ application development and optimization efforts.

Responsible for many of the TG HighlightsCan be requested as Startup and Supplemental via POPsOften results in co-authorship, co-Pis in proposals, …

PY4: 15.25 FTE for ~25 ASTA collaborations

Happy New Year!I just wanted to say that Roberto has really been working above and beyond the call of duty to get this all going -- we've been getting mail from him on his way to bed and on waking in the morning with his kids... our runs are all going in this morning, and with any luck at all we'll have a nice set of results to discuss at the AAS in San Diego next week.

Thanks for your help!

Mordecai-Mark Mac Low Merican Museum of Natural History (Jan 3)-------------------------------

Page 6: User Support: Current Levels and Methods Ralph Roskies Scientific Director PSC January 9, 2009

Advanced User Support

• Advanced Support for ProjectsIdentify, deploy, harden, optimize and benchmark tools and

application packages that benefit large numbers of users in a particular domain or across multiple domains.

Examples include molecular dynamics (NAMD, AMBER, GROMACS, CHARMM, LAMMPS and DESMOND) and materials codes (CPMD, VASP, SIESTA, ABINIT), heavily used in TG.

PY4: 8.25 FTE for at least 3 cross-TG application infrastructure projects

A million thanks to you, and to all the folks at the PSC for your help with this BIG job. Without it, it could not have been done.

Jacobo Bielak, CMU------------------------------Our (consultant) sometimes comes up with his own suggestions before we even have a problem!

Steve Gottlieb (Indiana)

Page 7: User Support: Current Levels and Methods Ralph Roskies Scientific Director PSC January 9, 2009

Advanced User Support - Gateways

•Subset of Advanced User Support program– Same request process, just looking for different

expertise•Perhaps Grid computing and workflows rather than optimization and scaling–Some may request support in multiple expertise areas

– Targeted support a hallmark of the Gateway program early on•As the program was being formed, all gateway developers were guinea pigs, many received advanced support

•Today, moving toward a more sustainable production environment

PY4: 5.7 FTE for at least 10 SGW projects

Page 8: User Support: Current Levels and Methods Ralph Roskies Scientific Director PSC January 9, 2009

Online User Support

• User Information Presentation Develop and maintain methods to provide

users with current, accurate information from across the TeraGrid in a dynamic environment of resources, software and services.

PY4: 3.5 FTE• Information Production and Organization

Maintain and update documentation content, including a knowledge base of brief answers, with follow-up references, to frequent user questions.

PY4: 2.25 FTE for 250 new documents

PSC has exemplary organization for its user guides that simplifies migration to a new machine. All the right information is available for machine parameters, compilation and batch script development. It really reduces the barrier to starting out on a new computer.

Steve Gottlieb Indiana University

Page 9: User Support: Current Levels and Methods Ralph Roskies Scientific Director PSC January 9, 2009

Support for EOT

• Advanced Support for Education, Outreach and Training Prepare and deliver advanced HPC/CI content for the HPC

University, as well as for education and outreach activitiesFirst 3 quarters of 2008, new contents have included:

•Intro to Multi-Core Programming, •TeraGrid New User Training, •Hybrid Programming for Shared-Memory and Clustered SMP Systems,

•Introduction to Data Transfer and File Management on the TeraGrid ,

•Introduction to Parallel Programming on Ranger, Clouds and Web 2.0

PY4: 4.25 FTE

Page 10: User Support: Current Levels and Methods Ralph Roskies Scientific Director PSC January 9, 2009

Backup Slides

Page 11: User Support: Current Levels and Methods Ralph Roskies Scientific Director PSC January 9, 2009

Training via HPC University Program•Support for TeraGrid Training

Provide a broad range of live, synchronous and asynchronous training opportunities. Work with external organizations to identify and promote all HPC training resources and opportunities for participation

Over the first three quarters of 2008, TeraGrid has provided training for 5,306 people through 75 training events and through access to 22 on-line tutorials

www.hpcuniv.org

Page 12: User Support: Current Levels and Methods Ralph Roskies Scientific Director PSC January 9, 2009

Support for EOT•Support for Education, Outreach and Training Prepare current and future, and significantly larger and more diverse generations, of STEM practitioners to actively contribute to advancing scientific discovery. Over the first three quarters of 2008, EOT has engaged 8,421 people in 190 EOT events, plus use of 22 on-line tutorials, and engagement in over 80 tours of facilities, and through TG’08. AUS staff have contributed significantly in support of these activities.

Page 13: User Support: Current Levels and Methods Ralph Roskies Scientific Director PSC January 9, 2009

Current SGW Collaborations•GIG

– GEON and Navajo Technical College

– PolarGrid– Computational

Infrastructure for Geodynamics

– Social Informatics DataGrid

– Allegheny General Hospital

– TeraDRE– HUB gateways– Asteroseismology

•RP– Community Climate

System Model (CCSM)– Neutron Science

Portal– Earth System Grid

Page 14: User Support: Current Levels and Methods Ralph Roskies Scientific Director PSC January 9, 2009

•Membership-governed organization – 40 institutional

member, 9 foreign affiliates

•Supports and promotes Earth science by developing and maintaining software for computational geophysics

Page 15: User Support: Current Levels and Methods Ralph Roskies Scientific Director PSC January 9, 2009

How does CIG use the TeraGrid?

• Seismograms allow scientists to understand the ground motion• Computationally-intensive simulations run on TeraGrid using an

assortment of 3D and 1D earth models produce synthetic seismograms– Necessary input datasets provided via the portal– Daemon (Python, Pyre) constantly polls the web site looking for work to do

•GSI-OpenSSH and MyProxy credentials to submit jobs, monitors jobs, transfers output back to portal

•status updates to the web site using HTTP POST

– Users can download results in ASCII and Seismic Analysis Code (SAC) format•Visualizations include "beachball" graphics depicting the earthquake's source mechanism, and maps showing the locations of the earthquake and the seismic stations using GMT (http://gmt.soest.hawaii.edu/)

• Researchers quickly receive results and can concentrate on the scientific aspects of the output rather than on the details of running the analysis on a supercomputer

• Future Directions– Parameter explorations– Custom earth models for users

Page 16: User Support: Current Levels and Methods Ralph Roskies Scientific Director PSC January 9, 2009

Social Informatics Data Grid

•Heavy use of “multimodal” data. – Subject might be viewing a

video, while a researcher collects heart rate and eye movement data.

•Events must be synchronized for analysis, large datasets result

•Extensive analysis capabilities are not something that each researcher should have to create for themselves.

http://www.ci.uchicago.edu/research/files/sidgrid.mov

Page 17: User Support: Current Levels and Methods Ralph Roskies Scientific Director PSC January 9, 2009

How does SIDGrid use the TeraGrid?

•Computationally intensive tasks– Speech, gesture, facial expression, and physiological

measurements •Media transcoding for pitch analysis of audio tracks•Once stored in raw form, data streams converted to formats compatible with software for annotation, coding, integration, analysis

– fMRI image analysis

•Workflows for massive job submissions and data transfers using Virtual Data System (VDS)

•Worflows converted to concrete execution plan via Pegasus Grid planner– TeraGrid information service (MDS)– Replica location service (RLS)– DAGMAN and Condor-G/GRAM

Page 18: User Support: Current Levels and Methods Ralph Roskies Scientific Director PSC January 9, 2009

Purdue ASTA - TG-MCA05S015 18

Purdue ASTA Activity – TG-MCA05S015

P. A. Cheeseman ([email protected])

Teragrid AllocationsTG-MCA05S015TG-MCA05T015

Page 19: User Support: Current Levels and Methods Ralph Roskies Scientific Director PSC January 9, 2009

Purdue ASTA - TG-MCA05S015 19

• Milestones

• 2006/02 – Adaptation of parameter sweep to Condor began.

• 2006/05 – Condor adaptation plan reviewed.• Reduce job times to avoid preemption.• Improve fault tolerance.• Incorporate internal, adaptable, time limits.• Incorporate script level steps within program

(self-checkpoint, seed iteration, etc.).

• 2006/08 – Program adaptation complete and adapted code in production (see Slide 4).

• 2007/06 – Presentation at TG07 (http://www.teragrid.org/events/teragrid07/archive/presentations/wednesday/TG07.PD.12)

Purdue ASTA Activity ...

Page 20: User Support: Current Levels and Methods Ralph Roskies Scientific Director PSC January 9, 2009

Purdue ASTA - TG-MCA05S015 20

• Milestones (cont.)• 2007/12/24 – Initial computations complete.

• ~6M jobs completed.• 4M hours delivered.• 240+ hours/Hour average delivery rate• Peak rates of 2000+ hours/hour.• 3,168,459 parameter sets processed (100 seeds

per set).

• 2008/01 – Refinement computations began.• Minor code adaptations necessary.• Less CPU intensive.

• 2008/11 – Refinement computations complete.• 8M+ inputs processed.

• Results presently being reviewed By Profs. Deem and Earl.

Purdue ASTA Activity ...

Page 21: User Support: Current Levels and Methods Ralph Roskies Scientific Director PSC January 9, 2009

Purdue ASTA - TG-MCA05S015 21

Unadapted Execution Times

0

100

200

300

400

500

600

700

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

Time (hours)

Jobs

Unadapted Execution Time Breakdown

20 min. or less3%

20-40 min.6%

40 min. to 1 hr.7%

1-2 hr.24%

2-3 hr.21%

3 hr. or more39%

Adapted Execution TimesFive Jobs per Set

0

2000

4000

6000

8000

10000

12000

14000

0 1 2 3

Time (hours)

Jobs

Adapted Execution Time BreakdownFive Jobs per Set

1 hr. or less93%

1-2 hr.6%

2-3 hr.1%

0%

Purdue ASTA Activity ...