Upload
susanna-wade
View
217
Download
0
Tags:
Embed Size (px)
Citation preview
A View from the TopEnd of Year 1
A View from the TopEnd of Year 1
Al GeistOctober 10-11
Houston TX
www.scidac.org/ScalableSystems
Coordinator: Al Geist
Participating Organizations
ORNLANLLBNLPNNL
PSCSDSCIBM
SNLLANLAmesNCSA
CrayIntelUnlimited Scale
Participating OrganizationsParticipating Organizations
Main Web SiteMain Web Site
Scalable Systems Software CenterJune 13-14Houston TX
Review of Last MeetingReview of Last Meeting
Details inMain project notebook
Progress Reports at June. mtgProgress Reports at June. mtg
Al Geist – working groups, notebooks, telecoms
Working Group Leaders –What areas their working group is addressing Progress report on what their group has done Present problems being addressed Next steps for the group Discussion items for the larger group to consider
Demonstrations of Prototype ComponentsOne Big intra-component demo
Slides can be found in Main Notebook page 22
Consensus and Voting:Consensus and Voting:
Event Manager Proposal: Much discussion: revised proposal to say that Event Management is important feature to our Software Suite independent of whether it is in a central component or inside components. And that proposed tuple API is initial starting point.
Passed strawvote 13 for / 0 against / 0 abstainAdopt HTTP POST (byte count) as standard Proposal: Passed strawvote 10 for / 0 against / 1 abstainAdopt W3 standard for XML signature syntax and process: Long discussion. Decided more discussion needed before voteBugzilla site now up and running Link is on the ScalableSystems home page.
Scalable Systems Software Center
June-October
Progress Since Last MeetingProgress Since Last Meeting
Five Project Notebooks filling upFive Project Notebooks filling up
A main notebook for general information
And individual notebooks for each working group
• Over 200 total pages – 34 added since last meeting
• A lot of new material in Resource Management notebook (way to go)
Get to all notebooks through main web site www.scidac.org/ScalableSystems
Click on side bar or at “project notebooks” at bottom of page
Four Bi-weekly Working Group Telecoms Less talk more work
Resource management, scheduling, and accounting
Tuesday 3:00 pm (Eastern) 1-800-664-0771 keyword “SSS mtg”
Validation and Testing
Wednesday 1:00 pm (Eastern) 1-877-540-9892 mtg code 999157
Proccess management, system monitoring, and checkpointing
Thursday 1:00 pm (Eastern) 1-877-252-5250 mtg code 160910
Node build, configuration, and information service
Friday 3:00 pm (Eastern) 1-888-469-1934 mtg code 58145 (changes)
Scalable Systems Integrated Component Demonstration
QueueManager
AllocationManager
NodeMonitor
LocalScheduler
ProcessManager
DiscoveryService
Color Key
Working Group
Resource Management and Accounting
Process Management and Monitoring
Node Configuration and Build Infrastructure
JobSubmission
Client
1 Submit-Job
3 Query-N
ode6
Exe
c-Pr
oces
s
4 Create-Reservation
2 Query
-Job
5 Run-Jo
b
8 Dele
te-Job
0 Service
-Lookup
7 Query
-Job
9 Withdraw-Allocation
Done June 2002
Authentication &Communication
R. Lusk
MetaSchedulerD. Jackson
MetaManagerS. Scott
AccountingS. Jackson
SchedulerD. Jackson
System/JobMonitors
M. Showerman
PackageServices
J. Mugler
InformationServices
JP Navaro
AllocationManagement
S. Jackson
QueueManagerB. Bode
JobManagerB. Bode
Checkpoint /Restart
P. Hargrove
ProcessManagerR.Lusk
ServiceDirectoryN. Desai
NodeManager
T. Naughton
C-PlantXML interface
E. Debenedictis
Resource MgmtWorking Group
Build & ConfigureWorking Group
Process MgmtWorking Group
SSSlibUsed by all components
Scalable Systems Software Center
October 10-11,2002
This MeetingThis Meeting
SciDAC BoothSciDAC Booth
SciDAC Systems PosterSciDAC Systems Poster
SciDAC BoothSciDAC Booth
SciDAC Systems Poster (2)SciDAC Systems Poster (2)
Agenda – October 10Agenda – October 10
8:00 Breakfast 8:30 Al Geist – Project Status. Getting ready for SC 2002 9:00 External Project review – Feburary (start planing) Working Group Reports 9:30 Scott Jackson – Resource Management10:30 Break11:00 Erik Debenedictis – Validation and Testing 12:00 Lunch (on own but go somewhere as group) 1:00 Paul Hargrove – Process Management 2:00 Narayan Desi – Node Build, Configure 3.00 Break 3:30 SC Demos and Hacking
big multi-component demo 5:00 Open Discussion 5:30 Adjourn Working groups may wish to get together in evening
Agenda – October 11Agenda – October 11
8:00 Breakfast 8:30 Discussion, proposals, strawvotes
THANKS to Airport Security Meeting for open access to their internet access!ssslibmeatball GUI (who?)Chiba City for SC demos (Nov 4?)cross group issuestest packaging?
10:30 Break11:00 Al Geist – Summary SC Booth, demos, theater, software, handout (Brett)
February review – reviewers, advisor, talks next meeting date: day before review12:00 meeting ends
External SciDAC Review mtgExternal SciDAC Review mtg
Late February 2003 – may bubble over to early March 18 month checkup by MICS
Each SciDAC Project is reviewed separately – Scalable Systems is the only thing on the agenda
Full two days of detailed presentationsSo many of us will have to give presentations
External review panel (different for each ISIC)We can suggest names Can’t be from our organizations or affiliated They will have been given our proposal beforehand
External SciDAC Review metricsExternal SciDAC Review metrics
I asked Fred and McGraw about Metrics:
1. How have we helped SciDAC Aps?Can we show use in CCS and NERSC and others.
2. Put Advisory Panel into place.Apps and Computer Center personnelI’ve asked Drake (Climate), Mezzacapa (Astro), Bland (CCS), Nichols (Chemistry)
we need NERSC rep and others?3. Show short term successes and use
External Review Panel SuggestionsExternal Review Panel Suggestions
External review panel (different for each ISIC)We can suggest names - who?
Barney McCabeRuss MillerBart MillerJose M (IBM)Someone from CraySomeone from Etnus – John DelsignoreSomeone from Unlimited Scale? Walt LigonAndrew LumsdaineJim Garlick
Steve Chapin
Meeting NotesMeeting Notes
Scott Jackson – rm progressScope queue manager, job manager, scheduler, allocation, & metaDemo CCS, NERSC, and Chiba meta-schedule would be goodScheduler- enhance internal scalability to 64K nodes, add support for HTTP framing protocol. Qbank security enhanced Interface to PBS, LSF, LL for suspend/resume and requeue mgtQueue Manager-conforms to SSSRMAP XML spec. full wire protocol compatibility new enterface to Event ManagerAllocation Manager-survey of 15 sites for requirements. Implemented HTTP framing, SHA1-HMAC security working with Qbank/Maui reframed bank objects (accounts, users, allocations) as dynamic object actions defined in metadata cache creation of dynamic web-GUI using PHP and javascriptMeta scheduler – interoperates with Grid (globus), fault tolerance – global jobID tracking, scheduler reconnection. Improved user interfaceCurrent issues – job state mgt, data staging, job signaling, job steps
Meeting NotesMeeting Notes
Scott Jackson – rm progress (cont.)Next work- prepare for SC demos, scalability testing, BIG thing is release v1.0 RM system. Documentation, security authentication, extend suspend/resume schema beyond what PBS, LL does today Discussion of the need for a scalability testbed.Eric Debenidictis – validation progressCreate machine independent test for testing supercomputer Infrastructure QMTest Tests (from all sources) Value- improved method execute the “SSS Standard Test body”Recent Activity – QMTest on SNL SciDAC cluster, test package definitionWill McClendon – test architecture (diagram in slides)QMTest is scriptable test driver in PythonHTTP based interface – ZopeRunning at SNL and PSCRequires exact match on STDOUT/STDERR
Meeting NotesMeeting Notes
Will McClendon – test architecture (cont.)QMTest Screenshot and discussion of how tests are done.Raw results need to be interpreted to determine pass or failMike ???- goes over the “package” detailsHow to create a test package to the suite – Package File Layout Make-likeWill present as a proposal tomorrowPaul Hargrove – pm groupProgress – prototyping and development continue how to interface to something we can’t imagine validating schema for process manager node monitor schema createdCheckpoint Manager- types serial checkpoints (independent but potentially multithreaded), done parallel checkpoints (MPI) scalable systems XML interfaces
Meeting NotesMeeting Notes
Rusty Lusk – process manager (see diagram in his slides)MPD1 (C) overview – added capabilities required by pmWGMPD is one prototype for SSS Process ManagerMPD2 (python) diagram in slides for new designPython about 5X slower with this untuned versionMike Showerman- system monitoring componentCraig Steffen full time on this project and a studentUsing new XML schema defined by Need to write graphical display that uses this new XML interfaceRun a small cluster in NCSA booth with SSS software stackDiscussion – how about an animated meatball diagramPaul returns –Data migration meatball removedNext steps – interfaces continue to stabilize chkpt, PM, monitors Monitoring data. . . Details need defining
Meeting NotesMeeting Notes
Narayan Desai – Build and configure updateComponents – service directory (solid and on Chiba now), event manager completely rewritten, stable XML, SSSlib robust (bindings for C++, Java, Python, Perl) (wire protocol modules, basic, challenge, http, http-rm)Build and Config Management (third try at the abstraction) cluster HW build system (OSCAR module for this one in the works) node state managerIssues- Abstraction problems with second try. Multiple implementations important to validate abstraction
DEMOS
MetaScheduler
MetaMonitor
MetaManager
Accounting Scheduler
NodeConfiguration
& BuildManager
User DB
AllocationManagement
Job QueueManager
Process Manager
UsageReports
UserUtilities
HighPerformance
Communication& I/O
FileSystem
Application Environment
Meta Services
Testing & Validation
System &Job Monitor
Event Manager
ServiceDirectory
Checkpoint /Restart
Blue text – uses ssslibRed text – talks ssslib protocol
Refined Picture on Next Slide
Accounting
FileSystem
Event Manager
ServiceDirectory
MetaScheduler
MetaMonitor
MetaManager
Scheduler
User DBAllocationManagement
Process Manager
UsageReports
UserUtilities
HighPerformance
Communication& I/O
Application Environment
Meta Services
System &Job Monitor
Checkpoint /Restart
Grid Interfaces
Job QueueManager
TheseInterfaceTo all
NodeConfiguration
& BuildManager