Upload
others
View
6
Download
0
Embed Size (px)
Citation preview
Introduction to Compute Canada ARC resources
Grigory Shamov, UManitoba / WestgridOct 18, 2017
Thanks to: Alex Razoumov and John Simpson
Introduction & Outline
● Who we are
● What are the ARC resources
● What resources are available from ComputeCanada
2
What is ComputeCanada
3
Regional Consortia
4
What is WestGrid
5
WestGrid Members and Partners
6
WestGrid Site at UofM.
○ARC Support staff : Ali Kerrache , Grigory Shamov ○Sysadmin team (UofM IST): Daryl Lorimer, Sean Kalynuk
●Legacy WestGrid system: Grex○https://www.westgrid.ca/support/systems/grex
7
Advanced Research Computing
What are the Advanced Research Computing Resources?
○Advanced means: something that has to scale out of a PC/workstation
○Compute, Data storage, Network○Scientific Platforms, Visualization○Support, Advocacy and Consultation
8
●High Performance Computing Clusters ○“A cluster of workstations, with interconnect and under command of a batch system”. ○Resource supply is managed by RM/Scheduler○Batch mode (Mainframe from 60s-70s).○Scheduling gives high utilization and “traffic control” for jobs○Also “fairness” and accounting of the systems usage○Single, curated HPC software stack●Cloud Computing○Infrastructure as service (but also “cloud is a business model”) ○Create and manage your own Virtual Machine instances.○Natural for servers, but can be used for compute ○Most flexible, can use any OS or software stack; VMs can be portable.
9
Computing that scales: HPC / Cloud
HPC System Example: Grex
HPC Scheduler
●Scheduler is the component that does the policing
○Resource management/allocation is done through the Scheduler○Resource Allocations set priority targets ○Dynamically changed priorities based on recent usage○Allocations are not static reservations!○Needs steady supply of jobs throughout the year to fulfill the RAC targets
○Very large or very long jobs may need special treatment○Burst load also might be harder to accommodate.
But what Scheduler?
●A BIG CHANGE: from Torque to SLURM !○All the legacy Westgrid systems were running Torque RM with Moab scheduler #PBS –l walltime=1:00:00,nodes=1:ppn=4,mem=4gb#PBS –A abc-678-aaqsub test.job○New National sites (Cedar, Graham) are using SLURM#SBATCH ???#SBATCH ??? squeue test.job
●Kamil’s (UofA) workshop for scheduling is coming this Fall●Any interest in a local workshop about SLURM?
12
ComputeCanada HPC offerings
○https://docs.computecanada.ca/wiki/Cedar○https://docs.computecanada.ca/wiki/Graham
CPUs GPUs RAM per node
Storage Note
Cedar 27.6K x E5-2683v4 146 x 4 P100 128GB – 3TB 3.7PB /scratch9.3PB /project
Fully online! CPUs to be expand in 2018
Graham 33.4K x E5-2683v4 160 x 2 P100 128GB – 3TB 3.2PB /scratch11PB /project
Fully online! Storage to be expanded in 2018
Niagara TBD, ~60K 0 TBD TBD SciNet, expected by Apr 1, 2018
GP4 TBD TBD TBD TBD
13
HPC Software stack
●Number-crunching software environment: ○Compilers, BLAS/LAPACK/PETSc, MPI, OpenMP●Dynamic languages and libraries○R, Python, Perl, Julia, ...●Domain-specific applications and packages○Engineering, Chemistry, MachineLearning, ...●Biomolecular pipelines, genomics etc.
●BigData (Hadoop, Spark, ElasticSearch)?
●Centralized CC software stack, distributed over CVMFS 14
HPC Software stack
●Distributed over CVMFS to all National HPC systems○Using “modules” (LMod); module load abinit/8.2.2○https://docs.computecanada.ca/wiki/Available_software ○Module usage is tracked.
15
●High Performance Computing Clusters ○“A cluster of workstations, with interconnect and under command of a batch system”. ○Resource supply is managed by RM/Scheduler○Batch mode (Mainframe from 60s-70s).○Scheduling gives high utilization and “traffic control” for jobs○Also “fairness” and accounting of the systems usage○Single, curated HPC software stack●Cloud Computing○Infrastructure as service (but also “cloud is a business model”) ○Create and manage your own Virtual Machine instances.○Natural for servers, but can be used for compute ○Most flexible, can use any OS or software stack; VMs can be portable.
16
Computing that scales: HPC / Cloud
OpenStack Logical Diagram
WestCloud example
ComputeCanada Cloud offering
Current default limits: ○2 VMs, 4 VCPUs, 15G RAM, 50Gb persistent storage, 1 IP.○https://docs.computecanada.ca/wiki/Cloud_resources
Nodes RAM per node Storage Note
Arbutus 240, 2 x E5-2680v4 256GB on most 500TB Ceph Due to expansion, will include big data nodes
West Cloud 32, 2 x E5-2650v2 256GB on most Now part of Arbutus
East Cloud 36, 2 x E5-2650v2 128 Gb 100 TB Ceph U.de Scherbrooke
Cloud partitions of Cedar and Graham
TBD; for Graham ~56
TBD;for Graham 256GB
TBD
19
Storage: Large, Fast and Scalable
●Parallel, globally consistent filesystems for HPC ○Lustre, (10PB + 12 PB, operational) ○IBM GPFS
●Object storage filesystems; filesystems for clouds ○Ceph, DDN WOS
●Long-term storage (Tape-and-Disk HSM)○~30PB/site , in progress
○https://docs.computecanada.ca/wiki/National_Data_Cyberinfrastructure 20
Networks and Data Transfer
●CANARIE ○100TB/s at new National hosting sites; at least 10Gb/s to every major site.
●ComputeCanada GlobusOnline○http://globus.computecanada.ca ○Fast transfers, GridFTP○Batch, fire-and-forget○Identity management○Organizational endpoints○Personal endpoints!
○Institutional offerings and issues: Firewalls, security domains 21
Other Data Services and Projects
●ownCloud: another online storage○Dropbox-like; free for Westgrid users ○Allows for easy data sharing; up to 50GB quota○http://owncloud.westgrid.ca
22
Other CC Data Services and Projects
●Research Data Management Project ○https://portagenetwork.ca/frdr-dfdr in limited production now!○CC, USask, UBC, CARL, and Globus
23
FRDR is a single platform from which research data can be ingested, curated, preserved, discovered, cited and shared.
● Federated Storage● Federated Support● Scalable Platform
Portals/Platforms and Viz
●Portals: not only batch processing○Portals, Workflows, and Databases-as services○GenAp, Galaxy; JuPyTer Hub, WebMO○User-developed portals and RPP competition
●Visualization: ○Specific software & hardware & Network requirements: GPU's, Monitors, communication libs like VNC or NX(X2go), low-latency and high-bandwidth networks, interactive nodes○generic (Paraview) and domain-specific (VMD, MaterialSudio, ANSYS). ○Interactive computing like iPython/JuPyTer
24
Accessing ComputeCanada● Step 1:
○ Faculty member registers in the Compute Canada Database (CCDB)○ http://ccdb.computecanada.ca
● Step 2: ○ Once account is approved, students / colleagues can register & be
added as group members
25
Accessing Regional Resources
26
Accessing Support
27
●ComputeCanada support contacts:○[email protected] for the general support○specific support contacts like [email protected] ○Regional consortia might have their own support systems ○Westgrid: [email protected] ○Local UofM staff can be reached through IST Services Catalogue as well○http://umanitoba.ca/ist/service_catalogue/
●Documentation and Training○ComputeCanada: https://docs.computecanada.ca ○Westgrid Website: https://www.westgrid.ca ○Westgrid Training Events calendar: https://www.westgrid.ca//events
Getting more than Default Share
28
●Default RAC (Rapid Access Service now)○A fraction of resources (compute, storage) is left for it.○Groups can start using it right away, no application needed○The fraction is not large, so you might want to take part in:
●Annual ComputeCanada Resource Competitions ○Resources for Research Groups (RRG), formerly RAC○Research Portals and Platforms (RPP) ○In detail today by Patrick Mann, Westgrid’s Director of Operations!
Compute Canada’s Resource Allocation Competition (RAC):
BEST PRACTICESPatrick Mann, Director of Operations
(University of Manitoba Version)
Introduction
1. What is the RAC?2. Manitoba Stats3. Overall and Admin Details4. Tips and Best Practices
a. Reviewersb. Basic Best Practicesc. RRG specificsd. RPP specifics
5. Questions and Discussion
WG Best Practices
This document is a WestGrid-provided best practices guide for the Compute Canada Resource Allocation Competition (RAC). It does not replace the official RAC documents and PI’s should carefully review the Compute Canada official documentation. In cases of conflict between this Best Practices Guide and the official RAC documents, the official documents take precedence.
What is the RAC?
Resource Allocation Competition for allocations on Compute Canada national services
About 20% reserved for opportunistic use.● Available to all CC users - Rapid Access Service (RAS).● Any researcher/student at a Canadian institution can become a user.
○ No application required beyond request for an account
~80% allocated through a competitive process.
Two competitions in RAC 2018:○ Research Platforms and Portals (RPP) Competition○ Resources for Research Groups (RRG) Competition
Manitoba RAC 2017 Results● 100% success rate for 2017!
○ All 10 who applied were successful, receiving resources valued at $434,662. ○ 2-RPP, 3-Engineering, 1-Chemistry, 1-Bioinformatics, 1-Earth Sciences, 1-Nano,
1-Astronomy/Subatomic Physics
● Received 7.9% of the total RAC awards within WestGrid
○ Higher than the total % of users within WestGrid (6.5%)
● 3.3/5 average science score; average CPU requests scaled 55%
○ Equal to national averages
● However Manitoba researchers did receive the lowest CPU allocation relative to their request, receiving 52% of the core years they requested, compared to the WestGrid and Compute Canada average of 58%.
Value & Total Awards
Allocations by Research Area
RAS vs RAC - What do you need?
RAS: Rapid Access Service (default resources available to any user with a Compute Canada account)
RAC - Resource Allocation Competitions (annual competitive process to allocate resources based on scientific excellence)
2018 RAC Structure
2018 RAC Key Dates
RAC 2018 Start FinishFast Track submission Oct 3, 2017 Nov 2, 2017
RRG & RPP full application submission Oct 3, 2017 Nov 16, 2017RPP Progress Report submission Jan 30, 2017 Mar 14, 2018
Award letters sent* Mid Mar, 2018 Late Mar, 2018
RAC 2018 Allocations implemented* Mid Apr, 2018 Early May, 2018
* Final dates to be confirmed.
Under Pressure!
Note: does not include Challenge 1 awards or RRG out-of-round applications.
Good news...System Current Status
Cedar(GP2, SFU)
Current: 27,696 CPU cores, 146 GPU nodes (4 x NVidia P100)● Major expansion in progress● ~doubling in size. Includes storage.
IN OPERATION (June 30, 2017)Expansion Dec.31/2017RAC 2017
Graham(GP3, Waterloo)
Current: 33,472 CPU cores, 160 GPU nodes (2 x NVidia P100)● Storage expansion
IN OPERATION (June 30, 2017)Expansion Dec.31/2017RAC 2017
Niagara(LP1, Toronto)
Large parallel system ~60,000 cores● Current: final contract negotiations with vendor
Expected online early 2018RAC 2018
GP4 New general purpose system (like cedar/graham)● Includes third “deep” storage site
Early stage RFP developmentLater in 2018Not included in RAC 2018
But: ~20 legacy clusters will be defunded -> in total slightly fewer cores, but they should be faster.
Competitive Process
● CFI mandates allocations based on excellence
● Peer-review process to determine SCIENCE SCORE
● High Score = Less Scaling
***Scaling is required due to insufficient resources, which makes the RAC a very competitive process.***
Admin Changes● Fast Track eligibility criteria have been relaxed to favour more users: 296
eligible PIs - largest number ever invited.○ Invitations have been sent out.○ If you are reasonably happy with your 2017 allocation then fast-track is
very straightforward and requires no further work on your part.○ Fast Track applications cannot be delegated: PIs must complete the
application themselves.● PIs can delegate the completion of a RPP or RRG application to a sponsored
user on CCDB.● Notice of Intent (NOI) is not a prerequisite for applying to any RAC
competition this year.● Storage types (e.g., /project, /nearline, etc) are shown only when the resource
you requested has that type of storage available for allocation (see list of allocatable resources for RAC 2018)
CCV
● Required for RRG and RPP full applications (update not required for Fast-Tracks or progress reports).● Note: Applicants with a CCV on CCDB - need to delete and resubmit. Carefully read the instructions in
the CCV Submission Guide (section 6)
Keep your CCV up to date!1. CCV is used by the review committees - quality of the PI and his/her research.
Keep your bibliography up-to-date to ensure that reviewers are aware of the latest (and greatest!) work.2. CC relies on the CCV for bibliographic analyses used both for annual reporting, and for funding proposals.
The Field Weighted Citation Index (FWCI) is a key component of the metrics presented by CC.
“For future allocation requests, please consider adding references to published work in the proposal, progress made in the previous year, and clear description of how HQP will be involved in achieving the milestones.”
“Progress over the past year is missing. Based on the CCV, most of the group's publications were enabled through Compute Canada resources, so that information should have been accessible and would have been relevant to include.”
Out-of-Round & Early-Stage
New faculty can apply for an Out-of-Round● Send request to [email protected]
Use the RAS for tests and prototypes1. Early-stage or new users get an account and learn about the systems.2. Run prototypes or test jobs, and if possible production jobs3. Acquire performance statistics4. Predict future requirements5. Use the above information to create a well-justified RAC application.
(out-of-round or regular)
Tips & Best Practices
No Appeal Process
It is worth noting right away that there is no appeal process.
If the reviewers misunderstand a proposal (which does happen) there is no way of correcting or amplifying the proposal.
● Need to get it right the first time.
The Reviewers
● Science reviewers are experts in their discipline○ For details see list of committees and chairs on CC RAC pages
● But - not necessarily experts in the specific sub-discipline or area of any particular project.
● Very wide range of RAC proposals do not allow for area experts, so you cannot assume that your proposal is going to be read by someone who works directly in the area or field of the project.
Don’t have lots of detailed jargon. That’s very hard for someone outside the specific area. Write for a discipline expert, but not for the specific area.
Use the Templates
RRG Regular Stream ● Compute between 50*-1999 Core Years● Storage between 10-999 TBs● GPUs between 10-199 GPU years● Cloud Compute between 80-499 VCPUs● Persistent Cloud between 10-99 VCPUs
RRG Large Stream Anything larger than the above
We strongly recommend that PIs use the templates.● RRG Application Template (Regular Stream) (Large Stream)● RPP Application Template
Summaries and details are in the following sections.
Justify your Request
● Address the evaluation criteria (details to follow)● Provide details on significance and impact of research.● Citation rates of recent work help justify science.● Justify why/how the resources requested will be used to accomplish/support
the science.● Poor proposals generally do not provide sufficient information, or have mixed
technical requirements into the research justification. This is very difficult to decipher for both science and technical reviewers, and results in poor scores.
“Please improve motivation of why the proposed calculations are important, and what is to be learned and/or what other science depends on the results.”
Provide Adequate Details
● Clearly explain WHAT the science is, using specific details. ● Use tables to provide resource details by project● If it is difficult to predict usage then emphasize the areas of uncertainty.
Project Team Members
Estimated Core-Years
/project Storage
Memory/ core Comments
Project 1 Student X 10,000 100TB 4 GB ...
Project 2 Students Y, Z 5,000 50TB 32 GB ...
Totals: 15,000 150TB
“The technical justification did not show a calculation of the computing and storage needs. Providing a table with this information, as suggested in the guidelines, would have made this section stronger.”
Be Concise
● Provide ALL the information asked for, but ONLY what is asked for. ○ Use the templates!
● Answer the specific questions and provide only those details requested.● Do not re-use submissions from other proposals or competitions.
○ Students and post-docs can write sections (they are the experts) but the overall proposal needs an experienced, guiding hand.
● Be clear, avoid jargon - reviewers are not experts in all sub-domains.● Take time to edit and review the application before submitting it.
“The proposal is very long. The very detailed scientific justification is more reminiscent of a NSERC proposal. For next year, I recommend to shorten the science part, and to work out more clearly the purpose of the CC usage, and the results that will be enabled by the CC Allocation.”
HQP Impact
● Ideally provide a table showing the expected HQP at each level (undergraduate, masters, doctorate, etc).
● Highlight the contribution of any excellent graduate students.● Mention any training opportunities beyond that of normal academic
teaching.
“For future allocation requests… describe in detail the involvement of HQP in past projects that utilized Compute Canada resources. The current proposal states that one PDF will be involved and reviewers
were wondering if there were plans for this PDF to mentor junior graduate or undergraduate students. If so, mention this in the proposal.”
***“This proposal was very clearly articulated with an integrated HQP training plan. The number of HQPs is
impressive.”
Updated CCV & Past Year Progress
● Keep your CCV up to date!● Reference citations in the proposal (“Progress over the Past Year” section)
which were or will continue to be supported by the resources requested.
“For future allocation requests, please consider adding references to published work in the proposal, progress made in the previous year, and clear description of how HQP will be involved in achieving the
milestones.”***
“Progress over the past year is missing. Based on the CCV, most of the group's publications were enabled through Compute Canada resources, so that information should have been accessible and
would have been relevant to include.”
Narrative
● This was emphasized by reviewers!
Science impact → solution techniques and challengesSolution techniques and challenges → the technologies and resource ask.
● Team (HQP, RPP management, ..) supports the science and the technologies/ask.
There should be a thread or narrative used to present a well-connected and justified story.
Discrepancies in the Asks
Very important: Ensure there are NO discrepancies between the resources requested in your technical justification
document and the online application.
● Seems to happen a number of times every year.● Requests on the form are not consistent with the technical justification.● Please be careful - we have missed or mistaken the resource requests in
the past.
RRG
Resources for Research Groups● Aimed at Job-based use of the big clusters.
Emphasis1. Quality of Science - including impact and technical
justification2. Quality of Applicant(s)
RRG Evaluation CriteriaQuality of Science
60% Research Problem and Justification● originality and innovation● significance and expected contributions to research● clarity and scope of objectives● impact of the research
Technical Justification● clarity and appropriateness of methodology● feasibility● discussion of relevant issues
Training and Support of HQP
Quality of the Applicant(s)
40% Research Problem and Justification● knowledge, expertise, and experience● quality of contributions to, and impact on, the proposed and other research areas● importance of contributions
Training and Support of HQP
Blue headers are from the templates
RRG TemplateSection Regular Large
Research Problem and Justification
1 section - 3 pages 2 subsections:● Research Problem (1 page)● Research Justification (2-3 pages)
Training and Support of HQP
1 page 1 page
Technical Justification 2 pagesCompute Request (3 sections)
● Code details, performance and utilization
● Memory● Compute
Storage Requests (1 section)
3 pagesCompute Request (5 subsections)
● System selection● Code details● Code performance and utilization● Memory requirements● Size of compute request
Storage Requests (3 subsections)● Details● Performance and utilization● Size
Progress over last year ½ page Length not specified. About a page is reasonable.
Technical Justification I
Provide performance estimates.
Explain the projects being worked on and the proposed research plan.
● RAS - default resources available for testing and prototyping.● Performance tables really help the reviewers, and make the proposal look good!
● Tabulate the number and size of the needed runs/jobs/virtual machines● Again tables and breakdowns really add to the proposal.
○ Sub-tasks or sub-projects, phases, team members, ..
● If it is difficult to predict usage then emphasize the areas of uncertainty.○ early research issues where details are still to be worked out○ administrative issues like onboarding graduate students or postdocs who have not yet worked
out a detailed research plan.
Technical Justification II
Provide details of the numerical and computational techniques used.
● Describe the numerical and computational techniques. Expect reviewers to be computationally literate, but not experts in your particular field.
● Explain the way these techniques will address the science and the challenges.● Explain any development of new or revised approaches.● Provide details of any software packages required. ● Show that the users are experienced and familiar with the performance
characteristics of such packages.● If the packages are new to the research team then describe the approach for
familiarization. If possible describe any tests that the team may undertake.
Resource Ask
● A critical section!● Tables showing your performance/requirements.● Tables showing a project breakdown.● Tie the previous techniques, science, HQP sections
together into the final ask (the thread!).
Carefully justify the resource ask.
Both science and technical reviewers are very aware of inflated, unjustified asks and have unilaterally decreased asks in the past.
Scaling
We really appreciate an upfront, clear presentation of the impact of scaling on the project. What would you as a researcher do if you were given some percentage of your ask?
● Impact on hiring graduate students.● Impact on hiring post-docs.● Issues with collaborative projects for the other co-PI’s or co-workers.● Critical or short-term projects that are necessary for further activities.● Identification of lower-priority projects or activities that could be postponed.● Projects which are critical to, for instance, an early-stage researcher who is
applying for tenure.
RPP/Cloud
RPP is aimed at Platforms and Portals1. Portal - reasonably straightforward web server2. Platform - larger-scale
a. Analysis platform (job creation, job submission, virtual cluster, ..)b. Data dissemination (metadata, search, object storage, ..)
3. Computea. Unique or specialized software stacksb. International agreements requiring multi-year
There have been quite enthusiastic ideas with big asks, but in practice the user community is quite specialized and uptake is dependent on the user interfaces and the services offered. Such asks are critically reviewed and the allocation decreased.
RPP Review Overview
Review Emphasis:1. Impact
a. what is the user community?b. what is being provided?
2. Development and Managementa. Is the team capable of developing and managing a major platform/portal
system in the cloud.b. Are the requested resources reasonable for the predicted user community?
The criteria and templates are more detailed compared to RRG.The current (RAC 2018) cloud resources are IaaS (“Infrastructure as a
Service”) resources. So it is completely up to the project team to design and implement a suitable architecture.
RPP Evaluation Criteria ICriteria Weight Notes
Goal and Alignment
5 ● “Project goal is clearly stated”● “Area of focus is of importance to Canada, and will generate positive
benefits.
Use and development of a Platform/Portal
5 ● “Clearly explained the added value .. of the proposed portal”● “Driven by the targeted research community”● “Details level of interaction between Canadian and international research
groups”● Sharing data sets is well detailed.● “Addresses any potential accessibility issues”
Expected Impact 5 ● “Impacts have been clearly articulated”● Means by which impacts will be measured.● Clear timeline.● Outcomes are of relevance and importance, and will benefit the partners
and research community
RPP Evaluation Criteria II
Criteria Weight Notes
Management and Operation
5 ● Management is well-defined and will provide broad access to research community
● Resource access process is well-defined.● Credible plan to maintain or increase the community.● If applicable scientific evaluation of projects accessing the portal will
result in excellent research.● Information collected will provide ability to track impacts (scientify,
HQP, social, ..) and the participants are maximizing the benefits of the resources accessed.
● The team has the right combination of skills.
Highly Qualified Personnel
5 ● Significant benefits to HQP● HQP have been clearly delineated across academic levels● Platform/portal will create unique training opportunities● Identify strong examples of cross-pollination between HQP disciplines
RPP Evaluation Criteria IIIAppropriateness of Resources Requested
5 ● good understanding of the resources required● provided a justified request for resources.● resources are acceptable for the level of activities and anticipated
deliverables.● well justified over the entire duration of the request.● distribution of resources over the entire duration of the request is well
defined and includes any periods of high demand.● If applicable – locations of the resources requested are appropriate given
the capacity of the Compute Canada systems.● If applicable – storage requested has been appropriately categorized as
long-term/archival storage or a rapid accessibility storage solution.● For applications involving international participation, the application has
indicated the domestic proportion of resources for use across Canada compared to the international proportion and distribution.
● The applicant has provided a clear indication of the impacts of scaling down the resource allocation and has tied these impacts to the expected outcomes.
RPP Template Overview
Resource Justification
Portals - 3 pagesPlatforms - 6 pages
● Resource request summary - a table● Expected Resource Usage and Needs● Support required from CC.
Strategic Plan
Portals - 3 pagesPlatforms - 15 pages
● Goal and alignment with CC strategic plan● Need for and Use of the Platform/Portal● Expected Impacts● Management and Operation of the Platform/Portal● Highly Qualified Personnel (HQP)
Resource Justification next slide
RPP Template I
Resource JustificationPortals - 3 pagesPlatforms - 6 pages
● Resource request summary - a table● Expected Resource Usage
○ Description of storage and compute needs.○ Rationale for the choice of specific computing and storage resources (if
applicable).○ Description of high demand periods (If applicable).○ Description of who is expected to use the resources (domestic proportion
across country, international proportion and distribution).● Support required from CC.
RPP Template II
Strategic PlanPortals - 3 pagesPlatforms - 15 pages
Goal and alignment with CC strategic planUse of a Platform/Portal
● need for a platform/portal and added value created for the user community● context of the demand by the research community to consolidate efforts into a platform/portal● level of interaction between Canadian and international research groups and platforms● approach to sharing data sets across or within the platform/portal
Expected Impacts● expected impacts of the research resulting from use of the platform/portal● list of the outcomes resulting from the use of Compute Canada resources and the expected schedule for
achieving these outcomes● Explain how other partners, researchers or scientific areas will benefit from the outcomes of the research
Management and Operation of the Platform/Portal● Details of the management system and how the resources will be shared/distributed across the
platform/portal● Explain how the research community will access the resources and how the platform/portal will
maintain/increase the population accessing resources● evaluation mechanisms used to provide scientific rigor in selecting projects● information collected and the regularity of the project reporting to track the progress towards the
anticipated expected outcomesHighly Qualified Personnel (HQP)
● Students and post-docs● Canadian and International collaborators
RPP Phased Plans
We encourage phased plans, with for instance performance indicators showing a well-justified and strong understanding if the issues involved with not only technical development, but also marketing and communications.
If possible define performance and success metrics. It’s always useful to have a nice table defining your Key Performance Indicators (KPIs)
Especially startups - lots of large, enthusiastic but unjustified estimates.● Start small.● Include KPI’s to measure performance.● Briefly address marketing and communications.
RPP Impact Best Practices
1. Identify the audience/community.a. Who would be interested in using the proposed platform or portal?b. Why are they interested?c. Is the community national in scope? International?
2. What is the size of the community?a. justify any such estimates.
3. What kinds of research would you expect the community to carry out?a. examples of exciting projects that would make use of the portal/platform.
4. Are there any agreements in place that would put conditions on the request?a. For instance the Atlas High-Energy physics project is part of the Large
Hadron Collider collaboration, and allocations must satisfy international agreements.
RPP Dev & Management1. Carefully describe the server architecture.
a. Feeds into the resources ask.
2. Describe any security/privacy concerns or requirementsa. steps that will be taken to satisfy such requirements.
3. Is the architecture scalable?a. Refer to the size of the community to justify any scalability issues or plans.
4. Demonstrate the expertise and capability of your staff.a. A portal or platform requires system management so identify the staff resources necessary to
both develop and operationally manage your proposed system.
5. Special requirements?a. Cloud or tape based backupsb. Particularly large memory requirementsc. Performance or response requirementsd. Redundancy and Reliability requirements. For instance multi-site architectures.e. ...
GPUs
● Both Cedar and Graham have GPUs (P100’s)○ See docs.computecanada.ca for details.
● Currently phase 2 discussions about GPU’s○ Cedar expansion (2x): should it include GPU nodes?○ GP4: what mix of GPU’s? Volta? Tesla? Maybe even gamer versions?○ How many GPU’s per node?○ Tradeoff: cheap cpu nodes vs more expensive GPU nodes
● Now is the time to make special requests to the GP4 and Cedar teams.○ Send comments to me: [email protected]
Resource Links
● Ask a question:○ About RAC ~ [email protected] ○ About Compute Canada CCV ~ [email protected]
● Find an answer:○ General Competition Information○ RPP Guide
■ Application Template○ RRG Guide
■ Application Template (Regular Stream) (Large Stream)○ Technical Glossary○ Frequently Asked Questions○ CCV Guide
● Presentations & Guides○ WestGrid’s Summary of Best Practices for RAC 2018○ The Full WestGrid Best Practices Guide○ Compute Canada RAC Q&A October 2017
Contact us anytime:[email protected]
www.westgrid.cadocs.computecanada.ca
GOOD LUCK!
Questions?