16
Compute Canada Technology Briefing November 12, 2015

Compute Canada Technology Briefing · Compute Canada resources research teams and their international partners work with industry giants in the automotive, ICT, life sciences, aerospace

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Compute Canada Technology Briefing · Compute Canada resources research teams and their international partners work with industry giants in the automotive, ICT, life sciences, aerospace

Compute Canada Technology Briefing

November 12 2015

IntroductionCompute Canada in partnership with regional organizations ACENET Calcul Queacutebec Compute Ontario and WestGrid leads the acceleration of research innovation by deploying state-of-the-art advanced research computing (ARC) systems storage and software solutions Together we provide essential digital research services and infrastructure for Canadian researchers and their collaborators in all academic and industrial sectors Our world-class team of more than 200 experts employed by 35 partner universities and research institutions across the country provide direct support to research teams and industrial partners Advanced research computing accelerates research and discovery and helps solve todayrsquos grand scientific challenges Using Compute Canada resources research teams and their international partners work with industry giants in the automotive ICT life sciences aerospace and manufacturing sectors to drive innovation and new products to marketCanadian researchers leverage their access to expert support and infrastructure to participate in international initiativesResearchers using advanced research computing rate significantly higher in citations than the average from Canadarsquos top research universities and any international discipline average

Compute Canada Technology Briefing - November 2015 3

Key Facts ņ The investment of $75 million in funding from the Canada Foundation for Innovation (CFI)

and provincial partners will address urgent and pressing needs and replace aging high performance computing systems across Canada

ņ Compute Canada and its regional partners have more than 18 years of experience in accelerating results from industrial partnerships in advanced research computing and Canadarsquos major science investments

ņ Compute Canada currently manages more than 20 petabytes of storage and 2 petaflops of computing resources and supports all of Canadarsquos major science investments and programs

ņ With the implementation of this technology deployment plan Compute Canada will manage more than 60 petabytes of storage and 134 petaflops of computing resources

ņ What this means for Canadarsquos Research Community

ņ These improvements will allow Compute Canada to continue to support the wide array of excellent Canadian research identified in the proposal The purchase of significantly more storage deployed as part of an enhanced national storage infrastructure will accelerate data-intensive research in Canada The ability to purchase a single Large Parallel machine of over 65000 cores will provide Canadarsquos largest compute-intensive users with a new resource which far exceeds any machine in the Compute Canada fleet today

ņ This investment is more than an opportunity to increase the size of storage systems and a raw number of cores The new systems replace old technology with new technology and will be deployed with national services coherent policies and a new operational model for the organization This enhanced service level will allow more researchers to exploit the planned four new systems in an efficient and effective way

OverviewCompute Canada is the national resource provider for advanced research computing and big data delivering a full range of systems and services to researchers Funding from the Canada Foundation for Innovation (CFI) matching funds from provincial partners and from vendors in the form of in-kind contributions will enable the significant technology refresh program described below

This technology briefing document is intended to be circulated to Compute Canada stakeholders and suppliers It provides status and planning for the technology refresh program resulting from CFIrsquos cyberinfrastructure initiative and will be implemented from 2015 through early 2018 It also anticipates planning for future growth

The total value of this capital program is $75M to be spent mainly in 2016 and 2017 This reflects a $30M capital grant from CFI a further $30M from provincial and institutional sources and $15M of vendor in-kind1 By the end of 2017 many legacy systems will have been replaced by new computational systems and storage totalling over 123000 CPU cores and 60 PB of storage

1 See wwwinnovationcaenOurFundsComplementaryinformation

Compute Canada Technology Briefing - November 2015 5

New Systems at Four National SitesThrough a formal competition among Compute Canada member institutions four sites were selected to host the new systems and associated services They are the University of Victoria Simon Fraser University (SFU) the University of Waterloo and the University of Toronto

New Computational SystemsPlanning for the new systems at the four sites has been responsive to user demand site affinity and experience and shifts in timing and funding Envisioned system characteristics follow

University of Victoria The ldquoGP1rdquo system will be an OpenStack cloud with emphasis on hosting virtual machines and other cloud workloads At least 3000 CPU cores2 are anticipated by early 2016 with a 40 expansion planned in 2017

Simon Fraser University The ldquoGP2rdquo system will mainly focus on a mix of batch-oriented parallel and serial workloads with several different node types It will also have a relatively small OpenStack partition that will federate with GP1 and GP3 Node types will include some large memory nodes as well as approximately 1923 GPU nodes At least 18000 CPU cores is anticipated for mid-2016 with a 40 expansion planned in 2017

University of Waterloo The ldquoGP3rdquo system will have a similar design to GP2 and it is anticipated that GP2 and GP3 together will provide features for workload portability and resiliency Plans for GP3 include at least 19000 CPU cores in late 2016 with approximately 64 GPU nodes A 40 expansion is planned in 2017

University of Toronto The ldquoLPrdquo system will be deployed by approximately mid-2017 anticipated to have at least 66000 CPU cores This will be a balanced tightly coupled high performance computing resource designed mainly large parallel workloads

National Storage ArchitectureA new national storage architecture spanning the four sites will offer important benefits to users Compute Canada will utilize concepts of generic ldquostorage building blocksrdquo which will use software-defined storage techniques to deploy capacity and performance in a flexible easily expandable highly interoperable and cost-effective manner In addition to providing file systems for file-based storage there will be object storage services Object storage services will provide ease of use and built-in features including resiliency georeplication enhanced metadata and combinations of public access and data isolation

Approximately 20 petabytes (PB) of persistent storage is planned to be deployed across the four sites in early and mid-2016 with expansion to over 60PB by early 2018 An offlinenearline tier of over 20PB will provide lower-cost capacity for backups and hierarchical storage management High performance parallel filesystems for GP2 GP3 and LP will also be deployed

2 CPU core count equivalents are based on Intel ldquoHaswellrdquo computational capabilities3 All future plans for nodes CPUs and other specifications are intended as conservative estimates

Other Compute Canada member sites will be able to benefit from the national storage architecture including those sites operating legacy resources For example users may need to migrate data to the new systems or they might have use cases that will benefit from object storage or the larger capacity and higher performance the newer systems will offer

Delivery TimelineThe Challenge 2 Stage 1+2 technology refresh will span 2 years of staged deployment By the end of calendar year 2017 essentially all Challenge 2 Stage 1+2 funds will have been expended The total supply at that time is forecasted to be at least 126500 CPU cores (ldquoHaswellrdquo equivalents) and 62 petabytes of usable persistent storage Storage does not include near-line or backup storage nor high-speed parallel scratch space

Challenge 2 Stage 1+2 Technology Planning Compute is in Haswell-equivalent cores Storage is in usable petabytes Timeframe is calendar year quarters (ie ldquoQ1 2016rdquo is January-March 2016) and is approximate The core and storage targets are estimates only

During the same two-year period much of Compute Canadarsquos existing equipment will be defunded and removed from the allocations process Users will be moved to one of the new systems and needed data will be migrated Planning in 2014 for the site selection process identified 26 systems with 82000 CPU cores from older generations (nearly 1PF total) for retirement by early calendar year 2017 A schedule for the remaining systems will be developed in conjunction with planning for further technology expansion with some of the remaining systems likely to be removed from the allocations process in 2018 Much of the 15PB of allocatable storage available in 2015 will also be defunded and removed from the allocations process during the 2016-2018 period

Compute Canada Technology Briefing - November 2015 7

Organizational Cooperation and PlanningPlanning for site selection and the ensuing technology refresh has included deep coordination among the four national host sites for all aspects of procuring deploying configuring operating and supporting Compute Canadarsquos suite of systems and services The Compute Canada Technological Leadership Council (TLC) is responsible for developing specifications for the new systems and will lead the procurement evaluation TLC includes representatives from each national site as also includes the four regional CTOs It is led by the national CTO

New national teams which will draw from Compute Canada member institutions will run the systems and services provide user support and engage in cross-site coordination on major themes such as monitoring storage cloud services and networking The new systems and services will share practices for security The teams for all national systems and services will provide defined coverage and response levels

Procurement ProcessesAll four sites are working with the Compute Canada team to ensure an open and fair acquisition process Resources will be purchased and owned by each site Formation of specifications and evaluation of bids will be by national teams with full engagement by site procurement officers

Flexibility in PlanningPlans described here will be modified as needed based on discussions among the four sites Compute Canada and the national and provincial funding agencies Re-scaling of expectations for system size and capabilities if needed will be based on experience with vendor pricing and the influence of the Canadian dollarrsquos exchange rate There will also be assessment of anticipated user demand including for new technologies or configurations This will be via the SPARC process described below as well as through discussions with funding agencies and their researchers

By late 2016 updates will be considered for any needed revisions to planning for the expansion of the three GP systems and of the scale configuration and timing of the LP system Alignment of supply and demand will be re-assessed for computation and storage

Planning will also be responsive to any new information concerning additional funding the selection of additional hosting sites shifts to Canadarsquos digital research infrastructure strategy or other factors

Funding and GovernanceSFU is the lead organization for the CFI capital program and is executing an interinstitutional agreement with the three other hosting institutions and Compute Canada The Compute Canada membership will be involved in many broader aspects of organizational governance and planning CFI retains oversight for capital spending for the technology refresh as well as operational expenses via the Major Science Initiatives (MSI) program

Usage and CapabilitiesAs Canadarsquos national platform for advanced research computing Compute Canada serves thousands of users in essentially every scientific discipline Compute Canada is continually engaged in renewal and expansion of its services and its audience Beyond Canadarsquos academic community this includes engagement with industry and with international partners Some of the current and expanded services within Compute Canada are described below

Workload Portability Users will find it easier than before to run their jobs on any of the new systems This will be facilitated by deploying a single HPC batch system having a common naming scheme for software modules and filesystem mount points and incorporating mechanisms for data movement with the workload manager For projects involving a live stream of observational data or other time-sensitive characteristics workload portability will help to ensure the jobs run on time wherever appropriate HPC resources are available

Cloud Computing Building on Compute Canadarsquos successful early deployment of cloud systems and services the GP1 GP2 and GP3 systems will comprise a federated cloud including single sign-on shared data services a common cloud scheduler and other features for resiliency and ease of use Additional cloud resources within Compute Canada will be able to become part of the federated cloud simply by using the same authentication and configuration parameters

Big Data The storage architecture and cloud services will facilitate big data workloads including data analytics Storage will include database capabilities and cloud services will support virtual machines with user-selected software and features

National Operations and Support The national teams will work together to provide a consistent and well-supported environment for computation and data This will include all aspects of configuration and support Users will have a single point of contact to the national helpdesk and will also be able to benefit from the expertise of on-campus support personnel

Resource Allocations Compute Canada will continue to allocate compute and storage resources through a fair and open process Workload portability and the consistency of configuration and support will give users extra flexibility when desired in their choice of computing resources

Compute Canada Technology Briefing - November 2015 9

National ServicesConsultations have helped inform planning for systems and services in 2015-2017 Service demands were articulated while consulting with applicants for CFIrsquos Challenge 1 Stage 1 including these middleware services that were identified by multiple applicants

ņ Identification and Authorization Service Provide common login across systems

ņ Software Distribution Service Version-controlled software distribution to multiple sites

ņ Data Transfer Service To move datasets among collaborators and their repositories

ņ Monitoring Service Track uptime and availability of services and platforms

ņ Resource Publishing Service Current information about available resources

These services will be deployed beginning in 2016 for all new systems as part of the infrastructure investment Additional services will be identified and developed deployed and supported based on demand It is Compute Canadarsquos intention to provide a useful and effective set of middleware services accessible to any user or group These will provide a high performance and well-supported baseline upon which users or groups may build their own custom applications Compute Canada views these tools as needed software infrastructure and is devoting some of the Challenge 2 Stage 1+2 funds to developing that infrastructure

Compute Canada views many of the new services identified above as essential enabling tools for Research Data Management (RDM) As data volumes grow there is a growing demand for RDM Compute Canada will provide a common set of middleware services for users with this need RDM will continue to mature during the 2016-2018 period and will include cooperation with other digital research infrastructure providers in Canada

Future Consultations on this PlanIn early 2016 Compute Canada will embark on a second round of SPARC consultations SPARC2 will help to identify current and future needs as well as to parameterize growth in user demand As with the previous SPARC scientists and engineers from the across Canada will be invited to submit descriptions of their research goals and the needed advanced research computing capabilities and capacities required to achieve those goals

Projections for Future Supply and Demand

Technology Impact of Challenge 2 Stage 1+2By the end of calendar year 2017 Compute Canada will have delivered essentially all of the new computational and storage capacity facilitated by CFIrsquos ldquoChallenge 2 Stage 1+2rdquo award The $75M value of capital investment will replace most legacy systems and associated storage

Modernization and capacity resulting from Challenge 2 Stage 1+24

4 Primary disk does not include offlinenearline storage for backups or near-line storage It does include a variety of disk- or disk-like technologies including object storage block storage storage replicas and storage for filesystems

Compute Canada Technology Briefing - November 2015 11

During this technology refresh program CPUs will be replaced with the latest generation along with more memory New nodes will be augmented by GPUs and accelerators A typical node in service in 2015 has dual 6- or 8-core CPUs and 16-32GB of memory A typical node to be deployed in 2016 will have dual 14- or 16-core CPUs with 128GB of memory or greater

Challenge 2 Stage 1+2 is an important and necessary modernization of the DRI provided by Compute Canada Sustained investment is needed to accommodate the needs of current and future users of Canadarsquos national platform for computation and storage

Scenarios of Increasing DemandThere are several factors impacting planning future demand for advanced research computing

1 Demand by users who engage in computational modelling for additional CPU resources

a To increase spatial or temporal resolution

b To add physics or other simulation factors that were previously too slow or computationally expensive to calculate

c To test additional parameters or scenarios

d For projects and users new to Compute Canada especially in nontraditional fields

2 Demand for additional storage resources for computational modellers

a Larger input and output datasets due to larger or more complex models

b The need to keep some datasets beyond the end of a computational campaign to assist in future modelling or to support publications

3 Demand for portals and gateways including from new user populations

a May include needs for highly resilient services and systems

b May include needs for high-end storage subsystems for database operations

c Bring a user base that may be quite large and may include the general public

4 Demand for projects emphasizing instruments and observational data gathering and analysis

a May have irreplaceable or highly valuable data which needs to have multiple copies at multiple locations

b Include Compute Canadarsquos largest storage users many of whom have new instruments in development

c Require computational resources for post-processing analysis portals visualization andor reanalysis

5 Demand by data-focused projects

a May require isolation of data or computation from inappropriate disclosure

b Includes some usage (such as personal health information) with regulatory concerns

c Includes emphasis on data analytic methods that are not yet generally available on Compute Canada resources

6 Demand from projects being directed by funding agencies to consider utilizing Compute Canada resources

a Include a range of use cases that might not be in Compute Canadarsquos current service catalog but will be developed

b Some of these projects are very large and demanding

c Projects view Compute Canada as a partner not just a resource provider

7 Demand from industry

a May require isolation of data or computation from inappropriate disclosure

b Interested in the expertise of Compute Canada perhaps more than the computational resources

8 Demand from government

a Exist within a regulatory environment that might not be in part of Compute Canadarsquos current service catalog but will be developed

b Can involve a long planning timeline

Compute Canada Technology Briefing - November 2015 13

For Challenge 2 Stage 1+2 planning emphasized modernization of the computational and storage resources Planning has been sensitive to anticipated demand growth and changing patterns of utilization informed via Challenge 1 and the SPARC consultations The annual allocations process by Compute Canada is a major indicator of growth trends since it aggregates hundreds of existing projects Data for the 2016 allocation period are now available and reflect the impact of Challenge 2 Stage 1+2 In 2017 further growth is anticipated along with retirement of legacy resources

Through the 2014-15 SPARC process Compute Canada has identified the expected growth in community demand for storage (15x) and compute (7x) resources through 2020 This data has been converted into a doubling time to project future demand in equivalent core years and terabytes of storage Demand indicators support using a doubling time of 18 years for computational demand and 13 years for storage demand

By forecasting demand based on these doubling times we can project a trend into the future For this projection we present a range where the lower bound represents no growth in the Compute Canada user base and the upper bound represents ongoing increases in the user base following historical trends 2011-2015

Trends project a demand for 1-3 million Haswell-equivalent cores by 2020 and more than an exabyte of persistent storage These projections may turn out to be underestimates since some existing disciplines making extensive use of Compute Canada resources today anticipate needing over 1 million cores or 1 exabyte of data just for their own projects by 2020

It is hoped that Compute Canada will be stewards along with members regions and provincial partners of the sustained capital investment that will be required to meet these demands

Compute Canada Technology Briefing - November 2015 15

Vision 2020Compute Canada as a leading provider of digital research infrastructure (DRI) is taking an integrated approach to data and computational infrastructure in order to benefit all sectors of society As a result of the technology refresh and modernization supported by CFIrsquos Challenge 2 Stage 1+2 excellent research will benefit from modern and capable resources for computationally-based and data-focused work

Compute Canada is coordinating with government funding agencies and with other DRI providers to develop a vision of coordinating to provide the worldrsquos most advanced integrated and capable systems services and support for research Future researchers will have seamless access to DRI resources integrated together for maximum efficiency and performance without needing to be concerned with artificial boundaries based on different geographical locations or providers

By 2020 Compute Canada will offer a comprehensive catalog of resources to support the full data research cycle allowing researchers and their industrial and international partners to compete at a global scale In cooperation with Canadarsquos other DRI providers Compute Canadarsquos systems and services will facilitate workflows that easily span different resources from the lab or campus to national computational resources analytical facilities publication archives and with collaborators Local support and engagement will remain a hallmark of delivering excellent service to all users The pathway to this future has begun with the modernization of Compute Canadarsquos advanced research computing cyberinfrastructure through the CFI Challenge 2 Stage 1+2 program

36 York Mills Road Suite 505 Toronto Ontario Canada M2P 2E9

wwwcomputecanadaca | wwwcalculcanadaca | ComputeCanada

Page 2: Compute Canada Technology Briefing · Compute Canada resources research teams and their international partners work with industry giants in the automotive, ICT, life sciences, aerospace

IntroductionCompute Canada in partnership with regional organizations ACENET Calcul Queacutebec Compute Ontario and WestGrid leads the acceleration of research innovation by deploying state-of-the-art advanced research computing (ARC) systems storage and software solutions Together we provide essential digital research services and infrastructure for Canadian researchers and their collaborators in all academic and industrial sectors Our world-class team of more than 200 experts employed by 35 partner universities and research institutions across the country provide direct support to research teams and industrial partners Advanced research computing accelerates research and discovery and helps solve todayrsquos grand scientific challenges Using Compute Canada resources research teams and their international partners work with industry giants in the automotive ICT life sciences aerospace and manufacturing sectors to drive innovation and new products to marketCanadian researchers leverage their access to expert support and infrastructure to participate in international initiativesResearchers using advanced research computing rate significantly higher in citations than the average from Canadarsquos top research universities and any international discipline average

Compute Canada Technology Briefing - November 2015 3

Key Facts ņ The investment of $75 million in funding from the Canada Foundation for Innovation (CFI)

and provincial partners will address urgent and pressing needs and replace aging high performance computing systems across Canada

ņ Compute Canada and its regional partners have more than 18 years of experience in accelerating results from industrial partnerships in advanced research computing and Canadarsquos major science investments

ņ Compute Canada currently manages more than 20 petabytes of storage and 2 petaflops of computing resources and supports all of Canadarsquos major science investments and programs

ņ With the implementation of this technology deployment plan Compute Canada will manage more than 60 petabytes of storage and 134 petaflops of computing resources

ņ What this means for Canadarsquos Research Community

ņ These improvements will allow Compute Canada to continue to support the wide array of excellent Canadian research identified in the proposal The purchase of significantly more storage deployed as part of an enhanced national storage infrastructure will accelerate data-intensive research in Canada The ability to purchase a single Large Parallel machine of over 65000 cores will provide Canadarsquos largest compute-intensive users with a new resource which far exceeds any machine in the Compute Canada fleet today

ņ This investment is more than an opportunity to increase the size of storage systems and a raw number of cores The new systems replace old technology with new technology and will be deployed with national services coherent policies and a new operational model for the organization This enhanced service level will allow more researchers to exploit the planned four new systems in an efficient and effective way

OverviewCompute Canada is the national resource provider for advanced research computing and big data delivering a full range of systems and services to researchers Funding from the Canada Foundation for Innovation (CFI) matching funds from provincial partners and from vendors in the form of in-kind contributions will enable the significant technology refresh program described below

This technology briefing document is intended to be circulated to Compute Canada stakeholders and suppliers It provides status and planning for the technology refresh program resulting from CFIrsquos cyberinfrastructure initiative and will be implemented from 2015 through early 2018 It also anticipates planning for future growth

The total value of this capital program is $75M to be spent mainly in 2016 and 2017 This reflects a $30M capital grant from CFI a further $30M from provincial and institutional sources and $15M of vendor in-kind1 By the end of 2017 many legacy systems will have been replaced by new computational systems and storage totalling over 123000 CPU cores and 60 PB of storage

1 See wwwinnovationcaenOurFundsComplementaryinformation

Compute Canada Technology Briefing - November 2015 5

New Systems at Four National SitesThrough a formal competition among Compute Canada member institutions four sites were selected to host the new systems and associated services They are the University of Victoria Simon Fraser University (SFU) the University of Waterloo and the University of Toronto

New Computational SystemsPlanning for the new systems at the four sites has been responsive to user demand site affinity and experience and shifts in timing and funding Envisioned system characteristics follow

University of Victoria The ldquoGP1rdquo system will be an OpenStack cloud with emphasis on hosting virtual machines and other cloud workloads At least 3000 CPU cores2 are anticipated by early 2016 with a 40 expansion planned in 2017

Simon Fraser University The ldquoGP2rdquo system will mainly focus on a mix of batch-oriented parallel and serial workloads with several different node types It will also have a relatively small OpenStack partition that will federate with GP1 and GP3 Node types will include some large memory nodes as well as approximately 1923 GPU nodes At least 18000 CPU cores is anticipated for mid-2016 with a 40 expansion planned in 2017

University of Waterloo The ldquoGP3rdquo system will have a similar design to GP2 and it is anticipated that GP2 and GP3 together will provide features for workload portability and resiliency Plans for GP3 include at least 19000 CPU cores in late 2016 with approximately 64 GPU nodes A 40 expansion is planned in 2017

University of Toronto The ldquoLPrdquo system will be deployed by approximately mid-2017 anticipated to have at least 66000 CPU cores This will be a balanced tightly coupled high performance computing resource designed mainly large parallel workloads

National Storage ArchitectureA new national storage architecture spanning the four sites will offer important benefits to users Compute Canada will utilize concepts of generic ldquostorage building blocksrdquo which will use software-defined storage techniques to deploy capacity and performance in a flexible easily expandable highly interoperable and cost-effective manner In addition to providing file systems for file-based storage there will be object storage services Object storage services will provide ease of use and built-in features including resiliency georeplication enhanced metadata and combinations of public access and data isolation

Approximately 20 petabytes (PB) of persistent storage is planned to be deployed across the four sites in early and mid-2016 with expansion to over 60PB by early 2018 An offlinenearline tier of over 20PB will provide lower-cost capacity for backups and hierarchical storage management High performance parallel filesystems for GP2 GP3 and LP will also be deployed

2 CPU core count equivalents are based on Intel ldquoHaswellrdquo computational capabilities3 All future plans for nodes CPUs and other specifications are intended as conservative estimates

Other Compute Canada member sites will be able to benefit from the national storage architecture including those sites operating legacy resources For example users may need to migrate data to the new systems or they might have use cases that will benefit from object storage or the larger capacity and higher performance the newer systems will offer

Delivery TimelineThe Challenge 2 Stage 1+2 technology refresh will span 2 years of staged deployment By the end of calendar year 2017 essentially all Challenge 2 Stage 1+2 funds will have been expended The total supply at that time is forecasted to be at least 126500 CPU cores (ldquoHaswellrdquo equivalents) and 62 petabytes of usable persistent storage Storage does not include near-line or backup storage nor high-speed parallel scratch space

Challenge 2 Stage 1+2 Technology Planning Compute is in Haswell-equivalent cores Storage is in usable petabytes Timeframe is calendar year quarters (ie ldquoQ1 2016rdquo is January-March 2016) and is approximate The core and storage targets are estimates only

During the same two-year period much of Compute Canadarsquos existing equipment will be defunded and removed from the allocations process Users will be moved to one of the new systems and needed data will be migrated Planning in 2014 for the site selection process identified 26 systems with 82000 CPU cores from older generations (nearly 1PF total) for retirement by early calendar year 2017 A schedule for the remaining systems will be developed in conjunction with planning for further technology expansion with some of the remaining systems likely to be removed from the allocations process in 2018 Much of the 15PB of allocatable storage available in 2015 will also be defunded and removed from the allocations process during the 2016-2018 period

Compute Canada Technology Briefing - November 2015 7

Organizational Cooperation and PlanningPlanning for site selection and the ensuing technology refresh has included deep coordination among the four national host sites for all aspects of procuring deploying configuring operating and supporting Compute Canadarsquos suite of systems and services The Compute Canada Technological Leadership Council (TLC) is responsible for developing specifications for the new systems and will lead the procurement evaluation TLC includes representatives from each national site as also includes the four regional CTOs It is led by the national CTO

New national teams which will draw from Compute Canada member institutions will run the systems and services provide user support and engage in cross-site coordination on major themes such as monitoring storage cloud services and networking The new systems and services will share practices for security The teams for all national systems and services will provide defined coverage and response levels

Procurement ProcessesAll four sites are working with the Compute Canada team to ensure an open and fair acquisition process Resources will be purchased and owned by each site Formation of specifications and evaluation of bids will be by national teams with full engagement by site procurement officers

Flexibility in PlanningPlans described here will be modified as needed based on discussions among the four sites Compute Canada and the national and provincial funding agencies Re-scaling of expectations for system size and capabilities if needed will be based on experience with vendor pricing and the influence of the Canadian dollarrsquos exchange rate There will also be assessment of anticipated user demand including for new technologies or configurations This will be via the SPARC process described below as well as through discussions with funding agencies and their researchers

By late 2016 updates will be considered for any needed revisions to planning for the expansion of the three GP systems and of the scale configuration and timing of the LP system Alignment of supply and demand will be re-assessed for computation and storage

Planning will also be responsive to any new information concerning additional funding the selection of additional hosting sites shifts to Canadarsquos digital research infrastructure strategy or other factors

Funding and GovernanceSFU is the lead organization for the CFI capital program and is executing an interinstitutional agreement with the three other hosting institutions and Compute Canada The Compute Canada membership will be involved in many broader aspects of organizational governance and planning CFI retains oversight for capital spending for the technology refresh as well as operational expenses via the Major Science Initiatives (MSI) program

Usage and CapabilitiesAs Canadarsquos national platform for advanced research computing Compute Canada serves thousands of users in essentially every scientific discipline Compute Canada is continually engaged in renewal and expansion of its services and its audience Beyond Canadarsquos academic community this includes engagement with industry and with international partners Some of the current and expanded services within Compute Canada are described below

Workload Portability Users will find it easier than before to run their jobs on any of the new systems This will be facilitated by deploying a single HPC batch system having a common naming scheme for software modules and filesystem mount points and incorporating mechanisms for data movement with the workload manager For projects involving a live stream of observational data or other time-sensitive characteristics workload portability will help to ensure the jobs run on time wherever appropriate HPC resources are available

Cloud Computing Building on Compute Canadarsquos successful early deployment of cloud systems and services the GP1 GP2 and GP3 systems will comprise a federated cloud including single sign-on shared data services a common cloud scheduler and other features for resiliency and ease of use Additional cloud resources within Compute Canada will be able to become part of the federated cloud simply by using the same authentication and configuration parameters

Big Data The storage architecture and cloud services will facilitate big data workloads including data analytics Storage will include database capabilities and cloud services will support virtual machines with user-selected software and features

National Operations and Support The national teams will work together to provide a consistent and well-supported environment for computation and data This will include all aspects of configuration and support Users will have a single point of contact to the national helpdesk and will also be able to benefit from the expertise of on-campus support personnel

Resource Allocations Compute Canada will continue to allocate compute and storage resources through a fair and open process Workload portability and the consistency of configuration and support will give users extra flexibility when desired in their choice of computing resources

Compute Canada Technology Briefing - November 2015 9

National ServicesConsultations have helped inform planning for systems and services in 2015-2017 Service demands were articulated while consulting with applicants for CFIrsquos Challenge 1 Stage 1 including these middleware services that were identified by multiple applicants

ņ Identification and Authorization Service Provide common login across systems

ņ Software Distribution Service Version-controlled software distribution to multiple sites

ņ Data Transfer Service To move datasets among collaborators and their repositories

ņ Monitoring Service Track uptime and availability of services and platforms

ņ Resource Publishing Service Current information about available resources

These services will be deployed beginning in 2016 for all new systems as part of the infrastructure investment Additional services will be identified and developed deployed and supported based on demand It is Compute Canadarsquos intention to provide a useful and effective set of middleware services accessible to any user or group These will provide a high performance and well-supported baseline upon which users or groups may build their own custom applications Compute Canada views these tools as needed software infrastructure and is devoting some of the Challenge 2 Stage 1+2 funds to developing that infrastructure

Compute Canada views many of the new services identified above as essential enabling tools for Research Data Management (RDM) As data volumes grow there is a growing demand for RDM Compute Canada will provide a common set of middleware services for users with this need RDM will continue to mature during the 2016-2018 period and will include cooperation with other digital research infrastructure providers in Canada

Future Consultations on this PlanIn early 2016 Compute Canada will embark on a second round of SPARC consultations SPARC2 will help to identify current and future needs as well as to parameterize growth in user demand As with the previous SPARC scientists and engineers from the across Canada will be invited to submit descriptions of their research goals and the needed advanced research computing capabilities and capacities required to achieve those goals

Projections for Future Supply and Demand

Technology Impact of Challenge 2 Stage 1+2By the end of calendar year 2017 Compute Canada will have delivered essentially all of the new computational and storage capacity facilitated by CFIrsquos ldquoChallenge 2 Stage 1+2rdquo award The $75M value of capital investment will replace most legacy systems and associated storage

Modernization and capacity resulting from Challenge 2 Stage 1+24

4 Primary disk does not include offlinenearline storage for backups or near-line storage It does include a variety of disk- or disk-like technologies including object storage block storage storage replicas and storage for filesystems

Compute Canada Technology Briefing - November 2015 11

During this technology refresh program CPUs will be replaced with the latest generation along with more memory New nodes will be augmented by GPUs and accelerators A typical node in service in 2015 has dual 6- or 8-core CPUs and 16-32GB of memory A typical node to be deployed in 2016 will have dual 14- or 16-core CPUs with 128GB of memory or greater

Challenge 2 Stage 1+2 is an important and necessary modernization of the DRI provided by Compute Canada Sustained investment is needed to accommodate the needs of current and future users of Canadarsquos national platform for computation and storage

Scenarios of Increasing DemandThere are several factors impacting planning future demand for advanced research computing

1 Demand by users who engage in computational modelling for additional CPU resources

a To increase spatial or temporal resolution

b To add physics or other simulation factors that were previously too slow or computationally expensive to calculate

c To test additional parameters or scenarios

d For projects and users new to Compute Canada especially in nontraditional fields

2 Demand for additional storage resources for computational modellers

a Larger input and output datasets due to larger or more complex models

b The need to keep some datasets beyond the end of a computational campaign to assist in future modelling or to support publications

3 Demand for portals and gateways including from new user populations

a May include needs for highly resilient services and systems

b May include needs for high-end storage subsystems for database operations

c Bring a user base that may be quite large and may include the general public

4 Demand for projects emphasizing instruments and observational data gathering and analysis

a May have irreplaceable or highly valuable data which needs to have multiple copies at multiple locations

b Include Compute Canadarsquos largest storage users many of whom have new instruments in development

c Require computational resources for post-processing analysis portals visualization andor reanalysis

5 Demand by data-focused projects

a May require isolation of data or computation from inappropriate disclosure

b Includes some usage (such as personal health information) with regulatory concerns

c Includes emphasis on data analytic methods that are not yet generally available on Compute Canada resources

6 Demand from projects being directed by funding agencies to consider utilizing Compute Canada resources

a Include a range of use cases that might not be in Compute Canadarsquos current service catalog but will be developed

b Some of these projects are very large and demanding

c Projects view Compute Canada as a partner not just a resource provider

7 Demand from industry

a May require isolation of data or computation from inappropriate disclosure

b Interested in the expertise of Compute Canada perhaps more than the computational resources

8 Demand from government

a Exist within a regulatory environment that might not be in part of Compute Canadarsquos current service catalog but will be developed

b Can involve a long planning timeline

Compute Canada Technology Briefing - November 2015 13

For Challenge 2 Stage 1+2 planning emphasized modernization of the computational and storage resources Planning has been sensitive to anticipated demand growth and changing patterns of utilization informed via Challenge 1 and the SPARC consultations The annual allocations process by Compute Canada is a major indicator of growth trends since it aggregates hundreds of existing projects Data for the 2016 allocation period are now available and reflect the impact of Challenge 2 Stage 1+2 In 2017 further growth is anticipated along with retirement of legacy resources

Through the 2014-15 SPARC process Compute Canada has identified the expected growth in community demand for storage (15x) and compute (7x) resources through 2020 This data has been converted into a doubling time to project future demand in equivalent core years and terabytes of storage Demand indicators support using a doubling time of 18 years for computational demand and 13 years for storage demand

By forecasting demand based on these doubling times we can project a trend into the future For this projection we present a range where the lower bound represents no growth in the Compute Canada user base and the upper bound represents ongoing increases in the user base following historical trends 2011-2015

Trends project a demand for 1-3 million Haswell-equivalent cores by 2020 and more than an exabyte of persistent storage These projections may turn out to be underestimates since some existing disciplines making extensive use of Compute Canada resources today anticipate needing over 1 million cores or 1 exabyte of data just for their own projects by 2020

It is hoped that Compute Canada will be stewards along with members regions and provincial partners of the sustained capital investment that will be required to meet these demands

Compute Canada Technology Briefing - November 2015 15

Vision 2020Compute Canada as a leading provider of digital research infrastructure (DRI) is taking an integrated approach to data and computational infrastructure in order to benefit all sectors of society As a result of the technology refresh and modernization supported by CFIrsquos Challenge 2 Stage 1+2 excellent research will benefit from modern and capable resources for computationally-based and data-focused work

Compute Canada is coordinating with government funding agencies and with other DRI providers to develop a vision of coordinating to provide the worldrsquos most advanced integrated and capable systems services and support for research Future researchers will have seamless access to DRI resources integrated together for maximum efficiency and performance without needing to be concerned with artificial boundaries based on different geographical locations or providers

By 2020 Compute Canada will offer a comprehensive catalog of resources to support the full data research cycle allowing researchers and their industrial and international partners to compete at a global scale In cooperation with Canadarsquos other DRI providers Compute Canadarsquos systems and services will facilitate workflows that easily span different resources from the lab or campus to national computational resources analytical facilities publication archives and with collaborators Local support and engagement will remain a hallmark of delivering excellent service to all users The pathway to this future has begun with the modernization of Compute Canadarsquos advanced research computing cyberinfrastructure through the CFI Challenge 2 Stage 1+2 program

36 York Mills Road Suite 505 Toronto Ontario Canada M2P 2E9

wwwcomputecanadaca | wwwcalculcanadaca | ComputeCanada

Page 3: Compute Canada Technology Briefing · Compute Canada resources research teams and their international partners work with industry giants in the automotive, ICT, life sciences, aerospace

Compute Canada Technology Briefing - November 2015 3

Key Facts ņ The investment of $75 million in funding from the Canada Foundation for Innovation (CFI)

and provincial partners will address urgent and pressing needs and replace aging high performance computing systems across Canada

ņ Compute Canada and its regional partners have more than 18 years of experience in accelerating results from industrial partnerships in advanced research computing and Canadarsquos major science investments

ņ Compute Canada currently manages more than 20 petabytes of storage and 2 petaflops of computing resources and supports all of Canadarsquos major science investments and programs

ņ With the implementation of this technology deployment plan Compute Canada will manage more than 60 petabytes of storage and 134 petaflops of computing resources

ņ What this means for Canadarsquos Research Community

ņ These improvements will allow Compute Canada to continue to support the wide array of excellent Canadian research identified in the proposal The purchase of significantly more storage deployed as part of an enhanced national storage infrastructure will accelerate data-intensive research in Canada The ability to purchase a single Large Parallel machine of over 65000 cores will provide Canadarsquos largest compute-intensive users with a new resource which far exceeds any machine in the Compute Canada fleet today

ņ This investment is more than an opportunity to increase the size of storage systems and a raw number of cores The new systems replace old technology with new technology and will be deployed with national services coherent policies and a new operational model for the organization This enhanced service level will allow more researchers to exploit the planned four new systems in an efficient and effective way

OverviewCompute Canada is the national resource provider for advanced research computing and big data delivering a full range of systems and services to researchers Funding from the Canada Foundation for Innovation (CFI) matching funds from provincial partners and from vendors in the form of in-kind contributions will enable the significant technology refresh program described below

This technology briefing document is intended to be circulated to Compute Canada stakeholders and suppliers It provides status and planning for the technology refresh program resulting from CFIrsquos cyberinfrastructure initiative and will be implemented from 2015 through early 2018 It also anticipates planning for future growth

The total value of this capital program is $75M to be spent mainly in 2016 and 2017 This reflects a $30M capital grant from CFI a further $30M from provincial and institutional sources and $15M of vendor in-kind1 By the end of 2017 many legacy systems will have been replaced by new computational systems and storage totalling over 123000 CPU cores and 60 PB of storage

1 See wwwinnovationcaenOurFundsComplementaryinformation

Compute Canada Technology Briefing - November 2015 5

New Systems at Four National SitesThrough a formal competition among Compute Canada member institutions four sites were selected to host the new systems and associated services They are the University of Victoria Simon Fraser University (SFU) the University of Waterloo and the University of Toronto

New Computational SystemsPlanning for the new systems at the four sites has been responsive to user demand site affinity and experience and shifts in timing and funding Envisioned system characteristics follow

University of Victoria The ldquoGP1rdquo system will be an OpenStack cloud with emphasis on hosting virtual machines and other cloud workloads At least 3000 CPU cores2 are anticipated by early 2016 with a 40 expansion planned in 2017

Simon Fraser University The ldquoGP2rdquo system will mainly focus on a mix of batch-oriented parallel and serial workloads with several different node types It will also have a relatively small OpenStack partition that will federate with GP1 and GP3 Node types will include some large memory nodes as well as approximately 1923 GPU nodes At least 18000 CPU cores is anticipated for mid-2016 with a 40 expansion planned in 2017

University of Waterloo The ldquoGP3rdquo system will have a similar design to GP2 and it is anticipated that GP2 and GP3 together will provide features for workload portability and resiliency Plans for GP3 include at least 19000 CPU cores in late 2016 with approximately 64 GPU nodes A 40 expansion is planned in 2017

University of Toronto The ldquoLPrdquo system will be deployed by approximately mid-2017 anticipated to have at least 66000 CPU cores This will be a balanced tightly coupled high performance computing resource designed mainly large parallel workloads

National Storage ArchitectureA new national storage architecture spanning the four sites will offer important benefits to users Compute Canada will utilize concepts of generic ldquostorage building blocksrdquo which will use software-defined storage techniques to deploy capacity and performance in a flexible easily expandable highly interoperable and cost-effective manner In addition to providing file systems for file-based storage there will be object storage services Object storage services will provide ease of use and built-in features including resiliency georeplication enhanced metadata and combinations of public access and data isolation

Approximately 20 petabytes (PB) of persistent storage is planned to be deployed across the four sites in early and mid-2016 with expansion to over 60PB by early 2018 An offlinenearline tier of over 20PB will provide lower-cost capacity for backups and hierarchical storage management High performance parallel filesystems for GP2 GP3 and LP will also be deployed

2 CPU core count equivalents are based on Intel ldquoHaswellrdquo computational capabilities3 All future plans for nodes CPUs and other specifications are intended as conservative estimates

Other Compute Canada member sites will be able to benefit from the national storage architecture including those sites operating legacy resources For example users may need to migrate data to the new systems or they might have use cases that will benefit from object storage or the larger capacity and higher performance the newer systems will offer

Delivery TimelineThe Challenge 2 Stage 1+2 technology refresh will span 2 years of staged deployment By the end of calendar year 2017 essentially all Challenge 2 Stage 1+2 funds will have been expended The total supply at that time is forecasted to be at least 126500 CPU cores (ldquoHaswellrdquo equivalents) and 62 petabytes of usable persistent storage Storage does not include near-line or backup storage nor high-speed parallel scratch space

Challenge 2 Stage 1+2 Technology Planning Compute is in Haswell-equivalent cores Storage is in usable petabytes Timeframe is calendar year quarters (ie ldquoQ1 2016rdquo is January-March 2016) and is approximate The core and storage targets are estimates only

During the same two-year period much of Compute Canadarsquos existing equipment will be defunded and removed from the allocations process Users will be moved to one of the new systems and needed data will be migrated Planning in 2014 for the site selection process identified 26 systems with 82000 CPU cores from older generations (nearly 1PF total) for retirement by early calendar year 2017 A schedule for the remaining systems will be developed in conjunction with planning for further technology expansion with some of the remaining systems likely to be removed from the allocations process in 2018 Much of the 15PB of allocatable storage available in 2015 will also be defunded and removed from the allocations process during the 2016-2018 period

Compute Canada Technology Briefing - November 2015 7

Organizational Cooperation and PlanningPlanning for site selection and the ensuing technology refresh has included deep coordination among the four national host sites for all aspects of procuring deploying configuring operating and supporting Compute Canadarsquos suite of systems and services The Compute Canada Technological Leadership Council (TLC) is responsible for developing specifications for the new systems and will lead the procurement evaluation TLC includes representatives from each national site as also includes the four regional CTOs It is led by the national CTO

New national teams which will draw from Compute Canada member institutions will run the systems and services provide user support and engage in cross-site coordination on major themes such as monitoring storage cloud services and networking The new systems and services will share practices for security The teams for all national systems and services will provide defined coverage and response levels

Procurement ProcessesAll four sites are working with the Compute Canada team to ensure an open and fair acquisition process Resources will be purchased and owned by each site Formation of specifications and evaluation of bids will be by national teams with full engagement by site procurement officers

Flexibility in PlanningPlans described here will be modified as needed based on discussions among the four sites Compute Canada and the national and provincial funding agencies Re-scaling of expectations for system size and capabilities if needed will be based on experience with vendor pricing and the influence of the Canadian dollarrsquos exchange rate There will also be assessment of anticipated user demand including for new technologies or configurations This will be via the SPARC process described below as well as through discussions with funding agencies and their researchers

By late 2016 updates will be considered for any needed revisions to planning for the expansion of the three GP systems and of the scale configuration and timing of the LP system Alignment of supply and demand will be re-assessed for computation and storage

Planning will also be responsive to any new information concerning additional funding the selection of additional hosting sites shifts to Canadarsquos digital research infrastructure strategy or other factors

Funding and GovernanceSFU is the lead organization for the CFI capital program and is executing an interinstitutional agreement with the three other hosting institutions and Compute Canada The Compute Canada membership will be involved in many broader aspects of organizational governance and planning CFI retains oversight for capital spending for the technology refresh as well as operational expenses via the Major Science Initiatives (MSI) program

Usage and CapabilitiesAs Canadarsquos national platform for advanced research computing Compute Canada serves thousands of users in essentially every scientific discipline Compute Canada is continually engaged in renewal and expansion of its services and its audience Beyond Canadarsquos academic community this includes engagement with industry and with international partners Some of the current and expanded services within Compute Canada are described below

Workload Portability Users will find it easier than before to run their jobs on any of the new systems This will be facilitated by deploying a single HPC batch system having a common naming scheme for software modules and filesystem mount points and incorporating mechanisms for data movement with the workload manager For projects involving a live stream of observational data or other time-sensitive characteristics workload portability will help to ensure the jobs run on time wherever appropriate HPC resources are available

Cloud Computing Building on Compute Canadarsquos successful early deployment of cloud systems and services the GP1 GP2 and GP3 systems will comprise a federated cloud including single sign-on shared data services a common cloud scheduler and other features for resiliency and ease of use Additional cloud resources within Compute Canada will be able to become part of the federated cloud simply by using the same authentication and configuration parameters

Big Data The storage architecture and cloud services will facilitate big data workloads including data analytics Storage will include database capabilities and cloud services will support virtual machines with user-selected software and features

National Operations and Support The national teams will work together to provide a consistent and well-supported environment for computation and data This will include all aspects of configuration and support Users will have a single point of contact to the national helpdesk and will also be able to benefit from the expertise of on-campus support personnel

Resource Allocations Compute Canada will continue to allocate compute and storage resources through a fair and open process Workload portability and the consistency of configuration and support will give users extra flexibility when desired in their choice of computing resources

Compute Canada Technology Briefing - November 2015 9

National ServicesConsultations have helped inform planning for systems and services in 2015-2017 Service demands were articulated while consulting with applicants for CFIrsquos Challenge 1 Stage 1 including these middleware services that were identified by multiple applicants

ņ Identification and Authorization Service Provide common login across systems

ņ Software Distribution Service Version-controlled software distribution to multiple sites

ņ Data Transfer Service To move datasets among collaborators and their repositories

ņ Monitoring Service Track uptime and availability of services and platforms

ņ Resource Publishing Service Current information about available resources

These services will be deployed beginning in 2016 for all new systems as part of the infrastructure investment Additional services will be identified and developed deployed and supported based on demand It is Compute Canadarsquos intention to provide a useful and effective set of middleware services accessible to any user or group These will provide a high performance and well-supported baseline upon which users or groups may build their own custom applications Compute Canada views these tools as needed software infrastructure and is devoting some of the Challenge 2 Stage 1+2 funds to developing that infrastructure

Compute Canada views many of the new services identified above as essential enabling tools for Research Data Management (RDM) As data volumes grow there is a growing demand for RDM Compute Canada will provide a common set of middleware services for users with this need RDM will continue to mature during the 2016-2018 period and will include cooperation with other digital research infrastructure providers in Canada

Future Consultations on this PlanIn early 2016 Compute Canada will embark on a second round of SPARC consultations SPARC2 will help to identify current and future needs as well as to parameterize growth in user demand As with the previous SPARC scientists and engineers from the across Canada will be invited to submit descriptions of their research goals and the needed advanced research computing capabilities and capacities required to achieve those goals

Projections for Future Supply and Demand

Technology Impact of Challenge 2 Stage 1+2By the end of calendar year 2017 Compute Canada will have delivered essentially all of the new computational and storage capacity facilitated by CFIrsquos ldquoChallenge 2 Stage 1+2rdquo award The $75M value of capital investment will replace most legacy systems and associated storage

Modernization and capacity resulting from Challenge 2 Stage 1+24

4 Primary disk does not include offlinenearline storage for backups or near-line storage It does include a variety of disk- or disk-like technologies including object storage block storage storage replicas and storage for filesystems

Compute Canada Technology Briefing - November 2015 11

During this technology refresh program CPUs will be replaced with the latest generation along with more memory New nodes will be augmented by GPUs and accelerators A typical node in service in 2015 has dual 6- or 8-core CPUs and 16-32GB of memory A typical node to be deployed in 2016 will have dual 14- or 16-core CPUs with 128GB of memory or greater

Challenge 2 Stage 1+2 is an important and necessary modernization of the DRI provided by Compute Canada Sustained investment is needed to accommodate the needs of current and future users of Canadarsquos national platform for computation and storage

Scenarios of Increasing DemandThere are several factors impacting planning future demand for advanced research computing

1 Demand by users who engage in computational modelling for additional CPU resources

a To increase spatial or temporal resolution

b To add physics or other simulation factors that were previously too slow or computationally expensive to calculate

c To test additional parameters or scenarios

d For projects and users new to Compute Canada especially in nontraditional fields

2 Demand for additional storage resources for computational modellers

a Larger input and output datasets due to larger or more complex models

b The need to keep some datasets beyond the end of a computational campaign to assist in future modelling or to support publications

3 Demand for portals and gateways including from new user populations

a May include needs for highly resilient services and systems

b May include needs for high-end storage subsystems for database operations

c Bring a user base that may be quite large and may include the general public

4 Demand for projects emphasizing instruments and observational data gathering and analysis

a May have irreplaceable or highly valuable data which needs to have multiple copies at multiple locations

b Include Compute Canadarsquos largest storage users many of whom have new instruments in development

c Require computational resources for post-processing analysis portals visualization andor reanalysis

5 Demand by data-focused projects

a May require isolation of data or computation from inappropriate disclosure

b Includes some usage (such as personal health information) with regulatory concerns

c Includes emphasis on data analytic methods that are not yet generally available on Compute Canada resources

6 Demand from projects being directed by funding agencies to consider utilizing Compute Canada resources

a Include a range of use cases that might not be in Compute Canadarsquos current service catalog but will be developed

b Some of these projects are very large and demanding

c Projects view Compute Canada as a partner not just a resource provider

7 Demand from industry

a May require isolation of data or computation from inappropriate disclosure

b Interested in the expertise of Compute Canada perhaps more than the computational resources

8 Demand from government

a Exist within a regulatory environment that might not be in part of Compute Canadarsquos current service catalog but will be developed

b Can involve a long planning timeline

Compute Canada Technology Briefing - November 2015 13

For Challenge 2 Stage 1+2 planning emphasized modernization of the computational and storage resources Planning has been sensitive to anticipated demand growth and changing patterns of utilization informed via Challenge 1 and the SPARC consultations The annual allocations process by Compute Canada is a major indicator of growth trends since it aggregates hundreds of existing projects Data for the 2016 allocation period are now available and reflect the impact of Challenge 2 Stage 1+2 In 2017 further growth is anticipated along with retirement of legacy resources

Through the 2014-15 SPARC process Compute Canada has identified the expected growth in community demand for storage (15x) and compute (7x) resources through 2020 This data has been converted into a doubling time to project future demand in equivalent core years and terabytes of storage Demand indicators support using a doubling time of 18 years for computational demand and 13 years for storage demand

By forecasting demand based on these doubling times we can project a trend into the future For this projection we present a range where the lower bound represents no growth in the Compute Canada user base and the upper bound represents ongoing increases in the user base following historical trends 2011-2015

Trends project a demand for 1-3 million Haswell-equivalent cores by 2020 and more than an exabyte of persistent storage These projections may turn out to be underestimates since some existing disciplines making extensive use of Compute Canada resources today anticipate needing over 1 million cores or 1 exabyte of data just for their own projects by 2020

It is hoped that Compute Canada will be stewards along with members regions and provincial partners of the sustained capital investment that will be required to meet these demands

Compute Canada Technology Briefing - November 2015 15

Vision 2020Compute Canada as a leading provider of digital research infrastructure (DRI) is taking an integrated approach to data and computational infrastructure in order to benefit all sectors of society As a result of the technology refresh and modernization supported by CFIrsquos Challenge 2 Stage 1+2 excellent research will benefit from modern and capable resources for computationally-based and data-focused work

Compute Canada is coordinating with government funding agencies and with other DRI providers to develop a vision of coordinating to provide the worldrsquos most advanced integrated and capable systems services and support for research Future researchers will have seamless access to DRI resources integrated together for maximum efficiency and performance without needing to be concerned with artificial boundaries based on different geographical locations or providers

By 2020 Compute Canada will offer a comprehensive catalog of resources to support the full data research cycle allowing researchers and their industrial and international partners to compete at a global scale In cooperation with Canadarsquos other DRI providers Compute Canadarsquos systems and services will facilitate workflows that easily span different resources from the lab or campus to national computational resources analytical facilities publication archives and with collaborators Local support and engagement will remain a hallmark of delivering excellent service to all users The pathway to this future has begun with the modernization of Compute Canadarsquos advanced research computing cyberinfrastructure through the CFI Challenge 2 Stage 1+2 program

36 York Mills Road Suite 505 Toronto Ontario Canada M2P 2E9

wwwcomputecanadaca | wwwcalculcanadaca | ComputeCanada

Page 4: Compute Canada Technology Briefing · Compute Canada resources research teams and their international partners work with industry giants in the automotive, ICT, life sciences, aerospace

OverviewCompute Canada is the national resource provider for advanced research computing and big data delivering a full range of systems and services to researchers Funding from the Canada Foundation for Innovation (CFI) matching funds from provincial partners and from vendors in the form of in-kind contributions will enable the significant technology refresh program described below

This technology briefing document is intended to be circulated to Compute Canada stakeholders and suppliers It provides status and planning for the technology refresh program resulting from CFIrsquos cyberinfrastructure initiative and will be implemented from 2015 through early 2018 It also anticipates planning for future growth

The total value of this capital program is $75M to be spent mainly in 2016 and 2017 This reflects a $30M capital grant from CFI a further $30M from provincial and institutional sources and $15M of vendor in-kind1 By the end of 2017 many legacy systems will have been replaced by new computational systems and storage totalling over 123000 CPU cores and 60 PB of storage

1 See wwwinnovationcaenOurFundsComplementaryinformation

Compute Canada Technology Briefing - November 2015 5

New Systems at Four National SitesThrough a formal competition among Compute Canada member institutions four sites were selected to host the new systems and associated services They are the University of Victoria Simon Fraser University (SFU) the University of Waterloo and the University of Toronto

New Computational SystemsPlanning for the new systems at the four sites has been responsive to user demand site affinity and experience and shifts in timing and funding Envisioned system characteristics follow

University of Victoria The ldquoGP1rdquo system will be an OpenStack cloud with emphasis on hosting virtual machines and other cloud workloads At least 3000 CPU cores2 are anticipated by early 2016 with a 40 expansion planned in 2017

Simon Fraser University The ldquoGP2rdquo system will mainly focus on a mix of batch-oriented parallel and serial workloads with several different node types It will also have a relatively small OpenStack partition that will federate with GP1 and GP3 Node types will include some large memory nodes as well as approximately 1923 GPU nodes At least 18000 CPU cores is anticipated for mid-2016 with a 40 expansion planned in 2017

University of Waterloo The ldquoGP3rdquo system will have a similar design to GP2 and it is anticipated that GP2 and GP3 together will provide features for workload portability and resiliency Plans for GP3 include at least 19000 CPU cores in late 2016 with approximately 64 GPU nodes A 40 expansion is planned in 2017

University of Toronto The ldquoLPrdquo system will be deployed by approximately mid-2017 anticipated to have at least 66000 CPU cores This will be a balanced tightly coupled high performance computing resource designed mainly large parallel workloads

National Storage ArchitectureA new national storage architecture spanning the four sites will offer important benefits to users Compute Canada will utilize concepts of generic ldquostorage building blocksrdquo which will use software-defined storage techniques to deploy capacity and performance in a flexible easily expandable highly interoperable and cost-effective manner In addition to providing file systems for file-based storage there will be object storage services Object storage services will provide ease of use and built-in features including resiliency georeplication enhanced metadata and combinations of public access and data isolation

Approximately 20 petabytes (PB) of persistent storage is planned to be deployed across the four sites in early and mid-2016 with expansion to over 60PB by early 2018 An offlinenearline tier of over 20PB will provide lower-cost capacity for backups and hierarchical storage management High performance parallel filesystems for GP2 GP3 and LP will also be deployed

2 CPU core count equivalents are based on Intel ldquoHaswellrdquo computational capabilities3 All future plans for nodes CPUs and other specifications are intended as conservative estimates

Other Compute Canada member sites will be able to benefit from the national storage architecture including those sites operating legacy resources For example users may need to migrate data to the new systems or they might have use cases that will benefit from object storage or the larger capacity and higher performance the newer systems will offer

Delivery TimelineThe Challenge 2 Stage 1+2 technology refresh will span 2 years of staged deployment By the end of calendar year 2017 essentially all Challenge 2 Stage 1+2 funds will have been expended The total supply at that time is forecasted to be at least 126500 CPU cores (ldquoHaswellrdquo equivalents) and 62 petabytes of usable persistent storage Storage does not include near-line or backup storage nor high-speed parallel scratch space

Challenge 2 Stage 1+2 Technology Planning Compute is in Haswell-equivalent cores Storage is in usable petabytes Timeframe is calendar year quarters (ie ldquoQ1 2016rdquo is January-March 2016) and is approximate The core and storage targets are estimates only

During the same two-year period much of Compute Canadarsquos existing equipment will be defunded and removed from the allocations process Users will be moved to one of the new systems and needed data will be migrated Planning in 2014 for the site selection process identified 26 systems with 82000 CPU cores from older generations (nearly 1PF total) for retirement by early calendar year 2017 A schedule for the remaining systems will be developed in conjunction with planning for further technology expansion with some of the remaining systems likely to be removed from the allocations process in 2018 Much of the 15PB of allocatable storage available in 2015 will also be defunded and removed from the allocations process during the 2016-2018 period

Compute Canada Technology Briefing - November 2015 7

Organizational Cooperation and PlanningPlanning for site selection and the ensuing technology refresh has included deep coordination among the four national host sites for all aspects of procuring deploying configuring operating and supporting Compute Canadarsquos suite of systems and services The Compute Canada Technological Leadership Council (TLC) is responsible for developing specifications for the new systems and will lead the procurement evaluation TLC includes representatives from each national site as also includes the four regional CTOs It is led by the national CTO

New national teams which will draw from Compute Canada member institutions will run the systems and services provide user support and engage in cross-site coordination on major themes such as monitoring storage cloud services and networking The new systems and services will share practices for security The teams for all national systems and services will provide defined coverage and response levels

Procurement ProcessesAll four sites are working with the Compute Canada team to ensure an open and fair acquisition process Resources will be purchased and owned by each site Formation of specifications and evaluation of bids will be by national teams with full engagement by site procurement officers

Flexibility in PlanningPlans described here will be modified as needed based on discussions among the four sites Compute Canada and the national and provincial funding agencies Re-scaling of expectations for system size and capabilities if needed will be based on experience with vendor pricing and the influence of the Canadian dollarrsquos exchange rate There will also be assessment of anticipated user demand including for new technologies or configurations This will be via the SPARC process described below as well as through discussions with funding agencies and their researchers

By late 2016 updates will be considered for any needed revisions to planning for the expansion of the three GP systems and of the scale configuration and timing of the LP system Alignment of supply and demand will be re-assessed for computation and storage

Planning will also be responsive to any new information concerning additional funding the selection of additional hosting sites shifts to Canadarsquos digital research infrastructure strategy or other factors

Funding and GovernanceSFU is the lead organization for the CFI capital program and is executing an interinstitutional agreement with the three other hosting institutions and Compute Canada The Compute Canada membership will be involved in many broader aspects of organizational governance and planning CFI retains oversight for capital spending for the technology refresh as well as operational expenses via the Major Science Initiatives (MSI) program

Usage and CapabilitiesAs Canadarsquos national platform for advanced research computing Compute Canada serves thousands of users in essentially every scientific discipline Compute Canada is continually engaged in renewal and expansion of its services and its audience Beyond Canadarsquos academic community this includes engagement with industry and with international partners Some of the current and expanded services within Compute Canada are described below

Workload Portability Users will find it easier than before to run their jobs on any of the new systems This will be facilitated by deploying a single HPC batch system having a common naming scheme for software modules and filesystem mount points and incorporating mechanisms for data movement with the workload manager For projects involving a live stream of observational data or other time-sensitive characteristics workload portability will help to ensure the jobs run on time wherever appropriate HPC resources are available

Cloud Computing Building on Compute Canadarsquos successful early deployment of cloud systems and services the GP1 GP2 and GP3 systems will comprise a federated cloud including single sign-on shared data services a common cloud scheduler and other features for resiliency and ease of use Additional cloud resources within Compute Canada will be able to become part of the federated cloud simply by using the same authentication and configuration parameters

Big Data The storage architecture and cloud services will facilitate big data workloads including data analytics Storage will include database capabilities and cloud services will support virtual machines with user-selected software and features

National Operations and Support The national teams will work together to provide a consistent and well-supported environment for computation and data This will include all aspects of configuration and support Users will have a single point of contact to the national helpdesk and will also be able to benefit from the expertise of on-campus support personnel

Resource Allocations Compute Canada will continue to allocate compute and storage resources through a fair and open process Workload portability and the consistency of configuration and support will give users extra flexibility when desired in their choice of computing resources

Compute Canada Technology Briefing - November 2015 9

National ServicesConsultations have helped inform planning for systems and services in 2015-2017 Service demands were articulated while consulting with applicants for CFIrsquos Challenge 1 Stage 1 including these middleware services that were identified by multiple applicants

ņ Identification and Authorization Service Provide common login across systems

ņ Software Distribution Service Version-controlled software distribution to multiple sites

ņ Data Transfer Service To move datasets among collaborators and their repositories

ņ Monitoring Service Track uptime and availability of services and platforms

ņ Resource Publishing Service Current information about available resources

These services will be deployed beginning in 2016 for all new systems as part of the infrastructure investment Additional services will be identified and developed deployed and supported based on demand It is Compute Canadarsquos intention to provide a useful and effective set of middleware services accessible to any user or group These will provide a high performance and well-supported baseline upon which users or groups may build their own custom applications Compute Canada views these tools as needed software infrastructure and is devoting some of the Challenge 2 Stage 1+2 funds to developing that infrastructure

Compute Canada views many of the new services identified above as essential enabling tools for Research Data Management (RDM) As data volumes grow there is a growing demand for RDM Compute Canada will provide a common set of middleware services for users with this need RDM will continue to mature during the 2016-2018 period and will include cooperation with other digital research infrastructure providers in Canada

Future Consultations on this PlanIn early 2016 Compute Canada will embark on a second round of SPARC consultations SPARC2 will help to identify current and future needs as well as to parameterize growth in user demand As with the previous SPARC scientists and engineers from the across Canada will be invited to submit descriptions of their research goals and the needed advanced research computing capabilities and capacities required to achieve those goals

Projections for Future Supply and Demand

Technology Impact of Challenge 2 Stage 1+2By the end of calendar year 2017 Compute Canada will have delivered essentially all of the new computational and storage capacity facilitated by CFIrsquos ldquoChallenge 2 Stage 1+2rdquo award The $75M value of capital investment will replace most legacy systems and associated storage

Modernization and capacity resulting from Challenge 2 Stage 1+24

4 Primary disk does not include offlinenearline storage for backups or near-line storage It does include a variety of disk- or disk-like technologies including object storage block storage storage replicas and storage for filesystems

Compute Canada Technology Briefing - November 2015 11

During this technology refresh program CPUs will be replaced with the latest generation along with more memory New nodes will be augmented by GPUs and accelerators A typical node in service in 2015 has dual 6- or 8-core CPUs and 16-32GB of memory A typical node to be deployed in 2016 will have dual 14- or 16-core CPUs with 128GB of memory or greater

Challenge 2 Stage 1+2 is an important and necessary modernization of the DRI provided by Compute Canada Sustained investment is needed to accommodate the needs of current and future users of Canadarsquos national platform for computation and storage

Scenarios of Increasing DemandThere are several factors impacting planning future demand for advanced research computing

1 Demand by users who engage in computational modelling for additional CPU resources

a To increase spatial or temporal resolution

b To add physics or other simulation factors that were previously too slow or computationally expensive to calculate

c To test additional parameters or scenarios

d For projects and users new to Compute Canada especially in nontraditional fields

2 Demand for additional storage resources for computational modellers

a Larger input and output datasets due to larger or more complex models

b The need to keep some datasets beyond the end of a computational campaign to assist in future modelling or to support publications

3 Demand for portals and gateways including from new user populations

a May include needs for highly resilient services and systems

b May include needs for high-end storage subsystems for database operations

c Bring a user base that may be quite large and may include the general public

4 Demand for projects emphasizing instruments and observational data gathering and analysis

a May have irreplaceable or highly valuable data which needs to have multiple copies at multiple locations

b Include Compute Canadarsquos largest storage users many of whom have new instruments in development

c Require computational resources for post-processing analysis portals visualization andor reanalysis

5 Demand by data-focused projects

a May require isolation of data or computation from inappropriate disclosure

b Includes some usage (such as personal health information) with regulatory concerns

c Includes emphasis on data analytic methods that are not yet generally available on Compute Canada resources

6 Demand from projects being directed by funding agencies to consider utilizing Compute Canada resources

a Include a range of use cases that might not be in Compute Canadarsquos current service catalog but will be developed

b Some of these projects are very large and demanding

c Projects view Compute Canada as a partner not just a resource provider

7 Demand from industry

a May require isolation of data or computation from inappropriate disclosure

b Interested in the expertise of Compute Canada perhaps more than the computational resources

8 Demand from government

a Exist within a regulatory environment that might not be in part of Compute Canadarsquos current service catalog but will be developed

b Can involve a long planning timeline

Compute Canada Technology Briefing - November 2015 13

For Challenge 2 Stage 1+2 planning emphasized modernization of the computational and storage resources Planning has been sensitive to anticipated demand growth and changing patterns of utilization informed via Challenge 1 and the SPARC consultations The annual allocations process by Compute Canada is a major indicator of growth trends since it aggregates hundreds of existing projects Data for the 2016 allocation period are now available and reflect the impact of Challenge 2 Stage 1+2 In 2017 further growth is anticipated along with retirement of legacy resources

Through the 2014-15 SPARC process Compute Canada has identified the expected growth in community demand for storage (15x) and compute (7x) resources through 2020 This data has been converted into a doubling time to project future demand in equivalent core years and terabytes of storage Demand indicators support using a doubling time of 18 years for computational demand and 13 years for storage demand

By forecasting demand based on these doubling times we can project a trend into the future For this projection we present a range where the lower bound represents no growth in the Compute Canada user base and the upper bound represents ongoing increases in the user base following historical trends 2011-2015

Trends project a demand for 1-3 million Haswell-equivalent cores by 2020 and more than an exabyte of persistent storage These projections may turn out to be underestimates since some existing disciplines making extensive use of Compute Canada resources today anticipate needing over 1 million cores or 1 exabyte of data just for their own projects by 2020

It is hoped that Compute Canada will be stewards along with members regions and provincial partners of the sustained capital investment that will be required to meet these demands

Compute Canada Technology Briefing - November 2015 15

Vision 2020Compute Canada as a leading provider of digital research infrastructure (DRI) is taking an integrated approach to data and computational infrastructure in order to benefit all sectors of society As a result of the technology refresh and modernization supported by CFIrsquos Challenge 2 Stage 1+2 excellent research will benefit from modern and capable resources for computationally-based and data-focused work

Compute Canada is coordinating with government funding agencies and with other DRI providers to develop a vision of coordinating to provide the worldrsquos most advanced integrated and capable systems services and support for research Future researchers will have seamless access to DRI resources integrated together for maximum efficiency and performance without needing to be concerned with artificial boundaries based on different geographical locations or providers

By 2020 Compute Canada will offer a comprehensive catalog of resources to support the full data research cycle allowing researchers and their industrial and international partners to compete at a global scale In cooperation with Canadarsquos other DRI providers Compute Canadarsquos systems and services will facilitate workflows that easily span different resources from the lab or campus to national computational resources analytical facilities publication archives and with collaborators Local support and engagement will remain a hallmark of delivering excellent service to all users The pathway to this future has begun with the modernization of Compute Canadarsquos advanced research computing cyberinfrastructure through the CFI Challenge 2 Stage 1+2 program

36 York Mills Road Suite 505 Toronto Ontario Canada M2P 2E9

wwwcomputecanadaca | wwwcalculcanadaca | ComputeCanada

Page 5: Compute Canada Technology Briefing · Compute Canada resources research teams and their international partners work with industry giants in the automotive, ICT, life sciences, aerospace

Compute Canada Technology Briefing - November 2015 5

New Systems at Four National SitesThrough a formal competition among Compute Canada member institutions four sites were selected to host the new systems and associated services They are the University of Victoria Simon Fraser University (SFU) the University of Waterloo and the University of Toronto

New Computational SystemsPlanning for the new systems at the four sites has been responsive to user demand site affinity and experience and shifts in timing and funding Envisioned system characteristics follow

University of Victoria The ldquoGP1rdquo system will be an OpenStack cloud with emphasis on hosting virtual machines and other cloud workloads At least 3000 CPU cores2 are anticipated by early 2016 with a 40 expansion planned in 2017

Simon Fraser University The ldquoGP2rdquo system will mainly focus on a mix of batch-oriented parallel and serial workloads with several different node types It will also have a relatively small OpenStack partition that will federate with GP1 and GP3 Node types will include some large memory nodes as well as approximately 1923 GPU nodes At least 18000 CPU cores is anticipated for mid-2016 with a 40 expansion planned in 2017

University of Waterloo The ldquoGP3rdquo system will have a similar design to GP2 and it is anticipated that GP2 and GP3 together will provide features for workload portability and resiliency Plans for GP3 include at least 19000 CPU cores in late 2016 with approximately 64 GPU nodes A 40 expansion is planned in 2017

University of Toronto The ldquoLPrdquo system will be deployed by approximately mid-2017 anticipated to have at least 66000 CPU cores This will be a balanced tightly coupled high performance computing resource designed mainly large parallel workloads

National Storage ArchitectureA new national storage architecture spanning the four sites will offer important benefits to users Compute Canada will utilize concepts of generic ldquostorage building blocksrdquo which will use software-defined storage techniques to deploy capacity and performance in a flexible easily expandable highly interoperable and cost-effective manner In addition to providing file systems for file-based storage there will be object storage services Object storage services will provide ease of use and built-in features including resiliency georeplication enhanced metadata and combinations of public access and data isolation

Approximately 20 petabytes (PB) of persistent storage is planned to be deployed across the four sites in early and mid-2016 with expansion to over 60PB by early 2018 An offlinenearline tier of over 20PB will provide lower-cost capacity for backups and hierarchical storage management High performance parallel filesystems for GP2 GP3 and LP will also be deployed

2 CPU core count equivalents are based on Intel ldquoHaswellrdquo computational capabilities3 All future plans for nodes CPUs and other specifications are intended as conservative estimates

Other Compute Canada member sites will be able to benefit from the national storage architecture including those sites operating legacy resources For example users may need to migrate data to the new systems or they might have use cases that will benefit from object storage or the larger capacity and higher performance the newer systems will offer

Delivery TimelineThe Challenge 2 Stage 1+2 technology refresh will span 2 years of staged deployment By the end of calendar year 2017 essentially all Challenge 2 Stage 1+2 funds will have been expended The total supply at that time is forecasted to be at least 126500 CPU cores (ldquoHaswellrdquo equivalents) and 62 petabytes of usable persistent storage Storage does not include near-line or backup storage nor high-speed parallel scratch space

Challenge 2 Stage 1+2 Technology Planning Compute is in Haswell-equivalent cores Storage is in usable petabytes Timeframe is calendar year quarters (ie ldquoQ1 2016rdquo is January-March 2016) and is approximate The core and storage targets are estimates only

During the same two-year period much of Compute Canadarsquos existing equipment will be defunded and removed from the allocations process Users will be moved to one of the new systems and needed data will be migrated Planning in 2014 for the site selection process identified 26 systems with 82000 CPU cores from older generations (nearly 1PF total) for retirement by early calendar year 2017 A schedule for the remaining systems will be developed in conjunction with planning for further technology expansion with some of the remaining systems likely to be removed from the allocations process in 2018 Much of the 15PB of allocatable storage available in 2015 will also be defunded and removed from the allocations process during the 2016-2018 period

Compute Canada Technology Briefing - November 2015 7

Organizational Cooperation and PlanningPlanning for site selection and the ensuing technology refresh has included deep coordination among the four national host sites for all aspects of procuring deploying configuring operating and supporting Compute Canadarsquos suite of systems and services The Compute Canada Technological Leadership Council (TLC) is responsible for developing specifications for the new systems and will lead the procurement evaluation TLC includes representatives from each national site as also includes the four regional CTOs It is led by the national CTO

New national teams which will draw from Compute Canada member institutions will run the systems and services provide user support and engage in cross-site coordination on major themes such as monitoring storage cloud services and networking The new systems and services will share practices for security The teams for all national systems and services will provide defined coverage and response levels

Procurement ProcessesAll four sites are working with the Compute Canada team to ensure an open and fair acquisition process Resources will be purchased and owned by each site Formation of specifications and evaluation of bids will be by national teams with full engagement by site procurement officers

Flexibility in PlanningPlans described here will be modified as needed based on discussions among the four sites Compute Canada and the national and provincial funding agencies Re-scaling of expectations for system size and capabilities if needed will be based on experience with vendor pricing and the influence of the Canadian dollarrsquos exchange rate There will also be assessment of anticipated user demand including for new technologies or configurations This will be via the SPARC process described below as well as through discussions with funding agencies and their researchers

By late 2016 updates will be considered for any needed revisions to planning for the expansion of the three GP systems and of the scale configuration and timing of the LP system Alignment of supply and demand will be re-assessed for computation and storage

Planning will also be responsive to any new information concerning additional funding the selection of additional hosting sites shifts to Canadarsquos digital research infrastructure strategy or other factors

Funding and GovernanceSFU is the lead organization for the CFI capital program and is executing an interinstitutional agreement with the three other hosting institutions and Compute Canada The Compute Canada membership will be involved in many broader aspects of organizational governance and planning CFI retains oversight for capital spending for the technology refresh as well as operational expenses via the Major Science Initiatives (MSI) program

Usage and CapabilitiesAs Canadarsquos national platform for advanced research computing Compute Canada serves thousands of users in essentially every scientific discipline Compute Canada is continually engaged in renewal and expansion of its services and its audience Beyond Canadarsquos academic community this includes engagement with industry and with international partners Some of the current and expanded services within Compute Canada are described below

Workload Portability Users will find it easier than before to run their jobs on any of the new systems This will be facilitated by deploying a single HPC batch system having a common naming scheme for software modules and filesystem mount points and incorporating mechanisms for data movement with the workload manager For projects involving a live stream of observational data or other time-sensitive characteristics workload portability will help to ensure the jobs run on time wherever appropriate HPC resources are available

Cloud Computing Building on Compute Canadarsquos successful early deployment of cloud systems and services the GP1 GP2 and GP3 systems will comprise a federated cloud including single sign-on shared data services a common cloud scheduler and other features for resiliency and ease of use Additional cloud resources within Compute Canada will be able to become part of the federated cloud simply by using the same authentication and configuration parameters

Big Data The storage architecture and cloud services will facilitate big data workloads including data analytics Storage will include database capabilities and cloud services will support virtual machines with user-selected software and features

National Operations and Support The national teams will work together to provide a consistent and well-supported environment for computation and data This will include all aspects of configuration and support Users will have a single point of contact to the national helpdesk and will also be able to benefit from the expertise of on-campus support personnel

Resource Allocations Compute Canada will continue to allocate compute and storage resources through a fair and open process Workload portability and the consistency of configuration and support will give users extra flexibility when desired in their choice of computing resources

Compute Canada Technology Briefing - November 2015 9

National ServicesConsultations have helped inform planning for systems and services in 2015-2017 Service demands were articulated while consulting with applicants for CFIrsquos Challenge 1 Stage 1 including these middleware services that were identified by multiple applicants

ņ Identification and Authorization Service Provide common login across systems

ņ Software Distribution Service Version-controlled software distribution to multiple sites

ņ Data Transfer Service To move datasets among collaborators and their repositories

ņ Monitoring Service Track uptime and availability of services and platforms

ņ Resource Publishing Service Current information about available resources

These services will be deployed beginning in 2016 for all new systems as part of the infrastructure investment Additional services will be identified and developed deployed and supported based on demand It is Compute Canadarsquos intention to provide a useful and effective set of middleware services accessible to any user or group These will provide a high performance and well-supported baseline upon which users or groups may build their own custom applications Compute Canada views these tools as needed software infrastructure and is devoting some of the Challenge 2 Stage 1+2 funds to developing that infrastructure

Compute Canada views many of the new services identified above as essential enabling tools for Research Data Management (RDM) As data volumes grow there is a growing demand for RDM Compute Canada will provide a common set of middleware services for users with this need RDM will continue to mature during the 2016-2018 period and will include cooperation with other digital research infrastructure providers in Canada

Future Consultations on this PlanIn early 2016 Compute Canada will embark on a second round of SPARC consultations SPARC2 will help to identify current and future needs as well as to parameterize growth in user demand As with the previous SPARC scientists and engineers from the across Canada will be invited to submit descriptions of their research goals and the needed advanced research computing capabilities and capacities required to achieve those goals

Projections for Future Supply and Demand

Technology Impact of Challenge 2 Stage 1+2By the end of calendar year 2017 Compute Canada will have delivered essentially all of the new computational and storage capacity facilitated by CFIrsquos ldquoChallenge 2 Stage 1+2rdquo award The $75M value of capital investment will replace most legacy systems and associated storage

Modernization and capacity resulting from Challenge 2 Stage 1+24

4 Primary disk does not include offlinenearline storage for backups or near-line storage It does include a variety of disk- or disk-like technologies including object storage block storage storage replicas and storage for filesystems

Compute Canada Technology Briefing - November 2015 11

During this technology refresh program CPUs will be replaced with the latest generation along with more memory New nodes will be augmented by GPUs and accelerators A typical node in service in 2015 has dual 6- or 8-core CPUs and 16-32GB of memory A typical node to be deployed in 2016 will have dual 14- or 16-core CPUs with 128GB of memory or greater

Challenge 2 Stage 1+2 is an important and necessary modernization of the DRI provided by Compute Canada Sustained investment is needed to accommodate the needs of current and future users of Canadarsquos national platform for computation and storage

Scenarios of Increasing DemandThere are several factors impacting planning future demand for advanced research computing

1 Demand by users who engage in computational modelling for additional CPU resources

a To increase spatial or temporal resolution

b To add physics or other simulation factors that were previously too slow or computationally expensive to calculate

c To test additional parameters or scenarios

d For projects and users new to Compute Canada especially in nontraditional fields

2 Demand for additional storage resources for computational modellers

a Larger input and output datasets due to larger or more complex models

b The need to keep some datasets beyond the end of a computational campaign to assist in future modelling or to support publications

3 Demand for portals and gateways including from new user populations

a May include needs for highly resilient services and systems

b May include needs for high-end storage subsystems for database operations

c Bring a user base that may be quite large and may include the general public

4 Demand for projects emphasizing instruments and observational data gathering and analysis

a May have irreplaceable or highly valuable data which needs to have multiple copies at multiple locations

b Include Compute Canadarsquos largest storage users many of whom have new instruments in development

c Require computational resources for post-processing analysis portals visualization andor reanalysis

5 Demand by data-focused projects

a May require isolation of data or computation from inappropriate disclosure

b Includes some usage (such as personal health information) with regulatory concerns

c Includes emphasis on data analytic methods that are not yet generally available on Compute Canada resources

6 Demand from projects being directed by funding agencies to consider utilizing Compute Canada resources

a Include a range of use cases that might not be in Compute Canadarsquos current service catalog but will be developed

b Some of these projects are very large and demanding

c Projects view Compute Canada as a partner not just a resource provider

7 Demand from industry

a May require isolation of data or computation from inappropriate disclosure

b Interested in the expertise of Compute Canada perhaps more than the computational resources

8 Demand from government

a Exist within a regulatory environment that might not be in part of Compute Canadarsquos current service catalog but will be developed

b Can involve a long planning timeline

Compute Canada Technology Briefing - November 2015 13

For Challenge 2 Stage 1+2 planning emphasized modernization of the computational and storage resources Planning has been sensitive to anticipated demand growth and changing patterns of utilization informed via Challenge 1 and the SPARC consultations The annual allocations process by Compute Canada is a major indicator of growth trends since it aggregates hundreds of existing projects Data for the 2016 allocation period are now available and reflect the impact of Challenge 2 Stage 1+2 In 2017 further growth is anticipated along with retirement of legacy resources

Through the 2014-15 SPARC process Compute Canada has identified the expected growth in community demand for storage (15x) and compute (7x) resources through 2020 This data has been converted into a doubling time to project future demand in equivalent core years and terabytes of storage Demand indicators support using a doubling time of 18 years for computational demand and 13 years for storage demand

By forecasting demand based on these doubling times we can project a trend into the future For this projection we present a range where the lower bound represents no growth in the Compute Canada user base and the upper bound represents ongoing increases in the user base following historical trends 2011-2015

Trends project a demand for 1-3 million Haswell-equivalent cores by 2020 and more than an exabyte of persistent storage These projections may turn out to be underestimates since some existing disciplines making extensive use of Compute Canada resources today anticipate needing over 1 million cores or 1 exabyte of data just for their own projects by 2020

It is hoped that Compute Canada will be stewards along with members regions and provincial partners of the sustained capital investment that will be required to meet these demands

Compute Canada Technology Briefing - November 2015 15

Vision 2020Compute Canada as a leading provider of digital research infrastructure (DRI) is taking an integrated approach to data and computational infrastructure in order to benefit all sectors of society As a result of the technology refresh and modernization supported by CFIrsquos Challenge 2 Stage 1+2 excellent research will benefit from modern and capable resources for computationally-based and data-focused work

Compute Canada is coordinating with government funding agencies and with other DRI providers to develop a vision of coordinating to provide the worldrsquos most advanced integrated and capable systems services and support for research Future researchers will have seamless access to DRI resources integrated together for maximum efficiency and performance without needing to be concerned with artificial boundaries based on different geographical locations or providers

By 2020 Compute Canada will offer a comprehensive catalog of resources to support the full data research cycle allowing researchers and their industrial and international partners to compete at a global scale In cooperation with Canadarsquos other DRI providers Compute Canadarsquos systems and services will facilitate workflows that easily span different resources from the lab or campus to national computational resources analytical facilities publication archives and with collaborators Local support and engagement will remain a hallmark of delivering excellent service to all users The pathway to this future has begun with the modernization of Compute Canadarsquos advanced research computing cyberinfrastructure through the CFI Challenge 2 Stage 1+2 program

36 York Mills Road Suite 505 Toronto Ontario Canada M2P 2E9

wwwcomputecanadaca | wwwcalculcanadaca | ComputeCanada

Page 6: Compute Canada Technology Briefing · Compute Canada resources research teams and their international partners work with industry giants in the automotive, ICT, life sciences, aerospace

Other Compute Canada member sites will be able to benefit from the national storage architecture including those sites operating legacy resources For example users may need to migrate data to the new systems or they might have use cases that will benefit from object storage or the larger capacity and higher performance the newer systems will offer

Delivery TimelineThe Challenge 2 Stage 1+2 technology refresh will span 2 years of staged deployment By the end of calendar year 2017 essentially all Challenge 2 Stage 1+2 funds will have been expended The total supply at that time is forecasted to be at least 126500 CPU cores (ldquoHaswellrdquo equivalents) and 62 petabytes of usable persistent storage Storage does not include near-line or backup storage nor high-speed parallel scratch space

Challenge 2 Stage 1+2 Technology Planning Compute is in Haswell-equivalent cores Storage is in usable petabytes Timeframe is calendar year quarters (ie ldquoQ1 2016rdquo is January-March 2016) and is approximate The core and storage targets are estimates only

During the same two-year period much of Compute Canadarsquos existing equipment will be defunded and removed from the allocations process Users will be moved to one of the new systems and needed data will be migrated Planning in 2014 for the site selection process identified 26 systems with 82000 CPU cores from older generations (nearly 1PF total) for retirement by early calendar year 2017 A schedule for the remaining systems will be developed in conjunction with planning for further technology expansion with some of the remaining systems likely to be removed from the allocations process in 2018 Much of the 15PB of allocatable storage available in 2015 will also be defunded and removed from the allocations process during the 2016-2018 period

Compute Canada Technology Briefing - November 2015 7

Organizational Cooperation and PlanningPlanning for site selection and the ensuing technology refresh has included deep coordination among the four national host sites for all aspects of procuring deploying configuring operating and supporting Compute Canadarsquos suite of systems and services The Compute Canada Technological Leadership Council (TLC) is responsible for developing specifications for the new systems and will lead the procurement evaluation TLC includes representatives from each national site as also includes the four regional CTOs It is led by the national CTO

New national teams which will draw from Compute Canada member institutions will run the systems and services provide user support and engage in cross-site coordination on major themes such as monitoring storage cloud services and networking The new systems and services will share practices for security The teams for all national systems and services will provide defined coverage and response levels

Procurement ProcessesAll four sites are working with the Compute Canada team to ensure an open and fair acquisition process Resources will be purchased and owned by each site Formation of specifications and evaluation of bids will be by national teams with full engagement by site procurement officers

Flexibility in PlanningPlans described here will be modified as needed based on discussions among the four sites Compute Canada and the national and provincial funding agencies Re-scaling of expectations for system size and capabilities if needed will be based on experience with vendor pricing and the influence of the Canadian dollarrsquos exchange rate There will also be assessment of anticipated user demand including for new technologies or configurations This will be via the SPARC process described below as well as through discussions with funding agencies and their researchers

By late 2016 updates will be considered for any needed revisions to planning for the expansion of the three GP systems and of the scale configuration and timing of the LP system Alignment of supply and demand will be re-assessed for computation and storage

Planning will also be responsive to any new information concerning additional funding the selection of additional hosting sites shifts to Canadarsquos digital research infrastructure strategy or other factors

Funding and GovernanceSFU is the lead organization for the CFI capital program and is executing an interinstitutional agreement with the three other hosting institutions and Compute Canada The Compute Canada membership will be involved in many broader aspects of organizational governance and planning CFI retains oversight for capital spending for the technology refresh as well as operational expenses via the Major Science Initiatives (MSI) program

Usage and CapabilitiesAs Canadarsquos national platform for advanced research computing Compute Canada serves thousands of users in essentially every scientific discipline Compute Canada is continually engaged in renewal and expansion of its services and its audience Beyond Canadarsquos academic community this includes engagement with industry and with international partners Some of the current and expanded services within Compute Canada are described below

Workload Portability Users will find it easier than before to run their jobs on any of the new systems This will be facilitated by deploying a single HPC batch system having a common naming scheme for software modules and filesystem mount points and incorporating mechanisms for data movement with the workload manager For projects involving a live stream of observational data or other time-sensitive characteristics workload portability will help to ensure the jobs run on time wherever appropriate HPC resources are available

Cloud Computing Building on Compute Canadarsquos successful early deployment of cloud systems and services the GP1 GP2 and GP3 systems will comprise a federated cloud including single sign-on shared data services a common cloud scheduler and other features for resiliency and ease of use Additional cloud resources within Compute Canada will be able to become part of the federated cloud simply by using the same authentication and configuration parameters

Big Data The storage architecture and cloud services will facilitate big data workloads including data analytics Storage will include database capabilities and cloud services will support virtual machines with user-selected software and features

National Operations and Support The national teams will work together to provide a consistent and well-supported environment for computation and data This will include all aspects of configuration and support Users will have a single point of contact to the national helpdesk and will also be able to benefit from the expertise of on-campus support personnel

Resource Allocations Compute Canada will continue to allocate compute and storage resources through a fair and open process Workload portability and the consistency of configuration and support will give users extra flexibility when desired in their choice of computing resources

Compute Canada Technology Briefing - November 2015 9

National ServicesConsultations have helped inform planning for systems and services in 2015-2017 Service demands were articulated while consulting with applicants for CFIrsquos Challenge 1 Stage 1 including these middleware services that were identified by multiple applicants

ņ Identification and Authorization Service Provide common login across systems

ņ Software Distribution Service Version-controlled software distribution to multiple sites

ņ Data Transfer Service To move datasets among collaborators and their repositories

ņ Monitoring Service Track uptime and availability of services and platforms

ņ Resource Publishing Service Current information about available resources

These services will be deployed beginning in 2016 for all new systems as part of the infrastructure investment Additional services will be identified and developed deployed and supported based on demand It is Compute Canadarsquos intention to provide a useful and effective set of middleware services accessible to any user or group These will provide a high performance and well-supported baseline upon which users or groups may build their own custom applications Compute Canada views these tools as needed software infrastructure and is devoting some of the Challenge 2 Stage 1+2 funds to developing that infrastructure

Compute Canada views many of the new services identified above as essential enabling tools for Research Data Management (RDM) As data volumes grow there is a growing demand for RDM Compute Canada will provide a common set of middleware services for users with this need RDM will continue to mature during the 2016-2018 period and will include cooperation with other digital research infrastructure providers in Canada

Future Consultations on this PlanIn early 2016 Compute Canada will embark on a second round of SPARC consultations SPARC2 will help to identify current and future needs as well as to parameterize growth in user demand As with the previous SPARC scientists and engineers from the across Canada will be invited to submit descriptions of their research goals and the needed advanced research computing capabilities and capacities required to achieve those goals

Projections for Future Supply and Demand

Technology Impact of Challenge 2 Stage 1+2By the end of calendar year 2017 Compute Canada will have delivered essentially all of the new computational and storage capacity facilitated by CFIrsquos ldquoChallenge 2 Stage 1+2rdquo award The $75M value of capital investment will replace most legacy systems and associated storage

Modernization and capacity resulting from Challenge 2 Stage 1+24

4 Primary disk does not include offlinenearline storage for backups or near-line storage It does include a variety of disk- or disk-like technologies including object storage block storage storage replicas and storage for filesystems

Compute Canada Technology Briefing - November 2015 11

During this technology refresh program CPUs will be replaced with the latest generation along with more memory New nodes will be augmented by GPUs and accelerators A typical node in service in 2015 has dual 6- or 8-core CPUs and 16-32GB of memory A typical node to be deployed in 2016 will have dual 14- or 16-core CPUs with 128GB of memory or greater

Challenge 2 Stage 1+2 is an important and necessary modernization of the DRI provided by Compute Canada Sustained investment is needed to accommodate the needs of current and future users of Canadarsquos national platform for computation and storage

Scenarios of Increasing DemandThere are several factors impacting planning future demand for advanced research computing

1 Demand by users who engage in computational modelling for additional CPU resources

a To increase spatial or temporal resolution

b To add physics or other simulation factors that were previously too slow or computationally expensive to calculate

c To test additional parameters or scenarios

d For projects and users new to Compute Canada especially in nontraditional fields

2 Demand for additional storage resources for computational modellers

a Larger input and output datasets due to larger or more complex models

b The need to keep some datasets beyond the end of a computational campaign to assist in future modelling or to support publications

3 Demand for portals and gateways including from new user populations

a May include needs for highly resilient services and systems

b May include needs for high-end storage subsystems for database operations

c Bring a user base that may be quite large and may include the general public

4 Demand for projects emphasizing instruments and observational data gathering and analysis

a May have irreplaceable or highly valuable data which needs to have multiple copies at multiple locations

b Include Compute Canadarsquos largest storage users many of whom have new instruments in development

c Require computational resources for post-processing analysis portals visualization andor reanalysis

5 Demand by data-focused projects

a May require isolation of data or computation from inappropriate disclosure

b Includes some usage (such as personal health information) with regulatory concerns

c Includes emphasis on data analytic methods that are not yet generally available on Compute Canada resources

6 Demand from projects being directed by funding agencies to consider utilizing Compute Canada resources

a Include a range of use cases that might not be in Compute Canadarsquos current service catalog but will be developed

b Some of these projects are very large and demanding

c Projects view Compute Canada as a partner not just a resource provider

7 Demand from industry

a May require isolation of data or computation from inappropriate disclosure

b Interested in the expertise of Compute Canada perhaps more than the computational resources

8 Demand from government

a Exist within a regulatory environment that might not be in part of Compute Canadarsquos current service catalog but will be developed

b Can involve a long planning timeline

Compute Canada Technology Briefing - November 2015 13

For Challenge 2 Stage 1+2 planning emphasized modernization of the computational and storage resources Planning has been sensitive to anticipated demand growth and changing patterns of utilization informed via Challenge 1 and the SPARC consultations The annual allocations process by Compute Canada is a major indicator of growth trends since it aggregates hundreds of existing projects Data for the 2016 allocation period are now available and reflect the impact of Challenge 2 Stage 1+2 In 2017 further growth is anticipated along with retirement of legacy resources

Through the 2014-15 SPARC process Compute Canada has identified the expected growth in community demand for storage (15x) and compute (7x) resources through 2020 This data has been converted into a doubling time to project future demand in equivalent core years and terabytes of storage Demand indicators support using a doubling time of 18 years for computational demand and 13 years for storage demand

By forecasting demand based on these doubling times we can project a trend into the future For this projection we present a range where the lower bound represents no growth in the Compute Canada user base and the upper bound represents ongoing increases in the user base following historical trends 2011-2015

Trends project a demand for 1-3 million Haswell-equivalent cores by 2020 and more than an exabyte of persistent storage These projections may turn out to be underestimates since some existing disciplines making extensive use of Compute Canada resources today anticipate needing over 1 million cores or 1 exabyte of data just for their own projects by 2020

It is hoped that Compute Canada will be stewards along with members regions and provincial partners of the sustained capital investment that will be required to meet these demands

Compute Canada Technology Briefing - November 2015 15

Vision 2020Compute Canada as a leading provider of digital research infrastructure (DRI) is taking an integrated approach to data and computational infrastructure in order to benefit all sectors of society As a result of the technology refresh and modernization supported by CFIrsquos Challenge 2 Stage 1+2 excellent research will benefit from modern and capable resources for computationally-based and data-focused work

Compute Canada is coordinating with government funding agencies and with other DRI providers to develop a vision of coordinating to provide the worldrsquos most advanced integrated and capable systems services and support for research Future researchers will have seamless access to DRI resources integrated together for maximum efficiency and performance without needing to be concerned with artificial boundaries based on different geographical locations or providers

By 2020 Compute Canada will offer a comprehensive catalog of resources to support the full data research cycle allowing researchers and their industrial and international partners to compete at a global scale In cooperation with Canadarsquos other DRI providers Compute Canadarsquos systems and services will facilitate workflows that easily span different resources from the lab or campus to national computational resources analytical facilities publication archives and with collaborators Local support and engagement will remain a hallmark of delivering excellent service to all users The pathway to this future has begun with the modernization of Compute Canadarsquos advanced research computing cyberinfrastructure through the CFI Challenge 2 Stage 1+2 program

36 York Mills Road Suite 505 Toronto Ontario Canada M2P 2E9

wwwcomputecanadaca | wwwcalculcanadaca | ComputeCanada

Page 7: Compute Canada Technology Briefing · Compute Canada resources research teams and their international partners work with industry giants in the automotive, ICT, life sciences, aerospace

Compute Canada Technology Briefing - November 2015 7

Organizational Cooperation and PlanningPlanning for site selection and the ensuing technology refresh has included deep coordination among the four national host sites for all aspects of procuring deploying configuring operating and supporting Compute Canadarsquos suite of systems and services The Compute Canada Technological Leadership Council (TLC) is responsible for developing specifications for the new systems and will lead the procurement evaluation TLC includes representatives from each national site as also includes the four regional CTOs It is led by the national CTO

New national teams which will draw from Compute Canada member institutions will run the systems and services provide user support and engage in cross-site coordination on major themes such as monitoring storage cloud services and networking The new systems and services will share practices for security The teams for all national systems and services will provide defined coverage and response levels

Procurement ProcessesAll four sites are working with the Compute Canada team to ensure an open and fair acquisition process Resources will be purchased and owned by each site Formation of specifications and evaluation of bids will be by national teams with full engagement by site procurement officers

Flexibility in PlanningPlans described here will be modified as needed based on discussions among the four sites Compute Canada and the national and provincial funding agencies Re-scaling of expectations for system size and capabilities if needed will be based on experience with vendor pricing and the influence of the Canadian dollarrsquos exchange rate There will also be assessment of anticipated user demand including for new technologies or configurations This will be via the SPARC process described below as well as through discussions with funding agencies and their researchers

By late 2016 updates will be considered for any needed revisions to planning for the expansion of the three GP systems and of the scale configuration and timing of the LP system Alignment of supply and demand will be re-assessed for computation and storage

Planning will also be responsive to any new information concerning additional funding the selection of additional hosting sites shifts to Canadarsquos digital research infrastructure strategy or other factors

Funding and GovernanceSFU is the lead organization for the CFI capital program and is executing an interinstitutional agreement with the three other hosting institutions and Compute Canada The Compute Canada membership will be involved in many broader aspects of organizational governance and planning CFI retains oversight for capital spending for the technology refresh as well as operational expenses via the Major Science Initiatives (MSI) program

Usage and CapabilitiesAs Canadarsquos national platform for advanced research computing Compute Canada serves thousands of users in essentially every scientific discipline Compute Canada is continually engaged in renewal and expansion of its services and its audience Beyond Canadarsquos academic community this includes engagement with industry and with international partners Some of the current and expanded services within Compute Canada are described below

Workload Portability Users will find it easier than before to run their jobs on any of the new systems This will be facilitated by deploying a single HPC batch system having a common naming scheme for software modules and filesystem mount points and incorporating mechanisms for data movement with the workload manager For projects involving a live stream of observational data or other time-sensitive characteristics workload portability will help to ensure the jobs run on time wherever appropriate HPC resources are available

Cloud Computing Building on Compute Canadarsquos successful early deployment of cloud systems and services the GP1 GP2 and GP3 systems will comprise a federated cloud including single sign-on shared data services a common cloud scheduler and other features for resiliency and ease of use Additional cloud resources within Compute Canada will be able to become part of the federated cloud simply by using the same authentication and configuration parameters

Big Data The storage architecture and cloud services will facilitate big data workloads including data analytics Storage will include database capabilities and cloud services will support virtual machines with user-selected software and features

National Operations and Support The national teams will work together to provide a consistent and well-supported environment for computation and data This will include all aspects of configuration and support Users will have a single point of contact to the national helpdesk and will also be able to benefit from the expertise of on-campus support personnel

Resource Allocations Compute Canada will continue to allocate compute and storage resources through a fair and open process Workload portability and the consistency of configuration and support will give users extra flexibility when desired in their choice of computing resources

Compute Canada Technology Briefing - November 2015 9

National ServicesConsultations have helped inform planning for systems and services in 2015-2017 Service demands were articulated while consulting with applicants for CFIrsquos Challenge 1 Stage 1 including these middleware services that were identified by multiple applicants

ņ Identification and Authorization Service Provide common login across systems

ņ Software Distribution Service Version-controlled software distribution to multiple sites

ņ Data Transfer Service To move datasets among collaborators and their repositories

ņ Monitoring Service Track uptime and availability of services and platforms

ņ Resource Publishing Service Current information about available resources

These services will be deployed beginning in 2016 for all new systems as part of the infrastructure investment Additional services will be identified and developed deployed and supported based on demand It is Compute Canadarsquos intention to provide a useful and effective set of middleware services accessible to any user or group These will provide a high performance and well-supported baseline upon which users or groups may build their own custom applications Compute Canada views these tools as needed software infrastructure and is devoting some of the Challenge 2 Stage 1+2 funds to developing that infrastructure

Compute Canada views many of the new services identified above as essential enabling tools for Research Data Management (RDM) As data volumes grow there is a growing demand for RDM Compute Canada will provide a common set of middleware services for users with this need RDM will continue to mature during the 2016-2018 period and will include cooperation with other digital research infrastructure providers in Canada

Future Consultations on this PlanIn early 2016 Compute Canada will embark on a second round of SPARC consultations SPARC2 will help to identify current and future needs as well as to parameterize growth in user demand As with the previous SPARC scientists and engineers from the across Canada will be invited to submit descriptions of their research goals and the needed advanced research computing capabilities and capacities required to achieve those goals

Projections for Future Supply and Demand

Technology Impact of Challenge 2 Stage 1+2By the end of calendar year 2017 Compute Canada will have delivered essentially all of the new computational and storage capacity facilitated by CFIrsquos ldquoChallenge 2 Stage 1+2rdquo award The $75M value of capital investment will replace most legacy systems and associated storage

Modernization and capacity resulting from Challenge 2 Stage 1+24

4 Primary disk does not include offlinenearline storage for backups or near-line storage It does include a variety of disk- or disk-like technologies including object storage block storage storage replicas and storage for filesystems

Compute Canada Technology Briefing - November 2015 11

During this technology refresh program CPUs will be replaced with the latest generation along with more memory New nodes will be augmented by GPUs and accelerators A typical node in service in 2015 has dual 6- or 8-core CPUs and 16-32GB of memory A typical node to be deployed in 2016 will have dual 14- or 16-core CPUs with 128GB of memory or greater

Challenge 2 Stage 1+2 is an important and necessary modernization of the DRI provided by Compute Canada Sustained investment is needed to accommodate the needs of current and future users of Canadarsquos national platform for computation and storage

Scenarios of Increasing DemandThere are several factors impacting planning future demand for advanced research computing

1 Demand by users who engage in computational modelling for additional CPU resources

a To increase spatial or temporal resolution

b To add physics or other simulation factors that were previously too slow or computationally expensive to calculate

c To test additional parameters or scenarios

d For projects and users new to Compute Canada especially in nontraditional fields

2 Demand for additional storage resources for computational modellers

a Larger input and output datasets due to larger or more complex models

b The need to keep some datasets beyond the end of a computational campaign to assist in future modelling or to support publications

3 Demand for portals and gateways including from new user populations

a May include needs for highly resilient services and systems

b May include needs for high-end storage subsystems for database operations

c Bring a user base that may be quite large and may include the general public

4 Demand for projects emphasizing instruments and observational data gathering and analysis

a May have irreplaceable or highly valuable data which needs to have multiple copies at multiple locations

b Include Compute Canadarsquos largest storage users many of whom have new instruments in development

c Require computational resources for post-processing analysis portals visualization andor reanalysis

5 Demand by data-focused projects

a May require isolation of data or computation from inappropriate disclosure

b Includes some usage (such as personal health information) with regulatory concerns

c Includes emphasis on data analytic methods that are not yet generally available on Compute Canada resources

6 Demand from projects being directed by funding agencies to consider utilizing Compute Canada resources

a Include a range of use cases that might not be in Compute Canadarsquos current service catalog but will be developed

b Some of these projects are very large and demanding

c Projects view Compute Canada as a partner not just a resource provider

7 Demand from industry

a May require isolation of data or computation from inappropriate disclosure

b Interested in the expertise of Compute Canada perhaps more than the computational resources

8 Demand from government

a Exist within a regulatory environment that might not be in part of Compute Canadarsquos current service catalog but will be developed

b Can involve a long planning timeline

Compute Canada Technology Briefing - November 2015 13

For Challenge 2 Stage 1+2 planning emphasized modernization of the computational and storage resources Planning has been sensitive to anticipated demand growth and changing patterns of utilization informed via Challenge 1 and the SPARC consultations The annual allocations process by Compute Canada is a major indicator of growth trends since it aggregates hundreds of existing projects Data for the 2016 allocation period are now available and reflect the impact of Challenge 2 Stage 1+2 In 2017 further growth is anticipated along with retirement of legacy resources

Through the 2014-15 SPARC process Compute Canada has identified the expected growth in community demand for storage (15x) and compute (7x) resources through 2020 This data has been converted into a doubling time to project future demand in equivalent core years and terabytes of storage Demand indicators support using a doubling time of 18 years for computational demand and 13 years for storage demand

By forecasting demand based on these doubling times we can project a trend into the future For this projection we present a range where the lower bound represents no growth in the Compute Canada user base and the upper bound represents ongoing increases in the user base following historical trends 2011-2015

Trends project a demand for 1-3 million Haswell-equivalent cores by 2020 and more than an exabyte of persistent storage These projections may turn out to be underestimates since some existing disciplines making extensive use of Compute Canada resources today anticipate needing over 1 million cores or 1 exabyte of data just for their own projects by 2020

It is hoped that Compute Canada will be stewards along with members regions and provincial partners of the sustained capital investment that will be required to meet these demands

Compute Canada Technology Briefing - November 2015 15

Vision 2020Compute Canada as a leading provider of digital research infrastructure (DRI) is taking an integrated approach to data and computational infrastructure in order to benefit all sectors of society As a result of the technology refresh and modernization supported by CFIrsquos Challenge 2 Stage 1+2 excellent research will benefit from modern and capable resources for computationally-based and data-focused work

Compute Canada is coordinating with government funding agencies and with other DRI providers to develop a vision of coordinating to provide the worldrsquos most advanced integrated and capable systems services and support for research Future researchers will have seamless access to DRI resources integrated together for maximum efficiency and performance without needing to be concerned with artificial boundaries based on different geographical locations or providers

By 2020 Compute Canada will offer a comprehensive catalog of resources to support the full data research cycle allowing researchers and their industrial and international partners to compete at a global scale In cooperation with Canadarsquos other DRI providers Compute Canadarsquos systems and services will facilitate workflows that easily span different resources from the lab or campus to national computational resources analytical facilities publication archives and with collaborators Local support and engagement will remain a hallmark of delivering excellent service to all users The pathway to this future has begun with the modernization of Compute Canadarsquos advanced research computing cyberinfrastructure through the CFI Challenge 2 Stage 1+2 program

36 York Mills Road Suite 505 Toronto Ontario Canada M2P 2E9

wwwcomputecanadaca | wwwcalculcanadaca | ComputeCanada

Page 8: Compute Canada Technology Briefing · Compute Canada resources research teams and their international partners work with industry giants in the automotive, ICT, life sciences, aerospace

Usage and CapabilitiesAs Canadarsquos national platform for advanced research computing Compute Canada serves thousands of users in essentially every scientific discipline Compute Canada is continually engaged in renewal and expansion of its services and its audience Beyond Canadarsquos academic community this includes engagement with industry and with international partners Some of the current and expanded services within Compute Canada are described below

Workload Portability Users will find it easier than before to run their jobs on any of the new systems This will be facilitated by deploying a single HPC batch system having a common naming scheme for software modules and filesystem mount points and incorporating mechanisms for data movement with the workload manager For projects involving a live stream of observational data or other time-sensitive characteristics workload portability will help to ensure the jobs run on time wherever appropriate HPC resources are available

Cloud Computing Building on Compute Canadarsquos successful early deployment of cloud systems and services the GP1 GP2 and GP3 systems will comprise a federated cloud including single sign-on shared data services a common cloud scheduler and other features for resiliency and ease of use Additional cloud resources within Compute Canada will be able to become part of the federated cloud simply by using the same authentication and configuration parameters

Big Data The storage architecture and cloud services will facilitate big data workloads including data analytics Storage will include database capabilities and cloud services will support virtual machines with user-selected software and features

National Operations and Support The national teams will work together to provide a consistent and well-supported environment for computation and data This will include all aspects of configuration and support Users will have a single point of contact to the national helpdesk and will also be able to benefit from the expertise of on-campus support personnel

Resource Allocations Compute Canada will continue to allocate compute and storage resources through a fair and open process Workload portability and the consistency of configuration and support will give users extra flexibility when desired in their choice of computing resources

Compute Canada Technology Briefing - November 2015 9

National ServicesConsultations have helped inform planning for systems and services in 2015-2017 Service demands were articulated while consulting with applicants for CFIrsquos Challenge 1 Stage 1 including these middleware services that were identified by multiple applicants

ņ Identification and Authorization Service Provide common login across systems

ņ Software Distribution Service Version-controlled software distribution to multiple sites

ņ Data Transfer Service To move datasets among collaborators and their repositories

ņ Monitoring Service Track uptime and availability of services and platforms

ņ Resource Publishing Service Current information about available resources

These services will be deployed beginning in 2016 for all new systems as part of the infrastructure investment Additional services will be identified and developed deployed and supported based on demand It is Compute Canadarsquos intention to provide a useful and effective set of middleware services accessible to any user or group These will provide a high performance and well-supported baseline upon which users or groups may build their own custom applications Compute Canada views these tools as needed software infrastructure and is devoting some of the Challenge 2 Stage 1+2 funds to developing that infrastructure

Compute Canada views many of the new services identified above as essential enabling tools for Research Data Management (RDM) As data volumes grow there is a growing demand for RDM Compute Canada will provide a common set of middleware services for users with this need RDM will continue to mature during the 2016-2018 period and will include cooperation with other digital research infrastructure providers in Canada

Future Consultations on this PlanIn early 2016 Compute Canada will embark on a second round of SPARC consultations SPARC2 will help to identify current and future needs as well as to parameterize growth in user demand As with the previous SPARC scientists and engineers from the across Canada will be invited to submit descriptions of their research goals and the needed advanced research computing capabilities and capacities required to achieve those goals

Projections for Future Supply and Demand

Technology Impact of Challenge 2 Stage 1+2By the end of calendar year 2017 Compute Canada will have delivered essentially all of the new computational and storage capacity facilitated by CFIrsquos ldquoChallenge 2 Stage 1+2rdquo award The $75M value of capital investment will replace most legacy systems and associated storage

Modernization and capacity resulting from Challenge 2 Stage 1+24

4 Primary disk does not include offlinenearline storage for backups or near-line storage It does include a variety of disk- or disk-like technologies including object storage block storage storage replicas and storage for filesystems

Compute Canada Technology Briefing - November 2015 11

During this technology refresh program CPUs will be replaced with the latest generation along with more memory New nodes will be augmented by GPUs and accelerators A typical node in service in 2015 has dual 6- or 8-core CPUs and 16-32GB of memory A typical node to be deployed in 2016 will have dual 14- or 16-core CPUs with 128GB of memory or greater

Challenge 2 Stage 1+2 is an important and necessary modernization of the DRI provided by Compute Canada Sustained investment is needed to accommodate the needs of current and future users of Canadarsquos national platform for computation and storage

Scenarios of Increasing DemandThere are several factors impacting planning future demand for advanced research computing

1 Demand by users who engage in computational modelling for additional CPU resources

a To increase spatial or temporal resolution

b To add physics or other simulation factors that were previously too slow or computationally expensive to calculate

c To test additional parameters or scenarios

d For projects and users new to Compute Canada especially in nontraditional fields

2 Demand for additional storage resources for computational modellers

a Larger input and output datasets due to larger or more complex models

b The need to keep some datasets beyond the end of a computational campaign to assist in future modelling or to support publications

3 Demand for portals and gateways including from new user populations

a May include needs for highly resilient services and systems

b May include needs for high-end storage subsystems for database operations

c Bring a user base that may be quite large and may include the general public

4 Demand for projects emphasizing instruments and observational data gathering and analysis

a May have irreplaceable or highly valuable data which needs to have multiple copies at multiple locations

b Include Compute Canadarsquos largest storage users many of whom have new instruments in development

c Require computational resources for post-processing analysis portals visualization andor reanalysis

5 Demand by data-focused projects

a May require isolation of data or computation from inappropriate disclosure

b Includes some usage (such as personal health information) with regulatory concerns

c Includes emphasis on data analytic methods that are not yet generally available on Compute Canada resources

6 Demand from projects being directed by funding agencies to consider utilizing Compute Canada resources

a Include a range of use cases that might not be in Compute Canadarsquos current service catalog but will be developed

b Some of these projects are very large and demanding

c Projects view Compute Canada as a partner not just a resource provider

7 Demand from industry

a May require isolation of data or computation from inappropriate disclosure

b Interested in the expertise of Compute Canada perhaps more than the computational resources

8 Demand from government

a Exist within a regulatory environment that might not be in part of Compute Canadarsquos current service catalog but will be developed

b Can involve a long planning timeline

Compute Canada Technology Briefing - November 2015 13

For Challenge 2 Stage 1+2 planning emphasized modernization of the computational and storage resources Planning has been sensitive to anticipated demand growth and changing patterns of utilization informed via Challenge 1 and the SPARC consultations The annual allocations process by Compute Canada is a major indicator of growth trends since it aggregates hundreds of existing projects Data for the 2016 allocation period are now available and reflect the impact of Challenge 2 Stage 1+2 In 2017 further growth is anticipated along with retirement of legacy resources

Through the 2014-15 SPARC process Compute Canada has identified the expected growth in community demand for storage (15x) and compute (7x) resources through 2020 This data has been converted into a doubling time to project future demand in equivalent core years and terabytes of storage Demand indicators support using a doubling time of 18 years for computational demand and 13 years for storage demand

By forecasting demand based on these doubling times we can project a trend into the future For this projection we present a range where the lower bound represents no growth in the Compute Canada user base and the upper bound represents ongoing increases in the user base following historical trends 2011-2015

Trends project a demand for 1-3 million Haswell-equivalent cores by 2020 and more than an exabyte of persistent storage These projections may turn out to be underestimates since some existing disciplines making extensive use of Compute Canada resources today anticipate needing over 1 million cores or 1 exabyte of data just for their own projects by 2020

It is hoped that Compute Canada will be stewards along with members regions and provincial partners of the sustained capital investment that will be required to meet these demands

Compute Canada Technology Briefing - November 2015 15

Vision 2020Compute Canada as a leading provider of digital research infrastructure (DRI) is taking an integrated approach to data and computational infrastructure in order to benefit all sectors of society As a result of the technology refresh and modernization supported by CFIrsquos Challenge 2 Stage 1+2 excellent research will benefit from modern and capable resources for computationally-based and data-focused work

Compute Canada is coordinating with government funding agencies and with other DRI providers to develop a vision of coordinating to provide the worldrsquos most advanced integrated and capable systems services and support for research Future researchers will have seamless access to DRI resources integrated together for maximum efficiency and performance without needing to be concerned with artificial boundaries based on different geographical locations or providers

By 2020 Compute Canada will offer a comprehensive catalog of resources to support the full data research cycle allowing researchers and their industrial and international partners to compete at a global scale In cooperation with Canadarsquos other DRI providers Compute Canadarsquos systems and services will facilitate workflows that easily span different resources from the lab or campus to national computational resources analytical facilities publication archives and with collaborators Local support and engagement will remain a hallmark of delivering excellent service to all users The pathway to this future has begun with the modernization of Compute Canadarsquos advanced research computing cyberinfrastructure through the CFI Challenge 2 Stage 1+2 program

36 York Mills Road Suite 505 Toronto Ontario Canada M2P 2E9

wwwcomputecanadaca | wwwcalculcanadaca | ComputeCanada

Page 9: Compute Canada Technology Briefing · Compute Canada resources research teams and their international partners work with industry giants in the automotive, ICT, life sciences, aerospace

Compute Canada Technology Briefing - November 2015 9

National ServicesConsultations have helped inform planning for systems and services in 2015-2017 Service demands were articulated while consulting with applicants for CFIrsquos Challenge 1 Stage 1 including these middleware services that were identified by multiple applicants

ņ Identification and Authorization Service Provide common login across systems

ņ Software Distribution Service Version-controlled software distribution to multiple sites

ņ Data Transfer Service To move datasets among collaborators and their repositories

ņ Monitoring Service Track uptime and availability of services and platforms

ņ Resource Publishing Service Current information about available resources

These services will be deployed beginning in 2016 for all new systems as part of the infrastructure investment Additional services will be identified and developed deployed and supported based on demand It is Compute Canadarsquos intention to provide a useful and effective set of middleware services accessible to any user or group These will provide a high performance and well-supported baseline upon which users or groups may build their own custom applications Compute Canada views these tools as needed software infrastructure and is devoting some of the Challenge 2 Stage 1+2 funds to developing that infrastructure

Compute Canada views many of the new services identified above as essential enabling tools for Research Data Management (RDM) As data volumes grow there is a growing demand for RDM Compute Canada will provide a common set of middleware services for users with this need RDM will continue to mature during the 2016-2018 period and will include cooperation with other digital research infrastructure providers in Canada

Future Consultations on this PlanIn early 2016 Compute Canada will embark on a second round of SPARC consultations SPARC2 will help to identify current and future needs as well as to parameterize growth in user demand As with the previous SPARC scientists and engineers from the across Canada will be invited to submit descriptions of their research goals and the needed advanced research computing capabilities and capacities required to achieve those goals

Projections for Future Supply and Demand

Technology Impact of Challenge 2 Stage 1+2By the end of calendar year 2017 Compute Canada will have delivered essentially all of the new computational and storage capacity facilitated by CFIrsquos ldquoChallenge 2 Stage 1+2rdquo award The $75M value of capital investment will replace most legacy systems and associated storage

Modernization and capacity resulting from Challenge 2 Stage 1+24

4 Primary disk does not include offlinenearline storage for backups or near-line storage It does include a variety of disk- or disk-like technologies including object storage block storage storage replicas and storage for filesystems

Compute Canada Technology Briefing - November 2015 11

During this technology refresh program CPUs will be replaced with the latest generation along with more memory New nodes will be augmented by GPUs and accelerators A typical node in service in 2015 has dual 6- or 8-core CPUs and 16-32GB of memory A typical node to be deployed in 2016 will have dual 14- or 16-core CPUs with 128GB of memory or greater

Challenge 2 Stage 1+2 is an important and necessary modernization of the DRI provided by Compute Canada Sustained investment is needed to accommodate the needs of current and future users of Canadarsquos national platform for computation and storage

Scenarios of Increasing DemandThere are several factors impacting planning future demand for advanced research computing

1 Demand by users who engage in computational modelling for additional CPU resources

a To increase spatial or temporal resolution

b To add physics or other simulation factors that were previously too slow or computationally expensive to calculate

c To test additional parameters or scenarios

d For projects and users new to Compute Canada especially in nontraditional fields

2 Demand for additional storage resources for computational modellers

a Larger input and output datasets due to larger or more complex models

b The need to keep some datasets beyond the end of a computational campaign to assist in future modelling or to support publications

3 Demand for portals and gateways including from new user populations

a May include needs for highly resilient services and systems

b May include needs for high-end storage subsystems for database operations

c Bring a user base that may be quite large and may include the general public

4 Demand for projects emphasizing instruments and observational data gathering and analysis

a May have irreplaceable or highly valuable data which needs to have multiple copies at multiple locations

b Include Compute Canadarsquos largest storage users many of whom have new instruments in development

c Require computational resources for post-processing analysis portals visualization andor reanalysis

5 Demand by data-focused projects

a May require isolation of data or computation from inappropriate disclosure

b Includes some usage (such as personal health information) with regulatory concerns

c Includes emphasis on data analytic methods that are not yet generally available on Compute Canada resources

6 Demand from projects being directed by funding agencies to consider utilizing Compute Canada resources

a Include a range of use cases that might not be in Compute Canadarsquos current service catalog but will be developed

b Some of these projects are very large and demanding

c Projects view Compute Canada as a partner not just a resource provider

7 Demand from industry

a May require isolation of data or computation from inappropriate disclosure

b Interested in the expertise of Compute Canada perhaps more than the computational resources

8 Demand from government

a Exist within a regulatory environment that might not be in part of Compute Canadarsquos current service catalog but will be developed

b Can involve a long planning timeline

Compute Canada Technology Briefing - November 2015 13

For Challenge 2 Stage 1+2 planning emphasized modernization of the computational and storage resources Planning has been sensitive to anticipated demand growth and changing patterns of utilization informed via Challenge 1 and the SPARC consultations The annual allocations process by Compute Canada is a major indicator of growth trends since it aggregates hundreds of existing projects Data for the 2016 allocation period are now available and reflect the impact of Challenge 2 Stage 1+2 In 2017 further growth is anticipated along with retirement of legacy resources

Through the 2014-15 SPARC process Compute Canada has identified the expected growth in community demand for storage (15x) and compute (7x) resources through 2020 This data has been converted into a doubling time to project future demand in equivalent core years and terabytes of storage Demand indicators support using a doubling time of 18 years for computational demand and 13 years for storage demand

By forecasting demand based on these doubling times we can project a trend into the future For this projection we present a range where the lower bound represents no growth in the Compute Canada user base and the upper bound represents ongoing increases in the user base following historical trends 2011-2015

Trends project a demand for 1-3 million Haswell-equivalent cores by 2020 and more than an exabyte of persistent storage These projections may turn out to be underestimates since some existing disciplines making extensive use of Compute Canada resources today anticipate needing over 1 million cores or 1 exabyte of data just for their own projects by 2020

It is hoped that Compute Canada will be stewards along with members regions and provincial partners of the sustained capital investment that will be required to meet these demands

Compute Canada Technology Briefing - November 2015 15

Vision 2020Compute Canada as a leading provider of digital research infrastructure (DRI) is taking an integrated approach to data and computational infrastructure in order to benefit all sectors of society As a result of the technology refresh and modernization supported by CFIrsquos Challenge 2 Stage 1+2 excellent research will benefit from modern and capable resources for computationally-based and data-focused work

Compute Canada is coordinating with government funding agencies and with other DRI providers to develop a vision of coordinating to provide the worldrsquos most advanced integrated and capable systems services and support for research Future researchers will have seamless access to DRI resources integrated together for maximum efficiency and performance without needing to be concerned with artificial boundaries based on different geographical locations or providers

By 2020 Compute Canada will offer a comprehensive catalog of resources to support the full data research cycle allowing researchers and their industrial and international partners to compete at a global scale In cooperation with Canadarsquos other DRI providers Compute Canadarsquos systems and services will facilitate workflows that easily span different resources from the lab or campus to national computational resources analytical facilities publication archives and with collaborators Local support and engagement will remain a hallmark of delivering excellent service to all users The pathway to this future has begun with the modernization of Compute Canadarsquos advanced research computing cyberinfrastructure through the CFI Challenge 2 Stage 1+2 program

36 York Mills Road Suite 505 Toronto Ontario Canada M2P 2E9

wwwcomputecanadaca | wwwcalculcanadaca | ComputeCanada

Page 10: Compute Canada Technology Briefing · Compute Canada resources research teams and their international partners work with industry giants in the automotive, ICT, life sciences, aerospace

Projections for Future Supply and Demand

Technology Impact of Challenge 2 Stage 1+2By the end of calendar year 2017 Compute Canada will have delivered essentially all of the new computational and storage capacity facilitated by CFIrsquos ldquoChallenge 2 Stage 1+2rdquo award The $75M value of capital investment will replace most legacy systems and associated storage

Modernization and capacity resulting from Challenge 2 Stage 1+24

4 Primary disk does not include offlinenearline storage for backups or near-line storage It does include a variety of disk- or disk-like technologies including object storage block storage storage replicas and storage for filesystems

Compute Canada Technology Briefing - November 2015 11

During this technology refresh program CPUs will be replaced with the latest generation along with more memory New nodes will be augmented by GPUs and accelerators A typical node in service in 2015 has dual 6- or 8-core CPUs and 16-32GB of memory A typical node to be deployed in 2016 will have dual 14- or 16-core CPUs with 128GB of memory or greater

Challenge 2 Stage 1+2 is an important and necessary modernization of the DRI provided by Compute Canada Sustained investment is needed to accommodate the needs of current and future users of Canadarsquos national platform for computation and storage

Scenarios of Increasing DemandThere are several factors impacting planning future demand for advanced research computing

1 Demand by users who engage in computational modelling for additional CPU resources

a To increase spatial or temporal resolution

b To add physics or other simulation factors that were previously too slow or computationally expensive to calculate

c To test additional parameters or scenarios

d For projects and users new to Compute Canada especially in nontraditional fields

2 Demand for additional storage resources for computational modellers

a Larger input and output datasets due to larger or more complex models

b The need to keep some datasets beyond the end of a computational campaign to assist in future modelling or to support publications

3 Demand for portals and gateways including from new user populations

a May include needs for highly resilient services and systems

b May include needs for high-end storage subsystems for database operations

c Bring a user base that may be quite large and may include the general public

4 Demand for projects emphasizing instruments and observational data gathering and analysis

a May have irreplaceable or highly valuable data which needs to have multiple copies at multiple locations

b Include Compute Canadarsquos largest storage users many of whom have new instruments in development

c Require computational resources for post-processing analysis portals visualization andor reanalysis

5 Demand by data-focused projects

a May require isolation of data or computation from inappropriate disclosure

b Includes some usage (such as personal health information) with regulatory concerns

c Includes emphasis on data analytic methods that are not yet generally available on Compute Canada resources

6 Demand from projects being directed by funding agencies to consider utilizing Compute Canada resources

a Include a range of use cases that might not be in Compute Canadarsquos current service catalog but will be developed

b Some of these projects are very large and demanding

c Projects view Compute Canada as a partner not just a resource provider

7 Demand from industry

a May require isolation of data or computation from inappropriate disclosure

b Interested in the expertise of Compute Canada perhaps more than the computational resources

8 Demand from government

a Exist within a regulatory environment that might not be in part of Compute Canadarsquos current service catalog but will be developed

b Can involve a long planning timeline

Compute Canada Technology Briefing - November 2015 13

For Challenge 2 Stage 1+2 planning emphasized modernization of the computational and storage resources Planning has been sensitive to anticipated demand growth and changing patterns of utilization informed via Challenge 1 and the SPARC consultations The annual allocations process by Compute Canada is a major indicator of growth trends since it aggregates hundreds of existing projects Data for the 2016 allocation period are now available and reflect the impact of Challenge 2 Stage 1+2 In 2017 further growth is anticipated along with retirement of legacy resources

Through the 2014-15 SPARC process Compute Canada has identified the expected growth in community demand for storage (15x) and compute (7x) resources through 2020 This data has been converted into a doubling time to project future demand in equivalent core years and terabytes of storage Demand indicators support using a doubling time of 18 years for computational demand and 13 years for storage demand

By forecasting demand based on these doubling times we can project a trend into the future For this projection we present a range where the lower bound represents no growth in the Compute Canada user base and the upper bound represents ongoing increases in the user base following historical trends 2011-2015

Trends project a demand for 1-3 million Haswell-equivalent cores by 2020 and more than an exabyte of persistent storage These projections may turn out to be underestimates since some existing disciplines making extensive use of Compute Canada resources today anticipate needing over 1 million cores or 1 exabyte of data just for their own projects by 2020

It is hoped that Compute Canada will be stewards along with members regions and provincial partners of the sustained capital investment that will be required to meet these demands

Compute Canada Technology Briefing - November 2015 15

Vision 2020Compute Canada as a leading provider of digital research infrastructure (DRI) is taking an integrated approach to data and computational infrastructure in order to benefit all sectors of society As a result of the technology refresh and modernization supported by CFIrsquos Challenge 2 Stage 1+2 excellent research will benefit from modern and capable resources for computationally-based and data-focused work

Compute Canada is coordinating with government funding agencies and with other DRI providers to develop a vision of coordinating to provide the worldrsquos most advanced integrated and capable systems services and support for research Future researchers will have seamless access to DRI resources integrated together for maximum efficiency and performance without needing to be concerned with artificial boundaries based on different geographical locations or providers

By 2020 Compute Canada will offer a comprehensive catalog of resources to support the full data research cycle allowing researchers and their industrial and international partners to compete at a global scale In cooperation with Canadarsquos other DRI providers Compute Canadarsquos systems and services will facilitate workflows that easily span different resources from the lab or campus to national computational resources analytical facilities publication archives and with collaborators Local support and engagement will remain a hallmark of delivering excellent service to all users The pathway to this future has begun with the modernization of Compute Canadarsquos advanced research computing cyberinfrastructure through the CFI Challenge 2 Stage 1+2 program

36 York Mills Road Suite 505 Toronto Ontario Canada M2P 2E9

wwwcomputecanadaca | wwwcalculcanadaca | ComputeCanada

Page 11: Compute Canada Technology Briefing · Compute Canada resources research teams and their international partners work with industry giants in the automotive, ICT, life sciences, aerospace

Compute Canada Technology Briefing - November 2015 11

During this technology refresh program CPUs will be replaced with the latest generation along with more memory New nodes will be augmented by GPUs and accelerators A typical node in service in 2015 has dual 6- or 8-core CPUs and 16-32GB of memory A typical node to be deployed in 2016 will have dual 14- or 16-core CPUs with 128GB of memory or greater

Challenge 2 Stage 1+2 is an important and necessary modernization of the DRI provided by Compute Canada Sustained investment is needed to accommodate the needs of current and future users of Canadarsquos national platform for computation and storage

Scenarios of Increasing DemandThere are several factors impacting planning future demand for advanced research computing

1 Demand by users who engage in computational modelling for additional CPU resources

a To increase spatial or temporal resolution

b To add physics or other simulation factors that were previously too slow or computationally expensive to calculate

c To test additional parameters or scenarios

d For projects and users new to Compute Canada especially in nontraditional fields

2 Demand for additional storage resources for computational modellers

a Larger input and output datasets due to larger or more complex models

b The need to keep some datasets beyond the end of a computational campaign to assist in future modelling or to support publications

3 Demand for portals and gateways including from new user populations

a May include needs for highly resilient services and systems

b May include needs for high-end storage subsystems for database operations

c Bring a user base that may be quite large and may include the general public

4 Demand for projects emphasizing instruments and observational data gathering and analysis

a May have irreplaceable or highly valuable data which needs to have multiple copies at multiple locations

b Include Compute Canadarsquos largest storage users many of whom have new instruments in development

c Require computational resources for post-processing analysis portals visualization andor reanalysis

5 Demand by data-focused projects

a May require isolation of data or computation from inappropriate disclosure

b Includes some usage (such as personal health information) with regulatory concerns

c Includes emphasis on data analytic methods that are not yet generally available on Compute Canada resources

6 Demand from projects being directed by funding agencies to consider utilizing Compute Canada resources

a Include a range of use cases that might not be in Compute Canadarsquos current service catalog but will be developed

b Some of these projects are very large and demanding

c Projects view Compute Canada as a partner not just a resource provider

7 Demand from industry

a May require isolation of data or computation from inappropriate disclosure

b Interested in the expertise of Compute Canada perhaps more than the computational resources

8 Demand from government

a Exist within a regulatory environment that might not be in part of Compute Canadarsquos current service catalog but will be developed

b Can involve a long planning timeline

Compute Canada Technology Briefing - November 2015 13

For Challenge 2 Stage 1+2 planning emphasized modernization of the computational and storage resources Planning has been sensitive to anticipated demand growth and changing patterns of utilization informed via Challenge 1 and the SPARC consultations The annual allocations process by Compute Canada is a major indicator of growth trends since it aggregates hundreds of existing projects Data for the 2016 allocation period are now available and reflect the impact of Challenge 2 Stage 1+2 In 2017 further growth is anticipated along with retirement of legacy resources

Through the 2014-15 SPARC process Compute Canada has identified the expected growth in community demand for storage (15x) and compute (7x) resources through 2020 This data has been converted into a doubling time to project future demand in equivalent core years and terabytes of storage Demand indicators support using a doubling time of 18 years for computational demand and 13 years for storage demand

By forecasting demand based on these doubling times we can project a trend into the future For this projection we present a range where the lower bound represents no growth in the Compute Canada user base and the upper bound represents ongoing increases in the user base following historical trends 2011-2015

Trends project a demand for 1-3 million Haswell-equivalent cores by 2020 and more than an exabyte of persistent storage These projections may turn out to be underestimates since some existing disciplines making extensive use of Compute Canada resources today anticipate needing over 1 million cores or 1 exabyte of data just for their own projects by 2020

It is hoped that Compute Canada will be stewards along with members regions and provincial partners of the sustained capital investment that will be required to meet these demands

Compute Canada Technology Briefing - November 2015 15

Vision 2020Compute Canada as a leading provider of digital research infrastructure (DRI) is taking an integrated approach to data and computational infrastructure in order to benefit all sectors of society As a result of the technology refresh and modernization supported by CFIrsquos Challenge 2 Stage 1+2 excellent research will benefit from modern and capable resources for computationally-based and data-focused work

Compute Canada is coordinating with government funding agencies and with other DRI providers to develop a vision of coordinating to provide the worldrsquos most advanced integrated and capable systems services and support for research Future researchers will have seamless access to DRI resources integrated together for maximum efficiency and performance without needing to be concerned with artificial boundaries based on different geographical locations or providers

By 2020 Compute Canada will offer a comprehensive catalog of resources to support the full data research cycle allowing researchers and their industrial and international partners to compete at a global scale In cooperation with Canadarsquos other DRI providers Compute Canadarsquos systems and services will facilitate workflows that easily span different resources from the lab or campus to national computational resources analytical facilities publication archives and with collaborators Local support and engagement will remain a hallmark of delivering excellent service to all users The pathway to this future has begun with the modernization of Compute Canadarsquos advanced research computing cyberinfrastructure through the CFI Challenge 2 Stage 1+2 program

36 York Mills Road Suite 505 Toronto Ontario Canada M2P 2E9

wwwcomputecanadaca | wwwcalculcanadaca | ComputeCanada

Page 12: Compute Canada Technology Briefing · Compute Canada resources research teams and their international partners work with industry giants in the automotive, ICT, life sciences, aerospace

5 Demand by data-focused projects

a May require isolation of data or computation from inappropriate disclosure

b Includes some usage (such as personal health information) with regulatory concerns

c Includes emphasis on data analytic methods that are not yet generally available on Compute Canada resources

6 Demand from projects being directed by funding agencies to consider utilizing Compute Canada resources

a Include a range of use cases that might not be in Compute Canadarsquos current service catalog but will be developed

b Some of these projects are very large and demanding

c Projects view Compute Canada as a partner not just a resource provider

7 Demand from industry

a May require isolation of data or computation from inappropriate disclosure

b Interested in the expertise of Compute Canada perhaps more than the computational resources

8 Demand from government

a Exist within a regulatory environment that might not be in part of Compute Canadarsquos current service catalog but will be developed

b Can involve a long planning timeline

Compute Canada Technology Briefing - November 2015 13

For Challenge 2 Stage 1+2 planning emphasized modernization of the computational and storage resources Planning has been sensitive to anticipated demand growth and changing patterns of utilization informed via Challenge 1 and the SPARC consultations The annual allocations process by Compute Canada is a major indicator of growth trends since it aggregates hundreds of existing projects Data for the 2016 allocation period are now available and reflect the impact of Challenge 2 Stage 1+2 In 2017 further growth is anticipated along with retirement of legacy resources

Through the 2014-15 SPARC process Compute Canada has identified the expected growth in community demand for storage (15x) and compute (7x) resources through 2020 This data has been converted into a doubling time to project future demand in equivalent core years and terabytes of storage Demand indicators support using a doubling time of 18 years for computational demand and 13 years for storage demand

By forecasting demand based on these doubling times we can project a trend into the future For this projection we present a range where the lower bound represents no growth in the Compute Canada user base and the upper bound represents ongoing increases in the user base following historical trends 2011-2015

Trends project a demand for 1-3 million Haswell-equivalent cores by 2020 and more than an exabyte of persistent storage These projections may turn out to be underestimates since some existing disciplines making extensive use of Compute Canada resources today anticipate needing over 1 million cores or 1 exabyte of data just for their own projects by 2020

It is hoped that Compute Canada will be stewards along with members regions and provincial partners of the sustained capital investment that will be required to meet these demands

Compute Canada Technology Briefing - November 2015 15

Vision 2020Compute Canada as a leading provider of digital research infrastructure (DRI) is taking an integrated approach to data and computational infrastructure in order to benefit all sectors of society As a result of the technology refresh and modernization supported by CFIrsquos Challenge 2 Stage 1+2 excellent research will benefit from modern and capable resources for computationally-based and data-focused work

Compute Canada is coordinating with government funding agencies and with other DRI providers to develop a vision of coordinating to provide the worldrsquos most advanced integrated and capable systems services and support for research Future researchers will have seamless access to DRI resources integrated together for maximum efficiency and performance without needing to be concerned with artificial boundaries based on different geographical locations or providers

By 2020 Compute Canada will offer a comprehensive catalog of resources to support the full data research cycle allowing researchers and their industrial and international partners to compete at a global scale In cooperation with Canadarsquos other DRI providers Compute Canadarsquos systems and services will facilitate workflows that easily span different resources from the lab or campus to national computational resources analytical facilities publication archives and with collaborators Local support and engagement will remain a hallmark of delivering excellent service to all users The pathway to this future has begun with the modernization of Compute Canadarsquos advanced research computing cyberinfrastructure through the CFI Challenge 2 Stage 1+2 program

36 York Mills Road Suite 505 Toronto Ontario Canada M2P 2E9

wwwcomputecanadaca | wwwcalculcanadaca | ComputeCanada

Page 13: Compute Canada Technology Briefing · Compute Canada resources research teams and their international partners work with industry giants in the automotive, ICT, life sciences, aerospace

Compute Canada Technology Briefing - November 2015 13

For Challenge 2 Stage 1+2 planning emphasized modernization of the computational and storage resources Planning has been sensitive to anticipated demand growth and changing patterns of utilization informed via Challenge 1 and the SPARC consultations The annual allocations process by Compute Canada is a major indicator of growth trends since it aggregates hundreds of existing projects Data for the 2016 allocation period are now available and reflect the impact of Challenge 2 Stage 1+2 In 2017 further growth is anticipated along with retirement of legacy resources

Through the 2014-15 SPARC process Compute Canada has identified the expected growth in community demand for storage (15x) and compute (7x) resources through 2020 This data has been converted into a doubling time to project future demand in equivalent core years and terabytes of storage Demand indicators support using a doubling time of 18 years for computational demand and 13 years for storage demand

By forecasting demand based on these doubling times we can project a trend into the future For this projection we present a range where the lower bound represents no growth in the Compute Canada user base and the upper bound represents ongoing increases in the user base following historical trends 2011-2015

Trends project a demand for 1-3 million Haswell-equivalent cores by 2020 and more than an exabyte of persistent storage These projections may turn out to be underestimates since some existing disciplines making extensive use of Compute Canada resources today anticipate needing over 1 million cores or 1 exabyte of data just for their own projects by 2020

It is hoped that Compute Canada will be stewards along with members regions and provincial partners of the sustained capital investment that will be required to meet these demands

Compute Canada Technology Briefing - November 2015 15

Vision 2020Compute Canada as a leading provider of digital research infrastructure (DRI) is taking an integrated approach to data and computational infrastructure in order to benefit all sectors of society As a result of the technology refresh and modernization supported by CFIrsquos Challenge 2 Stage 1+2 excellent research will benefit from modern and capable resources for computationally-based and data-focused work

Compute Canada is coordinating with government funding agencies and with other DRI providers to develop a vision of coordinating to provide the worldrsquos most advanced integrated and capable systems services and support for research Future researchers will have seamless access to DRI resources integrated together for maximum efficiency and performance without needing to be concerned with artificial boundaries based on different geographical locations or providers

By 2020 Compute Canada will offer a comprehensive catalog of resources to support the full data research cycle allowing researchers and their industrial and international partners to compete at a global scale In cooperation with Canadarsquos other DRI providers Compute Canadarsquos systems and services will facilitate workflows that easily span different resources from the lab or campus to national computational resources analytical facilities publication archives and with collaborators Local support and engagement will remain a hallmark of delivering excellent service to all users The pathway to this future has begun with the modernization of Compute Canadarsquos advanced research computing cyberinfrastructure through the CFI Challenge 2 Stage 1+2 program

36 York Mills Road Suite 505 Toronto Ontario Canada M2P 2E9

wwwcomputecanadaca | wwwcalculcanadaca | ComputeCanada

Page 14: Compute Canada Technology Briefing · Compute Canada resources research teams and their international partners work with industry giants in the automotive, ICT, life sciences, aerospace

Compute Canada Technology Briefing - November 2015 15

Vision 2020Compute Canada as a leading provider of digital research infrastructure (DRI) is taking an integrated approach to data and computational infrastructure in order to benefit all sectors of society As a result of the technology refresh and modernization supported by CFIrsquos Challenge 2 Stage 1+2 excellent research will benefit from modern and capable resources for computationally-based and data-focused work

Compute Canada is coordinating with government funding agencies and with other DRI providers to develop a vision of coordinating to provide the worldrsquos most advanced integrated and capable systems services and support for research Future researchers will have seamless access to DRI resources integrated together for maximum efficiency and performance without needing to be concerned with artificial boundaries based on different geographical locations or providers

By 2020 Compute Canada will offer a comprehensive catalog of resources to support the full data research cycle allowing researchers and their industrial and international partners to compete at a global scale In cooperation with Canadarsquos other DRI providers Compute Canadarsquos systems and services will facilitate workflows that easily span different resources from the lab or campus to national computational resources analytical facilities publication archives and with collaborators Local support and engagement will remain a hallmark of delivering excellent service to all users The pathway to this future has begun with the modernization of Compute Canadarsquos advanced research computing cyberinfrastructure through the CFI Challenge 2 Stage 1+2 program

36 York Mills Road Suite 505 Toronto Ontario Canada M2P 2E9

wwwcomputecanadaca | wwwcalculcanadaca | ComputeCanada

Page 15: Compute Canada Technology Briefing · Compute Canada resources research teams and their international partners work with industry giants in the automotive, ICT, life sciences, aerospace

36 York Mills Road Suite 505 Toronto Ontario Canada M2P 2E9

wwwcomputecanadaca | wwwcalculcanadaca | ComputeCanada