11
Why are the biggest supercomputers so challenging? Balancing Software Engineering 26 Katherine Riley August 14, 2013

Why are the biggest supercomputers so challenging?press3.mcs.anl.gov/computingschool/files/2014/01/HPC...TOP500 Release June 2014 Category Cores per Socket Submit Legend: 4, 6, 8,

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Why are the biggest supercomputers so challenging?press3.mcs.anl.gov/computingschool/files/2014/01/HPC...TOP500 Release June 2014 Category Cores per Socket Submit Legend: 4, 6, 8,

Why are the biggest supercomputers so challenging?

Balancing Software Engineering

26

Katherine Riley August 14, 2013

Page 2: Why are the biggest supercomputers so challenging?press3.mcs.anl.gov/computingschool/files/2014/01/HPC...TOP500 Release June 2014 Category Cores per Socket Submit Legend: 4, 6, 8,

Define a supercomputer

“A supercomputer is a computer at the frontline of contemporary processing capacity – particularly speed of calculation which can happen at speeds of nanoseconds” - Wikipedia

¤  But not all supercomputers are as ‘frontline’ as others ¤  This makes the ecosystem … challenging.

27

Page 3: Why are the biggest supercomputers so challenging?press3.mcs.anl.gov/computingschool/files/2014/01/HPC...TOP500 Release June 2014 Category Cores per Socket Submit Legend: 4, 6, 8,

Top 500 Comment

28

8/14/14, 9:36 AMEfficiency, Power, Cores... | TOP500 Supercomputer Sites

Page 1 of 2http://www.top500.org/statistics/efficiency-power-cores/

Home (/) / Statistics (/statistics/) / Efficiency, Power, Cores...

Efficiency, Power, Cores...R and R values are in GFlops. For more details about other fields, check the TOP500 description(/project/top500_description).

R values are calculated using the advertised clock rate of the CPU. For the efficiency of the systems you should take intoaccount the Turbo CPU clock rate where it applies.

Chart Type

Rmax

TOP500 Release

June 2014

Category

Cores per Socket

Submit

Legend:

4, 6, 8, 10, 12, 16,

max peak

peak

0 100 200 300 400 500

100,000

1,000,000

10,000,000

100,000,000

Rm

ax (

GFl

op/

s) 2

Robert Scott and 3,527 others like this.LikeLike

This Week in HPC: Big Deals Cascade in for Cray and IBM Commits $3 Billion in Chip Research wp.me/p3RLHQ-cwC

Retweeted by TOP500

insideHPC.com @insideHPC

Expand

The new Trinity #supercomputer begins the transition to new #exascale architectures bit.ly/1qrl6TD @cray_inc pic.twitter.com/jx94tknk4P

Retweeted by TOP500

Los Alamos HPC @LANL_HPC

Expand

Exclusive: @LANL Lead Shares @Cray_inc "Trinity" #HPC System Feeds and Speeds -

HPCwire @HPCwire

11 Jul

11 Jul

10 Jul

Tweets FollowFollow

Tweet to @top500supercomp

Log in (/accounts/login/?next=/statistics/efficiency-power-cores/)

or

Sign up (/accounts/signup/?next=/statistics/efficiency-power-cores/)

8/14/14, 9:37 AMEfficiency, Power, Cores... | TOP500 Supercomputer Sites

Page 1 of 2http://www.top500.org/statistics/efficiency-power-cores/

Home (/) / Statistics (/statistics/) / Efficiency, Power, Cores...

Efficiency, Power, Cores...R and R values are in GFlops. For more details about other fields, check the TOP500 description(/project/top500_description).

R values are calculated using the advertised clock rate of the CPU. For the efficiency of the systems you should take intoaccount the Turbo CPU clock rate where it applies.

Chart Type

Rmax

TOP500 Release

June 2014

Category

Cores per Socket

Submit

Legend:

4, 6, 8, 10, 12, 16,

max peak

peak

0 10 20 30 40 50

100,000

1,000,000

10,000,000

100,000,000

Rm

ax (

GFl

op/

s)

Show all

2

Robert Scott and 3,527 others like this.LikeLike

This Week in HPC: Big Deals Cascade in for Cray and IBM Commits $3 Billion in Chip Research wp.me/p3RLHQ-cwC

Retweeted by TOP500

insideHPC.com @insideHPC

Expand

The new Trinity #supercomputer begins the transition to new #exascale architectures bit.ly/1qrl6TD @cray_inc pic.twitter.com/jx94tknk4P

Retweeted by TOP500

Los Alamos HPC @LANL_HPC

Expand

Exclusive: @LANL Lead Shares @Cray_inc "Trinity" #HPC System Feeds and Speeds -

HPCwire @HPCwire

11 Jul

11 Jul

10 Jul

Tweets FollowFollow

Tweet to @top500supercomp

Log in (/accounts/login/?next=/statistics/efficiency-power-cores/)

or

Sign up (/accounts/signup/?next=/statistics/efficiency-power-cores/)

Page 4: Why are the biggest supercomputers so challenging?press3.mcs.anl.gov/computingschool/files/2014/01/HPC...TOP500 Release June 2014 Category Cores per Socket Submit Legend: 4, 6, 8,

Top few vs Top 500

¤  There are not many huge supercomputers ¤  The biggest supercomputers are bleeding edge

¥  Hardware is sort of hardened in install ¥  Software is flushed out over the first year or two ¥  What I consider normal, many might consider HOSTILE

¤  Why is That? ¥  Reputation ¥  Move science forward

¡  Machines are targeted to a small number of users

¥  Help steer technology

29

Page 5: Why are the biggest supercomputers so challenging?press3.mcs.anl.gov/computingschool/files/2014/01/HPC...TOP500 Release June 2014 Category Cores per Socket Submit Legend: 4, 6, 8,

How is a big DOE supercomputer purchased?

30

Science  $   Call  for  

Designs  

Choice  (  Science,  Facility,  Budget)    

Opera=ons  

Poli=cs  

Science  

Opera=ons  

$  

Earliest  Hardware  

Delivery  

Science  

Ops  Vendor  

Shake  it  out  at  scale  

Science  

Ops  Vendor  

Usability  Balance  Capability  Future  

Collabora=on  

Science  

Ops  Vendor  

Science  

Ops  Vendor  

Here  be  Early  science  

Page 6: Why are the biggest supercomputers so challenging?press3.mcs.anl.gov/computingschool/files/2014/01/HPC...TOP500 Release June 2014 Category Cores per Socket Submit Legend: 4, 6, 8,

How is a big DOE supercomputer purchased?

31

Science  $   Call  for  

Designs  

Choice  (  Science,  Facility,  Budget)    

Opera=ons  

Poli=cs  

Science  

Opera=ons  

$  

Earliest  Hardware  

Delivery  

Science  

Ops  Vendor  

Shake  it  out  at  scale  

Science  

Ops  Vendor  

Usability  Balance  Capability  Future  

Collabora=on  

Science  

Ops  Vendor  

Science  

Ops  Vendor  

•  All  over  5  years  •  We  start  the  next  one  before  the  current  one  is  fully  “on  the  floor”  

Page 7: Why are the biggest supercomputers so challenging?press3.mcs.anl.gov/computingschool/files/2014/01/HPC...TOP500 Release June 2014 Category Cores per Socket Submit Legend: 4, 6, 8,

Collaboration does not yield perfection

¤  Even with all these iterations with the vendor, the machine will not be perfect

32

Applica=ons  

Budget  

Technology  

Future  Path  

Page 8: Why are the biggest supercomputers so challenging?press3.mcs.anl.gov/computingschool/files/2014/01/HPC...TOP500 Release June 2014 Category Cores per Socket Submit Legend: 4, 6, 8,

Applica=ons  

Budget  

Technology  

Future  Path  

Collaboration does not yield perfection

¤  Even with all these iterations with the vendor, the machine will not be perfect

33

(Not  perfectly  weighted)  

Page 9: Why are the biggest supercomputers so challenging?press3.mcs.anl.gov/computingschool/files/2014/01/HPC...TOP500 Release June 2014 Category Cores per Socket Submit Legend: 4, 6, 8,

What this means for your SW ecosystem?

¤  Bleeding edge systems are their own fun ¤  The system is available for production REALLY near to it’s first

construction ¥  Early science buffers this, but not completely ¥  It takes time to move software to a new architecture at a new scale ¥  Even earliest access won’t really resolve this ¥  Vendor does not have a significant leg up

¤  As SW on a machine stabilizes, the machine reaches end of life

¤  The available software will be bare bones ¤  Complicated requirements can slow you down ¤  Facilities tries very very hard to expand that as fast as possible

34

Page 10: Why are the biggest supercomputers so challenging?press3.mcs.anl.gov/computingschool/files/2014/01/HPC...TOP500 Release June 2014 Category Cores per Socket Submit Legend: 4, 6, 8,

Why Engineering Principles are Crucial

¤  Testing of your software is crucial ¥  You might have a better process than library developers & vendors ¥  Defense against other people’s bugs

¤  Versioning ¤  Abstraction of functionality

¥  To address the speed of change ¥  Help debug problems ¥  Help with performance

¤  Document to help us help you ¤  Running on the biggest systems is a competitive advantage

¥  Own all of your code. Even libraries.

¤  Let the compiler do the work with limits ¤  Use high level languages with care

35

Page 11: Why are the biggest supercomputers so challenging?press3.mcs.anl.gov/computingschool/files/2014/01/HPC...TOP500 Release June 2014 Category Cores per Socket Submit Legend: 4, 6, 8,

Questions

36