28
Preparing for the Future of Computing Pieter Hijma [email protected] Friday 15 April 2016 Coffee & Data on Infrastructures

Preparing for the Future of Computingdelaat/news/2016-04-15/02-pieter-hij… · 8086 80286 80386 80486 Pentium Pentium II Pentium III Pentium 4 Core 2 Core i7 (Nehalem) Core i7 (Sandy)

  • Upload
    others

  • View
    7

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Preparing for the Future of Computingdelaat/news/2016-04-15/02-pieter-hij… · 8086 80286 80386 80486 Pentium Pentium II Pentium III Pentium 4 Core 2 Core i7 (Nehalem) Core i7 (Sandy)

Preparing for the Future of Computing

Pieter [email protected]

Friday 15 April 2016

Coffee & Data on Infrastructures

Page 2: Preparing for the Future of Computingdelaat/news/2016-04-15/02-pieter-hij… · 8086 80286 80386 80486 Pentium Pentium II Pentium III Pentium 4 Core 2 Core i7 (Nehalem) Core i7 (Sandy)

Broaden the view

Anticipate risks

Become aware of current issues

2/23

Page 3: Preparing for the Future of Computingdelaat/news/2016-04-15/02-pieter-hij… · 8086 80286 80386 80486 Pentium Pentium II Pentium III Pentium 4 Core 2 Core i7 (Nehalem) Core i7 (Sandy)

Message

In coming years, software will have to adapt to hardware morethan previously: Software Crisis

10101010101010101010101010101010101010101010 01010 10101010 0 0100 1 1 101010 10 0 1 01 1 0 0 0101010101010101 0101010101010 101010101010

3/23

Page 4: Preparing for the Future of Computingdelaat/news/2016-04-15/02-pieter-hij… · 8086 80286 80386 80486 Pentium Pentium II Pentium III Pentium 4 Core 2 Core i7 (Nehalem) Core i7 (Sandy)

My background

• PhD. on programmingmany-cores

• High-performancecomputing: front-runner

Programming Many-Cores

on

Multiple Levels of Abstraction

Pieter Hijma

4/23

Page 5: Preparing for the Future of Computingdelaat/news/2016-04-15/02-pieter-hij… · 8086 80286 80386 80486 Pentium Pentium II Pentium III Pentium 4 Core 2 Core i7 (Nehalem) Core i7 (Sandy)

Look into the Past

103

104

105

106

107

108

109

1010

1970 1975 1980 1985 1990 1995 2000 2005 2010 2015

# t

ransi

stors

clock

speed (

MH

z)

Moore's law

# transistors

80088080

8086

8028680386

80486

Pentium

Pentium II

Pentium IIIPentium 4

Core 2

Core i7 (Nehalem)Core i7 (Sandy)

Core i7 (Haswell)

5/23

Page 6: Preparing for the Future of Computingdelaat/news/2016-04-15/02-pieter-hij… · 8086 80286 80386 80486 Pentium Pentium II Pentium III Pentium 4 Core 2 Core i7 (Nehalem) Core i7 (Sandy)

Look into the Past

103

104

105

106

107

108

109

1010

1970 1975 1980 1985 1990 1995 2000 2005 2010 2015 0.1

1

10

100

1000

10000

# t

ransi

stors

clock

speed (

MH

z)

Moore's law

# transistors

80088080

8086

8028680386

80486

Pentium

Pentium II

Pentium IIIPentium 4

Core 2

Core i7 (Nehalem)Core i7 (Sandy)

Core i7 (Haswell)clockspeed (MHz)

5/23

Page 7: Preparing for the Future of Computingdelaat/news/2016-04-15/02-pieter-hij… · 8086 80286 80386 80486 Pentium Pentium II Pentium III Pentium 4 Core 2 Core i7 (Nehalem) Core i7 (Sandy)

Look into the Past

103

104

105

106

107

108

109

1010

1970 1975 1980 1985 1990 1995 2000 2005 2010 20150.1

1

10

100

1000

10000

# t

ransi

stors

clock

speed (

MH

z)

Moore's law

# transistors

80088080

8086

8028680386

80486

Pentium

Pentium II

Pentium IIIPentium 4

Core 2

Core i7 (Nehalem)Core i7 (Sandy)

Core i7 (Haswell)clockspeed (MHz)

single-core era

5/23

Page 8: Preparing for the Future of Computingdelaat/news/2016-04-15/02-pieter-hij… · 8086 80286 80386 80486 Pentium Pentium II Pentium III Pentium 4 Core 2 Core i7 (Nehalem) Core i7 (Sandy)

Look into the Past

103

104

105

106

107

108

109

1010

1970 1975 1980 1985 1990 1995 2000 2005 2010 20150.1

1

10

100

1000

10000

# t

ransi

stors

clock

speed (

MH

z)

Moore's law

# transistors

80088080

8086

8028680386

80486

Pentium

Pentium II

Pentium IIIPentium 4

Core 2

Core i7 (Nehalem)Core i7 (Sandy)

Core i7 (Haswell)clockspeed (MHz)

lucky time

5/23

Page 9: Preparing for the Future of Computingdelaat/news/2016-04-15/02-pieter-hij… · 8086 80286 80386 80486 Pentium Pentium II Pentium III Pentium 4 Core 2 Core i7 (Nehalem) Core i7 (Sandy)

Look into the Past

103

104

105

106

107

108

109

1010

1970 1975 1980 1985 1990 1995 2000 2005 2010 20150.1

1

10

100

1000

10000

# t

ransi

stors

clock

speed (

MH

z)

Moore's law

# transistors

80088080

8086

8028680386

80486

Pentium

Pentium II

Pentium IIIPentium 4

Core 2

Core i7 (Nehalem)Core i7 (Sandy)

Core i7 (Haswell)clockspeed (MHz)

multi-core era

5/23

Page 10: Preparing for the Future of Computingdelaat/news/2016-04-15/02-pieter-hij… · 8086 80286 80386 80486 Pentium Pentium II Pentium III Pentium 4 Core 2 Core i7 (Nehalem) Core i7 (Sandy)

Processor types

• Single-core• Optimized for latency

• Multi-core• Still optimized for latency, butjust more than one

• Many-core• Optimized for throughput• High performance/Watt

ALU ALU

ALU ALU

Control

Cache

ALU ALU

ALU ALU

Control

Cache

ALU ALU

ALU ALU

Control

Cache

ALU ALU

ALU ALU

Control

Cache

ALU ALU

ALU ALU

Control

Cache

6/23

Page 11: Preparing for the Future of Computingdelaat/news/2016-04-15/02-pieter-hij… · 8086 80286 80386 80486 Pentium Pentium II Pentium III Pentium 4 Core 2 Core i7 (Nehalem) Core i7 (Sandy)

Performance per Watt

• 1972 - 2007:• 10,000 × higher performance• 300 × higher efficiency

• 2005 - 2015• 250 × higher performance• 10 × higher efficiency

7/23

Page 12: Preparing for the Future of Computingdelaat/news/2016-04-15/02-pieter-hij… · 8086 80286 80386 80486 Pentium Pentium II Pentium III Pentium 4 Core 2 Core i7 (Nehalem) Core i7 (Sandy)

Performance per Watt

0

5000

10000

15000

20000

25000

30000

35000

2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015

Performance and Power efficiency of #1 TOP500

Performance (TFLOPS)Efficiency (MFLOPS/W)

8/23

Page 13: Preparing for the Future of Computingdelaat/news/2016-04-15/02-pieter-hij… · 8086 80286 80386 80486 Pentium Pentium II Pentium III Pentium 4 Core 2 Core i7 (Nehalem) Core i7 (Sandy)

Many-core processors

features• throughput oriented• fast evolution of the architecture• architectural features for high performance

Difficult to program, especially for high-performance

9/23

Page 14: Preparing for the Future of Computingdelaat/news/2016-04-15/02-pieter-hij… · 8086 80286 80386 80486 Pentium Pentium II Pentium III Pentium 4 Core 2 Core i7 (Nehalem) Core i7 (Sandy)

Walls

• memory wall• energy wall

resulthardware without compromises to the interface toprogrammers → difficult to program →

• programming wall

10/23

Page 15: Preparing for the Future of Computingdelaat/news/2016-04-15/02-pieter-hij… · 8086 80286 80386 80486 Pentium Pentium II Pentium III Pentium 4 Core 2 Core i7 (Nehalem) Core i7 (Sandy)

Recap

To deal with energy problems hardware will be:• highly parallel• throughput oriented• architectural details for performance• difficult to program

ResultMore responsibility for software developers

11/23

Page 16: Preparing for the Future of Computingdelaat/news/2016-04-15/02-pieter-hij… · 8086 80286 80386 80486 Pentium Pentium II Pentium III Pentium 4 Core 2 Core i7 (Nehalem) Core i7 (Sandy)

Soft-/hardware

Traditional view• software is soft, we can change it easily• hardware is hard, we cannot change it

New view• software is expensive, big, long-lived and therefore hard,changing it is difficult

• hardware is inexpensive, short-lived and therefore soft,changing hardware is easy

12/23

Page 17: Preparing for the Future of Computingdelaat/news/2016-04-15/02-pieter-hij… · 8086 80286 80386 80486 Pentium Pentium II Pentium III Pentium 4 Core 2 Core i7 (Nehalem) Core i7 (Sandy)

Soft-/hardware

Traditional view• software is soft, we can change it easily• hardware is hard, we cannot change it

New view• software is expensive, big, long-lived and therefore hard,changing it is difficult

• hardware is inexpensive, short-lived and therefore soft,changing hardware is easy

12/23

Page 18: Preparing for the Future of Computingdelaat/news/2016-04-15/02-pieter-hij… · 8086 80286 80386 80486 Pentium Pentium II Pentium III Pentium 4 Core 2 Core i7 (Nehalem) Core i7 (Sandy)

Soft-/hardware

Traditional view• software receives less attention• hardware receives much attention• compiler can solve your performance problems

New view• software should gain more recognition• to adapt to future hardware, software has to become softagain

• compiler cannot help you as much

13/23

Page 19: Preparing for the Future of Computingdelaat/news/2016-04-15/02-pieter-hij… · 8086 80286 80386 80486 Pentium Pentium II Pentium III Pentium 4 Core 2 Core i7 (Nehalem) Core i7 (Sandy)

Examples

Parallel Ocean Program

• written in ’90s• different context: memory cheap, compute expensive

Linpack

• we still evaluate supercomputers with the same software• not representative anymore

14/23

Page 20: Preparing for the Future of Computingdelaat/news/2016-04-15/02-pieter-hij… · 8086 80286 80386 80486 Pentium Pentium II Pentium III Pentium 4 Core 2 Core i7 (Nehalem) Core i7 (Sandy)

How to prepare for this?

Abstractions• Programmer uses high-level abstractions• Compiler optimizes and translates to lower levels forcomputer

• Hardware extracts (limited) parallelism

This works reasonably well with:• a stable interface to hardware• limited parallelism on a processor

15/23

Page 21: Preparing for the Future of Computingdelaat/news/2016-04-15/02-pieter-hij… · 8086 80286 80386 80486 Pentium Pentium II Pentium III Pentium 4 Core 2 Core i7 (Nehalem) Core i7 (Sandy)

How to prepare for this?

In the many-core era?

• Interface to hardware not stable• Highly parallel, parallelism is explicit• Compilers are limited and cannot adapt fast enough

Abstractions• Also hide the architectural details necessary forperformance/efficiency

16/23

Page 22: Preparing for the Future of Computingdelaat/news/2016-04-15/02-pieter-hij… · 8086 80286 80386 80486 Pentium Pentium II Pentium III Pentium 4 Core 2 Core i7 (Nehalem) Core i7 (Sandy)

Programming

A program is an algorithm mapped to hardware

Program

Algorithm

Mapping

Hardware

SolutionIncorporate hardware descriptions in the programming model

17/23

Page 23: Preparing for the Future of Computingdelaat/news/2016-04-15/02-pieter-hij… · 8086 80286 80386 80486 Pentium Pentium II Pentium III Pentium 4 Core 2 Core i7 (Nehalem) Core i7 (Sandy)

Hierarchy of hardware descriptions

perfect

mic gpu

nvidia amd

fermi

xeon phi

gtx480

kepler

gtx680

controlperformance

portability

18/23

Page 24: Preparing for the Future of Computingdelaat/news/2016-04-15/02-pieter-hij… · 8086 80286 80386 80486 Pentium Pentium II Pentium III Pentium 4 Core 2 Core i7 (Nehalem) Core i7 (Sandy)

Role compiler: advisory

Compilers

• conservative• no application knowledge

Programmers

• application knowledge• can oversee correctness problems better

19/23

Page 25: Preparing for the Future of Computingdelaat/news/2016-04-15/02-pieter-hij… · 8086 80286 80386 80486 Pentium Pentium II Pentium III Pentium 4 Core 2 Core i7 (Nehalem) Core i7 (Sandy)

Optimization process recorded

perfect

mic gpu

nvidia amd

fermi

xeon phi

gtx480

kepler

gtx680

89 GFLOPSv1: 100 GFLOPSv2: 92 GFLOPSv3: 205 GFLOPS

205 GFLOPSv1: 494 GFLOPS

89 GFLOPS

FeedbackUsing 1/8 blocks per smp. Reduce the amount of sharedmemory used by storing/loading shared memory in phases

20/23

Page 26: Preparing for the Future of Computingdelaat/news/2016-04-15/02-pieter-hij… · 8086 80286 80386 80486 Pentium Pentium II Pentium III Pentium 4 Core 2 Core i7 (Nehalem) Core i7 (Sandy)

Making software soft again

MarverSemi-automatically transform legacy programs

• Ben van Werkhovenroot

POP_CommInitMessageEnvironment POP_Initialize

get_timertimer_start

step

timer_stop

override_time_flag

output_driver

exit_POP

timer_print_all

POP_ErrorPrint

POP_Initialize1

POP_ErrorSet

POP_Initialize_coupling POP_Initialize2

broadcast_array

POP_HaloUpdate

state

tavg_set_flag

float_global_to_local

set_surface_forcing

time_manager

tavg_forcing

passive_tracers_tavg_sflux movie_forcing diag_init_sums

dhdt

POP_Barrierbaroclinic_driver barotropic_driver

baroclinic_correct_adjust diag_global_preupdate

float_move_driver gather_float_fields

calc_float_field_values

diag_global_afterupdate

diag_print

diag_transport

error_check_time_flag

write_historywrite_moviewrite_restart write_tavgwrite_float_asciiwrite_float_netcdf

exit_message_environment abort_message_environment

timer_print

pop_init_phase1pop_init_coupler_comm cpl_interface_ibufRecvpop_send_to_coupler pop_init_phase2

init_communicate init_registry

init_io

init_context

POP_IOUnitsFlush

init_constants init_timers init_global_reductions

init_domain_blocks

init_grid1 init_domain_distribution

init_grid2

init_time1

init_state

init_topostress

init_horizontal_mix

init_tidal_mixinginit_vertical_mix POP_SolversInit

init_pressure_grad

init_baroclinic

init_barotropic

init_surface_hgt

init_prognostic

init_ice

init_ts

init_time2

init_forcing

pop_init_coupled

reset_registry_failure_count

broadcast_scalar

date_and_time shr_file_setIO

register_string

print_messagecreate_blocks

POP_BlocksCreate

horiz_grid_internalread_horiz_grid topography_internalread_topography create_local_block_ids

cf_area_avgcalc_tpointsvert_grid_internal

read_vert_grid

remove_pointssmooth_topography

read_bottom_cell

landmasks

area_masks

get_unit release_unit

init_time_flag

reset_switchesdocument

test_timestep

prior_days

init_state_coeffs

topo_stress_internal

read_ustar

init_del2u init_del4u

init_aniso

init_del2t init_del4t

init_gm

init_submeso

define_tavg_field

data_set destroy_io_field

init_vmix_const init_vmix_rich

init_vmix_kpp POP_ConfigOpen POP_ConfigRead POP_ConfigClosePOP_DistributionGet POP_RedistributeBlocks

define_movie_field define_float_field

destroy_file

read_restart

ymd2eday int_to_char

registry_err_check

leap_adjust

get_tday reduce_months

date2ymd

write_time_manager_options

init_ws init_shfinit_sfwf init_pt_interior init_s_interior init_ap

flushm

scatter_global

compute_dz

document_time_flagset_switches

hms unesco potem lsqsl2

smooth_topo2open_read_binary open_read_netcdfopen_binary open_netcdfclose_binary close_netcdfdefine_field_binarydefine_field_netcdfwrite_field_binary write_field_netcdfread_field_binary read_field_netcdf

add_attrib_file check_statusadd_attrib_io_fieldcheck_file_openwrite_array gather_globalread_array

compute_ccsm_var_viscosity

read_var_viscositywrite_var_viscosity

ugrid_to_tgrid

extract_attrib_file div find_forcing_timesget_forcing_filename find_interp_time echo_forcing_options

init_passive_tracers

init_advection

init_sw_absorption

pop_init_partially_coupled

init_qflux

init_diagnostics

init_outputinit_steptrap_registry_failure

POP_check

document_constants

set_tracer_indices

iage_init

dye_init

named_field_get_indexsw_absorb_frac set_chl_trn

ccsm_date_stamp

init_restartinit_history init_movie

init_tavg

init_floats

rest_read_tracer_blockfile_read_tracer_block

dye_init_sfluxdye_init_movie

read_field define_hist_field request_hist_field request_movie_fieldrequest_tavg_field

time_stampeval_time_flag

read_tavg

request_float_field

read_float_ascii read_float_netcdf

tavg_global

check

get_pt_interior_data get_s_interior_dataset_wsset_shfset_sfwf set_combined_forcing

tfreez

set_ap set_sflux_passive_tracersset_chl

reset_time_flags reduce_seconds

model_date

eval_time_flags

accumulate_tavg_field

tavg_coupled_forcing

update_movie_field

tgrid_to_ugrid

comp_flux_vel_ghost

vmix_coeffs

tracer_update tavg_passive_tracers movie_passive_tracersupdate_float_buffer

impvmixt

zcurl

clinic

impvmixu

cfl_check

grad

POP_SolversDiagonal POP_SolversRun

increment_tlast_ice impvmixt_correct

convad

ice_formation reset_passive_tracers

calc_float_weightswcalc float_move

calc_float_xyz_position float_field_value_T float_field_value_U_vec

float_field_value_U

POP_SolversGetDiagnostics

update_forcing_datainterpolate_forcing calc_shf_barnier_restoringcalc_shf_bulk_ncepcalc_shf_partially_coupled calc_shf_normal_yearcalc_sfwf_bulk_ncepcalc_sfwf_partially_coupled ms_balancing dye_set_sfluxnamed_field_get

interp_4pt

det

ocean_weights sen_lat_fluxprecip_adjustmenttmelt

reset_time_flag

ymd_hms

comp_flux_vel vmix_coeffs_const

vmix_coeffs_rich vmix_coeffs_kppcfl_vdiff

hdifft

advt vdifft

set_pt_interior set_s_interior set_interior_passive_tracersadd_kpp_sources

add_sw_absorb

dye_movie

advu gradphdiffu vdiffu

zero_ghost_cells

buoydiffri_iwmix ddmixbldepthblmix

sw_trans_chlwscale smooth_hblt

hdifft_del2hdifft_del4

hdifft_gm

advt_lw_limadvt_upwind3 advt_centeredcfl_advect

iage_set_interiordye_set_interior

cfl_hdiff

transition_layerkappa_lon_lat_vmhs kappa_lon_lat_hdgrkappa_lon_lat_dradius buoyancy_frequency_dependent_profilekappa_eg merged_streamfunction apply_vertical_profile_to_isop_hor_diff submeso

lw_limhupw3

hdiffu_del2hdiffu_del4 hdiffu_aniso

pcg ChronGear

btropOperator preconditioner

iage_reset dye_reset

create_suffix_hist create_suffix_movie_ccsmcreate_suffix_movie create_restart_suffix_ccsmcreate_restart_suffix write_restart_passive_tracerscreate_suffix_tavgcreate_suffix_float

21/23

Page 27: Preparing for the Future of Computingdelaat/news/2016-04-15/02-pieter-hij… · 8086 80286 80386 80486 Pentium Pentium II Pentium III Pentium 4 Core 2 Core i7 (Nehalem) Core i7 (Sandy)

Overview of people involved

Henri BalHigh Performance Distributed Computing

Cees de LaatSystem and Network Engineering

Ana VarbanescuPerformance Prediction

Ben van WerkhovenApplications

Ismail El-HelwProgramming Models

Alessio ScloccoRadio Astronomy

MeProgramming Models

Merijn VerstraatenGraph Processing

Souley MadougouPerformance Prediction

Clemens GrelckHigh-level Programming

Raphael PossHard-/software Interface

Catalin CiobanuReconfigurable Computing

22/23

Page 28: Preparing for the Future of Computingdelaat/news/2016-04-15/02-pieter-hij… · 8086 80286 80386 80486 Pentium Pentium II Pentium III Pentium 4 Core 2 Core i7 (Nehalem) Core i7 (Sandy)

Conclusion

In coming years, software will have to adapt to hardware morethan previously: Software Crisis

10101010101010101010101010101010101010101010 01010 10101010 0 0100 1 1 101010 10 0 1 01 1 0 0 0101010101010101 0101010101010 101010101010

23/23