Upload
bedros
View
49
Download
2
Embed Size (px)
DESCRIPTION
Stamatis Vassiliadis Symposium The Future of Computing A+A=A. Mateo Valero Barcelona Supercomputing Center. To Stamatis, my loved friend. The way we all do research ... As seen from HPCA 1999. Microarchitecture idea. Applications. SPEC, PerfectClub, TPC-D, NAS, Splash …. Compiler. - PowerPoint PPT Presentation
Citation preview
1 A+A=A Stamatis Vassiliadis Symposium
Stamatis Vassiliadis SymposiumThe Future of Computing
A+A=A
Mateo Valero
Barcelona Supercomputing Center
To Stamatis,my loved friend
2 A+A=A Stamatis Vassiliadis Symposium
The way we all do research ... As seen from HPCA 1999
• Microarchitecture idea
Applications
Compiler
Simulator
Results
SPEC, PerfectClub, TPC-D, NAS, Splash …
Production, public, custom, …
Public, custom, …
How much we get from our idea
3 A+A=A Stamatis Vassiliadis Symposium
The Past Future ... As seen from HPCA 1999
Algorithms
Compiler
Architecture
Hardware
Applications
Absolutely obsessed with going to
the limits of extracting available ILP on a single core
4 A+A=A Stamatis Vassiliadis Symposium
The Past Future Continued:Advanced ILP Techniques for Superscalar Processors
• Optimized Pipeline
• Cache
• Branch Predictors
• Instruction Collapsing
• Value Prediction
• Reuse
• Assisted/Subordinated Threads
• Trace Cache/Processor
• Control/Data Speculation
• Kilo-instruction Processors
• ………
5 A+A=A Stamatis Vassiliadis Symposium
Distant Parallelism: Non-numerical applications
• (In)Dependent threads: e.g. m88ksim
• Application speed-up: 2.65
check_issue kill_time
Real_execution
breakpoint?
PC guess breakpoint? fetch_next
statistics
cmmutime
Sbus2
TIMING
EXE
FETCH
6 A+A=A Stamatis Vassiliadis Symposium
The “immediate” future: Number of cores doubled every 18 months
“It is better for Intel to get involved in this now so when we get to the point of having 10s and 100s of cores we will have the answers.
There is a lot of architecture work to do to release the potential, and we will not bring these products to market until we have good solutions to the programming problem”
Justin Rattner Intel CTO
“Now, the grains inside these machines more and more will be multi-core type devices, and so the idea of parallelization won't just be at the individual chip level, even inside that chip we need to explore new techniques like transactional memory that will allow us to get the full benefit of all those transistors and map that into higher and higher performance.” Bill Gates, Supercomputing 05 keynote
Marenostrum
Most beautiful supercomputerFortune magazine, Sept. 2006
#1 in Europe, #5 in the World
100's of TeraFlops with general purpose Linux supercluster of commodity PowerPC-based Blade Servers
7 A+A=A Stamatis Vassiliadis Symposium
Supercomputers will likely have millions of processing cores
8 A+A=A Stamatis Vassiliadis Symposium
The “far” future (e.g. 2017) and The big question!
How to solve the programming problem? a.k.a. How to program the beast?
• How to enable the power of the hundreds to millions of cores on a system?
• Computer Architects must adapt their thinking. From now on, parallel software requirements will directly drive systems design
• We need a multidisciplinary top-down approach to this, including
• Applications
• Algorithms
• Debugging
• Programming models
• Programming languages
• Compilers
• Operating Systems
• Runtime environment
… as design drivers for future Architectures
9 A+A=A Stamatis Vassiliadis Symposium
The holistic view: A + A = A
How to solve the programming problem? a.k.a. How to program the beast?
• How to enable the power of the hundreds to millions of cores on a system?
• Computer Architects must adapt their thinking. From now on, many-core software requirements will directly drive processor design
• We need a multidisciplinary top-down approach to this, including
• Applications
• Algorithms
• Debugging
• Programming models
• Programming languages
• Compilers
• Operating Systems
• Runtime environment
… as design drivers
Applirithms +
Adhesive=
Architecture
10 A+A=A Stamatis Vassiliadis Symposium
Far Future: Applications
• What will be the typical applications in 2017?
• Is it Dwarfs and/versus RMS the right path to follow?
• Applications are ephemeral but the kernels are forever: the applications may change, the kernels stay the same.
• Will streaming applications require new architectures?
• Are we approaching the special purpose accelerators for specific applications?
M. Valero. Microsoft Workshop on Multicore, Seattle, June-2007
11 A+A=A Stamatis Vassiliadis Symposium
Far Future: Algorithms
• Bad news (for some folks): “Rethink and rewrite the algorithms”
• For manycores, the algorithms need to carefully consider:
• The right level of parallelism
• Load Balancing
• Communication-Computation overlapping
• Speculation (e.g. in message passing)
Source: Jack Dongarra Microsoft Workshop on Multicore, Seattle, June-2007
12 A+A=A Stamatis Vassiliadis Symposium
Top-Down CMP Design, an initial programmer wishlist
• Easy-to-express paralellism
• Transactional Memory (TM): Compared to locks, TM provides an easy to use mechanism for ensuring mutual exclusion
• Hide all kind of non-uniformities to the programmer (heterogeneous cores, non-uniform memory access, …)
• Continue using standard tools
• OpenMP: the industry standard for writing parallel programs on shared memory
• TM and OpenMP combines ease with familiarity for programming multi-cores
• BSC-UPC-Microsoft: IWOMP07, MEDEA07
• Stanford: PACT07
• Dataflow model ideally suited to express paralelism
• Cell Superscalar = Distant Parallelism+Data Flow+ Out of Order Execution
• Super computers: MPI+ (OpenMP/Cell Superscalar)+TM))
13 A+A=A Stamatis Vassiliadis Symposium
Chip organization in 2017: many-core
• How many cores will the processor of 2017 have?
• Will they be homogeneous or heterogeneous?. Arrays of simple in order cores, fewer complex out of order or a mix of the two? Consentry and Internet Security
• Simultaneous Multithreading is just for servers?
• Should we push for further optimizing classical OoO implementations or research how to put into practical use radical new approaches such as dataflow or asynchronous architectures?
Mem
ory
Mem
ory
Cac
heC
ache
Cac
heC
acheOn-
die
Inte
rcon
nect
Cac
heC
ache
Cac
heC
acheOn-
die
Inte
rcon
nect
Off-die Interconnect MemoryMemory
Microsoft Workshop on Multicore, Seattle, June-2007
14 A+A=A Stamatis Vassiliadis Symposium
Chip organization in 2017: memory and interconnection network
• How will the latency and bandwidth problems be addressed?
• 3D integration aware Computer Architecture: it is a great future idea. Will it will always be a great future idea?
• What is the best many-core interconnect topology?
• How we can evaluate the importance of the interconnection network in the applications?
• What are the obstacles that are presented for parallel applications when I/O doesn't scale well?
Microsoft Workshop on Multicore, Seattle, June-2007
15 A+A=A Stamatis Vassiliadis Symposium
App
licat
ions
Architecture
Transactional Memory
STM HTM
Func
tion
alIm
pera
tiveP
rogr
amm
ing
mod
el
An overall picture of the Microsoft Many-core project
• Programming models for futuremany-core architectures
• Architectural support to programmingmodels
• OpenMP+TM
• HW acceleration for Haskell
• Many-core architecture
• Power-aware
16 A+A=A Stamatis Vassiliadis Symposium
An overall picture of the IBM MareIncognito project
• Our 10-100 Petaflop research project for BSC (2010)
• Port/develop applications to reduce time-to-production once installed
• Programming models (MPI, OpenMP+TM, CellSs)
• Tools for application developmentand to support previous evaluations
• Evaluate node architecture (heavily multicored)
• Evaluate interconnect optionsPerformance analysis and
PredictionTools
Processor and node
Load balancing
Interconnect
Applicationdevelopment
an tuning
Fine-grain programming
models
Model andprototype
17 A+A=A Stamatis Vassiliadis Symposium
Supercomputing and e-Science Consolider program
• 5 Grand Challenge applications• 22 groups• 119 senior researchers
Strong interaction
Interaction to be created
Earth Sciences
Astrophysics
Engineering
Material Sciences
Life SciencesCompilers and
tuning of application kernels
Programming models and performance tuning tools
Architecturesand hardwaretechnologies
18 A+A=A Stamatis Vassiliadis Symposium
Education for multi-core
I programming
multicores
Multicore-based pacifier