32
School of Software Engineering Under the guidance of the professor Xu Yanling June 29, 2015 Thesis proposal defense for professional Master Degree in Computer Science

Energy / Performance trade-offs for transactional memory applications with an adaptive thread mapping method

Embed Size (px)

Citation preview

Page 1: Energy / Performance trade-offs for transactional memory applications with an adaptive thread mapping method

School of Software EngineeringUnder the guidance of the professor Xu Yanling

June 29, 2015

Thesis proposal defense for professional Master Degree in Computer Science

Page 2: Energy / Performance trade-offs for transactional memory applications with an adaptive thread mapping method

Energy / Performance trade-offs for transactional memory applicationswith an adaptive thread mapping method

Justin Brottes - M.SSchool of Software Engineering

Tongji University

Page 3: Energy / Performance trade-offs for transactional memory applications with an adaptive thread mapping method

Agenda

● Introduction

○ Motivation

○ Scientific question

○ Research background

● Rationale for the study

● Literature Review

● Methodology

○ Hypotheses

○ Research design

Page 4: Energy / Performance trade-offs for transactional memory applications with an adaptive thread mapping method

Introduction

Page 5: Energy / Performance trade-offs for transactional memory applications with an adaptive thread mapping method

Research motivation

I’m experienced in low-level application development, elementary and object-oriented languages, 3D drawing techniques (raycasting, raytracing), Linux/Unix systems.

I’m really greedy about improving my knowledge in Parallel Computing, Application behaviour at a low level and several other things which are relevant of supercomputing field.

Justin BrottesM.S in Information Technologies at EPITECH (France)M.S in Software Engineering at Tongji University (China)

Page 6: Energy / Performance trade-offs for transactional memory applications with an adaptive thread mapping method

“ Parallel computing is a form of computation in which many calculations are carried out simultaneously, operating on the principle that large problems can often be divided into smaller ones, which are then solved concurrently ("in parallel"). ”

Parallel Computing

Page 7: Energy / Performance trade-offs for transactional memory applications with an adaptive thread mapping method

Parallel computations can be done on multiple levels such as instructions, branch targets, loops, execution traces, and subroutines. Instruction Level Parallelization (ILP) is a measure of how many instructions that can run in parallel.

ILP can be implemented using software (compiler) or hardware (pipelining)

Parallel Computations

Page 8: Energy / Performance trade-offs for transactional memory applications with an adaptive thread mapping method

Which issues exists about the parallel computing implementation ?

Page 9: Energy / Performance trade-offs for transactional memory applications with an adaptive thread mapping method

"The way the processor industry is going, is to add more and more cores, but nobody knows how to program those things. I mean, two, yeah; four, not really; eight, forget it."

- Steve Jobs Co-Founder, Apple.

Page 10: Energy / Performance trade-offs for transactional memory applications with an adaptive thread mapping method

"Redesigning your application to run multithreaded on a multicore machine is a little like learning to swim by jumping into the deep end."

- Herb Sutter Hair of the ISO C++ standards committee, Microsoft.

Page 11: Energy / Performance trade-offs for transactional memory applications with an adaptive thread mapping method

Scientific question

● Softwares do not benefit from the underlying computation resources and many of them require an immense effort and cost of rewriting and re-engineering

● The energy is increasingly becoming one of the most expensive resources and the most important cost item for running a large supercomputing solution

How could we optimize the performance and reduce the energy consumption for supercomputing applications ?

Page 12: Energy / Performance trade-offs for transactional memory applications with an adaptive thread mapping method

Main methods to develop parallel computing solutions

Research background

Page 13: Energy / Performance trade-offs for transactional memory applications with an adaptive thread mapping method

A mutex is a lockable object that is designed to signal when critical sections of code need exclusive access. It prevents other threads with the same protection from executing concurrently and access the same memory locations.

What is a Mutual Exclusion (Mutex) ?

Page 14: Energy / Performance trade-offs for transactional memory applications with an adaptive thread mapping method

● Mutex have property that only owner can release it

● Mutexes can cause deadlocks, priority inversion if they are not handled properly

● If synchronization between threads is needed, mutexs are costly

Mutex drawbacks

Page 15: Energy / Performance trade-offs for transactional memory applications with an adaptive thread mapping method

TM is intended to simplify parallel programming, specifically for accessing shared data across multiple threads. It is a sequence of memory operations that either execute completely (commit) or have no effect (abort).

An “all or nothing” sequence of operation :

● On commit, all memory operations appear to take effect as a unit (all at once)

● On abort, none of the stores appear to take effect

Transactions run in isolation :

● Effects stores are not visible until transaction commits

● No concurrent conflicting accesses by other transactions

What is the Transactional Memory ?

Page 16: Energy / Performance trade-offs for transactional memory applications with an adaptive thread mapping method

● The performance degradation that can be experienced when applications run with a non-­optimal concurrency level

● It also cause a huge increase of energy consumption

TM drawbacks

Page 17: Energy / Performance trade-offs for transactional memory applications with an adaptive thread mapping method

Rationale for the study

Page 18: Energy / Performance trade-offs for transactional memory applications with an adaptive thread mapping method

1. Understand how works the performance of TM on multicore platforms. I first take a deeper look on the impacts of Software TM systems on the performance of TM applications.

2. Propose an existing but extended approach to improve the performance of TM applications through the exploitation of the memory hierarchy of modern multicore platforms.

3. Extend the aforementioned approach to predict and apply suitable thread mapping strategies for TM applications introduced by several researcher in order to propose a evolution

Step by step

Page 19: Energy / Performance trade-offs for transactional memory applications with an adaptive thread mapping method

Literature Review

Page 20: Energy / Performance trade-offs for transactional memory applications with an adaptive thread mapping method

Beginning to

2000s

2000s to

2007

2007 to

2011

2012 to

Nowadays

Evolutions & periods

Page 21: Energy / Performance trade-offs for transactional memory applications with an adaptive thread mapping method

Beginning to 2000s

● Transactional memory programming paradigm established by Herlihy and Moss start to replace mutexs

● Transactional memory research field was quite restricted to several pioneers researchers or supercomputing firms

● Works were highly focused on methods development and virtualisation way of TM

● All theoretical concepts were implemented in several projects between very early in 1995 and 1997, then it follows a long period of standards definition.

Page 22: Energy / Performance trade-offs for transactional memory applications with an adaptive thread mapping method

2000s to 2007 : STM evolution

● STM design has come a long way since the first STM algorithm by Shavit and Touitou appears which provided a non-blocking implementation of static transactions

● Many experimentations were done to develop newer solutions and comparisons with mutexs

Page 23: Energy / Performance trade-offs for transactional memory applications with an adaptive thread mapping method

2007 to 2011 : Late use of HTM

● Many researches conclude that TM has not yet gain the necessary maturation in order to present a compelling value proposition that will trigger its widespread adoption

● After several years of active research a, there is a lack of mentions in the research literature of large-scale applications that make use of TM

● The apparition of real HTM improvement who shows that hardware optimisation is giving more encouragement results.

● it is only in 2007 that the first hardware implementations of transactional memory was developed by Sun Microsystems.

Page 24: Energy / Performance trade-offs for transactional memory applications with an adaptive thread mapping method

2012 to Nowadays : Automatisation & Optimisation

● Research focus have changed target performances instead of methodologies

● Main studies rely on the automated thread mapping using different approaches

● An adaptive software basically uses available information about changes in its environment to improve its behavior over time (Machine Learning)

● The issue of reducing energy consumption in high performance multiprocessor systems is also quickly becoming urgent

Page 25: Energy / Performance trade-offs for transactional memory applications with an adaptive thread mapping method

My study position in the actual research field

● My research take place in the software part of TM.

● Software level optimisation is open to a wide range of developers. It gives them an alternative instead of hardware optimisation which could be more restrictive due to the costs of systems

● It is truly important to give more performance at the software level, it will in all cases help to have better applications behaviour

Page 26: Energy / Performance trade-offs for transactional memory applications with an adaptive thread mapping method

Methodology

Page 27: Energy / Performance trade-offs for transactional memory applications with an adaptive thread mapping method

Hypotheses

● The performance of TM applications can be improved if we match its characteristics to the underlying multicore platform

● The impacts of applying thread mapping on TM applications has been explored many times and proves its efficiency. I want to confirm these intuitions and propose an approach capable of predicting suitable thread mappings

● I need to find an interesting metric that could be added to balance the energy consumption because it is really complicated to deduce a viable results

Page 28: Energy / Performance trade-offs for transactional memory applications with an adaptive thread mapping method

Research design : Requirements

● The main approach for this research is to use a compiler which includes all STM functionalities (C/C++)

Page 29: Energy / Performance trade-offs for transactional memory applications with an adaptive thread mapping method

Research design : Difficulties

The increased complexity in the development of parallel programs can be eased up by a good understanding of the effective application behavior in its specific hardware and software execution contexts.

There are basically two main approaches to achieve this goal :

● Execution analysis : collects runtime information about the application behavior and uses such information to perform some action at runtime.

● Post-execution analysis : the collected runtime information is recorded in a detailed log (trace file) for later analysis.

Page 30: Energy / Performance trade-offs for transactional memory applications with an adaptive thread mapping method

Research design : Steps

There is three main step in the research method :

1. I need to understand better the parallel application behaviour, I need to realise many test and benchmark to collect data and analyse them.

2. ML based approach : the learning phase. The learning phase is subdivided in the following three major steps: application profiling, data pre-processing and learning process.

3. The last step will consist to implement the system as an extension of the system chosen and to run tests to confirm the efficiency of the current method.

Page 31: Energy / Performance trade-offs for transactional memory applications with an adaptive thread mapping method

Thank you.

Page 32: Energy / Performance trade-offs for transactional memory applications with an adaptive thread mapping method

Do you have some questions ?