1
SPEEDUP AND POWER SCALING MODELS FOR HETEROGENEOUS MANY-CORE SYSTEMS Parallelization has been used to maintain a reasonable balance between energy consumption and performance in computing systems, especially modern multi- and many-core platforms. This University Booth presents the expansion of speedup and power scaling models to heterogeneous many-core systems. In this study, traditional speedup and power scaling models are expanded to cover a normal form of core heterogeneity, no longer restricting to existing homogeneous and the limited ‘asymmetric’ assumption of heterogeneity. The new models are validated through extensive experimentation with two systems covering two different types of core heterogeneity: 1. ARM big.LITTLE architecture with iso-ISA core performance and power heterogeneity. Functional execution equivalence is established at the instruction level. 2. Windows laptop with CPU+GPU+GPU architecture representing ISA-level heterogeneity. Functional execution equivalence is established at the workload item level. We will demonstrate the validity of the method by comparing our experimental results with model-calculated results, and with a software tool http://async.org.uk/prime/PER/ . [A. Rafiev, M. Al-Hayanni, F. Xia, R. Shafik, A. Romanovsky and A. Yakovlev, “Speedup and Power Scaling Models for Heterogeneous Many-Core Systems,” in IEEE Trans. Multi-Scale Comp. Syst.] ARCHON –AN ARCHITECTURE-OPEN RESOURCE-DRIVEN CROSS-LAYER MODELLING FRAMEWORK The analysis for extra-functional properties such as power and performance takes a critical role in the system design workflow. Hardware-software co-simulation is one of the commonly used ways to perform this type of analysis. However, with the modern development of many-core systems the problem of scalability is becoming a bottleneck for all analysis techniques including simulation, especially when a simple extrapolation from the single core results is unacceptable. For instance, complications with memory access affect the speedup and power of many-core systems as much as core heterogeneity does. This University Booth demonstrates ArchOn, https://github.com/ashurrafiev/ArchOn, a tool based on representing systems as resource graphs, which is layer-agnostic and facilitates selective abstraction, helping to minimize the complexity of simulations. A set of Networks-on-Chip and bus-based topologies will be presented as use case examples to demonstrate the usefulness of ArchOn in analyzing speedup and power for systems with both core heterogeneity and memory architecture heterogeneity. Figure 1 ArchOn in the system design process (left) and selective abstraction using ArchOn. Figure 2 Speedup and power comparisons of different memory architectures obtained using ArchOn. [A. Rafiev, F. Xia, A. Iliasov, A. Romanovsky, A. Yakovlev. “Selective Abstraction for Estimating Extra- Functional Properties in Networks-on-Chips Using ArchOn,” In ACSD2017, Zaragoza, Spain, 2017.] http://async.org.uk/

Model SPEEDUP AND POWER SCALING ODELS FOR ...A. Rafiev, M. Al-Hayanni, F. Xia, R. Shafik, A. Romanovsky and A. Yakovlev, “Speedup and Power Scaling Models for Heterogeneous Many-Core

Embed Size (px)

Citation preview

Run-Time Power, Temperature and Reliability Estimation using Performance Counters

Matthew Walker, Professor Bashir Al-Hashimi, Dr Geoff Merrett{mw9g09, gvm, bmah}@ecs.soton.ac.uk

Electronic and Software Systems Research Group, Electronics and Computer ScienceUniversity of Southampton

Run-Time Power Estimation• Performance counter events (e.g. L2 cache miss, branch

mis-prediction) correlate well with power consumption;• Emphasis on responsiveness of estimation (Figure 1);• Uses: Guiding run-time power saving techniques,

profiling software for energy efficiency, and hardware design exploration;

• Use model to extend gem5 for providing verified power estimations that can compute run-time power much faster than McPAT;

• Research will draw comparisons with McPAT.

References

[1] R. Bertran, M. Gonzelez, X. Martorell, N. Navarro, and E. Ayguade, “A systematic methodology to generate decomposable and responsive power models for cmps,” Computers, IEEE Transactions on, vol. 62, no. 7, pp. 1289–1302, 2013

R u n -T i m e Te m p e r a t u r e a n d Reliability Estimation

• Close relationship between the power density and temperature of a SoC;

• As semiconductor technology scales down in size, lifetime reliability decreases drastically;

• Processor rel iabil ity is influenced greatly by temperature;

• Can performance counters determine thermal cycling (TC) and therefore MMTF due to TC?;

• With run-time knowledge of temperature and reliability, a SoC can adapt its operation to improve lifetime reliability.

Development PlatformsFour development boards based around ARM SoCs are currently being used for experimentation:

PandaBoard

Objectives• Establish and understand performance counters and

features instate-of-the-art development boards• Experimentally build model for estimating power,

temperature and reliability using performance counters and operating system statistics

Workload

Find Correlation

Model (e.g. Linear Regression

PMCsPower

Time

Pow

er

ActualModel AModel B

Simplified experiment method

Figure 1. Graph demonstrating the importance of the "responsiveness" of the model. Model A can be considered "more accurate", but Model B is better at directing dynamic power management techniques. Adapted from [1].

ModelPerformance Counters E.g.:Cycle countL2 Cache missBranch mis-prediction

Power

Model

Performance Counters

Average Temperature

Model

Temperature Cycling

Mean-time-to-failure MTTF

Above: Power estimation

Right: Temperature and lifetime reliability estimation

ODOIRD-XU+E BeagleBoard-xM

ODROID-U2

Accessible Performance

Counters?

Temperature Sensors?

ODROID-XU No Per-coreBeagleBoard Yes NoPandaBoard No One (2 cores)ODROID U2 Yes One (4 cores)

SPEEDUPANDPOWERSCALINGMODELSFORHETEROGENEOUSMANY-CORESYSTEMSParallelization has been used to maintain a reasonable balance between energy consumption and performance in computing systems, especially modern multi- and many-core platforms. This University Booth presents the expansion of speedup and power scaling models to heterogeneous many-core systems.

In this study, traditional speedup and power scaling models are expanded to cover a normal form of core heterogeneity, no longer restricting to existing homogeneous and the limited ‘asymmetric’ assumption of heterogeneity. The new models are validated through extensive experimentation with two systems covering two different types of core heterogeneity:

1. ARM big.LITTLE architecture with iso-ISA core performance and power heterogeneity. Functional execution equivalence is established at the instruction level.

2. Windows laptop with CPU+GPU+GPU architecture representing ISA-level heterogeneity. Functional execution equivalence is established at the workload item level.

We will demonstrate the validity of the method by comparing our experimental results with model-calculated results, and with a software tool http://async.org.uk/prime/PER/.

[A. Rafiev, M. Al-Hayanni, F. Xia, R. Shafik, A. Romanovsky and A. Yakovlev, “Speedup and Power Scaling Models for Heterogeneous Many-Core Systems,” in IEEE Trans. Multi-Scale Comp. Syst.]

ARCHON–ANARCHITECTURE-OPENRESOURCE-DRIVENCROSS-LAYERMODELLINGFRAMEWORKThe analysis for extra-functional properties such as power and performance takes a critical role in the system design workflow. Hardware-software co-simulation is one of the commonly used ways to perform this type of analysis. However, with the modern development of many-core systems the problem of scalability is becoming a bottleneck for all analysis techniques including simulation, especially when a simple extrapolation from the single core results is unacceptable. For instance, complications with memory access affect the speedup and power of many-core systems as much as core heterogeneity does.

This University Booth demonstrates ArchOn, https://github.com/ashurrafiev/ArchOn, a tool based on representing systems as resource graphs, which is layer-agnostic and facilitates selective abstraction, helping to minimize the complexity of simulations.

A set of Networks-on-Chip and bus-based topologies will be presented as use case examples to demonstrate the usefulness of ArchOn in analyzing speedup and power for systems with both core heterogeneity and memory architecture heterogeneity.

Figure1ArchOninthesystemdesignprocess(left)andselectiveabstractionusingArchOn.

Figure2SpeedupandpowercomparisonsofdifferentmemoryarchitecturesobtainedusingArchOn.

[A. Rafiev, F. Xia, A. Iliasov, A. Romanovsky, A. Yakovlev. “Selective Abstraction for Estimating Extra-Functional Properties in Networks-on-Chips Using ArchOn,” In ACSD2017, Zaragoza, Spain, 2017.]

http://async.org.uk/