27
CPR: Composable Performance Regression for Scalable Multiprocessor Models Benjamin C. Lee Computer Architecture Group Microsoft Research Jamison Collins, Hong Wang Microarchitecture Research Lab Intel Corporation David Brooks Engineering and Applied Sciences Harvard University International Symposium on Microarchitecture 11 November 2008 Benjamin C. Lee, et al 1 :: MICRO :: 11 Nov 08

CPR: Composable Performance Regression for Scalable ... · CPR: Composable Performance Regression for Scalable Multiprocessor Models Benjamin C. Lee Computer Architecture Group Microsoft

  • Upload
    others

  • View
    39

  • Download
    0

Embed Size (px)

Citation preview

Page 1: CPR: Composable Performance Regression for Scalable ... · CPR: Composable Performance Regression for Scalable Multiprocessor Models Benjamin C. Lee Computer Architecture Group Microsoft

CPR: Composable Performance Regressionfor Scalable Multiprocessor Models

Benjamin C. LeeComputer Architecture Group

Microsoft Research

Jamison Collins, Hong WangMicroarchitecture Research Lab

Intel Corporation

David BrooksEngineering and Applied Sciences

Harvard University

International Symposium on Microarchitecture11 November 2008

Benjamin C. Lee, et al 1 :: MICRO :: 11 Nov 08

Page 2: CPR: Composable Performance Regression for Scalable ... · CPR: Composable Performance Regression for Scalable Multiprocessor Models Benjamin C. Lee Computer Architecture Group Microsoft

MotivationUniprocessor

Multiprocessor

Technology TrendsSimulation ChallengesSimulation Paradigm

Technology TrendsMoore’s Law and increasing transistor densities

Performance and power efficiency

Transition to multi-core and parallelism

Benjamin C. Lee, et al 2 :: MICRO :: 11 Nov 08

Page 3: CPR: Composable Performance Regression for Scalable ... · CPR: Composable Performance Regression for Scalable Multiprocessor Models Benjamin C. Lee Computer Architecture Group Microsoft

MotivationUniprocessor

Multiprocessor

Technology TrendsSimulation ChallengesSimulation Paradigm

Simulation Challenges

Cycle-Accurate SimulationAccurately identifies trends in design spaceTracks instructions’ progress through microprocessor

Simulation CostsCosts per simulation :: minutes, hours per designNumber of simulations :: scales exponentially (mp)

p,m :: parameter count, resolutionCosts per simulation :: scales superlinearity (nγ)

n cores, γ > 1

Benjamin C. Lee, et al 3 :: MICRO :: 11 Nov 08

Page 4: CPR: Composable Performance Regression for Scalable ... · CPR: Composable Performance Regression for Scalable Multiprocessor Models Benjamin C. Lee Computer Architecture Group Microsoft

MotivationUniprocessor

Multiprocessor

Technology TrendsSimulation ChallengesSimulation Paradigm

Statistical InferenceConstruct inferential models from samples [ASPLOS’06]

Use models as efficient surrogates for simulator

Benjamin C. Lee, et al 4 :: MICRO :: 11 Nov 08

Page 5: CPR: Composable Performance Regression for Scalable ... · CPR: Composable Performance Regression for Scalable Multiprocessor Models Benjamin C. Lee Computer Architecture Group Microsoft

MotivationUniprocessor

Multiprocessor

Technology TrendsSimulation ChallengesSimulation Paradigm

Multiprocessor Inference

Expensive CMP SimulationsPhysical resource contention increases host cyclesLogical resource contention increases simulated cyclesSynchronization increases cost per simulated cycle

Composable Performance RegressionLeverage core models to minimize CMP simulationsCore :: Uniprocessor performanceContention :: Shared resource contentionPenalty :: Performance penalty from contention

Benjamin C. Lee, et al 5 :: MICRO :: 11 Nov 08

Page 6: CPR: Composable Performance Regression for Scalable ... · CPR: Composable Performance Regression for Scalable Multiprocessor Models Benjamin C. Lee Computer Architecture Group Microsoft

MotivationUniprocessor

Multiprocessor

Regression TheoryModel EvaluationEvolutionary Design

Outline

MotivationTechnology TrendsSimulation ChallengesSimulation Paradigm

UniprocessorRegression TheoryModel EvaluationEvolutionary Design

MultiprocessorCPRModel EvaluationScalability

Benjamin C. Lee, et al 6 :: MICRO :: 11 Nov 08

Page 7: CPR: Composable Performance Regression for Scalable ... · CPR: Composable Performance Regression for Scalable Multiprocessor Models Benjamin C. Lee Computer Architecture Group Microsoft

MotivationUniprocessor

Multiprocessor

Regression TheoryModel EvaluationEvolutionary Design

Outline

MotivationTechnology TrendsSimulation ChallengesSimulation Paradigm

UniprocessorRegression TheoryModel EvaluationEvolutionary Design

MultiprocessorCPRModel EvaluationScalability

Benjamin C. Lee, et al 7 :: MICRO :: 11 Nov 08

Page 8: CPR: Composable Performance Regression for Scalable ... · CPR: Composable Performance Regression for Scalable Multiprocessor Models Benjamin C. Lee Computer Architecture Group Microsoft

MotivationUniprocessor

Multiprocessor

Regression TheoryModel EvaluationEvolutionary Design

Regression Theory

Statistical InferenceModels relationships between dataRequires initial data to train, formulate modelLeverages correlation from initial data for prediction

Regression ModelsLow training costs (sample 300 from 4.3B designs)Accurate inference (1.5% median error)Efficient computation (100’s of predictions per second)

Benjamin C. Lee, et al 8 :: MICRO :: 11 Nov 08

Page 9: CPR: Composable Performance Regression for Scalable ... · CPR: Composable Performance Regression for Scalable Multiprocessor Models Benjamin C. Lee Computer Architecture Group Microsoft

MotivationUniprocessor

Multiprocessor

Regression TheoryModel EvaluationEvolutionary Design

Formulationn simulated design samples, p design parameters

Response :: Y design metrics (e.g., performance)

Predictor :: X design parameters (e.g., ROB, cache)

Y =

y1...

yn

X =

x11 . . . x1p...

. . ....

xn1 . . . xnp

Coefficients :: β = [β0, . . . , βp]

T

Errors :: ε = [ε1, . . . , εn]T where εi ∼ N(0, σ2)

Y = Xβ + ε

F(Y) = G(X)βG + ε

Benjamin C. Lee, et al 9 :: MICRO :: 11 Nov 08

Page 10: CPR: Composable Performance Regression for Scalable ... · CPR: Composable Performance Regression for Scalable Multiprocessor Models Benjamin C. Lee Computer Architecture Group Microsoft

MotivationUniprocessor

Multiprocessor

Regression TheoryModel EvaluationEvolutionary Design

Prediction

Requirementsβ known from least squares model trainingX known for a given set of queries

Expected ResponseResponse as weighted sum of predictor valuesComputed efficiently as matrix-vector product

E[Y] = E[Xβ + ε]= E[Xβ] + E[ε]= Xβ

Benjamin C. Lee, et al 10 :: MICRO :: 11 Nov 08

Page 11: CPR: Composable Performance Regression for Scalable ... · CPR: Composable Performance Regression for Scalable Multiprocessor Models Benjamin C. Lee Computer Architecture Group Microsoft

MotivationUniprocessor

Multiprocessor

Regression TheoryModel EvaluationEvolutionary Design

Experimental Methodology

Intel Product SimulatorsModels consecutive generations of x86 µ-archSupports dual-, quad-core architectures.

Sampling Uniformly at Random (UAR)Parameter space includes predictors, ROB, caches15 parameters, 4.3B designsSample 300 designs for simulation

Statistical FrameworkR :: software environment for statistical computingHmisc and Design packages [Harrell]

Benjamin C. Lee, et al 11 :: MICRO :: 11 Nov 08

Page 12: CPR: Composable Performance Regression for Scalable ... · CPR: Composable Performance Regression for Scalable Multiprocessor Models Benjamin C. Lee Computer Architecture Group Microsoft

MotivationUniprocessor

Multiprocessor

Regression TheoryModel EvaluationEvolutionary Design

BenchmarksDigital Home1 audio audio conversion2 video video compression3 photo photoshop albumGames4 unreal Unreal Tournament5 halflife Half-Life, modified Quake engine6 homeworld Homeworld, three-dimensional movementMultimedia7 mentalray rendering, ray tracing8 painter raster graphics package9 tachyon ray tracingOffice10 outlook personal information manager11 access relational database management system12 excel spreadsheet applicationProductivity13 md2 OpenSSL cryptographic hash function14 encrypt file encryption15 flash multimedia playerServer16 specweb web server17 tpcc on-line transaction processing18 specjapp J2EE 1.3 application servers

Benjamin C. Lee, et al 12 :: MICRO :: 11 Nov 08

Page 13: CPR: Composable Performance Regression for Scalable ... · CPR: Composable Performance Regression for Scalable Multiprocessor Models Benjamin C. Lee Computer Architecture Group Microsoft

MotivationUniprocessor

Multiprocessor

Regression TheoryModel EvaluationEvolutionary Design

Uniprocessor Model AccuracyObtain 50 additional random samples for validation

Core :: 1.5% median error

Benjamin C. Lee, et al 13 :: MICRO :: 11 Nov 08

Page 14: CPR: Composable Performance Regression for Scalable ... · CPR: Composable Performance Regression for Scalable Multiprocessor Models Benjamin C. Lee Computer Architecture Group Microsoft

MotivationUniprocessor

Multiprocessor

Regression TheoryModel EvaluationEvolutionary Design

Evolutionary Design

Evolutionary ApproachOptimize ProcXDesign ProcY, enhancing ProcX with µ-arch featuresRe-construct models, accounting for µ-arch featuresOptimize ProcY

Case StudyConsecutive generations of x86 µ-archImprove FE (e.g., branch prediction)Improve MEM (e.g., prefetching)Improve OOO (e.g., memory disambiguation)

Benjamin C. Lee, et al 14 :: MICRO :: 11 Nov 08

Page 15: CPR: Composable Performance Regression for Scalable ... · CPR: Composable Performance Regression for Scalable Multiprocessor Models Benjamin C. Lee Computer Architecture Group Microsoft

MotivationUniprocessor

Multiprocessor

Regression TheoryModel EvaluationEvolutionary Design

Evolving CachesImprove MEM: similar performance with smaller caches

Benjamin C. Lee, et al 15 :: MICRO :: 11 Nov 08

Page 16: CPR: Composable Performance Regression for Scalable ... · CPR: Composable Performance Regression for Scalable Multiprocessor Models Benjamin C. Lee Computer Architecture Group Microsoft

MotivationUniprocessor

Multiprocessor

Regression TheoryModel EvaluationEvolutionary Design

Evolving ROBImprove FE: more instructions inflight, suggests larger ROB

Improve MEM: fewer cache misses, suggests smaller ROB

Benjamin C. Lee, et al 16 :: MICRO :: 11 Nov 08

Page 17: CPR: Composable Performance Regression for Scalable ... · CPR: Composable Performance Regression for Scalable Multiprocessor Models Benjamin C. Lee Computer Architecture Group Microsoft

MotivationUniprocessor

Multiprocessor

CPRModel EvaluationScalability

Outline

MotivationTechnology TrendsSimulation ChallengesSimulation Paradigm

UniprocessorRegression TheoryModel EvaluationEvolutionary Design

MultiprocessorCPRModel EvaluationScalability

Benjamin C. Lee, et al 17 :: MICRO :: 11 Nov 08

Page 18: CPR: Composable Performance Regression for Scalable ... · CPR: Composable Performance Regression for Scalable Multiprocessor Models Benjamin C. Lee Computer Architecture Group Microsoft

MotivationUniprocessor

Multiprocessor

CPRModel EvaluationScalability

Composable Performance Regression

CPR :: build separate core, contention, penalty models

Requires simulations Nuni > Ncon ≥ Npen

Suppose core sims require T1, multi-core sims require T1nγ

Benjamin C. Lee, et al 18 :: MICRO :: 11 Nov 08

Page 19: CPR: Composable Performance Regression for Scalable ... · CPR: Composable Performance Regression for Scalable Multiprocessor Models Benjamin C. Lee Computer Architecture Group Microsoft

MotivationUniprocessor

Multiprocessor

CPRModel EvaluationScalability

CPR: Core

Train with uniprocessor sims from full parameter space

Estimate per core delay from all design parameters

Benjamin C. Lee, et al 19 :: MICRO :: 11 Nov 08

Page 20: CPR: Composable Performance Regression for Scalable ... · CPR: Composable Performance Regression for Scalable Multiprocessor Models Benjamin C. Lee Computer Architecture Group Microsoft

MotivationUniprocessor

Multiprocessor

CPRModel EvaluationScalability

CPR: Contention

Train with CMP, cache-only sims from reduced subspace

Estimate cache hits/misses from shared cache parameters

Benjamin C. Lee, et al 20 :: MICRO :: 11 Nov 08

Page 21: CPR: Composable Performance Regression for Scalable ... · CPR: Composable Performance Regression for Scalable Multiprocessor Models Benjamin C. Lee Computer Architecture Group Microsoft

MotivationUniprocessor

Multiprocessor

CPRModel EvaluationScalability

CPR: Penalty

Train with composed predictions, few CMP sims from full space

Estimate CMP core delays from core, contention predictions

Benjamin C. Lee, et al 21 :: MICRO :: 11 Nov 08

Page 22: CPR: Composable Performance Regression for Scalable ... · CPR: Composable Performance Regression for Scalable Multiprocessor Models Benjamin C. Lee Computer Architecture Group Microsoft

MotivationUniprocessor

Multiprocessor

CPRModel EvaluationScalability

BenchmarksDual-Core BenchmarksSet .1 .21 painter homeworld2 access mentalray3 specjapp specweb4 homeworld tachyon5 dense flash

Quad-Core BenchmarksSet .1 .2 .3 .41 dense excel flash md22 video specjapp specweb tachyon3 excel homeworld audio unreal4 outlook encrypt halflife homeworld5 painter mentalray outlook encrypt

Benjamin C. Lee, et al 22 :: MICRO :: 11 Nov 08

Page 23: CPR: Composable Performance Regression for Scalable ... · CPR: Composable Performance Regression for Scalable Multiprocessor Models Benjamin C. Lee Computer Architecture Group Microsoft

MotivationUniprocessor

Multiprocessor

CPRModel EvaluationScalability

Multiprocessor Model AccuracyDual-core :: 6.6% median error

Quad-core :: 4.8% median error

Benjamin C. Lee, et al 23 :: MICRO :: 11 Nov 08

Page 24: CPR: Composable Performance Regression for Scalable ... · CPR: Composable Performance Regression for Scalable Multiprocessor Models Benjamin C. Lee Computer Architecture Group Microsoft

MotivationUniprocessor

Multiprocessor

CPRModel EvaluationScalability

Scaling TrendsLower bound CPR costs 0.33x of naïve costs

Approach lower bound as uniprocessor models built

Benjamin C. Lee, et al 24 :: MICRO :: 11 Nov 08

Page 25: CPR: Composable Performance Regression for Scalable ... · CPR: Composable Performance Regression for Scalable Multiprocessor Models Benjamin C. Lee Computer Architecture Group Microsoft

Conclusion

Inference in IndustryEffective inference for x86 µ-arch1.5% median errors relative to simulationEvolutionary design for new features across generations

Composable Performance RegressionLeverage core models to minimize CMP simulationsConstruct separate core, contention, penalty models4.8 to 6.6% median errors for dual-, quad-core0.33x training costs of prior approaches

Benjamin C. Lee, et al 25 :: MICRO :: 11 Nov 08

Page 26: CPR: Composable Performance Regression for Scalable ... · CPR: Composable Performance Regression for Scalable Multiprocessor Models Benjamin C. Lee Computer Architecture Group Microsoft

Future Directions

Efficient Multiprogramming AnalysisEvaluate combinations without modeling every combination

Multi-Threaded WorkloadsExtend for homogeneous, heterogeneous threads.Account for synchronization events

Many-Core ArchitecturesConstruct models without many-core simulatorsConsider other shared resources (e.g., network)

Benjamin C. Lee, et al 26 :: MICRO :: 11 Nov 08

Page 27: CPR: Composable Performance Regression for Scalable ... · CPR: Composable Performance Regression for Scalable Multiprocessor Models Benjamin C. Lee Computer Architecture Group Microsoft

CPR: Composable Performance Regressionfor Scalable Multiprocessor Models

Benjamin C. LeeComputer Architecture Group

Microsoft Research

Jamison Collins, Hong WangMicroarchitecture Research Lab

Intel Corporation

David BrooksEngineering and Applied Sciences

Harvard University

International Symposium on Microarchitecture11 November 2008

Benjamin C. Lee, et al 27 :: MICRO :: 11 Nov 08