38
Statistical Modeling of Feedback Data in an Automatic Tuning System Richard Vuduc, James Demmel (U.C. Berkeley, EECS) {richie,demmel}@cs.berkeley.edu Jeff Bilmes (Univ. of Washington, EE) [email protected] December 10, 2000 Workshop on Feedback-Directed Dynamic Optimization

Statistical Modeling of Feedback Data in an Automatic Tuning System Richard Vuduc, James Demmel (U.C. Berkeley, EECS) {richie,demmel}@cs.berkeley.edu Jeff

Embed Size (px)

Citation preview

Page 1: Statistical Modeling of Feedback Data in an Automatic Tuning System Richard Vuduc, James Demmel (U.C. Berkeley, EECS) {richie,demmel}@cs.berkeley.edu Jeff

Statistical Modeling of Feedback Datain an Automatic Tuning System

Richard Vuduc, James Demmel (U.C. Berkeley, EECS){richie,demmel}@cs.berkeley.edu

Jeff Bilmes (Univ. of Washington, EE)[email protected]

December 10, 2000Workshop on Feedback-Directed Dynamic Optimization

Page 2: Statistical Modeling of Feedback Data in an Automatic Tuning System Richard Vuduc, James Demmel (U.C. Berkeley, EECS) {richie,demmel}@cs.berkeley.edu Jeff

Context: High Performance Libraries

Libraries can isolate performance issues– BLAS/LAPACK/ScaLAPACK (linear algebra)– VSIPL (signal and image processing)– MPI (distributed parallel communications)

Can we implement libraries …– automatically and portably– incorporating machine-dependent features– matching performance of hand-tuned

implementations– leveraging compiler technology– using domain-specific knowledge

Page 3: Statistical Modeling of Feedback Data in an Automatic Tuning System Richard Vuduc, James Demmel (U.C. Berkeley, EECS) {richie,demmel}@cs.berkeley.edu Jeff

Generate and Search:An Automatic Tuning Methodology

Given a library routine Write parameterized code generators

– parameters• machine (e.g., registers, cache, pipeline)• input (e.g., problem size)• problem-specific transformations

– output high-level source (e.g., C code) Search parameter spaces

– generate an implementation– compile using native compiler– measure performance (“feedback”)

Page 4: Statistical Modeling of Feedback Data in an Automatic Tuning System Richard Vuduc, James Demmel (U.C. Berkeley, EECS) {richie,demmel}@cs.berkeley.edu Jeff

Tuning System Examples

Linear algebra– PHiPAC (Bilmes, et al., 1997)– ATLAS (Whaley and Dongarra, 1998)– Sparsity (Im and Yelick, 1999)

Signal Processing– FFTW (Frigo and Johnson, 1998)– SPIRAL (Moura, et al., 2000)

Parallel Communcations– Automatically tuned MPI collective operations

(Vadhiyar, et al. 2000)

Related: Iterative compilation (Bodin, et al., 1998)

Page 5: Statistical Modeling of Feedback Data in an Automatic Tuning System Richard Vuduc, James Demmel (U.C. Berkeley, EECS) {richie,demmel}@cs.berkeley.edu Jeff

Road Map

Context The Search Problem Problem 1: Stopping searches early Problem 2: High-level run-time selection Summary

Page 6: Statistical Modeling of Feedback Data in an Automatic Tuning System Richard Vuduc, James Demmel (U.C. Berkeley, EECS) {richie,demmel}@cs.berkeley.edu Jeff

The Search Problem in PHiPAC

PHiPAC (Bilmes, et al., 1997)– produces dense matrix multiply (matmul)

implementations– generator parameters include

• size and depth of fully unrolled “core” matmul• rectangular, multi-level cache tile sizes• 6 flavors of software pipelining• scaling constants, transpose options, precisions, etc.

An experiment– fix software pipelining method– vary register tile sizes– 500 to 2500 “reasonable” implementations on 6

platforms

Page 7: Statistical Modeling of Feedback Data in an Automatic Tuning System Richard Vuduc, James Demmel (U.C. Berkeley, EECS) {richie,demmel}@cs.berkeley.edu Jeff

A Needle in a Haystack, Part I

Page 8: Statistical Modeling of Feedback Data in an Automatic Tuning System Richard Vuduc, James Demmel (U.C. Berkeley, EECS) {richie,demmel}@cs.berkeley.edu Jeff

Road Map

Context The Search Problem Problem 1: Stopping searches early Problem 2: High-level run-time selection Summary

Page 9: Statistical Modeling of Feedback Data in an Automatic Tuning System Richard Vuduc, James Demmel (U.C. Berkeley, EECS) {richie,demmel}@cs.berkeley.edu Jeff

Problem 1: Stopping Searches Early

Assume– dedicated resources limited– near-optimal implementation okay

Recall the search procedure– generate implementations at random– measure performance

Can we stop the search early?– how early is “early?”– guarantees on quality?

Page 10: Statistical Modeling of Feedback Data in an Automatic Tuning System Richard Vuduc, James Demmel (U.C. Berkeley, EECS) {richie,demmel}@cs.berkeley.edu Jeff

An Early Stopping Criterion

Performance scaled from 0 (worst) to 1 (best) Goal: Stop after t implementations when

Prob[ Mt <= 1- ] < – Mt max observed performance at t

– proximity to best– error tolerance– example: “find within top 5% with error 10%”

• = .05, = .1

Can show probability depends only on F(x) = Prob[ performance <= x ]

Idea: Estimate F(x) using observed samples

Page 11: Statistical Modeling of Feedback Data in an Automatic Tuning System Richard Vuduc, James Demmel (U.C. Berkeley, EECS) {richie,demmel}@cs.berkeley.edu Jeff

Stopping time (300 MHz Pentium-II)

Page 12: Statistical Modeling of Feedback Data in an Automatic Tuning System Richard Vuduc, James Demmel (U.C. Berkeley, EECS) {richie,demmel}@cs.berkeley.edu Jeff

Stopping Time (Cray T3E Node)

Page 13: Statistical Modeling of Feedback Data in an Automatic Tuning System Richard Vuduc, James Demmel (U.C. Berkeley, EECS) {richie,demmel}@cs.berkeley.edu Jeff

Road Map

Context The Search Problem Problem 1: Stopping searches early Problem 2: High-level run-time selection Summary

Page 14: Statistical Modeling of Feedback Data in an Automatic Tuning System Richard Vuduc, James Demmel (U.C. Berkeley, EECS) {richie,demmel}@cs.berkeley.edu Jeff

Problem 2: Run-Time Selection

Assume– one implementation is

not best for all inputs– a few, good

implementations known– can benchmark

How do we choose the “best” implementationat run-time?

Example: matrix multiply, tuned for small (L1), medium (L2), and large workloads

CAM

K

BK

N

C = C + A*B

Page 15: Statistical Modeling of Feedback Data in an Automatic Tuning System Richard Vuduc, James Demmel (U.C. Berkeley, EECS) {richie,demmel}@cs.berkeley.edu Jeff

Truth Map (Sun Ultra-I/170)

Page 16: Statistical Modeling of Feedback Data in an Automatic Tuning System Richard Vuduc, James Demmel (U.C. Berkeley, EECS) {richie,demmel}@cs.berkeley.edu Jeff

A Formal Framework

Given– m implementations– n sample inputs

(training set)– execution time

Find– decision function f(s)

– returns “best”implementationon input s

– f(s) cheap to evaluate

SsAaxaT

SsssS

aaaA

n

m

,:),(

},,,{

},,,{

210

21

ASf :

Page 17: Statistical Modeling of Feedback Data in an Automatic Tuning System Richard Vuduc, James Demmel (U.C. Berkeley, EECS) {richie,demmel}@cs.berkeley.edu Jeff

Solution Techniques (Overview)

Method 1: Cost Minimization– minimize overall execution time on samples

(boundary modeling)• pro: intuitive, f(s) cheap• con: ad hoc, geometric assumptions

Method 2: Regression (Brewer, 1995)– model run-time of each implementation

e.g., Ta(N) = b3N 3 + b2N 2 + b1N + b0

• pro: simple, standard• con: user must define model

Method 3: Support Vector Machines– statistical classification

• pro: solid theory, many successful applications• con: heavy training and prediction machinery

Page 18: Statistical Modeling of Feedback Data in an Automatic Tuning System Richard Vuduc, James Demmel (U.C. Berkeley, EECS) {richie,demmel}@cs.berkeley.edu Jeff

Results 1: Cost Minimization

Page 19: Statistical Modeling of Feedback Data in an Automatic Tuning System Richard Vuduc, James Demmel (U.C. Berkeley, EECS) {richie,demmel}@cs.berkeley.edu Jeff

Results 2: Regression

Page 20: Statistical Modeling of Feedback Data in an Automatic Tuning System Richard Vuduc, James Demmel (U.C. Berkeley, EECS) {richie,demmel}@cs.berkeley.edu Jeff

Results 3: Classification

Page 21: Statistical Modeling of Feedback Data in an Automatic Tuning System Richard Vuduc, James Demmel (U.C. Berkeley, EECS) {richie,demmel}@cs.berkeley.edu Jeff

Quantitative Comparison

Method Misclass.Average

errorBest5%

Worst20%

Worst50%

Regression 34.5% 2.6% 90.7% 1.2% 0.4%

Cost-Min 31.6% 2.2% 94.5% 2.8% 1.2%

SVM 12.0% 1.5% 99.0% 0.4% ~0.0%

Note:Cost of regression and cost-min prediction ~O(3x3 matmul)Cost of SVM prediction ~O(32x32 matmul)

Page 22: Statistical Modeling of Feedback Data in an Automatic Tuning System Richard Vuduc, James Demmel (U.C. Berkeley, EECS) {richie,demmel}@cs.berkeley.edu Jeff

Road Map

Context The Search Problem Problem 1: Stopping searches early Problem 2: High-level run-time selection Summary

Page 23: Statistical Modeling of Feedback Data in an Automatic Tuning System Richard Vuduc, James Demmel (U.C. Berkeley, EECS) {richie,demmel}@cs.berkeley.edu Jeff

Conclusions

Search beneficial Early stopping

– simple (random + a little bit)– informative criteria

High-level run-time selection– formal framework– error metrics

To do– other stopping models (cost-based)– large design space for run-time selection

Page 24: Statistical Modeling of Feedback Data in an Automatic Tuning System Richard Vuduc, James Demmel (U.C. Berkeley, EECS) {richie,demmel}@cs.berkeley.edu Jeff

Extra Slides

More detail (time and/or questions permitting)

Page 25: Statistical Modeling of Feedback Data in an Automatic Tuning System Richard Vuduc, James Demmel (U.C. Berkeley, EECS) {richie,demmel}@cs.berkeley.edu Jeff

PHiPAC Performance (Pentium-II)

Page 26: Statistical Modeling of Feedback Data in an Automatic Tuning System Richard Vuduc, James Demmel (U.C. Berkeley, EECS) {richie,demmel}@cs.berkeley.edu Jeff

PHiPAC Performance (Ultra-I/170)

Page 27: Statistical Modeling of Feedback Data in an Automatic Tuning System Richard Vuduc, James Demmel (U.C. Berkeley, EECS) {richie,demmel}@cs.berkeley.edu Jeff

PHiPAC Performance (IBM RS/6000)

Page 28: Statistical Modeling of Feedback Data in an Automatic Tuning System Richard Vuduc, James Demmel (U.C. Berkeley, EECS) {richie,demmel}@cs.berkeley.edu Jeff

PHiPAC Performance (MIPS R10K)

Page 29: Statistical Modeling of Feedback Data in an Automatic Tuning System Richard Vuduc, James Demmel (U.C. Berkeley, EECS) {richie,demmel}@cs.berkeley.edu Jeff

Needle in a Haystack, Part II

Page 30: Statistical Modeling of Feedback Data in an Automatic Tuning System Richard Vuduc, James Demmel (U.C. Berkeley, EECS) {richie,demmel}@cs.berkeley.edu Jeff

Performance Distribution (IBM RS/6000)

Page 31: Statistical Modeling of Feedback Data in an Automatic Tuning System Richard Vuduc, James Demmel (U.C. Berkeley, EECS) {richie,demmel}@cs.berkeley.edu Jeff

Performance Distribution (Pentium II)

Page 32: Statistical Modeling of Feedback Data in an Automatic Tuning System Richard Vuduc, James Demmel (U.C. Berkeley, EECS) {richie,demmel}@cs.berkeley.edu Jeff

Performance Distribution (Cray T3E Node)

Page 33: Statistical Modeling of Feedback Data in an Automatic Tuning System Richard Vuduc, James Demmel (U.C. Berkeley, EECS) {richie,demmel}@cs.berkeley.edu Jeff

Performance Distribution (Sun Ultra-I)

Page 34: Statistical Modeling of Feedback Data in an Automatic Tuning System Richard Vuduc, James Demmel (U.C. Berkeley, EECS) {richie,demmel}@cs.berkeley.edu Jeff

Cost Minimization

Decision function

)(maxarg)( swsfa

Aa

Aa Ss

aa saTswCam

0

1),()(),,(

Z

esw

aT

a

a

s 0,

)(

Minimize overall execution time on samples

Softmax weight (boundary) functions

Page 35: Statistical Modeling of Feedback Data in an Automatic Tuning System Richard Vuduc, James Demmel (U.C. Berkeley, EECS) {richie,demmel}@cs.berkeley.edu Jeff

Regression

Decision function

)(minarg)( sTsf aAa

012

23

3)( NNNsTa

Model implementation running time (e.g., square matmul of dimension N)

For general matmul with operand sizes (M, K, N), we generalize the above to include all product terms– MKN, MK, KN, MN, M, K, N

Page 36: Statistical Modeling of Feedback Data in an Automatic Tuning System Richard Vuduc, James Demmel (U.C. Berkeley, EECS) {richie,demmel}@cs.berkeley.edu Jeff

Support Vector Machines

Decision function

0

1,1

),()(

Ss

y

ssKybsL

i

i

iiii

Binary classifier

)(maxarg)( sLsf aAa

Page 37: Statistical Modeling of Feedback Data in an Automatic Tuning System Richard Vuduc, James Demmel (U.C. Berkeley, EECS) {richie,demmel}@cs.berkeley.edu Jeff

Proximity to Best (300 MHz Pentium-II)

Page 38: Statistical Modeling of Feedback Data in an Automatic Tuning System Richard Vuduc, James Demmel (U.C. Berkeley, EECS) {richie,demmel}@cs.berkeley.edu Jeff

Proximity to Best (Cray T3E Node)