25
Optimal Superblock Optimal Superblock Scheduling Using Scheduling Using Enumeration Enumeration Ghassan Shobaki, CS Dept. Ghassan Shobaki, CS Dept. Kent Wilken, ECE Dept. Kent Wilken, ECE Dept. University of California, University of California, Davis Davis www.ece.ucdavis.edu/aco/ www.ece.ucdavis.edu/aco/

Optimal Superblock Scheduling Using Enumeration

  • Upload
    iona

  • View
    50

  • Download
    4

Embed Size (px)

DESCRIPTION

Optimal Superblock Scheduling Using Enumeration. Ghassan Shobaki, CS Dept. Kent Wilken, ECE Dept. University of California, Davis www.ece.ucdavis.edu/aco/. Outline. Background Existing Solutions Optimal Solution Experimental Results Summary and Future Work. Overview. - PowerPoint PPT Presentation

Citation preview

Page 1: Optimal Superblock Scheduling Using Enumeration

Optimal Superblock Optimal Superblock Scheduling Using Scheduling Using

EnumerationEnumerationGhassan Shobaki, CS Dept.Ghassan Shobaki, CS Dept.

Kent Wilken, ECE Dept.Kent Wilken, ECE Dept.

University of California, DavisUniversity of California, Davis

www.ece.ucdavis.edu/aco/www.ece.ucdavis.edu/aco/

Page 2: Optimal Superblock Scheduling Using Enumeration

2

OutlineOutline

BackgroundBackground Existing SolutionsExisting Solutions Optimal SolutionOptimal Solution Experimental ResultsExperimental Results Summary and Future WorkSummary and Future Work

Page 3: Optimal Superblock Scheduling Using Enumeration

3

OverviewOverview

““Instruction Scheduling is the most Instruction Scheduling is the most fundamental ILP-oriented phase”. fundamental ILP-oriented phase”. [Josh Fisher [Josh Fisher et al.et al., “Embedded Computing”], “Embedded Computing”]

Scheduler tries to find an instruction Scheduler tries to find an instruction order that minimizes pipeline stalls order that minimizes pipeline stalls

Schedule must preserve program’s Schedule must preserve program’s semantics and honor hardware semantics and honor hardware constraintsconstraints

Page 4: Optimal Superblock Scheduling Using Enumeration

4

Elements of Instruction Elements of Instruction SchedulingScheduling

Region FormationRegion Formation Schedule Construction (the Schedule Construction (the

focus of our research)focus of our research)

Page 5: Optimal Superblock Scheduling Using Enumeration

5

Region FormationRegion Formation Scheduler’s scope is a sub-graph of the Scheduler’s scope is a sub-graph of the

program’s control flow graph (CFG)program’s control flow graph (CFG) Local scheduling:Local scheduling: single basic block single basic block Global scheduling:Global scheduling: multiple basic blocks: multiple basic blocks:

Trace Trace SuperblockSuperblock and hyperblock and hyperblock TreegionTreegion General acyclic: e.g. Wavefront (2000)General acyclic: e.g. Wavefront (2000)

Page 6: Optimal Superblock Scheduling Using Enumeration

6

Schedule ConstructionSchedule Construction NP-Hard problem for realistic NP-Hard problem for realistic

machinesmachines Heuristic Solutions: Virtually all Heuristic Solutions: Virtually all

production compilers and most production compilers and most researchresearch

Optimal Approaches: Recent researchOptimal Approaches: Recent research Local: Integer Programming and Local: Integer Programming and

enumeration enumeration Global: Integer ProgrammingGlobal: Integer Programming

Page 7: Optimal Superblock Scheduling Using Enumeration

7

The Superblock The Superblock

Single-entry multiple-exit sequence Single-entry multiple-exit sequence of basic blocksof basic blocks

Data and control dependencies and Data and control dependencies and allowed code motions are allowed code motions are represented by a represented by a Directed Acyclic Directed Acyclic Graph (DAG)Graph (DAG)

Page 8: Optimal Superblock Scheduling Using Enumeration

8

B

E

G

C

D

I

F

H0.3

0.2

0.5

A

1 1

11

0

3

0

13

0

Example Superblock DAG Example Superblock DAG

A

B

C

G

H

I

0.3

0.2

A

B

C

D

E

F

Page 9: Optimal Superblock Scheduling Using Enumeration

9

List SchedulingList Scheduling Most common method in practice Most common method in practice Approximate greedy algorithm that runs fast Approximate greedy algorithm that runs fast

in practice in practice Data-ready instructions stored in a Data-ready instructions stored in a priority priority

listlist Priorities assigned according to Priorities assigned according to heuristicsheuristics If ready list is not empty, schedule top If ready list is not empty, schedule top

priority instruction priority instruction Else schedule a stallElse schedule a stall Advance to next issue slotAdvance to next issue slot

Page 10: Optimal Superblock Scheduling Using Enumeration

10

Critical-Path HeuristicCritical-Path Heuristic

B

E

G

C

D

I

F

H0.3

0.2

0.5

A1 1

11

0

3

0

13

0

5

0

4

3

3 1

4

3

0

Cycle Instruction 0 A 1 B 2 G 3 C 4 D 5 H 6 E 7 F 8 I

Page 11: Optimal Superblock Scheduling Using Enumeration

11

Superblock HeuristicsSuperblock Heuristics Critical PathCritical Path Successive RetirementSuccessive Retirement Dependence height and Dependence height and

speculative yield (DHASY)speculative yield (DHASY) G* G* Speculative HedgeSpeculative Hedge Balance SchedulingBalance Scheduling

Page 12: Optimal Superblock Scheduling Using Enumeration

12

Optimal SchedulingOptimal Scheduling

Can make improvement over heuristicsCan make improvement over heuristics Accurate heuristic methods are already Accurate heuristic methods are already

complexcomplex In some applications, longer compile In some applications, longer compile

times can be toleratedtimes can be tolerated Reference for evaluating accuracy of Reference for evaluating accuracy of

heuristics and studying ILP limitsheuristics and studying ILP limits

Page 13: Optimal Superblock Scheduling Using Enumeration

13

ObjectiveObjective

S : A given schedule

Pi : Probability of exit i

Di : Delay of exit i from its lower bound Li

E : # of side exits

Find a schedule with minimum cost

1

1

)(E

iiiDPSCost

Page 14: Optimal Superblock Scheduling Using Enumeration

14

B

E

G

C

D

I

F

H0.3

0.2

0.5

A

1 1

11

0

3

0

13

0

[0,0]

[6,7]

[1,2]

[2,3]

[3,4] [3,6]

[1,4]

[2,5]

[8,8]

Cycle Instruction 0 A 1 B 2 G 3 C 4 D 5 H 6 E 7 F 8 I

Cost Function Example: CPCost Function Example: CP

Cost = 0.3*1 + 0.2*1 + 0.5*0 = 0.5

Page 15: Optimal Superblock Scheduling Using Enumeration

15

Heuristic Solution

Lower Bounds

Cost = 0YES

NO

Optimal AlgorithmOptimal Algorithm

Fix BranchesEnumera

teFeasible

Done

DoneYES

NO

Page 16: Optimal Superblock Scheduling Using Enumeration

16

EnumerationEnumeration List scheduling with backtrackingList scheduling with backtracking Explores one target length at a timeExplores one target length at a time A subset of instructions can be fixed A subset of instructions can be fixed Branch-and-Bound approach with four Branch-and-Bound approach with four

feasibility tests (pruning techniques)feasibility tests (pruning techniques)- Node superiorityNode superiority- LB tighteningLB tightening- History-based dominationHistory-based domination- Relaxed SchedulingRelaxed Scheduling

Page 17: Optimal Superblock Scheduling Using Enumeration

17

Enumeration ExampleEnumeration Example

I2 I3I1

I4 I5

22

22

I1

I2

I3

stall

I2

I3

I4

I5Infeasible!

Backtrack

Target length = 4

Page 18: Optimal Superblock Scheduling Using Enumeration

18

Branch Combinations Branch Combinations & Subset Sum& Subset Sum

Branch Combination Problem is Branch Combination Problem is NP- Complete!NP- Complete!

Can be reduced to Subset SumCan be reduced to Subset Sum In practice, the number of In practice, the number of

branches and ranges are small.branches and ranges are small. Solved efficiently using Solved efficiently using Dynamic Dynamic

ProgrammingProgramming

Page 19: Optimal Superblock Scheduling Using Enumeration

19

B

E

G

C

D

I

F

H0.3

0.2

0.5

A

1 1

11

0

3

0

13

0

[0,0]

[6,7]

[1,2]

[2,3]

[3,4] [3,6]

[1,4]

[2,5]

[8,8]

Start with CP heuristic

Cost = 0.5Only length 8 is interesting

BranchComb C F Cost(0, 0) 2 6 0.0(0, 1) 2 7 0.2(1, 0) 3 6 0.3

Complete ExampleComplete Example

Page 20: Optimal Superblock Scheduling Using Enumeration

20

0 : A

1 : B

2 : C

3 : D

4 : G

5 : E

A

Relaxed Sched

H

X

?Infeasible

Branch Combination (0,0)Branch Combination (0,0)Cost = 0.0Cost = 0.0

B

E

G

C

D

I

F

H0.3

0.2

0.5

A

1 1

11

0

3

0

13

0

[0,0]

[6,6]

[1,1]

[2,2]

[3,3] [3,5]

[1,4]

[2,5]

[8,8]

Page 21: Optimal Superblock Scheduling Using Enumeration

21

A

G

EDE

HE

E

F

I

H

B

C

G

D

G

Optimal ScheduleA, B, C, G, D, H, E, F, Iwith cost 0.2

B

E

G

C

D

I

F

H0.3

0.2

0.5

A

1 1

11

0

3

0

13

0

[0,0]

[7,7]

[1,1]

[2,2]

[3,4] [3,6]

[1,4]

[2,5]

[8,8]

Branch Combination (0,1)Branch Combination (0,1)Cost = 0.2Cost = 0.2

Page 22: Optimal Superblock Scheduling Using Enumeration

22

Experimental ResultsExperimental Results Superblocks imported from GCC Superblocks imported from GCC

using SPEC CPU2000, FP and INTusing SPEC CPU2000, FP and INT Scheduled for 4 machine models:Scheduled for 4 machine models:

single-issuesingle-issue dual-issuedual-issue quad-issuequad-issue six-issue.six-issue.

Time limit set to 1 second per Time limit set to 1 second per problemproblem

Page 23: Optimal Superblock Scheduling Using Enumeration

23

Superblock StatisticsSuperblock StatisticsFP2000FP2000 INT200INT200

MaxMax AvgAvg MaxMax AvgAvg

DAG Size DAG Size 12361236 2424 454454 1717

Exit CountExit Count 3131 2.82.8 4242 3.33.3

Final-Exit Final-Exit Probability (%)Probability (%) 9999 6868 9999 6666

Side-Exit Side-Exit Probability (%)Probability (%) 4848 1717 4949 1414

Page 24: Optimal Superblock Scheduling Using Enumeration

24

INT2000 ResultsINT2000 Results

Issue RateIssue Rate 11 22 44 66 AvgAvg

Hard BlocksHard Blocks 25132513 21312131 16851685 573573 17261726

%Timeouts%Timeouts 1.41.4 0.80.8 1.11.1 0.90.9 1.11.1

Avg Soln Avg Soln

Time (ms)Time (ms)55 55 99 99 77

%Improved %Improved BlocksBlocks 8585 7070 8282 8181 7979

% Cycle % Cycle ImprovementImprovement 2.92.9 2.42.4 3.53.5 4.14.1 33

Page 25: Optimal Superblock Scheduling Using Enumeration

25

Summary & Future WorkSummary & Future Work

An optimal superblock scheduling An optimal superblock scheduling technique has been developed technique has been developed

About 99% of hard problems solved About 99% of hard problems solved within 1 secwithin 1 sec

80% improved80% improved Next Goal: explore other global Next Goal: explore other global

regions. Trace is strongest candidate regions. Trace is strongest candidate