22
Parallel Applications Parallel Hardware Parallel Software IT industry (Silicon Valley) Users Parallel Layout Automatic Generation and Optimization 01.14.2011 Leo Meyerovich Adam Jiang Rastislav Bodik

Parallel Hardware Applications Parallel IT industry ...parlab.eecs.berkeley.edu/sites/all/parlab/files/browsertalkjan2011.pdfWhy Generate a Layout Engine Many and Growing Layout Languages

  • Upload
    others

  • View
    3

  • Download
    1

Embed Size (px)

Citation preview

Page 1: Parallel Hardware Applications Parallel IT industry ...parlab.eecs.berkeley.edu/sites/all/parlab/files/browsertalkjan2011.pdfWhy Generate a Layout Engine Many and Growing Layout Languages

Parallel Applications

Parallel Hardware

Parallel Software IT industry

(Silicon Valley) Users

Parallel Layout Automatic Generation and Optimization

01.14.2011

Leo Meyerovich

Adam Jiang

Rastislav Bodik

Page 2: Parallel Hardware Applications Parallel IT industry ...parlab.eecs.berkeley.edu/sites/all/parlab/files/browsertalkjan2011.pdfWhy Generate a Layout Engine Many and Growing Layout Languages

Why Generate a Layout Engine

Many and Growing Layout Languages

HTML, CSS, SMIL, XUL, Ext, jQuery, YUI OpenOffice, JavaFX, Swing, Flex,

Adam & Eve, Thermo, XAML/WPF, Word, WinForms, Qt, LaTeX, music,

iPhone, Android, WAP, MathML, 3 competing CSS grid proposals

A lot of code

Firefox layout engine: 346111 lines

Page 3: Parallel Hardware Applications Parallel IT industry ...parlab.eecs.berkeley.edu/sites/all/parlab/files/browsertalkjan2011.pdfWhy Generate a Layout Engine Many and Growing Layout Languages

layout engine

renderer

scene graph

parser

selector matcher

cascade

HTML CSS

tree decorated with style constraints

tree traversals

fast tree library

CSS grammar specification

generator

Compile Time

Approach

sequential layout engine

Page 4: Parallel Hardware Applications Parallel IT industry ...parlab.eecs.berkeley.edu/sites/all/parlab/files/browsertalkjan2011.pdfWhy Generate a Layout Engine Many and Growing Layout Languages

HBox: Example of Specifying Layout

4

HBox with two child nodes

wInput = 50px

10px 5px

wInput = ShrinkToFit

10px 5px

w := case wInput: n px: n shrinkToFit: sum (children.w)

Page 5: Parallel Hardware Applications Parallel IT industry ...parlab.eecs.berkeley.edu/sites/all/parlab/files/browsertalkjan2011.pdfWhy Generate a Layout Engine Many and Growing Layout Languages

All Absolute Coordinates

Computed

HBox Traversal Functions

5

def pass0(): child1.y = y child2.y = y def pass1(): cursor = child1.w w = if (wInput is shrink): child1.w + child2.w else: wInput h = max (child1.h, child2.h) def pass2(): child1.x = x; child2.x = x + cursor;

Root

HBox

Leaf Leaf

pass0 pass1 pass2

Page 6: Parallel Hardware Applications Parallel IT industry ...parlab.eecs.berkeley.edu/sites/all/parlab/files/browsertalkjan2011.pdfWhy Generate a Layout Engine Many and Growing Layout Languages

All Absolute Coordinates

Computed

HBox Traversal Functions

6

def pass0(): child1.y = y child2.y = y def pass1(): cursor = child1.w w = if (wInput is shrink): child1.w + child2.w else: wInput h = max (child1.h, child2.h) def pass2(): child1.x = x; child2.x = x + cursor; Leaf Leaf

Root

HBox

Leaf HBox

Page 7: Parallel Hardware Applications Parallel IT industry ...parlab.eecs.berkeley.edu/sites/all/parlab/files/browsertalkjan2011.pdfWhy Generate a Layout Engine Many and Growing Layout Languages

Scheduling Traversals

7

See Demo @

Poster Session

Page 8: Parallel Hardware Applications Parallel IT industry ...parlab.eecs.berkeley.edu/sites/all/parlab/files/browsertalkjan2011.pdfWhy Generate a Layout Engine Many and Growing Layout Languages

Leveraging Generation: % Width

8

w := case wInput: n px: n n %: parentWidth * n% shrinkToFit: sum(children.w)

10px 90% of parentWidth

Page 9: Parallel Hardware Applications Parallel IT industry ...parlab.eecs.berkeley.edu/sites/all/parlab/files/browsertalkjan2011.pdfWhy Generate a Layout Engine Many and Growing Layout Languages

9

def pass0(): … def pass1(): cursor = child1.w w = calculateWidth(wInput, child1.w, child2.w) h = max (child1.h, child2.h)

def pass2(): child1.x = x child2.x = x + cursor

def pass0(): …

def pass1():

cwpx =sumPx(children)

cwperc = sumPercs(children)

h = max (child1.h, child2.h)

def pass2():

w = calculateWidth(wInput,

cwpx, cwperc, parentWidth)

child.parentWidth = w

def pass3():

cursor = child1.w

def pass4():

child1.x = x

child2.x = x + cursor

Page 10: Parallel Hardware Applications Parallel IT industry ...parlab.eecs.berkeley.edu/sites/all/parlab/files/browsertalkjan2011.pdfWhy Generate a Layout Engine Many and Growing Layout Languages

Other Advantages

10

• Correctness Wins • Finds spec inconsistencies • Can visually debug spec

• Performance Wins • Optimal scheduling • Extract parallelism

Leaf Leaf

Root

HBox

HBox HBox

Leaf Leaf

h h h h

h h

h

Page 11: Parallel Hardware Applications Parallel IT industry ...parlab.eecs.berkeley.edu/sites/all/parlab/files/browsertalkjan2011.pdfWhy Generate a Layout Engine Many and Growing Layout Languages

layout engine

renderer

scene graph

parser

selector matcher

cascade

HTML CSS

tree decorated with style constraints

tree traversals

fast tree library

CSS grammar specification

ALE synthesizer

Compile Time

Fast Tree Library

Page 12: Parallel Hardware Applications Parallel IT industry ...parlab.eecs.berkeley.edu/sites/all/parlab/files/browsertalkjan2011.pdfWhy Generate a Layout Engine Many and Growing Layout Languages

Overview of Tree Eval Strategies

Sequential

Multicore

core 1 core 2

SIMD (“SIMTask”)

Page 13: Parallel Hardware Applications Parallel IT industry ...parlab.eecs.berkeley.edu/sites/all/parlab/files/browsertalkjan2011.pdfWhy Generate a Layout Engine Many and Growing Layout Languages

2. Optimizing memory

1

1.05

1.1

1.15

1.2

1.25

1.3

1.35

1.4

50 150 250 350

spe

ed

up

nodes per block

13

Page 14: Parallel Hardware Applications Parallel IT industry ...parlab.eecs.berkeley.edu/sites/all/parlab/files/browsertalkjan2011.pdfWhy Generate a Layout Engine Many and Growing Layout Languages

2. Optimizing memory

1

1.05

1.1

1.15

1.2

1.25

1.3

1.35

1.4

50 150 250 350

spe

ed

up

nodes per block

dfs

bfs

Order within block: bfs, dfs

Traversal order

14

Page 15: Parallel Hardware Applications Parallel IT industry ...parlab.eecs.berkeley.edu/sites/all/parlab/files/browsertalkjan2011.pdfWhy Generate a Layout Engine Many and Growing Layout Languages

2. Optimizing memory

1

1.05

1.1

1.15

1.2

1.25

1.3

1.35

1.4

50 150 250 350

spe

ed

up

nodes per block

dfs, rel pointers

dfs

bfs

Order within block: bfs, dfs

Pointer representation: leftChild = 0x00ffaa00,

leftchild = 1200

Pointer compression How much compression hardware

15

Page 16: Parallel Hardware Applications Parallel IT industry ...parlab.eecs.berkeley.edu/sites/all/parlab/files/browsertalkjan2011.pdfWhy Generate a Layout Engine Many and Growing Layout Languages

2. Optimizing memory

1

1.05

1.1

1.15

1.2

1.25

1.3

1.35

1.4

50 150 250 350

spe

ed

up

nodes per block

bfs, rel pointers

dfs, rel pointers

dfs

bfs

Order within block: bfs, dfs

Pointer representation: leftChild = 0x00ffaa00,

leftchild = 1200

And More packing, coallocation / lazy defaults, structure splitting / phasing …

16

Page 17: Parallel Hardware Applications Parallel IT industry ...parlab.eecs.berkeley.edu/sites/all/parlab/files/browsertalkjan2011.pdfWhy Generate a Layout Engine Many and Growing Layout Languages

Challenge Problem for Task Parallelism?

0.5

0.6

0.7

0.8

0.9

1

1 2 3 4

Sp

eed

up

threads

TBB tree traversal on dual-core Atom 330

base.h tbbcont.h tbbgraph.h tbb.h tbbopt.h

Different TBB algorithms

• Seconds, milliseconds instead of hours, minutes

• Sequence of traversals (locality implications) Dynamic task allocation?

Runtime queues?

Locality across traversals?

Page 18: Parallel Hardware Applications Parallel IT industry ...parlab.eecs.berkeley.edu/sites/all/parlab/files/browsertalkjan2011.pdfWhy Generate a Layout Engine Many and Growing Layout Languages

Semi-Static Work Stealing

1. Before parallel traversal: approximate work stealing

schedule 2. Traversal: reuse schedule

tuned locking scheme

Locality across

passes!

Page 19: Parallel Hardware Applications Parallel IT industry ...parlab.eecs.berkeley.edu/sites/all/parlab/files/browsertalkjan2011.pdfWhy Generate a Layout Engine Many and Growing Layout Languages

0

2

4

6

8

1 2 3 4 5 6 7 8

sp

eed

up

pthreads

Opteron Speedup (2 sockets x 4 cores); 1,000 nodes

0i

0s

1i

1s

SUM

0

2

4

6

8

1 2 3 4 5 6 7 8

sp

ee

du

p

pthreads

0i

0s

1i

1s

SUM

0

2

4

6

8

1 2 3 4 5 6 7 8

sp

ee

du

p

pthreads

0i

0s

1i

1s

SUM

strong scaling: small workload (1ms

each)

10,000 nodes 1,000 nodes; repeat each 10x

Page 20: Parallel Hardware Applications Parallel IT industry ...parlab.eecs.berkeley.edu/sites/all/parlab/files/browsertalkjan2011.pdfWhy Generate a Layout Engine Many and Growing Layout Languages

SIMD Task Evaluation (MSR)

pointwise parallel

instructions

over similar tasks

Irregularity in the task tree

structure mining

Microbenchmarks: 2-7x speedup

see poster for challenges and opportunities

15-20

10-15

5-10

0-5

Page 21: Parallel Hardware Applications Parallel IT industry ...parlab.eecs.berkeley.edu/sites/all/parlab/files/browsertalkjan2011.pdfWhy Generate a Layout Engine Many and Growing Layout Languages

Demo

Parallel layout

Page 22: Parallel Hardware Applications Parallel IT industry ...parlab.eecs.berkeley.edu/sites/all/parlab/files/browsertalkjan2011.pdfWhy Generate a Layout Engine Many and Growing Layout Languages

layout engine

scene graph

renderer

parser

multicore selector matcher

multicore cascade

HTML CSS

tree style

template

tree decorated with style constraints OpenGL Qt Renderer

tree traversals

Fast Tree Library

grammar specification

ALE synthesizer

Compile Time

Status and Future Work

MUD language

widget definition

incrementalizer

multicore parser