27
Parallel Radiation Transport Algorithms and Associated Architectural Requirements Jim Morel, Randal Baker, and James Warsa Computer and Computational Sciences Division Los Alamos National Laboratory [email protected], [email protected], [email protected] Presentation at the Conference on High Speed Computing, Gleneden Beach, OR, April 19-22, 2004 Slide 1/27

Parallel Radiation Transport Algorithms and Associated … · 2006-10-31 · Figure 1: 1-D Sweep 12 3 4 • The solution process is inherently sequential and parallelism is not possible

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Parallel Radiation Transport Algorithms and Associated … · 2006-10-31 · Figure 1: 1-D Sweep 12 3 4 • The solution process is inherently sequential and parallelism is not possible

Parallel Radiation Transport Algorithms

and

Associated Architectural Requirements

Jim Morel, Randal Baker, and James Warsa

Computer and Computational Sciences Division

Los Alamos National Laboratory

[email protected], [email protected], [email protected]

Presentation at the Conference on High Speed Computing, Gleneden Beach, OR, April 19-22, 2004 Slide 1/27

Page 2: Parallel Radiation Transport Algorithms and Associated … · 2006-10-31 · Figure 1: 1-D Sweep 12 3 4 • The solution process is inherently sequential and parallelism is not possible

Ove

rvie

w

Intr

od

uct

ion

Th

eTr

ansp

ort

Eq

uat

ion

So

urc

eIt

erat

ion

Co

nver

gen

ceo

fS

ou

rce

Iter

atio

n

Co

nver

gen

ceA

ccel

erat

ion

Kry

lov

Met

ho

ds

and

Pre

con

dit

ion

ers

Pre

sent

atio

nat

the

Con

fere

nce

onH

igh

Spe

edC

ompu

ting,

Gle

nede

nB

each

,OR

,Apr

il19

-22,

2004

Slid

e2/

27

Page 3: Parallel Radiation Transport Algorithms and Associated … · 2006-10-31 · Figure 1: 1-D Sweep 12 3 4 • The solution process is inherently sequential and parallelism is not possible

Ove

rvie

w

Rec

tan

gu

lar-

Mes

hP

aral

lelS

wee

pA

lgo

rith

ms

Rec

tan

gu

lar-

Mes

hS

wee

pL

ayo

uts

Par

alle

lSw

eep

so

nU

nst

ruct

ure

dM

esh

es

Par

alle

lSw

eep

so

nS

tru

ctu

red

AM

RM

esh

es

Do

mai

nD

eco

mp

osi

tio

nS

olu

tio

nA

lgo

rith

ms

Co

mp

ute

rA

rch

itec

ture

Req

uir

emen

ts

Pre

sent

atio

nat

the

Con

fere

nce

onH

igh

Spe

edC

ompu

ting,

Gle

nede

nB

each

,OR

,Apr

il19

-22,

2004

Slid

e3/

27

Page 4: Parallel Radiation Transport Algorithms and Associated … · 2006-10-31 · Figure 1: 1-D Sweep 12 3 4 • The solution process is inherently sequential and parallelism is not possible

Intr

od

uct

ion

•Tr

ansp

ortc

alcu

latio

nsar

eof

fund

amen

tali

mpo

rtan

ceto

AS

CI.

•Tr

ansp

ortc

alcu

latio

nsre

quire

far

mor

em

emor

yan

dC

PU

-tim

eth

anan

y

othe

rty

peof

phys

ics

calc

ulat

ion

inan

AS

CIc

ode.

•T

here

are

thre

ere

leva

ntty

pes

oftr

ansp

ortc

alcu

latio

ns:

neut

ron,

ther

mal

radi

atio

n,an

dch

arge

d-pa

rtic

le.

•T

henu

mer

ical

met

hods

for

neut

ron

tran

spor

tcal

cula

tions

are

very

mat

ure

and

gene

rally

adeq

uate

.

•T

henu

mer

ical

met

hods

for

char

ged-

part

icle

tran

spor

tand

ther

mal

radi

atio

ntr

ansp

orta

refa

rle

ssm

atur

ean

dth

eir

adeq

uacy

for

mul

tidim

ensi

onal

calc

ulat

ions

isno

twel

lest

ablis

hed.

•Fr

oma

num

eric

alm

etho

dsst

andp

oint

,the

rmal

radi

atio

ntr

ansp

orti

s

sign

ifica

ntly

mor

ede

man

ding

than

char

ged-

part

icle

tran

spor

t,an

d

char

ged-

part

icle

tran

spor

tis

sign

ifica

ntly

mor

ede

man

ding

than

neut

ron

tran

spor

t.

Pre

sent

atio

nat

the

Con

fere

nce

onH

igh

Spe

edC

ompu

ting,

Gle

nede

nB

each

,OR

,Apr

il19

-22,

2004

Slid

e4/

27

Page 5: Parallel Radiation Transport Algorithms and Associated … · 2006-10-31 · Figure 1: 1-D Sweep 12 3 4 • The solution process is inherently sequential and parallelism is not possible

Intr

od

uct

ion

•F

orth

em

ostp

art,

the

sam

eba

sic

para

llels

olut

ion

tech

niqu

esca

nbe

used

for

allt

hree

type

sof

calc

ulat

ions

.

•P

aral

lels

olut

ion

tech

niqu

esar

equ

item

atur

efo

rre

ctan

gula

rsp

atia

l

mes

hes,

butt

hey

rem

ain

are

sear

chto

pic

for

stru

ctur

edA

MR

mes

hes

and

fully

unst

ruct

ured

mes

hes.

Pre

sent

atio

nat

the

Con

fere

nce

onH

igh

Spe

edC

ompu

ting,

Gle

nede

nB

each

,OR

,Apr

il19

-22,

2004

Slid

e5/

27

Page 6: Parallel Radiation Transport Algorithms and Associated … · 2006-10-31 · Figure 1: 1-D Sweep 12 3 4 • The solution process is inherently sequential and parallelism is not possible

Th

eTr

ansp

ort

Eq

uat

ion

•T

hefo

llow

ing

neut

ron

tran

spor

tequ

atio

nis

repr

esen

tativ

eof

allt

hree

type

sof

tran

spor

tequ

atio

ns:

1 v

∂ψ ∂t

+−→ Ω

·−→ ∇ψ

tψ=

∫ ∞ 0

∫ 4π

σs(E

′ →E

,−→ Ω′ ·−→ Ω

)ψ(−→ Ω

′ ,E′ )

dΩ′ d

E′ +

Q.

•W

eus

eop

erat

orno

tatio

nto

expr

ess

this

equa

tion

asfo

llow

s:

=S

ψ+

Q.

•T

hepr

imar

yun

know

nis

the

angu

lar

flux,

ψ,w

hich

isse

ven-

dim

ensi

onal

:

ψ=

ψ(t

,−→ r,−→ Ω

,E) .

•t

istim

e,−→ r

isth

epa

rtic

lepo

sitio

n,−→ Ω

isth

epa

rtic

ledi

rect

ion,

and

Eis

the

part

icle

ener

gy.

Pre

sent

atio

nat

the

Con

fere

nce

onH

igh

Spe

edC

ompu

ting,

Gle

nede

nB

each

,OR

,Apr

il19

-22,

2004

Slid

e6/

27

Page 7: Parallel Radiation Transport Algorithms and Associated … · 2006-10-31 · Figure 1: 1-D Sweep 12 3 4 • The solution process is inherently sequential and parallelism is not possible

Th

eTr

ansp

ort

Eq

uat

ion

•T

his

equa

tion

isju

sta

stat

emen

tofp

artic

leco

nser

vatio

nin

adi

ffere

ntia

l

phas

e-sp

ace

volu

me,

dP

=dV

dE

.

•T

hedi

ffere

ntia

lpha

sesp

ace

volu

me

cons

ists

ofth

edi

ffere

ntia

lspa

tial

volu

me

abou

t−→ r

,the

diffe

rent

ials

olid

angl

eab

out−→ Ω

,and

the

diffe

rent

iale

nerg

yin

terv

alab

outE

.

•T

heeq

uatio

nst

ates

that

the

rate

ofch

ange

ofth

enu

mbe

rof

part

icle

sin

the

diffe

rent

ialv

olum

edP

iseq

ualt

oth

epa

rtic

leso

urce

rate

sm

inus

the

part

icle

sink

rate

s.

•S

inks

fordP

incl

ude

•st

ream

ing

outo

fthe

diffe

rent

ials

patia

lvol

ume

abou

t−→ r

.

•be

ing

eith

erab

sorb

edor

scat

tere

dou

toft

hedi

ffere

ntia

lsol

idan

gle

abou

t−→ Ωan

dth

edi

ffere

ntia

lene

rgy

inte

rval

abou

tthe

ener

gyE

.

Pre

sent

atio

nat

the

Con

fere

nce

onH

igh

Spe

edC

ompu

ting,

Gle

nede

nB

each

,OR

,Apr

il19

-22,

2004

Slid

e7/

27

Page 8: Parallel Radiation Transport Algorithms and Associated … · 2006-10-31 · Figure 1: 1-D Sweep 12 3 4 • The solution process is inherently sequential and parallelism is not possible

Th

eTr

ansp

ort

Eq

uat

ion

•S

ourc

esin

clud

e

•st

ream

ing

into

the

diffe

rent

ials

patia

lvol

ume

abou

t−→ r

.

•be

ing

scat

tere

din

toth

edi

ffere

ntia

lsol

idan

gle

abou

t−→ Ω

and

the

diffe

rent

iale

nerg

yin

terv

alab

outt

heen

ergy

E.

•B

eing

expl

icitl

ycr

eate

dw

ithin

dP

.

Pre

sent

atio

nat

the

Con

fere

nce

onH

igh

Spe

edC

ompu

ting,

Gle

nede

nB

each

,OR

,Apr

il19

-22,

2004

Slid

e8/

27

Page 9: Parallel Radiation Transport Algorithms and Associated … · 2006-10-31 · Figure 1: 1-D Sweep 12 3 4 • The solution process is inherently sequential and parallelism is not possible

So

urc

eIt

erat

ion

•T

hest

anda

rdte

chni

que

for

solv

ing

the

tran

spor

tequ

atio

nis

the

sour

ce

itera

tion

tech

niqu

e.

•S

ince

the

sour

cete

rmco

ntai

nsal

lthe

angl

e-en

ergy

coup

ling,

itis

itera

tivel

yla

gged

:

ψ�+

1=

L−1

�+

L−1

Q.

•O

nre

ctan

gula

rm

eshe

s,th

eop

erat

orL

repr

esen

tsa

bloc

k

low

er-t

riang

ular

mat

rixw

ithea

chbl

ock

corr

espo

ndin

gto

the

unkn

owns

in

asi

ngle

cell.

•S

uch

mat

rices

are

very

easy

toin

vert

ona

scal

arm

achi

ne.

•In

part

icul

ar,o

nese

quen

tially

solv

esfo

rth

eun

know

nsin

each

mes

hce

ll,

mov

ing

acro

ssth

em

esh

inth

edi

rect

ion

ofpa

rtic

leflo

w.

•B

ecau

seof

this

mov

emen

t,th

ein

vers

ion

proc

ess

isca

lled

asw

eep.

Pre

sent

atio

nat

the

Con

fere

nce

onH

igh

Spe

edC

ompu

ting,

Gle

nede

nB

each

,OR

,Apr

il19

-22,

2004

Slid

e9/

27

Page 10: Parallel Radiation Transport Algorithms and Associated … · 2006-10-31 · Figure 1: 1-D Sweep 12 3 4 • The solution process is inherently sequential and parallelism is not possible

So

urc

eIt

erat

ion

•T

hest

anda

rdte

chni

que

for

solv

ing

the

tran

spor

tequ

atio

nis

the

sour

ce

itera

tion

tech

niqu

e.

•S

ince

the

sour

cete

rmco

ntai

nsal

lthe

angl

e-en

ergy

coup

ling,

itis

itera

tivel

yla

gged

:

ψ�+

1=

L−1

�+

L−1

Q.

•O

nre

ctan

gula

rm

eshe

s,th

eop

erat

orL

repr

esen

tsa

bloc

k

low

er-t

riang

ular

mat

rixw

ithea

chbl

ock

corr

espo

ndin

gto

the

unkn

owns

in

asi

ngle

cell.

•S

uch

mat

rices

are

very

easy

toin

vert

ona

scal

arm

achi

ne.

•In

part

icul

ar,o

nese

quen

tially

solv

esfo

rth

eun

know

nsin

each

mes

hce

ll,

mov

ing

acro

ssth

em

esh

inth

edi

rect

ion

ofpa

rtic

leflo

w.

•B

ecau

seof

this

mov

emen

t,th

ein

vers

ion

proc

ess

isca

lled

asw

eep.

Pre

sent

atio

nat

the

Con

fere

nce

onH

igh

Spe

edC

ompu

ting,

Gle

nede

nB

each

,OR

,Apr

il19

-22,

2004

Slid

e10

/27

Page 11: Parallel Radiation Transport Algorithms and Associated … · 2006-10-31 · Figure 1: 1-D Sweep 12 3 4 • The solution process is inherently sequential and parallelism is not possible

Co

nver

gen

ceo

fS

ou

rce

Iter

atio

n

•A

ssum

ing

aze

roin

itial

gues

sfo

rth

esc

atte

ring

sour

ce,e

ach

sour

ce

itera

tion

adds

the

cont

ribut

ion

toth

eso

lutio

nfr

oman

othe

rge

nera

tion

of

scat

terin

g.

•In

othe

rw

ords

,the

first

sour

ceite

ratio

nyi

elds

the

solu

tion

due

topa

rtic

les

that

have

noty

etsc

atte

red.

The

seco

ndso

urce

itera

tion

adds

the

cont

ribut

ion

toth

eso

lutio

nfr

ompa

rtic

les

that

have

scat

tere

don

ce.

The

third

sour

ceite

ratio

nad

dsth

eco

ntrib

utio

nto

the

solu

tion

from

part

icle

s

that

have

scat

tere

dtw

ice,

and

soon

.

•T

hus

the

sour

ceite

ratio

npr

oces

sco

nver

ges

rapi

dly

inpr

oble

ms

inw

hich

part

icle

son

lysc

atte

ra

few

times

onav

erag

ebe

fore

eith

erbe

ing

abso

rbed

orst

ream

ing

outo

fthe

syst

em.

•A

rbitr

arily

slow

conv

erge

nce

rate

sca

nbe

obta

ined

indi

ffusi

vepr

oble

ms,

i.e.,

prob

lem

sw

ithlit

tleab

sorp

tion

that

appe

arve

ryth

ick

toth

epa

rtic

les.

Pre

sent

atio

nat

the

Con

fere

nce

onH

igh

Spe

edC

ompu

ting,

Gle

nede

nB

each

,OR

,Apr

il19

-22,

2004

Slid

e11

/27

Page 12: Parallel Radiation Transport Algorithms and Associated … · 2006-10-31 · Figure 1: 1-D Sweep 12 3 4 • The solution process is inherently sequential and parallelism is not possible

Co

nver

gen

ceA

ccel

erat

ion

•In

diffu

sive

prob

lem

s,th

eso

urce

itera

tion

proc

ess

mus

tbe

acce

lera

ted.

Thi

sis

abso

lute

lycr

itica

lin

ther

mal

radi

atio

ntr

ansp

ortc

alcu

latio

ns.

•In

gene

ral,

the

resu

lting

solu

tion

tech

niqe

sca

nbe

thou

ghto

fas

atw

o-gr

id

met

hod,

with

adi

ffusi

oneq

uatio

npl

ayin

gth

ero

leof

the

“coa

rse-

grid

tran

spor

tope

rato

r.

•T

hedi

ffusi

oneq

uatio

nm

ustb

edi

scre

tized

inac

cord

ance

with

the

disc

retiz

atio

nof

the

tran

spor

tequ

atio

nto

avoi

din

stab

ilitie

s.

•B

ecau

seth

etr

ansp

orte

quat

ion

isge

nera

llydi

scre

tized

usin

g

disc

ontin

uous

finite

-ele

men

tmet

hods

,thi

sle

ads

tono

n-st

anda

rddi

ffusi

on

disc

retiz

atio

nsth

atar

eve

rydi

fficu

ltto

solv

e.

•F

urth

erm

ore,

the

acce

lera

tion

tech

niqu

eca

nbe

com

ein

crea

sing

ly

inef

fect

ive

with

seve

redi

scon

tinui

ties

inm

ater

ialp

rope

rtie

s.

Pre

sent

atio

nat

the

Con

fere

nce

onH

igh

Spe

edC

ompu

ting,

Gle

nede

nB

each

,OR

,Apr

il19

-22,

2004

Slid

e12

/27

Page 13: Parallel Radiation Transport Algorithms and Associated … · 2006-10-31 · Figure 1: 1-D Sweep 12 3 4 • The solution process is inherently sequential and parallelism is not possible

Kry

lov

Met

ho

ds

and

Pre

con

dit

ion

ers

•A

nal

tern

ativ

eto

the

two-

grid

appr

oach

isto

use

aK

rylo

vm

etho

dto

solv

e

the

tran

spor

tequ

atio

n,an

dto

reca

stth

edi

ffusi

on-b

ased

acce

lera

tion

tech

niqu

eas

adi

ffusi

on-b

ased

prec

ondi

tione

r.

•T

his

appr

oach

rela

xes

the

cons

iste

ncy

requ

irem

ents

onth

edi

ffusi

on

disc

retiz

atio

n,an

dis

easi

lyim

plem

ente

din

exis

ting

code

s.

•H

ighl

yef

fect

ive

prec

ondi

tioni

ngis

achi

eved

via

ast

anda

rddi

ffusi

on

disc

retiz

atio

nra

ther

than

adi

scon

tinuo

usdi

ffusi

ondi

scre

tizat

ion,

and

the

mat

eria

ldis

cont

inut

itypr

oble

mis

elim

inat

ed.

•T

hepr

econ

ditio

ned

Kry

lov

appr

oach

isha

ving

are

volu

tiona

ryim

pact

upon

solu

tion

tech

niqu

esfo

rth

etr

ansp

orte

quat

ion.

Pre

sent

atio

nat

the

Con

fere

nce

onH

igh

Spe

edC

ompu

ting,

Gle

nede

nB

each

,OR

,Apr

il19

-22,

2004

Slid

e13

/27

Page 14: Parallel Radiation Transport Algorithms and Associated … · 2006-10-31 · Figure 1: 1-D Sweep 12 3 4 • The solution process is inherently sequential and parallelism is not possible

Kry

lov

Met

ho

ds

and

Pre

con

dit

ion

ers

•T

heeq

uatio

nw

hich

the

Kry

lov

met

hod

solv

esis

not

=S

ψ+

Q,

butr

athe

r

(I−

L−1

S)ψ

=L−1

Q.

•T

hela

tter

equa

tion

has

far

bette

rsp

ectr

alpr

oper

ties,

butr

equi

res

asw

eep

toev

alua

teits

actio

n.

•T

hus

swee

psan

ddi

ffusi

onso

lves

are

requ

ired

with

both

trad

ition

alan

d

adva

nced

tran

spor

tsol

utio

nte

chni

ques

.

•O

urpa

ralle

lmet

hods

for

solv

ing

diffu

sion

equa

tions

are

take

nfr

omth

e

appl

ied

mat

hem

atic

sco

mm

unity

.

•T

hus

we

dono

tdis

cuss

them

furt

her.

Pre

sent

atio

nat

the

Con

fere

nce

onH

igh

Spe

edC

ompu

ting,

Gle

nede

nB

each

,OR

,Apr

il19

-22,

2004

Slid

e14

/27

Page 15: Parallel Radiation Transport Algorithms and Associated … · 2006-10-31 · Figure 1: 1-D Sweep 12 3 4 • The solution process is inherently sequential and parallelism is not possible

Rec

tan

gu

lar-

Mes

hP

aral

lelS

wee

pA

lgo

rith

ms

•O

na

N-c

ell1

-Dm

esh,

the

swee

pst

arts

atth

ebo

unda

ryce

llfo

rw

hich

the

part

icle

dire

ctio

nis

inci

dent

.

•S

olvi

ngfo

rth

eun

know

nsin

the

first

cell

prov

ides

the

inci

dent

solu

tion

valu

esre

quire

dto

solv

efo

rth

eun

know

nsin

the

seco

ndce

ll,an

dso

on,

asill

ustr

ated

inF

ig.1

.

Fig

ure

1:1-

DS

wee

p

12

34

•T

heso

lutio

npr

oces

sis

inhe

rent

lyse

quen

tiala

ndpa

ralle

lism

isno

t

poss

ible

.

Pre

sent

atio

nat

the

Con

fere

nce

onH

igh

Spe

edC

ompu

ting,

Gle

nede

nB

each

,OR

,Apr

il19

-22,

2004

Slid

e15

/27

Page 16: Parallel Radiation Transport Algorithms and Associated … · 2006-10-31 · Figure 1: 1-D Sweep 12 3 4 • The solution process is inherently sequential and parallelism is not possible

Rec

tan

gu

lar-

Mes

hP

aral

lelS

wee

pA

lgo

rith

ms

•O

na

N2-

Dm

esh,

the

swee

pst

arts

atth

eco

rner

mes

hce

llfo

r

whi

chth

epa

rtic

ledi

rect

ion

isin

cide

nton

two

cell

face

s.

•S

olvi

ngfo

rth

eun

know

nsin

the

corn

erm

esh

cell

prov

ides

the

inci

dent

solu

tion

valu

esre

quire

dto

solv

efo

rth

eun

know

nsin

the

next

two

cells

,

and

soon

,as

illus

trat

edin

Fig

.2.

Fig

ure

2:2-

DS

wee

p

12

2

3

3

3

4

4

4

4

5

5

56

67

Pre

sent

atio

nat

the

Con

fere

nce

onH

igh

Spe

edC

ompu

ting,

Gle

nede

nB

each

,OR

,Apr

il19

-22,

2004

Slid

e16

/27

Page 17: Parallel Radiation Transport Algorithms and Associated … · 2006-10-31 · Figure 1: 1-D Sweep 12 3 4 • The solution process is inherently sequential and parallelism is not possible

Rec

tan

gu

lar-

Mes

hP

aral

lelS

wee

pA

lgo

rith

ms

•2N

−1

sequ

entia

lsol

utio

nst

eps

are

requ

ired

with

N2

cells

,so

one

can

sim

ulta

neou

sly

solv

efo

rO

(N/2)

unkn

owns

per

step

onth

eav

erag

e.

•T

hus

para

lleliz

atio

nis

poss

ible

.

Pre

sent

atio

nat

the

Con

fere

nce

onH

igh

Spe

edC

ompu

ting,

Gle

nede

nB

each

,OR

,Apr

il19

-22,

2004

Slid

e17

/27

Page 18: Parallel Radiation Transport Algorithms and Associated … · 2006-10-31 · Figure 1: 1-D Sweep 12 3 4 • The solution process is inherently sequential and parallelism is not possible

Rec

tan

gu

lar-

Mes

hP

aral

lelS

wee

pA

lgo

rith

ms

•O

na

N3-

Dm

esh,

the

swee

pst

arts

atth

eco

rner

mes

hce

llfo

r

whi

chth

epa

rtic

ledi

rect

ion

isin

cide

nton

thre

ece

llfa

ces.

•S

olvi

ngfo

rth

eun

know

nsin

the

corn

erm

esh

cell

prov

ides

the

inci

dent

solu

tion

valu

esre

quire

dto

solv

efo

rth

eun

know

nsin

the

next

thre

ece

lls,

and

soon

,as

illus

trat

edin

Fig

.3.

Fig

ure

3:3-

DS

wee

p

21

22

Pre

sent

atio

nat

the

Con

fere

nce

onH

igh

Spe

edC

ompu

ting,

Gle

nede

nB

each

,OR

,Apr

il19

-22,

2004

Slid

e18

/27

Page 19: Parallel Radiation Transport Algorithms and Associated … · 2006-10-31 · Figure 1: 1-D Sweep 12 3 4 • The solution process is inherently sequential and parallelism is not possible

Rec

tan

gu

lar-

Mes

hP

aral

lelS

wee

pA

lgo

rith

ms

•3N

−2

sequ

entia

lsol

utio

nst

eps

are

requ

ired

with

N3

cells

,so

one

can

sim

ulta

neou

sly

solv

efo

rO

(N2/3

)un

know

nspe

rst

epon

the

aver

age.

•T

hus

para

lleliz

atio

nis

poss

ible

.

Pre

sent

atio

nat

the

Con

fere

nce

onH

igh

Spe

edC

ompu

ting,

Gle

nede

nB

each

,OR

,Apr

il19

-22,

2004

Slid

e19

/27

Page 20: Parallel Radiation Transport Algorithms and Associated … · 2006-10-31 · Figure 1: 1-D Sweep 12 3 4 • The solution process is inherently sequential and parallelism is not possible

Rec

tan

gu

lar-

Mes

hS

wee

pL

ayo

uts

•T

heda

tala

yout

for

a2-

Dtr

ansp

ortc

alcu

latio

nis

1-D

and

the

data

layo

ut

for

a3-

Dca

lcul

atio

nis

2-D

.

•F

orin

stan

ce,c

onsi

der

a4×

4m

esh.

•T

hece

llda

tafo

ral

ldire

ctio

nsan

den

ergi

esw

ould

bem

appe

dto

four

proc

esso

rsas

follo

ws: Tabl

e1:

Layo

utfo

r4×

4m

esh.

P1

P2

P3

P4

45

67

34

56

23

45

12

34

Pre

sent

atio

nat

the

Con

fere

nce

onH

igh

Spe

edC

ompu

ting,

Gle

nede

nB

each

,OR

,Apr

il19

-22,

2004

Slid

e20

/27

Page 21: Parallel Radiation Transport Algorithms and Associated … · 2006-10-31 · Figure 1: 1-D Sweep 12 3 4 • The solution process is inherently sequential and parallelism is not possible

Rec

tan

gu

lar-

Mes

hS

wee

pL

ayo

uts

•T

his

isa

non-

stan

dard

layo

ut.

•T

here

isa

load

bala

ncin

gpr

oble

mth

atis

addr

esse

dby

star

ting

the

solu

tion

proc

ess

for

addi

tiona

ldire

ctio

nsbe

fore

any

one

dire

ctio

nfin

ishe

s.

•O

nre

ctan

gula

r3-

Dm

eshe

s,pa

ralle

leffi

cien

cies

onth

eor

der

of50

-80

perc

ent(

with

wea

ksc

alin

g)ha

vebe

enac

hiev

edw

ithth

ousa

nds

of

proc

esso

rs.

Pre

sent

atio

nat

the

Con

fere

nce

onH

igh

Spe

edC

ompu

ting,

Gle

nede

nB

each

,OR

,Apr

il19

-22,

2004

Slid

e21

/27

Page 22: Parallel Radiation Transport Algorithms and Associated … · 2006-10-31 · Figure 1: 1-D Sweep 12 3 4 • The solution process is inherently sequential and parallelism is not possible

Par

alle

lSw

eep

so

nU

nst

ruct

ure

dM

esh

es

•M

atte

rsbe

com

em

uch

mor

eco

mpl

icat

edon

unst

ruct

ured

mes

hes.

•A

nor

derin

gof

the

unkn

owns

that

lead

sto

abl

ock

low

er-t

riang

ular

stru

ctur

efo

rth

esw

eep

equa

tions

alw

ays

exis

tson

“rea

sona

ble”

2-D

unst

ruct

ured

mes

hes,

butt

his

isno

tnec

essa

rily

the

case

for

3-D

mes

hes.

•A

bloc

klo

wer

-tria

ngul

aror

derin

ges

sent

ially

alw

ays

exis

tsfo

r“r

easo

nabl

e”

tetr

ahed

ralm

eshe

s,bu

tita

lmos

tnev

erex

ists

for

dist

orte

dhe

xahe

dral

mes

hes.

•E

ven

whe

na

bloc

klo

wer

-tria

ngul

aror

derin

gex

ists

,the

num

ber

ofst

eps

asso

ciat

edw

itha

mes

hof

Nce

llsva

ries

alon

gw

ithth

enu

mbe

rof

cells

in

each

step

.

•O

ptim

alpa

rtiti

onin

gis

anN

P-c

ompl

ete

prob

lem

.

•P

aral

lelu

nstr

uctu

red-

mes

hsw

eep

algo

rithm

sar

est

illa

rese

arch

topi

c.

•N

onet

hele

ss,v

ery

good

para

llele

ffici

enci

esha

vebe

enob

tain

edw

ith

thou

sand

sof

proc

esso

rson

tetr

ahed

ral-m

eshe

s.

Pre

sent

atio

nat

the

Con

fere

nce

onH

igh

Spe

edC

ompu

ting,

Gle

nede

nB

each

,OR

,Apr

il19

-22,

2004

Slid

e22

/27

Page 23: Parallel Radiation Transport Algorithms and Associated … · 2006-10-31 · Figure 1: 1-D Sweep 12 3 4 • The solution process is inherently sequential and parallelism is not possible

Par

alle

lSw

eep

so

nS

tru

ctu

red

AM

RM

esh

es

•T

hedi

fficu

ltyof

defin

ing

anef

ficie

ntsw

eep

algo

rithm

for

ast

ruct

ured

AM

R

mes

hca

nbe

sim

ilar

toth

atfo

rei

ther

unst

ruct

ured

orre

ctan

gula

rm

eshe

s.

•T

hedi

fficu

ltyof

defin

ing

anef

ficie

ntsw

eep

algo

rithm

for

ast

ruct

ured

AM

R

mes

hus

ually

incr

ease

sw

ithth

efle

xibi

lity

ofth

em

esh

gene

ratio

npr

oces

s.

Pre

sent

atio

nat

the

Con

fere

nce

onH

igh

Spe

edC

ompu

ting,

Gle

nede

nB

each

,OR

,Apr

il19

-22,

2004

Slid

e23

/27

Page 24: Parallel Radiation Transport Algorithms and Associated … · 2006-10-31 · Figure 1: 1-D Sweep 12 3 4 • The solution process is inherently sequential and parallelism is not possible

Do

mai

nD

eco

mp

osi

tio

nS

olu

tio

nA

lgo

rith

ms

•P

aral

lelt

rans

port

solu

tion

algo

rithm

sba

sed

upon

stan

dard

spat

iald

omai

n

deco

mpo

sitio

nar

ebe

ing

inve

stig

ated

.

•In

effe

ct,s

ourc

eite

ratio

nsar

epe

rfor

med

with

inea

chsu

b-do

mai

nan

dth

e

solu

tion

valu

eson

the

sub-

dom

ain

boun

darie

sar

eite

rate

dup

on.

•T

his

appr

oach

appe

ars

toha

veco

nsid

erab

lepr

omis

eas

long

asea

ch

spat

ials

ub-d

omai

nap

pear

sth

ick

toth

epa

rtic

les.

•W

hen

the

sub-

dom

ains

beco

me

thin

toth

epa

rtic

les,

the

algo

rithm

s

beco

me

inef

ficie

ntdu

eto

slow

conv

erge

nce

ofth

esu

b-do

mai

nbo

unda

ry

unkn

owns

.

•T

hese

appr

oach

esca

nbe

com

bine

dw

ithK

rylo

vm

etho

ds.

•W

hile

Kry

lov

met

hods

can

impr

ove

effic

ienc

yin

the

thic

kca

se,t

hey

are

inef

fect

ive

for

the

thin

case

.

Pre

sent

atio

nat

the

Con

fere

nce

onH

igh

Spe

edC

ompu

ting,

Gle

nede

nB

each

,OR

,Apr

il19

-22,

2004

Slid

e24

/27

Page 25: Parallel Radiation Transport Algorithms and Associated … · 2006-10-31 · Figure 1: 1-D Sweep 12 3 4 • The solution process is inherently sequential and parallelism is not possible

Co

mp

ute

rA

rch

itec

ture

Req

uir

emen

ts

•A

light

wei

ghtO

Sth

atm

inim

izes

the

num

ber

ofin

terr

upts

gene

rate

d.

•T

hesw

eep

algo

rithm

istig

htly

coup

led

inth

ese

nse

that

the

equa

tions

for

the

dow

nwin

dce

llsca

nnot

beso

lved

until

thos

efo

rth

eup

win

dce

lls

have

been

solv

ed.

•If

ado

wnw

ind

proc

esso

rha

sno

tfini

shed

it’s

com

puta

tion-

com

mun

icat

ion

cycl

edu

eto

anO

Sin

terr

upto

fsom

eso

rt,

then

ever

ythi

ngha

ltsun

tilit

isfr

eed

up.

•A

high

perf

orm

ance

mem

ory

sub-

syst

em.

•V

ery

few

float

ing

poin

tope

ratio

nsar

epe

rfor

med

per

mem

ory

load

,so

data

quic

kly

mov

esth

roug

hth

eca

che.

Pre

sent

atio

nat

the

Con

fere

nce

onH

igh

Spe

edC

ompu

ting,

Gle

nede

nB

each

,OR

,Apr

il19

-22,

2004

Slid

e25

/27

Page 26: Parallel Radiation Transport Algorithms and Associated … · 2006-10-31 · Figure 1: 1-D Sweep 12 3 4 • The solution process is inherently sequential and parallelism is not possible

Co

mp

ute

rA

rch

itec

ture

Req

uir

emen

ts

•A

low

late

ncy,

high

band

wid

thin

terc

onne

ctto

min

imiz

eco

mm

unic

atio

n

cost

s.N

ote

that

this

also

requ

ires

ahi

ghpe

rfor

man

ceM

PIl

ibra

rysi

nce

on

mod

ern

hard

war

em

osto

fthe

late

ncy

isdu

eto

softw

are,

noth

ardw

are.

•D

ata

isco

mm

unic

ated

betw

een

proc

esso

rsaf

ter

each

sequ

entia

lste

p

inth

esw

eep.

•T

here

lativ

eam

ount

ofda

taco

mm

unic

ated

isla

rge

enou

ghto

requ

ire

sign

ifica

ntba

ndw

idth

,but

smal

leno

ugh

tobe

sens

itive

tola

tenc

y.

Pre

sent

atio

nat

the

Con

fere

nce

onH

igh

Spe

edC

ompu

ting,

Gle

nede

nB

each

,OR

,Apr

il19

-22,

2004

Slid

e26

/27

Page 27: Parallel Radiation Transport Algorithms and Associated … · 2006-10-31 · Figure 1: 1-D Sweep 12 3 4 • The solution process is inherently sequential and parallelism is not possible

Co

mp

ute

rA

rch

itec

ture

Req

uir

emen

ts

•T

hese

requ

irem

ents

are

best

met

bya

syst

emco

mpo

sed

ofa

few

thou

sand

high

perf

orm

ance

proc

esso

rsw

itha

larg

eam

ount

ofm

emor

y

per

proc

esso

r.

•M

inim

izin

gth

enu

mbe

rof

proc

esso

rsw

hile

mee

ting

the

requ

irem

ents

for

spee

dan

dm

emor

ym

akes

the

desi

gnan

dim

plem

enta

tion

ofa

high

perf

orm

ance

inte

rcon

nect

easi

er.

•A

lthou

ghm

oder

nco

des

have

been

exte

nsiv

ely

para

lleliz

ed,A

mda

hl’s

Law

still

com

esin

toef

fect

asth

enu

mbe

rof

proc

esso

rsin

crea

ses.

•P

roce

ssor

relia

bilit

ybe

com

espr

oble

mat

icw

ithm

ore

than

afe

wth

ousa

nd

proc

esso

rs.

•E

xist

ing

AS

CIs

oftw

are

isde

sign

edfo

rR

ISC

-bas

edC

PU

’s.

Tran

sitio

ning

tove

ctor

-typ

eC

PU

’sco

uld

resu

ltin

sign

ifica

ntre

desi

gnco

sts.

Pre

sent

atio

nat

the

Con

fere

nce

onH

igh

Spe

edC

ompu

ting,

Gle

nede

nB

each

,OR

,Apr

il19

-22,

2004

Slid

e27

/27