Upload
yuers
View
61
Download
1
Tags:
Embed Size (px)
Citation preview
QuickPIC: a highly efficient fully parallelized
PIC code for plasma-based acceleration
Chengkun Huang, V. K. Decyk, M. Zhou, W. Lu, W. B. Mori (UCLA),
J.H. Cooley, T.M. Antonsen Jr. (U. Maryland),
B. Feng, T. Katsouleas (USC)
Jorge Vieira (IST)
06/26/06 SCIDAC 2006 2
Particle Accelerators
• Limited by peak power andbreakdown
• 20-100 MeV/m
• No breakdown limit
• 10-100 GeV/m
Conventional Accelerators Plasma
• Plasma Wake Field Accelerator (PWFA)
A high energy electron bunch
• Laser Wake Field Accelerator (LWFA)
A single short-pulse of photons
Why Plasmas?
Dawson & Tajima 1979
PIC Model required!
06/26/06 SCIDAC 2006 3
Accelerating force
Plasma/Laser Wakefield Acceleration
Uniform accelerating field Linear focusing field
E
z ,maxmc
pe 0.96
n
cm3 V / cmFocusing force
++ ++ ++ ++ ++ ++ + ++ + ++ ++ ++ ++ ++ ++ ++
-- -- -- - ---- -- --- ---- ----- - -- - - - ---
Fr
--
---------- --- --- - - - -- - -- -- -----------
-
----------
-- --- --
-- --------- -
-- - --------
----
-- ------
- - - --- - --
-- - ---
++++++++++++++++++++++++++ +++++++++++++++ +++++++++++++++
-
---
-- ---
---- ---------FFzz
06/26/06 SCIDAC 2006 4
Plasma Accelerator Progress“Accelerator Moore’s Law”
RAL
LBLOsaka
UCLA
E164X
ILC
Current Energy Frontier
ANL
E167
How do we gethere?
LBL
06/26/06 SCIDAC 2006 5
For 50GeV energy gain, 261,712 node-hours is needed for the Osiris simulation.
Challenge in PIC modeling
7μm 7μm 45μm
Beam spot size
~1.8E10 e-
Beamcharge
37 μm50Gev26~2E16 cm-3
Collisionless
skin depth c/ p
Beamenergy
Density ratio
nb,peak/n0
Plasmadensity
Quasi-static PIC code.
Beam, plasma evolutiontime scale separated
Full electromagnetic
PIC code
Feature
5,234t< 0.05 p-1~0.05c/ pOsiris
t<0.05 -1
=
Timestep limit
~0.05c/ p
Grid size limit
67QuickPIC
Total time ofsimulation per GeVstage (node-hour)
SimulationCodes
13
0.05 2 p1
Typical PWFA simulation parameters and requirement
06/26/06 SCIDAC 2006 6
Challenge in PIC modeling
0.8 μm
Laserwavelength
~200TW
Laser intensity
3.7 μm30 fs0.87E-3~1.5E18 cm-3
Collisionlessskin depth c/ p
Laserduration
Density ratio
n/ncrit
Plasmadensity
Typical LWFA simulation parameters
~ 5,000t< 0.05 p-1~0.05c/ p
PonderomotiveGuiding Center PIC
code
FullPIC(Vorpal)
with PGC
Quasi-static PIC code.
Laser, plasma evolutiontime scale separated
Full electromagnetic
PIC code
Feature
~ 1.2 105t< 0.2 0-1~0.05 Osiris/Vorp
al
t < 0.05 tr
Timestep limit
~0.05c/ p
Grid size limit
~ 192QuickPIC
Total time ofsimulation per GeVstage (node-hour)
SimulationCodes
13
06/26/06 SCIDAC 2006 7
Quasi-static Model
• There are two intrinsic time scales, one fast time scale associatedwith the plasma motion and one slow time scale associated withthe betatron motion of an ultra-relativistic electron beam.
• Quasi-static approximation eliminates the need to follow fastplasma motion for the whole simulation.
• Ponderomotive Guiding Center approximation: High frequencylaser oscillation can be averaged out, laser pulse will be repre-sented by its envelope.
06/26/06 SCIDAC 2006 8
Implementation
The driver evolution can be calculated in a 3D moving box,while the plasma response can be solved for slice by slicewith the longitudinal index being a time-like variable.
06/26/06 SCIDAC 2006 9
• Object-Oriented design in modern Fortran language,easily ported to major operating systems.
• Parallelized in both the plasma and the particle/laserbeam solvers.
• Use fast Sin/Cos transform to perform FFT.• Can accommodate beam drivers, laser driver and
external injection simultaneously.• Include ionization process and radiation damping.• Can model plasma channel with arbitrary profile.• Include ion motion.
Code Features
06/26/06 SCIDAC 2006 10
Benchmark with full PIC code
-3
-2
-1
0
1
2
-5 0 5 10
OSIRISQuickPIC
Lon
gitu
dina
l wak
efie
ld(m
cp/e
)
(c/p)
-3
-2
-1
0
1
2
3
-8 -6 -4 -2 0 2 4 6 8
OsirisQuickPIC (l=2)QuickPIC (l=4)
Long
itudi
nal W
akef
ield
(mc
p/e)
(c/p)
-0.1
-0.05
0
0.05
0.1
-10 -5 0 5 10
Osiris
QuickPIC (l=2)
Long
itudi
nal W
akef
ield
(mc
p/e)
(c/p)
-1
-0.5
0
0.5
1
-6 -4 -2 0 2 4 6
Osiris QuickPIC (l=2)
Long
itudi
nal W
akef
ield
(mc
p/e)
(c/p)
e- driver e+ driver
e- driver with
ionization laser driver
100+ CPU savings with “no” loss in accuracy
06/26/06 SCIDAC 2006 11
-4
-2
0
2
4
6
8
0
2
4
6
8
10
12
0 100 200 300 400 500
E (
GeV
)
Beam
curren
t (KA
)
z(μm)
Emax ~ 5 0.5( tron radiation) = 4.5GeV
Emax ~ 4GeV
(initial energy chirp
considered)
E164X experiment
QuickPIC simulation
Modeling self-ionized PWFA experiment
Experiment conducted at SLAChas shown 4 GeV energy gain ofthe electron beam in 30 cmplasma.
QuickPIC simulation hasshown 4.5 GeV energy gainwith similar features in theenergy diagnostics.
06/26/06 SCIDAC 2006 12
A TeV class afterburner
06/26/06 SCIDAC 2006 13
Simulation result of 1 TeV PWFA
500 GeV energy gain in 25m! Energy spread ~ 5%
Wakefield evolution is stable
06/26/06 SCIDAC 2006 14
Laser wakefield simulation
QuickPIC simulation for LWFA in the blow-out regime
06/26/06 SCIDAC 2006 15
Modeling LWFA in a plasma channel
06/26/06 SCIDAC 2006 16
0.8mm 2.6mm 3.4mm
3D OSIRIS
QuickPIC
An accurate and efficient tool for LWFA
Tremendous time-saving!
06/26/06 SCIDAC 2006 17
Exploiting more parallelism: Pipelining
• Pipelining technique exploits parallelism in a sequential operationstream and can be adopted in various levels.
• Modern CPU designs include instruction level pipeline toimprove performance by increasing the throughput.
• In scientific computation, software level pipeline is less commondue to hidden parallelism in the algorithm.
• We are implementing a software level pipeline in QuickPIC.
Moving Window
plasma response
1 ~(# of slices)/25 ~ 31Stages
Plasma/beamupdate
IF, ID, EX, MEM,WB
Operation
Plasma sliceInstruction streamOperand
Software pipelineInstruction pipeline
06/26/06 SCIDAC 2006 18
beam
solve plasmaresponse
update beam
Initialplasma slab
Without pipelining: Beam is not advanced
until entire plasma response is determined
solve plasmaresponse
update beam
solve plasmaresponse
update beam
solve plasmaresponse
update beam
solve plasmaresponse
update beam
beam
1 2 3 4
With pipelining: Each section is updated when its
input is ready, the plasma slab flows in the pipeline.
Initialplasma slab
Pipelining: scaling QuickPICto 10,000+ processors
06/26/06 SCIDAC 2006 19
Speedup in pipeline mode
0
10
20
30
40
50
60
70
0 20 40 60 80
Stages in the pipeline
Sp
eed
up
Actual
Ideal
Performance in pipeline mode
• Preliminary benchmark shows thatpipeline operation can reachperformance very close to the idealsituation.
• Time to transfer a plasma slice betweensuccessive stages is inexpensive anddoes not depend on the number ofstages.
• Speedup will saturate when overhead(time to transfer a slice) becomessignificant in the total time spend ineach stage.
• In each stage, the number of processorsis chosen according to the transversesize of the problem.
16,384512Total
12832# of stages
12816# of CPU ineach stage
HighRes.
Typical
06/26/06 SCIDAC 2006 20
Summary
By taking advantage of the two different time scales in PWFA/LWFAproblems, QuickPIC allows 100-1000 times time-saving for simulations ofstate-of-art experiments.
QuickPIC enables scientific discovery in plasma-based acceleration byexploring parameter space which are not easily accessible throughconventional PIC code.
We are working to scale QuickPIC to the petascale platform using thesoftware pipelining technique. Initial benchmark shows very promisingperformance enhancement.