Introduction to Algorithmic Differentiation
Kshitij Kulshreshtha
Universitat Paderborn
ESSIM Summer School13. August - 16. August 2012
Dresden
K. Kulshreshtha 1 / 94 Intro to AD ESSIM Summer School
Schedule
1. Introduction/Motivation/Function Representation/Forward mode2. Reverse Mode/Higher order Derivatives3. Implementations/ADOL-C4. Hand on Exercises (ADOL-C)
K. Kulshreshtha 2 / 94 Intro to AD ESSIM Summer School
Computing Derivatives
SimulationSensitivity
CalculationOptimization
K. Kulshreshtha 3 / 94 Intro to AD ESSIM Summer School
Computing Derivatives
SimulationSensitivity
CalculationOptimization
data
theory
user
input x
output y
modelling computerprogram
K. Kulshreshtha 3 / 94 Intro to AD ESSIM Summer School
Computing Derivatives
SimulationSensitivity
CalculationOptimization
data
theory
user
differentiation
input x
output y
modelling computerprogram
enhancedprogram
K. Kulshreshtha 3 / 94 Intro to AD ESSIM Summer School
Computing Derivatives
SimulationSensitivity
CalculationOptimization
data
theory
user user
differentiation
input x
output y
input x
output y
modelling computerprogram
enhancedprogram
optimizationalgorithm
sensitivityyx
K. Kulshreshtha 3 / 94 Intro to AD ESSIM Summer School
Computing DerivativesGiven:
Description of functional relation asI formula y = F (x) explicit expression y = F (x)I computer program ?
Task:
Computation of derivatives takingI requirements on exactnessI computational effort
into account
K. Kulshreshtha 4 / 94 Intro to AD ESSIM Summer School
Computing DerivativesGiven:
Description of functional relation asI formula y = F (x) explicit expression y = F (x)I computer program ?
Task:
Computation of derivatives takingI requirements on exactnessI computational effort
into account
K. Kulshreshtha 4 / 94 Intro to AD ESSIM Summer School
Derivation: What for?
K. Kulshreshtha 5 / 94 Intro to AD ESSIM Summer School
Derivation: What for?I Optimization:
unbounded: min f (x), f : Rn Rbounded: min f (x), f : Rn R
c(x) = 0, c : Rn Rmh(x) 0, h : Rn Rl
I Solution of nonlinear equation systemsF (x) = 0, F : Rn Rn
Newton method requires F (x) RnnI Simulation of complex procedures
I system definitionI integration of differential equations
I Sensitivity analysisI Real-time control
K. Kulshreshtha 5 / 94 Intro to AD ESSIM Summer School
Derivation: What for?I Optimization:
unbounded: min f (x), f : Rn Rbounded: min f (x), f : Rn R
c(x) = 0, c : Rn Rmh(x) 0, h : Rn Rl
I Solution of nonlinear equation systemsF (x) = 0, F : Rn Rn
Newton method requires F (x) Rnn
I Simulation of complex proceduresI system definitionI integration of differential equations
I Sensitivity analysisI Real-time control
K. Kulshreshtha 5 / 94 Intro to AD ESSIM Summer School
Derivation: What for?I Optimization:
unbounded: min f (x), f : Rn Rbounded: min f (x), f : Rn R
c(x) = 0, c : Rn Rmh(x) 0, h : Rn Rl
I Solution of nonlinear equation systemsF (x) = 0, F : Rn Rn
Newton method requires F (x) RnnI Simulation of complex procedures
I system definitionI integration of differential equations
I Sensitivity analysisI Real-time control
K. Kulshreshtha 5 / 94 Intro to AD ESSIM Summer School
Derivation: What for?I Optimization:
unbounded: min f (x), f : Rn Rbounded: min f (x), f : Rn R
c(x) = 0, c : Rn Rmh(x) 0, h : Rn Rl
I Solution of nonlinear equation systemsF (x) = 0, F : Rn Rn
Newton method requires F (x) RnnI Simulation of complex procedures
I system definitionI integration of differential equations
I Sensitivity analysisI Real-time control
K. Kulshreshtha 5 / 94 Intro to AD ESSIM Summer School
Symbolic Differentiation
K. Kulshreshtha 6 / 94 Intro to AD ESSIM Summer School
Symbolic Differentiation
x2 2x
K. Kulshreshtha 6 / 94 Intro to AD ESSIM Summer School
Symbolic Differentiation
x2 2xxp pxp1
K. Kulshreshtha 6 / 94 Intro to AD ESSIM Summer School
Symbolic Differentiation
x2 2xxp pxp1cos(x) sin(x)
K. Kulshreshtha 6 / 94 Intro to AD ESSIM Summer School
Symbolic Differentiation
x2 2xxp pxp1cos(x) sin(x)exp(x) exp(x)
K. Kulshreshtha 6 / 94 Intro to AD ESSIM Summer School
Symbolic Differentiation
x2 2xxp pxp1cos(x) sin(x)exp(x) exp(x)log(x) 1x
K. Kulshreshtha 6 / 94 Intro to AD ESSIM Summer School
Symbolic Differentiation
x2 2xxp pxp1cos(x) sin(x)exp(x) exp(x)log(x) 1xg(h(x)) g(h(x)) h(x)
...
K. Kulshreshtha 6 / 94 Intro to AD ESSIM Summer School
Symbolic Differentiation
I exact derivativesI f (x) = exp(sin(x2))
f (x) = exp(sin(x2)) cos(x2) 2xI min J(x , u) such that x = f (x , u) + IC
reduced formulation: J(x , u) J(u)
J (u) based on symbolic adjoint = fx (x , u)> + TC
I cost (common subexpression, implementation)
I legacy code with large number of linesclosed form expression not available
K. Kulshreshtha 7 / 94 Intro to AD ESSIM Summer School
Symbolic Differentiation
I exact derivativesI f (x) = exp(sin(x2))
f (x) = exp(sin(x2)) cos(x2) 2xI min J(x , u) such that x = f (x , u) + IC
reduced formulation: J(x , u) J(u)
J (u) based on symbolic adjoint = fx (x , u)> + TCI cost (common subexpression, implementation)
I legacy code with large number of linesclosed form expression not available
K. Kulshreshtha 7 / 94 Intro to AD ESSIM Summer School
read_input_file(argv[1], &code_control);
code_control.timestep_type = 0; // calculate timestep size like TAU
// read in CFD mesh read_cfd_mesh(code_control.CFDmesh_name, &gridbase); grid[0] = gridbase; // remove mesh corner points arizing more than once . . . // e.g. for block structured area and at interface between // block structured and unstructured area remove_double_points( &gridbase, grid); // write out mesh in tecplot format write_pointdata( name, &(grid[0]));
// calculate metric of finest grid level/* grid[0].xp[ii][1] += 0.00000001; */ calc_metric(&(grid[0]), &code_control); puts("calc_metric ready"); // create coarse meshes for multigrid, calculate their metric // and initialze forcing functions to zero for (i = 1; i < code_control.nlevels; i++) { create_coarse_mesh(&(grid[i1]), &(grid[i])); init2zero(&(grid[i]), grid[i].force); } puts("create_coarse_mesh ready"); // initialize flow field on all grid levels to free stream // quantities for (i = 0; i < code_control.nlevels; i++) init_field(&(grid[i]), &code_control); puts("init_field ready"); // if selected read restart file if (code_control.restart == 1) read_restart( "restart", grid, &code_control, &first_residual, &first_step);
// calculate primitive variables for all grid levels and // initialize states at the boundary for (i = 0; i < code_control.nlevels; i++) { cons2prim(&(grid[i]), &code_control); init_bdry_states(&(grid[i])); } // open file for writing convergence history conv = fopen("conv.dat", "w"); fprintf(conv, "title = convergence\n"); fprintf(conv, "variables = iter, l2res, lift, drag\n");
level = 0; printf("will perform %d steps\n",code_control.nsteps[level]);
// starting time of computation t1 = time(&t1);
double lift, drag;
// loop over all multigrid cycles
Jan 01, 08 21:46 Seite 29/30euler2d.c
for (it = 0; it < code_control.nsteps[level]; it++) { double residual; lift = 0.0; drag = 0.0;
// calculate actual weight of gradient needed for reconstruction if (sum_it+first_step 1.0e10) ? residual: 1.0;
// print out convergence information to file and standard output printf("IT = %d %20.10e %20.10e %20.10e %4.2f\n", sum_it, residual / first_residual, lift, drag, weight); fprintf(conv, "%d %20.10e %20.10e %20.10e\n", sum_it+first_step, residual / first_residual, lift, drag); sum_it++; }
// final time of computation t2 = time(&t2); // print out time needed for the time loop printf ("Zeit : %f\n", difftime(t2, t1)); last_step = first_step + code_control.nsteps[0] ;
fclose(conv);
// map solution from cell centers to vertices center2point(grid); // write out field solution write_eulerdata( "euler.dat", grid, &code_control); // write out solution on walls write_surf( "eulersurf.dat", grid, &code_control); // write restart file write_restart( "restart", grid, &code_control, first_residual, last_step);
return 0;}
Jan 01, 08 21:46 Seite 30/30euler2d.c
Gedruckt von Andrea Walther
Dienstag Januar 01, 2008 15/15euler2d.c
K. Kulshreshtha 8 / 94 Intro to AD ESSIM Summer School
Finite Differences
Idea: Taylor-expansion, f : R R smooth then
f (x + h) = f (x) + h(x) + h2f (x)/2 + h3f (x)/6 + f (x + h) f (x) + hf (x)
Df (x) f (x + h) f (x)h
I simple derivative calculation (only function evaluations!)I inexact derivativesI computation cost often too high
F : Rn R OPS(F (x)) (n + 1)OPS(F (x))
K. Kulshreshtha 9 / 94 Intro to AD ESSIM Summer School
Finite Differences
Idea: Taylor-expansion, f : R R smooth then
f (x + h) = f (x) + h(x) + h2f (x)/2 + h3f (x)/6 + f (x + h) f (x) + hf (x)
Df (x) f (x + h) f (x)h
I simple derivative calculation (only function evaluations!)I inexact derivativesI computation cost often too high
F : Rn R OPS(F (x)) (n + 1)OPS(F (x))
K. Kulshreshtha 9 / 94 Intro to AD ESSIM Summer School
Example 1
K. Kulshreshtha 10 / 94 Intro to AD ESSIM Summer School
Example 1
K. Kulshreshtha 11 / 94 Intro to AD ESSIM Summer School
Example 1
K. Kulshreshtha 12 / 94 Intro to AD ESSIM Summer School
Example 1
K. Kulshreshtha 13 / 94 Intro to AD ESSIM Summer School
Example 1
K. Kulshreshtha 14 / 94 Intro to AD ESSIM Summer School
Example 1
K. Kulshreshtha 15 / 94 Intro to AD ESSIM Summer School
Example 2
K. Kulshreshtha 16 / 94 Intro to AD ESSIM Summer School
Example 2
K. Kulshreshtha 17 / 94 Intro to AD ESSIM Summer School
Example 2
K. Kulshreshtha 18 / 94 Intro to AD ESSIM Summer School
Example 2
K. Kulshreshtha 19 / 94 Intro to AD ESSIM Summer School
Example 2
K. Kulshreshtha 20 / 94 Intro to AD ESSIM Summer School
Algorithmic Differentiation (AD)
Main Products:I Quantitative dependence information (local):
I Weighted and directed partial derivativesI Error and condition number estimates . . .I Lipschitz constants, interval enclosures . . .I Eigenvalues, Newton steps . . .
I Qualitative dependence information (regional):I Sparsity structures, degrees of polynomialsI Ranks, eigenvalue multiplicities . . .
K. Kulshreshtha 21 / 94 Intro to AD ESSIM Summer School
Algorithmic Differentiation (AD)
Main Products:I Quantitative dependence information (local):
I Weighted and directed partial derivativesI Error and condition number estimates . . .I Lipschitz constants, interval enclosures . . .I Eigenvalues, Newton steps . . .
I Qualitative dependence information (regional):I Sparsity structures, degrees of polynomialsI Ranks, eigenvalue multiplicities . . .
K. Kulshreshtha 21 / 94 Intro to AD ESSIM Summer School
Algorithmic Differentation (AD)
Main Properties:I No truncation errors
I Chain rule applied to numbersI Applicability to arbitrary programsI A priori bounded and/or adjustable costs:
I Total operations countI Maximal memory requirementI Total memory traffic
always relative to original function.
K. Kulshreshtha 22 / 94 Intro to AD ESSIM Summer School
Algorithmic Differentation (AD)
Main Properties:I No truncation errorsI Chain rule applied to numbers
I Applicability to arbitrary programsI A priori bounded and/or adjustable costs:
I Total operations countI Maximal memory requirementI Total memory traffic
always relative to original function.
K. Kulshreshtha 22 / 94 Intro to AD ESSIM Summer School
Algorithmic Differentation (AD)
Main Properties:I No truncation errorsI Chain rule applied to numbersI Applicability to arbitrary programs
I A priori bounded and/or adjustable costs:I Total operations countI Maximal memory requirementI Total memory traffic
always relative to original function.
K. Kulshreshtha 22 / 94 Intro to AD ESSIM Summer School
Algorithmic Differentation (AD)
Main Properties:I No truncation errorsI Chain rule applied to numbersI Applicability to arbitrary programsI A priori bounded and/or adjustable costs:
I Total operations countI Maximal memory requirementI Total memory traffic
always relative to original function.
K. Kulshreshtha 22 / 94 Intro to AD ESSIM Summer School
Algorithmic Differentation (AD)
Main Properties:I No truncation errorsI Chain rule applied to numbersI Applicability to arbitrary programsI A priori bounded and/or adjustable costs:
I Total operations countI Maximal memory requirementI Total memory traffic
always relative to original function. (Griewank, Walther 09)
K. Kulshreshtha 22 / 94 Intro to AD ESSIM Summer School
The Hello-World-Example of AD
Lighthouse
quay
y1 = tan( t) tan( t) and y2 =
tan( t) tan( t)
K. Kulshreshtha 23 / 94 Intro to AD ESSIM Summer School
The Hello-World-Example of AD
Lighthouse
quay
y1 = tan( t) tan( t) and y2 =
tan( t) tan( t)
K. Kulshreshtha 23 / 94 Intro to AD ESSIM Summer School
The Hello-World-Example of AD
quay
Lighthouse
y1
y2
t
y2 = y1
y1 = tan( t) tan( t) and y2 =
tan( t) tan( t)
K. Kulshreshtha 23 / 94 Intro to AD ESSIM Summer School
The Hello-World-Example of AD
quay
Lighthouse
y1
y2
t
y2 = y1
y1 = tan( t) tan( t) and y2 =
tan( t) tan( t)
K. Kulshreshtha 23 / 94 Intro to AD ESSIM Summer School
Automatisation:Computer tolls as Maple, Mathematica, . . .
Maple provides
K. Kulshreshtha 24 / 94 Intro to AD ESSIM Summer School
Automatisation:Computer tolls as Maple, Mathematica, . . .
Maple provides
K. Kulshreshtha 24 / 94 Intro to AD ESSIM Summer School
Automatisation:Computer tolls as Maple, Mathematica, . . .
Maple provides
K. Kulshreshtha 24 / 94 Intro to AD ESSIM Summer School
Automatisation:Computer tolls as Maple, Mathematica, . . .
Maple provides
K. Kulshreshtha 24 / 94 Intro to AD ESSIM Summer School
Forward Mode of AD
x(t)y(t)
x
yF
F
y(t) = t F (x(t)) = F(x(t)) x(t) F (x , x)
K. Kulshreshtha 25 / 94 Intro to AD ESSIM Summer School
Forward Mode(Lighthouse)v3 = x1 = v2 = x2 = v1 = x3 = v0 = x4 = t
v1 = v1 v0v2 = tan(v1)v3 = v2 v2v4 = v3 v2v5 = v4/v3v6 = v5 v2y1 = v5y2 = v6
v3 = x1v2 = x2v1 = x3v0 = x4v1 = v1 v0 + v1 v0v2 = v1/ cos(v1)2
v3 = v2 v2v4 = v3 v2 + v3 v2v5 = (v4 v3 v5) (1/v3)v6 = v5 v2 + v5 v2y1 = v5y2 = v6
K. Kulshreshtha 26 / 94 Intro to AD ESSIM Summer School
Forward Mode(Lighthouse)v3 = x1 = v2 = x2 = v1 = x3 = v0 = x4 = t
v1 = v1 v0v2 = tan(v1)v3 = v2 v2v4 = v3 v2v5 = v4/v3v6 = v5 v2y1 = v5y2 = v6
v3 = x1v2 = x2v1 = x3v0 = x4
v1 = v1 v0 + v1 v0v2 = v1/ cos(v1)2
v3 = v2 v2v4 = v3 v2 + v3 v2v5 = (v4 v3 v5) (1/v3)v6 = v5 v2 + v5 v2y1 = v5y2 = v6
K. Kulshreshtha 26 / 94 Intro to AD ESSIM Summer School
Forward Mode(Lighthouse)v3 = x1 = v2 = x2 = v1 = x3 = v0 = x4 = t
v1 = v1 v0v2 = tan(v1)v3 = v2 v2v4 = v3 v2v5 = v4/v3v6 = v5 v2y1 = v5y2 = v6
v3 = x1v2 = x2v1 = x3v0 = x4v1 = v1 v0 + v1 v0
v2 = v1/ cos(v1)2
v3 = v2 v2v4 = v3 v2 + v3 v2v5 = (v4 v3 v5) (1/v3)v6 = v5 v2 + v5 v2y1 = v5y2 = v6
K. Kulshreshtha 26 / 94 Intro to AD ESSIM Summer School
Forward Mode(Lighthouse)v3 = x1 = v2 = x2 = v1 = x3 = v0 = x4 = t
v1 = v1 v0v2 = tan(v1)v3 = v2 v2v4 = v3 v2v5 = v4/v3v6 = v5 v2y1 = v5y2 = v6
v3 = x1v2 = x2v1 = x3v0 = x4v1 = v1 v0 + v1 v0v2 = v1/ cos(v1)2
v3 = v2 v2v4 = v3 v2 + v3 v2v5 = (v4 v3 v5) (1/v3)v6 = v5 v2 + v5 v2y1 = v5y2 = v6
K. Kulshreshtha 26 / 94 Intro to AD ESSIM Summer School
Forward Mode(Lighthouse)v3 = x1 = v2 = x2 = v1 = x3 = v0 = x4 = t
v1 = v1 v0v2 = tan(v1)v3 = v2 v2v4 = v3 v2v5 = v4/v3v6 = v5 v2y1 = v5y2 = v6
v3 = x1v2 = x2v1 = x3v0 = x4v1 = v1 v0 + v1 v0v2 = v1/ cos(v1)2
v3 = v2 v2
v4 = v3 v2 + v3 v2v5 = (v4 v3 v5) (1/v3)v6 = v5 v2 + v5 v2y1 = v5y2 = v6
K. Kulshreshtha 26 / 94 Intro to AD ESSIM Summer School
Forward Mode(Lighthouse)v3 = x1 = v2 = x2 = v1 = x3 = v0 = x4 = t
v1 = v1 v0v2 = tan(v1)v3 = v2 v2v4 = v3 v2v5 = v4/v3v6 = v5 v2y1 = v5y2 = v6
v3 = x1v2 = x2v1 = x3v0 = x4v1 = v1 v0 + v1 v0v2 = v1/ cos(v1)2
v3 = v2 v2v4 = v3 v2 + v3 v2
v5 = (v4 v3 v5) (1/v3)v6 = v5 v2 + v5 v2y1 = v5y2 = v6
K. Kulshreshtha 26 / 94 Intro to AD ESSIM Summer School
Forward Mode(Lighthouse)v3 = x1 = v2 = x2 = v1 = x3 = v0 = x4 = t
v1 = v1 v0v2 = tan(v1)v3 = v2 v2v4 = v3 v2v5 = v4/v3v6 = v5 v2y1 = v5y2 = v6
v3 = x1v2 = x2v1 = x3v0 = x4v1 = v1 v0 + v1 v0v2 = v1/ cos(v1)2
v3 = v2 v2v4 = v3 v2 + v3 v2v5 = (v4 v3 v5) (1/v3)
v6 = v5 v2 + v5 v2y1 = v5y2 = v6
K. Kulshreshtha 26 / 94 Intro to AD ESSIM Summer School
Forward Mode(Lighthouse)v3 = x1 = v2 = x2 = v1 = x3 = v0 = x4 = t
v1 = v1 v0v2 = tan(v1)v3 = v2 v2v4 = v3 v2v5 = v4/v3v6 = v5 v2y1 = v5y2 = v6
v3 = x1v2 = x2v1 = x3v0 = x4v1 = v1 v0 + v1 v0v2 = v1/ cos(v1)2
v3 = v2 v2v4 = v3 v2 + v3 v2v5 = (v4 v3 v5) (1/v3)v6 = v5 v2 + v5 v2
y1 = v5y2 = v6
K. Kulshreshtha 26 / 94 Intro to AD ESSIM Summer School
Forward Mode(Lighthouse)v3 = x1 = v2 = x2 = v1 = x3 = v0 = x4 = t
v1 = v1 v0v2 = tan(v1)v3 = v2 v2v4 = v3 v2v5 = v4/v3v6 = v5 v2y1 = v5y2 = v6
v3 = x1v2 = x2v1 = x3v0 = x4v1 = v1 v0 + v1 v0v2 = v1/ cos(v1)2
v3 = v2 v2v4 = v3 v2 + v3 v2v5 = (v4 v3 v5) (1/v3)v6 = v5 v2 + v5 v2y1 = v5y2 = v6
K. Kulshreshtha 26 / 94 Intro to AD ESSIM Summer School
Complexity (Forward Mode)
tang c MOVES 1 + 1 3 + 3 3 + 3 2 + 2ADDS 0 1 + 1 0 + 1 0 + 0MULTS 0 0 1 + 2 0 + 1NLOPS 0 0 0 1 + 1
OPS(F (x)x) c OPS(F (x))
with c [2,5/2] platform dependent
K. Kulshreshtha 27 / 94 Intro to AD ESSIM Summer School
Reverse Mode AD = Discrete Adjoints
x
F
y
x y>F (x) = x y>F (x) F (x , y)
K. Kulshreshtha 28 / 94 Intro to AD ESSIM Summer School
Reverse Mode AD = Discrete Adjoints
x
F
y
y
y >y
=c
x y>F (x) = x y>F (x) F (x , y)
K. Kulshreshtha 28 / 94 Intro to AD ESSIM Summer School
Reverse Mode AD = Discrete Adjoints
x
x
F
Fy
y
y >F
(x)=
c
y >y
=c
x y>F (x) = x y>F (x) F (x , y)
K. Kulshreshtha 28 / 94 Intro to AD ESSIM Summer School
Reverse Mode AD = Discrete Adjoints
x
x
F
Fy
y
y >F
(x)=
c
y >y
=c
x y>F (x) = x y>F (x) F (x , y)
K. Kulshreshtha 28 / 94 Intro to AD ESSIM Summer School
Reverse Mode (Lighthouse)v3 = x1; v2 = x2; v1 = x3; v0 = x4;
v1 = v1 v0;v2 = tan(v1);
v3 = v2 v2;v4 = v3 v2;
v5 = v4/v3;v6 = v5 v2;
y1 = v5; y2 = v6;v5 = y1; v6 = y2;
v5 += v6v2; v2 += v6v5;v4 += v5/v3; v3 = v5v5/v3;
v3 += v4v2; v2 += v4v3;v2 += v3; v2 = v3;
v1 += v2/ cos2(v1);v1 += v1v0; v0 += v1v1;
x4 = v0; x3 = v1; x2 = v2; x1 = v3;
K. Kulshreshtha 29 / 94 Intro to AD ESSIM Summer School
Complexity (Reverse Mode)grad c
MOVES 1 + 1 3 + 6 3 + 8 2 + 5ADDS 0 1 + 2 0 + 2 0 + 1MULTS 0 0 1 + 2 0 + 1NLOPS 0 0 0 1 + 1
OPS(y>F (x)) c OPS(F (x))MEM(y>F (x)) OPS(F (x))
with c [3,4] platform dependent
Remarks:I Cost for gradient calculation independent of nI Memory requirement may cause problem! Checkpointing
K. Kulshreshtha 30 / 94 Intro to AD ESSIM Summer School
Complexity (Reverse Mode)grad c
MOVES 1 + 1 3 + 6 3 + 8 2 + 5ADDS 0 1 + 2 0 + 2 0 + 1MULTS 0 0 1 + 2 0 + 1NLOPS 0 0 0 1 + 1
OPS(y>F (x)) c OPS(F (x))MEM(y>F (x)) OPS(F (x))
with c [3,4] platform dependentRemarks:
I Cost for gradient calculation independent of nI Memory requirement may cause problem! Checkpointing
K. Kulshreshtha 30 / 94 Intro to AD ESSIM Summer School
Algorithmic Differentiation (AD)Differentiation of computer programs within machine precision
Basic forms:
forward mode: OPS(F (x)x) c1 OPS(F ), c1 [2,5/2]reverse mode: OPS(y>F (x)) c2 OPS(F ), c2 [3,4]
MEM(y> F (x)) OPS(F )combination: OPS(y>F (x)x) c3 OPS(F ), c3 [7,10]
Tasks:
Derivatives (of any order) sparsity patterns, differentiation of fixpointiterations and time-stepping procedures . . .Implementations?
K. Kulshreshtha 31 / 94 Intro to AD ESSIM Summer School
Algorithmic Differentiation (AD)Differentiation of computer programs within machine precision
Basic forms:
forward mode: OPS(F (x)x) c1 OPS(F ), c1 [2,5/2]reverse mode: OPS(y>F (x)) c2 OPS(F ), c2 [3,4]
MEM(y> F (x)) OPS(F )combination: OPS(y>F (x)x) c3 OPS(F ), c3 [7,10]
Tasks:
Derivatives (of any order) sparsity patterns, differentiation of fixpointiterations and time-stepping procedures . . .
Implementations?
K. Kulshreshtha 31 / 94 Intro to AD ESSIM Summer School
Algorithmic Differentiation (AD)Differentiation of computer programs within machine precision
Basic forms:
forward mode: OPS(F (x)x) c1 OPS(F ), c1 [2,5/2]reverse mode: OPS(y>F (x)) c2 OPS(F ), c2 [3,4]
MEM(y> F (x)) OPS(F )combination: OPS(y>F (x)x) c3 OPS(F ), c3 [7,10]
Tasks:
Derivatives (of any order) sparsity patterns, differentiation of fixpointiterations and time-stepping procedures . . .Implementations?
K. Kulshreshtha 31 / 94 Intro to AD ESSIM Summer School
Intermediate Conclusions
K. Kulshreshtha 32 / 94 Intro to AD ESSIM Summer School
Intermediate Conclusions
I Evaluation of derivatives with working accuracy
I Efficient evaluation of derivatives of any order
I Numerous tools available,based on source transformation or operator overloading
I Nondifferentiability only on meager set
I Optimal Jacobi accumulation is (NP-)hard
I Full Jacobians/Hessians often not needed or sparse
I see www.autodiff.org
K. Kulshreshtha 32 / 94 Intro to AD ESSIM Summer School
Intermediate Conclusions
I Evaluation of derivatives with working accuracy
I Efficient evaluation of derivatives of any order
I Numerous tools available,based on source transformation or operator overloading
I Nondifferentiability only on meager set
I Optimal Jacobi accumulation is (NP-)hard
I Full Jacobians/Hessians often not needed or sparse
I see www.autodiff.org
K. Kulshreshtha 32 / 94 Intro to AD ESSIM Summer School
Intermediate Conclusions
I Evaluation of derivatives with working accuracy
I Efficient evaluation of derivatives of any order
I Numerous tools available,based on source transformation or operator overloading
I Nondifferentiability only on meager set
I Optimal Jacobi accumulation is (NP-)hard
I Full Jacobians/Hessians often not needed or sparse
I see www.autodiff.org
K. Kulshreshtha 32 / 94 Intro to AD ESSIM Summer School
Forward Mode of AD
x(t)y(t)
x
yF
F
y(t) = t F (x(t)) = F(x(t)) x(t) F (x , x)
K. Kulshreshtha 33 / 94 Intro to AD ESSIM Summer School
Evaluation procedure for:
F (x) = exp(sin(x2))
ddt
F (x) = exp(sin(x2)) cos(x2) 2x x
v0 = x v0 = xv1 = v20 v1 = 2v0 v0v2 = sin(v1) v2 = cos(v1) v1v3 = exp(v2) v3 = exp(v2) v2y = v3 y = v3 = ddt F (x)
K. Kulshreshtha 34 / 94 Intro to AD ESSIM Summer School
General: With ui (vj )ji
v = (ui ) v = i (ui )ui
Elementary Tangents for Instrinsic Functions
v = sin(u) v = cos(u) uv =
u v = 0.5 u/vv = tan(u) v = u/ cos2(u)
Elementary Tangents for Arithmetic Operations
v = u w v = u wv = u w v = w u + u wv = v/w v = (u v w)/w
K. Kulshreshtha 35 / 94 Intro to AD ESSIM Summer School
Forward Differentiation of Lighthouse Examplev3 = x1 = v3 x1v2 = x2 = v2 x2v1 = x3 = v1 x3v0 = x4 = t v0 x4v1 = v1 v0v2 = tan(v1)v3 = v2 v2v4 = v3 v2v5 = v4/v3v6 = v5v7 = v5 v2y1 = v6y2 = v7
K. Kulshreshtha 36 / 94 Intro to AD ESSIM Summer School
Forward Differentiation of Lighthouse Examplev3 = x1 = v3 x1v2 = x2 = v2 x2v1 = x3 = v1 x3v0 = x4 = t v0 x4v1 = v1 v0 v1 = v1 v0 + v1 v0v2 = tan(v1) v2 = v1/ cos(v1)2
v3 = v2 v2 v3 = v2 v2v4 = v3 v2 v4 = v3 v2 + v3 v2v5 = v4/v3 v5 = (v4 v3 v5) (1/v3)v6 = v5 v6 = v5v7 = v5 v2 v7 = v5 v2 + v5 v2y1 = v6y2 = v7
K. Kulshreshtha 36 / 94 Intro to AD ESSIM Summer School
Forward Differentiation of Lighthouse Examplev3 = x1 = v3 x1v2 = x2 = v2 x2v1 = x3 = v1 x3v0 = x4 = t v0 x4v1 = v1 v0 v1 = v1 v0 + v1 v0v2 = tan(v1) v2 = v1/ cos(v1)2
v3 = v2 v2 v3 = v2 v2v4 = v3 v2 v4 = v3 v2 + v3 v2v5 = v4/v3 v5 = (v4 v3 v5) (1/v3)v6 = v5 v6 = v5v7 = v5 v2 v7 = v5 v2 + v5 v2y1 = v6 y1 = v6y2 = v7 y2 = v7
K. Kulshreshtha 36 / 94 Intro to AD ESSIM Summer School
Definei (ui , ui ) i (ui )ui
General Tangent Procedure
[vin, vin] = [xi , xi ] i = 1, . . . ,n
[vi , vi ] = [i (ui ), i (ui )] i = 1, . . . , l
[ymi , ymi ] = [vli , vli ] i = m 1, . . . ,0
K. Kulshreshtha 37 / 94 Intro to AD ESSIM Summer School
Complexity:
tang c MOVES 1 + 1 3 + 3 3 + 3 2 + 2ADDS 0 1 + 1 0 + 1 0 + 0MULTS 0 0 1 + 2 0 + 1NLOPS 0 0 0 1 + 1
OPS(F (x)x) c OPS(F (x))
with c [2,5/2] platform dependent
Remarks:I How to arange evaluation of [vi , vi ] optimal?I Overwrites are no problem
K. Kulshreshtha 38 / 94 Intro to AD ESSIM Summer School
Complexity:
tang c MOVES 1 + 1 3 + 3 3 + 3 2 + 2ADDS 0 1 + 1 0 + 1 0 + 0MULTS 0 0 1 + 2 0 + 1NLOPS 0 0 0 1 + 1
OPS(F (x)x) c OPS(F (x))
with c [2,5/2] platform dependent
Remarks:I How to arange evaluation of [vi , vi ] optimal?I Overwrites are no problem
K. Kulshreshtha 38 / 94 Intro to AD ESSIM Summer School
Reverse Mode of AD
x
F
y
x> y>F (x) = x y>F (x) F (x , y)
K. Kulshreshtha 39 / 94 Intro to AD ESSIM Summer School
Reverse Mode of AD
x
F
y
y
y >y
=c
x> y>F (x) = x y>F (x) F (x , y)
K. Kulshreshtha 39 / 94 Intro to AD ESSIM Summer School
Reverse Mode of AD
x
x F
F
y
y
y >F
(x)=
c
y >y
=c
x> y>F (x) = x y>F (x) F (x , y)
K. Kulshreshtha 39 / 94 Intro to AD ESSIM Summer School
Reverse Mode of AD
x
x F
F
y
y
y >F
(x)=
c
y >y
=c
x> y>F (x) = x y>F (x) F (x , y)
K. Kulshreshtha 39 / 94 Intro to AD ESSIM Summer School
Evaluation Procedure:
F (x) = exp(sin(x2))yF (x) = y exp(sin(x2)) cos(x2) 2x
v0 = x v3 = yv1 = v20 v2 = exp(v2) v3v2 = sin(v1) v1 = cos(v1) v2v3 = exp(v2) v0 = 2v0 v1y = v3 x = v0
K. Kulshreshtha 40 / 94 Intro to AD ESSIM Summer School
Computation of yF (x)?
Rewrite general tangent procedure
[vin, vin] = [xi , xi ] i = 1, . . . ,n
[vi , vi ] = [i (ui ), i (ui )] i = 1, . . . , l
[ymi , ymi ] = [vli , vli ] i = m 1, . . . ,0
in matrix-vector notation.
K. Kulshreshtha 41 / 94 Intro to AD ESSIM Summer School
Define
cij ivj
Ai
1 0 0 0...
......
......
......
...0 0 1 0ci 1n ci 2n ci i1 0 00 0 1 0...
......
......
......
...0 0 1
R(n+l)(n+l)
= I + en+i [i (ui ) en+i ]T i = 1, . . . , lPn [I,0, . . . ,0] Rn(n+l),
Qm [0, . . . , I] Rm(n+l).
K. Kulshreshtha 42 / 94 Intro to AD ESSIM Summer School
Tangent Operation vi = i (ui )ui equals v = Ai v
Linear transformation y = F (x) x equals
y = QmAlAl1 . . .A2A1PTn x ,
with the product representation
F (x) = QmAlAl1 . . .A2A1PTn Rmn .
linear transformation x = y F (x) (reverse mode) equals
xT = PnAT1 AT2 . . .A
Tl1A
Tl Q
Tmy
T ,
K. Kulshreshtha 43 / 94 Intro to AD ESSIM Summer School
Because ofATi = I + [i (ui ) en+i ]eTn+i ,
product ATi vj with vj Rn+l forms incremental operation:
1. Index i 6= j 6 i vj is left unchanged
2. Index j i vj is incremented by vi cij3. Index i vi is set to zero
Hence, for computing x = y F (x) one has toI evaluate all intermediate vi obtaining partial derivatives cijI increment adjoint values vi in reverse order.
K. Kulshreshtha 44 / 94 Intro to AD ESSIM Summer School
Corresponding Incremental Adjoint Recursion
vi = 0 i = 1 n, . . . , lvin = xi i = 1, . . . ,nvi = i (vj )ji i = 1, . . . , lymi = vli i = m 1, . . . ,0vli = ymi i = 0, . . . ,m 1vj + = vi vj
i (ui ) for j i i = l , . . . ,1xi = vin i = n, . . . ,1
No overwriting Zeroing out of the vi is omitted.
K. Kulshreshtha 45 / 94 Intro to AD ESSIM Summer School
Adjoint Recursion of Lighthouse-Examplev3 = x1; v2 = x2; v1 = x3; v0 = x4;
v1 = v1 v0;v2 = tan(v1);
v3 = v2 v2;v4 = v3 v2;
v5 = v4/v3;v6 = v5 v2;
y1 = v5; y2 = v6;v5 = y1; v6 = y2;
v5 += v6v2; v2 += v6v5;v4 += v5/v3; v3 += v5v5/v3;
v3 += v4v2; v2 += v4v3;v2 += v3; v2 = v3;
v1 += v2/ cos2(v1);v1 += v1v0; v0 += v1v1;
x4 = v0; x3 = v1; x2 = v2; x1 = v3;
K. Kulshreshtha 46 / 94 Intro to AD ESSIM Summer School
Reverse Operations of ElementalsElemental Reverse
v = c v = 0
v = u w u += v ; w = v ; v = 0
v = u w u += v w ; w += v u; v = 0
v = 1/u u= (v v) v ; v = 0
v =
u u += 0.5 v/v ; v = 0
v = uc u += (v c) v/u; v = 0
v = exp (u) u += v v ; v = 0
v = log (u) u += v/u; v = 0
v = sin (u) u += v cos (u); v = 0
K. Kulshreshtha 47 / 94 Intro to AD ESSIM Summer School
Complexity:
grad c MOVES 1 + 1 3 + 6 3 + 8 2 + 5ADDS 0 1 + 2 0 + 2 0 + 1MULTS 0 0 1 + 2 0 + 1NLOPS 0 0 0 1 + 1
OPS(yF (x)) c OPS(F (x))
with c [3,5] platform dependentRemark:
I Overwrites cause problem! See Checkpointing.
K. Kulshreshtha 48 / 94 Intro to AD ESSIM Summer School
Interpretation of adjoint values vi :I Lagrange multipliers of
Min yys.t. (ui ) vi = 0 i = 1, . . . , l
vli ymi = 0 i = 1, . . . ,mI Error propagation factors
vi = vi (1 + i ), |i | macheps
y y =l
i=1
vivii + H.O.T
l
i=1
|vivi |+ H.O.T
K. Kulshreshtha 49 / 94 Intro to AD ESSIM Summer School
Vector ModesHow to reduce overhead in forward and reverse mode?
Propagate a bundle of vectors instead of a single vector!
Forward Mode:
Compute Y = F (x)X Rmp for X Rnp
instead of
y = F (x)x Rm for x Rn
I Replace vj R by Vj Rp in tangent procedureI Replace z R(lm) by Z R(lm)pI Everything else remains unchanged
K. Kulshreshtha 50 / 94 Intro to AD ESSIM Summer School
Vector Tangents:
v = u w V = U Wv = u w V = w U + u Wv = sin(u) V = cos(u) U
Complexity:
tangp c MOVES 1 + p 3 + 3p 3 + 3p 2 + 2pADDS 0 1 + p 0 + p 0 + 0MULTS 0 0 1 + 2p 0 + pNLOPS 0 0 0 2
OPS(F (x)X ) c OPS(F (x))
with c [1 + p,1 + 1.5p]
K. Kulshreshtha 51 / 94 Intro to AD ESSIM Summer School
Applications:I Computation of full Jacobian
Set X = In
OPS(F (x)) (1 + 1.5n)OPS(F (x))
Compare with scalar mode, i.e. F (x)ei , i = 1, . . . ,n:
OP(f (x)) 2.5nOPS(F (x))
Remarkable run-time savings can be obtainedI Computation of sparse Jacobians
K. Kulshreshtha 52 / 94 Intro to AD ESSIM Summer School
Reverse mode:
Compute X = YF (x) Rqn for Y Rqm
instead of
x = yF (x) Rn for y Rm
Implementation:I Replace vi R by Vi Rq in adjoint recursionI Everything else remains unchanged
K. Kulshreshtha 53 / 94 Intro to AD ESSIM Summer School
Complexity:
gradq c MOVES 1 + q 3 + 6q 5 + 6q 3 + 4qADDS 0 1 + 2q 0 + 2q 0 + qMULTS 0 0 1 + 2q 0 + qNLOPS 0 0 0 2
OPS(YF (x)) k OPS(F (x))
with c [1 + 2q,1.5 + 2.5q]
Remarks:I Computation of full JacobianI Remarkable run-time savings can be obtainedI Application: e.g. matrix compression for sparse Jacobians
K. Kulshreshtha 54 / 94 Intro to AD ESSIM Summer School
General Evaluation Procedure:
vin = xi for i = 1, . . . ,n
vi = i (vj )ji for i = 1, . . . , l
ymi = vli for i = m 1, . . . ,0
whereI vi , i 0, are the independentsI vi , i l m + 1, are the dependentsI i is elemental function, = set of elemental functionsI j i is dependence relation: vi depends directly on vj
K. Kulshreshtha 55 / 94 Intro to AD ESSIM Summer School
Essential Optional Vector
Smooth u + v , u v u v , u/v k uk vku, c c u, c u k ck ukrec(u) 1/u ukexp(u), log(u) uc , arcsin(u)sin(u), cos(u) tan(u)|u|c , c > 1 . . .
Lipschitz abs(u) |u| max(u, v) max{|uk |}k ,
k |uk ||u, v |
u2+v2 min(u, v)
k u
2k
General heav(u) sign(u)(u > 0)?v , w
K. Kulshreshtha 56 / 94 Intro to AD ESSIM Summer School
Representation as computational graph
v3
v2
v1
v0
v1 v2
v4 v5
v3
v6
v7
Computational graph of lighthouse example
K. Kulshreshtha 57 / 94 Intro to AD ESSIM Summer School
Representation as extended system E(x ; v) = 0
I Defineui (vj )ji for i = 1 n, . . . , li (ui ) xi+n for i = 1 n, . . . ,0cij ivj called elemental partials
I Assume dependents variables as mutually independent
Then one has thatI i < 1 or j > l m implies cij 0I Evaluation procedure equivalent to nonlinear system
0 = E(x ; v) (i (ui ) vi )i=1n,...,l
K. Kulshreshtha 58 / 94 Intro to AD ESSIM Summer School
Topics to keep in mind:
I Composite Differentiation (overall differentiability):Here it is assumed that all i are differentiable atrespective argument.
I Overwriting = Shared Allocation of vi and vjNot considered here, but:There are ways to handle them.
I Aliasing of Arguments and Results:Not considered here, but:Again there are ways to handle them.
K. Kulshreshtha 59 / 94 Intro to AD ESSIM Summer School
Main Results:
I Gradients are cheap function costsI Costs grow only quadratically in degree
I Nondifferentiability only on meager setI Optimal Jacobi-accumulation is (NP-)hardI Jacobi-accumulation is often unneccessary
K. Kulshreshtha 60 / 94 Intro to AD ESSIM Summer School
Main Results:
I Gradients are cheap function costsI Costs grow only quadratically in degreeI Nondifferentiability only on meager set
I Optimal Jacobi-accumulation is (NP-)hardI Jacobi-accumulation is often unneccessary
K. Kulshreshtha 60 / 94 Intro to AD ESSIM Summer School
Main Results:
I Gradients are cheap function costsI Costs grow only quadratically in degreeI Nondifferentiability only on meager setI Optimal Jacobi-accumulation is (NP-)hard
I Jacobi-accumulation is often unneccessary
K. Kulshreshtha 60 / 94 Intro to AD ESSIM Summer School
Main Results:
I Gradients are cheap function costsI Costs grow only quadratically in degreeI Nondifferentiability only on meager setI Optimal Jacobi-accumulation is (NP-)hardI Jacobi-accumulation is often unneccessary
K. Kulshreshtha 60 / 94 Intro to AD ESSIM Summer School
Higher Order DerivativesSecond derivatives:
Use forward and reverse differentiation together!
y = F (x)reverse
differentiation x = yF (x)
x = yF (x)forward
differentiation x = yF (x)x + yF (x)
For scalar-valued function F , y = 1, y = 0, and x one has
x = F (x)x = Hessian-vector product
K. Kulshreshtha 61 / 94 Intro to AD ESSIM Summer School
Implementation
x1
d1
d2
bx
y4
Obj
y1
y2
Sourcecode
y = F (x) Rm
K. Kulshreshtha 62 / 94 Intro to AD ESSIM Summer School
Implementation
d2
Sourcecode
Object Code
y = F (x)
y = F (x) Rm
y = F (x) x Rm
forwa
rd
(x, x) R2n
K. Kulshreshtha 62 / 94 Intro to AD ESSIM Summer School
Implementation
Sourcecode
Object Code Object Code
y = F (x)
y = F (x) Rm
y = F (x) x Rm
forwa
rdreverse
(x, x) R2n
Intermediatevalues
Stack (Tape)
x = yF (x) Rn
(x, y) Rn+m
K. Kulshreshtha 62 / 94 Intro to AD ESSIM Summer School
Implementation
Sourcecode
Object Code Object Code
y = F (x)
y = F (x) Rm
y = F (x) x Rm
forwa
rdreverse
(x, x) R2n
Intermediatevalues
Stack (Tape)
x = yF (x) Rnx = yF (x) x Rn
(x, y) Rn+m
K. Kulshreshtha 62 / 94 Intro to AD ESSIM Summer School
Higher derivatives: Think in Taylor polynomialsLet x be the path
x(t) d
j=0
xj t j : R 7 R with xj =1j! j
t jx(t)
t=0
HenceI xj is scaled derivative at t = 0.I x1 = tangent,x2 = curvature.
Now:Consider F : Rn Rm,d times continuously differentiable y(t) F (x(t)) : R 7 Rm is smooth (chain-rule)
K. Kulshreshtha 63 / 94 Intro to AD ESSIM Summer School
There exist Taylor coefficients yj Rm, 0 j d , with
y(t) =d
j=0
yj t j + O(td+1).
Using the chain-rule, one obtains:
y0 = F (x0)
y1 = F (x0)x1
y2 = F (x0)x2 +12
F (x0)x1x1
y3 = F (x0)x3 + F (x0)x1x2 +16
F (x0)x1x1x1
directional derivatives of high order (forward mode)
OPS(x0, . . . , xd y0, . . . , yd ) d2OPS(x0 y0)
using Taylor arithmetic
K. Kulshreshtha 64 / 94 Intro to AD ESSIM Summer School
Cross Country / Preaccumulation
Consider again Jacobian of the extended system E(x ; v):
E (x ; v) =
I 0 0B L I 0R T 1
with F (x) = R + T (I L)1B = Schur complement
I Forward mode: Row-Block-Elimination of T by L II Reverse mode: Column-Block-Elimination of B by L II Cross country: Apply other elimination order
K. Kulshreshtha 65 / 94 Intro to AD ESSIM Summer School
Another InterpretationElimination in Jacobian of extended system =Vertex elimination in computational graph:
4
1
+
Example for forward and reverse mode:
0
1
1 2 3 4
6
5
c 1,0
c1,
1 c2,1 c3,2 c4,3
c5,1
c 5,4
c6,0
c6,4
K. Kulshreshtha 66 / 94 Intro to AD ESSIM Summer School
Another InterpretationElimination in Jacobian of extended system =Vertex elimination in computational graph:
4
1
+
Example for forward and reverse mode:
0
1
1 2 3 4
6
5
c 1,0
c1,
1 c2,1 c3,2 c4,3
c5,1
c 5,4
c6,0
c6,4
K. Kulshreshtha 66 / 94 Intro to AD ESSIM Summer School
AD by Source Transformation
Code for Simulation Code for Derivatives
func.src 1. Lexical analysis2. Syntax analysis3. Semantic analysis4. Static data flow analyses (Symbol table, Error handler)5. AD !!!6. Code optimization7. Unparsing
func&der.drc
K. Kulshreshtha 67 / 94 Intro to AD ESSIM Summer School
AD by Source Transformation
Code for Simulation Code for Derivatives
func.src 1. Lexical analysis2. Syntax analysis3. Semantic analysis4. Static data flow analyses (Symbol table, Error handler)5. AD !!!6. Code optimization7. Unparsing
func&der.drc
K. Kulshreshtha 67 / 94 Intro to AD ESSIM Summer School
Remarks
Pros:I Make use of compiler optimization !!I Derivative code is readable
Cons:I Requires complete compiler activity analysis, new language features
I Flexibility
K. Kulshreshtha 68 / 94 Intro to AD ESSIM Summer School
Speelpennings Example
Speelpennings function f (x) =n
i=1 xi coded in Fortran:
SUBROUTINE speelpenning ( x , y )DOUBLE PRECISION x (10 ) , yINTEGER i
y=1.0DO i =1 ,10
y=yx ( i )END DOEND
K. Kulshreshtha 69 / 94 Intro to AD ESSIM Summer School
Forward Mode Differentiation
Requirements for AD tool:I determine active variablesI accociate derivative objectsI generate differntiated statementsI generate differentiated subroutines if required
K. Kulshreshtha 70 / 94 Intro to AD ESSIM Summer School
Forward Mode Differentiation
SUBROUTINE SPEELPENNING D( x , xd , y , ydDOUBLE PRECISION x (10 ) , xd (10 ) , y , ydINTEGER i
y=1.0yd =0.D0DO i =1 ,10
y=yd=ydx ( i )+ yxd ( i )y=yx ( i )
END DOEND
K. Kulshreshtha 71 / 94 Intro to AD ESSIM Summer School
Reverse Mode Differentiation
Requirements for AD tool:I determine active variablesI associate derivative objectsI generate adjoint statementsI determine values to be recordedI reverse control flowI generate recording part of the code
K. Kulshreshtha 72 / 94 Intro to AD ESSIM Summer School
Reverse Mode DifferentiationSUBROUTINE SPEELPENNING B( x , xb , y , yb )DOUBLE PRECISION x (10 ) , xb (10 ) , y , ybINTEGER i
y=1.0DO i =1 ,10
CALL PUSHREAL8( y )y=yx ( i )
END DOCALL PUSHINTEGER4( i 1)CALL POPINTEGER4( adTo )DO i =adTo,1 ,1
CALL POPREAL8( y )xb ( i )= xb ( i ) + yybyb=x ( i ) yb
END DOyb =0.D0END
K. Kulshreshtha 73 / 94 Intro to AD ESSIM Summer School
Source Transformation ToolsFortran:
AMPL Bell Lab, USA forward + reverseNAG RWTH Aachen, germany + NAG forward + reverseTAF FastOpt, Germany forward + reverseTapenade INRIA, France forward + reverseOpenAD ANL, USA, RWTH Aachen, germany forward. . .
All tools provide at most second derivatives !Goal: Get AD into the compiler
Matlab:ADIMAT RWTH Aachen, Germany forwardMAD (in TOMS) Univ. Shrivenham, GB forward. . .
K. Kulshreshtha 74 / 94 Intro to AD ESSIM Summer School
Source Transformation ToolsFortran:
AMPL Bell Lab, USA forward + reverseNAG RWTH Aachen, germany + NAG forward + reverseTAF FastOpt, Germany forward + reverseTapenade INRIA, France forward + reverseOpenAD ANL, USA, RWTH Aachen, germany forward. . .
All tools provide at most second derivatives !Goal: Get AD into the compiler
Matlab:ADIMAT RWTH Aachen, Germany forwardMAD (in TOMS) Univ. Shrivenham, GB forward. . .
K. Kulshreshtha 74 / 94 Intro to AD ESSIM Summer School
AD by Opeator OverloadingImplementation strategies:
I New variable type for active variables,i.e., for independents, dependents, intermediates (!)
I Derivative information included asI field in composite structuresI generation of internal function representation
Principle: New class definition
type ad double = c lassVAL : double ;???
end ;
K. Kulshreshtha 75 / 94 Intro to AD ESSIM Summer School
AD by Field in Composite StructureIdea for forward mode:
type ad double = c lassVAL : double ;DER : array [ 1 . . p ] of double ;
end ;
All operations are overloaded, e.g., multiplication:
opera tor (U,V : ad double ) W : ad double ;i : integer ;begin
W.VAL := U. VAL V.VAL ;for i := 1 to p do
W.DER[ i ] := U.VALV.DER[ i ] + U.DER[ i ]V.VAL ;end ;
K. Kulshreshtha 76 / 94 Intro to AD ESSIM Summer School
AD by Field in Composite StructureIdea for forward mode:
type ad double = c lassVAL : double ;DER : array [ 1 . . p ] of double ;
end ;
All operations are overloaded, e.g., multiplication:
opera tor (U,V : ad double ) W : ad double ;i : integer ;begin
W.VAL := U.VAL V.VAL ;for i := 1 to p do
W.DER[ i ] := U.VALV.DER[ i ] + U.DER[ i ]V.VAL ;end ;
K. Kulshreshtha 76 / 94 Intro to AD ESSIM Summer School
Remarks:I Implements basic rules of differentations for
I each arithmetic operationI each intrinsic function
I Easy to realizetapeless forward mode of ADOL-C
Idea for reverse mode: Graph in coreNeed of backward pointer for calculating adjoints new node type:
. . .
VAL IND
DER ADJ
K. Kulshreshtha 77 / 94 Intro to AD ESSIM Summer School
Remarks:I Implements basic rules of differentations for
I each arithmetic operationI each intrinsic function
I Easy to realizetapeless forward mode of ADOL-C
Idea for reverse mode: Graph in coreNeed of backward pointer for calculating adjoints new node type:
. . .
VAL IND
DER ADJ
K. Kulshreshtha 77 / 94 Intro to AD ESSIM Summer School
W := U * V generates
U.VAL U.IND
V.VAL V.IND
V.ADJV.DER
U.ADJU.DER
W.VAL
W.DER
W.IND
W.ADJ
Data dependency in opposite direction !!
K. Kulshreshtha 78 / 94 Intro to AD ESSIM Summer School
W := U * V generates
U.VAL U.IND
V.VAL V.IND
V.ADJV.DER
U.ADJU.DER
W.VAL
W.DER
W.IND
W.ADJ
Data dependency in opposite direction !!
K. Kulshreshtha 78 / 94 Intro to AD ESSIM Summer School
Remarks:I Calculation of derivatives only possible after
function evaluation at same point
I Complete computational graph is kept in coreAmount of internal storage may cause problems
I Access on nodes largely at random
Conclusion:Derivative calculation in active class OK for forward mode butnot practicable for reverse mode on large problems !!
K. Kulshreshtha 79 / 94 Intro to AD ESSIM Summer School
Remarks:I Calculation of derivatives only possible after
function evaluation at same point
I Complete computational graph is kept in coreAmount of internal storage may cause problems
I Access on nodes largely at random
Conclusion:Derivative calculation in active class OK for forward mode butnot practicable for reverse mode on large problems !!
K. Kulshreshtha 79 / 94 Intro to AD ESSIM Summer School
AD by Internal RepresentationIdea:
Class adouble {VAL : double ;IND : i n t e g e r ; }
W = U * V causes recording onto three tapes:INDEXCOUNT += 1W.IND = INDEXCOUNTW.VAL = U.VAL * V.VALW.VAL VALUE-TAPEMULT OPERATION-TAPE(U.IND, V.IND, W.IND) INDEX-TAPE
K. Kulshreshtha 80 / 94 Intro to AD ESSIM Summer School
AD by Internal RepresentationIdea:
Class adouble {VAL : double ;IND : i n t e g e r ; }
W = U * V causes recording onto three tapes:INDEXCOUNT += 1W.IND = INDEXCOUNTW.VAL = U.VAL * V.VALW.VAL VALUE-TAPEMULT OPERATION-TAPE(U.IND, V.IND, W.IND) INDEX-TAPE
K. Kulshreshtha 80 / 94 Intro to AD ESSIM Summer School
Remarks for Tape-based Derivatives
I Mechanical application of chain-rule based on the tapes
I Data access on tapes strictly sequential
I Tapes can be reused for function / derivativecalculation at different points
I Several tapes for different control flows depending onactive variables
K. Kulshreshtha 81 / 94 Intro to AD ESSIM Summer School
General Remarks for Operator Overloading
Pros:I Only choice for C/C++I Great flexibility with respect to mode/orderI Simple to maintain
Cons:I Compiler optimization may be lostI Only object code for derivatives
K. Kulshreshtha 82 / 94 Intro to AD ESSIM Summer School
Operator Overloading Tools
C/C++:ADO1 Cranfield Uni., GB forward + reverseADOL-C Uni. Paderborn, Germany forward + reverseCppAD Uni. Washington, USA forward + reverseFADBAD TU Denmark forward + reverseTADIFF TU Denmark forward...
Matlab:ADMAT Cayuga Research Associates, USA forward + reverse...
K. Kulshreshtha 83 / 94 Intro to AD ESSIM Summer School
Operator Overloading Tools
C/C++:ADO1 Cranfield Uni., GB forward + reverseADOL-C Uni. Paderborn, Germany forward + reverseCppAD Uni. Washington, USA forward + reverseFADBAD TU Denmark forward + reverseTADIFF TU Denmark forward...
Matlab:ADMAT Cayuga Research Associates, USA forward + reverse...
K. Kulshreshtha 83 / 94 Intro to AD ESSIM Summer School
Conclusions
I Evaluation of derivatives with working accuracy
I Forward mode: OPS(F (x)x) c OPS(F ), c [2,5/2]I Reverse mode: OPS(y> F (x)) c OPS(F ), c [3,4]
Gradients are cheap Function Costs!!
I Combination: OPS(y>,F (x)x) c OPS(F ), c [7,10]I Cost of higher derivatives grows quadratically in the degree
K. Kulshreshtha 84 / 94 Intro to AD ESSIM Summer School
Conclusions
I Evaluation of derivatives with working accuracy
I Efficient evaluation of derivatives of any order
I Numerous tools available,based on source transformation or operator overloading
I Nondifferentiability only on meager set
I Optimal Jacobi accumulation is (NP-)hard
I Full Jacobians/Hessians often not needed or sparse
I see www.autodiff.org
K. Kulshreshtha 84 / 94 Intro to AD ESSIM Summer School
Conclusions
I Evaluation of derivatives with working accuracy
I Efficient evaluation of derivatives of any order
I Numerous tools available,based on source transformation or operator overloading
I Nondifferentiability only on meager set
I Optimal Jacobi accumulation is (NP-)hard
I Full Jacobians/Hessians often not needed or sparse
I see www.autodiff.org
K. Kulshreshtha 84 / 94 Intro to AD ESSIM Summer School
Conclusions
I Evaluation of derivatives with working accuracy
I Efficient evaluation of derivatives of any order
I Numerous tools available,based on source transformation or operator overloading
I Nondifferentiability only on meager set
I Optimal Jacobi accumulation is (NP-)hard
I Full Jacobians/Hessians often not needed or sparse
I see www.autodiff.org
K. Kulshreshtha 84 / 94 Intro to AD ESSIM Summer School
Automatic differentiation by overloading in C++
ADOL-C version 2.1
I reorganization of tapingtape dependent information kept in separate structures
I different differentiation contextsI documented external function facilityI documented fixpoint iteration facilityI documented checkpointing facility based on revolve
I documented parallelization of derivative calculationI coupled with ColPack for exploitation of sparsityI available at COIN-OR since May 2009
K. Kulshreshtha 85 / 94 Intro to AD ESSIM Summer School
Source Code ModificationsI Include needed header-files
easiest way: #include adolc.h
I Define region that has to be differentiated:
trace on(tag,keep); Start of... active sectiontrace off(file); and its end
I Mark independents and dependents in active section:
xa = yp; mark dependents
I Declare all active variables of type adoubleI Calculate derivative objects after trace off(file)
K. Kulshreshtha 86 / 94 Intro to AD ESSIM Summer School
Source Code ModificationsI Include needed header-files
easiest way: #include adolc.hI Define region that has to be differentiated:
trace on(tag,keep); Start of... active sectiontrace off(file); and its end
I Mark independents and dependents in active section:
xa = yp; mark dependents
I Declare all active variables of type adoubleI Calculate derivative objects after trace off(file)
K. Kulshreshtha 86 / 94 Intro to AD ESSIM Summer School
Source Code ModificationsI Include needed header-files
easiest way: #include adolc.hI Define region that has to be differentiated:
trace on(tag,keep); Start of... active sectiontrace off(file); and its end
I Mark independents and dependents in active section:
xa = yp; mark dependents
I Declare all active variables of type adoubleI Calculate derivative objects after trace off(file)
K. Kulshreshtha 86 / 94 Intro to AD ESSIM Summer School
Source Code ModificationsI Include needed header-files
easiest way: #include adolc.hI Define region that has to be differentiated:
trace on(tag,keep); Start of... active sectiontrace off(file); and its end
I Mark independents and dependents in active section:
xa = yp; mark dependents
I Declare all active variables of type adouble
I Calculate derivative objects after trace off(file)
K. Kulshreshtha 86 / 94 Intro to AD ESSIM Summer School
Source Code ModificationsI Include needed header-files
easiest way: #include adolc.hI Define region that has to be differentiated:
trace on(tag,keep); Start of... active sectiontrace off(file); and its end
I Mark independents and dependents in active section:
xa = yp; mark dependents
I Declare all active variables of type adoubleI Calculate derivative objects after trace off(file)
K. Kulshreshtha 86 / 94 Intro to AD ESSIM Summer School
Evaluation Procedure (Lighthouse)
y1 = tan(t)tan(t)
y2 = tan(t)tan(t)
v3 = x1 = v2 = x2 = v1 = x3 = v0 = x4 = tv1 = v1 v0 1(v1, v0)v2 = tan(v1) 2(v1)v3 = v2 v2 3(v2, v2)v4 = v3 v2 4(v3, v2)v5 = v4/v3 5(v4, v3)v6 = v5 v2 6(v5, v2)y1 = v5y2 = v6
K. Kulshreshtha 87 / 94 Intro to AD ESSIM Summer School
Model Generation with ADOL-C
v
v
v
v
x1
1
x2
x3x4
2
3 4
5
6
v
v
K. Kulshreshtha 88 / 94 Intro to AD ESSIM Summer School
Model Generation with ADOL-C
v
v
v
v
x1
1
x2
x3x4
2
3 4
5
6
v
v
double x1, x2, x3, x4;double v1, v2, . . .double y1, y2;
x1 = 3.7; x2 = 0.7;x3 = 0.5; x4 = 0.5;
v1 = x3*x4;v2 = tan(v1);v3 = x2-v2;v4 = x1*v2;v5 = v4/v3;v6 = v5*x2
y1 = v5; y2 = v6
K. Kulshreshtha 88 / 94 Intro to AD ESSIM Summer School
Model Generation with ADOL-C
v
v
v
v
x1
1
x2
x3x4
2
3 4
5
6
v
v
adouble x1, x2, x3, x4;adouble v1, v2, . . .double y1, y2;
trace on(tag);x1= 3.7; x2= 0.7;x3= 0.5; x4= 0.5;v1 = x3*x4;v2 = tan(v1);v3 = x2-v2;v4 = x1*v2;v5 = v4/v3;v6 = v5*x2
v5= y1; v6= y2;trace off();
K. Kulshreshtha 88 / 94 Intro to AD ESSIM Summer School
Model Generation with ADOL-C
v
v
v
v
x1
1
x2
x3x4
2
3 4
5
6
v
v
adouble x1, x2, x3, x4;adouble v1, v2, . . .double y1, y2;
trace on(tag);x1= 3.7; x2= 0.7;x3= 0.5; x4= 0.5;v1 = x3*x4;v2 = tan(v1);v3 = x2-v2;v4 = x1*v2;v5 = v4/v3;v6 = v5*x2
v5= y1; v6= y2;trace off();
Easy-to-use routinesint gradient (tag,n,x[n],g[n]): tag = tape number
n = # indepsx[n] = values of indepsg[n] = f (x)
int jacobian (tag,m,n,x[n],J[m][n]): tag = tape numberm = # depsn = # indepsx[n] = values of indepsJ[m][n] = F (x)
int hessian (tag,n,x[n],H[n][n]) tag = tape numbern = # indepsx[n] = values of indepsH[n][n] = 2f (x)
K. Kulshreshtha 89 / 94 Intro to AD ESSIM Summer School
Easy-to-use routinesint gradient (tag,n,x[n],g[n]): tag = tape number
n = # indepsx[n] = values of indepsg[n] = f (x)
int jacobian (tag,m,n,x[n],J[m][n]): tag = tape numberm = # depsn = # indepsx[n] = values of indepsJ[m][n] = F (x)
int hessian (tag,n,x[n],H[n][n]) tag = tape numbern = # indepsx[n] = values of indepsH[n][n] = 2f (x)
K. Kulshreshtha 89 / 94 Intro to AD ESSIM Summer School
Easy-to-use routinesint gradient (tag,n,x[n],g[n]): tag = tape number
n = # indepsx[n] = values of indepsg[n] = f (x)
int jacobian (tag,m,n,x[n],J[m][n]): tag = tape numberm = # depsn = # indepsx[n] = values of indepsJ[m][n] = F (x)
int hessian (tag,n,x[n],H[n][n]) tag = tape numbern = # indepsx[n] = values of indepsH[n][n] = 2f (x)
K. Kulshreshtha 89 / 94 Intro to AD ESSIM Summer School
Further drivers for optimization:
I vec_jac(tag,m,n,repeat,x[n],u[m],z[n])
Computes z = uT F (x)I jac_vec(tag,m,n,x[n],v[n],z[n])
Computes z = F (x)vI hess_vec(tag,n,x[n],v[n],z[n])
Computes z = 2f (x)vI lagra_hess_vec(tag,n,m,x[n],v[n],u[m],h[n])
Computes h = uT F (x)vExtension to uT F (x)V available
I jac_solv(tag,n,x[n],b[n], sparse, mode)Computes w with F (x)w = b and store result in b
K. Kulshreshtha 90 / 94 Intro to AD ESSIM Summer School
vec_jac(tag, m, n, repeat, x[n], u[m], z[n])jac_vec(tag, m, n, x[n], v[n], z[n])hess_vec(tag, n, x[n], v[n], z[n])lagra_hess_vec(tag, n, m, x[n], v[n], u[m], h[n])jac_solv(tag, n, x[n], b[n]
Drivers for sparse Derivative MatricesJacobian:sparse_jac(tag,m,n,repeat,x,&nnz,&row_ind,&col_ind,
&values,options);
jac_pat(tag,m,n,x,JP,options);
generate_seed_jac(m,n,JP,&seed,&p,option);
Hessian:sparse_hess(tag,n,repeat,x,&nnz,&row_ind,&col_ind,
&values,options);
hess_pat(tag,n,x,HP,options);
generate_seed_hess(n,HP,&seed,&p,option);
graph coloring routines by ColPack (Gebrehemedin, Pothen)
K. Kulshreshtha 91 / 94 Intro to AD ESSIM Summer School
sparse_jac(tag, m, n, repeat, x, &nnz, &row_ind, &col_ind,& values, options);jac_pat(tag, m, n, x, JP, options);generate_seed_jac(m, n, JP, &seed, &p, option);sparse_hess(tag, n, repeat, x, &nnz, &row_ind, &col_ind,&values, options);hess_pat(tag, n, x, HP, options);generate_seed_hess(n, HP, &seed, &p, option);
Drivers for sparse Derivative MatricesJacobian:sparse_jac(tag,m,n,repeat,x,&nnz,&row_ind,&col_ind,
&values,options);
jac_pat(tag,m,n,x,JP,options);
generate_seed_jac(m,n,JP,&seed,&p,option);
Hessian:sparse_hess(tag,n,repeat,x,&nnz,&row_ind,&col_ind,
&values,options);
hess_pat(tag,n,x,HP,options);
generate_seed_hess(n,HP,&seed,&p,option);
graph coloring routines by ColPack (Gebrehemedin, Pothen)
K. Kulshreshtha 91 / 94 Intro to AD ESSIM Summer School
sparse_jac(tag, m, n, repeat, x, &nnz, &row_ind, &col_ind,& values, options);jac_pat(tag, m, n, x, JP, options);generate_seed_jac(m, n, JP, &seed, &p, option);sparse_hess(tag, n, repeat, x, &nnz, &row_ind, &col_ind,&values, options);hess_pat(tag, n, x, HP, options);generate_seed_hess(n, HP, &seed, &p, option);
Drivers for sparse Derivative MatricesJacobian:sparse_jac(tag,m,n,repeat,x,&nnz,&row_ind,&col_ind,
&values,options);
jac_pat(tag,m,n,x,JP,options);
generate_seed_jac(m,n,JP,&seed,&p,option);
Hessian:sparse_hess(tag,n,repeat,x,&nnz,&row_ind,&col_ind,
&values,options);
hess_pat(tag,n,x,HP,options);
generate_seed_hess(n,HP,&seed,&p,option);
graph coloring routines by ColPack (Gebrehemedin, Pothen)
K. Kulshreshtha 91 / 94 Intro to AD ESSIM Summer School
sparse_jac(tag, m, n, repeat, x, &nnz, &row_ind, &col_ind,& values, options);jac_pat(tag, m, n, x, JP, options);generate_seed_jac(m, n, JP, &seed, &p, option);sparse_hess(tag, n, repeat, x, &nnz, &row_ind, &col_ind,&values, options);hess_pat(tag, n, x, HP, options);generate_seed_hess(n, HP, &seed, &p, option);
Drivers for sparse Derivative MatricesJacobian:sparse_jac(tag,m,n,repeat,x,&nnz,&row_ind,&col_ind,
&values,options);
jac_pat(tag,m,n,x,JP,options);
generate_seed_jac(m,n,JP,&seed,&p,option);
Hessian:sparse_hess(tag,n,repeat,x,&nnz,&row_ind,&col_ind,
&values,options);
hess_pat(tag,n,x,HP,options);
generate_seed_hess(n,HP,&seed,&p,option);
graph coloring routines by ColPack (Gebrehemedin, Pothen)
K. Kulshreshtha 91 / 94 Intro to AD ESSIM Summer School
sparse_jac(tag, m, n, repeat, x, &nnz, &row_ind, &col_ind,& values, options);jac_pat(tag, m, n, x, JP, options);generate_seed_jac(m, n, JP, &seed, &p, option);sparse_hess(tag, n, repeat, x, &nnz, &row_ind, &col_ind,&values, options);hess_pat(tag, n, x, HP, options);generate_seed_hess(n, HP, &seed, &p, option);
Advanced taping techniques
Part A Part B Part C
ADOL-C tape (basic)
K. Kulshreshtha 92 / 94 Intro to AD ESSIM Summer School
Advanced taping techniques
Part A Part B Part C
ADOL-C tape (basic)
ADOL-C tape (nested)
K. Kulshreshtha 92 / 94 Intro to AD ESSIM Summer School
Advanced taping techniques
Part A Part B Part C
ADOL-C tape (basic)
ADOL-C tape (nested)
External differentiated functions
K. Kulshreshtha 92 / 94 Intro to AD ESSIM Summer School
Differentiation of fixpoint iterations
ADOLC tape (basic)
initialization iteration evaluation
() () ()
K. Kulshreshtha 93 / 94 Intro to AD ESSIM Summer School
Differentiation of fixpoint iterations
ADOLC tape (basic)
initialization iteration evaluation
ADOLC function:
ADOLC tape (nested)
fp_iteration()
() () ()
K. Kulshreshtha 93 / 94 Intro to AD ESSIM Summer School
Differentiation of fixpoint iterations
ADOLC tape (basic)
initialization iteration evaluation
repeated subtape evaluation
reverse evaluation
()
()
()
K. Kulshreshtha 93 / 94 Intro to AD ESSIM Summer School
Summary and Outlook
I AD for C and C++ derivatives of any orderI Easy-to-use routines for standard derivative objects,
optimization, ODEs, tensors, . . .I Randomly accessed memory of original magnitudeI Sequential access of data on tape, i.e., array or fileI Wide range of application:
I aerodynamic, chemical engineering, medicine, netword opt., . . .Several ideas for improvement:
I runtime activity, parallelization, front ends, . . .
Contact:http://www.coin-or.org/projects/ADOL-C.xml
K. Kulshreshtha 94 / 94 Intro to AD ESSIM Summer School