HDMR Methodology

GLOBAL SENSITIVITY ANALYSIS BY RANDOM SAMPLING - HIGH

DIMENSIONAL MODELREPRESENTATION (RS-HDMR)

Herschel Rabitz

Department of Chemistry, Princeton University,Princeton, New Jersey 08544

HDMR Methodology

• HDMR expresses a system output as a hierarchical correlated function expansion of inputs:

HDMR Methodology (Contd.)

• HDMR component functions are optimally defined as:

- where are unconditional and conditional probability density functions:

RS (Random Sampling) – HDMR (Contd.)

• RS-HDMR component functions are approximated by expansions of orthonormal polynomials

- Inputs can be sampled independently and/or in a correlated fashion

- Only one set of data is needed to determine all of the component functions

- Statistical analysis (F-test) is used proper truncation of RS-HDMR expansion

Global Sensitivity Analysis by RS-HDMR

• Individual RS-HDMR component functions have a direct statistical correlation interpretation, which permits the model output variance to be decomposed into its input contributions

- Where are defined as the covariances of

with f(x), respectively

A Propellant Ignition Model

Calculated profiles of temperature and major mole fractions for the ignition and combustion of the M10 solid propellant


• 10 independent and 44 cooperative contributions of inputs were identified as significant


• Nonlinear global sensitivity indexes efficiently identified all significant contributions of inputs

Trichloroethylene (TCE) Microenvironmental/Pharmacokinetic Modeling

Microenvironmental/exposure/dose modeling system

Structure of TCE-PBPK model (adapted from Fisher et. al., 1998)

Example: Trichloroethylene (TCE) Microenvironmental/Pharmacokinetic Modeling

• The coupled microenvironmental/pharmacokinetic model:

- Three exposure routes (inhalation, ingestion, and dermal absorption)

- Release of TCE from water into the air within the residence

- Activities of individuals and physiological uptake processes

• Seven input variables [age (x1), tap water concentration (x2), shower stall volume (x3), drinking water consumption rate (x4), shower flow rate (x5), shower time (x6), time in bathroom after shower (x7)] are used to construct the RS-HDMR orthonormal polynomials

• Target outputs: the total internal doses from intake (inhalation and ingestion) and uptake (dermal absorption)

- The amount inhaled or ingested:

- The amount absorbed:

- C(t): exposure concentration, IR(t): inhalation or ingestion rate, Kp: permeability coefficient, SA(t): surface area exposed


• Inputs (x1, x2, x3, x4) have a uniform distribution, and inputs (x5, x6, x7) have a triangular distribution; 10,000 input-output data were generated

The data distributions for the uniformly distributed variable x1 and the triangularly distributed variable x5


• Seven independent, fifteen 2nd order and one 3rd order cooperative contributions of inputs were identified as significant

First order sensitivity indexes


• Nonlinear global sensitivity indexes (2nd order and above) efficiently identified all significant contributions of inputs

The ten largest 2nd and 3rd order sensitivity indexes

Identification of bionetwork model parameters

Characteristics of the problem: System nonlinearity Limited number & type of experiments Considerable biological and measurement noise

Multiple solutions exist !

Problems with traditional identification methods: Provide only one or a few solutions for each parameter Assume linear propagation from data noise to parameter uncertainties

The closed-loop identification protocol (CLIP): Extract the full parameter distribution by global identification Iteratively look for the most informative experiments for minimizing parameter uncertainty

xc xr

Proposed Mechanism

CONTROL MODULE

Controlled System Perturbations and

Property Measurements

LaboratoryConstraints

R(u,X)

uc(t)

Xr(t) Qinv

Best Solution Distribution

(k*)

ANALYSIS MODULE

INVERSION MODULE

Learning Algorithm Guiding the Experiments: Qinv

→ J ctrl→ uc( )t

Previous Knowledge

Trial Solutions

(k0)

Pre-lab analysis and design of themost informative experiments

Iterative experiment optimizationand data acquisition

Global parameter identification

General operation of CLIP

Isoleucyl-tRNA synthetase proofreading valyl-tRNAIle

Okamoto and Savageau, Biochemistry, 23:1701-1709 (1984)

*

**

**

*

* Rate constants to be identified

The inversion module: identifying the rate constant distribution

The Genetic Algorithm (GA)

Mutation

1101 1111+1100 0010

1101 1101+1100 0110

Crossover

1101 1100 + 1111 0010

1101 0010 + 1111 1100

The inversion cost function

in

calpitn

labitn

t

tt

N

n

piinv XX

TNJ

T

ε/||||11 ,,

,,,

1

,

1

−= ∑∑==

0

5

10

15

20

25

0

0.110.220.330.440.550.660.770.880.991.1

1.211.321.431.541.651.761.871.98

k2 (s-1, in log scale)

no. of solutions

distribution of k2 (random control)

Typical rate constant distributionafter random perturbation/control

Q

Inversion quality index Q

The analysis module: estimating the most informative experiments

Estimate the best species for monitoring system behavior Determine the best species for perturbing the system

Nonlinear sensitivity analysis by Random-Sampling High Dimensional Model Representation (RS-HDMR)

K++= ∑∑≤<≤= nji

ij

n

iitotal

1

2

1

22 σσσ

Optimally controlled identification: squeezing on the rate constant distribution

The control cost function

∑= +

−=

−=M

mim

im

im

imi

inv

ir

ic

iinv

ictrl

kk

kk

MQ

tXtuRQJ

1 min,max,

min,max, ])(

)(1/[1

)](),([ω

Inversion quality

Feng and Rabitz, Biophys. J., 86:1270-1281 (2004)Feng, Rabitz, Turinici, and LeBris, J. Phys. Chem. A, 110:7755-7762 (2006)

Non-

Network property optimization:

A. Identifying the best targeted network locations for intervention

B. Identifying the optimal network control

OptimalNetwork

Performance

OptimalNetwork

Performance

Optimal Controls

Optimal Controls

BiologicalSystemBiological

System

Initial Guess/Random Control

Initial Guess/Random Control

ControlDesignControlDesign

LearningAlgorithm

LearningAlgorithm

ObservedResponseObservedResponse

ControlObjective

ControlObjective

A. Molecular target identification for network engineering

P(lac)

IPTG

LacI

lacI cI eyfp

EYFPCI

?

P(R-O12)p(lacIq )

LacI

lacIlacI cIcI eyfp

EYFPCICI

?

P(R-O12)p(lacIq )

Random-sampling high dimensionalmodel representation (RS-HDMR)

∑ ∑= ≤<≤

+++==N

i Njijiijii kkfkfffy

1 10 ),()()( Kk

Randomly sample k

K++= ∑∑≤<≤= Nji

ij

N

iitotal

1

2

1

22 σσσ

Advantages of RS-HDMR:

Global sensitivity analysis Nonlinear component functions Physically meaningful representation Favorable scalability

Li, Rosenthal, and Rabitz, J. Phys. Chem. A, 105:7765-7777 (2001)

P(lac)

IPTG

LacI

lacI cI eyfp

EYFPCI

?

P(R-O12)p(lacIq )

LacI

lacIlacI cIcI eyfp

EYFPCICI

?

P(R-O12)p(lacIq )

k6 k10 ─ k13

01

2

3

4

5

6

7

8

9

p110 pR1 pR2 pR3

IPTG=1mMIPTG=0

k10 ─ k13 fixed

05

10

15

20

25

30

35

4045

p107 pM4 pM5 pM6

IPTG=1mMIPTG=0

k6 fixed

Feng, Hooshangi, Chen, Li, Weiss, and Rabitz, Biophys. J., 87:2195-2202 (2004)

Laboratory data on the mutants

Example: Biochemical multi-component formulation mapping

• Allosteric regulation of aspartate transcarbamoylase (ATcase) in vitro by all four ribonucleotide triphosphates (NTPs)

• ATcase activity (output) was measured for 300 random NTP concentration combinations (inputs) in the laboratory

• A second order RS-HDMR as an input -> output map was constructed. Its accuracy is comparable with the laboratory error The absolute error of repeated

measurements

Biochemical multi-component formulation mapping

The comparison of the laboratory data and the 2nd order RS-HDMR approximation for “used” and “test” data

Note: The two parallel lines are absolute error ±0.2

The s-space network identification procedure (SNIP)

aTc: x1 IPTG: x2 EYFP: y(x1,x2)

Encode: x1→x1m1(s) x2→x2m2(s)

Response measurement: y→y(s)

Decode: Fourier transform

TetR

tetR lacI eyfp

EYFPLacI

pL(tet) P(lac)

aTc

p(lacIq)

IPTG

TetR

tetRtetR lacIlacI eyfp

EYFPLacILacI

pL(tet) P(lac)

aTc

p(lacIq)

IPTGLaboratory data on the transcriptional cascade

Nonlinear property prediction by SNIP

Unmeasured region correctly predictedNonlinear, cooperative behavior revealed

Feng, Nichols, Mitra, Hooshangi, Weiss, and Rabitz, In preparation

SNIP application to an intracellular signaling network

Sachs, et al., Science, 308:523-529 (2005)

Laboratory single cell measurement data

PKC

PKA

P38Jnk

Plc

?

PIP3

PIP2

Raf

Mek

Erk

Akt

PKC

PKA

P38Jnk

Plc

?

PIP3

PIP2

Raf

Mek

Erk

Akt

PKC

PKA

P38Jnk

Plc

?

PIP3

PIP2

Raf

Mek

Erk

Akt

Network connections identified by SNIP and Bayesian analysis

Reliable SNIP prediction of Akt levels

Identified network with predictive capability

Example: Ionospheric measured data

• The ionospheric critical frequencies determined from ground-based ionosonde measurements at Huancayo, Peru from years 1957 - 1987 (8694 points)

• Input: year, day, solar flux (f10.7), magnetic activity index (kp), geomagnetic field index (dst), previous day's value of foE

• Output: ionospheric critical frequencies foE• The inputs are not controllable and not independent;

the pdf of the inputs is not separable, and was not explicitly known

Ionospheric measured data

The dependence of foE on the input “day”

Ionosonde data distribution: the dependences between normalized input variables: year and f10.7, kp and dst for

the data at 12 UT

Ionospheric measured data

The accuracy of the 2nd order RS-HDMR expansion for the output, foE

Quantitative molecular property prediction

X1

X2

Standard QSAR

General strategy:Molecular activity is a function of its chemical/physical/structural descriptors

Problems: Overfitting (choice of descriptors) Underlying physics

A simple solution:y=f(x1,x2), x1=1,2,…,N1, x2=1,2,…,N2

Descriptor-free quantitative molecular property interpolation

Descriptor-free property predictionfrom an arbitrary substituent order

∑=j

jjcy φ

Property prediction from the optimal substituent order

Shenvi, Geremia, and Rabitz, J. Phys. Chem. A, 107:2066 (2003)

Complexity of the search: N1!•N2!=14!•8!=1015

Cost function: yJ ∇=

Application to a chromophore transition metal complex library

Before reordering After reordering

Outliers captured by the reordering algorithm

Liang, Feng, Lowry, and Rabitz J. Phys. Chem. B, 109:5842-5854 (2005)

∑ −=k

labk

calk yyJ 2)(Cost function:

10 20 30 40 50 60 70 80 90

20

40

60

80

100

120

140

Application to a drug compound library

∑ ∑∑ ∑≤′<≤ ≤≤

′′≤′<≤ ≤≤

′′ −+−=2 11 2 1 1

2

1 1

2 ])([])([Njj Ni

jiijjjNii Nj

jiijii yycyycJCost function:

15% of data

Reorder

Prediction

>14,000 compounds

THE MODERN WAY TO DO SCIENCE*

* Adaptively under high duty cycle and automated

“You should understand the physics, write down the correct equations, and let nature do the calculations.”

Peter Debye

Documents

HDMR Methodology