37
GLOBAL SENSITIVITY ANALYSIS BY RANDOM SAMPLING - HIGH DIMENSIONAL MODEL REPRESENTATION (RS-HDMR) Herschel Rabitz Department of Chemistry, Princeton University, Princeton, New Jersey 08544

HDMR Methodology

  • Upload
    ban

  • View
    46

  • Download
    1

Embed Size (px)

DESCRIPTION

GLOBAL SENSITIVITY ANALYSIS BY RANDOM SAMPLING - HIGH DIMENSIONAL MODEL REPRESENTATION (RS-HDMR) Herschel Rabitz Department of Chemistry, Princeton University, Princeton, New Jersey 08544. HDMR Methodology. HDMR expresses a system output as a hierarchical correlated function expansion of inputs:. - PowerPoint PPT Presentation

Citation preview

Page 1: HDMR Methodology

GLOBAL SENSITIVITY ANALYSIS BY RANDOM SAMPLING - HIGH

DIMENSIONAL MODELREPRESENTATION (RS-HDMR)

Herschel Rabitz

Department of Chemistry, Princeton University,Princeton, New Jersey 08544

Page 2: HDMR Methodology

HDMR Methodology

• HDMR expresses a system output as a hierarchical correlated function expansion of inputs:

Page 3: HDMR Methodology

HDMR Methodology (Contd.)

• HDMR component functions are optimally defined as:

- where are unconditional and conditional probability density functions:

Page 4: HDMR Methodology

RS (Random Sampling) – HDMR (Contd.)

• RS-HDMR component functions are approximated by expansions of orthonormal polynomials

- Inputs can be sampled independently and/or in a correlated fashion

- Only one set of data is needed to determine all of the component functions

- Statistical analysis (F-test) is used proper truncation of RS-HDMR expansion

Page 5: HDMR Methodology

Global Sensitivity Analysis by RS-HDMR

• Individual RS-HDMR component functions have a direct statistical correlation interpretation, which permits the model output variance to be decomposed into its input contributions

- Where are defined as the covariances of

with f(x), respectively

Page 6: HDMR Methodology

A Propellant Ignition Model

Calculated profiles of temperature and major mole fractions for the ignition and combustion of the M10 solid propellant

Page 7: HDMR Methodology

A Propellant Ignition Model

• 10 independent and 44 cooperative contributions of inputs were identified as significant

Page 8: HDMR Methodology

A Propellant Ignition Model

• Nonlinear global sensitivity indexes efficiently identified all significant contributions of inputs

Page 9: HDMR Methodology

Trichloroethylene (TCE) Microenvironmental/Pharmacokinetic Modeling

Microenvironmental/exposure/dose modeling system

Structure of TCE-PBPK model (adapted from Fisher et. al., 1998)

Page 10: HDMR Methodology

Example: Trichloroethylene (TCE) Microenvironmental/Pharmacokinetic Modeling

• The coupled microenvironmental/pharmacokinetic model:

- Three exposure routes (inhalation, ingestion, and dermal absorption)

- Release of TCE from water into the air within the residence

- Activities of individuals and physiological uptake processes

• Seven input variables [age (x1), tap water concentration (x2), shower stall volume (x3), drinking water consumption rate (x4), shower flow rate (x5), shower time (x6), time in bathroom after shower (x7)] are used to construct the RS-HDMR orthonormal polynomials

• Target outputs: the total internal doses from intake (inhalation and ingestion) and uptake (dermal absorption)

- The amount inhaled or ingested:

- The amount absorbed:

- C(t): exposure concentration, IR(t): inhalation or ingestion rate, Kp: permeability coefficient, SA(t): surface area exposed

Page 11: HDMR Methodology

Trichloroethylene (TCE) Microenvironmental/Pharmacokinetic Modeling

• Inputs (x1, x2, x3, x4) have a uniform distribution, and inputs (x5, x6, x7) have a triangular distribution; 10,000 input-output data were generated

The data distributions for the uniformly distributed variable x1 and the triangularly distributed variable x5

Page 12: HDMR Methodology

Trichloroethylene (TCE) Microenvironmental/Pharmacokinetic Modeling

• Seven independent, fifteen 2nd order and one 3rd order cooperative contributions of inputs were identified as significant

First order sensitivity indexes

Page 13: HDMR Methodology

Trichloroethylene (TCE) Microenvironmental/Pharmacokinetic Modeling

• Nonlinear global sensitivity indexes (2nd order and above) efficiently identified all significant contributions of inputs

The ten largest 2nd and 3rd order sensitivity indexes

Page 14: HDMR Methodology

Identification of bionetwork model parameters

Characteristics of the problem: System nonlinearity Limited number & type of experiments Considerable biological and measurement noise

Multiple solutions exist !

Problems with traditional identification methods: Provide only one or a few solutions for each parameter Assume linear propagation from data noise to parameter uncertainties

The closed-loop identification protocol (CLIP): Extract the full parameter distribution by global identification Iteratively look for the most informative experiments for minimizing parameter uncertainty

Page 15: HDMR Methodology

xc xr

Proposed Mechanism

CONTROL MODULE

Controlled System Perturbations and

Property Measurements

LaboratoryConstraints

R(u,X)

uc(t)

Xr(t) Qinv

Best Solution Distribution

(k*)

ANALYSIS MODULE

INVERSION MODULE

Learning Algorithm Guiding the Experiments: Qinv

→ J ctrl→ uc( )t

Previous Knowledge

Trial Solutions

(k0)

Pre-lab analysis and design of themost informative experiments

Iterative experiment optimizationand data acquisition

Global parameter identification

General operation of CLIP

Page 16: HDMR Methodology

Isoleucyl-tRNA synthetase proofreading valyl-tRNAIle

Okamoto and Savageau, Biochemistry, 23:1701-1709 (1984)

*

**

**

*

* Rate constants to be identified

Page 17: HDMR Methodology

The inversion module: identifying the rate constant distribution

The Genetic Algorithm (GA)

Mutation

1101 1111+1100 0010

1101 1101+1100 0110

Crossover

1101 1100 + 1111 0010

1101 0010 + 1111 1100

The inversion cost function

in

calpitn

labitn

t

tt

N

n

piinv XX

TNJ

T

ε/||||11 ,,

,,,

1

,

1

−= ∑∑==

0

5

10

15

20

25

0

0.110.220.330.440.550.660.770.880.991.1

1.211.321.431.541.651.761.871.98

k2 (s-1, in log scale)

no. of solutions

distribution of k2 (random control)

Typical rate constant distributionafter random perturbation/control

Q

Inversion quality index Q

Page 18: HDMR Methodology

The analysis module: estimating the most informative experiments

Estimate the best species for monitoring system behavior Determine the best species for perturbing the system

Nonlinear sensitivity analysis by Random-Sampling High Dimensional Model Representation (RS-HDMR)

K++= ∑∑≤<≤= nji

ij

n

iitotal

1

2

1

22 σσσ

Page 19: HDMR Methodology

Optimally controlled identification: squeezing on the rate constant distribution

The control cost function

∑= +

−=

−=M

mim

im

im

imi

inv

ir

ic

iinv

ictrl

kk

kk

MQ

tXtuRQJ

1 min,max,

min,max, ])(

)(1/[1

)](),([ω

Inversion quality

Feng and Rabitz, Biophys. J., 86:1270-1281 (2004)Feng, Rabitz, Turinici, and LeBris, J. Phys. Chem. A, 110:7755-7762 (2006)

Non-

Page 20: HDMR Methodology

Network property optimization:

A. Identifying the best targeted network locations for intervention

B. Identifying the optimal network control

OptimalNetwork

Performance

OptimalNetwork

Performance

Optimal Controls

Optimal Controls

BiologicalSystemBiological

System

Initial Guess/Random Control

Initial Guess/Random Control

ControlDesignControlDesign

LearningAlgorithm

LearningAlgorithm

ObservedResponseObservedResponse

ControlObjective

ControlObjective

Page 21: HDMR Methodology

A. Molecular target identification for network engineering

P(lac)

IPTG

LacI

lacI cI eyfp

EYFPCI

?

P(R-O12)p(lacIq )

LacI

lacIlacI cIcI eyfp

EYFPCICI

?

P(R-O12)p(lacIq )

Random-sampling high dimensionalmodel representation (RS-HDMR)

∑ ∑= ≤<≤

+++==N

i Njijiijii kkfkfffy

1 10 ),()()( Kk

Randomly sample k

K++= ∑∑≤<≤= Nji

ij

N

iitotal

1

2

1

22 σσσ

Advantages of RS-HDMR:

Global sensitivity analysis Nonlinear component functions Physically meaningful representation Favorable scalability

Li, Rosenthal, and Rabitz, J. Phys. Chem. A, 105:7765-7777 (2001)

Page 22: HDMR Methodology

P(lac)

IPTG

LacI

lacI cI eyfp

EYFPCI

?

P(R-O12)p(lacIq )

LacI

lacIlacI cIcI eyfp

EYFPCICI

?

P(R-O12)p(lacIq )

k6 k10 ─ k13

01

2

3

4

5

6

7

8

9

p110 pR1 pR2 pR3

IPTG=1mMIPTG=0

k10 ─ k13 fixed

05

10

15

20

25

30

35

4045

p107 pM4 pM5 pM6

IPTG=1mMIPTG=0

k6 fixed

Feng, Hooshangi, Chen, Li, Weiss, and Rabitz, Biophys. J., 87:2195-2202 (2004)

Laboratory data on the mutants

Page 23: HDMR Methodology

Example: Biochemical multi-component formulation mapping

• Allosteric regulation of aspartate transcarbamoylase (ATcase) in vitro by all four ribonucleotide triphosphates (NTPs)

• ATcase activity (output) was measured for 300 random NTP concentration combinations (inputs) in the laboratory

• A second order RS-HDMR as an input -> output map was constructed. Its accuracy is comparable with the laboratory error The absolute error of repeated

measurements

Page 24: HDMR Methodology

Biochemical multi-component formulation mapping

The comparison of the laboratory data and the 2nd order RS-HDMR approximation for “used” and “test” data

Note: The two parallel lines are absolute error ±0.2

Page 25: HDMR Methodology

The s-space network identification procedure (SNIP)

aTc: x1 IPTG: x2 EYFP: y(x1,x2)

Encode: x1→x1m1(s) x2→x2m2(s)

Response measurement: y→y(s)

Decode: Fourier transform

TetR

tetR lacI eyfp

EYFPLacI

pL(tet) P(lac)

aTc

p(lacIq)

IPTG

TetR

tetRtetR lacIlacI eyfp

EYFPLacILacI

pL(tet) P(lac)

aTc

p(lacIq)

IPTGLaboratory data on the transcriptional cascade

Page 26: HDMR Methodology

Nonlinear property prediction by SNIP

Unmeasured region correctly predictedNonlinear, cooperative behavior revealed

Feng, Nichols, Mitra, Hooshangi, Weiss, and Rabitz, In preparation

Page 27: HDMR Methodology

SNIP application to an intracellular signaling network

Sachs, et al., Science, 308:523-529 (2005)

Laboratory single cell measurement data

Page 28: HDMR Methodology

PKC

PKA

P38Jnk

Plc

?

PIP3

PIP2

Raf

Mek

Erk

Akt

PKC

PKA

P38Jnk

Plc

?

PIP3

PIP2

Raf

Mek

Erk

Akt

PKC

PKA

P38Jnk

Plc

?

PIP3

PIP2

Raf

Mek

Erk

Akt

Network connections identified by SNIP and Bayesian analysis

Reliable SNIP prediction of Akt levels

Identified network with predictive capability

Page 29: HDMR Methodology

Example: Ionospheric measured data

• The ionospheric critical frequencies determined from ground-based ionosonde measurements at Huancayo, Peru from years 1957 - 1987 (8694 points)

• Input: year, day, solar flux (f10.7), magnetic activity index (kp), geomagnetic field index (dst), previous day's value of foE

• Output: ionospheric critical frequencies foE• The inputs are not controllable and not independent;

the pdf of the inputs is not separable, and was not explicitly known

Page 30: HDMR Methodology

Ionospheric measured data

The dependence of foE on the input “day”

Ionosonde data distribution: the dependences between normalized input variables: year and f10.7, kp and dst for

the data at 12 UT

Page 31: HDMR Methodology

Ionospheric measured data

The accuracy of the 2nd order RS-HDMR expansion for the output, foE

Page 32: HDMR Methodology

Quantitative molecular property prediction

X1

X2

Standard QSAR

General strategy:Molecular activity is a function of its chemical/physical/structural descriptors

Problems: Overfitting (choice of descriptors) Underlying physics

A simple solution:y=f(x1,x2), x1=1,2,…,N1, x2=1,2,…,N2

Descriptor-free quantitative molecular property interpolation

Page 33: HDMR Methodology

Descriptor-free property predictionfrom an arbitrary substituent order

∑=j

jjcy φ

Page 34: HDMR Methodology

Property prediction from the optimal substituent order

Shenvi, Geremia, and Rabitz, J. Phys. Chem. A, 107:2066 (2003)

Complexity of the search: N1!•N2!=14!•8!=1015

Cost function: yJ ∇=

Page 35: HDMR Methodology

Application to a chromophore transition metal complex library

Before reordering After reordering

Outliers captured by the reordering algorithm

Liang, Feng, Lowry, and Rabitz J. Phys. Chem. B, 109:5842-5854 (2005)

∑ −=k

labk

calk yyJ 2)(Cost function:

Page 36: HDMR Methodology

10 20 30 40 50 60 70 80 90

20

40

60

80

100

120

140

Application to a drug compound library

∑ ∑∑ ∑≤′<≤ ≤≤

′′≤′<≤ ≤≤

′′ −+−=2 11 2 1 1

2

1 1

2 ])([])([Njj Ni

jiijjjNii Nj

jiijii yycyycJCost function:

15% of data

Reorder

Prediction

>14,000 compounds

Page 37: HDMR Methodology

THE MODERN WAY TO DO SCIENCE*

* Adaptively under high duty cycle and automated

“You should understand the physics, write down the correct equations, and let nature do the calculations.”

Peter Debye