Upload
ban
View
46
Download
1
Tags:
Embed Size (px)
DESCRIPTION
GLOBAL SENSITIVITY ANALYSIS BY RANDOM SAMPLING - HIGH DIMENSIONAL MODEL REPRESENTATION (RS-HDMR) Herschel Rabitz Department of Chemistry, Princeton University, Princeton, New Jersey 08544. HDMR Methodology. HDMR expresses a system output as a hierarchical correlated function expansion of inputs:. - PowerPoint PPT Presentation
Citation preview
GLOBAL SENSITIVITY ANALYSIS BY RANDOM SAMPLING - HIGH
DIMENSIONAL MODELREPRESENTATION (RS-HDMR)
Herschel Rabitz
Department of Chemistry, Princeton University,Princeton, New Jersey 08544
HDMR Methodology
• HDMR expresses a system output as a hierarchical correlated function expansion of inputs:
HDMR Methodology (Contd.)
• HDMR component functions are optimally defined as:
- where are unconditional and conditional probability density functions:
RS (Random Sampling) – HDMR (Contd.)
• RS-HDMR component functions are approximated by expansions of orthonormal polynomials
- Inputs can be sampled independently and/or in a correlated fashion
- Only one set of data is needed to determine all of the component functions
- Statistical analysis (F-test) is used proper truncation of RS-HDMR expansion
Global Sensitivity Analysis by RS-HDMR
• Individual RS-HDMR component functions have a direct statistical correlation interpretation, which permits the model output variance to be decomposed into its input contributions
- Where are defined as the covariances of
with f(x), respectively
A Propellant Ignition Model
Calculated profiles of temperature and major mole fractions for the ignition and combustion of the M10 solid propellant
A Propellant Ignition Model
• 10 independent and 44 cooperative contributions of inputs were identified as significant
A Propellant Ignition Model
• Nonlinear global sensitivity indexes efficiently identified all significant contributions of inputs
Trichloroethylene (TCE) Microenvironmental/Pharmacokinetic Modeling
Microenvironmental/exposure/dose modeling system
Structure of TCE-PBPK model (adapted from Fisher et. al., 1998)
Example: Trichloroethylene (TCE) Microenvironmental/Pharmacokinetic Modeling
• The coupled microenvironmental/pharmacokinetic model:
- Three exposure routes (inhalation, ingestion, and dermal absorption)
- Release of TCE from water into the air within the residence
- Activities of individuals and physiological uptake processes
• Seven input variables [age (x1), tap water concentration (x2), shower stall volume (x3), drinking water consumption rate (x4), shower flow rate (x5), shower time (x6), time in bathroom after shower (x7)] are used to construct the RS-HDMR orthonormal polynomials
• Target outputs: the total internal doses from intake (inhalation and ingestion) and uptake (dermal absorption)
- The amount inhaled or ingested:
- The amount absorbed:
- C(t): exposure concentration, IR(t): inhalation or ingestion rate, Kp: permeability coefficient, SA(t): surface area exposed
Trichloroethylene (TCE) Microenvironmental/Pharmacokinetic Modeling
• Inputs (x1, x2, x3, x4) have a uniform distribution, and inputs (x5, x6, x7) have a triangular distribution; 10,000 input-output data were generated
The data distributions for the uniformly distributed variable x1 and the triangularly distributed variable x5
Trichloroethylene (TCE) Microenvironmental/Pharmacokinetic Modeling
• Seven independent, fifteen 2nd order and one 3rd order cooperative contributions of inputs were identified as significant
First order sensitivity indexes
Trichloroethylene (TCE) Microenvironmental/Pharmacokinetic Modeling
• Nonlinear global sensitivity indexes (2nd order and above) efficiently identified all significant contributions of inputs
The ten largest 2nd and 3rd order sensitivity indexes
Identification of bionetwork model parameters
Characteristics of the problem: System nonlinearity Limited number & type of experiments Considerable biological and measurement noise
Multiple solutions exist !
Problems with traditional identification methods: Provide only one or a few solutions for each parameter Assume linear propagation from data noise to parameter uncertainties
The closed-loop identification protocol (CLIP): Extract the full parameter distribution by global identification Iteratively look for the most informative experiments for minimizing parameter uncertainty
xc xr
Proposed Mechanism
CONTROL MODULE
Controlled System Perturbations and
Property Measurements
LaboratoryConstraints
R(u,X)
uc(t)
Xr(t) Qinv
Best Solution Distribution
(k*)
ANALYSIS MODULE
INVERSION MODULE
Learning Algorithm Guiding the Experiments: Qinv
→ J ctrl→ uc( )t
Previous Knowledge
Trial Solutions
(k0)
Pre-lab analysis and design of themost informative experiments
Iterative experiment optimizationand data acquisition
Global parameter identification
General operation of CLIP
Isoleucyl-tRNA synthetase proofreading valyl-tRNAIle
Okamoto and Savageau, Biochemistry, 23:1701-1709 (1984)
*
**
**
*
* Rate constants to be identified
The inversion module: identifying the rate constant distribution
The Genetic Algorithm (GA)
Mutation
1101 1111+1100 0010
1101 1101+1100 0110
Crossover
1101 1100 + 1111 0010
1101 0010 + 1111 1100
The inversion cost function
in
calpitn
labitn
t
tt
N
n
piinv XX
TNJ
T
ε/||||11 ,,
,,,
1
,
1
−= ∑∑==
0
5
10
15
20
25
0
0.110.220.330.440.550.660.770.880.991.1
1.211.321.431.541.651.761.871.98
k2 (s-1, in log scale)
no. of solutions
distribution of k2 (random control)
Typical rate constant distributionafter random perturbation/control
Q
Inversion quality index Q
The analysis module: estimating the most informative experiments
Estimate the best species for monitoring system behavior Determine the best species for perturbing the system
Nonlinear sensitivity analysis by Random-Sampling High Dimensional Model Representation (RS-HDMR)
K++= ∑∑≤<≤= nji
ij
n
iitotal
1
2
1
22 σσσ
Optimally controlled identification: squeezing on the rate constant distribution
The control cost function
∑= +
−=
−=M
mim
im
im
imi
inv
ir
ic
iinv
ictrl
kk
kk
MQ
tXtuRQJ
1 min,max,
min,max, ])(
)(1/[1
)](),([ω
Inversion quality
Feng and Rabitz, Biophys. J., 86:1270-1281 (2004)Feng, Rabitz, Turinici, and LeBris, J. Phys. Chem. A, 110:7755-7762 (2006)
Non-
Network property optimization:
A. Identifying the best targeted network locations for intervention
B. Identifying the optimal network control
OptimalNetwork
Performance
OptimalNetwork
Performance
Optimal Controls
Optimal Controls
BiologicalSystemBiological
System
Initial Guess/Random Control
Initial Guess/Random Control
ControlDesignControlDesign
LearningAlgorithm
LearningAlgorithm
ObservedResponseObservedResponse
ControlObjective
ControlObjective
A. Molecular target identification for network engineering
P(lac)
IPTG
LacI
lacI cI eyfp
EYFPCI
?
P(R-O12)p(lacIq )
LacI
lacIlacI cIcI eyfp
EYFPCICI
?
P(R-O12)p(lacIq )
Random-sampling high dimensionalmodel representation (RS-HDMR)
∑ ∑= ≤<≤
+++==N
i Njijiijii kkfkfffy
1 10 ),()()( Kk
Randomly sample k
K++= ∑∑≤<≤= Nji
ij
N
iitotal
1
2
1
22 σσσ
Advantages of RS-HDMR:
Global sensitivity analysis Nonlinear component functions Physically meaningful representation Favorable scalability
Li, Rosenthal, and Rabitz, J. Phys. Chem. A, 105:7765-7777 (2001)
P(lac)
IPTG
LacI
lacI cI eyfp
EYFPCI
?
P(R-O12)p(lacIq )
LacI
lacIlacI cIcI eyfp
EYFPCICI
?
P(R-O12)p(lacIq )
k6 k10 ─ k13
01
2
3
4
5
6
7
8
9
p110 pR1 pR2 pR3
IPTG=1mMIPTG=0
k10 ─ k13 fixed
05
10
15
20
25
30
35
4045
p107 pM4 pM5 pM6
IPTG=1mMIPTG=0
k6 fixed
Feng, Hooshangi, Chen, Li, Weiss, and Rabitz, Biophys. J., 87:2195-2202 (2004)
Laboratory data on the mutants
Example: Biochemical multi-component formulation mapping
• Allosteric regulation of aspartate transcarbamoylase (ATcase) in vitro by all four ribonucleotide triphosphates (NTPs)
• ATcase activity (output) was measured for 300 random NTP concentration combinations (inputs) in the laboratory
• A second order RS-HDMR as an input -> output map was constructed. Its accuracy is comparable with the laboratory error The absolute error of repeated
measurements
Biochemical multi-component formulation mapping
The comparison of the laboratory data and the 2nd order RS-HDMR approximation for “used” and “test” data
Note: The two parallel lines are absolute error ±0.2
The s-space network identification procedure (SNIP)
aTc: x1 IPTG: x2 EYFP: y(x1,x2)
Encode: x1→x1m1(s) x2→x2m2(s)
Response measurement: y→y(s)
Decode: Fourier transform
TetR
tetR lacI eyfp
EYFPLacI
pL(tet) P(lac)
aTc
p(lacIq)
IPTG
TetR
tetRtetR lacIlacI eyfp
EYFPLacILacI
pL(tet) P(lac)
aTc
p(lacIq)
IPTGLaboratory data on the transcriptional cascade
Nonlinear property prediction by SNIP
Unmeasured region correctly predictedNonlinear, cooperative behavior revealed
Feng, Nichols, Mitra, Hooshangi, Weiss, and Rabitz, In preparation
SNIP application to an intracellular signaling network
Sachs, et al., Science, 308:523-529 (2005)
Laboratory single cell measurement data
PKC
PKA
P38Jnk
Plc
?
PIP3
PIP2
Raf
Mek
Erk
Akt
PKC
PKA
P38Jnk
Plc
?
PIP3
PIP2
Raf
Mek
Erk
Akt
PKC
PKA
P38Jnk
Plc
?
PIP3
PIP2
Raf
Mek
Erk
Akt
Network connections identified by SNIP and Bayesian analysis
Reliable SNIP prediction of Akt levels
Identified network with predictive capability
Example: Ionospheric measured data
• The ionospheric critical frequencies determined from ground-based ionosonde measurements at Huancayo, Peru from years 1957 - 1987 (8694 points)
• Input: year, day, solar flux (f10.7), magnetic activity index (kp), geomagnetic field index (dst), previous day's value of foE
• Output: ionospheric critical frequencies foE• The inputs are not controllable and not independent;
the pdf of the inputs is not separable, and was not explicitly known
Ionospheric measured data
The dependence of foE on the input “day”
Ionosonde data distribution: the dependences between normalized input variables: year and f10.7, kp and dst for
the data at 12 UT
Ionospheric measured data
The accuracy of the 2nd order RS-HDMR expansion for the output, foE
Quantitative molecular property prediction
X1
X2
Standard QSAR
General strategy:Molecular activity is a function of its chemical/physical/structural descriptors
Problems: Overfitting (choice of descriptors) Underlying physics
A simple solution:y=f(x1,x2), x1=1,2,…,N1, x2=1,2,…,N2
Descriptor-free quantitative molecular property interpolation
Descriptor-free property predictionfrom an arbitrary substituent order
∑=j
jjcy φ
Property prediction from the optimal substituent order
Shenvi, Geremia, and Rabitz, J. Phys. Chem. A, 107:2066 (2003)
Complexity of the search: N1!•N2!=14!•8!=1015
Cost function: yJ ∇=
Application to a chromophore transition metal complex library
Before reordering After reordering
Outliers captured by the reordering algorithm
Liang, Feng, Lowry, and Rabitz J. Phys. Chem. B, 109:5842-5854 (2005)
∑ −=k
labk
calk yyJ 2)(Cost function:
10 20 30 40 50 60 70 80 90
20
40
60
80
100
120
140
Application to a drug compound library
∑ ∑∑ ∑≤′<≤ ≤≤
′′≤′<≤ ≤≤
′′ −+−=2 11 2 1 1
2
1 1
2 ])([])([Njj Ni
jiijjjNii Nj
jiijii yycyycJCost function:
15% of data
Reorder
Prediction
>14,000 compounds
THE MODERN WAY TO DO SCIENCE*
* Adaptively under high duty cycle and automated
“You should understand the physics, write down the correct equations, and let nature do the calculations.”
Peter Debye