8/22/2019 BIGDATA Workshop
1/94
Big Challenges with Big Data
in Life Sciences
Shankar Subramaniam
UC San Diego
8/22/2019 BIGDATA Workshop
2/94
The Digital Human
8/22/2019 BIGDATA Workshop
3/94
A Super-Moores Law
Adapted from Lincoln Stein 2012
http://www.google.com/url?sa=i&rct=j&q=moores+law+in+genomics+lincoln+stein&source=images&cd=&cad=rja&docid=CUa75_qiud_w3M&tbnid=BLVNmC10CilvcM:&ved=0CAUQjRw&url=http%3A%2F%2Fivory.idyll.org%2Fblog%2Fcloud-not-the-solution.html&ei=nCpGUbLYFIugqQHT-ICoDg&bvm=bv.43828540,d.aWc&psig=AFQjCNG2Bd86eAp_WlsrdIWztvI_qm72tw&ust=13636393193097958/22/2019 BIGDATA Workshop
4/94
The Phenotypic Readout
8/22/2019 BIGDATA Workshop
5/94
Data to Networks to Biology
8/22/2019 BIGDATA Workshop
6/94
NETWORK RECONSTRUCTION Data-driven network reconstruction of biological
systems Derive relationships between input/output data
Represent the relationships as a network
Inverse Problem: Data-driven Network Reconstruction
Experiments/Measurements
8/22/2019 BIGDATA Workshop
7/94
Network ReconstructionsReverse Engineering of biological networks
Reverse engineering of biological networks:
- Structural identification: to ascertain network structure ortopology.
- Identification of dynamics to determine interaction details.
Main approaches:
- Statistical methods
-Simulation methods
- Optimization methods
- Regression techniques
- Clustering
8/22/2019 BIGDATA Workshop
8/94
Network Reconstruction ofDynamic Biological Systems:Doubly Penalized LASSO
Behrang Asadi*, Mano R. Maurya*,
Daniel Tartakovsky, Shankar Subramaniam
Department of BioengineeringUniversity of California, San Diego
NSF grants (STC-0939370, DBI-0641037 and DBI-0835541)
NIH grants 5 R33 HL087375-02* Equal effort
8/22/2019 BIGDATA Workshop
9/94
APPLICATIONPhosphoprotein signaling and cytokine measurements in RAW
264.7 macrophage cells.
8/22/2019 BIGDATA Workshop
10/94
MOTIVATION FOR THE NOVEL METHOD
Various methods
Regression-based approaches (least-squares) with statisticalsignificance testing of coefficients
Dimensionality-reduction to handle correlation: PCR and PLS
Optimization/Shrinkage (penalty)-based approach: LASSO
Partial-correlation and probabilistic model/Bayesian-based Different methods have distinct
advantages/disadvantages
Can we benefit by combining the methods?
Compensate for the disadvantages
A novel method: Doubly Penalized Linear Absolute
Shrinkage and Selection Operator (DPLASSO)
Incorporate both statistical significant testing andShrinkage
8/22/2019 BIGDATA Workshop
11/94
LINEAR REGRESSION
Goal: Building a linear-relationship based model
X: input data (m samples by n inputs), zero mean, unit standard deviation
y: output data (m samples by 1 output column), zero-mean
b: model coefficients: translates into the edges in the network
e: normal random noise with zero mean
Ordinary Least Squares solution:
Formulation for dynamic systems:
2 arg min{ ( - ) ( - )}Te b y Xb y Xb-1
( )T T
b X X X y
),0(~ Nee;Xby
),0(~)( Nttdt
dee;Xb
XXXy
8/22/2019 BIGDATA Workshop
12/94
Most coefficients non-zero, a mathematical artifact
Perform statistical significance testing Compute the standard deviation on the coefficients
Ratio
Coefficient is significant (different from zero) if:
Edges in the network graph represents the coefficients.
STATISTICAL SIGNIFICANCE TESTING
* 2
cov( )
T
b bb
y y
, , , ,/ij k ij k ij k r b b
tinv(1 / 2, )
, 1 confidence level
ijr v
v DOF
* Krmer, Nicole, and Masashi Sugiyama. "The degrees of freedom of partial least squares regression." Journal of the AmericanStatistical Association106.494 (2011): 697-705.
1);/())((:SquaresFor Least 2/11, nmvvmRMSEXXdiag LSTLSb
mmyystdyym
RMSE piim
i piiLS/)1()()(
1,1
2
,
8/22/2019 BIGDATA Workshop
13/94
Partial least squares finds direction in the X space that explainsthe maximum variance direction in the Y space
PLS regression is used when the number of observations per
variable is low and/or collinearity exists among X values
Requires iterative algorithm: NIPALS, SIMPLS, etc
Statistical significance testing is iterative
CORRELATED INPUTS: PLS
T
T
0
X=TP +E
Y=UQ +F
Y=XB+B
* H. WOLD, (1975), Soft modelling by latent variables; the non-linear iterative partial least squares approach, inPerspectives in Probability and Statistics, Papers in Honour of M. S. Bartlett, J. Gani, ed., Academic Press, London.
8/22/2019 BIGDATA Workshop
14/94
LASSO
Shrinkage version of the Ordinary Least Squares, subject to
L-1 penalty constraint (the sum of the absolute value of the
coefficients should be less than a threshold)
Where represents the full least square estimates
0 < t < 1: causes the shrinkage
The LASSO estimator is then defined as:
* Tibshirani, R.: Regression shrinkage and selection via the Lasso, J. Roy. Stat. Soc. B Met., 1996, 58, (1), pp. 267288
CostFunction
L-1Constraint
j
j
j
j
N
i j
ijji
btb
xbbybb
0
1
200
subject to
)(argmin),(
0b
8/22/2019 BIGDATA Workshop
15/94
Noise and Missing Data
More systematic comparison needed withrespect to:1. Noise: Level, Type
2. Size (dimension)
3. Level of missing data4. Collinearity or dependency among input channels
5. Missing data
6. Nonlinearity between inputs/outputs and nonlineardependency
7. Time-series inputs(/outputs) and dynamicstructure
8/22/2019 BIGDATA Workshop
16/94
METHODS
Linear Matrix Inequalities (LMI)*
Converts a nonlinear optimization problem into a linearoptimization problem.
Congruence transformation:
Pre-existing knowledge of the system (e.g. ) can be added inthe form of LMI constraints:
Threshold the coefficients:
13 210 , 0a a
* [Cosentino, C., et al., IET Systems Biology, 2007. 1(3): p. 164-173]
min( ) / ( - )( - )n p
T
m mB
e s t e
Y Xb Y Xb I
-0
( ) -
m m
T
p p
e
I Y Xb
Y - Xb I
( )0T T Ti j j iv u u v B B0,
1,
r
i
r
v r i
v v r i
0,
1,
r
i
r
u r iu u r i
.. :2 2
/ij ij i jb b b b
8/22/2019 BIGDATA Workshop
17/94
METRICS
Metrics for comparing the methods
o Reconstruction from 80% of datasets and 20% for validation
o RMSE on the test set, and the number and the identity of the significant
predictors as the basic metric to evaluate the performance of each method1. Fractional error in the estimating the parameters
2. Sensitivity, specificity, G, accuracy
,,
,
1method j
frac jtrue j
bb mean
b
parameters smaller than 10% of the standard deviation ofall parameter values were set to 0 when generating thesynthetic data
:
:
:
TN TP Accuracy
TN TP FN FP
TPSensitivity
TP FN
TNSpecificity
TN FP
TP : True PositiveFP : False PositiveTN : True NegativeFN : False Negative
8/22/2019 BIGDATA Workshop
18/94
RESULTS: DATA SETS
Data sets for benchmarking: Two data sets
1. First set: experimental data measured on
macrophage cells (Phosphoprotein (PP) vsCytokine)*
2. Second sets consist of synthetic datagenerated in Matlab. We build the model using80% of the data-set (called training set) anduse the rest of data-set to validate the model(called test set).
* [Pradervand, S., M.R. Maurya, and S. Subramaniam, Genome Biology, 2006. 7(2): p. R11].
8/22/2019 BIGDATA Workshop
19/94
RESULTS: PP-Cytokine Data Set
Schematic representation of Phosphoprotein (PP) vsCytokine
- Signals were transmitted through22 recorded signaling proteins and
other pathways (unmeasuredpathways).
- Only measured pathwayscontributed to the analysis
Schematic graphs from:
[Pradervand, S., M.R. Maurya, and S. Subramaniam, Genome Biology, 2006. 7(2): p. R11].
8/22/2019 BIGDATA Workshop
20/94
PP-CYTOKINE DATASET
Measurements of phosphoproteins in response to LPS
Courtesy: AfCS
8/22/2019 BIGDATA Workshop
21/94
Measurements of cytokines in response toLPS
~ 250 such datasets
8/22/2019 BIGDATA Workshop
22/94
RESULTS: COMPARISON
Comparison on synthetic noisy data The methods are applied on synthetic data with 22 inputs and 1 output.
The true coefficients for the inputs (about 1/3rd) are made zero totest the methods if they identify them as insignificant.
Effect of noise levelFour outputs with 5, 10, 20 and 40% noise levels, respectively, aregenerated from the noise-free (true) output.
Effect of noise type
Three outputs with White, t-distributed, and uniform noise types,respectively are generated from the noise-free (true) output
8/22/2019 BIGDATA Workshop
23/94
RESULTS: COMPARISONVariability between realizations of data with white noisePCR, LASSO, and LMIare used to identify significant predictors for1000 input-output pairs.
Histograms of the coefficients in the three significant predictorscommon to the three methods:
Method Predictor # 1 10 11
True value -3.40 5.82 -6.95
PCR Mean -3.81 4.73 -6.06
Std. 0.33 0.32 0.32
Frac. Err. in mean 0.12 0.19 0.13
LASSO Mean -2.82 4.48 -5.62
Std. 0.34 0.32 0.33
Frac. Err. in mean 0.17 0.23 0.19
LMI Mean -3.70 4.74 -6.34
Std. 0.34 0.32 0.34
Frac. Err. in mean 0.09 0.18 0.09
Mean and standard deviation in the histograms ofthe coefficients computed with PCR, LASSO, and
LMI.
8/22/2019 BIGDATA Workshop
24/94
RESULTS: COMPARISON Comparison of outcome of different methods on the real data
Different methods identified unique sets of common and distinctpredictors for each output
Graphical illustration of methods PCR, LASSO, and LMI in detection of
significant predictors for output IL-6 in PP/cytokine experimental dataset
Only the PCRmethod detectsthe true inputcAMP
zone I providesvalidation and ithighlights thecommon output ofall the methods
8/22/2019 BIGDATA Workshop
25/94
RESULTS: SUMMARY
Comparison with respect to different noise types: LASSO is the most robust methods for different noise types.
Missing data RMSE: LASSO less deviation, more robust.
Collinearity:
PCR less deviation against noise level, better accuracy and Gwithincreasing noise level.
8/22/2019 BIGDATA Workshop
26/94
A COMPARISON (Asadi, et al., 2012)Methods / Criteria PCR LASSO LMI
Increasing Noise
RMSE
Score= (average RMSE across different noise levels for LS)/(average RMSE across different noise levels
for the chosen method)
/ 0.68
degrades gradually
with level of noise
/ 0.56 / 0.94
Standard deviation and error in mean of Coefficients.
Score = 1average (fractional error in mean(10,12,20) + (std(10,12,20)/ |true associated coefficients|) ) / 0.53 / 0.47 / 0.55
Acc./G
Score = average accuracy across different noise levels for chosen method (white noise) / 0.70 / 0.87
/ 0.91
at high noise all
similar
Fractional Error in estimating the parameters
Score = 1- average fractional error in estimating the coefficients across different noise levels for chosen
method (white noise)
/ 0.81 / .55 / 0.78
Types of noise
Fractional Error in estimating the parameters
Score = 1- average fractional error in estimating the coefficients across different noise levels and different
noise types (20% noise level)
/ 0.80 / 0.56 / 0.79
Accuracy and G
Score = average accuracy across different noise levels and different noise types / 0.71 / 0.87 / 0.91
Dimension ratio / Size
Fractional Error in estimating the parameters
Score = 1- average fractional error in estimating the coefficients across different noise levels and different
ratios (m/n = 100/25, 100/50, 400/100)
/ 0.77 / 0.53 / 0.75
Accuracy and G
Score = average accuracy across different white noise levels and different ratios (m/n = 100/25, 100/50,400/100)
/ 0.66
/ 0.83
/ 0.90
8/22/2019 BIGDATA Workshop
27/94
DPLASSO
Doubly Penalized Least AbsoluteShrinkage and Selection Operator
8/22/2019 BIGDATA Workshop
28/94
OUR APPROACH: DPLASSO
Reconstructed
Network
1 3 5 6 7
y = Xb +
: , , , , ,...B b b b b b
Statistical
Significant Testing
PLS
1 2 3 4 5 6 7 8: , , , , , , , , ...
: 0, 1 , 0 , 1 , 0 , 1 , 0, 1 ,...
B b b b b b b b b
W
LASSO
1, 2 3 4 5 6 7 8: , , , , , , , ...B b b b b b b b b
Model
y = Xb +
8/22/2019 BIGDATA Workshop
29/94
Our approach: DPLASSO includes two parameterselection layers:
Layer 1 (supervisory layer):
Partial Least Squares (PLS)
Statistical significance testing
Layer 2 (lower layer):
LASSO with extra weights on less informative model parameters
derived in layer 1
Retain significant predictors and set the remaining small coefficients to
zero
DPLASSO WORK FLOW
2
1,..., 1,...,
arg min{ ( - ) ( - )}
/
T
j j j j j
LS
ij ij ij ij
i p i p
e
s t w b t w b
b y Xb y Xb
wij 0 bij is PLS- significant
1 otherwise
8/22/2019 BIGDATA Workshop
30/94
DPLASSO: EXTENDED VERSION Smooth weights:
Layer 1 : Continuous significance score (versus binary):
Mapping function (logistic significance score):
Layer 2:
Continuous weight vector (versus fuzzy weight vector)
-tinv(1 / 2, )
, 1 confidence level
i i
PLS
r v
v DOF
( )
1( )
1 ii is
e
2
1,..., 1,...,
argmin{ ( - ) ( - )} , /T LSj j j j j i ij i iji p i p
e s t w b t w b
b y Xb y Xb
-5 0 50
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
(Significance Score)
s() (Significance Score)
w() (Weight function)
15.05.0)(00:tscoefficienantinsignific5.001)(5.00:tscoefficientsignifican
),()(1
iiii
iiii
iiiii
wsws
ssw
Tuning parameter
8/22/2019 BIGDATA Workshop
31/94
APPLICATIONS
1. Synthetic (random) networks: Datasetsgenerated in Matlab
2. Biological dataset: Saccharomyces
cerevisiae - cell cycle model
8/22/2019 BIGDATA Workshop
32/94
SYNTHETIC (RANDOM) NETWORKS
Datasets generated in Matlab using:
Linear dynamic system
Dominant poles/Eigen values () ranges [-2,0]
Lyapunov stable
Informal definition from wikipedia: if all solutions of the
dynamical system that start out near an equilibrium point xestay near xe forever, then the system is Lyapunov stable.
Zero-input/Excited-state release condition
5% measurement (white) noise.
),0(~)( Nttdt
dee;Xb
XXXy
8/22/2019 BIGDATA Workshop
33/94
Two metrics to evaluate the performance of DPLASSO1. Sensitivity, Specificity, G (Geometric mean of Sensitivity and
Specificity), Accuracy
2. The root-mean-squared error (RMSE) of prediction
METRICS
TP : True Positive
FP : False Positive
TN : True NegativeFN : False Negative
2
,
1
1( )
m
i i p
i
RMSE y ym
Accuracy TNTP
TNTPFNFP
SensitivityTP
TPFN
SpecificityTN
TNFP
Precision TP
TPFP
8/22/2019 BIGDATA Workshop
34/94
TUNING
Tuning shrinkage parameter for DPLASSO
The shrinkage parameters in LASSO level (threshold t) via k-foldcross-validation (k= 10) on associated dataset
Validation error versus selection threshold t for
DPLASSO on synthetic data set
Rule of thumb after cross
validations:
Example:
Optimal value of the tuning
parameter for a network with 65%
connectivity roughly equal to 0.65
PERFORMANCE COMPARISON ACCURACY
8/22/2019 BIGDATA Workshop
35/94
PERFORMANCE COMPARISON: ACCURACY
0
0.51
1.5
-4
-2
0
20.5
0.55
0.6
0.65
0.7
Accuracy
LASSODPLASSO
PLS
00.5
11.5
-4
-2
0
20.2
0.4
0.6
0.8
1
Accuracy
LASSO
DPLASSO
PLS
00.5
11.5
-4
-2
0
20.2
0.4
0.6
0.8
1
Accuracy
LASSO
DPLASSO
PLS
00.5
11.5
-4
-2
0
20
0.2
0.4
0.6
0.8
Accuracy
LASSO
DPLASSO
PLS
Density 5%
Density 10%
Density 50%Density 20%
Network Size 20MC 10Noise 5%
PLS Better performance
DPLASSO provides good compromise between LASSO and PLS in terms of
accuracy for different network densities
PERFORMANCE COMPARISON SENSITIVITY
8/22/2019 BIGDATA Workshop
36/94
PERFORMANCE COMPARISON: SENSITIVITY
0
0.51
1.5
-4
-2
0
20.4
0.6
0.8
1
Sensitivity
LASSO
DPLASSO
PLS
0 0.5
11.5
-4
-2
0
20.4
0.6
0.8
1
Sensitivity
LASSO
DPLASSO
PLS
00.5
1
1.5
-4
-2
0
20.4
0.6
0.8
1
Sensitivity
LASSO
DPLASSO
PLS
00.5
1
1.5
-4
-2
0
20.4
0.6
0.8
1
Sensitivity
LASSO
DPLASSO
PLS
Density 5%Density 10%
Density 50%Density 20%
Network Size 20MC 10Noise 5%
LASSO has better performance
DPLASSO provides good compromise between LASSO and PLS in terms of
Sensitivity for different network densities
PERFORMANCE COMPARISON SPECIFICITY
8/22/2019 BIGDATA Workshop
37/94
PERFORMANCE COMPARISON: SPECIFICITY
00.5
1
1.5
-4
-2
0
20
0.2
0.4
0.6
0.8
Specificity
LASSO
DPLASSO
PLS
00.5
1
1.5
-4
-2
0
20
0.2
0.4
0.6
0.8
Specificity
LASSO
DPLASSO
PLS
00.5
11.5
-4
-2
0
20
0.2
0.4
0.6
0.8
Specificity
LASSO
DPLASSO
PLS
00.5
11.5
-4
-2
0
20
0.2
0.4
0.6
0.8
Specificity
LASSO
DPLASSO
PLS
Density 50% Density 20%
Density 5%Density 10%
Network Size 20MC 10Noise 5%
DPLASSO provides good compromise between LASSO and PLS in terms of
specificity for different network densities.
PERFORMANCE COMPARISON NETWORK SIZE
8/22/2019 BIGDATA Workshop
38/94
PERFORMANCE COMPARISON: NETWORK-SIZE
DPLASSO provides good compromise between LASSO and PLS in terms of
accuracy for different network sizes
DPLASSO provides good compromise between LASSO and PLS in terms of
sensitivity (not shown) for different network sizes
00.5
11.5
-4
-2
0
20.2
0.4
0.6
0.8
1
LASSO
DPLASSO
PLS
0
0.51
1.5
-4
-2
0
20.2
0.4
0.6
0.8
1
LASSO
DPLASSO
PLS
00.5
11.5
-4
-2
0
20.2
0.4
0.6
0.8
1
LASSO
DPLASSO
PLS
Network Size: 10* 100 potential connections
Network Size: 20* 400 potential connections
Network Size: 50* 2500 potential connections
Acc
uracy
Acc
uracy
Acc
uracy
ROC CURVE DYNAMICS AND WEIGHTINGS
8/22/2019 BIGDATA Workshop
39/94
ROC CURVE vs. DYNAMICS AND WEIGHTINGS
DPLASSO exhibits better performance for networks with slow dynamics.
The parameter in DPLASSO can be adjusted to improve performance
for fast dynamic networks
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
0.2
0.4
0.6
0.8
1
ROC for variable (the closer to origin the larger - Density: 20% MC: 10 Size: 50)
Specificity
Sensitivity
LASSO
DPLASSO
PLS
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
0.2
0.4
0.6
0.8
1ROC for variable (the larger the larger - Density: 20% MC: 10 Size: 50)
Specificity
Sensitivity
LASSO
DPLASSO
PLS
8/22/2019 BIGDATA Workshop
40/94
YEAST CELL DIVISION
Experimental dataset generated via well-known nonlinear model of a
cell division cycle of fission yeast. The model is dynamic with 9 state
variables.
* Novak, Bela, et al. "Mathematical model of the cell division cycle of fissionyeast." Chaos: An Interdisciplinary Journal of Nonlinear Science 11.1 (2001): 277-286.
8/22/2019 BIGDATA Workshop
41/94
CELL DIVISION CYCLE
True Network (Cell Division Cycle)
PLS DPLASSO LASSO
Missing in DPLASSO!
8/22/2019 BIGDATA Workshop
42/94
RECONSTRUCTION PERFORMANCE
MethodMetric
Accuracy Sensitivity Specificity SD RMSE/MeanLASSO 0.31 0.92 0.16 0.14
DPLASSO 0.56 0.73 0.52 0.08
PLS 0.60 0.67 0.63 0.09
Case Study II: Cell Division Cycle, Average over value
Case Study I: 10 Monte Carlo Simulations, Size 20, Average over different , , network
density, and Monte Carlo sample datasets
MethodMetric
Accuracy
Sensitivity
Specificity
SD RMSE/MeanLASSO 0.39 0.90 0.05 0.06
DPLASSO 0.52 0.90 0.34 0.07
PLS 0.59 0.80 0.20 0.07
8/22/2019 BIGDATA Workshop
43/94
CONCLUSION
Novel method, Doubly Penalized Linear Absolute Shrinkage and
Selection Operator (DPLASSO), to reconstruct dynamic biologicalnetworks
Based on integration of significance testing of coefficients and optimization
Smoothening function to trade off between PLS and LASSO
Simulation results on synthetic datasets
DPLASSO provides good compromise between PLS and LASSO in terms
of accuracy and sensitivity for
Different network densities
Different network sizes
For biological dataset
DPLASSO best in terms of sensitivity
DPLASSO good compromise between LASSO and PLS in terms of
accuracy, specificity and lift
8/22/2019 BIGDATA Workshop
44/94
Information TheoryMethods
Farzaneh Farangmehr
8/22/2019 BIGDATA Workshop
45/94
Mutual Information
It gives us a metric that is indicative of how much information from avariable can be obtained to predict the behavior of the other variable .
The higher the mutual information, the more similar are the two profiles.
For two discrete random variables of X={x1,..,xn} and Y={y1,ym}:
p(xi,yj) is the joint probability of xi and yjP(xi) and p(yj) are marginal probability of xi and yj
m
j
n
i ji
ji
jiypxp
yxpyxpYXI
1 1 )()(
),(log),();(
I f ti th ti l h
8/22/2019 BIGDATA Workshop
46/94
Information theoretical approachShannon theory
Hartleys conceptual framework of information relates the information of a randomvariable with its probability.
Shannon defined entropy, H, of a random variable X given a random sample in termsof its probability distribution:
Entropy is a good measure of randomness or uncertainty.
Shannon defines mutual information as the amount of information about a randomvariable Xthat can be obtained by observing another random variable Y:
},...,{ 1 nxx
)](log[)()()()(11
i
n
i
ii
n
i
ixPxPxIxPXH
),()()()()(),()()(),( XYIYXHXHXYHYHYXHYHXHYXI
8/22/2019 BIGDATA Workshop
47/94
Mutual information networks
X={x1, ,xi} Y={y1 , ,yj}
The ultimate goal is to find the best model that maps X Y- The general definition: Y= f(X)+U. In linear cases: Y=[A]X+U where [A] is a matrix
defines the linear dependency of inputs and outputs
Information theory maps inputs to outputs (both linear and non-linear models)by using the mutual information:
m
j
n
i ji
ji
jiypxp
yxpyxpYXI
1 1 )()(
),(log),();(
8/22/2019 BIGDATA Workshop
48/94
Mutual information networks The entire framework of network reconstruction using information theory
has two stages:
1-Mutual information measurements
2- The selection of a proper threshold.
Mutual information networks rely on the measurement of the mutualinformation matrix (MIM). MIM is a square matrix whose elements (MIMij= I(Xi;Yj)) are the mutual information between Xi and Yj.
Choosing a proper threshold is a non-trivial problem. The usual way is toperform permutations of expression of measurements many times andrecalculate a distribution of the mutual information for each permutation.Then distributions are averaged and the good choice for the threshold isthe largest mutual information value in the averaged permuteddistribution.
M t l i f ti t k
8/22/2019 BIGDATA Workshop
49/94
Mutual information networksData Processing Inequality (DPI)
The DPI for biological networks states that if genesg1 andg3 interactonly through a third gene,g2, then:
Checking against the DPI may identify those gene pairs which are notdirectly dependent even if
)],();,(min[),( 322131 ggIggIggI
)()(),( jiji gpgpggp
8/22/2019 BIGDATA Workshop
50/94
ARACNe algorithm
ARACNE flowchart [Califano and coworkers]
ARACNE stands for Algorithmfor the Reconstruction ofAccurate Cellular NEtworks[25].
ARACNE identifies candidateinteractions by estimatingpairwise gene expression profilemutual information, I(gi, gj) andthen filter MIs using anappropriate threshold, I0,
computed for a specific p-value,p0. In the second step, ARACNeremoves the vast majority ofindirect connections using theData Processing Inequality(DPI).
8/22/2019 BIGDATA Workshop
51/94
Protein-Cytokine
Network in
MacrophageActivation
8/22/2019 BIGDATA Workshop
52/94
Application to Protein-Cytokine Network Reconstruction
Release of immune-regulatory Cytokines during inflammatory response is medicated by acomplex signaling network [45].
Current knowledge does not provide a complete picture of these signaling components.
22 Signaling proteins responsible for cytokine releases:
cAMP, AKT, ERK1, ERK2, Ezr/Rdx, GSK3A, GSK3B, JNK lg, JNK sh, MSN, p38,p40Phox, NFkB p65, PKCd, PKCmu2,RSK, Rps6 , SMAD2, STAT1a, STAT1b, STAT3,STAT5
7 released cytokines (as signal receivers):
G-CSF, IL-1a, IL-6, IL-10, MIP-1a, RANTES, TNFa
we developed an information theoretic-based model that derives the responses of sevenCytokines from the activation of twenty two signaling Phosphoproteins in RAW 264.7
macrophages.
This model captured most of known signaling components involved in Cytokine releases and wasable to reasonably predict potentially important novel signaling components.
Protein Cytokine Network Reconstruction
8/22/2019 BIGDATA Workshop
53/94
Protein-Cytokine Network ReconstructionMI Estimation using KDE
- Given a random sample for a univariate random variable Xwith an unknowndensity a kernel density estimator (KDE) estimates the shape of this function as:
assuming Gaussian kernels:
- Bivariate kernel density function of two random variables Xand Ygiven two randomsamples and :
-
Mutual information of X and Y using Kernel Density Estimation:
n =sample size; h=kernel width
},...,{ 1 nxxf
)(1
)(1
)(1 h
xxk
nhxxk
nxf ihi
n
i
h
n
i
i
h
xx
nhxf
12
2
2 2
)(exp
2
1)(
},...,{ 1 nxx },...,{ 1 nyy
n
i
ii
h
yyxx
nhyxf
1 2
22
2 2
)()(exp
2
1),(
n
j jj
jj
yfxf
yxf
nYXI
1 )()(
),(ln
1),(
P t in C t kin N t k R nst u ti n
8/22/2019 BIGDATA Workshop
54/94
Protein-Cytokine Network ReconstructionKernel bandwidth selection
There is not a universal way of choosing h and however the ranking of the MIsdepends only weakly on them.
The most common criterion used to select the optimal kernel width is to minimizeexpected risk function, also known as the mean integrated squared error (MISE):
Loss function (Integrated Squared Error) :
Unbiased Cross-validation approach select the kernel width that minimizes the lostfunction by minimizing:
where f(-i),h
(xi
) is the kernel density estimator using the bandwidth h at xi
obtainedafter removing ith observation.
constdxxfdxxfdxxfxfdxxf
dxxfxfhL
h
h
h
)(where)()()(2)(
)](()([)(
222
2
)(2
)()(),(1
2
i
hi
n
i
h xfn
dxxfhUCV
xxfxfEhMISE h d)]()([)( 2
Protein-Cytokine Network Reconstruction
8/22/2019 BIGDATA Workshop
55/94
Protein Cytokine Network ReconstructionThreshold Selection
Based on large deviation theory (extended to biological networks by ARACNE), the
probability that an empirical value of mutual information I is greater than I0,provided that its true value , is:
Where the bar denotes the true MI, N is the sample size and c is a constant. After taking thelogarithm of both sides of the above equation:
Therefore, lnPcan be fitted as a linear function of I0 and the slope of b, where b isproportional to the sample size N.
Using these results, for any given dataset with sample size N and a desired p-value,the corresponding threshold can be obtained.
0I
P(I> I
0I=
0) ~ e
- cNI0
0ln bIaP
Protein-Cytokine Network Reconstructionl d f k
8/22/2019 BIGDATA Workshop
56/94
Kernel density estimation of cytokines
Figure 3: The probability distribution ofseven released cytokines in macrophage 246.7using on Kernel density estimation (KDE)
Mutual information for all 22x7pairs of phosphoprotein-cytokine
from toll data (the upper bar) andnon-toll data (the lower bar).
Protein Cytokine Network Reconstruction
8/22/2019 BIGDATA Workshop
57/94
Protein-Cytokine Network ReconstructionProtein-Cytokine signaling networks
+ =
The topology of signaling protein-released cytokinesobtained from the non-Toll (A) and Toll (B) data.
A
B
Protein cytokine Network Reconstruction
8/22/2019 BIGDATA Workshop
58/94
Protein-cytokine Network ReconstructionSummary
This model successfully captures all known signaling
components involved in cytokine releases
It predicts two potentially new signaling componentsinvolved in releases of cytokines including: Ribosomal
S6 kinase on Tumor Necrosis Factor and RibosomalProtein S6 on Interleukin-10.
For MIP-1 and IL-10 with low coefficient of
determination data that lead to less precise linearthe information theoretical model shows advantageover linear methods such as PCR minimal model[Pradervand et al.] in capturing all known regulatory
components involved in cytokine releases.
8/22/2019 BIGDATA Workshop
59/94
Network reconstruction from time-course dataBackground: Time-delayed gene networks
Comes from the consideration that the expression of a gene at a certain timecould depend by the expression level of another gene at previous time point orat very few time points before.
The time-delayed gene regulation pattern in organisms is a common phenomenon
since:
If effect of geneg1 on geneg2 depends on an inducer,g3, that has to bebound first in order to be able to bind to the inhibition site ong2, therecan be a significant delay between the expression of geneg1 and itsobserved effect, i.e., the inhibition of geneg2.
Not all the genes that influence the expression level of a gene arenecessarily observable in one microarray experiment. It is quite possiblethat thereare not among the genes that are being monitored in theexperiment, or its function is currently unknown.
Network reconstruction from time-course data
8/22/2019 BIGDATA Workshop
60/94
The Algorithm
downstsuptssts iiiii eeoreeeICNA 00 //minarg)(
N t k t ti f ti d t
8/22/2019 BIGDATA Workshop
61/94
Network reconstruction from time-course dataAlgorithm
Network reconstruction from time-course data
8/22/2019 BIGDATA Workshop
62/94
Network reconstruction from time-course dataThe flow diagram
Gene lists
Cluster into
n
subnetwork
Measure
sub-network
activities
Measure the influence
between flagged sub-
networks
Build Inflence matrixFind the
threshold
Remove
connections
below the
threshold
Apply DPI for
connections above
the threshold
Build the network
based on non-zero
elements of the
mutual information
matrix
Flag potentially
dependent sub-
networks by
measuring ICNA
The flow diagram of the information theoretic approach forbiological network reconstruction from time-course microarraydata by identifying the topology of functional sub-networks
Network reconstruction from time-course data
8/22/2019 BIGDATA Workshop
63/94
Network reconstruction from time-course dataCase study: the yeast cell-cycle
The cell cycle consists of four distinct phases:
G0 (Gap 0) :A resting phase where the cell has left the cycle and has stopped dividing.
G1 (Gap 1) : Cells increase in size in Gap 1. The G1checkpointcontrol mechanism ensures thateverything is ready for DNA synthesis.
S1 (Synthesis): DNA replication occurs during this phase.
G2 (Gap 2): During the gap between DNA
synthesis and mitosis, the cell will
continue to grow. The G2checkpoint
control mechanism ensures that
everything is ready to enter the M
(mitosis) phase and divide.
M (Mitosis) : Cell growth stops at this stage and
cellular energy is focused on the orderly
division into two daughter cells. A checkpoint
in the middle of mitosis (Metaphase Checkpoint) ensures that the
cell is ready to complete cell division.
Network reconstruction from time-course data
http://en.wikipedia.org/wiki/DNA_replicationhttp://en.wikipedia.org/wiki/DNA_replicationhttp://en.wikipedia.org/wiki/DNA_replicationhttp://en.wikipedia.org/wiki/DNA_replication8/22/2019 BIGDATA Workshop
64/94
Network reconstruction from time-course dataCase study: the yeast cell-cycle
Data from Gene Expression Omnibus (GEO)
Culture synchronized by alpha factor arrest. samples taken every 7minutes as cells went through cell cycle.
Value type: Log ratio
5,981 genes, 7728 probes and 14 time points
94 Pathways from KEGG Pathways
Network reconstruction from time-course data
8/22/2019 BIGDATA Workshop
65/94
Network reconstruction from time course dataCase study: the yeast cell-cycle
Thereconstructedfunctionalnetwork of yeastcell cycleobtained from
time-coursemicroarray data
Mutual information networks
8/22/2019 BIGDATA Workshop
66/94
Mutual information networksAdvantages and Limits
A major advantage of information theory is its nonparametric nature.Entropy does not require any assumptions about the distribution ofvariables [43].
It does not make any assumption about the linearity of the model forthe ease of computation.
It is applicable for time series data.
A high mutual information does not tell us anything about the directionof the relationship.
8/22/2019 BIGDATA Workshop
67/94
Time Varying Networks
Causality
Maryam Masnardi-Shirazi
Causal Inference of Time Varying
8/22/2019 BIGDATA Workshop
68/94
Causal Inference of Time-VaryingBiological Networks
Definition of Causality
8/22/2019 BIGDATA Workshop
69/94
Definition of Causality
Beyond Correlation: Causation
8/22/2019 BIGDATA Workshop
70/94
y
Idea: map a set of K time series to a directed graph with K nodeswhere an edge is placed from a to b if the past of a has an impact on
the future of b
How do we quantitatively do this in a general purpose manner?
G N i f C li
8/22/2019 BIGDATA Workshop
71/94
Grangers Notion of Causality
It is said that process X Granger Causes Process Y, if future values of Ycan be better predicted using the past values of X and Y than only using
past values of Y.
G C lit F l ti
8/22/2019 BIGDATA Workshop
72/94
Ganger Causality Formulation
There are many ways to formulate the notionof granger causality, some of which are:
- Information Theory and the concept of
Directed Information- Learning Theory
- Dynamic Bayesian Networks
- Vector Autoregressive Models (VAR)- Hypothesis Tests, e.g. t-test and F tests
Vector Autoregressive Model (VAR)
8/22/2019 BIGDATA Workshop
73/94
Vector Autoregressive Model (VAR)
Least Squares Estimation
8/22/2019 BIGDATA Workshop
74/94
Least Squares Estimation
8/22/2019 BIGDATA Workshop
75/94
Least Squares Estimation (Cont.)
P i th d t
8/22/2019 BIGDATA Workshop
76/94
Processing the data
Phosphoprotein two-ligand screen assay: RAW 264.7
There are 327 experiments from western blots processed withmixtures of phosphospecific antibodies. In all experiments, theeffects of single ligand and simultaneous ligand addition are
measured
Each experiment includes the fold change of Phosphoprotein attime points t=0, 1, 3, 10, 30 minutes
Data at time=30 minute is omitted, and data from t=0:10 isinterpolated by steps=1 min
Least Squares Estimation and Rank Deficiency of
8/22/2019 BIGDATA Workshop
77/94
Transformation Matrix
Exp.1
Exp.2
Exp. 327
All Y data
Exp.1
Exp.2
Exp. 327
All X data
N li i h d
8/22/2019 BIGDATA Workshop
78/94
Normalizing the data
8/22/2019 BIGDATA Workshop
79/94
Statistical Significance Test (Confidence Interval)
The Reconstructed Phosphoproteins Signaling
8/22/2019 BIGDATA Workshop
80/94
The Reconstructed Phosphoproteins SignalingNetwork
The network isreconstructed byestimating causalrelationships between allnodes
All the 21phosphoproteins arepresent and interactingwith one another
There are 122 edges inthis network
l d
8/22/2019 BIGDATA Workshop
81/94
Correlation and Causation
The conventional dictum that "correlation does notimply causation" means that correlation cannot beused to infer a causal relationship between thevariables
This does not mean that correlations cannot indicatethe potential existence of causal relations. However,the causes underlying the correlation, if any, may beindirect and unknown
Consequently, establishing a correlation between twovariables is not a sufficient condition to establish acausal relationship (in either direction).
C l ti d C lit i
http://en.wikipedia.org/wiki/Correlation_does_not_imply_causationhttp://en.wikipedia.org/wiki/Correlation_does_not_imply_causationhttp://en.wikipedia.org/wiki/Correlation_does_not_imply_causationhttp://en.wikipedia.org/wiki/Correlation_does_not_imply_causation8/22/2019 BIGDATA Workshop
82/94
Correlation and Causality comparison
Heat-map of the correlation matrix betweenthe input (X) and output (Y)
The reconstructed network consideringsignificant coefficients and their intersection
with connections having correlations higher than0.5
The conventional dictum that "correlation does not imply causation" means that correlation cannot be used to infer a causalrelationship between the variables. This dictum should not be taken to mean that correlations cannot indicate the potentialexistence of causal relations. However, the causes underlying the correlation, if any, may be indirect and unknown.
Consequently, establishing a correlation between two variables is not a sufficient condition to establish a causalrelationship (in either direction).
Correlation and Causality comparison (cont )
http://en.wikipedia.org/wiki/Correlation_does_not_imply_causationhttp://en.wikipedia.org/wiki/Correlation_does_not_imply_causation8/22/2019 BIGDATA Workshop
83/94
Correlation and Causality comparison (cont.)
Heat-map of the correlation matrix between
the input (X) and output (Y)
The reconstructed network consideringsignificant coefficients and their intersection
with connections having correlations higher than0.4
Validating our network
8/22/2019 BIGDATA Workshop
84/94
Validating our network
Identification ofCrosswalk between
phosphoproteinSignaling Pathways in
RAW 264.7Macrophage Cells
(Gupta et al., 2010)
The Reconstructed Phosphoproteins Signaling Network
8/22/2019 BIGDATA Workshop
85/94
The Reconstructed Phosphoproteins Signaling Networkfor t=0 to t=4 minutes
Heat-map of the correlation matrixbetween the input (X) and output (Y)
for t=0 to t=4 minutes
Intersection of Causal Coefficients withconnections with correlations higher than
0.4 for time t=0 to t=4 minutes
9 nodes15 edges
The Reconstructed Phosphoproteins Signaling Network
8/22/2019 BIGDATA Workshop
86/94
The Reconstructed Phosphoproteins Signaling Networkfor t=3 to t=7 minutes
Heat-map of the correlation matrixbetween the input (X) and output (Y)
for t=3 to t=7 minutes
Intersection of Causal Coefficients withconnections with correlations higher than
0.4 for time t=3 to t=7 minutes
19 nodes51 edges
The Reconstructed Phosphoproteins Signaling Network
8/22/2019 BIGDATA Workshop
87/94
The Reconstructed Phosphoproteins Signaling Networkfor t=6 to t=10 minutes
Heat-map of the correlation matrixbetween the input (X) and output (Y)
for t=6 to t=10 minutes
Intersection of Causal Coefficients withconnections with correlations higher than
0.4 for time t=6 to t=10 minutes
19 nodes56 edges
Time-Varying reconstructed Network
8/22/2019 BIGDATA Workshop
88/94
Time-Varying reconstructed Network
t=0 to 4 min t=3 to 7 min t=6 to 10 min
The Reconstructed Network for t=0 to t=4 minutes
8/22/2019 BIGDATA Workshop
89/94
The Reconstructed Network for t 0 to t 4 minuteswithout the presence of LPS as a Ligand
With LPS15 Edges
WithoutLPS
16 Edges
The Reconstructed Network for t=3 to t=7 minuteswithout the presence of LPS as a Ligand VS the
8/22/2019 BIGDATA Workshop
90/94
without the presence of LPS as a Ligand VS thepresence of all ligands
With all ligandsincluding LPS
(51 Edges)
Without LPS
(55 Edges)
The Reconstructed Network for t=6 to t=10 minutes without
8/22/2019 BIGDATA Workshop
91/94
The Reconstructed Network for t 6 to t 10 minutes withoutthe presence of LPS as a Ligand VS the presence of all ligands
With all ligandsincluding LPS
(56 Edges)
Without LPS
(66 Edges)
Time-Varying Network with LPS not present as aligand
8/22/2019 BIGDATA Workshop
92/94
g
t=0 to 4 min t=3 to 7 min t=6 to 10 min
Summary
8/22/2019 BIGDATA Workshop
93/94
Summary
Information theory methods can help in determining causal and time-dependent networks from time series data.
The granularity of the time course will be a factor in determining the
causal connections.
Such dynamical networks can be used to construct both linear andnonlinear models from data.
8/22/2019 BIGDATA Workshop
94/94