Computational methods to inferring cellular networks Stat 877 Apr 15 th 2014 Sushmita Roy

Computational methods to inferring cellular networks

Stat 877Apr 15th 2014Sushmita Roy

Goals for today

• Introduction– Different types of cellular networks

• Methods for network reconstruction from expression– Per-gene vs Per-module methods– Sparse Candidates Bayesian networks – Regression-based methods• GENIE3• L1-DAG learn

• Assessing confidence in network structure

Why networks?

“A system is an entity that maintains its function through the interaction of its parts”– Kohl & Noble

To understand cells as systems: measure, model, predict, refine

Uwe Sauer, Matthias Heinemann, Nicola Zamboni, Science 2007

Different types of networks

• Physical networks – Transcriptional regulatory networks: interactions between

regulatory proteins (transcription factors) and genes– Protein-protein: interactions among proteins– Signaling networks: protein-protein and protein-small molecule

interactions to relay signals from outside the cell to the nucleus

• Functional networks– Metabolic: reactions through which enzymes convert substrates

to products– Genetic: interactions among genes which when perturbed

together produce a significant phenotype than when individually perturbed

Transcriptional regulatory networks

Regulatory network of E. coli.153 TFs (green & light red), 1319 targets

Vargas and Santillan, 2008

Gene C

Transcription factors (TF)

• Directed, signed, weighted graph• Nodes: TFs and Target genes• Edges: A regulates B’s expression

Reactions associated with Galactose metabolism

Metabolic networks

MetabolitesN

Enzymes

• Unweighted graph• Nodes: Metabolic enzyme• Edges: Enzymes M and N share a

compound

Protein-protein interaction networks

Barabasi et al. 2003

Yeast protein interaction network

• Un/weighted graph• Nodes: Proteins• Edges: Protein X physically

interacts with protein Y

Challenges in network biology

Network structure analysis

Network reconstruction/inference (today)

Hubs, degree-distributions,Network motifs

? ? ?Identifying edges and their logic X=f(A,B)

Y=g(B)Node attributes

Structure Parameters

Network applications f

Predicting function and activity of genesfrom network

Goals for today

Computational methods to infer networks

• We will focus on transcriptional regulatory networks– These networks control what genes get activated

when– Precise gene activation or inactivation is crucial for

many biological processes– Microarrays and RNA-seq allows us to systematically

measure gene activity levels • These networks are primarily inferred from gene

expression data

What do we want a model to capture?

X3=ψ(X1,X2)

Function

BOOLEANLINEARDIFF. EQNSPROBABILISTIC…

How they determine expression levels?

Structure

Who are the regulators?

Hot1 regulates HSP12

HSP12 is a target of Hot1

Input: Transcription factor level (trans)

HSP12Sko1Hot1

Output: expression levels

Mathematical representations of regulatory networks

Output expression of target gene

Models differ in the function that maps regulator input levels to target levels

Input expression/activity of regulators

Rate equations Probability distributions

Boolean Networks Differential equations Probabilistic graphical models

Input OutputX1 X2

Regulatory network inference from expression

Expression-based network inference

Experiments

Structure

X3=f(X1,X2)

Function

X3Expression level of gene i in experiment j

Two classes of expression-based methods

• Per-gene/direct methods (Today)

• Module based methods (Thursday)

Module

Per-gene methods

• Key idea: find the regulators that “best explain” expression level of a gene

• Probabilistic graphical methods– Bayesian network

• Sparse Candidates– Dependency networks

• GENIE3, TIGRESS

• Information theoretic methods– Context Likelihood of relatedness– ARACNE

Module-based methods

• Find regulators for an entire module– Assume genes in the same module have the same

regulators• Module Networks (Segal et al. 2005)• Stochastic LeMoNe (Joshi et al. 2008)

Per module

Module

Goals for today

Notation

• V: A set of p network components– p genes

• E: Edge set connecting V• G=(V, E). G is the graph we wish to infer• Xv: Random variable, for v ε V

• X={X1,.., Xp}

• D: Dataset of N measurements for X– D: {x1,…xN}

• Θ: Set of parameters associated with the network

Spang and Markowetz, BMC Bioinformatics 2005

Bayesian networks (BN)

• Denoted by B={G, Θ}– G: Graph is directed and acyclic (DAG)– Pa(Xi): Parents of Xi

– Θ: {θ1,.., θp} Parameters for p conditional probability distributions (CPD) P(Xi | Pa(Xi) )

• Vertices of G correspond to random variables X1… Xp

• Edges of G encode directed influences between X1… Xp

A simple Bayesian network of four variables

Adapted from “Introduction to graphical models”, Kevin Murphy, 2001

Random variables:Cloudy ε {T, F}Sprinker ε {T, F}Rain ε {T, F}WetGrass ε {T,F}

A simple Bayesian network of four variables

Conditional probability distributions (CPD)

Adapted from “Introduction to graphical models”, Kevin Murphy, 2001

Random variables:Cloudy ε {T, F}Sprinker ε {T, F}Rain ε {T, F}WetGrass ε {T,F}

Bayesian network representation of a regulatory network

Bayesian network

TARGET (CHILD)

REGULATORS (PARENTS)X1

X3P(X3|X1,X2)

Random variables

HSP12Sko1Hot1

Inside the cell

Hsp12:

P(X1) P(X2)

Bayesian networks compactly represent joint distributions

Example Bayesian network of 5 variables

PARENTS

No independence assertions

Independence assertions

Assume Xi is binary

Needs 25 measurements

Needs 23 measurements

CPD in Bayesian networks

• The CPD P(Xi|Pa(Xi)) specifies a distribution over values of Xi for each combination of values of Pa(Xi)

• CPD P(Xi|Pa(Xi)) can be parameterized in different ways

• Xi are discrete random variables – Conditional probability table or tree

• Xi are continuous random variables– CPD can be Gaussians or regression trees

• Consider four binary variables X1, X2, X3, X4

Representing CPDs as tables

X1 X2 X3 t f

t t t 0.9 0.1

t t f 0.9 0.1

t f t 0.9 0.1

t f f 0.9 0.1

f t t 0.8 0.2

f t f 0.5 0.5

f f t 0.5 0.5

f f f 0.5 0.5

P( X4 | X1, X2, X3 ) as a tableX4

Pa(X4): X1, X2, X3

Estimating CPD table from data

• Assume we observe the following assignments for X1, X2, X3, X4

T F T T

T T F T

T F T T

T F T F

F F T F

X1 X2 X3 X4

For each joint assignment to X1, X2, X3, estimate the probabilities for each value of X4

For example, consider X1=T, X2=F, X3=T

P(X4=T|X1=T, X2=F, X3=T)=2/4P(X4=F|X1=T, X2=F, X3=T)=2/4

A tree representation of a CPD

P( X4 | X1, X2, X3 ) as a tree

P(X4 = t) = 0.9

P(X4 = t) = 0.5

P(X4 = t) = 0.5 P(X4 = t) = 0.8

Allows more compact representation of CPDs,by ignoring some unlikely relationships.

The learning problems in Bayesian networks

• Parameter learning on known graph structure– Given data D and G, learn Θ

• Structure learning– Given data D, learn G and Θ

Structure learning using score-based search

A function of how well B describes the data D

Scores for Bayesian networks

• Maximum likelihood

• Regularized maximum likelihood

• Bayesian score

Decomposability of scores

• The score of a Bayesian network B decomposes over individual variables

• Enables efficient computation of the score change to local changes

Joint assignment to Pa(Xi) in the dth sample

Search space of graphs is huge

• For N variables there are possible graphs

• Set of possible networks grows super exponentially

N Number of networks

5 1024

6 32768

Need approximate methods to search the space of networks

Greedy Hill climbing to search Bayesian network space

• Input: D={x1,..,xN}, An initial graph, B0={G0, Θ0}

• Output: Bbest

• Loop until convergence:– {Bi

1, .., Bim} = Neighbors(Bi) by making local changes to Bi

– Bi+1: arg maxj(Score(Bij))

• Termination: – Bbest= Bi

Local changes to Bi

add an edge

delete an edge

Current network

Check for cycles

Challenges with applying Bayesian network to genome-scale data

• Number of variables, p is in thousands

• Number of samples, N is in hundreds

Extensions to Bayesian networks to handle genome-scale networks

• Sparse candidate algorithm – Friedman, Nachman, Pe’er. 1999

• Bootstrap to identify high scoring graph features– Friedman, Linial, Nachman, Pe’er. 2000

• Module networks (subsequent lecture)– Segal, Pe’er, Regev, Koller, Friedman. 2005

• Add graph priors (subsequent lecture, hopefully)

The Sparse candidate Structure learning in Bayesian networks

• Key idea: Identify k “promising” candidate parents for each Xi

– k<<p, p: number of random variables– Candidates define a “skeleton graph” H

• Restrict graph structure to select parents from H• Early choices in H might exclude other good parents– Resolve using an iterative algorithm

Sparse candidate algorithm

• Input:– A data set D– An initial Bayes net B0

– A parameter k: max number of parents per variable• Output:

– Final B• Loop until convergence

– Restrict• Based on D and Bn-1 select candidate parents Ci

n-1 for Xi

• This defines a skeleton directed network Hn

– Maximize• Find network Bn that maximizes the score Score(Bn;D) among networks satisfying

• Termination: Return Bn

Selecting candidate parents in the Restrict Step

• A good parent for Xi is one with strong statistical dependence with Xi

– Mutual information provides a good measure of statistical dependence I(Xi; Xj)

– Mutual information should be used only as a first approximation• Candidate parents need to be iteratively refined to

avoid missing important dependences

• A good parent for Xi has the highest score improvement when added to Pa(Xi)

Mutual Information

• Measure of statistical dependence between two random variables, Xi and Xj

Mutual information can miss some parents• Consider the following true network

• If I(A;C)>I(A;D)>I(A;B) and we are selecting k<=2 parents, B will never be selected as a parent

• How do we get B as a candidate parent?• If we used mutual information alone to select candidates, we

might be stuck with C and D

True network

Computational savings in Sparse Candidate

• Ordinary hill climbing– O(2n) possible parent sets– O(n2) initial score change calculations– O(n) for subsequent iterations

• Complexity of learning constrained on a skeleton directed graph – O(2k) possible parent sets– O(nk) initial score change calculations– O(k) for subsequent iterations

Sparse candidate learns good networks faster than hill-climbing

Dataset 1 Dataset 2

100 variables 200 variables

Greedy hill climbing takes much longer to reach a high scoring bayesian network

Some comments about choosing candidates

• How to select k in the sparse candidate algorithm?• Should k be the same for all Xi ?• If the data are Gaussian could be do something better?• Regularized regression approaches can be used to

estimate the structure of an undirected graph• L1-Dag learn provides an alternate

– Schmidt, Niculescu-Mizil, Murphy 2007– Estimate an undirected dependency network Gundir

– Learn a Bayesian network constrained on Gundir

Dependency network

• A type of probabilistic graphical model• As in Bayesian networks has– A graph component– A probability component

• Unlike Bayesian network – Can have cyclic dependencies

Dependency Networks for Inference, Collaborative Filtering and Data visualization Heckerman, Chickering, Meek, Rounthwaite, Kadie 2000

Selecting candidate regulators for the ith gene using regularized linear regression

Xi= X1 …… Xp-1

1 p-11

Regularization term

?? ?…

Candidates

Everything other than Xi

L1 norm, sparsity imposing Sets many regression coefficients to 0Also called Lasso regression

Learning dependency networks

• Learning: estimate a set of conditional probability distributions, one per variable.

• P(X,|X-j) could be estimated by solving• A set of linear regression problem• Meinhausen & Buhlmann, 2006 • TIGRESS (Haury et al, 2010)

• A set of non-linear regression problems• Non-linearity captured by Regression Tree

(Heckerman et al, 2000)• Non-linearity captured by Random forest• GENIE3, (Huynh-Thu et al, 2010)

Where do different methods rank?

Marbach et al., 2012 Com

Goals for today

• Introduction– Why should we care?– Different types of cellular networks

• Methods for network reconstruction from expression– Per-gene methods

• Sparse Candidates Bayesian networks • Regression-based methods

– GENIE3– L1-DAG learn

– Per-module methods• Assessing confidence in network structure

Assessing confidence in the learned network

• Typically the number of training samples is not sufficient to reliably determine the “right” network

• One can however estimate the confidence of specific features of the network– Graph features f(G)

• Examples of f(G)– An edge between two random variables– Order relations: Is X, Y’s ancestor?

How to assess confidence in graph features?

• What we want is P(f(G)|D), which is

• But it is not feasible to compute this sum

• Instead we will use a “bootstrap” procedure

Bootstrap to assess graph feature confidence

• For i=1 to m– Construct dataset Di by sampling with

replacement N samples from dataset D, where N is the size of the original D

– Learn a network Bi

• For each feature of interest f, calculate confidence

Does the bootstrap confidence represent real relationships?

• Compare the confidence distribution to that obtained from randomized data

• Shuffle the columns of each row (gene) separately.• Repeat the bootstrap procedure

randomize eachrow independently

Experimental conditions

Application of Bayesian network to yeast expression data

• 76 experiments/microarrays• 800 genes• Bootstrap procedure on 200 subsampled

datasets• Sparse candidate as the Bayesian network

learning algorithm

Bootstrap-based confidence differs between real and actual data

Random

Example of a high confidence sub-network

One learned Bayesian network Bootstrapped confidence Bayesian network

Highlights a subnetwork associated with yeast mating

Summary

• Network inference from expression provides a promising approach to identify cellular networks

• Bayesian networks are one representation of networks that have a probabilistic and graphical component– Network inference naturally translates to learning problems in

Bayesian networks.• Successful application of Bayesian networks to

expression data requires additional considerations– Reduce potential parents

• statistically or using biological knowledge

– Bootstrap based confidence estimation

Linear regression with N inputs

• Y: output•

intercept Parameters/coefficients

Given: Data=

Estimate:

Information theoretic concepts

• Kullback Leibler (KL) Divergence– Distance between two distributions

• Mutual information– Measures statistical dependence between X and Y– Equal to KL Divergence between P(X,Y) and

P(X)P(Y)• Conditional Mutual information– Measures the information between two variables

given a third

KL Divergence

P(X), Q(X) are two distributions over X

Measuring relevance of Y to X

• MDisc(X,Y)– DKL(P(X,Y)||PB(X,Y))

• MShield(X,Y)– I(X;Y|Pa(X))

• Mscore(X,Y)– Score(X;Y,Pa(X),D)

Conditional Mutual Information

• Measures the mutual information between X and Y, given Z

• If Z captures everything about X, knowing Y gives no more information about X.

• Thus the conditional mutual information would be zero.

What do the Bayesian network edges represent?

Spang and Markowetz, BMC Bioinformatics 2005

Is it just correlation? No.

High correlation could be due to any of the three possible regulatory mechanisms

Computational methods to inferring cellular networks Stat 877 Apr 15 th 2014 Sushmita Roy

Documents

Inferring the Source of Encrypted HTTP Connectionspdm12/cse544/slides/cse544-inferring-lin.pdf · Inferring the Source of Encrypted HTTP Connections Michael Lin CSE 544

Sushmita Sridhar thesis1 - Stacksyp026hd9575/Sridhar... · Identifying oncogenic attributes and protein binding partners of the NKX2-1 lung cancer oncogene Sushmita Sridhar May 2014

Inferring Disjunctive Postconditions

Modified inferring

Reading skills inferring

Inferring Mood

Pairwise Sequence Alignment Sushmita Roy BMI/CS 576 Sushmita Roy sroy@biostat.wisc.edu Sep 10 th, 2013 BMI/CS 576

Inferring Object Invariants

Sushmita Chatterjee

Inferences and Inferring

Inferring and Predicting

Inferring semantic maps.pdf

Inferring cause and effect

Rizal Chapter 4 de Leon Maria Sushmita

Inferring Strategy

Inferring Gas Consumption and Pollution Emission of ...dz220/CS671/6GasConsumption.pdf · Inferring Gas Consumption and Pollution ... –GPS logger tracking the ... et al. Inferring

PROPHYLAXIS FOR ABORTION BY, Rachel Sushmita Daniel 514 A

7 inferring

Inferring Semantic Interfaces of Data Structures - Falk …falkhowar.de/...Inferring-Semantic-Interfaces-of-Data-Structures.pdf · Inferring Semantic Interfaces of Data Structures?

The Whistling Wanderer-Sushmita Ananth