20
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Graph Regularized Dual Lasso for Robust eQTL Mapping Wei Cheng 1 Xiang Zhang 2 Zhishan Guo 1 Yu Shi 3 Wei Wang 4 1 University of North Carolina at Chapel Hill, 2 Case Western Reserve University, 3 University of Science and Technology of China, 4 University of California, Los Angeles Speaker: Wei Cheng The 22 th Annual International Conference on Intelligent Systems for Molecular Biology (ISMB’14)

Graph Regularized Dual Lasso for Robust eQTL Mapping

  • Upload
    daisy

  • View
    41

  • Download
    0

Embed Size (px)

DESCRIPTION

Graph Regularized Dual Lasso for Robust eQTL Mapping. Wei Cheng 1 Xiang Zhang 2 Zhishan Guo 1 Yu Shi 3 Wei Wang 4 1 University of North Carolina at Chapel Hill, 2 Case Western Reserve University, 3 University of Science and Technology of China, 4 University of California, Los Angeles. - PowerPoint PPT Presentation

Citation preview

Page 1: Graph Regularized Dual Lasso for Robust eQTL Mapping

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL

Graph Regularized Dual Lasso for Robust eQTL Mapping

Wei Cheng1 Xiang Zhang2 Zhishan Guo1 Yu Shi3 Wei Wang4

1University of North Carolina at Chapel Hill,2Case Western Reserve University,

3University of Science and Technology of China,4University of California, Los Angeles

Speaker: Wei ChengThe 22th  Annual International Conference on Intelligent Systems for Molecular

Biology (ISMB’14)

Page 2: Graph Regularized Dual Lasso for Robust eQTL Mapping

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL

eQTL (Expression QTL) • Goal: Identify genomic locations where

genotype significantly affects gene expression.

Page 3: Graph Regularized Dual Lasso for Robust eQTL Mapping

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL

• Partition individuals into groups according to genotype of a SNP

• Do a statistic (t, ANOVA) test

• Repeat for each SNP

Statistical Test

SNPs

(X)

Gene expression levels (Z)

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .0 0 0 0 0 0 1 1 1 1 1 10 0 1 1 1 1 0 0 1 0 0 00 0 1 0 0 0 1 0 1 0 0 11 0 0 0 1 0 1 0 1 1 1 10 0 0 1 0 0 1 1 1 0 0 01 0 1 0 1 0 1 0 1 0 1 0. . . . . . . . . . . .. . . . . . . . . . . .. . . . . . . . . . . .. . . . . . . . . . . .8 7 12 11 9 13 6 4 2 5 0 39 8 1 0 8 5 2 1 0 8 6 2. . . . . . . . . . . .

individuals

SNP1

1

0

4 8 12

Gene expression level

Page 4: Graph Regularized Dual Lasso for Robust eQTL Mapping

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL

Lasso-based feature selection

X: the SNP matrix (each row is one SNP)Z: the gene expression matrix (each row is one

gene expression level)Objective:

21

1min || || || ||

2 F W

Z WX W

Page 5: Graph Regularized Dual Lasso for Robust eQTL Mapping

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL

Incorporating prior knowledge

• SNPs (and genes) usually are not independent

• The interplay among SNPs and the interplay among genes can be represented as networks and used as prior knowledgePrior knowledge: genetic interaction network, PPI network,

gene co-expression network, etc.

• E.g., group lasso, multi-task, SIOL, MTLasso 2G.

Page 6: Graph Regularized Dual Lasso for Robust eQTL Mapping

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL

Limitations of current methods

• A clustering step is usually needed to obtain the grouping information.

• Do not take into consideration the incompleteness of the prior knowledge and the noise in themE.g., PPI networks may contain many false interactions and

miss true interactions

• Other prior knowledge, such as location and gene pathway information, are not considered.

Page 7: Graph Regularized Dual Lasso for Robust eQTL Mapping

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL

Motivation

• Examples of prior knowledge on genetic interaction network S and gene-gene interactions represented by PPI network (or gene co-expression network G).W is the regression coefficients to be learned.

Page 8: Graph Regularized Dual Lasso for Robust eQTL Mapping

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL

21 *

, , ,

T T

2 2

1min || || || || || ||

2

tr( ( ) ) tr( ( ) )

|| || || ||

F

F F

W L S 0 G 0

S G

0 0

Z WX L W L

W D S W W D G W

S S G G

GD-Lasso: Graph-regularized Dual Lasso• Objective:

Lasso objective considering confounding factors (L), ||L||* is the nuclear norm to control L as low-rank.

The graph regularizerThe graph regularizer

The fitting constraint for prior knowledgeThe fitting constraint for prior knowledge

Page 9: Graph Regularized Dual Lasso for Robust eQTL Mapping

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL

GGD-Lasso: Generalized Graph-regularized Dual Lasso

• Further incorporating location and pathway information.

• Objective:

21 *

, , ,

* * , * * ,, ,

1min || || || || || ||

2

( , ) ( , )

F

i j i j i j i ji j i j

D D

W L S 0 G 0

Z WX L W L

w w S w w G

D(·, ·) is a nonnegative distance measure.

Page 10: Graph Regularized Dual Lasso for Robust eQTL Mapping

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL

GGD-Lasso: Optimization• Executes the following two steps iteratively until

the termination condition is met: 1) update W while fixing S and G;2) update S and G according to W, while decreasing: and

We can maintain a fixed number of edges in S and G. E.g., to update G, we can swap edge (i’, j’) and edge (i,j) when

• Further integrate location and pathway information

* * ,,

( , )i j i ji j

D w w S * * ,,

( , )i j i ji j

D w w G

* * '* '*( , ) ( , )i j i jD Dw w w w

Page 11: Graph Regularized Dual Lasso for Robust eQTL Mapping

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL

Experimental Study: simulation

• 10 gene expression profiles are generated by2

* * * * *

T*

(0, )

(0, ), , (0,1)

j j j j j

j ij

where N

N N

Z W X E E I

MM M~

~

~

Page 12: Graph Regularized Dual Lasso for Robust eQTL Mapping

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL

Experimental Study: simulation

The ROC curve. The black solid line denotes what random guessing would have achieved.

Page 13: Graph Regularized Dual Lasso for Robust eQTL Mapping

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL

Experimental Study: simulation

AUCs of Lasso, LORS, G-Lasso and GD-Lasso. In each panel, we vary the percentage of noises in the prior networks S0 and G0.

Page 14: Graph Regularized Dual Lasso for Robust eQTL Mapping

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL

Experimental Study: Yeast• yeast eQTL dataset

112 yeast segregants generated from a cross of two inbred strains: BY and RM;

removing those SNP markers with percentage of NAs larger than 0.1 (the incomplete SNPs are imputed), and merging those markers with the same genotypes, dropping genes with missing values;

get 1017 SNP markers, 4474 expression profiles;

• Genetic interaction network and PPI network (S and G)

Page 15: Graph Regularized Dual Lasso for Robust eQTL Mapping

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL

Experimental Study: Yeast

• cis-enrichment analysis (1) one-tailed Mann-Whitney: test on each SNP for cis

hypotheses; (2) a paired Wilcoxon sign-rank: test on the p-values

obtained from (1).

• trans-enrichment:Similar strategy: genes regulated by transcription factors

(TF) are used as trans-acting signals.

Page 16: Graph Regularized Dual Lasso for Robust eQTL Mapping

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL

Experimental Study: Yeast

Pairwise comparison of different models using cis-enrichment and trans-enrichment analysis

Page 17: Graph Regularized Dual Lasso for Robust eQTL Mapping

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL

Experimental Study: Yeast

Summary of the top-15 hotspots detected by GGD-Lasso. Hotspot (12) in bold cannot be detected by G-Lasso. Hotspot (6) in italic cannot be detected by SIOL. Hotspot (3) in teletype cannot be detected by LORS.

Page 18: Graph Regularized Dual Lasso for Robust eQTL Mapping

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL

Experimental Study: Yeast

Hotspots detected by different methods

Page 19: Graph Regularized Dual Lasso for Robust eQTL Mapping

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL

Conclusion

• In this paper…We propose novel and robust graph regularized

regression models to take into account the prior networks of SNPs and genes simultaneously.

Exploiting the duality between the learned coefficients and incomplete prior networks enables more robust model.

We also generalize our model to integrate other types of information, such as location and gene pathway information.

Page 20: Graph Regularized Dual Lasso for Robust eQTL Mapping

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL

Thank You !

Questions?

Travel funding to ISMB 2014 was generously provided by DOE