32
Using Functional Genomic Units to Corroborate User Experiments with the Rosetta Compendium Duke Bioinformatics Shared Resource Duke University Medical Center Simon M. Lin* Patrick McConnell* Department of Electronic Engineering Duke University Xuejun Liao* Lawrence Carin Department of Cardiology Duke University Medical Center Korkut Vita* Pascal Goldschmidt (* Authors contributed equally to the work)

Using Functional Genomic Units to Corroborate User ...people.ee.duke.edu/~xjliao/talk/CAMDA2001_ICA_Simon.pdf• Functional Genomic Unit #6: Six of these genes are coding for isoforms

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Using Functional Genomic Units to Corroborate User ...people.ee.duke.edu/~xjliao/talk/CAMDA2001_ICA_Simon.pdf• Functional Genomic Unit #6: Six of these genes are coding for isoforms

Using Functional Genomic Unitsto Corroborate User Experimentswith the Rosetta Compendium

Duke Bioinformatics Shared ResourceDuke University Medical Center

Simon M. Lin*Patrick McConnell*

Department of Electronic EngineeringDuke UniversityXuejun Liao*

Lawrence Carin

Department of CardiologyDuke University Medical Center

Korkut Vita*Pascal Goldschmidt

(* Authors contributed equally to the work)

Page 2: Using Functional Genomic Units to Corroborate User ...people.ee.duke.edu/~xjliao/talk/CAMDA2001_ICA_Simon.pdf• Functional Genomic Unit #6: Six of these genes are coding for isoforms

Contributions

� Can we use biological knowledge in exploratory data analysis?�Context-sensitive Clustering

�Designed a Java Application

� Can we computationally find the coordinated gene groups? Canwe use them to simplify our analysis?

�Functional Genomic Units (will be available to academic groups)

�Utilized an ICA Implementation in MatLab

� Can we use Rosetta data to explain our own experiments?�Conducted an Affymetrix measurement of RacC/A yeast strain

�Explained results from different Labs/Instrumentation setups

Page 3: Using Functional Genomic Units to Corroborate User ...people.ee.duke.edu/~xjliao/talk/CAMDA2001_ICA_Simon.pdf• Functional Genomic Unit #6: Six of these genes are coding for isoforms

Knowledge Should not beIgnored in the MicroarrayAnalysis Process

Scientist

Data

Knowledge Publication

Experiment

Page 4: Using Functional Genomic Units to Corroborate User ...people.ee.duke.edu/~xjliao/talk/CAMDA2001_ICA_Simon.pdf• Functional Genomic Unit #6: Six of these genes are coding for isoforms

Context-driven Clustering� Clustering is unsupervised learning. No

previous knowledge is necessary.

� Even with its exploratory nature, it stilldepends on your point of view.

� You previous knowledge will help you onfeature selection.

Page 5: Using Functional Genomic Units to Corroborate User ...people.ee.duke.edu/~xjliao/talk/CAMDA2001_ICA_Simon.pdf• Functional Genomic Unit #6: Six of these genes are coding for isoforms

Why clustering should be done ina given context (a Toy Example)

Features

Obj ects

3400290422000Person1

0300180432900Person2

……………………

0500380712500Person10000

# ofclaims:autoaccident

Autoinsurancepremium

# of carsin thehousehold

BloodPressure

Fiberintake

Saltintake

Calorieintake

3400290422000Person1

0300180432900Person2

……………………

0500380712500Person10000

# ofclaims:autoaccident

Autoinsurancepremium

# of carsin thehousehold

BloodPressure

Fiberintake

Saltintake

Calorieintake

Page 6: Using Functional Genomic Units to Corroborate User ...people.ee.duke.edu/~xjliao/talk/CAMDA2001_ICA_Simon.pdf• Functional Genomic Unit #6: Six of these genes are coding for isoforms

Same is true for genomics

2…422000Experiment 1

1…432900Experiment 2

………………

3…712500Experiment 300

Gene 10000

…Gene3

Gene2

Gene1

Features

Obj ects

Page 7: Using Functional Genomic Units to Corroborate User ...people.ee.duke.edu/~xjliao/talk/CAMDA2001_ICA_Simon.pdf• Functional Genomic Unit #6: Six of these genes are coding for isoforms

“Kitchen-sink” Clustering

Page 8: Using Functional Genomic Units to Corroborate User ...people.ee.duke.edu/~xjliao/talk/CAMDA2001_ICA_Simon.pdf• Functional Genomic Unit #6: Six of these genes are coding for isoforms

Genomic Knowledge Organizedin a Tree Structure

Page 9: Using Functional Genomic Units to Corroborate User ...people.ee.duke.edu/~xjliao/talk/CAMDA2001_ICA_Simon.pdf• Functional Genomic Unit #6: Six of these genes are coding for isoforms

Integrated withthe ExpressionBrowser forClustering

Page 10: Using Functional Genomic Units to Corroborate User ...people.ee.duke.edu/~xjliao/talk/CAMDA2001_ICA_Simon.pdf• Functional Genomic Unit #6: Six of these genes are coding for isoforms

Clustering in the lipid-metabolismContext

Page 11: Using Functional Genomic Units to Corroborate User ...people.ee.duke.edu/~xjliao/talk/CAMDA2001_ICA_Simon.pdf• Functional Genomic Unit #6: Six of these genes are coding for isoforms

Independent Component Analysis (ICA) of the GeneExpression Profiles

• Statement of the ICA problem

Axy =

y - the observed random vector of N components

x - is a random vector with M independent components (IC)

A - mixing matrix

Q - separating matrix

• ICA Signal Model

• Objective

Find Q and A such that the components in x are asindependent as possible

or Qyx =

Page 12: Using Functional Genomic Units to Corroborate User ...people.ee.duke.edu/~xjliao/talk/CAMDA2001_ICA_Simon.pdf• Functional Genomic Unit #6: Six of these genes are coding for isoforms

• ICA Solution of the Blind Source Separation Problem

--- An Illustration

Page 13: Using Functional Genomic Units to Corroborate User ...people.ee.duke.edu/~xjliao/talk/CAMDA2001_ICA_Simon.pdf• Functional Genomic Unit #6: Six of these genes are coding for isoforms

• ICA Model of the Microphone Array Signals

1 0 2 0 3 0 4 0 5 0 6 0

- 1

- 0 . 5

0

0 . 5

1

1 0 2 0 3 0 4 0 5 0 6 0

- 1

- 0 . 5

0

0 . 5

1

• Audio signals of two independent speakers

Speaker 1

Speaker 2

0 1 0 2 0 3 0 4 0 5 0 6 0- 1

- 0 . 5

0

0 . 5

0 1 0 2 0 3 0 4 0 5 0 6 0- 0 . 5

0

0 . 5

0 1 0 2 0 3 0 4 0 5 0 6 0- 0 . 2

0

0 . 2

Time indices

• Mixed audio signals received at the microphone array

Microphonearray

Page 14: Using Functional Genomic Units to Corroborate User ...people.ee.duke.edu/~xjliao/talk/CAMDA2001_ICA_Simon.pdf• Functional Genomic Unit #6: Six of these genes are coding for isoforms

Original signals

ICA signals

PCA signals

Time indices

• Extraction of the two speakers’ audio signals via ICA and PCA

1 0 2 0 3 0 4 0 5 0 6 0

- 1

- 0 . 5

0

0 . 5

1

1 0 2 0 3 0 4 0 5 0 6 0

- 1

- 0 . 5

0

0 . 5

1

1 0 2 0 3 0 4 0 5 0 6 0

- 1

- 0 . 5

0

0 . 5

1

1 0 2 0 3 0 4 0 5 0 6 0

- 1

- 0 . 5

0

0 . 5

1

1 0 2 0 3 0 4 0 5 0 6 0

- 1

- 0 . 5

0

0 . 5

1

1 0 2 0 3 0 4 0 5 0 6 0

- 1

- 0 . 5

0

0 . 5

1

Page 15: Using Functional Genomic Units to Corroborate User ...people.ee.duke.edu/~xjliao/talk/CAMDA2001_ICA_Simon.pdf• Functional Genomic Unit #6: Six of these genes are coding for isoforms

• ICA Model of the DNA Microarray (Gene Expression) Profiles

Functional Event 1eg, Cell Proliferation

Functional Event 2eg, Detoxification

• Gene expression profiles versus experiment (condition) received at the DNA microarray

• Expression versus experiment (condition) measurements of two mutually independentFunctional Events Experiments (conditions)

Gen

es

DNA Microarray

1 2 3 4 5 6 7 8 9 1 0 1 1 1 2

- 1

- 0 . 5

0

0 . 5

1

1 2 3 4 5 6 7 8 9 1 0 1 1 1 2

- 1

- 0 . 5

0

0 . 5

1

1 2 3 4 5 6 7 8 9 10 11 12-0.4-0.200.20.40.60.8

1 2 3 4 5 6 7 8 9 10 11 12012

1 2 3 4 5 6 7 8 9 10 11 12-1.5-1-0.500.5

1 2 3 4 5 6 7 8 9 10 11 12-0.1

-0.050

0.05

1 2 3 4 5 6 7 8 9 10 11 12-0.200.20.40.6

1 2 3 4 5 6 7 8 9 10 11 12-1.5

-1-0.50

1 2 3 4 5 6 7 8 9 10 11 12-1.5

-1-0.5

00.5

Page 16: Using Functional Genomic Units to Corroborate User ...people.ee.duke.edu/~xjliao/talk/CAMDA2001_ICA_Simon.pdf• Functional Genomic Unit #6: Six of these genes are coding for isoforms

• Extraction of the two mutually independent Functional Events via ICA and PCA

Original events

Functionaleventsrecoveredfrom ICA

Functionaleventsrecoveredfrom PCA

Experiments (conditions)

1 2 3 4 5 6 7 8 9 1 0 1 1 1 2

- 1

- 0 . 5

0

0 . 5

1

1 2 3 4 5 6 7 8 9 1 0 1 1 1 2

- 1

- 0 . 5

0

0 . 5

1

1 2 3 4 5 6 7 8 9 1 0 1 1 1 2

- 1

- 0 . 5

0

0 . 5

1

1 2 3 4 5 6 7 8 9 1 0 1 1 1 2

- 1

- 0 . 5

0

0 . 5

1

1 2 3 4 5 6 7 8 9 1 0 1 1 1 2

- 1

- 0 . 5

0

0 . 5

1

1 2 3 4 5 6 7 8 9 1 0 1 1 1 2

- 1

- 0 . 5

0

0 . 5

1

Page 17: Using Functional Genomic Units to Corroborate User ...people.ee.duke.edu/~xjliao/talk/CAMDA2001_ICA_Simon.pdf• Functional Genomic Unit #6: Six of these genes are coding for isoforms

• ICA Model of the DNA Microarray (Gene Expression) Profiles

= ×Expression

Profile of

Genes

Experiment indexIndependentComponents

ExpressionProfiles ofFunctionalUnits

Experiment indexElement (i, j)DenotesFuzzyMembershipof Gene iBelongingtoFunctionalunit j

Expression

Profile of Genes=

Memberships of Genes

Belonging to Function Units×

ExpressionProfile ofFunctionalUnits

GenomicFunctionalUnits

Representing Impactsof Experiments interms of Genes

Representing Impacts ofExperiments in terms ofGenomic Functional Units

xAy ⋅= ICA Model

or

Page 18: Using Functional Genomic Units to Corroborate User ...people.ee.duke.edu/~xjliao/talk/CAMDA2001_ICA_Simon.pdf• Functional Genomic Unit #6: Six of these genes are coding for isoforms

• Definition of a Genomic Functional Unit

0 5 0 1 0 0 1 5 0 2 0 0 2 5 0 3 0 0 3 5 00

0 .0 2

0 .0 4

0 .0 6

0 .0 8

0 .1

0 .1 2M e m b e r s h i p fu n c ti o n o f G e n o m i c F u n c t i o n a l U n i t # 6 9

YFL026W'

YFL053W

YML007W

YLR307W

YAL067C

DR218CYIL037C

YLR296W

YPL121C

YPR116W

Fuzzy membership function of Unit # 69, which is responsible for oxidative stress response

A Genomic Functional Unit is a fuzzy set defined on the genes in consideration.

It generally contains genes that work together to accomplish a certain biological function

Page 19: Using Functional Genomic Units to Corroborate User ...people.ee.duke.edu/~xjliao/talk/CAMDA2001_ICA_Simon.pdf• Functional Genomic Unit #6: Six of these genes are coding for isoforms

• Principles of the Independent Component Analysis Algorithm

1. Measure of statistical independence

• Mutual information

• Original definitionA random vector x has independent components xi if

∏=

=N

iixx upp

i1

)()(u

joint pdf marginal pdf

Kullback-Leibler distance between joint pdf and marginal pdf

� ∏∏ ===

uu

u dup

ppppkldpI

iix

xx

N

ixxx

i

i )(

)(ln)(),()(

1

Page 20: Using Functional Genomic Units to Corroborate User ...people.ee.duke.edu/~xjliao/talk/CAMDA2001_ICA_Simon.pdf• Functional Genomic Unit #6: Six of these genes are coding for isoforms

Differential entropy of x

�−= uuu dpppS xxx )(ln)()(

Negentropy of x

)()()( xyx pSSpJ −= φφY(u) - Gaussian distribution with equal covariance matrix to px(u)

J(px) ≥ 0 with equality iff px(u ) = φY(u ). This is so because Gaussiandistribution has the largest entropy among the pdf’s having a givencovariance matrix

IMPRTANT: J(px) is invariant under general invertible linear transforms

because AxAx detln)()( += pSpS

cancel out in J(px)

Page 21: Using Functional Genomic Units to Corroborate User ...people.ee.duke.edu/~xjliao/talk/CAMDA2001_ICA_Simon.pdf• Functional Genomic Unit #6: Six of these genes are coding for isoforms

V

VpJpJpI ii

N

ixxx i det

ln2

1)()()(

1

∏� +−==

Proof.

])()([)(

)(ln)(

])(ln)()([)(ln)()(

)]()([)()()()(1

�� ∏

� ��

��

−+=

−−+=

−−−=−=

ixx

iix

xx

iixxxxxx

ixxxx

N

ixx

i

i

ii

iii

SSdup

pp

duppSdppS

pSSpSSpJpJ

φφ

φφ

φφ

uu

u

uuuuu

∏=

iiiV

Vdetln

2

1

)( xpIVeS n

x det)2ln(2

1)( πφ =

• Representation of mutual information using the negentropy

Page 22: Using Functional Genomic Units to Corroborate User ...people.ee.duke.edu/~xjliao/talk/CAMDA2001_ICA_Simon.pdf• Functional Genomic Unit #6: Six of these genes are coding for isoforms

2. Basic Principals of the ICA algorithms

V

VpJpJpI ii

N

ixxx i det

ln2

1 )( )()(

1

∏� +−==

J(px) is invariant undergeneral invertiblelinear transforms

To bemaximized

Cancel out via standardization,which transforms x to with aunitary covariance matrix

x~

3 Examples of practical criterions of statistical independence

V

VpJpI ii

N

iiiiiiiiiiiiiiiiiixx det

ln21

)487

81

481

121

( )()(1

4222 ∏� ++−+−=

=

κκκκκ

),,,( iiiiiii xxxxcum=κwhere

3.1 Criterion based on approximation of negentropy

To be maximized

Page 23: Using Functional Genomic Units to Corroborate User ...people.ee.duke.edu/~xjliao/talk/CAMDA2001_ICA_Simon.pdf• Functional Genomic Unit #6: Six of these genes are coding for isoforms

3.2 Simple criterions based on cumulants

�QyzQ ==�

=

of cumulants the with)(1

2

'

KKN

iiiisir

�ψ

• ICA Results of the Rosetta Compendium Data Set

Rosetta Data Set --- Expression profiles of genes in 300 experiments

Page 24: Using Functional Genomic Units to Corroborate User ...people.ee.duke.edu/~xjliao/talk/CAMDA2001_ICA_Simon.pdf• Functional Genomic Unit #6: Six of these genes are coding for isoforms

ICA results I:Expression profiles of Independent Components (functional units) in 300experiments

Experiment indices

Ind

epe

nden

t Co

mp

one

nt in

dice

s

50 100 150 200 250 300

50

100

150

200

250

Page 25: Using Functional Genomic Units to Corroborate User ...people.ee.duke.edu/~xjliao/talk/CAMDA2001_ICA_Simon.pdf• Functional Genomic Unit #6: Six of these genes are coding for isoforms

PCA results for comparison:Expression profiles of the Principal Components in 300 experiments

Experiment indices

Prin

cipa

l Co

mpo

nent

indi

ces

50 100 150 200 250 300

50

100

150

200

250

-40

-30

-20

-10

0

10

Page 26: Using Functional Genomic Units to Corroborate User ...people.ee.duke.edu/~xjliao/talk/CAMDA2001_ICA_Simon.pdf• Functional Genomic Unit #6: Six of these genes are coding for isoforms

• Functional Genomic Unit #6:

� Six of these genes are coding for isoforms of α-glucosidase (MAL62, MAL32, MAL12, FSP2, YIL172c, and YJL216c)

� Four of the genes are directly associated with cell-wallsynthesis and sporulation (sporulation specific homolog of csd4 (YER096w),sporulation specific cell wall maturation protein (YHR139c), first enzyme in dityrosine synthesis inthe outer layer of the spore wall pathway, converting L-tyrosine to N-formyl-L-tyrosine,(YDR403w), and Cell wall mannoprotein (YJR150c)).

� Five genes are involved in the glucose metabolism (glucoserepression regulatory protein-exhibits similarity to beta subunits of G proteins (TUP1), Highaffinity hexose transporter (YDL245c), High affinity hexose transporter (YEL069c), Hexosetransporter (YNR072w), and Hexose Transporter (YJR158w)).

ICA results II:Discovery or corroboration of genes’ functional unit

Page 27: Using Functional Genomic Units to Corroborate User ...people.ee.duke.edu/~xjliao/talk/CAMDA2001_ICA_Simon.pdf• Functional Genomic Unit #6: Six of these genes are coding for isoforms

• Definition of a Genomic Functional Unit

0 5 0 1 0 0 1 5 0 2 0 0 2 5 0 3 0 0 3 5 00

0 .0 2

0 .0 4

0 .0 6

0 .0 8

0 .1

0 .1 2M e m b e r s h i p fu n c ti o n o f G e n o m i c F u n c t i o n a l U n i t # 6 9

YFL026W'

YFL053W

YML007W

YLR307W

YAL067C

DR218CYIL037C

YLR296W

YPL121C

YPR116W

Fuzzy membership function of Unit # 69, which is responsible for oxidative stress response

A Genomic Functional Unit is a fuzzy set defined on the genes in consideration.

It generally contains genes that work together to accomplish a certain biological function

Page 28: Using Functional Genomic Units to Corroborate User ...people.ee.duke.edu/~xjliao/talk/CAMDA2001_ICA_Simon.pdf• Functional Genomic Unit #6: Six of these genes are coding for isoforms

CCAGAAGTTGA 1 319 1CAAAAAGGTGT 1 647 0CCTGAAGTTGT 3 47 1CAAAAAGGTCA 3 362 1CCGGAAGGGGT 3 440 0CAGGAAGGTGA 4 81 1CAGGAAGTTGA 4 121 1CACAAAGGTGA 6 69 0CCTGAAGGTCA 7 169 0CCTGAAGGTTT 7 188 1** ********

Common 5’ -UTR

Page 29: Using Functional Genomic Units to Corroborate User ...people.ee.duke.edu/~xjliao/talk/CAMDA2001_ICA_Simon.pdf• Functional Genomic Unit #6: Six of these genes are coding for isoforms

Concept of “Functional Genomic Unit”

� The set of gene found here is different fromthe “Pathways” in the traditional sense.

� Mathematical Point of View: LatentVariables constructed by IndependentComponent Analysis

� Biological Point of View: Coordinatedgenes to achieve a certain goal

Page 30: Using Functional Genomic Units to Corroborate User ...people.ee.duke.edu/~xjliao/talk/CAMDA2001_ICA_Simon.pdf• Functional Genomic Unit #6: Six of these genes are coding for isoforms

Using FGU to explain userexperiments

FGUs

Ex p

e rim

ent s

Page 31: Using Functional Genomic Units to Corroborate User ...people.ee.duke.edu/~xjliao/talk/CAMDA2001_ICA_Simon.pdf• Functional Genomic Unit #6: Six of these genes are coding for isoforms

Putative signal transduction pathways

Ras

Raf MKK4

JNK

ERK

MKK3/6

P38

c-fos c-jun

AP-1 REJun1/jun2

UVCytokine ReceptorsGrowth Factor receptors

Rac / CDC42

MEK

Rac / CDC42

NADPH Oxidase

ROS

+ +

c-jun ATF2C-fos promoter

Page 32: Using Functional Genomic Units to Corroborate User ...people.ee.duke.edu/~xjliao/talk/CAMDA2001_ICA_Simon.pdf• Functional Genomic Unit #6: Six of these genes are coding for isoforms

Summary of Findings� Incorporated biological knowledge in

exploratory data analysis

� Utilized ICA to model the yeast functionalgenomics behavior

� Proposed Functional Genomics Units

� Demonstrated the potentials of the“Compendium” approach