37
Brad Windle, Ph.D. 628-1956 [email protected] Unsupervised Learning and Microarrays b Site: http://www.people.vcu.edu/~bwind nk to Courses and then lecture for this c

Brad Windle, Ph.D. 628-1956 [email protected] Unsupervised Learning and Microarrays Web Site: bwindle Link to Courses and

Embed Size (px)

Citation preview

Page 1: Brad Windle, Ph.D. 628-1956 bwindle@hsc.vcu.edu Unsupervised Learning and Microarrays Web Site: bwindle Link to Courses and

Brad Windle, [email protected]

Unsupervised Learningand Microarrays

Web Site: http://www.people.vcu.edu/~bwindleLink to Courses and then lecture for this class

Page 2: Brad Windle, Ph.D. 628-1956 bwindle@hsc.vcu.edu Unsupervised Learning and Microarrays Web Site: bwindle Link to Courses and

Gene Expression Profiling

Unsupervised Learning

Cluster Analysisand

Applications

Good review of microarray data analysis isComputational analysis of microarray data.Quackenbush J. Nat Rev Genet 2001 Jun;2(6):418-427

Page 3: Brad Windle, Ph.D. 628-1956 bwindle@hsc.vcu.edu Unsupervised Learning and Microarrays Web Site: bwindle Link to Courses and

Reductionism versus Systems Approach

Why generate global analyses?

as opposed to picking a gene/protein and hoping you get lucky and it has great significance to the big picture or to mankind’s health.

Page 4: Brad Windle, Ph.D. 628-1956 bwindle@hsc.vcu.edu Unsupervised Learning and Microarrays Web Site: bwindle Link to Courses and

Normalizing Data

Northern blot

For normalizing samples, you would divide experimental values bythe mean of the values thought to be constant through the samples

Page 5: Brad Windle, Ph.D. 628-1956 bwindle@hsc.vcu.edu Unsupervised Learning and Microarrays Web Site: bwindle Link to Courses and

Sample values are typically normalized by dividing by the meanof the reference values or mean of all values

Page 6: Brad Windle, Ph.D. 628-1956 bwindle@hsc.vcu.edu Unsupervised Learning and Microarrays Web Site: bwindle Link to Courses and

What about normalizing gene values across all the samples?

100

10

Rationale for normalizing samples does not apply to genes

One strategy is to subtract the mean (mean centering).

Page 7: Brad Windle, Ph.D. 628-1956 bwindle@hsc.vcu.edu Unsupervised Learning and Microarrays Web Site: bwindle Link to Courses and

Log transformation

.01 1 10 100//

-2 0 2

Page 8: Brad Windle, Ph.D. 628-1956 bwindle@hsc.vcu.edu Unsupervised Learning and Microarrays Web Site: bwindle Link to Courses and

Gene to Gene Variability

Page 9: Brad Windle, Ph.D. 628-1956 bwindle@hsc.vcu.edu Unsupervised Learning and Microarrays Web Site: bwindle Link to Courses and
Page 10: Brad Windle, Ph.D. 628-1956 bwindle@hsc.vcu.edu Unsupervised Learning and Microarrays Web Site: bwindle Link to Courses and

Cluster Analysis

Goal - puts items (genes) together in clusters based on similarity of expression across various conditions, either similarity of absolute expression levels or overall similarity in pattern

Page 11: Brad Windle, Ph.D. 628-1956 bwindle@hsc.vcu.edu Unsupervised Learning and Microarrays Web Site: bwindle Link to Courses and

1

2

34

Page 12: Brad Windle, Ph.D. 628-1956 bwindle@hsc.vcu.edu Unsupervised Learning and Microarrays Web Site: bwindle Link to Courses and

1

2

34

item X Y Z

1 1 1.5 1

2 1.2 1.3 1.5

3 1.4 3.2 4.0

4 5.1 3.5 2.1

d= (X1-X2)2 + (Y1-Y2)2 + (Z1-Z2)2

Page 13: Brad Windle, Ph.D. 628-1956 bwindle@hsc.vcu.edu Unsupervised Learning and Microarrays Web Site: bwindle Link to Courses and

QuickTime™ and aAnimation decompressor

are needed to see this picture.

Page 14: Brad Windle, Ph.D. 628-1956 bwindle@hsc.vcu.edu Unsupervised Learning and Microarrays Web Site: bwindle Link to Courses and

1

2

34

d= (X1-X2)2 + (Y1-Y2)2 + (Z1-Z2)2

item X Y Z

1 1 1.5 1

2 1.2 1.3 1.5

3 1.4 3.2 4.0

4 5.1 3.5 2.1

Page 15: Brad Windle, Ph.D. 628-1956 bwindle@hsc.vcu.edu Unsupervised Learning and Microarrays Web Site: bwindle Link to Courses and

item X Y Z

1 1 1.5 1

2 1.2 1.3 1.5

3 1.4 3.2 4.0

4 5.1 3.5 2.1

1 2 3 4

0 .28 1.75 4.56

.28 0 1.91 4.48

1.75 1.91 0 3.71

4.56 4.48 3.71 0

1

2

3

4

r =n(∑XY) -(∑X)(∑Y)

[n∑X2-(∑X)2][n∑Y2-(∑Y)2]Pearson

1.00 -0.19 0.22 -0.04

-0.19 1.00 0.92 -0.97

0.22 0.92 1.00 -0.98

-0.04 -0.97 -0.98 1.00

1

2

3

4

1 2 3 41 2 3 4

1

2

3

4

0.00 1.19 0.78 1.04

1.19 0.00 0.08 1.97

0.78 0.08 0.00 1.98

1.04 1.97 1.98 0.00

d= 1-r 0 to 2

r= -1 to +1d= 1-|r| 0 to 1

d= 1-r2 0 to 1

Page 16: Brad Windle, Ph.D. 628-1956 bwindle@hsc.vcu.edu Unsupervised Learning and Microarrays Web Site: bwindle Link to Courses and

Item 1

Item 2

Item 3

Item 4

Item 5

Item 6

Item 7

Page 17: Brad Windle, Ph.D. 628-1956 bwindle@hsc.vcu.edu Unsupervised Learning and Microarrays Web Site: bwindle Link to Courses and

1

2

3

4

Hierarchical Clustering

Page 18: Brad Windle, Ph.D. 628-1956 bwindle@hsc.vcu.edu Unsupervised Learning and Microarrays Web Site: bwindle Link to Courses and

Divisive Agglomerative(Aggregative)

Clustering Methods

Page 19: Brad Windle, Ph.D. 628-1956 bwindle@hsc.vcu.edu Unsupervised Learning and Microarrays Web Site: bwindle Link to Courses and

A

B

C

D

.1

.12

.15

.15

.6

.6

A

B

C

D

.1

.12

.2

.3

.2

.6

Cluster Linkage Methods

Nearest Neighboror Single Linkage

Furthest Neighboror Complete Linkage

Average Neighborsor Average Linkage

2N-1

Page 20: Brad Windle, Ph.D. 628-1956 bwindle@hsc.vcu.edu Unsupervised Learning and Microarrays Web Site: bwindle Link to Courses and

item X Y Z

1 1 1.5 1

2 1.2 1.3 1.5

3 1.4 3.2 4.0

4 5.1 3.5 2.1

X Y Z

Page 21: Brad Windle, Ph.D. 628-1956 bwindle@hsc.vcu.edu Unsupervised Learning and Microarrays Web Site: bwindle Link to Courses and

12

3

1 2 3

K-Means Clustering and it’s relative Self-Organizing Maps (SOM)

12

3

1

2

3

Page 22: Brad Windle, Ph.D. 628-1956 bwindle@hsc.vcu.edu Unsupervised Learning and Microarrays Web Site: bwindle Link to Courses and
Page 23: Brad Windle, Ph.D. 628-1956 bwindle@hsc.vcu.edu Unsupervised Learning and Microarrays Web Site: bwindle Link to Courses and

0 10

0

5

10

15

0 5 10 15

Ranking Order Clustering

Page 24: Brad Windle, Ph.D. 628-1956 bwindle@hsc.vcu.edu Unsupervised Learning and Microarrays Web Site: bwindle Link to Courses and

Cluster Playground 3

Page 25: Brad Windle, Ph.D. 628-1956 bwindle@hsc.vcu.edu Unsupervised Learning and Microarrays Web Site: bwindle Link to Courses and

Applications of Gene Expression Profiling andCluster Analysis

Tissue or Tumor Classification

Gene Classification

Drug Classification

Drug Target Identification

Page 26: Brad Windle, Ph.D. 628-1956 bwindle@hsc.vcu.edu Unsupervised Learning and Microarrays Web Site: bwindle Link to Courses and
Page 27: Brad Windle, Ph.D. 628-1956 bwindle@hsc.vcu.edu Unsupervised Learning and Microarrays Web Site: bwindle Link to Courses and

B-Cell LymphomaNATURE 403, 503-511, 2000

Indistinguishable by histology

Yet half responded well to therapy and half did not

Where there differences in gene expression that correlate with drug response?

Gene expression profiles showed half the lymphomas were of GC B-Cell lineage and the other of Activated B-Cell lineage

A subset of genes predicts therapeutic outcome

Page 28: Brad Windle, Ph.D. 628-1956 bwindle@hsc.vcu.edu Unsupervised Learning and Microarrays Web Site: bwindle Link to Courses and
Page 29: Brad Windle, Ph.D. 628-1956 bwindle@hsc.vcu.edu Unsupervised Learning and Microarrays Web Site: bwindle Link to Courses and
Page 30: Brad Windle, Ph.D. 628-1956 bwindle@hsc.vcu.edu Unsupervised Learning and Microarrays Web Site: bwindle Link to Courses and
Page 31: Brad Windle, Ph.D. 628-1956 bwindle@hsc.vcu.edu Unsupervised Learning and Microarrays Web Site: bwindle Link to Courses and

M1 M2 M3 M4 M5 M6

M7 M8 M9 M10M11M12

M13M14M15M16M17M18

D1 D2 D3 D4 D5 D6

D7 D8 D9 D100D11D12

D13D14D15D16D17D18

Gene Expression Profiling of Yeast Mutants and DrugsCell 102, 109–126, 2000

Mutants Drugs

M4 D17

Erg2 Dyclonine

Human sigma receptor

Page 32: Brad Windle, Ph.D. 628-1956 bwindle@hsc.vcu.edu Unsupervised Learning and Microarrays Web Site: bwindle Link to Courses and

Validation of cdc28 Kinase Target InhibitionSCIENCE 281, 533-538, 1998

cdc28-

D1 D2

} Cdc28-regulated genes

} Phosphate metabolism genes

Nucleotide analogs that block cdc28pD1 and D2

Pho85

Page 33: Brad Windle, Ph.D. 628-1956 bwindle@hsc.vcu.edu Unsupervised Learning and Microarrays Web Site: bwindle Link to Courses and

Drug 12345

CellsA B C D E

-2 -1 0 -1 .01 1 -1.5 2 0 -.5 .4 0 1 1 .2 0 .7 2 1 .9 1 0 -.5 .5 -.8

COMPAREClustering Drugs Based on Cell Line Sensitivities

Nature Genetics 24: 236-244, 2000

Page 34: Brad Windle, Ph.D. 628-1956 bwindle@hsc.vcu.edu Unsupervised Learning and Microarrays Web Site: bwindle Link to Courses and

T1T1T1T1T1T2T2A7A7T2A7A7A7A7A7A7A7T1T1T1T1T1

Page 35: Brad Windle, Ph.D. 628-1956 bwindle@hsc.vcu.edu Unsupervised Learning and Microarrays Web Site: bwindle Link to Courses and

ProfilingGene

Expression

ProteinExpression

MiscData

SNPs

Methylation

DrugStructure

ProteinStructure

Cell State

Disease Drug Response

MetaboliticsStructuralGenomic

Page 36: Brad Windle, Ph.D. 628-1956 bwindle@hsc.vcu.edu Unsupervised Learning and Microarrays Web Site: bwindle Link to Courses and

Clustering NCI 60 Cancer Cell LinesNature Genetics 24: 227-238

6165 Genes

9 Types of Tissues/Tumors

BreastCNSColonLeukemiaLungMelanomaOvarianProstateRenal

Page 37: Brad Windle, Ph.D. 628-1956 bwindle@hsc.vcu.edu Unsupervised Learning and Microarrays Web Site: bwindle Link to Courses and

Filtering Data

Filter out data with the program Cluster, based on SD cuts