View
252
Download
2
Category
Tags:
Preview:
Citation preview
Introduction to Bioinformatics - Tutorial no. 12
Expression Data Analysis:- Clustering- GEO- EPClust
Application of Microarrays
We only know the function of about 20% of the 30,000 genes in the Human Genome Gene exploration Faster and better
Applications: Evolution Behavior Cancer Research
Microarray Analysis
Unsupervised Grouping: Clustering
Pattern discovery via grouping similarly expressed genes together
Three techniques most often used k-Means Clustering Hierarchical Clustering Kohonen Self Organizing Feature Maps
Hierarchical Agglomerative ClusteringMichael Eisen, 1998
Cluster (algorithm) TreeView (visualization)
Hierarchical Agglomerative Clustering Step 1: Similarity score between all pairs of genes
Pearson Correlation Euclidean distance
Step 2: Find the two most similar genes, replace with a node that contains the average Builds a tree of genes
Step 3: Repeat
52 41 3
Agglomerative Hierarchical Clustering
3
1
4 2
5
Distance between joined clusters
Need to define the distance between thenew cluster and the other clusters.
Single Linkage: distance between closest pair.
Complete Linkage: distance between farthest pair.
Average Linkage: average distance between all pairs
or distance between cluster centers
Need to define the distance between thenew cluster and the other clusters.
Single Linkage: distance between closest pair.
Complete Linkage: distance between farthest pair.
Average Linkage: average distance between all pairs
or distance between cluster centers
Dendrogram
The dendrogram induces a linear ordering of the data points
The dendrogram induces a linear ordering of the data points
Results of Clustering Gene Expression
CLUSTER is simple and easy to use
De facto standard for microarray analysis
Limitations: Hierarchical clustering in
general is not robust Genes may belong to
more than one cluster
K-Means Clustering Algorithm Randomly initialize k cluster means Iterate:
Assign each genes to the nearest cluster mean Recompute cluster means
Stop when clustering converges
Notes: Really fast Genes are partitioned into clusters How do we select k?
K-Means Algorithm
Randomly Initialize Clusters
K-Means Algorithm
Assign data points to nearest clusters
K-Means Algorithm
Recalculate Clusters
K-Means Algorithm
Recalculate Clusters
K-Means Algorithm
Repeat
K-Means Algorithm
Repeat
K-Means Algorithm
Repeat … until convergence
EPClust Input (1)Expression data matrix
Extra annotation for gene rows
Method of tabulation
Name for further analysis
EPClust Input (2)
Method of measuring distance between gene rows
Cluster hierarchically
Number k of means
Cluster into k means
GEO: Gene Expression Omnibus
NCBI database for gene expression data Founded at end of 2000
Querying GEOBrowse records
Search for entries containing a gene
Search for experiments
Search with Entrez
SGD – Expression database
http://db.yeastgenome.org/cgi-bin/expression/expressionConnection.pl
SGD – Expression database
SGD – Expression database
SGD – Expression database
Two labs are running experiments on the APO1 gene. Suggest a method that would allow them to compare their results.
Gene grouping Relative values
Explain how microarrays can be used as a basis for diagnostic
Sample 1
Sample 2
Sample 3
sample4
Sample 5
Gen1+--++Gen2++-+-Gen3-+++-Gen4+++--Gen5--+-+
Explain how microarrays can be used as a basis for diagnostic
Sample 1
Sample 2
sample4
Sample 3
Sample 5
Gen1+-+-+Gen2+++--Gen3-+++-Gen4++-+-Gen5---++
Recommended