20
0 1 2 3 4 1 2 3 4 5 C hips signalvalue Expression Profile = the pattern of signal values for one gene over several chips. Expression Profile Clustering = the clustering of “similar” profiles Why? Similar expression profiles suggest regulation (by shared factor or cluster member) related function ALSO: opposite expression profiles suggest regulation (e.g. inhibition) Software: – EpClust – J-Express – Etc…. Expression Profile Clustering

Expression Profile = the pattern of signal values for one gene over several chips. Expression Profile Clustering = the clustering of similar profiles Why?

Embed Size (px)

Citation preview

  • Slide 1

Expression Profile = the pattern of signal values for one gene over several chips. Expression Profile Clustering = the clustering of similar profiles Why? Similar expression profiles suggest regulation (by shared factor or cluster member) related function ALSO: opposite expression profiles suggest regulation (e.g. inhibition) Software: EpClust J-Express Etc. Expression Profile Clustering Slide 2 Guided Tour of EpClust Slide 3 Data file format Can enter in a variety of formats. See EpClusts data upload page One simple way: Download tab-delimited data from NASC Open with (or paste into) excel Delete all but one name column and Signal value columns. Do not include any hyphens! (I try to avoid all punctuation) Save as tab delimited text Annotation can be added as either: A second file with the same first column of names A single second column in the signal file (if specified) Gene_IDChip1Chip2Chip3 AT1G01010404865 AT1G01030241 AT1G01040798835 AT1G01050713744671 AT1G010605680111 AT1G010707510994 AT1G010803933 Slide 4 To Upload Data Slide 5 To select uploaded data Slide 6 Select Specific Experiments within Input File Slide 7 To Remove Unreliable Data Slide 8 To Choose Algorithm Type Hierarchical Measures distance between each profile (i.e. gene) with each other profile So time increases exponentially with each gene Then clusters closest genes together, followed by increasingly distant ones Into a tree of clusters within clusters can be visualised and finally split at chosen distance with knowledge of tree K-means Initially choose a set number (K) of clusters 1) Chooses the K most different profiles 2) Clusters each remaining profile with one of the K So, much less time than Hierarchical (if many genes) Cluster size can vary so can return too many or few genes with no way to select. Then must repeat with different K values. Slide 9 So Which is Better? K-Means is better able to manage large data-sets. Hierarchical seams a more objective aproach In that you dont need to decide cluster number at start But which is more biologically informative? Opinion divided. Both artificial. Dont prove, just suggest. K-means gives more consistent results. .and then theres SOTA too! Slide 10 Alternative Distance Measurements Slide 11 The most commonly used How similar the SHAPES of the two profiles are Based on average of values and the standard deviation Rates from identical (1) to completely uncorrelated (0) to perfect opposites (-1) Centred Test? If profiles have identical shape, but offset from each other by a fixed value (or magnitude) Centred: Identical (1) Uncentred: Not (