View
229
Download
0
Tags:
Embed Size (px)
Citation preview
SPIN tutorial
Sorting multidimensionaldata using distance matrices
• Clustering aims to partition data, such that points within a cluster are more “similar” as compared to points outside the cluster.
• Sorting rearranges points into a particular one-dimensional permutation, that reflects the shape of their arrangement.
Shape of cluster
50 100 150 200 250 300 350 400 450 500
50
100
150
200
250
300
350
400
450
500
0.51
1.52
2.53
3.54
4.5
-2.4-2.2-2-1.8
-1.6
-0.6-0.4-0.2
00.2
0 0.5 1 1.5 2 2.5 3
-0.5
0
0.5
1
-1-0.5
00.5
100 200 300 400 500 600 700 800 900 1000
100
200
300
400
500
600
700
800
900
1000
3
4
5
6
-1-0.5
00.5
-2
-1.5
-1
-0.5
50 100 150 200 250 300 350 400 450 500
50
100
150
200
250
300
350
400
450
500
50 100 150 200 250 300 350 400 450 500
50
100
150
200
250
300
350
400
450
500
-0.50
0.5
-0.5
0
0.5
-0.5
0
0.5
50 100 150 200 250 300 350 400
50
100
150
200
250
300
350
400
-5 0 5
-5
-4
-3
-2
-1
0
1
2
3
4
5
Ordering according to ascension
50 100 150 200 250 300 350 400 450 500
50
100
150
200
250
300
350
400
450
50050 100 150 200 250 300 350 400 450 500
50
100
150
200
250
300
350
400
450
500
-8 -6 -4 -2 0 2 4 6 8
-0.50
0.5
Sorting a ring of points
Side to side50 100 150 200 250 300 350 400
50
100
150
200
250
300
350
400
-5 0 5
-5
-4
-3
-2
-1
0
1
2
3
4
5
Neighborhood50 100 150 200 250 300 350 400
50
100
150
200
250
300
350
400
-5 0 5
-5
-4
-3
-2
-1
0
1
2
3
4
5Which is the better approach?
An energy function that penalizes blue points that are far from the main diagonal.
An energy function that penalizes red-points near the main diagonal.
or
Right
Right
1. Sort the genes by setting Width to 10 then pressing the Neighborhood button.
• Question 10: How can you identify potential "clusters" in the sorted distance matrix?
2. Select the last "cluster" in the distance matrix. This can be done in several ways:
• By dragging a box around it in the distance matrix.
• Dragging a box of appropriate height in the expression axes.
• Setting the values of the small text boxes below the distance-matrix to 377 and 392 respectively.
• Note that the results are generally not very sensitive to the exact choice of the "cluster"
3. The sorter now highlights the selected region in all 3 of the top displays. Zoom in on the "cluster" by clicking Zoom in button.
4. Press Transpose to view the samples in the space of the selected cluster.
• Sort the samples by using Side2side.
• Export the PCA image and add the labels.
• Question 11: What is the connection between the labels and the current ordering of the samples?
5. Transpose and Zoom out. Look at the ordering of the samples (columns) in the expression matrix (top middle image).
6. Question 12: Why are they ordered in this particular fashion?
7. Repeat steps 9-11 on the "cluster" of genes between 40 and 80.
• Question 13: What is the major partition of the samples in these gene-space?
• Question 14: What is the connection to previously known labels?
8. You may try to zoom in and out of different groups of genes and try to discover novel partitions of the samples.