14
SPIN tutorial

SPIN tutorial. Sorting multidimensional data using distance matrices Clustering aims to partition data, such that points within a cluster are more “ similar

  • View
    229

  • Download
    0

Embed Size (px)

Citation preview

Page 1: SPIN tutorial. Sorting multidimensional data using distance matrices Clustering aims to partition data, such that points within a cluster are more “ similar

SPIN tutorial

Page 2: SPIN tutorial. Sorting multidimensional data using distance matrices Clustering aims to partition data, such that points within a cluster are more “ similar

Sorting multidimensionaldata using distance matrices

• Clustering aims to partition data, such that points within a cluster are more “similar” as compared to points outside the cluster.

• Sorting rearranges points into a particular one-dimensional permutation, that reflects the shape of their arrangement.

Page 3: SPIN tutorial. Sorting multidimensional data using distance matrices Clustering aims to partition data, such that points within a cluster are more “ similar

Shape of cluster

50 100 150 200 250 300 350 400 450 500

50

100

150

200

250

300

350

400

450

500

0.51

1.52

2.53

3.54

4.5

-2.4-2.2-2-1.8

-1.6

-0.6-0.4-0.2

00.2

0 0.5 1 1.5 2 2.5 3

-0.5

0

0.5

1

-1-0.5

00.5

100 200 300 400 500 600 700 800 900 1000

100

200

300

400

500

600

700

800

900

1000

3

4

5

6

-1-0.5

00.5

-2

-1.5

-1

-0.5

50 100 150 200 250 300 350 400 450 500

50

100

150

200

250

300

350

400

450

500

50 100 150 200 250 300 350 400 450 500

50

100

150

200

250

300

350

400

450

500

-0.50

0.5

-0.5

0

0.5

-0.5

0

0.5

50 100 150 200 250 300 350 400

50

100

150

200

250

300

350

400

-5 0 5

-5

-4

-3

-2

-1

0

1

2

3

4

5

Page 4: SPIN tutorial. Sorting multidimensional data using distance matrices Clustering aims to partition data, such that points within a cluster are more “ similar

Ordering according to ascension

50 100 150 200 250 300 350 400 450 500

50

100

150

200

250

300

350

400

450

50050 100 150 200 250 300 350 400 450 500

50

100

150

200

250

300

350

400

450

500

-8 -6 -4 -2 0 2 4 6 8

-0.50

0.5

Page 5: SPIN tutorial. Sorting multidimensional data using distance matrices Clustering aims to partition data, such that points within a cluster are more “ similar

Sorting a ring of points

Side to side50 100 150 200 250 300 350 400

50

100

150

200

250

300

350

400

-5 0 5

-5

-4

-3

-2

-1

0

1

2

3

4

5

Neighborhood50 100 150 200 250 300 350 400

50

100

150

200

250

300

350

400

-5 0 5

-5

-4

-3

-2

-1

0

1

2

3

4

5Which is the better approach?

An energy function that penalizes blue points that are far from the main diagonal.

An energy function that penalizes red-points near the main diagonal.

or

Page 6: SPIN tutorial. Sorting multidimensional data using distance matrices Clustering aims to partition data, such that points within a cluster are more “ similar
Page 7: SPIN tutorial. Sorting multidimensional data using distance matrices Clustering aims to partition data, such that points within a cluster are more “ similar
Page 8: SPIN tutorial. Sorting multidimensional data using distance matrices Clustering aims to partition data, such that points within a cluster are more “ similar
Page 9: SPIN tutorial. Sorting multidimensional data using distance matrices Clustering aims to partition data, such that points within a cluster are more “ similar
Page 10: SPIN tutorial. Sorting multidimensional data using distance matrices Clustering aims to partition data, such that points within a cluster are more “ similar

Right

Right

Page 11: SPIN tutorial. Sorting multidimensional data using distance matrices Clustering aims to partition data, such that points within a cluster are more “ similar
Page 12: SPIN tutorial. Sorting multidimensional data using distance matrices Clustering aims to partition data, such that points within a cluster are more “ similar
Page 13: SPIN tutorial. Sorting multidimensional data using distance matrices Clustering aims to partition data, such that points within a cluster are more “ similar
Page 14: SPIN tutorial. Sorting multidimensional data using distance matrices Clustering aims to partition data, such that points within a cluster are more “ similar

1. Sort the genes by setting Width to 10 then pressing the Neighborhood button.

• Question 10: How can you identify potential "clusters" in the sorted distance matrix?

2. Select the last "cluster" in the distance matrix. This can be done in several ways:

• By dragging a box around it in the distance matrix.

• Dragging a box of appropriate height in the expression axes.

• Setting the values of the small text boxes below the distance-matrix to 377 and 392 respectively.

• Note that the results are generally not very sensitive to the exact choice of the "cluster"

3. The sorter now highlights the selected region in all 3 of the top displays. Zoom in on the "cluster" by clicking Zoom in button.

4. Press Transpose to view the samples in the space of the selected cluster.

• Sort the samples by using Side2side.

• Export the PCA image and add the labels.

• Question 11: What is the connection between the labels and the current ordering of the samples?

5. Transpose and Zoom out. Look at the ordering of the samples (columns) in the expression matrix (top middle image).

6. Question 12: Why are they ordered in this particular fashion?

7. Repeat steps 9-11 on the "cluster" of genes between 40 and 80.

• Question 13: What is the major partition of the samples in these gene-space?

• Question 14: What is the connection to previously known labels?

8. You may try to zoom in and out of different groups of genes and try to discover novel partitions of the samples.