67
Dario Bruzzese Domenico Vistocco [email protected] [email protected] Dario Bruzzese, Domenico Vistocco Compstat 2010 1 / 19 Cutting the dendrogram through permutation tests Department of Preventive Medical Sciences UNIVERSITY OF NAPLES ITALY Department of Economics UNIVERSITY OF CASSINO ITALY

Cutting the dendrogram through permutation tests

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Cutting the dendrogram through permutation tests

Dario Bruzzese Domenico [email protected] [email protected]

Dario Bruzzese, Domenico Vistocco () Compstat 2010 1 / 19

Cutting the dendrogram throughpermutation tests

Department ofPreventive Medical Sciences

UNIVERSITY OF NAPLES ITALY

Department ofEconomics

UNIVERSITY OF CASSINO ITALY

Page 2: Cutting the dendrogram through permutation tests

La Carte

1 Motivation

2 The stairstep-like permutation procedureNotationThe outline

3 Some resultsReal datasetsSynthetic dataset

4 ToDo List

Dario Bruzzese, Domenico Vistocco () Compstat 2010 2 / 19

Page 3: Cutting the dendrogram through permutation tests

La Carte

1 Motivation

2 The stairstep-like permutation procedureNotationThe outline

3 Some resultsReal datasetsSynthetic dataset

4 ToDo List

Dario Bruzzese, Domenico Vistocco () Compstat 2010 3 / 19

Page 4: Cutting the dendrogram through permutation tests

Motivation

Dario Bruzzese, Domenico Vistocco () Compstat 2010 4 / 19

Automatically determine the optimal cut-off level of a dendrogramExplore partitions different from those allowed by an horizontal cut

Page 5: Cutting the dendrogram through permutation tests

Motivation

Dario Bruzzese, Domenico Vistocco () Compstat 2010 4 / 19

Automatically determine the optimal cut-off level of a dendrogramExplore partitions different from those allowed by an horizontal cut

The rep1HighNoise datasetYeung KY, Medvedovic M, Bumgarner KY:Clustering gene-expression data withrepeated measurements.

Genome Biology, 2003, 4:R34

n = 200p = 20

Page 6: Cutting the dendrogram through permutation tests

Motivation

Dario Bruzzese, Domenico Vistocco () Compstat 2010 4 / 19

Automatically determine the optimal cut-off level of a dendrogramExplore partitions different from those allowed by an horizontal cut

Horizontal cutk = 3

Page 7: Cutting the dendrogram through permutation tests

Motivation

Dario Bruzzese, Domenico Vistocco () Compstat 2010 4 / 19

Automatically determine the optimal cut-off level of a dendrogramExplore partitions different from those allowed by an horizontal cut

An alternative cutk = 3

Page 8: Cutting the dendrogram through permutation tests

La Carte

1 Motivation

2 The stairstep-like permutation procedureNotationThe outline

3 Some resultsReal datasetsSynthetic dataset

4 ToDo List

Dario Bruzzese, Domenico Vistocco () Compstat 2010 5 / 19

Page 9: Cutting the dendrogram through permutation tests

Notation

Dario Bruzzese, Domenico Vistocco () Compstat 2010 6 / 19

Let:

n the number of objects to classify;

CkL and Ck

R the two classes merged at level k(k=1,...,n-1)

h(

CkL ∪ Ck

R

)the height necessary to merge

CkL and Ck

R

h(

Ckj

)the height at which Ck

j has been obtained(j ∈ { L, R })

Page 10: Cutting the dendrogram through permutation tests

Notation

Dario Bruzzese, Domenico Vistocco () Compstat 2010 6 / 19

Let:n the number of objects to classify;

CkL and Ck

R the two classes merged at level k(k=1,...,n-1)

h(

CkL ∪ Ck

R

)the height necessary to merge

CkL and Ck

R

h(

Ckj

)the height at which Ck

j has been obtained(j ∈ { L, R })

︸ ︷︷ ︸︸ ︷︷ ︸

Page 11: Cutting the dendrogram through permutation tests

Notation

Dario Bruzzese, Domenico Vistocco () Compstat 2010 6 / 19

Let:n the number of objects to classify;

CkL and Ck

R the two classes merged at level k(k=1,...,n-1)

h(

CkL ∪ Ck

R

)the height necessary to merge

CkL and Ck

R

h(

Ckj

)the height at which Ck

j has been obtained(j ∈ { L, R })

Page 12: Cutting the dendrogram through permutation tests

Notation

Dario Bruzzese, Domenico Vistocco () Compstat 2010 6 / 19

Let:n the number of objects to classify;

CkL and Ck

R the two classes merged at level k(k=1,...,n-1)

h(

CkL ∪ Ck

R

)the height necessary to merge

CkL and Ck

R

h(

Ckj

)the height at which Ck

j has been obtained(j ∈ { L, R })

C1L C1

R

Page 13: Cutting the dendrogram through permutation tests

Notation

Dario Bruzzese, Domenico Vistocco () Compstat 2010 6 / 19

Let:n the number of objects to classify;

CkL and Ck

R the two classes merged at level k(k=1,...,n-1)

h(

CkL ∪ Ck

R

)the height necessary to merge

CkL and Ck

R

h(

Ckj

)the height at which Ck

j has been obtained(j ∈ { L, R })

C2L C2

R

Page 14: Cutting the dendrogram through permutation tests

Notation

Dario Bruzzese, Domenico Vistocco () Compstat 2010 6 / 19

Let:n the number of objects to classify;

CkL and Ck

R the two classes merged at level k(k=1,...,n-1)

h(

CkL ∪ Ck

R

)the height necessary to merge

CkL and Ck

R

h(

Ckj

)the height at which Ck

j has been obtained(j ∈ { L, R })

C3L C3

R

Page 15: Cutting the dendrogram through permutation tests

Notation

Dario Bruzzese, Domenico Vistocco () Compstat 2010 6 / 19

Let:n the number of objects to classify;

CkL and Ck

R the two classes merged at level k(k=1,...,n-1)

h(

CkL ∪ Ck

R

)the height necessary to merge

CkL and Ck

R

h(

Ckj

)the height at which Ck

j has been obtained(j ∈ { L, R })

Page 16: Cutting the dendrogram through permutation tests

Notation

Dario Bruzzese, Domenico Vistocco () Compstat 2010 6 / 19

Let:n the number of objects to classify;

CkL and Ck

R the two classes merged at level k(k=1,...,n-1)

h(

CkL ∪ Ck

R

)the height necessary to merge

CkL and Ck

R

h(

Ckj

)the height at which Ck

j has been obtained(j ∈ { L, R })

C1L C1

R

h(

C1L ∪ C1

R

)

Page 17: Cutting the dendrogram through permutation tests

Notation

Dario Bruzzese, Domenico Vistocco () Compstat 2010 6 / 19

Let:n the number of objects to classify;

CkL and Ck

R the two classes merged at level k(k=1,...,n-1)

h(

CkL ∪ Ck

R

)the height necessary to merge

CkL and Ck

R

h(

Ckj

)the height at which Ck

j has been obtained(j ∈ { L, R })

C2L C2

R

h(

C2L ∪ C2

R

)

Page 18: Cutting the dendrogram through permutation tests

Notation

Dario Bruzzese, Domenico Vistocco () Compstat 2010 6 / 19

Let:n the number of objects to classify;

CkL and Ck

R the two classes merged at level k(k=1,...,n-1)

h(

CkL ∪ Ck

R

)the height necessary to merge

CkL and Ck

R

h(

Ckj

)the height at which Ck

j has been obtained(j ∈ { L, R })

C3L C3

R

h(

C3L ∪ C3

R

)

Page 19: Cutting the dendrogram through permutation tests

Notation

Dario Bruzzese, Domenico Vistocco () Compstat 2010 6 / 19

Let:n the number of objects to classify;

CkL and Ck

R the two classes merged at level k(k=1,...,n-1)

h(

CkL ∪ Ck

R

)the height necessary to merge

CkL and Ck

R

h(

Ckj

)the height at which Ck

j has been obtained(j ∈ { L, R })

Page 20: Cutting the dendrogram through permutation tests

Notation

Dario Bruzzese, Domenico Vistocco () Compstat 2010 6 / 19

Let:n the number of objects to classify;

CkL and Ck

R the two classes merged at level k(k=1,...,n-1)

h(

CkL ∪ Ck

R

)the height necessary to merge

CkL and Ck

R

h(

Ckj

)the height at which Ck

j has been obtained(j ∈ { L, R })

C1L

h(

C1L

)

Page 21: Cutting the dendrogram through permutation tests

Notation

Dario Bruzzese, Domenico Vistocco () Compstat 2010 6 / 19

Let:n the number of objects to classify;

CkL and Ck

R the two classes merged at level k(k=1,...,n-1)

h(

CkL ∪ Ck

R

)the height necessary to merge

CkL and Ck

R

h(

Ckj

)the height at which Ck

j has been obtained(j ∈ { L, R })

C1R

h(

C1R

)

Page 22: Cutting the dendrogram through permutation tests

Notation

Dario Bruzzese, Domenico Vistocco () Compstat 2010 6 / 19

Let:n the number of objects to classify;

CkL and Ck

R the two classes merged at level k(k=1,...,n-1)

h(

CkL ∪ Ck

R

)the height necessary to merge

CkL and Ck

R

h(

Ckj

)the height at which Ck

j has been obtained(j ∈ { L, R })

C2L

h(

C2L

)

Page 23: Cutting the dendrogram through permutation tests

Notation

Dario Bruzzese, Domenico Vistocco () Compstat 2010 6 / 19

Let:n the number of objects to classify;

CkL and Ck

R the two classes merged at level k(k=1,...,n-1)

h(

CkL ∪ Ck

R

)the height necessary to merge

CkL and Ck

R

h(

Ckj

)the height at which Ck

j has been obtained(j ∈ { L, R })

C2R

h(

C2R

)

Page 24: Cutting the dendrogram through permutation tests

Notation

Dario Bruzzese, Domenico Vistocco () Compstat 2010 6 / 19

Let:n the number of objects to classify;

CkL and Ck

R the two classes merged at level k(k=1,...,n-1)

h(

CkL ∪ Ck

R

)the height necessary to merge

CkL and Ck

R

h(

Ckj

)the height at which Ck

j has been obtained(j ∈ { L, R })

C3L

h(

C3L

)

Page 25: Cutting the dendrogram through permutation tests

Notation

Dario Bruzzese, Domenico Vistocco () Compstat 2010 6 / 19

Let:n the number of objects to classify;

CkL and Ck

R the two classes merged at level k(k=1,...,n-1)

h(

CkL ∪ Ck

R

)the height necessary to merge

CkL and Ck

R

h(

Ckj

)the height at which Ck

j has been obtained(j ∈ { L, R })

C3R

h(

C3R

)

Page 26: Cutting the dendrogram through permutation tests

The algorithm - Pseudo CodeInput: A dataset and its related dendrogramOutput: A partition of the dataset

initialization:aggregationLevelsToVisit← h(C1

L ∪ C1R)

permClusters← [ ]i← 1repeat

if C iL ≡ C i

R thenadd C i

L ∪ C iR to permClusters

elseadd h(C i

L) and h(C iR) to aggregationLevelsToVisit

sort aggregationLevelsToVisit in descending orderendremove the first element from aggregationLevelsToVisiti← i+1

until aggregationLevelsToVisit is empty

Dario Bruzzese, Domenico Vistocco () Compstat 2010 7 / 19

Page 27: Cutting the dendrogram through permutation tests

The algorithm - Pseudo CodeInput: A dataset and its related dendrogramOutput: A partition of the dataset

initialization:aggregationLevelsToVisit← h(C1

L ∪ C1R)

permClusters← [ ]i← 1

repeatif C i

L ≡ C iR then

add C iL ∪ C i

R to permClusterselse

add h(C iL) and h(C i

R) to aggregationLevelsToVisitsort aggregationLevelsToVisit in descending order

endremove the first element from aggregationLevelsToVisiti← i+1

until aggregationLevelsToVisit is empty

Dario Bruzzese, Domenico Vistocco () Compstat 2010 7 / 19

Page 28: Cutting the dendrogram through permutation tests

The algorithm - Pseudo CodeInput: A dataset and its related dendrogramOutput: A partition of the dataset

initialization:aggregationLevelsToVisit← h(C1

L ∪ C1R)

permClusters← [ ]i← 1repeat

if C iL ≡ C i

R thenadd C i

L ∪ C iR to permClusters

elseadd h(C i

L) and h(C iR) to aggregationLevelsToVisit

sort aggregationLevelsToVisit in descending orderend

remove the first element from aggregationLevelsToVisiti← i+1

until aggregationLevelsToVisit is empty

Dario Bruzzese, Domenico Vistocco () Compstat 2010 7 / 19

Page 29: Cutting the dendrogram through permutation tests

The algorithm - Pseudo CodeInput: A dataset and its related dendrogramOutput: A partition of the dataset

initialization:aggregationLevelsToVisit← h(C1

L ∪ C1R)

permClusters← [ ]i← 1repeat

if C iL ≡ C i

R thenadd C i

L ∪ C iR to permClusters

elseadd h(C i

L) and h(C iR) to aggregationLevelsToVisit

sort aggregationLevelsToVisit in descending orderendremove the first element from aggregationLevelsToVisiti← i+1

until aggregationLevelsToVisit is empty

Dario Bruzzese, Domenico Vistocco () Compstat 2010 7 / 19

Page 30: Cutting the dendrogram through permutation tests

The algorithm - Pseudo CodeInput: A dataset and its related dendrogramOutput: A partition of the dataset

initialization:aggregationLevelsToVisit← h(C1

L ∪ C1R)

permClusters← [ ]i← 1repeat

if C iL ≡ C i

R thenadd C i

L ∪ C iR to permClusters

elseadd h(C i

L) and h(C iR) to aggregationLevelsToVisit

sort aggregationLevelsToVisit in descending orderendremove the first element from aggregationLevelsToVisiti← i+1

until aggregationLevelsToVisit is empty

Dario Bruzzese, Domenico Vistocco () Compstat 2010 7 / 19

Page 31: Cutting the dendrogram through permutation tests

The algorithm - The outline

Dario Bruzzese, Domenico Vistocco () Compstat 2010 8 / 19

Initializationi ← 0

aggregationLevelsToVisit

h(C1L ∪ C1

R)

permClusters

Page 32: Cutting the dendrogram through permutation tests

The algorithm - The outline

Dario Bruzzese, Domenico Vistocco () Compstat 2010 8 / 19

Iterationi ← 1

aggregationLevelsToVisit

h(C1L ∪ C1

R)

permClusters

h(

C1L ∪ C1

R

)

Page 33: Cutting the dendrogram through permutation tests

The algorithm - The outline

Dario Bruzzese, Domenico Vistocco () Compstat 2010 8 / 19

Iterationi ← 1

aggregationLevelsToVisit

h(C1L ∪ C1

R)

permClusters

clusters to compare

H0 : C1L ≡ C1

R 7→ reject

C1L C1

R

Page 34: Cutting the dendrogram through permutation tests

The algorithm - The outline

Dario Bruzzese, Domenico Vistocco () Compstat 2010 8 / 19

Iterationi ← 1

permClusters

aggregationLevelsToVisit

h(C1L ∪ C1

R),h(C1R),h(C

1L)

h(

C1L

)h(

C1R

)

Page 35: Cutting the dendrogram through permutation tests

The algorithm - The outline

Dario Bruzzese, Domenico Vistocco () Compstat 2010 8 / 19

Iterationi ← 1

permClusters

aggregationLevelsToVisit

h(C1R),h(C

1L)

Page 36: Cutting the dendrogram through permutation tests

The algorithm - The outline

Dario Bruzzese, Domenico Vistocco () Compstat 2010 8 / 19

permClusters

Iterationi ← 2

aggregationLevelsToVisit

h(C1R),h(C

1L)

h(

C1R

)

Page 37: Cutting the dendrogram through permutation tests

The algorithm - The outline

Dario Bruzzese, Domenico Vistocco () Compstat 2010 8 / 19

permClusters

Iterationi ← 2

aggregationLevelsToVisit

h(C1R),h(C

1L)

clusters to compare

H0 : C2L ≡ C2

R 7→ reject

C2L C2

R

Page 38: Cutting the dendrogram through permutation tests

The algorithm - The outline

Dario Bruzzese, Domenico Vistocco () Compstat 2010 8 / 19

permClusters

Iterationi ← 2

aggregationLevelsToVisit

h(C1R),h(C

1L),h(C

2R),h(C

2L)

h(

C2L

)h(

C2R

)

Page 39: Cutting the dendrogram through permutation tests

The algorithm - The outline

Dario Bruzzese, Domenico Vistocco () Compstat 2010 8 / 19

permClusters

Iterationi ← 2

aggregationLevelsToVisit

h(C1L),h(C

2R),h(C

2L)

Page 40: Cutting the dendrogram through permutation tests

The algorithm - The outline

Dario Bruzzese, Domenico Vistocco () Compstat 2010 8 / 19

permClusters

Iterationi ← 3

aggregationLevelsToVisit

h(C1L),h(C

2R),h(C

2L)

h(

C1L

)

Page 41: Cutting the dendrogram through permutation tests

The algorithm - The outline

Dario Bruzzese, Domenico Vistocco () Compstat 2010 8 / 19

permClusters

Iterationi ← 3

aggregationLevelsToVisit

h(C1L),h(C

2R),h(C

2L)

clusters to compare

H0 : C3L ≡ C3

R 7→ reject

C3L C3

R

Page 42: Cutting the dendrogram through permutation tests

The algorithm - The outline

Dario Bruzzese, Domenico Vistocco () Compstat 2010 8 / 19

permClusters

Iterationi ← 3

h(

C3L

)h(

C3R

) aggregationLevelsToVisit

h(C3R),h(C

2R),h(C

2L),h(C

3L)

Page 43: Cutting the dendrogram through permutation tests

The algorithm - The outline

Dario Bruzzese, Domenico Vistocco () Compstat 2010 8 / 19

permClusters

Iterationi ← 4

aggregationLevelsToVisit

h(C3R),h(C

2R),h(C

2L),h(C

3L)

h(

C3R

)

Page 44: Cutting the dendrogram through permutation tests

The algorithm - The outline

Dario Bruzzese, Domenico Vistocco () Compstat 2010 8 / 19

permClusters

Iterationi ← 4

aggregationLevelsToVisit

h(C3R),h(C

2R),h(C

2L),h(C

3L)

C4L C4

R

clusters to compare

H0 : C4L ≡ C4

R 7→ accept

Page 45: Cutting the dendrogram through permutation tests

The algorithm - The outline

Dario Bruzzese, Domenico Vistocco () Compstat 2010 8 / 19

Iterationi ← 4

aggregationLevelsToVisit

h(C3R),h(C

2R),h(C

2L),h(C

3L)

clusters to compare

H0 : C4L ≡ C4

R 7→ accept

permClusters

C4L ∪ C4

R ⇔ C3R

C3R

Page 46: Cutting the dendrogram through permutation tests

The algorithm - The outline

Dario Bruzzese, Domenico Vistocco () Compstat 2010 8 / 19

Iterationi ← 9

aggregationLevelsToVisit

permClusters

C3L ,C

3R,C

2L ,C

4L ,C

4R

Page 47: Cutting the dendrogram through permutation tests

The algorithm - The outline

Dario Bruzzese, Domenico Vistocco () Compstat 2010 8 / 19

Iterationi ← 9

aggregationLevelsToVisit

permClusters

C3L ,C

3R,C

2L ,C

4L ,C

4R

Page 48: Cutting the dendrogram through permutation tests

The algorithm - The permutation Test

Dario Bruzzese, Domenico Vistocco () Compstat 2010 9 / 19

For each aggregation level k a permutation test isdesigned to test the Null Hypothesis that the twogroups Ck

L and CkR really belong to the same

cluster, i.e. :

H0 : CkL ≡ Ck

R

Under this null, mixing up (permuting) the statisticalunits of Ck

L and CkR should not alter the aggregation

process resulting in their merging in.

Page 49: Cutting the dendrogram through permutation tests

The algorithm - The permutation Test

Dario Bruzzese, Domenico Vistocco () Compstat 2010 9 / 19

For each k , the difference between maxj∈{L,R}

h(

Ckj

)and min

j∈{L,R}h(

Ckj

)can be considered as the

minimum cost necessary to merge the two classes..

min h(C3j )

max h(C3j )

Page 50: Cutting the dendrogram through permutation tests

The algorithm - The permutation Test

Dario Bruzzese, Domenico Vistocco () Compstat 2010 9 / 19

For each k , the difference between maxj∈{L,R}

h(

Ckj

)and min

j∈{L,R}h(

Ckj

)can be considered as the

minimum cost necessary to merge the two classes.

The difference between h(

CkL ∪ Ck

R

)and

maxj∈{L,R}

h(

Ckj

)can be, instead, considered as the

cost actually incurred for merging CkL and Ck

R .

h(C3L ∪ C3

R )

max h(C3j )

Page 51: Cutting the dendrogram through permutation tests

The algorithm - The permutation Test

Dario Bruzzese, Domenico Vistocco () Compstat 2010 9 / 19

For each k , the difference between maxj∈{L,R}

h(

Ckj

)and min

j∈{L,R}h(

Ckj

)can be considered as the

minimum cost necessary to merge the two classes.

The difference between h(

CkL ∪ Ck

R

)and

maxj∈{L,R}

h(

Ckj

)can be, instead, considered as the

cost actually incurred for merging CkL and Ck

R .

The ratio between these two differences:

cost(

CkL ∪ Ck

R

)=

maxj∈{L,R}

h(

Ckj

)− min

j∈{L,R}h(

Ckj

)h(Ck

L ∪ CkR

)− max

j∈{L,R}h(

Ckj

)is thus a measure that characterizes the aggregation process resulting in thenew class Ck

L ∪ CkR

Page 52: Cutting the dendrogram through permutation tests

The algorithm - The permutation Test

Dario Bruzzese, Domenico Vistocco () Compstat 2010 10 / 19

C3L C3

R

mC3LmC3

R

Page 53: Cutting the dendrogram through permutation tests

The algorithm - The permutation Test

Dario Bruzzese, Domenico Vistocco () Compstat 2010 10 / 19

C3L C3

R

mC3LmC3

R

mC3L

mC3R

h(mC3L)

h(mC3R)

Page 54: Cutting the dendrogram through permutation tests

The algorithm - The permutation Test

Dario Bruzzese, Domenico Vistocco () Compstat 2010 10 / 19

The ratio:

cost(

mCkL ∪ mCk

R

)=

maxj∈{L,R}

h(

mCkj

)− min

j∈{L,R}h(

mCkj

)h(Ck

L ∪ CkR

)− max

j∈{L,R}h(

mCkj

)is thus a measure that characterizes the aggregation process resulting in thenew (potential) class mCk

L ∪ mCkR

C3L C3

R

mC3LmC3

R

mC3L

mC3R

h(mC3L)

h(mC3R)

Page 55: Cutting the dendrogram through permutation tests

The algorithm - The permutation Test

Dario Bruzzese, Domenico Vistocco () Compstat 2010 10 / 19

Under H0 the aggregation process resulting in the new cluster CkL ∪ Ck

R should be very similar

to the one that potentially produces mCkL ∪ mCk

R ; thus the two values cost(

mCkL ∪ mCk

R

)and

cost(

CkL ∪ Ck

R

)should be close enough.

C3L C3

R

mC3LmC3

R

mC3L

mC3R

h(mC3L)

h(mC3R)

Page 56: Cutting the dendrogram through permutation tests

The algorithm - The permutation Test

Dario Bruzzese, Domenico Vistocco () Compstat 2010 10 / 19

The permutation procedure is repeated M times and each time a new couple mCkL , mCk

R isobtained. The pvalue Montecarlo is thus computed as:

p =#

{cost

(mCk

L ∪ mCkR

)≤ cost

(Ck

L ∪ CkR

)}+ 1

M + 1

C3L C3

R

mC3LmC3

R

mC3L

mC3R

h(mC3L)

h(mC3R)

Page 57: Cutting the dendrogram through permutation tests

La Carte

1 Motivation

2 The stairstep-like permutation procedureNotationThe outline

3 Some resultsReal datasetsSynthetic dataset

4 ToDo List

Dario Bruzzese, Domenico Vistocco () Compstat 2010 11 / 19

Page 58: Cutting the dendrogram through permutation tests

Some results - Real datasets

The yeast galactosedatasetIdeker T, Thorsson V, Ranish JA,Christmas R, Buhler J, Eng JK,Bumgarner RE, Goodlett DR,Aebersold R, Hood LIntegrated genomic andproteomic analyses of asystemically perturbed metabolicnetwork.

Science 2001, 292:929-934.

n = 205p = 80

Dario Bruzzese, Domenico Vistocco () Compstat 2010 12 / 19

Page 59: Cutting the dendrogram through permutation tests

Some results - Real datasets

Dario Bruzzese, Domenico Vistocco () Compstat 2010 12 / 19

% of misclassification = 1.5

Page 60: Cutting the dendrogram through permutation tests

Some results - Real datasets

The diabetes datasetBanfield JD, Raftery AEModel–based Gaussian andNon–Gaussian Clustering.

Biometrics, 1993, 49, 803-821.

n = 145p = 3

Dario Bruzzese, Domenico Vistocco () Compstat 2010 13 / 19

Page 61: Cutting the dendrogram through permutation tests

Some results - Real datasets

Dario Bruzzese, Domenico Vistocco () Compstat 2010 13 / 19

% of misclassification = 15.2

Page 62: Cutting the dendrogram through permutation tests

Some results - Synthetic dataset

QIU W.-L, JOE H. (2009). clusterGeneration: random clustergeneration (with specified degree of separation). R package version1.2.7.

different number of clusters (k = 2; 3; 4; 5; 6; 7)separation index = 0.01different number of variables (p = 5; 10; 15)100 replications for each combination of k and p

Dario Bruzzese, Domenico Vistocco () Compstat 2010 14 / 19

Page 63: Cutting the dendrogram through permutation tests

Some results - Synthetic dataset (p=5)

Dario Bruzzese, Domenico Vistocco () Compstat 2010 15 / 19

Page 64: Cutting the dendrogram through permutation tests

Some results - Synthetic dataset (p=10)

Dario Bruzzese, Domenico Vistocco () Compstat 2010 16 / 19

Page 65: Cutting the dendrogram through permutation tests

Some results - Synthetic dataset (p=15)

Dario Bruzzese, Domenico Vistocco () Compstat 2010 17 / 19

Page 66: Cutting the dendrogram through permutation tests

La Carte

1 Motivation

2 The stairstep-like permutation procedureNotationThe outline

3 Some resultsReal datasetsSynthetic dataset

4 ToDo List

Dario Bruzzese, Domenico Vistocco () Compstat 2010 18 / 19

Page 67: Cutting the dendrogram through permutation tests

ToDo List

Statistical issues

Introducing a penalty term in the permutation test stepQuality measures of the obtained partitionMultiple Testing Problem (???)

Computational issues

profiling and optimizing the R codeI use of compiled codeI deploying a package

Dario Bruzzese, Domenico Vistocco () Compstat 2010 19 / 19