29
Chapter 6 Distance Measures From: McCune, B. & J. B. Grace. 2002. Analysis of Ecological Communities. MjM Software Design, Gleneden Beach, Oregon http://www.pcord.com Tables, Figures, and Equations

Chapter 6 Distance Measures From: McCune, B. & J. B. Grace. 2002. Analysis of Ecological Communities. MjM Software Design, Gleneden Beach, Oregon :

  • View
    244

  • Download
    3

Embed Size (px)

Citation preview

Page 1: Chapter 6 Distance Measures From: McCune, B. & J. B. Grace. 2002. Analysis of Ecological Communities. MjM Software Design, Gleneden Beach, Oregon :

Chapter 6Distance Measures

From: McCune, B. & J. B. Grace. 2002. Analysis of Ecological Communities. MjM Software Design, Gleneden Beach, Oregon http://www.pcord.com

Tables, Figures, and Equations

Page 2: Chapter 6 Distance Measures From: McCune, B. & J. B. Grace. 2002. Analysis of Ecological Communities. MjM Software Design, Gleneden Beach, Oregon :

Table 6.1. Example data set. Abundance of two species in two sample units.

Species

Sample unit 1 2

A 1 4 B 5 2

0

1

2

3

4

5

0 1 2 3 4 5Sample Unit A

Sam

ple

Uni

t B

Sample spaceSp 1

Sp 2

0

1

2

3

4

5

0 1 2 3 4 5Species 1

Spec

ies

2

Species space

SU A

SU B

Figure 6.1. Graphical representation of the data set in Table 6.1. The left-hand graph shows species as points in sample space. The right-hand graph shows sample units as points in species space.

Page 3: Chapter 6 Distance Measures From: McCune, B. & J. B. Grace. 2002. Analysis of Ecological Communities. MjM Software Design, Gleneden Beach, Oregon :

Table 6.2. Reasonable and acceptable domains of input data, x, and ranges of distance measures, d = f(x).

Name (synonyms)

Domain of x

Range of d = f(x)

Comments

Sørensen (Bray & Curtis; Czekanowski)

x 0 0 d 1

(or 0 x 100%)

proportion coefficient in city-block space; semimetric

Relative Sørensen (Kulczynski; Quantitative Symmetric)

x 0 0 d 1

(or 0 x 100%)

proportion coefficient in city-block space; same as Sørensen but data points relativized by sample unit totals; semimetric

Jaccard x 0 0 d 1

(or 0 d 100%)

proportion coefficient in city-block space; metric

Euclidean (Pythagorean) all non-negative metric

Relative Euclidean (Chord distance; standardized Euclidean)

all 0 d 2 for quarter hypersphere; 0 d 2 for full hypersphere

Euclidean distance between points on unit hypersphere; metric

Correlation distance all 0 d 1 converted from correlation to distance; proportional to arc distance between points on unit hypersphere; cosine of angle from centroid to points; metric

Chi-square x 0 d 0 Euclidean but doubly weighted by variable and sample unit totals; metric

Squared Euclidean all d 0 metric

Mahalanobis all d 0 distance between groups weighted by within-group dispersion; metric

Page 4: Chapter 6 Distance Measures From: McCune, B. & J. B. Grace. 2002. Analysis of Ecological Communities. MjM Software Design, Gleneden Beach, Oregon :

i,h

j=

p

i, j h, jED = a - a1

2 ( )

i,hj=

p

i, j h, jCB = |a - a |1

City-block distance (= Manhattan distance)

Euclidean distance

Figure 6.2. Geometric representations of basic distance measures between two sample units (A and B) in species space. In the upper two graphs the axes meet at the origin; in the lowest graph, at the centroid.

Page 5: Chapter 6 Distance Measures From: McCune, B. & J. B. Grace. 2002. Analysis of Ecological Communities. MjM Software Design, Gleneden Beach, Oregon :

i,h

j=

p

i, j h, jED = a - a1

2 ( )

i,hj=

p

i, j h, jCB = |a - a |1

City-block distance (= Manhattan distance)

Euclidean distance

Figure 6.2. Geometric representations of basic distance measures between two sample units (A and B) in species space. In the upper two graphs the axes meet at the origin; in the lowest graph, at the centroid.

Page 6: Chapter 6 Distance Measures From: McCune, B. & J. B. Grace. 2002. Analysis of Ecological Communities. MjM Software Design, Gleneden Beach, Oregon :

Distance = x + y k kk

Minkowski metric in two dimensions:

k = 2 gives Euclidean distance

k = 1 gives city-block distance

Page 7: Chapter 6 Distance Measures From: McCune, B. & J. B. Grace. 2002. Analysis of Ecological Communities. MjM Software Design, Gleneden Beach, Oregon :

Distance = x + y k kk

i,hj=

p

i, j h, jk

kDistance = a - a1

( )

Minkowski metric in two dimensions:

Minkowski metric in p dimensions:

k = 2 gives Euclidean distance

k = 1 gives city-block distance

Page 8: Chapter 6 Distance Measures From: McCune, B. & J. B. Grace. 2002. Analysis of Ecological Communities. MjM Software Design, Gleneden Beach, Oregon :

The correlation coefficient can be rescaled to a distance measure of range 0-1 by:

distancer = - r /( )1 2

Page 9: Chapter 6 Distance Measures From: McCune, B. & J. B. Grace. 2002. Analysis of Ecological Communities. MjM Software Design, Gleneden Beach, Oregon :

Sorensen similarity = A B

A B + A B

2( )

( ) ( )

Jaccard similarity = A B

A B

Proportion coefficients

Page 10: Chapter 6 Distance Measures From: McCune, B. & J. B. Grace. 2002. Analysis of Ecological Communities. MjM Software Design, Gleneden Beach, Oregon :

Figure 6.3. Overlap between two species abundances along an environmental gradient. The abundance shared between species A and B is shown by w.

Page 11: Chapter 6 Distance Measures From: McCune, B. & J. B. Grace. 2002. Analysis of Ecological Communities. MjM Software Design, Gleneden Beach, Oregon :

i,hj=

p

ij hj

j=

p

ij

j=

p

hj

D =

|a - a |

a a +1

1 1

ihj=1

p

ij hj

j=

p

ij

j=

p

h,

D = -

a , a

a a +1

2

1 1

MIN( )

Sorensen distance = BC = PD = Dih ih ih100

Page 12: Chapter 6 Distance Measures From: McCune, B. & J. B. Grace. 2002. Analysis of Ecological Communities. MjM Software Design, Gleneden Beach, Oregon :

ihj=

p

ij hj

j=

p

ij

j=

p

hj

j=

p

ij hj

JD =

|a - a |

a a |a - a | + +

21

1 1 1

Jaccard dissimilarity is the proportion of the combined abundance that is not shared, or w / (A + B - w) (Jaccard 1901):

Page 13: Chapter 6 Distance Measures From: McCune, B. & J. B. Grace. 2002. Analysis of Ecological Communities. MjM Software Design, Gleneden Beach, Oregon :

Quantitative symmetric dissimilarity (also known as the Kulczynski or QSK coefficient; see Faith et al. 1987):

ih

j=

p

ij hj

j=

p

ij

j=

p

ij hj

j=

p

hj

QSK

a ,a

a

a , a

a

1

1

2

1

1

1

1

MIN MIN(( ) )

Page 14: Chapter 6 Distance Measures From: McCune, B. & J. B. Grace. 2002. Analysis of Ecological Communities. MjM Software Design, Gleneden Beach, Oregon :

Relative Sørensen (also known as relativized Manhattan coefficient in Faith et al. 1987) is mathematically equivalent to the Bray-Curtis coefficient on data relativized by SU total:

ihij

ijj

p

hj

hjj

pj

p

D = - MINa

a

a

a1

1 1

1

,

ihj

pij

ijj

p

hj

hjj

pD = a

a

a

a

1

2 1

1 1

or:

Page 15: Chapter 6 Distance Measures From: McCune, B. & J. B. Grace. 2002. Analysis of Ecological Communities. MjM Software Design, Gleneden Beach, Oregon :

ihRED =

a

a

- a

aj=1

pij

ijj

p

hj

hjj

p

2

2

1

2

1

Relative Euclidean distance (RED)

Page 16: Chapter 6 Distance Measures From: McCune, B. & J. B. Grace. 2002. Analysis of Ecological Communities. MjM Software Design, Gleneden Beach, Oregon :

Figure 6.4. Relative Euclidean distance is the chord distance between two points on the surface of a unit hypersphere.

Page 17: Chapter 6 Distance Measures From: McCune, B. & J. B. Grace. 2002. Analysis of Ecological Communities. MjM Software Design, Gleneden Beach, Oregon :

Figure 6.4. Relative Euclidean distance is the chord distance between two points on the surface of a unit hypersphere.

θ

r = cos θ

= arccos (r)

Page 18: Chapter 6 Distance Measures From: McCune, B. & J. B. Grace. 2002. Analysis of Ecological Communities. MjM Software Design, Gleneden Beach, Oregon :

Some notation...

Page 19: Chapter 6 Distance Measures From: McCune, B. & J. B. Grace. 2002. Analysis of Ecological Communities. MjM Software Design, Gleneden Beach, Oregon :

then the chi-square distance (Chardy et al. 1976) is:

ihj=

p

+ j

hj

h+

ij

i+

= a

a

a-

a

a2

1

21

If the data are prerelativized by sample unit totals (i.e., bij = aij /ai+), then the equation simplifies to:

ih

j=

phj ij

+ j

= b - b

a2

1

2

Page 20: Chapter 6 Distance Measures From: McCune, B. & J. B. Grace. 2002. Analysis of Ecological Communities. MjM Software Design, Gleneden Beach, Oregon :

Figure 6.5. Illustration of the influence of within-group variance on

Mahalanobis distance.

Page 21: Chapter 6 Distance Measures From: McCune, B. & J. B. Grace. 2002. Analysis of Ecological Communities. MjM Software Design, Gleneden Beach, Oregon :

Mahalanobis distance Dfh2 is used as a distance measure

between two groups (f and h).

fhi=1

p

j=1

p

ij if ih jf jhD = n - g w a - a a - a2 ( ) ( )( )

where

aif is the mean for ith variable in group f

wij is an element from the inverse of the pooled within groups covariance matrix (downweights correlated variables)

n is the number of sample units,

g is the number of groups, and

i j.

Page 22: Chapter 6 Distance Measures From: McCune, B. & J. B. Grace. 2002. Analysis of Ecological Communities. MjM Software Design, Gleneden Beach, Oregon :

0

5

10

15

20

25

0 2 4 6

Environmental Distance

Euc

lide

an D

ista

nce

0

0.2

0.4

0.6

0.8

1

0 2 4 6

Environmental Distance

Sore

nsen

Dis

tanc

e

0

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

0 2 4 6

Environmental Distance

Chi

-squ

are

Dis

tanc

e

00.10.20.30.40.50.60.70.80.9

0 2 4 6

Environmental Distance

Cor

rela

tion

Dis

tanc

e0

100

200

300

400

500

600

700

0 2 4 6

Environmental Distance

Squa

red

Euc

lide

an

0

0.2

0.4

0.6

0.8

1

0 2 4 6

Environmental Distance

Jacc

ard

Dis

tanc

e

Figure 6.6. Relationship between distance in species space for an easy data set, using various distance measures, and environmental distance. The graphs are based on a synthetic data set with noiseless species responses to two known underlying environmental gradients. The gradients were sampled with a 5 5 grid. This is an “easy” data set because the average distance is reasonably small (Sørensen distance = 0.59; 1.3 half changes) all species are similar in abundance (CV of species totals = 37%) and sample units have similar totals (CV of SU totals = 17%).

Page 23: Chapter 6 Distance Measures From: McCune, B. & J. B. Grace. 2002. Analysis of Ecological Communities. MjM Software Design, Gleneden Beach, Oregon :

0

10

20

30

40

50

60

70

80

90

100

0 1 2 3 4 5 6 7 8 9 10 11 12 13

Environmental Distance

Euc

lide

an D

ista

nce

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 1 2 3 4 5 6 7 8 9 10 11 12 13

Environmental Distance

Sore

nsen

Dis

tanc

e

0

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

0 1 2 3 4 5 6 7 8 9 10 11 12 13Environmental Distance

Chi

-squ

are

Dis

tanc

e

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0 1 2 3 4 5 6 7 8 9 10 11 12 13Environmental Distance

Cor

rela

tion

Dis

tanc

e

Figure 6.7. Relationship between distance in species space for a more difficult data set, using various distance measures, and environmental distance. The graphs are based on a synthetic data set with noiseless species responses to two known underlying environmental gradients. The gradients were sampled with a 10 10 grid. This is a “more difficult” data set because the average distance is rather large (Sørensen distance = 0.79; 2.3 half changes), species vary in abundance (CV of species totals = 183%), and sample units have moderately variable totals (CV of SU totals = 40%).

Page 24: Chapter 6 Distance Measures From: McCune, B. & J. B. Grace. 2002. Analysis of Ecological Communities. MjM Software Design, Gleneden Beach, Oregon :

0

0.5

1

1.5

2

2.5

3

0 1 2 3 4 5 6 7 8 9 10 11 12 13

Environmental Distance

Dis

tanc

e in

NM

S O

rdin

atio

n

Figure 6.8. Distance in a 2-D nonmetric multidimensional scaling ordination (NMS) in relation to environmental distances, using the same data set as in Figure 6.7. Note how the ordination overcame the limita- tion of the Sørensen coefficient at expressing large distances.

Page 25: Chapter 6 Distance Measures From: McCune, B. & J. B. Grace. 2002. Analysis of Ecological Communities. MjM Software Design, Gleneden Beach, Oregon :

Data matrix containing abundances

of two species in four plots.

Sp1 Sp2

Plot 1 1 0

Plot 2 1 1

Plot 3 10 0

Plot 4 10 10

Box 6.1. Comparison of Euclidean distance with a proportion coefficient (Sørensen distance). Relative proportions of species 1 and 2 are the same between Plots 1 and 2 and Plots 3 and 4.

Example calculations of distance measures for Plots 3 and 4. ED = Euclidean distance, PD = Sørensen distance as percentage

ED3 42 210 10 10 0 10, ( ) ( )

PD3 4

100 10 10 10 0

10 20333, .

%

Page 26: Chapter 6 Distance Measures From: McCune, B. & J. B. Grace. 2002. Analysis of Ecological Communities. MjM Software Design, Gleneden Beach, Oregon :

Sørensen Distance matrix, expressed as percentages.

Plot 1 Plot 2 Plot 3 Plot 4

Plot 1 0 Plot 2 33.3 0 Plot 3 81.8 83.3 0 Plot 4 90.5 83.3 33.3 0

Euclidean distance matrix

Plot 1 Plot 2 Plot 3 Plot 4

Plot 1 0 Plot 2 1.0 0 Plot 3 9.0 9.1 0 Plot 4 13.4 12.7 10.0 0

Page 27: Chapter 6 Distance Measures From: McCune, B. & J. B. Grace. 2002. Analysis of Ecological Communities. MjM Software Design, Gleneden Beach, Oregon :

The Sørensen distance between Plots 1 and 2 is 0.333 (33.3%), as is the Sørensen distance between Plots 3 and 4, as illustrated below. In both cases the shared abundance is one third of the total abundance. In contrast, the Euclidean distance between Plots 1 and 2 is 1, while the Euclidean distance between Plots 3 and 4 is 10. Thus the Sørensen coefficient expresses the shared abundance as a proportion of the total abundance, while Euclidean distance is unconcerned with proportions.

Page 28: Chapter 6 Distance Measures From: McCune, B. & J. B. Grace. 2002. Analysis of Ecological Communities. MjM Software Design, Gleneden Beach, Oregon :

Box 6.2. Example data set comparing Euclidean and city-block distances, contrasting the effect of squaring differences versus not.

Hypothetical data: abundance of four species in three sample units (SU).

Sp

SU 1 2 3 4

A 4 2 0 1

B 5 1 1 10

C 7 5 3 4

Sample units A,B: species differences d = 1, 1, 1, 9 for each of the four species. Sample units A,C: species differences d = 3, 3, 3, 3

Page 29: Chapter 6 Distance Measures From: McCune, B. & J. B. Grace. 2002. Analysis of Ecological Communities. MjM Software Design, Gleneden Beach, Oregon :

Sample units A,B: species differences d = 1, 1, 1, 9 for each of the four species. Sample units A,C: species differences d = 3, 3, 3, 3

Distance Measure

Pair of SUs Euclidean City-block

AB 9.165 12

AC 6 12

Which distance measure matches your intuition?