Review of Previous Lecture Range The difference between the
largest and smallest values Interquartile range The difference
between the 25th and 75th percentiles Variance The sum of squares
divided by the population size or the sample size minus one
Standard deviation The square root of the variance Z-scores The
number of standard deviations an observation is away from the
mean
Slide 2
Outline of Session Another Measure of Dispersion Coefficient of
Variation (CV) Histograms Skewness Kurtosis Other Descriptive
Summary Measures
Slide 3
Measures of Dispersion Coefficient of Variation Coefficient of
variation (CV) measures the spread of a set of data as a proportion
of its mean. It is the ratio of the sample standard deviation to
the sample mean It is sometimes expressed as a percentage There is
an equivalent definition for the coefficient of variation of a
population
Slide 4
A standard application of the Coefficient of Variation (CV) is
to characterize the variability of geographic variables over space
or time Coefficient of Variation (CV) is particularly applied to
characterize the interannual variability of climate variables
(e.g., temperature or precipitation) or biophysical variables (leaf
area index (LAI), biomass, etc) Measures of Dispersion Coefficient
of Variation
Slide 5
Chapel Hill (A) Bend (B) Mean1198.10298.07 Standard
Deviation191.8082.08 Coefficient of Variation (CV) 0.16 (16%) 0.28
(28%)
Slide 6
Coefficient of Variation (CV) It is a dimensionless number that
can be used to compare the amount of variance between populations
with different means
Slide 7
Source: http://www.daymet.org
Slide 8
Slide 9
Source: Xiao & Moody, 2004 1990 - 2000 NDVI Temperature
Precipitation CV of NDVI ~ CV of temperature CV of NDVI ~ CV of
precipitation
Slide 10
Measures of Skewness and Kurtosis A fundamental task in many
statistical analyses is to characterize the location and
variability of a data set (Measures of central tendency vs.
measures of dispersion) Both measures tell us nothing about the
shape of the distribution A further characterization of the data
includes skewness and kurtosis The histogram is an effective
graphical technique for showing both the skewness and kurtosis of a
data set
Slide 11
Fig. 3. Histogram of crown width (m) measured in situ for a
random sample of Quercus robur trees in Frame Wood (n = 63; mean =
9.3 m; SD = 4.64 m).Quercus robur Source: Koukoulas &
Blackburn, 2005. Journal of Vegetation Science: Vol. 16, No. 5, pp.
587596 Histograms
Slide 12
Frequency & Distribution A histogram is one way to depict a
frequency distribution Frequency is the number of times a variable
takes on a particular value Note that any variable has a frequency
distribution e.g. roll a pair of dice several times and record the
resulting values (constrained to being between and 2 and 12),
counting the number of times any given value occurs (the frequency
of that value occurring), and take these all together to form a
frequency distribution
Slide 13
Frequency & Distribution Frequencies can be absolute (when
the frequency provided is the actual count of the occurrences) or
relative (when they are normalized by dividing the absolute
frequency by the total number of observations [0, 1]) Relative
frequencies are particularly useful if you want to compare
distributions drawn from two different sources (i.e. while the
numbers of observations of each source may be different)
Slide 14
Histograms We may summarize our data by constructing
histograms, which are vertical bar graphs A histogram is used to
graphically summarize the distribution of a data set A histogram
divides the range of values in a data set into intervals Over each
interval is placed a bar whose height represents the frequency of
data values in the interval.
Slide 15
Building a Histogram To construct a histogram, the data are
first grouped into categories The histogram contains one vertical
bar for each category The height of the bar represents the number
of observations in the category (i.e., frequency) It is common to
note the midpoint of the category on the horizontal axis
Slide 16
1. Develop an ungrouped frequency table That is, we build a
table that counts the number of occurrences of each variable value
from lowest to highest: TMI ValueUngrouped Freq. 4.162 4.174 4.180
13.711 We could attempt to construct a bar chart from this table,
but it would have too many bars to really be useful Building a
Histogram Example
Slide 17
2. Construct a grouped frequency table Select an appropriate
number of classes Percentage
Slide 18
Building a Histogram Example 3. Plot the frequencies of each
class All that remains is to create the bar graph A proxy for Soil
Moisture
Slide 19
Further Moments of the Distribution While measures of
dispersion are useful for helping us describe the width of the
distribution, they tell us nothing about the shape of the
distribution Source: Earickson, RJ, and Harlin, JM. 1994.
Geographic Measurement and Quantitative Analysis. USA: Macmillan
College Publishing Co., p. 91.
Slide 20
There are further statistics that describe the shape of the
distribution, using formulae that are similar to those of the mean
and variance 1 st moment - Mean (describes central value) 2 nd
moment - Variance (describes dispersion) 3 rd moment - Skewness
(describes asymmetry) 4 th moment - Kurtosis (describes peakedness)
Further Moments of the Distribution
Slide 21
Skewness measures the degree of asymmetry exhibited by the data
If skewness equals zero, the histogram is symmetric about the mean
Positive skewness vs negative skewness Further Moments
Skewness
Positive skewness There are more observations below the mean
than above it When the mean is greater than the median Negative
skewness There are a small number of low observations and a large
number of high ones When the median is greater than the mean
Further Moments Skewness
Slide 24
Further Moments Kurtosis Kurtosis measures how peaked the
histogram is The kurtosis of a normal distribution is 0 Kurtosis
characterizes the relative peakedness or flatness of a distribution
compared to the normal distribution
Slide 25
Further Moments Kurtosis Platykurtic When the kurtosis < 0,
the frequencies throughout the curve are closer to be equal (i.e.,
the curve is more flat and wide) Thus, negative kurtosis indicates
a relatively flat distribution Leptokurtic When the kurtosis >
0, there are high frequencies in only a small part of the curve
(i.e, the curve is more peaked) Thus, positive kurtosis indicates a
relatively peaked distribution
Slide 26
Source: http://www.riskglossary.com/link/kurtosis.htm Kurtosis
is based on the size of a distribution's tails. Negative kurtosis
(platykurtic) distributions with short tails Positive kurtosis
(leptokurtic) distributions with relatively long tails Further
Moments Kurtosis leptokurtic platykurtic
Slide 27
Why Do We Need Kurtosis? These two distributions have the same
variance, approximately the same skew, but differ markedly in
kurtosis. Source: http://davidmlane.com/hyperstat/A53638.html
Slide 28
How to Graphically Summarize Data? Histograms Box plots
Slide 29
Functions of a Histogram The function of a histogram is to
graphically summarize the distribution of a data set The histogram
graphically shows the following: 1. Center (i.e., the location) of
the data 2. Spread (i.e., the scale) of the data 3. Skewness of the
data 4. Kurtosis of the data 4. Presence of outliers 5. Presence of
multiple modes in the data.
Slide 30
Functions of a Histogram The histogram can be used to answer
the following questions: 1. What kind of population distribution do
the data come from? 2. Where are the data located? 3. How spread
out are the data? 4. Are the data symmetric or skewed? 5. Are there
outliers in the data?
Box Plots We can also use a box plot to graphically summarize a
data set A box plot represents a graphical summary of what is
sometimes called a five-number summary of the distribution Minimum
Maximum 25 th percentile 75 th percentile Median Interquartile
Range (IQR) Rogerson, p. 8. min. max. 25 th %-ile 75 th %-ile
median
Slide 33
Box Plots Example Consider first 9 Commodore prices ( in $,000)
6.0, 6.7, 3.8, 7.0, 5.8, 9.975, 10.5, 5.99, 20.0 Arrange these in
order of magnitude 3.8, 5.8, 5.99, 6.0, 6.7, 7.0, 9.975, 10.5, 20.0
The median is Q 2 = 6.7 (there are 4 values on either side) Q 1 =
5.9 (median of the 4 smallest values) Q 3 = 10.2 (median of the 4
largest values) IQR = Q 3 Q 1 = 10.2 - 5.9 = 4.3
Slide 34
Example (ranked) 3.8, 5.8, 5.99, 6.0, 6.7, 7.0, 9.975, 10.5,
20.0 The median is Q 1 = 6.7 Q 1 = 5.9 Q 3 = 10.2 IQR = Q 3 Q 1 =
10.2 - 5.9 = 4.3
Other Descriptive Summary Measures Descriptive statistics
provide an organization and summary of a dataset A small number of
summary measures replaces the entirety of a dataset Well briefly
talk about other simple descriptive summary measures
Slide 38
Other Descriptive Summary Measures You're likely already
familiar with some simple descriptive summary measures Ratios
Proportions Percentages Rates of Change Location Quotients
Slide 39
Other Descriptive Summary Measures Ratios # of observations in
A # of observations in B e.g., A - 6 overcast, B - 24 mostly cloudy
days Proportions Relates one part or category of data to the entire
set of observations, e.g., a box of marbles that contains 4 yellow,
6 red, 5 blue, and 2 green gives a yellow proportion of 4/17 or
color count = {yellow, red, blue, green} a count = {4, 6, 5, 2}
=
Slide 40
Other Descriptive Summary Measures Proportions - Sum of all
proportions = 1. These are useful for comparing two sets of data
w/different sizes and category counts, e.g., a different box of
marbles gives a yellow proportion of 2/23, and in order for this to
be a reasonable comparison we need to know the totals for both
samples Percentages - Calculated by proportions x 100, e.g., 2/23 x
100% = 8.696%, use of these should be restricted to larger samples
sizes, perhaps 20+ observations
Slide 41
Other Descriptive Summary Measures Location Quotients - An
index of relative concentration in space, a comparison of a
region's share of something to the total Example Suppose we have a
region of 1000 Km 2 which we subdivide into three smaller areas of
200, 300, and 500 km 2 (labeled A, B, & C) The region has an
influenza outbreak with 150 cases in A, 100 in B, and 350 in C (a
total of 600 flu cases): Proportion of AreaProportion of
CasesLocation Quotient A 200/1000=0.2 150/600=0.25 0.25/0.2=1.25 B
300/1000=0.3 100/600=0.17 0.17/0.3 = 0.57 C 500/1000=0.5
350/600=0.58 0.58/0.5=1.17
Slide 42
Assignment II Due by Thursday (02/09/2006) Downloadable from
Course website:
http://www.unc.edu/courses/2006spring/geog/090/001/www /