Upload
darrell-nash
View
216
Download
0
Embed Size (px)
Citation preview
Descriptive Statistics
Roger L. Brown, Ph.D.Roger L. Brown, Ph.D.
Medical Research ConsultingMedical Research Consulting
Middleton, WIMiddleton, WI
Online Course #1
This online course is a FREE This online course is a FREE service to all MRC clientsservice to all MRC clients
Purpose of this seriesPurpose of this series
To assist researchers in the To assist researchers in the interpretation and application of interpretation and application of
statistical analysesstatistical analyses
Statistics ?Statistics ?
The Science of collecting, The Science of collecting, organizing, analyzing, organizing, analyzing,
interpreting and presenting datainterpreting and presenting data
Topics we will reviewTopics we will review
• Descriptive Statistics
• Frequency Distributions and Histograms
Relative / Cumulative Frequency
• Measures of Central Tendency
Mean, Median, Mode, Midrange
Topics (continued)
• Measures of Dispersion (Variation) Range, Standard Deviation, Variance and Coefficient of variation• Shape Symmetric, Skewed, using Box-and- Whisker Plots• Quartile• Statistical Relationships Correlation , Covariance
A collection of quantitative measures and
ways of describing data. This includes:
Frequency distributions & histograms, measures of central tendency
and
measures of dispersion
Descriptive Statistics
Descriptive Statistics
•Collect Data e.g. Survey
•Present Data e.g. Tables and Graphs
•Characterize Data e.g. Mean
nx i
A Characteristic of a: Population is a Parameter
Sample is a Statistic.
Collection of DataCollection of Data
Survey/questionnaires/interviewsSurvey/questionnaires/interviewsDirect observationDirect observationSecondary data source (e.g., Medical charts)Secondary data source (e.g., Medical charts)
Presenting DataPresenting DataGraphicsGraphics
The visual representation of data may be used not The visual representation of data may be used not only to present results/findings in the data, but only to present results/findings in the data, but may also be used to learn about the data.may also be used to learn about the data.
Summary Measures in Descriptive Statistics
Central Tendency
Mean
Median
Mode
Midrange
Quartile
Summary Measures
Variation
Variance
Standard Deviation
Coefficient of Variation
Range
Measures of Central Tendency
Central Tendency
Mean Median Mode
Midrange
The Mean (Arithmetic Average)
•It is the Arithmetic Average of data values:
•The Most Common Measure of Central Tendency
•Affected by Extreme Values (Outliers)
n
xn
1ii
n
xxx n2i
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 12 14
Mean = 5 Mean = 6
xSample Mean
The Median
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 12 14
Median = 5 Median = 5
•Important Measure of Central Tendency
•In an ordered array, the median is the “middle” number.
•If n is odd, the median is the middle number.•If n is even, the median is the average of the 2
middle numbers.•Not Affected by Extreme Values
The Mode
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Mode = 9
•A Measure of Central Tendency•Value that Occurs Most Often•Not Affected by Extreme Values•There May Not be a Mode•There May be Several Modes•Used for Either Numerical or Categorical Data
0 1 2 3 4 5 6
No Mode
Midrange
•A Measure of Central Tendency
•Average of Smallest and Largest
Observation:
•Affected by Extreme Value
2
xx smallestestl arg
Midrange
0 1 2 3 4 5 6 7 8 9 10
0 1 2 3 4 5 6 7 8 9 10
Midrange = 5 Midrange = 5
Summary Measures in Descriptive Statistics
Central Tendency
Mean
Median
Mode
Midrange
Quartile
Summary Measures
Variation
Variance
Standard Deviation
Coefficient of Variation
Range
QuartilesQuartiles
Not a Measure of Central Tendency Split Ordered Data into 4 Quarters
Position of i-th Quartile: position of point
25% 25% 25% 25%
Q1 Q2 Q3
Q i(n+1)i 4
Data in Ordered Array: 11 12 13 16 16 17 18 21 22
Position of Q1 = 2.50 Q1 =12.5= 1•(9 + 1)4
QuartilesQuartiles
Not a Measure of Central Tendency Split Ordered Data into 4 Quarters
Position of i-th Quartile: position of point
25% 25% 25% 25%
Q1 Q2 Q3
Q i(n+1)i 4
Data in Ordered Array: 11 12 13 16 16 17 18 21 22
Position of Q3 = 7.50 Q3 =19.5= 3•(9 + 1)4
Summary Measures
Central Tendency
Mean
Median
Mode
Midrange
Quartile
Summary Measures
Variation
Variance
Standard Deviation
Coefficient of Variation
Range
Measures of Dispersion (Variation)
Variation
Variance Standard Deviation Coefficient of Variation
PopulationVariance
Sample
Variance
PopulationStandardDeviationSample
Standard
Deviation
Range
Understanding Variation
• The more Spread out or dispersed data
the larger the measures of variation
• The more concentrated or homogenous the data the smaller the measures of variation
• If all observations are equal
measures of variation = Zero
• All measures of variation are Nonnegative
• Measure of Variation
• Difference Between Largest & Smallest Observations:
Range =
• Ignores How Data Are Distributed:
The Range
SmallestrgestLa xx
7 8 9 10 11 12
Range = 12 - 7 = 5
7 8 9 10 11 12
Range = 12 - 7 = 5
•Important Measure of Variation
•Shows Variation About the Mean:
•For the Population:
•For the Sample:
Variance
N
X i
22
1
22
n
XXs i
For the Population: use N in the denominator.
For the Sample : use n - 1 in the denominator.
•Most Important Measure of Variation
•Shows Variation About the Mean:
•For the Population:
•For the Sample:
Standard Deviation
N
X i
2
1
2
n
XXs i
For the Population: use N in the denominator.
For the Sample : use n - 1 in the denominator.
Sample Standard Deviation
1
2
n
XX i For the Sample : use n - 1 in the denominator.
Data: 10 12 14 15 17 18 18 24
s =
n = 8 Mean =16
18
1624161816171615161416121610 2222222
)()()()()()()(
= 4.2426
s
:X i
Comparing Standard Deviations
1
2
n
XX is =
= 4.2426
N
X i
2 = 3.9686
Value for the Standard Deviation is larger for data considered as a Sample.
Data : 10 12 14 15 17 18 18 24:X i
N= 8 Mean =16
Comparing Standard Deviations
Mean = 15.5 s = 3.338 11 12 13 14 15 16 17 18 19 20 21
11 12 13 14 15 16 17 18 19 20 21
Data B
Data A
Mean = 15.5 s = .9258
11 12 13 14 15 16 17 18 19 20 21
Mean = 15.5 s = 4.57
Data C
Coefficient of VariationCoefficient of Variation
Measure of Relative Variation
Always a %
Shows Variation Relative to Mean
Used to Compare 2 or More Groups
Formula ( for Sample):
100%
X
SCV
Comparing Coefficient of VariationComparing Coefficient of Variation
Group A: Average Health Measure = 50 Standard Deviation = 5 Group B: Average Health Measure = 100 Standard Deviation = 5
100%
X
SCV
Coefficient of Variation:
Group A: CV = 10%
Group B: CV = 5%
ShapeShape
Describes How Data Are Distributed Measures of Shape: Symmetric or skewed
ShapeShape
Describes How Data Are Distributed Measures of Shape: Symmetric or skewed
SymmetricMean = Median = Mode
-0.5 <0 < 0.5
ShapeShape
Describes How Data Are Distributed Measures of Shape: Symmetric or skewed
Left-Skewed SymmetricMean = Median = ModeMean Median Mode
< -1 -0.5 <0 < 0.5
ShapeShape
Describes How Data Are Distributed Measures of Shape: Symmetric or skewed
Right-SkewedLeft-Skewed SymmetricMean = Median = ModeMean Median Mode Median MeanMode
< -1 > 1 -0.5 <0 < 0.5
Negatively Skewed Positively Skewed
Box-and-Whisker PlotBox-and-Whisker Plot
Graphical Display of Data Using5-Number Summary
Median
4 6 8 10 12
Q3Q1 XlargestXsmallest
Distribution Shape & Distribution Shape & Box-and-Whisker PlotsBox-and-Whisker Plots
Right-SkewedLeft-Skewed Symmetric
Q1 Median Q3Q1 Median Q3 Q1
Median Q3
SummarySummary Discussed Measures of Central Tendency Mean, Median, Mode, Midrange Quartiles Addressed Measures of Variation The Range, Interquartile Range, Variance, Standard Deviation, Coefficient of Variation Determined Shape of Distributions Symmetric, Skewed, Box-and-Whisker Plot
Mean = Median = ModeMean Median Mode Mode Median Mean