Upload
ahmad-shahir
View
217
Download
0
Embed Size (px)
Citation preview
8/7/2019 IMS 504-Week 4&5 new
1/40
IMS 502IMS 502
Information Analysis for Decision MakingInformation Analysis for Decision Making
DescriptiveDescriptive StatisticsStatistics
8/7/2019 IMS 504-Week 4&5 new
2/40
Content Measure of Location
Measure of Spread
Measure of Shape
8/7/2019 IMS 504-Week 4&5 new
3/40
Learning Outcomes After completing this chapter, you should be
able to:Able to conduct appropriate measure of location,
measure of spread & measure of shape
8/7/2019 IMS 504-Week 4&5 new
4/40
Concept of Descriptive
Statistics
Describing Data NumericallyMeasure of Location
(Central Tendency)
Measure of Spread
(Variability)
Measure of Shape
8/7/2019 IMS 504-Week 4&5 new
5/40
Concept of Descriptive
StatisticsMeasure of Location
(Central Tendency)
MeanMean
MedianMedian
ModeMode
Descriptive information about the single numerical value that isconsidered to be the most typicalof the values of a quantitative variable.
A measure of central tendency is a measure which indicates where the
middlemiddle of the data is. Three common measures of central tendency are:
8/7/2019 IMS 504-Week 4&5 new
6/40
Measures of Central
Tendency
Statistics to represent the centre of a
distributionMode (most frequent)
Median (50th percentile)
Mean (average)
Choice of measure dependent on
Type of data
Shape of distribution (esp. skewness)
8/7/2019 IMS 504-Week 4&5 new
7/40
Measures of Central
Tendency
Measure of Central Tendency
Level of
Measurement
Mode Median Mean
Nominal X
Ordinal X X X
Interval X X X
Ratio X? X X
8/7/2019 IMS 504-Week 4&5 new
8/40
Measures of Central
Tendency Choosing a measure of central tendency
Use the Mode When : 1. The variable is measured at the nominal level.
2. You want a quick and easy measure for ordinal and
interval-ratio variables.
3. You want to report the most common score.
Use the Median When : 1. The variable is measured at the ordinal level.2. A variables measured at the interval-ratio level has a
highly skewed distribution.
3. You want to report the central score. The median
always lies at the exact center of a distribution.
Use the Mean When : 1. The variable is measured at the interval-ratio level
(except when the variable is highly skewed).
2. You want to report the typical score. The mean is the
fulcrum that exactly balances all of the scores.
3. You anticipate additional statistical analysis.
8/7/2019 IMS 504-Week 4&5 new
9/40
Measures of Central
Tendency Mean
Is the average of the data
The Population Mean: = which is usually unknown, then we use the
sample mean to estimate or approximate it.
The Sample Mean:=
x
1
N
ii
N
X=
1
n
i
i
n
x=
8/7/2019 IMS 504-Week 4&5 new
10/40
Measures of Central
Tendency
Example:Example:Example:Example:Here is a random sample of size 10 of ages, where
1 = 42, 2 = 28, 3 = 28, 4 = 61, 5 = 31, 6 = 23, 7 = 50, 8 = 34, 9 = 32, 10 = 37.
= (42 + 28 + + 37) / 10 = 36.6x
8/7/2019 IMS 504-Week 4&5 new
11/40
Measures of Central
Tendency Properties of the Mean
UniquenessUniqueness: For a given set of data there isone and only one mean.
SimplicitySimplicity: It is easy to understand and to
compute.
Affected by extreme valuesAffected by extreme values: Since all valuesenter into the computation.
8/7/2019 IMS 504-Week 4&5 new
12/40
Measures of Central
Tendency
Example:Example:Example:Example:Assume the values are 115, 110, 119, 117, 121 and 126.
The mean = 118.
But assume that the values are 75, 75, 80, 80 and 280.
The mean = 118, a value that is not representativenot representative of the set of data asa whole.
8/7/2019 IMS 504-Week 4&5 new
13/40
Concept of Descriptive
Statistics Median
When orderingordering the data, it is the observation thatdivide the set of observations into two equal partsinto two equal parts
such that half of the data are before it and the other
are after it.
The median is the center point in a set of numbers
(50% above, 50% below)
8/7/2019 IMS 504-Week 4&5 new
14/40
Concept of Descriptive
Statistics Median
Check to see which of the following tworules applies:
Rule One
Ifnn is odd, the median will be the middle ofobservations.
It will be the (n+1)/2 th ordered observation.
When n = 11, then the median is the 6th observation.
Example:Example:Example:Example:
Three is the median for the numbers 1, 1, 33, 4, 9
8/7/2019 IMS 504-Week 4&5 new
15/40
Concept of Descriptive
Statistics Median
Check to see which of the following tworules applies:
Rule Two
Ifnn is even, there are two middle observations. The medianwill be the mean of these two middle observations.
It will be the (n+1)/2 th ordered observation.
When n = 12, then the median is the 6.5th observation,
which is an observation halfway between the 6th and 7thordered observation.
Example:Example:Example:Example:
23, 28, 28, 31, 32, 34, 37, 42, 50, 61.
Since n = 10, then the median is the 5.5th observation, i.e. =(32+34)/2 = 33
8/7/2019 IMS 504-Week 4&5 new
16/40
Measures of Central
Tendency Properties of the Median
UniquenessUniqueness: For a given set of data there isone and only one median.
SimplicitySimplicity: It is easy to calculate.
It is not affected by extreme valuesIt is not affected by extreme values as is the
mean.
8/7/2019 IMS 504-Week 4&5 new
17/40
Concept of Descriptive
Statistics Mode
The mode is simply the most frequentlyfrequentlyoccurring number.
If all values are different there is no modeno mode.
Sometimes, there are more than one modemore than one mode.
8/7/2019 IMS 504-Week 4&5 new
18/40
Measures of Central
Tendency
Example:Example:Example:Example:Assume the values are 23, 28, 28, 31, 32, 34, 37, 42, 50, 61.
The median = 28 (repeated two times).
8/7/2019 IMS 504-Week 4&5 new
19/40
Measures of Central
Tendency Properties of the Mode
Sometimes, it is not unique.not unique.
It may be used for describing qualitativedescribing qualitative
data.data.
8/7/2019 IMS 504-Week 4&5 new
20/40
Measures of Central
Tendency The only procedure in SPSS that will produce all three
commonly used measure of location is Frequency.
To begin:
8/7/2019 IMS 504-Week 4&5 new
21/40
Working Example(Pg. 59) One hundred tennis players participated
in a serving competition. Gender andnumber of aces were recorded for each
player. The data can be found in
Work4.sav on the iLearn web site that
accompanies this title.
Follow steps 1, 2, 3, 4, 6, 8 & 11.
8/7/2019 IMS 504-Week 4&5 new
22/40
Exercises Use the Frequencies command to get all
three (3) measures of location. Write a sentence or two reporting each
measure. You may choose three(3)
variables to report on.
8/7/2019 IMS 504-Week 4&5 new
23/40
Concept of Descriptive
Statistics
Measure of Spread
(Variability/Dispersion)
Give information on the spread or variability of the data values. Measures of deviation from the central tendency.
Non-parametric/non-normal: range, percentiles, min, max
Parametric: standard deviation (SD) & properties of the normal distribution
8/7/2019 IMS 504-Week 4&5 new
24/40
Concept of Descriptive
Statistics
RangeRange
QuartilesQuartiles
VarianceVarianceStandardStandard
DeviationDeviation
Measure of Spread
(Variability/Dispersion)
Four common measures of spread are:
8/7/2019 IMS 504-Week 4&5 new
25/40
Measures of Spread
Measure of Spread
Level of
Measurement
Range, Min/Max Percentile Standard Deviation
(SD)
Nominal
Ordinal X
Interval X X X?
Ratio X X X
8/7/2019 IMS 504-Week 4&5 new
26/40
Measures of Spread Range (R)
The range is the difference between thehighest and lowest scores in a distribution.
If we examine the marks of the 100 students
above, then we can see that the highestscore was 85 and the lowest was 35.
Therefore, the range is 50 (85 35 = 50).
The range is limited as a means of tellingabout the general spread of a group of data,
it does set the boundaries of the scores.
8/7/2019 IMS 504-Week 4&5 new
27/40
Measures of Spread Quartiles (Q)
The quartiles split the ordered data into fourquarters:
Q1, or 25th percentile -- 25.0% of theobservations
Q2, is the median -- 50.0% of the observations Q3, or 75th percentile -- 75.0% of the
observations
The difference between Q3 - Q1 is called theinter-quartile range, or IQR.
25%25% 25%25% 25%25% 25%25%
QQ11 QQ22 QQ33
8/7/2019 IMS 504-Week 4&5 new
28/40
Measures of Spread Variance
The variance is the square of the standarddeviation.
The lower the variance, the more accurately
the mean represents the scores of all casesin a distribution of data.
8/7/2019 IMS 504-Week 4&5 new
29/40
Measures of Spread Standard Deviation
The standard deviation provides theresearcher with an indicator of how scoresfor variables are spread around the mean
average.The higher the standard deviation, the morescores around the mean are spread out.
8/7/2019 IMS 504-Week 4&5 new
30/40
Measures of Spread Using SPSS for measure of spread.
To begin:
8/7/2019 IMS 504-Week 4&5 new
31/40
Working Example(Pg. 59) One hundred tennis players participated
in a serving competition. Gender andnumber of aces were recorded for eachplayer. The data can be found inWork4.sav on the iLearn web site that
accompanies this title. Follow steps 1, 2, 3, 4, 5, 7, 8 & 11.
8/7/2019 IMS 504-Week 4&5 new
32/40
Exercises Choose three(3) variables to work on.
Write a few sentences summarizing thesetables for each under measurement ofspread.
Describe the difference (if any).
8/7/2019 IMS 504-Week 4&5 new
33/40
Concept of Descriptive
Statistics
SkewnessKurtosis
Measure of ShapeMeasure of Shape
To describes how data are distributed. Use the normal curve (combination of mean & standard deviation) to construct
precise descriptive statements. Two common measures of shape are:
Normal
Negative Skewness
Positive Skewness
Platykurtic
Leptokurtic
is a measure of symmetry, or moreprecisely, the lack of symmetry.
is a measure of whether the data are
peaked or flat relative to a normaldistribution.
8/7/2019 IMS 504-Week 4&5 new
34/40
Measure of Shape - Skewness
Mean = Median = ModeMean = Median = Mode
Coefficient = 0Coefficient = 0
Mean < Median < ModeMean < Median < Mode
Coefficient = NegativeCoefficient = Negative
Mode < Median < MeanMode < Median < Mean
Coefficient = PositiveCoefficient = Positive
8/7/2019 IMS 504-Week 4&5 new
35/40
Measure of Shape - Kurtosis
Data distribution with largestandard deviation. Data sets with low kurtosis tendto have a flat top near the mean
rather than a sharp peak. The data will be far away fromthe mean.
Kurtosis, 2 < 0 (Platykurtic)
Data distribution with small
standard deviation. Data sets with high kurtosis tend
to have a distinct peak near the
mean, decline rather rapidly, andhave heavy tails.
The data will cluster around orclose to the Mean.
Kurtosis, 2 > 0 (Leptokurtic)
8/7/2019 IMS 504-Week 4&5 new
36/40
Measure of Shape Using SPSS for measure of shape.
To begin:
8/7/2019 IMS 504-Week 4&5 new
37/40
Working Example(Pg. 59) One hundred tennis players participated
in a serving competition. Gender andnumber of aces were recorded for eachplayer. The data can be found inWork4.sav on the iLearn web site that
accompanies this title. Follow steps 1, 2, 3, 4, Click on Skewness
& Kurtosis, 8, 9, 10 & 11.
8/7/2019 IMS 504-Week 4&5 new
38/40
Exercise Choose three(3) variables to work on.
Write a few sentences summarizing thesetables for each under measurement ofshape.
Describe the difference (if any).
8/7/2019 IMS 504-Week 4&5 new
39/40
8/7/2019 IMS 504-Week 4&5 new
40/40