Upload
brianne-watts
View
215
Download
0
Embed Size (px)
Citation preview
Edpsy 511Edpsy 511
Exploratory Data AnalysisExploratory Data Analysis
Homework 1: Due 9/20Homework 1: Due 9/20
Landmarks in the dataLandmarks in the data
► QuartilesQuartiles We’re often interested in the 25We’re often interested in the 25thth, 50, 50thth and 75 and 75thth
percentiles.percentiles.► 39, 38, 38, 36, 36, 31, 29, 29, 28, 19 39, 38, 38, 36, 36, 31, 29, 29, 28, 19
StepsSteps► First, order the scores from least to greatest.First, order the scores from least to greatest.► Second, Add 1 to the sample size.Second, Add 1 to the sample size.
Why?Why?► Third, Multiply sample size by percentile to find Third, Multiply sample size by percentile to find locationlocation..
Q1 = (10 + 1) * .25Q1 = (10 + 1) * .25 Q2 = (10 + 1) * .50Q2 = (10 + 1) * .50 Q3 = (10 + 1) * .75Q3 = (10 + 1) * .75
► If the value obtained is a fraction take the average of If the value obtained is a fraction take the average of the two adjacent X values.the two adjacent X values.
Box-and-Whiskers Plots (a.k.a., Box-and-Whiskers Plots (a.k.a., Boxplots)Boxplots)
Shapes of DistributionsShapes of Distributions
►Normal distributionNormal distribution►Positive SkewPositive Skew
Or right skewedOr right skewed
►Negative SkewNegative Skew Or left skewedOr left skewed
How is this variable How is this variable distributed?distributed?
87654321
score
3.0
2.5
2.0
1.5
1.0
0.5
0.0
Frequency
Mean = 4.3Std. Dev. = 1.494N = 10
How is this variable How is this variable distributed?distributed?
7.006.005.004.003.002.001.000.00
right
3.0
2.5
2.0
1.5
1.0
0.5
0.0
Frequency
Mean = 2.80Std. Dev. = 1.75119N = 10
How is this variable How is this variable distributed?distributed?
8.007.006.005.004.003.002.00
left
3.0
2.5
2.0
1.5
1.0
0.5
0.0
Frequency
Mean = 5.40Std. Dev. = 1.42984N = 10
Descriptive StatisticsDescriptive Statistics
Statistics vs. ParametersStatistics vs. Parameters
► A parameter is a characteristic of a A parameter is a characteristic of a population.population. It is a numerical or graphic way to summarize It is a numerical or graphic way to summarize
data obtained from the populationdata obtained from the population
► A statistic is a characteristic of a sample.A statistic is a characteristic of a sample. It is a numerical or graphic way to summarize It is a numerical or graphic way to summarize
data obtained from a sampledata obtained from a sample
Types of Numerical DataTypes of Numerical Data
► There are two fundamental types of There are two fundamental types of numerical data:numerical data:
1)1) Categorical data: obtained by determining Categorical data: obtained by determining the frequency of occurrences in each of the frequency of occurrences in each of several categoriesseveral categories
2)2) Quantitative data: obtained by determining Quantitative data: obtained by determining placement on a scale that indicates amount placement on a scale that indicates amount or degreeor degree
Techniques for Summarizing Techniques for Summarizing Quantitative DataQuantitative Data
► Frequency DistributionsFrequency Distributions► HistogramsHistograms► Stem and Leaf PlotsStem and Leaf Plots► Distribution curvesDistribution curves► AveragesAverages► VariabilityVariability
Summary MeasuresSummary Measures
Central Tendency
Arithmetic Mean
Median Mode
Quartile
Summary Measures
Variation
Variance
Standard Deviation
Range
Measures of Central Measures of Central TendencyTendency
Central Tendency
Average (Mean) Median Mode
1
1
n
ii
N
ii
XX
n
X
N
Mean (Arithmetic Mean)Mean (Arithmetic Mean)
►Mean (arithmetic mean) of data Mean (arithmetic mean) of data valuesvalues SampleSample mean mean
PopulationPopulation mean mean
1 1 2
n
ii n
XX X X
Xn n
1 1 2
N
ii N
XX X X
N N
Sample Size
Population Size
MeanMean
►The most common measure of central The most common measure of central tendencytendency
►Affected by extreme values (outliers)Affected by extreme values (outliers)
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 12 14
Mean = 5 Mean = 6
Mean of Grouped FrequencyMean of Grouped Frequency
XX ff fXfX
1010 11
99 33
88 22
77 44
66 66
55 55
TotalTotal NN
2121 fX NfXX /
Weighted MeanWeighted Mean
A form of mean A form of mean obtained from obtained from groups of data in groups of data in which the different which the different sizes of the groups sizes of the groups are accounted for or are accounted for or weighted.weighted.
total
wN
xfx
)(
GroupGroup xbarxbar NN f(xbar)f(xbar)
11 3030 1010
22 2525 1515
33 4040 2525
total
wN
xfx
)(
MedianMedian
►Robust measure of central tendencyRobust measure of central tendency►NotNot affected by extreme values affected by extreme values
►In an In an Ordered arrayOrdered array, median is the , median is the “middle” number“middle” number If n or N is odd, median is the middle numberIf n or N is odd, median is the middle number If n or N is even, median is the average of If n or N is even, median is the average of
the two middle numbersthe two middle numbers
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 12 14
Median = 5 Median = 5
ModeMode► A measure of central tendencyA measure of central tendency► Value that occurs most oftenValue that occurs most often► Not affected by extreme valuesNot affected by extreme values► Used for either numerical or categorical Used for either numerical or categorical
datadata► There may may be no modeThere may may be no mode► There may be several modesThere may be several modes
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Mode = 9
0 1 2 3 4 5 6
No Mode
The Normal CurveThe Normal Curve
Different Distributions ComparedDifferent Distributions Compared
VariabilityVariability
► Refers to the extent to which the scores on a Refers to the extent to which the scores on a quantitative variable in a distribution are spread quantitative variable in a distribution are spread out.out.
► The The rangerange represents the difference between the represents the difference between the highest and lowest scores in a distribution.highest and lowest scores in a distribution.
► A A five number summaryfive number summary reports the lowest, the first reports the lowest, the first quartile, the median, the third quartile, and highest quartile, the median, the third quartile, and highest score.score. Five number summaries are often portrayed graphically by Five number summaries are often portrayed graphically by
the use of the use of box plots.box plots.
VarianceVariance► The Variance, sThe Variance, s22, represents the amount of variability of , represents the amount of variability of
the data relative to their meanthe data relative to their mean► As shown below, the variance is the “average” of the As shown below, the variance is the “average” of the
squared deviations of the observations about their meansquared deviations of the observations about their mean
1
)( 22
n
xxs i
► The Variance, sThe Variance, s22, is the , is the samplesample variance, and is variance, and is used to estimate the actual used to estimate the actual populationpopulation variance, variance, 22
N
xi
22 )(
Standard DeviationStandard Deviation
► Considered the most useful index of variability.Considered the most useful index of variability.► It is a single number that represents the spread of a It is a single number that represents the spread of a
distribution.distribution.► If a distribution is normal, then the mean plus or If a distribution is normal, then the mean plus or
minus 3 SD will encompass about 99% of all scores minus 3 SD will encompass about 99% of all scores in the distribution.in the distribution.
Calculation of the Variance and Standard Calculation of the Variance and Standard Deviation of a DistributionDeviation of a Distribution
√
RawScore Mean X – X (X – X)
2
85 54 31 96180 54 26 67670 54 16 25660 54 6 3655 54 1 150 54 -4 1645 54 -9 8140 54 -14 19630 54 -24 57625 54 -29 841
Variance (SD2) =
Σ(X – X)2
N-1
= 3640
9 =404.44
Standard deviation (SD) = Σ(X – X)2
N-1
Comparing Standard Comparing Standard DeviationsDeviations
Mean = 15.5 S = 3.338 11 12 13 14 15 16 17 18 19 20 21
11 12 13 14 15 16 17 18 19 20 21
Data B
Data A
Mean = 15.5 S = .9258
11 12 13 14 15 16 17 18 19 20 21
Mean = 15.5 S = 4.57
Data C
Facts about the Normal Facts about the Normal DistributionDistribution
► 50% of all the observations fall on each side of 50% of all the observations fall on each side of the mean. the mean.
► 68% of scores fall within 1 SD of the mean in a 68% of scores fall within 1 SD of the mean in a normal distribution.normal distribution.
► 27% of the observations fall between 1 and 2 27% of the observations fall between 1 and 2 SD from the mean.SD from the mean.
► 99.7% of all scores fall within 3 SD of the mean. 99.7% of all scores fall within 3 SD of the mean. ► This is often referred to as the This is often referred to as the 68-95-99.7 rule68-95-99.7 rule
Fifty Percent of All Scores in a Normal Fifty Percent of All Scores in a Normal Curve Fall on Each Side of the MeanCurve Fall on Each Side of the Mean
Probabilities Under the Normal Probabilities Under the Normal CurveCurve
Standard ScoresStandard Scores
► Standard scores use a common scale to indicate how Standard scores use a common scale to indicate how an individual compares to other individuals in a an individual compares to other individuals in a group.group.
► The simplest form of a standard score is a The simplest form of a standard score is a Z scoreZ score..► A A Z score Z score expresses how far a raw score is from the expresses how far a raw score is from the
mean in standard deviation units. mean in standard deviation units. ► Standard scores provide a better basis for comparing Standard scores provide a better basis for comparing
performance on different measures than do raw performance on different measures than do raw scores.scores.
► A A Probability Probability is a percent stated in decimal form and is a percent stated in decimal form and refers to the likelihood of an event occurring.refers to the likelihood of an event occurring.
► T scores T scores are z scores expressed in a different form (z are z scores expressed in a different form (z score x 10 + 50).score x 10 + 50).
Probability Areas Between the Mean Probability Areas Between the Mean and Different Z Scoresand Different Z Scores
Examples of Standard Scores Examples of Standard Scores