Upload
lesley-lyons
View
218
Download
3
Embed Size (px)
Citation preview
Types of data and how to present them
Types of data and how to present them
47:269: Research Methods I
Dr. LeonardMarch 31, 2010
47:269: Research Methods I
Dr. LeonardMarch 31, 2010
Scientific TheoryScientific Theory1. Formulate theories
2. Develop testable hypotheses (operational definitions)
3. Conduct research, gather data
4. Evaluate hypotheses based on data
5. Cautiously draw conclusions
1. Formulate theories
2. Develop testable hypotheses (operational definitions)
3. Conduct research, gather data
4. Evaluate hypotheses based on data
5. Cautiously draw conclusions
Scales of MeasurementScales of Measurement NominalNominal
Categories OrdinalOrdinal
Categories that can be ranked Interval Interval
Scores with equidistant intervals between them
RatioRatio Scores with equidistant
intervals and absolute zero
NominalNominal Categories
OrdinalOrdinal Categories that can be ranked
Interval Interval Scores with equidistant
intervals between them RatioRatio
Scores with equidistant intervals and absolute zero
Responses are distinct
Responses can be ranked
Equal intervals
Absolute zero
Nominal YES NO NO NO
Ordinal YES YES NO NO
Interval YES YES YES NO
Ratio YES YES YES YES
Two major approaches to using dataTwo major approaches to using data Descriptive statistics
Describe or summarize data to characterize sample
Organizes responses to show trends in data Inferential statistics
Draw inferences about population from sample (is population distinct from sample?)
Significance tests Capture impact of random error on responses
Margin of error Note: Statistics describe responses from a sample;
parameters describe responses from a population (e.g., a census)
Descriptive statistics Describe or summarize data to characterize
sample Organizes responses to show trends in data
Inferential statistics Draw inferences about population from sample
(is population distinct from sample?) Significance tests
Capture impact of random error on responses Margin of error
Note: Statistics describe responses from a sample; parameters describe responses from a population (e.g., a census)
Descriptive StatisticsDescriptive Statistics N, total number of cases (responses) in a sample
Our class would be N = 33 f, or frequency, is the number of participants who
gave a particular response, x Can also be given as percentages or proportions
Can be univariate or bivariate How participants vary on one variable (uni-) How participants vary on two variables (bi-)
Descriptive statistics are a good first step for analyzing any data! They are the only statistics appropriate for nominal data
N, total number of cases (responses) in a sample Our class would be N = 33
f, or frequency, is the number of participants who gave a particular response, x Can also be given as percentages or proportions
Can be univariate or bivariate How participants vary on one variable (uni-) How participants vary on two variables (bi-)
Descriptive statistics are a good first step for analyzing any data! They are the only statistics appropriate for nominal data
Frequency distribution (nominal data)
Frequency distribution (nominal data)
x (response) f (frequency) %
Democrat 479 47.9
Republican 411 41.1
Independent 101 10.1
Green party 9 0.9
Total n = 1,000 100%
Frequency distribution (interval or ratio data)Frequency distribution (interval or ratio data) When you need to present a wide range of scores, show responses grouped
in intervals to make it easier to grasp “big picture” of data
Interval f.90 - 1.1 11.2 - 1.4 31.5 - 1.7 31.8 - 2.0 52.1 - 2.3 62.4 - 2.6 72.7 - 2.9 103.0 - 3.2 143.3 - 3.5 12 3.6 - 3.8 3
When you need to present a wide range of scores, show responses grouped in intervals to make it easier to grasp “big picture” of data
Interval f.90 - 1.1 11.2 - 1.4 31.5 - 1.7 31.8 - 2.0 52.1 - 2.3 62.4 - 2.6 72.7 - 2.9 103.0 - 3.2 143.3 - 3.5 12 3.6 - 3.8 3
2.7 1.9 1.0 3.3 1.3 1.8 2.6 3.7
3.1 2.2 3.0 3.4 3.1 2.2 1.9 3.1
3.4 3.0 3.5 3.0 2.4 3.0 3.4 2.4
2.4 3.2 3.3 2.7 3.5 3.2 3.1 3.3
2.1 1.5 2.7 2.4 3.4 3.3 3.0 3.8
1.4 2.6 2.9 2.1 2.6 1.5 2.8 2.3
2.3 3.1 1.6 2.8 2.3 2.8 3.2 2.8
2.8 3.8 1.4 1.9 3.3 2.9 2.0 3.2
Frequency distributions can be depicted graphically in…
Bar graphs Bars not touching because of
discrete data Nominal and ordinal data
Histograms Bars touching because of
continuous data Interval and ratio data
Frequency polygons (single line) Interval and ratio data
Frequency distributions can be depicted graphically in…
Bar graphs Bars not touching because of
discrete data Nominal and ordinal data
Histograms Bars touching because of
continuous data Interval and ratio data
Frequency polygons (single line) Interval and ratio data
S h a p e s o f D i s t r i b u t i o n s
X
X
X
_ n o r m a l
_ p o s i t i v e s k e w
_ n e g a t i v e s k e w
S h a p e s o f D i s t r i b u t i o n s
X
X
X
_ n o r m a l
_ p l a t y k u r t i c
_ l e p t o k u r t i c
S h a p e s o f D i s t r i b u t i o n s
X
X
X
_ n o r m a l
_ p l a t y k u r t i c
_ l e p t o k u r t i c
What else can we do besides frequencies?What else can we do besides frequencies? Measures of central tendency show the central or “typical”
scores in a distribution
Mean- the average score Median- the middle score Mode- the most frequent score
The mean, median, and mode are related to the horizontal shape (skew) of the distribution.
In a normal distribution: Mean = Median = Mode In a positively skewed distribution: Mode < Median < Mean In a negatively skewed distribution: Mean < Median < Mode
Measures of central tendency show the central or “typical” scores in a distribution
Mean- the average score Median- the middle score Mode- the most frequent score
The mean, median, and mode are related to the horizontal shape (skew) of the distribution.
In a normal distribution: Mean = Median = Mode In a positively skewed distribution: Mode < Median < Mean In a negatively skewed distribution: Mean < Median < Mode
Which measure of central tendency???
Different measures of central tendency are appropriate depending upon the level of measurement used:
Nominal Ordinal Interval/Ratio Mode Mode Mode
Median Median Mean
Which measure of central tendency???
Different measures of central tendency are appropriate depending upon the level of measurement used:
Nominal Ordinal Interval/Ratio Mode Mode Mode
Median Median Mean
The MeanThe Mean The most informative and elegant measure of
central tendency. The average The fulcrum point of the distribution
2 4 6 8 10 2 4 6 8 15
The MedianThe Median
The middle most score in a distribution. The scale value below which and above which
50% of the distribution falls Not the fulcrum: The halfway point
The middle most score in a distribution. The scale value below which and above which
50% of the distribution falls Not the fulcrum: The halfway point
2 4 6 8 10 2 4 6 8 15
The MedianThe Median If N is odd, then median is the center score
If N is even, then median is the average of the two centermost score
If N is odd, then median is the center score
If N is even, then median is the average of the two centermost score
2 4 6 8 152 4 6 8 10
2 4 6 8 10 12 2 4 6 8 1510
The MedianThe Median If the median occurs at a value
where there are tied scores, use the tied score as the median
If the median occurs at a value where there are tied scores, use the tied score as the median
2 4 6 8 1510
108
10
The ModeThe Mode The most frequent score in the distribution The most frequent score in the distribution
2 4 6 8 1510
108
10
2 4 6 8 1510
108
One more thing…One more thing… These measures of central tendency vary in their sampling
stability = match between the sample mean (e.g., x) and the population mean (μ). Mode Median Mean
• Note: Roman (r, s, x) characters are used for sample statistics while Greek (, , ) characters are used for population statistics.
These measures of central tendency vary in their sampling stability = match between the sample mean (e.g., x) and the population mean (μ). Mode Median Mean
• Note: Roman (r, s, x) characters are used for sample statistics while Greek (, , ) characters are used for population statistics.
Least sampling
stability
Most sampling
stability
Review of central tendencyReview of central tendency Which one is the only appropriate measure for nominal data?
The mode How do you find the median when there is an odd number of scores?
Simply locate the score in the middle …when there is an even number of scores?
Average the two middle scores Which measure is most sensitive to extreme scores and why?
The mean because it takes all scores into account and can be swayed by positive or negative skew
Which measure has the most sampling stability and why? The mean because it is the most accurate representation of the
overall sample
Which one is the only appropriate measure for nominal data? The mode
How do you find the median when there is an odd number of scores? Simply locate the score in the middle
…when there is an even number of scores? Average the two middle scores
Which measure is most sensitive to extreme scores and why? The mean because it takes all scores into account and can be
swayed by positive or negative skew Which measure has the most sampling stability and why?
The mean because it is the most accurate representation of the overall sample
Application of central tendency
Application of central tendency
In 2006, the median home price in Boston was $386,300. (San Francisco was $518,400; Washington D.C was $258,700).
How do you interpret these numbers?
Why are housing prices framed in terms of the median rather than the mean or the mode?
In 2006, the median home price in Boston was $386,300. (San Francisco was $518,400; Washington D.C was $258,700).
How do you interpret these numbers?
Why are housing prices framed in terms of the median rather than the mean or the mode?
Measures of variabilityMeasures of variability Measures of central tendency…indicate the typical scores in a distribution…are related to skew (horizontal)
Measures of variability …show the dispersion of scores in a
distribution…are related to kurtosis (vertical)
Measures of central tendency…indicate the typical scores in a distribution…are related to skew (horizontal)
Measures of variability …show the dispersion of scores in a
distribution…are related to kurtosis (vertical)
Measures of variabilityMeasures of variability Range - the difference between the
highest and lowest score
Variance - the total variation (distance) from the mean of all the scores
Standard deviation - the average variation (distance) from the mean of all the scores
Range - the difference between the highest and lowest score
Variance - the total variation (distance) from the mean of all the scores
Standard deviation - the average variation (distance) from the mean of all the scores
Measures of variabilityMeasures of variability
Range = Highest Score – Lowest Score
Most sensitive to extreme scores!
Range = Highest Score – Lowest Score
Most sensitive to extreme scores!
2 4 6 8 10
2 4 6 8 15
Measures of variabilityMeasures of variability Again, variance is the overall distance from
the mean of all scores (requires squaring the distance of each score from the mean)
Not as useful as the standard deviation -- the average distance scores fall from the mean
Again, variance is the overall distance from the mean of all scores (requires squaring the distance of each score from the mean)
Not as useful as the standard deviation -- the average distance scores fall from the mean
Measures of variabilityMeasures of variability Standard deviation, like the mean, is the most
informative and elegant measure of variability. The average distance of scores from the mean
score -- deviation is distance!
Also like the mean, standard deviation has the most sampling stability
Standard deviation, like the mean, is the most informative and elegant measure of variability.
The average distance of scores from the mean score -- deviation is distance!
Also like the mean, standard deviation has the most sampling stability
2 4 6 8 10
How would these standard deviations differ?How would these standard deviations differ?
2 4 6 8 1210
108
2 4 6 8 10Mean = 6
Mean = 7.9
Range = 8
Range = 106
Standard deviation and shape of distribution
Standard deviation and shape of distribution
00 55 1010 30302020 25251515 Mean = 15Std. Dev. = 10
1414 1414 1414 16161515 16161515 Mean = 15Std. Dev. = 0.9
Mean = 15
Properties of Normal DistributionsProperties of Normal Distributions•All normal distributions are single peaked,
symmetric, and bell-shaped
•Normal distributions can have different values for mean and standard deviation but…
•All normal distributions follow the 68-95-99 rule68.3% of data within 1 standard deviation of the
mean 95.4% of data within 2 standard deviations of the
mean 99.7% of data within 3 standard deviations of the
mean
99.7% - 95.4% - 68.3% - 95.4% - 99.7%
Mean
99.7% - 95.4% - 68.3% - 95.4% - 99.7%
Mean