30
Types of data and how to present them 47:269: Research Methods I Dr. Leonard March 31, 2010

Types of data and how to present them 47:269: Research Methods I Dr. Leonard March 31, 2010 47:269: Research Methods I Dr. Leonard March 31, 2010

Embed Size (px)

Citation preview

Page 1: Types of data and how to present them 47:269: Research Methods I Dr. Leonard March 31, 2010 47:269: Research Methods I Dr. Leonard March 31, 2010

Types of data and how to present them

Types of data and how to present them

47:269: Research Methods I

Dr. LeonardMarch 31, 2010

47:269: Research Methods I

Dr. LeonardMarch 31, 2010

Page 2: Types of data and how to present them 47:269: Research Methods I Dr. Leonard March 31, 2010 47:269: Research Methods I Dr. Leonard March 31, 2010

Scientific TheoryScientific Theory1. Formulate theories

2. Develop testable hypotheses (operational definitions)

3. Conduct research, gather data

4. Evaluate hypotheses based on data

5. Cautiously draw conclusions

1. Formulate theories

2. Develop testable hypotheses (operational definitions)

3. Conduct research, gather data

4. Evaluate hypotheses based on data

5. Cautiously draw conclusions

Page 3: Types of data and how to present them 47:269: Research Methods I Dr. Leonard March 31, 2010 47:269: Research Methods I Dr. Leonard March 31, 2010

Scales of MeasurementScales of Measurement NominalNominal

Categories OrdinalOrdinal

Categories that can be ranked Interval Interval

Scores with equidistant intervals between them

RatioRatio Scores with equidistant

intervals and absolute zero

NominalNominal Categories

OrdinalOrdinal Categories that can be ranked

Interval Interval Scores with equidistant

intervals between them RatioRatio

Scores with equidistant intervals and absolute zero

Page 4: Types of data and how to present them 47:269: Research Methods I Dr. Leonard March 31, 2010 47:269: Research Methods I Dr. Leonard March 31, 2010

Responses are distinct

Responses can be ranked

Equal intervals

Absolute zero

Nominal YES NO NO NO

Ordinal YES YES NO NO

Interval YES YES YES NO

Ratio YES YES YES YES

Page 5: Types of data and how to present them 47:269: Research Methods I Dr. Leonard March 31, 2010 47:269: Research Methods I Dr. Leonard March 31, 2010

Two major approaches to using dataTwo major approaches to using data Descriptive statistics

Describe or summarize data to characterize sample

Organizes responses to show trends in data Inferential statistics

Draw inferences about population from sample (is population distinct from sample?)

Significance tests Capture impact of random error on responses

Margin of error Note: Statistics describe responses from a sample;

parameters describe responses from a population (e.g., a census)

Descriptive statistics Describe or summarize data to characterize

sample Organizes responses to show trends in data

Inferential statistics Draw inferences about population from sample

(is population distinct from sample?) Significance tests

Capture impact of random error on responses Margin of error

Note: Statistics describe responses from a sample; parameters describe responses from a population (e.g., a census)

Page 6: Types of data and how to present them 47:269: Research Methods I Dr. Leonard March 31, 2010 47:269: Research Methods I Dr. Leonard March 31, 2010

Descriptive StatisticsDescriptive Statistics N, total number of cases (responses) in a sample

Our class would be N = 33 f, or frequency, is the number of participants who

gave a particular response, x Can also be given as percentages or proportions

Can be univariate or bivariate How participants vary on one variable (uni-) How participants vary on two variables (bi-)

Descriptive statistics are a good first step for analyzing any data! They are the only statistics appropriate for nominal data

N, total number of cases (responses) in a sample Our class would be N = 33

f, or frequency, is the number of participants who gave a particular response, x Can also be given as percentages or proportions

Can be univariate or bivariate How participants vary on one variable (uni-) How participants vary on two variables (bi-)

Descriptive statistics are a good first step for analyzing any data! They are the only statistics appropriate for nominal data

Page 7: Types of data and how to present them 47:269: Research Methods I Dr. Leonard March 31, 2010 47:269: Research Methods I Dr. Leonard March 31, 2010

Frequency distribution (nominal data)

Frequency distribution (nominal data)

x (response) f (frequency) %

Democrat 479 47.9

Republican 411 41.1

Independent 101 10.1

Green party 9 0.9

Total n = 1,000 100%

Page 8: Types of data and how to present them 47:269: Research Methods I Dr. Leonard March 31, 2010 47:269: Research Methods I Dr. Leonard March 31, 2010

Frequency distribution (interval or ratio data)Frequency distribution (interval or ratio data) When you need to present a wide range of scores, show responses grouped

in intervals to make it easier to grasp “big picture” of data

Interval f.90 - 1.1 11.2 - 1.4 31.5 - 1.7 31.8 - 2.0 52.1 - 2.3 62.4 - 2.6 72.7 - 2.9 103.0 - 3.2 143.3 - 3.5 12 3.6 - 3.8 3

When you need to present a wide range of scores, show responses grouped in intervals to make it easier to grasp “big picture” of data

Interval f.90 - 1.1 11.2 - 1.4 31.5 - 1.7 31.8 - 2.0 52.1 - 2.3 62.4 - 2.6 72.7 - 2.9 103.0 - 3.2 143.3 - 3.5 12 3.6 - 3.8 3

2.7 1.9 1.0 3.3 1.3 1.8 2.6 3.7

3.1 2.2 3.0 3.4 3.1 2.2 1.9 3.1

3.4 3.0 3.5 3.0 2.4 3.0 3.4 2.4

2.4 3.2 3.3 2.7 3.5 3.2 3.1 3.3

2.1 1.5 2.7 2.4 3.4 3.3 3.0 3.8

1.4 2.6 2.9 2.1 2.6 1.5 2.8 2.3

2.3 3.1 1.6 2.8 2.3 2.8 3.2 2.8

2.8 3.8 1.4 1.9 3.3 2.9 2.0 3.2

Page 9: Types of data and how to present them 47:269: Research Methods I Dr. Leonard March 31, 2010 47:269: Research Methods I Dr. Leonard March 31, 2010

Frequency distributions can be depicted graphically in…

Bar graphs Bars not touching because of

discrete data Nominal and ordinal data

Histograms Bars touching because of

continuous data Interval and ratio data

Frequency polygons (single line) Interval and ratio data

Frequency distributions can be depicted graphically in…

Bar graphs Bars not touching because of

discrete data Nominal and ordinal data

Histograms Bars touching because of

continuous data Interval and ratio data

Frequency polygons (single line) Interval and ratio data

Page 10: Types of data and how to present them 47:269: Research Methods I Dr. Leonard March 31, 2010 47:269: Research Methods I Dr. Leonard March 31, 2010

S h a p e s o f D i s t r i b u t i o n s

X

X

X

_ n o r m a l

_ p o s i t i v e s k e w

_ n e g a t i v e s k e w

Page 11: Types of data and how to present them 47:269: Research Methods I Dr. Leonard March 31, 2010 47:269: Research Methods I Dr. Leonard March 31, 2010

S h a p e s o f D i s t r i b u t i o n s

X

X

X

_ n o r m a l

_ p l a t y k u r t i c

_ l e p t o k u r t i c

S h a p e s o f D i s t r i b u t i o n s

X

X

X

_ n o r m a l

_ p l a t y k u r t i c

_ l e p t o k u r t i c

Page 12: Types of data and how to present them 47:269: Research Methods I Dr. Leonard March 31, 2010 47:269: Research Methods I Dr. Leonard March 31, 2010

What else can we do besides frequencies?What else can we do besides frequencies? Measures of central tendency show the central or “typical”

scores in a distribution

Mean- the average score Median- the middle score Mode- the most frequent score

The mean, median, and mode are related to the horizontal shape (skew) of the distribution.

In a normal distribution: Mean = Median = Mode In a positively skewed distribution: Mode < Median < Mean In a negatively skewed distribution: Mean < Median < Mode

Measures of central tendency show the central or “typical” scores in a distribution

Mean- the average score Median- the middle score Mode- the most frequent score

The mean, median, and mode are related to the horizontal shape (skew) of the distribution.

In a normal distribution: Mean = Median = Mode In a positively skewed distribution: Mode < Median < Mean In a negatively skewed distribution: Mean < Median < Mode

Page 13: Types of data and how to present them 47:269: Research Methods I Dr. Leonard March 31, 2010 47:269: Research Methods I Dr. Leonard March 31, 2010

Which measure of central tendency???

Different measures of central tendency are appropriate depending upon the level of measurement used:

Nominal Ordinal Interval/Ratio Mode Mode Mode

Median Median Mean

Which measure of central tendency???

Different measures of central tendency are appropriate depending upon the level of measurement used:

Nominal Ordinal Interval/Ratio Mode Mode Mode

Median Median Mean

Page 14: Types of data and how to present them 47:269: Research Methods I Dr. Leonard March 31, 2010 47:269: Research Methods I Dr. Leonard March 31, 2010

The MeanThe Mean The most informative and elegant measure of

central tendency. The average The fulcrum point of the distribution

2 4 6 8 10 2 4 6 8 15

Page 15: Types of data and how to present them 47:269: Research Methods I Dr. Leonard March 31, 2010 47:269: Research Methods I Dr. Leonard March 31, 2010

The MedianThe Median

The middle most score in a distribution. The scale value below which and above which

50% of the distribution falls Not the fulcrum: The halfway point

The middle most score in a distribution. The scale value below which and above which

50% of the distribution falls Not the fulcrum: The halfway point

2 4 6 8 10 2 4 6 8 15

Page 16: Types of data and how to present them 47:269: Research Methods I Dr. Leonard March 31, 2010 47:269: Research Methods I Dr. Leonard March 31, 2010

The MedianThe Median If N is odd, then median is the center score

If N is even, then median is the average of the two centermost score

If N is odd, then median is the center score

If N is even, then median is the average of the two centermost score

2 4 6 8 152 4 6 8 10

2 4 6 8 10 12 2 4 6 8 1510

Page 17: Types of data and how to present them 47:269: Research Methods I Dr. Leonard March 31, 2010 47:269: Research Methods I Dr. Leonard March 31, 2010

The MedianThe Median If the median occurs at a value

where there are tied scores, use the tied score as the median

If the median occurs at a value where there are tied scores, use the tied score as the median

2 4 6 8 1510

108

10

Page 18: Types of data and how to present them 47:269: Research Methods I Dr. Leonard March 31, 2010 47:269: Research Methods I Dr. Leonard March 31, 2010

The ModeThe Mode The most frequent score in the distribution The most frequent score in the distribution

2 4 6 8 1510

108

10

2 4 6 8 1510

108

Page 19: Types of data and how to present them 47:269: Research Methods I Dr. Leonard March 31, 2010 47:269: Research Methods I Dr. Leonard March 31, 2010

One more thing…One more thing… These measures of central tendency vary in their sampling

stability = match between the sample mean (e.g., x) and the population mean (μ). Mode Median Mean

• Note: Roman (r, s, x) characters are used for sample statistics while Greek (, , ) characters are used for population statistics.

These measures of central tendency vary in their sampling stability = match between the sample mean (e.g., x) and the population mean (μ). Mode Median Mean

• Note: Roman (r, s, x) characters are used for sample statistics while Greek (, , ) characters are used for population statistics.

Least sampling

stability

Most sampling

stability

Page 20: Types of data and how to present them 47:269: Research Methods I Dr. Leonard March 31, 2010 47:269: Research Methods I Dr. Leonard March 31, 2010

Review of central tendencyReview of central tendency Which one is the only appropriate measure for nominal data?

The mode How do you find the median when there is an odd number of scores?

Simply locate the score in the middle …when there is an even number of scores?

Average the two middle scores Which measure is most sensitive to extreme scores and why?

The mean because it takes all scores into account and can be swayed by positive or negative skew

Which measure has the most sampling stability and why? The mean because it is the most accurate representation of the

overall sample

Which one is the only appropriate measure for nominal data? The mode

How do you find the median when there is an odd number of scores? Simply locate the score in the middle

…when there is an even number of scores? Average the two middle scores

Which measure is most sensitive to extreme scores and why? The mean because it takes all scores into account and can be

swayed by positive or negative skew Which measure has the most sampling stability and why?

The mean because it is the most accurate representation of the overall sample

Page 21: Types of data and how to present them 47:269: Research Methods I Dr. Leonard March 31, 2010 47:269: Research Methods I Dr. Leonard March 31, 2010

Application of central tendency

Application of central tendency

In 2006, the median home price in Boston was $386,300. (San Francisco was $518,400; Washington D.C was $258,700).

How do you interpret these numbers?

Why are housing prices framed in terms of the median rather than the mean or the mode?

In 2006, the median home price in Boston was $386,300. (San Francisco was $518,400; Washington D.C was $258,700).

How do you interpret these numbers?

Why are housing prices framed in terms of the median rather than the mean or the mode?

Page 22: Types of data and how to present them 47:269: Research Methods I Dr. Leonard March 31, 2010 47:269: Research Methods I Dr. Leonard March 31, 2010

Measures of variabilityMeasures of variability Measures of central tendency…indicate the typical scores in a distribution…are related to skew (horizontal)

Measures of variability …show the dispersion of scores in a

distribution…are related to kurtosis (vertical)

Measures of central tendency…indicate the typical scores in a distribution…are related to skew (horizontal)

Measures of variability …show the dispersion of scores in a

distribution…are related to kurtosis (vertical)

Page 23: Types of data and how to present them 47:269: Research Methods I Dr. Leonard March 31, 2010 47:269: Research Methods I Dr. Leonard March 31, 2010

Measures of variabilityMeasures of variability Range - the difference between the

highest and lowest score

Variance - the total variation (distance) from the mean of all the scores

Standard deviation - the average variation (distance) from the mean of all the scores

Range - the difference between the highest and lowest score

Variance - the total variation (distance) from the mean of all the scores

Standard deviation - the average variation (distance) from the mean of all the scores

Page 24: Types of data and how to present them 47:269: Research Methods I Dr. Leonard March 31, 2010 47:269: Research Methods I Dr. Leonard March 31, 2010

Measures of variabilityMeasures of variability

Range = Highest Score – Lowest Score

Most sensitive to extreme scores!

Range = Highest Score – Lowest Score

Most sensitive to extreme scores!

2 4 6 8 10

2 4 6 8 15

Page 25: Types of data and how to present them 47:269: Research Methods I Dr. Leonard March 31, 2010 47:269: Research Methods I Dr. Leonard March 31, 2010

Measures of variabilityMeasures of variability Again, variance is the overall distance from

the mean of all scores (requires squaring the distance of each score from the mean)

Not as useful as the standard deviation -- the average distance scores fall from the mean

Again, variance is the overall distance from the mean of all scores (requires squaring the distance of each score from the mean)

Not as useful as the standard deviation -- the average distance scores fall from the mean

Page 26: Types of data and how to present them 47:269: Research Methods I Dr. Leonard March 31, 2010 47:269: Research Methods I Dr. Leonard March 31, 2010

Measures of variabilityMeasures of variability Standard deviation, like the mean, is the most

informative and elegant measure of variability. The average distance of scores from the mean

score -- deviation is distance!

Also like the mean, standard deviation has the most sampling stability

Standard deviation, like the mean, is the most informative and elegant measure of variability.

The average distance of scores from the mean score -- deviation is distance!

Also like the mean, standard deviation has the most sampling stability

2 4 6 8 10

Page 27: Types of data and how to present them 47:269: Research Methods I Dr. Leonard March 31, 2010 47:269: Research Methods I Dr. Leonard March 31, 2010

How would these standard deviations differ?How would these standard deviations differ?

2 4 6 8 1210

108

2 4 6 8 10Mean = 6

Mean = 7.9

Range = 8

Range = 106

Page 28: Types of data and how to present them 47:269: Research Methods I Dr. Leonard March 31, 2010 47:269: Research Methods I Dr. Leonard March 31, 2010

Standard deviation and shape of distribution

Standard deviation and shape of distribution

00 55 1010 30302020 25251515 Mean = 15Std. Dev. = 10

1414 1414 1414 16161515 16161515 Mean = 15Std. Dev. = 0.9

Mean = 15

Page 29: Types of data and how to present them 47:269: Research Methods I Dr. Leonard March 31, 2010 47:269: Research Methods I Dr. Leonard March 31, 2010

Properties of Normal DistributionsProperties of Normal Distributions•All normal distributions are single peaked,

symmetric, and bell-shaped

•Normal distributions can have different values for mean and standard deviation but…

•All normal distributions follow the 68-95-99 rule68.3% of data within 1 standard deviation of the

mean 95.4% of data within 2 standard deviations of the

mean 99.7% of data within 3 standard deviations of the

mean

Page 30: Types of data and how to present them 47:269: Research Methods I Dr. Leonard March 31, 2010 47:269: Research Methods I Dr. Leonard March 31, 2010

99.7% - 95.4% - 68.3% - 95.4% - 99.7%

Mean

99.7% - 95.4% - 68.3% - 95.4% - 99.7%

Mean