View
234
Download
11
Embed Size (px)
Citation preview
Data AnalysisData Analysis
StatisticsStatistics
OVERVIEWOVERVIEW Getting Ready for Data CollectionGetting Ready for Data Collection The Data Collection ProcessThe Data Collection Process Getting Ready for Data AnalysisGetting Ready for Data Analysis Descriptive StatisticsDescriptive Statistics
GETTING READY FOR GETTING READY FOR DATA COLLECTIONDATA COLLECTION
Four stepsFour steps Constructing a data collection formConstructing a data collection form Establishing a coding strategyEstablishing a coding strategy Collecting the dataCollecting the data Entering data onto the collection Entering data onto the collection
formform
THE DATA COLLECTION THE DATA COLLECTION PROCESSPROCESS
Begins with raw dataBegins with raw data– Raw data are unorganized dataRaw data are unorganized data
CONSTRUCTING DATA CONSTRUCTING DATA COLLECTION FORMSCOLLECTION FORMS
ID ID GendGenderer
GradeGrade BuildingBuilding ReadinReading Scoreg Score
Mathematics Mathematics ScoreScore
11
22
33
44
55
22
22
11
22
22
88
22
88
44
1010
11
66
66
66
66
5555
4141
4646
5656
4545
6060
4444
3737
5959
3232
One column for each variable
One row for each subject
CODING DATACODING DATA
Use single digits when possibleUse single digits when possible Use codes that are simple and unambiguousUse codes that are simple and unambiguous Use codes that are explicit and discreteUse codes that are explicit and discrete
VariableVariable Range of Data Range of Data PossiblePossible
ExampleExample
ID NumberID Number 001 through 200001 through 200 138138
GenderGender 1 or 21 or 2 22
GradeGrade 1, 2, 4, 6, 8, or 101, 2, 4, 6, 8, or 10 44
BuildingBuilding 1 through 61 through 6 11
Reading ScoreReading Score 1 through 1001 through 100 7878
Mathematics ScoreMathematics Score 1 through 1001 through 100 6969
Interpretation
• The process of making pertinent inferences and drawing conclusions concerning the meaning and implications of a research investigation
The BasicsThe Basics
Descriptive statisticsDescriptive statistics Inferential statisticsInferential statistics
Sample statisticsSample statistics Population parametersPopulation parameters
Sample--------------Sample--------------populationpopulation
Sample statisticsSample statistics
Variables in a Variables in a sample or sample or measures measures computed from computed from sample datasample data
Population Population parametersparameters
The variables in a The variables in a population or population or measured measured characteristics of characteristics of the populationthe population
Making Data UsableMaking Data Usable
……Or what to do with all those Or what to do with all those numbersnumbers
Descriptive StatisticsDescriptive Statistics
Frequency Frequency DistributionsDistributions
Organizing a set Organizing a set of data by of data by summarizing the summarizing the number of times a number of times a particular value of particular value of a variable occursa variable occurs
Frequency distribution Frequency distribution of ice cream of ice cream consumptionconsumption
AgeAge Frequency Frequency (number in (number in range)range)
00
1-51-5
6-106-10
11-1511-15
TOTALTOTAL
2525
1515
88
22
5050
Percentage DistributionsPercentage Distributions Organizing the frequency Organizing the frequency
distribution into a chart or distribution into a chart or graph that summarizes graph that summarizes percentage values percentage values associated with particular associated with particular values of a variablevalues of a variable
ProportionProportion The percentage of The percentage of
elements that meet some elements that meet some criterion (percentage, criterion (percentage, fraction or decimal)fraction or decimal)
Frequency distribution Frequency distribution of ice cream of ice cream
consumption by ageconsumption by age
AgeAge Percent Percent (of (of people who people who consumed ice consumed ice cream in cream in range)range)
00
1-51-5
6-106-10
11-1511-15
TOTALTOTAL
5050
3030
1616
44
100%100%
Graphic Representations of Data
Pie Chart: Ice cream consumption
WinterSpringSummerFall
Bar Chart: Frequency of Seasonal Ice Cream consumption
0
10
20
30
40
50
60
70
80
90
Winter Spring Summer Fall
Amt
Bar Chart: Frequency of Seasonal Ice Cream consumption Shown By Gender
0
10
20
30
40
50
60
70
80
90
Winter Spring Summer Fall
MaleFemale
Graphical representation of results from cross tab
Cross tabulationCross tabulation
Cross tabulation:Cross tabulation:– a technique for organizing data by a technique for organizing data by
groups, categories or classes, thus groups, categories or classes, thus facilitating comparisons; facilitating comparisons;
– a joint frequency distribution of a joint frequency distribution of observations on two or more sets of observations on two or more sets of variablesvariables
Types of Cross tabsTypes of Cross tabs Contingency table: the results of Contingency table: the results of
a cross tabulation of two a cross tabulation of two variables, such as survey variables, such as survey questionsquestions
Cross tab of question: Do you have children Cross tab of question: Do you have children under the age of six currently living with under the age of six currently living with you? (2 x 2 table)you? (2 x 2 table)YesYes NoNo TotalTotal
MalesMales 55 1515 2020
FemaleFemaless
1010 2020 3030
TotalTotal 1515 3535 5050
Types of Cross tabsTypes of Cross tabs Percentage cross-tab. Using percentages Percentage cross-tab. Using percentages
helps us make relative comparisons. The helps us make relative comparisons. The total number of respondents/observations total number of respondents/observations may be used as a base for computing the may be used as a base for computing the percentage in each cellpercentage in each cell
Percentage Cross tab : Do you have children Percentage Cross tab : Do you have children under the age of six currently living with under the age of six currently living with you? you? YesYes NoNo TotalTotal
MalesMales 20%20% 80%80% 100% 100% (20)(20)
FemalesFemales 33.33%33.33% 66.66%66.66% 100% 100% (30)(30)
TotalTotal 30%30% 70%70% 100% 100% (50)(50)
Elaboration Analysis of Elaboration Analysis of
Cross tabsCross tabs Analysis of the basic cross-tab for each level Analysis of the basic cross-tab for each level of another variable, such as subgroups of the of another variable, such as subgroups of the same samplesame sample
Percentage Cross tab : Do you have children Percentage Cross tab : Do you have children under the age of six currently living with under the age of six currently living with you? you?
Aged 17-25Aged 17-25 Aged 25 and up Aged 25 and upMaleMale FemalFemal
ee
YesYes 00 22
NoNo 1010 2020
MaleMale FemalFemalee
55 88
00 00
Calculating Rank DataCalculating Rank Data
Please place in rank order the Please place in rank order the following varieties of cookies (1= following varieties of cookies (1= most preferred to 4=least most preferred to 4=least preferred) preferred)
__ Chocolate chip__ Chocolate chip __ Marshmallow__ Marshmallow __ Oatmeal__ Oatmeal __ Oreo__ Oreo
Choco Choco chipchip
MarshMarshmm
OatmealOatmeal OreoOreo
11 11 22 44 33
22 11 33 44 22
33 22 11 33 44
44 22 44 33 11
55 22 11 33 44
66 33 44 11 22
77 22 33 11 44
88 11 44 22 33
99 44 33 22 11
1010 22 11 33 44
Chocolate chip: (3X1) +(4X2) + (2X3) +(1X4) = 21
Marshmallow: (3X1) +(1X2) + (3X3) +(3X4) = 26
Oatmeal: (2X1) +(2X2) + (4X3) +(3X4) = 26
Oreo: (2X1) +(2X2) + (2X3) +(4X4) = 28
Measures of central Measures of central tendencytendency Mode: the value that occurs most oftenMode: the value that occurs most often
Median: the midpoint; the value below Median: the midpoint; the value below which half the values in a distribution fallwhich half the values in a distribution fall
Mean: the arithmetic averageMean: the arithmetic average
Remember: what type of scale you use Remember: what type of scale you use determines the type of statistic you may determines the type of statistic you may calculatecalculate
WHEN TO USE WHICH WHEN TO USE WHICH MEASUREMEASUREMeasure of Measure of
Central Central TendencyTendency
Level of Level of MeasurementMeasurement
Use When Use When ExamplesExamples
ModeMode NominalNominal Data are categoricalData are categorical Eye color, party Eye color, party affiliationaffiliation
MedianMedian OrdinalOrdinal Data include extreme Data include extreme scoresscores
Rank in class, Rank in class, birth orderbirth order
MeanMean Interval and Interval and ratioratio
You can, and the data You can, and the data fitfit
Speed of Speed of response, age in response, age in yearsyears
Measures of dispersionMeasures of dispersion
What is the tendency for measures to What is the tendency for measures to depart from the central tendency?depart from the central tendency?
Range:Range: simplest measure of dispersion simplest measure of dispersion Deviation scoresDeviation scores- quantitative index of - quantitative index of
dispersiondispersion– VarianceVariance: the sum of squared deviation : the sum of squared deviation
scores divided by sample size minus 1- often scores divided by sample size minus 1- often used. (variance is in squared units, eg used. (variance is in squared units, eg squared dollars)squared dollars)
– Standard Deviation:Standard Deviation: square root of variance square root of variance
MEASURES OF MEASURES OF VARIABILITYVARIABILITY
Variability is the degree of spread or Variability is the degree of spread or dispersion in a set of scoresdispersion in a set of scores
Range—difference between highest and Range—difference between highest and lowest scorelowest score
Standard deviation—average difference Standard deviation—average difference of each score from meanof each score from mean
THE MEAN AND THE THE MEAN AND THE STANDARD DEVIATIONSTANDARD DEVIATION
STANDARD DEVIATIONS STANDARD DEVIATIONS AND % OF CASESAND % OF CASES
The normal curve is symmetricalThe normal curve is symmetrical One standard deviation to either side of the mean contains 34% of One standard deviation to either side of the mean contains 34% of
area under curvearea under curve 68% of scores lie within 68% of scores lie within ± 1 standard deviation of mean± 1 standard deviation of mean