43
Data Analysis Using SPSS Data Analysis Using SPSS By By Dr.R.RAVANAN Dr.R.RAVANAN Associate Professor Associate Professor Department of Statistics Department of Statistics Presidency College Presidency College Chennai – 600 005 Chennai – 600 005 E-mail: E-mail: [email protected] Mobile: 98403 75672 / 94442 Mobile: 98403 75672 / 94442 21627 21627

SPSS Def + Example_new_1!1!2011

Embed Size (px)

Citation preview

Page 1: SPSS Def + Example_new_1!1!2011

Data Analysis Using SPSSData Analysis Using SPSS

ByBy

Dr.R.RAVANANDr.R.RAVANANAssociate ProfessorAssociate Professor

Department of StatisticsDepartment of StatisticsPresidency CollegePresidency CollegeChennai – 600 005Chennai – 600 005

E-mail: E-mail: [email protected]: 98403 75672 / 94442 21627Mobile: 98403 75672 / 94442 21627

Page 2: SPSS Def + Example_new_1!1!2011

What is SPSS?What is SPSS?

Statistical Package for Social ScienceStatistical Package for Social Science

General Purpose Statistical SoftwareGeneral Purpose Statistical Software

Consists of three componentsConsists of three componentsData Window - data entry and database Data Window - data entry and database (.sav)(.sav)

Output Window - all output from any SPSS Output Window - all output from any SPSS session (.lst)session (.lst)

Syntax Window - commands lines (.sps)Syntax Window - commands lines (.sps)

Page 3: SPSS Def + Example_new_1!1!2011

Data Entry & PreparationData Entry & Preparation

Data entryData entry New or Recalled (SPSS or non-SPSS)New or Recalled (SPSS or non-SPSS)

Data DefinitionData Definition

Data Manipulation and Variable Data Manipulation and Variable DevelopmentDevelopment

Page 4: SPSS Def + Example_new_1!1!2011

Data DefinitionData Definition

Purpose: Purpose: Give meanings to the numbers for ease of Give meanings to the numbers for ease of

reading the outputreading the output

InvolvesInvolves Data FormatData Format Variable NameVariable Name Value LabelsValue Labels Missing ValuesMissing Values

Command: Command: Data Data Data Definition Data Definition

Page 5: SPSS Def + Example_new_1!1!2011

Data ManipulationData Manipulation

Recoding

To give new values to old values (especially reversing negatively worded questions)

To form nominal variable from continuous data

Variable Development

To form new variables combinations of old ones or functions of old ones

Command: Transform Recode/ Compute

Page 6: SPSS Def + Example_new_1!1!2011

Data Analysis - DescriptiveData Analysis - Descriptive

Purpose: To describe each variable - What is the current level of the variable of interest?Command Frequency Means, Minimum, Maximum, Standard Deviation, Quartiles, Standard Deviation Analyze Frequencies /Descriptives

Page 7: SPSS Def + Example_new_1!1!2011

Data Analysis - DescriptiveData Analysis - Descriptive

Frequencies for two or more nominal variables

Analyze Summarize Crosstabulation

Means of variables by subgroups defined by one or more nominal variables

Analyze Compare Means Means (Use of Levels)

Page 8: SPSS Def + Example_new_1!1!2011

Parametric Parametric Test of DifferencesTest of Differences

WhenWhendependent continuous variable and we dependent continuous variable and we want to test differences across groupswant to test differences across groups

CommandCommandAnalyze Analyze Compare Means Compare Means Independent t-test/ Paired t-test/ one-Independent t-test/ Paired t-test/ one-way ANOVAway ANOVA

Page 9: SPSS Def + Example_new_1!1!2011

Non-Parametric Test of Non-Parametric Test of DifferencesDifferences

WhenWhendependent variable ordinal or normal dependent variable ordinal or normal assumption not metassumption not met

CommandCommandAnalyze Analyze Non-parametric Non-parametric 2 2 Independent/ 2 related samples/ k Independent/ 2 related samples/ k independent samples/ k related independent samples/ k related samplessamples

Page 10: SPSS Def + Example_new_1!1!2011

Parametric Two-Way ANOVAParametric Two-Way ANOVA

WhenWhencontinuous dependent variable and continuous dependent variable and related groupsrelated groups

CommandCommandAnalyze Analyze General Linear Model General Linear Model Simple Simple

Note: Fixed Factor EffectNote: Fixed Factor Effect

Page 11: SPSS Def + Example_new_1!1!2011

Bivariate RelationshipBivariate Relationship

WhenWhenCovariation between two variablesCovariation between two variables

Correlation: Correlation: When both are continuous or ordinalWhen both are continuous or ordinal

CommandCommandAnalyze Analyze Correlate Correlate Bivariate (with Bivariate (with

option for Spearman if both ordinal)option for Spearman if both ordinal)

Page 12: SPSS Def + Example_new_1!1!2011

Regression AnalysisRegression Analysis

WhenWhenTo establish relationship between one continuous To establish relationship between one continuous dependent variable and a number of continuous dependent variable and a number of continuous independent variablesindependent variables

CommandCommandAnalyze Analyze Regression Regression Linear (Use Statistics, Save Linear (Use Statistics, Save

options)options)

Issues:Issues:Assumptions of Regression - normality; constant Assumptions of Regression - normality; constant variance, independence of independent variables; variance, independence of independent variables; independence of error termsindependence of error terms

Page 13: SPSS Def + Example_new_1!1!2011

Regression AnalysisRegression Analysis

Issues (cont.)Issues (cont.)Outliers and Leverage ValuesOutliers and Leverage Values

Choice of Selection Method of Independent Choice of Selection Method of Independent Variables - Enter, Backward, Forward, Variables - Enter, Backward, Forward, StepwiseStepwise

Dummy Independent VariablesDummy Independent Variables

OptionsOptionsResidual Analysis; Influence Statistics, Residual Analysis; Influence Statistics, Collinearity Diagnostics, Normality PlotsCollinearity Diagnostics, Normality Plots

Page 14: SPSS Def + Example_new_1!1!2011

Regression AnalysisRegression Analysis

InterpretationInterpretationGoodness of Model: RGoodness of Model: R22, F-statistics, , F-statistics, Adj. RAdj. R22, Standard error, Standard error

Strength of Influence of Independent Strength of Influence of Independent Variables: beta and standardized betaVariables: beta and standardized beta

Page 15: SPSS Def + Example_new_1!1!2011

Reliability AnalysisReliability Analysis

WhenWhenBefore forming composite index to a variable Before forming composite index to a variable from a number of items from a number of items

CommandCommandAnalyze Analyze Scale Scale Reliability Analysis (with Reliability Analysis (with

option for Descriptives item, scale, scale if option for Descriptives item, scale, scale if item deleted)item deleted)

InterpretationInterpretationalpha value greater than 0.7 is good; more alpha value greater than 0.7 is good; more than 0.5 is acceptable; delete some items if than 0.5 is acceptable; delete some items if necessarynecessary

Page 16: SPSS Def + Example_new_1!1!2011

Measures of ReliabilityMeasures of ReliabilityInternal Consistency: (of items in a scale):Internal Consistency: (of items in a scale):

1. Average inter-item correlation 1. Average inter-item correlation If average inter-item If average inter-item correlation > 0.6, then standardize items and add them correlation > 0.6, then standardize items and add them together as an index.together as an index.

2. 2. Cronbach's alpha Cronbach's alpha , which measures " internal consistency , which measures " internal consistency of items in a scale" Garson ,G.D.(1999) and isof items in a scale" Garson ,G.D.(1999) and is

Page 17: SPSS Def + Example_new_1!1!2011

Factor AnalysisFactor Analysis

WhenWhenTo reduce the number of variables to To reduce the number of variables to underlying dimensionsunderlying dimensions

CommandCommandAnalyze Analyze Data Reduction Data Reduction Factor (Option: Factor (Option:

rotation, save factor scores)rotation, save factor scores)

IssuesIssuesAssumptions sufficient correlations between Assumptions sufficient correlations between the variables (Bartlett test; anti-image, KMO the variables (Bartlett test; anti-image, KMO test of sufficiency)test of sufficiency)

Page 18: SPSS Def + Example_new_1!1!2011

Discriminant AnalysisDiscriminant Analysis

WhenWhenDependent Variable is Nominal and the Dependent Variable is Nominal and the Purpose is to predict group membership on Purpose is to predict group membership on the basis of independent variablesthe basis of independent variables

CommandCommandAnalyze Analyze Classify Classify Discriminant (Option: Discriminant (Option:

Classify by summary tables; Select - for Classify by summary tables; Select - for holdout and analysis samplesholdout and analysis samples

IssuesIssuesSimilar to RegressionSimilar to Regression

Page 19: SPSS Def + Example_new_1!1!2011

Discriminant AnalysisDiscriminant Analysis

InterpretationInterpretationGoodness of Analysis: Hits Ratio - Goodness of Analysis: Hits Ratio - compared to maximum chance, compared to maximum chance, proportional chance and Press Q.proportional chance and Press Q.

Univariate Results: To establish the Univariate Results: To establish the discriminating variablesdiscriminating variables

Page 20: SPSS Def + Example_new_1!1!2011

Exercise 1: t –TEST FOR SINGLE MEAN

Problem:

The satisfaction levels of 12 employee’s current job are given below:

Test whether the level of satisfaction are above average level at 1% level

Emp

No11 22 33 44 55 66 77 88 99 1010 1111 1212

Satisfaction level

S HS N HS D S HS N S HS S HS

Page 21: SPSS Def + Example_new_1!1!2011

Solution:

1. Null Hypothesis: The level of satisfaction of employees is equal to average level.

2. Alternate Hypothesis: The level of satisfaction of employees is not equal to average level

3. Test Statistic: t test for single mean is

Page 22: SPSS Def + Example_new_1!1!2011

Exercise 2: t -TEST FOR DIFFERENCE OF TWO MEANS(INDEPENDENT SAMPLE)

Problem:

The Marks obtained by a group of 9 regular students and another group of 11 part-time course students in a test are given below:

Regular 70 78 75 71 73 59 78 69 72

Part -Time 62 70 71 62 60 56 69 64 72 68 66

Examine whether the marks obtained by regular and part-time students differ significantly at 5% level of significance.

Page 23: SPSS Def + Example_new_1!1!2011

Solution:

1. Null Hypothesis: There is no significant difference between the average marks obtained by regular and Part time students2. Alternate Hypothesis: There is a significant difference between the average marks obtained by regular and Part-Time students. 3. Test Statistic: t test for difference of two means is

Page 24: SPSS Def + Example_new_1!1!2011

Exercise 3: PAIRED ‘t’ TEST FOR DIFFERENCE OF TWO MEANS (DEPENDENT SAMPLES)

Problem: A Company arranged an intensive training course for its team of salesmen. A random sample of 10 salesmen was selected and the value (in ‘000) of their sales made in the weeks immediately before and after the course are shown in the following table:

Salesmen 1 2 3 4 5 6 7 8 9 10

Sales Before 12 23 5 18 10 21 19 15 8 14

Sales After 18 22 15 21 13 22 17 19 12 16

Test whether there is evidence of an increase in mean sales.

Page 25: SPSS Def + Example_new_1!1!2011

Solution:

1. Null Hypothesis: There is no significant difference in mean sales of before and after the training course.

2. Alternate Hypothesis: There is significant difference in mean sales of before and after the training course.

3. Test Statistic: Paired t test for difference of two means is

Page 26: SPSS Def + Example_new_1!1!2011

Exercise 4: F-TEST FOR EQUALITY OF TWO VARIANCE

Problem:Time taken by workers in performing a job are given below

Method I 20 16 26 27 23 22

Method II 27 33 42 35 32 34 38

Test whether there is any significance difference between the variance of time distribution.

Page 27: SPSS Def + Example_new_1!1!2011

Solution:

1. Null Hypothesis: There is no significant difference between the variance of method I and method II

with regard to time distribution.

2. Alternate Hypothesis: There is significant difference between the variance of method I and method

II with regard to time distribution.

3. Test Statistic: F test for equality of variance is

Page 28: SPSS Def + Example_new_1!1!2011

Exercise 5: ANOVA (ONE WAY CLASSIFICATION)Exercise 5: ANOVA (ONE WAY CLASSIFICATION)

Problem:Problem:

The Following table gives the yields of 15 sample of plot under three varietiesThe Following table gives the yields of 15 sample of plot under three varietiesof seed.of seed.

Test whether there is significance difference in the average yield of three varieties of Test whether there is significance difference in the average yield of three varieties of seedseed

1. 1. Null HypothesisNull Hypothesis: There is no significant difference between average: There is no significant difference between average yield of three varieties of seedsyield of three varieties of seeds

22. . Alternate HypothesisAlternate Hypothesis: There is a significant difference between the: There is a significant difference between the average yield of three varieties of seedsaverage yield of three varieties of seeds..

Variety AVariety A 2020 2020 2323 1616 2020

Variety BVariety B 1818 2020 1717 1515 2525

Variety CVariety C 2525 2828 2222 2828 3232

Page 29: SPSS Def + Example_new_1!1!2011

Exercise 6 : ANOVA (TWO WAY CLASSIFICATION)

Problem: Perform a two-way ANOVA and test for the difference between varieties as well as blocks to the following data.

Variety Blocks

1 2 3 4

A 52 56 48 44

B 43 41 45 38

C 39 39 41 41

1. Null Hypothesis: There is no significant difference between the mean yields between varieties as well as

blocks.

2. Alternate Hypothesis: There is significant difference between the mean yields between varieties as well as blocks.

Page 30: SPSS Def + Example_new_1!1!2011

Exercise 7: CHI SQUARE TEST FOR GOODNESS OF FIT

Problem: A company keeps records of accidents. During a recent safety review, a random sample of 60 accidents was selected and classified by the day of the week on which they occurred.

Day Monday Tuesday Wednesday Thursday Friday

No of accidents

8 12 9 14 17

Test whether there is any evidence that accidents are more likely

on some days than others.

Page 31: SPSS Def + Example_new_1!1!2011

Solution:

1. Null Hypothesis: Accidents are equally distributed over the days of the week.

2. Alternate Hypothesis: Accidents are not equally distributed over the days of the week

3. Test Statistic: Chi-square test for goodness of fit is

Page 32: SPSS Def + Example_new_1!1!2011

Exercise 8: CHI SQUARE TEST FOR INDEPENDENCE OF ATTRIBUTES

Problem:The following table gives the data relating to the condition of child and condition of home. Test whether the two attributes are independent.

Condition of Child

Condition of Home

Clean Dirty

Clean 70 50

Fairly clean 80 20

Dirty 35 45

Page 33: SPSS Def + Example_new_1!1!2011

Solution:

1. Null Hypothesis: There is no association between condition of child and condition of home.

2. Alternate Hypothesis: There is an association between condition of child and condition of home.

3. Test Statistic: Chi-square test for independence of attributes is

Page 34: SPSS Def + Example_new_1!1!2011

Exercise 9: TEST FOR SIGNIFICANCE OF CORRELATION COEFFICIENT

Problem:

Find the correlation coefficient between income and expenditure of the family to the following data. Also test whether correlation coefficient is significant.

Income ( in hundreds)

60 58 45 65 56 38 70

Expenditure (in hundreds)

55 50 40 60 62 45 63

Page 35: SPSS Def + Example_new_1!1!2011

Solution:First find the coefficient of correlation by using the formula

1. Null Hypothesis: There is no relationship between income and expenditure of the family

2. Alternate Hypothesis: There is relationship between income and expenditure of the family3. Test Statistic: t test for coefficient of correlation is

Page 36: SPSS Def + Example_new_1!1!2011

Exercise: 10 REGRESSION ANALYSISProblem:The following table gives the food expenditure, annual income and family size of 10 families. Fit a multiple regression equation of Food Expenditure on annual family Income and family Size..

FamilyFamily Annual Food Annual Food Expenditure (‘000)Expenditure (‘000)

Annual Income(‘000)Annual Income(‘000) Family Size (number in family)Family Size (number in family)

11 5.25.2 2828 33

22 5.15.1 2626 33

33 5.65.6 3232 22

44 4.64.6 2424 11

55 11.311.3 5454 44

66 8.18.1 2929 22

77 7.87.8 4444 33

88 5.85.8 3030 22

99 5.15.1 4040 11

1010 18.018.0 8282 66

Page 37: SPSS Def + Example_new_1!1!2011

The regression model is

Page 38: SPSS Def + Example_new_1!1!2011

Non-Parametric TestNon-Parametric Test

One sample test:One sample test:– Binomial TestBinomial Test– Chi-Square test for goodness of fitChi-Square test for goodness of fit– Kolmogorov-Smirnov one sample testKolmogorov-Smirnov one sample test

Two Independent sample:Two Independent sample:– Fisher Exact testFisher Exact test– Chi-Square test for intendance of attributesChi-Square test for intendance of attributes– Median testMedian test– Mann-Whitney U testMann-Whitney U test– Kolmogorov-Smirnov Two sample testKolmogorov-Smirnov Two sample test

Page 39: SPSS Def + Example_new_1!1!2011

Non-Parametric TestNon-Parametric TestTwo dependent sampleTwo dependent sample

– McNemar testMcNemar test– Sign testSign test– Wilcoxon Matched-Pairs signed rank testWilcoxon Matched-Pairs signed rank test– Walsh testWalsh test

More than two independent samplesMore than two independent samples– Krushkal_Wallis one-way analysisKrushkal_Wallis one-way analysis– Chi-square test for k impendent sampleChi-square test for k impendent sample– Extention of Median testExtention of Median test

More than two dependent samplesMore than two dependent samples– Friedman Two way analysisFriedman Two way analysis– Cochran Q testCochran Q test

Page 40: SPSS Def + Example_new_1!1!2011

Mann-Whitney U testMann-Whitney U testMann-Whitney U test isMann-Whitney U test is

WhereWhere

Page 41: SPSS Def + Example_new_1!1!2011

Wilcoxon testWilcoxon testWilcoxon test isWilcoxon test is

WhereWhere T = Sum of rank with less frequent signT = Sum of rank with less frequent sign

Page 42: SPSS Def + Example_new_1!1!2011

Krushkal-Wallis one-way analysisKrushkal-Wallis one-way analysis

Krushkal - Wallis Krushkal - Wallis test istest is

WhereWhere R R = Sum of rank of each group = Sum of rank of each group

N = Total number of observationsN = Total number of observations

n = Number of observation in each groupn = Number of observation in each group

k = Number of groupsk = Number of groups

Page 43: SPSS Def + Example_new_1!1!2011

Friedman Two way analysisFriedman Two way analysis

Friedman Friedman test istest is

WhereWhere R R = Sum of rank of each items = Sum of rank of each items

N = Total number of observationsN = Total number of observations

k = Number of itemsk = Number of items