View
219
Download
0
Category
Preview:
Citation preview
8/10/2019 Young Children Job Satisfaction
1/49
8/10/2019 Young Children Job Satisfaction
2/49
Slide 2
Stage One: Define the Research Problem
In this stage, the following issues are addressed:
Relationship to be analyzed
Specifying the dependent and independent variablesMethod for including independent variables
Young Children and Job Satisfaction
Relationship to be analyzed
"We are interested in examining the effect of young children on the job satisfaction ofmen and women involved in a variety of work and family roles to see how the presenceof family responsibilities affects their happiness at work. The research is comparative. Itinvolves contrasts between men and women in different work and marital statuses asseveral points in time." (page 800)
8/10/2019 Young Children Job Satisfaction
3/49
Slide 3
Specifying the dependent and independent variables
The dependent variable is job satisfaction, measured on a four category Likert-scale:1=Very Satisfied, 2=Moderately Satisfied, 3=A Little Dissatisfied, and 4=Very Dissatisfied.Because the data does not follow a normal distribution (See page 803-804), the authors
recoded the variable to a dichotomous variable where 1 = Very Satisfied and 0 =Moderately Satisfied to Very Dissatisfied. The purpose of the analysis, then, is todetermine what factors contribute to a high level of job satisfaction versus some otherlevel of job satisfaction. With a dichotomous dependent variable, logistic regressionbecomes the analytic techniques of choice.
The independent variables are grouped into two categories:
1. Individual and family characteristics (age, race, education, spouse's work status,
prestige of spouse's occupation, number of children, presence of young children, generalhappiness, and satisfaction with family)
2. Job characteristics (income, job prestige, job authority, job autonomy,convenience (number of hours worked per week), and past work experience).
The variable presence of young children is important to answering the main question ofthe article.
Other variables, which could have been included as independent variables, were used todivide the sample into subgroups which were compared with each other to answer theresearch questions. For example, Sex and Work Status were combined to form acomposite variable WORK_SEX. We will use these variables with the SPSS "Select Casescommand to produce the results for different groups.
Young Children and Job Satisfaction
8/10/2019 Young Children Job Satisfaction
4/49
Slide 4
Method for including independent variables
With a dichotomous dependent variable and a variety of independent variables, thestatistical technique to use is logistic regression. While we could structure the analysisto do hierarchical entry of variables (individual, family characteristics, and job
characteristics in block 1 and the presence of young children in block 2), we will usedirect entry of all variables on a single step to conform to the authors analysis.
Young Children and Job Satisfaction
8/10/2019 Young Children Job Satisfaction
5/49
Slide 5
Stage 2: Develop the Analysis Plan: Sample Size Issues
In this stage, the following issues are addressed:
Missing data analysis
Minimum sample size requirement: 15-20 cases per independent variable
Young Children and Job Satisfaction
8/10/2019 Young Children Job Satisfaction
6/49
Slide 6
Missing data analysis
In the missing data analysis, we are looking for a pattern or process whereby the patternof missing data could influence the results of the statistical analysis.
The data set for this problem is used for a large number of analyses in the article. Notall variables and cases are used in each analysis, so it makes sense to conduct themissing data analysis on the cases and variables to be included in the problem in thisexercise.
We will compute the logistic regression model for 1976-77 married, full-time males aspresented in table 2 on page 807. (Note: this analysis does not include the independentvariables SPOCCUP 'Spouses Occupation' and EVWORK 'Ever Work as Long as One Year').
First, we will exclude the cases not used in this exercise and then we will examinemissing data for the variables used in this exercise.
Young Children and Job Satisfaction
8/10/2019 Young Children Job Satisfaction
7/49Slide 7
Specify the Cases to Include in this Analysis
Young Children and Job Satisfaction
8/10/2019 Young Children Job Satisfaction
8/49Slide 8
Enter the Selection Criterion
Young Children and Job Satisfaction
8/10/2019 Young Children Job Satisfaction
9/49Slide 9
Run the MissingDataCheck Script
Young Children and Job Satisfaction
8/10/2019 Young Children Job Satisfaction
10/49Slide 10
Complete the 'Check for Missing Data' Dialog Box
Young Children and Job Satisfaction
8/10/2019 Young Children Job Satisfaction
11/49Slide 11
Number of Valid and Missing Cases per Variable
Two independent variables have relatively large numbers of missing cases:JCINCOME 'Job Characteristic - Income' and AUTHORIT 'Job Characteristic - Authority'.
However, all variables have valid data for 90% or more of cases, so no variables will beexcluded for an excessive number of missing cases.
Young Children and Job Satisfaction
8/10/2019 Young Children Job Satisfaction
12/49Slide 12
Frequency of Cases that are Missing Variables
Next, we examine the number of missing variables per case. Of the possible 14 variablesin the analysis (13 independent variables and 1 dependent variable), one cases wasmissing half of the variables (7) and should be excluded from the remaining analyses.
Young Children and Job Satisfaction
8/10/2019 Young Children Job Satisfaction
13/49
8/10/2019 Young Children Job Satisfaction
14/49
Slide 14
Correlation Matrix of Valid/Missing Dichotomous Variables
The largest correlation in the matrix of valid/missing data (not shown) is 0.363. None ofthe correlations for missing data values are above the weak level, so we can deletemissing cases without fear that we are distorting the solution.
Young Children and Job Satisfaction
8/10/2019 Young Children Job Satisfaction
15/49
Slide 15
Minimum sample size requirement:15-20 cases per independent variable
If we accept the SPSS default of listwise deletion of missing data, we will have 538 casesin the analysis. The ratio of cases to independent variables is 538/13 or 41 to 1. We
meet this requirement.
Young Children and Job Satisfaction
8/10/2019 Young Children Job Satisfaction
16/49
Slide 16
Stage 2: Develop the Analysis Plan: Measurement Issues:
In this stage, the following issues are addressed:
Incorporating nonmetric data with dummy variables
Representing Curvilinear Effects with PolynomialsRepresenting Interaction or Moderator Effects
Young Children and Job Satisfaction
Incorporating Nonmetric Data with Dummy Variables
All of the nonmetric variables have recoded into dichotomous dummy-coded variables.
Representing Curvilinear Effects with Polynomials
We do not have any evidence of curvilinear effects at this point in the analysis.
Representing Interaction or Moderator Effects
We do not have any evidence at this point in the analysis that we should add interactionor moderator variables.
8/10/2019 Young Children Job Satisfaction
17/49
Slide 17
Stage 3: Evaluate Underlying Assumptions
In this stage, the following issues are addressed:
Nonmetric dependent variable with two groups
Metric or dummy-coded independent variables
Young Children and Job Satisfaction
Nonmetric dependent variable having two groups
The dependent variable 'Job satisfaction' was recoded into dichotomous categories.
Metric or dummy-coded independent variables
Marital status, race, spouse's work status, presence of young children, job authority, jobautonomy, and ever worked as long as one year are all coded as dichotomous variables.
Age of respondent, highest year of school completed, prestige of spouse's occupation,number or children, general happiness, satisfaction with family, income, job prestige,hours worked (convenience), and year of the survey can be treated as metric variables.
8/10/2019 Young Children Job Satisfaction
18/49
Slide 18
Stage 4: Estimation of Logistic Regression andAssessing Overall Fit: Model Estimation
In this stage, the following issues are addressed:
Compute logistic regression model
Young Children and Job Satisfaction
Compute the logistic regression
The steps to obtain a logistic regression analysis are detailed on the following screens.
If the cases to be included in this analysis were not selected in the missing data analysis,the selection needs to be completed before proceeding.
8/10/2019 Young Children Job Satisfaction
19/49
8/10/2019 Young Children Job Satisfaction
20/49
Slide 20
Specifying the Dependent Variable
Young Children and Job Satisfaction
8/10/2019 Young Children Job Satisfaction
21/49
8/10/2019 Young Children Job Satisfaction
22/49
Slide 22
Specify the method for entering variables
Young Children and Job Satisfaction
8/10/2019 Young Children Job Satisfaction
23/49
Slide 23
Specifying Options to Include in the Output
Young Children and Job Satisfaction
8/10/2019 Young Children Job Satisfaction
24/49
Slide 24
Specifying the New Variables to Save
Young Children and Job Satisfaction
8/10/2019 Young Children Job Satisfaction
25/49
Slide 25
Complete the Logistic Regression Request
Young Children and Job Satisfaction
8/10/2019 Young Children Job Satisfaction
26/49
Slide 26
Stage 4: Estimation of Logistic Regression andAssessing Overall Fit: Assessing Model Fit
In this stage, the following issues are addressed:
Significance test of the model log likelihood (Change in -2LL)Measures Analogous to R: Cox and Snell R and Nagelkerke RHosmer-Lemeshow Goodness-of-fitClassification matricesCheck for Numerical ProblemsPresence of outliers
Young Children and Job Satisfaction
8/10/2019 Young Children Job Satisfaction
27/49
Slide 27
Initial statistics before independent variables are included
The Initial Log Likelihood Function, (-2 Log Likelihood or -2LL) is a statistical measurelike total sums of squares in regression. If our independent variables have a relationshipto the dependent variable, we will improve our ability to predict the dependent variable
accurately, and the log likelihood value will decrease. The initial 2LL value is 742.850on step 0, before any variables have been added to the model.
Young Children and Job Satisfaction
8/10/2019 Young Children Job Satisfaction
28/49
Slide 28
Significance test of the model log likelihood
The difference between these two measures is the model child-square value (57.153 =742.850 685.697) that is tested for statistical significance. This test is analogous to theF-test for R or change in R value in multiple regression which tests whether or not the
improvement in the model associated with the additional variables is statisticallysignificant.
In this problem the model Chi-Square value of 57.153 has a significance of 0.000, lessthan 0.05, so we conclude that there is a significant relationship between the dependentvariable and the set of independent variables.
Young Children and Job Satisfaction
8/10/2019 Young Children Job Satisfaction
29/49
Slide 29
Measures Analogous to R
The next SPSS outputs indicate the strength of the relationship between the dependentvariable and the independent variables, analogous to the R measures in multipleregression.
The Cox and Snell R measure operates like R, with higher values indicating greatermodel fit. However, this measure is limited in that it cannot reach the maximum valueof 1, so Nagelkerke proposed a modification that had the range from 0 to 1. We will relyupon Nagelkerke's measure as indicating the strength of the relationship.
Based on the interpretive criteria, we would characterize this model as weak.
Young Children and Job Satisfaction
8/10/2019 Young Children Job Satisfaction
30/49
Slide 30
Correspondence of Actual and Predicted Valuesof the Dependent Variable
The final measure of model fit is the Hosmer and Lemeshow goodness-of-fit statistic,which measures the correspondence between the actual and predicted values of thedependent variable. In this case, better model fit is indicated by a smaller difference in
the observed and predicted classification. A good model fit is indicated by anonsignificant chi-square value.
The goodness-of-fit measure has a value of 5.678 which has the desirable outcome of
nonsignificance.Young Children and Job Satisfaction
8/10/2019 Young Children Job Satisfaction
31/49
Slide 31
The Classification Matrices
The classification matrices in logistic regression serve the same function as theclassification matrices in Young Children and Job Satisfaction, i.e. evaluating theaccuracy of the model.
To evaluate the accuracy of the model, we compute the proportional by chance accuracyrate and the maximum by chance accuracy rates, if appropriate. Since the sizes of thegroups in this problem are equal to 46% and 54%, the proportional accuracy criterion isappropriate because we do not have a dominant group.
The proportional by chance accuracy rate is equal to 0.503 (0.463^2 + 0.537^2). A 25%increase over the by chance accuracy rate would equal 0.628.
Our model accuracy race of 63.2% meets this criterion.
8/10/2019 Young Children Job Satisfaction
32/49
8/10/2019 Young Children Job Satisfaction
33/49
Slide 33
Check for Numerical Problems
There are several numerical problems that can in logistic regression that are notdetected by SPSS or other statistical packages: multicollinearity among the independentvariables, zero cells for a dummy-coded independent variable because all of the
subjects have the same value for the variable, and "complete separation" whereby thetwo groups in the dependent event variable can be perfectly separated by scores on oneof the independent variables.
All of these problems produce large standard errors (over 2) for the variables included inthe analysis and very often produce very large B coefficients as well. If we encounterlarge standard errors for the predictor variables, we should examine frequency tables,one-way ANOVAs, and correlations for the variables involved to try to identify the sourceof the problem.
The standarderrors and Bcoefficients arenot excessivelylarge, so there isno evidence of anumeric problemwith this analysis.
Young Children and Job Satisfaction
8/10/2019 Young Children Job Satisfaction
34/49
Slide 34
There are two outputs to alert us to outliers that we might consider excluding from theanalysis: listing of residuals and saving Cook's distance scores to the data set.
SPSS provides a casewise list of residuals that identify cases whose residual is above orbelow a certain number of standard deviation units. Like multiple regression there are avariety of ways to compute the residual. In logistic regression, the residual is thedifference between the observed probability of the dependent variable event and thepredicted probability based on the model. The standardized residual is the residualdivided by an estimate of its standard deviation. The deviance is calculated by takingthe square root of -2 x the log of the predicted probability for the observed group andattaching a negative sign if the event did not occur for that case. Large values fordeviance indicate that the model does not fit the case well. The studentized residual
for a case is the change in the model deviance if the case is excluded. Discrepanciesbetween the deviance and the studentized residual may identify unusual cases. (See theSPSS chapter on Logistic Regression Analysis for additional details).
In the output for our problem, SPSS listed one cases that have may be considered anoutlier with a studentized residuals greater than 2:
Presence of outliers
Young Children and Job Satisfaction
8/10/2019 Young Children Job Satisfaction
35/49
Slide 35
Cooks Distance
SPSS has an option to compute Cook's distance as a measure of influential cases and addthe score to the data editor. I am not aware of a precise formula for determining whatcutoff value should be used, so we will rely on the more traditional method for
interpreting Cook's distance which is to identify cases that either have a score of 1.0 orhigher, or cases which have a Cook's distance substantially different from the other. Theprescribed method for detecting unusually large Cook's distance scores is to create ascatterplot of Cook's distance scores versus case id.
SPSS Sample Problem
8/10/2019 Young Children Job Satisfaction
36/49
8/10/2019 Young Children Job Satisfaction
37/49
Slide 37
Specifying the Variables for the Scatterplot
Young Children and Job Satisfaction
8/10/2019 Young Children Job Satisfaction
38/49
Slide 38
The Scatterplot of Cook's Distances
Horizontal gridlines were added to the scatterplot to aid interpretation. Based on thegridlines, we can identify four cases with Cook's distances about 0.175 as influentialcases.
After sorting the data set by theCook's distance variable, weidentify the four cases as havingid numbers: 99, 1807, 1833, and1953. None of these cases wereincluded on the casewise listingfor large studentized residuals.
Based on these outputs, weidentify five cases out of 538 thatare potential outliers. Since thenumber of outliers representsless than 1% of the sample andnone of the outliers are reallyextreme, I will opt to retain themin the analysis.
Young Children and Job Satisfaction
8/10/2019 Young Children Job Satisfaction
39/49
Slide 39
Stage 5: Interpret the Results
In this section, we address the following issues:
Identifying the statistically significant predictor variables
Direction of relationship and contribution to dependent variable
Young Children and Job Satisfaction
8/10/2019 Young Children Job Satisfaction
40/49
Slide 40
Identifying the statistically significant predictor variables
The table of variables in the equation identifies for us the predictor variables that havea statistically significant individual relationship to the dependent variable. Scanning the'Sig' column, we identify four variables that have a significance level less than
0.05: GENHAPPY 'How Happy Generally', PRESTIGE 'Job Characteristic - Prestige',CONVENIE 'Job Characteristic - Convenience', and YEAR 'GSS Year for Respondent'.
Young Children and Job Satisfaction
8/10/2019 Young Children Job Satisfaction
41/49
Slide 41
Direction of relationship and contribution to dependent variable - 1
The sign of the B coefficients indicates whether the predictor variable increased ordecreased the likelihood of belonging to the group of respondents who were verysatisfied with their jobs.
Young Children and Job Satisfaction
8/10/2019 Young Children Job Satisfaction
42/49
Slide 42
Direction of relationship and contribution to dependent variable - 2
The coefficient signs for the variables GENHAPPY 'How Happy Generally', PRESTIGE 'JobCharacteristic - Prestige', and CONVENIE 'Job Characteristic - Convenience' were all
positive, indicating that a higher score on these variables enhanced the likelihood ofbelonging to the group that was very satisfied with their jobs. The coefficient for YEARwas negative, indicating that job satisfaction has been declining in later years of thesurvey.
The magnitude of change associated with each independent variable is given in the oddsratio column labeled 'Exp (B)'. This column indicates the increased or decreased odds ofbelonging to the group that was very satisfied with their jobs.
For each unit increment on the measure of overall happiness, a respondent was 1.76times more likely to be very satisfied with his or her job. For each unit increment in jobprestige, a subject was 1.02 times as likely to be very satisfied with his or her job. Foreach unit increment in job convenience (or hours worked), a subject was 1.02 times aslikely to be very satisfied with his or her job. Finally, for each increase in year, asubject was 0.65 times as likely to be very satisfied with his or her job, i.e. was lesslikely to be satisfied.
Important to the research question raised by the authors is the finding that
CHILDLT6 'Presence of Young Children' did not have a statistically significant impact onjob satisfaction.
Young Children and Job Satisfaction
8/10/2019 Young Children Job Satisfaction
43/49
8/10/2019 Young Children Job Satisfaction
44/49
Slide 44
Set the Starting Point for Random Number Generation
Young Children and Job Satisfaction
8/10/2019 Young Children Job Satisfaction
45/49
8/10/2019 Young Children Job Satisfaction
46/49
Slide 46
Specify the Cases to Include in the First Screening Sample
Young Children and Job Satisfaction
8/10/2019 Young Children Job Satisfaction
47/49
S if th V l f th S l ti V i bl
8/10/2019 Young Children Job Satisfaction
48/49
Slide 48
Specify the Value of the Selection Variablefor the Second Validation Analysis
Young Children and Job Satisfaction
8/10/2019 Young Children Job Satisfaction
49/49
Generalizability of the Logistic Regression Model
Only one predictor variable, CONVENIE 'Job Characteristic - Convenience, has a stable,statistically significant relationship to the dependent variable, Job Satisfaction.In addition, the accuracy that we should evaluate in assessing our model is in the 56% to59% range rather than in the 63% to 72% range. At this accuracy rate, the model doesnot represent a 25% increase over the proportional by chance accuracy rate.
In sum, we do find a relationship between one of the independent variables and jobsatisfaction. Our findings should be regarded as tentative or exploratory rather thandefinitive because we would not meet the classification accuracy rate required for ausable model
Full Model
Split=0
Split=1
Model Chi-Square
57.153, p=.0000
54.386, p
Recommended