23

Click here to load reader

project econ 436511

Embed Size (px)

Citation preview

Page 1: project econ 436511

1

Joshua Shea

Wage gap and Variations Between

Gender and Race

Dr Natalia Zhivan

Page 2: project econ 436511

2

Wage inequality and variation

Everybodys main goal at work is to be as efficient as possible and to make as much

money as they can. Although some people may be more efficient or “better at their job” than

the other person they still might get paid less because of biased opinions. People will have

biased opinions on males, females, and their ethnicity. Some biases will be negative some will

be posotive. Allmost all of the biased opinions will be misenterprenting the persons assets. For

example if a women works at a collection agency and is compared to her counterpart male she

might be held at a level below him. These levels might even increase if she has children or is a

different race than him. They might have biased opinions about her if she has children because

they think she might not be as energized as him, cant spend as much time at work,or just isn’t

as good as him because she is a “women”. These mysoganist opinions keep women out of

competitive work positions and can cause negative variations in their salry. Many races also

might be discriminated against because the companies might not think that they are good as

their “white” or other counterparts.

The Huffington post provides a good regression or comparison of how women’s

earnings compare between their different ethnicities and agianst their male counterparts also.

As we can see from looking at this table that all women “as a whole” held up against their same

ethnicity male counterparts have lower earnings. The gap between women and same ethnicity

males are relatively close ranging from 89%-78%. This shows that there is some a large

posibailty of mysoginistic behavior or biased towards women. The variation also could be

explained by women not being able to work full time because of their obligations to children or

other family members. This chart also shows that people do not view race equally.

But how do different ethnicity women hold up agianst their white male counterparts.

They ragress white mens earnings agains womens earnings in different ethnicity catagories. We

find that the variation in womens and their male counterparts earnings become much larger

now. The gape between womens and their same ethnicity male counterparts was 22% at the

largest. When they are ran against their white male counterparts it increases to a maximum

57% varation. This shows that there is a great amount of biased opinions on race and it only

Page 3: project econ 436511

3

becomes greater when you factor in women. We find that Asian women are the closest to their

white male counterparts at 87% of male earnings. Hispanic women are the furthest away from

white male earnigns only making 53% of their white male counterparts.

We should also look at how other ethnicities aren’t paid as much as their white counterparts

excluding the fact that they’re women. Rakesh Kochhar states in his article that “the median

wealth of white households is 20 times that of black households and 18 times that of hispanic

households”. The numbers that he provides us are staggering. He states that black households

had $5,677 in assets, Hispanics $6,325 in assets, and a white household has $113,149. This just

shows how much wage differentiates. We hope to have our model explain why this is and if not

to readjust the parameters and explanatory variables to where it can tell us.

Date of Our Empirical Model

In our model we are going to run regressions and on wage and natural log of wage to

see how wage varies across different variables. We will try to explain why wage varies between

these variables and determine how “good” our model and parameters are. In our model we run

Page 4: project econ 436511

4

the regression ln_wage= age female college hisp black hours part_time child6. All of these

commands or inputs that we run in the regression in Stata stand for different variables. Age is a

continuous variable. Age is the age of an individual measured in years. Female is a dummy

variable. Female is the gender of the person, 1 if female and 0 if male. College is a dummy

variable. College is if the person graduated college or not. College equals 1 if graduated and 0 if

the person hasn’t. Black is a dummy variable. Black is the race of the person. 1 if black 0 if

other. Hispanic is a dummy variable. Hispanic is the ethnicity of the person. 1 if the person is

Hispanic and 0 if other. Health is a categorical variable. Health ranges from 1= excellent, 2=very

good, 3=good, 4= fair, 5=poor. Child 6 is a categorical variable. Child6 is the number of children

under the age of 6. Wage is a continuous variable. Wage is the annual earnings measured in $1

increments. Hours is a continuous variable. Hours is the number of hours worked last week

measured in 1 hour increments. Part_time is a dummy variable. 1 if part time 0 if not part time.

July_temp is a continuous variable. July_temp is the average july temperature in the county of

residence measured in Fahrenheit. Pop65 is a continuous variable. Pop65 is the share of the

population older than the age of 65.

Count 1047

variable observation Mean Std. dev. Min Max

Sum

female

1047 .4555874 .4982616 0 1

Sum

college

1047 .2941738 .4558882 0 1

Sum age 1047 33.53391 7.855929 18 45

Sum

wage

1047 39653.52 48878.27 12 619221

Sum

hours

1047 37.19962 15.24388 0 99

Sum

july_temp

1047 80.03084 4.991196 62.23553 94.44621

Sum 1047 .252078 .0500064 .0978648 .3961477

Page 5: project econ 436511

5

pop65

Describing The Data

Our sample size is 1047 individuals. For female our mean is 45%. So our sample is 45%

female. For college our mean is 29.41%. So 29.41% of our sample has graduated college. For

age our mean is 33.53. This means that the average age of our sample is 33.5 years old. The

minimum age for or sample is 18 and the max is 45. The mean for wage is 39653.52. This mean

the average wage that the individual makes in our sample is $39653.52 a year. The minimum

wage a year for our population is $12 a year and the max is $619221. The mean for hours is

37.19962. This mean that the average amount of hours worked by our population is 37.2 a

week. The minimum amount of hours worked in a week by a individual in our sample is 0 and

the max is 99. The mean for July_temp is 80.03084. This means that our average temperature

was 80 degrees Fahrenheit. The minimum temperature for a sample is 62 degrees and the

maximum is 94 degrees. The mean for pop65 is 25.2078%. This means that 25.2% of our

population is 65 or older.

Histogram.

Page 6: project econ 436511

6

05.

0e-0

61.

0e-0

51.

5e-0

52.

0e-0

5D

ens

ity

0 200000 400000 600000Person's total earnings

We create a histogram of wage, age, and hours worked. Our sample is skewed to the

right we do not have a normal distribution. If the error term was normally distributed it would

be in the shape of a bell. Since the error term does not have a normal distribution we can no

longer trust our hypothesis testing. OLS is no longer Best Linier Unbiased Estimator.

Correlation Matrix

Variables Age Black college Wage Hours

Age 1.000

Black 0.0039 1.000

College 0.1926 0.0315 1.000

Wage 0.2532 0.0174 0.2743 1.000

Hours 0.1406 0.0083 0.0305 0.1949 1.000

In our correlation matrix there are a couple variables that are highly correlated. Wage

and age are highly correlated at 0.2532. This makes sense because the older someone is the

Page 7: project econ 436511

7

more likely they are skilled in their position and the more likely that they are educated. College

and age are highly correlated at 0.1926. We can expect as the persons age goes up it is more

likely they have a college degree. This makes sense because most college students start at 18

and don’t graduate till their mid twentys. College and wage is also highly correlated at 0.2743.

This makes sense because after graduating college the individual is smarter and has a higher

number of skillsets. Hours and wage are also highly correlated at 0.1949. We can expect that

the more hours someone works the higher their wage is going to be. This makes sense because

of the possibility of overtime where someone gets paid a time and a half after working 40

hours.

Ttest wage, by (black)

Group obs mean Std. err. Std. dev. 95% conf.

interval

0 869 39268.2 1582.823 46659.74 36161.59 42374.8

1 178 41534.7 4394.655 58632.01 32862.03 50207.36

Combined 1047 39653 1510.576 48878.27 36689.42 42617.63

diff -2266.501 4022.639 -10159.87 5626.869

Ha: diff !=0

Pr(|T|>|t|)=0.5733

When we run the t-test against wage and being black we find that blacks make $2266.50

more dollars a year. We find that it is not statistically significant at 0.5733. Since it is not

statistically significant we cannot trust the numbers that it gives us.

Ttest wage, by (female)

Group obs mean Std. err. Std. dev. 95% conf.

interval

0 570 4650.73 2164.591 51678.91 42253.17 50756.29

1 477 31466.55 2013.623 43978.19 27509.86 35423.23

Combined 1047 39653.52 1510.576 28878.27 36689.42 42617.63

Page 8: project econ 436511

8

diff 15038.18 2889.723 9153.981 20922.39

Ha: diff !=0

Pr(|T|>|t|)=0.00

When we run the t-test against wage and female we find that women make $15038.18

less a year. This shows that there is a possibility of discrimination, less educated, or there is a

possibility that women work less hours. We do find that it is statistically significant at 0.00. We

can trust the results that it gives us.

Ttest hours, by female

Group obs mean Std. err. Std. dev. 95% conf.

interval

0 570 41.59123 .603649 14.41158 40.4056 42.77685

1 477 31.95178 .6661574 14.5491 30.64281 33.26075

Combined 1047 37.19962 .47111 15.24388 36.27519 38.12405

diff 9.639446 .8982078 7.87695 11.40194

Ha: diff !=0

Pr(|T|>|t|)=0.00

When we run the t-test of hours by female we find that women work 31.95 hours a

week while men work 41.59 hours a week. This helps reassure our belief from the table above

when we ran a t-test for wage and female. Women do make less money because they work less

than their male counterparts. Women working less is likely the cause of having children and

having to take care of other family members while the men work more hours because they get

paid at a higher rate. It is statistically significant so we can use these results and have

confidence in them.

Page 9: project econ 436511

9

How Relative Are Our Variables?

When we run our regression model we want to make sure that all of our variables are

good fits and that they make sense being in the equation. The reason for this is if we don’t do

this we might break the classical assumptions and OLS will no longer be BLUE, and are results

will not be significant or trustworthy. When we run our regression to find out how wage varies

we need to make sure everything is theoretically sound. In our regression our wage is the

dependent variable or “Y”. If we go examine the variables in our model we can determine

which ones should be included. We should include age because with a increase in age wage

tends to go up as well. We can see when we run our regression ln_wage= age female college

hisp black hours part_time child6. Ages coefficient is .0345 so every year wage goes up .0345

when we include age. We should include female because we want to find out why there is a

difference in wages earned. We should include college in our model also. College tells how

wage goes up with education. We should include hours and part time because it can help

explain why people get paid differently with the variations of hours worked. We should also

include child6 because it explains why people work less hours. All of these variables above that

I have stated are statistically significant. When we include black into the equation it is not

statistically significant. The overall fit of our model isn’t very good Rsquared=0.3435. I believe if

we want the overall fit of the equation to be a better fit we need to add variables that might

have been omitted. Some of the variables that I thought might be omitted are IQ, quality of

university, location of living, and job position. One reason why I wanted to add the variable

location of living is because in some cities the cost of living is higher than others. We do not

include july_temp because it has no effect on our model, and we don’t include pop65 because

it is too small for our regression model to have any affect.

For the regression model we want to make sure that we know the expected signs. The

reason why we want to make sure that we know the signs is so we can tell if it would be a good

fit for our model. B:age I expect age to have a positive sign because with the increase in age we

expect an increase in wage. B:hisp we expect Hispanic to have a negative sign because of

Page 10: project econ 436511

10

barriers to entry and discrimination. B:black we expect black to have a positive sign because we

think it will bring in more money every year. B:female all else being equal we expect female to

have a negative sign. We expect women to make less than their counterpart men. B:college all

else being equal we expect college to have a positive sign. We expect college to bring in more

money yearly because of the skills it provides to people. B:hours we expect hours to have a

positive sign. We expect that the more hours you work the more money you will make.

B:part_time all else being equal we expect part time to have a negative sign. We expect part

time to have a negative sign because they work fewer hours than full time employees. The less

hours worked the fewer money brought in yearly. B:child6 all else being equal we expect child6

to have a positive sign.

Variable B hat Standar

error

Significant

at 1%

Significance

at 5%

Significance

at 10%

Age 0.032 -.003 2.326 1.645 1.282

Hisp -0.134 -.063 2.326 1.645 1.282

Black -.058 -.064 2.326 1.645 1.282

Female -.283 -.048 2.326 1.645 1.282

college 0.509 -.053 2.326 1.645 1.282

Hours 0.013 -.002 2.326 1.645 1.282

Part_time -0.283 -.075 2.326 1.645 1.282

Page 11: project econ 436511

11

ln_wage coef Std. Err. t P>|t|

Age .0345185 .003263

2

10.5

8

0.000 .0281153 .0409217

Female -.2850152 .052757

4

-5.40 0.000 -.3885385 -.1814918

College .5071715 .056433

9

8.99 0.000 .3964341 .617909

Hisp -.198478 .067628

2

-2.93 0.003 -.3311816 -.0657743

Black .0259772 .066478

5

0.39 0.696 -.1044703 .1564248

Hours .0053968 .002504

4

2.15 0.031 .0004826 .0103111

Part_tim

e

-.5076908 .083479

3

-6.08 0.000 -.6714982 -.3438834

Child6 .119445 .035073

6

3.41 0.001 .0506217 .1882683

_cons 8.929974 .164651

8

54.2

4

0.000 8.606885 9.253062

Number of obs=1047

F( 8, 1038)=69.43

Page 12: project econ 436511

12

r-squared= 0.3486

adj r-squared= 0.3435

The Regressions Results

Our expectation of the regression equation is

LN_wage=8.93+.035AGi-.285FEMALEi+.507COLLEGEi-.198COLLEGEi+.025BLACKi+.005HOURSi-.5

07PART_TIMEi+.119CHILD6i. When we look at our constant we can tell that wage goes up by

8.929% every year. For every year that age goes up wage goes up by .035%. If the person is a

female wage goes down by .285%. If the person has gone to college wage goes up by .507%.If

the person is Hispanic wage goes down by .198%. If the person is black wage goes up by .025%.

For every hour worked wage goes up by .005%. If the person is part time wage goes down

by .507%. If the person has a child under the age of six wage goes up by .119%. Are R-squared is

.3486 that means we can only account for 35% of our model. This means that our model is very

poor. When we calculate adjusted R-squared it gets even worse .3453. This means that we

need to make some adjustments to the model like correcting heteroskedasticity, increasing the

sample size, dropping redundant variables, or adding more theoretically fit variables.

i. 35(age)+1(female)+1(college)+2(child6)+1(part_time)=$62,764

ii. 45(age)+1(college)=$75,531

iii. 25(age)+1(black)+1(part_time)=21,366

iv. 25(age)+1(part_time)=19,777

v. 55(age)+1(black)+(part_time)53,921

Our model shows that the 25 year old black male makes $1,589 more a year than their

white male counterpart. Our model also shows how much college and working full time

Page 13: project econ 436511

13

increase wage. We compare the 45 year old male with the 55 year old male and there is a

big lead that the 55 year old man won’t be able to catch unless he also graduates and starts

working full time. We can also see how having children increase wage in figure i.

wage age hisp black female college hours Part_time Child6

Wage 1.000

Age 0.2532 1.000

Hisp -

0.0685

-.0768 1.000

Black 0.0174 0.0039 -.1278 1.000

Female -.1533 -.0490 -.0368 0.0301 1.000

College .2743 .1926 -.1501 .0315 .0618 1.000

hours .1949 .1406 .0174 .0083 -.3151 .0305 1.000

Part_time -.2083 -.1659 -.0389 -.0030 .2911 -.0685 -.7542 1.000

Child6 .0821 -.0570 .0038 -.0456 -.0134 -.0086 -.0594 .0461 1.000

When we run the regression we find no cases of multicollinierity in the model. All of the

variables correlations are less than >.50. Since there is no multicollinearity in the problem

we know that none of the variables are explaining movement in other variables. This means

the selection of our explanatory variables are theoretically sound.

When we test for heteroskedasticity with the Breusch-Pagan/ Cook-Weisberg test we

find that without a doubt there is heteroskedasticity. Prob > chi2 =0.00. This means that the

error term variance is not constant and has multiple error term variances. Since there is a

strong case of heteroscedasticity that means that the standard errors are biases ant that

hypothesis testing is no longer reliable. We correct heteroskedasticity by typing in the

Page 14: project econ 436511

14

command robust into Stata. When we robust the equation most of the t values go up. R-

squared stays the same and so do the coefficients. Also standard errors change. Running robust

through Stata helps reduce the chance that the standard errors will be biased increasing the

utility of hypothesis testing.

Regress wage age female black Hispanic adjusted r-squared= .0837

Wage coef Std. err. t P>|t| 95% conf. I

Age 1505.38 184.94 8.14 0.00 1142.47 1868.29

Female -14109.34 2910.27 -4.85 0.00 -19820.01 -8398.68

Black 1814.01 3882.92 0.47 0.640 -5805.22 9433.26

Hisp -6984.56 3913.39 -1.78 0.075 -14663.61 694.47

_cons -3534.23 6696.82 -0.53 0.598 -16675.04 9606.57

Regress wage age female black hispanic college adjusted r-squared= .1385

Wage Coef. Std. Err. t P>|t| 95 % Conf I

Age 1225 182.56 6.71 0.00 866.76 1583.24

Female -15672.88 2828.39 -5.54 0.00 -21222.89 -10122.88

Black 1441.915 3765.372 0.38 0.702 -5946.67 8830.49

Hisp -2802.351 3828.77 -.73 0.464 -10315.34 4710.63

College 26023.57 3173.647 8.20 0.00 19796.1 32251.04

Page 15: project econ 436511

15

_cons -1714.822 6497.403 -.26 0.792 -14464.32 11034.68

Regress wage age female black hispanic college part_time adjusted r-squared=.1522

wage Coef Std. Err. T P>|t| 95% Conf I

Age 1111.20 183.09 6.07 0.00 751.9356 1470.47

Female -12075.66 2931.81 4.12 0.00 -17828.6 -6322.71

Black 1182.848 3735.7 0.32 0.752 -6147.51 8513.21

Hisp -3657.44 3803.46 -.96 0.336 -11120.78 3805.89

College 25120.85 3155.44 7.96 0.00 18929.09 31312.6

Part_time -12699.73 3239.19 -4.23 0.00 -20055.83 -7343.63

_cons 4959.04 6635.68 .75 0.455 -8061.82 17979.91

Comparing these models we see when we add college to the equation the cost of living

goes way up. The constant is -1714.82 making negative earnings. When we add part time to the

equation we see that constant turns positive. This makes sense because not working and going

to school you’re spending thousands of dollars without any positive income. Looking at all three

models we see that every time we add a variable the overall fit of the equation goes up.

Although the overall fit of these equations do rise they’re still very poor only accounting for

15% of our model. As we add more explanatory variables fluctuate and standard errors mainly

remain the same. The two explanatory variables that we do add are theoretically sound. They

are both significant at 0.00. The constant is not significant in these models. We see that when

we add college to the equation wages go up like expected. Part time also decreases the amount

of money like we expected. Our models need to be improved. We cannot rely on it because the

Page 16: project econ 436511

16

adjusted R-squared is so low. We can improve the model by adding more explanatory variables

or increasing the sample size.

Conclusion

I conclude that the model that has been given to us is a very basic model. The model

that we’re given has a few problems. It isn’t very reliable the adjusted R-squared only accounts

for 35% of the equation. OLS is no longer the Best Linear Unbiased Estimator because two of

the classical assumptions are broke. The error term is not normally distributed and does not

have a constant variance. The distribution of the error term is skewed to the right. The error

term has more than one variance so we can no longer trust hypothesis testing. We can correct

these errors so that OLS is BLUE again. We can robust the regression, run a white test, drop

redundant variables, and increase the sample size to fix these problems.

Our model only tells us that women and Hispanics get paid less than their white

counterparts. It doesn’t have enough variables to give us enough information on whether

women and different races are discriminated against in the work environment. If we want our

model to be as reliable as those of the Huffington post or other researchers models we need to

add omitted variables. We should add more races like Asian and Native American to see if

discrimination varies. We also need to add more variables that will give us an idea if they’re

being discriminated against. For example IQ, work ethic, living cost, job position, female to male

ratio, work environments. Then our model will be more descriptive and give us more

information about wage inequality. We can also run a time series to see how wage inequality

has either increased or decreased. Expanding our model to this limits takes a lot more research

but also provides us with more descriptive and reliable results.

Refrences

Wealth Gaps Rise to Record High Between Whites, Black, Hispanics, Rakesh Kochhar, Richard

Fry, Paul Taylor, July 26, 2011, Pew Research

Page 17: project econ 436511

17

http://www.pewsocialtrends.org/2011/07/26/wealth-gaps-rise-to-record-highs-between-

whites-blacks-hispanics/

Catherine Hill, How does Race Affect the Gender Wage Gap?

http://www.huffingtonpost.com/catherine-hill/how-does-race-affect-the-gender-wage-

gap_b_5087132.html