Outline Test bias – definitions The basic issue: group differences What causes group differences? Arguments that tests are not biased Differential item

Outline

Test bias – definitions The basic issue: group differences What causes group differences? Arguments that tests are not biased Differential item functioning analysis Criterion-related sources of bias

Outline

Other approaches to testing minority groups Chitling test BITCH test SOMPA

Models of test Bias Regression Constant Ratio Cole/Darlington Quota

Test bias – definition

A test is biased if it gives a systematically wrong result when used to predict something.

So, an intelligence test would be biased if, for example, it underestimated one group’s probability of success in a given endeavor.

Test bias – the basic issue

Various groups within society differ in their average scores on some psychological tests.

We don’t know what causes these differences.

What causes group differences?

Some candidate accounts:

Genetics Socioeconomic

factors Caste Culture Stereotype threat

Arguments that tests are not biased

Major tests have been subjected to impressive scrutiny for decades

Enormous resources are devoted to this purpose

Criterion validity has been established very securely for the major intelligence tests – they do predict college and job performance


It is not appropriate to focus on individual items on a test, which some critics of testing do

Items should be drawn from a variety of domains, not all of which will be familiar to anyone


Test developers evaluate tests on the basis of overall patterns of prediction utility

They’re future-oriented, not past-oriented: “How will you do

in college or in a job?”

Not “have you had the opportunity to learn?”


Do you think of test score results as “outcomes” or as “information” (predictors)?

Test developers say, results are the beginning, not the end – they are information that will guide us

Opponents see test results as outcomes


Systematic studies have asked whether biased items produce group differences on tests such as Stanford-Binet and Wechsler tests

These studies found no evidence that group differences disappeared when allegedly biased items were removed

Argument that tests are not biased

Group differences just as large on what is considered the most culture fair test, Ravens Progressive Matrices, as on WAIS

IQ scores have same utility for prediction regardless of race or socio-economic status.

Differential item functioning analysis

In this approach to testing for bias, you first form groups for comparison which are equated on overall test score

Implication: groups are equivalent in overall ability

Then, you look for differences between groups on individual items

Where difference is found, you conclude that the item is biased (since groups are not different on ability)

Differential item functioning analysis

But removing such items does not eliminate group differences

E.g., people depicted in test items may typically be White & male

But changing this has little effect (McCarty, Noble, & Huntley, 1989)

Criterion-related sources of bias

We evaluate criterion validity by looking at correlation between test scores and criterion scores

E.g., SAT scores vs. GPA after 4 years at university


If correlation is good, we use test scores (e.g., SAT) to predict criterion – and make selection decisions

What do we do if the correlation is different for different groups?

This would imply that test scores mean different things for different groups


In this graph, Group B performs better than Group A but the correlation is the same for both

Test scoreC

riter

ion

Group B

Group A


In this graph, the slopes of the lines are the same but the intercepts are different

Equal slopes means equal correlations – that is, equally good predictions

Test scoreC

riter

ion

Group B

Group A


Here, the intercepts are different and the slopes are different, so predictions for Groups A and B would not be equally good

Such cases are rare

Group A

Group B

X1 X2


Major tests, such as SAT and WISC-R, have equal criterion validity for various ethnic groups (e.g, African-American, White, Latino/Latina)

Similar results have been found in other multi-ethnic countries, such as Israel

Other approaches to testing minority groups

The Chitling Test The BITCH Test SOMPA

The Chitling Test (Dove, 1968)

Developed to make a point about testing for information a group is unlikely to have acquired

Questions require a particular form of “street smarts” to answer correctly

No validity data exist for this test

If you want to predict college performance for minority students, this test won’t help

The BITCH test (Williams, 1974)

Task: define 100 words drawn from the Afro-American Slang Dictionary and Williams' personal experience

African-Americans score higher than Whites

Williams argues that this test is analogous to the standard IQ tests, which are also culture-bound

The BITCH test (Williams, 1974)

Problem: there is no reason to accept the claim that this is an intelligence test.

There is no validity evidence – no prediction of any performance

Does not test reasoning skills

May have some value for testing familiarity with African-American culture

SOMPA (Mercer, 1979)

System of Multi-cultural Pluralistic Assessment

Based on idea that what constitutes knowledge is socially-constructed

Mercer also suggested that IQ tests are a tool Whites use to keep minority groups “in their place”.


Inspired originally in part by over-representation of minority group children in EMR classes in US schools

Mercer: this over-representation resulted from both More medical

problems Unfamiliar cultural

references on tests


Fundamental assumption: all cultural groups have the same potential on average

On this view, if one cultural group does more poorly than another on a test, that is a fact about the test, not the groups.


Combines 3 kinds of evaluation:

Medical Health, vision,

hearing, etc. Social

Entire WISC-R Pluralistic

Compare WISC-R scores to those of same community


Estimated Learning Potentials: WISC-R scores adjusted for socio-economic background

But these ELPs don’t predict school performance as well as the original WISC-R scores

Mercer: ELPs are intended to assess who should be in EMR classes


A major problem, in my view, is that we don’t know what consequences arise for children who are removed from EMR classes on basis of ELPs

Is what we call these children important? It is if the label has an effect, but data do not show that effect

SOMPA used much less today than it used to be

Models of test Bias

Regression Constant Ratio Cole/Darlington Quota

Regression

Basis – unqualified individualism: Treat each person

as an individual, not as a member of a group

Select people with highest scores for job or college place

Ignores sex, race, other group characteristics

Leads to highest average performance on criterion

Constant Ratio

Basis – choose so that selection ratio for groups = success ratio for groups

Select the best candidate but give a boost to minority group members’ scores so that selection probability = success probability

Constant Ratio

Adjust test scores for minority groups upwards by half the mean difference between groups

Leads to somewhat lower average performance on criterion

Cole/Darlington

Basis – If there is special value in selecting minority group members, then a minority score of Y on criterion is equal to a majority score of Y + k on criterion

Separate regression equations used for different groups and adjustment made

Leads to lower average performance on criterion

Cole/Darlington

If a value is placed on selection of minority group members, and intercept is lower for that group, then we consider minority test score X1 and majority test score X2 equal

X1 X2

k

Quota

Basis – idea that all groups should have equal outcomes

Selection based on different regression equations for each group

Produces lower average performance on criterion

Quota

If 10% of population is Asian then 10% of student body should be Asian

Another way to look at this: if 10% of population is Jewish then no more than 10% of professors should be Jewish.

This puts the quota idea in a different light.

Documents

Outline Test bias – definitions The basic issue: group differences What causes group differences? Arguments that tests are not biased Differential item