Likely Impact of the GRE@ Writing Assessment on ...are perhaps best reflected in the minutes of one GRE Writing Advisory Committee meeting, held June 25-26, 1996, which cited the following

Likely Impact of the GRE@ Writing Assessment on Graduate Admissions Decisions

Donald E. Powers Mary E. Fowles

GRE Board Report No. 97-06R

September 2000

This report presents the findings of a research project funded by and carried

out under the auspices of the Graduate Record Examinations Board

Educational Testing Service, Princeton, NJ 08541

********************

Researchers are encouraged to express freely their professional judgment. Therefore, points of view or opinions stated in Graduate

Record Examinations Board Reports do not necessarily represent official Graduate Record Examinations Board position or policy.

********************

The Graduate Record Examinations Board and Educational Testing Service are dedicated to the principle of equal opportunity, and their programs,

services, and employment policies are guided by that principle.

EDUCATIONAL TESTING SERVICE, ETS, the ETS logo, the modernized ETS logo, GRADUATE RECORD EXAMINATIONS, and GRE are

registered trademarks of Educational Testing Service.

Educational Testing Service Princeton, NJ 08541

Copyright 0 2000 by Educational Testing Service. All rights reserved.

Acknowledgments

We are grateful to Doug Baldwin, Carol Dwyer, Nancy Glazer, and Cyndi Welsh for helpful

comments and suggestions; to Leona Aiken, Henry Braun, Jose Mestre, Phil Oltman, Larry Stricker, and

Art Young for providing formal reviews of a first draft; to Marisa Famum, Terry Santos, and Agnes

Yamada for providing advice on the kinds of essay errors to consider; to Liane Patsula for providing data

on the Graduate Management Admissions Test Writing Assessment; to Rob Durso for help in generating

our data; to Pankaja Narayanan for providing information about GRE score sending patterns; to Debra

Friedman for assistance with the development of study materials; to Laura Jerry for analyzing the data; to

Ruth Yoder for identifying relevant sources to guide the development of study materials, for devising

study forms, for overseeing the general administration of the study, and for numerous other tasks; to the

faculty members and graduate admissions staff who contributed to the study; and to the Graduate Record

Examinations Board and its Research Committee for supporting this research.

This “judgmental policy capturing” study investigated how the new Graduate Record

Examinations (GRE@) Writing Assessment might influence graduate admissions decisions. Of interest

was the likely impact of GRJZ Writing Assessment scores on graduate admissions decisions, as well as

the probable effects of sending actual examinee essays to graduate institutions along with test scores and

whether the presence of construct-irrelevant flaws in these essays might negatively influence admissions

decisions. To answer these questions, 23 graduate faculty -- who represent nine graduate psychology and

14 graduate history departments and who have at least some experience with the admissions process --

reviewed simulated admissions folders and made admissions decisions for a set of fictitious applicants.

The study examined the relationship between these admissions decisions and a number of variables,

including GRE Writing Assessment scores, the availability of examinee essays in admissions folders,

GRE General Test scores, undergraduate grades, and the quality of the applicant’s recommendation and

personal statement. The study results suggest that GRE writing scores will probably have some impact on

graduate admission decisions, but that overall, the availability of examinee essays will have substantially

less, if any, influence. The study did not detect any significant tendency for graduate faculty to attend

unduly to extraneous flaws in examinees’ essays. A substantial majority of the participants felt that

receiving applicants’ GRE essays would either probably or definitely be useful.

KEYWORDS: Graduate Record Examinations (GRE), writing assessment, graduate admissions, policy capturing, test use

Table of Contents

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..*.........*....................................................................... 1

Policy Capturing . ..~................~..~~.......~.............~...~~~..........~~..............~.....................~...~.~...~.,...~....~... 2

Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ................................. 4

Overview of the Research Design ................................................................................................... 4

Study Design ................................................................................................................................... 5

Adding Errors to the Essays ............................................................................................................ I

Strength of Applicants’ Admission Credentials .............................................................................. 8

The Sample of Faculty Participants ............................................................................................... 12

Instruments/Procedures ................................................................................................................. 13

Analyses ........................................................................................................................................ 18

Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ................................ 20

Admission Rates ............................................................................................................................ 20

Reliability/Consistency ................................................................................................................. 21

Effect of GRE Writing Assessment Scores and Presence of Essays in Folders ........................... 21

Effect of Construct-Irrelevant Flaws in Essays ............................................................................. 29

Influence of Other Pre-Admission Factors .................................................................................... 39

The Fidelity of Our Simulation ..................................................................................................... 46

Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..**............................ 47

Limitations ..................................................................................................................................... 48

Implications ................................................................................................................................... 50

Further Considerations .................................................................................................................. 52

References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..*......................................*....................................... 53

Appendix A . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ........................ 57

Appendix B . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ........................ 59

Appendix C . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..*........................................................................................ 61

Appendix D . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..*......................................*................... 63

List of Tables

Table 1. Correlations Among Pre-Admission Variables ........................................................................... 12 Table 2a. Hierarchical Regression Analysis for All History Faculty Considering GRE Writing

Assessment Only ........................................................................................................................ 24 Table 2b. Hierarchical Regression Analysis for All Psychology Faculty Considering GRE Writing

Assessment Only ........................................................................................................................ 25 Table 3a. Hierarchical Regression Analysis for All History Faculty Considering All Information.. ....... .26 Table 3b. Hierarchical Regression Analysis for All Psychology Faculty Considering All

Information ................................................................................................................................. 27 Table 4a. Hierarchical Regression Analysis for All History Faculty for Abbreviated Applications ......... 28 Table 4b. Hierarchical Regression Analysis for All Psychology Faculty for Abbreviated

Applications ............................................................................................................................... 29 Table 5a. Hierarchical Regression Analysis for History Faculty Based on GRE Writing

Assessment Only (Essays Available in Folders) ........................................................................ 31 Table 5b. Hierarchical Regression Analysis for Psychology Faculty Based on GRE Writing

Assessment Only (Essays Available in Folders) ........................................................................ 32 Table 6a. Hierarchical Regression Analysis for History Faculty Based on All Information (Essays

Available in Folders) .................................................................................................................. 35 Table 6b. Hierarchical Regression Analysis for Psychology Faculty Based on All Information

(Essays Available in Folders) ..................................................................................................... 36 Table 7a. Hierarchical Regression Analysis for History Faculty Based on GRE Writing

Assessment Essays Only (Essays Available in Separate Packet) .............................................. 37 Table 7b. Hierarchical Regression Analysis for Psychology Faculty Based on GRE Writing

Assessment Essays Only (Essays Available in Separate Packet) .............................................. 38 Table 8a. Standardized Regression Weights for Prediction of Likelihood Estimates for History

Faculty Participants .................................................................................................................... 40 Table 8b. Standardized Regression Weights for Prediction of Likelihood Estimates for

Psychology Faculty Participants ................................................................................................ 41 Table 9a. Mean Faculty Ratings and Weightings of Importance of Admissions Factors .......................... 43 Table 9b. Mean Faculty Ratings of Importance of Information Available From Recommendations ....... .44

Table 9c. Mean Faculty Ratings of Importance of Information Available From Personal Statements .... .45 Table D 1 .GRE General Test Scores Interpretive Data ............................................................................... 65 Table D2.GRE Writing Assessment Interpretive Data ............................................................................... 65

List of Figures

Figure 1. Summary of study procedures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 Figure 2. Mean likelihood of admission on the basis of the GRE Writing Assessment only, by

score level, availability of essays, and prevalence of errors (psychology faculty). . . . . . . . . . . . . . . . . . . . 33 Figure 3. Mean likelihood of admission on the basis of the GRE Writing Assessment only, by

score level, availability of essays, and prevalence of errors (history faculty) . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 Figure 4. Mean likelihood of admission on the basis of the GRE Writing Assessment only

(essays evaluated separately), by score level and prevalence of errors (psychology faculty) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .............. 34

Figure 5. Mean likelihood of admission on the basis of the GRE Writing Assessment only (essays evaluated separately), by score level and prevalence of errors (history faculty). . . . . . . . . .34

Introduction

What, if any, additional supporting materials -- actual examinee essays, for example -- should

accompany Graduate Record Examinations (GRE’) Writing Assessment scores when they are sent to

institutional test-score users? This question, arising during the development of the GRE Writing

Assessment, was a central focus of the research we have reported here. Also of interest was a more

general question concerning the likely influence of the new GRE Writing Assessment on graduate

admissions. Within the context of the broader study through which we sought to answer these two

questions, we also considered, in a less rigorous fashion, the impact of traditional pre-admission

information -- such as GRE General Test scores, undergraduate grades, and the quality of the applicant’s

recommendation and personal statement -- on graduate admissions decisions.

The GRE Writing Assessment, which was introduced operationally in the 1999-2000 testing

year, requires GRE test takers to write two essays -- one that entails discussing an issue and another that

involves analyzing an argument’. On the basis of these two writing samples, a composite score is

reported that reflects each writer’s ability to analyze and discuss complex ideas in a clear, well-focused,

coherent, and effective manner. To help test-score users understand the meaning of scores derived from

the assessment, interpretive materials and sample essays are provided with the score reports that are sent

to graduate institutions. Currently, however, because of concerns about possible misuse, examinees’

essays do not accompany their test scores. This decision was based on the concern that, without proper

training, graduate admissions staff may misinterpret writing performance, perhaps by using idiosyncratic

criteria to re-evaluate essays or by focusing on extraneous features that are not considered integral to the

construct of writing ability as defined by the GRE program. (Trained essay readers are instructed to

disregard such features when scoring GRE essays.) Moreover, because irrelevant features may be

differentially prevalent by gender, ethnicity, or cultural background, there exists in this practice a

potential for unfairness to certain groups of test takers. Of course, there are also cost considerations

involved in sending (and receiving) essays.

The reluctance to release actual GRE examinee essays to graduate institutions is not felt by

everyone in the assessment community, however. In fact, sending the essays along with the score reports

’ For a full description of the GRE Writing Assessment, sample questions, and scoring criteria, see the GRE website (http://www.gre.org/twotasks.html) or the 1999-2000 GRE Guide to the use of scores (Educational Testing Service, 1999).

was endorsed by a majority of the experts in the teaching and assessment of writing who offered advice

during the development of the new assessment. This endorsement, it seems, was predicated on the belief

that releasing essays may actually increase access to graduate education for some applicants whose

writing scores might otherwise preclude their admission. Making essays available might also, it was

believed, improve the academic community’s understanding of the writing assessment, whereas

withholding the essays could exacerbate feelings of mistrust about the new measure. These sentiments

are perhaps best reflected in the minutes of one GRE Writing Advisory Committee meeting, held June

25-26, 1996, which cited the following reasons for releasing examinees’ essays:

l Many faculty are quite capable of interpreting GRE essays appropriately.

l Some testing programs -- such as the Graduate Management Admission Test (GMAT) -- have been releasing essays for some time, and there have been few, if any, reports to suggest that this information has been abused.2

l When considering candidates with marginal writing scores, departments may find that having essays is helpful for determining the seriousness of an applicant’s writing deficiencies. In addition, departments may be willing to accept applicants whose writing exhibits a strong “voice” and well-reasoned explanations, knowing that (for international students, for example) grammatical and other second-language problems could be remediated after admission.

In light of the GRE Writing Advisory Committee’s stance, the GRE Board left open the possibility of

revisiting its current policy of not releasing essays; hence the study reported here. Anticipating our main

findings, we note here that, of the 23 graduate faculty who participated in the study we describe below, a

substantial majority felt that receiving applicants’ GRE essays would either probably (n = 3) or definitely

(a = 16) be useful.

Policv Capturing

“Policy capturing” is a technique used to model human decision-making. More specifically, the

term refers to efforts to explicate the relative weights or rules for combining information of different

types or from several sources in order to arrive at summary judgments or decisions (see, for example,

’ As a prelude to the study, we conducted a small, informal survey of admissions personnel at 14 graduate schools of management, all of whom were addressed through their web sites. Half of those contacted offered responses to our query about ” . . . whether or not you find that receiving applicants’ GMAT essays, along with their test scores, is useful in your admissions process.” A variety of views were expressed, but the most frequent theme was that, if read, the essays were consulted selectively -- for example, for international students or when a question arose about the authenticity of an alternative writing sample submitted by an applicant.

Hammond, Mumpower, & Smith, 1977). As Schmidt, Johnson, and Gugel(l978) point out, policy-

capturing methodology has been used successfully to study virtually all kinds of decision-making

strategies. Applications have included:

modeling the diagnoses of clinical psychologists (Goldberg, 1970)

investigating the factors underlying judicial decisions (Roehling, 1993; Werner & Bolino, 1997)

studying teachers as decision makers (Shavelson, 1973; Shavelson Jz Atwood, 1977) and students as evaluators of teaching effectiveness (Harrison, Ryan, & Moore, 1996)

understanding the bases for performance appraisals (e.g., Hobson, Mendel, & Gibson, 1981) and employment interviews (Dougherty, Ebert, & Callendar, 1986)

learning about the factors underlying sexual behavior (Finkelstein & Brannick, 1997; Wiederman, 1999) and preference for mates (Wiederman 8z Dubois, 1998)

capturing the strategies that expert horse racing handicappers use to predict odds at post time (Ceci & Liker, 1986)

The studies most relevant to the effort reported here -- for example, research on employee

evaluation (Gaeth & Shanteau, 1984; Stumpf & London, 1981; and Zedeck & Kafry, 1977) -- have been

undertaken in organizational and industrial settings. Several efforts, however, have aimed to clarify the

nature of admissions decisions in academic settings. Perhaps best known is a study of admissions

decisions at the University of Oregon’s psychology department (Dawes, 1971), which found that GRE

General Test scores were more highly related to admission ratings than were undergraduate grade point

averages (GPA). Other studies include:

l Schmidt, Johnson, and Gugel’s 1978 study of applicants to psychology programs at Michigan State University, in which different subdisciplines of psychology weighted criteria very differently

l Wallace and Schwab’s 1976 study of decisions about applicants to an industrial relations program, in which GRE General Test scores and undergraduate grades were the most consistent predictors of admission decisions

l Gomey and Jaeger’s 1995 study of the likely impact of several new GRE tests on graduate admissions, in which tests received different weights by discipline

l Kline and Sulsky’s 1995 study of the acceptability of applicants to a graduate psychology program at the University of Calgary, in which professors used information relatively consistently, but sometimes applied complex rules when making judgments.

Besides contributing to a better understanding of graduate admissions, each of these investigations also

provided guidance for the study reported here. Gomey and Jaeger’s (1995) advice, for example, was to

use simulated applications that are as realistic and as representative of the typical applicant pool as

possible.

Method

Overview of the Research Design

The primary methodology employed in the study reported here has usually been referred to as

“judgmental policy capturing.” This technique was designed to model statistically, or “capture,” the

judgments of human decision makers. However, in contrast to most policy capturing studies, which

primarily center on the decisions of individuals, the focus here was on decision makers in the aggregate.

It should be stressed at the outset that, like most policy capturing studies, ours relies mainly on

correlational data. Thus, with the exception ‘of those variables that were experimentally manipulated and

were therefore uncorrelated with other variables, a large weight for a given variable does not necessarily

mean that the variable was actually used in decision making. Rather, it could signify instead only that

some highly related variable was used.

For this study, a small number of history (r! = 14) and psychology faculty (n = 9) involved in

graduate admissions each reviewed 27 simulated “admission folders” for a set of fictitious applicants and

made judgments of the admissibility of each one. Each folder contained an application for graduate

admission, which specified the applicant’s GRE General Test scores, GRE Writing Assessment score,

undergraduate GPAs, and other personal information that typically appears on applications to graduate

school. Each application also included a simulated recommendation from an undergraduate professor,

and a manufactured summary of information gleaned from a hypothetical personal statement. (The

rationale for our use of simulations instead of actual recommendations and personal statements will, we

hope, become clear later in this report.)

A key variable of interest to us was the amount of information provided about applicants’

performance on the GRE Writing Assessment. Within each graduate department in our study, half of the

faculty participants received applicants’ GRE writing scores, while the other half received applicants’

GRE writing scores plus copies of the essays on which the scores were based. All of these essays had

been written by participants in an earlier GRE research study and had been scored by ETS-trained

readers.

A second design factor involved the number of additional errors introduced by us into applicants’

GRE essays. To control this factor experimentally, we systematically inserted extraneous errors of

various sorts into two thirds of the essays, while leaving the remaining 18 essays intact. The errors were

the kind that trained GRE essay readers are instructed to keep in perspective when evaluating examinees’

essays. We hypothesized that such errors could negatively influence graduate faculty perceptions of

applicants’ writing skills3.

Analyses were undertaken to determine how the likelihood of the admission of our hypothetical

applicants relates to three variables: (1) the applicants’ GRE Writing Assessment scores, (2) whether or

not applicants’ essays accompanied their GRE writing scores, and (3) the extent to which applicants’

essays contained extraneous flaws. Secondarily, we also estimated the influence of several traditional

graduate school admissions criteria -- GRE General Test scores, undergraduate GPAs, and the quality of

the applicant’s recommendation and personal statement -- on these admission decisions.

Study Design

The specific factors varied in the study reported here were:

the presence or absence of actual essays in application folders

applicants’ GRE writing scores

the extent to which essays contained extraneous flaws (none, some, or many)

the strength of applicants’ other admissions credentials -- such as, the level of their GRE General Test scores and undergraduate GPAs

Inclusion of essays. As noted earlier, half of the participating faculty saw the scores from the

GRE Writing Assessment; the other half saw scores and actual essays. All participants received a set of

interpretive materials explaining the Writing Assessment. These materials included a description of the

3 A third feature of the essays, which was of possible interest to the GRE Research Committee, was the extent to which essays expressed controversial views. We decided, however, not to try to incorporate this feature into the current study because adequately evaluating the influence of controversial opinions would require additional information about faculty participants -- for example, what kinds of views they themselves hold. We felt that, while the issue of controversy is an important one, it was beyond the scope of the present effort.

5

assessment and the scoring guides, questions and answers about various aspects of measure, and sample

benchmark essays representing each point on the GRE writing score scale.

GZ?E Writing Assessment scores. Examinee essays were selected from some 4,000 essays that

were written by third- and fourth-year undergraduates and by first-year graduate students at 26

geographically diverse colleges and universities during two earlier research studies of the GRE Writing

Assessment (Powers, Fowles, & Welsh, 1999; Schaeffer, Briel, & Fowles, in press). From this sample,

we identified a total of 27 participants, each of whom had written both an issue essay and an argument

essay. The GRE writing scores for this small sample spanned the range of GRE Writing Assessment

scores: exactly three examinees were identified at each half-point score interval from 2.0 to 6.0. (Because

scores below 2.0 are rare in other holistically scored graduate-level writing tests, they were not included

in the study.) Each of the resulting 54 essays (that is, 27 issue essays and 27 argument essays) had

previously received exactly the same score by each of two trained readers (in operational scoring, a

difference of one point or less is considered to be an acceptable level of agreement.).

Insertion of minor-flaws. To control extraneous flaws or errors, we systematically modified some

essays by introducing irrelevant flaws. The work of Freedman (1979) provided a good model for

experimentally manipulating this design factor. For a third of the 27 applicants (one at each GRE Writing

Assessment half-score point interval), both the issue essay and the argument essay remained as originally

written, with no additional irrelevant characteristics added. For another third of the applicants (again, one

at each score level), both essays were modified by introducing “some” irrelevant features. For the

remaining third of the applicants, “many” such characteristics were inserted into essays at each score

level. Caution was needed here, however, as errors can differ significantly in their quality: Some are

considerably more serious than others because they render an argument less compelling or because they

interfere with meaning. The quantity of errors was also a concern: For example, a few diction errors may

matter little in an otherwise strong essay, but a large number of word-choice errors may have a greater

impact.

With respect to the quantity of errors, we defined “some” as any two of the following error

groupings, while “many” was defined as the presence of all three of the following error groupings:

0 a minor error of fact that did not affect the strength of the essay

l two errors of grammar or diction that did not seriously affect the coherence of the essay

0 three spelling errors

GRE Writing Assessment specialists judged all of these newly added errors to be contextually so

trivial as to have no effect on the overall quality of any essay. To verify this judgment, two additional

trained essay readers who had no knowledge of the original scores reevaluated each of the modified

essays. This step served to verify that scores had not changed as a result of our modifications.4

Other admissions credentials. To determine the impact of more traditional admissions

credentials, we specified the following for each hypothetical applicant:

l GRE General Test scores (verbal, quantitative, and analytical)

l undergraduate GPAs (overall and in major field)

0 the quality of the applicant’s recommendations (both a rating of the student’s overall promise, as suggested by the recommendation, and a summary of the recommendation)

0 the quality of the applicant’s personal statement (that is, a rating of the information that was, hypothetically, gleaned from the statement)

Using a data simulation program, these data were generated so as to meet specified means and variances

for each variable, and for the variables as a set, to meet specified correlations among them. Because the

number of constraints was relatively large compared with the number of cases, converging exactly on the

target statistics was not possible. Manual adjustments to some data values were needed in order to attain

a sufficiently close approximation to the targets. More detail about the specification of these variables is

provided below.

Adding Errors to the Essays

Our task was to decide which (and how many) errors to include in our “doctored” essays. Our

aim was to use errors that would be potentially distracting to untrained readers, but would not affect the

overall quality of essays. Toward this end, several sources guided our decisions about the kinds of errors

that we inserted into essays. First, we reviewed selected literature on “error gravity” for clues regarding

the sorts of errors that might be most noticeable or irritating to untrained readers (see Connors &

4 The new GRE Writing Assessment scores given to the 27 applicants corresponded very closely to the scores that were assigned originally (r = .92). The mean original score was 4.0, while the mean restore was 4.06. An inspection of original and subsequent restores for individual applicants did not suggest that modified essays were any more likely than unmodified essays to receive different scores upon restoring. Thus, the insertion of irrelevant errors did not appear to affect the scores assigned by trained essay readers.

7

Lunsford, 1988; Hairston, 1981; Janopoulos, 1992; Rifkin & Roberts, 1995; Santos, 1988; and Vann,

Meyer, & Lorenz, 1984, for example). Next, we examined essays that were written during the fall 1997

field testing of GRE writing prompts to glean this information. This was accomplished both

retrospectively (readers revisited previously scored essays to identify errors or features that were

potentially distracting), and concurrently (readers recorded errors and distracting features during their

initial scoring of GRE essays).* We also asked several experts6 to tell us the kinds of errors they feel are

prominent in GRE essays and that graduate faculty might find distracting. Finally, we reviewed the

manuals and materials used to train ETS readers in the holistic scoring of essays. Perusal of these

manuals was informative, both with respect to what is expected of readers and how the construct of

writing is defined.7

Strength of Annlicants’ Admission Credentials

Each simulated admissions folder contained three documents -- an application for graduate study

(which contained information about our hypothetical students’ test scores and grades), a recommendation

5 Among the potentially distracting features that readers noted were the following: (a) the accumulation of spelling errors, (b) “bull,” jargon, intrusive literary allusions, or other “inflated effects,” (c) ESL and other dialect markers that do not interfere with meaning, (d) factual errors, (e) highly charged emotional responses or diatribes, (f) incompleteness, (g) highly specialized, discipline-specific subject matter, (h) unusual or idiosyncratic approaches to the topic, (i) variation in the quality of various parts of the essay, (i) nonstandard forms, such as double negatives, (k) punctuation errors, (1) misuse of diction or poor word choice, (m) the absence of paragraphing, and (n) “mangled” cliches, such as “looking at the world through rose-colored eyes.”

6 Agnes Yamada, Professor of English and Department Chair at California State University at Dominguez Hills (personal communication, April 3, 1998), Terry Santos, Associate Professor at Humboldt State University (personal communication, April 5, 1998), and ETS staff member Marissa Farnum (personal communication, April 5, 1998). ETS staff members Nancy Glazer and Cyndi Welsh were especially helpful to us in our decision-making about which errors to insert into the essays.

7 These points are emphasized with regard to holistic scoring: Score holistically -- for the overall quality of thinking and writing, as described in the Scoring Guide. Don’t focus on minor errors or a single weakness. Three examples are not necessarily better than one; what matters is how well-chosen the examples are and how well they are discussed.

Read supportively; look for and reward what is done well in relation to the scoring criteria, rather than penalize what has been done poorly or omitted.

Read the entire response; the writing sometimes improves dramatically as the writer becomes more engaged in the task.

Acknowledge the limitations of time. Depending on the task, candidates have only 45 or 30 minutes to draft and revise a response. Judge the responses as first drafts and do not expect “perfect” or highly polished papers, even at the top score levels.

Do not penalize an unfinished but developed response for lacking a conclusion.

Do not judge a response by its length; some short responses are very good and some long ones deserve low scores.

form, and a summary of the applicant’s statement of purpose. The development of application materials

was based a small number of actual applications that GRE test takers supplied and permitted us to use as

models. The application form itself was a composite of actual forms currently used by graduate

institutions, as was the recommendation form.

For each of the two disciplines represented by our application folders -- history and psychology

-- we specified a target matrix of means, variances, and correlations among variables. For GRE General

Test scores, for example, target means and variances were based on statistics for actual examinees who

had majored in psychology and history (Educational Testing Service, 1998). Because the departments

that agreed to participate in the study were, with few exceptions, relatively selective ones, GRE General

Test score means supplied in the folders of our fictitious history and psychology applicants were actually

set somewhat higher (by about 25 points) than the means for all GRE test takers in each of the two

disciplines. Faculty comments confirmed the appropriateness of setting these higher test-score levels.

The intercorrelations reported for all GRE General Test takers (Educational Testing Service,

1998) served as the target for correlations among the three GRE General Test scores -- verbal,

quantitative, and analytical --that we provided. The target means and variances for undergraduate GPAs,

and their correlations with GRE General Test scores, were based on data provided by Schneider and

Brie1 (1990) and Wilson (1986). For GRE Writing Assessment scores, target statistics were based on

research data from pre-operational tryouts of the GRE Writing Assessment (Powers, Fowles, & Boyles,

1996) and on data from the first two years of the operational GMAT Analytical Writing Assessment

(Liane Patsula, personal communication, May 27, 1999). The use of data from the GMAT program

seemed justified, as the GMAT and GRE writing assessments are very similar to one another with respect

to both their content and scoring. Finally, overall undergraduate GPAs and GPAs in the major field of

study were specified so as to relate very strongly to one another, with the latter having a slightly higher

mean.

A review of several recommendation forms suggested a number of traits that recommendation

writers are often asked to rate or comment on. In order to identify other relevant traits, we also consulted

literature on the nature and use of recommendations (for example, Aamodt, Bryan, & Whitcomb, 1993;

Range, Menyhert, Walsh, Hardin, Ellis, & Craddick, 1991). In this regard, Keith-Spiegel’s (1991)

analysis of several hundred recommendations written in support of graduate applications, and an

accompanying discussion of some two dozen applicant qualities mentioned in these letters, proved very

helpful.

Taking McCauley’s (1991) advice, rather than using predominately narrative accounts of

applicants’ qualities, we chose to quantify the information in our hypothetical applicants’ letters of

recommendation and personal statements. For both recommendations and personal statements, we

specified applicant traits or qualities that might be manifested in narrative accounts, and then provided

numerical ratings of each trait, as if the narratives had been evaluated previously for evidence of each

trait. That is, for each applicant we provided a list of traits or personal qualities that had already been

rated (for recommendations) or coded (for personal statements).

Employing this strategy, instead of providing potentially ambiguous descriptive letters, was

intended to minimize the level of inference required of applications readers (Tommasi, Williams, dz

Nordstrom’ 1998). Although the resulting numerical evaluations may not have been entirely

characteristic of the information normally reviewed by graduate admissions committees, this strategy did

allow us to sidestep several potential complications associated with the use of actual narrative statements

and recommendations. In particular, we were able to exert some control over the numerous factors that

are known to moderate the influence of recommendations. Among these influences are length (Weins,

Jackson, Manaugh, dz Matarazzo, 1969), tone, content, and structure (Loher, Hazer, Tsai, Tilton, &

James, 1997), writer (Ralston & Yoder, 1989), and the presence of specific examples versus generalities

(Knouse, 1983).

With respect to summary recommendations, our fictitious applicants were divided evenly among

the following categories: “strongly recommended, ” “recommended,” and “recommended with

reservations.” None were placed in the lowest category (“not recommended”), since few graduate school

applicants are, we believe, so unfortunate as to secure entirely negative endorsements. We set the mean

rating of “overall promise” on the recommendation forms between “good” (4) and “very good” (5) on a

seven-point scale ranging from “below average” (1) to “truly exceptional” (7). Summary

recommendations and ratings of overall promise were specified so as to correlate strongly with one

another (r = .87), but only slightly with other variables. Each of the 16 individual traits (dependability,

intellectual ability, and so on) that appeared on the recommendation form were specified to correlate

moderately with one another (r = .31 to .77), and strongly with the overall rating of promise (r = .68 to

.79). Consistent with research by Zeleznik, Hojat, & Veloski (1983), recommendations were generated so

as to relate modestly to undergraduate GPAs (r = .21, .28), and lacking any other information, to correlate

only slightly to test scores and other variables (r = .03 to .34).

10

To devise personal statements, we followed a strategy similar to the one used for

recommendations. First, we reviewed a set of real personal statements that had been provided for

previous research on the value of personal statements as an indication of writing skill (Powers Jz Fowles,

1997). We also examined a number of personal statement forms that are currently used by graduate

institutions, and noted the kinds of information that applicants are asked to provide. Finally, we reviewed’

relevant literature on personal statements in order to ascertain further the kinds of evidence that these

statements typically contain (Ferguson, 1991; Hatch, Hill, & Hayes, 1993; Hawkins, 1993; Keith-

Spiegel, 1991; Murphy, 1991; Paley, 1994; Wickenden, 1982; Willingham, 1985; Willingham & Breland,

1982). On these bases, we created a checklist of important qualities and traits that might be inferred from

an applicant’s personal statement. We completed this checklist for each applicant and included it in each

application folder as “A Summary of Applicant’s Statement of Purpose.”

The average rating of overall strength of personal statements was set between “good” (4) and

“very good” (5) on the same seven-point scale used for recommendations, and the correlations of this

rating with other pre-admission measures was assumed to be slight. Correlations among the 11 qualities

or traits gleaned from the personal statements (motivation, likely commitment to the field, and so on)

were moderate to strong (r = .35 to .79), as was the correlation of each trait with the rating of the overall

strength of the statement (r = .67 to .79).

Descriptive statistics for, and correlations among, all pre-admission measures are shown in Table

1. The same statistics were used for history and psychology applicants, except for a different profile of

mean GRE General Test scores (for GRE verbal, quantitative, and analytical scores, respectively, m =

556, 545, and 589 for history applicants, and m = 486,525, and 569 for psychology applicants). These

between-discipline differences in test scores were specified on the basis of data available for all GRE

General Test takers by discipline.

11

Table 1. Correlations Among Pre-Admission Variables

Variable

Variable

M SD 1 2 3 4 5 6 7 8 9

1. Gm verbal score

2. GRE quantitative score

3. GRE analytical score

4. GRE writing score

5. Undergraduate GPA -- overall

6. Undergraduate GPA -- major

7. Summary recommendationb

8. Rating of overall promisec

9. Personal statementC

4861556” 86

5251545” 90 .47

56915 89” 85 .59 .61

4.0 1.32 .42 .13 .34

3.27 0.31 .30 .27 .23 .12

3.33 0.32 .20 .21 .29 .16 .71

2.04 0.81 .25 .03 .22 .05 .28 .21

4.26 1.46 .20 .12 .20 .07 .17 .23 .87

4.56 1.69 .03 .09 .08 .02 .04 .17 .18 .19

10. Selectivity/reputation of 2.0 0.83 .27 .31 .30 .05 .09 .07 .34 .32 .33 undergraduate schoold

Note. Correlations among variables were the same for psychology and history applicants, as were means and standard deviations for all variables except GRE General Test scores. a The first mean is for psychology applicants; the second is for history applicants. b On a three-point scale: 1 = recommend with reservations ,2 = recommend, 3 = strongly recommend. ’ On a seven point scale: 1 = below average, 2 = average, 3 = somewhat above average, 4 = good, 5 = very good, 6 = outstanding, 7 = truly exceptional. d On a three-point scale: 1 = less selective/good reputation, 2 = very selective/very good reputation, 3 = extremely selective/extremely good reputation.

The Sample of Faculty Participants

History and psychology departments were selected for the study because these two disciplines

typically require relatively extensive writing of students, and thus these departments seemed to us to be

reasonably likely prospects for adopting the GRE Writing Assessment. No attempt was made to identify

a representative sample of departments within these disciplines. Rather, for each of discipline, we

selected departments primarily on the basis of the high volume of GRE test scores that they currently

receive. Further, we identified clusters of departments that had, for a recent testing year, received GRE

General Test scores from at least some of the same test takers. That is, these departments appeared to

“compete” for the same or similar students, and could thus be expected to have somewhat similar

applicant pools. The aim here was to decrease the possibility that our simulated applicant pool would

deviate dramatically from the pool of applicants typically seen by the departments participating in our

12

study. This strategy obviated the need to tailor applications to each department, thus enabling us to use

the same set of admissions folders for every department within a single discipline.

Once departments were identified, letters of invitation were sent to department chairs. In order to

participate, each department was required to identify two faculty members who would be willing to

evaluate a package of fictionalized “admissions folders.” Each faculty participant received an honorarium

of $200. Our invitation intentionally solicited faculty who had recently been involved in graduate

admissions, and preferably, who currently served on a departmental admissions committee. A total of six

psychology departments and eight history departments agreed to participate. These departments were,

according to participants’ reports of admission rates, relatively selective. Selection ratios ranged from 7%

to 20% for psychology departments (median = 10%) and from 10% to 70% for participating history

departments (median = 33%).

From these participating departments, data were eventually received from nine psychology

faculty and 14 history faculty, all of whom had been involved in graduate admissions at some point in

their careers. The range of experience was from 3 to 25 years for psychology faculty (median = 20 years)

and from 1 to 25 years for history faculty (median = 5 years). A majority of the participants (6 of 9

psychology faculty and 13 of 14 history faculty) had been involved in graduate admissions during the

previous year.

Instruments/Procedures

Within each department, faculty members were randomly assigned to one of two conditions:

0 reading folders that contained both GRE Writing Assessment scores and examinee essays

0 reading folders that contained only GRE Writing Assessment scores, but no examinee essays

For each condition, participants received by mail a packet of materials, general information about the

study, and directions for completing the study tasks. Included in each packet were simulated admissions

folders for the 27 fictitious applicants, each applying to a doctoral program. Each folder contained the

following pieces of information:

13

an application’ for graduate study, which included GRE General Test scores, GRE Writing Assessment scores, undergraduate GPAs (overall and in major field), undergraduate institution, and proposed program of study (a sample application form is provided in Appendix A)

a recommendation from an undergraduate professor (a sample recommendation form is provided in Appendix B).

a summary of information gleaned from the applicant’s personal statement/statement of purpose (a sample personal statement form is provided in Appendix C)

For the score-plus-essays condition, each folder also contained two GRE Writing Assessment essays (one

issue essay and one argument essay) written by the applicant.

For the scores-only condition, application folders did not contain essays. However, an additional,

separate, sealed package containing applicants’ GRE essays was provided to participants who were

assigned to the scores-only condition. These participants were instructed to open this package and

evaluate the enclosed essays only after completing all other tasks. The aim here was to compare decisions

within the same group of faculty members according to whether or not they had access to GRE essays.

(For the score-plus-essays condition, this separate packet was missing, as applicants’ GRE essays were

already included in their application folders.) Application folders were arranged in a different random

order for each faculty participant.

Before completing study tasks, all participants were first asked to read material about the new

GRE Writing Assessment. This material contained background about the test, information about the

scoring criteria, and, for both the issue and the argument topics, examples of essays at each score level.

After becoming familiar with the Writing Assessment, participants in each condition were asked to judge

the admissibility of each of the hypothetical applicants included in the folders, indicating for each

applicant:

l the faculty member’s own recommendation for admission (deny or admit)

0 the faculty member’s estimate of the likelihood (0 to 100%) that their program/department would admit the applicant

8 Applications did not reveal the gender or the racial/ethnic background of participants, as investigating the effect of these factors was beyond the scope of the present study. Specifying these characteristics would, we believed, serve to confound the results, given the small number of applicants that were included. To conceal the gender of each applicant, gender-ambiguous first names -- such as Dana, Lee, and Robin -- were used.

14

We thought that these two different perspectives on the admissibility of applicants -- one referenced to

the individual faculty member and the other to an admissions committee -- might differ psychologically.

Furthermore, because of the nature of the resulting variables, either dichotomous or continuous, the

results might differ statistically also.

In making their judgments, participants were asked to take two passes through the applications,

first considering only information about the applicants’ writing skill, as reflected by GRE Writing

Assessment scores for the scores-only condition, or by GRE Writing Assessment scores and essays for

the scores-plus-essays conditiong. Then, on a second pass, they were to consider all the available

information for each applicant. After each pass, participants were asked to render the two admissions

decisions described above. Then, participants in the scores-only condition were asked to undertake a final

step -- to open the separate package of applicant essays and, on the basis of these essays, make the same

two judgments about each applicant’s admissibility again.

All participants were advised to evaluate the applications in a way that best matched their

department’s admissions procedures. For example, if applications are first sorted into broad categories

before making individual decisions, they should proceed accordingly for our study. The faculty were also

reminded that our “applicant pool” might differ in size and/or overall quality from their typical pool of

applicants. To minimize the potential effect of any such differences, they were asked to apply their usual

standards, so that students “admitted” from our applicant pool would approximate the general caliber of

their typical entering class.

In addition, recognizing that our study procedures could not faithfully simulate every aspect of

the graduate admissions process, we asked participants to accept a number of simplifying assumptions

and to make their best judgments in light of incomplete information. With regard to assumptions, for

instance, all of our hypothetical applicants had obtained undergraduate degrees in the same field to which

they were applying for graduate study, and all had graduated within six months of submitting their

applications. With respect to the recommendations we provided, study participants were asked to assume,

for the sake of standardization, that the unnamed recommendation writer was someone they knew and

respected, and who was active, but not necessarily well known, in their field. Because we supplied only

9 We asked participants to focus on the GRE Writing Assessment during their first pass to ensure that they would be at least somewhat oriented to this new measure. In retrospect, the alternative order -- that is, having participants attend first to all the available information in folders -- might have been even more appropriate, as this order might have minimized the likelihood of participants focusing unduly on the Writing Assessment in making their judgments.

15

one recommendation per applicant, participants were asked to assume further that other, missing

recommendations were consistent with the one provided.

With respect to missing information, we did not, for example, reveal a number of possibly

relevant characteristics of our fictitious applicants -- such as their gender, race/ethnicity, and age.

Furthermore, we omitted other important information, such as full undergraduate transcripts, which

departments undoubtedly consider when making real admissions decisions. Instead, we asked participants

to assume that each applicant had completed any undergraduate courses that were considered necessary

or prerequisite to their programs, and that performance in these courses was commensurate with

applicants’ overall undergraduate GPAs, as specified on the application. In addition, for each applicant

we identified only a fictitious undergraduate school, assumed to be regionally accredited, along with an

indication of that school’s selectivity/reputation as either (3) extremely selective/extremely good

reputation, (2) very selective/very good reputation, or (1) less selective/good reputation. Five to six

names of real undergraduate schools were given as examples of each category. Institutions were assigned

to these categories on the basis of ratings provided annually by such sources as Time Magazine, U.S.

News and World Report, Peterson’s Guides, and College Board listings. Only schools with

classifications that were consistent across these various sources were cited as examples.

As a next step, each participant made another round of admissions decisions for what they were

told was a second batch of 27 applicants; information for these applicants was given only in abbreviated

form. These “new” applicants were actually the same 27 applicants evaluated in the previous steps. For

this step, however, applicants (with names deleted) were presented in a different order, and information

was presented in a different format -- on a single, one-page form instead of in individual application

folders (A sample of this abbreviated application form is provided in Appendix D.). These abbreviated

applications were presented in order to obtain a crude estimate of the consistency of the decisions made

by each faculty participant. Because the abbreviated folders did not match the full folders exactly, this

procedure necessarily underestimates actual consistency.

Next, participants indicated, on a five-point scale (0 = “not considered at all” to 5 = “extremely

important”), the importance of each of several general factors (GRE General Test scores, undergraduate

grades, recommendations, and so on) in their admissions decisions. Using the same scale, they also

indicated the importance of each trait listed on our recommendation form, as well as each quality listed

on our summary of applicants’ personal statements. They were also asked to assign weights to each of

these admissions factors to indicate more precisely the relative importance they attached to each one.

16

This information allowed us to compare participants’ perceptions of the importance of various factors

with the weights that were computed for them when making their admissions decisions for our study.” It

also enabled us to assess the perceived importance of certain factors (for example, personal interviews)

that we were not able to consider in this study.

Finally, we asked participants to complete a brief background questionnaire about their

experience with graduate admissions. We also solicited their opinions as to how our admissions

procedures may have differed from the admissions procedures in their departments, and if so, how the

differences may have affected the admissions decisions they made for this study. Our objective here was

to identify any deviations that might have threatened the validity of our results.

When they had completed all of the tasks, participants returned all study materials to ETS. They

were promised that all information would be treated confidentially. Figure 1 summarr ‘zes the various

tasks undertaken by participants in each condition.

Before conducting the study proper, all instruments and procedures were pilot tested with

directors (and one assistant director) of graduate admissions at two graduate institutions in New Jersey.

Our simplifying assumptions were judged by these participants to be generally reasonable, and our

directions and procedures, understandable. The major concern they expressed was the omission of

personal statements or other nontest writing samples, which these directors thought might be useful for

obtaining a sense of applicants’ personalities.

lo We should be clear here about what this agreement (or lack thereof) may suggest. As one of our reviewers (Leona Aiken, personal communication, March 28,200O) pointed out, even if the actual importance given to pre-admissions variables in decision making is in complete agreement with participants’ perceptions of the importance of these variables, the computed regression weights can be expected to diverge from perceptions because, unlike perceptions, regression weights take into account the correlations among the pre-admissions variables. To put it simply, when two correlated predictors are both important, the stronger predictor will be “credited” with the variance explained by both predictors. The stronger predictor will then appear, according to its regression weight, to be even stronger, and the weaker predictor even weaker.

Study Condition

GRE Writing Assessment scores only GRE Writing Assessment scores plus essays

1. Read description of, and information about, the GRE Writing Assessment

2. Reviewed admissions folders, considering only GRE Writing Assessment scores

3. Made recommendations and estimated the likelihood of admission to department on the basis of GRE Writing Assessment scores only (Tables 2a, 2b)

4. Reviewed admissions folders again, this time considering all information in folders, including GRE Writing Assessment scores

5. Made recommendations and estimated likelihood of admission based on all information in folders (Tables 3a, 3b)

6. Read applicants’ GRE Writing Assessment essays that were provided separately, and made recommendations and estimates (Tables 7a, 7b)

2. Reviewed admissions folders, considering only GRE Writing Assessment scores and essays

3. Made recommendations and estimated the likelihood of admission to department on the basis of GRE Writing Assessment scores and essays only (Tables 2a, 2b, 5a, 5b)

4. Reviewed admissions folders again, this time considering all information in folders, including GRE Writing Assessment scores and essays

5. Made recommendations and estimated likelihood of admission based on all information in folders (Tables 3a, 3b, 6a, 6b)

7. Reviewed roster of abbreviated applications, made recommendations (Tables 4a, 4b)

8. Completed background questionnaire, provided reactions to the study procedures

9. Rated the importance of various pre-admissions factors (Tables 9a, 9b, 9c)

Figure 1. Summary of study procedures.

Analyses

To assess the degree to which participants made consistent decisions, we compared their

recommendations to admit or deny applicants based on the complete application folders with their

recommendations based on abbreviated applications for the same applicants. Finally, as is typical in most

policy capturing studies, the multiple R-square for each participant was taken as an indication of the

extent to which participants used the information that was available to them.

To determine the effects of GRE Writing Assessment scores, the presence of GRE Writing

Assessment essays in folders, and the prevalence of errors in these essays on admissions decisions, the

18

following analyses were conducted. For psychology faculty (and separately for history faculty),

hierarchical” regression analyses were performed in which (1) admit/deny recommendations and (2)

likelihood of admission estimates served, in turn, as dependent variables. At each stage, a significance

test was computed to determine the contribution of each additional independent variable to the multiple

correlation. Logistic regression was used to analyze the dichotomous admit/deny recommendations;

ordinary, least-squares linear regression was used for estimates of the likelihood of admission, to which a

logit transformation was first applied in order to better meet the assumptions underlying the analysis.

Independent variables were entered in the following sequence:

1. department to which the applicant applied -- that is, the institutional affiliation of faculty member (dummy codes were assigned to represent each department)

2. GRE General Test verbal scores, GRE General Test quantitative scores, GRE General Test analytical scores, overall undergraduate GPA, undergraduate GPA in major, summary faculty recommendation, overall rating of promise by recommendation writer, strength of personal statement, and selectivity of undergraduate school

3. GRE Writing Assessment scores

4. presence of GRE Writing Assessment essays in the admission folder

5. a product variable indicating the interaction between GRE Writing Assessment score and presence of GRE Writing Assessment essay in folder

The decision to enter GRE writing scores after other admissions variables reflects our interest in the

impact of the GRE Writing Assessment above and beyond the impact of traditional pre-admission

measures. It is possible, of course, that in actual admissions decisions, GRE writing scores could

supplant, not just supplement, other pre-admissions measures, thus making our analytic strategy a

conservative one with respect to estimating the impact of the GRE Writing Assessment.

Within each of the two disciplines, we treated each admissions recommendation (and each

likelihood estimate) as an independent observation (so that, for example, 27 applicants times 9

psychology faculty = 243 observations each for both admit/deny recommendations and likelihood

estimates). The same analyses were repeated for the data based on abbreviated applications (except that

variables 4 and 5 were not relevant to these).

l1 By “hierarchical,” we mean that explanatory variables, or sets of them, were entered cumulatively in a prespecified order as described by Cohen & Cohen ( 1983).

19

The same kind of analysis was repeated for the subset of participants who were given application

folders that included both GRE Writing Assessment scores and essays. This time, however, in order to

assess the effect of construct-irrelevant errors, we included two dummy-coded variables to reflect the

presence of these errors (none, some, or many) in the GRE Writing Assessment essays. As in the

previously described analyses, these variables were added in a final step to the regression equations

containing other independent variables. The increase in R-square that resulted from considering the

prevalence of errors in essays suggested the extent to which participants were influenced by the presence

of these irrelevant features. This analysis was repeated for faculty whose folders did not include GRE

essays, but who reviewed GRE essays as a final, separate step.

Finally, although the 27 applicants were few in relation to the number of predictors -- or

admissions variables -- we also computed regression equations for individual faculty participants. Only

likelihood estimates, not recommendations to admit or deny, were used here. To assess the lack of

stability due to small sample size, we conducted the analysis for estimates based on both complete and

abbreviated application folders. For each faculty participant, we calculated the correlation between the

regression weights based on their reviews of complete and abbreviated applications. Finally, we

compared the results of the policy capturing analysis with the perceptions of participants regarding the

importance of the various admissions criteria.

Results

Admission Rates

The admission rates for our hypothetical applicants were quite similar regardless of the particular

study condition under which they were judged. For applicants to psychology departments, acceptance

rates ranged from 29% (when all information was considered) to 37% (when abbreviated applications

were considered). For history applicants, the rates ranged from 30% (when only writing performance was

considered) to 35% (when abbreviated applications were considered). These rates are relevant from two

perspectives. First, they show that the quality of the hypothetical applicants in our study, and the

standards applied to their admission by our study participants, were reasonably similar on average to

those reported by study participants for their departments. Secondly, the proportion of decisions to admit

versus those to deny is germane to the regression analyses that we report below, as this split is known to

affect regression and correlation coefficients and their errors, as well as the estimation of agreement

rates. These results suggest proportions that are not so extreme as to affect these statistics dramatically.

20

Reliability/Consistency

For decisions to admit or deny, the degree of agreement based on complete applications versus

the degree of agreement based on abbreviated applications ranged from 70% to 96% for individual

psychology faculty (median = 85%) and from 74% to 93% for history faculty (median = 81%). Overall,

the kappa statistics were K = .66 for psychology faculty and K = 57 for history faculty. R-square values

were relatively large for each participant (66 to .90 for history and 55 to .92 for psychology), suggesting

that participants did make use of the information provided in folders. Thus, although variation was

evident, faculty participants were generally consistent in making their decisions and estimates, even

when different formats were used to present slightly different information.

To obtain another indication of consistency, we compared each faculty participant’s

recommendations with his or her estimates of the likelihood of admission by the department.” We

defined the following combinations as being inconsistent:

0

admit recommendations with likelihood of department admission below 50%

deny recommendations with likelihood of department admission above 50%

Of the 243 “admissions decisions” made by psychology faculty, only two were inconsistent according to

this definition. Similarly, of the 324 decisions made by history faculty, only 13 were inconsistent, with

five denials having a likelihood of departmental acceptance above 50% and eight admits having a

likelihood of departmental acceptance below 50%. Nearly all of these inconsistencies involved likelihood

estimates that were very close to 50%. The correlations between recommendations to admit or deny and

the corresponding departmental likelihood estimates ranged from .79 to .89 for the several kinds of

ratings made by study participants.

Effect of GRE Writing Assessment Scores and Presence of Essays in Folders

The most straightforward test of the effect of the presence of GRE Writing Assessment essays in

the application folders is based on data from a subset of study participants (four psychology faculty and

seven history faculty) whose admissions folders contained only GRE Writing Assessment scores, and

who were asked to review the actual essays separately in a subsequent, decision-making step. Thus, this

l2 This index may be only a rough surrogate for consistency, as a given faculty member may appear to be inconsistent by this criterion when in fact he or she is only exhibiting standards that differ from those of other faculty members.

21

sample made two sets of admissions decisions: the first set based on only GRE Writing Assessment

scores and the second set based on those scores plus the essays from which the scores were derived.

Overall, 84% of these 297 admissions decisions (27 applicants times 11 faculty) remained the same

regardless of whether or not essays were available. A total of 5% of the 297 decisions changed from

“deny” to “admit,” and 11% changed from “admit” to “deny.” More than a fourth of the total number of

changes from “admit” to “deny” came from one history faculty member, whose changes were all in the

same direction. Individually, faculty members’ decisions remained the same 67% to 96% of the time,

regardless of whether essays were available. The median correlations between estimates of the likelihood

of admission under the two conditions were 38 for psychology faculty and .77 for history faculty.

Finally, changes in admissions decisions were examined for each applicant to ascertain whether

some applicants either suffered or benefited when their essays were available. There was no detectable

tendency for changes to be more often in one direction than the other (e.g., from deny to admit) with any

consistency for any of the 27 hypothetical applicants.

The more formal regression analyses are also revealing with respect to the effect of GRE essays

and other factors. Table 2a and Table 2b show for history faculty and for psychology faculty,

respectively, the influence of various factors on admit/deny recommendations and on likelihood

estimates when they were asked first to consider only information from the GRE Writing Assessment --

not any other pre-admissions variables -- in making their judgments. (We have included other admissions

variables in this analysis mainly only to ascertain the extent to which participants did, as instructed,

disregard this information when making their first pass through the folders. Some contribution to the

multiple R would be expected from these variables, even if not considered by faculty, because of their

association with GRJ3 writing scores.)

Our first observation from the analyses presented in Table 2a through Table 7b is that, for each

discipline, the logistic regression analyses using admit/deny recommendations as the dependent variable

(shown in the top half of each table) and the linear regression analyses using likelihood estimates as the

dependent variable (shown in the bottom half of each table) produced quite similar results. For each

analysis, the institutional affiliation of participants accounted for a small portion of variance in both

admissions recommendations and likelihood estimates (from 5% to 20% for history and from 6% to 21%

for psychology). This suggests that the departments represented in the study are reasonably, though not

completely, homogeneous with respect to their overall admissions standards.

22

From Tables 2a and 2b we see that, for both kinds of decisions, the greatest influence (that is, in

terms of the portion of variance explained) was the applicants’ GRE Writing Assessment scores -- the

factor on which faculty were asked to focus for this analysis. The estimated portion for history applicants

was 40% and for psychology applicants, 53%. The next greatest portion was explained by applicants’

standing on the other pre-admission variables (because of their correlation with GRE writing scores).

The variable of primary interest -- whether or not GRE Writing Assessment essays were included

in admissions folders -- accounted for virtually none of the variance in either individual admit/deny

recommendations or in estimates of the likelihood of admission by the department. For psychology

participants, however, the 1% of variance that could be accounted for by the likelihood estimates was

statistically significant (E c .Ol). To put the significance of this influence in perspective, we note that the

availability of essays in the folders of psychology applicants resulted in an average likelihood of

admission that was about 7 percentage points lower than for applicants whose essays were not available.

In contrast, each additional point on the l-to-6 point GRE writing-score scale resulted in a nearly 25

percentage point increase in the probability of admission. The availability of GRE essays interacted

significantly @ < .05) with essay score level only for recommendations made by psychology faculty, but

accounted for only 1% of the variance.

A more realistic picture of the likely influence of the GRE Writing Assessment emerges from an

analysis of admit/deny recommendations and likelihood estimates made on the basis of aEZ information in

the admissions folders, not just information related to the GRE Writing Assessment. Tables 3a and 3b

show that, after explaining variation due to (a) differences among institutions and (b) the influence of

traditional admissions information included in folders, GRE writing scores explained a statistically

significant, but small, proportion of variance in the decisions made. Estimates were 3% to 4% for history

and 6% to 10% for psychology.

The presence of essays in folders was a significant influence in only one of the analyses

(accounting for 2% of the variance) -- for the estimates of likelihood provided by psychology faculty.

Again, this result translated, on average, to a 7-percentage-point lower likelihood of admission when

essays were available. (The extent to which this effect may be the result of our insertion of errors into

some essays is addressed below.) When considered with all other pre-admissions variables, the influence

of GRE writing scores decreased substantially, with a one-point increase in GRE writing scores raising

the average likelihood of admission by about 6 percentage points. Again, there was some indication for

23

psychology of an interaction between the availability of essays and GRE writing-score level. Again,

however, the effect was small, explaining only 1% of the variance @ < .OS).

Tables 4a and 4b provide the results when faculty were asked to make a second set of judgments

for a “new” set of applicants for whom only abbreviated admissions folders, but no GRE essays, were

provided. The influence of GRE writing scores above and beyond other admissions information was

again quite modest, accounting for about 1% to 3% of the variance in faculty decisions.

Table 2a. Hierarchical Regression Analysis for All History Faculty Considering GRE Writing Assessment Only

Explanatory variable(s) Cumulative !L2 (or ,R” )

Increase in BL’ (or ,R” )

x2 (or IJ for increase df

Department

Admissions variables’

GRE writing score

Availability of essays

GRE writing score x availability

Department


GRE writing score



.15

.30

.70

.70

.70

.lO

.25

.65

.65

.65

Admit/deny recommendations

.15 72.6**

.15 74.g***

.40 192.6***

.OO 2.1

.OO 0.0

Likelihood of admission estimates

.lO 5 . 1***

.15 6 . 8***

.40 356.3***

.oo 0.7

.OO 0.2

7, 316

9, 307

1, 306

1, 305

1, 304

L&,2 is a logistic regression analog to B2 in ordinary least squares regression analysis. b2 Note. represents the proportional reduction in the value of the log-likelihood coefficient when alternative models are compared.

’ GRE verbal score, GRE quantitative score, GRE analytical score, overall undergraduate GPA, undergraduate GPA in major, summary recommendation, overall rating of promise, strength of personal statement, and selectivity of undergraduate school.

** p < .Ol. *** p < .OOl.

24

Table 2b. Hierarchical Regression Analysis for All Psychology Faculty Considering GRE Writing Assessment Only

Explanatory variable(s) Cumulative h2 (or B2)

Increase in _RL2 (or B2)

x2 (or FJ for increase df

Department


GRE writing score



Department


GRE writing score



.06

.31

.84

.85

.86

.07

.25

.78

.79

.79


.06 20.2* *

.25 84.2***

53 179.9***

.oo 1.6

.Ol 4.9*


.07 3.6**

.18 6 l*** .

.53 539.5***

.Ol 9.7**

.oo 1.6

5, 237

9, 228

1, 227

1, 226

1, 225

Note. h2 is a logistic regression analog tog2 in ordinary least squares regression analysis. a2 represents the proportional reduction in the value of the log-likelihood coefficient when alternative models are compared.

’ GRJZ verbal score, GRE quantitative score, GRE analytical score, overall undergraduate GPA, undergraduate GPA in major, summary recommendation, overall rating of promise, strength of personal statement, and selectivity of undergraduate school.

* p < .05. ** p < .Ol. ***p < .OOl.

25

Table 3a. Hierarchical Regression Analysis for All History Faculty Considering All Information

Explanatory variable(s) Cumulative _RL~ (or ,R2 )

Increase in BL~ (or ,R2 )

x2 (or E) for increase

Department .18

Admissions variables’ .48

GRE writing .51

Availability of essays .51

GRE writing x availability .51

Department .13


GREI writing .62

Availability of essays .62

GRE writing x availability .62


-18 96.4***

.30 154.9***

.03 16.9***

.oo 0.0

.oo 0.4


.13 6 . 6***

.46 44.0***

.04 35.4***

.OO 0.3

.OO 0.0

8, 368

9, 359

1, 358

1, 357

1, 356

Note. l&2 is a logistic regression analog to R2 in ordinary least squares regression analysis. b2 represents the proportional reduction in the value of the log-likelihood coefficient when alternative models are compared.


*** p c .OOl.

26

Table 3b. Hierarchical Regression Analysis for All Psychology Faculty Considering All Information

Explanatory variable(s) Cumulative h2 (or B2)

Increase in h2 (or _R2)

x2 (or IJ for increase

Department


GRE writing score



Department


GRE writing score



.17

.57

.66

.66

.66

.09

.53

.59

.61

.61


.I7 57.6***

.39 132.9***

JO 33.1***

.oo 0.1

.oo 0.0


.09 4 . 9***

.43 23.3***

.06 32.2***

.02 11.5***

.Ol 4.4*

5, 237

9, 228

1, 227

1, 226

1, 225

L2 Note. L2 is a logistic regression analog to B2 in ordinary least squares regression analysis. represents the proportional reduction in the value of the log-likelihood coefficient when alternative models are compared.


* p < .05. *** p < .OOl.

27

Table 4a. Hierarchical Regression Analysis for All History Faculty for Abbreviated Applications

Explanatory variable(s) Cumulative Increase in h2 (or _R2) k2 (or _Rz)

x2 (or FJ for increase Df

Department .16


GRE writing score .55


.16 86.2***

.37 193.6***

.Ol 7.0**

Department .19




.19 10.8***

.36 32.2***

.Ol 7.2**

8, 369

9, 360

1, 359

h2 Note. l&2 is a logistic regression analog to R2 in ordinary least squares regression analysis. represents the proportional reduction in the value of the log-likelihood coefficient when alternative models are compared.


** p c .Ol. *** p < .OOl.

28

Table 4b. Hierarchical Regression Analysis for All Psychology Faculty for Abbreviated Applications

Explanatory variable(s) Cumulative b2 (or _R2)

Increase in I&2 (or _R2)

II” (Or 1) for increase

Department .lO


GRE writing score 53


.lO 34.5***

.39 131.1***

.03 11.7***

Department .21




.21 12.3**” 5, 237

.41 27.5*** 9, 228

.Ol 3.3 1, 227

Note. L&2 is a logistic regression analog toR2 in ordinary least squares regression analysis. h2 represents the proportional reduction in the value of the log-likelihood coefficient when alternative models are compared.


*** p < .OOl.

Effect of Construct-Irrelevant Flaws in Essays

In an effort to understand the influence of the flaws we introduced into the essays, we conducted

several analyses of data from faculty whose folders contained the applicants’ GRE essays. The main

focus of these analyses was the influence on admissions decisions of irrelevant flaws in GRE essays, and

whether these flaws, if influential, might play a greater role for strong essays than for weak ones (or vice

versa).

Tables 5a and 5b show that when judgments were based only on the GRE Writing Assessment,

the prevalence of errors explained none of the variance in the decisions of psychology faculty, and only

about 2% of the variance in the decisions of history faculty. The latter results were significant (p c .OS)

only for likelihood estimates. These tables also show that, in each analysis, the interaction of GRE

writing-score level with the prevalence of errors accounted for little if any variance. The effects for

29

estimates of the likelihood of departmental admission are depicted in Figure 2 and Figure 3. Also

included in these figures are the likelihood estimates that were made when essays were not available.

Similar results were found when faculty based their judgments on all information in admission

folders (see Table 6a and Table 6b) . The prevalence of errors in essays accounted for a small,

statistically significant @ < .OS) portion of the variation in admit/deny recommendations (about 3% for

history and 4% for psychology), but none of the variation in likelihood estimates. The effects are such

that the presence of extra errors decreases the likelihood of admission somewhat, regardless of the

quantity of added errors (some or many).

We repeated the analysis for data from faculty who initially did not see essays in their

admissions folders, but who instead evaluated the essays as a separate step, after they had already

reviewed the admissions folders. Table 7a and Table 7b show that the prevalence of errors under these

circumstances had no statistically significant impact on either admit/deny recommendations or likelihood

estimates. The lack of any large, consistent effect is illustrated in Figure 4 and Figure 5. (We note that in

the analysis of admissions recommendations made by psychology faculty, about 8% of the variance was

attributed to the interaction of GRE writing-score level and the presence of errors. Although statistically

significant, this estimate may not be very trustworthy, as the analysis did not seem to “behave” in a

reasonable manner with respect to how it converged on this estimate.)

30

Table 5a. Hierarchical Regression Analysis for History Faculty Based on GRE Writing Assessment Only (Essays Available in Folders)

Explanatory variable(s) Cumulative RJ~ (or _R2)

Increase in k2 (or _R2)

x2 (or E) for increase Lf

Admit/denv recommendations

Department .20 .20 43.9*** 5

Admissions variables’ .36 .16 36.1*** 9

GRE writing score 73 .37 83.5*** 1

Prevalence of errors .75 .02 5.2 2

GRE writing score x errors 75 .oo 0.0 2

Department .05




.05 1.5

.I7 3 . 5***

.43 178.5***

5, 156

9, 147

1, 146

Prevalence of errors .66 .02 3.3* 2, 144

GRJS writing score x errors .68 .Ol 3.1* 2, 142

Note. l&2 is a logistic regression analog to 8’ in ordinary least squares regression analysis. h2 represents the proportional reduction in the value of the log-likelihood coefficient when alternative models are compared.


* p < .05. *** p c ,001.

31

Table 5b. Hierarchical Regression Analysis for Psychology Faculty Based on GRE Writing Assessment Only (Essays Available in Folders)

Explanatory variable(s) Cumulative I&2 (or B2)

Increase in h2 (or B2)

x2 (or FJ for increase df

Department


GRE writing score

Prevalence of errors

GRE writing score x errors

Department

Admissions variables 1

GRIZ writing score



.08

.33

.95

.96

.96

.ll

.28

.82

.82

.82


.08 14.5**

.25 47.4***

.62 116.6***

.OO 0.1

.oo 0.1


.ll 4.2**

.I7 3.2**

.54 364.8***

.OO 0.3

.OO 0.2

4, 130

9, 121

1, 120

2, 118

2, 116

Note. L&2 is a logistic regression analog toFJ2 in ordinary least squares regression analysis. L&2 represents the proportional reduction in the value of the log-likelihood coefficient when alternative models are compared.


* p c .05. ** p < .Ol. *** p c.001.

32

100

I 80

60

Mean Likelihood 50

A.. AdmPsfsion 40

30

20

I 10

I 0 2 2.5 3 3.5 4 4.5 5 5.5 6

GRE Writing Assessment Score

Figure 2. Mean likelihood of admission on the basis of the GRE Writing Assessment only, by score level, availability of essays, and prevalence of errors (psychology faculty).

100

90

80

70

Mean Likelihood 60

of Admission 5O

40

30

20

10

0. .

2 2.5 3 3.5 4 4.5 5 5.5 6 GRE Writing Assessment Score

Figure 3. Mean likelihood of admission on the basis of the GRE Writing Assessment only, by score level, availability of essays, and prevalence of errors (history faculty).

33

100

90

80

70 Mean

Likelihood 60 of

Admission 50

40

30

20

10

0

2 2.5 3 3.5 4 4.5 5 5.5 6


Figure 4. Mean likelihood of admission on the basis of the GRJZ Writing Assessment only (essays evaluated separately), by score level and prevalence of errors (psychology faculty).

100

90

80

70 Mean

Likelihood 60 of

Admission 50

40

30

20

10

0

2 2.5 3 3.5 4 4.5 5 5.5 6


Figure 5. Mean likelihood of admission on the basis of the GRE Writing Assessment only (essays evaluated separately), by score level and prevalence of errors (history faculty).

34

Table 6a. Hierarchical Regression Analysis for History Faculty Based on All Information (Essays Available in Folders)

Explanatory variable(s) Cumulative &2 (or B2 )

Increase in I&2 (or _R2)

x2 (or IJ for increase Lf

Department


GRE writing score



Department


GRE writing score



.15

.40

.44

.47

.48

.07

51

56

.57

.57


-15 39.5***

25 64.7***

.04 10.5**

.03 8.9”

.Ol 2.0


.07 2.4*

.43 16.7***

.06 22.2***

.OO 0.4

.Ol 1.6

6, 181

9, 172

1, 171

2, 169

2, 167

Note. k2 is a logistic regression analog toE2 in ordinary least squares regression analysis. F&2 represents the proportional reduction in the value of the log-likelihood coefficient when alternative models are compared.


* p < .05. ** p < .Ol. *** p < .OOl.

35

Table 6b. Hierarchical Regression Analysis for Psychology Faculty Based on All Information (Essays Available in Folders)

Explanatory variable(s) Cumulative RJ~ (or B2)

Increase in l&2 (or _R2)

x2 (or 1) for increase Lf

Department


GRE writing score


GREI writing score x errors

Department


GRE writing score



.17

.65

.76

.79

.81

.06

.65

.74

.74

.75

Admit/denv recommendations

.17 32.3***

.48 89.6***

.ll 20.1***

.04 6.6*

.Ol 3.7


.06 2.2

58 22 . 4***

.09 42.9***

.OO 0.9

.oo 1.1

4, 130

9, 121

1, 120

2, 118

2, 116

L&2 is a logistic regression analog to _R2 in ordinary least squares regression analysis. I&2 Note. represents the proportional reduction in the value of the log-likelihood coefficient when alternative models are compared.


* p < .05. *** p < .OOl.

36

Table 7a. Hierarchical Regression Analysis for History Faculty Based on GRE Writing Assessment Essays Only (Essays Available in Separate Packet)

Explanatory variable(s) Cumulative RJ~ (or _R2)

Increase in EL (or ,R” )

x2 (or IJ for increase ti


GRE writing score




GRE writing score



.30

.51

.53


.30 77J3***

.22 56.4***

.02 5.3

not estimable

.15

.48

.48

.49


.15 3 . 4***

.33 114.5***

.OO 0.1

.oo 0.6

9, 178

1, 177

2, 175

2, 173

Note. &’ is a logistic regression analog toR2 in ordinary least squares regression analysis. k2 represents the proportional reduction in the value of the log-likelihood coefficient when alternative models are compared.

’ GRE verbal score, GRJZ quantitative score, GRE analytical score, overall undergraduate GPA, undergraduate GPA in major, summary recommendation, overall rating of promise, strength of personal statement, and selectivity of undergraduate school.

*** p c ,001.

37

Table 7b. Hierarchical Regression Analysis for Psychology Faculty Based on GRE Writing Assessment Essays Only (Essays Available in Separate Packet)

Explanatory variable(s) Cumulative RJ~ (or _R)

Increase in I&2 (or g2)

x2 (or E) for increase lrlf



Prevalence of errors .62

GRE writing score x errors .70


GRE writing score



.15

.61

.62

.62


.28 42. I***

.32 47.2***

.02 3.5

.08 12.5**


.15 2.0*

45 112.3***

.Ol 1.0

.oo 0.6

9, 98

1, 97

2, 95

2, 93

k2 is a logistic regression analog to B2 in ordinary least squares regression analysis. L&2 Note. represents the proportional reduction in the value of the log-likelihood coefficient when alternative models are compared.


* p < .05. ** p < .Ol. ***p < ,001.

38

Influence of Other Pre-Admission Factors

As stated at the outset, a secondary purpose of the study was to ascertain, within the limits of our

data, the role of traditional pre-admission information in the graduate admissions process. For individual

faculty members and for aggregates of all history faculty and all psychology faculty, Table 8a and Table

8b provide the standardized regression weights for admissions judgments made on the basis of both

complete and abbreviated applications. Because of the relatively small sample of applicants, the weights

for individual faculty are not highly consistent across the two sets of applications. Median correlations

(over individual participants) between weights computed for pre-admission factors based on judgments

of complete and abbreviated applications were .52 for psychology faculty and .47 for history faculty.

Thus, individually, participants were at least modestly consistent in their decision making. (Part, and

perhaps most, of the apparent inconsistency is, of course, the result of our presenting slightly different

information and presenting it in a different format in the two types of applications.)

Although the study was not designed to determine the decision-making strategies of individual

faculty, the data do provide some sense of the influence of each admissions criterion on faculty

judgments about applicants. (The instability of the regression coefficients must be borne in mind,

however.) For example, with relative consistency, the decisions of history faculty participants 2,3,6, 8,

12, and 13 are relatively strongly related to GRE verbal scores. The decisions of history faculty member

4 are related to the quality of personal statements, and the decisions of participant 11 relate strongly to

faculty ratings of overall promise. Such differences are equally apparent among psychology faculty.

Thus, if our results are any indication, there is apparently considerable variation among individual faculty

__ often within the same department -- with respect to the importance they attach to various kinds of pre-

admission information.

In the aggregate, the judgments made by history faculty in our study correlated most strongly

with GRE verbal scores. Those made by psychology faculty correlated relatively strongly with GRE

verbal scores, overall undergraduate GPA, and ratings of overall promise. Thus, if our study procedures

bear any resemblance to actual admissions procedures at participating departments, then GRE verbal

scores in particular would seem to be an important factor in the admissions decisions at these

departments.

39

Table 8a. Standardized Regression Weights for Prediction of Likelihood Estimates for History Faculty

Variable

Indiyidual faculty participants Application

type 1 2 3 4 5 6 7 8 9 10 11

GRE verbal score

GRE quantitative score

GRE analytical score

Undergraduate GPA -- overall

Undergraduate GPA -- major

% Summary recommendation

Rating of overall promise

Personal statement

Selectivity of undergraduate school

GRJZ writing score

Multiple R

Complete 29 63’ Incomplete 38a 75c

Complete Incomplete

Complete Incomplete

Complete Incomplete

Complete Incomplete

Complete Incomplete

Complete 51 10 Incomplete 27 36

Complete 10 00 Incomplete 16 07

Complete Incomplete

Complete Incomplete

Complete 88 95 88 Incomplete 89 76 89

-11 10 -23 08

01 08 13 -26

22 10

42a 02

-00 -01 -10 -03

-30 -01 -19 -12

28 14 12 39a 11 28a

42b 24’ 17 20 -32 06

39a 46b

07 -09

07 14

21 41a

13 -03

23 -34

00 40

03 08

-07 79= 53’ 16 73’ 24 28 18 63’ 61’ 99c 5oc

16 -21 -13 33 -02 -04 -05 -16 10 -03 -16 04

-08 -00 04 14 -16 24 -13 29 02 -19 -06 07

31 -16 01 -03 -11 00 -08 19 01 04 -03 01

-35 17 19 04 28 -08 22 -06 17 08 02 -21

02 -21 -49 14 -18 17 -34 -34 07 -11 01 05

17 35 59a 04 32 -14 45 20 10 47a 10 22

61’ 15 25a 01 -06 12 76c 17 25a 2ga 17a 11

30 24a 08 14 39c 21 -09 24 15 3oa 11 16

-01 07 33b 39b -03 53c -13 4sb -15 -24a -06 41’

85 92 91 85 90 89 86 82 91 91 96 95

05 15

2oa -11

09 36

34c 28

01 -21

3oa 35

23 -22

00 14

11 22

2gc 27

97 81

-20 08

-14 -23

26 -01

31a 36b

-00 -12

-21 -23

7gc 75c

15 3sc

19 33’

34c 04

94 94

Note. Weights greater than .30 in absolute value appear in bold.

ap < .05; bp < .Ol; cp < .OOl

Table 8b. Standardized Regression Weights for Prediction of Likelihood Estimates for Psychology Faculty Participants

Variable

Individual faculty participants Application All

type 1 2 3 4 5 6 7 8 9 faculty

GRE verbal score Complete Incomplete

-08

01

GRE quantitative score Complete 05

Incomplete -24a

GRE analytical score

Undergraduate GPA -- overall

Undergraduate GPA -- major

Summary recommendation

Rating of overall promise

Personal statement

Complete Incomplete

Complete Incomplete

Complete Incomplete

Complete Incomplete

Complete Incomplete

Complete Incomplete

Complete

Incomplete

Complete Incomplete

37= 39c

21 17

-17 12

11 11

32

37a

Selectivity of under-

graduate school

GRE writing score 39c 02

Multiple R Complete 96

02 22

04 08

41b -09

11 08

-12 17

06 07

04 20

08 18

14 14

60’

52c

93 90

21

2sc

07 06

08 -04

7oc 77c

-01 -14

15 -28

-18

4sc

05

09

13

23’

16 13

95 97

14

13

08 23

43a -05

53b 9oc

-20 -36a

-22 -07

29 23

07

25a

14

-00

03 24 08 -00

87 91 Incomplete 96

Note. Weights greater than .30 in absolute value appear in bold.

ap < .05; bp < .Ol; cp < .OOl

3Sa

62c

51b 20

-14

-01

05 24

-09 -00

-18 -12

20 08

-07 20

17 12

86 90

-07

26a

33a 12

32a 32b

23

37b

-15 03

-20 -25

54a 35

-05

21a

11

13

33b 02

91 95

41b

6gc

42b

36’

-16

-13

19 05

01 11

14 -22

-09

49a

-03 05

12

-03

34b -16

92 93

06 73’

32a 79’

-01 19

26 33

28a -53a 05 -49b

58’ 25 32 -11

-18 -00 -34a 11

-05 -55 -26 -22

21 61 42 49

13 12

48’ -06

23a 06 -01 05

34c 04 25a 06

94 81 91 87

19c 34c

16b 14a

11 -02

3oc 27’

-10 -02

-08 -12

21a 32’

06

17c

14c 09

26’ 09a

74 81

As the final step of our study, we asked participants to rate, on a O-to-5 scale, the importance they

place on each type of pre-admission information contained in our admission folders. Table 9a shows the

participants’ ratings, as well as the relative weights that they assigned to each factor. (Table 9b and Table

9c display faculty ratings of the importance of the personal qualities that were listed in our fictitious

recommendations and personal statements, respectively.) Here, too, we see considerable variability

among participating faculty. However, these results differ somewhat from the results of our analysis of

the relationship between faculty recommendations and applicants’ standing on pre-admission variables.

That is, faculty reports of the importance of various admissions factors do not correspond precisely with

the actual weights that these factors receive in the admissions process. This difference may be the result,

in part, of both the correlations among the admissions factors included in the regression analyses, and the

incomplete correspondence between information included in admissions folders and the factors that

faculty were asked to rate. It may also suggest, however, that, to some degree, faculty perceptions of

importance do not fully reflect the actual weights that the factors receive. This, of course, is policy

capturing’s raison d ‘h-e.

42

Table 9a. Mean Faculty Ratings and Weightings of Importance of Admissions Factors

Factor Psychology History M SD M SD

GRE verbal score ,,,,,..,...,..~I....II....1..........I.. Rating Weight

GRE quantitative score .,...............I............. Rating Weight

GRE analytical score ,,,,...,.,...,,................... Rating Weight

GRE Subject Test score . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Rating Weight

Undergraduate GPA (overall) . . . . . . . . . . . . . . . . . . . . . Rating Weight

Undergraduate GPA (in major). . . . . . . . . . . . . . . . . . . Rating Weight

Undergraduate GPA (final two years) . . . . . . . . Rating Weight

Undergraduate course of study . . . . . . . . . . . . . . . . . . . Rating Weight

Quality of undergraduate institution . . . . . . . . . . . Rating Weight

Recommendations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Rating Weight

Personal statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Rating Weight

Interview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Rating Weight

Writing samples . . . ..a.................................... Rating Weight

4.2 (O-6) 47 (22) 4.2 (0.6) 47 (22) 2.6 (1.6) 30 03) 2.6 (1.1) 30 (23)

4.1 (0.6) 48 (26)

4.0 (0.9) 48 (29)

4.2 (0.7) 59 (31)

3.3 (0.7) 32 (20)

3.3 (0.8) 40 (27)

3.9 (0.9) 49 (21)

3.1 (0.9) 31 (22)

3.1 (1.7) 29 (16)

2.7 (1.6) 29 (13)

4.4 (O-6) 61 (46)

2.5 (1.3) 25 (12)

3.5 (0.9) 40 (19)

0.4 (1.1) 10 (4)

3.3 (1.4) 51 (52)

3.8 (0.9) 53 (51)

3.8 (0.9) 52 (46)

3.9 (0.8) 52 (46)

3.7 (0.9) 55 (52)

4.1 (0.8) 53 (41)

4.6 (0.6) 76 (72)

0.3 (0.8) 13 (16)

4.0 (1.4) 78 (76)

Note. Ratings and weights were provided by 9 psychology faculty and by 13 history faculty. Ratings were provided on a O-to-5 scale with 5 = extremely important, 4 = very important, 3 = moderately important, 1 = of little importance, and 0 = not considered at all. Respondents were asked to give a weight of 10 to the least important factor. There was no limit at the upper end of the scale.

43

Table 9b. Mean Faculty Ratings of Importance of Information Available From Recommendations

Information Psychology History

Academic preparedness for proposed program... M SD

Intellectual ability/capacity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . M SD

Aptitude (skill or potential) for research . . . . . . . . . . . . M SD

Teaching ability/potential . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . M SD

Ability to communicate orally . . . . . . . . . . . . . . . . . . . . . . . . . . . . M SD

Ability to communicate in writing . . . . . . . . . . . . . . . . . . . . . . M SD

Creativity/originality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . M SD

Independent thinking . . . . ..*.................................... M SD

Dependability/reliability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . M SD

Initiative . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . M SD

Seriousness of purpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . M SD

Professional expertise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . M SD

Character/integrity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . M SD

Promise of productive scholarship . . . . . . . . . . . . . . . . . . . . . . M SD

Emotional maturity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . M SD

Motivation for proposed program of study.. . . . . . . . M SD

3.8 (0.4)

4.6 (0.5)

4.3 (0.7)

2.8 (0.9)

3.7 (0.7)

4.2 (0.4)

3.7 (0.8)

4.2 (0.4)

3.8 (0.6)

4.0 (0.7)

4.0 (0.7)

2.7 (0.9)

4.1 (0.7)

3.9 (0.6)

3.6 (0.5)

4.2 (0.4)

3.9

(0.9)

4.9

(0.3)

4.6 (0.6)

3.5 (0.7)

3.9 (1.0)

4.8 (0.4)

4.1 (0.7)

4.4 (0.6)

3.9 (0.7)

3.9 (0.8)

4.2 (0.8)

3.0 (0.9)

2.9 (1.6)

4.2 (0.8)

3.0 (1.4)

4.1 (1.0)

Note. Ratings were provided by 9 psychology and 14 history faculty. Ratings were provided on a O-to-5 scale with 5 = extremely important, 4 = very important, 3 = moderately important, 1 = of little importance, and 0 = not considered at all. Respondents were asked to give a weight of 10 to the least important factor. There was no limit at the upper end of the scale.

44

Table 9c. Mean Faculty Ratings of Importance of Information Available From Personal Statements

Information Psychology History

Match of applicant’s interests with those of prospective advisors . . ..I...................... M SD

Match of applicant’s professional aspirations with program objectives . . . . . . . . . . . . . . . . . . M SD

Applicant’s rationale for graduate study . . . . . . . . . . . . . . . . . . . . . . . ..I........................................ M SD

Likely commitment to the field . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..I................................... M SD

Evidence of motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . M SD

Evidence of maturity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . M SD

Evidence of other personal characteristics likely to facilitate degree completion . . . M SD

Degree to which applicant has overcome prior obstacles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . M SD

Evidence of relevant experiences, special interests, or achievements . . . . . . . . . . . . . . . . . . . . . M SD

Depth of knowledge . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . M SD

Evidence of a “voice” (personality, style, etc.) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . M SD

4.1

(0.6)

4.2 (0.6)

3.6 (1.0)

3.9 (0.6)

4.3 (0.7)

3.9 (0.9)

4.1 (0.6)

2.7 (1.1)

3.1 (1.4)

3.3 (0.7) 2.4

(1.1)

4.4 (0.7)

4.2 (0.9)

4.4 (0.6)

4.1 (0.9)

4.1 (0.7)

4.0 (0.8)

3.4 (1.0)

2.9 (0.7)

3.5 (0.8)

3.4 (0.6) 3.7

(0.7)

Note. Ratings were provided by 9 psychology and 14 history faculty. Ratings were provided on a O-to-5 scale with 5 = extremely important, 4 = very important, 3 = moderately important, 1 = of little importance, and 0 = not considered at all. Respondents were asked to give a weight of 10 to the least important factor. There was no limit at the upper end of the scale.

45

The Fidelitv of Our Simulation

Because it was impossible to faithfully simulate all aspects of actual graduate admissions

procedures, we asked participants to tell us how, if at all, our study procedures (admissions folders, etc.,

for instance) differed from the admissions procedures that they normally use, and how these differences,

if any, might have affected the “admissions decisions” they made for our study, relative to the decisions

they routinely make. In their responses, participants indicated that, when compared with actual

admissions procedures, our materials and methods were “similar,” “very similar,” “pretty close,” “had no

significant differences,“or”contained most of the information needed.”

At the same time, some participants identified ways in which our procedures differed from actual

graduate admissions. For example, one participant noted that the procedures used by his department were

“more straightforward” than our procedures, and another mentioned the need to adjust her thinking for

our “applicants” because none of them possessed masters degrees. The difference cited most often -- by

about a third of the participating faculty -- was the lack of actual transcripts or information about

performance in specific, critical undergraduate courses. About the same number of faculty observed at

least one of the following differences: the lack of written letters of recommendation, the lack of multiple

recommendations, or the lack of knowledge of the recommendation writer. Four or fewer participants

mentioned: (a) sources (such as interviews, writing samples, and actual personal statements), (b)

information (about computer skills, language proficiency, research experience, and ethnic identity, for

instance), or (c) opportunities (for example, to the opportunity to contact applicants or their advisors

personally) that our admissions process did not provide.

Several faculty commented on what they regarded as inconsistencies in our admissions folders --

for example, too many recommendations that were either too qualified or too candid (in contrast to actual

recommendations, which are, invariably, glowing), and applicants from highly selective schools who had

only average GRE scores. Several faculty also stated that their departments require all applicants to

possess a master’s degree before being considered for the doctoral program; none of our applicants did.

Finally, several also reported that their department’s admission process occurs in stages; typically, all

information is not considered simultaneously.

Despite these differences, most participants felt that, even if they had followed their

departments’ procedures instead of ours, their admissions decisions and estimates would have been about

the same. In response to our question about the likely effect of differences between their admissions

procedures and those used in our study, typical comments were:

Not a great deal.

My decisions weren’t aflected (except that I may have rejected some applicants whose advisors might have convinced me that my concerns were unfounded).

The deviation would only very occasionally a#ect my decisions.

Probably very little efect.

Made little dtrerence.

On the other hand, some expressed more serious concerns:

I had little confidence in my decisions without interview data.

Applicants with weak credentials might have been accepted if they had had strong clinical or research experience.

A4y responses were thrown ofsby the unrealistic character of the folders (they didn’t look “normal” to me).

Discussion

The overarching aim of this study was to ascertain the likely role of the GRE Writing

Assessment in graduate admissions decisions. A more specific objective was to determine the impact that

examinees’ GRE test essays would have on these decisions if the essays were made available to test-

score users during the admissions process. A secondary goal was to simultaneously assess the influence

of more traditional admissions criteria on the graduate-school application process.

Analyses revealed that scores from the GRE Writing Assessment accounted for a statistically

significant, but small, portion of the variation in faculty decisions above and beyond that explained by

traditional pre-admissions measures, and that faculty decisions were strongly related to applicants’

standing on traditional pre-admissions measures, such as GRE General Test scores, undergraduate

grades, faculty recommendations, and personal statements. Thus, if our study participants are at all

representative of potential users of the GRE Writing Assessment, it is likely that GRE writing scores will

play some role in graduate admissions decisions. However, while there is, of course, no guarantee that

every admissions decision will be unaffected by the impression that an applicant’s essay makes, our

results also suggest that the availability of applicants’ GRE essays will probably have little additional

influence on admissions decisions beyond that wielded by the writing scores themselves.

An additional analysis suggested that the presence of incidental writing errors in the actual test

essays sent to admissions departments are not likely to impress graduate faculty differently than similarly

scored essays that do not contain such errors. While noticeable, such errors are unlikely to hinder

communication (and are therefore accorded little weight in the operational scoring of GRE essays);

accordingly, this analysis revealed only a slight (and not entirely consistent) influence of these errors in

relation to the influence of applicants’ GRE Writing Assessment scores. There was a slight indication

that these errors may exert a slightly greater, negative influence on strong essays than on weaker ones.

We note, however, that in order to detect this effect, we had to insert either “some” or “many” errors into

the essays, thus creating an artifice for the purposes of this study. It has been our experience to date that,

in actual practice, GRE test takers who are skillful enough to earn high writing scores rarely write essays

with so many errors. Graduate applicants who are able to discuss complex ideas in an organized, well-

developed essay tend not to make the abundance of trivial errors that we inserted into essays for this

study. Thus, by using extreme examples of the kinds of essays that admissions staff might see if essays

were released, this study has perhaps presented a worst-case scenario.

Limitations

The first limitation of our study is characteristic of many research efforts. By providing material

that described the focus of our study -- that is, the GRE Writing Assessment -- we may have called extra

attention to the measure, thereby possibly elevating its importance in the hypothetical admissions

decisions made by study participants. This limitation was unavoidable, however, as it was necessary to

acquaint participants with the GRE Writing Assessment in order to allow its consideration in the

decision-making process.

Second, as with all simulations, the judgment process that we studied was necessarily somewhat

artificial. Although faculty participants reviewed admission folders that were developed from actual

application materials, they did not make real admissions decisions. The “admissions process” that we

employed was, undoubtedly, different in a variety of ways from the complex and varied processes that

graduate admissions committees typically employ. For instance, our applicant pool was not tailored

precisely to each department, and we did not necessarily provide study participants with all of the

information (or in the same form) that they typically review when making real admissions decisions.

48

Instead, we asked participants to make simplifying assumptions about both the admissions process and

the “applicants” they were asked to evaluate. For example, graduate admission decisions are usually the

work of committees, not of individual decision makers, as our study procedures suggest. (Along these

lines, one of our reviewers suggested that that the deliberations of admissions committees might in effect

serve as further training on the appropriate use of the GRE Writing Assessment, thereby resulting in even

fewer misreadings of applicants’ essays than we found here.) Moreover, we tacitly assumed a fully

decentralized model of graduate admissions, instead of a more centralized model in which baseline

standards and procedures are established on a university-wide basis (Council of Graduate Schools, 1993).

A further simplification, as is typical of many policy capturing studies, was the use of ordinary

linear regression to model the judgments of decision makers. It is well known, however (and confirmed

by study participants), that information is not always combined, and decisions not always made, in a

simple linear fashion. A rather extensive literature suggests, however, that despite the complexity of

human decision making, many kinds of decisions can be accurately represented by simple linear models

(see, for example, Dawes, 1979; Dawes & Corrigan, 1974; Dawes, Faust, & Meehl, 1989; and Wiggins,

1973, chapter 4).

The validity of our results also depends in part on how representative the “extraneous errors”

used in our study were of the errors that we can expect to find in actual GRE essays. Most of the errors

that we introduced into essays were relatively minor ones -- spelling or typing errors, minor grammatical

mistakes, or careless misstatements of fact. Introducing more serious errors in grammar or syntax might

have had a greater impact on “admissions decisions.” However, it is likely that essays containing these

more serious errors would also have received lower GRE scores. Thus, any devaluation by faculty of the

writing exhibited in these essays would probably have been appropriate.

In addition, besides errors we did not investigate other characteristics that might influence

perception of GRE essays. For example, we did not consider whether such factors as distinct cultural

dialects might sway (or bias) faculty perceptions. From one perspective, however, a concern about

dialects may be largely unwarranted. During our inspection of a relatively large number of GRE essays,

we found many essays that contained examples of international English (British spelling, non-American

phrasing, and so on) and references to foreign homelands, but we encountered virtually none that used

cultural or regional dialect. Most applicants to graduate school, it seems, choose to write their essays in

what might be called formal, academic English (standard American or International English.) Thus, we

have some evidence that GRE examinees do not frequently employ dialects in their test essays. We do

49

not, however, know how the apparently rare use of such dialects in GRE essays may affect graduate

admissions decisions.

Finally, our study was restricted to only two disciplines, which surely do not adequately

represent all possible graduate fields. As noted earlier, psychology and history were selected because

these two disciplines typically require relatively extensive writing of students, and because we were

advised that departments in both of these disciplines are interested in adding GRE Writing Assessment

scores to the information that they consider for graduate admissions. Because we do not know the extent

to which our study results apply more widely, it would be appropriate, we think, to extend the study to

other disciplines, since a multi-disciplinary advisory committee approved the measure as a means of

assessing writing skills that are important in “most” disciplines.

Imnlications

It is clear that our study has not fully addressed all of the potentially important aspects of the

problem we have set forth. However, despite the various limitations mentioned here, we believe that our

study possessed a reasonable degree of external validity for answering the primary questions of interest.

The study’results have, we believe, significance on several levels -- foremost for informing decisions

about the GRE Writing Assessment, but also for understanding more fully both the graduate admissions

process and the role of GRE scores in this process. Our findings may also have some relevance to

performance assessment more generally.

A current, long-standing objective of the GRE Board is to better understand how its test offerings

facilitate and influence graduate admissions decisions. Toward this end, this study has provided some

additional information about the emphases, or relative weights, that departments in two disciplines place

on GRE scores and other traditional pre-admission information. The results contribute modestly, we

believe, to the existing body of knowledge concerning the use of GRE test scores in graduate admissions

(see, for instance, Oltman & Hartnett, 1985). Further, the study provides additional information about the

feasibility and utility of a set of procedures that could be employed for further study of graduate

admissions. Possibly, these procedures could be extended in the future to other disciplines, other

decision-making contexts (such as decisions about fellowships and financial aid), or any new measures

being considered for graduate admissions.

More broadly speaking, we believe the results also have implications for the type of score

reporting that is associated with performance assessment and other forms of constructed-response testing.

50

The general issue here is the extent to which the products or performances generated by test takers may

contain useful information that is not captured solely in summary evaluations of these products -- that is,

the test scores. For example, a potential employer might wish to view an applicant’s videotaped

performance in a certified technical trainer program; similarly, a school administrator might wish to read

a teacher-applicant’s test essay to ascertain the writer’s position on sensitive political, social, or moral

issues. Our study provides some clues, we believe, as to whether one such work product -- the GRE

Writing Assessment essay -- may be welcomed by graduate departments, and also whether any potential

for misuse or misinterpretation may be associated with releasing this product.

There are, however, to the best of our knowledge, no formal guidelines regarding the

circumstances under which it is appropriate to release examinees’ work products. According to the

recently published Standards for Educational and Psychological Testing (AERAlAF’AfNCME, 1999),

“Test scores, per se, are not readily interpreted without other information . . . ” (p.62). It is incumbent on

testing programs, therefore, to provide a variety of interpretive materials that describe, for instance, what

a test covers, what test scores mean, and how scores should (and should not) be used. The standards are

silent, however, with respect to whether examinees’ test responses should be included in these materials.

Increasingly, as test users become more and more informed about testing issues, they are calling

for more information than is contained in a single test score. For some kinds of assessments, test

responses may help to meet test users’ demand for more comprehensive information. It seems clear,

however, that if examinee responses -- essays for example -- are to be provided to test users, it would be

prudent to assess not only the potential value of this information, but also the likelihood that it will be

misused.

Our study detected little evidence of inappropriate (construct-irrelevant) judgments when

examinee essays were released to graduate faculty. Thus, for one large-scale test -- the GRE Writing

Assessment -- making applicants’ responses available to admissions staff will probably not give rise to

abuse, at least with respect to the likelihood that graduate faculty will focus unduly on inappropriate

features of applicants’ writing. From this perspective, therefore, reservations about releasing test takers’

essays -- that test score recipients may focus on trivial errors, apply inappropriate standards, or react to

the essays in a biased or unfair way -- seem less compelling. There is little evidence in this study to

support such reservations: Even without formal training in evaluating GRE essays, the faculty who

participated in our study were not influenced excessively by noticeable but relatively trivial, construct-

51

irrelevant errors of the kind that GRE essay readers are trained to downplay when they evaluate the

overall quality of GRE essays.

Further Considerations

This study has provided some information about how graduate admissions personnel perceive

GRE writing scores when examinees’ essays accompany those test scores. However, if the GRE program

were to consider sending essays to graduate departments, additional questions would probably need to be

answered, and other factors should likely be considered, in order to decide the conditions -- if any --

under which these essays are ultimately released. For example:

l Should the release of essays be automatic, or should it be optional -- for example, only with the test taker’s permission? What are the implications of some candidates denying permission? When might it be to the writer’s advantage to release (or withhold) his or her essays?

l Should the essays be available to graduate departments only under certain conditions? For instance, should the GRE program provide training to test users, or require some assurance that test essays will be used appropriately, before authorizing their release?

l How might the decision to release essays affect the construct being assessed? In particular, would the specification of graduate admissions faculty as a second audience (that is, in addition to trained GRE essay readers) affect test takers’ writing? Might writers put forth greater effort to include or emphasize discipline-specific content? Might they be more cautious in order to limit inaccuracies that might be noticed by faculty in their discipline?

l Would some departments be inclined to use GRE essays for alternative purposes (for example, diagnosing writing problems) without first formally establishing the validity of scores for such purposes?

l What are the implications for challenging the accuracy of GRE Writing Assessment scores? Might there be more requests for restoring as a result of faculty review?

These are some of the additional issues that should probably be considered, along with the

information provided by the study reported here, before a decision is made about providing examinees’

GRE essays to test score users. Finally, as Art Young (personal communication, March 27,200O) pointed

out to us, there may also be problems associated with not releasing GRE essays. We have not speculated

about these potential problems here.

52

References

Aamodt, M. G., Bryan, D. A., & Whitcomb, A. J. (1993). Predicting performance with letters of recommendation. Public Personnel Management, 22, 8 l-90.

American Educational Research Association/American Psychological Association/ National Council on Measurement in Education (1999). Standards for educational and psychological testing. Washington, DC: American Educational Research Association.

Ceci, S. J., & Liker, J. K. (1986). A day at the races: A study of IQ, expertise, and cognitive complexity. Journal of Experimental Psychology: General, 115, 255-266.

Cohen, J., 8z Cohen, P. (1983). Applied multiple regression/correlation analvsis for the behavioral sciences (2nd ed.). Hillsdale, NJ: Lawrence Erlbaum.

Connors, R. J., & Lunsford, A. A. (1988). Frequency of formal errors in current college writing, or Ma and Pa Kettle do research. College Composition and Communication, 39, 395409.

Council of Graduate Schools. (1993). A policy statement: An essential guide to graduate admissions. Washington, DC: Council of Graduate Schools.

Dawes, R. M. (1971). A case study of graduate admissions: Application of three principles of human decision making. American Psvchologist, 26, 180-l 88.

Dawes, R. M. (1979). The robust beauty of improper linear models. American Psychologist, 34,571-582.

Dawes, R. M., & Corrigan, B. (1974). Linear models in decision making. Psychological Bulletin, 81,95- 106

Dawes, R. M., Faust D., & Meehl, P. E. (1989). Clinical versus actuarial judgment. Science, 243,1668- 1674.

Dougherty, T. W., Ebert, R. J., & Callender, J. C. (1986). Policy capturing in the employment interview. Journal of Applied Psychology, 7 1,9-15.

Educational Testing Service. (1998). 1998-1999 GRE guide to the use of scores. Princeton, NJ: Educational Testing Service.

Educational Testing Service. (1999). 1999-2000 GRE guide to the use of scores. Princeton, NJ: Educational Testing Service.

Ferguson, F. J. (1991). Voices of their own: Students’ biographies and the college application. College Board Review, 158, 18-21,32.

Finkelstein, M. A., & Brannick, M. T. (1997). Making decisions about sexual intercourse: Capturing college students’ policies. Basic and Annlied Social Psvchologv, 19, 101-120.

Freedman, S. (1979). How characteristics of student essays influence teachers’ evaluations. Journal of Educational Psychology, 7 1, 328-338.

Gaeth, G. J., & Shanteau, J. (1984). Reducing the influence of irrelevant information on experienced decision makers. Organizational Behavior and Human Performance, 33,263-282.

Goldberg, L. R. (1970). Man versus model of man: A rationale, plus some evidence, for a method of improving on clinical inferences. Psvcholoeical Bulletin, 73,422-432.

Gomey, B. E., & Jaeger, R. M. ( 1995). A pilot study on the use of five new GRE subtests for admissions screening by faculty in two academic disciplines: An apnlication of judgmental policy canturing. Greensboro, NC: The University of North Carolina at Greensboro, Center for Educational Research and Evaluation.

Hairston, M. (198 1). Not all errors are created equal: Nonacademic readers in the professions respond to lapses in usage. Collepe English, 43,794~806.

Hammond, K. R., Mumpower, J. L., & Smith, T. H. (1977). Linking environmental models with models of human judgments: A symmetrical decision aid. IEEE Transaction on Systems, Man, and Cybernetics, SMC-7,358-367.

Harrison, P. D., Ryan, J. M., & Moore, P. S. (1996). College students’ self-insight and common implicit theories in rating of teaching effectiveness. Journal of Educational Psvchologv, 88,775-782.

Hatch, J. A., Hill, C. A., & Hayes, J. R. (1993). When the messenger is the message: Readers’ impressions of writers’ personalities. Written Communication, 10, 569-98.

Hawkins, B. D. (1993). Educators push for diverse grad school admission criteria. Black Issues in Higher Education, 10, 26-30.

Hobson, C. J., Mendel, R. M., & Gibson, F. W. (1981). Clarifying performance appraisal criteria. Organizational Behavior and Human Performance, 28, 164-188.

Janopoulos, M. (1992). University faculty tolerance of NS and NNS writing errors: A comparison. Journal of Second Language Writing. 1,109-121.

Keith-Spiegel, P. (1991). The complete guide to graduate school admission: Psychology and related fields. Hillsdale, NJ: Lawrence Erlbaum.

Kline, T. J. B., & Sulsky, L. M. (1995). A policy-capturing approach to individual decision making: A demonstration using professors’ judgements of the acceptability of psychology graduate school applicants. Canadian Journal of Behavioural Science, 27, 393-404.

&rouse, S. B. (1983). The letter of recommendation: Specificity and favorability of information. Personnel Psychology, 36, 33 l-34 1.

Loher, B. T., Hazer, J. T., Tsai, A., Tilton, K., & James, J. (1997). Letters of reference: A process approach. Journal of Business and Psvchologv, 11,339-355.

McCauley, C. (1991). Selection of National Science Foundation graduate fellows: A case study of psychologists failing to apply what they know about decision making. American Psvchologist, 4, 1,287-1,291.

Murphy, E. (1991). Whomp! Real voices in college admission essays. English Journal, 80, 34-37.

54

Oltman, P. K, & Hartnett, R. T. (1985). The role of Graduate Record Examinations in graduate admissions. Journal of Higher Education. 56,523-527.

Paley, K. S. (1994, March). The college annlication essav: A rhetorical Paradox. Paper presented at the meeting of the Conference on College Composition and Communication, Nashville, TN.

Powers, D. E., & Fowles, M. E. (1997). The personal statement as an indicator of writing skill: A cautionary note. Educational Assessment, 4,75-87.

Powers, D. E., Fowles, M. E., & Boyles, K. (1996); Validating: a writing test for graduate admissions (GRE Board Professional Rep. 96-26b and ETS Research Rep. No. 96-27). Princeton, NJ: Educational Testing Service.

Powers, D. E., Fowles, M. E., & Welsh, C. K. (1999). Further validation of a writing; assessment for graduate admissions (GRE Board Research Rep. No. 96-13R and ETS Research Rep. No. 99-18). Princeton, NJ: Educational Testing Service.

Ralston, S. M., 8z Yoder, D. D. (1989). Effect of a referent’s status on the evaluation of job applications. Journal of Emplovment Counseling;, 26, 84-89.

Range, L. M., Menyhert, A., Walsh, M. L., Hardin, K. N., Ellis, J. B., & Craddick, R. (1991). Letters of recommendation: Perspectives, recommendations, and ethics. Professional Psychology: Research and Practice, 22,389-392.

Rifkin, B., & Roberts, F. D. (1995). Error gravity: A critical review of research design. Language Learning, 45,5 1 l-537.

Roehling, M. V. (1993). Extracting policy from judicial opinions -- The dangers of policy capturing in a field setting. Personnel Psvchologv, 46,477-502.

Santos, T. (1988). Professors’ reactions to the academic writing of nonnative-speaking students. TESOL Ouarterlv, 22,69-90.

Schaeffer, G., Briel, J., & Fowles, M. E. (in press). Psvchometric evaluation of the new GRE Writing Assessment (GRE Rep. No. 96-l 1). Princeton, NJ: Educational Testing Service.

Schmidt, F. L., Johnson, R. H., & Gugel, J. F. (1978). Utility of policy capturing as an approach to graduate admissions decision making. Applied Psychological Measurement, 2,347-359.

Schneider, L. M., dz Briel, J. B. (1990). Validity of the GRE: 1988-89 summarv report. Princeton, NJ: Educational Testing Service.

Shavelson, R. J. (1973). What is the basic teaching skill? Journal of Teacher Education, 14, 144-15 1.

Shavelson, R. J., & Atwood, N. (1977). Teachers’ estimates of student “states of mind.” British Journal of Teacher Education, 3, 131-138.

Stumpf, S. A., dz London, M. (1981). Capturing rater policies in evaluating candidates for promotion. Academy of Management Journal, 24,752-766.

55

Tommasi, G. W., Williams, K. B., & Nordstrom, C. R. (1998). Letters of recommendation: What information captures HR professionals’ attention? Journal of Business and Psvchologv, 13,518.

Vann, R. J., Meyer, D. E., & Lorenz, F. 0. ( 1984). Error gravity: A study of faculty opinion of ESL errors. TESOL Ouarterlv, 18,427-440.

Wallace, M. J., & Schwab, D. P. (1976). A cross-validated comparison of five models used to predict graduate admissions committee decisions. Journal of Applied Psychology, 61, 559-563.

Weins, A. N., Jackson, R. H., Manaugh, T. S., Jz Matarazzo, J. D. (1969). Communication length as an index of communicator attitude: A replication. Journal of Applied Psvchologv, 53,264-266.

Werner, J. M., & Bolino, M. C. (1997). Explaining U.S. Courts of Appeals decisions involving performance appraisal: Accuracy, fairness, and validation. Personnel Psychology, 50, l-24.

Wickenden, J. W. (1982). Open letter to college students applying to competitive colleges. In H. C. Hegener (Ed.), The competitive colleges: Who are they? Where are they? What are they like? (pp. ix-xvii). Princeton, NJ: Peterson’s Guides.

Wiederman, M. W. (1999). Policy capturing methodology in sexuality research. Journal of Sex Research, 3&91-95.

Wiederman, M. W., & Dubois, S. L. (1998). Evolution and sex differences in preferences for short-term mates: Results from a policy capturing study. Evolution and Human Behavior, 19, 153-170.

Willingham, W. W. (1985). Success in college: The role of personal aualities and academic ability. New York: College Entrance Examination Board.

Willingham, W. W., & Breland, H. M. (1982). Personal qualities and college admissions. New York: College Entrance Examination Board.

Wilson, K. M. (1986). The relationship of scores based on GRE General Test item types to undergraduate grades: An exploratory study for selected subgroups (GRE Board Rep. No. 83- 19P). Princeton, NJ: Educational Testing Service.

Zedeck, S., & Kafry, D. (1977). Capturing rater policies for processing evaluation data. Organizational Behavior and Human Performance, 18,269-294.

Zeleznik, C., Hojat, M., & Veloski, J. (1983). Levels of recommendation for students and academic performance in medical school. Psychological Reports, 52, 85 l-858.

56

APPENDIX A

APPLICATION FOR GRADUATE SCHOOL

57

APPLICATION FOR ADMISSION FOR GRADUATE STUDY

APPLICATION MUST BE MADE ONLY TO ONE DEPARTMENT. Return both copies of this form with supporting documents directly to the Department to which you are applying. Processing of your application will be deluyed if supporting credentials are sent separately. Supporting materials will not be returned to you or forwarded to other schools or agencies. Type or print clearly.

BIOGRAPHICAL INFORMATION FULL LEGAL NAME LAST, FAMILY, OR SURNAME FIRST MIDDLE

MAILING ADDRESS NUMBER & STREET CITY STATE ZIP CODE

If you are a U.S. citizen or permanent resident, please 0 American Indian/Alaska Native 0 African American/ Cl Caucasian 0 Puerto Rican indicate your ethnic origin. (Self-identification is entirely (indicate tribal affiliation) Black 0 Mexican-American 0 Multi-racial voluntarily.) 0 Asian American/ (Chicano) 0 Other

Pacific Islander 0 Other Hispanic

APPLICATION INFORMATION

PROPOSED DEPARTMENT

PROPOSED 0 0 0 0 0 DEGREE: PHD MA/PHD MS/PHD MA/DMA MASTER ONLY

MAJOR DEPARTMENT SPECIALIZATION

LIST EVERY COLLEGE AND UNIVERSITY YOU HAVE ATTENDED FOR ONE YEAR OR MORE FULL TIME: ACTUAL NAME OF DATE

DATES (Month and Year) DEGREE OR DIPLOMA RECEIVED NAME OF INSTITUTION & LOCATION (List chronologically) From To MAJOR FIELD OF STUDY (do not translate) OR EXPECTED

I

1

GRADE POINT AVERAGE GRADUATE RECORD EXAM (GRE) RESULTS

GPA EXAMINATION Score %

Cumulative Verbal

Major Quantitative I

Analytical

Writing Assessment I I

I

ADDlTlONAL INFORMATION

List all other activities related to your academic goals:

ACADEMIC HONORS, FELLOWSHIPS, OR NON-ACADEMIC DISTINCTIONS, OR RECOGNITION

LANGUAGE BACKGROUND List your first language

OTHER GRADUATE SCHOOLS TO WHICH YOU ARE APPLYING:

I HEREBY APPLY FOR ADMISSION TO GRADUATE STUDY AT THE UNIVERSITY AND CERTIFY THAT THE ABOVE INFORMATION AND AlTACHED STATEMENTS ARE CORRECT AND COMPLETE TO THE BEST OF MY KNOWLEDGE. FALSIFICATION OR OMISSION OF REQUESTED INFORMATION WILL BE GROUNDS FOR TERMINATING CONSIDERATION OF THE APPLICATION, OR, IF DISCOVERED AFTER ENROLLMENT, FOR WITHDRAWING REGISTRATION PRIVILEGES.

SIGNATURE DATE

58

APPENDIX B

RECOMMENDATION FORM

59

RECOMMENDATION FORM (GRE ADMISSIONS STUDY)

NAME OF APPLICANT LAST FIRST MIDDLE

DEPARTMENT OF DEGREE

Family Educational Rights and Privacy Act of 1974 guarantees students access to their educational records, Students can waive their right to access records.

I DO WAIVE my right to inspect the contents of the following recommendation. = I DO NOT WAIVE my right to inspect the contents of the following recommendation.

SIGNATURE

RECOMMENDER: This recommendation will remain confidential during the admission process. HOW LONG AND IN WHAT CAPACITY HAVE YOU KNOWN THE APPLICANT?

Please write candidly about the student’s qualifications, potential to carry on advanced study in the field specified, intellectual independence, capacity for analytical thinking, ability to organize and express ideas clearly, and potential for teaching. Descriptions of significant actions, accomplishments, and

i

personal qualities related to scholarly achievement are particularly helpful. I I

Please rate this student in comparison with other individuals whom you have known at similar stages in their careers. Please check the box under the appropriate rating.

1. Academic preparedness/background for proposed program of study

Below average

Cl

Average

cl 2. Intellectual ability/capacity I 0 I q 3. Aptitude (skill or potential) for research 1 0 I cl 4. Teaching ability/potential 0 cl 5. Ability to communicate orally cl Cl 6. Ability to communicate in writing q cl 7. Creativity/originality cl q 8. Independent thinking

9. Dependability/reliability I 0 I 0 I cl I cl

10. Initiative I 0 I cl 11. Seriousness of purpose I cl I cl

I cl I 0 12. Professional expertise

13. Character/integrity I cl I cl 14. Promise of productive scholarship I 0 I cl 15. Emotional maturity I Cl I cl 16. Motivation for proposed program of

study cl cl

17. Overall promise I cl I 17

Somewhat Inadequate above Very Truly opportunity

average Good good Outstanding exceptional to observe

ADMISSION TO GRADUATE STUDY IS:

0 Strongly recommended 0 Recommended 0 Recommended with reservations 0 NOT recommended

DATE SIGNATURE TITLE

[Note: Assume that this recommendation is an adequate reflection of all other recommendations submitted by the applicant.]

60

APPENDIX C

APPLICANT’S STATEMENT OF PURPOSE

61

Summary of Applicant’s STATEMENT OF PURPOSE

NAME OF APPLICANT LAST FIRST MIDDLE

This applicant’s statement of purpose was rated Somewhat Inadequate by a faculty committee as showing evidence of Below above Very Out- Truly opportunity the foil0 wing: average Average average Good good standing exceptional to observe

1. Match of applicant’s interests with those of prospective advisees q 0 q 0 0 Cl 0 q

2. Match of applicant’s professional aspirations with objectives program 0 0 0 q cl 0 c3 cl

3. Applicant’s rationale for graduate study 0 cl cl cl cl 0 0 0

4. Likely commitment to the field 0 0 0 q 0 Cl cl 0

5. Evidence of motivation Cl 0 0 cl 0 cl 0 cl

6. Evidence of maturity q 0 0 0 0 cl q 0 7. Evidence of other personal characteristics

likely to facilitate degree completion II cl 0 I7 0 0 III 0 8. Degree to which applicant has overcome prior

obstacles (economic, social, etc.) 0 0 0 0 cl Cl ci 0 9. Evidence of relevant experiences, special

interests, or achievements cl 0 q 0 cl cl cl 0 10. Depth of knowledge cl 0 q 0 0 0 0 cl

11. Evidence of a “voice” (personality, style, etc.) j-J cl cl 0 0 0 0 0

12. Overall strength of statement 0 0 tl 0 III Cl cl Cl

62

APPENDIX D

ABBREVIATED APPLICATIONS FORM

63

Abbreviated Applications (History) Form D

, Undergrad. YOUR PROB School GRE UGPA FACULTY OVERALL PERSONAL RECOM. ADMIS

Selectivity v * A Writ OveralVMaJor RECOMMENDATION PROMISE STATEMENT Deny/Admit 0 toloo%

very Sel. 560 580 680 3.5 3.67 3.72 3 Strongly Recom’d 6 Outstanding 6 Outstanding 0 1 %

very Sel. 640 680 650 5.5 3.49 3.69 1 Recom. w Resew. 3 Smwt Above Avg 2 Average 0 1 %

Very Sel. 510 670 540 2.0 2.57 2.61 1 Recom. w Reserv. 3 Smwt Above Avg 6 Outstanding 0 1 %

Very Sel. 470 560 520 4.5 2.75 3.05 1 Recom. w Resew. 3 Smwt Above Avg 7 Truly Exceptional 0 1 %

Extrm’ly Sel. 540 610 670 5.0 3.44 3.07 3 Strongly Recom’d 6 Outstanding 6 Outstanding 0 1 %

Extrm’ly Sel. 590 790 730 2.5 3.59 3.25 3 Strongly Recom’d 5 Very Good 5 Very Good 0 1 %

Extrm’ly Sel. 610 640 650 2.0 3.27 3.09 1 Recom. w Reserv. 2 Average 4 Good 0 1 %

Extrm’ly Sel. 670 670 620 4.5 3.44 3.39 3 Strongly Recom’d 7 Truly Exceptional 6 Outstanding 0 1 %

Extrm’ly Sel. 660 550 760 5.0 3.33 3.66 2 Recommended 3 Smwt Above Avg 6 Outstanding 0 1 %

Extrm’ly Sel. 700 770 770 5.5 2.95 3.21 3 Strongly Recom’d 6 Outstanding 3 Smwt Above Avg 0 1 %

Extrm’ly Sel. 770 680 700 6.0 3.55 3.19 3 Strongly Recom’d 5 Very Good 3 Smwt Above Avg 0 1 %

Less Sel. 800 640 770 6.0 2.90 2.89 2 Recommended 4 Good 5 Very Good 0 1 %

Extrm’ly Sel. 670 640 570 2.5 3.21 3.54 3 Strongly Recom’d 7 Truly Exceptional 5 Very Good 0 1 %

Less Sel. 460 390 500 3.0 2.88 2.81 3 Strongly Recom’d 5 Very Good 4 Good 0 1 %

Extrm’ly Sel. 520 700 710 3.0 3.26 3.64 2 Recommended 4 Good 6 Outstanding 0 1 %

Less Sel. 530 550 580 3.5 3.33 3.76 3 Strongly Recom’d 6 Outstanding 5 Very Good 0 1 %

Less Sel. 620 710 680 6.0 3.49 3.59 1 Recom. w Reserv. 3 Smwt Above Avg 6 Outstanding 0 1 %

Less Sel. 520 550 640 2.0 2.72 2.80 1 Recom. w Reserv. 3 Smwt Above Avg 1 Below Average 0 1 %

Less Sel. 530 650 510 4.0 3.55 3.14 1 Recom. w Reserv. 2 Average 3 Smwt Above Avg 0 1 %

Very Sel. 540 720 730 5.0 2.95 3.34 2 Recommended 4 Good 5 Very Good 0 1 %

Very Sel. 540 520 590 4.5 3.11 3.22 1 Recom. w Reserv. 3 Smwt Above Avg 3 Smwt Above Avg 0 1 %

Very Sel. 640 560 570 3.0 3.55 3.56 2 Recommended 3 Smwt Above Avg 7 Truly Exceptional 0 1 %

Less Sel. 560 640 610 4.0 3.44 3.49 2 Recommended 4 Good 2 Average 0 1 %

Less Sel. 620 630 610 3.5 3.58 3.55 2 Recommended 3 Smwt Above Avg 3 Smwt Above Avg 0 1 %

Less Sel. 590 680 730 2.5 3.54 3.55 2 Recommended 5 Very Good 3 Smwt Above Avg 0 1 %

Very Sel. 530 560 630 5.5 3.30 3.48 2 Recommended 5 Very Good 4Good 0 1 %

Very Sel. 710 800 800 4.0 3.56 3.66 2 Recommended 5 Very Good 7 Truly Exceptional 0 1 %

KEY: UNDERGRADUATE SCHOOL SELECTIVITY/REPUTATION: Extrm’ly Sel. = extremely selective undergraduate school/extremely good reputation Very Sel . = very selective school/very good reputation Less Sel. = less selective school/good reputation

FACULTY RECOMMENDATION: OVERALL PROMISE = overall oromise ratina from recommendation form (3) Strongly Recom’d = strongly recommended PERSONAL STATEMENT = overall strenath of Dersonal statement (2) Recommended = recommended (1) Below average (4) Good (7) Truly exceptional (1) Recom. w Reset-v. = recommended with reservations (2) Average (5) Very good

(3) Somewhat above average (6) Outstanding

YOUR RECOM = your recommendation to denv or admit PROB ADMIS = likelihood of admission to vour deot. (0 to 100%)

Please return form in envelope provided. See reverse side for percentiles corresponding to GRE scores b

65

Table Dl. GRE General Test Scores Interpretive Data

Percent of Examinees Scorinu Lower than Selected Scores Score GRE-V GRE-Q GRE-A 800 99 99 99 700 97 80 86 600 84 58 62 500 59 35 34 400 26 15 14 300 5 3 3

Table D2. GRE Writing Assessment Interpretive Data

Score

Percent of Examinees Scoring Lower than Selected Scores

!&

6.0 99 5.0 83 4.5 64 4.0 41 3.5 24 3.0 12 2.0 2

66

Documents

Likely Impact of the GRE@ Writing Assessment on ...are perhaps best reflected in the minutes of one GRE Writing Advisory Committee meeting, held June 25-26, 1996, which cited the following