Upload
featherspoon
View
6
Download
3
Embed Size (px)
DESCRIPTION
Research report on assessment
Citation preview
ResearchReports REPORT 68
APRIL 2002TEST OF ENGLISH AS A FOREIGN LANGUAGETM
Influence of IrrelevantSpeech on StandardizedTest Performance
Donald E. PowersWendy AlbertsonThomas FlorekKathy JohnsonJohn MalakBill NemceffMark PorzucDonna SilvesterMinhwei WangRichard WestonEdward WinnerAleksander Zelazny
Influence of Irrelevant Speech on Standardized Test Performance
Donald E. Powers Wendy Albertson Thomas Florek Kathy Johnson
John Malak Bill Nemceff Mark Porzuc
Donna Silvester Minhwei Wang Richard Weston Edward Winner
Aleksander Zelazny
Educational Testing Service Princeton, New Jersey
RR-02-06
Educational Testing Service is an Equal Opportunity/Affirmative Action Employer.
Copyright © 2002 by Educational Testing Service. All rights reserved.
No part of this report may be reproduced or transmitted in any form or by any means,electronic or mechanical, including photocopy, recording, or any information storageand retrieval system, without permission in writing from the publisher. Violators willbe prosecuted in accordance with both U.S. and international copyright laws.
EDUCATIONAL TESTING SERVICE, ETS, the ETS logos, Graduate RecordExaminations, GRE, TOEFL, and the TOEFL logo are registered trademarks ofEducational Testing Service. The Test of English as a Foreign Language and ThePraxis Series are trademarks of Educational Testing Service.
College Board is a registered trademark of the College Entrance Examination Board.
Graduate Management Admission Test and GMAT are registered trademarks of theGraduate Management Admission Council.
®
®®
i
Abstract
The aims of this study were to (1) estimate the likely degree of any such distraction as
well as its impact on test performance and (2) evaluate the prospects of reducing the distraction
to a level that is acceptable to test takers.
Study participants were volunteers (N = 171) who had previously taken the Graduate
Management Admission Test® (GMAT®), the Graduate Record Examinations® (GRE®) General
Test, or the Test of English as a Foreign Language™ (TOEFL®). They were invited to retake a
different form of the same test under either distracting conditions or standard, distraction-free
conditions. To reduce distraction, some participants used either headsets or headsets plus
masking noise.
Attempts to reduce distraction to an acceptable level were largely unsuccessful. The
impact on actual test performance, however, was slight in the GMAT sample and negligible in
both the GRE and TOEFL samples.
The conclusion was that intermingling examinees with others who are taking a speaking
test remains a concern, primarily because of strong negative perceptions by test takers. More
effective means need to be devised to reduce or control distraction.
Key words: distraction, standardized testing, speaking tests, test validity
ii
The Test of English as a Foreign LanguageTM (TOEFL®) was developed in 1963 by the NationalCouncil on the Testing of English as a Foreign Language. The Council was formed through thecooperative effort of more than 30 public and private organizations concerned with testing the Englishproficiency of nonnative speakers of the language applying for admission to institutions in the UnitedStates. In 1965, Educational Testing Service® (ETS®) and the College Board® assumedjoint responsibility for the program. In 1973, a cooperative arrangement for the operation of theprogram was entered into by ETS, the College Board, and the Graduate Record Examinations® (GRE®)Board. The membership of the College Board is composed of schools, colleges, school systems, andeducational associations; GRE Board members are associated with graduate education.
ETS administers the TOEFL program under the general direction of a policy board that was establishedby, and is affiliated with, the sponsoring organizations. Members of the TOEFL Board (previously thePolicy Council) represent the College Board, the GRE Board, and such institutions and agencies asgraduate schools of business, junior and community colleges, nonprofit educational exchangeagencies, and agencies of the United States government.
✥ ✥ ✥
A continuing program of research related to the TOEFL test is carried out under the direction of theTOEFL Committee of Examiners. Its 13 members include representatives of the TOEFL Board, anddistinguished English as a second language specialists from the academic community. The Committeemeets twice yearly to oversee the review and approval of proposals for test-related research and to setguidelines for the entire scope of the TOEFL research program. Members of the Committee ofExaminers serve three-year terms at the invitation of the Board; the chair of the committee serves onthe Board.
Because the studies are specific to the TOEFL test and the testing program, most of the actual researchis conducted by ETS staff rather than by outside researchers. Many projects require the cooperationof other institutions, however, particularly those with programs in the teaching of English as a foreignor second language and applied linguistics. Representatives of such programs who are interested inparticipating in or conducting TOEFL-related research are invited to contact the TOEFL programoffice. All TOEFL research projects must undergo appropriate ETS review to ascertain that dataconfidentiality will be protected.
Current (2001-2002) members of the TOEFL Committee of Examiners are:
Lyle Bachman University of California, Los AngelesDeena Boraie The American University of CairoMicheline Chalhoub-Deville (Chair) University of IowaJodi Crandall (Ex Officio) University of Maryland, BaltimoreCatherine Elder University of AucklandGlenn Fulcher University of SurreyWilliam Grabe Northern Arizona UniversityStan Jones Carleton UniversityKeiko Koda Carnegie Mellon UniversityRichard Luecht University of North Carolina at GreensboroTerry Santos Humboldt State UniversityMerrill Swain The University of TorontoRichard Young University of Wisconsin-Madison
To obtain more information about TOEFL programs and services, use one of the following:
Email: [email protected]
Web site: http://www.toefl.org
iii
Acknowledgments
Many people contributed in various ways to the implementation of the study reported here.
Contributions included offering advice on the design of the study, recruiting participants,
collecting data, processing data, retrieving data from test score files, administering examinations,
preparing and reviewing the report, and many other tasks. We are grateful to all of the following
people for the parts they played:
Tedi Adams, Jane Burtis, Clara Bowman, Brent Bridgeman, Jim Butera,
Pat Carey, Regina Carola, Amy Chen, Rob Colson, Sandy Cool, Karen Copper,
Lisa Costas, Kevin Cureton, Carol Dwyer, Mike Ecker, Dan Eignor, Cindy Evans,
Lauren Florczak, Mark Grant, Fanmin Guo, Erin Herbert, Bob Hill, Rick Kling,
Anna Kubiak, Maria Leszczyszyn, Jack McDonald, Steven Nelson,
Denise Nevrincean, Cindy Nguyen, Roxanna Paez, Liane Patsula, Gary Payne,
Robert Peterla, Karen Potsko, Meg Powers, Diane Rein, Alan Rushforth,
Aleta Sclan, Ed Shea, Dave Vale, Jeff Wright, and Ruth Yoder
iv
Table of Contents
Page
Introduction ...................................................................................................................................1
Background .........................................................................................................................1
Relevant Literature ..............................................................................................................1
Purpose of the Study ...........................................................................................................4
Objectives ...........................................................................................................................5
Method ...........................................................................................................................................5
Participants ..........................................................................................................................5
Testing Facility ...................................................................................................................6
Tasks ...................................................................................................................................6
Study Instruments ...............................................................................................................7
Procedure ............................................................................................................................8
Design .................................................................................................................................9
Analysis .............................................................................................................................10
Results ..........................................................................................................................................12
Reliability of Study Measures ...........................................................................................12
Study Sample ....................................................................................................................12
Effect of the Experimental Manipulation .........................................................................13
Effort/Motivation and Test Anxiety ..................................................................................16
Effects on Test Performance .............................................................................................18
Examinee Reactions to Method Used to Block Sound .....................................................22
Observations and Debriefings ...........................................................................................22
Discussion .....................................................................................................................................23
Limitations ........................................................................................................................24
Conclusions ..................................................................................................................................25
References ....................................................................................................................................26
Notes .............................................................................................................................................28
Appendixes ...................................................................................................................................30
v
List of Tables
Page
Table 1. Reliability Estimates for Self-Report Scales .................................................................12
Table 2. Distribution of Study Sample by Test Sample and Testing Condition .........................13
Table 3. Degree of Distraction by Test-Taker Sample and Testing Condition ...........................14
Table 4. Effect of Testing Condition on Degree of Distraction ..................................................16
Table 5. Means and SDs of Anxiety Scale Scores by Sample and Testing Condition ...............17
Table 6. Means and SDs of Effort Scores by Sample and Test Occasion ...................................18
Table 7. Test Performance by Testing Condition .......................................................................19
Table 8. Summary of Degree of Influence of Several Extraneous Sources of Variation in Test Performance ......................................................................................................21
1
Introduction Background
Currently, some examinees who take ETS-administered computer-based tests do so at the
same time that other test takers take different tests in the same testing center. In the future, some
of these tests will contain speaking components. Under these conditions, it is possible that the
speech generated by some test takers will be distracting to other test takers and also, possibly,
detrimental to the test performance of the latter. It is this concern that motivated the study
reported here.
Relevant Literature
There is substantial literature on the effects of noise on both attitudes and performance.
Much of this research, however, is only partially relevant to our interests. Nonetheless, even a
cursory review suggests that our concern is a legitimate one.
Context of Prior Research. A preponderance of the research on noise has focused on
hearing and auditory performance in occupational settings, where the effects (interference and
annoyance) are well known. The effects of noise on cognitive performance, however, remain
“controversial and far from conclusive” (Mital, McGlothlin, & Faard, 1992, p. 70). Moreover,
decades of research have produced “only fragmentary support” for the belief that distracting
noises can seriously affect performance on intellectually demanding tasks (Weinstein, 1977,
p. 104). Consistent with these views is the fact that efforts to reduce noise have been mostly in
production workplaces, where researchers have shown that, even at low levels, noise can have
negative effects on concentration, productivity, working capacity, and the likelihood of accidents
(Sailer & Hassenzahl, 2000). Perhaps less focus has been on other settings, such as school and
office workplaces, because low sound levels are seldom harmful — at least in the same sense as
they are in occupational settings.
Furthermore, not all of the research on cognitive performance is entirely relevant to our
concern, in large part because much of it has taken place in laboratory settings using noise
generated by percussion-type equipment. Some studies have also been conducted in more
naturalistic settings, however. For example, Ng (2000) studied the effects of building
construction on dormitory residents, finding that behavior and attitudes (about studying) were
2
most affected for residents nearest the construction. Although the impact of such naturally
occurring distractions has been reasonably well studied, the effects of the kind of distraction of
most interest to us (i.e., conversational noise, or speech) have not been as well documented
(Mital, McGlothlin, & Faard, 1992).
Tasks Studied in Previous Research. Neither has the research on noise focused on exactly
the kinds of standardized test tasks that are of most interest to us. Although this research has
employed a variety of different types of cognitive tasks, few if any of these tasks correspond
closely with the kinds of test questions that comprise the large-scale national tests administered
by ETS. Much of this research has focused on quite simple tasks because, as Weinstein (1974)
pointed out, earlier research on complex activities — like reading comprehension and
performance on intelligence tests — yielded mostly inconclusive results. Research using
“realistic noise levels and meaningful tasks” has been “uncommon” (Weinstein, 1974, p. 553).
Several investigations have focused on relatively simple tasks, such as proofreading. For
example, Weinstein (1974) studied the effects of noise on a proofreading task, finding effects on
subjects’ ability to detect grammatical, but not spelling, errors. Later, the same researcher
concluded that the skills required to detect spelling errors were “relatively immune” (Weinstein,
1977, p. 106) to disruption (a tape recording of radio news) but that the more complex processes
involved in detecting contextual errors (e.g., grammatical errors) were affected.
Proofreading has also been used in research on the effects of irrelevant speech on task
performance. In a series of studies, Jones, Miles, and Page (1990) noted negative effects, largely
when irrelevant speech was meaningful. Contrary to Weinstein’s (1974, 1977) findings, the
effects were greater when participants were asked to detect noncontextual errors (e.g., spelling)
than when they were asked to locate contextual ones (e.g., grammatical errors). In their summary
of research on the effects of irrelevant speech, Jones and Morris (1992) focused mainly on visual
tasks, rather than auditory ones. Asking what types of sound interfere with different kinds of
tasks, they concluded that interference is “quite considerable” in some circumstances, but
essentially nonexistent in others (p. 30).
With regard to other kinds of tasks, Kjellberg (1990) noted that reaction time and
vigilance tasks have been studied quite often. In contrast, research on the effects of noise on
“counting” is relatively rare, but seems to indicate that irrelevant speech has “no effect” (Martin,
Wogalter, & Forlano, 1988).
3
Smith (1985) studied a semantic processing task (requiring participants to indicate
whether sentences were correct or not) and a syntactic reasoning task (requiring participants to
verify statements about the order of letters presented). The results showed that continuous noise
had no effect on either task, but that intermittent, unpredictable noise did have an effect on
semantic processing. Finally, Kjellberg and Skoldstrom (1991) studied several different kinds of
tasks involving proofreading, finger dexterity, and complex reaction time and concluded that
“task differences probably explain only a small part of the widely differing noise tolerance levels
at different work places” (p. 39).
Research on Individual Differences. Besides research on different kinds of tasks, some
(but relatively little) research has focused on differences among individuals in their response to
noise. In one such effort, Weinstein (1978) used a self-report noise sensitivity scale (e.g., “At
movies, whispering and crinkling candy wrappers disturb me” and “Even music I normally like
will bother me if I’m trying to concentrate”). Because the scale predicted the reactions of noise-
sensitive and noise-insensitive students, Weinstein concluded that sensitivity to noise is a
personal trait “of sufficient power and generality to permit predictions of reactions to
environments” (p. 464).
In other related research, striking differences have been noted among individuals with
regard to the extent to which they find noise to be annoying and the degree to which it affects
their performance. However, in general this research has not been very successful in identifying
the particular characteristics of people most annoyed by noise (Jones & Davies, 1984). An
exception is that people with hearing impairments may express more annoyance in response to
noise than others. Thus, the effects of noise may depend both on the nature of the task being
performed and on the characteristics of the person performing the task.
Types of Noise Studied. In addition, as intimated earlier, the effects of noise may also
depend on various features of the noise itself. For example, Kjellberg, Landstrom, Tesarz,
Soderberg, and Akerlund (1996) concluded that it is easier to habituate to constant noise than to
variable noise, and therefore constant noise is generally less annoying than variable noise.
Kjellberg et al. also found that distraction was most closely related to the predictability of noise
and the degree to which listeners had control over it. Jones (1984) found that a major
determinant of the effect of noise on performance was the magnitude of the level of change in
noise — in either direction.
4
Jones, Miles, and Page (1990) found that the negative effects of irrelevant speech
depended on the speech being meaningful. With studies of meaningless noise, the main interest
has been the effects of intensity. For instance, the intensity of continuous white noise has been
studied widely for its influence on environmental stress and information processing (Smith,
1985). Intensity has been shown not to be a factor in studies of irrelevant speech, however
(Jones, Miles, & Page, 1990). Also, speech may be more disruptive when it is spoken in the
listener’s native language than in a language that the listener does not understand (Martin,
Wogalter, & Forlano, 1988).
Other Factors and Issues. A number of other factors have also been shown to moderate
the effects of noise. Studying the effects of building construction on dormitory residents, Ng
(2000) cited research showing that attitudes toward the source of the noise may moderate the
effects of noise. Those affected may be less negative about some sources of noise. For example,
in his study Ng speculated that because students stood to benefit, they would be less annoyed by
the disturbance resulting from the construction of a new dorm. In other research, Kjellberg et al.
(1996) found that distraction was most closely related to its predictability and the degree of self-
control that was possible to exert over it.
A final issue that has been addressed in research on noise is its measurement. Sailer and
Hassenzahl (2000) pointed out that the subjective experience of annoyance depends on factors
other than the absolute level of noise. Although there are objective measurements of unwanted
sound, “the relationship between objective sound level and subjective annoyance is rather poor”
(p. 1921). Furthermore, Kjellberg et al. (1996) noted that technical measures of noise explain
only a small portion of variation in annoyance. For example, sound level has been shown to be
only weakly related to annoyance when the noise is irrelevant speech.
Purpose of the Study
The main objective of our study was to determine the influence that distractions from
fellow test takers may have on test performance. The situation of particular interest was one in
which some test takers are taking a speaking test at the same time that others are taking tests that
involve reading, listening, calculating, reasoning, and writing.
5
Objectives
The two main objectives of the study were the following:
(1) to determine the extent to which test takers may be distracted by other test takers
in a computer-based testing environment in which some test takers are trying
tasks that involve speaking
(2) to assess the effectiveness of two methods of reducing distraction due to other test
takers1
Method
Participants
The target sample for the study was test takers who had previously taken either the
GMAT, the GRE General Test, or the TOEFL test. To constitute this sample, we searched the
ETS “data warehouse” files to identify test takers who had tested previously at the University of
Maryland (College Park) computer-based testing center. For GMAT and GRE test takers,
participation was restricted to examinees who had tested within the previous 12 months. For
TOEFL test takers, the restriction was to the previous 6 months, as TOEFL scores are more
likely to change (as a result of experience with the English language) than are GMAT and GRE
scores.
This search yielded approximately 1,200 GRE test takers, 500 GMAT test takers, and 300
TOEFL test takers. “First come, first served” invitations were sent to each of these test takers,
who were asked to return to the test center to retake the test they had taken previously. Each
participant was promised an honorarium of $75. Using a recruitment flyer that we developed for
their use, the CBT test center staff at the university also solicited test takers for the study. In
addition to these efforts, information about the study was posted at George Mason University in
Fairfax, VA, the University of Maryland, Baltimore County, and both the graduate school and
the graduate business management school at the University of Maryland, College Park. These
efforts resulted in a total of 171 study participants, whose operational test scores were very
similar on average to those of the three test-taking populations.
6
Testing Facility
The testing site was a standard ETS computer-based testing center, centrally located on
the campus of the University of Maryland in College Park. This testing center, which houses 10
computer workstations, has been in operation since May 1999 and administers computer-based
tests for the GMAT, GRE, TOEFL, and The Praxis Series™ testing programs.
The center meets all operational specifications for ETS computer-based testing centers,
and as such it is deemed to be fully compliant with all technical and administrative procedures
required for the administration of “high-stakes” computer-based tests. These procedures include
all security practices as defined in the ETS “Policies, Practices, and Procedures” manual. Staff
are fully trained and certified by ETS.
The 10 workstations are partitioned, and each is equipped with a computer, a monitor, a
mouse, headphones, earplugs, and an adjustable chair. The administrative office is equipped with
a viewing window that enables a proctor to monitor test takers at all times during testing. The
center has a 20-foot ceiling and windows around the top of the room.
A picture showing the arrangement of the testing room is included as Appendix A.
Tasks
The tests used in the study were the following:
• GMAT
• GRE General Test
• TOEFL (computer-based version)
According to test information bulletins, these tests measure (for the GMAT) basic verbal,
mathematical, and analytical writing skills; (for the GRE General Test) verbal, quantitative, and
analytical skills; and (for the TOEFL) the English language proficiency of people whose native
language is not English. Each test is comprised of several different question formats. The verbal
portion of the GMAT includes reading comprehension, critical reasoning, and sentence
completion formats. The GMAT quantitative section includes problem solving and data
sufficiency formats, and the analytical writing section requires test takers to write two 30-minute
essays.
7
The verbal portion of the GRE General Test contains analogy, antonym, sentence
completion, and reading comprehension item formats. The quantitative portion contains
quantitative comparison questions and problem solving questions involving arithmetic, algebra,
and geometry. The analytical section is comprised of two item types — analytical reasoning and
logical reasoning.
The TOEFL listening section requires examinees to answer questions about conversations
they have heard. The reading portion contains short passages and questions that test examinees’
understanding of the passages. The structure portion contains multiple-choice questions that
require the recognition of language that is appropriate for standard written English. The writing
section requires test takers to compose a 30-minute essay.
Study Instruments
To devise a measure of distraction during test taking, 42 statements were written on the
basis of a review of the literature on the effects of noise/distraction on performance. These
statements were then reviewed, and the 21 most “face valid” were selected for inclusion in the
study measure. For these “distraction” questions, participants were asked to indicate the extent to
which they agreed or disagreed with such statements as
“I found myself thinking more about the surroundings than the test itself.”
“It was annoying when other people talked during the test.”
Responses were on a 5-point scale (strongly agree, agree, neither agree nor disagree, disagree,
and strongly disagree). A score of 1 to 5 was assigned to each statement, and scores for each
statement were summed to get a total distraction score. Some items were reversed scored as
appropriate so that the greater the distraction, the higher the total score.
Because there was no opportunity to pretest the items, we relied on data collected during
the study proper to document the suitability of the distraction measure. The study data showed
that, when items referred to distraction during the test that students had just taken, all items
correlated at least .50 in absolute value with the total distraction scale score, and 17 of 21
correlated greater than .70. When the statements referred to the previous test that examinees had
taken, 18 of 21 items correlated greater than .50 in absolute value with the total scale score. The
correlation of the weakest item was .27 with the total scale. Thus, the items comprising the
distraction scale were reasonably homogeneous.
8
A second scale (eight statements) was developed to allow participants to indicate (again,
agreeing or disagreeing) how anxious they were about test taking, both for the test they had just
taken and for the operational test they had taken previously. Statements were modeled after those
in a variety of test anxiety inventories (e.g., Sarason, 1984; Spielberger, 1980) and included ones
such as
“I felt tense and unsure.”
“Thinking about how I was doing interfered with my work.”
The correlations of individual anxiety items with the total scale were all greater than .50.
For a third category of statements, participants were also asked to report, again by
agreeing or disagreeing on a 5-point scale, the amount of effort they made (or motivation they
had) for each testing by agreeing or disagreeing with six statements, such as
“I really tried to do my very best.”
“Getting a good score was not important to me.”
For the effort scale, item-total scale correlations were all greater than .40. The relevance of each
of these scales to our study objectives is discussed below.
Procedure
During the last three weeks in July 2001, test takers were scheduled in groups of up to 10
to retake a different form of the same operational test they had taken previously at the center.
GRE, GMAT, and TOEFL test takers were intermingled within the testing sessions, which are
described below. In preparation for the study, test takers were told that, in contrast to the test
they had taken previously, test scores would not count, nor would scores be reported to
institutions. Immediately before taking the test, participants were told only that
the purpose of the study is to compare how test takers perform under standard
operational testing conditions (when test scores count) vs. how they perform
under non-standard conditions when test scores do not count. So, we’d like you
to try your best when you take the test today.
Study participants who were assigned to the distracting conditions (described below) were also
told that
9
one major difference between today’s testing and the test that you took “for
real” previously is that you will hear other test takers taking a speaking test.
We have provided you with headsets to wear when you take the test. These
should help to block out the sound created by those taking the speaking test.
If they are not completely effective, please try to ignore any distractions as
best you can.
To simulate distraction from other examinees taking a speaking test, two additional
workstations were set up within the center to accommodate two compact disk (CD) players.
These CD players were used to play recordings of TOEFL candidates who had participated in the
pilot testing of new item types for a new TOEFL speaking test. The recordings included several
different speaking tasks (reading/speaking, listening/speaking, and independent speaking) so as
to generate a variety of speech as well as a relatively continuous stream of distraction during the
entire testing session. These responses were stored on four CDs created for the study, each
containing the responses of a single respondent from the earlier TOEFL pilot study. Two of the
CDs contained the responses of male test takers; two others contained the responses of female
test takers. Pauses of 5 to 12 minutes were inserted between the individual responses to simulate
the silence that would be observed for someone actually taking the new TOEFL speaking test.
The duration of each pause was determined at random to simulate the kind of intermittent speech
that might be expected during the testing. This intermittent speech was played during the entire
test session to enable a worst-case test of the effects of irrelevant speech on GMAT, GRE, and
TOEFL test taking. The volume, which was standardized for each test session, was set at a level
that project staff judged to approximate that of a typical test taker.
Design2
In total, 28 testing sessions were conducted. During approximately a third of the sessions,
(randomly designated) examinees were tested under standard, distraction-free conditions. For the
remaining two thirds of the sessions (again, randomly designated), examinees were distracted by
the two simulated “examinees.” For half of these “distracting” sessions, test takers were issued
headsets as a means of blocking noise. For the other half, they were provided with headsets that
also enabled them to hear (and adjust the level of) masking noise that was delivered through the
10
headsets. Before the test began, study participants in the masking noise condition were given
time to try different sound levels in order to determine their preferred level during testing.
After each testing session (both the standard sessions and the “distracting” sessions),
participants completed a questionnaire, which included the items that comprised each of the three
scales — distraction, anxiety, and effort — discussed above. Each of the scales sought
perceptions regarding both the test taken for our study (“today’s test”) and the test taken
previously (“previous test”).
Participants were also asked to give their opinions of the suitability and effectiveness of
each of the alternative methods of blocking distractions. Finally, in addition to informal
observations of each testing session by test proctors, we formally observed and documented test
takers’ activities and reactions during one of the testing sessions. Upon completion of the
session, the observer also talked individually with the test takers to gather their impressions.
Analysis
The main analysis was directed toward determining whether distraction had an undue
influence on test performance. More specifically, our interest was the extent to which distraction
could explain variation in participants’ test performance, above and beyond what could be
explained by their performance on the same test taken under standard, distraction-free
conditions. To help gauge the size of any “distraction effect,” we also ran the same analysis
using as explanatory variables (instead of distraction) each of two other variables often believed
to contribute to unwanted, extraneous variation among test takers’ performances. The first was
test-taking effort/motivation, the second was test anxiety. The rationale for using these variables
was that, like distraction, both are potential sources of invalid variation among test scores. Also,
as for distraction, we assumed that each of these variables would have a differential influence on
previous, operational test performance and performance on the test taken for our study. For
example, we assumed that, because there were no consequences for performing poorly, both
anxiety and effort/motivation would, on average, be less for our research study test
administration than for the previously taken operational test. For some of our study participants,
distraction would presumably be greater for our study test than for the operational test, which
was supposedly taken under relatively distraction-free conditions. The inclusion of these
11
variables (i.e., effort/motivation and anxiety) thus served as a baseline, enabling a comparison of
the influence of distraction with the influence of two other common but undesirable potential
sources of test score variance.
Second, we also assessed the effect of distraction on examinee perceptions. This
determination was made by comparing the reactions of participants who were distracted in our
study with those who were not. We also compared participants’ reactions to our distracting
conditions with their reactions to conditions experienced during the previous operational testing.
The reactions of interest were participants’ agreement with such statements as the following:
“Because of noise or other interruptions, my test performance suffered.”
“Any distractions probably didn’t affect my performance much.”
A final analysis focused on the possible differential effectiveness of the two methods of
reducing or controlling distraction.
These analysis objectives were accomplished by using, in the hierarchical fashion
suggested by Cohen and Cohen (1977), ordinary least squares linear regression analysis. For
each test section, test scores earned under distracting conditions were regressed, first on test
scores earned previously under standard, nondistracting conditions. Next, the degree of
distraction reported by participants was added to the regression equation to determine the degree
to which it explained variation in test performance above and beyond that explained by
performance on the previous, operational test taken under standard, distraction-free conditions.
The same analysis was repeated using, in turn, effort/motivation and anxiety (instead of
distraction) as the explanatory variables. In addition, because anxiety was expected to exert a
greater influence for the previous, operational test taken “for real” than for our research test, we
reversed the analysis, this time regressing previous test performance on (1) test performance
during our study and (2) reported anxiety on the previous test. This enabled an assessment of the
extraneous influence of test anxiety in the operational test.
An analysis similar to those described above was undertaken to determine the effect of
testing condition on participants’ perceptions of distraction. In this analysis, the perception of
distraction during our study was regressed on perceptions of distraction during the previous test.
Next, testing condition was added to assess its power to explain perceptions of distraction above
and beyond what was explainable from participants’ perceptions of distraction on the previous,
operational test taken under, presumably, distraction-free conditions.
12
Finally, the degree to which distraction was an influence — both on test performance and
on examinee perceptions — was compared for each method of blocking distractions, that is,
headsets only versus headsets and masking noise.
Results
Reliability of Study Measures
Table 1 provides estimates of the reliability (coefficient alpha) for each of the scales
developed to measure three sources of construct-irrelevant (unwanted) test variance: differential
effort/motivation, test anxiety, and distraction. The estimates have been computed both for
participants’ responses concerning the previously taken operational test, and for the test taken for
our research study. As is clear, the 21-item distraction scale was quite reliable. The 8-item
anxiety scale was somewhat less reliable, with estimates around .80. The estimated reliability of
the 6-item effort/motivation scale was reasonably good with regard to reports of
effort/motivation for the research study test, but somewhat less for the previous, operational test.
Table 1 Reliability Estimates for Self-Report Scales
Test Occasion
Measure Previous Operational Test
Research Test
Distraction
.92
.96
Test anxiety
.81
.76
Effort
.65
.81
Note: Table entries are Cronbach’s coefficient alpha.
Study Sample
Table 2 shows the distribution of GMAT, GRE, and TOEFL test takers by study testing
condition. Because we were unable to assign equal numbers of GMAT, GRE, and TOEFL test
takers to each testing session, the samples were not equally distributed across testing conditions.
13
However, a chi-square test of the frequencies did not detect a statistically significant imbalance,
χ2 (4) = 5.7, n.s.
Table 2 Distribution of Study Sample by Test Sample and Testing Condition
Testing Condition Sample
Distraction With
Headsets
Distraction With Headsets and
Masking Noise
No
Distraction
Total GMAT 8 13 12 33
GRE 25 28 17 70
TOEFL 14 27 27 68
Total 47 68 56 171
X2 (4) = 5.70, n.s.
Effect of the Experimental Manipulation
Table 3 displays, by testing condition and test-taker sample, study participants’ scores on
the distraction scale (higher numbers indicating greater perceptions of distraction). Several
aspects of this table are noteworthy. First of all, the degree of distraction reported by study
participants who were assigned to the no-distraction condition was, on average, comparable to
that reported for the previously taken operational test. There was no statistically significant
difference between these two conditions — that is, the nondistracting condition in our study
versus conditions during the operational test — within any of the three test-taker samples. The
implication is that our standard, nondistracting condition was a good approximation of
conditions during an operational test administration.
14
Table 3 Degree of Distraction by Test-Taker Sample and Testing Condition
Test/Condition Sample GMAT GRE TOEFL Total Research Test Distraction With Headsets M
78.8
80.7
70.1
77.2
SD 10.9 13.7 14.7 144 n 8 25 14 47 Distraction With Headsets and Masking Noise M
79.5
70.1
69.3
71.6
SD 19.9 23.3 17.6 20.9 n 13 28 27 68 No Distraction M
42.5
47.7
44.9
45.2
SD 12.0 18.2 14.3 15.3 n 12 17 27 56 Previous Operational Test M
50.9
47.4
48.7
48.6
SD 17.0 14.3 14.9 15.1 n 32 69 66 167
Note: Entries are scores on the distraction scale.
Secondly, the table suggests few noteworthy differences among the three test-taking
samples with respect to the degree of distraction that they reported.
Thirdly, the reactions of participants in the two distracting conditions differed
dramatically from those in the nondistracting condition. These differences vary somewhat across
conditions and samples, but all can be considered by most standards (e.g., Cohen & Cohen,
1977) to be large, ranging from at least one to nearly three full standard deviation units. The
implication is that our treatment conditions were very successful in generating a noticeable
degree of distraction.
Finally, comparing the two distracting conditions reveals that, over all samples combined,
using masking noise in conjunction with headsets (instead of headsets alone) reduced the amount
of distraction slightly — by about a third of a standard deviation unit on average — an effect that
can be considered “small to medium” (Cohen & Cohen, 1977). This reduction can be considered
15
relatively small also in relation to the large difference remaining between distracted and
nondistracted study participants. Moreover, the effect does not appear to be consistent across
test-taking samples, with virtually all of it occurring within the GRE sample.
Appendix B provides participants’ responses to each of the individual statements
concerning distraction. As is clear, there were noticeably higher reports of distraction on
virtually every item for the two distracted groups when compared to reports for both the
nondistracted study group and the operational test. Chi-square tests revealed statistically
significant differences among groups for each item. Perhaps most telling were the reactions to
the following statement:
“Due to noise or other interruptions, my test performance suffered.”
Only 10% of all study participants agreed or strongly agreed that this statement applied to
their operational testing experience, and only 9% of those assigned to the nondistracting study
condition agreed or strongly agreed that the statement applied to their experience in our study.
However, this statement was endorsed far more frequently by those who were distracted during
our study — by a slight majority (55%) of study participants who used headsets and by a near
majority (45%) of those who also heard masking noise.
Table 4 contains the results of regressing participants’ perceptions of distraction
(regarding the research study test administration) on their perceptions of distraction regarding the
previous, operational test administration. Of interest here is the effect of testing condition (no
distraction, distraction with headset, and distraction with headset and masking noise) over and
above the degree to which distraction can be explained as a consistent individual difference
among test takers (i.e., as indexed by their reports of distraction from the operational test taken
under relatively distraction-free conditions). The results of this analysis show that testing
condition was clearly a significant explanatory variable, above and beyond that reported for the
earlier operational test. This indicates that the treatment conditions did have a statistically
significant effect on participants’ perceptions of distraction for each of the three test-taker
samples. The regression weights indicate that, as suggested by the descriptive statistics presented
earlier, the main differences are between the two distracting conditions and the nondistracting
condition. Only in the GRE sample was there a significant difference between the two distracting
conditions, with the use of masking noise (instead of headsets alone) reducing the degree of
distraction reported.
16
Table 4 Effect of Testing Condition on Degree of Distraction
Explanatory Variable(s) Cumulative
R2 Increase in
R2 F for increase
in R2
df GMAT Sample
1. Distraction Score (Nonoperational Test)
.10
.10
3.5
1, 31
2. Distraction Score, Testing Condition
.61
.50
18.5***
3, 29
GRE Sample 1. Distraction Score (Nonoperational Test)
.18
.18
14.9***
1, 68
2. Distraction Score, Testing Condition
.44
.26
14.9***
3, 66
TOEFL Sample 1. Distraction Score (Nonoperational Test)
.15
.15
11.3**
1, 66
2. Distraction Score, Testing Condition
.51
.36
23.9***
3, 64
** p<.01, *** p<.001 The conclusion from this analysis is that our experimental simulation of distraction was
successful. The bad news, however, is that participants did react negatively to the resulting
distraction, feeling not only that they were distracted but also that their test performance was
negatively affected.
Effort/Motivation and Test Anxiety
As stated earlier, part of our strategy was to compare the relative influence of distraction
with two other potential sources of unwanted test score variation — test anxiety and test-taking
effort/motivation. Table 5 shows mainly that, as expected, test anxiety ran higher, according to
examinee reports, during the previous, operational test than during the experimental testing that
we conducted. Over all samples, the difference (effect size) was nearly a full standard deviation
unit, a difference that can be considered “large.” Thus, on average, study participants clearly felt
that they experienced more anxiety when they took operational tests than when they took our
research tests. Responses to each of the test-anxiety statements are given in Appendix C. For
17
example, whereas 67% of the total sample agreed or strongly agreed that they “worried a lot”
before taking a previous, operational test, far fewer (9-11%) agreed or strongly agreed that they
did so before taking the tests given during our research study.
Table 5 Means and SDs of Anxiety Scale Scores by Sample and Testing Condition
Sample
Testing Condition GMAT GRE TOEFL Total Distraction With Headsets M
23.1
19.4
25.1
20.3
SD 3.2 5.2 6.9 4.9 n 8 25 14 47 Distraction With Headsets
and Masking Noise M
20.7
21.4
21.1
21.1 SD 6.4 6.1 6.2 6.2 n 13 28 27 68 No Distraction M
18.3
23.9
18.7
18.6
SD 5.3 6.3 5.8 5.5 n 12 17 27 56 Previous Operational Test M
26.8
25.4
24.2
25.2
SD 7.4 6.3 7.1 6.9 n 32 69 66 167
With respect to effort/motivation, participants said that they expended more effort (or
were more highly motivated) when they took an operational test than when they tested for our
study (Table 6). In terms of effect sizes, the difference was “large” within each of the three test-
taking samples. Appendix D shows responses to each question comprising the effort/motivation
scale. For example, whereas 90% of all study participants agreed or strongly agreed that they
were “concerned about doing well” on their previous, operational test, fewer (44-59%) of the
participants in each study condition agreed or strongly agreed with this statement. It should be
noted, however, that despite the generally lower effort/motivation for our research testing,
participants’ responses suggest that, in general, they made a reasonably good effort for our study.
18
For instance, a significant majority (70-87%) agreed or strongly agreed that they “tried to do
[my] best” for our study (compared with 88% for the operational test).
Table 6 Means and SDs of Effort Scores by Sample and Test Occasion
Sample
Test Occasion GMAT GRE TOEFL Total Research Study M
19.8
21.1
22.4
21.4
SD 6.1 4.7 5.2 5.3 n 33 70 68 171 Previous Operational Test M
25.8
26.7
25.8
26.2
SD 4.1 3.2 3.5 3.5 n 32 69 66 167
Effects on Test Performance Table 7 shows the mean test scores for each test-taker sample according to testing
condition for the tests taken for our study. Also shown are the scores that each group received on
the operational versions of the tests that participants took prior to our study. As is clear,
operational scores for each sample were somewhat higher on average than were the scores
obtained on the tests administered for our study. Scores obtained under distracting conditions
also were generally (but not always) lower than were those obtained under nondistracting
conditions in our study.
19
Table 7 Test Performance by Testing Condition
Research Test Test Score
Distraction
With Headsets
Distraction With Headsets
and Masking Noise
No Distraction
Operational Test
Score Scale Range
GMAT Sample GMAT V M 21 21 30 28 0-60 SD 5 7 6 8 GMAT Q M 32 34 37 37 0-60 SD 14 11 14 11 GMAT W M 3.3 3.3 3.7 3.9 0-6 SD 0.8 0.7 0.5 0.7 GMAT Total M 457 473 574 548 200-800 SD 103 114 114 114 GRE Sample GRE V M 468 497 460 494 200-800 SD 88 134 124 127 GRE Q M 560 578 602 582 200-800 SD 181 160 160 164 GRE A M 590 544 528 549 200-800 SD 167 163 173 163 GRE Total M 1617 1620 1590 1625 600-2400 SD 371 388 369 364 TOEFL Sample TOEFL R M 23 21 22 23 0-30 SD 2 4 5 4 TOEFL L M 23 21 23 22 0-30 SD 3 4 3 5 TOEFL W M 23 21 23 23 0-30 SD 4 5 3 3 TOEFL Essay M 4.1 4.1 4.1 4.3 0-6 SD 0.4 0.7 0.6 0.8 TOEFL Total M 222 208 227 227 0-300 SD 31 32 31 32
Note: N for GMAT sample was 26, N for GRE sample was 65, N for TOEFL sample was 51.
20
Regressing test scores obtained in our study on test scores obtained earlier under
standard, operational conditions revealed that the later scores were highly explainable from
earlier ones. Almost without exception, previous test performance explained what can be
considered to be a “large” proportion of variance (defined by Cohen and Cohen as 26%). When
we assessed the influence of distraction by adding this variable to the regression equation, we
found that it explained no more than 1% of test variance for any test taken by either the GRE or
the TOEFL sample. For the smaller GMAT sample, only for the total score did distraction
account for a statistically significant portion of test-score variance (8%). A proportion of this size
has been described by Cohen and Cohen as “small” to “medium.” Detailed results of these
regression analyses are shown in Appendix E. The general overall conclusion is that the variation
due to distraction, above and beyond that explainable by performance under standard,
nondistracting conditions, was neither practically nor (with one exception) statistically
significant.
In a similar vein, testing condition was substituted for distraction in the regression
equations to assess its explanatory power above and beyond that due to previous test
performance under standard conditions. The results of this analysis (Appendix F) were consistent
with the results using distraction score as the explanatory variable. Only for GMAT verbal score
was the effect of testing condition statistically significant (p < .05). As indicated by the
regression weights (not shown), this effect was approximately 7 points on the 0 to 60 score scale
(standard error of approximately 3 points) in favor of the nondistracting condition over each of
the distracting ones. The effect on GMAT total score barely failed to reach statistical
significance (p = .07).
Analyses that were parallel to those just described were run when test anxiety and test-
taking effort/motivation were substituted for distraction as an explanatory variable. The detailed
results of these regression analyses are shown in Appendixes G and H. Test anxiety was a
significant explanatory variable for several of the GMAT scores only. Like distraction,
differential effort/motivation had little explanatory power.
A summary of the proportion of test score variance that was explainable by each
extraneous source of variance — distraction, test anxiety, and test-taker effort/motivation — is
shown in Table 8. Also shown (in the column labeled “test constructs”) is the proportion of
variance explainable from performance on the previous, operational test. This table reveals that,
21
as suggested above, distraction had little if any role in explaining differences among study
participants’ test scores. In relation to the influence of the two other extraneous factors we
investigated, the role of distraction was minimal. In addition, the role of distraction was always
much less than the role of the skills and abilities measured by the tests (test constructs).
Table 8 Summary of Degree of Influence of Several Extraneous Sources of Variation in Test Performance
Source of Variation Test Source Distraction Anxiety Effort Test Constructs
GMAT V .11 .36*** .03 .33 (.34) GMAT Q .02 .02 .00 .86 (.87) GMAT W .09 .21** .10 .23 (.28) GMAT Total .08* .17*** .01 .68 (.69) GRE V .01 .00 .01 .72 (.75) GRE Q .01 .00 .00 .80 (.80) GRE A .00 .00 .01 .71 (.60) GRE Total .00 .00 .01 .85 (.83) TOEFL L .01 .00 .00 .77 (.73) TOEFL R .00 .01 .02 .50 (.59) TOEFL W .00 .01 .03 .50 (.55) TOEFL Essay .00 .03 .05 .21 (.25) TOEFL Total .01 .00 .01 .76 (.80)
* p<.05, ** p<.01, *** p<.001
Note: Table entries are percentages of variance explained by each source, above and beyond that explained by test performance on another occasion. The right-most numbers in parentheses are the proportions of test-score variation in previous, operational test scores that were explained by performance on the research study test.
22
Examinee Reactions to Methods Used to Block Sound
Opinion was equally mixed regarding the helpfulness of the headsets in blocking out
distraction. Although a majority of the study participants (57%) regarded the headsets as either
very or somewhat effective in blocking distractions, approximately a third (34%) rated the
headsets as either somewhat or very ineffective. When asked whether their headsets were
comfortable, 43% (of 143 participants who responded) said that the headsets were either very
comfortable or somewhat comfortable. On the other hand, a slightly greater proportion (48%)
found them to be either somewhat uncomfortable or very uncomfortable.
Masking noise was rated as slightly less effective than the headsets. Fewer than half
(44%) of the study participants said that masking noise was either very effective or somewhat
effective. Nearly a third (31%) rated masking noise as either very or somewhat ineffective. A
slight majority found it to be either very or somewhat helpful to be able to adjust the volume of
masking noise.
Observations and Debriefings
The observations of study participants and the informal debriefing of them after one of
the testing sessions provided information that was generally consistent with the results of the
questionnaire survey. The debriefings also yielded further insights or suggested additional
considerations, such as the following:
Distraction may be more noticeable and possibly more detrimental to performance on an
operational test than on our research test.
Masking noise may itself be a source of distraction for some test takers.
Some test takers may be able to accommodate to distraction after a relatively brief period
of exposure.
23
Discussion The Standards for Educational and Psychological Testing (AERA/APA/NCME, 1999)
assert that
[Test] validation involves careful attention to possible distortions in [test score] meaning
arising from … aspects of measurement such as test format, administration conditions
[emphasis added], or language level that may materially limit or qualify the interpretation
of test scores (p.10).
The study reported here was designed to assess the degree of test-score distortion that
might result from one specific administration condition. The condition was one in which test
takers from various testing programs are comingled in the same computer-based testing center
with examinees taking a test containing a speaking component. In the current study, the
distraction expected from fellow test takers under these conditions was simulated by playing
recordings of test takers who had completed a number of tasks being tried out for a new TOEFL
speaking test. Of primary interest was the degree to which distraction was found to be an undue
or disproportionate influence on test performance, especially relative to the influence of other
extraneous factors. This influence was assessed by comparing the proportion of variation in test
takers’ performance that could be attributed to (1) test takers’ abilities, as determined by their
performance on tests taken earlier under standard, distraction-free testing conditions, and (2) the
degree of distraction that examinees experienced during the testing conducted for purposes of
our study. The study also provides information regarding the degree to which distraction was
perceived to have affected test performance. Finally, the study results provide some, albeit not
very encouraging, information about how unwanted distraction from fellow test takers might be
reduced.
The results revealed that distraction had, on average, a very large impact on examinees’
perceptions. Study participants reported that they were annoyed, disrupted, distracted, and so
forth by the noise that we generated. More importantly, examinees felt that our distraction had
negative effects on their test performances. The data also suggested that although anxiety levels,
which were reported to have been lower generally during our study testing than during previous,
operational testing, were slightly, but not consistently, elevated when test takers were distracted
during our study.
24
The two methods of sound blocking that were studied — the use of headsets and the use
of headsets plus masking noise — received mixed reviews from examinees with respect to their
effectiveness. Furthermore, headsets were regarded as being uncomfortable by a significant
portion of the study sample. More importantly, neither method came close to reducing the
perceived level of distraction to the much lower level that was perceived by examinees when
tested under standard, relatively distraction-free conditions.
With respect to effects on actual test performance, distraction was estimated to be a
negligible influence for each of the tests taken by two of the test-taker samples (GRE and
TOEFL), accounting for, at most, 1% of test score variation above and beyond what could be
explained by examinee performance under relatively distraction-free conditions. The influence of
distraction on GMAT test takers was also a relatively minor influence, accounting for, at most, a
“small to medium” (2 to 11%) proportion of variance for any of the GMAT sections. Moreover,
in each sample, the estimated influence of distraction was less than the influence noted for two
other potential sources of test score invalidity — test anxiety and differential examinee
motivation/effort.
Limitations
Like most research studies, this one had its limitations. First of all, although we were
interested in the effects of distraction for all ETS computer-based tests, it was not possible within
the time frame designated for the study to include test takers from every testing program. We
did, however, include three of the major (i.e., largest volume) ETS-administered testing
programs, whose tests contain a relatively wide variety of test question types that measure
diverse cognitive skills and abilities. One of the specific skills the study did not consider was
speaking proficiency.
Secondly, because the study sample sizes were relatively small, it was not possible to
assess the extent to which effects of distraction on test performance might depend on certain
individual differences among examinees — for example, their sensitivity to noise. Other
individual differences that might be equally important in this regard were not considered either.
Moreover, because one of the study samples (GMAT) was very small, the results based on this
sample should be interpreted very cautiously.
25
Thirdly, although the data suggested that we provided a rigorous test of examinees’
ability to cope with distraction, we wonder if we established conditions that were unduly severe.
Even though our simulated distraction seemed both realistic and reasonable to us, it may have
resulted in distraction that could be regarded as excessive. On the other hand, an extensive
observation of one of the testing sessions by an ETS observer led us to question whether we had
created enough distraction, as the observer questioned whether the level of noise, which had been
predetermined in the pilot testing, was loud enough.
Finally, we were able to implement our sound-blocking methods only partially in some
cases, as we could merely encourage, not enforce, the use of headsets during the distracting
conditions. Our observer noted several instances when test takers removed headsets, sometimes
repeatedly.
Conclusions
Our main conclusion is that intermingling test takers who are taking a speaking test with
examinees who are taking other tests will pose a challenge. More effective methods need to be
devised to reduce distraction to a level that is acceptable to test takers.
From our analyses of test performance, we conclude that, in general, the kind of
distraction that we studied will not pose a major threat to the actual validity of test scores. We
must temper this conclusion by our finding that, in one of the test-taker samples (GMAT), we did
detect a small but statistically significant impact on test performance, mainly on the verbal
portion of the test.
We also conclude that the instruments and design developed for the study reported here
proved effective for detecting the effects of distraction and for evaluating the effectiveness of
procedures for reducing it. These resources should prove useful for studying other alternatives
for reducing or controlling distraction.
26
References
American Educational Research Association (AERA), American Psychological Association
(APA), National Council on Measurement in Education (NCME). (1999). Standards for
educational and psychological testing. Washington, DC: American Educational Research
Association.
Cohen, J. (1977). Statistical power analysis for the behavioral sciences (rev. ed.). New York:
Academic Press.
Cohen, J., & Cohen, P. (1977). Applied multiple regression/correlation analysis for the
behavioral sciences. Hillsdale, NJ: Erlbaum.
Jones, D. M. (1984). Performance effects. In D. Jones & A. Chapman (Eds.), Noise and society
(pp. 155-184). New York: Wiley.
Jones, D. M., & Davies, D. R. (1984). Individual and group differences in the response to noise.
In D. Jones & A. Chapman (Eds.), Noise and society (pp. 125-153). New York: Wiley.
Jones, D. M., Miles, C., & Page, J. (1990). Disruption of reading by irrelevant speech: Effects of
attention, arousal or memory? Applied Cognitive Psychology, 4, 89-108.
Jones, D. M., & Morris, N. (1992). Irrelevant speech and cognition. Handbook of human
performance, Vol. 1 (pp. 29-53). New York: Academic Press.
Kjellberg, A. (1990). Subjective, behavioral and psychophysiological effects of noise.
Scandinavian Journal of Work & Environmental Health, 16, 29-38.
Kjellberg, A., & Skoldstrom, B. (1991). Noise annoyance during the performance of different
nonauditory tasks. Perceptual and Motor Skills, 73, 39-49.
Kjellberg, A., Landstrom, U., Tesarz, M., Soderberg, L., & Akerlund, E. (1996). The effects of
nonphysical noise characteristics, ongoing task and noise sensitivity on annoyance and
distraction due to noise at work. Journal of Environmental Psychology, 16, 123-136.
Martin, R. C., Wogalter, M. S., & Forlano, J. G. (1988). Reading comprehension in the presence
of unattended speech and music. Journal of Memory and Language, 27, 382-398.
Mital, A., McGlothlin, J. D., & Faard, H. F. (1992). Noise in multiple-workstation open-plan
computer rooms: Measurements and annoyance. Journal of Human Ergonomics, 21, 69-82.
Ng, C. F. (2000). Effects of building construction noise on residents: A quasi-experiment.
Journal of Environmental Psychology, 20, 375-385.
27
Sailer, U., & Hassenzahl, M. (2000). Assessing noise annoyance: An improvement-oriented
approach. Ergonomics, 43, 1920-1938.
Sarason, I. G. (1984). Stress, anxiety, and cognitive interference: Reactions to tests. Journal of
Personality and Social Psychology, 46, 929-938.
Smith, A. P. (1985). The effects of different types of noise on semantic processing and syntactic
reasoning. Acta Psychologica, 58, 263-273.
Spielberger, C. D. (1980). Test anxiety inventory. Preliminary Professional Manual. Palo Alto,
CA: Consulting Psychologists Press.
Weinstein, N. D. (1974). Effects of noise on intellectual performance. Journal of Applied
Psychology, 59, 548-554.
Weinstein, N. D. (1977). Noise and intellectual performance: A confirmation and extension.
Journal of Applied Psychology, 62, 104-107.
Weinstein, N. D. (1978). Individual differences in reactions to noise: A longitudinal study in a
college dormitory. Journal of Applied Psychology, 63, 458-466.
28
Notes 1 Sound-control solutions for speech delivery and capture were divided into three main
categories, based on cost, technology, and logistical implementation:
��low-end solutions (headphones, headsets, earplugs, earmuffs, etc.)
��midrange solutions (masking noise, whether white, pink, or brown)
��high-end solutions (sound booths, architectural modifications, etc.)
For practical cost, timeline, and logistical considerations, the team decided to focus the research
on the low-end and midrange solutions. 2 The study design was informed by a set of small pilot studies at an on-site ETS computer-based
testing laboratory. Four sessions, each with three test takers, were conducted in order to
1. compare several models of headsets, and select the best one for the larger scale study
2. try out the logistics for the main study
3. inform the design of a questionnaire
4. resolve any operational issues, on a small scale, before moving to the larger study
Based on a $200 maximum price limit, the team compared headsets and headphones made by
a variety of manufacturers: Sony and Boss for headphones, and Plantronics, Sennheisser, and
Tandberg for headsets. Criteria used for comparison were comfort, sound quality, volume
control, durability, and cost.
The use of headsets with attached microphones was preferred over headphones with separate
microphones for the following reasons:
��The use of a headset will ensure a constant distance between the test taker and the
microphone at all times.
��The possibility of breakdown is reduced by using one device.
��Dealing with one device is more convenient operationally.
Finally, it was decided that study participants would be encouraged to use the headsets,
but would not be required to do so. It was also decided that participants would be allowed to use
earplugs instead of headsets if they so desired.
Based on the feedback from pilot study participants, the project team selected the
Tandberg headset for the main study. The Tandberg headset was perceived to be the most
comfortable; it is very robust; it has no volume-control capability (a feature preferred for
29
ensuring a speech capture); it has a nonmovable, easy-to-adjust unidirectional microphone
(another feature preferred for ensuring a speech capture); and it is relatively inexpensive (less
than $80).
Several different types of masking noise were also considered for use in the main study,
and these were tried out in the pilot study. These choices included standard “white noise,” “pink
noise,” and a newer sound algorithm called “brown noise.” Analog and pure-digital versions of
these noise algorithms were tried out. Although each of the algorithms was determined to be
adequate for the purposes of this study, we found that an analog recording of pink noise was the
most pleasing to the majority of people questioned. This version of pink noise was created from
the analog outputs of a commercially available noise generator; it had fewer frequencies above
5000 Hz than a purely digital version of pink noise.
30
List of Appendixes
Page
A. Testing Center Layout .............................................................................................................31
B. Study Participants’ Reports of Distraction on Research Study Test and on Previous
(Operational) Test ....................................................................................................................32
C. Study Participants’ Reports of Test Anxiety on Research Study Test and on Previous
(Operational) Test ...................................................................................................................33
D. Study Participants’ Reports of Effort on Research Study Test and on Previous
(Operational) Test ...................................................................................................................34
E. Regression Analyses for Influence of Distraction ...................................................................35
F. Regression Analyses for Influence of Testing Condition ........................................................38
G. Regression Analyses for Influence of Test Anxiety ...............................................................41
H. Regression Analyses for Influence of Effort ...........................................................................44
31
Appendix A
Testing Center Layout
32
Appendix B Study Participants’ Reports of Distraction on Research Study Test and on Previous (Operational) Test Research Test
Statements
Distraction With
Headset
Distraction With Headset and
Masking Noise
No
Distraction
Operational
Test I found myself thinking about how
noisy the testing room was.
79
72
23
17 The testing room was relatively free
from interruptions.
21
39
64
74 Distractions in the testing room
really annoyed me.
65
59
16
17 Any distractions probably didn’t
affect my test performance much.
24
25
41
48 It was annoying when other people
talked during the test.
74
56
33
35 During the test, my “train of
thought” was often disrupted.
79
63
32
24 I found myself thinking more about
the surroundings….
49
52
4
16 The testing room was so loud that I
couldn’t “hear” myself think.
51
36
4
4 Due to noise or other interruptions,
my test performance suffered.
55
45
9
10 The level of noise in the testing
center was very noticeable.
96
79
18
14 Distractions in the testing center
really slowed me down.
53
53
9
18 Distractions interfered with my
thinking process.
77
67
25
28 I had to exert a lot of effort to
concentrate.
70
70
29
34 I found myself listening to what
others were saying.
64
55
14
13 I had lots of “starts and stops” or
interruptions.
51
48
4
11 Because of distractions, I probably
made careless mistakes.
70
63
29
31 There was a lot of commotion in the
testing room.
49
47
9
7 Noise at the testing center was not a
problem for me.
11
19
54
48 The level of noise was
uncomfortably loud.
65
55
16
6 I felt “overloaded” because of
distractions in the testing room.
39
44
5
6 I had no problem “tuning out” any
noise or distraction.
19
31
41
40 Note: Table entries are percentages of study participants who agreed or strongly agreed with each statement.
33
Appendix C
Study Participants’ Reports of Test Anxiety on Research Study Test
and on Previous (Operational) Test
Research Test
Statements
Distraction
With Headset
Distraction With Headset and Masking
Noise
No Distraction
Operational Test
I felt tense and unsure.
38
49
14
54
Thinking about how I was doing interfered with my work.
43
43
32
55
The harder I tried, the worse I did.
13
24
7
22
Thoughts of doing poorly interfered with my concentration.
28
36
34
53
I worried a lot before taking the test.
11
9
11
67
I found myself thinking about how poorly I was doing
36
37
23
43
After the test was over I couldn’t stop worrying.
9
13
12
38
I got so nervous that I forgot things I really knew.
6
25
14
40
Note: Table entries are percentages of study participants who agreed or strongly agreed with each statement.
34
Appendix D
Study Participants’ Reports of Effort on Research Study Test
and on Previous (Operational) Test
Research Test
Statements
Distraction
With Headset
Distraction With Headset and Masking
Noise
No Distraction
Operational Test
I really tried to do my very best.
87
75
70
88
I really didn’t care how well I did.
20
22
20
3
I was motivated to do well.
60
61
55
84
I was very concerned about doing well on the test.
45
44
59
90
I didn’t try nearly as hard as I could have.
23
31
38
16
Getting a good score was not important to me.
13
22
29
5
Note: Table entries are percentages of study participants who agreed or strongly agreed with each statement.
35
Appendix E
Regression Analyses for Influence of Distraction
Table E.1 Influence of Distraction on GMAT Performance
Explanatory Variables Cumulative
R2 Increase in
R2 F for increase
in R2
df
Dependent variable: GMAT Verbal 1. GMAT V, GMAT Q, GMAT W .33 .33 3.6* 3, 22 2. GMAT V, GMAT Q, GMAT W, Distraction Score
.44
.11
4.2
4, 21
Dependent variable: GMAT Quantitative
1. GMAT V, GMAT Q, GMAT W .86 .86 45.4*** 3, 22 2. GMAT V, GMAT Q, GMAT W, Distraction Score
.88
.02
2.6
4, 21
Dependent variable: GMAT Writing
1. GMAT V, GMAT Q, GMAT W .23 .23 2.2 3, 22 2. GMAT V, GMAT Q, GMAT W, Distraction Score
.33
.09
3.2
4, 21
Dependent variable: GMAT Total
1. GMAT V, GMAT Q, GMAT W .68 .68 15.5*** 3, 22 2. GMAT V, GMAT Q, GMAT W, Distraction Score
.76
.08
7.3*
4, 21
* p<.05, *** p<.001
Note: Dependent variables are test scores from the nonoperational test that participants took for this study.
Explanatory variables are test scores from a previous, operational administration of the test.
36
Table E.2 Influence of Distraction on GRE General Test Scores
Explanatory Variables Cumulative
R2 Increase in
R2 F for increase
in R2
df
Dependent variable: GRE Verbal 1. GRE V, GRE Q, GRE A .72 .72 52.7*** 3, 61 2. GRE V, GRE Q, GRE A, Distraction Score
.73
.01
3.0
4, 60
Dependent variable: GRE Quantitative
1. GRE V, GRE Q, GRE A .80 .80 80.2*** 3, 61 2. GRE V, GRE Q, GRE A, Distraction Score
.81
.01
2.9
4, 60
Dependent variable: GRE Analytical
1. GRE V, GRE Q, GRE A .71 .71 49.1*** 3, 61 2. GRE V, GRE Q, GRE A, Distraction Score
.71
.00
0.5
4, 60
Dependent variable: GRE Total
1. GRE V, GRE Q, GRE A .85 .85 113.8*** 3, 61 2. GRE V, GRE Q, GRE A, Distraction Score
.85
.00
1.3
4, 60
*** p<.001
Note: Dependent variables are test scores from the nonoperational test that participants took for this study.
Explanatory variables are test scores from a previous, operational administration of the test.
37
Table E.3 Influence of Distraction on TOEFL Scores
Explanatory Variables Cumulative
R2 Increase in
R2 F for increase
in R2
df
Dependent variable: TOEFL Listening 1. TOEFL L, TOEFL R, TOEFL W,
TOEFL Essay
.77
.77
37.8***
4, 46 2. TOEFL L, TOEFL R, TOEFL W, TOEFL Essay, Distraction Score
.78
.01
2.3
5, 45
Dependent variable: TOEFL Reading
1. TOEFL L, TOEFL R, TOEFL W, TOEFL Essay
.50
.50
11.3***
6, 46
2. TOEFL L, TOEFL R, TOEFL W, TOEFL Essay, Distraction Score
.50
.00
0.4
5, 45
Dependent variable: TOEFL Writing
1. TOEFL L, TOEFL R, TOEFL W, TOEFL Essay
.50
.50
11.6***
4, 46
2. TOEFL L, TOEFL R, TOEFL W, TOEFL Essay, Distraction Score
.50
.00
0.1
5, 45
Dependent variable: TOEFL Essay
1. TOEFL L, TOEFL R, TOEFL W, TOEFL Essay
.21
.21
3.0*
4, 46
2. TOEFL L, TOEFL R, TOEFL W, TOEFL Essay, Distraction Score
.21
.00
0.0
5, 45
Dependent variable: TOEFL Total
1. TOEFL L, TOEFL R, TOEFL W, TOEFL Essay
.76
.76
36.2***
4, 46
2. TOEFL L, TOEFL R, TOEFL W, TOEFL Essay, Distraction Score
.76
.01
1.1
5, 45
* p<.05, *** p<.001
Note: Dependent variables are test scores from the nonoperational test that participants took for this study.
Explanatory variables are test scores from a previous, operational administration of the test.
38
Appendix F
Regression Analyses for Influence of Testing Condition Table F.1 Influence of Testing Condition on GMAT Performance
Explanatory Variables Cumulative
R2 Increase in
R2 F for increase
in R2
df
Dependent variable: GMAT Verbal 1. GMAT V, GMAT Q, GMAT W .33 .33 3.6* 3, 22 2. GMAT V, GMAT Q, GMAT W, Testing Condition
.50
.17
3.5*
5, 20
Dependent variable: GMAT Quantitative
1. GMAT V, GMAT Q, GMAT W .86 .86 45.4*** 3, 22 2. GMAT V, GMAT Q, GMAT W, Testing Condition
.87
.01
0.5
5, 20
Dependent variable: GMAT Writing
1. GMAT V, GMAT Q, GMAT W .23 .23 2.2 3, 22 2. GMAT V, GMAT Q, GMAT W, Testing Condition
.29
.06
0.8
5, 20
Dependent variable: GMAT Total
1. GMAT V, GMAT Q, GMAT W .68 .68 15.5*** 3, 22 2. GMAT V, GMAT Q, GMAT W, Testing Condition
.75
.08
3.1
5, 20
* p<.05, *** p<.001
Note: Dependent variables are test scores from the nonoperational test that participants took for this study.
Explanatory variables are test scores from a previous, operational administration of the test.
39
Table F.2 Influence of Testing Condition on GRE General Test Scores
Explanatory Variables Cumulative
R2 Increase in
R2 F for increase
in R2
df
Dependent variable: GRE Verbal 1. GRE V, GRE Q, GRE A .72 .72 52.7*** 3, 61 2. GRE V, GRE Q, GRE A, Testing Condition
.74
.02
2.1
5, 59
Dependent variable: GRE Quantitative
1. GRE V, GRE Q, GRE A .80 .80 80.2*** 3, 61 2. GRE V, GRE Q, GRE A, Testing Condition
.81
.01
1.6
5, 59
Dependent variable: GRE Analytical
1. GRE V, GRE Q, GRE A .71 .71 49.1*** 3, 61 2. GRE V, GRE Q, GRE A, Testing Condition
.72
.01
1.6
5, 59
Dependent variable: GRE Total
1. GRE V, GRE Q, GRE A .85 .85 113.8*** 3, 61 2. GRE V, GRE Q, GRE A, Testing Condition
.85
.00
0.5
5, 59
*** p<.001
Note: Dependent variables are test scores from the nonoperational test that participants took for this study.
Explanatory variables are test scores from a previous, operational administration of the test.
40
Table F.3 Influence of Testing Condition on TOEFL Scores
Explanatory Variables Cumulative
R2 Increase in
R2 F for increase
in R2 df
Dependent variable: TOEFL Listening
1. TOEFL L, TOEFL R, TOEFL W, TOEFL Essay
.77 .77 37.8*** 4, 46
2. TOEFL L, TOEFL R, TOEFL W, TOEFL Essay, Testing Condition
.77 .00 0.3 6, 44
Dependent variable: TOEFL Reading
1. TOEFL L, TOEFL R, TOEFL W, TOEFL Essay
.50 .50 11.3*** 4, 46
2. TOEFL L, TOEFL R, TOEFL W TOEFL Essay, Testing Condition
.50 .01 0.3 6, 44
Dependent variable: TOEFL Writing
1. TOEFL L, TOEFL R, TOEFL W, TOEFL Essay
.50 .50 11.6*** 4, 46
2. TOEFL L, TOEFL R, TOEFL W, TOEFL Essay, Testing Condition
.52 .02 0.7 6, 44
Dependent variable: TOEFL Essay
1. TOEFL L, TOEFL R, TOEFL W, TOEFL Essay
.21 .21 3.0* 4, 46
2. TOEFL L, TOEFL R, TOEFL W, TOEFL Essay, Testing Condition
.21 .00 0.1 6, 44
Dependent variable: TOEFL Total
1. TOEFL L, TOEFL R, TOEFL W, TOEFL Essay
.76 .76 36.2*** 4, 46
2. TOEFL L, TOEFL R, TOEFL W, TOEFL Essay, Testing Condition
.76 .00 0.2 6, 44
* p<.05, *** p<.001 Note: Dependent variables are test scores from the nonoperational test that participants took for this study.
Explanatory variables are test scores from a previous, operational administration of the test.
41
Appendix G
Regression Analyses for Influence of Test Anxiety Table G.1 Influence of Test Anxiety on GMAT Performance
Explanatory Variables Cumulative
R2 Increase in
R2 F for increase
in R2
df
Dependent variable: GMAT Verbal 1. GMAT V, GMAT Q, GMAT W .34 .34 3.8* 3, 22 2. GMAT V, GMAT Q, GMAT W, Test Anxiety Score
.71
.36
26.0***
4, 21
Dependent variable: GMAT Quantitative
1. GMAT V, GMAT Q, GMAT W .87 .87 48.3*** 3, 22 2. GMAT V, GMAT Q, GMAT W, Test Anxiety Score
.89
.02
4.6*
4, 21
Dependent variable: GMAT Writing
1. GMAT V, GMAT Q, GMAT W .28 .28 2.9 3, 22 2. GMAT V, GMAT Q, GMAT W, Test Anxiety Score
.50
.21
8.9**
4, 21
Dependent variable: GMAT Total
1. GMAT V, GMAT Q, GMAT W .69 .69 16.3*** 3, 21 2. GMAT V, GMAT Q, GMAT W,
Test Anxiety Score
.86
.17
25.6***
4, 21 * p<.05, ** p<.01, *** p<.001
42
Table G.2 Influence of Test Anxiety on GRE General Test Performance
Explanatory Variables Cumulative
R2 Increase in
R2 F for increase
in R2
df
Dependent variable: GRE Verbal 1. GRE V, GRE Q, GRE A .75 .75 60.2*** 3, 61 2. GRE V, GRE Q, GRE A, Test Anxiety Score
.75
.00
0.4
4, 60
Dependent variable: GRE Quantitative
1. GRE V, GRE Q, GRE A .80 .80 83.9*** 3, 61 2. GRE V, GRE Q, GRE A, Test Anxiety Score
.80
.00
0.1
4, 60
Dependent variable: GRE Analytical
1. GRE V, GRE Q, GRE A .60 .60 30.4*** 3, 61 2. GRE V, GRE Q, GRE A, Test Anxiety Score
.60
.00
0.3
4, 60
Dependent variable: GRE Total
1. GRE V, GRE Q, GRE A .83 .83 100.6*** 3, 61 2. GRE V, GRE Q, GRE A, Test Anxiety Score
.83
.00
0.0
4, 60
*** p<.001
43
Table G.3 Influence of Test Anxiety on TOEFL Performance
Explanatory Variables Cumulative
R2 Increase in
R2 F for increase
in R2
df
Dependent variable: TOEFL Listening 1. TOEFL L, TOEFL R, TOEFL W,
TOEFL Essay
.73
.73
31.1***
4, 46 2. TOEFL L, TOEFL R, TOEFL W,
TOEFL Essay, Test Anxiety Score
.73
.00
0.5
5, 45
Dependent variable: TOEFL Reading
1. TOEFL L, TOEFL R, TOEFL W, TOEFL Essay
.59
.59
16.3***
4, 46
2. TOEFL L, TOEFL R, TOEFL W, TOEFL Essay, Test Anxiety Score
.59
.01
0.9
5, 45
Dependent variable: TOEFL Writing
1. TOEFL L, TOEFL R, TOEFL W, TOEFL Essay
.55
.55
14.3***
4, 46
2. TOEFL L, TOEFL R, TOEFL W, TOEFL Essay, Test Anxiety Score
.56
.01
0.5
5, 45
Dependent variable: TOEFL Essay
1. TOEFL L, TOEFL R, TOEFL W, TOEFL Essay
.25
.25
3.8**
4, 46
2. TOEFL L, TOEFL R, TOEFL W, TOEFL Essay, Test Anxiety Score
.28
.03
1.8
5, 45
Dependent variable: TOEFL Total
1. TOEFL L, TOEFL R, TOEFL W, TOEFL Essay
.80
.80
46.6***
4, 46
2. TOEFL L, TOEFL R, TOEFL W, TOEFL Essay, Test Anxiety Score
.80
.00
0.3
5, 45
** p<.01, *** p<.001
44
Appendix H
Regression Analyses for Influence of Effort Table H.1 Influence of Effort on GMAT Performance
Explanatory Variables Cumulative
R2 Increase in
R2 F for increase
in R2
df
Dependent variable: GMAT Verbal 1. GMAT V, GMAT Q, GMAT W .33 .33 3.6* 3, 22 2. GMAT V, GMAT Q, GMAT W, Effort Score
.36
.03
1.1
4, 21
Dependent variable: GMAT Quantitative
1. GMAT V, GMAT Q, GMAT W .86 .86 45.4*** 3, 22 2. GMAT V, GMAT Q, GMAT W, Effort Score
.86
.00
0.2
4, 21
Dependent variable: GMAT Writing
1. GMAT V, GMAT Q, GMAT W .23 .23 2.2 3, 22 2. GMAT V, GMAT Q, GMAT W, Effort Score
.33
.10
3.0
4, 21
Dependent variable: GMAT Total
1. GMAT V, GMAT Q, GMAT W .68 .68 15.5*** 3, 22 2. GMAT V, GMAT Q, GMAT W, Effort Score
.69
.01
0.8
4, 21
*** p<.001
45
Table H.2 Influence of Effort on GRE General Test Performance
Explanatory Variables Cumulative
R2 Increase in
R2 F for increase
in R2 df
Dependent variable: GRE Verbal
1. GRE V, GRE Q, GRE A .72 .72 52.7*** 3, 61 2. GRE V, GRE Q, GRE A, Effort Score
.73
.01
2.1
4, 60
Dependent variable: GRE Quantitative
1. GRE V, GRE Q, GRE A .80 .80 80.2*** 3, 61 2. GRE V, GRE Q, GRE A, Effort Score
.80
.00
0.3
4, 60
Dependent variable: GRE Analytical
1. GRE V, GRE Q, GRE A .71 .71 49.1*** 3, 61 2. GRE V, GRE Q, GRE A, Effort Score
.72
.01
2.4
4, 60
Dependent variable: GRE Total
1. GRE V, GRE Q, GRE A .85 .85 113.8*** 3, 61 2. GRE V, GRE Q, GRE A, Effort Score
.86
.01
3.6
4, 60
*** p<.001
46
Table H.3 Influence of Effort on TOEFL Performance
Explanatory Variables Cumulative
R2 Increase in
R2 F for increase
in R2
df
Dependent variable: TOEFL Listening 1. TOEFL L, TOEFL R, TOEFL W,
TOEFL Essay
.77
.77
37.8***
4, 46 2. TOEFL L, TOEFL R, TOEFL W,
TOEFL Essay, Effort Score
.77
.00
0.5
5, 45
Dependent variable: TOEFL Reading
1. TOEFL L, TOEFL R, TOEFL W, TOEFL Essay
.50
.50
11.3***
6, 46
2. TOEFL L, TOEFL R, TOEFL W, TOEFL Essay, Effort Score
.52
.02
2.2
5, 45
Dependent variable: TOEFL Writing
1. TOEFL L, TOEFL R, TOEFL W, TOEFL Essay
.50
.50
11.6***
6, 46
2. TOEFL L, TOEFL R, TOEFL W, TOEFL Essay, Effort Score
.54
.03
3.3
5, 45
Dependent variable: TOEFL Essay
1. TOEFL L, TOEFL R, TOEFL W, TOEFL Essay
.21
.21
3.0*
6, 46
2. TOEFL L, TOEFL R, TOEFL W, TOEFL Essay, Effort Score
.25
.05
2.7
5, 45
Dependent variable: TOEFL Total
1. TOEFL L, TOEFL R, TOEFL W, TOEFL Essay
.76
.76
36.2***
6, 46
2. TOEFL L, TOEFL R, TOEFL W, TOEFL Essay, Effort Score
.77
.01
1.9
5, 45
* p<.05, *** p<.001
57906-005535 • Y42M.7 • Printed in U.S.A.
I.N. 993553
Test of English as a Foreign LanguageP.O. Box 6155
Princeton, NJ 08541-6155USA
�������������������������������������������������
To obtain more information about TOEFL
programs and services, use one of the following:
Phone: 609-771-7100Email: [email protected]
Web site: http://www.toefl.org
®