50
Copyright © 2006 NCS Pearson, Inc. All rights reserved. Advanced Numerical Reasoning Appraisal TM (ANRA) Manual John Rust 888-298-6227 TalentLens.com

Advanced Numerical Reasoning Appraisal TM (ANRA)talentlens.pearsonpsychcorp.com.au/files/ANRA_Manual(1).pdf · Numerical Reasoning Appraisal (RANRA). This manual details our adaptation

  • Upload
    others

  • View
    74

  • Download
    3

Embed Size (px)

Citation preview

Page 1: Advanced Numerical Reasoning Appraisal TM (ANRA)talentlens.pearsonpsychcorp.com.au/files/ANRA_Manual(1).pdf · Numerical Reasoning Appraisal (RANRA). This manual details our adaptation

Copyright © 2006 NCS Pearson, Inc. All rights reserved.

Advanced Numerical Reasoning Appraisal TM

(ANRA)

Manual

John Rust

888-298-6227 • TalentLens.com

888-298-6227 • TalentLens.com

Page 2: Advanced Numerical Reasoning Appraisal TM (ANRA)talentlens.pearsonpsychcorp.com.au/files/ANRA_Manual(1).pdf · Numerical Reasoning Appraisal (RANRA). This manual details our adaptation

Copyright © 2006 by NCS Pearson, Inc. All rights reserved. No part of this publication may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopy, recording, or any information storage and retrieval system, without permission in writing from the copyright owner. The Pearson and TalentLens logos, and Advanced Numerical Reasoning Appraisal are trademarks,in the U.S. and/or other countries, of Pearson Education, Inc. or its affiliate(s). Portions of this work were previously published. Printed in the United States of America.

Page 3: Advanced Numerical Reasoning Appraisal TM (ANRA)talentlens.pearsonpsychcorp.com.au/files/ANRA_Manual(1).pdf · Numerical Reasoning Appraisal (RANRA). This manual details our adaptation

Copyright © 2006 by NCS Pearson, Inc. All rights reserved.

Table of Contents

Acknowledgements Chapter 1 Introduction............................................................................................. 1

Numerical Reasoning and Critical Thinking ............................................................ 2

Chapter 2 History and Development of ANRA....................................................... 4 Description of the Test ............................................................................................ 4

Adapting RANRA .................................................................................................... 4

Development of RANRA ......................................................................................... 5

Chapter 3 Directions for Administration ................................................................ 6 General Information ................................................................................................ 6

Preparing for Administration.................................................................................... 6

Testing Conditions .................................................................................................. 7

Answering Questions .............................................................................................. 7

Administering the Test ............................................................................................ 7

Scoring and Reporting ............................................................................................ 8

Test Security ........................................................................................................... 8

Concluding Test Administration .............................................................................. 8

Administering ANRA and Watson-Glaser Critical Thinking Appraisal® in a Single Testing Session..................................................................................... 9

Accommodating Examinees with Disabilities .......................................................... 9

Chapter 4 ANRA Norms Development.................................................................... 10 Using ANRA as a Norm- or Criterion-Referenced Test........................................... 10

Using Norms to Interpret Scores............................................................................. 11

Converting Raw Scores to Percentile Ranks .......................................................... 12

Using Standard Scores to Interpret Performance ................................................... 12

Converting z Scores to T Scores....................................................................... 13

Using ANRA and Watson-Glaser Critical Thinking Appraisal Together .................. 14

Page 4: Advanced Numerical Reasoning Appraisal TM (ANRA)talentlens.pearsonpsychcorp.com.au/files/ANRA_Manual(1).pdf · Numerical Reasoning Appraisal (RANRA). This manual details our adaptation

Copyright © 2006 by NCS Pearson, Inc. All rights reserved.

Chapter 5 Evidence of Reliability............................................................................ 15 Reliability Coefficients and Standard Error of Measurement................................... 15

RANRA Reliability Studies ..................................................................................... 17

ANRA Reliability Studies......................................................................................... 17

Evidence of Internal Consistency ...................................................................... 18

Evidence of Test-Retest Stability ...................................................................... 20

Chapter 6 Evidence of Validity ................................................................................ 20 Face Validity............................................................................................................ 20

Evidence Based on Test Content............................................................................ 21

Evidence Based on Test-Criterion Relationships.................................................... 22

Correlations Between ANRA Test1 and Test 2....................................................... 25

Evidence of Convergent and Discriminant Validity ................................................. 25

Correlations Between ANRA and Watson-Glaser

Critical Thinking Appraisal—Short Form ........................................................... 25

Correlations Between ANRA and Other Tests .................................................. 26

Chapter 7 Using ANRA as an Employment Selection Tool .................................. 27 Employment Selection ............................................................................................ 27

Using ANRA in Making a Hiring Decision ............................................................... 27

Differences in Reading Ability, Including the Use of English

as a Second Language .......................................................................................... 29

Using ANRA as a Guide for Training, Learning, and Education.............................. 29

Fairness in Selection Testing .................................................................................. 30

Legal Considerations......................................................................................... 30

Group Differences and Adverse Impact ............................................................ 30

Monitoring the Selection System....................................................................... 31

References................................................................................................................... 32

Appendices Appendix A Description of the Normative Sample .................................................... 35

Appendix B ANRA Total Raw Scores, Mid-Point Percentile Ranks, and T Scores by Norm Group....................................................................... 37

Appendix C Combined Waston-Glaser and ANRA T Scores and Percentile Ranks by Norm Group.......................................................... 39

Page 5: Advanced Numerical Reasoning Appraisal TM (ANRA)talentlens.pearsonpsychcorp.com.au/files/ANRA_Manual(1).pdf · Numerical Reasoning Appraisal (RANRA). This manual details our adaptation

Copyright © 2006 by NCS Pearson, Inc. All rights reserved.

Tables Table 5.1 Coefficient Alpha, Odd-Even Split-Half Reliability, and

Standard Error of Measurement (SEM) for RANRA (from Rust, 2002, p.85).......................................................................... 17

Table 5.2 ANRA Means, Standard Deviations (SD), Standard Errors of Measurement (SEM), and Internal Consistency Reliability Coefficients (Alpha) ............................................................................... 18

Table 5.3 ANRA Test-Retest Stability (N = 73)...................................................... 19

Table 6.1 Evidence of ANRA Criterion-Related Validity (Total Raw Score) of Job Incumbents in Various Finance-Related Occupations and Position Levels ............................................................................... 24

Table 6.2 Correlations Between Watson-Glaser Critical Thinking Appraisal—Short Form and ANRA (N = 452) ........................................ 25

Table 6.3 Correlations Between ANRA, the Miller Analogies Test for Professional Selection (MAT for PS), and the Differential Aptitude Tests for Personnel and Career Assessment—Numerical Ability (DAT for PCA—NA) .................................................................... 26

Figure Figure 4.1 The Relationship of Percentiles to T Scores ......................................... 14

Page 6: Advanced Numerical Reasoning Appraisal TM (ANRA)talentlens.pearsonpsychcorp.com.au/files/ANRA_Manual(1).pdf · Numerical Reasoning Appraisal (RANRA). This manual details our adaptation

Copyright © 2006 by NCS Pearson, Inc. All rights reserved.

Acknowledgements

Pearson’s Talent Assessment group would like to recognize and thank Professor John Rust, Director of the Psychometrics Center at the University of Cambridge, United Kingdom, for his seminal efforts that led to his development of the Rust Advanced Numerical Reasoning Appraisal (RANRA). This manual details our adaptation of RANRA for use in the United States—the Advanced Numerical Reasoning Appraisal (ANRA).

We are indebted to numerous professionals and organizations for their assistance during several phases of our work—project design, data collection, statistical data analyses, editing, and publication.

We acknowledge the efforts of Julia Kearney, Sampling Projects Coordinator; Jane McDonald, Sampling Recruiter; Terri Garrard, Study Manager; David Quintero, Clinical Handscoring Supervisor; Hector Solis, Sampling Manager, and Victoria Locke, Director, Field Research, in driving the data collection activities. Nishidha Goel helped to collate and prepare the data.

We thank Zhiming Yang, PhD, Psychometrician, and JJ Zhu, PhD, Director of Psychometrics, Clinical Products. Dr. Yang’s technical expertise in analyzing the data and Dr. Zhu's psychometric leadership ensured the high level of psychometric integrity of the results.

Our thanks also go to Toby Mahan and Troy Beehler, Project Managers, for diligently managing the logistics of this project. Toby and Troy worked with several team members from the Technology Products Group, Pearson to ensure the high quality and accuracy of the computer interface. These dedicated individuals included Paula Oles, Manager, Software Quality Assurance; Christina McCumber, Software Quality Assurance Analyst; Matt Morris, Manager, System Development; Maurya Buchanan, Technical Writer; and Alan Anderson, Director, Technology Products Group. Dawn Dunleavy, Senior Managing Editor, Konstantin Tikhonov, Project Editor; and Marion Jones, Director, Mathematics, provided editorial guidance. Mark Cooley assisted with the design of the cover.

Finally, we wish to acknowledge the leadership, guidance, support, and commitment of the following people through all the phases of this project: Jenifer Kihm, PhD, Senior Product Line Manager, Talent Assessment; John Toomey, Director, Talent Assessment; Paul McKeown, International Product Development Director; Judy Chartrand, PhD, Director, Test Development; Gene Bowles, Vice President, Publishing and Technology; Larry Weiss, PhD, Vice President, Psychological Assessment Products Group; and Aurelio Prifitera, PhD, Group President and CEO of Clinical Assessment/Worldwide.

Kingsley C. Ejiogu, PhD, Research Director

John Trent, M.S., Research Director

Mark Rose, PhD, Research Director

Page 7: Advanced Numerical Reasoning Appraisal TM (ANRA)talentlens.pearsonpsychcorp.com.au/files/ANRA_Manual(1).pdf · Numerical Reasoning Appraisal (RANRA). This manual details our adaptation

Copyright © 2006 by NCS Pearson, Inc. All rights reserved. 1

Chapter 1

Introduction

The Advanced Numerical Reasoning Appraisal (ANRA) measures the ability to recognize,

understand, and apply mathematical and statistical reasoning. Specifically, ANRA measures

numerical reasoning abilities that involve deduction, interpretation, and evaluation. Numerical

reasoning, as measured by ANRA, is operationally defined as the ability to correctly perform the

domain of tasks represented by two sets of items—Comparison of Quantities and Sufficiency of

Information. Both require the use of analytical skills rather than straightforward computational

skills. The key attribute ANRA measures is an individual’s ability to apply numerical reasoning

to everyday problem solving in professional and business settings.

Starkey (1992) describes numerical reasoning as comprising “a set of abilities that are used to

operate upon or mentally manipulate representations of numerosity” (p. 94). Research suggests

that numerical reasoning abilities exist even in infancy, before children begin to receive explicit

instruction in mathematics in school (Brannon, 2002; Feigenson, Dehaene, & Spelke, 2004;

Spelke, 2005; Starkey, 1992; Wynn, Bloom, & Chiang, 2002). As Spelke (2005) observed,

children harness these core abilities when they learn mathematics, and adults use the core abilities

to engage in mathematical and scientific thinking.

The numerical reasoning skill is the foundation of all other numerical ability (Rust, 2002). This

skill enables individuals to learn how to evaluate situations, how to select and apply strategies for

problem-solving, how to draw logical conclusions using numerical data, how to describe and

develop solutions, and to recognize when and how to apply the solutions. Eventually, one is able

to reflect on solutions to problems and determine whether the solutions make sense.

The nature of work is changing significantly and there is an increased demand for a new kind of

worker—the knowledge worker (Hunt, 1995). As Facione (2006) observed, though the ability to

think critically and make sound decisions does not absolutely guarantee a life of happiness and

economic success, having this ability equips an individual to improve his or her future and

contribute to society. As the Internet has transformed home life and leisure time, people have

been deluged with data of ever-increasing complexity. They must select, interpret, digest,

evaluate, learn, and apply information.

Employers are typically interested in tests that measure candidates' ability to apply constructively

and critically, rather than by rote, what they have learned. A person can be trained or educated to

Page 8: Advanced Numerical Reasoning Appraisal TM (ANRA)talentlens.pearsonpsychcorp.com.au/files/ANRA_Manual(1).pdf · Numerical Reasoning Appraisal (RANRA). This manual details our adaptation

Copyright © 2006 by NCS Pearson, Inc. All rights reserved. 2

engage in numerical reasoning; as a result, tests that measure the ability to use mathematical

reasoning within the context of work have an important function in career development. Such

tests enable an organization to identify candidates who may need to improve their skills to

enhance their work effectiveness and career success.

Numerical Reasoning and Critical Thinking In a skills search of the O*Net OnLine database for “Mathematics” (defined by O*Net OnLine as

“using mathematics to solve problems”) and “Critical Thinking” (defined by O*Net OnLine as

“using logic and reasoning to identify the strengths and weaknesses of alternative solutions,

conclusions, or approaches to problems”), both of these skills were rated as “Very Important” for

as many as 99 occupations (accountant, actuary, auditor, financial analyst, government service

executive, management analyst, occupational health and safety specialist, etc.). Numerical

reasoning and critical thinking are essential parts of the cognitive complexity that is a basic factor

for understanding group differences in work performance (Nijenhuis & Flier, 2005).

Both numerical reasoning and critical thinking are higher-order thinking skills—“fundamental

skills that are essential to being a responsible, decision-making member of the work-place”

(Paul & Nosich, 2004, p. 5). Paul and Nosich contrasted the higher-order thinking skills with such

lower-order thinking skills as rote memorization and recall, and they noted that critical thinking

could be applied to any subject matter and any situation where reasoning is relevant. Such a

subject matter or situation could range from accounting (Kealy, Holland, & Watson, 2005;

American Institute of Certified Public Accountants, 1999), through medicine (Vandenbroucke,

1998), to truck driving (Nijenhuis & Flier, 2005). As Paul and Nosich (2004) stated, in any

context where we are thinking well, we are thinking critically.

The enhancement of critical thinking in U.S. college students is a national priority (National

Educational Goals Panel, 1991). In a paper commissioned by the United States Department of

Education, Paul and Nosich (2004) highlighted what the National Council for Excellence in

Critical Thinking Instruction regarded as a basic principle of critical thinking instruction as

applied to subject-matter teaching: “to achieve knowledge in any domain, it is essential to think

critically” (Paul & Nosich, p. 33). Critical thinking is the skill that is required to increase the

probability of desirable outcomes in our lives, such as making the right career choice, using

money wisely, or planning our future. Such critical thinking is reasoned, purposeful, and goal

directed. At the cognitive level, such critical thinking involves solving problems, formulating

inferences, calculating likely outcomes and decision-making. Once people have developed this

Page 9: Advanced Numerical Reasoning Appraisal TM (ANRA)talentlens.pearsonpsychcorp.com.au/files/ANRA_Manual(1).pdf · Numerical Reasoning Appraisal (RANRA). This manual details our adaptation

Copyright © 2006 by NCS Pearson, Inc. All rights reserved. 3

critical thinking skill, they are able to apply it in a wide variety of circumstances. Critical

thinking can involve proper language use, applied logic, and practical mathematics.

Because ANRA items require higher-order numerical reasoning skills, rather than rote calculation

to solve, using the Watson-Glaser Critical Thinking Appraisal® (a reliable and valid test of verbal

critical thinking) in conjunction with ANRA provides a demanding, high-level measurement of

numerical reasoning and verbal critical thinking skills, respectively. These two skills are

important when recruiting in the competitive talent assessment market.

In response to requests from Watson-Glaser Critical Thinking Appraisal customers in the United

Kingdom, The Psychological Corporation (now Pearson) in the UK developed the Rust Advanced

Numerical Reasoning Appraisal (RANRA) in 2000 as a companion numerical reasoning test for

the Watson-Glaser Critical Thinking Appraisal. In 2006, Pearson adapted RANRA to enhance

the suitability and applicability of the test in the United States. This manual contains detailed

information on the U.S. adaptation—ANRA.

Page 10: Advanced Numerical Reasoning Appraisal TM (ANRA)talentlens.pearsonpsychcorp.com.au/files/ANRA_Manual(1).pdf · Numerical Reasoning Appraisal (RANRA). This manual details our adaptation

Copyright © 2006 by NCS Pearson, Inc. All rights reserved. 4

Chapter 2

History and Development of ANRA

Description of the Test

ANRA consists of a set of two tests: Test 1—Comparison of Quantities and Test 2—Sufficiency

of Information. The candidate must apply his or her numerical reasoning skills to decisions that

reflect the wide variety of numerical estimation and analytic tasks frequently encountered in

many everyday situations at work or in a learning environment.

The two ANRA tests are designed to measure different, but interdependent, aspects of numerical

reasoning. The tests require the candidate to consider alternatives (either by comparing quantities

or judging information to be sufficient) in relation to given problems. The examinee's task is to

study each problem and to evaluate the appropriateness or validity of the alternatives. The ANRA

maximum total raw score is 32.

Because ANRA is intended as a test of numerical reasoning power rather than speed, there is no

rigid time limit for taking the test. Candidates should be given as much time as they reasonably

need to finish the test. An individual typically completes the test in about 45 minutes. About 90%

of the 452 individuals in the normative group who were employed in professional, management,

and higher-level positions completed the test within 75 minutes.

Adapting RANRA The Rust Advanced Numerical Reasoning Appraisal (RANRA) was adapted to reflect U.S.

English and U.S. measurement units. Because RANRA measures reasoning more than

computation, only the measurement units were changed and the original numbers were kept,

except in cases where it affected the realism of the situation. For example, “82 kilograms” was

changed to “82 pounds,” though 82 kg = 180.4 lbs. Similarly, “5,000 British pounds sterling” was

changed to “5,000 U.S. dollars,” though 5,000 British pounds sterling ≠ 5,000 U.S. dollars.

ANRA contains the original 32 RANRA items plus additional items for continuous test

improvement purposes. All the items were reviewed by a group comprising 16 individuals—

researchers in test development, financial analysts, business development professionals,

industrial/organizational psychologists, and editors in test publishing. Item sentence construction

was modified in some items, based on input from the American reviewers.

Page 11: Advanced Numerical Reasoning Appraisal TM (ANRA)talentlens.pearsonpsychcorp.com.au/files/ANRA_Manual(1).pdf · Numerical Reasoning Appraisal (RANRA). This manual details our adaptation

Copyright © 2006 by NCS Pearson, Inc. All rights reserved. 5

Development of RANRA In developing RANRA (2002), Rust first did a conceptual analysis of the role of critical thinking

in the use of mathematics. Through this conceptual analysis, he identified the two subdomains of

comparison of quantities and sufficiency of information as the key concepts in developing an

assessment of mathematical reasoning. Rust then constructed 80 items and had a panel of

educators and psychologists evaluate and modify them, and then generated the pilot version of

RANRA. This pilot version of RANRA was administered to 76 students and staff from diverse

subject backgrounds within the University of London. The data were subjected to detailed

analysis at the item level. Distractor analysis led to the modification of some items. Item-

difficulty values were calculated for each item, based on the proportion of examinees passing

each item. The discrimination index was also calculated, and those items that showed they were

measuring a common quality in numerical reasoning were identified and retained. This approach

led to the development of the 32-item RANRA.

Page 12: Advanced Numerical Reasoning Appraisal TM (ANRA)talentlens.pearsonpsychcorp.com.au/files/ANRA_Manual(1).pdf · Numerical Reasoning Appraisal (RANRA). This manual details our adaptation

Copyright © 2006 by NCS Pearson, Inc. All rights reserved. 6

Chapter 3

Directions for Administration

General Information

ANRA is administered through the online testing platform at TalentLens.com, an Internet-based

testing system designed by Pearson, for the administration, scoring, and reporting of professional

assessments. Instructions for administrators on how to order and access the test online are

provided at TalentLens.com. Instructions for accessing ANRA interpretive reports are provided

on the website. After a candidate has taken ANRA online, the test administrator can use the link

Pearson provides to review the candidate’s results in an interpretive report.

Preparing for Administration Being thoroughly prepared before administering the test results in a more efficient administration

session. Test administrators should take ANRA prior to administering the test and comply with

the directions. Candidates are not allowed to use calculators or similar calculation devices while

completing the test. Test administrators should provide candidates with pencils, an eraser, and a

sheet of paper to write their calculations if needed.

Test administration must comply with the code of practice of the testing organization, applicable

government regulations, and the recommendations of the test publisher. Candidates should be

informed before the testing session about the nature of the assessment, why the test is being used,

the conditions under which they will be tested, and the nature of any feedback they will receive.

Test administrators need to assure candidates that their test results will remain confidential.

The test administrator must obtain informed consent from the candidate before testing. The

informed consent is a written statement that explains the type of test to be administered, the

purpose of the test, as well as who will have access to the test data, signed by the candidate. It is

the responsibility of the test user to ensure that candidates understand the testing procedure. The

test administrator should also ensure that all relevant background information from the candidate

is collected and verified (e.g., name, gender, educational level, current employment, occupational

history, and so on).

Page 13: Advanced Numerical Reasoning Appraisal TM (ANRA)talentlens.pearsonpsychcorp.com.au/files/ANRA_Manual(1).pdf · Numerical Reasoning Appraisal (RANRA). This manual details our adaptation

Copyright © 2006 by NCS Pearson, Inc. All rights reserved. 7

Testing Conditions The test administrator has a significant responsibility to ensure that the conditions under which

the test is taken do not contain undesirable influences on the test performance of candidates. Such

undesirable influences can either inflate or reduce the test scores of candidates. Poor

administration of a test undermines the value of test scores and makes an accurate interpretation

of results very difficult, if not impossible.

It is important to ensure that the test is administered in a quiet, well-lit room. The following

conditions are necessary for accurate scores and for maintaining the cooperation of the examinee:

good lighting, comfortable seating, adequate desk or table space, comfortable positioning of the

computer screen, keyboard and mouse, and freedom from noise and other distractions.

Interruptions and distractions from outside should be kept to a minimum, if not eliminated.

Answering Questions The test administrator may answer examinees' questions about the test before giving the signal to

begin. To maintain standard testing conditions, answer such questions by re-reading the

appropriate section of these directions. Do not volunteer new explanations or examples. The test

administrator is responsible for ensuring that examinees understand the correct way to indicate

their answers and what is required of the examinees. The question period should never be rushed

or omitted.

If any examinees have routine questions after the testing has started, try to answer them without

disturbing the other examinees. However, questions about the test items should be handled by

telling the examinee to do his or her best.

Administering the Test After the examinee is seated at the computer and the initial instruction screen for ANRA appears,

say,

The on-screen directions will take you through the entire process that begins with some demographic questions. After you have completed these questions, the test will begin. You will have as much time as you reasonably need to complete the test items. The test ends with a few additional demographic questions. Do you have any questions before starting the test?

Answer any questions and say, Please begin the test.

Page 14: Advanced Numerical Reasoning Appraisal TM (ANRA)talentlens.pearsonpsychcorp.com.au/files/ANRA_Manual(1).pdf · Numerical Reasoning Appraisal (RANRA). This manual details our adaptation

Copyright © 2006 by NCS Pearson, Inc. All rights reserved. 8

Once the examinee clicks the “Start Your Test” button, administration begins with the first page

of questions. The examinee may review test items at the end of the test. Allow examinees as

much time as they reasonably need to complete the test. Average completion time is about 45

minutes. About 90% of candidates are finished with the test within 75 minutes.

If an examinee’s computer develops technical problems during testing, the test administrator

should move the examinee to another suitable computer location. If the technical problems cannot

be solved by moving to another computer location, the administrator should contact Pearson’s

Technical Support at 1-888-298-6227 for assistance.

Scoring and Reporting Scoring is automatic, and the report is typically available within a minute after the test is

completed. A link to the report will be available on the online testing platform at TalentLens.com.

Adobe® Acrobat Reader® is required to open the report. The test administrator may view, print, or

save the candidate’s report.

Test Security ANRA scores are confidential and should be stored in a secure location accessible only to

authorized individuals. It is unethical and poor test practice to allow test-score access to

individuals who do not have a legitimate need for the information. Storing test scores in a locked

cabinet or password-protected file that can only be accessed by designated test administrators will

help ensure the security of the test scores. The security of testing materials (e.g., access to online

tests) and protection of copyright must also be maintained by authorized individuals. Avoid

disclosure of test access information such as usernames or passwords, and only administer ANRA

in proctored environments. All the computer stations used in administering ANRA must be in

locations that can be easily supervised and with adequate level of security.

Concluding Test Administration At the end of the testing session, thank the candidate(s) for his or her participation and check the

computer station(s) to ensure that the test is closed.

ANRA can be a demanding test for some candidates. It may be constructive to clarify what part

the test plays within the context of the selection or assessment procedures. It is also constructive

to reassure candidates about the confidentiality of their test scores.

Page 15: Advanced Numerical Reasoning Appraisal TM (ANRA)talentlens.pearsonpsychcorp.com.au/files/ANRA_Manual(1).pdf · Numerical Reasoning Appraisal (RANRA). This manual details our adaptation

Copyright © 2006 by NCS Pearson, Inc. All rights reserved. 9

Administering ANRA and Watson-Glaser Critical Thinking Appraisal in a Single Testing Session When administering the ANRA and the Watson-Glaser in a single testing session, administer the

Watson-Glaser first. Just as ANRA is intended as a test of numerical reasoning power rather than

speed, the Watson-Glaser is intended as a test of critical thinking power rather than speed. Both

tests are untimed; administration of ANRA and the Watson-Glaser Short Form in one session

should take about 1 hour and 45 minutes.

Accommodating Examinees With Disabilities The Americans with Disabilities Act (ADA) of 1990 requires an employer to reasonably

accommodate the known disability of a qualified applicant, provided such accommodation would

not cause an “undue hardship” to the operation of the employer’s business.

The test administrator should provide reasonable accommodations to enable candidates with

special needs to comfortably take the test. Reasonable accommodations may include, but are not

limited to, modifications to the test environment (e.g., high desks) and medium (e.g., having a

reader read questions to the examinee, or increasing the font size of questions) (Society for

Industrial and Organizational Psychology, 2003). In situations where an examinee’s disability is

not likely to impair his or her job performance, but may hinder the examinee’s performance on

ANRA, the organization may want to consider waiving the test or de-emphasizing the score in

lieu of other application criteria. Interpretive data as to whether scores on ANRA are comparable

for examinees who are provided reasonable accommodations are not available at this time due to

the small number of examinees who have requested such accommodations.

Page 16: Advanced Numerical Reasoning Appraisal TM (ANRA)talentlens.pearsonpsychcorp.com.au/files/ANRA_Manual(1).pdf · Numerical Reasoning Appraisal (RANRA). This manual details our adaptation

Copyright © 2006 by NCS Pearson, Inc. All rights reserved. 10

Chapter 4

ANRA Norms Development

Norms provide a basis for evaluating an individual's score relative to the scores of other

individuals who took the same test. Norms allow for the conversion of raw scores to more useful

comparative scores, such as percentile ranks. Typically, norms are constructed from the scores of

a large sample of other individuals who took the test under similar conditions. This group of

individuals is called the norm group.

The characteristics of the sample used for preparing norms are critical in determining the

usefulness of those norms. For such purposes as selecting from among applicants to fill a

particular job, normative information derived from a specific, relevant, well-defined group might

be most useful. However, the composition of the sample of job applicants is influenced by a

variety of situational factors, including the job demands and local labor market conditions.

Because such factors can vary across jobs, locations, and over time, the limitations on the

usefulness of any set of published norms should be recognized.

When a test is used to make employment decisions, the most appropriate norm group is one that

is representative of those who will be taking the test in the local situation. It is best, whenever

possible, to prepare local norms by accumulating the test scores of applicants, trainees, or

employees. One of the factors that must be considered in establishing norms is sample size. Data

from small samples tend to be unstable and the presentation of percentile ranks for all possible

scores is imprecise. As a result, the use of in-house norms is only recommended when the sample

is sufficiently large (about 100 or more people). Until a sufficient and representative number of

cases has been collected, the test user should consider norms based on other similar groups rather

than from local data with a small sample size. In the absence of adequate local norms, the norms

provided in Appendixes A and B should be used to guide the interpretation of scores.

Using ANRA as a Norm- or Criterion-Referenced Test

ANRA may be used as a norm-referenced or as a criterion-referenced instrument. A norm-

referenced test enables a human resource professional to interpret an individual's test performance

in comparison to a particular normative group. An individual's performance on a criterion-

referenced instrument can only indicate whether or not that individual meets certain, predefined

criteria. It is appropriate to use ANRA as a norm-referenced instrument in the process of

Page 17: Advanced Numerical Reasoning Appraisal TM (ANRA)talentlens.pearsonpsychcorp.com.au/files/ANRA_Manual(1).pdf · Numerical Reasoning Appraisal (RANRA). This manual details our adaptation

Copyright © 2006 by NCS Pearson, Inc. All rights reserved. 11

employment selections. For optimal results in such decisions, the overall total score, rather than

the subtest scores should be used. Subtest scores represent fewer items and, therefore, are less

stable than the total score. However, as a criterion-referenced measure, it is feasible to use subtest

scores to analyze the numerical reasoning abilities of a class or larger group and to determine the

types of numerical reasoning or critical thinking training that may be most appropriate.

In norm-referenced situations, raw scores need to be converted before they can be compared.

Though raw scores may be used to rank candidates in order of performance, little can be inferred

from raw scores alone. There are two main reasons for this. First, raw scores cannot be treated as

having equal intervals. For example, it would be incorrect to assume that the difference between

raw scores of, say, 20 and 21 is of the same significance as the difference between raw scores of

30 and 31. Second, ANRA raw scores may not be normally distributed. Hence, they are not

subject to the psychometric principles of parametric statistics required for the proper evaluation

of validity.

Using Norms to Interpret Scores The ANRA norms presented in Appendix B and Appendix C were derived from data collected

February 2006 through June 2006, from 452 adults in a variety of employment settings. The

tables in Appendix B (Tables B.1 and B.2) show the ANRA total raw scores with corresponding

percentile ranks and T scores for the identified norm groups.

When using the norms tables in Appendix B, look for a group that is similar to the individual or

group tested. For example, you would compare the test score of a person who applied for a

Manager position with norms derived from the scores of other managers. When using the norms

in Appendix B to interpret candidates’ scores, keep in mind that norms are affected by the

composition of the groups that participated in the normative study. Therefore, it is important to

examine specific position level and occupational characteristics of a norm group.

By comparing an individual’s raw score to the data in a norms table, it is possible to determine

the percentile rank corresponding to that score. The percentile rank indicates an individuals’

relative position in the norm group. Percentiles should not be confused with percentage scores

that represent the percentage of correct items. Percentiles are derived scores that are expressed in

terms of the percent of people in the norm group scoring equal to or below a given raw score.

Percentiles have the advantage of being readily understood and universally applicable. However,

although percentiles are useful for expressing an examinee’s performance relative to other

candidates, percentiles have limitations. For example, percentile ranks do not have equal

intervals. While percentiles indicate the relative position of each candidate in relation to the

Page 18: Advanced Numerical Reasoning Appraisal TM (ANRA)talentlens.pearsonpsychcorp.com.au/files/ANRA_Manual(1).pdf · Numerical Reasoning Appraisal (RANRA). This manual details our adaptation

Copyright © 2006 by NCS Pearson, Inc. All rights reserved. 12

normative sample, they do not show the amount of difference between scores. In a normal

distribution of scores, percentile ranks tend to cluster around the 50th percentile. This clustering

affects scores in the average range the most because a difference of one or two raw score points

may change the percentile rank. Extreme scores are affected less; a change in one or two raw

score points at the extremes typically does not produce a large change in percentile ranks. These

factors should be considered when interpreting percentile ranks.

Converting Raw Scores to Percentile Ranks To find the percentile rank of a candidate’s raw score, locate the ANRA total raw score in Table

B.1 or B.2. The corresponding percentile rank is read from the selected norm group column. For

example, if a person applying for a job as a Director had a score of 25 on ANRA, it is appropriate

to use the Executives/Directors norms in Table B.1 for comparison. In this case, the percentile

rank corresponding to a raw score of 25 is 67. This percentile rank indicates that about 67% of the

people in the norm group scored lower than or equal to a score of 25 on ANRA, and about 33%

scored higher than a score of 25 on ANRA. The lowest raw score will lie at the 1st percentile; the

median raw score will fall at the 50th percentile, and the highest raw score will lie at the 99th

percentile.

Each group’s size (N), raw score mean, and raw score standard deviation (SD) are shown at the

bottom of the norms tables. The group raw score mean or average is calculated by summing the

raw scores and dividing the sum by the total number of examinees. The standard deviation

indicates the amount of variation in a group of scores. In a normal distribution, approximately

two-thirds (68.26%) of the scores are within the range of 1 SD below the mean to 1 SD above the

mean. These statistics are often used in describing a sample and setting cut scores. For example, a

cut score may be set as one SD below the mean. In compliance with the Civil Rights Act of 1991,

Section 5 (a) (1), as amended, the norms provided in Appendix B and Appendix C combine data

for males and females, and for white and minority candidates.

Using Standard Scores to Interpret Performance

Test results can be reported in many different formats. Examples of these formats include raw

scores, percentiles, and various forms of standard scores. Standard scores express the score of

each individual in terms of its distance from the mean. Examples of standard scores are z scores

and T scores. Standard scores do not suffer from the drawbacks associated with percentiles. The

advantage of percentiles is that they are readily understood and, therefore, immediately

meaningful. As indicated above, however, there is a risk of percentiles being confused with

Page 19: Advanced Numerical Reasoning Appraisal TM (ANRA)talentlens.pearsonpsychcorp.com.au/files/ANRA_Manual(1).pdf · Numerical Reasoning Appraisal (RANRA). This manual details our adaptation

Copyright © 2006 by NCS Pearson, Inc. All rights reserved. 13

percentage scores, or of percentiles being interpreted as an interval scale. Standard scores avoid

the unequal clustering of scores by adopting a scale based on standard deviation units.

The basic type of standard score is the z score, which is a raw score converted to a standard

deviation unit. Thus a raw score that is 0.53 standard deviations below the mean score for the

group receives a z score of –0.53. z scores are generally in the –3.00 to + 3.00 range. However,

there are certain disadvantages in saying that a person has a score of –0.53 on a test. From the

point of view of presentation, the use of decimal points and the negative symbol is unappealing.

Hence, certain transformation algorithms have become available that enable a more user-friendly

image for standard scores.

Converting z Scores to T Scores To convert a z score to a T score, multiply the z score by 10 and add 50. Thus, a z score of –0.53

becomes a T score of 44.7, which is then rounded, as a matter of convention, to the nearest whole

number, that is, 45. A set of T scores has a mean of 50 and at each standard deviation point there

is a score difference of 10. Thus, a T score of 30 is at two standard deviations below the mean,

while a T score of 60 is one standard deviation above the mean. The T score transformation

results in a scale that runs from 10 to 90, with each 10th interval coinciding with a standard

deviation point. Appendix B shows ANRA T scores. Appendix C shows the sum of Watson-

Glaser and ANRA T scores and their corresponding percentiles. Because the Watson-Glaser and

ANRA do not measure identical constructs, their combined T scores must be derived by first

transforming separate Watson-Glaser and ANRA raw score pairs to their respective T scores, and

then summing the T scores. Figure 4.1 illustrates the relationship between percentiles and

T scores.

Page 20: Advanced Numerical Reasoning Appraisal TM (ANRA)talentlens.pearsonpsychcorp.com.au/files/ANRA_Manual(1).pdf · Numerical Reasoning Appraisal (RANRA). This manual details our adaptation

Copyright © 2006 by NCS Pearson, Inc. All rights reserved. 14

Figure 4.1 The Relationship of Percentiles to T Scores

Using ANRA and Watson-Glaser Critical Thinking Appraisal Together The ANRA and Watson-Glaser combined score provides a broader range of critical reasoning

skills than would be obtained by the use of each test alone. Scores from ANRA and the Watson-

Glaser can be combined by first converting each total raw score to a T score and then adding the

two T scores together. The sum of the T scores can also be converted to percentile ranks.

Appendix C (Tables C.1 and C.2) shows the percentile ranks of the sum of ANRA and

Watson-Glaser Short Form T scores.

Another potential benefit from using ANRA and the Watson-Glaser together is in the expected

difference between scores on the two tests. This expected difference depends on the type of norm

group to which the candidate belongs. Generally speaking, candidates in financial or scientific

occupations are expected to score higher on ANRA than on the Watson-Glaser. On the other

hand, managers, particularly in fields where critical thinking using language is a key skill, and

employees in occupations that do not require a great deal of numeracy, will be expected to

perform better on the Watson-Glaser than on ANRA. By examining the difference between a

candidate’s Watson-Glaser and ANRA scores, the user can make appropriate development

suggestions to the candidate.

Page 21: Advanced Numerical Reasoning Appraisal TM (ANRA)talentlens.pearsonpsychcorp.com.au/files/ANRA_Manual(1).pdf · Numerical Reasoning Appraisal (RANRA). This manual details our adaptation

Copyright © 2006 by NCS Pearson, Inc. All rights reserved. 15

Chapter 5

Evidence of Reliability

The reliability of a measurement instrument refers to the accuracy, consistency, and precision of

test scores across situations (Anastasi & Urbina, 1997). Test theory posits that a test score is an

estimate of an individual’s hypothetical true score, or the score an individual would receive if the

test were perfectly reliable. In actual practice, however, some measurement error is to be

expected. A reliable test has relatively small measurement error.

The methods most commonly used to estimate test reliability are test–retest (the stability of test

scores over time), alternate forms (the consistency of scores across alternate forms of a test), and

internal consistency of the test items (e.g., Cronbach’s alpha coefficient, Cronbach 1970).

Decisions about the form of reliability to be used in comparing tests depend on a consideration of

the nature of the error that is involved in each form. Different types of error can be operating at

the same time, so it is to be expected that reliability coefficients will differ in different situations

and on different groupings and samplings of respondents. An appropriate estimate of reliability

can be obtained from a large representative sample of the respondents to whom the test is

generally administered.

Reliability Coefficients and Standard Error of Measurement The reliability of a test is expressed as a correlation coefficient, which represents the consistency

of scores that would be obtained if a test could be given an infinite number of times. Reliability

coefficients are a type of estimate of the amount of error associated with test scores and can range

from .00 to 1.00. The closer the reliability coefficient is to 1.00, the more reliable the test. A

perfectly reliable test would have a reliability coefficient of 1.00 and no measurement error. A

completely unreliable test would have a reliability coefficient of .00. The U.S. Department of

Labor (1999) provides the following general guidelines for interpreting a reliability coefficient:

above .89 is considered “excellent,” .80–.89 is “good,” .70–.79 is considered “adequate,” and

below .70 “may have limited applicability.”

Page 22: Advanced Numerical Reasoning Appraisal TM (ANRA)talentlens.pearsonpsychcorp.com.au/files/ANRA_Manual(1).pdf · Numerical Reasoning Appraisal (RANRA). This manual details our adaptation

Copyright © 2006 by NCS Pearson, Inc. All rights reserved. 16

Repeated testing leads to some variation. Consequently, no single test event effectively measures

an examinee’s actual ability with complete accuracy. Therefore, an estimate of the possible

amount of error present in a test score, or the amount that scores would probably vary if an

examinee were tested repeatedly with the same test is necessary. This estimate of error is known

as the standard error of measurement (SEM). The SEM decreases as the reliability of a test

increases. A large SEM denotes less reliable measurement and less reliable scores. The standard

error of measurement is calculated with the formula:

xxrSDSEM −= 1

In this formula, SEM represents the standard error of measurement, SD represents the standard

deviation of the distribution of obtained scores, and rxx represents the reliability coefficient of the

test (Cascio, 1991, formula 7-11).

The SEM is a quantity that is added to and subtracted from an examinee’s standard test score to

create a confidence interval or band of scores around the obtained standard score. The confidence

interval is a score range that, in all likelihood, includes the examinee’s hypothetical “true” score

that represents the examinee’s actual ability. A true score is a theoretical score entirely free of

error. Since the true score is a hypothetical value that can never be obtained because testing

always involves some measurement error, the score obtained by an examinee on any test will vary

somewhat from administration to administration. As a result, any obtained score is considered

only an estimate of the examinee’s “true” score. Approximately 68% of the time, the observed

standard score will lie within +1.0 and –1.0 SEM of the true score; 95% of the time, the observed

standard score will lie within +1.96 and –1.96 SEM of the true score; and 99% of the time, the

observed standard score will lie within +2.58 and –2.58 SEM of the true score.

Using the SEM means that standard scores are interpreted as bands or ranges of scores, rather

than as precise points (Nunnally, 1978). To illustrate the use of SEM with an example, assume a

director candidate obtained a total raw score of 25 on ANRA, with SEM = 2.32. From the

information in Table B.1, the standard score (T score) for this candidate is 57. We can, therefore,

infer that if this candidate were administered a large number of alternative forms of ANRA, 95%

of this candidate’s T scores would lie within the range between 57 –1.96 x 2.32�52 T score

points and 57 + 1.96 x 2.32�62 T score points. We can further infer that the expected average of

this person’s T scores from a large number of alternate forms of ANRA would be 57.

Page 23: Advanced Numerical Reasoning Appraisal TM (ANRA)talentlens.pearsonpsychcorp.com.au/files/ANRA_Manual(1).pdf · Numerical Reasoning Appraisal (RANRA). This manual details our adaptation

Copyright © 2006 by NCS Pearson, Inc. All rights reserved. 17

Thinking in terms of score ranges serves as a check against overemphasizing small differences

between scores. The SEM may be used to determine if an individual’s score is significantly

different from a cut score, or if the scores of two individuals differ significantly. An example of

one general rule of thumb is that the difference between two scores on the same test should not be

interpreted as significant unless the difference is equal to at least twice the standard error of the

difference (SED), where SED = SEM 2 (Gulliksen, as cited in Cascio, 1991, p.143).

RANRA Reliability Studies Because ANRA is a U.S. adaptation of RANRA, the information on previous studies refers to

RANRA. For the sample used in the initial development of RANRA in the United Kingdom

(N = 1546), Cronbach’s alpha coefficient and split-half reliability were .78 for the overall

RANRA score (Rust, 2002). The reliability coefficients of RANRA for both Test 1 and Test 2

and for the overall RANRA score are shown in Table 5.1.

Table 5.1 Coefficient Alpha, Odd-Even Split-Half Reliability, and Standard Error of Measurement (SEM) for RANRA (from Rust, 2002, p. 8.5)

Alpha Split-Half SEM

Test 1: Comparison of Quantities .63 .60 6.32 Test 2: Sufficiency of Information .70 .71 5.39 RANRA Score .78 .78 4.69

The RANRA score reported in Table 5.1 is a T score transformed from the total raw score, while

the standard error of measurement reported in the table was based on the split-half reliability

(Rust, 2002).

ANRA Reliability Studies Evidence of Internal Consistency Cronbach’s alpha and the standard error of measurement (SEM) were calculated for the sample

used for the ANRA norm groups reported in this manual. The internal consistency reliability

estimates for ANRA total raw score and ANRA subtests are shown in Table 5.2.

Page 24: Advanced Numerical Reasoning Appraisal TM (ANRA)talentlens.pearsonpsychcorp.com.au/files/ANRA_Manual(1).pdf · Numerical Reasoning Appraisal (RANRA). This manual details our adaptation

Copyright © 2006 by NCS Pearson, Inc. All rights reserved. 18

Table 5.2 ANRA Means, Standard Deviations (SD), Standard Errors of Measurement (SEM), and Internal Consistency Reliability Coefficients (Alpha)

ANRA Total Raw Score Norm Group N Mean SD SEM Alpha Executives/Directors 91 21.3 6.0 2.32 .85 Managers 88 20.1 5.6 2.38 .82 Professionals/Individual Contributors 200 22.1 6.4 2.22 .88 Employees in Financial Occupations 198 21.9 6.4 2.22 .88 ANRA Test 1: Comparison of Quantities Norm Group N Mean SD SEM Alpha Executives/Directors 91 10.9 3.4 1.63 .77 Managers 88 10.3 3.4 1.70 .75 Professionals/Individual Contributors 200 11.4 3.6 1.53 .82 Employees in Financial Occupations 198 11.3 3.5 1.57 .80 ANRA Test 2: Sufficiency of Information Norm Group N Mean SD SEM Alpha Executives/Directors 91 10.4 3.3 1.60 .75 Managers 88 9.9 2.9 1.67 .67 Professionals/Individual Contributors 200 10.7 3.3 1.62 .76 Employees in Financial Occupations 198 10.6 3.3 1.58 .77

The values in Table 5.2 show that the ANRA total raw score possesses good internal consistency

reliability. The ANRA subtests showed lower internal consistency reliability estimates than the

ANRA total raw score. Consequently, the ANRA total score, not the subtest scores, should be

used for optimal hiring results.

Evidence of Test-Retest Stability

ANRA was administered on two separate occasions to determine the stability of performance on

the test over time. A sample of 73 job incumbents representing various occupations and

organizational levels took the test twice. The average test-retest interval was two weeks. The test-

retest stability was evaluated using Pearson’s product-moment correlation of the standardized T

scores from the first and second testing occasions. The test-retest correlation coefficient was

corrected for the variability of the sample (Allen & Yen, 1979). Furthermore, the standard

difference (i.e., effect size) was calculated using the mean score difference between the first and

second testing occasions divided by the pooled standard deviation (Cohen, 1996, Formula 10.4).

This difference (d), proposed by Cohen (1988), is useful as an index to measure the magnitude of

the actual difference between two means. The corrected test-retest stability coefficient was .85.

The difference in mean scores between the first testing and the second testing was statistically

small (d = –0.03). As the data in Table 5.3 indicate, ANRA demonstrates good test-retest stability

over time.

Page 25: Advanced Numerical Reasoning Appraisal TM (ANRA)talentlens.pearsonpsychcorp.com.au/files/ANRA_Manual(1).pdf · Numerical Reasoning Appraisal (RANRA). This manual details our adaptation

Copyright © 2006 by NCS Pearson, Inc. All rights reserved. 19

Table 5.3 ANRA Test-Retest Stability (N = 73)

First Testing Second Testing Mean SD Mean SD r12

Corrected r12

Standard Difference (d)

ANRA Standardized T score 50.1 9.2 49.8 10.0 .82 .85 –0.03

Page 26: Advanced Numerical Reasoning Appraisal TM (ANRA)talentlens.pearsonpsychcorp.com.au/files/ANRA_Manual(1).pdf · Numerical Reasoning Appraisal (RANRA). This manual details our adaptation

Copyright © 2006 by NCS Pearson, Inc. All rights reserved. 20

Chapter 6

Evidence of Validity

Validity refers to the degree to which specific data, research, or theory support the interpretation

of test scores entailed by proposed uses of tests (American Educational Research Association

[AERA], American Psychological Association [APA], & National Council on Measurement in

Education [NCME], 1999). Cronbach (1970) observed that validity is high if a test gives the

information the decision maker needs. Several sources of validity evidence are discussed next in

relation to ANRA.

Face Validity Face validity refers to a test's appearance and what the test seems to measure, rather than what the

test actually measures. Face validity is not validity in any technical sense and should not be

confused with content validity. Face validity refers to whether or not a test looks valid to

candidates, administrators and other observers. If test content does not seem relevant to the

candidate, the result may be lack of cooperation, regardless of the actual validity of the test. For a

test to function effectively in practical situations, such a test not only has to be objectively valid

but also face valid.

However, a test cannot be judged solely on whether it “looks right.” Appearance and graphic

design of a test are no guarantee of quality. Face validity should not be considered a substitute for

objectively determined validity. As mentioned in the chapter on the development of ANRA,

ANRA items were reviewed by a group of individuals who provided feedback on the test. The

reviewers provided their feedback regarding issues like clarity of the items, the extent to which

items appeared to measure numerical reasoning, extent to which test content appeared relevant to

jobs that required numerical reasoning, and to what extent they thought the test would yield

useful information. From the responses by this group, it was evident that ANRA had high face

validity and participants recognized its relevance to the skills required by employees who deal

with numbers or project planning. Although the item content of ANRA could not reflect every

work situation for which the test would be appropriate, the operations and processes required in

each subtest represent abilities that are valued and readily appreciated.

Page 27: Advanced Numerical Reasoning Appraisal TM (ANRA)talentlens.pearsonpsychcorp.com.au/files/ANRA_Manual(1).pdf · Numerical Reasoning Appraisal (RANRA). This manual details our adaptation

Copyright © 2006 by NCS Pearson, Inc. All rights reserved. 21

Evidence Based on Test Content Evidence based on the content of a test exists when the test includes a representative sample of

tasks, behaviors, knowledge, skills, abilities, or other characteristics necessary to perform the job.

Evidence of content validity is usually gathered through job analysis and is most appropriate for

evaluating knowledge and skills tests.

Evaluation of content-related evidence is usually a rational, judgmental process

(Cascio & Aguinis, 2005). In employment settings, the principal concern is with making

inferences about how well the test samples a job performance domain—a segment or aspect of

the job performance universe that has been identified and about which inferences are to be made

(Lawshe, 1975). Because most jobs have several performance domains, a standardized test

generally applies only to one segment of the job performance universe (e.g., a typing test

administered to a secretary applies to typing—one job performance domain in the job

performance universe of a secretary). Thus, the judgment of whether content-related evidence

exists depends on an evaluation of whether the same capabilities are required in both the job

performance domain and the test (Cascio & Aguinis, 2005).

When considering content validity, it is important to recognize that a test attempts to sample the

area of behavior being measured. It is rarely the purpose of a test to be exhaustive in assessing

every possible manifestation of a domain. While content exhaustiveness may seem feasible in

some highly specific areas of achievement, in other measurement situations it would simply not

be possible. Aptitude, ability and personality tests always aim to achieve representative sampling

of the behaviors in question, and the evaluation of content validity relates to the degree to which

this representation has been achieved.

Evidence of content validity is most easily shown with reference to achievement tests where the

relationship between the items and the expected manifestation of that ability in real-life situations

is very clear. Achievement tests are designed to measure how well an individual has mastered a

particular skill or course of study. From this perspective, it might seem that an informed

inspection of the contents of a test would be sufficient to establish its validity for such a purpose.

For example, a test of spelling should consist of spelling items. A careful analysis of the domain

will be necessary to ensure that all the important features are covered by the test items, and that

the features are appropriately represented in the test according to their significance.

The effect of speed on test scores also needs to be checked. Participants may perform differently

under the additional pressure of a timed test. There are also implications for test design and

scoring arising from the interaction of speed and accuracy and from situations where candidates

Page 28: Advanced Numerical Reasoning Appraisal TM (ANRA)talentlens.pearsonpsychcorp.com.au/files/ANRA_Manual(1).pdf · Numerical Reasoning Appraisal (RANRA). This manual details our adaptation

Copyright © 2006 by NCS Pearson, Inc. All rights reserved. 22

fail to finish a timed test. In any case, ANRA is not a speed test and it is unlikely that anyone

failing to complete the test within a reasonable amount of time would improve his or her score

significantly if given extra time.

In an employment setting, evidence of ANRA content-related validity should be established by

demonstrating that the jobs require the numerical reasoning skills measured by ANRA. Content-

related validity in instructional settings may be examined for the extent to which ANRA measures

a sample of the specified objectives of such instructional programs.

Evidence Based on Test-Criterion Relationships One of the primary reasons for using tests is to be able to make an informed prediction about an

examinee’s potential for future success. For example, selection tests are used to hire or promote

individuals most likely to be productive employees. The rationale behind using selection tests is

the better an individual performs on the test, the better this individual will perform as an

employee.

Evidence of criterion-related validity addresses the inference that individuals who score better on

tests will be successful on some criterion of interest. Criterion-related validity evidence indicates

the statistical relationship (e.g., for a given sample of job applicants or incumbents) between

scores on the test and one or more criteria, or between scores on the test and independently

obtained measures of subsequent job performance. By collecting test scores and criterion scores

(e.g., job performance results, grades in a training course, supervisor ratings), one can determine

how much confidence may be placed in using test scores to predict job success. Typically,

correlations between criterion measures and scores on the test serve as indicators of criterion-

related validity evidence. Provided the conditions for a meaningful validity study have been met

(e.g., sufficient sample size, and adequate criteria), these correlation coefficients are important

indicators of the utility of the test.

The conditions for evaluating criterion-related validity evidence are often difficult to fulfill in the

ordinary employment setting. Studies of test-criterion relationships should involve a sufficiently

large number of persons hired for the same job and evaluated for success using a uniform

criterion measure. The criterion itself should be reliable and job-relevant, and should provide a

wide range of scores. In order to evaluate the quality of studies of test-criterion relationships, it is

essential to know at least the size of the sample and the nature of the criterion.

Assuming that the conditions for a meaningful evaluation of criterion-related validity evidence

had been met, Cronbach (1970) characterized validity coefficients of .30 or better as having

“definite practical value.” The U.S. Department of Labor (1999) provides the following general

Page 29: Advanced Numerical Reasoning Appraisal TM (ANRA)talentlens.pearsonpsychcorp.com.au/files/ANRA_Manual(1).pdf · Numerical Reasoning Appraisal (RANRA). This manual details our adaptation

Copyright © 2006 by NCS Pearson, Inc. All rights reserved. 23

guidelines for interpreting validity coefficients: above .35 are considered “very beneficial,” .21–

.35 are considered “likely to be useful,” .11–.20 “depends on the circumstances,” and below .11

“unlikely to be useful.” It is important to point out that even relatively lower validities (e.g., .20)

may justify the use of a test in a selection program (Anastasi & Urbina, 1997). This suggestion is

because the practical value of the test depends not only on the validity, but also other factors,

such as the base rate for success on the job (i.e., the proportion of people who would be

successful in the absence of any selection procedure). If the base rate for success on the job is low

(i.e., few people would be successful on the job), tests with low validity can have considerable

utility or value. When the base rate is high (i.e., selected at random, most people would succeed

on the job), even highly valid tests may not contribute significantly to the selection process.

In addition to the practical value of validity coefficients, the statistical significance of coefficients

should be noted. Statistical significance refers to the odds that a non-zero correlation could have

occurred by chance. If the odds are 1 in 20 that a non-zero correlation could have occurred by

chance, then the correlation is considered statistically significant. Some experts prefer even more

stringent odds, such as 1 in 100, although the generally accepted odds are 1 in 20. In statistical

analyses, these odds are designated by the lower case p (probability) to signify whether a non-

zero correlation is statistically significant. When p is less than or equal to .05, the odds are

presumed to be 1 in 20 (or less) that a non-zero correlation of that size could have occurred by

chance. When p is less than or equal to .01, the odds are presumed to be 1 in 100 (or less) that a

non-zero correlation of that size occurred by chance.

In a study of ANRA criterion-related validity, we examined the relationship between ANRA

scores and on-the-job performance of job incumbents in various occupations (mostly finance-

related occupations) and position levels (mainly professionals, managers, and directors). Job

performance was defined as supervisory ratings on behaviors determined through research to be

important to most professional, managerial, and executive jobs. The study found that ANRA

scores correlated .32 with supervisory ratings on a dimension made up of Analysis and Problem

Solving behaviors, and .36 with supervisory ratings on a dimension made up of Judgment and

Decision Making behaviors (see Table 6.1). Furthermore, ANRA scores correlated .36 with

supervisory ratings on a dimension composed of job behaviors dealing with Quantitative/

Professional Knowledge and Expertise. Supervisory ratings from the sum of ratings on 24 job

performance behaviors (“Total Performance”), as well as ratings on a single-item measure of

“Overall Potential” were also obtained. The ANRA scores correlated .44 with Total Performance

and .31 with ratings of Overall Potential. The correlation between ANRA scores and a single-item

supervisory rating of “Overall Performance” was .38.

Page 30: Advanced Numerical Reasoning Appraisal TM (ANRA)talentlens.pearsonpsychcorp.com.au/files/ANRA_Manual(1).pdf · Numerical Reasoning Appraisal (RANRA). This manual details our adaptation

Copyright © 2006 by NCS Pearson, Inc. All rights reserved. 24

Table 6.1 Evidence of ANRA Criterion-Related Validity (Total Raw Score) of Job Incumbents in Various Finance-Related Occupations and Position Levels

Criterion N Mean SD r

Analysis and Problem Solving 89 37.6 7.0 .32**

Judgment and Decision Making 91 32.2 5.9 .36**

Quantitative/Professional Knowledge and Expertise

59 53.6 8.9 .36**

Total Performance (24 items) 58 127.0 22.0 .44**

Overall Performance (single item) 94 5.6 1.1 .38**

Overall Potential 94 3.4 1.1 .31**

** p < .01

In Table 6.1, the column entitled N details the number of cases having valid supervisory ratings

for every single job behavior contained in the specified criterion. The means and standard

deviations refer to the criteria ratings shown in the table. The validity coefficients appear in the

last column.

The criterion-related validity coefficients reported in Table 6.1 apply to the specific sample of job

incumbents mentioned in the table. These validity coefficients clearly indicate that ANRA is

likely to be very beneficial as an indicator of the criteria shown in Table 6.1. However, test users

should not automatically assume that these data constitute sole and sufficient justification for use

of ANRA. Inferring validity for one group of employees or candidates from data reported for

another group is not appropriate unless the organizations and job categories being compared are

demonstrably similar.

Careful examination of Table 6.1 can help test users make an informed judgment about the

appropriateness of ANRA for their own organization. However, the data presented here are not

intended to serve as a substitute for locally obtained validity data. Local validity studies, together

with locally derived norms, provide a sound basis for determining the most appropriate use of

ANRA. Hence, whenever technically feasible, test users should study the validity of ANRA, or

any selection test, at their own location or organization.

Sometimes it is not possible for a test user to conduct a local validation study. There may be too

few incumbents in a particular job, an unbiased and reliable measure of job performance may not

be available, or there may not be a sufficient range in the ratings of job performance to justify the

computation of validity coefficients. In such circumstances, evidence of a test’s validity reported

elsewhere may be relevant, provided that the data refer to comparable jobs.

Page 31: Advanced Numerical Reasoning Appraisal TM (ANRA)talentlens.pearsonpsychcorp.com.au/files/ANRA_Manual(1).pdf · Numerical Reasoning Appraisal (RANRA). This manual details our adaptation

Copyright © 2006 by NCS Pearson, Inc. All rights reserved. 25

Correlations Between ANRA Test 1 and Test 2

The correlation between Test 1 (Comparison of Quantities) and Test 2 (Sufficiency of

Information) of ANRA was 0.71 (N = 452, p < .0001). This correlation is clearly significant and

also lower than the reliabilities of either test shown in Table 5.2, chapter 5. This evidence

suggests that ANRA effectively samples both of these reasoning domains within the broader

conception of numerical reasoning (Rust, 2002).

Evidence of Convergent and Discriminant Validity Convergent evidence is provided when scores on a test relate to scores on other tests or variables

that purport to measure similar traits or constructs. Evidence of relations with other variables can

involve experimental (or quasi-experimental) as well as correlational evidence (AERA et al.,

1999). Discriminant evidence is provided when scores on a test do not relate closely to scores on

tests or variables that measure different traits or constructs.

Correlations Between ANRA and Watson-Glaser Critical Thinking Appraisal—Short Form Correlations between ANRA and the Watson-Glaser Critical Thinking Appraisal®—Short Form

(see Table 6.2) suggest that the tests are measuring a common general ability. Evidence for the

validity of the Watson-Glaser as a measure of critical thinking and reasoning appears in the

Watson-Glaser Short Form Manual (Watson & Glaser, 2006). The data in Table 6.2 suggest that

ANRA also measures reasoning ability.

The fact that the correlations between ANRA and the Watson-Glaser Short Form tests are lower

than the inter-correlation between the two ANRA tests suggests that ANRA also measures some

distinct aspect of reasoning that is not measured by the Watson-Glaser (Rust, 2002).

Table 6.2 Correlations Between Watson-Glaser Critical Thinking Appraisal—Short Form and ANRA (N = 452)

Watson-Glaser

ANRA Test 1: Comparison of

Quantities

ANRA Test 2: Sufficiency of Information

ANRA Total Raw Score

Watson-Glaser Short Form Total Raw Score .65 .61 .68 Test 1: Inference .48 .47 .52 Test 2: Recognition of Assumptions .40 .36 .41 Test 3: Deduction .53 .51 .56 Test 4: Interpretation .60 .51 .60 Test 5: Evaluation of Arguments .35 .36 .39 Note. For all the correlations, p < .001

Page 32: Advanced Numerical Reasoning Appraisal TM (ANRA)talentlens.pearsonpsychcorp.com.au/files/ANRA_Manual(1).pdf · Numerical Reasoning Appraisal (RANRA). This manual details our adaptation

Copyright © 2006 by NCS Pearson, Inc. All rights reserved. 26

Correlations Between ANRA and Other Tests

In addition to the correlations with the Watson-Glaser, we also examined the correlations between

ANRA and two other tests: Miller Analogies Test for Professional Selection (N = 67), and the

DAT for Personnel and Career Assessment–Numerical Ability (N = 80). As would be expected,

ANRA correlated higher with the Numerical Ability test of the DAT for PCA (r = .70, p < .001)

than with the MAT for PS (r = .57, p = < .001). Details of these results, which suggest convergent

as well as discriminant validity, are shown in Table 6.3.

Table 6.3 Correlations Between ANRA, the Miller Analogies Test for Professional Selection (MAT for PS), and the Differential Aptitude Tests for Personnel and Career Assessment—Numerical Ability (DAT for PCA—NA)

ANRA MAT for PS

(N = 67) DAT for PCA—NA

(N = 80) ANRA Total Raw Score .57 .70 ANRA Test 1: Comparison of Quantities .50 .69 ANRA Test 2: Sufficiency of Information .50 .57

Note. For all the correlations, p < .001

Page 33: Advanced Numerical Reasoning Appraisal TM (ANRA)talentlens.pearsonpsychcorp.com.au/files/ANRA_Manual(1).pdf · Numerical Reasoning Appraisal (RANRA). This manual details our adaptation

Copyright © 2006 by NCS Pearson, Inc. All rights reserved. 27

Chapter 7

Using ANRA as an Employment Selection Tool

ANRA was developed for use in adult employment selection. It may be used to predict success in

jobs that require application of numerical reasoning skills. ANRA can also be useful in

monitoring the effectiveness of numerical reasoning instruction and training programs, and in

researching the relationship between numerical reasoning and other abilities or skills.

Employment Selection

Many organizations use testing as a component of their employment selection process.

Employment selection programs typically use cognitive ability tests, aptitude tests, personality

tests, basic skills tests, and work values tests to screen out unqualified candidates, to categorize

prospective employees according to their probability of success on the job, or to rank order a

group of candidates according to merit.

ANRA was designed to assist in the selection of employees for jobs that require numerical

reasoning. Many finance-related, project-management, and technical professions require the type

of numerical reasoning ability measured by ANRA. The test is useful to assess applicants for a

variety of jobs, such as Accountant, Account Manager, Actuary, Banking Manager, Business

Analyst, Business Development Manager, Business Unit Leader, Finance Analyst, Loan Officer,

Project Manager, Inventory Planning Analyst, Procurement or Purchasing Manager, and

leadership positions with financial responsibilities.

It should not be assumed that the type of numerical reasoning required in a particular job is

identical to that measured by ANRA. Job analysis and local validation of ANRA for selection

purposes should follow accepted human resource research procedures, and conform to existing

guidelines concerning fair employment practices. In addition, no single test score can possibly

suggest all of the requisite knowledge and skills necessary for success in a job.

Using ANRA in Making a Hiring Decision It is ultimately the responsibility of the hiring authority to determine how it uses ANRA scores.

We recommend that if the hiring authority establishes a cut score, examinees’ scores should be

considered in the context of appropriate measurement data for the test, such as the standard error

of measurement and data regarding the predictive validity of the test. In addition, we recommend

Page 34: Advanced Numerical Reasoning Appraisal TM (ANRA)talentlens.pearsonpsychcorp.com.au/files/ANRA_Manual(1).pdf · Numerical Reasoning Appraisal (RANRA). This manual details our adaptation

Copyright © 2006 by NCS Pearson, Inc. All rights reserved. 28

that selection decisions be based on multiple job-relevant tools rather than relying on any single

test (e.g., using only ANRA scores to make employment decisions).

Human resource professionals can look at the percentile rank that corresponds to the candidate’s

raw score in several ways. Candidates’ scores may be rank ordered by percentiles so that those

with the highest scores are considered further. Alternatively, a cut score (e.g., the 50th percentile)

may be established so that candidates who score below the cut score are not considered further. In

general, the higher the cut score is set, the higher the likelihood that a given candidate who scores

above that cut score will be successful. However, the need to select high scoring candidates

typically needs to be balanced with situational factors, such as the need to keep jobs filled and the

supply of talent in the local labor market.

When interpreting ANRA scores, it is useful to know the specific behaviors that an applicant with

a high ANRA score may be expected to exhibit. These behaviors, as rated by supervisors, were

consistently found to be related to ANRA scores across different occupations requiring numerical

reasoning. In general, candidates who score low on ANRA may find it challenging to effectively

demonstrate these behaviors. Conversely, candidates who score high on ANRA are likely to

display a higher level of competence in the following behaviors:

• Uses quantitative reasoning to solve job-related problems.

• Learns new numerical concepts quickly.

• Applies sound logic and reasoning when making decisions.

• Demonstrates knowledge of financial indicators and their implications.

• Breaks down information into essential parts or underlying principles.

• Readily integrates new information into problem-solving and decision-making

processes.

• Recognizes differences and similarities in situations or events.

• Engages in a broad analysis of relevant information before making decisions.

• Probes deeply to understand the root causes of problems.

• Reviews financial statements, sales reports, and/or other financial data when

planning.

• Accurately assesses the financial value of things (e.g., worth of assets) or people

(e.g., credit worthiness).

Page 35: Advanced Numerical Reasoning Appraisal TM (ANRA)talentlens.pearsonpsychcorp.com.au/files/ANRA_Manual(1).pdf · Numerical Reasoning Appraisal (RANRA). This manual details our adaptation

Copyright © 2006 by NCS Pearson, Inc. All rights reserved. 29

Human resource professionals who use ANRA should document and examine the relationship

between applicants’ scores and their subsequent performance on the job. Using locally obtained

criterion-related validity information provides the best foundation for interpreting scores and

most effectively differentiating examinees who are likely to be successful from those who are not.

Pearson does not establish or recommend a passing score for ANRA.

Differences in Reading Ability, Including the Use of English as a Second Language Though ANRA is a mathematical test, a level of reading proficiency in the English language is

assumed and reflected in the items. Where ANRA is being used to measure the numerical

reasoning capabilities of a group, for some of whom English is not their first language, reasonable

precautions need to be taken. If a candidate experiences difficulty with the language or the

reading level of the test, note this information and consider it when interpreting the test scores. In

some cases, it may be more appropriate to test such individuals with another assessment

procedure that fully accommodates their language of preference or familiarity.

Using ANRA as a Guide for Training, Learning, and Education

Critical thinking, numerical or otherwise, is trainable (Halpern, 1998; Paul & Nosich, 2004).

Thus, when interpreting test scores on ANRA, it is important to bear in mind the extent to which

training may have influenced the scores. The ability to think critically has long been recognized

as a desirable educational objective and studies that have been done in educational settings

demonstrate that critical thinking can be improved as a result of training directed to this end (Hill,

1959; Kosonen & Winne, 1995; Nisbett, 1993, Perkins & Grotzer, 1997).

Scores on ANRA are likely to be influenced by factors associated with training. Typically,

individuals will differ in the extent to which such training has been made available to them.

Although traditional classes in math and science in school are important, many of these classes

involve computational arithmetic and other lower order-thinking skills, such as the rote

application of rules that have been learned. Training in higher-order numerical reasoning during

the school years will often have been indirect and largely dependent on the overall quality of

education available to the individual. Consequently, this indirect training would likely depend on

the amount of time spent in education or learning. Furthermore, the extent to which numerical

reasoning skills are trainable will likely differ between individuals.

Page 36: Advanced Numerical Reasoning Appraisal TM (ANRA)talentlens.pearsonpsychcorp.com.au/files/ANRA_Manual(1).pdf · Numerical Reasoning Appraisal (RANRA). This manual details our adaptation

Copyright © 2006 by NCS Pearson, Inc. All rights reserved. 30

Fairness in Selection Testing Fair employment regulations and their interpretation are continuously subject to changes in the

legal, social, and political environments. Therefore, ANRA users should consult with qualified

legal advisors and human resources professionals as appropriate.

Legal Considerations Governmental and professional regulations cover the use of all personnel selection procedures.

Relevant source documents that the user may wish to consult include the Standards for

Educational and Psychological Testing (AERA et al., 1999); the Principles for the Validation and

Use of Personnel Selection Procedures (Society for Industrial and Organizational Psychology,

2003); and the federal Uniform Guidelines on Employee Selection Procedures (Equal

Employment Opportunity Commission, 1978). For an overview of the statutes and types of legal

proceedings that influence an organization’s equal employment opportunity obligations, the user

is referred to Cascio and Aguinis (2005) or the U.S. Department of Labor’s (1999) Testing and

Assessment: An Employer’s Guide to Good Practices.

Group Differences and Adverse Impact Local validation is particularly important when a selection test may have adverse impact.

According to the Uniform Guidelines on Employee Selection Procedures (Equal Employment

Opportunity Commission, 1978), adverse impact is indicated when the selection rate for one

group is less than 80% (or 4 out of 5) of another group. Adverse impact is likely to occur with

cognitive ability tests such as ANRA.

Although it is within the law to use a test with adverse impact (Equal Employment Opportunity

Commission, 1978), the testing organization must be prepared to demonstrate that the selection

test is job-related and consistent with business necessity. The Civil Rights Act of 1991, as

amended, defined “business necessity” to mean that, “in the case of employment practices

involving selection …, the practice or group of practices must bear a significant relationship to

successful performance of the job” (Section 3 (o) (1) (A)). In deciding whether the standards for

business necessity have been met, the Civil Rights Act of 1991 states that “demonstrable

evidence is required”. The Act provides examples of “demonstrable evidence” as “statistical

reports, validation studies, expert testimony, prior successful experience and other evidence as

permitted by the Federal Rules of Evidence” (Section 3 (o) (1) (B)).

A local validation study, in which ANRA scores are correlated with job performance indicators,

can provide evidence to support the use of the test in a particular job context. An evaluation that

Page 37: Advanced Numerical Reasoning Appraisal TM (ANRA)talentlens.pearsonpsychcorp.com.au/files/ANRA_Manual(1).pdf · Numerical Reasoning Appraisal (RANRA). This manual details our adaptation

Copyright © 2006 by NCS Pearson, Inc. All rights reserved. 31

demonstrates that ANRA (or any employment assessment tool) is equally predictive for protected

subgroups, as outlined by the Equal Employment Opportunity Commission, will assist in the

demonstration of fairness of the test. For example, from the results of their review of 22 cases in

U.S. Appellate and District Courts involving cognitive ability testing in class-action suits,

Shoenfelt and Pedigo (2005, p. 6) reported that “organizations that utilize professionally

developed standardized cognitive ability tests that are validated and that set cutoff scores

supported by the validation study data are likely to fare well in court.”

Monitoring the Selection System An organization’s abilities to evaluate selection strategies and to implement fair employment

practices depend on its awareness of the demographic characteristics of applicants and

incumbents. Monitoring these characteristics and accumulating test score data are clearly

necessary for establishing legal defensibility of a selection system, including those systems that

incorporate ANRA. The most effective use of ANRA is with a local norms database that is

regularly updated and monitored.

The hiring organization should ensure that its selection process is clearly job related and focuses

on characteristics that are important to job success. Good tests that are appropriate to the job in

question can contribute a great deal towards monitoring and minimizing the major sources of bias

in the selection procedures. ANRA is a reliable and valid instrument for the assessment of

numerical reasoning. When used for the assessment of candidates or incumbents for work that

requires this skill, ANRA can be useful in the selection of the better candidates. However, where

candidates drawn from different sub-groups of the population are more or less deficient in

numerical reasoning skills as a result of the failure to provide the necessary educational

environment during schooling, then there is the risk of overlooking candidates who can develop

this skill but who have not had the opportunity to do so. Employers can reasonably expect that

candidates should have achieved all the necessary basic skills before applying for the job.

However, in circumstances where adverse impact is manifest, an organization might wish to

consider ways in which it can contribute to the reduction of adverse impact. This approach might

take the form of providing training courses to employees in the deficient skill areas, or of

increasing involvement with the local community to identify ways in which the community might

assist, or of re-evaluating recruitment strategy, for example, by advertising job positions more

widely or through different media.

Page 38: Advanced Numerical Reasoning Appraisal TM (ANRA)talentlens.pearsonpsychcorp.com.au/files/ANRA_Manual(1).pdf · Numerical Reasoning Appraisal (RANRA). This manual details our adaptation

Copyright © 2006 by NCS Pearson, Inc. All rights reserved. 32

References

Allen, M.J., & Yen, W.M. (1979). Introduction to measurement theory. Monterey, CA: Brooks/Cole.

American Educational Research Association, American Psychological Association, & National Council on Measurement in Education (1999). Standards for educational and psychological testing. Washington, DC: Author.

American Institute of Certified Public Accountants, AICPA (1999). Broad business perspective competencies. Retrieved February 27, 2006, from http://www.aicpa.org/edu/bbfin.htm

Americans With Disabilities Act of 1990, Titles I & V (Pub. L. 101-336). United States Code, Volume 42, Sections 12101–12213.

Anastasi, A. & Urbina, S. (1997). Psychological testing (7th ed.). New York: Macmillan.

Brannon, E.M. (2002). The development of ordinal numerical knowledge in infancy. Cognition, 83, 223–240.

Cascio, W.F. (1991). Applied psychology in personnel management (4th ed.). Englewood Cliffs, NJ: Prentice Hall.

Cascio, W. F., & Aguinis, H. (2005). Applied psychology in human resource management (6th ed.). Upper Saddle River, NJ: Prentice Hall.

Civil Rights Act of 1991. 102nd Congress, 1st Session, H.R.1. Retrieved August 4, 2006. Access: http://usinfo.state.gov/usa/infousa/laws/majorlaw/civil91.htm

Cohen, B.H. (1996). Explaining psychological statistics. Pacific Grove, CA: Brooks & Cole.

Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Hillsdale, NJ: Lawrence Erlbaum.

Cronbach, L.J. (1970). Essentials of psychological testing (3rd ed.). New York: Harper & Row.

Equal Employment Opportunity Commission. (1978). Uniform guidelines on employee selection procedures. Federal Register, 43(166), 38295–38309.

Facione, P.A. (2006). Critical Thinking: What It Is and Why It Counts–2006 Update. Retrieved July 28, 2006 from http://www.insightassessment.com/pdf_files/what&why2006.pdf

Feigenson, L, Dehaene, S., & Spelke, E. (2004). Core systems of number. Trends in Cognitive Sciences, 8, 307–314.

Halpern, D. F. (1998) Teaching critical thinking for transfer across domains: Dispositions, skills, structure training, and metacognitive monitoring. American Psychologist, 53, 449–455.

Hill, W. H. (1959). Review of Watson-Glaser Critical Thinking Appraisal. In O.K. Buros (Ed.), The fifth mental measurements yearbook. Lincoln: University of Nebraska Press.

Page 39: Advanced Numerical Reasoning Appraisal TM (ANRA)talentlens.pearsonpsychcorp.com.au/files/ANRA_Manual(1).pdf · Numerical Reasoning Appraisal (RANRA). This manual details our adaptation

Copyright © 2006 by NCS Pearson, Inc. All rights reserved. 33

Hunt, E. (1995). Will we be smart enough? New York: Russell Sage Foundation.

Kealy, B.T., Holland, J., & Watson, M. (2005). Preliminary evidence on the association between critical thinking and performance in principles of accounting. Issues in Accounting Education, 20 (1), 33–47.

Kosonen, P. & Winne, P. H. (1995). Effects of teaching statistical laws on reasoning about everyday problems. Journal of Educational Psychology, 87, 33–46.

Lawshe, C.H. (1975). A quantitative approach to content validity. Personnel Psychology, 28, 563–575.

National Education Goals Panel. (1991). The national education goals report. Washington, DC: U.S. Government Printing Office

Nijenhuis, J., & Flier, H. (2005). Immigrant-majority group differences on work-related measures: the case for cognitive complexity. Personality and Individual Differences, 38, 1213–1221.

Nisbett, R. E. (Ed.) (1993). Rules for reasoning. Hillsdale, NJ: Lawrence Erlbaum

Nunnally, J.C. (1978). Psychometric theory (2nd ed.). Hew York: McGraw-Hill.

O*Net OnLine (2005). Skill searches for: Mathematics, Critical Thinking. Occupational Information Network: O*Net OnLine. Retrieved July 17, 2006 via O*Net OnLine Access: http://online.onetcenter.org/skills/result?s=2.A.2.a&s=2.A.1.e&g=Go

Paul, R., & Nosich, G.M. (2004). A Model for the National Assessment of Higher Order Thinking. Retrieved July 13, 2006, from http://www.criticalthinking.org/resources/articles/a-model-nal-assessment-hot.shtml

Perkins, D. N. & Grotzer, T. A. (1997). Teaching intelligence. American Psychologist, 52, 1125–1133.

Rust, J. (2002) Rust Advanced Numerical Reasoning Appraisal Manual. The Psychological Corporation: London.

Shoenfelt, E.L., & Pedigo, L.C. (2005, April). A Review of Court Decisions on Cognitive Ability Testing, 1992-2004. Poster Presentation at the 20th Annual Conference of the Society for Industrial and Organizational Psychology, Los Angeles, CA.

Society for Industrial and Organizational Psychology. (2003). Principles for the validation and use of personnel selection procedures (4th ed.). Bowling Green, OH: Author.

Spelke, Elizabeth S. (2005). Sex differences in intrinsic aptitude for mathematics and science? A critical review. American Psychologist, 60, 958-958.

Starkey, P. (1992). The early development of numerical reasoning. Cognition, 43, 93–126.

U.S. Department of Labor. (1999). Testing and assessment: An employer’s guide to good practices. Washington, DC: Author.

Vandenbroucke, Jan P. (1998). Clinical investigation in the 20th century: The ascendancy of numerical reasoning. The Lancet, 175(352), 12–16.

Page 40: Advanced Numerical Reasoning Appraisal TM (ANRA)talentlens.pearsonpsychcorp.com.au/files/ANRA_Manual(1).pdf · Numerical Reasoning Appraisal (RANRA). This manual details our adaptation

Copyright © 2006 by NCS Pearson, Inc. All rights reserved. 34

Watson, G. B. & Glaser, E. M. (2006) Watson–Glaser Critical Thinking Appraisal Short Form Manual. San Antonio, TX: Pearson.

Wynn, K., Bloom, P., & Chiang, W. (2002). Enumeration of collective entities by 5-month-old infants. Cognition, 83, B55–B62.

Page 41: Advanced Numerical Reasoning Appraisal TM (ANRA)talentlens.pearsonpsychcorp.com.au/files/ANRA_Manual(1).pdf · Numerical Reasoning Appraisal (RANRA). This manual details our adaptation

Copyright © 2006 by NCS Pearson, Inc. All rights reserved. 35

Appendix A

Description of the Normative Sample

The normative information provided below is based on data collected during the period of

February 2006 through June 2006.

Table A.1 Description of the Normative Sample by Occupation

Occupation Norms and Sample Characteristics Employees in Various Financial Occupations

N = 198 Mean = 21.9 SD = 6.4

Occupations in the Financial Occupations norm group Accountants = 6.1% Accounting Analysts = 1.5% Actuaries = 32.3% Auditors = 1.0% Banking Supervisors/Managers = 5.1% Billing Coordinators = 1.0% Bookkeepers = 2.0% Business Analysts = 3.5% Business Specialists = 0.5% Buyers = 2.5% Chief Financial Officers = 2.5% Claims Adjusters = 1.0% Collections Supervisors/Managers = 1.0% Comptrollers/Controllers = 2.0% Finance Analysts/Managers = 17.7% Finance or Budget Estimators = 0.5% Financial Planners = 3.0% Insurance Agents = 2.5% Insurance Analysts = 0.5% Insurance Brokers = 2.0% Loan Officers = 2.0% Procurement or Purchasing Officers/Managers = 9.6%

Page 42: Advanced Numerical Reasoning Appraisal TM (ANRA)talentlens.pearsonpsychcorp.com.au/files/ANRA_Manual(1).pdf · Numerical Reasoning Appraisal (RANRA). This manual details our adaptation

Copyright © 2006 by NCS Pearson, Inc. All rights reserved. 36

Table A.2 Description of the Normative Sample by Position Level

Position Level Norms and Characteristics Executives/Directors Executive- and Director-level positions within various industries.

N = 91 Mean = 21.3 SD = 6.0 Industry Characteristics Financial Services/Banking/Insurance = 53.9% Government/Public Service/Defense = 7.7% Professional Business Services/Consulting = 6.6% Publishing/Printing = 12.1% Real Estate = 1.1% Retail/Wholesale = 2.2% Other (unspecified) = 16.5%

Managers Manager-level positions within various industries.

N = 88

Mean = 20.1

SD = 5.6

Industry Characteristics

Financial Services/Banking/Insurance = 38.6%

Government/Public Service/Defense = 19.3%

Professional Business Services/Consulting = 10.2%

Publishing/Printing = 12.5%

Real Estate = 2.3%

Retail/Wholesale = 1.1%

Other (unspecified) = 14.8%

Professionals/Individual Contributors

Professional-level and individual-contributor positions within various industries.

N = 200

Mean = 22.1

SD = 6.4

Industry Characteristics

Financial Services/Banking/Insurance = 23.0%

Government/Public Service/Defense = 36.5%

Professional Business Services/Consulting = 12.5%

Publishing/Printing = 7.5%

Real Estate = 1.0%

Retail/Wholesale = 1.5%

Other (unspecified) = 16.5%

Page 43: Advanced Numerical Reasoning Appraisal TM (ANRA)talentlens.pearsonpsychcorp.com.au/files/ANRA_Manual(1).pdf · Numerical Reasoning Appraisal (RANRA). This manual details our adaptation

Copyright © 2006 by NCS Pearson, Inc. All rights reserved. 37

Appendix B

ANRA Total Raw Scores, Mid-Point Percentile Ranks, and T Scores by Norm Group

Table B.1 ANRA Total Raw Scores, Mid-Point Percentile Ranks, and T Scores by Position Level

Percentile Ranks by Position Level ANRA Total Raw Score

Executives/ Directors Managers

Professionals/Individual Contributors T Score

32 ≥99 ≥99 ≥99 ≥68 31 ≥99 ≥99 98 66 30 96 98 94 65 29 92 95 88 63 28 87 91 81 62 27 81 87 73 60 26 76 82 66 58 25 67 77 60 57 24 58 72 53 55 23 54 66 47 54 22 48 61 42 52 21 43 53 38 50 20 40 46 34 49 19 34 43 29 47 18 30 37 26 46 17 26 31 23 44 16 23 27 19 43 15 18 21 16 41 14 13 15 13 39 13 9 12 11 38 12 7 10 9 36 11 6 6 7 35 10 5 3 6 33 9 4 2 5 31 8 3 ≤1 4 30 7 2 ≤1 2 28 6 ≤1 ≤1 ≤1 27 5 ≤1 ≤1 ≤1 25 4 ≤1 ≤1 ≤1 23 3 ≤1 ≤1 ≤1 22 2 ≤1 ≤1 ≤1 20 1 ≤1 ≤1 ≤1 19 0 ≤1 ≤1 ≤1 17

Raw Score

Mean = 21.3 Raw Score

Mean = 20.1 Raw Score

Mean = 22.1 SD = 6.0 SD = 5.6 SD = 6.4 N = 91 N = 88 N = 200

Page 44: Advanced Numerical Reasoning Appraisal TM (ANRA)talentlens.pearsonpsychcorp.com.au/files/ANRA_Manual(1).pdf · Numerical Reasoning Appraisal (RANRA). This manual details our adaptation

Copyright © 2006 by NCS Pearson, Inc. All rights reserved. 38

Table B.2 ANRA Total Raw Scores, MId-Point Percentile Ranks, and T Scores for Employees in Various Financial Occupations (see Table A.1 for a list of the occupations in this norm group)

ANRA Total Raw Score

Percentile Ranks for Employees in Financial Occupations T Score

32 ≥99 ≥68

31 98 66

30 93 65

29 86 63

28 78 62

27 71 60

26 65 58

25 60 57

24 55 55

23 52 54

22 47 52

21 42 50

20 37 49

19 33 47

18 30 46

17 26 44

16 22 43

15 19 41

14 15 39

13 11 38

12 8 36

11 6 35

10 4 33

9 3 31

8 2 30

7 ≤1 28

6 ≤1 27

5 ≤1 25

4 ≤1 23

3 ≤1 22

2 ≤1 20

1 ≤1 19

0 ≤1 17

Raw Score Mean = 21.9 Raw Score SD = 6.4 N = 198

Page 45: Advanced Numerical Reasoning Appraisal TM (ANRA)talentlens.pearsonpsychcorp.com.au/files/ANRA_Manual(1).pdf · Numerical Reasoning Appraisal (RANRA). This manual details our adaptation

Copyright © 2006 by NCS Pearson, Inc. All rights reserved. 39

Appendix C

Combined Watson-Glaser and ANRA T Scores and Percentile Ranks by Norm Group

Table C.1 Combined Watson-Glaser Short Form and ANRA T Scores and Percentile Ranks by Position Level

Percentile Ranks by Position Combined T Scores

Executives/ Directors Managers

Professionals/ Individual Contributors

Combined T Scores

≥135 ≥99 ≥99 ≥99 ≥135

134 ≥99 ≥99 ≥99 134

133 ≥99 ≥99 ≥99 133

132 ≥99 ≥99 ≥99 132

131 ≥99 ≥99 ≥99 131

130 98 ≥99 ≥99 130

129 98 ≥99 ≥99 129

128 97 ≥99 ≥99 128

127 95 98 97 127

126 93 97 95 126

125 92 97 93 125

124 91 96 90 124

123 88 94 88 123

122 85 91 84 122

121 82 89 81 121

120 82 88 77 120

119 80 87 74 119

118 77 86 72 118

117 75 86 70 117

116 73 84 67 116

115 71 81 65 115

114 69 79 63 114

113 66 78 60 113

112 63 78 58 112

111 62 77 57 111

110 61 75 54 110

Page 46: Advanced Numerical Reasoning Appraisal TM (ANRA)talentlens.pearsonpsychcorp.com.au/files/ANRA_Manual(1).pdf · Numerical Reasoning Appraisal (RANRA). This manual details our adaptation

Copyright © 2006 by NCS Pearson, Inc. All rights reserved. 40

Table C.1 Combined Watson-Glaser Short Form and ANRA T Scores and Percentile Ranks by Position Level continued

Percentile Ranks by Position Level Combined T Scores

Executives/ Directors Managers

Professionals/ Individual Contributors

Combined T Scores

109 60 73 51 109

108 58 71 48 108

107 56 70 46 107

106 55 67 44 106

105 53 64 42 105

104 51 61 41 104

103 48 60 39 103

102 46 57 37 102

101 43 55 36 101

100 42 54 35 100

99 41 52 33 99

98 40 51 31 98

97 38 48 30 97

96 37 44 29 96

95 34 42 28 95

94 31 39 27 94

93 29 36 26 93

92 27 34 25 92

91 27 32 23 91

90 25 29 22 90

89 22 27 22 89

88 21 26 21 88

87 20 25 19 87

86 20 24 18 86

85 19 23 17 85

84 18 21 17 84

83 17 19 16 83

82 16 18 14 82

81 16 16 13 81

80 16 15 12 80

79 16 14 11 79

78 14 14 10 78

77 11 13 9 77

76 10 12 8 76

75 9 12 8 75

Page 47: Advanced Numerical Reasoning Appraisal TM (ANRA)talentlens.pearsonpsychcorp.com.au/files/ANRA_Manual(1).pdf · Numerical Reasoning Appraisal (RANRA). This manual details our adaptation

Copyright © 2006 by NCS Pearson, Inc. All rights reserved. 41

Table C.1 Combined Watson-Glaser Short Form and ANRA T Scores and Percentile Ranks by Position Level continued

Percentile Ranks by Position Level Combined T Scores

Executives/ Directors Managers

Professionals/ Individual Contributors

Combined T Scores

74 8 11 8 74

73 7 9 8 73

72 6 7 7 72

71 6 5 7 71

70 6 3 7 70

69 5 3 7 69

68 4 2 6 68

67 4 2 5 67

66 3 2 5 66

65 3 2 5 65

64 3 2 4 64

63 2 ≤1 4 63

62 2 ≤1 3 62

61 ≤1 ≤1 3 61

60 ≤1 ≤1 2 60

59 ≤1 ≤1 2 59

58 ≤1 ≤1 2 58

57 ≤1 ≤1 2 57

56 ≤1 ≤1 2 56

55 ≤1 ≤1 <=1 55

54 ≤1 ≤1 <=1 54

53 ≤1 ≤1 <=1 53

52 ≤1 ≤1 <=1 52

51 ≤1 ≤1 <=1 51

≤50 ≤1 ≤1 <=1 ≤50

N = 91 N = 88 N = 200

Page 48: Advanced Numerical Reasoning Appraisal TM (ANRA)talentlens.pearsonpsychcorp.com.au/files/ANRA_Manual(1).pdf · Numerical Reasoning Appraisal (RANRA). This manual details our adaptation

Copyright © 2006 by NCS Pearson, Inc. All rights reserved. 42

Table C.2 Combined Watson-Glaser Short Form and ANRA T Scores and Percentile Ranks for Employees in Various Financial Occupations (See Table A.1 for a list of the occupations in this group.)

Combined T Scores

Percentile Ranks for Employees in

Financial Occupations Combined T Scores ≥135 ≥99 ≥135

134 ≥99 134 133 ≥99 133 132 ≥99 132 131 ≥99 131 130 ≥99 130 129 ≥99 129 128 97 128 127 95 127 126 92 126 125 89 125 124 87 124 123 84 123 122 80 122 121 77 121 120 74 120 119 73 119 118 71 118 117 69 117 116 67 116 115 65 115 114 64 114 113 62 113 112 60 112 111 60 111 110 59 110 109 58 109 108 57 108 107 55 107 106 54 106 105 52 105 104 50 104 103 48 103 102 47 102 101 45 101 100 43 100

99 40 99 98 38 98 97 37 97 96 37 96

Page 49: Advanced Numerical Reasoning Appraisal TM (ANRA)talentlens.pearsonpsychcorp.com.au/files/ANRA_Manual(1).pdf · Numerical Reasoning Appraisal (RANRA). This manual details our adaptation

Copyright © 2006 by NCS Pearson, Inc. All rights reserved. 43

Table C.2 Combined Watson-Glaser Short Form and ANRA T Scores and Percentile Ranks for Employees in Various Financial Occupations (continued)

Combined T Scores

Percentile Ranks for Employees in

Financial Occupations Combined T Scores 95 35 95 94 34 94 93 32 93 92 31 92 91 29 91 90 28 90 89 27 89 88 27 88 87 26 87 86 25 86 85 24 85 84 22 84 83 20 83 82 19 82 81 18 81 80 17 80 79 16 79 78 14 78 77 12 77 76 10 76 75 9 75 74 8 74 73 7 73 72 5 72 71 4 71 70 4 70 69 4 69 68 4 68 67 3 67 66 3 66 65 2 65 64 2 64 63 ≤1 63 62 ≤1 62 61 ≤1 61 60 ≤1 60

Page 50: Advanced Numerical Reasoning Appraisal TM (ANRA)talentlens.pearsonpsychcorp.com.au/files/ANRA_Manual(1).pdf · Numerical Reasoning Appraisal (RANRA). This manual details our adaptation

Copyright © 2006 by NCS Pearson, Inc. All rights reserved. 44

Table C.2 Combined Watson-Glaser Short Form and ANRA T Scores and Percentile Ranks for Employees in Various Financial Occupations (continued)

Combined T Scores

Percentile Ranks for Employees in

Financial Occupations Combined T Scores 59 ≤1 59 58 ≤1 58 57 ≤1 57 56 ≤1 56 55 ≤1 55 54 ≤1 54 53 ≤1 53 52 ≤1 52 51 ≤1 51

≤50 ≤1 ≤50

N = 198