Sampling Audit (paper)

Auditing: A Journal of Practice & Theory American Accounting AssociationVol. 32, Supplement 1 DOI: 10.2308/ajpt-503942013pp. 99–129

Audit Sampling Research: A Synthesis andImplications for Future Research

Randal J. Elder, Abraham D. Akresh, Steven M. Glover, Julia L. Higgs, andJonathan Liljegren

SUMMARY: While research has influenced auditing standards for audit sampling,

academic research provides limited insights into the current use of audit sampling. We

synthesize relevant research based on a sampling decision framework and suggest areas

for additional research. Important judgments include determining if sampling applies, what

type of sampling to apply (e.g., attribute or monetary sampling), whether to use statistical

or nonstatistical techniques, appropriate inputs to determine sample size, and evaluation

of results, particularly when errors are observed in the sample. Several of these

judgments may be influenced by environmental factors, such as regulation, litigation,

competition, culture, and technology, and there are a number of research opportunities

available in exploring how these environmental factors influence audit sampling decisions.

Research indicates that auditors may underestimate risks and required assurance in

order to reduce the extent of testing, although some of this research predates current risk

assessment standards, as well as recent regulatory changes. Research also indicates

auditors sometimes fail to project sample errors, and are prone to decision biases when

evaluating nonstatistical samples. More recent research finds low rates of sample errors

in many sampled populations, indicating that some sampling concerns may be mitigated

Randal J. Elder is a Professor at Syracuse University, Abraham D. Akresh is a Certified Public Accountantand Certified Government Financial Manager, Steven M. Glover is a Professor at Brigham Young University,Julia L. Higgs is an Associate Professor at Florida Atlantic University, and Jonathan Liljegren is a Managerat Freddie Mac.

The authors thank Jeff Cohen (editor) and two anonymous reviewers for their helpful comments that substantiallyimproved the paper.

To facilitate the development of auditing and other professional standards and to inform regulators of insights from theacademic auditing literature, the Auditing Section of the American Accounting Association (AAA) decided to develop aseries of literature syntheses for the Public Company Accounting Oversight Board (PCAOB). This paper (article) isauthored by one of the research synthesis teams formed by the Auditing Section under this program. The views expressedin this paper are those of the authors and do not reflect an official position of the AAA or the Auditing Section. Inaddition, while discussions with the PCAOB staff helped us identify the issues that are most relevant to setting auditingand other professional standards, the author team was not selected or managed by the PCAOB, and the resulting paperexpresses our views (the views of the authors), which may or may not correspond to views held by the PCAOB and itsstaff.

Editor’s note: Accepted by Jeffrey R. Cohen.

Submitted: April 2012Accepted: January 2013

Published Online: January 2013

99

in the post-Sarbanes-Oxley (SOX) environment.

Keywords: audit sampling; audit evidence; post-SOX environment.

Data Availability: Please contact the authors.

INTRODUCTION

Audit sampling is a pervasive audit testing technique. The American Institute of Certified

Public Accountants (AICPA) and International Auditing and Assurance Standards Board

(IAASB) have recently updated audit standards and audit guides (e.g., AICPA [2011c]

AU-C 530 and the AICPA [2012a] Audit Sampling Audit Guide), and regulators such as the U.S.

Public Company Accounting Oversight Board (PCAOB) are currently considering various issues

related to the use of audit sampling, such as the advantages of statistical and nonstatistical sampling,

and under what conditions one approach might be more preferable than another. We provide a

synthesis of academic and practitioner research on audit sampling that will be useful for standard

setters in considering revisions to guidance and standards, and we identify areas for future research

opportunities.

We provide a framework of the audit sampling process based on existing auditing standards

and guidance. We then review relevant literature for each step in the audit process. A fairly

extensive literature exists on some sampling issues, such as determination of sample size and

projection of misstatements found in the sample. An extensive, but generally dated, literature also

exists on various statistical sampling techniques. However, limited evidence exists for many

issues related to audit sampling, which raises a number of potential research questions.

Auditing standards and guidance on audit sampling have not changed significantly since SAS

No. 39 (AICPA 1981) and the first Audit Sampling Accounting and Auditing Guide (AICPA 1983).

However, a review of the literature suggests there have been major changes in sampling practices

over the last three decades. Limited evidence exists as to the reasons for these changes, and the

effect of the legal and regulatory environment in the U.S. and other countries on sampling

decisions. Research into the nature and reasons for these changes and comparisons of sampling

techniques across variations in a number of environmental factors, such as private versus public

company audits, regulatory regimes, competition, technology, cultures, and countries, would

provide insight into factors impacting auditors’ sampling decisions.

Current standards allow the use of both statistical and nonstatistical sampling methods, and

auditors’ use of statistical sampling appears to have varied over time. Limited research evidence

exists on the extent of the use of statistical and nonstatistical sampling for tests of controls and tests

of details, and how use of these methods has changed over time or across client characteristics or

other environmental factors. Little research evidence also exists as to the effectiveness of audit

sampling relative to other audit procedures, or the effectiveness of nonstatistical audit sampling

relative to statistical audit sampling in providing sufficient audit evidence in practical audit settings.

Research into the determinants of current sampling practices would help inform standard setting,

practice, future research, and audit education. Furthermore, when auditors select samples

statistically (e.g., randomly) and evaluate the results nonstatistically, research suggests they may

be prone to decision biases, particularly when they do not use a decision aid or template (Butler

1985). This may result in incorrect acceptance of populations. Additional research could examine

how auditors evaluate sample results nonstatistically.

Studies also indicate that auditors often select samples haphazardly (e.g., Hall et al. 2002).

There is some evidence that haphazard samples may not be selected in a way that would be

expected to be representative of some population characteristics (Hall et al. 2001, 2000). However,

there is little evidence on the effect of haphazard selection on the representativeness of the

100 Elder, Akresh, Glover, Higgs, and Liljegren

Auditing: A Journal of Practice & TheorySupplement 1, 2013

selections with respect to the primary characteristic of interest in practical settings—error in the

population. Additional evidence is needed on how auditors select samples for tests of controls and

tests of details and whether the selection method leads to bias with respect to the primary

characteristics of the test.

A relatively recent development in the U.S. is the requirement for auditors to report on the

effectiveness of internal control over financial reporting. This requirement has increased the extent

and relative importance of tests of the operating effectiveness of internal controls over financial

reporting. However, we find that limited evidence exists on how auditors determine sample sizes

and evaluate sample results for attribute sampling in tests of controls. We do not know how the

inputs to sample size or other sampling decisions have changed over time or in response to

increases in the prevalence of automated controls. Although significant research has been

performed on auditor reporting on internal controls over financial reporting, we encourage research

into the underlying auditor testing of operating effectiveness of internal controls and whether the

audit sampling methods and decisions are different when testing the operating effectiveness of

controls for public and private companies, as well as in different reporting and regulatory

jurisdictions.

There is research indicating that auditors often underestimate risks in order to minimize the

extent of testing in tests of details (e.g., Kachelmeier and Messier 1990; Elder and Allen 2003),

which could potentially compromise audit effectiveness. Further research is needed on how

auditors’ risk assessment, audit strategy, and materiality judgments affect the application of audit

sampling in terms of when and how sampling is used, the level of assurance typically sought via

audit sampling, inputs to sample size, as well as selection and evaluation techniques.

Several studies (e.g., Burgstahler and Jiambalvo 1986; Elder and Allen 1998; Burgstahler et al.

2000) find that auditors may not consistently project sample misstatements as required by auditing

standards, which could lead to incorrect acceptance of accounting populations. However, more

recent research by Durney et al. (2012) suggests that when decision aids such as templates are used,

auditors do usually project misstatements observed in the sample to the population. Additional

research could examine current rates of error projection and why some auditors choose not to

project misstatements.

The next section briefly describes our research method, followed by a discussion of how

environmental factors influence audit sampling. The following section provides a summary of the

findings from the review of existing research. The final section presents our summary and

conclusions, as well as suggestions for future research.

METHOD

We first develop a model of the audit sampling process based on auditing standards and related

guidance, as well as some of the environmental factors that impact the use of audit sampling. These

environmental factors include the legal and regulatory environment, client complexity and use of

technology, and changes in audit approaches. These factors affect several parts of the audit

sampling process illustrated in Figure 1, especially the decision to use sampling, the form of

sampling used, and the sample size.

Based on the account and assertions to be tested, the nature of the population, and the

assurance needed, the auditor first determines whether sampling is necessary. Additional

considerations include whether the tests are designed to obtain evidence of control effectiveness,

substantive assurance, or dual-purpose, and whether the test is to be the primary source of evidence

about the assertion, or one of several tests of the assertion. The auditor then determines the

objectives addressed by the sampling application and specifies deviation or misstatement

conditions. If sampling applies, the auditor also decides whether to use statistical or nonstatistical

Audit Sampling Research 101


FIGURE 1General Audit Sampling Process



sampling, and whether the sample is intended to gather evidence on binary characteristics (i.e.,

attribute sampling) or on monetary balances for tests of details.

The auditor then determines the necessary sample size based on various inputs and selects the

sample items for testing. Auditing standards (e.g., PCAOB AU 350) indicate that sample items

should be selected in a manner that is expected to be representative of the population, and suggest

two methods of obtaining representative selections: haphazard and random-based selection

methods.1 After performing the audit procedures, the auditor then determines the underlying cause

of deviations or misstatements (e.g., error or fraud). The auditor then projects the errors observed in

the sample to the population and draws conclusions. Figure 2 provides further detail on the specific

steps involved in audit sampling for attribute sampling and monetary tests of details sampling

applications.

ENVIRONMENTAL FACTORS IMPACTING AUDIT SAMPLING

Audit sampling is one of the most fundamental testing procedures used to gather audit

evidence, and it has undergone significant change during the history of modern auditing. Before

the start of the twentieth century, many audits included an examination of every transaction

included in the financial statements. As companies increased in size, auditors often applied audit

sampling. In 1955, the American Institute of Accountants (predecessor to the AICPA) published

A Case Study of the Extent of Audit Samples, which was one of the first publications on audit

sampling, and also recognized the relationship between the extent of testing and the effectiveness

of internal control. As the use of sampling increased, so did interest in applying statistical

sampling.

The 1970s and 1980s saw extensive use of statistical sampling, and many research studies were

published that addressed the performance of various statistical sampling approaches. However,

research did not address factors impacting auditors’ sampling decisions, such as budgetary or

competitive pressure, legal jurisdictions, regulation, or technology on the decision to use statistical

sampling, as well as the judgments and techniques involved in effectively using audit sampling. In

1981, the AICPA’s Auditing Standards Board (ASB) issued SAS No. 39, Audit Sampling, and in

1983, the AICPA issued its first Audit Sampling Audit Guide.

SAS No. 58 changed the wording of the standard unqualified audit report to include

terminology that audit procedures are performed on a ‘‘test basis,’’ although this term is not defined.

Asare and Wright (2012) administered a company scenario involving an audit report to auditors,

bankers, and investors. The bankers and investors believed that ‘‘test basis’’ involved examining

larger samples than auditors actually use. Notably, the current audit report, under the AICPA

clarified auditing standards (AU 700-C) effective for periods ending on or after December 15,

2012, and IAASB auditing standards (ISA 700), no longer uses the term ‘‘test basis’’ (AICPA

2011d, IAASB 2009). The report does state ‘‘the procedures selected depend on the auditor’s

judgment, including the assessment of the risks of material misstatement of the financial statements,

whether due to error or fraud’’ (AU 700-C, A.58). We recommend further research similar to Asare

1 Random-based selection includes random selection, stratified random selection, probability proportional to sizeselection, and systematic selection with random start(s). A random-based selection, regardless of how the extentof testing was determined, can be evaluated formally using statistical techniques or nonstatistically based onauditor judgment. A haphazard selection, which is selection without any conscious bias (that is, without anyspecial reason for including or omitting items from the sample), is not careless, and is selected in a manner thatcan be expected to be representative of the population. A haphazard sample is evaluated nonstatistically,although a statistical evaluation could be used to inform auditor judgment as long as formal statisticalconclusions are not drawn.



FIGURE 2Specific Steps of Audit Sampling Processes



and Wright (2012) to address how financial statement users perceive the extent of audit sampling

implied in current audit report wording.

Despite the importance of audit sampling and significant changes in how it is applied, there is

almost no research that examines how audit sampling has changed over time and the reasons for

these changes. For example, recent increases in regulation and inspection, as well as weak global

economic conditions and increased audit competition, may have influenced the use of audit

sampling and sample sizes. Further, even though a common understanding of statistical concepts

and the use of technology such as off-the-shelf audit sampling software would suggest a similar

application of sampling with similar sample sizes across clients, industries, countries, and cultures,

the application of sampling requires significant auditor judgment—particularly in determining when

to use audit sampling, as well as the inputs to sample size (e.g., level of desired assurance, tolerable

error, expected error). There are a number of research opportunities to examine how these

environmental factors have influenced the use of audit sampling, as well as the comparability of

sample sizes and evaluation approaches. For example, is audit sampling more or less common for

public versus private companies? Are sample sizes larger for public company audits than for private

company audits? How is sample size influenced by culture, regulation, technology, and

competition?

Sullivan (1992) was the first to note that the then-Big 6 firms were using nonstatistical

sampling for almost all testing. However, other than noting that nonstatistical sampling is likely less

expensive to apply and can provide sufficient evidence, he did not provide any explanation why the

largest audit firms moved from statistical to nonstatistical sampling. Elder and Allen (2003) found

decreasing risk assessments and sample sizes during the 1990s, which they attribute to increased

competition, although that period was also associated with decreases in auditor legal liability, which

may also have resulted in reduced testing. However, little research has considered how factors such

as regulation, technology, or competition have influenced the use of statistical versus nonstatistical

sampling or resulting sample sizes.

For example, Trompeter and Wright (2010) suggest that regulation and inspection may

motivate auditors to use detail testing and audit sampling more, because it is easier to document and

justify than techniques such as substantive analytical procedures. Trompeter and Wright (2010)

surveyed 34 practicing auditors to assess how the uses of analytical procedures have changed as

audit approaches and use of technology have changed. The change in the wording of the standard

audit report appears to reflect a current emphasis on designing audit procedures to address

significant risks, and suggests that detail tests may involve procedures that target larger or more

risky items for testing. This typically would not involve audit sampling because audit sampling

requires the use of a selection technique that will produce a representative sample. Archival studies

similar to the approach in Elder and Allen (2003) could address whether auditors have increased or

decreased the number of sampling applications and increased or decreased risk-based targeted

selection techniques. Similarly, surveys of experienced partners could assess whether they are more

or less likely to use audit sampling (including statistical sampling) to obtain audit assurance in the

current environment compared to the pre-SOX environment, and whether the use of statistical

sampling and sample sizes has changed as a result of the clarified audit standards or PCAOB

inspection process.

As the ASB and IAASB have completed their clarity and convergence projects, audit

sampling standards and audit reporting are mostly similar for U.S. nonpublic entities and

international entities. While the global network firms use consistent sampling methodologies for

all entities across the globe, we encourage research that addresses how differences in the U.S. and

international legal, regulatory, and competitive environments impact the use and application of

audit sampling. For example, do risk assessments and sample sizes vary depending on the legal

and regulatory environment across countries? Similarly, will auditors from the same global firm,



but from different countries that are culturally distinct, reach the same decision on what

constitutes an appropriate audit sample? Has the use of sampling and sample sizes increased in

the U.S. due to PCAOB oversight? Have sample sizes decreased due to increased audit fee

competition, and have the changes been uniform across audit firms, borders, and regulatory

environments? Do firms use different sample sizes or approaches for public company and private

company audits or public company audits subject to PCAOB oversight versus other regulatory

oversight?

RELEVANT FINDINGS FROM RESEARCH

We review and summarize relevant literature for each step in the audit sampling process. There

is an extensive, but dated, literature that primarily examines refinements in statistical sampling

techniques. Much of the research prior to 1985 is summarized in Akresh et al. (1988), who

discussed many research questions related to audit sampling. Aldersley et al. (1995) summarized

the history of audit sampling at many of the firms and the collaboration of academics and

practitioners in audit sampling through 1995. This publication contains an extensive chronological

bibliography, including studies going back to 1933. Because both Akresh et al. (1988) and

Aldersley et al. (1995) contain extensive bibliographies, we include earlier studies only if they

relate to the research questions in this study.

While some aspects of the sampling process have been extensively studied, limited evidence

exists for other parts of the process. A number of studies are summarized in Table 1 and are

organized by steps in the sampling process.

Where Sampling Applies

PCAOB auditing standards (AU 350.01) define audit sampling as ‘‘the application of an audit

procedure to less than 100 percent of the items within an account balance or class of transactions for

the purpose of evaluating some characteristic of the balance or class.’’ Our first discussion question

addresses auditors’ use of sampling to obtain audit evidence:

DQ1: What factors impact auditors’ decisions to use audit sampling to obtain evidence

regarding the effectiveness of controls or the accuracy of the monetary amount of a class

of transactions or account balance? How do auditors use those factors in reaching their

decisions?

AU-C 330.A65–71 (AICPA 2011a) notes that items selected for testing include (1) selecting all

items (100 percent examination), (2) selecting specific items, and (3) audit sampling. Although any or

a combination of approaches may be appropriate in the circumstances, the first two approaches are not

sampling. In particular, the selective examination of specific items from a class of transactions or

account balance will often be an efficient means of obtaining audit evidence, but does not constitute

audit sampling, as the selection of specific items is not intended to be representative of the population.

Audit sampling is designed to enable conclusions to be drawn about an entire population on the

basis of testing a sample drawn from the population. The AICPA (2012a) Audit Sampling Audit

Guide provides several categories of audit procedures that may not involve audit sampling.

Sampling may not be appropriate when a population is small or when it is difficult to define a

homogeneous population, such as some inventory observation settings or the search for unrecorded

liabilities. Sampling is also not used when audit procedures are applied to every item in a

population, such as certain clerical accuracy and comparison tests applied to the entire population

using computer-assisted auditing techniques (CAATs).



TA

BL

E1

Sel

ecte

dS

tud

ies

Ad

dre

ssin

gA

ud

itS

am

pli

ng

Cit

ati

on

Pu

rpose

Res

earc

hM

eth

od

sF

ind

ings

Res

earc

hO

pp

ort

un

itie

s

Dec

ide

Wh

eth

erto

Use

Sta

tist

ical

or

No

nst

atis

tica

lS

amp

lin

g

Hal

let

al.

(20

02

)

Ass

ess

sam

pli

ng

met

ho

ds

use

d

by

aud

ito

rsin

pu

bli

c

acco

un

tin

g,

ind

ust

ry,

and

go

ver

nm

ent.

Su

rvey

ed6

00

aud

ito

rsan

d

rece

ived

22

3u

sab

le

resp

on

ses.

�N

on

stat

isti

cal

met

ho

ds

use

d

in8

5p

erce

nt

of

sam

pli

ng

appli

cati

ons.

�M

US

mo

stco

mm

on

stat

isti

cal

met

ho

d.

�A

ud

ito

rsev

alu

ated

man

y

sam

ple

sst

atis

tica

lly

,ev

en

tho

ug

hth

eyw

ere

sele

cted

no

nst

atis

tica

lly

.

To

wh

atex

ten

tar

eau

dit

ors

curr

entl

y

usi

ng

no

nst

atis

tica

lan

dst

atis

tica

l

met

ho

ds?

Ho

wsi

mil

arar

est

atis

tica

lan

d

no

nst

atis

tica

lap

pro

ach

esin

pra

ctic

e?

Are

ther

ed

iffe

ren

tap

pro

ach

esto

stat

isti

cal

sam

pli

ng

?

Are

ther

ed

iffe

ren

ces

inte

rnat

ion

ally

in

the

use

of

stat

isti

cal

and

no

nst

atis

tica

lsa

mp

lin

g?

Net

eran

d

Lo

ebb

eck

e

(19

75

)

Stu

dy

the

pre

cisi

on

and

reli

abil

ity

of

var

iou

s

stat

isti

cal

esti

mat

ors

.

Ev

alu

ate

fou

rac

cou

nti

ng

po

pu

lati

on

sw

ith

hig

han

d

mo

der

ate

erro

rra

tes.

MU

Sis

pre

fera

ble

for

popula

tions

wit

hlo

wer

ror

rate

s.

Ho

wex

ten

siv

ely

isM

US

use

d,

and

for

wh

atty

pes

of

acco

un

tin

g

po

pu

lati

on

s?

Det

erm

ine

Sam

ple

Siz

e

Eld

eran

dA

llen

(20

03

)

Det

erm

ine

wh

eth

ersa

mp

le

size

sar

ese

nsi

tiv

eto

risk

asse

ssm

ents

.

Co

llec

ted

dat

afr

om

wo

rk

pap

ers

fro

mth

ree

firm

s,fo

r

23

5sa

mp

lin

gap

pli

cati

on

s

inv

olv

ing

53

aud

its

in1

99

4

and

19

99

.

�S

amp

lesi

zes

wer

eo

nly

wea

kly

asso

ciat

edw

ith

risk

asse

ssm

ents

.�

Ris

kas

sess

men

tsan

d

sam

ple

size

sd

ecli

ned

ov

er

the

per

iod

.

Wh

atar

ecu

rren

tri

skas

sess

men

tle

vel

s

and

sam

ple

size

s?

Do

sam

ple

size

sv

ary

dep

end

ing

on

the

reg

ula

tory

env

iro

nm

ent?

Mes

sier

etal

.

(20

01

)

Ree

xam

ine

Kac

hel

mei

eran

d

Mes

sier

(19

90

)to

asse

ss

wh

eth

erau

dit

ors

det

erm

ine

sam

ple

size

sco

nsi

sten

tw

ith

rev

ised

AIC

PA

aud

it

sam

pli

ng

gu

ide.

Ex

per

imen

tal

stu

dy

inv

olv

ing

14

9ex

per

ien

ced

aud

ito

rs.

Rec

om

men

ded

sam

ple

size

s

are

clo

ser

tore

com

men

ded

sam

ple

size

s,b

ut

stil

l

con

tain

evid

ence

of

‘‘w

ork

ing

bac

kw

ard

.’’

Has

the

rela

tio

nsh

ipb

etw

een

risk

and

sam

ple

size

sch

ang

edfo

llo

win

g

ado

pti

on

of

risk

asse

ssm

ent

stan

dar

ds?

Do

es‘‘w

ork

ing

bac

kw

ard’’

stil

lex

ist

wh

enri

sks

are

esta

bli

shed

in

pla

nn

ing

asp

art

of

risk

asse

ssm

ent

pro

ced

ure

s?

(con

tinu

edo

nn

ext

pa

ge)



TA

BL

E1

(co

nti

nu

ed)

Cit

ati

on

Pu

rpo

seR

esea

rch

Met

ho

ds

Fin

din

gs

Res

earc

hO

pp

ort

un

itie

s

Kac

hel

mei

eran

d

Mes

sier

(19

90

)

Ass

ess

whet

her

audit

ors

det

erm

ine

sam

ple

size

s

con

sist

ent

wit

hA

ICP

A

aud

itsa

mp

lin

gg

uid

e.

Ex

per

imen

tal

stu

dy

inv

olv

ing

16

1au

dit

ors

.

Au

dit

ors

app

ear

to‘‘

wo

rk

bac

kw

ard’’

and

sele

ctri

skto

ach

iev

ea

des

ired

sam

ple

size

.

Sel

ect

Sam

ple

Item

s

Hal

let

al.

(20

01

)

Ex

amin

ew

het

her

do

ub

lin

g

hap

haz

ard

sam

ple

size

s

red

uce

sb

ias

inth

e

hap

haz

ard

sam

ple

sele

ctio

n.

Stu

den

tsu

bje

cts

wer

eas

ked

to

sele

ctin

ven

tory

item

san

d

vo

uch

ers

fro

mst

ora

ge

un

its

wit

hk

no

wn

po

pu

lati

on

char

acte

rist

ics.

Incr

easi

ng

sam

ple

size

s

som

ewh

atm

itig

ates

bia

sin

hap

haz

ard

sele

ctio

n,

bu

t

do

ub

lin

gsa

mp

lesi

zes,

as

sug

ges

ted

inth

eli

tera

ture

,

do

esn

ot

app

ear

tob

ean

effe

ctiv

eso

luti

on

for

red

uci

ng

mis

rep

rese

nta

tio

n

inth

esa

mp

le.

Wh

atm

eth

od

sar

ecu

rren

tly

use

db

y

aud

ito

rsto

sele

ctsa

mp

les

for

test

s

of

con

tro

lsan

dte

sts

of

det

ails

?

Do

aud

ito

rsu

sera

nd

om

sele

ctio

n

tech

niq

ues

wit

hn

on

stat

isti

cal

sam

pli

ng

met

ho

ds?

Hal

let

al.

(20

00

)

Det

erm

ine

ifn

ov

ice

aud

ito

rs

can

sele

cth

aph

azar

d

sam

ple

sw

ith

ou

tb

ias.

Stu

den

tsu

bje

cts

wer

eas

ked

to

sele

ctin

ven

tory

item

san

d

vo

uch

ers

fro

mst

ora

ge

un

its

wit

hk

no

wn

po

pu

lati

on

s

char

acte

rist

ics.

Des

pit

edeb

iasi

ng

inst

ruct

ions,

sam

pli

ng

un

its

sele

cted

ten

ded

tob

ela

rger

,b

rig

htl

y

colo

red

,co

nv

enie

ntl

y

loca

ted

,an

dh

adfe

wer

adja

cen

tn

eig

hb

ors

.

Do

esth

ese

lect

ion

bia

sn

ote

din

rese

arch

rela

teto

the

pri

mar

y

char

acte

rist

ico

fin

tere

st—

erro

rra

te

or

mis

stat

emen

t?

Per

form

Au

dit

Pro

ced

ure

s

Wag

go

ner

(19

90

)

Det

erm

ine

wh

eth

ern

on

-

sam

pli

ng

risk

aris

esfr

om

aud

ito

rs’

fail

ure

tod

etec

t

erro

rs.

Pro

vid

ed2

5au

dit

ors

wit

h2

5

dis

bu

rsem

ent

pac

kag

es

seed

edw

ith

erro

rs.

Au

dit

ors

fail

edto

det

ect

45

per

cen

to

fth

eer

rors

inth

e

sam

ple

.

Ho

wex

ten

siv

ear

en

on

sam

pli

ng

risk

s

rela

tiv

eto

sam

pli

ng

risk

s?

Cas

ter

etal

.

(20

08

)

Ev

alu

ate

the

exte

nt

tow

hic

h

con

firm

atio

ns

are

effe

ctiv

e

ind

etec

tin

gac

cou

nts

rece

ivab

leer

rors

.

Lit

erat

ure

rev

iew

of

con

firm

atio

nst

ud

ies

and

Acc

ou

nti

ng

and

Au

dit

ing

En

forc

emen

tR

elea

ses.

Man

ym

isst

atem

ents

are

no

t

det

ecte

db

yco

nfi

rmat

ion

pro

ced

ure

s.

Wh

atar

eth

ep

rim

ary

cau

ses

of

no

n-

sam

pli

ng

risk

(e.g

.,ti

me

pre

ssu

re,

inco

mp

eten

ce)

and

wh

atfa

cto

rs

wo

uld

hel

pm

itig

ate

bia

s?

(con

tinu

edo

nn

ext

pa

ge)



TA

BL

E1

(co

nti

nu

ed)

Cit

ati

on

Pu

rpose

Res

earc

hM

eth

od

sF

ind

ings

Res

earc

hO

pp

ort

un

itie

s

Ev

alu

ate

Sam

ple

Res

ult

s

Du

rney

etal

.

(20

12

)

Ass

ess

eval

uat

ion

so

fsa

mp

les

inth

ep

ost

-SO

X

env

iro

nm

ent.

Inves

tigat

eev

aluat

ions

of

sam

ple

sfo

r1

60

app

lica

tio

ns

inv

olv

ing

anin

tern

atio

nal

acco

un

tin

gfi

rm.

�M

ost

sam

ple

sd

idn

ot

con

tain

erro

rs.

�9

7p

erce

nt

of

erro

rsw

ere

pro

ject

ed.

Wh

atar

ety

pic

aler

ror

rate

sin

acco

un

tin

gp

op

ula

tio

ns

for

pu

bli

c

com

pan

ies

req

uir

edto

com

ply

wit

h

SO

X4

04

,an

dfo

rp

riv

ate

and

smal

l

pu

bli

cco

mp

anie

s?

Eld

eran

dA

llen

(19

98

)

Inv

esti

gat

eau

dit

ors

’ac

tual

erro

rp

roje

ctio

nd

ecis

ion

s.

Rev

iew

edw

ork

pap

ers

for

23

5

sam

pli

ng

app

lica

tio

ns

fro

m

53

audit

so

fm

ediu

m-s

ized

com

pan

ies

per

form

edb

y

thre

ela

rge

aud

itfi

rms.

�A

ud

ito

rsfa

iled

top

roje

ct3

3

per

cen

to

fsa

mp

leer

rors

.�

Imm

ater

iali

tyw

asth

em

ost

com

mo

nre

aso

nfo

rn

ot

pro

ject

ing

aner

ror.

�A

ud

ito

rso

ften

use

d

con

tain

men

tto

lim

itan

erro

rto

asu

bp

op

ula

tio

n.

Wh

atfa

cto

rssu

pp

ort

hig

hra

tes

of

erro

rp

roje

ctio

nb

yau

dit

ors

?W

hat

can

aud

itfi

rms

do

toim

pro

ve

aud

ito

rp

erfo

rman

ce(e

.g.,

dec

isio

n

aid

s,tr

ain

ing

)?

Wh

atre

aso

ns

are

curr

entl

yu

sed

by

aud

ito

rsto

just

ify

no

tp

roje

ctin

gan

erro

r?

Du

sen

bu

ryet

al.

(19

94

)

Tes

tw

het

her

aud

ito

rs’

dec

isio

ns

top

roje

ctsa

mp

le

erro

rsar

ere

late

dto

the

freq

uen

cyo

fth

eer

ror

and

the

exte

nt

of

info

rmat

ion

on

erro

rco

nta

inm

ent.

Beh

avio

ral

exp

erim

ent

bas

ed

on

Bu

rgst

ahle

ran

d

Jiam

bal

vo

(19

86

),m

od

ified

toad

dre

sser

ror

freq

uen

cy

and

con

tain

men

t.

Au

dit

ors

wer

em

ore

lik

ely

to

pro

ject

erro

rsth

atw

ere

mo

refr

equ

ent,

and

less

lik

ely

top

roje

cter

rors

wh

en

they

had

bee

nco

nta

ined

to

asu

bp

op

ula

tio

n.

To

wh

atex

ten

td

oau

dit

ors

bel

iev

eit

isap

pro

pri

ate

totr

eat

aner

ror

as

bei

ng

anan

om

aly

?

Bu

rgst

ahle

ran

d

Jiam

bal

vo

(19

86

)

Inv

esti

gat

eex

ten

tto

wh

ich

aud

ito

rsp

roje

ctsa

mp

le

erro

rs.

Beh

avio

ral

exp

erim

ent

in

wh

ich

pra

ctic

ing

aud

ito

rs

wer

ep

rov

ided

wit

hse

ven

hy

po

thet

ical

erro

rp

roje

ctio

n

dec

isio

ns.

Au

dit

ors

fail

edto

pro

ject

67

per

cen

to

fth

esa

mp

leer

rors

.

Co

ncl

ud

eo

nA

ccep

tab

ilit

yo

fP

op

ula

tio

nB

ased

on

Sam

ple

Bu

rgst

ahle

ret

al.

(20

00

)

Ass

ess

wh

eth

erau

dit

ors

’

eval

uat

ion

so

fsa

mp

les

are

affe

cted

by

con

sid

erat

ion

of

erro

rp

roje

ctio

nan

d

sam

pli

ng

risk

.

61

aud

itse

nio

rsfr

om

aB

ig5

acco

un

tin

gfi

rmw

ere

pro

vid

edw

ith

thre

ese

tso

f

mat

eria

lsw

ith

succ

essi

vel

y

more

com

ple

tein

form

atio

n

reg

ard

ing

sam

ple

resu

lts.

Au

dit

ors

wer

em

ore

lik

ely

to

req

uir

ean

aud

itad

just

men

t

wh

enth

eyw

ere

spec

ifica

lly

req

uir

edto

con

sid

erer

ror

pro

ject

ion

and

sam

pli

ng

risk

.

Ho

wd

oau

dit

ors

con

sid

ersa

mp

lin

g

risk

wh

enp

erfo

rmin

gn

on

stat

isti

cal

sam

pli

ng

?

Wh

atca

nau

dit

firm

sd

oto

imp

rov

e

aud

ito

rp

erfo

rman

ce(e

.g.,

dec

isio

n

aid

s,tr

ain

ing

,ex

per

tre

vie

w)?

(con

tinu

edo

nn

ext

pa

ge)



TA

BL

E1

(co

nti

nu

ed)

Cit

ati

on

Pu

rpo

seR

esea

rch

Met

ho

ds

Fin

din

gs

Res

earc

hO

pp

ort

un

itie

s

Blo

cher

and

By

lin

ski

(19

85

)

Ev

alu

ate

aud

ito

rs’

no

nst

atis

tica

lev

alu

atio

no

f

erro

rco

nfi

den

cein

terv

als.

Ex

per

imen

tad

min

iste

red

to3

0

aud

ito

rsfr

om

reg

ion

alan

d

nat

ion

alfi

rms

bas

edo

nd

ata

inN

eter

and

Lo

ebb

eck

e

(19

75

).

Su

bje

cts’

sub

ject

ive

con

fid

ence

inte

rval

sw

ere

nar

row

erth

an

stat

isti

cal

con

fid

ence

inte

rval

s.



For similar reasons, paragraph 1.12 of the Sampling Guide indicates ‘‘cutoff tests often do not

involve audit sampling applications,’’ because auditors often define a small ‘‘cutoff’’ population and

then test all items in the population. However, the Sampling Guide does acknowledge that, ‘‘one

could design cutoff tests using audit sampling when the volume of transactions during the period of

interest is high’’ (AICPA 2012a, 4). For example, in some industries (e.g., financial services), there

may be thousands of transactions that occur in a very short period of time around year-end. In such

cases, the auditor may define the cutoff population by time (i.e., the last and first day of the period)

and choose to apply audit sampling.

Ham et al. (1985) study error rates and distributions for 20 audits for five years. They found that

cutoff errors represented the most likely source of material error for inventory, accounts receivable,

and accounts payable. Elder and Allen (1998) found that auditors often did not project cutoff errors,

but do not indicate whether this is because the auditor did not consider the test to be a sampling

application, or because it was difficult to monetarily measure the population for projection.

As this discussion illustrates, determining whether an audit procedure involves or does not

involve sampling is more complicated than it initially appears. However, we are unaware of

research that addresses the decision to use audit sampling. There is a need for academic research

that examines the extent to which sampling is currently applied in tests of controls and tests of

details. How has test of controls sampling changed with increased automation of controls and the

issuance of audit opinions on internal control? To what extent has sampling for tests of details

decreased because other evidence—including risk assessment procedures, consideration of related

controls, and analytical procedures—supports the conclusion that an account does not contain a

material misstatement? Future academic research could address these issues of when sampling is

appropriate and where other tests provide sufficient evidence.

Comparison of Statistical and Nonstatistical Sampling

International, AICPA, and PCAOB auditing standards allow the auditor to apply either statistical

or nonstatistical sampling. PCAOB AU 350 notes that sampling risk is present in both nonstatistical

and statistical sampling, and all audit sampling involves judgment in planning and performing the

sampling procedure and evaluating the results of the sample. Further, paragraph 2.24 in the AICPA

(2012a, 14) Sampling Guide indicates that a properly designed nonstatistical sample that considers

the same factors considered in a properly designed statistical sample can provide results that are as

effective as those from a properly designed statistical sampling application. One advantage of

statistical sampling is that it allows the auditor to explicitly quantify the level of sampling risk.

However, when the populations tested via audit sampling contain zero or trivial misstatement, the

advantage of statistical sampling may be less important to the auditor, as auditing standards require

the sample sizes for statistical and nonstatistical sampling to be comparable (PCAOB AU 350).

Colbert (1991) argues that statistical sampling is more defensible than nonstatistical

sampling. Gray et al. (2011) conducted focus groups with financial statement users, and some

users expressed disappointment that sample sizes were not larger and selected more

scientifically. Gilberston and Herron (2003) administered an experimental instrument to 122

jurors and students asking them to determine liability and assess damages in a liability case

involving 800 fictitious sales transactions out of a population of 12,000 transactions. The

auditors examined 100 sales invoices and found no discrepancies. Subjects were not more

likely to find the auditors guilty in the nonstatistical sampling condition compared to the

statistical sampling condition. However, damages were significantly larger in the nonstatistical

sampling condition.



Our second discussion question addresses the extent to which statistical and nonstatistical

sampling techniques are used, and their effectiveness in evaluating audit evidence:

DQ2: To what extent do auditors use statistical and nonstatistical sampling to obtain audit

evidence? Are there differences in the effectiveness of the two types of audit sampling?

If so, what are those differences?

Auditor use of statistical sampling appears to have varied over time periods. Statistical sampling

was used more in the 1970s and early 1980s as compared to later periods, around the time of the

issuance of Statement on Auditing Standards No. 39, Audit Sampling (AICPA 1981).2 In 1978, the

AICPA surveyed 200 firms believed to be using statistical sampling (Akresh 1980). Thirteen of the 15

largest firms responded. Nine of these large firms indicated they were using monetary unit sampling

(MUS) or classical variables sampling or both; 12 of the 13 were using attributes sampling.

Nonstatistical audit sampling is now common in practice, although it is often used in ways to

approximate a statistical approach. PCAOB and AICPA auditing standards require statistical and

nonstatistical approaches to be similar. For example, as noted above, PCAOB audit standards

indicate that ‘‘when a nonstatistical sampling approach is applied properly, the resulting sample size

will be comparable to, or larger than, the sample size resulting from an efficient and effectively

designed statistical sample’’ (AU 350.23a).

By the early 1990s, it appears that most of the larger firms were primarily using nonstatistical

methods (Sullivan 1992; Elder and Allen 1998). There are two primary reasons why statistical

sampling may have fallen out of favor. First, increased emphasis on inherent risk (e.g., Houghton

and Fogarty 1991) suggested auditors could use knowledge and expertise to identify high-risk

transactions or balances (e.g., large unusual items, transactions near period end, areas where

material misstatements have been discovered in the past) and test these risky items 100 percent,

rather than rely on random or haphazard selection.

The second reason relates to poor linkage between the applied audit setting and traditional

statistical sampling applications. In most scientific statistical applications, a high degree of

confidence, say 95 to 99 percent, is required. However, in an audit context, the auditor may need

only a low or moderate level of confidence or assurance (e.g., 50 to 80 percent) because evidence

gathered from other audit procedures provides additional assurance. Although sampling guidance

allows for lower confidence levels, some audit firms simply moved to nonstatistical sampling with

guidance based primarily on judgment. These judgments may not have always been consistent with

standards or statistical theory, and were likely motivated in some cases by a desire to reduce testing.

Discussions by some members of the author team with large audit firms indicate that in recent

years, these firms have updated their nonstatistical sampling approaches to be more consistent with

statistical theory. For example, these firms indicate their attribute sampling applications use sample

sizes grounded in statistical theory, but the firms’ sampling policies and practice aids simplify many

of the judgments necessary to determine sample size.3 The input choices, in terms of levels of

assurance, importance of the control, and expected deviation rate, may contain only a few choices

2 Carpenter and Dirsmith (1993) analyzed the use of statistical sampling from an institutional and sociologicalperspective. Statistical sampling was part of a movement away from an emphasis on auditors’ detection of fraud,and was favored by firms that followed more structured auditing approaches. Statistical sampling raised thestature of auditing in academia, and Smith and Krogstad (1984) reported that three statistical sampling studieswere the most cited articles in Auditing: A Journal of Practice & Theory at that time. The Carpenter and Dirsmith(1993) study suggests that statistical sampling may influence audit approaches beyond its effect on samplingprocedures.

3 For example, a national auditing services partner at one firm characterized their sampling for tests of controls asnonstatistical guided by statistical theory in compliance with auditing standards that require a nonstatisticalsample size to ordinarily be comparable to a statistical sample size.



(e.g., moderate or high assurance, moderate or high importance) to simplify the judgments and to

improve consistency in the application of the sampling approach. For example, a recurring manual

control may have a sample size of 25 to 40 items, and the results are deemed acceptable if no

deviations are found in the sample. These characteristics of attribute sampling lead to more

common or universal approaches across firms and relatively simple sample size tables. For an

illustration of how such simple sample size tables are developed, see Chapter 11 in the AICPA

(2012b) Audit Guide: Government Auditing Standards and Circular A-133 Audits.

Hall et al. (2002) surveyed 600 auditors in public accounting, industry, and government in

1997 and received 223 usable responses. Respondents were asked how they determined sample

sizes, selected samples, and evaluated samples for all sampling applications they had completed

over the previous six months. Nonstatistical methods were used in 85 percent of the sampling

applications, with monetary unit sampling (MUS) being the most common statistical method used.

The auditors indicated that they selected 15 percent of their samples using statistical sampling

techniques, but evaluated 36 percent of the samples using statistical sampling techniques. The

authors interpret this as indicating improper use of statistical evaluation in 21 percent of the

sampling applications, although a haphazard sample can be evaluated statistically for information

purposes to assist the auditor in evaluating the test results, as long as the auditor does not draw

formal statistical conclusions. We discuss the evaluation of sample results later in this study.

Thus, this research may indicate a potential need for auditing standards and related guidance to

clarify the relation between the method used to select and evaluate a sample. In many

circumstances, use of statistical sampling guidance can be helpful in determining an adequate

sample size and selection of a sample that is suitable for the objectives of the test, even if the sample

is evaluated using nonstatistical techniques. Several research opportunities exist in this area,

including research into how auditors are applying statistical and nonstatistical sampling in the

current environment. Auditors may be more likely to apply statistical sampling post-SOX if sample

sizes have increased in response to regulation, or if statistical sampling is believed to be more

defensible to regulators. This suggests several research questions. Due to regulatory oversight, is

the use of audit sampling increasing, and is statistical sampling more likely to be used for public

company audits? Are sample sizes larger for public company audits? Are auditors more likely to

apply sampling and even statistical sampling for audits subject to PCAOB oversight than they are

for public company audits in other jurisdictions?

Choice of Statistical Methods for Substantive Tests of Details

The auditor can choose from several statistical sampling methods for substantive tests of

details. These methods include monetary unit sampling (MUS) and classical variables methods—

including ratio estimation, difference estimation, and mean-per-unit estimation—with MUS being

the most common for the reasons discussed below.

Table 2 summarizes some of the more significant research on statistical sampling. Panel A

identifies research that has significantly influenced current audit practice, while Panel B lists other

relevant studies that have not significantly impacted current practice, along with suggestions for

additional research in the area.

As noted in Table 2, Panel A, Neter and Loebbecke (1975) studied the precision and reliability

of several statistical estimators in sampling four accounting populations. Two of the populations

had high error rates and two had moderate error rates.4 The study concluded that MUS is preferable

4 In order to evaluate the effectiveness of ratio and difference estimation, the populations selected were required tocontain errors. Thus, the study was not intended to provide evidence on representative populations of accounts,including low-error accounts, tested via audit sampling.



TA

BL

E2

Res

earc

hon

Sta

tist

ical

Sam

pli

ng

Pa

nel

A:

Stu

die

sS

ign

ifica

ntl

yIn

flu

enci

ng

Cu

rren

tA

ud

itP

ract

ice

Res

earc

hA

rea

of

Pra

ctic

eE

ffec

to

nP

ract

ice

Str

ing

er(1

96

3)

Mo

net

ary

Un

itS

amp

lin

gC

on

tain

sth

eo

rig

inal

met

ho

do

log

yfo

rM

US

(esp

ecia

lly

wh

enp

erfo

rmed

man

ual

ly).

Als

oco

nta

ins

the

fou

nd

atio

ns

for

the

aud

itri

skm

od

el.

Les

lie

etal

.(L

TA

19

79

)

and

var

iou

sp

aper

sb

y

thes

eau

tho

rs

Mo

net

ary

Un

itS

amp

lin

gA

nex

ten

siv

ete

xt

on

MU

S;

des

crib

esth

ece

llm

eth

od

,w

hic

his

use

din

IDE

A

and

AC

L.

Pro

vid

esth

era

tio

nal

efo

rM

US

;al

sop

rov

ides

gu

idan

ceo

nri

skan

d

mat

eria

lity

.

Var

iou

sp

aper

sb

yth

ese

auth

ors

hel

ped

reso

lve

issu

esre

late

dto

MU

S.

Au

dit

ors

wh

ou

seM

US

are

pri

mar

ily

usi

ng

eith

erS

trin

ger

’s(1

96

3)

met

ho

d,

or

LT

A’s

met

ho

ds.

Net

eran

dL

oeb

bec

ke

(19

75

)

Cla

ssic

alV

aria

ble

s

Sam

pli

ng

Po

inte

do

ut

the

dan

ger

so

fu

sin

gra

tio

and

dif

fere

nce

esti

mat

ion

un

less

sam

ple

size

isn

ot

smal

lan

den

ou

gh

dif

fere

nce

sar

efo

un

d;

led

tom

uch

gre

ater

use

of

MU

Sfo

rlo

wer

ror

rate

po

pu

lati

on

s;al

sole

dso

me

aud

ito

rsto

esta

bli

sh

min

imu

mst

ratu

msi

zean

dm

inim

um

nu

mb

ero

fer

rors

tou

sera

tio

and

dif

fere

nce

esti

mat

ion

(an

dre

gre

ssio

nes

tim

atio

n).

Ro

ber

ts(1

97

8)

Cla

ssic

alV

aria

ble

s

Sam

pli

ng

Co

nta

ins

form

ula

san

dg

uid

ance

for

clas

sica

lv

aria

ble

ssa

mp

lin

g.

Th

ese

form

ula

s

are

inID

EA

and

oth

erso

ftw

are.

Als

od

iscu

sses

clas

sica

lP

PS

met

ho

ds

toal

low

aud

ito

rsto

use

MU

Sin

hig

her

ror

rate

situ

atio

ns.

Ell

iott

and

Ro

ger

s(1

97

2)

Hy

po

thes

isT

esti

ng

Ch

ang

edau

dit

ors

’o

rien

tati

on

fro

mes

tim

atin

gv

alu

esto

test

ing

hy

po

thes

es.

Dis

tin

gu

ish

esb

etw

een

risk

of

inco

rrec

tac

cep

tan

cean

dri

sko

fin

corr

ect

reje

ctio

n,

and

stre

sses

the

pri

mar

yn

eed

toco

ntr

ol

risk

of

inco

rrec

tac

cep

tan

ce.

Th

isis

the

app

roac

hu

sed

incu

rren

tg

uid

ance

.

Akre

shan

dF

inle

y

(19

79

);R

ob

erts

(19

78

)

Seq

uen

tial

Sam

pli

ng

Fo

rco

ntr

ol

test

s,so

me

firm

su

sese

qu

enti

alsa

mp

lin

g,

eith

erst

atis

tica

lly

or

asa

bas

isfo

rn

on

stat

isti

cal

pla

ns

that

app

rox

imat

est

atis

tica

lp

lan

s.A

ud

itS

am

pli

ng

Au

dit

Gu

ide,

Ap

pen

dix

B(A

ICP

A2

01

2a)

,d

iscu

sses

seq

uen

tial

sam

pli

ng

.

(con

tinu

edo

nn

ext

pa

ge)



TA

BL

E2

(co

nti

nu

ed)

Pa

nel

B:

Oth

erS

tud

ies

tha

tH

ave

No

tS

ign

ifica

ntl

yIn

flu

ence

dC

urr

ent

Au

dit

Pra

ctic

e

Res

earc

hA

rea

of

Pra

ctic

eS

um

ma

ryo

fF

ind

ing

sP

oss

ible

Rea

son

sfo

rL

imit

edIn

flu

ence

Su

gg

esti

on

sfo

rF

utu

reR

esea

rch

Net

eret

al.

(19

78

)M

on

etar

yU

nit

Sam

pli

ng

Pro

po

ses

the

mu

ltin

om

ial

bo

un

d

ascl

ose

stto

the

theo

reti

call

y

corr

ect

MU

Sb

ou

nd

.T

his

bo

un

dre

mo

ves

the

exce

ss

con

serv

atis

mo

fS

trin

ger

(19

63

)an

dce

llb

ou

nd

s.

Au

dit

ors

are

no

tco

nce

rned

wit

h

exce

ssco

nse

rvat

ism

ifth

ey

fin

dn

oer

rors

or

ifth

eyca

n

stil

lac

cep

tth

ere

sult

s.

Mu

ltin

om

ial

bo

un

dre

qu

ires

exte

nsi

ve

com

pu

ter

reso

urc

es

toco

mp

ute

.

Do

esto

day

’sex

ten

siv

e

com

pu

ter

po

wer

mak

eth

isa

bet

ter

met

ho

dfo

rev

alu

atin

g

MU

Ssa

mp

les?

Fel

ixet

.al

.(1

99

0)

Mo

net

ary

Un

itS

amp

lin

gIn

dic

ates

that

Art

hu

rA

nd

erse

n

had

dev

elo

ped

and

use

d

soft

war

efo

rth

em

om

ent

bo

un

db

ased

on

anal

ysi

sin

Gri

mlu

nd

and

Fel

ix(1

98

7).

Als

ose

eD

wo

rin

and

Gri

mlu

nd

(19

84

),T

sui

etal

.

(19

85

),an

dM

enze

fric

ke

and

Sm

ieli

ausk

as(1

98

4)

for

rela

ted

rese

arch

.

Lac

ko

fin

tere

stin

stat

isti

cal

sam

pli

ng

inth

e1

99

0s.

Fir

ms

did

no

tw

ant

tom

od

ify

thei

r

sam

pli

ng

tech

niq

ues

.

Wh

yar

eo

ther

firm

sn

ot

usi

ng

this

bo

un

d?

Wh

yd

idn

’tso

ftw

are

ven

do

rs

use

this

met

ho

d?

(con

tinu

edo

nn

ext

pa

ge)



TA

BL

E2

(co

nti

nu

ed)

Res

earc

hA

rea

of

Pra

ctic

eS

um

ma

ryo

fF

ind

ing

sP

oss

ible

Rea

son

sfo

rL

imit

edIn

flu

ence

Su

gg

esti

on

sfo

rF

utu

reR

esea

rch

Bir

nb

erg

(19

64

);T

racy

(19

69

);S

cott

(19

73

);

Fel

ixan

dG

rim

lun

d

(19

77

);G

od

frey

and

Net

er(1

98

4);

McC

ray

(19

84

);S

haf

eran

d

Sri

vas

tav

a(1

99

0)

Bay

esia

nst

atis

tics

and

oth

erd

ecis

ion

syst

ems

Th

ere

was

som

eea

rly

exp

erim

enta

tio

nw

ith

Bay

esia

nm

eth

od

s;th

ere

wer

e

also

qu

esti

on

sab

ou

tit

su

se.

See

the

sum

mar

yin

Ak

resh

et

al.

(19

88

,4

4–

51

).

Met

ho

dis

com

ple

x;

earl

y

soft

war

en

ot

use

rfr

ien

dly

;

nee

dto

quan

tify

audit

or

jud

gm

ents

.

Ho

wd

oth

ese

met

ho

ds

com

par

e

wit

hcl

assi

cal

met

ho

ds

in

term

so

fd

efen

sib

ilit

y,

sam

ple

size

s,ev

alu

atio

ns,

and

abil

ity

toag

gre

gat

eev

iden

ce?

Can

use

r-fr

ien

dly

soft

war

eb

e

dev

elo

ped

tom

ake

itea

sier

to

un

der

stan

dth

ese

met

ho

ds?

Wh

yd

idn

’tth

ep

rofe

ssio

n

adopt

thes

em

ethods?

Should

the

pro

fess

ion

reco

nsi

der

Bay

esia

no

ro

ther

dec

isio

n

met

ho

ds

of

sam

ple

size

det

erm

inat

ion

and

eval

uat

ion

?

Les

lie

etal

.(1

97

9);

Ald

ersl

eyan

dL

esli

e

(19

84

);A

ud

itG

uid

e

(AIC

PA

20

12

a)

Sam

pli

ng

inm

ult

i-

loca

tio

nau

dit

s

Th

ere

are

dif

fere

nt

way

sto

pla

n

sam

pli

ng

for

mu

lti-

loca

tio

n

aud

its.

Cu

rren

tst

ate

of

pra

ctic

e

un

clea

r—th

isar

eah

asn

ot

bee

nsi

gn

ifica

ntl

yre

sear

ched

.

Wh

atar

eap

pro

pri

ate

way

sto

sam

ple

inm

ult

i-lo

cati

on

aud

its?

Wh

atar

eg

oo

dw

ays

toev

alu

ate

resu

lts?



for use with populations with low error rates. The study suggested that ratio and difference

estimation should not be used for low error rate populations, as the usual statistical procedures for

calculating confidence intervals for the ratio and difference estimators, whether with unstratified or

stratified samples, involve confidence coefficients far below the nominal coefficient when the

population error rate is low and the sample size is less than 200.5

These findings are particularly germane in a post-SOX environment, as recent research

indicates that the vast majority of samples from accounting populations where sampling is used find

little or no error. In a study of sampling applications performed by a large global network firm,

Durney et al. (2012) report that about 80 percent of the sampling applications find no

misstatements, and 90 percent of sampling applications contain projected misstatement less than 0.5

percent of the unaudited book value of the sampled population. When a population has no error, it

renders ratio and difference estimation useless, as those methods provide reliable confidence

intervals only when sufficient misstatements are found.

Monetary Unit Sampling

MUS is a tool invented by auditors to deal with the low error rate populations often found in

financial systems. Stringer (1963) was the first in the U.S. to write about this method. Several

studies in the 1970s and 1980s by Leslie, Teitlebaum, and Anderson (LTA), together and

individually, addressed the use of MUS. Their book, Dollar-Unit Sampling (Leslie et al. 1979), is

still the best description of how to apply this method.

One perceived disadvantage of MUS using the evaluation method described by Stringer (1963)

is that it is very conservative; that is, when errors are observed, it understates the confidence

achieved or overstates the upper limit of error. Several studies have attempted to improve on MUS

evaluation methods, or find better evaluation formulas to reduce unnecessary conservatism. LTA

introduced the cell method as one way to reduce conservatism. This method is now included in

some audit software, such as ACL and IDEA.

Other examples of attempts to reduce conservatism include studies on the multinomial bound

(see Neter et al. 1978; Leitch et al. 1981, 1982) and studies comparing several newer bounds (see

Grimlund and Felix [1987], which compares four new methods of evaluating MUS samples—

Bayesian Normal, Cox and Snell, Modified Moment, Multinomial Dirichlet). Even though these

methods are less conservative, auditors have not widely adopted them, possibly because auditors are

not concerned about conservatism as most populations sampled have little or no error, and/or because

auditors do not use the formal statistical MUS evaluation methods when evaluating samples.

Determine Sample Size

Determination of sample size also varies, depending on whether the test involves tests of controls

or tests of details. For attribute sampling used in test of controls, sample size depends on the desired

confidence level, tolerable deviation rate, and expected population deviation rate. For monetary

sampling used in tests of details of balances, the required sample size depends upon the desired

confidence level, which is a function of risk and assurance from other tests, tolerable misstatement,

expected population misstatement, and population size. The PCAOB identified several issues related

to audit sampling in a report on inspections of 1,662 audits performed by the eight annually inspected

5 When the mean-per-unit estimator is employed with stratified random sampling, the large-sample confidencelimits appear to be reasonably reliable. However, mean-per-unit estimation assumes the variability in thereported amounts is a good proxy for the variability of errors in the population, which does not hold forpopulations with little to no error. As such, in most accounting populations, mean-per-unit sampling is not anefficient sampling approach.



firms for the years 2004–2007 (PCAOB 2008). Issues included (1) sample sizes that were too small to

form a conclusion about the account balance or class of transactions tested, (2) failing to project the

errors identified to the entire population, (3) failing to select the sample in such a way that it could be

expected to be representative of the population, and (4) not appropriately testing all the items in the

sample. The PCAOB finding that sample sizes are too small suggests that audit firms may not be

adequately considering the factors impacting sample size, which is our third discussion question:

DQ3: Do auditors appropriately consider the required factors in the determination of audit

sample sizes? Do they consider other factors?

The PCAOB report noted above is based on data that are more than five years old. Data from

more recent inspections would indicate whether auditors continue to have these problems in

applying audit sampling. Data from inspections from firms other than the eight largest firms and for

private company and governmental audits would also be helpful in identifying whether auditors

appropriately consider factors impacting the determination of sample size in other settings.

Tests of Controls

Regardless of whether statistical or nonstatistical sampling is used, sample size depends upon

the degree of assurance required, the expected rate of deviation, and the tolerable deviation rate.

There appears to be limited research into how these sampling parameters are established in practice.

In addition to the sampling guidelines for controls for large populations that are applied on a

recurring basis, the AICPA (2012a, 39, Table 3.5) Sampling Guide provides minimum suggested

sample size guidelines for controls that operate on a quarterly, monthly, semimonthly, or weekly

basis where the control test may not be the primary source of reliance on the control. Jacoby and

Hitzig (2011) recomputed statistical sample sizes for infrequent controls (e.g., quarterly, monthly,

weekly) and determined that the AICPA minimum sample sizes in the 2008 Sampling Guide were

too small. The authors demonstrate that, for example, a control that operates four times a year

(i.e., quarterly) would require a sample size of four, even if the auditor only wants low assurance

(relatively high control risk), whereas the AICPA guidance indicates a sample size of two.

Despite this criticism, the AICPA task force did not revise the suggested sample sizes in Table

3.5 of the 2012 Sampling Guide. The Guide indicates the extent of testing for infrequent controls is

based on the application of ‘‘experience and judgment.’’ The Guide also indicates that the extent of

testing in the table reflects ‘‘the assumption that the test is often not a sole source of evidence

relating to the control objective in an audit of the financial statements and therefore a higher risk of

overreliance is acceptable. In less frequently operating controls, the effect of other sources of

evidence is often greater than for more frequently operating controls’’ (AICPA 2012a, 39). Thus,

high levels of assurance may not be necessary to provide reasonable assurance of the operating

effectiveness of a control when the population is very small, such as four, 12, 24 or 52 items, and

when the auditor has other evidence to address the control objective.6 The Sampling Guide

indicates that when the control test is the sole source of evidence regarding the effectiveness of

controls, and a specific high level of audit evidence is desired, sampling parameters may be used to

determine an appropriate sample size instead of Table 3.5.

Additional research examining how population size and frequency of operation of a control

affect sample size may be useful. Are auditors more likely to use sampling and select larger sample

6 Footnote 14 of paragraph 3.62 in the Sampling Guide (AICPA 2012, 39) indicates that some examples of theother implicit sources of evidence include ‘‘inherent risk assessments, assessments of design andimplementation, past experience, walkthroughs, corroborating inquiries, other control testing, knowledge aboutother balances, competence of personnel, systems knowledge, and so on.’’



sizes for tests of controls on public company audits? Do the large global network firms use the same

sampling approaches and similar sample sizes across political borders? How do sample sizes

compare across public accounting firms? How has recent audit fee competition influenced the use of

sampling and sample sizes? Do auditors generally test controls for audits of private companies?

Test of Balances

The Sampling Guide indicates that sample size for tests of balances depends on the dollar

amount of the population, tolerable misstatement, inherent risk, control risk, expected size and

frequency of misstatements, and the assessment of risks related to other substantive audit

procedures. PCAOB inspections for 2004–2007 found instances of sample sizes that were too small

to form an appropriate conclusion (PCAOB 2008).

Kachelmeier and Messier (1990) document the tendency of auditors to ‘‘work backward’’ and

lower assessed risks to achieve a desired sample size. Messier et al. (2001) find that the tendency to

work backward still exists using the 1999 version of the Sampling Guide.7 Using empirical data

from 1993–1994 and 1999 collected from individual offices of two Big 4 firms and one large

regional firm, Elder and Allen (2003) found that risk assessments and sample sizes declined over

the period, and sample sizes were only weakly related to assessments of inherent risk and control

risk.

These studies predate SOX and the issuance of risk assessment suites of standards by the ASB

and PCAOB. Using post-SOX data, Durney et al. (2012) report higher average sample sizes than

Elder and Allen (2003). Because risk assessments are now initially made during performance of risk

assessment procedures, it is unclear whether auditors are able to lower risk assessments during

testing to achieve lower sample sizes.

Determining sample size is generally thought of in terms of sampling from an individual

account or type of transaction. In these instances, the auditor can determine separate sample sizes

based on individual component parameters, or the auditor can determine an overall sample size

based on the overall entity (assuming sufficient homogeneity in risks, processes, nature of

transactions) and allocate the sample to components based on the proportional size.8 Appendix E of

the Sampling Guide provides some practical guidance on multi-location sampling.

Little evidence exists as to how auditors make these allocation decisions. Menzefricke and

Smieliauskas (1988) extend the model in Scott (1973) to allocate sample size across multiple

accounts to be sampled. The auditor’s objective is to minimize the combined costs of sampling and

risk of misestimating the accounts due to sampling error. Populations that are more likely to contain

error are sampled more, while those that are costly to test are sampled less.

Paragraph 4.54 in the Sampling Guide (AICPA 2012a, 64–65) indicates that when there are

numerous accounts where uncertainty exists or the results are based on numerous tests at various

locations, tolerable misstatement might be set at 50 percent or less of materiality, compared to

ranges of 50 percent to 75 percent often observed. The relationship between tolerable misstatement

7 A number of studies have found a relationship between auditors’ risk assessments and planned audit hours (e.g.,Johnstone and Bedard 2001; Hackenbrack and Knechel 1997; O’Keefe et al. 1994). In contrast, in empiricalstudies based on archival data from one firm, Mock and Wright (1999, 1993) find that auditor risk assessmentsare associated with the number of audit hours, but risk assessments are not significantly associated with thenumber of audit procedures. Because these studies do not directly address auditor sampling decisions, they arenot discussed further.

8 Monetary unit sampling and some other sampling methodologies allow the auditor to determine one sample forall accounts to be sampled using monetary unit sampling. There is little academic evidence on whether thisapproach is used in practice.



and materiality can vary to reflect risk and efficiency characteristics. However, audit risk and

allowance for sampling risk are still to be considered for the aggregate of samples.

Additional research is needed on auditors’ current sample size decisions, including current

levels of risk assessment, overall materiality, performance materiality, and tolerable misstatement,

and their impact on sample sizes. Does establishment of risks in audit planning affect auditors’

ability to adjust risks to reduce sample sizes? How does multi-location testing impact sample sizes?

Is multi-location testing impacted by whether the locations to be tested are domestic or foreign? For

example, are auditors more likely to pool domestic locations as one population to be sampled? Do

auditors treat foreign locations as separate populations for sampling, even when the foreign location

has similar transactions and controls as domestic locations?

Select Sample Items

Our fourth discussion question addresses how auditors select sample items for testing:

DQ4: What methods do auditors use to select sample items for testing?

As discussed earlier, Hall et al. (2002) found in a survey that 85 percent of audit procedures

involved nonstatistical sampling. Of these nonstatistical sampling procedures in tests of controls

and tests of balances, 87 percent used haphazard sampling. This is not surprising since the standards

permit haphazard sampling even though more sophisticated, but easy to apply, methods exist for

selecting samples. However, it is not clear whether their survey instrument allowed responses that

would characterize other forms of sample selection, and auditors may use different selection

techniques today than in 1997 when the survey was administered.

Hall et al. (2000) demonstrate that novice auditors are unable to eliminate bias in haphazard

sample selection. If the bias in novice auditor selection is correlated with the characteristic of audit

interest, errors in the population, then haphazard selection could lead to selections that are not

expected to be representative. Using storage units set up to hold inventory items and vouchers,

auditors tended to select large, brightly colored, and conveniently located sampling units. Using a

similar methodology, Hall et al. (2001) demonstrate that a strategy suggested in the literature of

doubling the haphazard sample does not completely correct the selection bias.

As previously noted, Hall et al. (2002) found that MUS is the most common statistical

sampling method used. MUS samples are selected with probability proportionate to size, so that

larger items are automatically emphasized. There are various MUS methods for selecting the

sample; the most common are the interval and cell methods.9

Stratification of populations is applicable to nonstatistical and statistical variables sampling.

Auditing standards require selection of all items greater than tolerable misstatement. Anecdotal

evidence suggests most firms usually sample from the remaining items as a single stratum (this is

technically not a stratified sample). Given that accounting populations are heteroscedastic (the

dollar amount of misstatement for an item is often related to the recorded value of the item), there

may be efficiency gains to further stratification.10 However, this is not applicable in populations

9 The interval length is the population size divided by the sample size. A random start is selected from a numberbetween one and the length of the interval, and the interval length is added to the item selected to identify thenext sample dollar. The cell method is an interval selection method that selects random dollars in each interval.These methods insure that items greater than the interval will be selected for testing, but have the potential forbias associated with systematic selection methods if patterns exist in the sample data. In contrast, using therandom method, each dollar in the population has an equal probability of being selected. Although larger itemswill have a greater probability of being selected, their selection is not guaranteed.

10 For example, Roshwalb et al. (1987) discuss a model-based statistical sampling approach. They applied theapproach to four inventory populations using difference estimation. The model used a large number of strata, butresulted in a smaller sample size.



for which no or few misstatements are expected, which may be common for many audits (Durney

et al. 2012).

Auditors may select initial samples randomly or probabilistically, but then supplement the

initial sample with additional items that may be chosen nonstatistically. Wright (1991) documents

that the monetary unit skip interval selection method for augmenting a probability proportionate to

size sample over-represents small book value units. He documents that doubling the sample size to

halve the skip interval and selecting randomly from a preselected ‘‘PPS pool’’ maintains the

sampling inclusion properties. The more general implication is that an augmented sample may not

maintain the sample characteristics of the original sample. Although sieve sampling is not widely

used in practice, Hoogduin et al. (2010) present a modified sieve selection procedure that maintains

sampling stratum PPS properties in multi-stage samples.

Current research evidence on auditor sample selection techniques would be useful. For

example, to what extent do auditors use haphazard or directed sample selection? Do auditors use

random selection techniques with nonstatistical sampling? To what extent do selection biases

demonstrated in research result in samples that are not representative of the primary population

characteristics of interest (i.e., error rate)? Recent research by Durney et al. (2012) suggests that

structured computerized decision templates result in improved auditor evaluation of samples. Does

such use of technology improve the consistency of application of audit sampling, including sample

planning and the selection decisions? Finally, does the use of computerized decision templates ever

lead to less effective sampling by reducing professional judgment?

Perform Audit Procedures

Perhaps the most critical part of audit sampling is the actual performance of the audit

procedures; however, few research studies directly examine this issue. The fifth discussion question

addresses auditor effectiveness in detecting sample errors:

DQ5: How effective are auditors and audit teams in detecting errors in audit samples?

Waggoner (1990) gave 25 Big 8 auditors a set of 25 disbursement packages seeded with

errors. Auditors failed to detect 45 percent of the errors in the sample. Detection was related to

task experience, suggesting that close attention should be paid to the work of inexperienced staff

members that have not previously performed the task. Participants were not selected at random

and were not familiar with the audit program used. In addition, participants’ work was not

reviewed, as it would be in a typical auditing setting. Thus, this study may overstate the extent to

which auditors fail to detect sample errors, but the results suggest that evaluations of populations

based on samples may underestimate population error rates due to failure to consider non-

sampling risk.

We are unaware of any studies that address auditor effectiveness in detecting misstatements in

sampling monetary balances, although there is an extensive, but dated, literature on the

effectiveness of confirmations in detecting misstatements (see Caster et al. [2008] for a review).

Nonsampling risk arising from auditor failure to detect misstatements in a sample will result in

underestimating the projected misstatement and the upper limit on misstatement.

Nonsampling risk may be partially addressed by quality control procedures, including the

review process, as well as obtaining more evidence from other audit procedures that have lower

nonsampling risk. Further evidence on the causes and extent of nonsampling risk would be helpful

in assessing its impact on audit effectiveness, and whether existing levels of quality control and

review procedures are sufficient to mitigate nonsampling risk.



Evaluate Errors Found

Existing academic studies on evaluating errors primarily focus on error projection in tests of

details of balances.11 Akresh and Tatum (1988) surveyed CPAs and found that auditors have

difficulty with error projection. In a survey of New York CPA firms, Hitzig (1995) found that

failure to project errors occurs more frequently with nonstatistical sampling. PCAOB inspection

reports (PCAOB 2008) also identified deficiencies in auditors failing to project sample errors. The

sixth discussion question addresses how auditors project errors to the population:

DQ6: To what extent do auditors project sample misstatements to the population? What

factors impact the decision to project sample misstatements?

Burgstahler and Jiambalvo (1986) presented a sample of Big 8 auditors with a set of seven

hypothetical accounts receivable confirmation differences. The auditors projected only 33 percent

of the errors, indicating that auditors fail to project most misstatements, although many of the seven

cases involved unique characteristics. They suggest auditors base the decision to project on the

auditor’s belief about the representativeness of the error (even though the concept of

representativeness applies to the total sample, not individual items), and argue that it is rarely

appropriate to isolate (not project) an error, since the errors found proxy for unknown errors.

Dusenbury et al. (1994) performed a study involving 105 experienced auditors from one

international accounting firm using cases modeled after those in Burgstahler and Jiambalvo (1986).

They manipulated error representativeness (error versus fraud) and information on containment of

the error to a subpopulation of the unsampled items. Auditors were more likely to project errors that

they believed were more representative of typical misstatements and less likely to project errors

when they believed the error was isolated or containment information was present.12

Elder and Allen (1998) investigate the factors associated with auditors’ decisions not to project

errors in accounts receivable and inventory, based on 235 accounts receivable and inventory

sampling applications for 53 audits of fiscal year 1993 or 1994, involving individual offices from

three audit firms. Auditors projected 67 percent of the errors. Auditors were more likely not to

project when errors were immaterial in nature, where the population was not well-specified, or in

situations where the error was contained to a subpopulation. There were significant differences in

projection rates across different types of tests, and across firms. Hermanson (1997) found that

auditors from structured audit firms were more likely to project errors than those from less-

structured firms. In a follow-up study of fiscal year 1999 audits, Allen and Elder (2005) found that

projection rates decreased for the two Big 6 firms in their sample. One of the firms increased its

reliance on containment procedures, and another firm increased its use of immateriality as

justification of a decision not to project an error.13

PCAOB (2008) inspection reports and discussions with practitioners suggest sample

misstatements may not always be projected to the population, which is an area prior research

suggests is a potential risk. However, Wheeler et al. (1997) suggest that containment is appropriate

in some circumstances. Elder and Allen (1998) indicate that error containment often involves larger

11 Error projection is not a concern for attribute samples, since the projected deviation rate is the sample deviationrate, and most samples have an expectation of no deviations, meaning any deviation should cause unacceptableresults.

12 An isolated error is one that the auditor identifies as being unique, while a contained error is one the auditoridentifies as occurring only in a segment of the population.

13 The use of immateriality was based on the size of the error, presumably on the basis that with knowledge of thesize of the misstatement, sample, and population, the auditor could readily estimate whether the projectedmisstatement would exceed the minimum threshold for posting amounts to the schedule of possiblemisstatements.



errors; otherwise, the auditor would not expend the effort to document error containment or

isolation if the projected error were immaterial. Durney et al. (2012) examine 160 post-SOX

sampling applications from one audit firm and find that 97 percent of the samples with error

included a projection.

The PCAOB audit standards indicate that errors should be projected to the population.

However, international auditing standards indicate that rare cases of anomalous errors do not need

to be projected when the auditor is able to obtain evidence that a misstatement (or deviation) is not

representative of the population (ISA 530.13). While the related AICPA (2011c) clarified auditing

standard, AU-C 530, Audit Sampling, did not adopt the anomaly language, AU-C 450, Evaluationof Misstatements Identified during the Audit (AICPA 2011b), indicates that an observed

misstatement may not be an isolated occurrence. In addition, AU-C 450.A4 cautions the auditor

that ‘‘A misstatement may not be an isolated occurrence. Evidence that other misstatements may

exist include, for example, when the auditor identifies that a misstatement arose from a breakdown

in internal control or from inappropriate assumptions or valuation methods that have been widely

applied by the entity.’’

Previous research (e.g., Allen and Elder 2008) and PCAOB inspection reports (PCAOB 2008)

have indicated situations where auditors do not project sample errors, potentially compromising

audit effectiveness, although recent research (Durney et al. 2012) suggests significantly improved

auditor performance in projecting sample errors. Perhaps auditor isolation of errors is related to the

formal acknowledgement of anomalies in some recently revised auditing standards. Although

Hitzig (2001) indicates that there really is no such thing as an isolated error, additional guidance on

the use of error containment could be helpful.

Additional research is needed to determine audit firms’ current practices regarding error

projection. Do projection rates differ for private company and public company audits as a result of

PCAOB oversight? Elder and Allen (1998) and Durney et al. (2012) find that projection decisions

are impacted by use of computer documentation templates. Do techniques such as training and

decision aids improve auditor performance? Given the recent discussion of anomalies in auditing

standards and evidence in Elder and Allen (1998) that containment is used for larger errors, what

reasons, including anomalies, are currently used by auditors to justify not projecting errors? Further,

do auditors ever fail to project errors because of fear of potential disputes with clients?

Conclude on Acceptability of Population Based on Sample Results

Regardless of whether nonstatistical or statistical sampling is used, the auditor should project

sample errors and consider sampling risk in evaluating whether the population is acceptable based

on the sample results. For tests of controls, this involves either comparing the computed upper

deviation rate based on the sample results to the planned tolerable deviation rate established by the

auditor or comparing the sample deviation rate to the planned deviation rate (the latter comparison

assumes the sample size was sufficiently large). For samples used as substantive tests of details, the

acceptability of the sample results is determined similarly by comparing the upper limit on

misstatement to tolerable misstatement or by comparing the projected misstatement to the expected

misstatement (this comparison assumes the sample size was sufficiently large). The final discussion

question addresses auditors’ evaluation of sample results:

DQ7: What factors impact auditors’ judgment as to whether the population is acceptable based

on the sample results?

Uecker and Kinney (1977) tested whether auditors’ evaluation of samples were affected by

representativeness and protectiveness heuristics. Under the representativeness heuristic, auditors

may accept the results of a sample with a low sample deviation rate, even if the actual results are



unacceptable. Under the protectiveness heuristic, auditors may accept the results of a larger sample

compared to the results from a smaller sample, even if the smaller sample has a lower deviation rate.

They found that auditors employed both heuristics, indicating the potential risk of drawing incorrect

conclusions when auditors evaluate samples nonstatistically. However, this paper predates current

auditing standards.

Blocher and Bylinski (1985) provide subjects with eight trials involving the evaluation of a

sample of 100 accounts receivable selected from a population of 4,033 trade accounts receivable.

They manipulated the variance of audited sample values, mean error amounts, and error variances.

Auditors were asked to construct confidence intervals for the amount of misstatement and account

balance; the auditors’ constructed intervals were narrower than statistical confidence intervals.

Ham et al. (1985) study error rates and distributions for 20 audits for five years. They find that

error amounts are highly variable and not normally distributed. The magnitude and direction of

errors also differed across accounts. They also found that cutoff errors represented the most likely

source of material error for inventory, accounts receivable, and accounts payable. Butler (1985)

constructed a simple decision aid that caused auditors to consider the base rate of misstatement and

the predictability of the information. Auditors using the decision aid were more likely to make

correct decisions to accept or reject the population compared to a control group.

Peek et al. (1991) examined the effect of the AICPA (1983) Audit Sampling Audit Guide in

evaluating audit populations. They tested two decision rules used by some auditors to account for

sampling risk. One rule rejected the population if projected misstatement exceeded one-third of

tolerable misstatement; the other rejected the population if projected misstatement exceeded two-

thirds of tolerable misstatement. They tested four accounting populations and varied the extent of

error and tolerable error. Not surprisingly, the two decision rules differed significantly in the extent

to which they resulted in incorrect acceptance and incorrect rejection decisions.14

A more recent paper by Durney et al. (2012) examines 160 sampling applications of a large

network firm that instituted a formalized sampling worksheet to walk auditors through the steps of

sampling, and requires error projection and consideration of sampling risk. They find that sampling

risk was properly considered in the design of all samples, and in the 20 percent of the samples that

contained errors, they found 97 percent of the sampling applications included proper error

projection and consideration of sampling risk.

Elder and Allen (1998) found that auditors generally make direct linear projections of errors,

and did not quantify sampling risk for individual sampling applications. PCAOB AU 350 (}26)

indicates ‘‘if the total projected misstatement is close to the tolerable misstatement, the auditor may

conclude that there is an unacceptably high risk that the actual misstatements in the population

exceed the tolerable misstatement.’’ Burgstahler et al. (2000) found that auditors are less likely to

require an audit adjustment when the uncertainty associated with sampling risk is not properly

considered by the auditor.

In addition to failing to project errors and adequately consider sampling risk, evaluations of

samples may be affected by the presence of nonsampling error. Anderson and Kraushaar (1986)

14 Ponemon and Wendell (1995) asked 49 inexperienced auditors with an average of less than two years ofexperience, and 34 experienced auditors with an average of over four years of experience, to nonstatisticallyselect 50 sample items and nonstatistically set an upper misstatement bound for the supplies inventory for aschool district. The median of the statistical bounds was closer to the actual overstatement amount than themedian of the auditors’ nonstatistical bounds for all confidence levels, and had lower dispersion. A validationexperiment indicated that auditors did not judge confidence bounds set statistically as superior to bounds setnonstatistically. However, an alternative interpretation is that the task was not consistent with how at least someauditors consider sampling risk given the guidance in the current and previous AICPA Audit Sampling guides,which involves a direct comparison of projected misstatement to tolerable misstatement to determine if there isadequate allowance for sampling risk.



find that auditor measurement error can significantly affect statistical results and result in actual

confidence levels well below the desired confidence level. The potential presence of nonsampling

risk can be mitigated by use of adequate review, training, and other quality control procedures and

by use of conservative estimators.

Little current evidence exists on the use of nonstatistical sampling to evaluate audit samples

and the decision rules used to account for sampling risk and nonsampling risk. Further research

would be helpful to assess current statistical and nonstatistical sampling practices and the potential

for judgment errors in the evaluation of sample results, especially when done using nonstatistical

methods. Are there differences across regulatory or country boundaries in the projection of errors

and the consideration of sampling risk? If so, what are the causes (e.g., training in math and

statistics, standards)? For example, international auditing standards indicate errors observed by the

auditor may not need to be projected if they are ‘‘anomalous’’; however, similar wording is not

included in U.S. auditing standards. Are errors less likely to be projected for non-U.S. audits?

Future research can examine whether culture is a factor in possible differences in the projection of

errors by auditors in different countries.

SUMMARY AND CONCLUSIONS

We developed a framework of the audit sampling process, and reviewed academic and

practitioner research on the use of audit sampling. Current standards allow the use of statistical and

nonstatistical sampling methods; however, limited current research evidence exists on the methods

used by auditors and their relative effectiveness. Research indicates that auditors often

underestimate risks to minimize sample sizes. Establishing risks in planning may reduce the

extent to which auditors lower risk assessments to minimize sample sizes.

Several studies also document that auditors often failed to project errors, although recent

research suggests that post-SOX projection may be more common. Additional research could

address whether there are circumstances when it is acceptable not to project sample errors, and the

documentation necessary to support this conclusion. Auditors are prone to decision biases when

they evaluate samples nonstatistically that may result in auditors accepting populations that are

considered unacceptable based on specified levels of tolerable deviation rate or tolerable

misstatement.

Many questions raised in our current study have not been addressed by existing research. We

encourage further research in the following areas:

(1) Reporting on internal control, increases in computerized controls, and use of risk

assessment procedures and substantive analytical procedures have changed the way

auditors approach testing. How have these factors affected the types of tests auditors

consider to be sampling applications?

(2) Changes in the regulatory environment may have increased sample sizes and the need for

auditors to justify sampling decisions. To what extent are auditors currently using

statistical versus nonstatistical sampling in tests of controls and monetary tests of details?

Are there situations where firms require the use of statistical sampling? Are there situations

where firms prohibit the use of statistical sampling? Do these requirements vary based on

the regulatory environment?

(3) Given the changes in the need for assurance from sampling and the ability to easily apply

statistical sampling with computer technology, are there situations where statistical

sampling should be required?

(4) Elder and Allen (2003) found a weak relationship between risk and sample sizes. To what

extent are current sample sizes sensitive to risk factors? The AICPA Audit Sampling Audit

Guide indicates that sample sizes determined nonstatistically should be similar to those



determined statistically. Are there differences in practice for sample sizes determined

under the two approaches?

(5) Sampling decisions may be impacted by regulation and the legal environment. Are the use

of sampling, type of sampling, and sample sizes sensitive to the legal and regulatory

environment? If so, what are the causes—incentives, competition, culture, inspection, or

litigation risk? Are the difference observed within the large global network firms, or across

firms?

(6) What techniques are currently used by auditors to select sample items for testing? Do

auditors primarily use statistical or nonstatistical selection methods? Where auditors select

a nonstatistical sample, what quality controls are employed to be sure the sample is

expected to be representative of the characteristic(s) of interest?

(7) Companies have grown in size and complexity, but there is little research or guidance on

sampling in such environments. How does the existence of multiple locations, sometimes

with differing controls and accounting systems, influence the way sampling is planned,

performed, and evaluated? Further, is this problem exacerbated by a client with multiple

locations in different countries?

(8) Previous research (e.g., Burgstahler and Jiambalvo 1986; Elder and Allen 1998;

Burgstahler et al. 2000) indicates that auditors often fail to project misstatements,

potentially compromising audit effectiveness; recent research by Durney et al. (2012)

suggests improved auditor performance. How effective are auditors in projecting

misstatements? What impacts auditors’ decisions not to project sample errors?

(9) Burgstahler et al. (2000) find that auditors fail to consider sampling risk, potentially

resulting in incorrect audit conclusions. What techniques do auditors use to consider

sampling risk when evaluating samples nonstatistically?

Many of the research questions we have identified could be addressed using controlled

experiments or archival data from audit firms. We encourage researchers to pursue these questions

and audit firms to provide data access and subjects to support such research.

REFERENCES

Akresh, A. 1980. Statistical sampling in public accounting. The CPA Journal 50 (7): 20–26.

Akresh, A., and D. Finley. 1979. Two-step attributes sampling in auditing. The CPA Journal 46 (12): 19–

24.

Akresh, A., J. Loebbecke, and W. Scott. 1988. Audit approaches and techniques. In Research Opportunitiesin Auditing: The Second Decade, edited by Abdel-Khalik, A., and I. Solomon, 32–49. Sarasota, FL:

American Accounting Association.

Akresh, A., and K. Tatum. 1988. Audit sampling—Dealing with the problems. Journal of Accountancy(December): 58–64.

Aldersley, S., W. Felix, W. Kinney, and J. Loebbecke. 1995. Audit sampling. In Auditing Practice,Research, and Education: A Productive Collaboration, edited by Bell, T., and A. Wright. New York,

NY: AICPA.

Aldersley, S., and D. Leslie. 1984. Models for multilocation audits. Symposium on Audit Research VI, 99–

128, University of Illinois at Urbana–Champaign.

Allen, R., and R. Elder. 2005. A longitudinal investigation of auditor error projection decisions. Auditing: AJournal of Practice & Theory 24 (2): 69–84.

American Institute of Certified Public Accountants (AICPA). 1981. Audit Sampling. Statement on Auditing

Standards No. 39. New York, NY: AICPA.

American Institute of Certified Public Accountants (AICPA) Audit Sampling Committee. 1983. Audit andAccounting Guide: Audit Sampling. New York, NY: AICPA.



American Institute of Certified Public Accountants (AICPA). 2011a. Performing Audit Procedures in

Response to Assessed Risks and Evaluating the Audit Evidence Obtained. AU-C Section 330. New

York, NY: AICPA.

American Institute of Certified Public Accountants (AICPA). 2011b. Evaluation of Misstatements Identified

During the Audit. AU-C Section 450. New York, NY: AICPA.

American Institute of Certified Public Accountants (AICPA). 2011c. Audit Sampling. AU-C Section 530.

New York, NY: AICPA.

American Institute of Certified Public Accountants (AICPA). 2011d. Audit Conclusions and Reporting.

AU-C Section 700. New York, NY: AICPA.

American Institute of Certified Public Accountants (AICPA) Audit Sampling Committee. 2012a. Audit

Sampling: Audit Guide. New York, NY: AICPA.

American Institute of Certified Public Accountants (AICPA). 2012b. Audit Guide: Government Auditing

Standards and Circular A-133 Audits. New York, NY: AICPA.

Anderson, J., and J. Kraushaar. 1986. Measurement error and statistical sampling in auditing: The potential

effects. The Accounting Review 61 (3): 379–399.

Asare, S., and A. Wright. 2012. Investors’, auditors’, and lenders’ understanding of the message conveyed

by the standard audit report on the financial statements. Accounting Horizons 26 (2): 193–217.

Birnberg, J. 1964. Bayesian statistics: A review. Journal of Accounting Research (Spring): 108–116.

Blocher, E., and J. Bylinski. 1985. The influence of sample characteristics in sample evaluation. Auditing: A

Journal of Practice & Theory 5 (1): 79–90.

Burgstahler, D., S. Glover, and J. Jiambalvo. 2000. Error projection and uncertainty in the evaluation of

aggregate error. Auditing: A Journal of Practice & Theory 19 (1): 79–99.

Burgstahler, D., and J. Jiambalvo. 1986. Sample error characteristics and projection of error to audit

populations. The Accounting Review 61 (2): 233–248.

Butler, S. A. 1985. Application of a decision aid in the judgmental evaluation of substantive test of details

samples. Journal of Accounting Research 23 (2): 513–526.

Carpenter, B., and M. Dirsmith. 1993. Sampling and the abstraction of knowledge in the auditing

profession: An extended institutional theory perspective. Accounting, Organizations and Society 18

(1): 41–64.

Caster, P., R. Elder, and D. Janvrin. 2008. A summary of research and enforcement release evidence on

confirmation use and effectiveness. Auditing: A Journal of Practice & Theory 27 (4): 253–279.

Colbert, J. 1991. Statistical or nonstatistical sampling: Which approach is best? The Journal of Applied

Business Research 7 (2): 117–120.

Durney, M., R. Elder, and S. Glover. 2012. Error Rates, Error Projection, and Consideration of Sampling

Risk: Audit Sampling Data from the Field. Working paper, Brigham Young University.

Dusenbury, R., J. Reimers, and S. Wheeler. 1994. The effect of containment information and error

frequency on projection of sample errors to audit populations. The Accounting Review 69 (1): 257–

264.

Dworin, L., and R. Grimlund. 1984. Dollar-unit sampling for accounts receivable and inventory. The

Accounting Review 59 (2): 218–241.

Elder, R., and R. Allen. 1998. An empirical investigation of the auditor’s decision to project errors.

Auditing: A Journal of Practice & Theory 17 (2): 71–87.

Elder, R., and R. Allen. 2003. A longitudinal field investigation of auditor risk assessments and sample size

decisions. The Accounting Review 78 (4): 983–1002.

Elliot, R., and J. Rogers. 1972. Relating statistical sampling to audit objectives. Journal of Accountancy

(July): 46–55.

Felix, W., Jr., and R. Grimlund. 1977. A sampling model for audit tests of composite accounts. Journal of

Accounting Research (Spring): 23–41.

Felix, W., Jr., R. Grimlund, F. Koster, and R. Roussey. 1990. Arthur Andersen’s new monetary unit

sampling approach. Auditing: A Journal of Practice & Theory 9 (3): 1–16.



Gilberston, D., and T. Herron. 2003. Audit sampling methods and juror negligence awards: An expectation

gap? The Journal of Applied Business Research 19 (3): 109–122.

Godfrey, J., and J. Neter. 1984. Bayesian bounds for monetary unit sampling in accounting and auditing.

Journal of Accounting Research 22 (2): 497–525.

Gray, G., J. Turner, P. Coram, and T. Mock. 2011. Perceptions and misperceptions regarding the

unqualified auditor’s report by financial statement preparers, users, and auditors. Accounting

Horizons 25 (4): 659–684.

Grimlund, R., and W. Felix, Jr. 1987. Simulation evidence and analysis of alternative methods of evaluating

dollar-unit samples. The Accounting Review 62 (3): 455–479.

Hackenbrack, K., and R. Knechel. 1997. Resource allocation decisions in audit engagements.

Contemporary Accounting Research 14 (3): 481–500.

Hall, T., J. Hunton, and B. Pierce. 2000. The use of and selection biases associated with nonstatistical

sampling in auditing. Behavioral Research in Accounting 16: 231–255.

Hall, T., J. Hunton, and B. Pierce. 2002. Sampling practices of auditors in public accounting, industry, and

government. Accounting Horizons 16 (2): 125–136.

Hall, T., T. Herron, B. Pierce, and T. Witt. 2001. The effectiveness of increasing sample size to mitigate the

influence of population characteristics in haphazard sampling. Auditing: A Journal of Practice &

Theory 20 (1): 169–185.

Ham, J., D. Losell, and W. Smieliauskas. 1985. An empirical study of error characteristics in accounting

populations. The Accounting Review 60 (3): 387–406.

Hermanson, H. 1997. The effects of audit structure and experience on auditors’ decisions to isolate errors.

Behavioral Research in Accounting (Supplement): 76–93.

Hitzig, N. 1995. Audit sampling: A survey of current practice. The CPA Journal (July): 54–57.

Hitzig, N. 2001. The mythical isolated error. The CPA Journal 71 (9): 50.

Hoogduin, L., T. Hall, and J. Tsay. 2010. Modified sieve sampling: A method for single- and multi-stage

probability-proportional-to-size sampling. Auditing: A Journal of Practice & Theory 29 (1): 125–

148.

Houghton, C., and J. Fogarty. 1991. Inherent risk. Auditing: A Journal of Practice & Theory 10 (1): 1–21.

International Auditing and Assurance Standards Board (IAASB). 2009. Forming an Opinion and Reporting

on Financial Statements. International Standard on Auditing 700. New York, NY: IAASB.

Jacoby, J., and N. Hitzig. 2011. Auditing internal controls in small populations. The CPA Journal 81 (12):

34–36.

Johnstone, K., and J. Bedard. 2001. Engagement planning, bid pricing and client response: The effects of

risk and market context in initial attest engagements. The Accounting Review 76 (2): 199–220.

Kachelmeier, S., and W. Messier, Jr. 1990. An investigation of the influence of a nonstatistical decision aid

on auditor sample size decisions. The Accounting Review 65 (1): 209–226.

Leitch, R., J. Neter, R. Plante, and P. Sinha. 1981. Implementation of upper multinomial bound using

clustering. Journal of the American Statistical Association 76 (375): 530–533.

Leitch, R., J. Neter, R. Plante, and P. Sinha. 1982. Modified multinomial bounds for larger numbers of

errors in audits. The Accounting Review 57 (2): 384–400.

Leslie, D., A. Teitlebaum, and R. Anderson. 1979. Dollar-Unit Sampling: A Practical Guide for Auditors.

Belle Mead, NJ: Pittman Publishing.

McCray, J. 1984. A quasi-Bayesian audit risk model for dollar unit sampling. The Accounting Review 59

(1): 35–51.

Menzefricke, U., and W. Smieliauskas. 1988. On sample size allocation in auditing. Contemporary

Accounting Research 4 (2): 314–336.

Messier, W., Jr., S. Kachelmeier, and K. Jensen. 2001. An experimental assessment of recent professional

developments in nonstatistical audit sampling guidance. Auditing: A Journal of Practice & Theory 20

(1): 81–96.

Mock, T., and A. Wright. 1993. An exploratory study of auditors’ evidential planning judgments. Auditing:

A Journal of Practice & Theory 12 (2): 39–61.



Mock, T., and A. Wright. 1999. Are audit program plans risk-adjusted? Auditing: A Journal of Practice &Theory 18 (1): 55–74.

Neter, J., R. Leitch, and S. Fienberg. 1978. Dollar unit sampling: Multinomial bounds for total

overstatement and understatement errors. The Accounting Review 53 (1): 77–93.

Neter, J., and J. K. Loebbecke. 1975. Behavior of Major Statistical Estimators in Sampling AccountingPopulations: An Empirical Study. New York, NY: American Institute of Certified Public

Accountants.

O’Keefe, T., D. Simunic, and M. Stein. 1994. The production of audit services: Evidence from a major

public accounting firm. Journal of Accounting Research 32 (2): 241–261.

Peek, L., J. Neter, and C. Warren. 1991. AICPA nonstatistical audit sampling guidelines: A simulation.

Auditing: A Journal of Practice & Theory 10 (2): 33–48.

Ponemon, L., and J. Wendell. 1995. Judgmental versus random sampling in auditing: An experimental

investigation. Auditing: A Journal of Practice & Theory 14 (2): 17–34.

Public Company Accounting Oversight Board (PCAOB). 2003. Interim Standards—Audit Sampling. AU

Section 350. Washington, DC: PCAOB.

Public Company Accounting Oversight Board (PCAOB). 2008. Report on the PCAOB’s 2004, 2005, 2006,

and 2007 inspections of domestic annually inspected firms. PCAOB Release No. 2008-008

(December 5).

Roberts, D. 1978. Statistical Auditing. New York, NY: American Institute of Certified Public Accountants.

Roshwalb, A., R. Wright, and J. Godfrey. 1987. A new approach for stratified sampling in inventory cost

estimation. Auditing: A Journal of Practice & Theory 7 (1): 54–70.

Scott, W. 1973. A Bayesian approach to asset valuation and audit size. Journal of Accounting Research 11

(2): 304–330.

Shafer, G., and R. Srivastava. 1990. The Bayesian and belief-function formalisms: A general perspective for

auditing. Auditing: A Journal of Practice & Theory (Supplement): 110–148.

Smith, G., and J. Krogstad. 1984. Impact of sources and authors on Auditing: A Journal of Practice &Theory. Auditing: A Journal of Practice & Theory 4 (1): 107–117.

Stringer, K. 1963. Practical Aspects of Statistical Sampling in Auditing. Proceedings of Business and

Economic Statistics Section, American Statistical Association.

Sullivan, J. 1992. Litigation risk broadly considered. In Auditing Symposium XI: Proceedings of the 1992Deloitte and Touche/University of Kansas Symposium on Auditing Problems, 49–59. Lawrence, KS:

The University of Kansas School of Business.

Tracy, J. 1969. Bayesian statistical methods in auditing. The Accounting Review 44 (1): 90–98.

Trompeter, G., and A. Wright. 2010. The world has changed—Have analytical procedure practices?

Contemporary Accounting Research 27 (2): 669–700.

Tsui, K., M. Matsumura, and K. Tsui. 1985. Multinomial-Dirichlet bounds for dollar-unit sampling in

auditing. The Accounting Review 60 (1): 76–96.

Uecker, W., and W. Kinney, Jr. 1977. Judgmental evaluation of sample results: A study of the type and

severity of errors made by practicing CPAs. Accounting, Organizations and Society 2 (3): 269–275.

Waggoner, J. B. 1990. Auditor detection rates in an internal control test. Auditing: A Journal of Practice &Theory 9 (2): 77–89.

Wheeler, S., R. Dusenbury, and J. Reimers. 1997. Projecting sample misstatements to audit populations:

Theoretical, professional and empirical considerations. Decision Sciences (Spring): 261–268.

Wright, D. 1991. Augmenting a sample selected with probabilities proportional to size. Auditing: A Journalof Practice & Theory 10 (1): 145–158.



Copyright of Auditing is the property of American Accounting Association and its content may not be copied or

emailed to multiple sites or posted to a listserv without the copyright holder's express written permission.

However, users may print, download, or email articles for individual use.

Documents

Sampling Audit (paper)