46
Technology-Assisted Review Can be More Effective and More Efficient Than Exhaustive Manual Review Gordon V. Cormack University of Waterloo [email protected] (519) 888-4567 x34450 Maura R. Grossman Wachtell, Lipton, Rosen & Katz [email protected] (212) 403-1391

Technology-Assisted Review Can be More Effective and More Efficient Than Exhaustive Manual Review Gordon V. Cormack University of Waterloo [email protected]

Embed Size (px)

Citation preview

Technology-Assisted ReviewCan be More Effective andMore Efficient Than ExhaustiveManual Review

Gordon V. CormackUniversity of [email protected](519) 888-4567 x34450

Maura R. GrossmanWachtell, Lipton, Rosen & [email protected](212) 403-1391

2

Watson Versus Jennings and Ritter

3

Debunking the Myth of Manual Review The Myth:

That “eyeballs-on” review of each and every document in a massive collection of ESI will identify essentially all responsive (or privileged) documents; and

That computers are less reliable than humans in identifying responsive (or privileged) documents.

The Facts: Humans miss a substantial number of responsive (or privileged)

documents; Computers—aided by humans—find at least as many

responsive (or privileged) documents as humans alone; and Computers—aided by humans—make fewer errors on

responsiveness (or privilege) than humans alone, and are far more efficient than humans.

4

Human Assessors Disagree! Suppose two assessors, A and B, review the same set of

documents;

Overlap =

# documents coded responsive by both A and B# documents coded responsive by A or B, or both A and B

Example: Primary and secondary assessors both code 2,504 documents as responsive.

One or both code 2,531 + 2,504 + 463 = 5,498 documents as responsive.

Overlap = 2,504 ∕ 5,498 = 45.5%.

5

More Human Assessors Disagree Even More! Suppose three assessors, A, B, and C, review the same set of

documents;

Overlap =# documents coded responsive by A and B and C # documents coded responsive by one or more of A, B, or C

Example: Primary, secondary, and tertiary assessors all code 1,972 documents as responsive.

One or more code 1,482 + 532 + 224 + 1,972 + 1,049 + 239 + 522 = 6,020 documents as responsive.

Overlap = 1,972 / 6,020 = 32.8%.

6

Pairwise Assessor Overlap in the TREC 4 IR Task (Voorhees 2000)

7

Assessor Overlap With the Original Response to a DOJ Second Request (Roitblat et al. 2010)

8

Assessor Overlap: IR Versus Legal Tasks

9

What is the “Truth”? Option #1: Deem Someone Correct

Deem the primary reviewer as the gold standard (Voorhees 2000).

10

What is the “Truth”? Option #2: Take the Majority Vote

Deem the majority vote as the gold standard.

11

What is the “Truth”? Option #3: Have all Disagreements Adjudicated by a Topic Authority

Have a senior attorney adjudicate all but only cases of disagreement (Roitblat et al. 2010; TREC Interactive Task 2009).

12

How Good are Human Eyeballs?

What do we mean by “How Good”?

Recall; Precision; and F1.

13

Measures of Information Retrieval Recall =

# of responsive documents retrieved Total # of responsive documents in the entire document collection

(“How many of the responsive documents did I find?”)

Precision =# of responsive documents retrieved Total # of documents retrieved

(“How much of what I retrieved was junk?”)

F1 = The harmonic mean of Recall and Precision.

14

Recall and Precision

15

10% 20% 30% 40% 50% 60% 70% 80% 90% 100%

Recall

70%

80%

90%

100%

0%

10%

20%

30%

40%

50%

60%

0%

Prec

isio

n

TREC Best Benchmark(Best performance on Precision at a given Recall)

Perfection

Typical result in a manual responsiveness review

Blair & Maron (1985)

The Recall-Precision Trade-Off

16

How Good is Manual Review?

17

Effectiveness of Manual Review

18

How Good is Technology-Assisted Review?

19

What is “Technology-Assisted Review”?

20

Defining “Technology-Assisted Review”

The use of machine learning technologies to categorize an entire collection of documents as responsive or non-responsive, based on human review of only a subset of the document collection. These technologies typically rank the documents from most to least likely to be responsive to a specific information request. This ranking can then be used to “cut” or partition the documents into one or more categories, such as potentially responsive or not, in need of further review or not, etc.

Think of a spam filter that reviews and classifies email into “ham,” “spam,” and “questionable.”

21

Types of Machine Learning

SUPERVISED LEARNING = where a human chooses the document exemplars (“seed set”) to feed to the system and requests that the system rank the remaining documents in the collection according to their similarity to, or difference from, the exemplars (i.e., “find more like this”).

ACTIVE LEARNING = where the system chooses the document exemplars to feed to the human and requests that the human make responsiveness determinations from which the system then learns and applies that learning to the remaining documents in the collection.

22

Source: Servient Inc. http://www.servient.com/

Document Set for Review

Machine Learning Step #1: Achieving High Precision

23

Documents Set Excluded From Review

Source: Servient Inc. http://www.servient.com/

Machine LearningStep #2: Improving Recall

24

How Do We Evaluate Technology-Assisted Review?

25

The Text REtrieval Conference (“TREC”):Measuring the Effectiveness of Technology-Assisted Review

International, interdisciplinary research project sponsored by the National Institute of Standards and Technology (NIST), which is part of the U.S. Department of Commerce.

Designed to promote research into the science of information retrieval.

First TREC conference was held in 1992; the TREC Legal Track began in 2006.

Designed to evaluate the effectiveness of search technologies in the context of e-discovery.

Employs hypothetical complaints and requests for production drafted by members of The Sedona Conference®.

For the first three years (2006-2008), documents were from the publicly available 7 million document tobacco litigation Master Settlement Agreement database.

Since 2009, publicly available Enron data sets have been used.

Participating teams of information scientists from around the world and U.S. litigation support service providers have contributed computer runs attempting to identify responsive (or privileged) documents.

26

The TREC Interactive Task The Interactive Task was introduced in 2008, and repeated in 2009 and

2010.

It models a document review for responsiveness.

It begins with a mock complaint and associated requests for production (“topics”).

It has a single Topic Authority (“TA”) for each topic.

Teams may interact with the Topic Authority for up to 10 hours.

Each team must submit a binary (“responsive” / “unresponsive”) decision for each and every document in the collection for their assigned topic(s).

It provides for a two-step assessment and adjudication process for the gold standard: where the team and assessor agree on coding, the coding decision is deemed correct; where the team and assessor disagree on coding, appeal is made to the Topic Authority who determines which coding decision is correct.

TREC

27

Effectiveness of Technology-Assisted Review at TREC 2009

28

Manual Versus Technology-Assisted Review

29

But! Roitblat, Voorees, and the TREC 2009 Interactive Task all used

different datasets, different topics, and different gold standards, so we cannot directly compare them.

While technology-assisted review appears to be at least as good as manual review, we need to control for these differences.

30

Effectiveness of Manual Versus Technology-Assisted Review

31

So, Technology-Assisted Review is at Least as Effective as Manual Review, But is it More Efficient?

32

Efficiency of Technology-Assisted Versus Exhaustive Manual Review

Exhaustive manual review involves coding 100% of the documents, while technology-assisted review involves coding of between 0.5% (Topic 203) and 5% (Topic 207) of the documents.

Therefore, on average, technology-assisted review is 50 times more efficient than exhaustive manual review.

33

Why Are Humans So Lousy at Document Review?

34

Topic 204 (TREC 2009)

Document Request All documents or communications that describe, discuss,

refer to, report on, or relate to any intentions, plans, efforts, or activities involving the alteration, destruction, retention, lack of retention, deletion, or shredding of documents or other evidence, whether in hard-copy or electronic form.

Topic Authority Maura R. Grossman (Wachtell, Lipton, Rosen & Katz)

35

Inarguable Error for Topic 204

36

Interpretation Error for Topic 204

37

Arguable Error for Topic 204

38

Topic 207 (TREC 2009) Document Request

All documents or communications that describe, discuss, refer to, report on, or relate to fantasy football, gambling on football, and related activities, including but not limited to, football teams, football players, football games, football statistics, and football performance.

Topic Authority K. Krasnow Waterman (LawTechIntersect, LLC)

39

Inarguable Error for Topic 207

40

Interpretation Error for Topic 207

41

Arguable Error for Topic 207

42

Types of Manual Coding Errors

43

Take-Away Messages

Technology-assisted review finds at least as many responsive documents as exhaustive manual review (meaning that recall is at least as good).

Technology-assisted review is more accurate than exhaustive manual review (meaning that precision is much better).

Technology-assisted review is orders of magnitude more efficient than manual review (meaning that it is quicker and cheaper).

44

Measurement is Key

Not all technology-assisted review (and not all exhaustive manual review) is created equal.

Measurement is important in selecting and defending an e-discovery strategy.

Measurement also is critical in discovering better search methods and tools.

45

Additional Resources TREC

http://trec.nist.gov/

TREC Legal Track http://trec-legal.umiacs.umd.edu/

TREC 2008 Overview http://trec.nist.gov/pubs/trec17/papers/LEGAL.OVERVIEW08.pdf

TREC 2009 Overview http://trec.nist.gov/pubs/trec18/papers/LEGAL09.OVERVIEW.pdf

TREC 2010 Overview Forthcoming (April 2011) at http://trec-legal.umiacs.umd.edu/

Maura R. Grossman & Gordon V. Cormack, Technology-Assisted Review Can be More Effective and More Efficient Than Exhaustive Manual Review, XVII:3 Richmond Journal of Law & Technology (Spring 2011) (in press).

46

Questions?

Thank You!