74
The Neighborhood Auditing Tool James Geller Yehoshua Perl C. Paul Morrey

The Neighborhood Auditing Tool

Embed Size (px)

DESCRIPTION

The Neighborhood Auditing Tool. James Geller Yehoshua Perl C. Paul Morrey. Dayanand Sagar Kushal Chopra Sandeep Ramachandran Anisa Vishnani Aditi Dekhane Kandarp Shah Rajesh Gupta Suraj Pal Singh Saurabh Patel. Kartik Gopal Yakup Kav Rahul Bhave Sirish Motati Pratik Shah - PowerPoint PPT Presentation

Citation preview

Page 1: The Neighborhood Auditing Tool

The Neighborhood Auditing Tool

James GellerYehoshua PerlC. Paul Morrey

Page 2: The Neighborhood Auditing Tool

22

Participating Student Developers

Dayanand Sagar Kushal Chopra Sandeep Ramachandran Anisa Vishnani Aditi Dekhane Kandarp Shah Rajesh Gupta Suraj Pal Singh Saurabh Patel

Kartik Gopal Yakup Kav Rahul Bhave Sirish Motati Pratik Shah Saurabh Singhi Sirish Motati Reddy Sandeep Pasuparthy Ramya Gokanakonda

Page 3: The Neighborhood Auditing Tool

33

Overview

Goals of an Auditor’s Tool for the UMLS Principles of Auditing with Neighborhoods The Idea of a Hybrid Display Current State of the NAT: Serving the Auditor Feature Presentation Live Audit Session Planned State of the NAT: Guiding the Auditor Conclusions and Future Work

Page 4: The Neighborhood Auditing Tool

44

Auditing the UMLS

The UMLS consists of over 100 terminologies.

It is natural that inconsistencies will appear Over 1.5 million concepts and over 7

million terms Two level structure consisting of the

Semantic Network and the Metathesaurus

Page 5: The Neighborhood Auditing Tool

5

How We did it before the NAT: Paper Form

CPT: C1081844 Antonospora locustaeSRC: NCBISTY: T004T009 Fungus + InvertebrateDEF:SYN: Antonospora locustae | Nosema locustaePAR: Antonospora{STY: Invertebrate}CHD:

Page 6: The Neighborhood Auditing Tool

6

Previous Work on Auditing H. Gu, Y. Perl, J. Geller, M. Halper, L. Liu, and J.J. Cimino.

Representing the UMLS as an Object-oriented Database: Modeling Issues and Advantages. J Am Med Inform Assoc, 7(1):66-80, 2000.

J. Geller, H. Gu, Y. Perl, and M. Halper. Semantic refinement and error correction in large terminological knowledge bases. Data & Knowledge Engineering, 45(1):1-32, 2003.

Y. Chen, Y. Perl, J. Geller, and J.J. Cimino. Analysis of a study of the users, uses, and future agenda of the UMLS. J Am Med Inform Assoc, 14(2):221-231, 2007.

H. Gu, G. Hripcsak, Y. Chen, C.P. Morrey, G. Elhanan, J.J. Cimino, J. Geller, and Y. Perl. Evaluation of a UMLS auditing process of semantic type assignments. In J.M. Teich, J. Suermondt, and G. Hripcsak, editors, Proc AMIA Symp, pages 294-298, Chicago IL, Nov. 2007.

Page 7: The Neighborhood Auditing Tool

77

Auditing Results Paper Form(C1081844) Antonospora locustaeSTY: Fungus + Invertebrate

No errors Semantic Type Error: Fungus Semantic Type Error: Invertebrate Ambiguity Add Semantic Type______________________ Other error_____________________________ Comments _____________________________

______________________________________

Page 8: The Neighborhood Auditing Tool

88

Goals of an Auditor’s Tool for the UMLS

Display relevant information to the auditor. Do not overwhelm the auditor with too

much information. Helps the auditor focus on areas most

likely to contain errors.Neighborhood display of reviewed conceptsAlgorithms suggest likely erroneous concepts

Page 9: The Neighborhood Auditing Tool

99

Principles of Auditing with Neighborhoods

Several years of experience: Auditing is to a large degree a “local” activity.

Concepts have two kinds of knowledge elements:Textual Knowledge Elements: Preferred term,

CUI, synonyms, LUI, definition, sources, semantic types

CONtextual Knowledge Elements: Neighbors

Page 10: The Neighborhood Auditing Tool

1010

Neighborhoods

Focus concept: The concept presently under review

Immediate Neighborhood: The set of concepts reachable from the focus concept by stepping one relationship (up, down, lateral, etc.)

Extended neighborhood: Includes parents of parents (grandparents), children of children (grandchildren) and siblings. No lateral chains.

Page 11: The Neighborhood Auditing Tool

1111

Immediate Neighborhood

Microsporidia, Unclassified

Microsporidia <protozoa>

Dictyocoela Edhazardia

FibrillanosemaMicrosporidium

Kabatana

Oligosporidium

Cellular aspects of

Microbiological

Pathogenicity Aspects

virologic

Page 12: The Neighborhood Auditing Tool

1212

Extended Neighborhood

RELATIONSHIPS

SIBLINGS

GRANDCHILDREN

CHILDREN

FOCUS CONCEPT

PARENTS

GRANDPARENTS

Microsporidia, Unclassified

Microsporidia <protozoa>

Erroneous concept

fungus

PHYLUM MICROSPORA

Protozoa

Sporozeoa

Dictyocoela Edhazardia

FibrillanosemaMicrosporidium

Dictyocoela berillonum

Dictyocoela cavimanum

Edhazardia aedis

Fibrillanosema crangonycis

Microsporidium 57864

Dictyocoela dehayesum

Dictyocoela duebenum

Dictyocoela grammarellum

Dictyocoela muelleri

Dictyocoela sp.L11

Kabatana

Kabatana takedai

Microsporidium africanum

Microsporidium ceylonensis

Microsporidium cypselurus

Microsporidium prosopium

Microsporidium seriolae

Oligosporidium

Oligosporidium occidentalis

Microsporea

Cellular aspects of

Microbiological

Pathogenicity Aspects

virologic

SIB

Page 13: The Neighborhood Auditing Tool

13

Up-Extended and Down-Extended Neighborhood

An up-extended neighborhood includes grandparents and the immediate neighborhood.

A down-extended neighborhood includes grandchildren and the immediate neighborhood.

Give auditor all s/he needs but not more.

Page 14: The Neighborhood Auditing Tool

14

Semantic Type Neighborhood

If we provide the semantic types for every concept, those also form a neighborhood.

It is important to keep the information which semantic types belong to which concepts.

Page 15: The Neighborhood Auditing Tool

15

References about Neighborhood M.S. Tuttle, D.D. Sherertz, N.E. Olson, M.S. Erlbaum, W.D.

Sperzel, and L.F. Fuller, et al. Using META-1, the first version of the UMLS Metathesaurus. In Proc 14th Annu Symp Comput Appl Med Care, pages 131-135, Washington, D.C., 1990.

S.J. Nelson, M.S. Tuttle, W.G. Cole, D.D. Sherertz, W. D. Sperzel, M.S. Erlbaum, L.L. Fuller, N.E. Olson, From meaning to term: semantic locality in the UMLS Metathesaurus. In Proc Annu Symp Comput Appl Med Care, pages 209-213, Washington, D.C., 1991.

J.J. Cimino, H. Min, and Y. Perl. Consistency across the hierarchies of the UMLS Semantic Network and Metathesaurus. J Biomed Inform, 36(6):450-461, 2003.

Page 16: The Neighborhood Auditing Tool

1616

Desirable Information Beyond Neighborhoods

Concept definition for Focus Concept Concept sources for Focus Concept Assigned Semantic Types of concepts Definitions of relevant Semantic Types Global view of the Semantic Network

Indented (better for wide branches)Graphical (better for almost everything else)–

we set the standard on this.

Page 17: The Neighborhood Auditing Tool

1717

The Idea of a Hybrid Display

Diagrams are wonderful – as long as they fit on one screen.

Indented text is wonderful – as long as there are no or very few multiple parents.

But the UMLS does not fit onto one screen and there are many cases of multiple parents.

Page 18: The Neighborhood Auditing Tool

1818

WHAT makes a diagram wonderful?

You can follow parent/child paths with your eyes.

You can get a feeling for everything a concept is connected to with one look.

You can see multiple parents and paths with one look.

You can see global features (short and bushy versus tall and sparse, or (gasp) tall and bushy).

Page 19: The Neighborhood Auditing Tool

1919

What makes Indented Text Wonderful?

Indentation expresses parenthood elegantly.

There are no lines crossing. You don’t need a layout algorithm. There is a linear order in which to study

text.

Page 20: The Neighborhood Auditing Tool

2020

The Idea of a Hybrid Display (cont.)

Keep the best features of text and the best features of diagrams.

Maintain relative positions between the focus concept and its children, parents, etc.

Eliminate clutter of arrows.

Page 21: The Neighborhood Auditing Tool

2121

A Hybrid Diagram/Form Display of a Neighborhood

Children

Focus ConceptSynonyms Relationships

Parents

Page 22: The Neighborhood Auditing Tool

22

Important Auditing Principles

If a concept C has a combination of semantic types assigned, and very few other concepts C1…Cn (n < 6) have that same combination assigned, then C and C1…Cn are suspicious concepts.

We call this “a small intersection.” Group-based auditing: Audit sets of similar

concepts. Y. Chen, H. Gu, Y. Perl, J. Geller, and M. Halper. Structural group

auditing of a UMLS semantic type’s extent. J Biomed Inform, 2007. Accepted for publication.

Page 23: The Neighborhood Auditing Tool

2323

Current State of the NAT: Serving the Auditor

The Neighborhood Auditing Tool has been implemented to fully support display of neighborhoods.

Navigation to “adjacent neighborhoods” is easy.

Additional features listed before have been implemented.

Page 24: The Neighborhood Auditing Tool

2424

Demonstration of NAT Features

Neighborhood Relationships Siblings Grandparents and

grandchildren Synonyms Focus concept definition Focus concept sources Semantic Type display Semantic Type definition

Semantic Network (indented)

Semantic Network (diagram)

Display Options Navigation Search Viewing History UMLS version

offline version

Page 25: The Neighborhood Auditing Tool

2525

Audit Example

An algorithm determined that the concept Antonospora locustae was likely assigned incorrect semantic types.

We follow an auditor’s review of this concept using the data from 2007AA.

offline version

Page 26: The Neighborhood Auditing Tool

26

Preliminary Evaluation Study with NAT

Compare paper-based auditing and NAT-based auditing.

Counterbalanced groups. Recall improves with NAT use. Auditors

seem willing to investigate more concepts. Precision stays the same. Auditors’ mental

process does not improve (?).

Page 27: The Neighborhood Auditing Tool

2727

Planned State of the NAT:Guiding the Auditor by Finding

(i.e. Computing) Audit Sets As noted before, errors are likely in small

intersections. Planned new version of the NAT will compute

and display small intersections. Errors are clearly visible in small groups of

supposedly similar concepts. Planned new version of the NAT will compute

small groups of supposedly similar concepts.

Page 28: The Neighborhood Auditing Tool

2828

Page 29: The Neighborhood Auditing Tool

2929

Finding Successively Smaller Groups of Concepts

Finding Audit sets by selecting:

1. Concepts with same semantic type.

2. Concepts with 1. and same root.

3. Concepts with 1. and 2. that have the same relationships.

Page 30: The Neighborhood Auditing Tool

30

A

B

C D

E

LEGEND

concept

PAR/CHD relationship

area

EXTENT OF A SEMANTIC TYPE

Area A

Area B

Area C Area D

Area E

Other relationship

r1 r3

r'3

r2

r4

Page 31: The Neighborhood Auditing Tool

3131

Audit Set Examples Example A A selection of concepts in

the intersection of Manufactured Object + Organization under the root School (environment).

Example B All concepts that are in a non-chemical intersection with an extent size less than five.

Page 32: The Neighborhood Auditing Tool

3232

Possible Auditor’s Recommendations (see Pg. 7)

Mark concept as reviewed and correct. Mark semantic types that should be

removed. Mark semantic types that should be

added. Mark other kinds of errors. Attach notes to a reviewed concept.

Page 33: The Neighborhood Auditing Tool

3333

Page 34: The Neighborhood Auditing Tool

3434

Conclusions and Future Work

Preliminary study showed that people are more successful finding errors with NAT than with paper sources.

Recall improved with the NAT, precision did not.

NAT seems to nicely complement use of the UMLSKS.

Page 35: The Neighborhood Auditing Tool

3535

Conclusions and Future Work (cont.)

This year, work with more human subjects to quantify these observations.

Integration of algorithms for finding audit sets with NAT.By extent sizeUsing roots, and relationship patterns

within extents.

Page 36: The Neighborhood Auditing Tool

36

Page 37: The Neighborhood Auditing Tool

3737

Page 38: The Neighborhood Auditing Tool

38

Auditor

Errors Recall Precision F

with NAT

w/o NAT

with NAT

w/o NAT

with NAT

w/o NAT

with NAT

w/o NAT

1 57 45 0.97 0.82 0.53 0.51 0.86 0.63

2 22 20 0.43 0.35 0.55 0.55 0.48 0.43

3 39 34 0.64 0.58 0.46 0.53 0.54 0.55

4 56 44 0.55 0.54 0.30 0.34 0.39 0.42

Avg. 44 36 0.65 0.57 0.46 0.48 0.57 0.51

Preliminary Evaluation Study

Page 39: The Neighborhood Auditing Tool

39

Improved Recall

The auditor finds it easy to search for more errors in the neighborhood of the suspicious concept.

With better recall and the same precision you still find more errors.

Page 40: The Neighborhood Auditing Tool

4040

Auditing Demonstration

The concept Antonospora locustae was selected for audit by an algorithm that found it was the only concept assigned to the intersection Fungus + Invertebrate in the UMLS 2007AA.

Page 41: The Neighborhood Auditing Tool

4141

Page 42: The Neighborhood Auditing Tool

4242

Page 43: The Neighborhood Auditing Tool

4343

Page 44: The Neighborhood Auditing Tool

4444

Page 45: The Neighborhood Auditing Tool

45

Page 46: The Neighborhood Auditing Tool

4646

Page 47: The Neighborhood Auditing Tool

4747

Page 48: The Neighborhood Auditing Tool

4848

Page 49: The Neighborhood Auditing Tool

4949

Page 50: The Neighborhood Auditing Tool

50

Page 51: The Neighborhood Auditing Tool

51

Page 52: The Neighborhood Auditing Tool

52

Page 53: The Neighborhood Auditing Tool

5353

NAT Features Demonstration

Page 54: The Neighborhood Auditing Tool

54

Neighborhood

Page 55: The Neighborhood Auditing Tool

55

Page 56: The Neighborhood Auditing Tool

56

Page 57: The Neighborhood Auditing Tool

57

Page 58: The Neighborhood Auditing Tool

58

Page 59: The Neighborhood Auditing Tool

59

Page 60: The Neighborhood Auditing Tool

60

Page 61: The Neighborhood Auditing Tool

61

Page 62: The Neighborhood Auditing Tool

62

Page 63: The Neighborhood Auditing Tool

63

Page 64: The Neighborhood Auditing Tool

64

Page 65: The Neighborhood Auditing Tool

65

Page 66: The Neighborhood Auditing Tool

66

Page 67: The Neighborhood Auditing Tool

67

Page 68: The Neighborhood Auditing Tool

68

Page 69: The Neighborhood Auditing Tool

69

Page 70: The Neighborhood Auditing Tool

70

Page 71: The Neighborhood Auditing Tool

71

Page 72: The Neighborhood Auditing Tool

72

Page 73: The Neighborhood Auditing Tool

73

Page 74: The Neighborhood Auditing Tool

74