Upload
ngothuy
View
223
Download
0
Embed Size (px)
Citation preview
LII!Legal Information Institute"
Sara Frug"Associate Director for Technology, LII"
"Thomas Bruce"Director, LII"
Who are we?"
•!Open access legal publisher based in Cornell Law School #since 1992$"
•!Known for enhanced US Code and CFR"
•!Help people find and understand the law #for 30 million visitors last year$"
•!Collaborators with information scientists at Cornell and elsewhere"
What do we do"•!Provide a bottom%up #regulations%first$
approach to the law "
•!Enrich primary text "
•!Extract metadata"
•!Connect things%in%the%world to their legal context"
•!Present these connections in an understandable way"
For example"
What makes this hard?"•!Traditional semantic web quality requirements"
•!Combined with"
•!Dirty source data"
•! Incomplete source data"
•!Performance of extraction technologies"
•! Formalizing and converting paper finding aids unearths flaws and mapping gaps"
What have we done?"
•!A lot of prototypes; some production features"
•!XML enhancement of US Code, CFR"
•!Metadata to RDF"
•! Feature development"
•!Visualization"
Data Sources"•! FDsys eCFR XML"
•!Parallel Table of Authorities, Table of Popular Names of Legislation, etc."
•! FR Thesaurus of Indexing Terms"
•!U.S. Government Manual"
•!Agency guidances"
•!Ontologies and vocabularies"
eCFR minimum feature set % and why it’s hard to build"
Feature! Dependency! Challenge!
Legislative references"
Bill structure metadata"
Metadata availability"
Pinpoint cross%references"
Beneath%section structure"
Inconsistent enumeration"
Breadcrumbs, TOCs"
Knowledge of title structure"
XML structure is volume%based"
Definitions"•!Definition extraction"
•!Definiendum parsing"
•! Scope detection and resolution"
•!Term%in%context disambiguation"
M.Eng. Teams: "Fall 2015: Karthik Venkataramaiah, Dhwanish Pramthesh Shah, Shivananda Pujeri, Vishal Kumkar, Jigar Bhati; Supervisors: Sara Frug and Sylvia Kwakye, Ph.D. "Fall 2013: Deepthi Rajagopalan, Neha Kulkarni, and Siyu Zhan Supervisor: Mohammad AL Asswad, Ph.D."Fall 2012: Sarah Bouwman, Debraj Sinha; Supervisor: Nuria Casellas, Ph.D. "
Linked Entities"•!Ontologies and Vocabularies"
•!Agrovoc and Agris"
•!Drugbank"
•!MeSH"
•!DBpedia"M.Eng. Teams:"Fall 2015: #Drugbank$ Arpitha Shivakumar, Sanapureddy Ram Sai, Sheena Jain; Domain "expert: Caroline Young, JD, MLIS"Spring 2015: Jai Bhatt"Fall 2014: #FIBO, DBPedia$ Jai Bhatt, Trupti Bavalatti, Meghana Pavagada Chandrashekar, "Nikhil Navali, Surya Sumukh SP, Joshua Freeberg"Spring 2013: #Agrovoc, Agris$ Yan Huang, Timothy SavardSpring 2012: #UNSPSC$ Jie Lin, "Krithi Rai"
Bottom%up Vocabulary"•!Parsed all language of the CFR"
•!Extracted broader, narrower, related"
•!Extracted obligations and requirements"
M.Eng. Teams: "% Sharvari Marathe, Dallas Dias, Ankit Singh; Sanjna Venkataraman; Supervisor: Núria Casellas, Ph.D. "% Caleb Perkins; Supervisor: Mohammad AL Asswad, Ph.D. "
Topic Modeling"•! Software: Mallet #David Mimno$"
•!Visualizer: fork of DFR Browser #Andrew Goldstone; fork by Josh Campbell$"
M.Eng. Teams: "CFR: Eva Sharma, Shreya Roy Chowdhury, Lisha Murthy"Federal Register: Nivedhitha Sundarmurthi, Srinisha Ramaswamy, Shubhangi Kumar"Visualization: Joshua Campbell, Saicharan Shriram Mujumar, Anisha Venugopal Reddy" "
Visualizer: Topic List"
Visualizer: Topic View"
Visualizer: Document View"
Visualizer: Stopword Evaluation"
Visualizer: Stability over Time"
Linked Data: CFR and Agency Structure"
Linked Data: CFR Cross%References"
CFR Entities"
CFR and Guidances"
Evaluation and Crowdsourcing"
•!We tried domain%expert%annotates%samples"
•!Domain experts were reliable for a few terms and a few paragraphs at a time and would otherwise find it di&cult not to fall back on word%processing search tools"
•!We decided to separate evaluation of precision and recall"
All user see links:"
Interested users may evaluate"
Crowdsourcing: How it Can Work"
Assessment! Expertise required!
Was the definition phrase truncated?" Simple reading comprehension !any layperson""
Was the defined term parsed correctly?" Reading comprehension !any layperson"""
Was a non%definition extracted / linked?" Familiarity with legal text !law student, careful reader""
Was a term marked when “the context otherwise requires”?"
Familiarity with legal text or substantive domain expertise !careful student, lay expert on object of regulation""
Was scoping language extracted?" Familiarity with legal text !law student, careful reader""
Does the definition apply within this scope? #simple scope$"
Familiarity with legal text !law student, careful lay reader""
Does the definition apply within this scope? #complex scope$"
Experience with regulatory practice, regulatory drafting, or law librarianship !lawyer""
Crowdsourcing: Looking Ahead"
•! Inspiration: Games With a Purpose and reCAPTCHA #Luis Von Ahn$"
•!Make recall evaluation simple and granular enough for a knowledgeable audience to complete successfully"
More information"•! The Law of Where I’m Standing Right Now %
https://blog.law.cornell.edu/tbruce/2013/12/24/the%law%of%where%im%standing%right%now/"
•! Making Metasausage #legislative data modeling$ % https://blog.law.cornell.edu/metasausage"
•! Linked Linked Legal Data: A SKOS Vocabulary for the Code of Federal Regulations http://www.semantic%web%journal.net/content/linked%legal%data%skos%vocabulary%code%federal%regulations"
•! What can an Index Do? % http://papers.ssrn.com/sol3/papers.cfm?abstract_id=2443386"
Questions?"
•! Sara Frug <[email protected]>"