23
Introduction and Overview Web of Law Research Proposal Adam Meyers NYU

Web of Law Research Proposal - New York University

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Introduction and Overview

Web of Law

Research ProposalAdam Meyers

NYU

Introduction and Overview

Outline

● Goals● Existing Systems● The Original Data● Our Work So Far● Future Work, including what I we will do this

Summer

Introduction and Overview

Goals● Open Source Library of Legal NLP Materials

– Legal Search, Annotated Corpora, Automated Translation, etc.

– Court Decisions, Legislation, Legal Forms, Codes and Regulations, etc.

● Short Range Goals:– Analysis of Legal Decisions from the Supreme Court

– Expand to Other Appellate Decisions

● Collaboration:– Periodic contact with Court Listener – hope to do distribute/share with them

● https://www.courtlistener.com/● We get our data from them

– Possible future work with Cornell Legal Information Institute● https://www.law.cornell.edu/lii/about/who_we_are● They work more with legislation. For example, we may want to link citations in legal decisions to their text● They translate some legal documents (not case decisions) to Spanish

Introduction and Overview

Goals for Today

● Project So Far and Future Directions● Some Projects that have Already Started● Intended Future Projects● Method for Identifying Future Projects● The beginnings of 1 or more annotation tasks

Introduction and Overview

Existing Systems● Proprietary Systems: Lexis/Nexis, Westlaw, Bloomberg Law, Casemaker

– Legal Search

– Links between citations and legal documents● Including citation graph

– Sorting by: parties of case, judges, etc.

– Information about Lexis/Nexis● https://www.youtube.com/watch?v=7jPBrndIMMs ● http://www.lexisnexis.com/documents/pdf/20100702053851_large.pdf● https://www.lexisnexis.com/documents/pdf/20150701111022_large.pdf

● Court Listener currently has limited versions of these capabilities, but is freely available– We are adapting Court Listener's data

Introduction and Overview

The Data

● U.S. Court Decisions– Supreme Court (Scotus)

– Appeals Courts (District Courts, Veterans Appeals)

● Source of Data = Court Listener – Free Law Project (a non-profit organization)

● Michael Lissner and Brian Carver

● Court Listener provides: – Harvesting court cases to provide users with free access

– legal search engine

– Limited html links between court cases

– Downloadable json/html markup of text

Introduction and Overview

Our Modified Version of Data● https://nlp.cs.nyu.edu/meyers/web_of_law.html● Plain Text Version of the Data

– xml offset based on original json markup

– corrections to make cohesive: fixing paragraph splits, separating footnotes, etc.

– correct (most) encodeing errors

● Simple version of citation graph (usable without looking at the cases)● Simple Manual rule output

– citations to cases

– people names, organizations, legal_roles, professions, dates, etc.– Equivalence and Role relations

● based on apposition and substring

● Supreme Court Only (64K)● After further work, will expand to other appellate court decisions

Introduction and Overview

Manual Rule-based Entities & Relations● Regular expressions and local patterns only● Recognizes Citations to Legal Decisions

– Standard Citation: ● 410 U.S. 113 (1973)

– X vs. Y: ● ROE ET AL. v. WADE, DISTRICT ATTORNEY OF DALLAS COUNTY

– Other:● "In re Vince"

● Recognizes Other Entities:– LEGAL_ROLE: appellants

– PROFESSION: DISTRICT ATTORNEY OF DALLAS COUNTY

– NAME: Robert C. Flowers

– Others: ORGANIZATION, DOCKET, DATE

● Recognizes relations based on apposition and substring– Equivalence, party_of case, at_date, etc.

Introduction and Overview

Weiyi Lu's System

● Weiyi has modified ICE– http://nlp.cs.nyu.edu/ice/

● ICE is designed to produce an IE system (relations and entities) by bootstrapping from a small number of annotated examples.

● Weiyi's system identifies Named Entities and Legal Roles and some relations

● It can import entities (citations) from the manual rule system

Introduction and Overview

Future Manual Rules and Annotation

● Other Entities– citations to legislation are currently not covered by our system

● Rules similar to those for citations could be used● We have a copy of the “Blue Book” and copies exist online● Examples: constitution, statutes, amendments, sections, regulations, …

– Quotations – wrote a simple quote recognizer yesterday, which could probably be updated● This could be extended to cover “that clauses”● Quotes that are part of the same “that clause” could be grouped together, etc.

● Other Relations– Extended Coreference

● All members of party in case and their representatives form a unit● Citations, authors of decisions, courts the authors represent, etc. form a unit.

– Attribution● Quotes are attributed to sources (citations, people, organizations)

Introduction and Overview

Terminology Extraction

● We have adapted Termolator to run on Law Cases– http://nlp.cs.nyu.edu/termolator/

– The .terms files are offset annotation of individual (candidate) terms – these are potential arguments for IE relations

● We have run using one file (Roe v Wade) as a foreground● We are adapting to efficiently run on all 64K Supreme

Court cases, using each case as foreground and the whole set as background– The hope is that the top N=50 or N=100 can be used as topic

words for each case.

Introduction and Overview

Top Terms from Roe V Wade● medical-legal history● medical abortion practices● common-law prosecutions● roman catholic dogma● good-faith belief● definitional deficiencies● canon-law treatment● historical statutory development● common-law scholar● uniform abortion act● clinical judgment● anti-abortion statutes● anti-abortion mood● anti-abortion law● emotional self● final article● trend will● birth control law unconstitutional● good-faith opinion● jewish law● abortion controversy● canon-law crime

Introduction and Overview

Machine Translation to Spanish● John Ortega is supervising this● Esteban Galvis as begun working on this● Goal:

– Translate Court Decisions automatically to Spanish● Obstacle:

– No parallel text in the court decision domain● Proposed Solution: Domain Adaptation

– Train MT system on related domains ● Europarl contains Spanish/English text for EU parliament proceedings● Some Spanish/English documents at Cornell (legislation)● Some US government forms exist in Spanish and English

– Use previously described recognition of citations, names, terminology, etc. to identify “special phrases”● These phrases may be left untranslated● These phrases may be translated as units (e.g., a word inside a special phrase will not form a constituent with a word outside a special phrase)● Some special phrases will be specifically found in existing bilingual dictionaries.● These factors will constrain and improve the resutling MT

– Legal Dictionary from U of Connecticut: http://jud.ct.gov/external/news/jobs/interpreter/glossary_of_Legal_Terminology_English-to-Spanish.pdf

● Possible Extension: Implement Spanish Version of Termolator (favor term/term alignments) ● Dictionary of in vocabulary terms● Rules for inline terms● Adjustment of some encoding filters

Introduction and Overview

Big Citation Graph

Baker v. Carr

Board of Regents v. Roth

Association of Data Processing Service Organizations, Inc. v. Camp

Ashwander v. TVA

Aptheker v. Secretary of State

Abrams v. Foshee

Abele v. Markle

Boyd v. United States

Buck v. Bell

Byrne v. Karalexis

Byrn v. New York City Health & Hospitals Corp.

Bolling v. Sharpe

Boyle v. LandryBabbitz v. McCann

Cantwell v. Connecticut

Carrington v. Rash

Carroll v. President and Commissioners of Princess Anne

Carter v. Jury Comm'n

Cheaney v. State, Ind.

Commonwealth v. Bangs

Commonwealth v. Parker

Corkey v. Edwards

Court. Yick Wo v. Hopkins

Crossen v. Attorney General

Crossen v. BreckenridgeDoe v. Bolton

Doe v. Rampton

Doe v. Scott

Roe v Wade

Introduction and Overview

Full Citation Graph is Very Large

● Previous slide includes < ½ of cases cited in Roe v Wade (w/simulated interconnections)

● Non-case citations omitted– Legislative documents and Regulations

– Briefs

– Other materials

● Other Possible Graphs– Rulings and Citings by specific judges

– Rulings and Citings by particular courts

– Rulings and Citings sorted by particular issues

Introduction and Overview

Doc-Internal Coreference for Roe v Wade

Roe et al v Wade

Party1: Roe et al

Party2: Henry Wade

Dockets: 70-18

Years: 1973

Reporter: U.S.

Volume: 410

Page: 113

...

ROE ET AL

appellants

Jane Roe

Roe

Roe

RoeJane Roeappellant...

Wade

appellee

appellee

Henry Wade

Bolton

Bolton

Bolton...

Party1: John and Mary Doe

Party2: Bolton

Dockets: 70-18

Years: 1973

Reporter: F. Supp

Volume: 319

Page: 1048

...

Doe v. Bolton

Party2

John and Mary Doe

Doe

Doe

Mrs. Doe

The Does

Party1

Mentions

Doe v. Bolton

410 U.S. 179

410 U.S. 179

319 F.Supp. 1048 (N.D.Ga.1970)

Party1

Party2

Mentions

410 U.S. 113 (1973)

93 S.Ct. 705

35 L.Ed.2d 147

Jane ROE, et al., Appellants,v.Henry WADE

Cite

Cite

...

402 U. S. 941 (1971)

314 F. Supp. 1217, 1225 (ND Tex. 1970)

Lochner v. New York

Introduction and Overview

In-Document Coreference and Citations● Citations are mentions of docs & have context

– Version information: ● Stages of cases (original case, 1st appeal, nth appeal, supreme court)● Revisions of Laws

– Sentiment like properties – Candidates for Annotation + ML Research● “Reason” for citation – Looking for a simple detectable codification of what a reason for a citation is

– A topic, a set of topic words or terms, support for a particular party in the case, etc.

● Straight-forward common noun coreference– Metonymy via legal roles, professions and (maybe family relations)

● appellant, defendant, appellee, …● assistant district attorney, officer, justice, …● wife, husband, father, …

– Uses of nouns from small classes of words● this section, the statute, ...

● Parts of documents– sections of laws

– opinions: majority/binding, concurring, dissenting● overruling, confirming & negating other opinions

● Simplification: who is a “member” or “representative” of a particular party in a case– These could be treated as one entity (for purposes of attribution)

● A citation for a case, the court ruling on the case and the author of the binding opinion– These could be treated as a single entity (for purposes of attribution)

Introduction and Overview

Clustering of Documents

● Clustering of documents according to which other documents they cite– Anna Fenske – initial implementation, has not evaluated it

yet

● Clustering of documents based on the words/terms they contain– Carly Abraham – planned

● Comparison of these Clusters● Using co-training to combine methods

Introduction and Overview

How to Get Further on IE

● Design IE tasks through manual annotation with some pre-processing (preliminary systems)

● Pre-processing systems– Python Code (to be released)

● Identifies entities and simple relations as above● Identifies quoted phrases

– Weiyi's JET modification● Identifies entities and relations● Uses different methods – possibility of merging output

● I will demo a preliminary attribution task now

Introduction and Overview

Information Extraction Research

● Rounds of annotation, specification correction, pre-processing, until task(s) are well-defined. – Good Opportunity because we have several students

– Results in a design of an annotation task

● Machine Learning System– Using training data from annotation

● Active Learning System– More efficient use of training data

● More Manual Rules– Short Term Project: Modify existing code to recognize legislation citation

Introduction and Overview

Tasks● Information Extraction

– Annotation, Preprocessing, Etc.

– Short Term: manual rules for legislation

– Short Term: code for converting automatic annotation to Mae format

– Extracting Entities needed by MT– After consistent annotation, various automatic methods

● Including unsupervised/semi-supervised techniques based on IR and deep learning

● Terminology Extraction– Work on Spanish Version of Termolator

● Machine Translation– Various (including interaction with above)

● Document Clustering– Progress further on citation based clustering: evaluation, other methods

– Terminology/Word based clustering● Use words● Use N-grams● Use terms from .terms files● Use top terms as per Termoaltor

● Other

Introduction and Overview

Document Distribution● Initial output at Web of Law Website● Termolator (from Github)● More Code (in next 1 or 2 days):

– Code for material on Web of Law Website

– Initial Mae dtd for quotation task

– Initial specs

● Mae Program: https://github.com/amber-stubbs/mae-annotation● Various Dictionaries● Does everyone have a CS linux account?● Does everyone have “proteus” access?

Introduction and Overview

Some Standards

● Offset Annotation: – Output of systems should “point” to start and end

character offsets in the original text

● Encoding– The text is utf8 encoding including special characters

– The annotation should escape out problematic sgml/html characters using &xxx; style characters.

– I will distribute some code that does this in both directions.