57
1 REGNET: An Infrastructure for Regulatory Information Management and Compliance Assistance Kincho H. Law Prof., Civil and Env. Engr. Jim Leckie Prof., Civil and Env. Engr. Barton Thompson Prof., School of Law Gio Wiederhold Prof., Computer Science Shawn Kerrigan Bill Labiosa Gloria Lau Haoyi Wang Jie Wang Civil and Env. Engr. Pooja Trivedi Li Zhang Liang Zhou (former students) Computer Science Charles Heenan Researcher, Law Student Stanford University, Stanford, CA 94305

REGNET: An Infrastructure for Regulatory Information Management and Compliance Assistance

Embed Size (px)

DESCRIPTION

REGNET: An Infrastructure for Regulatory Information Management and Compliance Assistance. Shawn Kerrigan Bill Labiosa Gloria Lau Haoyi Wang Jie Wang Civil and Env. Engr. Pooja Trivedi Li Zhang Liang Zhou (former students) Computer Science Charles Heenan Researcher, Law Student. - PowerPoint PPT Presentation

Citation preview

1

REGNET: An Infrastructure for Regulatory Information Management and Compliance Assistance

Kincho H. LawProf., Civil and Env. Engr.

Jim Leckie

Prof., Civil and Env. Engr.

Barton Thompson

Prof., School of Law

Gio WiederholdProf., Computer Science

Shawn KerriganBill Labiosa

Gloria LauHaoyi Wang

Jie WangCivil and Env. Engr.

Pooja Trivedi Li Zhang

Liang Zhou(former students)Computer Science

Charles HeenanResearcher, Law Student

Stanford University, Stanford, CA 94305

2

The Public and Scientific Problem

• Regulations are established to protect the public• Regulations greatly constrain businesses’ actions• Many organizations participate to set and use regulations• Interpretation of regulations is costly and inconsistent

• Regulations are voluminous, often incomplete, sometimes conflicting

• Regulations are written in natural language• The objects and interests being regulated are often encoded• Many sources of supportive documents – interpretative

documents, guidelines, etc..

3

MotivationThe complexity, diversity, and volume of

federal and state regulations:• Require considerable expertise to understand• Increase the risk of companies failing to comply

with environmental regulations• Hinder public understanding of the government

How would IT help• to make “applicable” regulations easily

accessible? • to assist parties involved in regulation

compliance?

4

ObjectiveTo enhance regulation management, access and the regulatory compliance process through the use of information technology

Application FocusEnvironmental Regulations:• Federal CFR Title 40: Protection of Environment • 40 CFR 279: Standards For The Management Of Used Oil • 40 CFR 141: National Primary Drinking Water Regulations

Illinois Title 35: Environmental ProtectionNew York Title 6: Environmental Conservation Rules and Regulations

Others

REGNET Project(sponsored by Digital Government Program, National Science Foundation)

5

REGNET Research Goals

• Research questions

– What is an appropriate model for a information management system for compliance assistance?

– How to build such a system

– How to deal with the conflicting objectives?

• Research goal

– Developing information management frameworks that can facilitate public access to regulations, improve the efficiency of regulation compliance and facilitate the compliance process.

6

• Repositories: Infrastructure for online repository of regulations and translating texts into processable form and facilitate access

• Access Tools: Access of the regulation text and related information

• Ontology Development: Formalize terms and meanings to help development of logical rules about relationships in the regulations and among the different regulations

• Integrated Access: Retrieval of regulations based on the content or relationships between the regulations

• Analysis Tools: To validate and improve the quality of the ontology and to check the content of regulations within a domain or across different domains of federal, state and local regulations.

• Compliance Checking Assistance: To develop the means to interface the regulations with usage.

Research Tasks

7

8

9

10

Current Tasks

• Parsing unstructured documents into “tagged” processable format

• Investigating methodology to establish concepts and classification structures in the regulatory documents

• Developing a “logic-based” compliance assistance system

Purpose : Feedbacks and Suggestions

11

Document Repository and Access:

Examples:Drinking Water Regulations: 40 CFR Part 141 and

Background Documents

Bill Labiosa, Charles Heenan

Engineering Informatics Group

Stanford University

12

Overview: Drinking Water Background Information Search

• Documents of interest for this example: web available documents from www.epa.gov/OGWDW and online 40 CFR Part 141

• Current keyword search approach vs. concept categorization approachUSEPA Search Engine: file search for keywordsSemioTagger: Concept categorization approach

• Using two simple categorization hierarchies: an index (alphabetical list of concepts) a regulated drinking water contaminant hierarchy

13

Current Approach: Using “EPA Search”

radium removal

Search Term:“radium removal”

14

15

Full 112 page Document is returned ...

16

More specific search using EPA Search

• New search expression:

“radium removal” AND “drinking water” AND “small systems”

17

Fewer results, but still full documents

18

“radium removal” AND“drinking water” AND“small systems”

search confined to www.epa.gov

19

20

Where do I begin?

                      

21

Search problem

• Background documents, even when located, are voluminous. User is forced to do keyword search within documents, trying to find “the right part of the right document”: time consuming and frustrating.

• When you don’t know which document you want, you can end up in the familiar “information overload” situation.

22

Concept Categorization Approach: SemioTagger

• Two example hierarchies:– “Index for Drinking Water Information” for web

available materials from OGWDW– “Index to the National Primary Drinking Water

Regulations” for 40 CFR 141, using a drinking water contaminant hierarchy

• Both starting lists of concepts extracted by Semio were “cleaned” (irrelevant concepts deleted, important “compound word concepts” modified to meet expectations of drinking water experts).

23

Concept Categorization Approach

• noun phrase extraction

• noun phrase co-occurrence cycles

• hierarchy creation

• document tagging

• information retrieval interface

24

25

Document Repository and Access:

Demonstration Session I:Index for Drinking Water Information anda Contaminant Hierarchy for 40 CFR 141

Bill Labiosa

Engineering Informatics Group

Stanford University

26

Example Taxonomy: Drinking Water Contaminants

Drinking WaterContaminants

Drinking WaterContaminants

Microorganisms

InorganicChemicals

Disinfection/Disinf. By-Products

Radionuclides

OrganicChemicals

27

Regulatory Compliance Assistance

Shawn Kerrigan

Engineering Informatics Group

Stanford University

28

Background• Current state of compliance checking:

• Paper-based process• Locating and interpreting the relevant regulations is

complex, even with the help of supplementary information

• Small companies have difficulty conducting compliance checks due to lack of resources and knowledge

• Vision for future:• Up-to-date regulations and compliance-checking

assistance procedures available online• Improved regulation and compliance-requirement

transparency through clear presentation and linking

29

Research Questions

• How can we make the information and rules more accessible?

• How can we represent the information and rules in environmental regulations in a computer interpretable format?

• How can we structure this information to assist with regulation compliance checking?

30

General Approach

• Information Integration• Formalization of meaning and relationships• Regulation-centric• Tie the information to the appropriate

portion of the regulation

31

Regulation Assistance System (RAS)

• Provides a unifying web interface for the regulation documents and meta-data

• Demonstrates the usefulness of XML structured regulation documents with meta-data

• Works with a logic-based compliance-checking assistance system to demonstrate web-based regulation services

32

Demonstration Session II

• Display regulations with meta-data

• Compliance example

• Non-compliance example

33

Regulation Parsing

• Need to transform plain text/PDF regulations into XML

• Can structure the XML to represent the hierarchical structure of the regulation

34

HTML to XML Regulation Parsing

XMLStructuredDocument

35

Regulation Parsing§ 279.12 Prohibitions.

(a) Surface impoundment prohibition. Used oil shall not be managed in surface impoundments or waste piles unless the units are subject to regulation under parts

264 or 265 of this chapter.

<regElement id = “40.cfr.279.12” title = “Prohibitions”> < regElement id = “40.cfr.279.12.a” title = “Surface Impoundment prohibition”> <regText>

Used oil shall not be managed in surface impoundments or waste piles unless the units are subject to regulation under parts 264 or 265 of this chapter.

</regText> </regElement></regElement>

36

Adding Meta-Data to Regulations

Regulation taggedwith meta-data

Add LegalInterpretation

Reference Extraction

Add LogicalInterpretation

Add Concepts

Original XMLdocument

Document

Program

37

Parsing ReferencesPART 279—Standards For The Management Of Used Oil

Subpart B – Applicability

…§ 279.12 Prohibitions.(a) Surface impoundment prohibition. Used oil shall not be managed in surface impoundments or waste piles unless the units are subject to regulation under parts 264 or 265 of this chapter. (b) Use as a dust suppressant. The use of used oil as a dust suppressant is prohibited, except when such activity takes place in one of the states listed in § 279.82(c).(c) Burning in particular units. Off-specification used oil fuel may be burned for energy recovery in only the following devices: (1) Industrial furnaces identified in § 260.10 of this chapter; (2) Boilers, as defined in § 260.10 of this chapter, that are identified as follows: (i) Industrial boilers located on the site of a facility engaged in a manufacturing process where substances are transformed into new products, including the component parts of products, by mechanical or chemical processes; (ii) Utility boilers used to produce electric power, steam, heated or cooled air, or other gases or fluids for sale; or (iii) Used oil-fired space heaters provided that the burner meets the provisions of § 279.23. (3) Hazardous waste incinerators subject to regulation under subpart O of parts 264 or 265 of this chapter.

§ 262.11 Used Oil Specification.…..

38

<regText>(a) Surface impoundment prohibition. Used oil shall not be managed in surface impoundments or waste piles unless the units are subject to regulation under parts 264 or 265 of this chapter.</regText>

Before:

<regText>(a) Surface impoundment prohibition. Used oil shall not be managed in surface impoundments or waste piles unless the units are subject to regulation under parts 264 or 265 of this chapter.</regText><reference id = "ref.40.cfr.264" /><reference id = "ref.40.cfr.265" />

After:

Parsing References

Original XMLdocument

XML withReference List

Reference Extraction

39

What is a “Concept”?

• Examples:– emission requirement

– leaked hazardous substance

– disposal of solvents

– principal hazardous constituent

• Why are they useful?– identify similar regulations even when they do not

reference each other

– provide a “context” for the regulation provision

40

Regnet Taxonomy

41

Tagging with Concepts

<regText>Used oil shall not be managed in surface impoundments or waste piles unless the units are subject to regulation under parts 264 or 265 of this chapter. </regText>

<regText>Used oil shall not be managed in surface impoundments or waste piles unless the units are subject to regulation under parts 264 or 265 of this chapter. </regText>

<concept name = “waste pile” /><concept name = “surface impoundment” />

42

XML Embedded Logic

<logic_sentence> all _o (usedOil(_o) -> -(dustSuppressant(_o))).</logic_sentence>

Rule logic represents the rules specified by the regulation:

40.CFR.279.12.b – Use as a dust suppressant:

“The use of used oil as a dust suppressant is prohibited…”

Option elements define the user interface:<logic_option> <question> Is the used-oil used as a dust suppressant? </question> <logic_opt answer = "yes"> (usedOil(oil1) & dust_suppressant(oil1)). </logic_opt> <logic_opt answer = "no"> (usedOil(oil1) & (-(dust_suppressant(oil1))). </logic_opt></logic_option>

Control statements specify processing instructions for compliance-checking:<control> <goto target = “ref.40.cfr.279.65” /> <switchTo target = “ref.40.cfr.279.73” /></control>

<control> <end target = “ref.40.cfr.279.12” /></control>

43

XML-based Regulations

Additional Input Files

Interactive User Input

Regulation Compliance

Decision

Logic input file Found proof / no proof found

RASweb•Provides web interface

•Displays regulation information

RCCsession•Implements compliance checking procedure

User input Results / requested information

RAS System Structure

* Otter is an automated-deduction program developed by William McCune at Argonne National Laboratory

Otter*

•Attempts to find proof by contradiction from input file

44

Demonstration Session III

• Use of control elements

• Use of “I don’t know” to check multiple paths

45

Summary

• Can decompose regulations into a structured XML document

• Adding rich meta-data about regulations enables more sophisticated interaction with the documents

• Automated assistance with environmental compliance-checking may be possible

46

Thank You!

Questions?

47

Discussion Questions

• How can we explain things better? • How will such a system be useful?• What are examples of how you could use such a system?• What would make the system more useful?• Do you have suggestions for people/fields we should

contact that might be interested in what we are doing?• How are the problems addressed currently dealt with?• What are some existing technologies we should investigate?• What are recommendations for issues we should address?• What might be complementary tools to develop next?

48

49

Translate To Hierarchical Structure

PART 279—Standards For The Management Of Used Oil

Subpart B – Applicability

…§ 279.12 Prohibitions.(a) Surface impoundment prohibition. Used oil shall not be managed in surface impoundments or waste piles unless the units are subject to regulation under parts 264 or 265 of this chapter. (b) Use as a dust suppressant. The use of used oil as a dust suppressant is prohibited, except when such activity takes place in one of the states listed in § 279.82(c).(c) Burning in particular units. Off-specification used oil fuel may be burned for energy recovery in only the following devices: (1) Industrial furnaces identified in § 260.10 of this chapter; (2) Boilers, as defined in § 260.10 of this chapter, that are identified as follows: (i) Industrial boilers located on the site of a facility engaged in a manufacturing process where substances are transformed into new products, including the component parts of products, by mechanical or chemical processes;….§ 262.11 Used Oil Specification.…..

Subsection(a)

Subsection(b)

Subsection(c)

Subsection(d)

40 CFR 279

Subpart A Subpart B Subpart I

Section 262.10 Section 262.11 Section 262.12

…… …

contains

(a) Surface impoundment prohibition. Used oil shall not be managed in surface impoundments or waste piles unless the units …

Example:

(a) Surface impoundment prohibition. Used oil shall not be managed in surface impoundments or waste piles unless the units are subject to regulation under parts 264 or 265 of this chapter.

50

Document Structures

• Plain text• PDF• HTML• XML

51

Plain Text

• Unstructured text• Cannot contain non-text elements• Difficult for machines to process

52

PDF

• Allow images and other non-text elements• Not an open standard• Display-enhancement, data content not

structured or tagged with meaning• Poses the same information-extraction

problem as with plain text

53

HTML

• Open standard• Allows incorporation of display formatting,

images, sounds, and video• Primarily a method for describing how data

should be displayed• Does not effectively represent structure or

meaning of data

54

XML

• XML does not improve the “viewability” of web pages

• XML puts the data in a format that allows us to do more powerful things with it• Organized structure

• Self-describing

• Searching

• Selective views

• Add meta-data

55

XML

•In XML we are not limited to a predefined set of tags

•We can now tag the data according to content, rather than display format

HTML:

<b>Section 262.20</b><b>General Requirements</b><p>A generator who transports, or offers for transportation…</p>

Example XML:

<sectionId>Section 262.20</sectionId><title>General Requirements</title><provision>A generator who transports, or offers for transportation…</provision>

56

Otter•Attempts to find proof by contradiction from input file

RCCsession – Otter Interaction

FOPCInput File

Proof Attempt Output File

RCCsession•Implements compliance checking procedure

Develop input file with appropriate logic sentences

Read proof attempt output and take appropriate action

57

Legal Interpretation

Interpretation of the provision by a legal expert familiar with the regulations

40 CFR 261.4(b)(1)

The following solid wastes are not hazardous wastes:(1) Household waste, including household waste that has been collected, transported, stored, treated, disposed, recovered…

<legal_interpretation>

This provision has been upheld, but narrowed in scope by the U.S. Supreme Court. Household waste is generally not considered a hazardous waste. The court narrowed this provision when it decided ash produced by incinerating household waste is regulated as a hazardous waste if it has hazardous characteristics. Thus, if an incineration facility burns household waste, it can be considered a generator of hazardous waste.

</legal_interpretation>