20
FRAMEWORK FOR INTELLIGENT VIRTUAL ORGANIZATIONS (FIVO) Natural Language based Processing of Multilingual Contracts for Virtual Organizations constitution Mikołaj Pastuszko , Bartosz Kryza, Renata Słota, Jacek Kitowski Institute of Computer Science, University of Science and Technology AGH Kraków, POLAND

CGW 2010 - NLPN

Embed Size (px)

DESCRIPTION

NLPN (NLP Negotiations) System presented at Cracow Grid Workshop 2010 conference in Kraków, Poland

Citation preview

Page 1: CGW 2010 - NLPN

FRAMEWORK FOR INTELLIGENT VIRTUAL ORGANIZATIONS (FIVO)

Natural Language basedProcessing of Multilingual Contracts

for Virtual Organizations constitution

Mikołaj Pastuszko, Bartosz Kryza, Renata Słota, Jacek Kitowski

Institute of Computer Science, University of Science and Technology AGHKraków, POLAND

Page 2: CGW 2010 - NLPN

Agenda

Background of the problem

Goals and requirements of NLPN system

Architecture of NLPN system

Main processing flow in NLPN system

Technologies and tools used in NLPN system

Example of contract text analysis in NLPN system

Future development proposals for NLPN system

Page 3: CGW 2010 - NLPN

Problem introduction

Assumption Organizations own resources that are expected to be shared within Virtual

Organization Conditions of cooperation are written down in form of the contract

document

Problem Contracts are written in natural language (e.g. Polish) Automatization of the Virtual Organization management (FiVO) requires a

formal and semantic form of the contract (ontology in OWL format)

Solution NLP-based Negotiations (NLPN) System:

Translating natural language based contracts to ontologies in OWL format

Page 4: CGW 2010 - NLPN

Concept of NLPN system

Page 5: CGW 2010 - NLPN

Goals and requirements

Support for multiple languages English and Polish as a starting point Easily extendable with support for another languages

Output ontology in OWL format (FiVO requirement)

Ontology sturucture easily adjustable

Minimalization of human (supervisor) assistance

Flexible mapping between text phrases and ontology entities Human-readable and easily editable Contract Dictionary

Modularity Easy orchestration for various applications

Page 6: CGW 2010 - NLPN

Data flow in NLPN system

Page 7: CGW 2010 - NLPN

Modular architecture of NLPN system

Page 8: CGW 2010 - NLPN

Contract text analysis

1. Tokenization

2. Sentence Splitting

3. Morphological Analysis and POS Tagging

4. Named Entities Recognition● Gazetteer

5. Contract Statemets Recognition● Transducer + grammars

Page 9: CGW 2010 - NLPN

Technologies and tools

NLP tools GATE – General Architecture for Text Engineering

Tokenizer Gazetteer OntoGazetteer JAPE Transducer

JAPE grammars – Java Annotations Pattern Engine

LanguageTool Sentence Splitter Part-of-Speech Tagger Disambiguator (tagger part)

Supports 20 languages including Polish (Morfologik library)

ANNIE – A Nearly-NewInformation Extraction System

Page 10: CGW 2010 - NLPN

Technologies and tools

Ontologies Jena Semantic Web Framework library

Supports read and write in RDF/XML, N3 and N-Triples formats Provides API for OWL and RDF

Configuration files YAML format SnakeYAML library

Page 11: CGW 2010 - NLPN

Example: Contract text analysis

Stwierdzenia QoS

Costa Rica Airlines będzie świadczyć ilość miejsc siedzących dla Mercedes-Benz H6 wynoszącą dokładnie 54 i przewidywaną prędkość średnią ponad 60 km/h.

Stwierdzenia bezpieczeństwa

Tour Manager i Klient powinni być uprawnieni do rezerwowania miejsc poprzez Usługę Costa Rica.

Klauzule kar umownych

W przypadku niedotrzymania warunków świadczenia Acela D45 trainset powinno zostać wysłane powiadomienie do Johna Smitha.

QoS Statements

Costa Rica Airlines should provide number of seats of Mercedes-Benz H6 equal to 54 and expected average velocity greater than 60 km/h.

Security Statements

Tour Manager and Client should be able to book seats on Costa Rica Service.

Penalty Clauses

In case of violation of Acela D45 trainset sharing conditions a notification should be sent to John Smith.

Page 12: CGW 2010 - NLPN

Tokenization

Page 13: CGW 2010 - NLPN

Sentence Splitting

Page 14: CGW 2010 - NLPN

Morphological Analysis and POS Tagging

Page 15: CGW 2010 - NLPN

Named Entities Recognition

Page 16: CGW 2010 - NLPN

Contract Statements Recognition

Page 17: CGW 2010 - NLPN

Contract Statements Recognition

Page 18: CGW 2010 - NLPN

Summary

NLPN system: Translates natural language based contracts to formal and

semantic form of ontologies Supports English and Polish

Easily extendable with another languages Is modular

Ease of use in various applications Is highly configurable

Contract Dictionary (including its structure) Contract Ontology structure Contract Statements forms Configuration files for all components

Has broad perspectives for future development →

Page 19: CGW 2010 - NLPN

Future development

Distributed Negotiations Environment NegotiationsConsole

Morestatementforms

Statisticapproachalgorithms

Noisecorrection(typo etc.)

Page 20: CGW 2010 - NLPN

Thank you

The End

[email protected]