41
Algorithm to populate Telecom domain OWL-DL ontology with A-box object properties derived from Technical Support Documents 1 Kouznetsov A, 2 Shoebottom B, 1 Baker CJO 1 Department of Computer Science and Applied Statistics, University of New Brunswick, Saint John, Canada 2 Innovatia, Inc, Saint John, Canada

1 Kouznetsov A, 2 Shoebottom B, 1 Baker CJO

  • Upload
    naoko

  • View
    39

  • Download
    0

Embed Size (px)

DESCRIPTION

Algorithm to populate Telecom domain OWL-DL ontology with A-box object properties derived from Technical Support Documents. 1 Kouznetsov A, 2 Shoebottom B, 1 Baker CJO 1 Department of Computer Science and Applied Statistics, University of New Brunswick, Saint John, Canada - PowerPoint PPT Presentation

Citation preview

Page 1: 1 Kouznetsov  A,  2 Shoebottom B,  1 Baker CJO

Algorithm to populate Telecom domain OWL-DL ontology

with A-box object properties derived from Technical Support Documents

1Kouznetsov A, 2Shoebottom B, 1Baker CJO

1 Department of Computer Science and Applied Statistics, University of New Brunswick, Saint John, Canada2 Innovatia, Inc, Saint John, Canada

Page 2: 1 Kouznetsov  A,  2 Shoebottom B,  1 Baker CJO

Motivation: Why Ontology-Centric?

• Problem: To respond information requests timely contact center workers need to search through many types of knowledge resources

• Challenge: increasing quality of service and decreasing contact center costs

• Solution: using the ontology centric‐ platform– less escalation to more experienced workers– less time spent in resolving cases– training time is also greatly reduced

Page 3: 1 Kouznetsov  A,  2 Shoebottom B,  1 Baker CJO

Motivation: Why Text Mining?

• Problem : Significant time spent by highly educated experts in populating ontology.

• Challenge: Reduce the workload• Solution: Apply text mining - semiautomatic

method for extracting information, specifically named entities and their relations, from texts and populating a domain ontology.

Page 4: 1 Kouznetsov  A,  2 Shoebottom B,  1 Baker CJO

Focus

• We are focused on the problem of accurately extracting and populating relations between the named entities and presenting them as object properties between A-box individuals in an OWL-DL ontology.

Page 5: 1 Kouznetsov  A,  2 Shoebottom B,  1 Baker CJO

Populate A-box Object Property. Single Property

Domain ClassMan

Range ClassWoman

Object Property

hasSister

Domain InstanceSamuel

Range InstanceMary?

T-Box

A-Box

Page 6: 1 Kouznetsov  A,  2 Shoebottom B,  1 Baker CJO

Populate A-box Object Property. Multi-properties

Domain ClassMan

Range ClassWoman

Object Property

hasSister

T-Box

A-Box

Object Property

hasMother

Domain Instance

SamuelRange Instance

MaryhasSister

?

hasMother

?

Page 7: 1 Kouznetsov  A,  2 Shoebottom B,  1 Baker CJO

More complicate case….

Domain Instance

SamuelRange Instance

Mary

hasSister ?

hasMother ?

hasSameLastName

?

Page 8: 1 Kouznetsov  A,  2 Shoebottom B,  1 Baker CJO

Methodology

• Ontology-based information retrieval applies Natural Language processing (NLP) to link text segments, named entities and relations between named entities to existing ontologies.

• Algorithm leverages a customized gazetteer list, including lists specific to object property synonyms

• Score A-box property candidates by using functions of distance between co-occurred terms.

• A-box Property prediction and population based on these scores (Thresholds, Fuzzy approach)

Page 9: 1 Kouznetsov  A,  2 Shoebottom B,  1 Baker CJO

Main Implementation tools

Java

GATE/JAPE

OWLAPI

Page 10: 1 Kouznetsov  A,  2 Shoebottom B,  1 Baker CJO

Semi-Automatic Ontology populating pipeline

Source Documents

XML

Preprocessing

SynonymsLists

TextSegmentsProcessing

TextSegments

Separation

Sentences

Tables

Other Text Segments

Ontologyunpopulated

(OWL)

Term List(Excel)

OntologyPopulation

Named Entities

Single Relations

MultiRelations

Populated Ontology

Using Ontology

Reasoning

Visualizing

VisualQueries

Connecting Recourses

Page 11: 1 Kouznetsov  A,  2 Shoebottom B,  1 Baker CJO

Populating Ontology

Scoring Framework

Co-occurrence Based Scores

generator

Relation Framework for A-box

candidates extraction

Candidate

Decision Framework

Decisionmodule

Reasoning

Ontology

Scores

Focus

LabelledDataTres

Page 12: 1 Kouznetsov  A,  2 Shoebottom B,  1 Baker CJO

Co-occurrence Based Scores generator

Co-occurrence Based Scores generator (Light version)

A-box CandidateAll related content

Scores

Relations Framework

Relation Object

Tokenizer

Gazetteer

Score calculator

IntegratorFragments Processor

Synonyms List

Page 13: 1 Kouznetsov  A,  2 Shoebottom B,  1 Baker CJO

Generation of Scores

• Relation Collection

Framework to process Relation objects

• Relation Object

integrates object property with:• all types of related text fragments• ontology objects• and score processing intermediate and final results

identified as : Domain Class: Domain Instance : Object Property : Range Class: Range Instance

Page 14: 1 Kouznetsov  A,  2 Shoebottom B,  1 Baker CJO

Scores Generator: Details

Score Calculator: • Score calculation for text fragments associated

with the Relation .

• Current version based on distance between occurred entities and number of text fragments with co-occurrence

• Includes by Text Fragments Processor and Integrator

Page 15: 1 Kouznetsov  A,  2 Shoebottom B,  1 Baker CJO

2-terms and 3-terms scoring system

Tokenizer

Score Gazeteer

ScoreProcessor

Domain Synonyms list

RangeSynonyms list

Object Property

Synonyms list

Tokenized sentence

sentencescore

Legend Legacy (2 terms) System

Modified/Added on new (3 terms) system

Page 16: 1 Kouznetsov  A,  2 Shoebottom B,  1 Baker CJO

Multiple Formats Score Generation

Technical documentation contains knowledge displayed in multiple formats, each requiring different processing subroutines:

• Table Processing• Sentence Processing• Other segments

Page 17: 1 Kouznetsov  A,  2 Shoebottom B,  1 Baker CJO

Extensible Data Model

Document Segment

Table Segment

Data Cell

IDContent

Row Header

IDContent

Column Header

IDContent

Table Header

IDContent

Text Segment

Sentence

IDContent

Document

Corpus

Doc ID

Options: Sections, Paragraphs, Bullet lists, Headings

Page 18: 1 Kouznetsov  A,  2 Shoebottom B,  1 Baker CJO

A-Box Prop. Population

A-Box property candidates list

Text Mining

corpus

Gazetteer List

A-Box Obj. Properties (399)

Properties with occurrence of domain

or rangeIndividuals (256)

Properties with co-occurrence of

domain and rangeIndividuals (143)

Ontology processing

T-Box Obj. Properties (102)

Page 19: 1 Kouznetsov  A,  2 Shoebottom B,  1 Baker CJO

A-Box scoring

Evidences for A-box Obj. Property candidates

Current A-box Object Property Candidate

Evidences for Current A-box (co-occurrence of Domain and Range)

Text Segment

Sentence

IDContent

Text Segment

Sentence

IDContent

Text Segment

Sentence

IDContent

Text Segment

Sentence

IDContent

Table Segment

Data Cell

ID

Content

Row Header

ID

Content

Column Header

ID

Content

Table Header

ID

Content

Table Segment

Data Cell

ID

Content

Row Header

ID

Content

Column Header

ID

Content

Table Header

ID

Content

Table Segment

Data Cell

ID

Content

Row Header

ID

Content

Column Header

ID

Content

Table Header

ID

Content

Table Segment

Data Cell

ID

Content

Row Header

ID

Content

Column Header

ID

Content

Table Header

ID

Content

Evidences for Current A-box (occurrence of Domain or Range)

Text Segment

Sentence

IDContent

Text Segment

Sentence

IDContent

Text Segment

Sentence

IDContent

Text Segment

Sentence

IDContent

Table Segment

Data Cell

ID

Content

Row Header

ID

Content

Column Header

ID

Content

Table Header

ID

Content

Table Segment

Data Cell

ID

Content

Row Header

ID

Content

Column Header

ID

Content

Table Header

ID

Content

Table Segment

Data Cell

ID

Content

Row Header

ID

Content

Column Header

ID

Content

Table Header

ID

Content

Table Segment

Data Cell

ID

Content

Row Header

ID

Content

Column Header

ID

Content

Table Header

ID

Content

Page 20: 1 Kouznetsov  A,  2 Shoebottom B,  1 Baker CJO

Table Segments: Primary ScoringTable Segment

Data Cell

IDContent

Row Header

IDContent

Column Header

IDContent

Table Header

IDContent

A-Box scoring

Current A-box Object Property Candidate

Domain Property Range

Page 21: 1 Kouznetsov  A,  2 Shoebottom B,  1 Baker CJO

Table Segments: Secondary ScoringTable Segment

Data Cell

IDContent

Row Header

IDContent

Column Header

IDContent

Table Header

IDContent

A-Box scoring

Current A-box Object Property Candidate

Domain Property Range

Page 22: 1 Kouznetsov  A,  2 Shoebottom B,  1 Baker CJO

Sentence Scoring• A-box Object property Score for sentenceSentenceScore=1/(distance+1)+Bonus

• Integrated Object property Score over all related sentences

IntegratedScore= SUM(SentenceScore)

• Summarize Integrated Score with Table Scores

• Normalized Object property Score NormolizedScore= IntegratedScore/Norm

Page 23: 1 Kouznetsov  A,  2 Shoebottom B,  1 Baker CJO

Sentence scoring Score=1/(distance+1)+Bonus

< > </ > 1D R

< > </ > 21 2 3D 4 R

< > </ > 41 2 PD 4 R

< > </ > 31 2 3D 4 R 6 P

Domain Synonym Range Synonym Object Property Synonym

D R P

Distance: 1000, Bonus =0, Score= 1/(1000+1)+0=0.00099

Distance: 4, Bonus =0, Score= 1/(4+1)+0=0.2

Distance: 6, Bonus =3, Score= 1/(6+1)+3=3.14

Distance: 4, Bonus =10, Score= 1/(4+1)+10=10.2

Page 24: 1 Kouznetsov  A,  2 Shoebottom B,  1 Baker CJO

Example Sentence Type 1< > </ > 1D R

Distance: 1000, Bonus =0, Score= 1/(1000+1)+0=0.00099

sentence before cleaning: ["<Paragraph></Action> <Figure Numbered="Unnumbered" Position="Inline" TextSize="medium" Width="column" frame="all" id="DLM-11334063" xml:lang="en"><image border-style="none" border-width="medium" xml:lang="en" href="ERGNN46205-301Loosening_screws_on_the_SDM_FW4_8010co_chassis33b.png"/></Figure></Step><Step xml:lang="en"><Action><Paragraph xml:lang="en">Rotate the insert/extractlevers to eject the 8660 SDM from the chassis.] Final Score=9.99000999000999E-4 Best Bonus=0.0 Final Distance=1000.0

Telecommunications_Chassis:8010co_Chassis:hasChassis_Shipping_Accessories:Telecommunications_Chassis_Screws:Screws

Property Synonyms:

•need•have•require•has

Domain Synonyms:•8010co chassis•8010co Chassis•8010 CO chassis•8010co•8010CO chassis

Range Synonyms:

•Screws•screws

Page 25: 1 Kouznetsov  A,  2 Shoebottom B,  1 Baker CJO

Example Sentence Type 2

sentence after cleaning: In a chassis that includes two power supplies in a non redundant power configuration, you must start both restrictions dual power supplies power supply units within 2 seconds of each other.

Final Score=0.05Best Bonus=0.0 Final Distance=19

Telecommunications_Chassis:Chassis:hasChassis_Components:Telecommunications_Chassis_Power_Supply:Power_Supply

Property Synonyms:

•have•has

Domain Synonyms:

•chassis•switch chassis•8000 series•Chassis•CO chassis

Range Synonyms:

•Power Supply•transformer•power supply•power module•Power supply

< > </ > 21 2 3D 4 R

Page 26: 1 Kouznetsov  A,  2 Shoebottom B,  1 Baker CJO

Example Sentence Type 4

sentence after cleaning: In a chassis that includes two power supplies in a non redundant power configuration, you must start both restrictions dual power supplies power supply units within 2 seconds of each other.

Final Score=10.05Best Bonus=10.0 Final Distance=19

Telecommunications_Chassis_Power_Supply:Power_Supply:isPart_of_Chassis:Telecommunications_Chassis:Chassis

Property Synonyms:

•used in•include

Domain Synonyms:

•Power Supply•transformer•power supply•power module•Power supply

Range Synonyms:

•chassis•switch chassis•8000 series•Chassis•CO chassis

< > </ > 41 2 PD 4 R

Page 27: 1 Kouznetsov  A,  2 Shoebottom B,  1 Baker CJO

Bonus Calculation

< > </ >1 2 PD 4 R6

< > </ >1 2 3D R6P Distance: 6, Bonus Constant =10, Tokens in Property=2, Score= 1/(6+1)+2*10=20.14

Distance: 6, Bonus Constant=10, Tokens in Property=1, Score= 1/(6+1)+1*10=10.14

P

3

Bonus= Bonus Constant * Number of tokens in property

Sentence Example: Device X does not support Device Y

Object Properly Tokens Number Obtained Score Support 1 1/(3+1)+1*10=10.25 Not Support 2 1/(3+1)+2*10=20.25 V

Page 28: 1 Kouznetsov  A,  2 Shoebottom B,  1 Baker CJO

Normalization• Norm coefficient for A-box object property

Log(1.0+(NSD+1.0/Cd) *(NSR+1.0/Cr) )NSD – Number Of Sentences Domain OccurredCd – Domain Synonyms List CardinalityNSR – Number Of Sentences Range OccurredCr – Range Synonyms List Cardinality

Page 29: 1 Kouznetsov  A,  2 Shoebottom B,  1 Baker CJO

Gold Standard and Evaluation Framework

A-BoxOntology

T-Box Ontology

LabelsEvaluation

Report

Source Documents

XML

Preprocessing

Synonyms

Lists

TextSegmentsProcessing

TextSegment

sSeparati

on

Sentences

Tables

BulletLists

Ontologyunpopulated

(OWL)

Term List(Excel)

OntologyPopulation

Named

Entities

Single Relatio

ns

MultiRelatio

ns

Populated Ontology

Using Ontology

Reasoning

Visualizing

VisualQueries

Connecting

Recourses

PopulateOntology

Prediction evaluation Framework

Evaluate predictedProperties

/Update DB

Golden StandardDatabase

Import labels

KnowledgeEngineer

Page 30: 1 Kouznetsov  A,  2 Shoebottom B,  1 Baker CJO

Thresholds: Decision Boundary

All scores for each A-box property candidate are summarized for based on eligible sources of evidence for the A-box in question

Threshold in use Trade off - Recall vs. Precision

Page 31: 1 Kouznetsov  A,  2 Shoebottom B,  1 Baker CJO

Results for Tables: Baseline result

Focus on Positive class Recall and Positive class Precision

Class of interest (Positive class) Recall =0.80 Precision=0.85

Page 32: 1 Kouznetsov  A,  2 Shoebottom B,  1 Baker CJO

Results for Tables: Continued

Focus on Positive class Precision

Class of interest (Positive class) Recall =0.25 Precision=1.0

Page 33: 1 Kouznetsov  A,  2 Shoebottom B,  1 Baker CJO

Results for Tables: Continued

Focus on Positive class Recall

Class of interest (Positive class) Recall =1.0 Precision=77.5

Page 34: 1 Kouznetsov  A,  2 Shoebottom B,  1 Baker CJO

Results for Sentences

Focus on Positive class Precision

Class of interest (Positive class) Recall =0.14 Precision=1.0

Page 35: 1 Kouznetsov  A,  2 Shoebottom B,  1 Baker CJO

Results for Sentences and Tables

Focus on Positive class Precision Class of interest (Positive class)

Recall =0.4 Precision=1.0

Synergetic effect of using Sentences and Tables (wrt Precision=1.0):

Recall (sentences)= 0.14 Recall (tables)= 0.25 Recall (sentences & tables)= 0.4

Page 36: 1 Kouznetsov  A,  2 Shoebottom B,  1 Baker CJO

Advantages Improve Quality of Knowledge Base

Managing the argumentation process KB vs KE Iterative improvement of accuracy

Tier1 doing Tier 2 task (improve service)Tier1 (high precision) KB queryTier 2 (high recall) – knowledge integration Facilitate information processing without KE

Reduce workload (saving)

Page 37: 1 Kouznetsov  A,  2 Shoebottom B,  1 Baker CJO

Improve Quality of Knowledge Base

• Offline task by Knowledge Engineer • Disambiguation– Expert can pay special attention to any significant

inconsistency in human and machine outputs such as - Highly scored A-box candidates labeled as negatives

• Human Expert & Machine Committee vs. single human expert

Page 38: 1 Kouznetsov  A,  2 Shoebottom B,  1 Baker CJO

Real Time Integration of New Evidence

• Online, by call centre worker, at knowledge use stage– Extracting additional object properties from new

documents for emergency case– High Positive Precision focused scenario

• Offline, by Senior call centre worker, at knowledge use stage– Extracting additional object properties from new

documents for questions not answered online– High Positive Recall focused scenario

Page 39: 1 Kouznetsov  A,  2 Shoebottom B,  1 Baker CJO

Reduce Workload

• Online and Offline • Automatically Extracted Evidenced• Ranked Solutions with notified level of

confidence

Page 40: 1 Kouznetsov  A,  2 Shoebottom B,  1 Baker CJO

Gold Standard Corpus and Evaluation Framework

A-BoxOntology

T-Box Ontology

LabelsEvaluation

Report

Source Documents

XML

Preprocessing

Synonyms

Lists

TextSegmentsProcessing

TextSegment

sSeparati

on

Sentences

Tables

BulletLists

Ontologyunpopulated

(OWL)

Term List(Excel)

OntologyPopulation

Named

Entities

Single Relatio

ns

MultiRelatio

ns

Populated Ontology

Using Ontology

Reasoning

Visualizing

VisualQueries

Connecting

Recourses

PopulateOntology

Prediction evaluation Framework

Evaluate predictedProperties

/Update DB

Golden StandardDatabase

Import labels

KnowledgeEngineer

Page 41: 1 Kouznetsov  A,  2 Shoebottom B,  1 Baker CJO

Future Work: Extend Literature Scheme

• Sections• Paragraphs• Bullet Lists• Connect with Headings and Topics