25

An Ontological Approach to Financial Analysis and Monitoring

Embed Size (px)

DESCRIPTION

Application Architecture

Citation preview

Page 1: An Ontological Approach to Financial Analysis and Monitoring
Page 2: An Ontological Approach to Financial Analysis and Monitoring

An Ontological Approach to FinancialAnalysis and Monitoring

Page 3: An Ontological Approach to Financial Analysis and Monitoring

Application Architecture

Page 4: An Ontological Approach to Financial Analysis and Monitoring

Research Areas

● Data Extraction

● Data Disambiguation

● Semantic Association and Ranking

● MathML-S (MathML with Semantics)

Page 5: An Ontological Approach to Financial Analysis and Monitoring

● Ontology schema designed to provide common terminology for a domain.

● Ontology instances represent actual data from a domain.

● Data is extracted from multiple resources and translated into instances of a single ontology.

● Original data sources include:

● Databases● XML files● HTML pages● Text documents● Etc.

Data Extraction

Page 6: An Ontological Approach to Financial Analysis and Monitoring

Data Disambiguation

Ongoing research to develop relationship and attribute based disambiguation techniques so that the ontology can be meaningfully populated.

Simple Example:

Is “Athens” the city in Georgia or the city in Greece?

Page 7: An Ontological Approach to Financial Analysis and Monitoring

Data Disambiguation

Challenges● Merging two or more databases/ontologies/xml files with multiple references of the same logical entity● Adding new entities to an ontology when a similar entity already exists● Variations in database/ontology/xml schemas● Variations in information representation● Incomplete information● Use of abbreviations, mis-spellings, various naming conventions, format changes, etc.

Page 8: An Ontological Approach to Financial Analysis and Monitoring

Data Disambiguation

Schema

Person

-- SSN

-- TelNumber

-- FirstName

-- MiddleName

-- LastName

-- Generation

-- Marital Status

-- Applicant

-- dependent of

-- spouse of

-- works for

-- affiliated with

-- foreign influence event

-- address

Tim Robins

-- 889889889

-- 7065434567

-- Tim

--

-- Robins

--

-- Single

--

--

--

-- People Soft

--

-- event7823

-- place23

Conflicting instances

Timothy Wallace Robinson

-- 889889889

-- 7062123443

-- Timothy

-- Wallace

-- Robinson

--

-- Married

--

--

-- person12332

-- Oracle

--

-- event099

-- place23

Reconciling Oracle and PeopleSoft

indicates the two person entities work

for the same organization

Recognized as a time sensitive attribute

String similarity metrics

Nature of attribute indicates its relative importance – SSN

given a high weight in disambiguating

person entities

Page 9: An Ontological Approach to Financial Analysis and Monitoring

Semantic Associations and Ranking

Semantic Associations● Semantic associations are relationships or paths between concepts in an ontology

Ranking● Ranking based on multiple factors

● Number of links, types of links, location in ontology, etc.● Ranking indicates degree of semantic “closeness”

Page 10: An Ontological Approach to Financial Analysis and Monitoring

Semantic Associations and Ranking

Characterizing document content in terms of ontology “semantic annotation”

● Correlate words/phrases from document with entities/relationships in ontology

● Entity Identification

● Meta-data added to document (from associated ontological knowledge)

● Active area of research but practically useful technology now available● Constrained to content of ontology

Page 11: An Ontological Approach to Financial Analysis and Monitoring

Semantic Associations and Ranking

Semantic Relationships between Documents and Ontology

● Semantic associations: relationships between document concepts and ontology concepts are discovered and ranked

● Ranking based on multiple factors

● no. of links, types of links, location in ontology, …

● Ranking indicates degree of semantic “closeness”

Page 12: An Ontological Approach to Financial Analysis and Monitoring

Semantic Associations and Ranking

● Highly relevant

● Closely related

● Ambiguous

● Not relevant

● Undeterminable

Documents Ranking

Page 13: An Ontological Approach to Financial Analysis and Monitoring

Semantic Associations and Ranking

Research Content

● Discovery & Ranking of semantic semantic associations● Characterizing “need to know” in terms of ontological concepts & relationships● Meta-data annotation of data and (semi-structured & unstructured) documents

● correlation of document content & concepts in ontology

Page 14: An Ontological Approach to Financial Analysis and Monitoring

Semantic Associations and Ranking

Research Challenges

In this project we are addressing:● Discovery of Semantic Associations per entity per document● Input/Visualization/Management of Context of Investigation● Scalability on number of documents & ontology size

● Performs well with thousand documents● Ranking of documents

Page 15: An Ontological Approach to Financial Analysis and Monitoring

Semantic Associations and Ranking

Ranking of Documents Relevance

“Closely related entities are more relevant than distant entities”

E = {e | e Document }

Ek = {f | distance(f, eE) = k }

nk

k

k

k

k

Eelevanceentities_R

ERelevancerelations_

Elevanceclasses_Re

0

)(

)(

)(

RelevanceDocument

Page 16: An Ontological Approach to Financial Analysis and Monitoring

Semantic Associations and Ranking

Relevance Measures for Documents

● Relevance engine input● the set of semantically annotated documents● the context of investigation for the assignment● the ontology schema represented in RDFS, and the

ontology instances represented in RDF● Relevance measure function used to verify whether the entity

annotations in the annotated document can be fit into the entity classes, entity instances, and/or keywords specified in the context of investigation.

Page 17: An Ontological Approach to Financial Analysis and Monitoring

Semantic Associations and Ranking

Ranking of Documents Relevance

Four groups of document-ranking:● Not Related Documents

● unable to determine relation to context● Ambiguously Related Documents

● some relationship exists to the context● Somehow Related Documents

● Entities are closely related to the context● Highly Related Documents

● Entities are a direct match to the context

Page 18: An Ontological Approach to Financial Analysis and Monitoring

Semantic Associations and Ranking

Ranking of Documents Relevance continued

Cut-off values determine grouping of documents w.r.t. relevance

● These are customizable cut-off values (more control and more meaningful parameters compared to say automatic classification or statistical approaches)

“Inspection” of a document is possible via (a) original document or (b) original document with highlighted entities

Page 19: An Ontological Approach to Financial Analysis and Monitoring

Semantic Associations and Ranking

Ontology-driven Thematic Association LifecycleBuilding a scalable and high

performance capability with support for:

• Task domain ontology creation and maintenance

• Ontology “Knowledge” based on trusted sources supporting Document Classification

• Ontology-driven Semantic Metadata Extraction/Annotation

• Utilizing semantic metadata and ontology to associate document theme(s) with analytical task

• Weighting process used to measure degree of relevance

Task Domain SchemaCreation

Ontology Population

Metadata ExtractionAnd Annotation Enhancement

Thematic Association/Relationships

Discovery

Semantic RelationshipRank Analysis Ontology API

MB KB

Page 20: An Ontological Approach to Financial Analysis and Monitoring

MathML-S

● An interface has been developed that allows the user to specify the set of things that need to be verified for any given individual.

● This kind of “ultimate flexibility” is possible due to the ontological approach used.

● An application of research in modeling rules for identifying financial irregularities using MathML (MathML-S)

● Traversals, formulas, rules and profiles represent the data, calculations and checks that need to be performed

● These are created using the graphical interface developed, and stored in the Component Library

Page 21: An Ontological Approach to Financial Analysis and Monitoring

MathML-S

● A traversal is a path through the ontology ending with a data-type value.

● A formula is a computation of a value using concepts found in the ontology. It may be constrained using data found in the ontology instance data.

● A rule is a verification (with boolean value) performed on:

● A computed value that come from a formula• The existence of certain types of relationships • Other types of rules may be added based on feedback.

● A profile is a collection of rules, where rules may be given different weights. A profile value can be computed for a person represented in the ontology.

Page 22: An Ontological Approach to Financial Analysis and Monitoring

MathML-S

Rule Example

Solvency Ratio Check

Traversals● Asset_T = value of Asset(n)● Liability_T = value of Liability(n)

Formulas● Total_Assets = Asset_T1 + Asset_T2_ + … + Asset_Tn● Total_Liabilities = Liability_T1 + Liability_T2 + … +

Liability_Tn● Solvency_Ratio = Total_Assets / Total_Liabilities

Rule● Solvency_Ratio_Check = Solvency_Ratio > 1.1

Page 23: An Ontological Approach to Financial Analysis and Monitoring

MathML-S

● Data integrated in the ontology is queried to verify compliance with respect to a set of customizable rules

● These rules include math calculations, verification of conditions, calculation and verification of ratios, etc.

● The ontological approach allows definition of such rules by using concepts and relationships-types of the ontology. (Thus applicable to sub-concepts)

Page 24: An Ontological Approach to Financial Analysis and Monitoring

MathML-S

● Semantic matching and the graph-like nature of ontology provide flexibility on defining the rules yet few research issues are to be addressed

● Operands for a formulas are data items in the ontology. Some data values are retrieved by traversing specific sequences of relationships

● Formulas are self-contained so that they can be re-uses in various rules (thus providing flexibility on maintenance)

● A profile is a set of rules where its verification implies querying the ontology and at the same time computing formulas and rule values

Page 25: An Ontological Approach to Financial Analysis and Monitoring

MathML-S