56
Ontologies, Taxonomies and Search Denise Bedford Denise Bedford Special Libraries Association Special Libraries Association Baltimore, Maryland Baltimore, Maryland June, 2006 June, 2006

Enterprise Architecture Topics

  • Upload
    aamir97

  • View
    868

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Enterprise Architecture Topics

Ontologies, Taxonomies and Search

Denise BedfordDenise Bedford

Special Libraries AssociationSpecial Libraries AssociationBaltimore, MarylandBaltimore, Maryland

June, 2006June, 2006

Page 2: Enterprise Architecture Topics

Presentation OverviewPresentation Overview

I will talk today from the ‘end game’ perspective I will talk today from the ‘end game’ perspective of search – placing taxonomies and ontologies in of search – placing taxonomies and ontologies in that contextthat context

– Search – User NeedsSearch – User Needs– Search Functional Architecture Search Functional Architecture – Metadata & CategorizationMetadata & Categorization– Semantic TechnologiesSemantic Technologies

Page 3: Enterprise Architecture Topics

Search – User NeedsSearch – User Needs

Page 4: Enterprise Architecture Topics

Basic Assumptions About Search Basic Assumptions About Search SystemsSystems

Search systems are not WYSIWYG – what you see is Search systems are not WYSIWYG – what you see is not what you getnot what you get

Search systems are more like ice-bergs - most of Search systems are more like ice-bergs - most of the search system is below the surface – you can’t the search system is below the surface – you can’t see it, you don’t know how it has been configured see it, you don’t know how it has been configured or what components it hasor what components it has

Helpful to have a framework against which you can Helpful to have a framework against which you can judge the suitability of any search tool to your judge the suitability of any search tool to your environmentenvironment

Assess the search environment before you decide Assess the search environment before you decide what kind of a search system you needwhat kind of a search system you need

Page 5: Enterprise Architecture Topics

The EnvironmentThe Environment Like other libraries we acquire, create, license, access Like other libraries we acquire, create, license, access

information in 30 subject domains – ranging from Law & information in 30 subject domains – ranging from Law & Justice, to Transport, to Environment, to Agriculture, to Justice, to Transport, to Environment, to Agriculture, to Education, Health & Nutrition, etc. Education, Health & Nutrition, etc.

We have 500 different kinds of content/document types We have 500 different kinds of content/document types

We have a set of business processes which represent the We have a set of business processes which represent the way we do our workway we do our work

We have six working languages – but actually working in We have six working languages – but actually working in many moremany more

We have a rich history – content dating back to the 1940sWe have a rich history – content dating back to the 1940s

Our priorities change over time Our priorities change over time

Everyone in the Bank is a recognized international expert Everyone in the Bank is a recognized international expert

Page 6: Enterprise Architecture Topics

Underlying Search ArchitectureUnderlying Search Architecture

U

ser

Inte

rfa

ceS

ea

rch

En

gin

es

I

nfo

rma

tion

UI - Image BankUI - Intranet UI - External WebUI - IRIS

Oracle IntermediaOracle IntermediaGoogle Google

IRISImage BankIntranet External Web

UI-JOLISUI-

SAP&PeopleSoft UI-External

Sources

JOLIS

BRS-search SAP&PeopleSoft

SAP&PeopleSoft

Individual search engines

External Sources

Silos of Information = Multiple search engines and user interfaces

Page 7: Enterprise Architecture Topics

Search EnvironmentSearch Environment It is very challenging to design a search system which It is very challenging to design a search system which

meets the needs of this complex environment meets the needs of this complex environment

Internal Experts do ‘known item’ searching and look for Internal Experts do ‘known item’ searching and look for other experts to talk toother experts to talk to

General public looks for information ‘about’ somethingGeneral public looks for information ‘about’ something

Most people don’t know all the dimensions of the BankMost people don’t know all the dimensions of the Bank

People need to search in other than English and find People need to search in other than English and find content written in other languagescontent written in other languages

Our new Knowledge and Learning initiative is moving us Our new Knowledge and Learning initiative is moving us towards a W3 working environment - What information I towards a W3 working environment - What information I need when I need it, where I need it need when I need it, where I need it

We need a search architecture that supports the user’s We need a search architecture that supports the user’s needsneeds

Page 8: Enterprise Architecture Topics

Information Architecture: Moving Towards Information Architecture: Moving Towards ContextualizationContextualization

Anonymous Access(Context

Free)

Group Access (Context

Sensitive – Group)

Individualized Access (Context

Sensitive – Individual)

Customized Access to Information

Co

nte

nt

– In

form

atio

n A

sset

s

Str

uctu

red

Less

str

uctu

red

Intranet

External Web

Operations Portal

myPortal*

myPortal*

Client Connections

PeopleBasic Information “Customer” Information

3W Information Model

What I Want

Where I Want

When I Want

Page 9: Enterprise Architecture Topics

Search Use Case AnalysisSearch Use Case Analysis

Why Search Wasn’t WorkingWhy Search Wasn’t Working

Page 10: Enterprise Architecture Topics

Search Has Always Been Search Has Always Been ProblemmaticProblemmatic

Despite any reports you may read in the general Despite any reports you may read in the general literature, buying a fast crawler will not solve your literature, buying a fast crawler will not solve your search problemssearch problems

Implementing a fast crawler simply surfaces Implementing a fast crawler simply surfaces information management and data quality information management and data quality challenges directly to the userschallenges directly to the users

The crawler approach provides high hit rates, and The crawler approach provides high hit rates, and very low relevancy ratesvery low relevancy rates

It was a challenge to understand the search It was a challenge to understand the search problem from the system-side problem from the system-side

Try to understand the search problem from the Try to understand the search problem from the user’s sideuser’s side

Page 11: Enterprise Architecture Topics

Consultant is looking for the CAS for the Congo. Consultant is looking for the CAS for the Congo.

Steps 1 - 2

DEC Consultant

F

Intranet Search

ImageBank Search

Steps 4-6

F

Page 12: Enterprise Architecture Topics

Adviser wants to find social indicators, relevant quotes and Bank’s position on racism, as Adviser wants to find social indicators, relevant quotes and Bank’s position on racism, as well as learn what the Bank has done in this area in order to prepare a 30 min. speech for well as learn what the Bank has done in this area in order to prepare a 30 min. speech for

a WB Managing Director for conference.a WB Managing Director for conference.

Steps 1-4

HD Adviser

F

WB Intranet Search

Google Internet

Steps 7-12PS

External NGOColleague

Bank Colleagues LAC Group

Steps 15-16

SSteps 5-6 S

PS

Page 13: Enterprise Architecture Topics

What We LearnedWhat We Learned– The absolute success rate per search task was low at The absolute success rate per search task was low at

43.18%43.18%

– The Search experience generally consists of multiple steps The Search experience generally consists of multiple steps and multiple searches within and across sourcesand multiple searches within and across sources

– The source with the fewest number of steps was a The source with the fewest number of steps was a Colleague or Personal Contact. The source with the Colleague or Personal Contact. The source with the greatest number of steps was External Web Search Browsegreatest number of steps was External Web Search Browse

– Each source has its own behavior, business rules, functional Each source has its own behavior, business rules, functional architectures – users need to learn each systemarchitectures – users need to learn each system

– The Intranet search was selected as a logical place to look The Intranet search was selected as a logical place to look for information in over 65% of the use cases. However, the for information in over 65% of the use cases. However, the success rate for the Intranet Google search was only 35%success rate for the Intranet Google search was only 35%

Page 14: Enterprise Architecture Topics

What We LearnedWhat We Learned

You need to understand search from the You need to understand search from the searcher’s point of view searcher’s point of view

Until you can see it this way, you are simply Until you can see it this way, you are simply guessing at what problems need to be solved and guessing at what problems need to be solved and how to solve themhow to solve them

This is a challenge for an organization whose This is a challenge for an organization whose users are the public, but it is possible to design users are the public, but it is possible to design an architecture that suits the general public and an architecture that suits the general public and the internal staff, and the important stakeholdersthe internal staff, and the important stakeholders

The critical success factor is to design an open The critical success factor is to design an open architecture for agility and flexibilityarchitecture for agility and flexibility

Page 15: Enterprise Architecture Topics

Enterprise SearchEnterprise Search

Evolving Search ArchitectureEvolving Search Architecture

Page 16: Enterprise Architecture Topics

Contextualization, Personalization, Recommender, Content Syndication, Q&A Systems, Intelligent Search

What the user sees

Search Engine PartsSearch Engine Parts

Search System Inputs

Metadata Repository

for Enterprise

Search

Index Architectur

e & Constructi

on

Vocabulary Support

(thesaurus)

Classification

Schemes (taxonomy

)

Query Transformation

& MatchingBoolean matching

Exact phrase matching

Fuzzy matching

Term matching + synonym expansion

Term matching + cross language

expansionTerm weighted

matchingQuery term root

matching & dictionary expansion

Wild card matchingTerm proximity

matchingNeural network

expansionGenetic algorithm

expansion

Search OutputsQuery Manipulation

Simple search

Fielded Search

Query Processing Algorithms

Search term assistance (thesaurus, dictionaries)

Search language selection

Sources selection

Display ResultsIntegrated search results

Search results sorting

Search results contextualization

Search results relevancy ranking

The basic search programs

Foundation or Baseline of The Search System

Query Processing -Transforming the

query to add tags,

-Synonyms, etc.

Page 17: Enterprise Architecture Topics

Enterprise Search Proposed Enterprise Search Proposed SolutionSolution

What Enterprise Search is and what it is notWhat Enterprise Search is and what it is not

Distinguish between the web services platform for accessing Distinguish between the web services platform for accessing Enterprise Search and the enterprise’s full store of content Enterprise Search and the enterprise’s full store of content (its not only electronic, and its not all in html format!!)(its not only electronic, and its not all in html format!!)

What kind of an architecture do you need to support What kind of an architecture do you need to support enterprise search?enterprise search?

What kind of tools do you need to support enterprise What kind of tools do you need to support enterprise search? Its much more than just a web crawler….search? Its much more than just a web crawler….

What do you need to support true concept level cross-What do you need to support true concept level cross-language searching? language searching?

How does Enterprise Search respect security classifications How does Enterprise Search respect security classifications with/without restricting knowledge of what exists? with/without restricting knowledge of what exists?

Page 18: Enterprise Architecture Topics

ConsolidatedMetadata

Store

Enhanced Common

DataStores*

MDRTools & Utilities

Metadata Extracts

Metadata Extracts

Metadata Loads

Metadata Maintenance Utilities

Security Policy

Change Mgmt.Processes

Utilities

Interface Templates

ParametricIndexes

IndexUtilities

Teragram Metadata Capture

SearchInterface

-Simple andFielded Search

ResultsDisplay &

Manipulation

QueryManipulation

Options

QueryProcessingAlgorithms

MetaModelRepository

RelationalMetaModel

BusinessMetaModel

Logical MetaModel

ApplicationMetaModel

Metadata RepositoryMetaModel

Including• transformation rules• reporting specs• loader programs• data standards• data rationalization

ContentAggregator

RecommenderEngine

ContentSyndication

PersonalizationProfiles

SocialOr TaskFiltering

ThresholdFiltering

Search tools

Enterprise Search Functional Architecture

JOLIS

MD

IRAMS

MD

Global JOLIS

MD

Image Bank

MD

LMS

MD

CMS

MD

IRIS

MD

Union Index

*includes thesaurus support and taxonomies

Vocabulary Support

ClassificationSchemes

CrossLanguageSearching

Page 19: Enterprise Architecture Topics

Taxonomies and Ontologies Taxonomies and Ontologies in in

SearchSearch

Page 20: Enterprise Architecture Topics

Using Taxonomies in SearchUsing Taxonomies in Search

Controlled vocabularies – guiding users through the search Controlled vocabularies – guiding users through the search process process (flat taxonomies)(flat taxonomies)

Classification schemes – browsing, navigation, syndication, Classification schemes – browsing, navigation, syndication, contextualization contextualization (hierarchical taxonomies)(hierarchical taxonomies)

Metadata – supporting fielded search Metadata – supporting fielded search (faceted taxonomies)(faceted taxonomies)

Thesauri – supporting knowledge discovery, preventing Thesauri – supporting knowledge discovery, preventing Zero results, supporting cross-language searching Zero results, supporting cross-language searching (network (network taxonomies)taxonomies)

Synonym Rings – improving relevancy, reducing Synonym Rings – improving relevancy, reducing information scattering, managing recall information scattering, managing recall (ring taxonomies)(ring taxonomies)

Page 21: Enterprise Architecture Topics

Flat taxonomy

Hierarchical taxonomy

Ring taxonomy

Ring taxonomy

Fielded Search = Faceted Taxonomy

Vision of An Enterprise Advanced Search

Page 22: Enterprise Architecture Topics

Ring Taxonomy

NetworkTaxonomy

Metadata

Page 23: Enterprise Architecture Topics

Data Driving the Enterprise Data Driving the Enterprise ArchitectureArchitecture

30 Second Description of 30 Second Description of Taxonomies and OntologiesTaxonomies and Ontologies

Page 24: Enterprise Architecture Topics

Taxonomies & Ontologies Taxonomies & Ontologies

I just mentioned several kinds of taxonomies that searchI just mentioned several kinds of taxonomies that search

– Faceted taxonomies (metadata schemes)Faceted taxonomies (metadata schemes)– Flat taxonomies (controlled vocabularies, picklists)Flat taxonomies (controlled vocabularies, picklists)– Hierarchical taxonomies (classification schemes)Hierarchical taxonomies (classification schemes)– Network taxonomies (thesauri, semantic networks) Network taxonomies (thesauri, semantic networks) – Ring taxonomies – to handle synonym rings and Ring taxonomies – to handle synonym rings and

synonym searhingsynonym searhing

Let’s walk through these kinds of taxonomiesLet’s walk through these kinds of taxonomies

Then let’s see how they all make up an ontology Then let’s see how they all make up an ontology

Page 25: Enterprise Architecture Topics

Facet TaxonomiesFacet Taxonomies

Faceted taxonomy representedas a star data structure. Eachnode in the start structure isliked to the center focus. Any node can be linked to other nodes in other stars. Appears simple, but becomes complex quickly.

Page 26: Enterprise Architecture Topics

Core Metadata StrategyCore Metadata Strategy

What is a core metadata strategy?What is a core metadata strategy?

What’s the process you use to discover your organization’s What’s the process you use to discover your organization’s core metadata strategy?core metadata strategy?

Libraries have many core metadata standards – COSATI, Libraries have many core metadata standards – COSATI, Dublin Core, MARC, MODS, COSATI, AIIM TR48Dublin Core, MARC, MODS, COSATI, AIIM TR48

The important question for metadata in search is how are The important question for metadata in search is how are you using your metadata to support search? you using your metadata to support search?

Too often we put a ‘dumb’ search engine on top of ‘smart Too often we put a ‘dumb’ search engine on top of ‘smart metadata’ and do nothing more with it than ‘publish’ it metadata’ and do nothing more with it than ‘publish’ it

It’s time to think smarter about how we use our metadataIt’s time to think smarter about how we use our metadata

Page 27: Enterprise Architecture Topics

Purpose of Core MetadataPurpose of Core Metadata

Agent Country Authorized By

Record Identifier

Title Region Rights Management

Disposal Status

Date Abstract/ Summary

Access Rights

Disposal Review Date

Format Keywords Location Management History

Publisher Subject-Sector- Theme-Topic

Use History Retention Schedule/ Mandate

Language Business Function

Preservation History

Version Disclosure Status Aggregation Level

Series & Series #

Disclosure Review Date

Relation

Content Type

Identification/ Distinction

Search & Browse

Use ManagementCompliant Document Management

Page 28: Enterprise Architecture Topics

Capturing Core MetadataCapturing Core Metadata

Agent Country Authorized By

Record Identifier

Title Region Rights Management

Disposal Status

Date Abstract/ Summary

Access Rights

Disposal Review Date

Format Keywords Location Management History

Publisher Subject-Sector- Theme-Topic

Use History Retention Schedule/Mandate

Language Business Function

Disclosure Review Date Preservation History

Version Disclosure Status Aggregation Level

Series & Series #

Relation

Content Type

Identification/ Distinction

Use Management Compliant Document Management

Human CreationProgrammatic Capture

Inherit from System Context

Extrapolate from Business Rules

Search & Browse

Page 29: Enterprise Architecture Topics

Flat Taxonomy StructureFlat Taxonomy Structure

Energy Environment Education Economics Transport Trade Labor Agriculture

Page 30: Enterprise Architecture Topics

Hierarchical TaxonomyHierarchical Taxonomy

A hierarchical taxonomy is represented as a tree data structure in a database application. The tree data structure consists of nodes and links. In an RDBMSenvironment, the relationships become associations. In a hierarchical taxonomy, a nodecan have only one parent.

Page 31: Enterprise Architecture Topics

Hierarchical Taxonomies – Hierarchical Taxonomies – Classification SchemesClassification Schemes

Ranganathan is the supreme authority on how to create a Ranganathan is the supreme authority on how to create a well-design classification schemewell-design classification scheme

Most of what we teach in graduate school, though, is how to Most of what we teach in graduate school, though, is how to use a classification scheme not how to create oneuse a classification scheme not how to create one

Classification schemes are the controlling reference source Classification schemes are the controlling reference source for metadata attributesfor metadata attributes

For example, the Enterprise Topic Classification Scheme is For example, the Enterprise Topic Classification Scheme is the reference source for the attribute Topicthe reference source for the attribute Topic

We use tools to help us discover what the scheme should We use tools to help us discover what the scheme should be and also to help us classify content be and also to help us classify content

Page 32: Enterprise Architecture Topics

Network TaxonomiesNetwork Taxonomies

A network taxonomy is a plex data structure. Each node can have more than one parent. Any item in a plex structure can be linked to any other item. In plex structures, links can be meaningful & different.

Page 33: Enterprise Architecture Topics

Poverty mitigation

Poverty alleviation

Poverty elimination

Poverty reduction

Poverty eradication

Poverty abatement

Poverty prevention

Poverty reducation

Ring Taxonomy

Rings can include all kinds of synonyms - true, misspellings, predecessors, abbreviations

Page 34: Enterprise Architecture Topics

Content Entity1

Content Elements

Content

Metadata

Topic Class Scheme

Business ProcessScheme

Thesaurus

Country Names

Region Names

Skill Sets/Competencies

Standard Statistical Variables

Has values

usesHas

Contains

UserHas relationship to

Has Meaning in

Context

ContextualMatrix &Sensiing

Understood

uses

Not in progress In progress In Use

Profile

Has

Business Rule

Has

Ontology Design from 50,000 Feet

Has values

Content Parts

Has

Metadata

Has

Page 35: Enterprise Architecture Topics

Building & Maintaining Building & Maintaining TaxonomiesTaxonomies

Moving Towards Automted Moving Towards Automted Metadata CaptureMetadata Capture

Page 36: Enterprise Architecture Topics

Relationships across data

classes

3 OracleData

classes

Topic data class

SubtopicData Class

Page 37: Enterprise Architecture Topics

Building and Maintaining Building and Maintaining TaxonomiesTaxonomies

Moving towards automated metadata generation means Moving towards automated metadata generation means that catalogers shift their effort to reviewing the metadata that catalogers shift their effort to reviewing the metadata generated and to maintaining LCSH and LC classification as generated and to maintaining LCSH and LC classification as part of a suite of categorization toolspart of a suite of categorization tools

Level of effort shifts to training and developing the tools Level of effort shifts to training and developing the tools and away from original cataloging and metadata capture and away from original cataloging and metadata capture

Continue to work closely with subject experts to define the Continue to work closely with subject experts to define the controlled vocabularies and classification schemescontrolled vocabularies and classification schemes

Page 38: Enterprise Architecture Topics

The problem with metadataThe problem with metadata Metadata….Metadata….

– Is expensive and time consuming to createIs expensive and time consuming to create– Is sometimes subjective and not granular enoughIs sometimes subjective and not granular enough– Doesn’t always address the ways that users and systems think Doesn’t always address the ways that users and systems think

about the information it describesabout the information it describes– May not tell us enough about the information to trust it May not tell us enough about the information to trust it – May address only one context – the context for which it is May address only one context – the context for which it is

createdcreated– May live in the source application where it was createdMay live in the source application where it was created– May not be as accessible as the information objectMay not be as accessible as the information object

The future depends on metadata so we have to resolve these The future depends on metadata so we have to resolve these problems problems

We are resolving the problem using automated metadata creation We are resolving the problem using automated metadata creation and extraction technologiesand extraction technologies

Metadata extraction uses rules, classifiers and grammar enginesMetadata extraction uses rules, classifiers and grammar engines

Metadata creation uses categorization and rules enginesMetadata creation uses categorization and rules engines

Page 39: Enterprise Architecture Topics

Smart Use of TechnologiesSmart Use of Technologies

Sample structure – Bank Topics Classification Scheme Sample structure – Bank Topics Classification Scheme (hierarchical taxonomy)(hierarchical taxonomy)

– Oracle data classes used to represent Topic Classification Oracle data classes used to represent Topic Classification scheme scheme hierarchical taxonomy as reference source for the hierarchical taxonomy as reference source for the

attribute – Topicattribute – Topic used for Browse, Search, Content Syndication, used for Browse, Search, Content Syndication,

PersonalizationPersonalization

– 11stst challenge is to architect the hierarchy correctly challenge is to architect the hierarchy correctly 3 distinct data classes, not a tree structure with 3 distinct data classes, not a tree structure with

inheritanceinheritance Allows you to use the three data classes for distinct Allows you to use the three data classes for distinct

functions across systems but still enforce relationships functions across systems but still enforce relationships across the classesacross the classes

Page 40: Enterprise Architecture Topics

What is Teragram?What is Teragram?

Semantic analysis tools which support concept extraction, Semantic analysis tools which support concept extraction, categorization, summarization and pattern matching rules categorization, summarization and pattern matching rules enginesengines

Teragram works in 23 languagesTeragram works in 23 languages

Use categorization to capture Topics, Business Activities, Use categorization to capture Topics, Business Activities, Regions, Sectors, Themes, etc.Regions, Sectors, Themes, etc.

Use Concept Extraction to capture keywordsUse Concept Extraction to capture keywords

Use Rules Engine to capture Loan #, Credit #, Project ID, Trust Use Rules Engine to capture Loan #, Credit #, Project ID, Trust Fund #, etc.Fund #, etc.

Use Summarization to generate a ‘gist’ of the contentUse Summarization to generate a ‘gist’ of the content

Page 41: Enterprise Architecture Topics

How does semantic analysis work?How does semantic analysis work?

Page 42: Enterprise Architecture Topics

Automatically Generated MetadataAutomatically Generated Metadata

Metadata is generated using a categorization profile – reflects the way people organize.

Page 43: Enterprise Architecture Topics

Automatically Automatically ExtractedExtracted Metadata Metadata

<?xml version="1.0" encoding="UTF-8"?>

<Proper_Noun_Concept>

<Source><Source_Type>file</Source_Type>

<Source_Name>W:/Concept Extraction/Media Monitoring Negative Training Set/ 001B950F2EE8D0B4452570B4003FF816.txt</Source_Name>

</Source><Profile_Name>PEOPLE_ORG</Profile_Name>

<keywords>Abdul Salam Syed, Aruna Roy, Arundhati Roy, Arvind Kesarival, Bharat Dogra, Kwazulu Natal, Madhu Bhaduri, </keywords><keyword_count>7</keyword_count>

</Proper_Noun_Concept>

Proper Noun Profile for People Names uses grammars to find and extract the names of people referenced in the document.

Page 44: Enterprise Architecture Topics

Semantic Analysis BasicsSemantic Analysis Basics

Once you have made some sense of the sentence, Once you have made some sense of the sentence, reconstruct entities for information extraction (compose)reconstruct entities for information extraction (compose)

– Identify names and other fixed form expressions – Identify names and other fixed form expressions – people, organizations, conferencespeople, organizations, conferences

– Identify basic noun groups, verb groups, Identify basic noun groups, verb groups, presentations, other grammatical elementspresentations, other grammatical elements

– Construct complex noun groups and verb groupsConstruct complex noun groups and verb groups

– Identify event structuresIdentify event structures

– Identify common elements and associate Identify common elements and associate

Page 45: Enterprise Architecture Topics

Categorizing ContentCategorizing Content

Let’s look how we’re categorizing our content to this Let’s look how we’re categorizing our content to this structure automaticallystructure automatically

Topic classification, geographical region assignment, Topic classification, geographical region assignment, keywording exampleskeywording examples

Can apply this approach to any kind of content Can apply this approach to any kind of content

Enables us to build a robust metadata repository model, Enables us to build a robust metadata repository model, with strong metadata quality, to move towards SI at the with strong metadata quality, to move towards SI at the functional levelfunctional level

Also note that we can do this across many languagesAlso note that we can do this across many languages

Page 46: Enterprise Architecture Topics

Leveraging the StructureLeveraging the Structure Each subtopic is a knowledge domain Each subtopic is a knowledge domain (hierarchical (hierarchical

taxonomy) and etaxonomy) and each subtopic has an extensive concept ach subtopic has an extensive concept level definition (1,000 – 5,000+ concepts)level definition (1,000 – 5,000+ concepts)

Concepts are controlled vocabularies in their raw form Concepts are controlled vocabularies in their raw form (flat (flat taxonomy)taxonomy)

Concepts with relationships (extensive per new Z39.19 Concepts with relationships (extensive per new Z39.19 standard) comprise semantic network standard) comprise semantic network (network taxonomy)(network taxonomy)

Categorization tools work with topic structure & concept Categorization tools work with topic structure & concept definitions to categorize and index content definitions to categorize and index content

The following screen illustrates how that same structure is The following screen illustrates how that same structure is embedded into Teragram profile to support categorizationembedded into Teragram profile to support categorization

Page 47: Enterprise Architecture Topics

Subtopics

Domain concepts or controlled vocabulary

Page 48: Enterprise Architecture Topics

Extensive operators allow us to write

grammatical rules to manage typical semantic

problems

Page 49: Enterprise Architecture Topics

Concept based rules engine allows us to define patterns to

capture other kinds of data

Page 50: Enterprise Architecture Topics

Example of use of Authority Control to capture country

names but extract ‘authorized’ version of

country name

Example of use of a gazetteer + concept

extraction + rules engine to support semantic

interoperability

Page 51: Enterprise Architecture Topics

Use of concept extraction + rules engine to capture Loan #, Credit #,

Project ID#

Page 52: Enterprise Architecture Topics

Enterprise Profile

Development & Maintenance

Enterprise Metadata Profile

Concept Extraction TechnologyCountryOrganization NamePeople NameSeries Name/Collection TitleAuthor/CreatorTitlePublisher Standard Statistical VariableVersion/Edition

Categorization TechnologyTopic CategorizationBusiness Function CategorizationRegion CategorizationSector CategorizationTheme Categorization

Rule-Based CaptureProject IDTrust Fund #Loan #Credit #Series #Publication DateLanguage

Summarization

e-CDS Reference Sources forCountry, Region, Topics

Business Function, Keywords,Project ID, People, Organization

Data GovernanceProcess for

Topics, Business Function,Country, Region, Keywords,

People, Organizations, Project ID

Teragram Team

TK240 Client

System 1 System 2 System 3

System 4

System 5

Enterprise Profile Creation and Maintenance

UCM ServiceRequests

Update & Change Requests

Page 53: Enterprise Architecture Topics

ImageBank Integration

Content Capture

ISP Integration

Enterprise Profile

Development &

Maintenance

XML Wrapped Metadata

Dedicated Server – Teragram Semantic

Engine – Concept Extraction, Categorization, Clustering, Rule Based Engine, Language Detection

APIs & Integration

APIs & Integration

Content Capture

XML Wrapped Metadata

Factiva Metadata Database

IRIS Integration

APIs & Integration

EnterpriseMetadata Capture Strategy

TK240 Client

XML Output

Reference Sources

APIs & Technical Integration

Content OwnersContent Owners

Business Analyst

Indexers Librarians

FunctionalTeam

Enterprise Metadata Capture – Functional Reference Model

Page 54: Enterprise Architecture Topics

Caution Regarding ToolsCaution Regarding Tools

Not all tools will do what we describing hereNot all tools will do what we describing here

You need to have an underlying semantic engine You need to have an underlying semantic engine which can perform semantic analysiswhich can perform semantic analysis

You need to have a semantic engine in multiple You need to have a semantic engine in multiple languages – semantics vary by languagelanguages – semantics vary by language

You need to have access to the programs through a You need to have access to the programs through a user-friendly interface so you can adapt them to your user-friendly interface so you can adapt them to your environment without having to have programming environment without having to have programming knowledgeknowledge

You need to have several different kinds of You need to have several different kinds of technologies to do what I’m describing heretechnologies to do what I’m describing here

Not all the tools on the market today support this workNot all the tools on the market today support this work

Page 55: Enterprise Architecture Topics

Impacts & OutcomesImpacts & Outcomes Information Access impactsInformation Access impacts

– Increased precision of searchIncreased precision of search– Better control over recall Better control over recall – Searching like we talk Searching like we talk – Exact match searching – known item searching will work betterExact match searching – known item searching will work better– Metadata based searching now begins to resemble full-text Metadata based searching now begins to resemble full-text

searching but with all the advantages of structure & context, and searching but with all the advantages of structure & context, and a significant reduction in the amount of noisea significant reduction in the amount of noise

Productivity ImprovementsProductivity Improvements– Can now assign deep metadata to all kinds of content Can now assign deep metadata to all kinds of content – Remove the human review aspect from the metadata captureRemove the human review aspect from the metadata capture– Reduce unit times where human review is still usedReduce unit times where human review is still used

Information Quality impactsInformation Quality impacts– Apply quality metrics at the metadata level to eliminate need to Apply quality metrics at the metadata level to eliminate need to

build ‘fuzzy search architectures’ – these rarely scale or improve build ‘fuzzy search architectures’ – these rarely scale or improve in performancein performance

– Use the technologies to identify and fix problems with dataUse the technologies to identify and fix problems with data

Page 56: Enterprise Architecture Topics

Thank You.Thank You.Questions & DiscussionsQuestions & Discussions

Contact Information:Contact Information:[email protected]@[email protected]@georgetown.edu