Upload
aamir97
View
868
Download
0
Tags:
Embed Size (px)
Citation preview
Ontologies, Taxonomies and Search
Denise BedfordDenise Bedford
Special Libraries AssociationSpecial Libraries AssociationBaltimore, MarylandBaltimore, Maryland
June, 2006June, 2006
Presentation OverviewPresentation Overview
I will talk today from the ‘end game’ perspective I will talk today from the ‘end game’ perspective of search – placing taxonomies and ontologies in of search – placing taxonomies and ontologies in that contextthat context
– Search – User NeedsSearch – User Needs– Search Functional Architecture Search Functional Architecture – Metadata & CategorizationMetadata & Categorization– Semantic TechnologiesSemantic Technologies
Search – User NeedsSearch – User Needs
Basic Assumptions About Search Basic Assumptions About Search SystemsSystems
Search systems are not WYSIWYG – what you see is Search systems are not WYSIWYG – what you see is not what you getnot what you get
Search systems are more like ice-bergs - most of Search systems are more like ice-bergs - most of the search system is below the surface – you can’t the search system is below the surface – you can’t see it, you don’t know how it has been configured see it, you don’t know how it has been configured or what components it hasor what components it has
Helpful to have a framework against which you can Helpful to have a framework against which you can judge the suitability of any search tool to your judge the suitability of any search tool to your environmentenvironment
Assess the search environment before you decide Assess the search environment before you decide what kind of a search system you needwhat kind of a search system you need
The EnvironmentThe Environment Like other libraries we acquire, create, license, access Like other libraries we acquire, create, license, access
information in 30 subject domains – ranging from Law & information in 30 subject domains – ranging from Law & Justice, to Transport, to Environment, to Agriculture, to Justice, to Transport, to Environment, to Agriculture, to Education, Health & Nutrition, etc. Education, Health & Nutrition, etc.
We have 500 different kinds of content/document types We have 500 different kinds of content/document types
We have a set of business processes which represent the We have a set of business processes which represent the way we do our workway we do our work
We have six working languages – but actually working in We have six working languages – but actually working in many moremany more
We have a rich history – content dating back to the 1940sWe have a rich history – content dating back to the 1940s
Our priorities change over time Our priorities change over time
Everyone in the Bank is a recognized international expert Everyone in the Bank is a recognized international expert
Underlying Search ArchitectureUnderlying Search Architecture
U
ser
Inte
rfa
ceS
ea
rch
En
gin
es
I
nfo
rma
tion
UI - Image BankUI - Intranet UI - External WebUI - IRIS
Oracle IntermediaOracle IntermediaGoogle Google
IRISImage BankIntranet External Web
UI-JOLISUI-
SAP&PeopleSoft UI-External
Sources
JOLIS
BRS-search SAP&PeopleSoft
SAP&PeopleSoft
Individual search engines
External Sources
Silos of Information = Multiple search engines and user interfaces
Search EnvironmentSearch Environment It is very challenging to design a search system which It is very challenging to design a search system which
meets the needs of this complex environment meets the needs of this complex environment
Internal Experts do ‘known item’ searching and look for Internal Experts do ‘known item’ searching and look for other experts to talk toother experts to talk to
General public looks for information ‘about’ somethingGeneral public looks for information ‘about’ something
Most people don’t know all the dimensions of the BankMost people don’t know all the dimensions of the Bank
People need to search in other than English and find People need to search in other than English and find content written in other languagescontent written in other languages
Our new Knowledge and Learning initiative is moving us Our new Knowledge and Learning initiative is moving us towards a W3 working environment - What information I towards a W3 working environment - What information I need when I need it, where I need it need when I need it, where I need it
We need a search architecture that supports the user’s We need a search architecture that supports the user’s needsneeds
Information Architecture: Moving Towards Information Architecture: Moving Towards ContextualizationContextualization
Anonymous Access(Context
Free)
Group Access (Context
Sensitive – Group)
Individualized Access (Context
Sensitive – Individual)
Customized Access to Information
Co
nte
nt
– In
form
atio
n A
sset
s
Str
uctu
red
Less
str
uctu
red
Intranet
External Web
Operations Portal
myPortal*
myPortal*
Client Connections
PeopleBasic Information “Customer” Information
3W Information Model
What I Want
Where I Want
When I Want
Search Use Case AnalysisSearch Use Case Analysis
Why Search Wasn’t WorkingWhy Search Wasn’t Working
Search Has Always Been Search Has Always Been ProblemmaticProblemmatic
Despite any reports you may read in the general Despite any reports you may read in the general literature, buying a fast crawler will not solve your literature, buying a fast crawler will not solve your search problemssearch problems
Implementing a fast crawler simply surfaces Implementing a fast crawler simply surfaces information management and data quality information management and data quality challenges directly to the userschallenges directly to the users
The crawler approach provides high hit rates, and The crawler approach provides high hit rates, and very low relevancy ratesvery low relevancy rates
It was a challenge to understand the search It was a challenge to understand the search problem from the system-side problem from the system-side
Try to understand the search problem from the Try to understand the search problem from the user’s sideuser’s side
Consultant is looking for the CAS for the Congo. Consultant is looking for the CAS for the Congo.
Steps 1 - 2
DEC Consultant
F
Intranet Search
ImageBank Search
Steps 4-6
F
Adviser wants to find social indicators, relevant quotes and Bank’s position on racism, as Adviser wants to find social indicators, relevant quotes and Bank’s position on racism, as well as learn what the Bank has done in this area in order to prepare a 30 min. speech for well as learn what the Bank has done in this area in order to prepare a 30 min. speech for
a WB Managing Director for conference.a WB Managing Director for conference.
Steps 1-4
HD Adviser
F
WB Intranet Search
Google Internet
Steps 7-12PS
External NGOColleague
Bank Colleagues LAC Group
Steps 15-16
SSteps 5-6 S
PS
What We LearnedWhat We Learned– The absolute success rate per search task was low at The absolute success rate per search task was low at
43.18%43.18%
– The Search experience generally consists of multiple steps The Search experience generally consists of multiple steps and multiple searches within and across sourcesand multiple searches within and across sources
– The source with the fewest number of steps was a The source with the fewest number of steps was a Colleague or Personal Contact. The source with the Colleague or Personal Contact. The source with the greatest number of steps was External Web Search Browsegreatest number of steps was External Web Search Browse
– Each source has its own behavior, business rules, functional Each source has its own behavior, business rules, functional architectures – users need to learn each systemarchitectures – users need to learn each system
– The Intranet search was selected as a logical place to look The Intranet search was selected as a logical place to look for information in over 65% of the use cases. However, the for information in over 65% of the use cases. However, the success rate for the Intranet Google search was only 35%success rate for the Intranet Google search was only 35%
What We LearnedWhat We Learned
You need to understand search from the You need to understand search from the searcher’s point of view searcher’s point of view
Until you can see it this way, you are simply Until you can see it this way, you are simply guessing at what problems need to be solved and guessing at what problems need to be solved and how to solve themhow to solve them
This is a challenge for an organization whose This is a challenge for an organization whose users are the public, but it is possible to design users are the public, but it is possible to design an architecture that suits the general public and an architecture that suits the general public and the internal staff, and the important stakeholdersthe internal staff, and the important stakeholders
The critical success factor is to design an open The critical success factor is to design an open architecture for agility and flexibilityarchitecture for agility and flexibility
Enterprise SearchEnterprise Search
Evolving Search ArchitectureEvolving Search Architecture
Contextualization, Personalization, Recommender, Content Syndication, Q&A Systems, Intelligent Search
What the user sees
Search Engine PartsSearch Engine Parts
Search System Inputs
Metadata Repository
for Enterprise
Search
Index Architectur
e & Constructi
on
Vocabulary Support
(thesaurus)
Classification
Schemes (taxonomy
)
Query Transformation
& MatchingBoolean matching
Exact phrase matching
Fuzzy matching
Term matching + synonym expansion
Term matching + cross language
expansionTerm weighted
matchingQuery term root
matching & dictionary expansion
Wild card matchingTerm proximity
matchingNeural network
expansionGenetic algorithm
expansion
Search OutputsQuery Manipulation
Simple search
Fielded Search
Query Processing Algorithms
Search term assistance (thesaurus, dictionaries)
Search language selection
Sources selection
Display ResultsIntegrated search results
Search results sorting
Search results contextualization
Search results relevancy ranking
The basic search programs
Foundation or Baseline of The Search System
Query Processing -Transforming the
query to add tags,
-Synonyms, etc.
Enterprise Search Proposed Enterprise Search Proposed SolutionSolution
What Enterprise Search is and what it is notWhat Enterprise Search is and what it is not
Distinguish between the web services platform for accessing Distinguish between the web services platform for accessing Enterprise Search and the enterprise’s full store of content Enterprise Search and the enterprise’s full store of content (its not only electronic, and its not all in html format!!)(its not only electronic, and its not all in html format!!)
What kind of an architecture do you need to support What kind of an architecture do you need to support enterprise search?enterprise search?
What kind of tools do you need to support enterprise What kind of tools do you need to support enterprise search? Its much more than just a web crawler….search? Its much more than just a web crawler….
What do you need to support true concept level cross-What do you need to support true concept level cross-language searching? language searching?
How does Enterprise Search respect security classifications How does Enterprise Search respect security classifications with/without restricting knowledge of what exists? with/without restricting knowledge of what exists?
ConsolidatedMetadata
Store
Enhanced Common
DataStores*
MDRTools & Utilities
Metadata Extracts
Metadata Extracts
Metadata Loads
Metadata Maintenance Utilities
Security Policy
Change Mgmt.Processes
Utilities
Interface Templates
ParametricIndexes
IndexUtilities
Teragram Metadata Capture
SearchInterface
-Simple andFielded Search
ResultsDisplay &
Manipulation
QueryManipulation
Options
QueryProcessingAlgorithms
MetaModelRepository
RelationalMetaModel
BusinessMetaModel
Logical MetaModel
ApplicationMetaModel
Metadata RepositoryMetaModel
Including• transformation rules• reporting specs• loader programs• data standards• data rationalization
ContentAggregator
RecommenderEngine
ContentSyndication
PersonalizationProfiles
SocialOr TaskFiltering
ThresholdFiltering
Search tools
Enterprise Search Functional Architecture
JOLIS
MD
IRAMS
MD
Global JOLIS
MD
Image Bank
MD
LMS
MD
CMS
MD
IRIS
MD
Union Index
*includes thesaurus support and taxonomies
Vocabulary Support
ClassificationSchemes
CrossLanguageSearching
Taxonomies and Ontologies Taxonomies and Ontologies in in
SearchSearch
Using Taxonomies in SearchUsing Taxonomies in Search
Controlled vocabularies – guiding users through the search Controlled vocabularies – guiding users through the search process process (flat taxonomies)(flat taxonomies)
Classification schemes – browsing, navigation, syndication, Classification schemes – browsing, navigation, syndication, contextualization contextualization (hierarchical taxonomies)(hierarchical taxonomies)
Metadata – supporting fielded search Metadata – supporting fielded search (faceted taxonomies)(faceted taxonomies)
Thesauri – supporting knowledge discovery, preventing Thesauri – supporting knowledge discovery, preventing Zero results, supporting cross-language searching Zero results, supporting cross-language searching (network (network taxonomies)taxonomies)
Synonym Rings – improving relevancy, reducing Synonym Rings – improving relevancy, reducing information scattering, managing recall information scattering, managing recall (ring taxonomies)(ring taxonomies)
Flat taxonomy
Hierarchical taxonomy
Ring taxonomy
Ring taxonomy
Fielded Search = Faceted Taxonomy
Vision of An Enterprise Advanced Search
Ring Taxonomy
NetworkTaxonomy
Metadata
Data Driving the Enterprise Data Driving the Enterprise ArchitectureArchitecture
30 Second Description of 30 Second Description of Taxonomies and OntologiesTaxonomies and Ontologies
Taxonomies & Ontologies Taxonomies & Ontologies
I just mentioned several kinds of taxonomies that searchI just mentioned several kinds of taxonomies that search
– Faceted taxonomies (metadata schemes)Faceted taxonomies (metadata schemes)– Flat taxonomies (controlled vocabularies, picklists)Flat taxonomies (controlled vocabularies, picklists)– Hierarchical taxonomies (classification schemes)Hierarchical taxonomies (classification schemes)– Network taxonomies (thesauri, semantic networks) Network taxonomies (thesauri, semantic networks) – Ring taxonomies – to handle synonym rings and Ring taxonomies – to handle synonym rings and
synonym searhingsynonym searhing
Let’s walk through these kinds of taxonomiesLet’s walk through these kinds of taxonomies
Then let’s see how they all make up an ontology Then let’s see how they all make up an ontology
Facet TaxonomiesFacet Taxonomies
Faceted taxonomy representedas a star data structure. Eachnode in the start structure isliked to the center focus. Any node can be linked to other nodes in other stars. Appears simple, but becomes complex quickly.
Core Metadata StrategyCore Metadata Strategy
What is a core metadata strategy?What is a core metadata strategy?
What’s the process you use to discover your organization’s What’s the process you use to discover your organization’s core metadata strategy?core metadata strategy?
Libraries have many core metadata standards – COSATI, Libraries have many core metadata standards – COSATI, Dublin Core, MARC, MODS, COSATI, AIIM TR48Dublin Core, MARC, MODS, COSATI, AIIM TR48
The important question for metadata in search is how are The important question for metadata in search is how are you using your metadata to support search? you using your metadata to support search?
Too often we put a ‘dumb’ search engine on top of ‘smart Too often we put a ‘dumb’ search engine on top of ‘smart metadata’ and do nothing more with it than ‘publish’ it metadata’ and do nothing more with it than ‘publish’ it
It’s time to think smarter about how we use our metadataIt’s time to think smarter about how we use our metadata
Purpose of Core MetadataPurpose of Core Metadata
Agent Country Authorized By
Record Identifier
Title Region Rights Management
Disposal Status
Date Abstract/ Summary
Access Rights
Disposal Review Date
Format Keywords Location Management History
Publisher Subject-Sector- Theme-Topic
Use History Retention Schedule/ Mandate
Language Business Function
Preservation History
Version Disclosure Status Aggregation Level
Series & Series #
Disclosure Review Date
Relation
Content Type
Identification/ Distinction
Search & Browse
Use ManagementCompliant Document Management
Capturing Core MetadataCapturing Core Metadata
Agent Country Authorized By
Record Identifier
Title Region Rights Management
Disposal Status
Date Abstract/ Summary
Access Rights
Disposal Review Date
Format Keywords Location Management History
Publisher Subject-Sector- Theme-Topic
Use History Retention Schedule/Mandate
Language Business Function
Disclosure Review Date Preservation History
Version Disclosure Status Aggregation Level
Series & Series #
Relation
Content Type
Identification/ Distinction
Use Management Compliant Document Management
Human CreationProgrammatic Capture
Inherit from System Context
Extrapolate from Business Rules
Search & Browse
Flat Taxonomy StructureFlat Taxonomy Structure
Energy Environment Education Economics Transport Trade Labor Agriculture
Hierarchical TaxonomyHierarchical Taxonomy
A hierarchical taxonomy is represented as a tree data structure in a database application. The tree data structure consists of nodes and links. In an RDBMSenvironment, the relationships become associations. In a hierarchical taxonomy, a nodecan have only one parent.
Hierarchical Taxonomies – Hierarchical Taxonomies – Classification SchemesClassification Schemes
Ranganathan is the supreme authority on how to create a Ranganathan is the supreme authority on how to create a well-design classification schemewell-design classification scheme
Most of what we teach in graduate school, though, is how to Most of what we teach in graduate school, though, is how to use a classification scheme not how to create oneuse a classification scheme not how to create one
Classification schemes are the controlling reference source Classification schemes are the controlling reference source for metadata attributesfor metadata attributes
For example, the Enterprise Topic Classification Scheme is For example, the Enterprise Topic Classification Scheme is the reference source for the attribute Topicthe reference source for the attribute Topic
We use tools to help us discover what the scheme should We use tools to help us discover what the scheme should be and also to help us classify content be and also to help us classify content
Network TaxonomiesNetwork Taxonomies
A network taxonomy is a plex data structure. Each node can have more than one parent. Any item in a plex structure can be linked to any other item. In plex structures, links can be meaningful & different.
Poverty mitigation
Poverty alleviation
Poverty elimination
Poverty reduction
Poverty eradication
Poverty abatement
Poverty prevention
Poverty reducation
Ring Taxonomy
Rings can include all kinds of synonyms - true, misspellings, predecessors, abbreviations
Content Entity1
Content Elements
Content
Metadata
Topic Class Scheme
Business ProcessScheme
Thesaurus
Country Names
Region Names
Skill Sets/Competencies
Standard Statistical Variables
Has values
usesHas
Contains
UserHas relationship to
Has Meaning in
Context
ContextualMatrix &Sensiing
Understood
uses
Not in progress In progress In Use
Profile
Has
Business Rule
Has
Ontology Design from 50,000 Feet
Has values
Content Parts
Has
Metadata
Has
Building & Maintaining Building & Maintaining TaxonomiesTaxonomies
Moving Towards Automted Moving Towards Automted Metadata CaptureMetadata Capture
Relationships across data
classes
3 OracleData
classes
Topic data class
SubtopicData Class
Building and Maintaining Building and Maintaining TaxonomiesTaxonomies
Moving towards automated metadata generation means Moving towards automated metadata generation means that catalogers shift their effort to reviewing the metadata that catalogers shift their effort to reviewing the metadata generated and to maintaining LCSH and LC classification as generated and to maintaining LCSH and LC classification as part of a suite of categorization toolspart of a suite of categorization tools
Level of effort shifts to training and developing the tools Level of effort shifts to training and developing the tools and away from original cataloging and metadata capture and away from original cataloging and metadata capture
Continue to work closely with subject experts to define the Continue to work closely with subject experts to define the controlled vocabularies and classification schemescontrolled vocabularies and classification schemes
The problem with metadataThe problem with metadata Metadata….Metadata….
– Is expensive and time consuming to createIs expensive and time consuming to create– Is sometimes subjective and not granular enoughIs sometimes subjective and not granular enough– Doesn’t always address the ways that users and systems think Doesn’t always address the ways that users and systems think
about the information it describesabout the information it describes– May not tell us enough about the information to trust it May not tell us enough about the information to trust it – May address only one context – the context for which it is May address only one context – the context for which it is
createdcreated– May live in the source application where it was createdMay live in the source application where it was created– May not be as accessible as the information objectMay not be as accessible as the information object
The future depends on metadata so we have to resolve these The future depends on metadata so we have to resolve these problems problems
We are resolving the problem using automated metadata creation We are resolving the problem using automated metadata creation and extraction technologiesand extraction technologies
Metadata extraction uses rules, classifiers and grammar enginesMetadata extraction uses rules, classifiers and grammar engines
Metadata creation uses categorization and rules enginesMetadata creation uses categorization and rules engines
Smart Use of TechnologiesSmart Use of Technologies
Sample structure – Bank Topics Classification Scheme Sample structure – Bank Topics Classification Scheme (hierarchical taxonomy)(hierarchical taxonomy)
– Oracle data classes used to represent Topic Classification Oracle data classes used to represent Topic Classification scheme scheme hierarchical taxonomy as reference source for the hierarchical taxonomy as reference source for the
attribute – Topicattribute – Topic used for Browse, Search, Content Syndication, used for Browse, Search, Content Syndication,
PersonalizationPersonalization
– 11stst challenge is to architect the hierarchy correctly challenge is to architect the hierarchy correctly 3 distinct data classes, not a tree structure with 3 distinct data classes, not a tree structure with
inheritanceinheritance Allows you to use the three data classes for distinct Allows you to use the three data classes for distinct
functions across systems but still enforce relationships functions across systems but still enforce relationships across the classesacross the classes
What is Teragram?What is Teragram?
Semantic analysis tools which support concept extraction, Semantic analysis tools which support concept extraction, categorization, summarization and pattern matching rules categorization, summarization and pattern matching rules enginesengines
Teragram works in 23 languagesTeragram works in 23 languages
Use categorization to capture Topics, Business Activities, Use categorization to capture Topics, Business Activities, Regions, Sectors, Themes, etc.Regions, Sectors, Themes, etc.
Use Concept Extraction to capture keywordsUse Concept Extraction to capture keywords
Use Rules Engine to capture Loan #, Credit #, Project ID, Trust Use Rules Engine to capture Loan #, Credit #, Project ID, Trust Fund #, etc.Fund #, etc.
Use Summarization to generate a ‘gist’ of the contentUse Summarization to generate a ‘gist’ of the content
How does semantic analysis work?How does semantic analysis work?
Automatically Generated MetadataAutomatically Generated Metadata
Metadata is generated using a categorization profile – reflects the way people organize.
Automatically Automatically ExtractedExtracted Metadata Metadata
<?xml version="1.0" encoding="UTF-8"?>
<Proper_Noun_Concept>
<Source><Source_Type>file</Source_Type>
<Source_Name>W:/Concept Extraction/Media Monitoring Negative Training Set/ 001B950F2EE8D0B4452570B4003FF816.txt</Source_Name>
</Source><Profile_Name>PEOPLE_ORG</Profile_Name>
<keywords>Abdul Salam Syed, Aruna Roy, Arundhati Roy, Arvind Kesarival, Bharat Dogra, Kwazulu Natal, Madhu Bhaduri, </keywords><keyword_count>7</keyword_count>
</Proper_Noun_Concept>
Proper Noun Profile for People Names uses grammars to find and extract the names of people referenced in the document.
Semantic Analysis BasicsSemantic Analysis Basics
Once you have made some sense of the sentence, Once you have made some sense of the sentence, reconstruct entities for information extraction (compose)reconstruct entities for information extraction (compose)
– Identify names and other fixed form expressions – Identify names and other fixed form expressions – people, organizations, conferencespeople, organizations, conferences
– Identify basic noun groups, verb groups, Identify basic noun groups, verb groups, presentations, other grammatical elementspresentations, other grammatical elements
– Construct complex noun groups and verb groupsConstruct complex noun groups and verb groups
– Identify event structuresIdentify event structures
– Identify common elements and associate Identify common elements and associate
Categorizing ContentCategorizing Content
Let’s look how we’re categorizing our content to this Let’s look how we’re categorizing our content to this structure automaticallystructure automatically
Topic classification, geographical region assignment, Topic classification, geographical region assignment, keywording exampleskeywording examples
Can apply this approach to any kind of content Can apply this approach to any kind of content
Enables us to build a robust metadata repository model, Enables us to build a robust metadata repository model, with strong metadata quality, to move towards SI at the with strong metadata quality, to move towards SI at the functional levelfunctional level
Also note that we can do this across many languagesAlso note that we can do this across many languages
Leveraging the StructureLeveraging the Structure Each subtopic is a knowledge domain Each subtopic is a knowledge domain (hierarchical (hierarchical
taxonomy) and etaxonomy) and each subtopic has an extensive concept ach subtopic has an extensive concept level definition (1,000 – 5,000+ concepts)level definition (1,000 – 5,000+ concepts)
Concepts are controlled vocabularies in their raw form Concepts are controlled vocabularies in their raw form (flat (flat taxonomy)taxonomy)
Concepts with relationships (extensive per new Z39.19 Concepts with relationships (extensive per new Z39.19 standard) comprise semantic network standard) comprise semantic network (network taxonomy)(network taxonomy)
Categorization tools work with topic structure & concept Categorization tools work with topic structure & concept definitions to categorize and index content definitions to categorize and index content
The following screen illustrates how that same structure is The following screen illustrates how that same structure is embedded into Teragram profile to support categorizationembedded into Teragram profile to support categorization
Subtopics
Domain concepts or controlled vocabulary
Extensive operators allow us to write
grammatical rules to manage typical semantic
problems
Concept based rules engine allows us to define patterns to
capture other kinds of data
Example of use of Authority Control to capture country
names but extract ‘authorized’ version of
country name
Example of use of a gazetteer + concept
extraction + rules engine to support semantic
interoperability
Use of concept extraction + rules engine to capture Loan #, Credit #,
Project ID#
Enterprise Profile
Development & Maintenance
Enterprise Metadata Profile
Concept Extraction TechnologyCountryOrganization NamePeople NameSeries Name/Collection TitleAuthor/CreatorTitlePublisher Standard Statistical VariableVersion/Edition
Categorization TechnologyTopic CategorizationBusiness Function CategorizationRegion CategorizationSector CategorizationTheme Categorization
Rule-Based CaptureProject IDTrust Fund #Loan #Credit #Series #Publication DateLanguage
Summarization
e-CDS Reference Sources forCountry, Region, Topics
Business Function, Keywords,Project ID, People, Organization
Data GovernanceProcess for
Topics, Business Function,Country, Region, Keywords,
People, Organizations, Project ID
Teragram Team
TK240 Client
System 1 System 2 System 3
System 4
System 5
Enterprise Profile Creation and Maintenance
UCM ServiceRequests
Update & Change Requests
ImageBank Integration
Content Capture
ISP Integration
Enterprise Profile
Development &
Maintenance
XML Wrapped Metadata
Dedicated Server – Teragram Semantic
Engine – Concept Extraction, Categorization, Clustering, Rule Based Engine, Language Detection
APIs & Integration
APIs & Integration
Content Capture
XML Wrapped Metadata
Factiva Metadata Database
IRIS Integration
APIs & Integration
EnterpriseMetadata Capture Strategy
TK240 Client
XML Output
Reference Sources
APIs & Technical Integration
Content OwnersContent Owners
Business Analyst
Indexers Librarians
FunctionalTeam
Enterprise Metadata Capture – Functional Reference Model
Caution Regarding ToolsCaution Regarding Tools
Not all tools will do what we describing hereNot all tools will do what we describing here
You need to have an underlying semantic engine You need to have an underlying semantic engine which can perform semantic analysiswhich can perform semantic analysis
You need to have a semantic engine in multiple You need to have a semantic engine in multiple languages – semantics vary by languagelanguages – semantics vary by language
You need to have access to the programs through a You need to have access to the programs through a user-friendly interface so you can adapt them to your user-friendly interface so you can adapt them to your environment without having to have programming environment without having to have programming knowledgeknowledge
You need to have several different kinds of You need to have several different kinds of technologies to do what I’m describing heretechnologies to do what I’m describing here
Not all the tools on the market today support this workNot all the tools on the market today support this work
Impacts & OutcomesImpacts & Outcomes Information Access impactsInformation Access impacts
– Increased precision of searchIncreased precision of search– Better control over recall Better control over recall – Searching like we talk Searching like we talk – Exact match searching – known item searching will work betterExact match searching – known item searching will work better– Metadata based searching now begins to resemble full-text Metadata based searching now begins to resemble full-text
searching but with all the advantages of structure & context, and searching but with all the advantages of structure & context, and a significant reduction in the amount of noisea significant reduction in the amount of noise
Productivity ImprovementsProductivity Improvements– Can now assign deep metadata to all kinds of content Can now assign deep metadata to all kinds of content – Remove the human review aspect from the metadata captureRemove the human review aspect from the metadata capture– Reduce unit times where human review is still usedReduce unit times where human review is still used
Information Quality impactsInformation Quality impacts– Apply quality metrics at the metadata level to eliminate need to Apply quality metrics at the metadata level to eliminate need to
build ‘fuzzy search architectures’ – these rarely scale or improve build ‘fuzzy search architectures’ – these rarely scale or improve in performancein performance
– Use the technologies to identify and fix problems with dataUse the technologies to identify and fix problems with data
Thank You.Thank You.Questions & DiscussionsQuestions & Discussions
Contact Information:Contact Information:[email protected]@[email protected]@georgetown.edu