View
221
Download
1
Tags:
Embed Size (px)
Citation preview
12.10.2006 TMRA '06 2
Basics
Knowledge model based information retrieval
Fulltext search enhanced with Topic Maps = Semantic search
Search driven navigation
12.10.2006 TMRA '06 3
Search technologies
Level of precision("Intelligence")
Data volume(Domain size)
Semantic search
Full-text search
Conceptual search
Compass
12.10.2006 TMRA '06 4
Given...
a web site with a lot of text,
which is unstructured (no markup, no tags),
a controlled domain (we know what the discourse domain is), and
non-adequate search engine...
12.10.2006 TMRA '06 5
We would like to...
get relevant hits within a meaningful context,
spare the work of structuring our data,
add semantics to the content by defining a knowledge model.
12.10.2006 TMRA '06 6
Compass-bowl:Take a fulltext search engine.
Take a Topic Maps engine.
Add a hint of semantics.
Define the correct processes for orchestrating the components.
Mix them thoroughly.
Serve to public!
12.10.2006 TMRA '06 7
Full text search engine
Apache Lucene (open source)
Possible to index most file formats html, asp, php, jsp, pdf, rtf, txt, doc, ppt, xls,
pst…
The index is independent of the model No need to re-index when changes are made to
the model Small index size
typically less than 10% of the size of the data Fast index lookup
less than 20 ms for index size >20000
12.10.2006 TMRA '06 8
The knowledge model
Based on the ISO International Standard for Topic Maps
Semantic model of the discourse domain
Concept words = topic names/synonyms
Semantic relationships through associations
Compass Weight defines “closeness” between topics property on association types
12.10.2006 TMRA '06 9
Example
Ovitas
Christopher
type hasEmployee
CW=0.7
Compass
hasProduct
CW=0.8
type
12.10.2006 TMRA '06 10
Compass orchestrator
Guides the processes of the search:1. Search for term in the topic map2. Expand the map for relevant/related
topics3. Send all these terms off to a fulltext
search4. Calculates relevance (based on the
combination of CW and Lucene weights) and prepares the result list as an XML instance
5. Render XML as wished
Topic Map expansio
n
Search term
Hits in the fulltext gruouped by the related
topics
Relevant documents ranked by the
weighting result
Search term in the topic map, but not in the text
Relevant information about ”Chris
Searle”
Synonym search
12.10.2006 TMRA '06 15
Creating/maintaining the model
An MS Excel plug-in serves as the topic map editor
Can be put under version control Import the model into the topic map
engine: one click only For complex topic maps a custom
user interface can be used to enter instance data
12.10.2006 TMRA '06 16
Navigation
Navigation through the associations between topics
Navigation by search
12.10.2006 TMRA '06 17
User configurations
What pages to index What topic map to use The number of hops to perform The threshold for relevance
12.10.2006 TMRA '06 18
Content lifecycle management
Easy to integrate with content repositories
A content management or publishing system can send a request to the indexer to re-index a particular resource
Incremental indexing: add, update or delete documents
HTTP is used as the basic mechanism to address content
12.10.2006 TMRA '06 19
Architecture
SOA (service oriented architecture), no dependency on platform or components
Web service interface (HTTPRest) .NET platform Integrated components:
TMCore Topic Maps engine by NetworkedPlanet
Apache Lucene: full text engine
12.10.2006 TMRA '06 20
Architecture diagram
TM Core Full Text
Excel Editor Compass Service
TMNav
TM editor person
User Publishing System Services
12.10.2006 TMRA '06 21
Compass - Summary
Semantic search based on Topic Maps Search in any document formats Organize information in a topic-oriented
manner Link to relevant information without
touching the data content Conceptual navigation by Topic Maps Tools for maintaining/evolving the
classification Fast and easy implementation