25
Confidential HP Content Management, Content Management, Metadata & Semantic Web Metadata & Semantic Web Keynote Address Keynote Address Net.ObjectDAYS 2001, Erfurt, Germany, September 11, Net.ObjectDAYS 2001, Erfurt, Germany, September 11, 2001 2001 Amit Sheth Amit Sheth CTO/SrVP, Voquette (www.voquette.com) [formerly Founder/CEO, Taalee, www.taalee.com] Director, Large Scale Distributed Information Systems Lab, University Of Georgia (lsdis.cs.uga.edu) [email protected] Metadata Extraction is a patented pending technology of Taalee, Inc. Semantic Engine and WorldModel are trademarks of Taalee. Inc.

Amit Sheth CTO/SrVP, Voquette (voquette) [formerly Founder/CEO, Taalee, taalee]

Embed Size (px)

DESCRIPTION

Amit Sheth CTO/SrVP, Voquette (www.voquette.com) [formerly Founder/CEO, Taalee, www.taalee.com] Director, Large Scale Distributed Information Systems Lab, University Of Georgia (lsdis.cs.uga.edu) [email protected]. - PowerPoint PPT Presentation

Citation preview

Page 1: Amit Sheth CTO/SrVP, Voquette (voquette)  [formerly Founder/CEO, Taalee, taalee]

Confidential HP

Content Management, Content Management, Metadata & Semantic WebMetadata & Semantic Web

Keynote AddressKeynote AddressNet.ObjectDAYS 2001, Erfurt, Germany, September 11, 2001Net.ObjectDAYS 2001, Erfurt, Germany, September 11, 2001

Amit ShethAmit ShethCTO/SrVP, Voquette (www.voquette.com)

[formerly Founder/CEO, Taalee, www.taalee.com]

Director, Large Scale Distributed Information Systems Lab, University Of Georgia (lsdis.cs.uga.edu)

[email protected]

Metadata Extraction is a patented pending technology of Taalee, Inc.Semantic Engine and WorldModel are trademarks of Taalee. Inc.

Page 2: Amit Sheth CTO/SrVP, Voquette (voquette)  [formerly Founder/CEO, Taalee, taalee]

HP 2

Enterprise Content Management – sample user requirements (from a large Financial Svcs Company)

“If a new bond comes into inventory, then we should get a message, an alert...and be able to refine to say that I only have California, Oregon and Washington clients...."

“In the month of July, I received 95 e-mails from my subscriptions. These e-mails included 61 that had 143 attachments that had 67 more attachments. In total therefore, I received almost 400 documents including 5 different types (HTML,PDF, Word, Rich Media, …). Even with this volume, I had subscribed to only 10 categories in the Equities area. There are a total of 26 Equity Subscription areas and a total of 166 categories to which a user can subscribe across all Product Areas.”

Professional users of a traditional Content Management Product/Solution

Page 3: Amit Sheth CTO/SrVP, Voquette (voquette)  [formerly Founder/CEO, Taalee, taalee]

HP 3

Enterprise Content Management – sample user requirements (from a large Financial Svcs Company)

The real question is, "Which sales ideas may have significant relevance to my book of business?" For example, an earnings warning on an equity rated Hold or Lower and not owned by any of my clients may not be of high relevance to me. Ideally, a relevance analysis would: Greatly reduce the volume of Product Area Ideas sent to every FA,

hopefully to perhaps 10% to 20% or less of today's volume with ideas that are potentially actionable for that FA and his/her client

Result in FAs reading and evaluating the Product Area Ideas, taking appropriate actions, and generating sales because the Product Area Ideas would be relevant

Result in customer satisfaction because clients would understand FAs are paying attention to their needs and developing focused ideas

Professional users of a traditional Content Management Product/Solution

Page 4: Amit Sheth CTO/SrVP, Voquette (voquette)  [formerly Founder/CEO, Taalee, taalee]

HP 4

Enterprise Content Management – sample product requirements (from a large Financial Svcs Company)

“Content generation is a more complex and probably costly problem to solve ... we reportedly create about 9 million messages a month for field delivery. On average, this would mean 1,000 messages per month per ‘big user’ or perhaps only 500 to 600 per ‘little user’.…I strongly believe an analysis is in order of the nature and necessity of generated content , the establishment of content generation standards, themovement towards development and implementation of a relevance engine, … “

Director (Product Management) of a large company that uses a leading Content Management Product

Page 5: Amit Sheth CTO/SrVP, Voquette (voquette)  [formerly Founder/CEO, Taalee, taalee]

HP 5

New Enterprise Content Management Challenges

1. More variety and complexity More formats (MPEG, PDF, MS Office, WM, Real, AVI, etc) More types (Docs, Images -> Audio, Video, Variety of text-

structured, unstructured) More sources (internal, extranet, internet, feeds)

2. Information Overload Too much data, precious little information (Relevance)

3. Creating Value from Content How to Distribute the right content to the right people as needed?

(Personalization -- book of business) Customized delivery for different consumption options

(mobile/desktop, devices) Insight, Decision Making (Actionable)

Page 6: Amit Sheth CTO/SrVP, Voquette (voquette)  [formerly Founder/CEO, Taalee, taalee]

HP 6

New Enterprise Content Management Technical Challenges

1. Aggregation Feed handlers/Agents that understand content representation and

media semantics Push-pull, Web-DB-Files, Structured-Semi-structured-

Unstructured data of different types

2. Homogenization and Enhancement Enterprise-wide common view

Domain model, taxonomy/classification, metadata standards Semantic Metadata– created automatically if possible

3. Semantic Applications Search, personalization, directory, alerts, etc. using metadata and

semantics (semantic association and correlation), for improved relevance, intelligent personalization, customization

Page 7: Amit Sheth CTO/SrVP, Voquette (voquette)  [formerly Founder/CEO, Taalee, taalee]

HP 7

Semantics

“meaning or relationship of meanings, or relating to meaning”

(Webster)

is concerned with the relationship between the linguistic

symbols and their meaning or real-world objects

meaning and use of data (Information System)

Example: Palm -> Company, Product, Technology, Tree Name, part of location (Palm Spring, Palm Beach)

Semantics, Ontologies (Domain Models), Metamodels,

Metadata, Content/Data

Page 8: Amit Sheth CTO/SrVP, Voquette (voquette)  [formerly Founder/CEO, Taalee, taalee]

HP 8

“The Web of data (and connections) with meaning in the sense that a computer program can learn enough about what the data means to process it. . . . Imagine what computers can understand when there is a vast tangle of interconnected terms and data that can automatically be followed.” (Tim Berners-Lee, Weaving the Web, 1999)

A Content Management centric definition ofSemantic Web: The concept that Web-accessible content can be organized and utilized semantically, rather than though syntactic and structural methods.

Semantics: The Next Step in the Web’s Evolution

Page 9: Amit Sheth CTO/SrVP, Voquette (voquette)  [formerly Founder/CEO, Taalee, taalee]

HP 9

Organizing Content

Different and Related Objectives: Search, Browse, Summarization, Association/Relationships

Indexing Clustering Classification Controlled Vocabulary, Reference Data/ Dictionary/Thesaurus Metadata Knowledge Base (Entities/Objects and Relationships)

Page 10: Amit Sheth CTO/SrVP, Voquette (voquette)  [formerly Founder/CEO, Taalee, taalee]

HP 10

Statistical/AI Techniques

Customer Article Feed

4715

Classification of Article 4715

Customer Training

Set

Traditional Text Categorization

Routing/Distribution

Classify Place ina taxonomy

feed

Standard Metadata

Feed Source: iSyndicate  

Posted Date: 11/20/2000Most traditional Content Management Products support Categorization of unstructured content..

Page 11: Amit Sheth CTO/SrVP, Voquette (voquette)  [formerly Founder/CEO, Taalee, taalee]

HP 11

Knowledge-base & Statistical/AI Techniques

Article Feed4715

Classification of Article 4715

Customer Training Set & KB

Routing/Distribution

ClassifyPlace ina taxonomy

Taalee Training Set & KB

Map to another taxonomy

MetadataCatalog

Semantic Engine™

Precise Personalization/Syndication/Filtering

Voquette/Taalee’s Categorization & Automatic Metadata Creation

feed

Article 4715 MetadataFeed Source: iSyndicate  

Posted Date: 11/20/2000

Company Name: France Telecom,

Equant

Ticker Symbol: FTE, ENT

Exchange: NYSE

Topic: Company News

Standard metadata

Semantic metadata

FTECompany AnalysisConference Calls

EarningsStock Analysis

ENTCompany AnalysisConference Calls

EarningsStock Analysis

NYSEMember Companies

Market NewsIPOs

Automated Content Enrichment (ACE)

Page 12: Amit Sheth CTO/SrVP, Voquette (voquette)  [formerly Founder/CEO, Taalee, taalee]

HP 12

Technologies for Organizing Content

Information Retrieval/Document Indexing TF-IDF/statistical, Clustering, LSI Statistical learning/AI: Machine learning, Bayesian, Markov

Chains, Neural Network Lexical, Natural language Thesaurus, Reference data, Domain models (Ontology) Information Extractors Reasoning/Inferencing: Logic based, Knowledge-based, Rule

processing and

Most powerful solutions require combine several of these, addressing more of the objectives

Page 13: Amit Sheth CTO/SrVP, Voquette (voquette)  [formerly Founder/CEO, Taalee, taalee]

HP 13

Ontology

Standardizes meaning, description, representation of involved concepts/terms/attributes

Captures the semantics involved via domain characteristics, resulting in semantic metadata

“Ontological Commitment” forms basis for knowledge sharing and reuse

Ontology provides semantic underpinning.

Page 14: Amit Sheth CTO/SrVP, Voquette (voquette)  [formerly Founder/CEO, Taalee, taalee]

HP 14

An OntologyAn Ontology

Disaster

eventDate

description

site => latitude, longitude

sitelatitude

longitude

Natural Disaster

Man-made Disaster

damage

numberOfDeaths

damagePhoto

Volcano

EarthquakeNuclearTest

magnitude

bodyWaveMagnitude

conductedBy

explosiveYield

bodyWaveMagnitude < 10

bodyWaveMagnitude > 0

magnitude < 10

magnitude > 0

Terms/Concepts(Attributes) Functional

Dependencies (FDs)

Domain Rules

Hierarchies

Page 15: Amit Sheth CTO/SrVP, Voquette (voquette)  [formerly Founder/CEO, Taalee, taalee]

HP 15

Controlled Vocabularies/ Classifications/Taxonomies/Ontologies

WordNet Cyc The Medical Subject Headings (MeSH): NLM's controlled

vocabulary used for indexing articles, for cataloging books and other holdings, and for searching MeSH-indexed databases, including MEDLINE. MeSH terminology provides a consistent way to retrieve information that may use different terminology for the same concepts. Year 2000 MeSH includes more than 19,000 main headings, 110,000 Supplementary Concept Records (formerly Supplementary Chemical Records), and an entry vocabulary of over 300,000 terms.

Page 16: Amit Sheth CTO/SrVP, Voquette (voquette)  [formerly Founder/CEO, Taalee, taalee]

HP 16

Open Directory Project (ODP): Classification/Taxonomy & Directory

Page 17: Amit Sheth CTO/SrVP, Voquette (voquette)  [formerly Founder/CEO, Taalee, taalee]

HP 17

Example 1 – Snapshots (“Jamal Anderson”)

Click on first result for Jamal Anderson

View metadata. Note that Team name and League name are also included

in the metadata

Search for ‘Jamal Anderson’ in ‘Football’

View the original source HTML page. Verify that

the source page contains no mention of Team name and League name. They

were Taalee’s value-additions to the metadata to facilitate easier search.

Page 18: Amit Sheth CTO/SrVP, Voquette (voquette)  [formerly Founder/CEO, Taalee, taalee]

HP 18

Example 2 – Snapshots (“Gary Sheffield”)

Click on first result for Gary Sheffield

View metadata. Note that Team name and League name are also included

in the metadata

Search for ‘Gary Sheffield’ in ‘Baseball’

View the original source HTML page. Verify that

the source page contains no mention of Team name and League name. They

were Taalee’s value-additions to the metadata to facilitate easier search.

Page 19: Amit Sheth CTO/SrVP, Voquette (voquette)  [formerly Founder/CEO, Taalee, taalee]

HP 19

Related Stock

News

Related Stock

News

Semantic Web – Intelligent Content(supported by Taalee Semantic Engine)

IndustryNews

IndustryNews

Technology Products

Technology Products

COMPANYCOMPANY

SECEPAEPA

RegulationsRegulations

CompetitionCompetition

COMPANIES in Same or Related INDUSTRY

COMPANIES inINDUSTRY with Competing PRODUCTS

Impacting INDUSTRY or Filed By COMPANY

Important to INDUSTRY or COMPANY

Intelligent Content = What You Asked for + What you need to know!

Page 20: Amit Sheth CTO/SrVP, Voquette (voquette)  [formerly Founder/CEO, Taalee, taalee]

HP 20

Focused relevantcontent

organizedby topic

(semantic categorization)

Automatic ContentAggregationfrom multiple

content providers and feeds

Related news not

specifically asked for(Semantic

Associations)

Competitive research inferred

automatically

Automatic 3rd party content

integration

Semantic Application – Equity Dashboard

Page 21: Amit Sheth CTO/SrVP, Voquette (voquette)  [formerly Founder/CEO, Taalee, taalee]

HP 21

Internal Source 1Research

Internal Source 2

External feeds/Web(e.g. Reuters)

VoquetteMetabase

World Model

Third-partyContent Mgmt

AndSyndication

SemanticEngine

1

2

3

4

Cisco story from Source 1passed on to addsemanticassociations

ConsultsKnowledgeBasefor Cisco’scompetition

Returns result:Lucent is a competitor of Cisco

Lucent story from external

feeds picked for publishing as

“semantically related” to Cisco

story – passedon to Dashboard

Story onLucent

Story onCisco

XCM-compliant metadata, XML or other format

SemanticApplication

ASP/Enterprise hosted

Extractor Agent 1

Extractor Agent 2

Extractor Agent 3

Metadata centricContent Management Architecture

Page 22: Amit Sheth CTO/SrVP, Voquette (voquette)  [formerly Founder/CEO, Taalee, taalee]

HP 22

Semantic Technology Features

Unstructured Text Content Semi-Structured Content Structured Content Audio/Video Content with associated text (transcript, journalist notes) Create a Customized "World Model" (Taxonomy Tree with customized domain

attributes) Automatically homogenize content feed tags Automatically categorize unstructured text Automatically create tags based on text Itself Create and maintain a Customized Knowledge Base for any domain Automatically enhance content tags based on information beyond text Build contextually relevant custom research applications Contextual Search (an order of magnitude better than keyword-based search) Support push or pull delivery/ingestion of content Personalization/Alerts/Notifications Real Time Indexing (stories indexed for search/personalization within a minute) Provide the user with relevant information not explicitly asked for (Semantic

Associations)

Page 23: Amit Sheth CTO/SrVP, Voquette (voquette)  [formerly Founder/CEO, Taalee, taalee]

Confidential HP

Along with the evolution of metadata and semantic

technologies enabling the next generation of the Web, Content Management has entered the next generation of Enhanced

Content Management.

Page 24: Amit Sheth CTO/SrVP, Voquette (voquette)  [formerly Founder/CEO, Taalee, taalee]

Resources/References

RDF:www.w3.org/TR/REC-rdf-syntax/ ICE: www.icestandard.org Meta Object Facility (MOF) Specification, Version 1.3, September 27, 1999:

http://cgi.omg.org/cgi-bin/doc?ad/99-09-05 XML Metadata Interchange (XMI) Specification, Version 1.1, October 25, 1999:

http://cgi.omg.org/cgi-bin/doc?ad/9910-02 http://cgi.omg.org/cgi-bin/doc?ad/99-10-03

DAML: www.daml.org NEWSML: newsshowcase.reuters.com PRISM: www.prismstandard.org/techdev/prismspec1.asp RIXML: www.rixml.org XCM: www.vignette.com OIL: www.ontoknowledge.org/oil SEMANTICWEB: www.semanticweb.org, business.semanticweb.org VOICEXML: www.voicexml.org MPEG7: www.darmstadt.gmd.de/mobile/MPEG7/ Taalee: www.taalee.com Applied Semantics: www.appliedsemantics.com Ontoprose: www.ontoprise.com

Page 25: Amit Sheth CTO/SrVP, Voquette (voquette)  [formerly Founder/CEO, Taalee, taalee]

Multimedia Data Management: Using Metadata to Integrate and Apply Digital Media, Amit Sheth & Wolfgang Klas, Eds., McGraw Hill, ISBN: 0-07-057735-8, 1998.

Information Brokering, Vipul Kashyap & Amit Sheth, Kluwer Academic Publishers, 2001.

Voquette Semantic Technology White Paper.

Mysteries of Metadata, Speaker – Amit Sheth, Workshop at Content World 2001.

Infoquilt Project, LSDIS lab.

http://www.taalee.com http://lsdis.cs.uga.edu/~amit