Upload
nigel-rogers
View
221
Download
0
Tags:
Embed Size (px)
Citation preview
Improving Search forDiscovery
Tom ReamyChief Knowledge Architect
KAPS Group
Program Chair – Text Analytics World
Knowledge Architecture Professional Services
http://www.kapsgroup.com
Improving Search forDiscovery
and Everything Else
Tom ReamyChief Knowledge Architect
KAPS Group
Program Chair – Text Analytics World
Knowledge Architecture Professional Services
http://www.kapsgroup.com
3
Agenda
Introduction
What is Wrong With Search?
What Works?– Metadata & taxonomies– Infrastructure / Information Life Cycle
Yes, But –– Missing Link - Text Analytics – Search and Beyond
Conclusion
4
Introduction: KAPS Group
Knowledge Architecture Professional Services – Network of Consultants Applied Theory – Faceted taxonomies, complexity theory, natural
categories, emotion taxonomies Services:
– Strategy – IM & KM - Text Analytics, Social Media, Integration– Taxonomy/Text Analytics development, consulting, customization– Text Analytics Quick Start – Audit, Evaluation, Pilot– Social Media: Text based applications – design & development
Partners: Smart Logic, Expert Systems, SAS, SAP, IBM, FAST, Concept Searching, Attensity, Clarabridge, Lexalytics
Clients: Genentech, Novartis, Northwestern Mutual Life, Financial Times, Hyatt, Home Depot, Harvard Business Library, British Parliament, Battelle, Amdocs, FDA, GAO, World Bank, etc.
Presentations, Articles, White Papers – www.kapsgroup.com Program Chair – Text Analytics World
5
Improving Search for Discovery
TheyWon’tWork!
6
Improving Search for DiscoveryWhy Won’t It Work?
Search Engines are Stupid! – (and people have better things to do)
Documents deal in language BUT it’s all chicken scratches to Search
Relevance – requires meaning– Imagine trying to understand what a document is about in a
language you don’t know
Mzndin agenpfre napae ponaoen afpenafpenae timtnoe.– Dictionary of chicken scratches (variants, related)– Count the number of chicken scratches = relevance - Not
Google = popularity of web sites and Best Bets– For documents in an enterprise – Counting and Weighting
7
Improving Search for DiscoveryWhy Won’t It Work?
Option – Add metadata – good for archiving & indexing Keywords – don’t scale
– Pilots or small doc set and many authors– Folksonomies don’t really work
Tagging – Governance – Thou Shalt Tag! – No they won’t or really badly
Add taxonomies – beautiful to behold, but gap between taxonomy and documents – and too complex for authors
Power Search – statistical signature of a document – apply all kinds of math = Find Similar!
Not trashing search, but just want to say:– Survey Says – Users Unhappy with Search – Text Analytics is (part of) the answer
8
Semantic InfrastructureText Analytics Features Text Mining – NLP, machine learning, complex statistics Noun Phrase Extraction – Feed facets
– People, Organizations, Dates, Geographic, Methods, etc. – Catalogs with variants, rule based dynamic.
Sentiment Analysis – Positive and Negative Phrases– Dictionaries & rules – “I hate your product”
Summarization – replace snippets Ontologies – fact extraction + reasoning about relationships Auto-categorization – built on a taxonomy
– Training sets, Terms, Semantic Networks– Rules: AND, OR, NOT, DIST, PARAGRAPH, SENTENCE– Foundation – subjects, disambiguation, add intelligence to all
Case Study – Categorization & Sentiment
9
Improving SearchAdding Meaning and Structure
Text Analytics and Taxonomy Together– Text Analytics provides the power to apply the taxonomy– And metadata of all kinds– Consistent in every dimension, powerful and economic
Hybrid Model– Publish Document -> Text Analytics analysis -> suggestions for
categorization, entities, metadata - > present to author– Cognitive task is simple -> react to a suggestion instead of select from
head or a complex taxonomy– Feedback – if author overrides -> suggestion for new category– Facets – Requires a lot of Metadata - Entity Extraction feeds facets
Hybrid – Automatic is really a spectrum – depends on context– Automatic – adding structure at search results
10
11
Improving SearchAdding Meaning and Structure Documents are not unstructured – they have a variety of
structures Categorization by page, sections (text markers) or even
sentence or phrase Use generic components – like the level of generality of
terms or concepts (general and context specific) Additional metadata - document types-purpose, authors Relevance – complex rules – based on structure (intelligent
use of titles, headlines, sections + complex categorization
12
Improving SearchDocument Type Rules (START_2000, (AND, (OR, _/article:"[Abstract]",
_/article:"[Methods]“), (OR,_/article:"clinical trial*", _/article:"humans",
(NOT, (DIST_5, (OR,_/article:"approved", _/article:"safe", _/article:"use", _/article:"animals"),
If the article has sections like Abstract or Methods AND has phrases around “clinical trials / Humans” and not words
like “animals” within 5 words of “clinical trial” words – count it and add up a relevancy score
Primary issue – major mentions, not every mention– Combination of noun phrase extraction and categorization– Results – virtually 100%
13
Need One More Piece:Smart Semantic Infrastructure Integrate entire information life cycle & environment Semantic Layer = Content, Taxonomies, Metadata, Vocabularies
+ Text Analytics– Integrated / Federated Search – all content
Technology Layer– Search, Content Management, SharePoint, Intranets
People – communities (formal and dynamic), business processes (embedded information needs and behaviors)
Publishing process– Hybrid human automatic structure (tagging)
Feedback is essential – direct user comments to deep analytics
Search Can Work!
Simple Subject Taxonomy structure – Easy to develop and maintain
Combined with categorization capabilities– Added power and intelligence
Combined with Faceted Metadata– Dynamic selection of simple categories– Allow multiple user perspectives
• Can’t predict all the ways people think• Monkey, Banana, Panda
Combined with ontologies and semantic data– Multiple applications – Text mining to Search
Combined with feedback before and after Search ROI is enormous - $7M per 1,000 employees a year
14
15
Enterprise Text AnalyticsBuilding on the Foundation: Applications
Focus on business value, cost cutting Enhancing information access is means, not an end
– Governance, Records Management, Doc duplication, Compliance
– Business Intelligence, CI, Behavior Prediction– eDiscovery, litigation support, Risk Management– Productivity / Portals -KM communities & knowledge bases
Sentiment Analysis, Social Media Analysis– Adding Search-based intelligence – context – New taxonomies – emotion, Appraisal
16
Beyond Search: Info AppsSearch-based Applications Plus Legal Review
– Significant trend – computer-assisted review – TA- categorize and filter to smaller, more relevant set– Payoff is big – One firm with 1.6 M docs – saved $2M
Expertise Location – Data (HR) plus text – authored documents – subject & level
Financial Services– Combine structured data (what) and unstructured text (why)– Anti-Money Laundering
17
Beyond Search: Info AppsBehavior Prediction – Telecom Customer Service
Problem – distinguish customers likely to cancel from mere threats Basic Rule
– (START_20, (AND, (DIST_7,"[cancel]", "[cancel-what-cust]"),
– (NOT,(DIST_10, "[cancel]", (OR, "[one-line]", "[restore]", “[if]”)))))
Examples:– customer called to say he will cancell his account if the does not stop receiving
a call from the ad agency. – cci and is upset that he has the asl charge and wants it off or her is going to
cancel his act
More sophisticated analysis of text and context in text Combine text analytics with Predictive Analytics and traditional behavior
monitoring for new applications
18
Beyond Search: Info AppsPronoun Analysis: Fraud Detection - Enron Emails Patterns of “Function” words reveal wide range of insights Function words = pronouns, articles, prepositions, conjunctions, etc.
– Used at a high rate, short and hard to detect, very social, processed in the brain differently than content words
Areas: sex, age, power-status, personality – individuals and groups Lying / Fraud detection: Documents with lies have
– Fewer and shorter words, fewer conjunctions, more positive emotion words
– More use of “if, any, those, he, she, they, you”, less “I”– More social and causal words, more discrepancy words
Current research – 76% accuracy in some contexts Text Analytics can improve accuracy and utilize new sources
19
Conclusions
Traditional Search improvements – nice, but Relevance needs meaning, Keyword and human tagging don’t
work Search + Text Analytics + Semantic Infrastructure work Text Analytics THE essential component of a multi-modal
solution Semantic Infrastructure
– Content, People, Technology, Processes – Integration of text analytics, search, content management– Hybrid Model of tagging – best of human & machine
Smart Search as foundation for new universe of Apps = Success beyond your wildest dreams!
20
Conclusions
Now You Believe! So, what next – how can you get started? Quick Start – software evaluation, Knowledge Map, POC or
Pilot = Good choice and Learn by doing Fall – Attend ESS, TBC, KMWorld – latest ideas Or develop a time machine and go back to yesterday and
take my workshop Fall 2014 – early 2015: New Book:
– Text Analytics: Everything You Need to Know to Conquer Information Overload, Mine Social Media for Real Value, and Turn Big Text Into Big Data
– Title might be shorter but it will be cover all you need to know
Questions?
KAPS Group
Knowledge Architecture Professional Services
http://www.kapsgroup.com