32
Applying Semantics to Search Text Analytics Tom Reamy Chief Knowledge Architect KAPS Group http://www.kapsgroup.com Enterprise Search Summit New York

Applying Semantics to Search Text Analytics

  • Upload
    chelsi

  • View
    61

  • Download
    0

Embed Size (px)

DESCRIPTION

Applying Semantics to Search Text Analytics. Tom Reamy Chief Knowledge Architect KAPS Group http://www.kapsgroup.com Enterprise Search Summit New York . Agenda. Introduction – Search, Semantics, Text Analytics How do you mean? Getting (Re)Started with Text Analytics – 3 ½ steps - PowerPoint PPT Presentation

Citation preview

Page 1: Applying Semantics to Search Text Analytics

Applying Semantics to SearchText Analytics

Tom ReamyChief Knowledge Architect

KAPS Grouphttp://www.kapsgroup.com

Enterprise Search SummitNew York

Page 2: Applying Semantics to Search Text Analytics

2

Agenda Introduction – Search, Semantics, Text Analytics

– How do you mean? Getting (Re)Started with Text Analytics – 3 ½ steps Preliminary: Strategic Vision

– What is text analytics and what can it do? Step 1: Self Knowledge – TA Audit Step 2: Text Analytics Software Evaluation Step 3: POC / Quick Start – Pilot to Development Rest of your Life: Refinement, Feedback, Learning Conclusions

Page 3: Applying Semantics to Search Text Analytics

3

KAPS Group: General Knowledge Architecture Professional Services – Network of Consultants Partners – SAS, SAP, IBM, FAST, Smart Logic, Concept Searching

– Attensity, Clarabridge, Lexalytics, Strategy – IM & KM - Text Analytics, Social Media, Integration Services:

– Taxonomy/Text Analytics development, consulting, customization– Text Analytics Fast Start – Audit, Evaluation, Pilot– Social Media: Text based applications – design & development

Clients: – Genentech, Novartis, Northwestern Mutual Life, Financial Times,

Hyatt, Home Depot, Harvard Business Library, British Parliament, Battelle, Amdocs, FDA, GAO, etc.

Applied Theory – Faceted taxonomies, complexity theory, natural categories, emotion taxonomies

Presentations, Articles, White Papers – http://www.kapsgroup.com

Page 4: Applying Semantics to Search Text Analytics

4

Introduction: Search, Semantics, Text AnalyticsWhat do you mean? All Search is (should be) semantic

– Humans search concepts not chicken scratches Is this semantics?

– NLP, Concept Search, Semantic Web (ontologies) Meaning in Text

– Text Analytics – categorization – Extraction – noun phrase, facts-triples

Meaning from Search Results– A conversation, not a list of ranked (poorly) documents

Page 5: Applying Semantics to Search Text Analytics

5

What is Text Analytics?Text Analytics Features Noun Phrase Extraction

– Catalogs with variants, rule based dynamic– Multiple types, custom classes – entities, concepts, events– Feeds facets

Summarization– Customizable rules, map to different content

Fact Extraction– Relationships of entities – people-organizations-activities– Ontologies – triples, RDF, etc.

Sentiment Analysis– Rules & statistical – Objects, products, companies, and phrases

Page 6: Applying Semantics to Search Text Analytics

6

What is Text Analytics?Text Analytics Features Auto-categorization

– Training sets – Bayesian, Vector space– Terms – literal strings, stemming, dictionary of related terms– Rules – simple – position in text (Title, body, url)– Semantic Network – Predefined relationships, sets of rules– Boolean– Full search syntax – AND, OR, NOT– Advanced – DIST(#), ORDDIST#, PARAGRAPH, SENTENCE

This is the most difficult to develop Build on a Taxonomy Combine with Extraction

– If any of list of entities and other words– Disambiguation - Ford

Page 7: Applying Semantics to Search Text Analytics

Case Study – Categorization & Sentiment

7

Page 8: Applying Semantics to Search Text Analytics

Case Study – Categorization & Sentiment

8

Page 9: Applying Semantics to Search Text Analytics

9

Page 10: Applying Semantics to Search Text Analytics

10

Page 11: Applying Semantics to Search Text Analytics

11

Page 12: Applying Semantics to Search Text Analytics

12

Page 13: Applying Semantics to Search Text Analytics

13

Page 14: Applying Semantics to Search Text Analytics

14

Preliminary: Text Analytics VisionWhat can Text Analytics Do? Strategic Questions – why, what value from the text analytics,

how are you going to use it– Platform or Applications?

What are the basic capabilities of Text Analytics? What can Text Analytics do for Search?

– After 10 years of failure – get search to work? What can you do with smart search based applications?

– RM, PII, Social ROI for effective search – difficulty of believing

– Problems with metadata, taxonomy

Page 15: Applying Semantics to Search Text Analytics

Preliminary: Text Analytics VisionAdding Structure to Unstructured Content How do you bridge the gap – taxonomy to documents? Tagging documents with taxonomy nodes is tough

– And expensive – central or distributed Library staff –experts in categorization not subject matter

– Too limited, narrow bottleneck– Often don’t understand business processes and business uses

Authors – Experts in the subject matter, terrible at categorization– Intra and Inter inconsistency, “intertwingleness”– Choosing tags from taxonomy – complex task– Folksonomy – almost as complex, wildly inconsistent– Resistance – not their job, cognitively difficult = non-compliance

Text Analytics is the answer(s)!

15

Page 16: Applying Semantics to Search Text Analytics

Preliminary: Text Analytics VisionAdding Structure to Unstructured Content Text Analytics and Taxonomy Together – Platform

– Text Analytics provides the power to apply the taxonomy– And metadata of all kinds– Consistent in every dimension, powerful and economic

Hybrid Model– Publish Document -> Text Analytics analysis -> suggestions for

categorization, entities, metadata - > present to author– Cognitive task is simple -> react to a suggestion instead of select

from head or a complex taxonomy– Feedback – if author overrides -> suggestion for new category– Facets – Requires a lot of Metadata - Entity Extraction feeds facets

Hybrid – Automatic is really a spectrum – depends on context– Automatic – adding structure at search results

16

Page 17: Applying Semantics to Search Text Analytics

Step 1 : TA Information Audit Start with Self Knowledge Info Problems – what, how severe Formal Process - KA audit – content, users, technology, business

and information behaviors, applications - Or informal for smaller organization,

Contextual interviews, content analysis, surveys, focus groups, ethnographic studies, Text Mining

Category modeling – Cognitive Science – how people think Natural level categories mapped to communities, activities

• Novice prefer higher levels• Balance of informative and distinctiveness

Text Analytics Strategy/Model – forms, technology, people

17

Page 18: Applying Semantics to Search Text Analytics

Step 1 : TA Information Audit Start with Self Knowledge Ideas – Content and Content Structure

– Map of Content – Tribal language silos– Structure – articulate and integrate– Taxonomic resources

People – Producers & Consumers– Communities, Users, Central Team

Activities – Business processes and procedures– Semantics, information needs and behaviors– Information Governance Policy

Technology – CMS, Search, portals, text analytics– Applications – BI, CI, Semantic Web, Text Mining

18

Page 19: Applying Semantics to Search Text Analytics

19

Step 2: TA EvaluationVarieties of Taxonomy/ Text Analytics Software Taxonomy Management - extraction Full Platform

– SAS, SAP, Smart Logic, Concept Searching, Expert System, IBM, Linguamatics, GATE

Embedded – Search or Content Management– FAST, Autonomy, Endeca, Vivisimo, NLP, etc.– Interwoven, Documentum, etc.

Specialty / Ontology (other semantic)– Sentiment Analysis – Attensity, Lexalytics, Clarabridge, Lots – Ontology – extraction, plus ontology

Page 20: Applying Semantics to Search Text Analytics

Step 2: Text Analytics EvaluationDifferent Kind of software evaluation Traditional Software Evaluation - Start

– Filter One- Ask Experts - reputation, research – Gartner, etc.• Market strength of vendor, platforms, etc.• Feature scorecard – minimum, must have, filter to top 6

– Filter Two – Technology Filter – match to your overall scope and capabilities – Filter not a focus

– Filter Three – In-Depth Demo – 3-6 vendors Reduce to 1-3 vendors Vendors have different strengths in multiple environments

– Millions of short, badly typed documents, Build application– Library 200 page PDF, enterprise & public search

20

Page 21: Applying Semantics to Search Text Analytics

Design of the Text Analytics Selection Team Traditional Candidates – IT&, Business, Library IT - Experience with software purchases, needs assess, budget

– Search/Categorization is unlike other software, deeper look Business -understand business, focus on business value They can get executive sponsorship, support, and budget

– But don’t understand information behavior, semantic focus Library, KM - Understand information structure Experts in search experience and categorization

– But don’t understand business or technology

21

Page 22: Applying Semantics to Search Text Analytics

Design of the Text Analytics Selection Team

Interdisciplinary Team, headed by Information Professionals Relative Contributions

– IT – Set necessary conditions, support tests– Business – provide input into requirements, support project– Library – provide input into requirements, add understanding of

search semantics and functionality Much more likely to make a good decision Create the foundation for implementation

22

Page 23: Applying Semantics to Search Text Analytics

Step 3: Proof of Concept / Pilot Project

4 weeks POC – bake off / or short pilot Real life scenarios, categorization with your content 2 rounds of development, test, refine / Not OOB Need SME’s as test evaluators – also to do an initial categorization of

content Measurable Quality of results is the essential factor Majority of time is on auto-categorization Need to balance uniformity of results with vendor unique capabilities –

have to determine at POC time Taxonomy Developers – expert consultants plus internal taxonomists

23

Page 24: Applying Semantics to Search Text Analytics

24

Step 3 : Proof of ConceptPOC Design: Evaluation Criteria & Issues Basic Test Design – categorize test set

– Score – by file name, human testers Categorization & Sentiment – Accuracy 80-90%

– Effort Level per accuracy level Quantify development time – main elements Comparison of two vendors – how score?

– Combination of scores and report Quality of content & initial human categorization

– Normalize among different test evaluators Quality of taxonomists – experience with text analytics software and/or

experience with content and information needs and behaviors Quality of taxonomy – structure, overlapping categories

Page 25: Applying Semantics to Search Text Analytics

Step 3: Proof of ConceptPOC and Early Development: Risks and Issues CTO Problem –This is not a regular software process Semantics is messy not just complex

– 30% accuracy isn’t 30% done – could be 90% Variability of human categorization Categorization is iterative, not “the program works”

– Need realistic budget and flexible project plan Anyone can do categorization

– Librarians often overdo, SME’s often get lost (keywords) Meta-language issues – understanding the results

– Need to educate IT and business in their language

25

Page 26: Applying Semantics to Search Text Analytics

Step 3: Proof of Concept / Quick StartOutcomes POC – understand how text analytics can work in your

environment Learn the software – internal resources trained by doing Learn the language – syntax (Advanced Boolean) Learn categorization and extraction Good categorization rules

– Balance of general and specific– Balance of recall and precision

Develop or refine taxonomies for categorization POC – can be the Quick Start or the Start of the Quick Start

26

Page 27: Applying Semantics to Search Text Analytics

Development, ImplementationQuick Start – First Application: Search and TA Simple Subject Taxonomy structure

– Easy to develop and maintain Combined with categorization capabilities

– Added power and intelligence Combined with people tagging, refining tags Combined with Faceted Metadata

– Dynamic selection of simple categories– Allow multiple user perspectives

• Can’t predict all the ways people think• Monkey, Banana, Panda

Combined with ontologies and semantic data– Multiple applications – Text mining to Search– Combine search and browse

27

Page 28: Applying Semantics to Search Text Analytics

28

3. Roles and Responsibilities Sample roles matrix:

Page 29: Applying Semantics to Search Text Analytics

29

3. Roles and Responsibilities Common Roles and SharePoint Permissions:

Recommended Roles

SharePoint Permissions

Site Administrator Site AdministratorSharePoint Owner Full ControlSharePoint Member ContributeSharePoint Visitor Read

Page 30: Applying Semantics to Search Text Analytics

Rest of Your Life: Maintenance, Refinement, Application, Learning This is easy – if you did the TA Audit and POC/Quick Start Content – new content – calls for flexible, new methods People – Have a trained team and extended team Technology – integrate into variety of applications – SBA Processes, workflow – how semi-automate, part of normal Maintenance – Refinement – in world of rapid change

– Mechanisms for feedback, learning – of text analysts and software Future Directions - Advanced Applications

– Embedded Applications, Semantic Web + Unstructured Content– Integration of Enterprise and External - Social Media– Expertise Analysis, Behavior Prediction (Predictive Analytics) – Voice of the Customer, Big Data– Turning unstructured content into data – new worlds

30

Page 31: Applying Semantics to Search Text Analytics

Conclusion

Text Analytics can fulfill the promise of taxonomy and metadata– Economic and consistent structure for unstructured content

Search and Text Analytics– Search that works – finally!– Platform for Search-Based Applications

Text Analytics is different kind of software / solution– Infrastructure – Hybrid CM to Search and feedback

How to Get Started with Text Analytics– Strategic Vision of Text Analytics– Three steps – TA Audit, TA evaluation, POC/Quick Start

Text Analytics opens up new worlds of applications

31

Page 32: Applying Semantics to Search Text Analytics

Questions? Tom Reamy

[email protected] Group

Knowledge Architecture Professional Serviceshttp://www.kapsgroup.com

www.TextAnalyticsWorld.com Oct 3-4, Boston