Transcript
Page 1: Text Analytics Past, Present & Future

Text Analytics Past, Present & Future

Seth Grimes

Page 2: Text Analytics Past, Present & Future

>> Past, Present & Future

He who controls the present, controls the past. He who controls the past, controls the future.

-- derived from George Orwell’s 1984

Page 3: Text Analytics Past, Present & Future

>> The Present: Today’s Market

I have estimated a $350 million global market in 2008, up 40% from $250 million in 2007.• Covers software licenses, vendor provided support and

professional services.

$(hundreds) million more value created by:• Universities and research centers, especially in the life

sciences.• Government, particularly for intelligence & counter-

terrorism.• OEM licensees, for listening platforms, e-discovery, etc.• Systems integrators and consultants.

Page 4: Text Analytics Past, Present & Future

>> Applications Today

Broadly grouped --• Intelligence and counter-terrorism.• Life sciences.

• Content management, publishing & search.• Customer & market intelligence.• E-discovery.• Enterprise feedback.• Law enforcement.• Risk, fraud, compliance, and investigation.

Page 5: Text Analytics Past, Present & Future

>> On the Demand Side…

How do current and prospective users see the market?

I recently published a study report, “Text Analytics 2009: User Perspectives on Solutions and Providers.” Drawing from the findings…

Page 6: Text Analytics Past, Present & Future

>> Primary Applications

Law enforcement

Other

E-discovery

Insurance, risk management, or fraud

Content management or publishing

Research (not listed)

Competitive intelligence

0% 5% 10% 15% 20% 25% 30% 35% 40% 45%

7%

8%

13%

14%

15%

15%

17%

18%

19%

22%

33%

33%

37%

40%

What are your primary applications where text comes into play?

Page 7: Text Analytics Past, Present & Future

>> Primary Applications

Results found by Fern Halper of Hurwitz & Associates.

Law enforcement

Other

E-discovery

Insurance, risk management, or fraud

Content management or publishing

Research (not listed)

Competitive intelligence

0% 10% 20% 30% 40% 50%

7%

8%

13%

14%

15%

15%

17%

18%

19%

22%

33%

33%

37%

40%

Page 8: Text Analytics Past, Present & Future

>> The “Unstructured Data” Challenge

Sources are highly varied –• Web sites, news & journal articles, images, video. • Blogs, forum postings, and social media.• E-mail, Contact-center notes and transcripts; recorded

conversation.• Surveys, feedback forms, warranty & insurance claims.• Office documents, regulatory filings, reports, scientific

papers.• And every other sort of document imaginable.

Page 9: Text Analytics Past, Present & Future

>> Important Sources

blogs and other social media (twitter, social-network sites, etc.)

62%

news articles 55%

on-line forums 41%

e-mail and correspondence 38%

customer/market surveys 35%

What textual information are you analyzing or do you plan to analyze?

Current users responded:

Page 10: Text Analytics Past, Present & Future

>> Finding Business Value

Why? In customer-experience initiatives, for example, “more unsolicited, unstructured data [implies] increasing use of text analytics.”

-- Bruce Temkin, Forrester Research

Page 11: Text Analytics Past, Present & Future

>> Information in Text

Named entities – people, companies, geographic locations, brands, ticker symbols, etc.

Topics and themes

Sentiment, opinions, attitudes, emotions

Concepts, that is, abstract groups of entities

Events, relationships, and/or facts

Metadata such as document author, publication date, title, headers, etc.

Other entities – phone numbers, e-mail & street addresses

Other

0% 10% 20% 30% 40% 50% 60% 70% 80%

71%

65%

60%

58%

55%

53%

40%

15%

Do you need (or expect to need) to extract or analyze:

Page 12: Text Analytics Past, Present & Future

Please rate your overall experience -- your satisfaction.

Fern Halper of Hurwitz & Associates found in her 2009 survey, “all of the companies that had deployed text analytics stated that the implementations either met or exceeded their expectations. And, close to 60% stated that text analytics had actually exceeded expectations.”

>> Text Analytics Satisfaction

Page 13: Text Analytics Past, Present & Future

>> Today’s Text Analytics Players

Data mining and analytics.

Enterprise- and specialized-application focus.

Search tools and services.

Software-tool, OEM suppliers.*

Text analytics pure-plays, diverse applications.*

Web services.

* TEMIS categories.

Page 14: Text Analytics Past, Present & Future

>> Today’s Text Analytics

Contrast with the 1999 landscape –

“The nascent field of text data mining (TDM) has the peculiar distinction of having a name and a fair amount of hype but as yet almost no practitioners.”

-- Prof. Marti A. Hearst,“Untangling Text Data Mining,” 1999

(For our purposes, “text analytics” = “text mining” = “text data mining.”)

Page 15: Text Analytics Past, Present & Future

>> What’s Past is Prologue

“Don't look back. Something might be gaining on you.”

-- Satchel Paige

Page 16: Text Analytics Past, Present & Future

>> Understanding the Challenge

Marti Hearst in 1999:

“Text expresses a vast, rich range of information, but encodes this information in a form that is difficult to decipher automatically.”

“[A] way to view text data mining is as a process of exploratory data analysis that leads to the discovery of heretofore unknown information, or to answers for questions for which the answer is not currently known.”

Challenges: Access, decoding, discovery, application.

Page 17: Text Analytics Past, Present & Future

>> In Business Terms

Business intelligence (BI) as defined in 1958:“In this paper, business is a collection of activities carried on for whatever purpose, be it science, technology, commerce, industry, law, government, defense, et cetera... The notion of intelligence is also defined here... as ‘the ability to apprehend the interrelationships of presented facts in such a way as to guide action towards a desired goal.’”

-- Hans Peter Luhn, “A Business Intelligence System,”

IBM Journal, October 1958

Page 18: Text Analytics Past, Present & Future

Document input and processing

Information extraction

H.P. Luhn, “A Business Intelligence System,” IBM Journal, October 1958

Knowledge management

Page 19: Text Analytics Past, Present & Future

>> Statistical Analysis of Content

Hans Peter Luhn, “The Automatic Creation of Literature Abstracts,” IBM Journal, April 1958

“Statistical information derived from word frequency and distribution is used by the machine to compute a relative measure of significance.”

Page 20: Text Analytics Past, Present & Future

>> Significance from Semantics

“This rather unsophisticated argument on ‘significance’ avoids such linguistic implications as grammar and syntax... No attention is paid to the logical and semantic relationships the author has established.”

-- Hans Peter Luhn, 1958

Page 21: Text Analytics Past, Present & Future

>> Methods

Technologists developed approaches to taming text:• Vector-space representations.

Salton, Wong & Yang, 1975,“A Vector Space Model for Automatic Indexing.”

• Clustering & classification algorithms.Naive Bayes.

Support Vector Machine.

K-nearest neighbor.• Linguistic methods.

• Machine learning.

Page 22: Text Analytics Past, Present & Future

>> Looking Ahead

Page 23: Text Analytics Past, Present & Future

>> Market Trends

Stronger than ever:• Life sciences.• Intelligence & counter-terrorism.

Continued steep growth:• Media & publishing.

Seek to mine and to classify/process. For users, semantic annotations ease navigation and boost findability.

• Customer experience. Key to quality, satisfaction.

• Market intelligence including competitive intelligence. Aggregates and details are both important.

“The Diverse and Exploding Digital Universe,” (IDC, 2008)

Page 24: Text Analytics Past, Present & Future

>> Technology Initiatives

Now and near future.• Semantic search.

Guha (IBM), McCool (Stanford), Miller (W3C): “The addition of explicit semantics can improve [navigational and research] search” (2003).

• Question answering.Matthew Glotzbach, Google: “Question answering is the future of

enterprise search” (2006).• Sentiment analysis.

Bing Liu, Univ of Illinois: “The Web has dramatically changed the way that people express their views and opinions.”

Page 25: Text Analytics Past, Present & Future

>> Technology Initiatives 2

Now and near future.• Listening platforms.

Bruce Temkin, Forrester Research: “The future is clearly about analyzing feedback in any form that your customers give it. That’s a trend that won’t go away.”

• Text visualization.We’re still coming to terms with the idea of actually extracting and

exploiting the information content of rich media.• Web 3.0 & the Semantic Web.

Ronen Feldman, Bar-Ilan University and Hebrew University: “Text analytics [is] driving the Semantic Web” (2006).

Page 26: Text Analytics Past, Present & Future

>> Search, from Keywords to Intelligence

Text analytics enables smarter search that better responds to user goals.

Page 27: Text Analytics Past, Present & Future

>> Question Answering

Text analytics (information extraction) feeds curated knowledge bases.

Page 28: Text Analytics Past, Present & Future

>> Sentiment Analysis

Two assertions:• Human

communications are inherently subjective.

• Opinion often masquerades as Fact.

Page 29: Text Analytics Past, Present & Future

>> Sentiment Analysis

“Sentiment analysis is the task of identifying positive and negative opinions, emotions, and evaluations.”

-- Wilson, Wiebe & Hoffman, 2005, “Recognizing Contextual Polarity in Phrase-Level Sentiment Analysis”

“Great hotel, just a few brilliant streets, full of restaurants and shops, from La Rambla. Beautiful hotel restaurant and the pool is UNBELIEVABLE! Single room is very modern and the blackout blind is awesome on mornings that you wish to sleep for a few more minutes. Will definitely be back!”

« Logiciel d’apparence assez simple (j’aime beaucoup l’icône de l’application), mais qui se trouve être très malin et sait se différencier de ses concurrents, par la possibilité de lui appliquer des thèmes ! »

Page 30: Text Analytics Past, Present & Future

>> Text Visualization

http://www.wordle.net/

Page 31: Text Analytics Past, Present & Future

>> Web 3.0 & the Semantic Web

“We have many of the tools in place -- from Web 2.0 technologies… to unstructured data search software and the Semantic Web -- to tame the digital universe. Done right, we can turn information growth into economic growth.”

-- “The Diverse and Exploding Digital Universe,” (IDC, 2008)

“The Semantic Web is a web of data, in some ways like a global database.” -- Tim Berners-Lee, 1998

Web 3.0 = Web 2.0 + the Semantic Web + semantic tools.

Page 32: Text Analytics Past, Present & Future

>> Web 3.0 & the Semantic Web

Recurring themes:• Semantically enriched -- context sensitive -- localized.

Technical concepts:• Linked Data -- Microformats, RDF, SPARQL – OWL.

Text analytics enables Web 3.0 and the Semantic Web.• Automated content categorization and classification.• Text augmentation: metadata generation, content tagging.• Information extraction to databases.• Exploratory analysis and visualization.

Page 33: Text Analytics Past, Present & Future

Text Analytics Past, Present & Future

Seth [email protected]://altaplana.com


Recommended