View
3.306
Download
0
Category
Tags:
Preview:
DESCRIPTION
Presentation to the May 8 2014 LIDER roadmapping workshop in Madrid
Citation preview
Text Analytics Applied
Seth GrimesAlta Plana Corporation
@sethgrimes
2nd LIDER roadmapping workshop – MadridMay 8, 2014
Text Analytics Applied
2nd LIDER workshop
2
“Organizations embracing text analytics all report having an epiphany moment when they suddenly knew more than before.” -- Philip Russom, the Data Warehousing Institute,
2007http://tdwi.org/articles/2007/05/09-what-works/bi-search-and-text-
analytics.aspx
Text Analytics Applied
2nd LIDER workshop
3
Document input and processing
Knowledge handling is key
Desk Set (1957): Computer engineer Richard Sumner (Spencer Tracy) and television network librarian Bunny Watson (Katherine Hepburn) and the "electronic brain" EMERAC.Hans Peter Luhn
“A Business Intelligence System”IBM Journal, October 1958
Text Analytics Applied
2nd LIDER workshop
5
Statistics and semanticsText analytics involves statistical
characterization and semantic understanding of text-derived features –Named entities: people, companies, places, etc.Pattern-based entities: e-mail addresses, phone
numbers, etc.Concepts: abstractions of entities.Facts and relationships.Events.Concrete and abstract attributes (e.g., “expensive”
& “comfortable”) including measure-value pairs.Subjectivity in the forms of opinions, sentiments,
and emotions: attitudinal data.– applied to business ends.
Text Analytics Applied
2nd LIDER workshop
6
SourcesIt’s a truism that 80% of enterprise-relevant
information originates in “unstructured” form:E-mail and messages.Web pages, online news & blogs, forum postings,
and other social media.Contact-center notes and transcripts.Surveys, feedback forms, warranty claims.Scientific literature, books, legal documents....
Non-text “unstructured” content?ImagesAudio including speechVideo
Value derives from patterns.
Text Analytics Applied
2nd LIDER workshop
7
ValueWhat do we do with information online, on-social,
and in the enterprise?1. Post/Publish, Manage, and Archive.2. Index and Search.3. Categorize and Classify according to
metadata & contents.4. Extract and Analyze.
Text Analytics Applied
2nd LIDER workshop
8
Semantics, analytics, and IRText analytics generates semantics to bridge
search, BI, and applications, enabling next-generation information systems.
Search BI/Big Data
Applica-tions
Search based applications (search + text + apps)
Information access (search + analytics)
Synthesis (text + BI)/(big data)
Text analytics (inner circle)
Semantic search (search + text)
NextGen CRM, EFM, MR, marketing, apps…
New York Times,September 8, 1957
Text Analytics Applied
2nd LIDER workshop
10
http://open.blogs.nytimes.com/2012/02/16/rnews-is-here-and-this-is-
what-it-means/
<div itemscope itemtype="http://schema.org/Organization"> <span itemprop="name">Google.org (GOOG)</span>
Contact Details: <div itemprop="address" itemscope itemtype="http://schema.org/PostalAddress"> Main address: <span itemprop="streetAddress">38 avenue de l'Opera</span> <span itemprop="postalCode">F-75002</span> <span itemprop="addressLocality">Paris, France</span> , </div> Tel:<span itemprop="telephone">( 33 1) 42 68 53 00 </span>, Fax:<span itemprop="faxNumber">( 33 1) 42 68 53 01 </span>, E-mail: <span itemprop="email">secretariat(at)google.org</span></div>http://schema.org/Organization
Structure matters
http://img.freebase.com/api/trans/raw/m/02dtnzv
http://www.cambridgesemantics.com/semantic-university/semantic-search-and-the-semantic-web
Text Analytics Applied
2nd LIDER workshop
11
Exploratory analysis, synthesis
Decisive Analyticshttp://www.dac.us/
Text Analytics Applied
2nd LIDER workshop
12
http://www.geeklawblog.com/2011/12/lexis-advance-platform-launch-two.html
A big data analytics architecture (example)
Text Analytics Applied
2nd LIDER workshop
13
ApplicationsSynthesis is cool, but let’s take a step back…Text analytics has applications in:
Intelligence & law enforcement.Life sciences & clinical medicine.Media & publishing including social-media analysis and contextual advertizing.Competitive intelligence.Voice of the Customer: CRM, product management & marketing.Public administration & policy.Legal, tax & regulatory (LTR) including compliance.Recruiting.
Text Analytics Applied
2nd LIDER workshop
14
Sentiment analysisA specialization, of relevance to:
Brand/reputation management.Customer experience management (CEM).Competitive intelligence.Survey analysis (EFM).Market research.Product design/quality.Trend spotting.
Text Analytics Applied
2nd LIDER workshop
15
http://altaplana.com/TA2014
Text Analytics Applied
2nd LIDER workshop
16
Military/national security/intelligenceLaw enforcement
Intellectual property/patent analysisFinancial services/capital markets
Product/service design, quality assurance, or warranty claims
OtherInsurance, risk management, or fraud
E-discoveryLife sciences or clinical medicine
Online commerce including shopping, price intel-ligence, reviews
Content management or publishingCustomer /CRM
Search, information access, or Question Answer-ing
Competitive intelligenceBrand/product/reputation management
Research (not listed)
Voice of the Customer / Customer Experience Management
0% 5% 10% 15% 20% 25% 30% 35% 40% 45%
5%6%
8%9%
10%11%
13%14%15%
16%25%
27%29%
33%38%38%
39%
What are your primary applications where text comes into play?
Text Analytics Applied
2nd LIDER workshop
17
Voice of the CustomerText analytics is applied to improve customer
service and boost satisfaction and loyalty.Analyze customer interactions and opinions –
• E-mail, contact-center notes, survey responses.• Forum & blog posting and other social media.
– to – • Address customer product & service issues.• Improve quality.• Manage brand & reputation.
Assessment of qualitative information from text helps users – • Gain feedback on interactions.• Assess customer value.• Understand root causes.• Mine data for measures such as churn likelihood.
Text Analytics Applied
2nd LIDER workshop
18
Online commerceText analytics is applied for marketing, search
optimization, competitive intelligence.Analyze social media and enterprise feedback to
understand the Voice of the Market: • Opportunities• Threats• Trends
Categorize product and service offerings for on-site search and faceted navigation and to enrich content delivery.
Annotate pages to enhance Web-search findability, ranking.
Scrape competitor sites for offers and pricing.Analyze social and news media for competitive
information.
Text Analytics Applied
2nd LIDER workshop
19
E-Discovery and complianceText analytics is applied for compliance, fraud and
risk, and e-discovery.Regulatory mandates and corporate practices
dictate –• Monitoring corporate communications• Managing electronic stored information for
production in event of litigationSources include e-mail (!!), news, social mediaRisk avoidance and fraud detection are key to
effective decision making• Text analytics mines critical data from unstructured
sources• Integrated text-transactional analytics provides rich
insights
Text Analytics Applied
2nd LIDER workshop
20
insurance claims or underwriting notes
video or animated images
photographs or other graphical images
field/intelligence reports
patent/IP filings
text messages/instant messages/SMS
Web-site feedback
chat
contact-center notes or transcripts
online reviews
Facebook postings
customer/market surveys
news articles
Twitter, Sina Weibo, or other microblogs
0% 5% 10% 15% 20% 25% 30% 35% 40% 45% 50%5%5%5%5%
7%9%
11%11%
12%12%12%13%
16%19%
20%20%
22%26%
31%31%
32%36%
37%38%
42%43%
46%
What textual information are you analyzing or do you plan to analyze?
Text Analytics Applied
2nd LIDER workshop
21
Web-site feedback
social media not listed above
chat
employee surveys
contact-center notes or transcripts
e-mail and correspondence
online reviews
scientific or technical literature
Facebook postings
on-line forums
customer/market surveys
comments on blogs and articles
news articles
blogs (long form) including Tumblr
Twitter, Sina Weibo, or other microblogs
0% 10% 20% 30% 40% 50% 60% 70%
16%
19%
20%
20%
22%
26%
31%
31%
32%
36%
37%
38%
42%
43%
46%
What textual information are you analyzing or do you plan to analyze?
201420112009
Text Analytics Applied
2nd LIDER workshop
22
Events
Semantic annotations
Other entities – phone numbers, part/product numbers, e-mail & street addresses, etc.
Metadata such as document author, publication date, title, headers, etc.
Concepts, that is, abstract groups of entities
Named entities – people, companies, geographic locations, brands, ticker symbols, etc.
Relationships and/or facts
Sentiment, opinions, attitudes, emotions, perceptions, intent
Topics and themes
0% 20% 40% 60% 80% 100%
Current; 33%
Current; 31%
Current; 34%
Current; 47%
Current; 51%
Current; 56%
Current; 47%
Current; 54%
Current; 66%
Expect; 21%
Expect; 24%
Expect; 23%
Expect; 23%
Expect; 28%
Expect; 25%
Expect; 33%
Expect; 28%
Expect; 22%
Do you currently need (or expect to need) to extract or analyze...
Text Analytics Applied
2nd LIDER workshop
23
export to Semantic Web formats (RDF, OWL, microformats, etc.)
media monitoring/analysis interface
supports data fusion / unified analytics
BI (business intelligence) integration
big data capabilities, e.g., via Hadoop/MapReduce
open source
sentiment scoring
low cost
document classification
ability to use specialized dictionaries, taxonomies, ontologies, or extraction rules
0% 10% 20% 30% 40% 50% 60% 70%
16%18%
22%25%
28%30%
32%33%33%
36%37%
40%41%
43%44%45%
53%53%54%
64%
What is important in a solution?
Text Analytics Applied
2nd LIDER workshop
24
Arabic
Chinese
French
Greek
Italian
Korean
Portuguese
Scandinavian or Baltic
Turkish or Turkic
Other Arabic script (including Urdu, Pashto, Farsi, Dari)
Other European or Slavic/Cyrillic
-10% 0% 10% 20% 30% 40% 50% 60%
10%1%
16%9%
36%34%
2%2%
18%7%
4%3%
13%8%7%
38%3%2%3%2%
5%9%
17%3%
28%7%
17%24%
2%10%
11%15%
8%4%
17%21%
3%20%
4%0%
1%1%
2%0%
CurrentWithin 2 years
Non-English language support?
Text Analytics Applied
2nd LIDER workshop
25
Software & platform optionsText-analytics options may be grouped in general
classes.• Installed text-analysis application, whether
desktop or server or deployed in-database.• Data mining workbench.• Hosted.• Programming tool.• As-a-service, via an application programming
interface (API).• Code library or component of a business/vertical
application, for instance for CRM, e-discovery, search.
Text analytics is frequently embedded in search or other end-user applications.
The slides that follow next will present leading options in each category except Hosted…
Text Analytics Applied
2nd LIDER workshop
26
User decision criteriaPrimary considerations include –
Adaptation or specialization: To a business or cultural domain, language, information type (e.g., text, speech, images) & source (e.g., Twitter, e-mail, online news).
By-user customization possibilities: For instance, via custom taxonomies, rules, lexicons.
Sentiment resolution: Aggregate, message, or feature level. (What features? Topics, coreferenced entities?)
What sentiment? Valence & what else? Emotion? Intent?
Outputs: E.g., annotated text, models, indicators, dashboards, exploratory data interfaces.
Usage mode: As-a-service (API), installed, or hosted/cloud.
Capacity: Volume, performance, throughput, latency.
Cost.
Text Analytics Applied
2nd LIDER workshop
27
Linked Data Links?
Text Analytics Applied
Seth GrimesAlta Plana Corporation
@sethgrimes
2nd LIDER roadmapping workshop – MadridMay 8, 2014
Recommended