Ikanow oanyc summit

Preview:

Citation preview

REALIZING BUSINESS VALUE FROM OPEN SOURCE DATA

AND OPEN SOURCE INTELLIGENCE

Presented by: Chris Morgan

http://bit.ly/data-vending

DATA AND ART (PRIMER)

Providing value on the potential of bad news to serve out a bag of salty potato chips

harnessing the power of open data and sentiment

Data Intelligence

Operational Lens

Intelligence is information that has been transformed to meet an operational need

Intelligence

Intelligence CycleNo matter what methodology you use…

intelligence analysis is an iterative process.

• Provide value to the organization – turn data into intelligence using an “operational lens”

• Ensure cyclical feedback occurs during collection, processing, analysis, and consumption

• Validate that a particular network is the right source of data for the questions you need answered

Open Source Analysis Goals

Common Pitfalls

Analyzing What Instead of Why

The important thing is often not what people are saying… but why they

are saying it.

Common Pitfalls

Using the Wrong Analysis Tools

Reporting tools rarely help dig into the why. Many common tools, reports, and metrics are misleading:– Word clouds atomize message context– Sentiment metrics are often highly inaccurate– Information in aggregate hides more than it

reveals

Use Case

Sentiment Analysis

http://bit.ly/ikanow-and-r

Enron Sentiment Analysis

Data source

~500,000 Publically available Enron emails

http://bit.ly/ikanow-and-r

Enron Sentiment Analysis

Hypothesis

Utilize Sentiment analysis as first order process to prioritize and streamline the

overall analysis process

http://bit.ly/ikanow-and-r

Enron Sentiment Analysis

Caveats

Sentiment was only attributed to the sender Not a complete representation of an organizations

email corpus Counteraction of uneven coverage was estimated Not a full analysis of the set of information

(objective was to use sentiment analysis as a reduction technique)

http://bit.ly/ikanow-and-r

Workflow• Data Ingestion Process– Extraction of entities, events, facts and some

basic statistics• Aggregation and Reduction– Aggregation of keywords with sentiment from

each email– Average sentiment score– Follow on aggregation by email address of the

sender over a given week (average sentiment score)

• Visualize and Analyze– Imported into Infinit.e and R for visualization

http://bit.ly/ikanow-and-r

• Horizontal Bar– Positive sentiment =

Green– Negative sentiment =

Red

• Chart on Left– Positive sentiment =

Green– Negative sentiment =

Red

• Chart on Right– Heuristic – weeks with

abrupt negative shifts indicated problems in organization

– Positive sentiment = Blue

– Negative sentiment = Red

One email sender’s Weekly Average Sentiment across time

Workflow

Workflow

close-up snapshot of sub-set of 20 individuals email average sentiment score over time

Individual analysis based on the reduction of the

information by the sentiment analysis

process

Workflow

Findings• Indicators and Additional Analysis– 801 weeks highlighted out of 11,500 weeks as

important for further investigation– Keywords found could further be used to

investigate statistically the 801 weeks highlighted for manual review

– Individual evaluation of emails highlighted through a reduction process (case construction)

– Pipeline created for further analysis

Lessons Learned

1. Drastically reduced the timeline necessary for case

construction

Lessons Learned

2. Multiple contexts for this type of technique

Intelligence Analysis

E-Discovery

Brand management Social Media

Analysis

Lessons Learned

3. Negative shifts were only investigated, analysis of the positivity side for other use cases could be applied to different questions easily

Lessons Learned

4. R and Infinit.e provide a interesting technology

integration for evaluating and reducing unstructured data

Chris Morgancmorgan@ikanow.com

www.ikanow.com

THANK YOU

github.com/ikanow/infinit.e

Recommended