26
© CGI GROUP INC. All rights reserved _experience the commitment TM The Cloud: Searching for Meaning Finding Relevant Data in the Cloud for Actionable Decisions APRIL 2012

Andres Dorado -Finding Relevant Data in the Cloud

Embed Size (px)

DESCRIPTION

 

Citation preview

Page 1: Andres Dorado -Finding Relevant Data in the Cloud

© CGI GROUP INC. All rights reserved

_experience the commitment TM

The Cloud: Searching for Meaning

Finding Relevant Data in the Cloud for Actionable Decisions

APRIL 2012

Page 2: Andres Dorado -Finding Relevant Data in the Cloud

2

Confidential

Agenda

• Information Retrieval

• The “ABC” Formula

• Some of the Challenges

• Example 1: The Right Profile

• Example 2: Like it �

• Example 3: Promote it

• Conclusions

• Q&A

Page 3: Andres Dorado -Finding Relevant Data in the Cloud

3

Confidential

Agenda

• Information Retrieval

• The “ABC” Formula

• Some of the Challenges

• Example 1: The Right Profile

• Example 2: Like it �

• Example 3: Promote it

• Conclusions

• Q&A

Page 4: Andres Dorado -Finding Relevant Data in the Cloud

4

Confidential

Information Retrieval is beyond databases

DBMS

Enterprise Data

> SELECT *FROM

Information Retrieval*, aka Search, is

finding material (usually documents) of an unstructured nature (usually text) that

satisfies an information need from within large collections (usually stored on computers).

* Maning, C. D., Raghavan, P. and Schutze, H. An Introduction to Information Retrieval. 2009

“An Information Need* is the topic

about which the user desires to know more, and is differentiated from a

query, which is what the user conveys

to the computer in an attempt to

communicate the information need.”

Search Go

Page 5: Andres Dorado -Finding Relevant Data in the Cloud

5

Confidential

Volume, variety and velocity… Big Data

* Maning, C. D., Raghavan, P. and Schutze, H. An Introduction to Information Retrieval. 2009

DBMS

Enterprise Data

> SELECT *FROM

Information Retrieval*, aka Search, is

finding material (usually documents) of an unstructured nature (usually text) that

satisfies an information need from within large collections (usually stored on computers).

Big Data refers to fast growing,

large data sets that cannot be managed with “traditional” Database

Management Systems.

The “Cloud”

Search Go

Page 6: Andres Dorado -Finding Relevant Data in the Cloud

6

Confidential

Consumer market is there and Organizations can learn from it

Personal

DataThe “Cloud”

iPhone

Siri: Searching for…

Page 7: Andres Dorado -Finding Relevant Data in the Cloud

7

Confidential

Analytics is enabling these capabilities

* Maning, C. D., Raghavan, P. and Schutze, H. An Introduction to Information Retrieval. 2009

The “Cloud”

Big Data

DBMS

Enterprise Data

> SELECT *FROM

Search Go

Information Retrieval applies analytic

techniques such as clustering and classification to support users in

browsing or filtering document collections or further processing a set of retrieved documents.*

Page 8: Andres Dorado -Finding Relevant Data in the Cloud

8

Confidential

Agenda

• Information Retrieval

• The “ABC” Formula

• Some of the Challenges

• Example 1: The Right Profile

• Example 2: Like it �

• Example 3: Promote it

• Conclusions

• Q&A

Page 9: Andres Dorado -Finding Relevant Data in the Cloud

9

Confidential

The “ABC” Formula

The “Cloud”

Big Data

Analytics

DBMS

Enterprise Data

> SELECT *FROM

Search Go

Analytics + Big Data + The “Cloud” = Enhanced Business Operations

Page 10: Andres Dorado -Finding Relevant Data in the Cloud

10

Confidential

Agenda

• Information Retrieval

• The “ABC” Formula

• Some of the Challenges

• Example 1: The Right Profile

• Example 2: Like it �

• Example 3: Promote it

• Conclusions

• Q&A

Page 11: Andres Dorado -Finding Relevant Data in the Cloud

11

Confidential

Some of the Challenges

Finding relevant data

Large-scale data sets

Quality of search results

• “A document is relevant* if it is one that the user perceives as containing information of value with respect to their personal information need.”

• “Something (A) is relevant** to a task (T) if it increases the likelihood of accomplishing the goal (G), which is implied by T.”

* Maning, C. D., Raghavan, P. and Schutze, H. An Introduction to Information Retrieval. 2009** Hjorland, B. and Christensen, F. S. Work tasks and socio-cognitive relevance: A specific example. 2002

Page 12: Andres Dorado -Finding Relevant Data in the Cloud

12

Confidential

Some of the Challenges

Finding relevant data

Large-scale data sets

Quality of search results

• Personal Information Retrieval: The system searches operating systems, e-mail, and other device applications.

• Enterprise, Institutional, and domain-specific search: Documents are typically stored on centralized file systems and/or dedicated servers.

• Web Search: The system has to provide search over billions of documents stored on millions of computers.

Page 13: Andres Dorado -Finding Relevant Data in the Cloud

13

Confidential

Some of the Challenges

Finding relevant data

Large-scale data sets

Quality of search results

• To assess effectiveness of an Information Retrieval system (i.e., the quality of its search results), a user will usually want to know two key statistics about the system’s returned results for a query or search:

• Precision: What fraction of the returned results are relevant to the information need?

• Recall: What fraction of the relevant documents in the collection were returned by the system?

Page 14: Andres Dorado -Finding Relevant Data in the Cloud

14

Confidential

Agenda

• Information Retrieval

• The “ABC” Formula

• Some of the Challenges

• Example 1: The Right Profile

• Example 2: Like it �

• Example 3: Promote it

• Conclusions

• Q&A

Page 15: Andres Dorado -Finding Relevant Data in the Cloud

15

Confidential

Example 1: The Right Profile

The “Cloud”

Big Data:LinkedIn

150 million professionals

Analytics:Text Mining

DBMS

Pipeline Data

> SELECT *FROM

Search Go

Analytics + Big Data + The “Cloud” = Enhanced Recruitment Process

Page 16: Andres Dorado -Finding Relevant Data in the Cloud

16

Confidential

Example 1: The Right Profile

Page 17: Andres Dorado -Finding Relevant Data in the Cloud

17

Confidential

Agenda

• Information Retrieval

• The “ABC” Formula

• Some of the Challenges

• Example 1: The Right Profile

• Example 2: Like it �

• Example 3: Promote it

• Conclusions

• Q&A

Page 18: Andres Dorado -Finding Relevant Data in the Cloud

18

Confidential

Example 2: Like it ����

The “Cloud”

Big Data:Twitter

340 million tweets/day

Sentiment Analysis

DBMS

Pipeline Data

> SELECT *FROM

Search Go

Analytics + Big Data + The “Cloud” = Enhanced Customer Satisfaction

Page 19: Andres Dorado -Finding Relevant Data in the Cloud

19

Confidential

Example 2: Like it ����

Public Relations using “Twitter Earth”Case: Tracking tweets and displaying them by location

Page 20: Andres Dorado -Finding Relevant Data in the Cloud

20

Confidential

Agenda

• Information Retrieval

• The “ABC” Formula

• Some of the Challenges

• Example 1: The Right Profile

• Example 2: Like it �

• Example 3: Promote it

• Conclusions

• Q&A

Page 21: Andres Dorado -Finding Relevant Data in the Cloud

21

Confidential

Example 3: Promote it

The “Cloud”

Big Data:Facebook

800 million users

“Wisdom”

DBMS

Pipeline Data

> SELECT *FROM

Search Go

Analytics + Big Data + The “Cloud” = Enhanced Marketing Effectiveness

Page 22: Andres Dorado -Finding Relevant Data in the Cloud

22

Confidential

Example 3: Promote it

Social Intelligence using “Wisdom”Case: Analyzing 10 million Facebook users to promote Engineering

Page 23: Andres Dorado -Finding Relevant Data in the Cloud

23

Confidential

Agenda

• Information Retrieval

• The “ABC” Formula

• Some of the Challenges

• Example 1: The Right Profile

• Example 2: Like it �

• Example 3: Promote it

• Conclusions

• Q&A

Page 24: Andres Dorado -Finding Relevant Data in the Cloud

24

Confidential

Conclusions

• Analytics add capabilities to information retrieval systems that facilitate finding relevant data in the “cloud”.

• Analytics enables information retrieval systems to deal with large-scale data sets and therefore is recommendable for working with Big Data.

• Analytics provides advanced techniques for more effective browsing and filtering of Big data.

How are you driving business value with the data assets accessible in by your organization?

Consider the “ABC” formula

Page 25: Andres Dorado -Finding Relevant Data in the Cloud

25

Confidential

Agenda

• Information Retrieval

• The “ABC” Formula

• Some of the Challenges

• Example 1: The Right Profile

• Example 2: Like it �

• Example 3: Promote it

• Conclusions

• Q&A

Page 26: Andres Dorado -Finding Relevant Data in the Cloud

_experience the commitment TM

Our commitment to youCGI delivers outcomes your business can count on.