CS 572: Information Retrieval - Emory Universityeugene/cs572/lectures/lecture19-social_media.pdf•...

CS 572: Information Retrieval

Web 2.0+: Social Media/CGC

Status

• Final project: updated proposal due by today/tomorrow

• Questions? Discuss at end of class.

3/28/2016 CS 572: Information Retrieval. Spring 2016 2

3Eugene Agichtein, Emory University, IR Lab

• Collaboratively Generated Content: CGC

– Creation models, quality

• Twitter/Weibo

• Searching:

– Indexing

– Ranking

Estimates of daily content creation [Ramakrishnan & Tomkins, IEEE Computer 2007]

Content type Amount produced / day

Published (books, magazines,

newspapers)

3-4 Gb

Professional Web (paid creation,

e.g., corporate Web site)

User-generated (reviews,

blogs, personal Web sites)

8-10 Gb

Private text (instant messages,

email)

3000 Gb (= 3Tb)

The value of CGC

• Extrinsic value – impact on AI, IR and NLP applications

• Intrinsic value – resources for the people– How to make the data more useful

(e.g., increase user engagement)

• Synergistic view of CGC– AI techniques can be used to improve the

quality of CGC repositories,

– which can in turn improve the performance of other AI applications

3,600,000 articles

Since 2001

2,500,000 entries

Since 2002

1,000,000,000 Q&A

Since 2005

4,000,000,000 images

Since 2004

Before7

Extrinsic value: sources of knowledge

Collaboratively Generated Content

articles: 65,000

Since 1768

entries: 150,000

Since 1985

assertions: 4,600,000

Since 1984

WordNet

Extrinsic value: what we can do with all this knowledge

• Under the hood– CGC as an additional (huge!) corpus

• Better term statistics, observe the document authoring process

– Repositories of knowledge• Concepts as features (BOW Bag of Words + Concepts)

• Concepts as word senses (way beyond WordNet)

– More comprehensive lexicons, gazetteers

– Extending existing knowledge repositories (e.g., WordNet)

• Higher-level tasks– (Concept-based) information retrieval

– Question answering

(Text) Social Media Today

Published: 4Gb/day Social Media: 10Gb/DayYahoo Answers: 120M users, 40M questions, 1B answers

Yes, we could read your blog. Or, you could tell us about your day

Models for CGC generation: Overview

• Wikipedia: contribution, curation, and coordination

– http://en.wikipedia.org/wiki/

Wikipedia:Modelling_Wikipedia%27s_growth

• Community forums: Yahoo! Answers, focused forums

• Sharing: twitter

Models of CGC Generation: Link Creation“Rich-Get-Richer” (for a while ...)

– Flickr and other sites alsoshow strong evidence of preferential attachment[Mislove et al. ‘08]

Preferential attachment:

More popular Wikipedia pages

are more likely to acquire links

(up to a point!)

[Capocci et al. ‘06]Po

Models of CGC Generation: Edits“Rich-Get-Richer” (again, up to a point)

Popular Wiki pages (> 1000 edits):

Editing activity initially increases than drops off

Later on, arriving visitors

are less able to contribute

novel information

[Das and Magdon-Ismail ‘10]

Models of CGC generation:Slowing Growth [Suh et al., WikiSym 2009]

• Article growth (#) per month, extrapolated to slow down dramatically by 2013

• Increased coordination and overhead, resistance to new edits

Models for CGC generationWikipedia conflict and coordination

[Kittur & Kraut,

Explicit coordination: discussion page

Implicit coordination: leading the effort

Example: “Music of Italy” page – The bulk of

edits are made by TUF-KAT and Jeffmatt

Page quality increases with number of

editors iff concentrated effort; otherwise,

more editors higher overhead, lower

quality

Models of CGC Generation: Content Visibility and Evolution

• Dynamics of votes of select Digg stories over a period of 4 days[Lerman ‘07]

• Dynamics of Y! Answersposts over a period of 4 days [Aji et al. ‘10]

Models for CGC Generation: ForumsInfo seekers vs. info providers

• In Yahoo! Answers, question and answer quality are strongly related:bad questions attractbad or mediocre answers:[Agichtein et al. ’08]

• Askers (Java forums) prefer help from users with justslightly more expertise [Zhang et al. ‘07]

Lots of content... Do we trust it?

3/28/2016 CS190: Web Science and Technology, 2010 17

On the Internet, nobody knows you are a dog.

CGC Quality Filtering (Overview)

• Multiple dimensions of CGC content to exploit:

– Content persistence (Wikipedia)

– Surface quality (language-wise): all CGC

– Metadata: Author reliability, geo-spatial edit patterns

– Citation/link-based estimation of authority

• Most effective systems use a combination of all of the above

Content-Driven Author Reputation[Adler & Alfaro, WWW 2007]

Automatic Vandalism Detection

• Hoaxes

• Libelous content

• Malware distribution

• Errors picked up by mainstream media!

• Wikipedia “truthiness”

Vandalized Ben Franklin Wikipedia page

Assigning Trust to Wikipedia Text

• Estimating text trust from edit history information.

• Trust is computed as:

1) Insertion: trust value for newly inserted text (E)

2) Edge effect: the text at the edge of blocks has the same trust as newly inserted text.

3) Revision: text trust may increase, if author reputation gets higher

[Adler et al., WikiSym 2008]Online demo+code are available: http://wikitrust.soe.ucsc.edu/

Before

NLP-based Vandalism Detection

• Use language modeling to predict edit quality

– Lexical: vocabulary, regular expression/lists

– Syntactic: N-gram analysis using part-of-speech tags

– Semantic: N-gram analysis with hand-picked words

[Wang et al., COLING 2010]

Figure adapted from Andrew G. West, 2010

Comparison of Vandalism Detection Approaches: Shared tasks [Potthast et al. ‘10]

• Shared vandalism corpus, 32,000 edits

• Best single approach: NLP-based

• Meta-detector: best– Task requires different

classes of features.

• CLEF 2010 Competition on Vandalism Detection:– Best result: AUC 0.92,

Precision = 100% atRecall = 20%

Finding High Quality Content in SM

• Well-written• Interesting• Relevant (answer)• Factually correct• Popular?• Provocative?• Useful?

As judged by

professional editors

E. Agichtein, C. Castillo, D. Donato, A. Gionis,

and G. Mishne, Finding High Quality Content in

Social Media, in WSDM 2008

Eugene Agichtein, Emory University, IR Lab

How do Question and Answer Quality relate?

Community

Link Analysis for Authority Estimation

Question 1

Question 2

Answer 5

Answer 1

Answer 2

Answer 4

Answer 3

User 1

User 2

User 3

User 6

User 4

User 5

Answer 6

Question 3

User 1

User 2

User 3

User 6

User 4

User 5

jAiH..0

iHjA..0

Hub (asker) Authority (answerer)

[Jurzcyk & Agichtein, CIKM 2007]

Qualitative Observations

HITS effective

HITS ineffective

Random forest

classifier

Result 1: Identifying High Quality Questions

Top Features for Question Classification

• Asker popularity (“stars”)

• Punctuation density

• Topical category

• Page views

• KL Divergence from reference corpus LM

Identifying High Quality Answers

Top Features for Answer Classification

• Answer length

• Community ratings

Answerer reputation

• Word overlap

• Kincaid readability score

CGC as a corpus

• Better statistics on a larger or cleaner text collection

– Compute IDF on the entire Wikipedia corpus

• Beware of different style / topic distribution

– Cf. Web as a corpus (Mihalcea’04, Gonzalo et al.’ 03)

• Web corpus statistics, such as counts of search results

– Cf. using very large corpora in NLP (Banko and Brill ‘01a, ‘01b)

CGC as a corpus (cont’d)

• Extend existing knowledge repositories (e.g., WordNet)

• Create new datasets, labeled by construction

– WSD datasets using OpenMind – Chklovski & Mihalcea ‘02, ‘03

– WSD datasets based on Wikipedia – Mihalcea’07; Bunescu & Pasca ‘06; Cucerzan ’07

– Text categorization datasets based on the ODP (Davidov et al. ‘04)

• Study the authoring process by observing multiple revisions of a document

41 CGC as corpus / Extending WordNet

Using corpus statistics for term weighting

• J. Elsas and S. Dumais. Leveraging temporal dynamics of document content in relevance ranking. WSDM 2010– Periodic crawls (starting from an unknown point in document’s life) vs.

formally tracked revision history

– LM-specific vs. general term frequency model, which can be plugged into any retrieval model

– Aji et al.’10 also model bursts – “significant” events in the document’s life

• M. Efron. Linear time series models for term weighting in information retrieval. JASIST 2010– Temporal behavior of terms as the entire collection changes over time

42 CGC as corpus / Document revisions

Extending term weighting models with Revision History Analysis (RHA) [Aji et al. ’10]

• RHA captures term importance from the document authoring process.

• Natural integration with state of the art retrieval models (BM25, LM)

• Consistent improvement over existing retrieval models

• Beyond Wikipedia

– Tracking changes in MS Word documents; version control

– Past snapshots of Web pages

(Elsas & Dumais ’10 constructed a new corpus)

CGC as corpus / Document revisions43

Observable document generation process Example: Revisions of “Topology” in Wikipedia

1st revision:

250th revision:

Current revision:

Revision History Analysis redefines Term Frequency (TF)

• TF is a key indicator of document relevance

• TF can be naturally integrated into ranking models

𝑆 𝑄, 𝐷 =

𝑡𝜖𝑄

𝐼𝐷𝐹 𝑡 ∙𝑇𝐹 𝑡, 𝐷 ∙ 𝑘1 + 1

𝑇𝐹 𝑡, 𝐷 + 𝑘1 1 − 𝑏 + 𝑏 ∙𝐷

𝑎𝑣𝑔𝑑𝑙

𝑆 𝑄, 𝐷 = 𝐷(𝑄| 𝐷 =

𝑡𝜖𝑉

𝑃 𝑡|𝑄 ∙ log𝑃 𝑡 𝑄

𝑃 𝑡 𝐷

Language Model

RHA global model

Define the term frequency based on the whole document generation process

– a document grows steadily over time

– a term is relatively important if it appears in the early revisions

𝑇𝐹𝑔𝑙𝑜𝑏𝑎𝑙 𝑡, 𝑑 =

𝑗=1

𝑛𝑐(𝑡, 𝑣𝑗)

𝑗𝛼

Frequency of term 𝑡 in revision 𝑣𝑗

Decay factor

Iteration over revisions

RHA global model: example

CGC as corpus / Document revisions

Decayed term importance

Decay factor

But some pages are different: “Avatar(2009 film)” – very bursty editing process

1st revision:

500th revision:

Current revision:

RHA burst model

• A burst resets the decay clock for a term

• The weight will decrease after a burst

𝑇𝐹𝑏𝑢𝑟𝑠𝑡 𝑡, 𝑑 =

𝑗=1

𝑘=𝑏𝑗

𝑛𝑐(𝑡, 𝑣𝑘)

(𝑘 − 𝑏𝑗 + 1)𝛽

Frequency of term 𝑡 in revision 𝑣𝑘

Decay factor for jth burst

Burst detection

Content-based (relative content change)

Activity-based (intensive edit activity)

Both models combined

Iteration over bursts

Iteration over revisions within

each burst

RHA burst model: example

Decay factor

Term importance

𝑇𝐹𝑟ℎ𝑎 𝑡, 𝐷 = 𝜆1 ∙ 𝑇𝐹𝑔 𝑡, 𝐷 + 𝜆2 ∙ 𝑇𝐹𝑏 𝑡, 𝐷 + 𝜆3 ∙ 𝑇𝐹 𝑡, 𝐷

Combining the global model with the burst model:

𝜆1 + 𝜆2 + 𝜆3 = 1

Putting it all together:RHA Term Frequency

global original TFburst

Integrating RHA into retrieval models

𝑆 𝑄, 𝐷 =

𝑡𝜖𝑄

𝐼𝐷𝐹 𝑡 ∙𝑇𝐹 𝑡, 𝐷 ∙ 𝑘1 + 1

𝑇𝐹 𝑡, 𝐷 + 𝑘1 1 − 𝑏 + 𝑏 ∙𝐷

𝑎𝑣𝑔𝑑𝑙

𝑆 𝑄, 𝐷 = 𝐷(𝑄| 𝐷 =

𝑡𝜖𝑉

𝑃 𝑡|𝑄 ∙ log𝑃 𝑡 𝑄

𝑃 𝑡 𝐷

Statistical Language Models

𝑇𝐹𝑟ℎ𝑎 𝑡, 𝐷

𝑃𝑟ℎ𝑎 𝑡, 𝐷

RHA Term Probability:

Performance improvements: +2-7% (INEX), +1-4% (TREC)

Overview of Part 3

Use CGC as an additional (huge!) corpus• Better term statistics

• Observing the authoring process

CGC as repositories of world knowledge• Distilling knowledge from CGC structure

and content

• Concepts as features(BOW Bag of Words + Concepts)

– Semantic relatedness, and later IR

• Concepts as word senses– WSD

CGC as repositories of senses and concepts

• Articles / concepts as representation features

– From the bag of words to

the bag of words + concepts

– Semantic relatedness of words (and texts)

– Later: concept-based IR

• Articles / concepts as word senses

– Word sense disambiguation

CGC as knowledge repositories54

Semantic relatedness of words and texts

• How related are

cat mouse

Augean stables golden apples of the Hesperides

慕田峪 (Mutianyu) 司马台 (Simatai)

• Used in many applications

– Information retrieval

– Word-sense disambiguation

– Error correction (“Web site” or “Web sight”)

55 CGC as knowledge / Semantic relatedness

Evaluation criterion: correlation with human judgments

Human score [0 .. 10]

journey voyage 9.29

Jerusalem Israel 8.46

money property 7.57

football tennis 6.63

alcohol chemistry 5.54

coast hill 4.38

professor cucumber 0.23

56 CGC as knowledge / Semantic relatedness

Previous approaches

• WordNet-based measures

– Using graph structure (path length, random walks), information content, glosses

– Comprehensive review: Budanitsky & Hirst ‘06

– Patwardhan et al. ‘03 – using semantic relatedness measures for WSD

• Jiang-Conrath (WN structure + info content) performs on par with Adapted Lesk (gloss overlap)

• All other WN-based measures are inferior to Adapted Lesk

• Co-occurrence based measures

– Latent Semantic Analysis (Deerwester et al. ‘90)

CGC as knowledge / Semantic relatedness57

The role of background knowledge

When human judge semantic relatedness they

use massive amounts of background

knowledge

Can we endow computers with such

capability of knowledge-based

reasoning?

Using Wikipedia for computing semantic relatedness

• Structure (links)

– Strube & Ponzetto ’06 (WikiRelate)

– Milne & Witten ’08 (WLM)

• Content (text)

– Gabrilovich & Markovitch ’07 (ESA)

• Structure + content

– Yeh et al. ’09 (WikiWalk)

Explicit Semantic Analysis (ESA)[Gabrilovich & Markovitch ‘06-’09]

• Wikipedia-based semantic representation

• Some ESA applications– Computing semantic relatedness of words and texts

– Text categorization

– Information retrieval

• Cross-lingual retrieval utilizing cross-language links in Wikipedia

The AI Dream61

Problem

solverProblem Solution

CGC concepts as features

The circular dependency problem

Problem

Natural Language

Understanding

Problem

Natural Language

Understanding

Wikipedia-based

semantics Statistical text processing

Circumventing the need for deep Natural Language Understanding

Every Wikipedia article represents a concept

Leopard

64 CGC concepts as features

Wikipedia can be viewed as an ontology –a collection of concepts

Leopard

Newton

The semantics of a word is a vector of its associations with Wikipedia concepts

Cf. Bengio’s talk

tomorrow on learning

text and image

representations

0.0 1.0

The semantics of a

word is a point in the

multi-dimensional

space defined by the

set of Wikipedia

concepts

Word representation: 2D example

Wikipedia

concept B

Wikipedia

concept A

0.0 1.0

Text representation: 2D example

Wikipedia

concept B

Wikipedia

concept A

The semantics of a

text fragment is a

centroid of the vectors

of the individual words

“Cat”

Leopard

Veterinarian

PetsTennis

Elephant

CGC concepts as features69

Cat [0.92] Leopard [0.84] Roar [0.77]

Cat [0.95] Felis [0.74] Claws [0.61]

Animation [0.85] Mouse [0.82] Cat [0.67]

CatCat

Panthera

Tom & Jerry

Concept and word representation in Explicit Semantic Analysis[Gabrilovich & Markovitch ’09]

Panthera

Leopar

Mac OS X

Panthera

CatCat

Panthera

Tom & Jerry

Computing semantic relatednessas cosine between two vectors

Experimental results (individual words)

Algorithm Correlation with

human judgments

WordNet-based 0.33–0.35

Roget’s Thesaurus 0.55

LSA 0.56

WikiRelate 0.49

WLM 0.69

ESA (ODP) 0.65

ESA (Wikipedia) 0.75

TSA [Radinsky et al. ‘11] 0.8

EWC (ESA/Wikipedia + WordNet + collocations)

[Haralambous & Klyuev, forthcoming at IJCNLP’11]

0.79-0.87

CGC concepts as features72

Using Wikipedia for Named Entity Disambiguation [Bunescu & Pasca ’06]

• Both the method and the dataset are based on Wikipedia

• Dataset: 1.7M labeled examples– ''The [[Vatican City | Vatican]] is now an independent enclave ...'‘

Geographical location, not the Holy See

CGC for WSD

The basic approach:

compute the cosine

similarity between the

word context and the

target articles

Problems with the basic approach

1. Short articles and stubs

2. Synonymy (lexical mismatch between the word context and the article)

Solution: use categories for generalization

CGC for WSD74

Word-Category Correlations

“John Williams and the Boston Pops a summer Star Wars concert at Tanglewood.”conducted

John Williams (composer)

John Williams (wrestler)

Film score composers

Composers

Musicians

Professional wrestlers

Wrestlers

People known in connection

with sports and hobbies

People by occupation

When a Wikipedia article is too short,

get additional context from its categories

CGC for WSD75

The full solution [Bunescu & Pasca ’06]

• ML-based approach (SVM) to rank candidate senses

• Features:

– Cosine(word context, article text)

– Word-category affinities (binary features indicating if a word has occurred in any article of a given category)

• Dimensionality (|V| x |C|) can be reduced by thresholding word frequencies

• Accuracy improvement vs. cosine-only: 3%-25.5%

CGC for WSD76

Results [Bunescu & Pasca ’06]

• The classification model can be trained on Wikipedia, then applied to any input text– Senses can only be resolved to those

defined in Wikipedia

• Another idea to overcome the lexical mismatch: augment Wikipedia articles with contexts of corresponding word senses – Cf. anchor text aggregation [Metzler et al. ’09] and

[Cucerzan ‘07]

CGC for WSD

Taxonomy Kernel

An alternative approach to named entity disambiguation [Cucerzan ‘07]

• Applicable to any text, not solely that from Wikipedia

– Sense inventory still limited to Wikipedia

• The system performed both NE identificationand disambiguation

• World knowledge used– Known entities (Wikipedia articles)

– Their class (Person, Location, Organization)

– Known surface forms (redirection pages + anchor text)

– Words that describe or co-occur with the entity

– Categories, lists, enumeration, tables

CGC for WSD78

Restricted document representation:aggregation of features of all identified surface forms

for all entity names

CGC for WSD79

Entities that can be

referred to as “Columbia”

Key idea: joint disambiguation

• Maximize the agreement – between the entity vectors and the document, and

– between the categories of the entities

• Example: {Columbia, Discovery}– Space Shuttle Columbia & Space Shuttle Discovery

– Columbia Pictures & Discovery Investigations

– LIST_astronomical_topics, CAT_manned_spacecraft, CAT_space_shuttles

• If the “one sense per discourse” assumption is violated – shrink the context iteratively

CGC for WSD80

Results [Cucerzan ‘07]

• Baseline: most frequent Wikipedia sense

• Wikipedia dataset (labeled by construction)

– 86.2% 88.3%

• MSNBC news stories (human labeling)

– 51.7% 91.4%

– Surface forms common in news are highly ambiguous

– Popularity in Wikipedia ≠ popularity in news

CGC for WSD

Blackberry

Case Study: Searching Tweets

Twitter Statistics (2014)

• 284 million monthly active users

• 500 million tweets are sent per day

• More than 300 billion tweets have been sent since company founding in 2006

• Tweets-per-second record: one-second peak of 143,199 TPS.

• More than 2 billion search queries per day.

Twitter Search Architecture

Active Index Segments: Write Friendly

Twitter Lucene Extensions

Twitter Lucene Extensions (cont’d)

Lucene Extensions: Storage

In-Memory Real Time Index

Revisit Old Idea: Probabilistic Skip Lists

Top Entities (cont’d)

Resources

• CGC tutorial at WSDM 2012:http://www.mathcs.emory.edu/~eugene/wsdm2012-cgc-tutorial/

• Real-time Search at Twitter: https://www.youtube.com/watch?v=cbksU0qkFm0

http://www.umiacs.umd.edu/~jimmylin/publications/Busch_etal_ICDE2012.pdf

http://www.slideshare.net/lucidworks/search-at-twitter-presented-by-michael-busch-twitter

CS 572: Information Retrieval - Emory Universityeugene/cs572/lectures/lecture19-social_media.pdf•...

Documents

Eugene Agichtein Microsoft Research - Emory Universityeugene/scalability-tutorial/KDD06... · 2007. 9. 7. · Coreference Resolution Deriving values from multiple sources (Web) Question

Part 3: Online Social Networks - Emory Universityeugene/cs190/lectures/march-21-PHP.pdf · Facebook Platform •Allows third parties to develop web or desktop applications that –leverage

{Why} Does Social Media Have to Change Everything?watershedplanning.tamu.edu/media/371049/hays-social_media.pdf · Sean Carton ClickZ – A Social Media Strategy Checklist . 1. What

Goal-Based Agents Informed Search - Iowa State Universityweb.cs.iastate.edu/~cs572/cs572informed-search09.pdf · Informed search • Informed use problem-specific knowledge • Which

Multi-media Cross Channel Coordination Marketing Campaignsj.b5z.net/i/u/2107738/...Multimedia...Social_Media.pdf · social, which ultimately leads us to purchase a product or service

Goal-Based Agents Informed Search - web.cs.iastate.eduweb.cs.iastate.edu/~cs572/ · – Best-first search and its variants • Heuristic functions – How to design them • Local

GOVERNMENT OF RAJASTHAN BOARD OF TECHNICAL … · 2011-12-22 · cs571/ human computer interaction it571 ar572/ computer in business systems ce572/ cc572 cs572/ ee572/ ef572/ el572

CS 572: Information Retrieval - Emory Universityeugene/cs572/lectures/lecture5...Antony and Cleopatra Julius Caesar The Tempest Hamlet Othello Macbeth Antony 1 1 0 0 0 1 Brutus 1 1

Inferring Searcher Intent - Emory Universityeugene/intent/AAAI2010-Inferring-Intent-slides.pdfCar rental Kuala Lumpur [from SIGIR 2008 Tutorial, Baeza-Yates and Jones] 24. Eugene Agichtein,

Lecture 2: Introducing Java - Emory Universityeugene/cs190/lectures/0226-advertizing.pdf · • Pay Per Click Advertising (PPC) –Also known as ... –Pick a small business (or non-profit

Supervised Learning: Artificial Neural Networkseugene/cs572/lectures/03302016-anns... · 2016-03-30 · Slide 19 The Key Elements of Neural Networks • Neural computing requires

Scalable Topic Models - Department of Math/CS - Homeeugene/cs572/lectures/scalable...Scalable Topic Models CS 572: Information Retrieval Acknowledgements: Slides adapted from Gunnar

Research Paper Presentation – CS572 Summer 2011 Presented by Donghee Sung Paper by Paul Clough (University of Sheffield Western Bank)

Introduction to Introduction to CS - Emory Universityeugene/cs170/lectures/lecture1-introductio… · Introduction to Introduction to CS Section 000: ... “computers” and is not

MapReduce & Distributed Indexing - Emory Universityeugene/cs572/lectures/lecture20-distributed... · MapReduce & Distributed Indexing Acknowledgments: Many slides in this lecture

Vasant Honavar, 2009. - Iowa State Universityweb.cs.iastate.edu/~cs572/cs572constraints09.pdf · 2009-10-23 · 3 Iowa State University Department of Computer Science Artificial Intelligence

Learning Agents - web.cs.iastate.eduweb.cs.iastate.edu/~cs572/cs572learningagents.pdf · Learning agents • All previous agent-programs use knowledge (e.g., model of the environment)

SOCIAL MEDIA CAREER ADVANCEMENTfaculty.ccbcmd.edu/career_services/social_media.pdf · (most popular services are programming, writing, designing, editing and more) 32% 18% 16% Internal

Social Media for learning by MeanS of icT - UNESCO IITEiite.unesco.org/files/policy_briefs/pdf/en/social_media.pdf · Social Media for Learning by Means of ICT ConneCtivity and soCial

CS 572: Information Retrievaleugene/cs572/lectures... · The Shawshank Redemption 4.593 137812 Lord of the Rings: The Return of the King 4.545 133597 The Green Mile 4.306 180883 Lord