MediaFinder: Collect, Enrich and Visualize Media Memes Shared by the Crowd

Preview:

DESCRIPTION

"MediaFinder: Collect, Enrich and Visualize Media Memes Shared by the Crowd", talk given at the 2nd Real Time Analysis and Mining of Social Streams Workshop (RAMSS) colocated with WWW 2013, Rio de Janeiro, Brazil

Citation preview

MediaFinder: Collect, Enrich and Visualize Media Memes

Shared by the Crowd

Raphaël Troncy

raphael.troncy@eurecom.fr / @rtroncy

Conferences and natural disaster

14/05/2013 Real-Time Analysis and Mining of Social Streams (RAMSS) - Rio de Janeiro - 2

- 3 14/05/2013 Real-Time Analysis and Mining of Social Streams (RAMSS) - Rio de Janeiro

- 4 14/05/2013 Real-Time Analysis and Mining of Social Streams (RAMSS) - Rio de Janeiro

- 6 14/05/2013 Real-Time Analysis and Mining of Social Streams (RAMSS) - Rio de Janeiro

Social Media: some definitions

Media Item: a photo or a video that is shared on a social network

Micropost: a text status message that can optionally accompany a media item

Social Network: an online service that focuses on building and reflecting social relationships among people sharing interests or activities Media Sharing Platforms: emphasis on sharing media

but blurred boundaries with social networks since users are encouraged to react on media content (like, comment, favorite, etc.)

Real-Time Analysis and Mining of Social Streams (RAMSS) - Rio de Janeiro 14/05/2013 - 7

Social networks and media items

First-order support: Posting requires the inclusion of a media item Example: Flickr, YouTube

Second-order support: Possibility to post media items but also text-only messages Example: Facebook

Third-order support: No direct support for media items but rely on third party applications

to host them Example: Twitter before the introduction of native photo support

Real-Time Analysis and Mining of Social Streams (RAMSS) - Rio de Janeiro 14/05/2013 - 8

Media Server

Composition of media item extractors (12 SNs) Rely on search APIs + a fix 30s timeout window to provide results Fallback on screen scraping when necessary (Twitter ecosystem)

Implemented as a NodeJS server

Serialize results in a common schema (JSON)

Real-Time Analysis and Mining of Social Streams (RAMSS) - Rio de Janeiro 14/05/2013 - 9

14/05/2013 Real-Time Analysis and Mining of Social Streams (RAMSS) - Rio de Janeiro - 10

Deep link Permalink

Clean text for NLP processing

Aggregate view of ALL social interactions

12 Social Networks

Media Finder (www2013)

14/05/2013 Real-Time Analysis and Mining of Social Streams (RAMSS) - Rio de Janeiro - 11

Media Finder (zooming on media items)

14/05/2013 Real-Time Analysis and Mining of Social Streams (RAMSS) - Rio de Janeiro - 12

Media Finder (timeline view)

14/05/2013 Real-Time Analysis and Mining of Social Streams (RAMSS) - Rio de Janeiro - 13

Named Entities are Pivotal

Standalone software GATE Stanford CoreNLP Temis

Web APIs

http://nerd.eurecom.fr/

14/05/2013 Real-Time Analysis and Mining of Social Streams (RAMSS) - Rio de Janeiro - 14

What is NERD? REST API2 ontology1

UI3

1 http://nerd.eurecom.fr/ontology 2 http://nerd.eurecom.fr/api/application.wadl 3 http://nerd.eurecom.fr

The NERD ontology has been integrated in the NIF project, a EU FP7 in the context of the LOD2: Creating Knowledge out of Interlinked Data

14/05/2013 Real-Time Analysis and Mining of Social Streams (RAMSS) - Rio de Janeiro - 15

NERD REST API

14/05/2013 Real-Time Analysis and Mining of Social Streams (RAMSS) - Rio de Janeiro - 16

GET, POST, PUT,

DELETE

/document /user /annotation/{extractor} /extraction /evaluation ...

JSON/RDF*

“entities” : [{ “entity”: “Tim Berners-Lee” , “type”: “Person” , “uri”: "http://dbpedia.org/resource/Tim_berners_lee", “nerdType”: "http://nerd.eurecom.fr/ontology#Person", “startChar”: 30, “endChar”: 45, “confidence”: 1, “relevance”: 0.5 }]

Rizzo G., Troncy R. (2012), NERD: A Framework for Unifying Named Entity Recognition and Disambiguation Web Extraction Tools. In: European chapter of the Association for Computational Linguistics (EACL'12), Avignon, France.

Media Finder Architecture

Media items harvesting using the Media Server http://eventmedia.eurecom.fr/media-

server/search/{combined}/{term} https://github.com/vuknje/media-server (@tomayac fork)

Image near de-duplication DCT signature on image and video frame,

Hamming distance between image pairs

Clustering and disambiguation Named Entity Extraction using NERD Topic Generation using LDA

14/05/2013 Real-Time Analysis and Mining of Social Streams (RAMSS) - Rio de Janeiro - 17

Media Finder (named entities clustering)

14/05/2013 Real-Time Analysis and Mining of Social Streams (RAMSS) - Rio de Janeiro - 18

Media Finder (zooming in a cluster)

14/05/2013 Real-Time Analysis and Mining of Social Streams (RAMSS) - Rio de Janeiro - 19

Media Finder

Live Topic Generation from Event Streams Meet us at WWW 2013 Demo Session http://www.youtube.com/watch?v=8iRiwz7cDYY

14/05/2013 Real-Time Analysis and Mining of Social Streams (RAMSS) - Rio de Janeiro - 20

Tracking an event: Italian Election

Repeated queries over a period of time We have tracked and analyzed media posts tagged as

elezioni2013 from 2013-02-26 to 2013-03-03 Cron job: every 30 minutes over the 6 days Slice the data in 24 hours slots

Research questions: Can we re-create the news headlines?

Storyboarding: http://mediafinder.eurecom.fr/story/elezioni2013

14/05/2013 Real-Time Analysis and Mining of Social Streams (RAMSS) - Rio de Janeiro - 21

Tracking an event: Italian Election

Dataset: ~16501 microposts containing (duplicate) media items ~21087 Named Entities extracted

Clustering NER and LDA Generate Bag of Entities (BOE) disambiguated with a

DBpedia URI

Examples: Monti, Bersani, Italia, Berlusconi, Grillo, Stelle

14/05/2013 Real-Time Analysis and Mining of Social Streams (RAMSS) - Rio de Janeiro - 22

Tracking an event: Italian Election

Tracking and Analyzing The 2013 Italian Election To appear at ESWC 2013 Demo Session http://www.youtube.com/watch?v=jIMdnwMoWnk

14/05/2013 Real-Time Analysis and Mining of Social Streams (RAMSS) - Rio de Janeiro - 23

Take Home Message

Media Server / Media Finder: Aggregating fresh social media items Making sense of media collection for video hyper-linking

NERD platform for extracting key information

Vision: adoption of semantic multimedia technologies will foster a European market for media fragment re-purposing and re-selling

Sneak preview: Interact with a Kinect and discover enriched hypervideo http://www.youtube.com/watch?v=4mSC685AG7k

14/05/2013 Real-Time Analysis and Mining of Social Streams (RAMSS) - Rio de Janeiro - 24

Credits

Vuk Milicic … interaction designer

Giuseppe Rizzo … NERD guru

José Luis Redondo Garcia … triplification and clustering

Thomas Steiner … Media Server original code

14/05/2013 Real-Time Analysis and Mining of Social Streams (RAMSS) - Rio de Janeiro - 25

http://www.slideshare.net/troncy

14/05/2013 Real-Time Analysis and Mining of Social Streams (RAMSS) - Rio de Janeiro - 26