13
TEMPORAL SUMMARIZATION OF EVENT-RELATED UPDATES IN WIKIPEDIA Mihai Georgescu, Dang Duc Pham, Sergej Zerr , Nattiya Kanhabua Stefan Siersdorfer, Wolfgang Nejdl L3S Research Center Leibniz University Hannover

Temporal summarization of event related updates

Embed Size (px)

Citation preview

TEMPORAL SUMMARIZATION OF EVENT-RELATED UPDATES

IN WIKIPEDIA

Mihai Georgescu, Dang Duc Pham, Sergej Zerr , Nattiya KanhabuaStefan Siersdorfer, Wolfgang Nejdl

L3S Research CenterLeibniz University Hannover

Overview

• Introduction and Motivation• Methods• Demo

Introduction

• Wikipedia is a free multilingual online encyclopedia covering a wide range of general and specic knowledge

• Most up-to-date encyclopedia

• One of the reasons that drives editing and updating in Wikipedia is the occurrence of new events in the real world

• All updates are kept in an edit history

• Use the edit history of Wikipedia for extracting event-related information and then present it in a comprehensive way

entity

Events cause increased activity on Wikipedia

Extract and present event-related information from the Wikipedia updates

Event

entity

Peaks in update activity correlate with eventsEdit history for the Barack Obama article (monthly)

Mar-

04

May-0

4

Jul-

04

Sep-0

4

Nov-0

4

Jan-0

5

Mar-

05

May-0

5

Jul-

05

Sep-0

5

Nov-0

5

Jan-0

6

Mar-

06

May-0

6

Jul-

06

Sep-0

6

Nov-0

6

Jan-0

7

Mar-

07

May-0

7

Jul-

07

Sep-0

7

Nov-0

7

Jan-0

8

Mar-

08

May-0

8

Jul-

08

Sep-0

8

Nov-0

8

Jan-0

9

Mar-

09

May-0

9

Jul-

09

Sep-0

9

Nov-0

9

Jan-1

0

0

200

400

600

800

1000

1200

1400

1600

 November 4, Obama won the presidency

Presidential Campaign Events

Inauguration January 20, 2009

Supported the Secure Fence Act

Announced his candidacyFebruary 10, 2007 won the 2009

Nobel Peace Prize

MotivationDonald Rumsfeld’s resignation from US Secretary of Defense

causes a burst of event-related updates

Oct-0

1

Jan-

02

Apr-0

2

Jul-0

2

Oct-0

2

Jan-

03

Apr-0

3

Jul-0

3

Oct-0

3

Jan-

04

Apr-0

4

Jul-0

4

Oct-0

4

Jan-

05

Apr-0

5

Jul-0

5

Oct-0

5

Jan-

06

Apr-0

6

Jul-0

6

Oct-0

6

Jan-

07

Apr-0

7

Jul-0

7

Oct-0

7

Jan-

08

Apr-0

8

Jul-0

8

Oct-0

8

Jan-

09

Apr-0

9

Jul-0

9

Oct-0

9

Jan-

100

100

200

300

400

500

600

700

800November 8, 2006

Event-related updates for Donald Rumsfeld

Wikipedia UpdateDifference between current version and previous version

Previous Revision

Current Revision

Words Added Words Removed

Comment

Section Title

TimestampAuthor

Position

Pipeline for identifying and summarizing event-related information from Wikipedia updates

EntityEvent-related

updates detection

Event identificationand

summarizationEvents and summaries

All Update

s

Event-relatedUpdates

Event-Related Updates Detection

Mar-0

4

May-0

4

Jul-0

4

Sep-

04

Nov-0

4

Jan-

05

Mar-0

5

May-0

5

Jul-0

5

Sep-

05

Nov-0

5

Jan-

06

Mar-0

6

May-0

6

Jul-0

6

Sep-

06

Nov-0

6

Jan-

07

Mar-0

7

May-0

7

Jul-0

7

Sep-

07

Nov-0

7

Jan-

08

Mar-0

8

May-0

8

Jul-0

8

Sep-

08

Nov-0

8

Jan-

09

Mar-0

9

May-0

9

Jul-0

9

Sep-

09

Nov-0

9

Jan-

100

200

400

600

800

1000

1200

1400

1600

Classify (SVN)(2616/10680)

Event Related Updates

Detect Bursts

Event-Related Updates Detection

Burst Detection Classification

TemporalSummarization• Time-based clustering

• Burst Detection ( each burst corresponds to an event)

• Sentence identification• Weight = #updates made to the sentence• Positions occupied in the updated revisions

• Text-based clustering• Incremental clustering JaccardSimilarity – Sentence Cluster• Cluster weight aggregation of member sentences weight• Representative sentence

• Position-based clustering• Maximum gap of 10 sentences => Positions Cluster• Mapping Sentences Cluster- Positions Cluster

• Summarization as ranked sentencesTop M Sentence Clusters Representative - Position Clusters

TemporalSummarization

1. Burst Detection

2. Sentence Extraction

3. Text Similarity

4. Spatial Similarity

1. Time similarity

Demo Example – Charlie Sheen

Demo and datasetwww.l3s.de/wiki-events