28
TIMELINE FROM NEWS KK Lo

503 Final Presentation

  • Upload
    kklo

  • View
    317

  • Download
    0

Embed Size (px)

DESCRIPTION

Computational Linguistics Course Project

Citation preview

Page 1: 503 Final Presentation

TIMELINE FROM NEWS

KK Lo

Page 2: 503 Final Presentation

GOAL...

Page 3: 503 Final Presentation
Page 4: 503 Final Presentation

RELATED WORK

Page 5: 503 Final Presentation

Topic Detection and Tracking

Temporal and Event Tagging

2communities

Page 6: 503 Final Presentation

Topic Detection and Tracking

tracking topics?classifying documents

discovering new topic

Events of interest

Page 7: 503 Final Presentation

assume each article is an event

Problems

lack of details

publication date =event happen time?

Page 8: 503 Final Presentation

Temporal and Event Tagging

? Tagging events and their temporal relationships

Page 9: 503 Final Presentation

too many Events....

Problems

Result obtained from the TARSQI toolkit

Event

Event Event

Event

EventEventEvent

Page 10: 503 Final Presentation

MY SOLUTION

Page 11: 503 Final Presentation

APPLY SUMMARIZATIONTECHNIQUE AS

EVENT FILTERING

Page 12: 503 Final Presentation

3components

Page 13: 503 Final Presentation

Prior Ranking1. Sentence A

2. Sentence B

3. Sentence C

4. ...

Beginning sentence has a higher prior probability

0prior probability

Page 14: 503 Final Presentation

Grasshopper

A Page-rank-like ranking algorithm

s1

s2s3

s4

s5

cosine similarities

Page 15: 503 Final Presentation

TARSQI Toolkit

explicit time

event instance

event-time link

event-event link

From TEXT to TimeML

Page 16: 503 Final Presentation

Event FilteringEvents in TimeML

Appear in the Top Selected Sentences?

PICK

BYENO

YES

Page 17: 503 Final Presentation

Temporal Reasoner

Find the (start, end) bound for each events

2008Dec

event1event2

event3

2009

Page 18: 503 Final Presentation

RESULT?

Page 19: 503 Final Presentation

Sentence Selection Quality

Special Thanks to

for the data and ROUGE =p

250-words summary form 25 documents with DUC2007 Data Set

Page 20: 503 Final Presentation

How can we represent 3320 events on a timeline?

Effect of Sentence Filtering

D0701A D0720E

#Event before Filtering 3320 1435

#Event after Filtering 67 37

choosing the top 10 sentences

Page 21: 503 Final Presentation

This shows that my approach is a failure

Time-Event AnchoringD0701A D0720E

#Event before Filtering

3320 1435

#Failure 3085 1129

#Event after Filtering

67 37

#Failure 49 29

Page 22: 503 Final Presentation

WHY?Unable to deduce the

relationships for all pair of events

TARSQI only support single document

e.g. 50 tagged events,only 50 pairs of relation are taggedshould be 50C2 = 1225

Page 23: 503 Final Presentation

LESSON LEARNED

Page 24: 503 Final Presentation

3areas

Topic Detection and Tracking

Temporal and Event Tagging

Automatic Summarization

my project

Page 25: 503 Final Presentation

The limit of existing technology

cannot get enough information from the documents

The limit of temporal analysis

OR EVEN

Page 26: 503 Final Presentation

cosine similarity with tf-idf weighting is computational

expensive

2.5 hrs for 867 sentences

Page 27: 503 Final Presentation

DUC2007 Documents are hard to parse

different documents have different format........

no standard date format...

contains some special characters that cause troubles

to XML parsers...

Page 28: 503 Final Presentation

Q & A