CMU TDT Report TIDES PI Meeting 2002

CMU TDT Report TIDES PI Meeting 2002

The CMU TDT Team:Jaime Carbonell, Yiming Yang, Ralf

Brown, Jian Zhang, Nianli Ma, Chun JinLanguage Technologies Institute, CMU

Time Line for TDT Activities ReStarted TDT: Summer 2001 Tasks: FSD, SLD, Detection New Techniques: Nov 2001 – Present

Topic-conditional Novelty (FSD) Situated NE’s (all tasks) Source-conditional interpolated training (SLD)

Evaluations TDT: Oct 2001, July 2002 New FSD (internal): July 2002 (KDD

Conference)

2002 Dry Run Results: DET

Evaluation Conditions Systran

EBMT DICT

SR=nwt+bnasr TE=mul,eng boundary DEF=10

0.3646 0.3465 [1]

SR=nwt+bnasr TE=mul,eng noboundary DEF=10

0.4040

SR=nwt+bnman TE=arb,eng boundary DEF=10

0.2011 0.6799 [2]

0.1966 [3]

SR=nwt+bnman TE=arb,nat boundary DEF=10

0.1732

[1] Using our Mandarin to English EBMT, and replace our boundary with systran’s boundary.[2] Using our Dictionary-Based Arabic to English translation, and with our own boundaries. So the boundaries of evaluation and our results are mismatching. [3] Using our Dictionary-Based Arabic to English translation, and replace our boundary with systran’s boundary.

Baseline FSD Method (Unconditional) Dissimilarity with

Past Decision threshold on most-similar

story (Linear) temporal decay Length-filter (for teasers)

Cosine similarity with standard weights:

)/log(*))log(1( idfNtftfidf

2002 Dry Run Results: FSD

Evaluation Conditions

SR=nwt+bnasr; TE=eng, nat;

boundary; DEF=10 0.6174 0.5846

SR=nwt+bnasr; TE=eng, nat; noboundary; DEF=10

0.6899 0.6403

normfsdC )( optimalnormfsdC )(

2002 Dry Run DET: CMU-FSD

FSD Observations Cross-site comparable baselines (cost

=.7) “Events-vs-Topics” issue (e.g. Asia crisis) A few mislabled stories wreak havoc for FSD Eager auto-segmentation a problem (misses)

Recommendations for TDT labeling FSD on true events, or events within topic(s) Change auto-segmentation optimality

criterion ?? Recommendations for TDT reserachers

Keep working hard on FSD – not cracked yet

New FSD Directions Topic-conditional models

E.g. “airplane,” “investigation,” “FAA,” “FBI,” “casualties,” topic, not event

“TWA 800,” “March 12, 1997” event First categorize into topic, then use

maximally-discriminative terms within topic

Rely on situated named entities E.g. “Arcan as victim,” “Sharon as peacemaker”

Broad Topics vs Events

Two-level Scheme for FSD

Confusability between Intra-topic Events

AIRPLANE ACCIDENTS BOMBINGS• Each data point in the matrix is the similarity between the two corresponding documents.

• Documents are sorted by event as the first key and by the time of arrival as second key, so the diagonal sub-matrices are intra-event document similarities, while the off-diagonal sub-matrices are inter-event document similarities.

Measuring Effectiveness of NEs

[1] f means a Named Entity; Sk the Kth type of Named Entities among seven types of NEs.[2] We use the effectiveness of each type of NEs to measure how well they can differentiate intra-topic events.

Effectiveness of Named Entities

Experimental Design Baseline: conventional FSD

Simple case: two-level FSD with “perfect” topic labels

Ideal case: two-level FSD with “perfect” topic labels, weighted NE and removing topic-specific stop words

Real case: the same as Ideal Case except using system-predicted topic labels

Data Description Broadcast News: published by Primary Source Media, 261,209 transcripts for news articles from ABC, CNN,

NPR and MSNBC in the period from 1992 to 1998. Document Structure: each document (story) is

composed of several fields, such as Title, Topic, Keywords, Date, Abstract and Body.

(Training) topic labels provided by PSM (4 topics) Airplane accidents, bombings, tornados, hijackings

CMU students labeled 36 events within 4 topics (divided into 50% training and 50% test)

Results for Topic-Conditioned FSD

Confusability Reduction (5 events within topic: airplane accident in test data)

NOTE:1. These graphs only contains test data (5 events for topic “airplane accidents”)2. The left graph is the Baseline, and the right one is the Ideal Case.

Topic-Conditioned Approach to First Story Detection for TDT