14
Intelligent Database Systems Lab 國國國國國國國國 National Yunlin University of Science and Technology 1 Discovering Interesting Usage Patterns in Text Collections: Integrating Text Mining with Visualization Presenter : Wei-Hao Huang Authors : Anthony Don, Elena Zheleva, Machon Gregory, Sureyya Tarkan, Loretta Auvil, Tanya Clement, Ben Shneiderman, Catherine Plaisant CIKM 2007

Presenter : Wei- Hao Huang

  • Upload
    ania

  • View
    59

  • Download
    0

Embed Size (px)

DESCRIPTION

Discovering Interesting Usage Patterns in Text Collections: Integrating Text Mining with Visualization. Presenter : Wei- Hao Huang - PowerPoint PPT Presentation

Citation preview

Page 1: Presenter  : Wei- Hao  Huang

Intelligent Database Systems Lab

國立雲林科技大學National Yunlin University of Science and Technology

1

Discovering Interesting Usage Patterns in TextCollections: Integrating Text Mining with Visualization

Presenter : Wei-Hao Huang  Authors : Anthony Don, Elena Zheleva, Machon Gregory, Sureyya Tarkan, Loretta Auvil, Tanya Clement, Ben Shneiderman, Catherine PlaisantCIKM 2007

Page 2: Presenter  : Wei- Hao  Huang

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

2

Outlines Motivation Objectives Methodology Experiments Conclusions Comments

Page 3: Presenter  : Wei- Hao  Huang

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

3

Motivation· Critical interpretation of literary works is

difficult.· Researchers are rarely to support their

interpretation and the development of new hypotheses.

· Text mining algorithms typically return large number of patterns which are difficult to interpret out of context.

Page 4: Presenter  : Wei- Hao  Huang

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Objectives

4

• To propose text mining with Visualization results more interpretation to humanities scholars, journalists, intelligence analysts, and other researchers, in order to support the analysis of text collections.

Page 5: Presenter  : Wei- Hao  Huang

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

5

Methodology

FeatureLens Frequent expressions

Frequent words

Frequent closed itemsets of n-grams

Page 6: Presenter  : Wei- Hao  Huang

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.FeatureLens

6

Page 7: Presenter  : Wei- Hao  Huang

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Frequent expressions· To qualify a word or a longer expression

N-gram

Support of an expression

7

Ex: This is a book.2-gram: {“This is”, “is a”, “a book”}3-gram: {“This is a”, “is a book”}

Page 8: Presenter  : Wei- Hao  Huang

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Frequent words· D2K/T2K provides the means to perform the

frequent words analysis with stemming.

8

Page 9: Presenter  : Wei- Hao  Huang

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Frequent closed itemsets of n-grams· <par_id, X>

· X1 is a frequent closed itemset but X2 and X3 are not.

9

I = { “I will improve”, “will improve medical”, “will improve security”, “will improve education”, “improve medical aid”, “improve security in”, “improve education in” , “medical aid in”, “aid in our”, “security in our”, “education in our”, “in our country”}

“improve our health care system”“improve our health our citizens”

Page 10: Presenter  : Wei- Hao  Huang

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Experiments

10

· With two different types of text The State of the Union Addresses The Making of Americans

Page 11: Presenter  : Wei- Hao  Huang

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.The State of the Union Addresses

11

1. How many times did “terrorist” appear in 2002? The president mentions “the American people” and “terrorist” in the same speeches, did the two terms ever appear in the same paragraph?

2. What was the longest pattern? In which year and paragraphs did it occur? What is the meaning of it?

Page 12: Presenter  : Wei- Hao  Huang

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.The Making of Americans

12

Page 13: Presenter  : Wei- Hao  Huang

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

13

Conclusions

• These text mining concepts can help the user to analyze the text, and to create insights and new hypotheses.

• FeatureLens helps to discover and present interesting insights about the text.

Page 14: Presenter  : Wei- Hao  Huang

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

14

Comments· Advantages

─ Text mining with visualization· Applications

─ Text mining