18
TEXT MINING – MP1 Prepared by: Mohammad Al Boni

T EXT M INING – MP1 Prepared by: Mohammad Al Boni

Embed Size (px)

Citation preview

Page 1: T EXT M INING – MP1 Prepared by: Mohammad Al Boni

TEXT MINING – MP1Prepared by: Mohammad Al Boni

Page 2: T EXT M INING – MP1 Prepared by: Mohammad Al Boni

2

TASKS & IMPLEMENTATION STRATEGIES

Some Implementation tips before you start! 1.1 Understand Zipf's Law. 1.2 Construct a Controlled Vocabulary. 1.3 Compute similarity between documents. 2.1 Maximum likelihood estimation for

statistical language models with proper smoothing.

2.2 Generate text documents from a language model.

2.3 Language model evaluation.

Page 3: T EXT M INING – MP1 Prepared by: Mohammad Al Boni

3

TASKS & IMPLEMENTATION STRATEGIES

Some Implementation tips before you start! 1.1 Understand Zipf's Law. 1.2 Construct a Controlled Vocabulary. 1.3 Compute similarity between documents. 2.1 Maximum likelihood estimation for

statistical language models with proper smoothing.

2.2 Generate text documents from a language model.

2.3 Language model evaluation.

Page 4: T EXT M INING – MP1 Prepared by: Mohammad Al Boni

4

IMPLEMENTATION TIPS

Use IDEs such as eclipse or netbeans. Divide and conquer!

Parallel computing vs. multi-threadingArrayList<Thread> threads = new ArrayList<Thread>();

for (int j = 0; j + core <FilesSize; j +=NumberOfProcessors)

analyzer.analyzeDocumentDemo(analyzer.LoadJson(Files.get(j+core)),core);

Use separate code files for separate problems. Save and load intermediate results. Always test your code on a small data sample.

Page 5: T EXT M INING – MP1 Prepared by: Mohammad Al Boni

5

TASKS - 1.3 COMPUTE SIMILARITY BETWEEN DOCUMENTS

Approach: Load the controlled vocabulary from part 1.2 Load test documents Load the reviews from query.json Compute similarities and get the top 3 similar

reviews

Page 6: T EXT M INING – MP1 Prepared by: Mohammad Al Boni

6

TASKS - 1.3 COMPUTE SIMILARITY BETWEEN DOCUMENTSCompute similarities and get the top 3 similar

reviews.

Page 7: T EXT M INING – MP1 Prepared by: Mohammad Al Boni

7

TASKS - 1.3 COMPUTE SIMILARITY BETWEEN DOCUMENTSCompute similarities and get the top 3 similar

reviews.

Page 8: T EXT M INING – MP1 Prepared by: Mohammad Al Boni

8

TASK 2.1 LM SMOOTHING

Page 9: T EXT M INING – MP1 Prepared by: Mohammad Al Boni

9

TASK 2.1 LM SMOOTHING

Page 10: T EXT M INING – MP1 Prepared by: Mohammad Al Boni

10

TASK 2.1 LM SMOOTHING

Page 11: T EXT M INING – MP1 Prepared by: Mohammad Al Boni

11

TASK 2.1 LM SMOOTHING

Page 12: T EXT M INING – MP1 Prepared by: Mohammad Al Boni

12

TASK 2.1 MAXIMUM LIKELIHOOD ESTIMATION

Figure 3. Absolute discounting smoothing Figure 2. Linear interpolation smoothing

Page 13: T EXT M INING – MP1 Prepared by: Mohammad Al Boni

13

Figure 4. Linear interpolation smoothingFigure 5. Absolute discounting smoothing

TASK 2.1 MAXIMUM LIKELIHOOD ESTIMATION

Page 14: T EXT M INING – MP1 Prepared by: Mohammad Al Boni

14

Figure 4. Linear interpolation smoothingFigure 5. Absolute discounting smoothing

TASK 2.1 MAXIMUM LIKELIHOOD ESTIMATION

Page 15: T EXT M INING – MP1 Prepared by: Mohammad Al Boni

15

TASK 2.2 GENERATE TEXT DOCUMENTS FROM A LANGUAGE MODEL.

Page 16: T EXT M INING – MP1 Prepared by: Mohammad Al Boni

16

TASK 2.2 GENERATE TEXT DOCUMENTS FROM A LANGUAGE MODEL.

Page 17: T EXT M INING – MP1 Prepared by: Mohammad Al Boni

17

TASK 2.2 GENERATE TEXT DOCUMENTS FROM A LANGUAGE MODEL.

Page 18: T EXT M INING – MP1 Prepared by: Mohammad Al Boni

18

THANK YOU!