16
Faster and Smaller N-Gram LMs Adam Pauls and Dan Klein Presented by SUN Jun

Faster and Smaller N-Gram LMs Adam Pauls and Dan Klein

Embed Size (px)

DESCRIPTION

Faster and Smaller N-Gram LMs Adam Pauls and Dan Klein. Presented by SUN Jun. Overview. N-gram LMs A short review of LM implementation Trie Array: implicit Trie This work: Combination of Multiple techniques Implicit Encoding of query word Variable length encoding for compression - PowerPoint PPT Presentation

Citation preview

Page 1: Faster and Smaller N-Gram LMs Adam Pauls and Dan Klein

Faster and Smaller N-Gram LMs

Adam Pauls and Dan Klein

Presented by SUN Jun

Page 2: Faster and Smaller N-Gram LMs Adam Pauls and Dan Klein

Overview

• N-gram LMs• A short review of LM implementation– Trie– Array: implicit Trie

• This work: Combination of Multiple techniques– Implicit Encoding of query word– Variable length encoding for compression– Speed up for decoder

Page 3: Faster and Smaller N-Gram LMs Adam Pauls and Dan Klein

Back-Off LM

• LM: An n-gram LM represents the probability of a word sequence, given history

• Back-Off LM: Trust the highest order language model that contains n-gram

Page 4: Faster and Smaller N-Gram LMs Adam Pauls and Dan Klein

Implementation of Back-off LM

• File based• Trie• Reverse Trie• Array-a: implicit Trie• Array-b: implicit Trie with reverse index to

parent

Page 5: Faster and Smaller N-Gram LMs Adam Pauls and Dan Klein
Page 6: Faster and Smaller N-Gram LMs Adam Pauls and Dan Klein

This paper

• This work: Combination of Multiple techniques– Implicit Encoding of query word– Variable length encoding for compression– Speed up for decoder

Page 7: Faster and Smaller N-Gram LMs Adam Pauls and Dan Klein

Implicit Encoding of query word

• Sorted array

Page 8: Faster and Smaller N-Gram LMs Adam Pauls and Dan Klein

Implicit Encoding of query word

• Hash Table

Page 9: Faster and Smaller N-Gram LMs Adam Pauls and Dan Klein

Implicit Encoding of query word

• We can exploit this redundancy by storing only the context offsets in the main array, using as many bits as needed to encode all context offsets (32 bits for Web1T).

• In auxiliary arrays, one for each n-gram order, we store the beginning and end of the range of the trie array in which all (wi; c) keys are stored for each wi.

Page 10: Faster and Smaller N-Gram LMs Adam Pauls and Dan Klein

Variable length encoding for compression

Page 11: Faster and Smaller N-Gram LMs Adam Pauls and Dan Klein

Speed up decoder

• Repetitive Queries– By cache

• Scrolling Queries

Page 12: Faster and Smaller N-Gram LMs Adam Pauls and Dan Klein

Experiements

Page 13: Faster and Smaller N-Gram LMs Adam Pauls and Dan Klein

Exp

Page 14: Faster and Smaller N-Gram LMs Adam Pauls and Dan Klein

Exp

Page 15: Faster and Smaller N-Gram LMs Adam Pauls and Dan Klein

Exp

Page 16: Faster and Smaller N-Gram LMs Adam Pauls and Dan Klein

END