26
Machine Learning, Predictive Text, and Topic Models Hanna Wallach University of Massachusetts Amherst [email protected]

Machine Learning, Predictive Text, and Topic Modelswallach/talks/ml_intro.pdfHanna Wallach Machine Learning, Predictive Text, and Topic Models 4 Uncertainty There is uncertainty in

  • Upload
    others

  • View
    10

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Machine Learning, Predictive Text, and Topic Modelswallach/talks/ml_intro.pdfHanna Wallach Machine Learning, Predictive Text, and Topic Models 4 Uncertainty There is uncertainty in

Machine Learning, Predictive Text, and Topic Models

Hanna WallachUniversity of Massachusetts Amherst

[email protected]

Page 2: Machine Learning, Predictive Text, and Topic Modelswallach/talks/ml_intro.pdfHanna Wallach Machine Learning, Predictive Text, and Topic Models 4 Uncertainty There is uncertainty in

Hanna Wallach Machine Learning, Predictive Text, and Topic Models 2

Outline

● What is machine learning?

● Examples of machine learning in practice:

$30: Dinner, Cambridge MA $50: Bus ticket, Cambridge MA$5000: Hotel suite, Hong Kong $20: Beer, Amherst MA $10: Lunch, Amherst MA

BALLGAMETEAM

FOOTBALLBASEBALLPLAYERS

PLAYFIELDPLAYER

BASKETBALLCOACHPLAYEDPLAYING

HITTENNIS

JOBWORKJOBS

CAREEREXPERIENCE

EMPLOYMENTOPPORTUNITIES

WORKINGTRAINING

SKILLSCAREERS

POSITIONSFIND

POSITIONFIELD

Page 3: Machine Learning, Predictive Text, and Topic Modelswallach/talks/ml_intro.pdfHanna Wallach Machine Learning, Predictive Text, and Topic Models 4 Uncertainty There is uncertainty in

Hanna Wallach Machine Learning, Predictive Text, and Topic Models 3

Machine Learning (ML)

● There are increasingly large amounts of digital data available:

● Machine learning uses computers to find the most salient features in data to further knowledge and make life easier

... with as little human input as possible

Page 4: Machine Learning, Predictive Text, and Topic Modelswallach/talks/ml_intro.pdfHanna Wallach Machine Learning, Predictive Text, and Topic Models 4 Uncertainty There is uncertainty in

Hanna Wallach Machine Learning, Predictive Text, and Topic Models 4

Uncertainty

● There is uncertainty in almost all real world situations:

● ML explicitly represents uncertainty using probability:

– Pr (lemon) = how certain I am that this is a lemon

● Probability provides a framework for reasoning under uncertainty

Page 5: Machine Learning, Predictive Text, and Topic Modelswallach/talks/ml_intro.pdfHanna Wallach Machine Learning, Predictive Text, and Topic Models 4 Uncertainty There is uncertainty in

Hanna Wallach Machine Learning, Predictive Text, and Topic Models 5

USPS Digit Recognition

● Problem:

– USPS needs to sort letters by zip code

● Solution:

– Teach a computer to recognize hand-written digits

– Only ask human when computer is uncertain:

1 or 7?

Page 6: Machine Learning, Predictive Text, and Topic Modelswallach/talks/ml_intro.pdfHanna Wallach Machine Learning, Predictive Text, and Topic Models 4 Uncertainty There is uncertainty in

Hanna Wallach Machine Learning, Predictive Text, and Topic Models 6

Credit Card Fraud

● Problem:

– Want to detect credit card fraud

● Solution:

– Train a computer to recognise normal and abnormal usages

– Alert card-holder if abnormal pattern is detected

$30: Dinner, Cambridge MA$50: Bus ticket, Cambridge MA$10: Lunch, Amherst MA$20: Beer, Amherst MA$10: Lunch, Amherst MA

$30: Dinner, Cambridge MA $50: Bus ticket, Cambridge MA$5000: Hotel suite, Hong Kong $20: Beer, Amherst MA $10: Lunch, Amherst MA

Page 7: Machine Learning, Predictive Text, and Topic Modelswallach/talks/ml_intro.pdfHanna Wallach Machine Learning, Predictive Text, and Topic Models 4 Uncertainty There is uncertainty in

Hanna Wallach Machine Learning, Predictive Text, and Topic Models 7

Sorting News Stories by Genre

Page 8: Machine Learning, Predictive Text, and Topic Modelswallach/talks/ml_intro.pdfHanna Wallach Machine Learning, Predictive Text, and Topic Models 4 Uncertainty There is uncertainty in

Hanna Wallach Machine Learning, Predictive Text, and Topic Models 8

Predictive Text Entry

● e.g., T9 or iTAP

● Used on cell phones

● Enables use of reduced keyboard

● Enter as much text as possible with as few gestures as possible

TextGestures

(as few as possible)

Page 9: Machine Learning, Predictive Text, and Topic Modelswallach/talks/ml_intro.pdfHanna Wallach Machine Learning, Predictive Text, and Topic Models 4 Uncertainty There is uncertainty in

Hanna Wallach Machine Learning, Predictive Text, and Topic Models 9

Predictive Text Entry

● This is like the reverse of text compression

● Text compression: want to go from as much text as possible to as small a representation as possible

TextBit string

(preferably short)

Page 10: Machine Learning, Predictive Text, and Topic Modelswallach/talks/ml_intro.pdfHanna Wallach Machine Learning, Predictive Text, and Topic Models 4 Uncertainty There is uncertainty in

Hanna Wallach Machine Learning, Predictive Text, and Topic Models 10

Writing and Text Compression

● Optimal text compression

TextBit string

(preferably short)

probabilisticmodel

Page 11: Machine Learning, Predictive Text, and Topic Modelswallach/talks/ml_intro.pdfHanna Wallach Machine Learning, Predictive Text, and Topic Models 4 Uncertainty There is uncertainty in

Hanna Wallach Machine Learning, Predictive Text, and Topic Models 11

Writing and Text Compression

● Optimal text compression and writing with predictive text entry

TextBit string

(preferably short)

probabilisticmodel

TextGestures

(as few as possible)

probabilisticmodel

Page 12: Machine Learning, Predictive Text, and Topic Modelswallach/talks/ml_intro.pdfHanna Wallach Machine Learning, Predictive Text, and Topic Models 4 Uncertainty There is uncertainty in

Hanna Wallach Machine Learning, Predictive Text, and Topic Models 12

Dasher [http://www.dasher.org.uk]

● Driven by 2D continuous gestures

● Uses a model of language

● Available for

– Windows

– Linux

– Mac OS X

– Pocket PC

– etc.

Page 13: Machine Learning, Predictive Text, and Topic Modelswallach/talks/ml_intro.pdfHanna Wallach Machine Learning, Predictive Text, and Topic Models 4 Uncertainty There is uncertainty in

Hanna Wallach Machine Learning, Predictive Text, and Topic Models 13

Dasher: Screen Layout

● Box sizes are proportional to probabilities

● Probabilities come from a letter-based language model

● P(X) = b P(X, Y) = a

Page 14: Machine Learning, Predictive Text, and Topic Modelswallach/talks/ml_intro.pdfHanna Wallach Machine Learning, Predictive Text, and Topic Models 4 Uncertainty There is uncertainty in

Hanna Wallach Machine Learning, Predictive Text, and Topic Models 14

Dasher: Dynamics

Point where you want to go

● Like driving a car

● Motion sickness?

● Not if you're driving!

Page 15: Machine Learning, Predictive Text, and Topic Modelswallach/talks/ml_intro.pdfHanna Wallach Machine Learning, Predictive Text, and Topic Models 4 Uncertainty There is uncertainty in

Hanna Wallach Machine Learning, Predictive Text, and Topic Models 15

Dasher: Benefits

● Keyboards: one gesture per character

● Dasher: some gestures select many characters

● Works with any language

● Inaccurate gestures can be compensated for by later gestures

Page 16: Machine Learning, Predictive Text, and Topic Modelswallach/talks/ml_intro.pdfHanna Wallach Machine Learning, Predictive Text, and Topic Models 4 Uncertainty There is uncertainty in

Hanna Wallach Machine Learning, Predictive Text, and Topic Models 16

Topic Models

● Humans can read a document and identify the small number of topics that best characterize that document

Page 17: Machine Learning, Predictive Text, and Topic Modelswallach/talks/ml_intro.pdfHanna Wallach Machine Learning, Predictive Text, and Topic Models 4 Uncertainty There is uncertainty in

Hanna Wallach Machine Learning, Predictive Text, and Topic Models 17

Topic Models

● Topics are mixtures of words and documents are mixtures of topics

Page 18: Machine Learning, Predictive Text, and Topic Modelswallach/talks/ml_intro.pdfHanna Wallach Machine Learning, Predictive Text, and Topic Models 4 Uncertainty There is uncertainty in

Hanna Wallach Machine Learning, Predictive Text, and Topic Models 18

Topic Models

● Infer topic information from word-document co-occurrences

Page 19: Machine Learning, Predictive Text, and Topic Modelswallach/talks/ml_intro.pdfHanna Wallach Machine Learning, Predictive Text, and Topic Models 4 Uncertainty There is uncertainty in

Hanna Wallach Machine Learning, Predictive Text, and Topic Models 19

Example Topics [Tenenbaum et al.]

STORYSTORIES

TELLCHARACTER

CHARACTERSAUTHOR

READTOLD

SETTINGTALESPLOT

TELLINGSHORTFICTIONACTION

FIELDMAGNETICMAGNET

WIRENEEDLE

CURRENTCOIL

POLESIRON

COMPASSLINESCORE

ELECTRICDIRECTION

FORCE

SCIENCESTUDY

SCIENTISTSSCIENTIFIC

KNOWLEDGEWORK

RESEARCHCHEMISTRY

TECHNOLOGYMANY

MATHEMATICSBIOLOGYFIELD

PHYSICSLABORATORY

BALLGAMETEAM

FOOTBALLBASEBALLPLAYERS

PLAYFIELDPLAYER

BASKETBALLCOACHPLAYEDPLAYING

HITTENNIS

JOBWORKJOBS

CAREEREXPERIENCE

EMPLOYMENTOPPORTUNITIES

WORKINGTRAINING

SKILLSCAREERS

POSITIONSFIND

POSITIONFIELD

Page 20: Machine Learning, Predictive Text, and Topic Modelswallach/talks/ml_intro.pdfHanna Wallach Machine Learning, Predictive Text, and Topic Models 4 Uncertainty There is uncertainty in

Hanna Wallach Machine Learning, Predictive Text, and Topic Models 20

Transfer between Topics [Mimno]

Page 21: Machine Learning, Predictive Text, and Topic Modelswallach/talks/ml_intro.pdfHanna Wallach Machine Learning, Predictive Text, and Topic Models 4 Uncertainty There is uncertainty in

Hanna Wallach Machine Learning, Predictive Text, and Topic Models 21

Entities and Topics [Newman et al.]

Page 22: Machine Learning, Predictive Text, and Topic Modelswallach/talks/ml_intro.pdfHanna Wallach Machine Learning, Predictive Text, and Topic Models 4 Uncertainty There is uncertainty in

Hanna Wallach Machine Learning, Predictive Text, and Topic Models 22

Topics and Email

● Enron email corpus:

– 250k email messages, 23k people

Page 23: Machine Learning, Predictive Text, and Topic Modelswallach/talks/ml_intro.pdfHanna Wallach Machine Learning, Predictive Text, and Topic Models 4 Uncertainty There is uncertainty in

Hanna Wallach Machine Learning, Predictive Text, and Topic Models 23

Selecting Email Keywords [Dredze et al.]

● Without topics: producst pliers stmt hammer wrench

● With topics: team meeting services lisa ase

Page 24: Machine Learning, Predictive Text, and Topic Modelswallach/talks/ml_intro.pdfHanna Wallach Machine Learning, Predictive Text, and Topic Models 4 Uncertainty There is uncertainty in

Hanna Wallach Machine Learning, Predictive Text, and Topic Models 24

Senders, Recipients, Topics [McCallum et al.]

“Chief Operations Officer” “Government Relations Executive”

Page 25: Machine Learning, Predictive Text, and Topic Modelswallach/talks/ml_intro.pdfHanna Wallach Machine Learning, Predictive Text, and Topic Models 4 Uncertainty There is uncertainty in

Hanna Wallach Machine Learning, Predictive Text, and Topic Models 25

Summary

● Machines can learn a lot from unstructured digital data

● We can use machine learning to build useful applications, some of which you are already using!

Page 26: Machine Learning, Predictive Text, and Topic Modelswallach/talks/ml_intro.pdfHanna Wallach Machine Learning, Predictive Text, and Topic Models 4 Uncertainty There is uncertainty in

Hanna Wallach Machine Learning, Predictive Text, and Topic Models 26

Questions?