61
Compositional and Multimodal Distributional Semantics Stephen Clark Bridges Programme Summer School, University of Warwick 21 September 2016 University of Cambridge Computer Laboratory 1

Compositional and Multimodal Distributional …...S → NP VP : VP (NP ) The dog sleeps : sleep (dog ) dog picks out an individual in some model sleep is a relation The dog sleeps

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Compositional and Multimodal Distributional …...S → NP VP : VP (NP ) The dog sleeps : sleep (dog ) dog picks out an individual in some model sleep is a relation The dog sleeps

Compositional and Multimodal Distributional Semantics

Stephen Clark

Bridges Programme Summer School, University of Warwick

21 September 2016

University of Cambridge Computer Laboratory

1

Page 2: Compositional and Multimodal Distributional …...S → NP VP : VP (NP ) The dog sleeps : sleep (dog ) dog picks out an individual in some model sleep is a relation The dog sleeps

Semantics and Language Technology

2

What’s the meaning representation?What representations are currently used in state-of-the-art Language Technology?

Page 3: Compositional and Multimodal Distributional …...S → NP VP : VP (NP ) The dog sleeps : sleep (dog ) dog picks out an individual in some model sleep is a relation The dog sleeps

Words in Google

3

Page 4: Compositional and Multimodal Distributional …...S → NP VP : VP (NP ) The dog sleeps : sleep (dog ) dog picks out an individual in some model sleep is a relation The dog sleeps

Sentences in Google

4

Page 5: Compositional and Multimodal Distributional …...S → NP VP : VP (NP ) The dog sleeps : sleep (dog ) dog picks out an individual in some model sleep is a relation The dog sleeps

Image Search

5

Page 6: Compositional and Multimodal Distributional …...S → NP VP : VP (NP ) The dog sleeps : sleep (dog ) dog picks out an individual in some model sleep is a relation The dog sleeps

Talk Outline

• Traditional (“symbolic”) compositional semantics

• Data-driven distributional (lexical) semantics

• Distributed representations in neural machine translation

• “Grounding” distributional semantics in psychological feature norms

6

Page 7: Compositional and Multimodal Distributional …...S → NP VP : VP (NP ) The dog sleeps : sleep (dog ) dog picks out an individual in some model sleep is a relation The dog sleeps

Collaborators

7

Page 8: Compositional and Multimodal Distributional …...S → NP VP : VP (NP ) The dog sleeps : sleep (dog ) dog picks out an individual in some model sleep is a relation The dog sleeps

Formal (Montague) Semantics

S → NP VP : VP �(NP �)

The dog sleeps : sleep�(dog �)

dog � picks out an individual in some model

sleep� is a relation

The dog sleeps � is true if dog � is in sleep� and false otherwise

8

Page 9: Compositional and Multimodal Distributional …...S → NP VP : VP (NP ) The dog sleeps : sleep (dog ) dog picks out an individual in some model sleep is a relation The dog sleeps

Semantic Analysis Tools

Boxer output for Every computer on the internet has a domain name

9

Page 10: Compositional and Multimodal Distributional …...S → NP VP : VP (NP ) The dog sleeps : sleep (dog ) dog picks out an individual in some model sleep is a relation The dog sleeps

Semantic Tools and Resources

Theorem provers and model builders: Vampire, otter, mace, paradox

Roberto Navigli

10

Page 11: Compositional and Multimodal Distributional …...S → NP VP : VP (NP ) The dog sleeps : sleep (dog ) dog picks out an individual in some model sleep is a relation The dog sleeps

Semantic Similarity

11

Regular coffee breaks diminish the risk of gettingAlzheiemers and dementia in old age.

Three cups of coffee a day greatly reduce the chanceof developing dementia or alzheimers later in life.

Page 12: Compositional and Multimodal Distributional …...S → NP VP : VP (NP ) The dog sleeps : sleep (dog ) dog picks out an individual in some model sleep is a relation The dog sleeps

Semantic Similarity

• Semantic Similarity is at the heart of many NLP problems

• Set theory is the wrong maths for similarity

12

Regular coffee breaks diminish the risk of gettingAlzheiemers and dementia in old age.

Three cups of coffee a day greatly reduce the chanceof developing dementia or alzheimers later in life.

Page 13: Compositional and Multimodal Distributional …...S → NP VP : VP (NP ) The dog sleeps : sleep (dog ) dog picks out an individual in some model sleep is a relation The dog sleeps

Semantic Similarity

• Semantic representations need to be learned rather than programmed

• Discrete logic-based representations are difficult to learn

13

Regular coffee breaks diminish the risk of gettingAlzheiemers and dementia in old age.

Three cups of coffee a day greatly reduce the chanceof developing dementia or alzheimers later in life.

Page 14: Compositional and Multimodal Distributional …...S → NP VP : VP (NP ) The dog sleeps : sleep (dog ) dog picks out an individual in some model sleep is a relation The dog sleeps

Cognitive Scientists vs. Philosophers

• Natural language is flexible, dynamic, ambiguous, metaphorical ...

• Cognitive scientists have always recognised this (Lakoff, 1987)

• Geometric models of meaning seem a better fit (Gardenfors, 2014)

14

Women, Fire and Dangerous Things (Lakoff, 1987); The Geometry of Meaning (Gardenfors, 2014)

Page 15: Compositional and Multimodal Distributional …...S → NP VP : VP (NP ) The dog sleeps : sleep (dog ) dog picks out an individual in some model sleep is a relation The dog sleeps

Talk Outline

• Traditional (“symbolic”) compositional semantics

• Data-driven distributional (lexical) semantics

• Distributed representations in neural machine translation

• “Grounding” distributional semantics in psychological feature norms

15

Page 16: Compositional and Multimodal Distributional …...S → NP VP : VP (NP ) The dog sleeps : sleep (dog ) dog picks out an individual in some model sleep is a relation The dog sleeps

16

Distributional Semantics

... examples of felines are the big cats – the lion , tiger , leopard , ...

... Calico cat is an American name for cats with three colors . Out ...

... Siamese is a breed of cat from Thailand . It is a well-known cat ...

... Cats is a musical . It is based on Old Possums Book of Practical ...

feline lion ... musical color ...

cat

tiger

10 20 4 5

2 4 0 1

target words context words

tTestpwi, cjq “ ppwi, cjq ´ ppwiqppcjqappwiqppcjq

Page 17: Compositional and Multimodal Distributional …...S → NP VP : VP (NP ) The dog sleeps : sleep (dog ) dog picks out an individual in some model sleep is a relation The dog sleeps

Dimensionality Reduction

17

Page 18: Compositional and Multimodal Distributional …...S → NP VP : VP (NP ) The dog sleeps : sleep (dog ) dog picks out an individual in some model sleep is a relation The dog sleeps

Vector Space Semantics

18

furry

stroke

pet��

��✠

✁✁✁✁✁✁✁✕

���

��✒

cat

dog

Page 19: Compositional and Multimodal Distributional …...S → NP VP : VP (NP ) The dog sleeps : sleep (dog ) dog picks out an individual in some model sleep is a relation The dog sleeps

Neural Representations

19

Mikolov et al. (2013); word2vec

Page 20: Compositional and Multimodal Distributional …...S → NP VP : VP (NP ) The dog sleeps : sleep (dog ) dog picks out an individual in some model sleep is a relation The dog sleeps

Neural Representations

20

The study of distributed representations currently dominates NLP/LT research

Page 21: Compositional and Multimodal Distributional …...S → NP VP : VP (NP ) The dog sleeps : sleep (dog ) dog picks out an individual in some model sleep is a relation The dog sleeps

From Words to Phrases and Sentences

s1

s2

s3�

��

��✠

✂✂✂✂✂✍

���✒

man killed dog

man murdered cat

man killed by dog

21

Page 22: Compositional and Multimodal Distributional …...S → NP VP : VP (NP ) The dog sleeps : sleep (dog ) dog picks out an individual in some model sleep is a relation The dog sleeps

Talk Outline

• Traditional (“symbolic”) compositional semantics

• Data-driven distributional (lexical) semantics

• Distributed representations in neural machine translation

• “Grounding” distributional semantics in psychological feature norms

22

Talk Outline

Page 23: Compositional and Multimodal Distributional …...S → NP VP : VP (NP ) The dog sleeps : sleep (dog ) dog picks out an individual in some model sleep is a relation The dog sleeps

Machine Translation

23

Page 24: Compositional and Multimodal Distributional …...S → NP VP : VP (NP ) The dog sleeps : sleep (dog ) dog picks out an individual in some model sleep is a relation The dog sleeps

MT with Vector Spaces

24

s1

s2

s3�

��

��✠

✂✂✂✂✂✍

���✒

man killed dog

man murdered cat

man killed by dog

Page 25: Compositional and Multimodal Distributional …...S → NP VP : VP (NP ) The dog sleeps : sleep (dog ) dog picks out an individual in some model sleep is a relation The dog sleeps

MT with Vector Spaces

25

Grefenstette et. al, New Directions in Vector Space Models of Meaning (ACL, 2014)

Page 26: Compositional and Multimodal Distributional …...S → NP VP : VP (NP ) The dog sleeps : sleep (dog ) dog picks out an individual in some model sleep is a relation The dog sleeps

Recurrent Neural Networks

26

Taken from Colah’s blog: http://colah.github.io/posts/2015-08-Understanding-LSTMs/

Page 27: Compositional and Multimodal Distributional …...S → NP VP : VP (NP ) The dog sleeps : sleep (dog ) dog picks out an individual in some model sleep is a relation The dog sleeps

RNNs for POS Tagging

27

The dog chased cheese

DT NN VBD NN

Page 28: Compositional and Multimodal Distributional …...S → NP VP : VP (NP ) The dog sleeps : sleep (dog ) dog picks out an individual in some model sleep is a relation The dog sleeps

RNNs for MT

28

Picture fromKyunghyun Cho

Page 29: Compositional and Multimodal Distributional …...S → NP VP : VP (NP ) The dog sleeps : sleep (dog ) dog picks out an individual in some model sleep is a relation The dog sleeps

Training the Encoder-Decoder End-to-End

29

• Optimise using SGD

• Exploit GPUs

• Neural Net libraries

• “Tricks of the trade”

Page 30: Compositional and Multimodal Distributional …...S → NP VP : VP (NP ) The dog sleeps : sleep (dog ) dog picks out an individual in some model sleep is a relation The dog sleeps

Training the Encoder-Decoder End-to-End

30

• Optimise using SGD

• Exploit GPUs

• Neural Net libraries

• “Tricks of the trade”

Applications of RNNs currently dominate NLP conferences

Page 31: Compositional and Multimodal Distributional …...S → NP VP : VP (NP ) The dog sleeps : sleep (dog ) dog picks out an individual in some model sleep is a relation The dog sleeps

How Far Can Neural Models Go?

• Can neural models learn to make inferences?

• Can neural models incorporate logical operators?

• Can symbolic knowledge be incorporated into neural models?

• Can neural/distributed models connect language with other modalities?

31

Page 32: Compositional and Multimodal Distributional …...S → NP VP : VP (NP ) The dog sleeps : sleep (dog ) dog picks out an individual in some model sleep is a relation The dog sleeps

How Far Can Neural Models Go?

• Can neural models learn to make inferences?

• Can neural models incorporate logical operators?

• Can symbolic knowledge be incorporated into neural models?

• Can neural/distributed models connect language with other modalities?

32

Page 33: Compositional and Multimodal Distributional …...S → NP VP : VP (NP ) The dog sleeps : sleep (dog ) dog picks out an individual in some model sleep is a relation The dog sleeps

Negation and Vector Spaces

33

Page 34: Compositional and Multimodal Distributional …...S → NP VP : VP (NP ) The dog sleeps : sleep (dog ) dog picks out an individual in some model sleep is a relation The dog sleeps

Negation and Vector Spaces

34

Page 35: Compositional and Multimodal Distributional …...S → NP VP : VP (NP ) The dog sleeps : sleep (dog ) dog picks out an individual in some model sleep is a relation The dog sleeps

Negation and Vector Spaces

35

Page 36: Compositional and Multimodal Distributional …...S → NP VP : VP (NP ) The dog sleeps : sleep (dog ) dog picks out an individual in some model sleep is a relation The dog sleeps

Negation and Vector Spaces

36

Page 37: Compositional and Multimodal Distributional …...S → NP VP : VP (NP ) The dog sleeps : sleep (dog ) dog picks out an individual in some model sleep is a relation The dog sleeps

Talk Outline

• Traditional (“symbolic”) compositional semantics

• Data-driven distributional (lexical) semantics

• Distributed representations in neural machine translation

• “Grounding” distributional semantics in psychological feature norms

37

Page 38: Compositional and Multimodal Distributional …...S → NP VP : VP (NP ) The dog sleeps : sleep (dog ) dog picks out an individual in some model sleep is a relation The dog sleeps

Multimodal Semantics

• The watermelon fruit has a smooth exterior rind (usually green with dark green stripes or yellow spots) and a juicy, sweet interior flesh.

• Watermelon not only boosts your "health esteem," but it is has excellent levels of vitamins A and C and a good level of vitamin B6.

38

Page 39: Compositional and Multimodal Distributional …...S → NP VP : VP (NP ) The dog sleeps : sleep (dog ) dog picks out an individual in some model sleep is a relation The dog sleeps

Multimodal Semantics

• The watermelon fruit has a smooth exterior rind (usually green with dark green stripes or yellow spots) and a juicy, sweet interior flesh.

• Watermelon not only boosts your "health esteem," but it is has excellent levels of vitamins A and C and a good level of vitamin B6.

39

Page 40: Compositional and Multimodal Distributional …...S → NP VP : VP (NP ) The dog sleeps : sleep (dog ) dog picks out an individual in some model sleep is a relation The dog sleeps

Multimodal Semantics

40

Page 41: Compositional and Multimodal Distributional …...S → NP VP : VP (NP ) The dog sleeps : sleep (dog ) dog picks out an individual in some model sleep is a relation The dog sleeps

Feature Norms

• Human subjects are asked to identify a concept’s key attributes:

41

Page 42: Compositional and Multimodal Distributional …...S → NP VP : VP (NP ) The dog sleeps : sleep (dog ) dog picks out an individual in some model sleep is a relation The dog sleeps

Feature Norms

• Feature norms used extensively in psychology but expensive to produce

• Feature norms are more “cognitively sound” than text-based distributional models, and more interpretable (find shirts with/without sleeves)

42

Page 43: Compositional and Multimodal Distributional …...S → NP VP : VP (NP ) The dog sleeps : sleep (dog ) dog picks out an individual in some model sleep is a relation The dog sleeps

Predicting Feature Norms

43

Page 44: Compositional and Multimodal Distributional …...S → NP VP : VP (NP ) The dog sleeps : sleep (dog ) dog picks out an individual in some model sleep is a relation The dog sleeps

Training and Testing the Mapping

• Use standard linear regression model

• Various distributional spaces built from Wikipedia

• “Zero-shot” learning: train on 540 concepts, test on 1

44

Page 45: Compositional and Multimodal Distributional …...S → NP VP : VP (NP ) The dog sleeps : sleep (dog ) dog picks out an individual in some model sleep is a relation The dog sleeps

Numerical Evaluation

45

Page 46: Compositional and Multimodal Distributional …...S → NP VP : VP (NP ) The dog sleeps : sleep (dog ) dog picks out an individual in some model sleep is a relation The dog sleeps

Example Nearest Neighbours

46

Page 47: Compositional and Multimodal Distributional …...S → NP VP : VP (NP ) The dog sleeps : sleep (dog ) dog picks out an individual in some model sleep is a relation The dog sleeps

Predicted Features

47

Page 48: Compositional and Multimodal Distributional …...S → NP VP : VP (NP ) The dog sleeps : sleep (dog ) dog picks out an individual in some model sleep is a relation The dog sleeps

Predicting Feature Norms (from visual space)

48

Page 49: Compositional and Multimodal Distributional …...S → NP VP : VP (NP ) The dog sleeps : sleep (dog ) dog picks out an individual in some model sleep is a relation The dog sleeps

Image Vectors

49

IMAGE_VECTOR_FOR_BIKE

• Convolutional Neural Network trained to do object recognition

Page 50: Compositional and Multimodal Distributional …...S → NP VP : VP (NP ) The dog sleeps : sleep (dog ) dog picks out an individual in some model sleep is a relation The dog sleeps

Predicting Norms from Image Vectors

50

Page 51: Compositional and Multimodal Distributional …...S → NP VP : VP (NP ) The dog sleeps : sleep (dog ) dog picks out an individual in some model sleep is a relation The dog sleeps

51

Example Nearest Neighbours

Page 52: Compositional and Multimodal Distributional …...S → NP VP : VP (NP ) The dog sleeps : sleep (dog ) dog picks out an individual in some model sleep is a relation The dog sleeps

52

Predicted Features

Page 53: Compositional and Multimodal Distributional …...S → NP VP : VP (NP ) The dog sleeps : sleep (dog ) dog picks out an individual in some model sleep is a relation The dog sleeps

What does a Concept Look Like?

53

Page 54: Compositional and Multimodal Distributional …...S → NP VP : VP (NP ) The dog sleeps : sleep (dog ) dog picks out an individual in some model sleep is a relation The dog sleeps

Image Prediction One Norm at a Time

54

Page 55: Compositional and Multimodal Distributional …...S → NP VP : VP (NP ) The dog sleeps : sleep (dog ) dog picks out an individual in some model sleep is a relation The dog sleeps

Image Prediction One Norm at a Time

55

Page 56: Compositional and Multimodal Distributional …...S → NP VP : VP (NP ) The dog sleeps : sleep (dog ) dog picks out an individual in some model sleep is a relation The dog sleeps

Image Prediction One Norm at a Time

56

Page 57: Compositional and Multimodal Distributional …...S → NP VP : VP (NP ) The dog sleeps : sleep (dog ) dog picks out an individual in some model sleep is a relation The dog sleeps

Image Prediction One Norm at a Time

57

Page 58: Compositional and Multimodal Distributional …...S → NP VP : VP (NP ) The dog sleeps : sleep (dog ) dog picks out an individual in some model sleep is a relation The dog sleeps

Image Prediction One Norm at a Time

58

Page 59: Compositional and Multimodal Distributional …...S → NP VP : VP (NP ) The dog sleeps : sleep (dog ) dog picks out an individual in some model sleep is a relation The dog sleeps

Image Prediction One Norm at a Time

59

Page 60: Compositional and Multimodal Distributional …...S → NP VP : VP (NP ) The dog sleeps : sleep (dog ) dog picks out an individual in some model sleep is a relation The dog sleeps

The Automated Wine Taster (?!)

60

Lightly floral notes open up into a broad honey and citrus aroma with a nice earthy mushroom touch

Place of origin: AustraliaGrape: ChardonnayPrice: £8.99

Page 61: Compositional and Multimodal Distributional …...S → NP VP : VP (NP ) The dog sleeps : sleep (dog ) dog picks out an individual in some model sleep is a relation The dog sleeps

The End

Papers available at: http://www.cl.cam.ac.uk/∼sc609/

61