11
CSCI 252: Neural Networks and Graphical Models Fall Term 2016 Prof. Levy Zhao, Li, & Kohonen (2010): Contextual Self-Organizing Map: Software for Constructing Semantic Representations

CSCI 252: Neural Networks and Graphical Models Fall Term ...home.wlu.edu/.../lectures/csci252-som-zhao-li-kohonen-2010.pdf · Zhao, Li, & Kohonen (2010) • Resulting trigram word

Embed Size (px)

Citation preview

CSCI 252: Neural Networks and

Graphical ModelsFall Term 2016

Prof. Levy

Zhao, Li, & Kohonen (2010):Contextual Self-Organizing Map:

Software for ConstructingSemantic Representations

Classical View of Word Meaning

Problem with the Classical View• Young children acquire seven to ten new words per day

• Clearly, they can’t be doing this by hearing a dictionary definition!

• Possible solutions (a combination of both is likely):

– Real-world usage context (see a giraffe, learn the word)

– Context of other words

Usage ContextFor a large class of cases – though not for all – in which we employ the word “meaning” it can be defined thus: the meaning of a word is its use in the language.

Ludwig Wittgenstein (1889-1951)

You shall know a word by the company it keeps.

– J.R. FirthJ.R. Firth (1890-1960)

Exploring a Context-Based Alternative

• Running an experiment with human subjects learning / creating word meanings is doable, but costly.

• It’s easier / cheaper to do corpus-based experiments, using a large body of (online) text.

Zhao, Li, & Kohonen (2010)• A true tabula rasa (“blank slate”) approach: each

word in the text starts out as a vector of completely random values.

• Vectors have low-precision values (either 0 or 1), and a large number of dimensions (100): a distributed representation that avoids the Grandmother Cell problem.

• The “meaning” of each word is an emergent property of its context: average vector of all the words preceding the word, plus average of all the words that follow it (a “trigram window”.

Trigram Window

the green vest

a green forest

my green shirt

some green plants

pretty green fabric

etc.

Zhao, Li, & Kohonen (2010)• Resulting trigram word vectors are then used as

the input data e for an SOM.

• To display the “location” of each word in the trained SOM, run a final pass, locating each word at the unit u that “wins” it.

• Resulting plot corresponds (in nontrivial ways) to our classical understanding of words – like what part of speech (noun, verb, adjective, preposition) it belongs to.

English (Grimm’s Fairy Tales)

Chinese (BNU Textbook Corpus)