Upload
timothy-bradshaw
View
42
Download
0
Tags:
Embed Size (px)
DESCRIPTION
A Probabilistic Approach to Semantic Representation. Tom Griffiths Mark Steyvers Josh Tenenbaum. How do we store the meanings of words? question of representation requires efficient abstraction. How do we store the meanings of words? question of representation - PowerPoint PPT Presentation
Citation preview
A Probabilistic Approach to Semantic Representation
Tom Griffiths
Mark Steyvers
Josh Tenenbaum
• How do we store the meanings of words?– question of representation– requires efficient abstraction
• How do we store the meanings of words?– question of representation– requires efficient abstraction
• Why do we store this information?– function of semantic memory– predictive structure
Latent Semantic Analysis(Landauer & Dumais, 1997)
1
…
6
…
11
…
spaces
…
6195semantic
2120in
3034words
Doc3 … Doc2Doc1
SVD words
in
semantic
spaces
X U D V T
co-occurrence matrix high dimensional space
Mechanistic Claim
Some component of word meaning can be extracted from co-occurrence statistics
Mechanistic Claim
Some component of word meaning can be extracted from co-occurrence statistics
But…– Why should this be true?– Is the SVD the best way to treat these data?– What assumptions are we making about meaning?
Mechanism and Function
Some component of word meaning can be extracted from co-occurrence statistics
Semantic memory is structured to aid retrieval via context-specific prediction
Functional Claim
Semantic memory is structured to aid retrieval via context-specific prediction
– Motivates sensitivity to co-occurrence statistics– Identifies how co-occurrence data should be used– Allows the role of meaning to be specified exactly,
and finds a meaningful decomposition of language
A Probabilistic Approach
• The function of semantic memory– The psychological problem of meaning
– One approach to meaning
• Solving the statistical problem of meaning– Maximum likelihood estimation
– Bayesian statistics
• Comparisons with Latent Semantic Analysis– Quantitative
– Qualitative
A Probabilistic Approach
• The function of semantic memory– The psychological problem of meaning
– One approach to meaning
• Solving the statistical problem of meaning– Maximum likelihood estimation
– Bayesian statistics
• Comparisons with Latent Semantic Analysis– Quantitative
– Qualitative
The Function of Semantic Memory
• To predict what concepts are likely to be needed in a context, and thereby ease their retrieval
• Similar to rational accounts of categorization and memory (Anderson, 1990)
• Same principle appears in semantic networks (Collins & Quillian, 1969; Collins & Loftus, 1975)
The Psychological Problem of Meaning
• Simply memorizing whole word-document co-occurrence matrix does not help
• Generalization requires abstraction, and this abstraction identifies the nature of meaning
• Specifying a generative model for documents allows inference and generalization
One Approach to Meaning
• Each document a mixture of topics
• Each word chosen from a single topic
• from parameters
• from parameters
One Approach to Meaning
HEART 0.2 LOVE 0.2SOUL 0.2TEARS 0.2JOY 0.2SCIENTIFIC 0.0KNOWLEDGE 0.0WORK 0.0RESEARCH 0.0MATHEMATICS 0.0
HEART 0.0 LOVE 0.0SOUL 0.0TEARS 0.0JOY 0.0 SCIENTIFIC 0.2KNOWLEDGE 0.2WORK 0.2RESEARCH 0.2MATHEMATICS 0.2
topic 1 topic 2
w P(w|z = 1) = (1) w P(w|z = 2) = (2)
Choose mixture weights for each document, generate “bag of words”
One Approach to Meaning
= {P(z = 1), P(z = 2)}
{0, 1}
{0.25, 0.75}
{0.5, 0.5}
{0.75, 0.25}
{1, 0}
MATHEMATICS KNOWLEDGE RESEARCH WORK MATHEMATICS RESEARCH WORK SCIENTIFIC MATHEMATICS WORK
SCIENTIFIC KNOWLEDGE MATHEMATICS SCIENTIFIC HEART LOVE TEARS KNOWLEDGE HEART
MATHEMATICS HEART RESEARCH LOVE MATHEMATICS WORK TEARS SOUL KNOWLEDGE HEART
WORK JOY SOUL TEARS MATHEMATICS TEARS LOVE LOVE LOVE SOUL
TEARS LOVE JOY SOUL LOVE TEARS SOUL SOUL TEARS JOY
z
w
One Approach to Meaning
• Generative model for co-occurrence data
• Introduced by Blei, Ng, and Jordan (2002)
• Clarifies pLSI (Hofmann, 1999)
Matrix Interpretationw
ords
documents
wor
ds
topics
topi
cs
documents
normalizedco-occurrence matrix
mixtureweights
mixturecomponents
A form of non-negative matrix factorization
wor
ds
documents
U D V
wor
ds
vectors
vectorsve
ctor
s
vect
ors documents
wor
ds
documents
wor
ds
topics
topi
cs
documents
Matrix Interpretation
The Function of Semantic Memory
• Prediction of needed concepts aids retrieval
• Generalization aided by a generative model
• One generative model: mixtures of topics
• Gives non-negative, non-orthogonal factorization of word-document co-occurrence matrix
A Probabilistic Approach
• The function of semantic memory– The psychological problem of meaning
– One approach to meaning
• Solving the statistical problem of meaning– Maximum likelihood estimation
– Bayesian statistics
• Comparisons with Latent Semantic Analysis– Quantitative
– Qualitative
The Statistical Problem of Meaning
• Generating data from parameters easy
• Learning parameters from data is hard
• Two approaches to this problem– Maximum likelihood estimation– Bayesian statistics
Inverting the Generative Model
• Maximum likelihood estimation
• Variational EM (Blei, Ng & Jordan, 2002)
• Bayesian inference
WT + DT parameters
WT + T parameters
0 parameters
Bayesian Inference
• Sum in the denominator over Tn terms
• Full posterior only tractable to a constant
Markov Chain Monte Carlo
• Sample from a Markov chain which converges to target distribution
• Allows sampling from an unnormalized posterior distribution
• Can compute approximate statistics from intractable distributions
(MacKay, 2002)
Gibbs Sampling
For variables x1, x2, …, xn
Draw xi(t) from P(xi|x-i)
x-i = x1(t), x2
(t),…, xi-1(t)
, xi+1(t-1)
, …, xn(t-1)
Gibbs Sampling
(MacKay, 2002)
Gibbs Sampling
• Need full conditional distributions for variables
• Since we only sample z we need
number of times word w assigned to topic j
number of times topic j used in document d
Gibbs Sampling
i wi di zi123456789
101112...
50
MATHEMATICSKNOWLEDGE
RESEARCHWORK
MATHEMATICSRESEARCH
WORKSCIENTIFIC
MATHEMATICSWORK
SCIENTIFICKNOWLEDGE
.
.
.JOY
111111111122...5
221212212111...2
iteration1
Gibbs Sampling
i wi di zi zi123456789
101112...
50
MATHEMATICSKNOWLEDGE
RESEARCHWORK
MATHEMATICSRESEARCH
WORKSCIENTIFIC
MATHEMATICSWORK
SCIENTIFICKNOWLEDGE
.
.
.JOY
111111111122...5
221212212111...2
?
iteration1 2
Gibbs Sampling
i wi di zi zi123456789
101112...
50
MATHEMATICSKNOWLEDGE
RESEARCHWORK
MATHEMATICSRESEARCH
WORKSCIENTIFIC
MATHEMATICSWORK
SCIENTIFICKNOWLEDGE
.
.
.JOY
111111111122...5
221212212111...2
?
iteration1 2
Gibbs Sampling
i wi di zi zi123456789
101112...
50
MATHEMATICSKNOWLEDGE
RESEARCHWORK
MATHEMATICSRESEARCH
WORKSCIENTIFIC
MATHEMATICSWORK
SCIENTIFICKNOWLEDGE
.
.
.JOY
111111111122...5
221212212111...2
?
iteration1 2
Gibbs Sampling
i wi di zi zi123456789
101112...
50
MATHEMATICSKNOWLEDGE
RESEARCHWORK
MATHEMATICSRESEARCH
WORKSCIENTIFIC
MATHEMATICSWORK
SCIENTIFICKNOWLEDGE
.
.
.JOY
111111111122...5
221212212111...2
2?
iteration1 2
Gibbs Sampling
i wi di zi zi123456789
101112...
50
MATHEMATICSKNOWLEDGE
RESEARCHWORK
MATHEMATICSRESEARCH
WORKSCIENTIFIC
MATHEMATICSWORK
SCIENTIFICKNOWLEDGE
.
.
.JOY
111111111122...5
221212212111...2
21?
iteration1 2
Gibbs Sampling
i wi di zi zi123456789
101112...
50
MATHEMATICSKNOWLEDGE
RESEARCHWORK
MATHEMATICSRESEARCH
WORKSCIENTIFIC
MATHEMATICSWORK
SCIENTIFICKNOWLEDGE
.
.
.JOY
111111111122...5
221212212111...2
211?
iteration1 2
Gibbs Sampling
i wi di zi zi123456789
101112...
50
MATHEMATICSKNOWLEDGE
RESEARCHWORK
MATHEMATICSRESEARCH
WORKSCIENTIFIC
MATHEMATICSWORK
SCIENTIFICKNOWLEDGE
.
.
.JOY
111111111122...5
221212212111...2
2112?
iteration1 2
Gibbs Sampling
i wi di zi zi zi123456789
101112...
50
MATHEMATICSKNOWLEDGE
RESEARCHWORK
MATHEMATICSRESEARCH
WORKSCIENTIFIC
MATHEMATICSWORK
SCIENTIFICKNOWLEDGE
.
.
.JOY
111111111122...5
221212212111...2
211222212212...1
…
222122212222...1
iteration1 2 … 1000
pixel = word image = document
sample each pixel froma mixture of topics
A Visual Example: Bars
A Visual Example: Bars
From 1000 Images
Interpretable Decomposition
• SVD gives a basis for the data, but not an interpretable one
• The true basis is not orthogonal, so rotation does no good
Application to Corpus Data
• TASA corpus: text from first grade to college
• Vocabulary of 26414 words
• Set of 36999 documents
• Approximately 6 million words in corpus
THEORYSCIENTISTS
EXPERIMENTOBSERVATIONS
SCIENTIFICEXPERIMENTSHYPOTHESIS
EXPLAINSCIENTISTOBSERVED
EXPLANATIONBASED
OBSERVATIONIDEA
EVIDENCETHEORIESBELIEVED
DISCOVEREDOBSERVE
FACTS
SPACEEARTHMOON
PLANETROCKET
MARSORBIT
ASTRONAUTSFIRST
SPACECRAFTJUPITER
SATELLITESATELLITES
ATMOSPHERESPACESHIPSURFACE
SCIENTISTSASTRONAUT
SATURNMILES
ARTPAINT
ARTISTPAINTINGPAINTEDARTISTSMUSEUM
WORKPAINTINGS
STYLEPICTURES
WORKSOWN
SCULPTUREPAINTER
ARTSBEAUTIFUL
DESIGNSPORTRAITPAINTERS
STUDENTSTEACHERSTUDENT
TEACHERSTEACHING
CLASSCLASSROOM
SCHOOLLEARNING
PUPILSCONTENT
INSTRUCTIONTAUGHTGROUPGRADE
SHOULDGRADESCLASSES
PUPILGIVEN
BRAINNERVESENSE
SENSESARE
NERVOUSNERVES
BODYSMELLTASTETOUCH
MESSAGESIMPULSES
CORDORGANSSPINALFIBERS
SENSORYPAIN
IS
CURRENTELECTRICITY
ELECTRICCIRCUIT
ISELECTRICAL
VOLTAGEFLOW
BATTERYWIRE
WIRESSWITCH
CONNECTEDELECTRONSRESISTANCE
POWERCONDUCTORS
CIRCUITSTUBE
NEGATIVE
NATUREWORLDHUMAN
PHILOSOPHYMORAL
KNOWLEDGETHOUGHTREASONSENSEOUR
TRUTHNATURAL
EXISTENCEBEINGLIFE
MINDARISTOTLEBELIEVED
EXPERIENCEREALITY
A Selection of Topics
THIRDFIRST
SECONDTHREE
FOURTHFOUR
GRADETWO
FIFTHSEVENTH
SIXTHEIGHTH
HALFSEVEN
SIXSINGLENINTH
ENDTENTH
ANOTHER
STORYSTORIES
TELLCHARACTER
CHARACTERSAUTHOR
READTOLD
SETTINGTALESPLOT
TELLINGSHORT
FICTIONACTION
TRUEEVENTSTELLSTALE
NOVEL
MINDWORLDDREAM
DREAMSTHOUGHT
IMAGINATIONMOMENT
THOUGHTSOWNREALLIFE
IMAGINESENSE
CONSCIOUSNESSSTRANGEFEELINGWHOLEBEINGMIGHTHOPE
WATERFISHSEA
SWIMSWIMMING
POOLLIKE
SHELLSHARKTANK
SHELLSSHARKSDIVING
DOLPHINSSWAMLONGSEALDIVE
DOLPHINUNDERWATER
DISEASEBACTERIADISEASES
GERMSFEVERCAUSE
CAUSEDSPREADVIRUSES
INFECTIONVIRUS
MICROORGANISMSPERSON
INFECTIOUSCOMMONCAUSING
SMALLPOXBODY
INFECTIONSCERTAIN
A Selection of Topics
FIELDMAGNETIC
MAGNETWIRE
NEEDLECURRENT
COILPOLESIRON
COMPASSLINESCORE
ELECTRICDIRECTION
FORCEMAGNETS
BEMAGNETISM
POLEINDUCED
SCIENCESTUDY
SCIENTISTSSCIENTIFIC
KNOWLEDGEWORK
RESEARCHCHEMISTRY
TECHNOLOGYMANY
MATHEMATICSBIOLOGY
FIELDPHYSICS
LABORATORYSTUDIESWORLD
SCIENTISTSTUDYINGSCIENCES
BALLGAMETEAM
FOOTBALLBASEBALLPLAYERS
PLAYFIELD
PLAYERBASKETBALL
COACHPLAYEDPLAYING
HITTENNISTEAMSGAMESSPORTS
BATTERRY
JOBWORKJOBS
CAREEREXPERIENCE
EMPLOYMENTOPPORTUNITIES
WORKINGTRAINING
SKILLSCAREERS
POSITIONSFIND
POSITIONFIELD
OCCUPATIONSREQUIRE
OPPORTUNITYEARNABLE
STORYSTORIES
TELLCHARACTER
CHARACTERSAUTHOR
READTOLD
SETTINGTALESPLOT
TELLINGSHORT
FICTIONACTION
TRUEEVENTSTELLSTALE
NOVEL
MINDWORLDDREAM
DREAMSTHOUGHT
IMAGINATIONMOMENT
THOUGHTSOWNREALLIFE
IMAGINESENSE
CONSCIOUSNESSSTRANGEFEELINGWHOLEBEINGMIGHTHOPE
WATERFISHSEA
SWIMSWIMMING
POOLLIKE
SHELLSHARKTANK
SHELLSSHARKSDIVING
DOLPHINSSWAMLONGSEALDIVE
DOLPHINUNDERWATER
DISEASEBACTERIADISEASES
GERMSFEVERCAUSE
CAUSEDSPREADVIRUSES
INFECTIONVIRUS
MICROORGANISMSPERSON
INFECTIOUSCOMMONCAUSING
SMALLPOXBODY
INFECTIONSCERTAIN
A Selection of Topics
FIELDMAGNETIC
MAGNETWIRE
NEEDLECURRENT
COILPOLESIRON
COMPASSLINESCORE
ELECTRICDIRECTION
FORCEMAGNETS
BEMAGNETISM
POLEINDUCED
SCIENCESTUDY
SCIENTISTSSCIENTIFIC
KNOWLEDGEWORK
RESEARCHCHEMISTRY
TECHNOLOGYMANY
MATHEMATICSBIOLOGY
FIELDPHYSICS
LABORATORYSTUDIESWORLD
SCIENTISTSTUDYINGSCIENCES
BALLGAMETEAM
FOOTBALLBASEBALLPLAYERS
PLAYFIELD
PLAYERBASKETBALL
COACHPLAYEDPLAYING
HITTENNISTEAMSGAMESSPORTS
BATTERRY
JOBWORKJOBS
CAREEREXPERIENCE
EMPLOYMENTOPPORTUNITIES
WORKINGTRAINING
SKILLSCAREERS
POSITIONSFIND
POSITIONFIELD
OCCUPATIONSREQUIRE
OPPORTUNITYEARNABLE
A Probabilistic Approach
• The function of semantic memory– The psychological problem of meaning
– One approach to meaning
• Solving the statistical problem of meaning– Maximum likelihood estimation
– Bayesian statistics
• Comparisons with Latent Semantic Analysis– Quantitative
– Qualitative
Probabilistic Queries
• can be computed in different ways
• Fixed topic assumption:
• Multiple samples:
Quantitative Comparisons
• Two types of task– general semantic tasks: dictionary, thesaurus– prediction of memory data
• All tests use LSA with 400 vectors, and a probabilistic model with 100 samples each using 500 topics
Fill in the Blank
• 12856 sentences extracted from WordNet
• Overall performance– LSA gives median rank of 3393– Probabilistic model gives median rank of 3344
his cold deprived him of his sense of _silence broken by dogs barking _a _ hybrid accent
Fill in the Blank
Synonyms
• 280 sets of five synonyms from WordNet, ordered by number of senses
• Two tasks:– Predict first synonym– Predict last synonym
• Increasing number of synonyms
BREAK (78) EXPOSE (9) DISCOVER (8) DECLARE (7) REVEAL (3)
CUT (72) REDUCE (19) CONTRACT (12) SHORTEN (5) ABRIDGE (1)
RUN (53) GO (34) WORK (25) FUNCTION (9) OPERATE (7)
First Synonym
Last Synonym
Synonyms and Word Frequency
Synonyms and Word Frequency
Probabilistic
LSA
Synonyms and Word Frequency
Probabilistic
LSA
Word Frequency and Filling Blanks
Probabilistic LSA
Performance on Semantic Tasks
• Performance comparable, neither great
• Difference in effects of word frequency due to treatment of co-occurrence data
• Probabilistic approach useful in addressing psychological data: frequency important
Intrusions in Free Recall
• Intrusion rates from Deese (1959)
• Used average word vectors in LSA, P(word|list) in probabilistic model
• Favors LSA, since probabilistic combination can be multimodal
CHAIRFOODDESKTOPLEGEATCLOTHDISHWOODDINNERMARBLETENNIS
Intrusions in Free Recall
Intrusions in Free Recall
word frequencymodels
Word Frequency is Not Enough
• An explanation needs to address two questions:– Why do these words intrude?– Why do other words not intrude?
Word Frequency is Not Enough
• An explanation needs to address two questions:– Why do these words intrude?– Why do other words not intrude?
• Median word frequency rank: 1698.5
• Median rank in model: 21
Word Association
• Word association norms from Nelson et al. (1998)
people
EARTH STARS SPACE
SUN MARS
UNIVERSE SATURN GALAXY
model
STARS STAR SUN
EARTH SPACE
SKY PLANET
UNIVERSE
PLANETS
associate number
12345678
Word Association
Performance on Memory Tasks
• Outperforms LSA on simple memory tasks, both far better at predicting memory data
• Improvement due to role of word frequency
• Not a complete account, but can form a part of more complex memory models
Qualitative Comparisons
• Naturally deals with complications for LSA– Polysemy– Asymmetry
• Respects natural statistics of language
• Easily extends to other models of meaning
Beyond the Bag of Words
z
w
zz
w w
Beyond the Bag of Words
z
w
zz
w w
z
w
zz
w w
sss
FOODFOODSBODY
NUTRIENTSDIETFAT
SUGARENERGY
MILKEATINGFRUITS
VEGETABLESWEIGHT
FATSNEEDS
CARBOHYDRATESVITAMINSCALORIESPROTEIN
MINERALS
MAPNORTHEARTHSOUTHPOLEMAPS
EQUATORWESTLINESEAST
AUSTRALIAGLOBEPOLES
HEMISPHERELATITUDE
PLACESLAND
WORLDCOMPASS
CONTINENTS
DOCTORPATIENTHEALTH
HOSPITALMEDICAL
CAREPATIENTS
NURSEDOCTORSMEDICINENURSING
TREATMENTNURSES
PHYSICIANHOSPITALS
DRSICK
ASSISTANTEMERGENCY
PRACTICE
BOOKBOOKS
READINGINFORMATION
LIBRARYREPORT
PAGETITLE
SUBJECTPAGESGUIDE
WORDSMATERIALARTICLE
ARTICLESWORDFACTS
AUTHORREFERENCE
NOTE
GOLDIRON
SILVERCOPPERMETAL
METALSSTEELCLAYLEADADAM
OREALUMINUM
MINERALMINE
STONEMINERALS
POTMININGMINERS
TIN
BEHAVIORSELF
INDIVIDUALPERSONALITY
RESPONSESOCIAL
EMOTIONALLEARNINGFEELINGS
PSYCHOLOGISTSINDIVIDUALS
PSYCHOLOGICALEXPERIENCES
ENVIRONMENTHUMAN
RESPONSESBEHAVIORSATTITUDES
PSYCHOLOGYPERSON
CELLSCELL
ORGANISMSALGAE
BACTERIAMICROSCOPEMEMBRANEORGANISM
FOODLIVINGFUNGIMOLD
MATERIALSNUCLEUSCELLED
STRUCTURESMATERIAL
STRUCTUREGREENMOLDS
Semantic categories
PLANTSPLANT
LEAVESSEEDSSOIL
ROOTSFLOWERS
WATERFOOD
GREENSEED
STEMSFLOWER
STEMLEAF
ANIMALSROOT
POLLENGROWING
GROW
GOODSMALL
NEWIMPORTANT
GREATLITTLELARGE
*BIG
LONGHIGH
DIFFERENTSPECIAL
OLDSTRONGYOUNG
COMMONWHITESINGLE
CERTAIN
THEHIS
THEIRYOURHERITSMYOURTHIS
THESEA
ANTHATNEW
THOSEEACH
MRANYMRSALL
MORESUCHLESS
MUCHKNOWN
JUSTBETTERRATHER
GREATERHIGHERLARGERLONGERFASTER
EXACTLYSMALLER
SOMETHINGBIGGERFEWERLOWER
ALMOST
ONAT
INTOFROMWITH
THROUGHOVER
AROUNDAGAINSTACROSS
UPONTOWARDUNDERALONGNEAR
BEHINDOFF
ABOVEDOWN
BEFORE
SAIDASKED
THOUGHTTOLDSAYS
MEANSCALLEDCRIED
SHOWSANSWERED
TELLSREPLIED
SHOUTEDEXPLAINEDLAUGHED
MEANTWROTE
SHOWEDBELIEVED
WHISPERED
ONESOMEMANYTWOEACHALL
MOSTANY
THREETHIS
EVERYSEVERAL
FOURFIVEBOTHTENSIX
MUCHTWENTY
EIGHT
HEYOU
THEYI
SHEWEIT
PEOPLEEVERYONE
OTHERSSCIENTISTSSOMEONE
WHONOBODY
ONESOMETHING
ANYONEEVERYBODY
SOMETHEN
Syntactic categories
BEMAKE
GETHAVE
GOTAKE
DOFINDUSESEE
HELPKEEPGIVELOOKCOMEWORKMOVELIVEEAT
BECOME
Sentence generationRESEARCH:[S] THE CHIEF WICKED SELECTION OF RESEARCH IN THE BIG MONTHS[S] EXPLANATIONS[S] IN THE PHYSICISTS EXPERIMENTS[S] HE MUST QUIT THE USE OF THE CONCLUSIONS[S] ASTRONOMY PEERED UPON YOUR SCIENTISTS DOOR[S] ANATOMY ESTABLISHED WITH PRINCIPLES EXPECTED IN BIOLOGY[S] ONCE BUT KNOWLEDGE MAY GROW[S] HE DECIDED THE MODERATE SCIENCE
LANGUAGE:[S] RESEARCHERS GIVE THE SPEECH[S] THE SOUND FEEL NO LISTENERS[S] WHICH WAS TO BE MEANING[S] HER VOCABULARIES STOPPED WORDS[S] HE EXPRESSLY WANTED THAT BETTER VOWEL
Sentence generationLAW:[S] BUT THE CRIME HAD BEEN SEVERELY POLITE OR CONFUSED[S] CUSTODY ON ENFORCEMENT RIGHTS IS PLENTIFUL
CLOTHING:[S] WEALTHY COTTON PORTFOLIO WAS OUT OF ALL SMALL SUITS[S] HE IS CONNECTING SNEAKERS[S] THUS CLOTHING ARE THOSE OF CORDUROY[S] THE FIRST AMOUNTS OF FASHION IN THE SKIRT[S] GET TIGHT TO GET THE EXTENT OF THE BELTS[S] ANY WARDROBE CHOOSES TWO SHOES
THE ARTS:[S] SHE INFURIATED THE MUSIC[S] ACTORS WILL MANAGE FLOATING FOR JOY[S] THEY ARE A SCENE AWAY WITH MY THINKER[S] IT MEANS A CONCLUSION
Conclusion
Taking a probabilistic approach can clarify some of the central issues in semantic representation
– Motivates sensitivity to co-occurrence statistics– Identifies how co-occurrence data should be used– Allows the role of meaning to be specified exactly,
and finds a meaningful decomposition of language
Probabilities and Inner Products
• Single word:
• List of words:
w
Model Selection
• How many topics does a language contain?
• Major issue for parametric models
• Not so much for non-parametric models– Dirichlet process mixtures– Expect more topics than tractable– Choice of number is choice of scale
Gibbs Sampling and EM
• How many topics does a language contain?
• EM finds fixed set of topics, single estimate
• Sampling allows for multiple sets of topics, and multimodal posterior distributions
Natural Statistics
• Treating co-occurrence data as frequencies preserves the natural statistics of language
• Word frequency
• Zipf’s Law of Meaning
Natural Statistics
Natural Statistics
Natural Statistics
Word Association
people
KING JEWEL QUEEN HEAD HAT TOP
ROYAL THRONE
model
KING TEETH HAIR
TOOTH ENGLAND
MOUTH QUEEN PRINCE
CROWN
Word Association
people
CHRISTMAS TOYS
LIE
model
MEXICO SPANISH
CALIFORNIA
SANTA