12
GRANT AGREEMENT: 601138 | SCHEME FP7 ICT 2011.4.3 Promoting and Enhancing Reuse of Information throughout the Content Lifecycle taking account of Evolving Semantics [Digital Preservation] “This project has received funding from the European Union’s Seventh Framework Programme for research, technological development and demonstration under grant agreement no601138”. P. Wittek, S. Daranyi, E. Kontopoulos, T. Moysiadis, I. Kompatsiaris

Monitoring term drift based on semantic consistency in an evolving vector field

Embed Size (px)

Citation preview

Page 1: Monitoring term drift based on semantic consistency in an evolving vector field

GRANT AGREEMENT: 601138 | SCHEME FP7 ICT 2011.4.3 Promoting and Enhancing Reuse of Information throughout the Content Lifecycle taking account of Evolving Semantics [Digital Preservation]

“This project has received funding from the European Union’sSeventh Framework Programme for research, technologicaldevelopment and demonstration under grant agreement no601138”.

P. Wittek, S. Daranyi, E. Kontopoulos, T. Moysiadis, I. Kompatsiaris

Page 2: Monitoring term drift based on semantic consistency in an evolving vector field

Propose a field approach to lexical analysis

Use evolving fields for expressing time-dependent changes in a vector space model ◦ Random indexing

◦ Evolving self-organizing maps (ESOM)

2

Page 3: Monitoring term drift based on semantic consistency in an evolving vector field

Semantic continuity hypothesis◦ Actual & potential word content

◦ Observable locations & “lexical gaps”

Continuity modelled as evolving field◦ Actual & potential word content constantly

dislocated over time

Time stamped data◦ Measure dislocations

◦ “Semantic drift”, an indicator of language change

3

Page 4: Monitoring term drift based on semantic consistency in an evolving vector field

Semantic similarity

Semantic fields

Measuring semantic consistency

Semantic drifts

4

Page 5: Monitoring term drift based on semantic consistency in an evolving vector field

1. Evaluate semantic consistency within single time periods of an evolving data set

2. Can semantic drift be detected by analysing the change in semantic consistency?

5

Page 6: Monitoring term drift based on semantic consistency in an evolving vector field

6

Page 7: Monitoring term drift based on semantic consistency in an evolving vector field

Distributional Similarity & Random Indexing◦ TFIDF vector space model of the corpus

◦ Random indexing

Semantic fields in ESOMs◦ Embed vector space on a 2D surface using ESOMs

◦ Resulting network reflects local topology of the high-dimensional space

WN-based similarity metrics◦ Path-based, content-based, feature-based, hybrid

7

Page 8: Monitoring term drift based on semantic consistency in an evolving vector field

12.8M Amazon book reviews over 18 yrs *

Lucene, SemanticVectors, Somoclu

WordNet 3.0 & WS4J

Wu and Palmer’s semantic similarity method

All experiments are open source

* Stanford University’s SNAP project: http://snap.stanford.edu/index.html

8

Page 9: Monitoring term drift based on semantic consistency in an evolving vector field

Proximity → sem-sim?

Neurons >1 term

avg-sim between terms

5-term neurons

Normal distribution

N ≥ 3, 5, 10

9

Terms within a neuron demonstrated significantly greater similarity in comparison to a randomly selected group of terms

Page 10: Monitoring term drift based on semantic consistency in an evolving vector field

For each of the 3 periods, the process was repeated

Percentages slightly decreased from period to period

Decrease was not statistically significant – no divergence

More periods needed

10

Page 11: Monitoring term drift based on semantic consistency in an evolving vector field

Models of evolving semantic content◦ Dynamic vector field

◦ Semantic continuity in the vocabulary

◦ Experiment confirmed that similarity of terms within ESOM grid neurons was significantly higher

Future work◦ Increase number of periods & grid granularity

◦ Smoothen transition between periods

◦ Interpretations of vector field model

◦ Application in other domains

11

Page 12: Monitoring term drift based on semantic consistency in an evolving vector field

12