Upload
pim-huijnen
View
49
Download
0
Embed Size (px)
Citation preview
Visualizing Concepts over Time
Translantis Project Digital Humanities Approaches to Reference
Cultures: The Emergence of the United States in Dutch Public Discourse 1890-1990
Carlos Martinez-Ortiz, Tom Kenter, Melvin Wevers, Pim Huijnen, Jaap Verheul, and Joris van Eijnatten
DESIGN AND IMPLEMENTATION OF SHICO
SHICO: SHIFTING CONCEPTS OVER TIME
• The emergence and transformation of concepts, ideas, and thoughts
• How can we study concepts in large digitized corpora?
OUR APPROACH• Data: >600.000 digitized
newspaper issues from the Dutch National Library
• Multi-dimensional word-vector space using Google’s word2vec (word embeddings)
• Semantic and syntactic information representation by geometry (Baroni & Kruszweksi, 2014; Wijaya & Yeniterzi, 2011)
1950 1960 1970
1 model = 10 years40 models for period between 1950-1990
TRACING CONCEPTS• Concept is a network of words
• Find closely related words to seed word based on distance
• One or more words as entry-points into concept
• Weighting based on frequency + sum distance
• Tracing concepts: which words remain and disappear from network
• See: Kenter, Wevers, Huijnen, de Rijke (2015) “Ad Hoc Monitoring of Vocabulary Shifts over Time”
expand tosemantic graphwithsemantic spacefor time t+1
vocabulary at time t
prune
t = t + 1
VISUALIZATION
• ADD SCREENSHOT INTERFACE
THREE TYPES OF VISUALIZATIONS
• Stream Graphs
• Network Graphs
• Vocabularies (lists of words in network)
Non-adaptive Model
Adaptive Model
Propaganda: Adaptive & Non-Adaptive
Propaganda: Adaptive & Non-Adaptive
DOPING: STREAM GRAPH
DOPING: VOCABULARIES
DOPING: NETWORK GRAPHS
INTEGRATION
DO-IT-YOURSELF
• Code / how-to-manual is on Github:
• https://github.com/NLeSC/ShiCo
• Models are also on Github
• Easy to integrate into other tools
Thank you!