Upload
jacob-mosley
View
219
Download
0
Embed Size (px)
Citation preview
1
Kalev Leetaru, Eric Shook, and Shaowen Wang
CyberInfrastructure and Geospatial Information Laboratory (CIGI)Department of Geography and Geographic Information Science
School of Earth, Society, and EnvironmentNational Center for Supercomputing Applications (NCSA)
University of Illinois at Urbana-Champaign
CyberGIS ‘ 12, Urbana IL, August 8, 2012
A CyberGIS Approach to Digital Humanities and Social Sciences: The World of Textual Geography and a Case
Study of Wikipedia’s History of the World
10
11
14
http://www.sgi.com/go/wikipedia
15
16
17
18
19
Workflow
CyberGIS
SentimentMining
Fulltext Geocoding
Inside the CyberGIS “black box”
Security DomainDecomposition
XSEDE
GISolve Middleware
CI
Data &Viz
Resource Selection
Task Scheduling
Clouds
Workflow Management ServicesOpen Service API
OSG
EmotionalHeatmap
Data Input for a Topic
A set of locations with 3 attributes Latitude, longitude point location1. Number of articles mentioning this location2. Number of articles mentioning both this location and topic3. Average tone of articles mentioning both this location and topic
Data Input for a Topic
A set of locations with 3 attributes Latitude, longitude point location1. Number of articles mentioning this location2. Number of articles mentioning both this location and topic3. Average tone of articles mentioning both this location and topic
?
Spatializing Emotion
3 important elements
1. Importance of location2. Prevalence of topic3. Emotion toward topic
Goal:Capture 3 elements on a single map
1) Importance of Location Every mention of a location
increases its importance
Generate a density map of the number of times a location is mentioned in text using Kernel Density Estimation (KDE) based on k nearest neighbor search
1) Importance of Location
2) Prevalence of Topic
We term topic intensity to capture the prevalence of a topic relative to other topics, and adopt a method commonly used in epidemiological studies to estimate it
Relative risk is a ratio of the KDE of disease infection locations and case control locations
Topic Intensity
Topic Intensity
KDE(articles that mention a topic)___ KDE(articles that do not mention the topic)
Relative Risk
KDE(points with disease)__ KDE(points without disease)
Topic Intensity
3) Emotion Toward a Topic Challenging question:
Is the emotional measure tone, discrete or continuous?– Is tone "countable" like trees or does
it exist as a continuum like air temperature?
Tone is a continuum:– Cannot have "number of tones"
3) Emotion Toward a Topic A different method is used,
because tone is continuous and not discrete
Inverse distance weighted (IDW) interpolation is used to estimate tone across space creating a tone map
Tone map captures positive and negative tone toward a particular topic across space
3) Emotion Toward a Topic
Overview – 3 layers
1) Article density - Proxy: Importance of location
2) Topic intensity - Proxy: Prevalence of topic relative to other topics
3) Tone - Proxy: Emotion toward a topic
Overview – 3 layers
1) Article density - Proxy: Importance of location
2) Topic intensity - Proxy: Prevalence of topic relative to other topics
3) Tone - Proxy: Emotion toward a topic
First two layers representscaling factors for tone
Value range: 0 - 1
Value range: 0 - 100
Value range: -100 - 100
Emotional Heatmap
Article Density Topic Intensity
Emotional HeatmapTone
*
=
*
Emotional Heatmap of Armed Conflict in 2003 (Wikipedia)
Summary
First steps, but started the dialogue
Balance– Managing the complexity of
cyberinfrastructure access– Simplifying the workflow of chaining
of spatial analytics– Making sense of what’s involved
Scientific rigor
Ongoing Work
Translate spatial knowledge to domain knowledge by answering a basic question: why is this here and not there?
Tackle spatial aggregation issues– Represent locations as areas not
points– Areal interpolation
39
Acknowledgments
Guofeng Cao, Anand Padmanabhan National Science Foundation
– BCS-0846655– OCI-1047916– Open Science Grid– XSEDE SES070004N
40
Thanks!