Visualizing The Semantic Similarity of Geographic Features

Introduction Methods Result Conclusion & Future Work References

Visualizing The Semantic Similarity of GeographicFeatures

Gengchen Mai 1 Krzysztof Janowicz 1 Sathya Prasad 2

Bo Yan 1

1STKO Lab, UC Santa Barbara, CA, USA

2Esri Inc, Redlands, CA, USA

April, 2018

A Semantically Enriched Visualization

Maps: extensively used to visualize GI and spatialrelationships.

Difficult to directly express non-spatial relationships (semanticsimilarity) using such maps.

A Semantically Enriched Visualization: An analogy of thematicmaps to visualize the distribution of geographic features in asemantic space.

Points: Geographic Coordinates → Locations in the SemanticSpace

Polygons: Administration Regions/Continents → SemanticContinents

A Semantically Enriched Visualization:

Semantically similar entities are clusters within the sameregion;

The distance between geographic features represents howsimilar they are.

In this work, a semantically enriched geospatial datavisualization and searching framework are presented.

We evaluate it using a subset of places from DBpedia.

Multiple techniques:

Paragraph VectorSpatial ClusteringConcave Hull ConstructionInformation Retrieval (IR) Model

A semantically enriched visualization resembles cartographic layouts

The Workflow

Paragraph Vectors Computing

Paragraph Vector (or called Doc2Vec) is a representationlearning method proposed by Natural Language Processingcommunity.

Idea: Give a collection of documents, Doc2Vec learns ahigh-dimensional continuous vector (embedding) for eachdocument.

The cosine similarity between the learned document vectorsrepresents the semantic similarity between theircorresponding documents.

The two-layer neural network architectural of Doc2Vec

Outputs of Doc2Vec:

Embeddings of documents;

Embeddings of word tokens in the document corpus.

Data Source: All entities typed dbo:HistoricalPlace inDBpedia (21010 places)

Each historic place has an abstract, comments, images, andgeographic coordinates.

Method: Doc2Vec Model (PVDM [4])

Textual data collection: Treat each place as a documentwhose content is its abstract and commentsTextual data preprocessing: tokenization and lemmalizationParagraph vector training: embedding dimention K = 300;window size N = 10; learning rate α = 0.025

Information Retrieval Model

Place Embeddings: the learned embedding of each historicplace from Doc2Vec.

Query Embeddings:Utilize the Doc2Vec.infer vector() function from gensim’sDoc2Vec packageThe TF-IDF score weighted embedding based on wordembeddings of query word tokensThe simple average of the query tokens’ embeddingsafter stop words removal

Semantic Similarity Score Function: the cosine similaritybetween the query embedding and place embeddings

An API 1 is provided for the semantic searching functionalityamong DBpedia historic places.

1http://stko-testing.geog.ucsb.edu:

3050/semantic/search?searchText=grave%20yard.

Semantic Similarity Map Construction

Spatialization: how to construct an overview of the semanticdistribution of geographic entities such that it follows acartographic tradition (semantic continent).

K-means clustering: group these place embeddings into differentclusters;

Try #(clusters) from 2 to 30 andcompute silhouette coefficient [5] of theclustering results;

#(clusters) = 16 gives the highestsilhouette coefficient;

The descriptions of places in each clusterare combined as one document;

Word clouds are produced from 10 wordwith highest TF-IDF score;

Each cluster is named according to the itstop 10 words.

The word cloud for Educationcluster

Dimension reduction: to visualize the semantic distribution ofgeographic entities in a 2-dimensional space

Different dimension deduction methods including PCA andt-SNE are experimented;

t-SNE performs best and the clusters derived from k-meansare still well separated.

DBSCAN:

Although t-SNE produces a gooddimension reduction result, somepoints are far away from theircluster centroids and scattered inthe 2D space.

We apply DBSCAN [3] to eachprojected k-means cluster toextract the “core” parts of them.

Visual interpretation are used toselect the parameter combinationfor DBSCAN. (Eps = 1.1 andMinPts = 6)

Concave Hull Construction: Chi-shape algorithm [2]

It first constructs a Delaunay triangulation;

It erodes the boundary by deleting boundary’sedges until the longest edge less than athreshold.

A normalized length parameter λp ∈ [1, 100]controls this threshold;

To get optimal λp, a fitness score function [1] isused to balance the complexity and emptynessof the resulting concave hull.

φ(P,D) = Emptiness(P,D) + C ∗ Complexity(P)(1)

P: the derived simple polygon; D: the Delaunaytriangulation of the corresponding point cluster.

Concave Hull Construction:

We iterate λp from 1 to 100 and compute the average fitnessscore of all point clusters produced by DBSCAN;

The optimal λp with the lowest average fitness score is 30.

The average fitness score for different λp among all DBSCAN clusters.

Publishing the Semantic View Webmap in ArcGIS Online:

We publish this map as a webmap service in ArcGIS Online 2.

2http://www.arcgis.com/home/item.html?id=

7e15f98399ff4788a502fd04320bdafc

Result

We have deployed a web-based user interface3 to showcase thefunctionality using the historical places dataset.

the search result of “grave yard” in the semantic space

3http://stko-testing.geog.ucsb.edu:3050/

Result

the search result of “grave yard” in the geographic space

Result

The pop-up window shows some basic information fordbo:Istre Cemetery Grave Houses.

Conclusion & Future Work

In this work, we presented a semantically enriched geospatialdata visualization and search framework.

In the future, the proposed methods have to be calibrated,e.g., by setting the hyperparameters, based on results ofhuman participants testing.

This work has been accepted to AGILE 2018.

Gengchen Mai, Krzysztof Janowicz, Sathya Prasad, Bo Yan.Visualizing The Semantic Similarity of Geographic Features, In:Proceedings of AGILE 2018, June 12 - 15, 2018, Lund, Sweden.

References I

[1] Akdag, Fatih, Eick, Christoph F, & Chen, Guoning. 2014.Creating polygon models for spatial clusters. Pages 493–499 of:International Symposium on Methodologies for IntelligentSystems. Springer.

[2] Duckham, Matt, Kulik, Lars, Worboys, Mike, & Galton,Antony. 2008. Efficient generation of simple polygons forcharacterizing the shape of a set of points in the plane. PatternRecognition, 41(10), 3224–3236.

[3] Ester, Martin, Kriegel, Hans-Peter, Sander, Jorg, Xu, Xiaowei,et al. . 1996. A density-based algorithm for discovering clustersin large spatial databases with noise. Pages 226–231 of: Kdd,vol. 96.

References II

[4] Le, Quoc, & Mikolov, Tomas. 2014. Distributedrepresentations of sentences and documents. Pages 1188–1196of: International Conference on Machine Learning.

[5] Rousseeuw, Peter J. 1987. Silhouettes: a graphical aid to theinterpretation and validation of cluster analysis. Journal ofcomputational and applied mathematics, 20, 53–65.

Visualizing The Semantic Similarity of Geographic Features

Documents

Visualizing Residential Vacancy by Length of Vacancy · Geographic Information Systems (GIS) organize and clarify the patterns of human ... residential vacancies for four categories:

WRITING AND REVISING - DiVA portal143740/FULLTEXT01.pdf · visualizing writing revision 6.2 Summary of Paper II. GIS for writing: applying geographic 19 information systems techniques

MS in Data Science Worksheetcsis.pace.edu/.../GradWorksheets/DataScienceMSWorksheet.pdf · 2020. 5. 5. · IS 668 Visualizing Information Systems: Introduction to Geographic Information

Visualizing the marrow of science - CORE · Visualizing the Marrow of Science SCImago Research Group: Félix de Moya-Anegón 1, ... similarity to a data set of 7,121 journals covering

Graph/Network Visualizationsfang/cs552/cs552-graph.pdf · Visualizing graphs and clusters as geographic maps to represent node relations (geographic neighbors) Topological graph simplification

Visualizing the spatial gene expression organization in ... · PDF fileVisualizing the spatial gene expression organization in the brain through non-linear similarity embeddings

Visualizing graph dynamics and similarity for enterprise ...dial/publications/liao2010visualizing.pdf · Visualizing Graph Dynamics and Similarity for Enterprise Network Security

Improving volunteered geographic data quality using semantic similarity measurements

Visualizing Historic Space Through the Integration of GIS ... · Visualizing Historic Space through the Integration of Geographic Information Science in Secondary School Curriculums:

The semantics of similarity in geographic information retrievalgeog.ucsb.edu/~jano/semofsim2011.pdf · 2014. 6. 22. · mation indexing, relevance rankings, search engines, evaluation

Visualization in AI · tSNE - Multi-dimensional Data Visualization tool t-Stochastic Neighbour Embedding, or t-SNE Visualizing data similarity of high-dimensional data by maintaining

Visualizing multiple word similarity measures Brent Kievit-Kylar

Evaluating semantic similarity using GML in Geographic Information Systems Fernando Ferri 1, Anna Formica 2, Patrizia Grifoni 1, and Maurizio Rafanelli

Evaluating semantic similarity using GML in Geographic Information Systems

Visualizing Geographic Classiﬁcations Using Colorpalantir.cs.colby.edu/maxwell/papers/pdfs/Maxwell-JnlCart-2000.pdf · Visualizing Geographic Classiﬁcations Using Color 1 Introduction

Visualizing Data with Geographic Information Systems (GIS)

Mapping potential foodsheds in New York State: A spatial ...faculty.bennington.edu/~kwoods/classes/agric_hist... · foodshed maps as a tool for visualizing the geographic extent of

Visualization in AIg.legrady/academic/... · tSNE - Multi-dimensional Data Visualization tool t-Stochastic Neighbour Embedding, or t-SNE Visualizing data similarity of high-dimensional

Visualizing Music and Audio using Self-Similaritymusic.ucsd.edu/~sdubnov/CATbox/Reader/p77-foote.pdf · Visualizing Music and Audio using Self-Similarity ... Matrix entries (i,j)

GeoGraphic patterns of sonG similarity in the Dickcissel ... · GeoGraphic patterns of sonG similarity in the Dickcissel (Spiza americana) Derek M. Schook,1 Michael D. collinS,1,5