39
The Black Art of Geocoding Finding that elusive lat/lon John Fagan, Microsoft

John Fagan - The Black Art of Geocoding

Embed Size (px)

DESCRIPTION

Mapping/LBS applications require 3 core engines, namely Mapping, Routing and Geocoding. The latter is often overlooked, but Geocoding is the fundamental component of all Mapping and LBS applications. If you don’t have a lat/lon, then how do you find a map, how do you get from a to b, how do you plot your data? This paper will give a whistlestop tour of the basics of mapping and routing engines and then do a deep dive on Geocoding. It will suggest that we have solved routing and mapping, but we have a lot of work to do with Geocoding.

Citation preview

Page 1: John Fagan - The Black Art of Geocoding

The Black Art of GeocodingFinding that elusive lat/lon

John Fagan, Microsoft

Page 2: John Fagan - The Black Art of Geocoding

The Black Art of Geocoding

Finding that elusive lat/lon

John FaganProgram Manager

Microsoft Corporation

@johnbfagan

Page 3: John Fagan - The Black Art of Geocoding

We been making maps for 1000’s of years

Page 4: John Fagan - The Black Art of Geocoding

Well known and established standards/principles

Page 5: John Fagan - The Black Art of Geocoding

Lots of experience in building software to create bitmaps from vector and raster data

Page 7: John Fagan - The Black Art of Geocoding

Mapping easy to scale

Page 8: John Fagan - The Black Art of Geocoding

...and so is routing

1000’s years experience in wayfindingOver 50 years experience in routing algorithmsDijkstra's shortest path algorithm (1959)

Page 10: John Fagan - The Black Art of Geocoding

Routing, easy to scale

Page 11: John Fagan - The Black Art of Geocoding

Geocoding not so easy

20 years experience10 years of global Geocoding5 years exposing geocoding to the mass consumer

No standard algorithmsVery few databases purpose built (maybe GNAF)Very hard to scale

Page 12: John Fagan - The Black Art of Geocoding

Geocoding is fundamental

Cant get a map without a geocodeCant get a route without a geocodeCant view your data without a geocode80% of all information contains a geographic element.

Page 13: John Fagan - The Black Art of Geocoding

It used to be easier

Page 14: John Fagan - The Black Art of Geocoding

Now its hard

Page 15: John Fagan - The Black Art of Geocoding

User expectations change with unstructured input

67 hill veiw road, s61 2bn in the 1850's1.5 hours from Niceexact directions from Bangkok Patana School to Suvanapumi Airport in Bangkok.10 mile radius from se20 7uahow long would it take me to walk around cancunhow to get to m13 gb from g83 9le by cardo bearded dragons bite?

Page 16: John Fagan - The Black Art of Geocoding

But ......Geocoding NOT about Search

Page 17: John Fagan - The Black Art of Geocoding

52.19157,-1.70415

Page 18: John Fagan - The Black Art of Geocoding

The reason it's called 'I'm Feeling Lucky,' is of course that's a pretty damn ambitious goal. I mean to get the exact right one thing without even giving you a list of choices, and so you have to feel a little bit lucky if you're going to try that with one go," tried to explain Sergey Brin.

Page 19: John Fagan - The Black Art of Geocoding

Why is it hard (2 reasons)

Page 20: John Fagan - The Black Art of Geocoding

Parsing: Hard to understand unstructured input

Page 21: John Fagan - The Black Art of Geocoding

Finding Stratford-upon-Avon

stratfordstratford upon avonStratford upon havenStratfordUponAvonStratford-Upon-Avonstratford on avon stratford-on-avonstratford 0n avonstratford - upon-avon stratford on avaonstratford apon avonstratford upon aavonstratford uppon avon

Page 22: John Fagan - The Black Art of Geocoding

Finding Stratford-upon-Avon

Page 23: John Fagan - The Black Art of Geocoding

Parsing

In computer science and linguistics, parsing, or, more formally, syntactic analysis, is the process of analyzing a text, made of a sequence of tokens (for example, words), to determine its grammatical structure with respect to a given (more or less) formal grammar.

http://en.wikipedia.org/wiki/Parsing

Page 24: John Fagan - The Black Art of Geocoding

Old way of Parsing – Rules based

A rules based approach (mainly done with regular expressions)

Page 25: John Fagan - The Black Art of Geocoding

Probabilistic approach

Machine learnedRequires you to “train” the engineRequires truth sets of training data

http://en.wikipedia.org/wiki/Hidden_Markov_model

Page 26: John Fagan - The Black Art of Geocoding

Probabilistic approach: Hidden Markov Model

input --> 165 fleet street london EC4A 2DY

output --> address {

street number : 165street : fleet street city : london postcode : EC4A 2DY }

Page 27: John Fagan - The Black Art of Geocoding

Multimap stats

grammar % share

Postcode 67.9

Locality 14.8

Landmark 3.3

Street name 3

Street name, Localiry 2.4

street number street name 0.5

County 0.5street number street name locality 0.5

Locality county 0.4

Page 28: John Fagan - The Black Art of Geocoding

Parsing has its limitations

Parsing failuresMultimap/Bing Maps (st andrews scotland)Google (uk near Boston, MA, USA)All fail - House number plus postcode (165, EC4A 2DY)

Page 29: John Fagan - The Black Art of Geocoding

Parsing using a Spatial Engine

http://research.microsoft.com/en-us/people/josephj/acm_gis_2007_robust_location_search.pdf

Page 30: John Fagan - The Black Art of Geocoding

Why is it hard (Data)

Page 32: John Fagan - The Black Art of Geocoding

[OSM-talk] Baghdad mapsI am informed that any road may have up to 4 names (which may be the same or different):

1) The pre-Saddam name 2) The Saddam-era name. 3) The "public" name - What the people who live there call it. 4) The "Official" name - What the new Government calls it.

This situation is further complicated by language and social issues: Language

5) The roads are names in Arabic.6) There is no fixed translation between the Arabic and Latin alphabets.

Social Issues:

1) Sunnis tend to use the Saddam-era names 7) Shia tend to rename streets and won't acknowledge Saddam-era names. 8) Ethnic cleansing is changing the neighbourhoods and hence the names. 9) Names (such as 14th July Bridge) will change later.

My translator's opinion is that street names are going to take at least 2-3 years to settle down.

http://lists.openstreetmap.org/pipermail/talk/2007-February/011273.html

Page 33: John Fagan - The Black Art of Geocoding

Don't throw away your data

Multimap have always kept old postcodes10% of Multimap’s postcode database is of “dead” postcodesThis might not work for routing and mapping, but very valuable for Geocoding

Page 34: John Fagan - The Black Art of Geocoding

EC4A 1HE – Postcode of vintage 2002

Page 35: John Fagan - The Black Art of Geocoding

Lash data and enrich

Stratford-upon-Avon

Page 36: John Fagan - The Black Art of Geocoding
Page 37: John Fagan - The Black Art of Geocoding

Future = Real time Geocoding?

Page 38: John Fagan - The Black Art of Geocoding

Summary

Mapping and Routing – FIXEDGeocoding – Must Try HarderParsing Data

Page 39: John Fagan - The Black Art of Geocoding

thanks

john fagan

ubergeo.com

@johnbfagan