Transcript
Page 1: OpenStreetMap Geocoder Based on Solr
Page 2: OpenStreetMap Geocoder Based on Solr

Ishan ChattopadhyayaLucidWorksOpenStreetMap FoundationTwitter: @ichattopadhyaya, OSM: chatman

Page 3: OpenStreetMap Geocoder Based on Solr

● Wikipedia of GeoData

● OpenStreetMap is a project aimed squarely

at creating and providing free geographic

data such as street maps to anyone who

wants them.

What is OpenStreetMap?

Page 4: OpenStreetMap Geocoder Based on Solr

State of OSM

● Commercial competitors

– Google Maps

– Bing Maps

● http://tools.geofabrik.de/mc/

Page 5: OpenStreetMap Geocoder Based on Solr

The OpenStreetMap Software Stack

Page 6: OpenStreetMap Geocoder Based on Solr

What is a Geocoder?

● Input: raw query

● Output: geocoordinates

Page 7: OpenStreetMap Geocoder Based on Solr

Nominatim

● http://nominatim.openstreetmap.org/

Page 8: OpenStreetMap Geocoder Based on Solr

Goals for the new Geocoder● Search for:

– Cities and towns

– Streets

– Address points

– Places of Interest, Businesses, Amenities, Attractions etc.

● Reverse geocoding

● Support for fuzzy queries

Page 9: OpenStreetMap Geocoder Based on Solr

Good changes in Lucene/Solr 4.x● Support for indexing polygons

– RecursivePrefixTree indexing

● Special spatial search predicates

– Contains

– IsWithin

– Intersects

– Etc.

● Reference: David Smiley's LuceneRevolution presentation

● SolrCloud mode for distributed indexing/searching

Page 10: OpenStreetMap Geocoder Based on Solr

Architecture

Indexer

Solr

www.Geocoder.

in

API Layer

Planet dumps

Page 11: OpenStreetMap Geocoder Based on Solr

Indexing: OSM Data format

● Node

– “A node defines a single geospatial point using a latitude and longitude.”

● Way

– “A way is an ordered list of between 2 and 2,000 nodes. Ways are used to represent linear features (vectors), such as rivers or roads.”

● Relation

– “A Relation is an all-purpose data structure that documents a relationship between two or more other objects.”

Page 12: OpenStreetMap Geocoder Based on Solr

Indexing: Facts and figures

● Number of OSM Nodes in the database = 2071039612

● Number of OSM Ways in the database = 202570637

● Number of OSM Relations in the database = 2217240

Page 13: OpenStreetMap Geocoder Based on Solr

Indexing: Schema

admin2 admin3

admin4

admin5 admin6 admin7 street st_type

Ireland Dublin County

Dublin Ballsbridge Lansdowne

Street

name level geo popularity

Landsdowne Street s <shape>

Page 14: OpenStreetMap Geocoder Based on Solr

Indexing: Schema

admin2 admin3

admin4

admin5 admin6 admin7 street st_type

Ireland Dublin County

Dublin

name level geo popularity

Dublin 6 <shape> 1

Page 15: OpenStreetMap Geocoder Based on Solr

Indexing: Schema (POIs)

admin2 admin3

admin4

admin5 admin6 admin7 street st_type

Ireland Dublin County

Dublin Ballsbridge

name category geo

Ballsbridge Hotel hotel <shape>

Page 16: OpenStreetMap Geocoder Based on Solr

Searching

Classifier Validator

Geocoder (lookup)

Raw query Classifications

Valid classifications

Structured location + geocodes

Page 17: OpenStreetMap Geocoder Based on Solr

Searching: Classification

Tokenizer Bloom FiltersQuery Shingles Classifications

Page 18: OpenStreetMap Geocoder Based on Solr

Searching: Classification

● Query= “hotels near lansdowne rd dublin”

● Shingles: hotels, near, lansdowne, rd, dublin, hotels near, near lansdowne, lansdowne rd, rd dublin, .., hotels near lansdowne rd dublin

Tokenizer Bloom FiltersQuery Shingles Classifications

Page 19: OpenStreetMap Geocoder Based on Solr

Searching: Classification

● hotels, near, lansdowne, rd, dublin, hotels near, near lansdowne, ..

Tokenizer Bloom FiltersQuery Shingles Classifications

Cat A2 A4 A5 Streets

hotels

Match

Page 20: OpenStreetMap Geocoder Based on Solr

Searching: Classification

● hotels, near, lansdowne, rd, dublin, hotels near, near lansdowne, ..

Tokenizer Bloom FiltersQuery Shingles Classifications

Cat A2 A4 A5 Streets

dublin

MatchMatch

Page 21: OpenStreetMap Geocoder Based on Solr

Searching: Classification

● hotels, near, lansdowne, rd, dublin, hotels near, near lansdowne, ..

Tokenizer Bloom FiltersQuery Shingles Classifications

Cat A2 A4 A5 Streets

lansdowne

MatchMatch

Page 22: OpenStreetMap Geocoder Based on Solr

Searching: Classifications

● Query = “hotels near lansdowne rd dublin”

● Classifications:hotels = categorylansdowne = admin5lansdowne = streetdublin = admin5dublin = street

Page 23: OpenStreetMap Geocoder Based on Solr

Searching: Classifications

● Query = “hotels near lansdowne rd dublin”

● Classifications:hotels = categorylansdowne = admin5lansdowne = streetdublin = admin5dublin = street

● Possible permutations:C.5.5C.S.5C.5.SC...5C.5..etc.

Page 24: OpenStreetMap Geocoder Based on Solr

Searching: Solr Query

● Query = “hotels near lansdowne rd dublin”

● Possible permutations:C.5.5: +level:5 +admin5:lansdowne +admin5:dublinC.S.5: +level:s +street:lansdowne +admin5:dublinC.5.S: +level:s +street:dublin +admin5:lansdowneC...5: +level:5 +admin5:dublinC.5..: +level:5 +admin5:lansdowneetc.

Page 25: OpenStreetMap Geocoder Based on Solr

Searching: Solr Query

● Query = “hotels near lansdowne rd dublin”

● Possible permutations:C.5.5: +level:5 +admin5:lansdowne +admin5:dublinC.S.5: +level:s +street:lansdowne +admin5:dublinC.5.S: +level:s +street:dublin +admin5:lansdowneC...5: +level:5 +admin5:dublinC.5..: +level:5 +admin5:lansdowneetc.

Page 26: OpenStreetMap Geocoder Based on Solr

Searching: Solr Query

● Query = “hotels near lansdowne rd dublin”

● Possible permutations:C.5.5: +level:5 +admin5:lansdowne +admin5:dublinC.S.5: +level:s +street:lansdowne +admin5:dublinC.5.S: +level:s +street:dublin +admin5:lansdowneC...5: +level:5 +admin5:dublinC.5..: +level:5 +admin5:lansdowneetc.

"POINT (-6.232063,53.333833)"

Page 27: OpenStreetMap Geocoder Based on Solr

Searching: Searching for POIs

● Query = “hotels near lansdowne rd dublin”

● Query = “hotels near” near "POINT (-6.232063,53.333833)"

● Solr query: fl=*,scoresort=score ascq={!geofilt score=distance filter=false sfield=geo pt= 53.333833,-6.232063 d=10}fq=+category:hotel

Page 28: OpenStreetMap Geocoder Based on Solr

Searching: Searching for POIs

Page 29: OpenStreetMap Geocoder Based on Solr

Challenges: Indexing

● Street Associativity

● Incomplete polygons

Page 30: OpenStreetMap Geocoder Based on Solr

Challenges

● Handling Updates

● Data validation

Page 31: OpenStreetMap Geocoder Based on Solr

Distributed Search

● Need for distributed search?

● Geographical partitioning

Page 32: OpenStreetMap Geocoder Based on Solr

Conclusion

● http://www.geocoder.in/

● Twitter: @ichattopadhyaya


Recommended