View
212
Download
0
Embed Size (px)
Citation preview
•1
The GeoParser
•2
Overview
• What is a geoparser?– Software for the automated extraction of place names
from text
• Why would you want one?– Document characterisation
– Explicit geocoding of metadata making document inherently geographically searchable
• How?– ‘bruteforce’
– rule based
•3
Geo-spatial data“data that have some form of spatial or geo-graphic reference that enables them to be located in two- or three-dimensional space”
Statistical Account of Scotland
NUMBER XIII.
PARISH OF CULLEN.
(COUNTY OF BANFF, SYNOD OF ABERDEEN, PRESBYTERY OF FORDYCE.)
By the Rev. Mr. ROBERT GRANT.
Royalty, Extent, Climate, etc.
CULLEN, as appears from old charters, was originallycalled Inverculan, because it stands upon the bank ofthe Burn of Cullen, which, at the N. end of the town, fallsinto the sea: but now it is known by the name of Cullen on-ly. Cullen is a royal burgh, formerly a constabulary, ofwhich the Earl of Findlater was hereditary constable. Theset, as it is called, of the council, consists of 19, in which num-ber are included the Earl of Findlater, hereditary preses, 3bailies, a treasurer, a dean-of-guild, and 13 counsellors. Theparish extends from the sea fouthward, about 2 English milesin length.
•4
Input document
Geoparse
Review
Output document
Geoparsing Flowline
•5
Geoparser architecture
Web Interface
geoXwalk Database
Text Docs / web pages
Parser : rule based place name id
Downloadable metadata
record xml, (gml?)
Results Table / map
preview
2. Geoparse
1.Inputs 3.Review4.Ouputs
•6
Demonstration
•7
Broad Issues
• What’s a geoparser for?– Geo-referencing tool for enhancing metadata?– Text analysis tool?
• Areas for improvement – Need for more reliable geoparsing algorithms
• to disambiguate multiple occurrences of the same place name in the same text
• to develop automated feature typing Areas for improvement
– Need for more reliable geoparsing algorithms • to disambiguate multiple occurrences of the same place name in
the same text– to develop automated feature typing
• Degree of user intervention - how ‘semi’ should semi-automatic be? – Interface design depends largely on the ‘accuracy’ of the
parser and the user’s motivations ?
•8
(An aside - Possible Solutions)
• Implement variety of parsing methods– user selects depending on use e.g.
• context based approach• definitive place name matching against gazetteer
• Tools made available to user depend on type and number of documents and intended use. – Need to find balance between text analysis and user
interaction
e.g. Batch facility limited to certain document types and user selected parsing method - minimal user intervention.
•9
Specific Issues
• The distinction between parser selected locations and gazetteer locations needs to be more explicit– no. of occurrences in text following geo-
referencing?
• Users will be able to search the gazetteer and add records to output
• Addition of ‘rogue’ place names to the gazetteer– (Quality assurance issues)
•10
Continued...
• Implementation of sorting functions to the results table
• Output options– currently preview results table
– map view for geo-referenced place names
– file download• required formats (xml, gml?)• Original document marked up in html(?)
•11