10
Disambiguating Queries Disambiguating Queries for Geographic for Geographic Information Retrieval Information Retrieval Carolyn Hafernik Carolyn Hafernik Thesis Proposal Thesis Proposal May 10, 2006 May 10, 2006 Computer Science Computer Science Advisor: Lisa Advisor: Lisa Ballesteros Ballesteros

Disambiguating Queries for Geographic Information Retrieval Carolyn Hafernik Thesis Proposal May 10, 2006 Computer Science Advisor: Lisa Ballesteros

Embed Size (px)

Citation preview

Page 1: Disambiguating Queries for Geographic Information Retrieval Carolyn Hafernik Thesis Proposal May 10, 2006 Computer Science Advisor: Lisa Ballesteros

Disambiguating Queries for Disambiguating Queries for Geographic Information RetrievalGeographic Information Retrieval

Carolyn HafernikCarolyn Hafernik

Thesis ProposalThesis Proposal

May 10, 2006May 10, 2006

Computer Science Computer Science

Advisor: Lisa BallesterosAdvisor: Lisa Ballesteros

Page 2: Disambiguating Queries for Geographic Information Retrieval Carolyn Hafernik Thesis Proposal May 10, 2006 Computer Science Advisor: Lisa Ballesteros

Information Retrieval (IR)Information Retrieval (IR)

What are the goals of an IR system?What are the goals of an IR system? What is a relevant document?What is a relevant document? How does one determine which documents are How does one determine which documents are

relevant?relevant? How are IR systems evaluated?How are IR systems evaluated?

Page 3: Disambiguating Queries for Geographic Information Retrieval Carolyn Hafernik Thesis Proposal May 10, 2006 Computer Science Advisor: Lisa Ballesteros

Geographic Information Retrieval Geographic Information Retrieval (GIR)(GIR)

GIR is an extension of IRGIR is an extension of IR It aims to use geospatial information to It aims to use geospatial information to

help improve retrieval effectivenesshelp improve retrieval effectivenessWhat makes GIR challenging?What makes GIR challenging?

Poor query specificationPoor query specificationAmbiguity of languageAmbiguity of languageNo central repository for geospatial No central repository for geospatial

informationinformation

Page 4: Disambiguating Queries for Geographic Information Retrieval Carolyn Hafernik Thesis Proposal May 10, 2006 Computer Science Advisor: Lisa Ballesteros

Geospatial Geospatial InformationInformation

Map from Map from www.lib.utexas.edu/maps/usmet.htmlwww.lib.utexas.edu/maps/usmet.html

LocationsLocations Population statisticsPopulation statistics Name variationsName variations Nearby landmarksNearby landmarks How can geospatial How can geospatial

information be used to information be used to increase retrieval increase retrieval effectiveness given a query?effectiveness given a query?

Example query: “Hiking near Example query: “Hiking near the Bay Area”the Bay Area”

Page 5: Disambiguating Queries for Geographic Information Retrieval Carolyn Hafernik Thesis Proposal May 10, 2006 Computer Science Advisor: Lisa Ballesteros

Sample GeoCLEF 2005 TopicsSample GeoCLEF 2005 Topics<top><top><num> GC001 </num><num> GC001 </num><orignum> C084 </orignum><orignum> C084 </orignum><EN-title> Shark Attacks off Australia and California </EN-title><EN-title> Shark Attacks off Australia and California </EN-title><EN-desc> Documents will report any information relating to shark <EN-desc> Documents will report any information relating to shark

attacks on humans. </EN-desc>attacks on humans. </EN-desc><EN-narr> Identify instances where a human was attacked by a <EN-narr> Identify instances where a human was attacked by a

shark, including where the attack took place and the shark, including where the attack took place and the circumstances surrounding the attack. Only documents circumstances surrounding the attack. Only documents concerning specific attacks are relevant; unconfirmed shark concerning specific attacks are relevant; unconfirmed shark attacks or suspected bites are not relevant. </EN-narr>attacks or suspected bites are not relevant. </EN-narr>

<!-- NOTE: This topic has added tags for GeoCLEF --><!-- NOTE: This topic has added tags for GeoCLEF --><EN-concept> Shark Attacks </EN-concept><EN-concept> Shark Attacks </EN-concept><EN-spatialrelation> near </EN-spatialrelation><EN-spatialrelation> near </EN-spatialrelation><EN-location> Australia </EN-location><EN-location> Australia </EN-location><EN-location> California </EN-location><EN-location> California </EN-location></top></top>

<top><top><num> GC004 </num><num> GC004 </num><orignum> C126 </orignum>-<orignum> C126 </orignum>-<EN-title> Actions against the fur industry in <EN-title> Actions against the fur industry in

Europe and the U.S.A. </EN-title>Europe and the U.S.A. </EN-title><EN-desc> Find information on protests or <EN-desc> Find information on protests or

violent acts against the fur industry. </EN-violent acts against the fur industry. </EN-desc> desc>

<EN-narr> Relevant documents describe <EN-narr> Relevant documents describe measures taken by animal right activists measures taken by animal right activists against fur farming and/or fur commerce, against fur farming and/or fur commerce, e.g. shops selling items in fur. Articles e.g. shops selling items in fur. Articles reporting actions taken against people reporting actions taken against people wearing furs are also of importance. </EN-wearing furs are also of importance. </EN-narr>narr>

<!-- NOTE: This topic has added tags for <!-- NOTE: This topic has added tags for GeoCLEF -->GeoCLEF -->

<EN-concept> Animal Rights Actions against the <EN-concept> Animal Rights Actions against the fur industry </EN-concept>fur industry </EN-concept>

<EN-spatialrelation> in </EN-spatialrelation><EN-spatialrelation> in </EN-spatialrelation><EN-location> Europe </EN-location><EN-location> Europe </EN-location><EN-location> United States </EN-location><EN-location> United States </EN-location></top></top>

Page 6: Disambiguating Queries for Geographic Information Retrieval Carolyn Hafernik Thesis Proposal May 10, 2006 Computer Science Advisor: Lisa Ballesteros

Previous WorkPrevious Work

GeoCLEF 2005GeoCLEF 2005Common approachesCommon approaches

Places to store informationPlaces to store informationNamed Entity RecognitionNamed Entity RecognitionQuery ExpansionQuery ExpansionTraditional IR approachesTraditional IR approaches

Page 7: Disambiguating Queries for Geographic Information Retrieval Carolyn Hafernik Thesis Proposal May 10, 2006 Computer Science Advisor: Lisa Ballesteros

HypothesisHypothesis My hypothesis is that using geospatial My hypothesis is that using geospatial

information for query expansion and to re-weight information for query expansion and to re-weight geospatial components for each query will geospatial components for each query will improve retrieval effectiveness. improve retrieval effectiveness. Improvement will occur because the expanded query Improvement will occur because the expanded query

will provide the system with more specific information will provide the system with more specific information than that contained in the original query.than that contained in the original query.

Page 8: Disambiguating Queries for Geographic Information Retrieval Carolyn Hafernik Thesis Proposal May 10, 2006 Computer Science Advisor: Lisa Ballesteros

TimelineTimeline TimelineTimeline

Fall SemesterFall Semester Build the GazetteerBuild the Gazetteer Modify Query AnalyzerModify Query Analyzer Design ExperimentsDesign Experiments Do More Background ReadingDo More Background Reading Start writing thesisStart writing thesis

January TermJanuary Term Run experimentsRun experiments Continue writing thesisContinue writing thesis

Spring SemesterSpring Semester Analyze resultsAnalyze results Run more experiments (If necessary)Run more experiments (If necessary) Finish thesisFinish thesis

Page 9: Disambiguating Queries for Geographic Information Retrieval Carolyn Hafernik Thesis Proposal May 10, 2006 Computer Science Advisor: Lisa Ballesteros

ReferencesReferences [1] Davide Buscaldi, Paolo Rosso, Emilio Sanchia Arnal. [1] Davide Buscaldi, Paolo Rosso, Emilio Sanchia Arnal. A WordNet-based Query Expansion method for Geographical Information A WordNet-based Query Expansion method for Geographical Information

Retrieval. Retrieval. 2005. 2005. [2] Nuno Cardoso, Bruno Martins, Marcirio Silveira Chaves, Leonardo Andrade, Mario J. Silva. The XLDB Group at GeoCLEF 2005. 2005.[2] Nuno Cardoso, Bruno Martins, Marcirio Silveira Chaves, Leonardo Andrade, Mario J. Silva. The XLDB Group at GeoCLEF 2005. 2005. [3] O. Ferrandez, Z. Kozareve, A. Toral, E. Noguera, A. Montoyo, R. Munoz, Fernando Llopis. Univeristy of Alicante at GeoCLEF 2005. [3] O. Ferrandez, Z. Kozareve, A. Toral, E. Noguera, A. Montoyo, R. Munoz, Fernando Llopis. Univeristy of Alicante at GeoCLEF 2005.

2005.2005. [4] Daniel Ferres, Alicia Ageno, Horacio Rodriguez. The GeoTALP-IR System at GeoCLEF-2005: Experiments Using a QA-based IR [4] Daniel Ferres, Alicia Ageno, Horacio Rodriguez. The GeoTALP-IR System at GeoCLEF-2005: Experiments Using a QA-based IR

System, Linguistic Analysis, and a Geographical Thesaurus. 2005.System, Linguistic Analysis, and a Geographical Thesaurus. 2005. [5] Fredric Gey, Ray Larson, Mark Sanderson, Hideo Joho, Paul Chlough. GeoCLEF: the CLEF 2005 Cross-Language Geographic [5] Fredric Gey, Ray Larson, Mark Sanderson, Hideo Joho, Paul Chlough. GeoCLEF: the CLEF 2005 Cross-Language Geographic

Information Retrieval Track Overview. 2005.Information Retrieval Track Overview. 2005. [6] Fredric Gey, Vivien Petras. Berkeley2 at GeoCLEF: Cross-Language Geographic Information Retrieval of German and English [6] Fredric Gey, Vivien Petras. Berkeley2 at GeoCLEF: Cross-Language Geographic Information Retrieval of German and English

Documents. 2005.Documents. 2005. [7] Rocio Guillen. CSUSM Experiments in GeoCLEF2005: Monolingual and Bilingual Tasks. 2005.[7] Rocio Guillen. CSUSM Experiments in GeoCLEF2005: Monolingual and Bilingual Tasks. 2005. [8] Baden Hughes. NICTA i2d2 at GeoCLEF 2005. 2005.[8] Baden Hughes. NICTA i2d2 at GeoCLEF 2005. 2005. [9] Andras Kornai. MetaCarta at GeoCLEF 2005. 2005.[9] Andras Kornai. MetaCarta at GeoCLEF 2005. 2005. [10] Sara Lana-Serrano, Jose M. Goni-Menoyo, Jose C. Gonzalez-Cristobal. Miracle’s 2005 Approach to Geographical Information [10] Sara Lana-Serrano, Jose M. Goni-Menoyo, Jose C. Gonzalez-Cristobal. Miracle’s 2005 Approach to Geographical Information

Retrieval. 2005.Retrieval. 2005. [11] Ray R. Larson. Chesire II at GeoCLEF: Fusion and Query Expansion for GIR. 2005.[11] Ray R. Larson. Chesire II at GeoCLEF: Fusion and Query Expansion for GIR. 2005. [12] Jochen L. Leidner. Preliminary Experiments with Geo-Filtering Predicates for Geographic IR. 2005. [12] Jochen L. Leidner. Preliminary Experiments with Geo-Filtering Predicates for Geographic IR. 2005. [13] Johannes Leveling, Sven Hartrumpf, Dirk Veiel. University of Hagen at GeoCLEF 2005: Using Semantic Networks for Interpreting [13] Johannes Leveling, Sven Hartrumpf, Dirk Veiel. University of Hagen at GeoCLEF 2005: Using Semantic Networks for Interpreting

Geographical Queries. 2005.Geographical Queries. 2005.

Page 10: Disambiguating Queries for Geographic Information Retrieval Carolyn Hafernik Thesis Proposal May 10, 2006 Computer Science Advisor: Lisa Ballesteros

Thank you!Thank you!

Questions? Comments?Questions? Comments?