34
Search engine and services Course: Location Aware Machine Intelligence Presented by : Celestine Mkama Kalendero 25.02.2014

Search engine and services

  • Upload
    kezia

  • View
    39

  • Download
    0

Embed Size (px)

DESCRIPTION

Search engine and services. Course: Location Aware Machine Intelligence Presented by : Celestine Mkama Kalendero 25.02.2014. Outline. Search Engine results ranking based on location Review of Personalized Mobile Search Engine Extraction of Address Data from Unstructured Text. - PowerPoint PPT Presentation

Citation preview

Page 1: Search engine and services

Search engine and services

Course: Location Aware Machine IntelligencePresented by : Celestine Mkama Kalendero

25.02.2014

Page 2: Search engine and services

Outline1. Search Engine results ranking based on location2. Review of Personalized Mobile Search Engine 3. Extraction of Address Data from Unstructured Text

Page 3: Search engine and services

Search Engine Results Ranking based on Location

Carolyn Watters and Ghada AmoudiFaculty of Computer Science, Dalhousie University, Halifax, Nova

Scotia. Canada. E-mail: [email protected] Year: 2003

Page 4: Search engine and services

Result Ranking in Search engine

( as in the year 2002 )Search engine build their indexes based on a) Keyword occurence Frequency of query negotiation

Prons+ Robust, FastCons- User sort through pages when queries related to physical

distance and location 44 % of users frustrated by search engine (Realname,2000)

Page 5: Search engine and services

Geosearcher Location based ranking system Translate search reference point into coordinates (Long,Lat) Rank search results in ascending order based on distance

Geosearcher architecture

Page 6: Search engine and services

Geosearcher architecture-Query Presented by end system users e.g skiing resort District of Columbia Query- Skiing resolt Reference Point- District of Columbia Sample random Urls available ( used for evaluation )

Page 7: Search engine and services

Geosearcher architecture-Geocoding

Process of assigning latitude and longitude coordinates to the host for each site;

- Preliminary work ( Perfomed by researchers)a) Determine Locationb) Create Lookup table

Page 8: Search engine and services

Geosearcher architecture-Geocoding

a) Determining Location From Host Urls – DNS,Country Codes,Whois database

- Map location into coordinates e.g Use Getty Thesaurus(GS) to map location into cordinates + Containing state and area code for US,Canada + Other Countries

b) Lookup Table - Country Codes with Coordinates

www.about.comwww.dartmouth.camathresource.com

Page 9: Search engine and services

Geosearcher architecture-Geocoding

a) Determining Location From Host Urls – DNS,Country Codes,Whois database

- Map location into coordinates e.g Use Getty Thesaurus(GS) to map location into cordinates + Containing state and area code for US,Canada + Other Countries

Lookup TableCountry Code State Code Area Code Coordinates(Lat,Long)US AL 25634.9200, 87.2703 US CA 53038.8951, 77.0367CA NS 90245.0000, 63.0000FI Helsinki 60.1708, 24.9375

NO Oslo 59.9500, 10.7500

Page 10: Search engine and services

Example: Location Information

Getty thesaurus

Whois Database

Page 11: Search engine and services

Geosearcher architecture-Geocoding

The Processa) Check coordinates from host tableb) If not, send domain to whois -Return Country Code(CC) and Area code on Match If CC is ca or us and area code, Lookup in Table :- Get state

name or province c) If not ,strip down domain by 1 level (i.e data.about.com to

about.com )d) Unmatched names checked in IPtoLL(Host-LatLong Conversion) - IPtoLL uses administrative contactStore Results in host table

Next

Page 12: Search engine and services

Geosearcher architecture-Geocoding

The Processa) Check coordinates from host tableb) If not, send domain to whois -Return Country Code(CC) and Area code on Match If CC is ca or us and area code, Lookup in Table :- Get state

name or province

Host TableHost Coordinates(Lat,Long)

www.skibluemt.com 34.9200, 87.2703

www.dcski.com 38.8951, 77.0367

Page 13: Search engine and services

Distance and Ranking

For Ranking URL in host table from ref Location Calculated using haversine distance Stored in session host table Rank results based on distance (Insertion sort)

Page 14: Search engine and services

Results

Unranked Result-

Altavista

Using Geosearcher

Page 15: Search engine and services

Results..contdValidation of accuracy Examined 100 result manually for Location Information 90 websites assigned correctly

78% of 83 URLs were accurately identified

Page 16: Search engine and services

Results..contdAlgorithm Effectiveness Tested with 10 sets of 100 URLs using Yahoo Random Link

generator

Page 17: Search engine and services

Personalized Mobile Search Engine Using Location and Content Concepts

Namrata G Kharate ME-Computer-II

MCOERC, Nasik-India

Prof. S. A. BhavsarAssistant Prof. Computer Dept.

MCOERC, Nasik-India

Publication: November, 2013

Page 18: Search engine and services

Search - Mobile Devices Search queries on mobile Devices – Shorter,ambiguous Search Results- Less Accurate

Solution We need a system that capture user preference to return

personalized result ranking Personalized Mobile Search Engine (PMSE)

Page 19: Search engine and services

PMSE- System Architecture

RSVM- Ranking Support Vector Machine Next

Page 20: Search engine and services

PMSE- System Architecture

RSVM- Ranking Support Vector Machine

Page 21: Search engine and services

PMSE

Client Receive user requests Store Click through Data (Location,Content) Submit Request to server Display results Profile preference in ontology based user profile

Server Forward request to commercial search engine RSVM Training Search Result Reranking

Page 22: Search engine and services

Extraction of Address Data from Unstructured Text using Free Knowledge Resources

Sebastian [email protected]

Simon [email protected]

Publication: November, 2013

Ralf [email protected]

Christoph [email protected]

Multimedia Communications LabTechnische UniversitätDarmstadt Germany

Page 23: Search engine and services

Extraction of Address Data

Is of interest in various domainso Location – based serviceso Address respiratory –automatically created

- Automatic harvesting of web address is not possible

Solution Identify business address data,hybrid approach

Combine Pattern & Gazetteers

Page 24: Search engine and services

Address Structure-Germany

Company Name- No special pattern Street- varies, Burgermeister-Jung,Bgm.-Jung Street # - Digit sequence, e.g 45a,45-47 Postal Code-exactly 5 numbers,reserved Cities –Frankfurt,Ffm,Frankfurt/Main

Page 25: Search engine and services

Address Data IdentificationWorkflow

Page 26: Search engine and services

Address Data IdentificationPreprocessing Strip HTML Markup –e.g using Beautiful Soap Library Clearing- Removing non-unicode chars,White space btn

numbers Line Splitting and Tokenizing –using Apache openNLP toolkit Part of Speech Tagging- using TreeTagger

Next

Page 27: Search engine and services
Page 28: Search engine and services

Address Data IdentificationLine Splitting and Tokenizing –using Apache openNLP toolkit

Page 29: Search engine and services

Address Data Identification1. Postal Codes

Token regular expression [0-9]{5}2. Cities

Generated list based on OpenStreetMap accessed via Overpass-API (28,087 entries)

oKnown city found in the listoPreceded directly by postal code

Page 30: Search engine and services

Address Data Identification3. Street Numbers

Use Regular expression ([0-9]{1,3})([a-zA-Z][0-9]?)?(([+|-])([0-9]{1,3})([a-zA-Z][0-9]?)?)?

4. Steet NamesGenerated list based on OpenStreetMap

accessed via Overpass-API (300,000 entries)oUse street name endings e.g str

Page 31: Search engine and services

Address Data Identification5. Company Name Search Identical terms ( Wikipedia )- 29 terms e.g GmbH-Private,AG-Public Exploit standard address structure

Page 32: Search engine and services

Evaluation & Methology Site with Legal Note (1,576 websites )

Fraction of full address identified correctly

Rcorrect Address- 0.946, Rcompany-0.82

complete address w/o

company name

complete address with

company name

company name

street city0.50.60.70.80.9

1

Precision

Recall

Page 33: Search engine and services

ConclusionSearch engine Ranking Evaluation- Algorithm was accurate and effective Efficiency- Impacted by reliance on external databases

Reccommendation Have Database of special resources – Increase efficiency Adaptation to other languages- Address extraction

Page 34: Search engine and services

Thank You!

(Q&A)