32
Business Identification: Local Neighborhood Alexander Darino

Business Identification: Local Neighborhood Alexander Darino

Embed Size (px)

Citation preview

Page 1: Business Identification: Local Neighborhood Alexander Darino

Business Identification:Local Neighborhood

Alexander Darino

Page 2: Business Identification: Local Neighborhood Alexander Darino

Outline

• Where Am I? project obtains geolocation of camera from image

• Objective: Obtain the geolocation and address of Businesses in image– Assume Business is nearby, eg. < 100m from

camera– Compare methods of obtaining this information

Page 3: Business Identification: Local Neighborhood Alexander Darino

Outline

LatitudeLongitude

GeocodingReverse Geocoding

Nearby Businesses

Image OCR Detected Text

Business Name

MatchingBusiness

Identification

Page 4: Business Identification: Local Neighborhood Alexander Darino

Outline

• This Week:– Finding Local Businesses via Geocode Search– Finding Local Addresses via Reverse Geocoding– Extracting Identifying Text (ie. store names) via

Optical Character Recognition (OCR) – Matching OCR text to Business Names

• Next Steps/Weekend Objectives• Acknowledgements

Page 5: Business Identification: Local Neighborhood Alexander Darino

Obtaining Business Names

Page 6: Business Identification: Local Neighborhood Alexander Darino

Local Businesses: Geocode Search

• Used Three Place-Search APIs:– Yelp API - detailed yellow page-type results– Google Places API - "Skinny" + Reference to more

information– CityGrid API - minimal yellow page-type results

• Used by Yellow Pages, Super Pages

• At present, only interested in business names • Aggregated names from all three APIs • Example (next slide)

Page 7: Business Identification: Local Neighborhood Alexander Darino

Local Businesses: Geocode Search40.441127247181797 -80.002821624487595Denham & Company SalonUllrich's Shoe RepairingNicholas Coffee CoBella Sera On the SquareA & J RibsStarbucks CoffeeJenny Lee BakeryGalardi's 30 Minute CleanersJimmy John's Gourmet SandwichesCharley's Grilled SubsFresh CornerLagondola Pizzeria & RestaurantCamera Repair Service IncPittsburgh Cigar BarOriginal Oyster HouseMixStirs1902 TavernCostanzo'sPittsburgh Silver LlcGraeme StGalardi's 30 Minute CleanersDenham & Co SalonBruegger's Bagel BakeryNicholas Coffee CoMarket SquareFat Tommy's PizzeriaMixstirs CafeGigglesRycon Construction IncGarbera, Dennis C, Dds - Emmert Dental AssocBella Sera on the SquareMancini's Bread CoLas VelasCiao BabyWashington Reprographics IncHighmark Life Insurance CoFischer, Donald R, Md - Highmark Life Insurance CoJimmy John'sLynx Energy Partners IncEmmert Dental Assoc

Page 8: Business Identification: Local Neighborhood Alexander Darino

Local Businesses: Geocode SearchResults:

12 Success, 3 PartialQ9: First Presbyterian Church [turns out it wasn't a cathedral] (SUCCESS)Q28: Moe's (SUCCESS)Q34: Bruegger's Bagels (SUCCESS)Q35: Breuggers, Tavern, Nicholas (SUCCESS)Q42: Tavern, Nicholas, Constanzo's [in distance] (SUCCESS)Q57: Tambellini (SUCCESS)Q63: Benedum Center (SUCCESS)Q141: Roberts/7-Eleven (PARTIAL - misses Roberts)Q200: Goodyear (SUCCESS)Q238: Far from Bruegger's, Tavern, Nicholas (PARTIAL - misses Tavern)Q246: Some theater (can't read it) (SUCCESS)Q249: George Aikens (SUCCESS)Q260: Dogs Dun Wright, Cherrie's diner (SUCCESS) Q300: Giggles, Bruegger's, Tavern (in distance) (SUCCESS)Q318: Fifth Avenue Place, Wines & Spirits (PARTIAL - misses Wine & Spirits)

Page 9: Business Identification: Local Neighborhood Alexander Darino

Local Businesses: Geocode Search• Strengths

– Aggregated results almost always found Business of interest

• Weaknesses

– Each API limits query result set size - this is why we aggregate– Contacted Yelp, Google, CityGrid for extended API Access.

• Heard back from CityGrid; conference call next week.

– Only businesses listed– Not all businesses listed

• All but one "Partial" result were for unlisted businesses

• Limitations

– Have only tested for 15 Pittsburgh images - unknown result quality for rural areas.

Page 10: Business Identification: Local Neighborhood Alexander Darino

Local Businesses: Geocode Search

✓*

✓* Implicitly verified: APIs can search by latitude/longitude OR address

Page 11: Business Identification: Local Neighborhood Alexander Darino

Local Addresses: Reverse Geocoding

• Used Two Reverse-Geocoding APIs– Google: provides a range of addresses on the same road

• Usually the road is correct, but sometime's it's slightly off• Sometimes the road is correct, but the actual address number is not in

the range

• Bing: provides one or two proximate addresses– Rates it's own confidence. Even 'Medium' confidences are very

accurate– Address is never exact, but is almost always adjacent to correct

address– Results returned never consistent: always returns one or the

other or both of the two addresses regardless of confidence level

Page 12: Business Identification: Local Neighborhood Alexander Darino

Local Addresses: Reverse Geocoding

• Intent: Get up to ~500 nearby addresses• No Address Search API Available

✓*

✓✗

Page 13: Business Identification: Local Neighborhood Alexander Darino

Extracting Identifying Text: OCR

LatitudeLongitude

GeocodingReverse Geocoding

Nearby Businesses

Image OCR Detected Text

Business Name

MatchingBusiness

Identification

Page 14: Business Identification: Local Neighborhood Alexander Darino

Extracting Identifying Text: OCR

• Given:– List of nearby businesses (names, addresses, etc)– Image containing businesses with visible names

• Objective:– Extract name of businesses from image– Identify businesses located in image

• Match names extracted from image to names in business list

Page 15: Business Identification: Local Neighborhood Alexander Darino

Extracting Identifying Text: OCR

• Used Two OCR APIs:– GNU OCR (Ocrad)– GOCR

• OCR APIs highly sensitive to:– Font (only works well with roman font)– Perspective– Scale– Binarization Threshold– Dark on Light vs. Light on Dark (inversion)

Page 16: Business Identification: Local Neighborhood Alexander Darino

Extracting Identifying Text: OCR

• OCR API evaluations– Ocrad - could not yield any meaningful data across

over 200 scale/threshold/inversion combinations– GOCR - produced good results across 10 scales

with and without inversion using threshold automatically determined by Otsu's method

• Examples of GOCR output (next slides)

Page 17: Business Identification: Local Neighborhood Alexander Darino

Extracting Identifying Text: OCR

Page 18: Business Identification: Local Neighborhood Alexander Darino

Extracting Identifying Text: OCR

n.c.......o.a...u..............oU..D.oa..e......_RuEGGE..KERy..J...w...........L........M.II.....c..

...i

.......l.

.J

.t...llt...lSHA.P.It..tllt.........._.l...Jy._.c_...._tt.._....t.._.r.........t.t_t.._.._.l..J.r.r.I.

Page 19: Business Identification: Local Neighborhood Alexander Darino

Extracting Identifying Text: OCR

Page 20: Business Identification: Local Neighborhood Alexander Darino

Extracting Identifying Text: OCRu..........._nq......eoR.E.l.e...í....e...n....n....n.e.R.E...e....o._....E.R.E.IKE........I.ltlO.........rE..o......E.....I.K.E.o.....

J.n....c...E.R.E.I.E.......M..E.R.E...E...aJ...Gu.ge..geE.F.._.....E..gE.D...fUlI..lll.lll.IIi.l..Xl..

Page 21: Business Identification: Local Neighborhood Alexander Darino

Extracting Identifying Text: OCR

Page 22: Business Identification: Local Neighborhood Alexander Darino

Extracting Identifying Text: OCR..e_..w.._......D.........uJ.....J.................n......n..........n_..r.l_d..J.ec.m._..n.......J.n.._...tn..ct..._.................D.u.v...e.n....u..

Y.._w.n.n....Jn.......G..o..r..._........J...ml.t..l.tt.l.._w....................._....l....t........j..ilI.i..

Page 23: Business Identification: Local Neighborhood Alexander Darino

Extracting Identifying Text: OCR

Page 24: Business Identification: Local Neighborhood Alexander Darino

Extracting Identifying Text: OCR__.ncu_.l..._..._J...ne......._n._..v.....ra......d_..._.............i..n..UllREsT.unAN...r.c.....r...Tt.rJll......m...c.....n.......

..

.Jn.I..c...r.rESTAU.ANT.r.O....c.cc.

Note: Even though "Tambellini" is a roman font, it is too stretched to be picked up by GOCR

Page 25: Business Identification: Local Neighborhood Alexander Darino

Extracting Identifying Text: OCR

• Strengths– Applicable to expected input of orthogonal images– Output can be run through word similarity matching algorithms

• Weaknesses– Only works well(-ish) for strictly roman font

• Limitations– Will perform poorly for artistic fonts and business signs

• Conclusion– By itself, OCR is not the best approach towards Business

identification (poor recognition, franchises, perspective, etc)– OCR could be used as part of Business identification voting

scheme

Page 26: Business Identification: Local Neighborhood Alexander Darino

Matching OCR Text to Business Names

LatitudeLongitude

GeocodingReverse Geocoding

Nearby Businesses

Image OCR Detected Text

Business Name

MatchingBusiness

Identification

Page 27: Business Identification: Local Neighborhood Alexander Darino

Matching OCR Text to Business Names

• Fuzzy String Matching: TRE Package– Approximate Regular Expression Matching– Returns edit-distance of matched text

• Filter OCR text– Trimming– Chunking– Uselessness (ie. Less than two letters)

• Developing algorithm to rate confidence of business name appearing in image

Page 28: Business Identification: Local Neighborhood Alexander Darino

Matching OCR Text to Business Names

𝐶𝑜𝑛𝑓𝑖𝑑𝑒𝑛𝑐𝑒 (𝑁𝑎𝑚𝑒 )= 1𝑂𝐶𝑅 h𝑀𝑎𝑡𝑐 𝑒𝑠 ∑

𝑂𝐶𝑅

❑ h𝐿𝑒𝑛𝑔𝑡 (𝑂𝐶𝑅 )(1+¿𝐸𝑟𝑟𝑜𝑟𝑠 )¿¿

¿

Page 29: Business Identification: Local Neighborhood Alexander Darino

Matching OCR Text to Business Names

Page 30: Business Identification: Local Neighborhood Alexander Darino

Next Steps/Weekend Objectives

• Implement ‘chunking’ to OCR output• Evaluate and refine algorithm against multiple inputs• Detect location of text in image

Page 31: Business Identification: Local Neighborhood Alexander Darino

Acknowledgements

• Subh– Directed us to the Ocrad and GOCR OCR packages– Provided feedback on how to calibrate OCR

packages to extract meaningful text (eg. scaling, inversion, etc)

Page 32: Business Identification: Local Neighborhood Alexander Darino

Thank You