37
Business Identification: Spatial Detection Alexander Darino Week 5

Business Identification: Spatial Detection

  • Upload
    hanley

  • View
    21

  • Download
    0

Embed Size (px)

DESCRIPTION

Business Identification: Spatial Detection. Alexander Darino Week 5. Outline. Recap of Previous Work Business Name Detection Business Name Matching Business Spatial Detection Weaknesses to Current Approach Alternatives to Current Approach Acknowledgements. Outline. Week 4. Week 5. - PowerPoint PPT Presentation

Citation preview

Page 1: Business Identification: Spatial Detection

Business Identification:Spatial Detection

Alexander DarinoWeek 5

Page 2: Business Identification: Spatial Detection

2

Outline

• Recap of Previous Work• Business Name Detection• Business Name Matching• Business Spatial Detection• Weaknesses to Current Approach• Alternatives to Current Approach• Acknowledgements

Page 3: Business Identification: Spatial Detection

3

Outline

LatitudeLongitude

Geocoding

ReverseGeocoding

Nearby Businesses

Image OCR Detected Text

Business Name

Matching

BusinessIdentification

Business Spatial

Detection

Week 4 Week 5

Page 4: Business Identification: Spatial Detection

Previous Work

4

Image Where Am I? Latitude, Longitude

Latitude, Longitude

Geocoding

ReverseGeocoding

Nearby Businesses

65George S Aiken CoWinghart's Burger & Whiskey BarMarket SquareBella Sera On the SquareChipotleNOLALas Velas…

Page 5: Business Identification: Spatial Detection

5

Business Name Detection

LatitudeLongitude

Geocoding

ReverseGeocoding

Nearby Businesses

Image OCR Detected Text

Business Name

Matching

BusinessIdentification

Business Spatial

Detection

Page 6: Business Identification: Spatial Detection

6

Business Name Detection

Page 7: Business Identification: Spatial Detection

7

Business Name Detection…<line dy="95" dx="1573" y="420" x="11" value="1">

<space dy="26" dx="9" y="379" x="11"/> <box dy="26" dx="9" y="379" x="11" value="0" weights="96" numac="1"/> <box dy="25" dx="6" y="406" x="11" value="J" weights="98,62" numac="2"

achars="p"/> <box dy="19" dx="5" y="382" x="19" value="n" weights="96" numac="1"/> <space dy="5" dx="30" y="441" x="25"/> <box dy="5" dx="7" y="441" x="56" value="."/> <box dy="24" dx="5" y="401" x="57" value="."/> <box dy="13" dx="8" y="429" x="58" value="v" weights="98" numac="1"/> <box dy="26" dx="9" y="402" x="60" value="." weights="94" numac="1"/> <box dy="22" dx="5" y="406" x="67" value="0" weights="96" numac="1"/> <box dy="10" dx="12" y="444" x="71" value="."/>

</line>…

Page 8: Business Identification: Spatial Detection

8

Business Name Matching

LatitudeLongitude

Geocoding

ReverseGeocoding

Nearby Businesses

Image OCR Detected Text

Business Name

Matching

BusinessIdentification

Business Spatial

Detection

Page 9: Business Identification: Spatial Detection

9

Business Name Matching

• Developed Confidence Attribution Algorithm– Confidence of OCR Token being Name Token• Example: Confidence of “ESTUANT” representing

“RESTAURANT”• Point-based system

– Confidence of Name appearing in Image• Sum of points of matching OCR Text• Use logarithmically-normalized points to determine

business inclusion threshold

Page 10: Business Identification: Spatial Detection

10

Business Name Matching

Page 11: Business Identification: Spatial Detection

11

Page 12: Business Identification: Spatial Detection

12

Business Name Matching

Page 13: Business Identification: Spatial Detection

13

Page 14: Business Identification: Spatial Detection

14

Business Name Matching

Page 15: Business Identification: Spatial Detection

15

Business Name Matching

Note: k is usually 2 or 3

Page 16: Business Identification: Spatial Detection

16

Business Name Matching

Page 17: Business Identification: Spatial Detection

17

Business Name Matching

Note: This originally did not appear because it did not exceed the confidence threshold. It now appears because it contributes to the Business Name Identification

Page 18: Business Identification: Spatial Detection

18

Business Spatial Identification

LatitudeLongitude

Geocoding

ReverseGeocoding

Nearby Businesses

Image OCR Detected Text

Business Name

Matching

BusinessIdentification

Business Spatial

Detection

Page 19: Business Identification: Spatial Detection

19

Business Spatial Identification

Page 20: Business Identification: Spatial Detection

20

Business Spatial Identification

Aiken George S Co

Category:Food, GroceryAddress: 218 Forbes AvePittsburgh, PA 15222Phone: (412) 391-6358Rating: 4.5/5 (2 Reviews)

Page 21: Business Identification: Spatial Detection

21

Business Spatial Identification

Page 22: Business Identification: Spatial Detection

22

Business Spatial Identification

Page 23: Business Identification: Spatial Detection

23

Business Spatial Identification

Bruegger's Bagels

Category:BagelsAddress: Market Sq

Pittsburgh, PA 15222Phone: (412) 281-2515Rating: Not Rated

Page 24: Business Identification: Spatial Detection

24

Weaknesses to Current Approach

LatitudeLongitude

Geocoding

ReverseGeocoding

Nearby Businesses

Image OCR Detected Text

Business Name

Matching

BusinessIdentification

Business Spatial

Detection

Page 25: Business Identification: Spatial Detection

25

Weaknesses to Current Approach

Lots of Garbage

Page 26: Business Identification: Spatial Detection

26

Weaknesses to Current Approach

Fragmented Word Detection

Page 27: Business Identification: Spatial Detection

27

Weaknesses to Current ApproachFails with

non-orthogonal perspective

Did I already mention lots of

garbage?

Page 28: Business Identification: Spatial Detection

28

Weaknesses to Current Approach

Fails withnon-roman text

Not scale-invariant

Page 29: Business Identification: Spatial Detection

29

ALTERNATIVE APPROACHESTwo different

Page 30: Business Identification: Spatial Detection

30

Alternative #1: Image Matching

LatitudeLongitude

Geocoding

ReverseGeocoding

Nearby Businesses

Image

Match to Storefront

Image

BusinessIdentification

Business Spatial

Detection

Page 31: Business Identification: Spatial Detection

31

Alternative #1: Image Matching

Page 32: Business Identification: Spatial Detection

32

Alternative #1: Image Matching

• Weaknesses– Storefront images aren’t always available for

matching– Computationally Expensive• Hundreds of images to compare to

– Nothing new– Boring!

Page 33: Business Identification: Spatial Detection

33

Alternative #2: Template Matching

LatitudeLongitude

Geocoding

ReverseGeocoding

Nearby Businesses

Image

Render Templates of Business Names in Different Fonts

Business SpatialDetection

Image Matching(eg. SIFT, HAAR)

Template Images

Business Identification

Page 34: Business Identification: Spatial Detection

34

Alternative #2: Template Matching

• Tambellini• Tambellini• Tambellini• Tambellini• Tambellini• Tambellini• Tambellini• Tambellini

Page 35: Business Identification: Spatial Detection

35

Alternative #2: Template Matching

OCR• Not Scale Invariant• Unbounded Search• Fragmented Recognition• Roman-only font

Alternative #2• Scale Invariant• Bounded Search• Whole-word recognition• All fonts

Page 36: Business Identification: Spatial Detection

36

Acknowledgements

• Subh– Provided several ideas regarding template

matching using SIFT, HAAR features, etc

Page 37: Business Identification: Spatial Detection

Thank You