37
Business Identification: Spatial Detection Alexander Darino Week 5

Business Identification: Spatial Detection Alexander Darino Week 5

Embed Size (px)

Citation preview

Page 1: Business Identification: Spatial Detection Alexander Darino Week 5

Business Identification:Spatial Detection

Alexander DarinoWeek 5

Page 2: Business Identification: Spatial Detection Alexander Darino Week 5

2

Outline

• Recap of Previous Work• Business Name Detection• Business Name Matching• Business Spatial Detection• Weaknesses to Current Approach• Alternatives to Current Approach• Acknowledgements

Page 3: Business Identification: Spatial Detection Alexander Darino Week 5

3

Outline

LatitudeLongitude

Geocoding

ReverseGeocoding

Nearby Businesses

Image OCR Detected Text

Business Name

Matching

BusinessIdentification

Business Spatial

Detection

Week 4 Week 5

Page 4: Business Identification: Spatial Detection Alexander Darino Week 5

Previous Work

4

Image Where Am I? Latitude, Longitude

Latitude, Longitude

Geocoding

ReverseGeocoding

Nearby Businesses

65George S Aiken CoWinghart's Burger & Whiskey BarMarket SquareBella Sera On the SquareChipotleNOLALas Velas…

Page 5: Business Identification: Spatial Detection Alexander Darino Week 5

5

Business Name Detection

LatitudeLongitude

Geocoding

ReverseGeocoding

Nearby Businesses

Image OCR Detected Text

Business Name

Matching

BusinessIdentification

Business Spatial

Detection

Page 6: Business Identification: Spatial Detection Alexander Darino Week 5

6

Business Name Detection

Page 7: Business Identification: Spatial Detection Alexander Darino Week 5

7

Business Name Detection…<line dy="95" dx="1573" y="420" x="11" value="1">

<space dy="26" dx="9" y="379" x="11"/> <box dy="26" dx="9" y="379" x="11" value="0" weights="96" numac="1"/> <box dy="25" dx="6" y="406" x="11" value="J" weights="98,62" numac="2"

achars="p"/> <box dy="19" dx="5" y="382" x="19" value="n" weights="96" numac="1"/> <space dy="5" dx="30" y="441" x="25"/> <box dy="5" dx="7" y="441" x="56" value="."/> <box dy="24" dx="5" y="401" x="57" value="."/> <box dy="13" dx="8" y="429" x="58" value="v" weights="98" numac="1"/> <box dy="26" dx="9" y="402" x="60" value="." weights="94" numac="1"/> <box dy="22" dx="5" y="406" x="67" value="0" weights="96" numac="1"/> <box dy="10" dx="12" y="444" x="71" value="."/>

</line>…

Page 8: Business Identification: Spatial Detection Alexander Darino Week 5

8

Business Name Matching

LatitudeLongitude

Geocoding

ReverseGeocoding

Nearby Businesses

Image OCR Detected Text

Business Name

Matching

BusinessIdentification

Business Spatial

Detection

Page 9: Business Identification: Spatial Detection Alexander Darino Week 5

9

Business Name Matching

• Developed Confidence Attribution Algorithm– Confidence of OCR Token being Name Token• Example: Confidence of “ESTUANT” representing

“RESTAURANT”• Point-based system

– Confidence of Name appearing in Image• Sum of points of matching OCR Text• Use logarithmically-normalized points to determine

business inclusion threshold

Page 10: Business Identification: Spatial Detection Alexander Darino Week 5

10

Business Name Matching

Page 11: Business Identification: Spatial Detection Alexander Darino Week 5

11

Page 12: Business Identification: Spatial Detection Alexander Darino Week 5

12

Business Name Matching

Page 13: Business Identification: Spatial Detection Alexander Darino Week 5

13

Page 14: Business Identification: Spatial Detection Alexander Darino Week 5

14

Business Name Matching

Page 15: Business Identification: Spatial Detection Alexander Darino Week 5

15

Business Name Matching

Note: k is usually 2 or 3

Page 16: Business Identification: Spatial Detection Alexander Darino Week 5

16

Business Name Matching

Page 17: Business Identification: Spatial Detection Alexander Darino Week 5

17

Business Name Matching

Note: This originally did not appear because it did not exceed the confidence threshold. It now appears because it contributes to the Business Name Identification

Page 18: Business Identification: Spatial Detection Alexander Darino Week 5

18

Business Spatial Identification

LatitudeLongitude

Geocoding

ReverseGeocoding

Nearby Businesses

Image OCR Detected Text

Business Name

Matching

BusinessIdentification

Business Spatial

Detection

Page 19: Business Identification: Spatial Detection Alexander Darino Week 5

19

Business Spatial Identification

Page 20: Business Identification: Spatial Detection Alexander Darino Week 5

20

Business Spatial Identification

Aiken George S Co

Category:Food, GroceryAddress: 218 Forbes AvePittsburgh, PA 15222Phone: (412) 391-6358Rating: 4.5/5 (2 Reviews)

Page 21: Business Identification: Spatial Detection Alexander Darino Week 5

21

Business Spatial Identification

Page 22: Business Identification: Spatial Detection Alexander Darino Week 5

22

Business Spatial Identification

Page 23: Business Identification: Spatial Detection Alexander Darino Week 5

23

Business Spatial Identification

Bruegger's Bagels

Category:BagelsAddress: Market Sq

Pittsburgh, PA 15222Phone: (412) 281-2515Rating: Not Rated

Page 24: Business Identification: Spatial Detection Alexander Darino Week 5

24

Weaknesses to Current Approach

LatitudeLongitude

Geocoding

ReverseGeocoding

Nearby Businesses

Image OCR Detected Text

Business Name

Matching

BusinessIdentification

Business Spatial

Detection

Page 25: Business Identification: Spatial Detection Alexander Darino Week 5

25

Weaknesses to Current Approach

Lots of Garbage

Page 26: Business Identification: Spatial Detection Alexander Darino Week 5

26

Weaknesses to Current Approach

Fragmented Word Detection

Page 27: Business Identification: Spatial Detection Alexander Darino Week 5

27

Weaknesses to Current ApproachFails with

non-orthogonal perspective

Did I already mention lots of

garbage?

Page 28: Business Identification: Spatial Detection Alexander Darino Week 5

28

Weaknesses to Current Approach

Fails withnon-roman text

Not scale-invariant

Page 29: Business Identification: Spatial Detection Alexander Darino Week 5

29

ALTERNATIVE APPROACHESTwo different

Page 30: Business Identification: Spatial Detection Alexander Darino Week 5

30

Alternative #1: Image Matching

LatitudeLongitude

Geocoding

ReverseGeocoding

Nearby Businesses

Image

Match to Storefront

Image

BusinessIdentification

Business Spatial

Detection

Page 31: Business Identification: Spatial Detection Alexander Darino Week 5

31

Alternative #1: Image Matching

Page 32: Business Identification: Spatial Detection Alexander Darino Week 5

32

Alternative #1: Image Matching

• Weaknesses– Storefront images aren’t always available for

matching– Computationally Expensive• Hundreds of images to compare to

– Nothing new– Boring!

Page 33: Business Identification: Spatial Detection Alexander Darino Week 5

33

Alternative #2: Template Matching

LatitudeLongitude

Geocoding

ReverseGeocoding

Nearby Businesses

Image

Render Templates of Business Names in Different Fonts

Business SpatialDetection

Image Matching(eg. SIFT, HAAR)

Template Images

Business Identification

Page 34: Business Identification: Spatial Detection Alexander Darino Week 5

34

Alternative #2: Template Matching

• Tambellini• Tambellini• Tambellini• Tambellini

• Tambellini• Tambellini• Tambellini• Tambellini

Page 35: Business Identification: Spatial Detection Alexander Darino Week 5

35

Alternative #2: Template Matching

OCR• Not Scale Invariant• Unbounded Search• Fragmented Recognition• Roman-only font

Alternative #2• Scale Invariant• Bounded Search• Whole-word recognition• All fonts

Page 36: Business Identification: Spatial Detection Alexander Darino Week 5

36

Acknowledgements

• Subh– Provided several ideas regarding template

matching using SIFT, HAAR features, etc

Page 37: Business Identification: Spatial Detection Alexander Darino Week 5

Thank You