Spatial Business Detection and Recognition from Images Alexander Darino Weeks 10 & 11

Spatial Business Detection and Recognition from Images

Spatial Business Detection and Recognition from ImagesAlexander DarinoWeeks 10 & 11STR ImplementationSTR Implementation: Automatic Detection and Recognition of Signs From Natural ScenesMultiresolution-based potential characters detectionCharacter/layout geometry and color properties analysisLocal affine rectificationRefined DetectionRefined DetectionOne Font per classifier, a-z A-ZGenerate alphabet templatesResize & center templates; Divide into grid (7x7)Apply several 2D Gabor filters to each grid patchDifferent orientations, frequencies, variances,For each pixel, yields real/imaginary component of transformationFeed data into Linear Discriminant AnalysisReduces features and forms classifier at same time2D Gabor FilterConvolution of Gaussian x Sine wave

Training Process

Character DeterminationEach grid patch has its own LDA classifier; classifier returns vector of probabilities for each symbolTo classify overall character, recursively consider all 9-neighborhoods, multiply corresponding probabilities togetherWhen only one grid-patch remains, highest probability winsRecognition ProcessColor Properties Analysis: Choose channel with highest confidence of best distinguishing foreground from backgroundBinarization Threshold (50% of Otsus Method)Intermediate Representation: Trim, Resize, and Center Binary ImagePerform OCR on variations of Int. Rep: stretched, eroded (gaussian-based), dilutedAggregate and return votesRecognition Process Example:G using Trebuchet-MS Classifier

Query Character(Actual Size)

Intermediate Representation(Actual Size)abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZRecognition Process Example:G using Trebuchet-MS ClassifierVariation (Actual Size)Identified Character: g

Variation (Actual Size)Identified Character: sVariation (Actual Size)Identified Character: G

Recognition Process Example:G using Trebuchet-MS ClassifierVariation (Actual Size)Identified Character: g

Variation (Actual Size)Identified Character: gVariation (Actual Size)Identified Character: B

Recognition Process Example:G using Trebuchet-MS ClassifierVariation (Actual Size)Identified Character: GVariation (Actual Size)Identified Character: GVariation (Actual Size)Identified Character: B

Recognition Process Example:G using Trebuchet-MS ClassifierVariation (Actual Size)Identified Character: BVariation (Actual Size)Identified Character: BVariation (Actual Size)Identified Character: G

Recognition Process Example:G using Trebuchet-MS ClassifierVariation (Actual Size)Identified Character: GVariation (Actual Size)Identified Character: BVariation (Actual Size)Identified Character: a

Recognition Process Example:G using Trebuchet-MS ClassifierFinal Results:B: 5/15G: 5/15g: 3/15a : 1 (6.6%)s : 1 (6.6%)

GEORGE (Trebuchet-MS)

Votes:E: 14/15t: 1/15GEORGE (Trebuchet-MS)Votes:j: 13/15i: 2/15

j is the default when unable to decide

Should invert during preprocessing

GEORGE (Trebuchet-MS)Votes:j: 13/15i: 1/15M: 1/15j is the default when unable to decide

Should invert during preprocessing

GEORGE (Trebuchet-MS)Votes:B: 5/15G: 5/15g: 3/15a: 1/15s: 1/15

GEORGE (Trebuchet-MS)Votes:j: 12/15Y: 2/15X: 1/15

j is the default when unable to decide

Should invert during preprocessing or training

Note on the Inversion ProblemEasy to fix; common problem in OCR systemsWill likely detect and correct during preprocessing state as opposed to trainingMore training data: slower, less reliablePreprocessing: like trying many different lenses at the eye doctor and taking your best guess with each lense.BAKERY(Actual: Tw-Cen-MT, Used: Arial)abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ

abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ

BAKERY(Actual: Tw-Cen-MT, Used: Arial)Votes:B: 9/15j: 3/15H: 2/15F: 1/15

BAKERY(Actual: Tw-Cen-MT, Used: Arial)Votes:A: 9/15j: 5/15n: 1/15

BAKERY(Actual: Tw-Cen-MT, Used: Arial)Votes:K: 12/15j: 2/15H: 1/15

BAKERY(Actual: Tw-Cen-MT, Used: Arial)Votes:E: 5/15j: 3/15L: 3/15r: 2/15F: 2/15

BAKERY(Actual: Tw-Cen-MT, Used: Arial)Votes:p: 12/15j: 3/15

PRBAKERY(Actual: Tw-Cen-MT, Used: Arial)Votes:Y: 12/15j: 3/15

UNIVERSITY(Used: Times New Roman)abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ

UNIVERSITY(Used: Times New Roman)Votes:U: 8/15C: 3/15j: 2/15s: 1/15O: 1/15

UNIVERSITY(Used: Times New Roman)Votes:N: 12/15j: 3/15

UNIVERSITY(Used: Times New Roman)Votes:l(el): 9/15I(eye): 6/15

UNIVERSITY(Used: Times New Roman)Votes:v: 9/15j: 3/15V: 3/15

UNIVERSITY(Used: Times New Roman)Votes:F: 9/15L: 5/15l (el): 1/15

UNIVERSITY(Used: Times New Roman)Votes:G: 9/15j: 6/15

UNIVERSITY(Used: Times New Roman)Votes:j: 12/15x: 2/15w: 1/15

UNIVERSITY(Used: Times New Roman)Votes:j: 5/15C: 4/15O: 4/15x: 2/15

UNIVERSITY(Used: Times New Roman)Votes:T: 9/15l: 3/15i: 1/15j: 1/15L: 1/15

UNIVERSITY(Used: Times New Roman)Votes:Y: 10/15j: 3/15i: 2/15

EvaluationBiggest weaknesses in preprocessing stageOCR sensitive to thresholding/color inversionOccasionally color modeling chooses a bad channel to use for OCR happens more often on low-resolution imagesWorks surprisingly well for low-resolution imagesFont does not need to be exact, but proportions need to be roughly the sameHow do I use this information?The Big PictureLatitudeLongitudeGeocoding

ReverseGeocodingNearby BusinessesImageSTRDetected TextBusiness Name MatchingBusinessIdentificationBusiness SpatialDetection43Old ApproachForm words from highest-voted charactersCompare to lexicon using Levenshtein distanceUse existing ranking system afterwards

BOKFRY > BAKERY (L-DIST = 2)GFQRGF > GEORGE (L-DIST = 3)New Approach (Lexicon-assisted STR)Minimize Levenshtein distance with best permutation of voted charactersUse existing ranking system afterwards

B O K F P YG U H E R I >>> BAKERYJ A j L I l (L-DIST = 0)

The End Result46Bruegger's BagelsCategory:BagelsAddress:Market SqPittsburgh, PA 15222Phone: (412) 281-2515Rating: Not Rated46Next StepsFix STR PreprocessingBug in Color Modeling code found onlineInversion determinationMultiple thresholdsWord matching: Generate templates of words/logos instead of lettersText detector: fix character/word fragmentation by reading papers that address the issueNext StepsTest more images; fix problems as they ariseIdeas to consider:Feed grid-patch probability vectors into SVM instead of smoothingGenerate disambiguation classifiers to differentiate:Between top contending votes. Remember how G and B got confused? Dynamically create classifier to tell them apartBetween commonly confused letters. Eg. E/F, l/i/j, o/c, etcDont consider statistically insignificant confidencesNext StepsText DetectionLook into after more work has been done on STRNeed to address issues:Intracharacter segmentationIntercharacter segmentationWord segmentationNeeded to make STR system automated like beforeThank You

Documents

Spatial Business Detection and Recognition from Images Alexander Darino Weeks 10 & 11