Performance Evaluation of GANs in a semi-supervised OCR Use … · 2019-03-13 · OCR with...

Performance Evaluation of GANs

in a semi-supervised OCR Use Case

Florian Wilhelm London, 2018-10-11

Special Interests

• Mathematical Modelling

• Recommendation Systems

• Data Science in Production

• Python Data Stack

• Maintainer of PyScaffold

Dr. Florian Wilhelm

Principal Data Scientist @ inovex

@FlorianWilhelm

� FlorianWilhelm

florianwilhelm.info

Florian TantenMaster Thesis @ inovex October 2017 - May 2018

IT-project house for digital transformation:‣ Agile Development & Management‣ Web · UI/UX · Replatforming · Microservices‣ Mobile · Apps · Smart Devices · Robotics‣ Big Data & Business Intelligence Platforms‣ Data Science · Data Products · Search · Deep Learning‣ Data Center Automation · DevOps · Cloud · Hosting‣ Trainings & Coachings

Using technology to inspire our clients. And ourselves.

inovex offices inKarlsruhe · Cologne · Munich · Pforzheim · Hamburg · Stuttgart.

www.inovex.de

Agenda

1. Use Case

2. Text Spotting

3. Data and Pipeline

4. Generative Adversarial Networks

5. Semi-supervised Learning

6. Results

5https://www.autocheck.com/vehiclehistory/autocheck/en/vinbasics

Vehicle Identification Number (VIN)

Unique identifier like a fingerprint of a vehicle

serial number

country security codemodel year

assembly plant

details

flexible fuel vehicles

manufacturer

Use Case

VIN:WF0DXXGAKDEJ37385

VIN-Decoder Manufacturer: BMWModel: X3Year: 2013-03-21Engine power: 143 PS

Equipment:- Xenon Lights...

Information about the car:

Spotting the vehicle identification number (VIN) in images of vehicle registration documents

OCR -Libraries

Commercial software Open source tools

„VSSZZZGJZHR03G533“

OCR with Tesseract

Agenda

1. Use Case

2. Text Spotting

6. Results

Character detection & extraction Character recognition

11Girshick et al. (2014), „Region-Based Convolutional Networks for Accurate Object Detection and Segmentation“

Methodology in Text Spotting

Sliding Window

Computer Vision Tools

Others

- Connected components- Stroke width transform- Edge detection

- SVM- Learning with HOG- CNN

- Region proposal- Hypotheses CNN pooling

Character or word

CNN + RNN

Nearest Neighbor

High-performer current studies

CNN = Convolutional Neural NetworkSVM = Support Vector MachineHOG = Histogram of oriented GradientsRNN = Recurrent Neural NetworksRL = Reinforcement Learning

379Character Recognition

Spotting = Detection + Recognition

12https://en.wikipedia.org/wiki/Convolutional_neural_network; http://intellabs.github.io/ParallelJavaScript/

Convolutional Neural Network

Max pooling with a 2x2 filter and stride = 2Convolution with 3x3 kernel and stride = 1

Agenda

1. Use Case

6. Results

Objectives

- ~170 images of vehicle registration documents

b) Semi-supervised method

a) Supervised method

2. Comparison of classifiers

1. Implementation of a prototype „XLG0H200NA0A10348“

Dataset:

Text Spotting

End-to-End Text Spotting Pipeline

Sliding window

Character Detector (2 classes)

Chararacter Recognizer (36 classes)

Only one window per character

All windows

Non Maximum Suppression

All windows with characters

Region of Interest Extractor

Image depicting only VIN

X L G 0 H 2 0 N A 10 04 43 80

Small DatasetWhat to do about that?

1. Data Generation

2. Data Augmentation

Data Augmentation

Data augmentation:

Datasets:

Original image labeled manually as „0“

2 classes 36 classes

Label: „0“

Label: „character“

Label: „no character“

170 images of vehicle registration documents

Training set

85 images 85 images

Training sets of classifiers Testing sets of classifiers Testing sets of pipeline

85 images

RecognizerDetector

~ 42000 images2 classes

RecognizerDetector

Data Augmentation Data Augmentation

Testing set

Datasets

Classifiers

1. Supervised Convolutional Neural Network

2. Semi-supervised Generative Adversarial Network Generator Discriminator

Input Feature extraction Classification

Agenda

1. Use Case

2. Text Spotting

6. Results

Yann LeCunDirector of Facebook AI Research, Prof at NYU

“... (GANs) and the variations that are now being proposed is the most interesting idea in the last 10 years in ML, in my opinion.“

Ian J. Goodfellow @ Google Brain

Generative Adversarial Network

Generator (G) Discriminator (D)

Goal: Generate images, which seem to be realistic

Goal: Differentiate between fake and real images

Generative Adversarial Network

Generator (G)

Discriminator (D) Is D correct?

„D classified the generated image as 10% real“

„yes“

AB...89F

Real imagesReal labeled images

25Goodfellow et al. (2014), Generative Adversarial Networks

Mathematical formulation

Discriminator outputfor real images

Discriminator outputfor fake images

Discriminator calculates likelihood [0,1] for an image being real

Maximizing discriminator loss

Minimizing generator loss

Objective function

Training (alternating)

Example of generated images

Training images: Generated images during learning process:

Agenda

1. Use Case

2. Text Spotting

6. Results

Semi-supervised Learning

Supervised Learning

UnsupervisedLearning

Semi-supervised Learning

• Makes use of unlabeled data

• Combines supervised and unsupervised learning

Semi-supervised GAN for Character Detection

Real labeled images

Real unlabeled images

Generator

Discriminator

Agenda

1. Use Case

2. Text Spotting

6. Results

60,00%

70,00%

80,00%

90,00%

100,00%

20 50 100 200 400 700 1000 5000 15000 30000 42000

DCNN DCNN pretrained

„Character“ „No character“

Manually generated images with CAPTCHA methods

Pretraining of DCNN

Size of labeled training set

Bildschirmfoto 2018-04-24 um 17.48.20Bildschirmfoto 2018-04-24 um 17.48.20

60,00%

70,00%

80,00%

90,00%

100,00%

20 50 100 200 400 700 1000 5000 15000 30000 42000

DCNN DCNN pretrained Supervised GAN

Generator

Discriminator

Real labeled images

Supervised GAN

Bildschirmfoto 2018-04-24 um 17.48.20Bildschirmfoto 2018-04-24 um 17.48.20

60,00%

70,00%

80,00%

90,00%

100,00%

20 50 100 200 400 700 1000 5000 15000 30000 42000

DCNN DCNN pretrained Supervised GAN Semi-supervised GAN

Discriminator CCF

Generator

Real labeled images

Real unlabeledimages

Semi-supervised GAN

Bildschirmfoto 2018-04-24 um 17.48.20

Character Recognizer (36 classes)

10,00%

20,00%

30,00%

40,00%

50,00%

60,00%

70,00%

80,00%

90,00%

100,00%

36 72 108 200 300 400 600 800 1000 5000 8000

60,00%

70,00%

80,00%

90,00%

100, 00%

20 50100

200400

7001000

DCNN DCNN pr etrained Sup ervised GAN

Character DetectorCharacter Recognizer

Bildschirmfoto 2018-04-24 um 17.48.20

End-to-End Text Spotting Pipeline

Sliding window

Accuracy = 99.94%

85 images

Google Cloud Vision API

Sliding window

85 images

∅ Levenshtein distance = 4.49

85 images of VINs

Our ApproachGoogle Cloud Vision API vs.

∅ Levenshtein distance = 0.011

Levenshtein distance:

Classification Label

AYZ33 XYZ321 = 3

Key Learnings

• Custom solutions can tremendously outperformoff-the-shelve software in a specific use-case

• Semi-supervised GANs can be successfullyapplied in use-cases with little data

• With simple data augmentation techniques having only little data might be enough

Bibliography

- Krizhevsky et al. (2012) „ImageNet Classication with Deep Convolutional Neural Networks“

- Girshick et al. (2014), „Region-Based Convolutional Networks for Accurate Object Detection and Segmentation“

- Girshick et al. (2015), „Fast R-CNN“

- Girshick et al. (2015), „Faster R-CNN“

- He et al. (2017), „Mask-R-CNN“

- Goodfellow et al. (2014) „Generative Adversarial Networks"

Thank you!

Florian Wilhelm

Principal Data Scientist

inovex GmbH

Schanzenstraße 6-20Kupferhütte 1.13 51063 Köln

florian.wilhelm@inovex.de

Performance Evaluation of GANs in a semi-supervised OCR Use … · 2019-03-13 · OCR with...

Documents

Integration of Telugu dictionary into Tesseract OCRsaketh/research/manasaMTP.pdfIntegration of Telugu dictionary into Tesseract OCR MTech Stage II Project Report Submitted in partial

Project FoxEye - Mozilla · The open source projects used in the demos OpenCV: BSD license, tons of computer vision algorithms library. Tesseract-OCR: Apache License v2.0, the

Tesseract OCR Engine - OpenFest 2009

Gans, Haus-/Graugans...die Diepholzer Gans, die Emdener Gans, die Pommersche Gans und die Rheinische Gans. Die Wildgänse unterteilt man in zwei Gruppen: In die Feldgänse mit zehn

Tesseract OCR Engine Svetlin Nakov and Veselin Kolev BASD (Bulgarian Association of Software Developers)

Sistema de Reconhecimento de Placas de Carro (Brasil) - Visão Computacional/OCR/Tesseract

Optical Character Recognition with a Neural Network Model ... · Tesseract vs Ocropy, free OCR frameworks Tesseract • Font based learning • Single characters are decomposed •

Trainer GUI for Tesseract

Simple Zonal OCR Manual - eDocfile · Simple Zonal OCR Manual Revised 9-18-2011 _____ eDocFile, Inc. ,2709 Willow OaKs Drive, Valrico, FL 33594 ... or Tesseract OCR engine from Google

Tesseract para el euskera - · PDF file$ xulrunner application.ini 9. 4 Desinstalación 4.1 En entorno Windows El proceso para desinstalar el OCR de código abierto para el euskera

SOFTWARE DESIGN FOR LICENSE PLATE DETECTION AND ...thesis.binus.ac.id/doc/Lain-lain/2012-1-00544-mtif RNGKSN.pdf · • Engine : Tesseract OCR 2.3. Evaluation The car image that will

Tesseract - Wikipedia, the free encyclopedia.pdf

OCR using Tesseract

Tesseract ocr

Tesseract Training_for Khmer Language_For Posting

Chapter 12. OCR and Sudoku - Prince of Songkla Universityad/jg/nui12/sudokuOCR.pdf · Tesseract is perhaps the most popular free OCR library, with several Java bindings, including

0 - Adapting Tesseract for Multigual OCR

Adapting the Tesseract Open Source OCR Engine for ... · Multilingual OCR Ray Smith Daria Antonova Dar-Shyang Lee Google Inc., 1600 Amphitheatre Pkwy, Mountain View, CA 94043, USA

fficient OR Training ata Generation with Aletheia · Full pre-production using Tesseract (an open-source OCR engine) Tools that “shrink-wrap” around a selection Easy merging and

Tesseract OCR Engine - nas-adminmentoring.nas-admin.org/images/c/cc/MyPdfFile.pdf · Tesseract OCR Engine What it is, where it came from, where it is going. Ray Smith, Google Inc