Performance Evaluation of GANs in a semi-supervised OCR Use … · 2019-03-13 · OCR with...

Preview:

Citation preview

Performance Evaluation of GANs

in a semi-supervised OCR Use Case

Florian Wilhelm London, 2018-10-11

Special Interests

• Mathematical Modelling

• Recommendation Systems

• Data Science in Production

• Python Data Stack

• Maintainer of PyScaffold

Dr. Florian Wilhelm

Principal Data Scientist @ inovex

@FlorianWilhelm

� FlorianWilhelm

florianwilhelm.info

2

Florian TantenMaster Thesis @ inovex October 2017 - May 2018

IT-project house for digital transformation:‣ Agile Development & Management‣ Web · UI/UX · Replatforming · Microservices‣ Mobile · Apps · Smart Devices · Robotics‣ Big Data & Business Intelligence Platforms‣ Data Science · Data Products · Search · Deep Learning‣ Data Center Automation · DevOps · Cloud · Hosting‣ Trainings & Coachings

Using technology to inspire our clients. And ourselves.

inovex offices inKarlsruhe · Cologne · Munich · Pforzheim · Hamburg · Stuttgart.

www.inovex.de

4

Agenda

1. Use Case

2. Text Spotting

3. Data and Pipeline

4. Generative Adversarial Networks

5. Semi-supervised Learning

6. Results

5https://www.autocheck.com/vehiclehistory/autocheck/en/vinbasics

Vehicle Identification Number (VIN)

Unique identifier like a fingerprint of a vehicle

serial number

country security codemodel year

assembly plant

details

flexible fuel vehicles

manufacturer

6

Use Case

VIN:WF0DXXGAKDEJ37385

VIN-Decoder Manufacturer: BMWModel: X3Year: 2013-03-21Engine power: 143 PS

Equipment:- Xenon Lights...

Information about the car:

Spotting the vehicle identification number (VIN) in images of vehicle registration documents

7

OCR -Libraries

PyOCR

Commercial software Open source tools

8

„VSSZZZGJZHR03G533“

???+

OCR with Tesseract

9

Agenda

1. Use Case

2. Text Spotting

3. Data and Pipeline

4. Generative Adversarial Networks

5. Semi-supervised Learning

6. Results

Character detection & extraction Character recognition

11Girshick et al. (2014), „Region-Based Convolutional Networks for Accurate Object Detection and Segmentation“

Methodology in Text Spotting

Sliding Window

Computer Vision Tools

Others

- Connected components- Stroke width transform- Edge detection

- SVM- Learning with HOG- CNN

- Region proposal- Hypotheses CNN pooling

Character or word

CNN

CNN + RNN

SVM

Nearest Neighbor

High-performer current studies

CNN = Convolutional Neural NetworkSVM = Support Vector MachineHOG = Histogram of oriented GradientsRNN = Recurrent Neural NetworksRL = Reinforcement Learning

379Character Recognition

...

Spotting = Detection + Recognition

12https://en.wikipedia.org/wiki/Convolutional_neural_network; http://intellabs.github.io/ParallelJavaScript/

Convolutional Neural Network

Max pooling with a 2x2 filter and stride = 2Convolution with 3x3 kernel and stride = 1

14

Agenda

1. Use Case

2. Data and Pipeline

3. Semi-supervised Learning

4. Generative Adversarial Networks

5. Semi-supervised Learning

6. Results

15

Objectives

- ~170 images of vehicle registration documents

b) Semi-supervised method

a) Supervised method

2. Comparison of classifiers

1. Implementation of a prototype „XLG0H200NA0A10348“

Dataset:

Text Spotting

16

End-to-End Text Spotting Pipeline

Sliding window

Character Detector (2 classes)

Chararacter Recognizer (36 classes)

Only one window per character

All windows

Non Maximum Suppression

All windows with characters

Region of Interest Extractor

Image depicting only VIN

X L G 0 H 2 0 N A 10 04 43 80

17

Small DatasetWhat to do about that?

1. Data Generation

2. Data Augmentation

18

Data Augmentation

Data augmentation:

Datasets:

Original image labeled manually as „0“

2 classes 36 classes

Chararacter Recognizer (36 classes)

Label: „0“

Character Detector (2 classes)

Label: „character“

Label: „no character“

19

170 images of vehicle registration documents

Training set

85 images 85 images

Training sets of classifiers Testing sets of classifiers Testing sets of pipeline

85 images

RecognizerDetector

~ 42000 images2 classes

~ 8000 images36 classes

~ 42000 images2 classes

~ 8000 images36 classes

RecognizerDetector

Data Augmentation Data Augmentation

Testing set

Datasets

20

Classifiers

1. Supervised Convolutional Neural Network

2. Semi-supervised Generative Adversarial Network Generator Discriminator

Input Feature extraction Classification

21

Agenda

1. Use Case

2. Text Spotting

3. Data and Pipeline

4. Generative Adversarial Networks

5. Semi-supervised Learning

6. Results

22

Yann LeCunDirector of Facebook AI Research, Prof at NYU

“... (GANs) and the variations that are now being proposed is the most interesting idea in the last 10 years in ML, in my opinion.“

Ian J. Goodfellow @ Google Brain

23

Generative Adversarial Network

Generator (G) Discriminator (D)

Goal: Generate images, which seem to be realistic

Goal: Differentiate between fake and real images

24

Generative Adversarial Network

Generator (G)

Discriminator (D) Is D correct?

„D classified the generated image as 10% real“

„yes“

AB...89F

Real imagesReal labeled images

25Goodfellow et al. (2014), Generative Adversarial Networks

Mathematical formulation

Discriminator outputfor real images

Discriminator outputfor fake images

Discriminator calculates likelihood [0,1] for an image being real

Maximizing discriminator loss

Minimizing generator loss

Objective function

Training (alternating)

26

Example of generated images

Training images: Generated images during learning process:

27

Agenda

1. Use Case

2. Text Spotting

3. Data and Pipeline

4. Generative Adversarial Networks

5. Semi-supervised Learning

6. Results

28

Semi-supervised Learning

Supervised Learning

UnsupervisedLearning

Semi-supervised Learning

• Makes use of unlabeled data

• Combines supervised and unsupervised learning

29

Semi-supervised GAN for Character Detection

Real labeled images

Real unlabeled images

Generator

Discriminator

30

Agenda

1. Use Case

2. Text Spotting

3. Data and Pipeline

4. Generative Adversarial Networks

5. Semi-supervised Learning

6. Results

31

Character Detector (2 classes)

60,00%

70,00%

80,00%

90,00%

100,00%

20 50 100 200 400 700 1000 5000 15000 30000 42000

DCNN DCNN pretrained

„Character“ „No character“

Manually generated images with CAPTCHA methods

Pretraining of DCNN

Size of labeled training set

Acc

urac

y

Bildschirmfoto 2018-04-24 um 17.48.20Bildschirmfoto 2018-04-24 um 17.48.20

32

Character Detector (2 classes)

60,00%

70,00%

80,00%

90,00%

100,00%

20 50 100 200 400 700 1000 5000 15000 30000 42000

DCNN DCNN pretrained Supervised GAN

Generator

Discriminator

Real labeled images

CCF

C C

F

Supervised GAN

Size of labeled training set

Acc

urac

y

Bildschirmfoto 2018-04-24 um 17.48.20Bildschirmfoto 2018-04-24 um 17.48.20

33

Character Detector (2 classes)

60,00%

70,00%

80,00%

90,00%

100,00%

20 50 100 200 400 700 1000 5000 15000 30000 42000

DCNN DCNN pretrained Supervised GAN Semi-supervised GAN

Discriminator CCF

Generator

F

Real labeled images

CC

Real unlabeledimages

Semi-supervised GAN

Size of labeled training set

Acc

urac

y

Bildschirmfoto 2018-04-24 um 17.48.20

34

Character Recognizer (36 classes)

0,00%

10,00%

20,00%

30,00%

40,00%

50,00%

60,00%

70,00%

80,00%

90,00%

100,00%

36 72 108 200 300 400 600 800 1000 5000 8000

60,00%

70,00%

80,00%

90,00%

100, 00%

20 50100

200400

7001000

5000

15000

30000

42000

DCNN DCNN pr etrained Sup ervised GAN

Character DetectorCharacter Recognizer

Size of labeled training set

Acc

urac

y

Size of labeled training set

Acc

ura

cy

Bildschirmfoto 2018-04-24 um 17.48.20

..

35

End-to-End Text Spotting Pipeline

Sliding window

Character Detector (2 classes)

Chararacter Recognizer (36 classes)

Non Maximum Suppression

Region of Interest Extractor

Accuracy = 99.94%

85 images

1.

2.

85.

.

36

Google Cloud Vision API

Sliding window

Character Detector (2 classes)

Chararacter Recognizer (36 classes)

Non Maximum Suppression

Region of Interest Extractor

85 images

∅ Levenshtein distance = 4.49

85 images of VINs

...

Our ApproachGoogle Cloud Vision API vs.

∅ Levenshtein distance = 0.011

Levenshtein distance:

Classification Label

AYZ33 XYZ321 = 3

37

Key Learnings

• Custom solutions can tremendously outperformoff-the-shelve software in a specific use-case

• Semi-supervised GANs can be successfullyapplied in use-cases with little data

• With simple data augmentation techniques having only little data might be enough

38

Bibliography

- Krizhevsky et al. (2012) „ImageNet Classication with Deep Convolutional Neural Networks“

- Girshick et al. (2014), „Region-Based Convolutional Networks for Accurate Object Detection and Segmentation“

- Girshick et al. (2015), „Fast R-CNN“

- Girshick et al. (2015), „Faster R-CNN“

- He et al. (2017), „Mask-R-CNN“

- Goodfellow et al. (2014) „Generative Adversarial Networks"

Thank you!

Florian Wilhelm

Principal Data Scientist

inovex GmbH

Schanzenstraße 6-20Kupferhütte 1.13 51063 Köln

florian.wilhelm@inovex.de

Recommended