37
Performance Evaluation of GANs in a semi-supervised OCR Use Case Florian Wilhelm London, 2018-10-11

Performance Evaluation of GANs in a semi-supervised OCR Use … · 2019-03-13 · OCR with Tesseract. 9 Agenda 1. Use Case 2. Text Spotting 3. Data and Pipeline 4. Generative Adversarial

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Performance Evaluation of GANs in a semi-supervised OCR Use … · 2019-03-13 · OCR with Tesseract. 9 Agenda 1. Use Case 2. Text Spotting 3. Data and Pipeline 4. Generative Adversarial

Performance Evaluation of GANs

in a semi-supervised OCR Use Case

Florian Wilhelm London, 2018-10-11

Page 2: Performance Evaluation of GANs in a semi-supervised OCR Use … · 2019-03-13 · OCR with Tesseract. 9 Agenda 1. Use Case 2. Text Spotting 3. Data and Pipeline 4. Generative Adversarial

Special Interests

• Mathematical Modelling

• Recommendation Systems

• Data Science in Production

• Python Data Stack

• Maintainer of PyScaffold

Dr. Florian Wilhelm

Principal Data Scientist @ inovex

@FlorianWilhelm

� FlorianWilhelm

florianwilhelm.info

2

Florian TantenMaster Thesis @ inovex October 2017 - May 2018

Page 3: Performance Evaluation of GANs in a semi-supervised OCR Use … · 2019-03-13 · OCR with Tesseract. 9 Agenda 1. Use Case 2. Text Spotting 3. Data and Pipeline 4. Generative Adversarial

IT-project house for digital transformation:‣ Agile Development & Management‣ Web · UI/UX · Replatforming · Microservices‣ Mobile · Apps · Smart Devices · Robotics‣ Big Data & Business Intelligence Platforms‣ Data Science · Data Products · Search · Deep Learning‣ Data Center Automation · DevOps · Cloud · Hosting‣ Trainings & Coachings

Using technology to inspire our clients. And ourselves.

inovex offices inKarlsruhe · Cologne · Munich · Pforzheim · Hamburg · Stuttgart.

www.inovex.de

Page 4: Performance Evaluation of GANs in a semi-supervised OCR Use … · 2019-03-13 · OCR with Tesseract. 9 Agenda 1. Use Case 2. Text Spotting 3. Data and Pipeline 4. Generative Adversarial

4

Agenda

1. Use Case

2. Text Spotting

3. Data and Pipeline

4. Generative Adversarial Networks

5. Semi-supervised Learning

6. Results

Page 5: Performance Evaluation of GANs in a semi-supervised OCR Use … · 2019-03-13 · OCR with Tesseract. 9 Agenda 1. Use Case 2. Text Spotting 3. Data and Pipeline 4. Generative Adversarial

5https://www.autocheck.com/vehiclehistory/autocheck/en/vinbasics

Vehicle Identification Number (VIN)

Unique identifier like a fingerprint of a vehicle

serial number

country security codemodel year

assembly plant

details

flexible fuel vehicles

manufacturer

Page 6: Performance Evaluation of GANs in a semi-supervised OCR Use … · 2019-03-13 · OCR with Tesseract. 9 Agenda 1. Use Case 2. Text Spotting 3. Data and Pipeline 4. Generative Adversarial

6

Use Case

VIN:WF0DXXGAKDEJ37385

VIN-Decoder Manufacturer: BMWModel: X3Year: 2013-03-21Engine power: 143 PS

Equipment:- Xenon Lights...

Information about the car:

Spotting the vehicle identification number (VIN) in images of vehicle registration documents

Page 7: Performance Evaluation of GANs in a semi-supervised OCR Use … · 2019-03-13 · OCR with Tesseract. 9 Agenda 1. Use Case 2. Text Spotting 3. Data and Pipeline 4. Generative Adversarial

7

OCR -Libraries

PyOCR

Commercial software Open source tools

Page 8: Performance Evaluation of GANs in a semi-supervised OCR Use … · 2019-03-13 · OCR with Tesseract. 9 Agenda 1. Use Case 2. Text Spotting 3. Data and Pipeline 4. Generative Adversarial

8

„VSSZZZGJZHR03G533“

???+

OCR with Tesseract

Page 9: Performance Evaluation of GANs in a semi-supervised OCR Use … · 2019-03-13 · OCR with Tesseract. 9 Agenda 1. Use Case 2. Text Spotting 3. Data and Pipeline 4. Generative Adversarial

9

Agenda

1. Use Case

2. Text Spotting

3. Data and Pipeline

4. Generative Adversarial Networks

5. Semi-supervised Learning

6. Results

Page 10: Performance Evaluation of GANs in a semi-supervised OCR Use … · 2019-03-13 · OCR with Tesseract. 9 Agenda 1. Use Case 2. Text Spotting 3. Data and Pipeline 4. Generative Adversarial

Character detection & extraction Character recognition

11Girshick et al. (2014), „Region-Based Convolutional Networks for Accurate Object Detection and Segmentation“

Methodology in Text Spotting

Sliding Window

Computer Vision Tools

Others

- Connected components- Stroke width transform- Edge detection

- SVM- Learning with HOG- CNN

- Region proposal- Hypotheses CNN pooling

Character or word

CNN

CNN + RNN

SVM

Nearest Neighbor

High-performer current studies

CNN = Convolutional Neural NetworkSVM = Support Vector MachineHOG = Histogram of oriented GradientsRNN = Recurrent Neural NetworksRL = Reinforcement Learning

379Character Recognition

...

Spotting = Detection + Recognition

Page 11: Performance Evaluation of GANs in a semi-supervised OCR Use … · 2019-03-13 · OCR with Tesseract. 9 Agenda 1. Use Case 2. Text Spotting 3. Data and Pipeline 4. Generative Adversarial

12https://en.wikipedia.org/wiki/Convolutional_neural_network; http://intellabs.github.io/ParallelJavaScript/

Convolutional Neural Network

Max pooling with a 2x2 filter and stride = 2Convolution with 3x3 kernel and stride = 1

Page 12: Performance Evaluation of GANs in a semi-supervised OCR Use … · 2019-03-13 · OCR with Tesseract. 9 Agenda 1. Use Case 2. Text Spotting 3. Data and Pipeline 4. Generative Adversarial

14

Agenda

1. Use Case

2. Data and Pipeline

3. Semi-supervised Learning

4. Generative Adversarial Networks

5. Semi-supervised Learning

6. Results

Page 13: Performance Evaluation of GANs in a semi-supervised OCR Use … · 2019-03-13 · OCR with Tesseract. 9 Agenda 1. Use Case 2. Text Spotting 3. Data and Pipeline 4. Generative Adversarial

15

Objectives

- ~170 images of vehicle registration documents

b) Semi-supervised method

a) Supervised method

2. Comparison of classifiers

1. Implementation of a prototype „XLG0H200NA0A10348“

Dataset:

Text Spotting

Page 14: Performance Evaluation of GANs in a semi-supervised OCR Use … · 2019-03-13 · OCR with Tesseract. 9 Agenda 1. Use Case 2. Text Spotting 3. Data and Pipeline 4. Generative Adversarial

16

End-to-End Text Spotting Pipeline

Sliding window

Character Detector (2 classes)

Chararacter Recognizer (36 classes)

Only one window per character

All windows

Non Maximum Suppression

All windows with characters

Region of Interest Extractor

Image depicting only VIN

X L G 0 H 2 0 N A 10 04 43 80

Page 15: Performance Evaluation of GANs in a semi-supervised OCR Use … · 2019-03-13 · OCR with Tesseract. 9 Agenda 1. Use Case 2. Text Spotting 3. Data and Pipeline 4. Generative Adversarial

17

Small DatasetWhat to do about that?

1. Data Generation

2. Data Augmentation

Page 16: Performance Evaluation of GANs in a semi-supervised OCR Use … · 2019-03-13 · OCR with Tesseract. 9 Agenda 1. Use Case 2. Text Spotting 3. Data and Pipeline 4. Generative Adversarial

18

Data Augmentation

Data augmentation:

Datasets:

Original image labeled manually as „0“

2 classes 36 classes

Chararacter Recognizer (36 classes)

Label: „0“

Character Detector (2 classes)

Label: „character“

Label: „no character“

Page 17: Performance Evaluation of GANs in a semi-supervised OCR Use … · 2019-03-13 · OCR with Tesseract. 9 Agenda 1. Use Case 2. Text Spotting 3. Data and Pipeline 4. Generative Adversarial

19

170 images of vehicle registration documents

Training set

85 images 85 images

Training sets of classifiers Testing sets of classifiers Testing sets of pipeline

85 images

RecognizerDetector

~ 42000 images2 classes

~ 8000 images36 classes

~ 42000 images2 classes

~ 8000 images36 classes

RecognizerDetector

Data Augmentation Data Augmentation

Testing set

Datasets

Page 18: Performance Evaluation of GANs in a semi-supervised OCR Use … · 2019-03-13 · OCR with Tesseract. 9 Agenda 1. Use Case 2. Text Spotting 3. Data and Pipeline 4. Generative Adversarial

20

Classifiers

1. Supervised Convolutional Neural Network

2. Semi-supervised Generative Adversarial Network Generator Discriminator

Input Feature extraction Classification

Page 19: Performance Evaluation of GANs in a semi-supervised OCR Use … · 2019-03-13 · OCR with Tesseract. 9 Agenda 1. Use Case 2. Text Spotting 3. Data and Pipeline 4. Generative Adversarial

21

Agenda

1. Use Case

2. Text Spotting

3. Data and Pipeline

4. Generative Adversarial Networks

5. Semi-supervised Learning

6. Results

Page 20: Performance Evaluation of GANs in a semi-supervised OCR Use … · 2019-03-13 · OCR with Tesseract. 9 Agenda 1. Use Case 2. Text Spotting 3. Data and Pipeline 4. Generative Adversarial

22

Yann LeCunDirector of Facebook AI Research, Prof at NYU

“... (GANs) and the variations that are now being proposed is the most interesting idea in the last 10 years in ML, in my opinion.“

Ian J. Goodfellow @ Google Brain

Page 21: Performance Evaluation of GANs in a semi-supervised OCR Use … · 2019-03-13 · OCR with Tesseract. 9 Agenda 1. Use Case 2. Text Spotting 3. Data and Pipeline 4. Generative Adversarial

23

Generative Adversarial Network

Generator (G) Discriminator (D)

Goal: Generate images, which seem to be realistic

Goal: Differentiate between fake and real images

Page 22: Performance Evaluation of GANs in a semi-supervised OCR Use … · 2019-03-13 · OCR with Tesseract. 9 Agenda 1. Use Case 2. Text Spotting 3. Data and Pipeline 4. Generative Adversarial

24

Generative Adversarial Network

Generator (G)

Discriminator (D) Is D correct?

„D classified the generated image as 10% real“

„yes“

AB...89F

Real imagesReal labeled images

Page 23: Performance Evaluation of GANs in a semi-supervised OCR Use … · 2019-03-13 · OCR with Tesseract. 9 Agenda 1. Use Case 2. Text Spotting 3. Data and Pipeline 4. Generative Adversarial

25Goodfellow et al. (2014), Generative Adversarial Networks

Mathematical formulation

Discriminator outputfor real images

Discriminator outputfor fake images

Discriminator calculates likelihood [0,1] for an image being real

Maximizing discriminator loss

Minimizing generator loss

Objective function

Training (alternating)

Page 24: Performance Evaluation of GANs in a semi-supervised OCR Use … · 2019-03-13 · OCR with Tesseract. 9 Agenda 1. Use Case 2. Text Spotting 3. Data and Pipeline 4. Generative Adversarial

26

Example of generated images

Training images: Generated images during learning process:

Page 25: Performance Evaluation of GANs in a semi-supervised OCR Use … · 2019-03-13 · OCR with Tesseract. 9 Agenda 1. Use Case 2. Text Spotting 3. Data and Pipeline 4. Generative Adversarial

27

Agenda

1. Use Case

2. Text Spotting

3. Data and Pipeline

4. Generative Adversarial Networks

5. Semi-supervised Learning

6. Results

Page 26: Performance Evaluation of GANs in a semi-supervised OCR Use … · 2019-03-13 · OCR with Tesseract. 9 Agenda 1. Use Case 2. Text Spotting 3. Data and Pipeline 4. Generative Adversarial

28

Semi-supervised Learning

Supervised Learning

UnsupervisedLearning

Semi-supervised Learning

• Makes use of unlabeled data

• Combines supervised and unsupervised learning

Page 27: Performance Evaluation of GANs in a semi-supervised OCR Use … · 2019-03-13 · OCR with Tesseract. 9 Agenda 1. Use Case 2. Text Spotting 3. Data and Pipeline 4. Generative Adversarial

29

Semi-supervised GAN for Character Detection

Real labeled images

Real unlabeled images

Generator

Discriminator

Page 28: Performance Evaluation of GANs in a semi-supervised OCR Use … · 2019-03-13 · OCR with Tesseract. 9 Agenda 1. Use Case 2. Text Spotting 3. Data and Pipeline 4. Generative Adversarial

30

Agenda

1. Use Case

2. Text Spotting

3. Data and Pipeline

4. Generative Adversarial Networks

5. Semi-supervised Learning

6. Results

Page 29: Performance Evaluation of GANs in a semi-supervised OCR Use … · 2019-03-13 · OCR with Tesseract. 9 Agenda 1. Use Case 2. Text Spotting 3. Data and Pipeline 4. Generative Adversarial

31

Character Detector (2 classes)

60,00%

70,00%

80,00%

90,00%

100,00%

20 50 100 200 400 700 1000 5000 15000 30000 42000

DCNN DCNN pretrained

„Character“ „No character“

Manually generated images with CAPTCHA methods

Pretraining of DCNN

Size of labeled training set

Acc

urac

y

Bildschirmfoto 2018-04-24 um 17.48.20Bildschirmfoto 2018-04-24 um 17.48.20

Page 30: Performance Evaluation of GANs in a semi-supervised OCR Use … · 2019-03-13 · OCR with Tesseract. 9 Agenda 1. Use Case 2. Text Spotting 3. Data and Pipeline 4. Generative Adversarial

32

Character Detector (2 classes)

60,00%

70,00%

80,00%

90,00%

100,00%

20 50 100 200 400 700 1000 5000 15000 30000 42000

DCNN DCNN pretrained Supervised GAN

Generator

Discriminator

Real labeled images

CCF

C C

F

Supervised GAN

Size of labeled training set

Acc

urac

y

Bildschirmfoto 2018-04-24 um 17.48.20Bildschirmfoto 2018-04-24 um 17.48.20

Page 31: Performance Evaluation of GANs in a semi-supervised OCR Use … · 2019-03-13 · OCR with Tesseract. 9 Agenda 1. Use Case 2. Text Spotting 3. Data and Pipeline 4. Generative Adversarial

33

Character Detector (2 classes)

60,00%

70,00%

80,00%

90,00%

100,00%

20 50 100 200 400 700 1000 5000 15000 30000 42000

DCNN DCNN pretrained Supervised GAN Semi-supervised GAN

Discriminator CCF

Generator

F

Real labeled images

CC

Real unlabeledimages

Semi-supervised GAN

Size of labeled training set

Acc

urac

y

Bildschirmfoto 2018-04-24 um 17.48.20

Page 32: Performance Evaluation of GANs in a semi-supervised OCR Use … · 2019-03-13 · OCR with Tesseract. 9 Agenda 1. Use Case 2. Text Spotting 3. Data and Pipeline 4. Generative Adversarial

34

Character Recognizer (36 classes)

0,00%

10,00%

20,00%

30,00%

40,00%

50,00%

60,00%

70,00%

80,00%

90,00%

100,00%

36 72 108 200 300 400 600 800 1000 5000 8000

60,00%

70,00%

80,00%

90,00%

100, 00%

20 50100

200400

7001000

5000

15000

30000

42000

DCNN DCNN pr etrained Sup ervised GAN

Character DetectorCharacter Recognizer

Size of labeled training set

Acc

urac

y

Size of labeled training set

Acc

ura

cy

Bildschirmfoto 2018-04-24 um 17.48.20

Page 33: Performance Evaluation of GANs in a semi-supervised OCR Use … · 2019-03-13 · OCR with Tesseract. 9 Agenda 1. Use Case 2. Text Spotting 3. Data and Pipeline 4. Generative Adversarial

..

35

End-to-End Text Spotting Pipeline

Sliding window

Character Detector (2 classes)

Chararacter Recognizer (36 classes)

Non Maximum Suppression

Region of Interest Extractor

Accuracy = 99.94%

85 images

1.

2.

85.

.

Page 34: Performance Evaluation of GANs in a semi-supervised OCR Use … · 2019-03-13 · OCR with Tesseract. 9 Agenda 1. Use Case 2. Text Spotting 3. Data and Pipeline 4. Generative Adversarial

36

Google Cloud Vision API

Sliding window

Character Detector (2 classes)

Chararacter Recognizer (36 classes)

Non Maximum Suppression

Region of Interest Extractor

85 images

∅ Levenshtein distance = 4.49

85 images of VINs

...

Our ApproachGoogle Cloud Vision API vs.

∅ Levenshtein distance = 0.011

Levenshtein distance:

Classification Label

AYZ33 XYZ321 = 3

Page 35: Performance Evaluation of GANs in a semi-supervised OCR Use … · 2019-03-13 · OCR with Tesseract. 9 Agenda 1. Use Case 2. Text Spotting 3. Data and Pipeline 4. Generative Adversarial

37

Key Learnings

• Custom solutions can tremendously outperformoff-the-shelve software in a specific use-case

• Semi-supervised GANs can be successfullyapplied in use-cases with little data

• With simple data augmentation techniques having only little data might be enough

Page 36: Performance Evaluation of GANs in a semi-supervised OCR Use … · 2019-03-13 · OCR with Tesseract. 9 Agenda 1. Use Case 2. Text Spotting 3. Data and Pipeline 4. Generative Adversarial

38

Bibliography

- Krizhevsky et al. (2012) „ImageNet Classication with Deep Convolutional Neural Networks“

- Girshick et al. (2014), „Region-Based Convolutional Networks for Accurate Object Detection and Segmentation“

- Girshick et al. (2015), „Fast R-CNN“

- Girshick et al. (2015), „Faster R-CNN“

- He et al. (2017), „Mask-R-CNN“

- Goodfellow et al. (2014) „Generative Adversarial Networks"

Page 37: Performance Evaluation of GANs in a semi-supervised OCR Use … · 2019-03-13 · OCR with Tesseract. 9 Agenda 1. Use Case 2. Text Spotting 3. Data and Pipeline 4. Generative Adversarial

Thank you!

Florian Wilhelm

Principal Data Scientist

inovex GmbH

Schanzenstraße 6-20Kupferhütte 1.13 51063 Köln

[email protected]