Capture, sort and identify all types of documents and forms, with IRISCapture Pro

Capture, sort and identify alltypes of documents and forms,

with IRISCapture Pro

Jean-Pierre KseniczIRISCapture Pro Product Manager – R&D

Brigitte LehmannIRISCapture Pro Development Team Manager – R&D

Introduction• Document Archiving and Retrieval• Automatic Document Reading (ADR)• Digital Mailroom

Applications

•Separation•Identification / ClassificationTechniques

•From structured forms to unstructured documents

A Little Story…

•Combination of techniquesThe Sorting Tree

Identification, why ?

Document Archiving & Retrieval

Capture a document Identify the document type

Extract indexes• manually or

automatically (ADR)

Automatic Document Reading

Capture a document

Identify the document

Automaticallyextract the

data(“indexes” or

“fields”)

Export

The document type must be identified, to apply the adequate data extraction

by OCR, ICR, OMR (tick marks), barcodes, for structured documents (forms with fixed regions of

interest)

by full text OCR with contextual analysis, for semi-structured documents (invoices, contracts,…) or

unstructured documents (letters, reports,…)

Digital Mailroom

Capture a document Identify the document type

Extract the routing data • Addressee,

department,…• Manually or

automatically

Techniques

Document Separation

Detection of a Separation Sheet

• A sheet with a patch code or a barcode can be used as a trigger for the detection of a new document• The barcode usually contains additional information like the document type, or document indexes

• A white page is often used as a separation sheet

First Page Identification

• By several techniques, that can be mixed:• Fit with anchor points, text in a zone, titles, fingerprint, barcode, classification results, … (see further

slides)

Document Identification

Descriptive criteria are defined to identify the document, like :

anchor pointsTitles, text in a region, keywordsbarcodeFuzzy search, regular expressions…

A “fingerprint” of each page to be identified is stored in a library

Document Classification

Document Classificationwithout pre-definition (self-training)

IRISClassify

A Little Story…

From Structured Forms to Unstructured Documents

Fixed Layouts (1)• Form identification with descriptive criteria

– A unique value is printed to identify precisely each document type– High Speed (about 20 images /sec, independent of the number of

document types)

Fixed Layouts (2)• Form identification by fitting

– graphical shapes : lines, frames, logos– text– Very high speed (about 30 to 50 images /sec)

Semi-structured Documents (1)• Identification by titles

– Speed (about 3-5 images/sec, nearly constant)

Semi-structured Documents (2)

• Identification by keywords– Keywords may be found everywhere on the document– Fuzzy search algorithm– Regular expressions– Speed about 1 to 3 image/sec (size of OCR zone)– Need expertise to identify the mix of documents, need time to

define the project

IRISFingerPrint(1)

Identification only based on graphical features :

• Size• Layout• Logo• Lines• Marks• ...

≙ 94,36%

… 26 32 23 41 76 59 92 …

… 1 2 -2 4 2 3 -2 …

IRISFingerPrint (2)– No more definition: predefined fingerprints are trained– Speed about 3 to 5 images/sec, loosely linked to the number of

document types– The documents must have significant layout differences

IRISClassify (1)• For structured and unstructured documents

– letters, contracts, forms,… may belong to a same class– Training of predefined classes, no definition required– Speed about 0.25 to 0.5 image/sec

IRISClassify (2)– Other documents from the same class:

Summary

• Configuration : Pentium IV, 2.66 GHz, 2 GB RAM)

Method Speed(image/s)

Pros Cons Doc Type

Unique criteria,Unique OCR value, Bar Code, fit

20 to 50 Highest speed,High volume,Highest accuracy

Manual definition

Structured or semi-structured

Identification by title

3 to 5 Speed Manual definition

Structured or semi-structured

IRISFingerprint 3 to 5 Training,No definition

Only graphical elements

Structured, with sufficient graphical

IRISClassify 0.25 to 0.5

Training,No definition,Wide mix of docs

Time for full text OCR and statistics

The Sorting Tree

Sorting Tree :The Mix of Both Worlds

Identification & Classification working

together•All classical criteria may be used•Use of IRISFingerPrint and IRISClassify

Use of any third-party module :

•For special identification based on :•cursive handwriting•color schema,• …

Sorting TreeGet the Optimum• for each document class of a project• to optimize the balance speed/accuracy

Choose the best technology

• With logical AND-OR-NOT operators• Unique identifier, fit, title, keywords,… • IRISFingerprint• IRISClassify

Combine any technology

• Open for specific identification needsInclude third-party engines

Example of a Sorting TreeImage Fit ?

Booklet Header

Booklet pages

Unique ID ?

Unknown for review

Appendix…

Classify

Class 1

Class 2

Unknownfor review

Example of a Sorting Tree :Get the Optimum (1)

Image Fit

Doc VAT625

Text length

App VAT625

Image Fit 1

Booklet

Unique ID

Doc 30501

Doc 30502

Doc 30503

Image Fit 2

Doc RABO 4”

Unique Barcode

Sep sheet 1

Sep sheet 2

Classify

Invoice

Cash Transfer

Small Size

Ticket 1

Ticket 2

Example of a Sorting Tree :Get the Optimum (2)

<Node Name="Rabo4Inch" Base="FormatA4"> <PageType Value="Rabo4Inch"/> <DocType Value="Default"/> <Property Name="FitRabo4Inch" UseLayout="FitRabo4Inch"/> <Identification> <MatchProperty Name="FitRabo4Inch" Value="True"/> </Identification> </Node> <Node Name="Booklet" Base="FormatA4"> <Property Name="FitBooklet" UseLayout="FitBooklet"/> <Identification> <MatchProperty Name="FitBooklet" Value="True"/> </Identification> </Node>

Review Module

Manual Identification

• For unidentified documents

Document Reordering

• Split, merge, move documents

Image Review

• Rotation

Review Module

Conclusion

Identification and Classification

•Mix of techniques in a sorting tree :it makes sense !

Sorting Tree : Get the Optimum

•Get the optimum•The sorting tree optimizes the speed-accuracy balance for each document class in a project

Questions & Answers

A step further

• Please Visit our booth for a demo• White Paper on IRISFingerPrint• IRISClassify presentation• IRIS Training Sessions• www.irislink.com

Thank You !

Capture, sort and identify all types of documents and forms, with IRISCapture Pro

Documents

Behavioral or Structural Adaptation Task Card Sort€¦ · Behavioral or Structural Adaptation Task Card Sort 52 Adaptations Cards Differentiated Recording Sheets ... identify the

Sorting Algorithms - vlsicad.ucsd.edu · • Selection (min) sort • Bubble sort • Insertion sort • Bucket sort • Merge sort • Bogo sort • Quick sort

alg07 2011e py.ppt - cw.fel.cvut.cz€¦ · 1 Selection sort (Select sort) Insertion sort (Insert sort) Bubble sort deprecated Quicksort Sort stability ALG 07 A4B33ALG 2010/05

CS3334 Data Structures - CityU CS › ~cheewtan › Lec4_bubble_insertion_sorts.pdf · Bubble sort, Insertion sort Merge sort, Quick sort, Heap sort Bucket & Radix sort • Specific

Sorting Algorithms Bubble Sort Merge Sort Quick Sort Randomized Quick Sort

Lesson Plan - 2: Bubble Sort, Quick Sort. Contents Evocation Objective Introduction Bubble Sort Bubble Sort Algorithm Quick Sort Quick Sort

Sorting - cs.colostate.educs161/Summer15/slides/14_sorting.pdf · " Selection sort " Insertion sort " Bubble sort " Merge sort " Heap sort " Quick sort " Radix sort Each has its advantages

A Monthly Journal of Computer Science and Information … · 2015-01-25 · Keywords: Bubble sort, Insertion sort, Selection sort, Shell sort, Merge sort, Quick sort, Heap sort, Binary

Multimedia Photography 1. Lesson Objectives Identify features of a digital camera Identify types of composition techniques Capture still-shot images

SORTING ROUTINES. OBJECTIVES INTRODUCTION BUBBLE SORT SELECTION SORT INSERTION SORT QUICK SORT MERGE SORT

Identify and Sort Common Objects Into Categories · Identify and Sort Common Objects Into Categories ... places, things, ... National Center on Intensive Intervention Identify and

GUJARAT TECHNOLOGICAL UNIVERSITYWrite program to sort a given list using (a) Bubble sort (b) Selection sort (c) Insertion sort (d) Shell Sort (e) Quick sort (f) Merge sort (g) Radix

PG&E VGI Valuation Method · 2. Identify gaps in (a) data and (b) modeling 3. Propose additional studies to address gaps 1. Identify pathways to capture value 2. Identify gaps in

9. Sorting - NUS Computinggem1501/year1314sem2/sorting.pdf · Introduction Insertion Sort Selection Sort Bubble Sort Quick Sort Merge Sort Lower Bound Count Sort Conclusion 9. Sorting

Merge Sort Quick Sort

Chapter 7: Sorting Algorithms Insertion Sort. Sorting Algorithms Insertion Sort Shell Sort Heap Sort Merge Sort Quick Sort 2

Sorting Algorithms n 2 Sorts ◦Selection Sort ◦Insertion Sort ◦Bubble Sort Better Sorts ◦Merge Sort ◦Quick Sort ◦Radix Sort

5 S Your Spring Cleaning with Lean Tools · Japanese: Seiri Sort Clearing Classify Sort and identify necessary items to perform process (work) and eliminate all other items from work

PENGURUTAN - boldson.staff.gunadarma.ac.idboldson.staff.gunadarma.ac.id/Downloads/files/38434... · • Heap sort • Merge sort • Radix sort • Tree sort • Binary sort . Bubble

BUBBLE SORT BUBBLE SORT INSERTION SORT INSERTION SORT SELECTION SORT SELECTION SORT RADIX SORT RADIX SORT