View
5
Download
0
Category
Preview:
Citation preview
Noise cleaning and Binarization
Skew Correction
Text & Graphics Segmentation
Line & Word Segmentation
Parsing (CC Analysis)
Feature Extraction
Classification Converter & Post-processing
Document Reconstruction
Output: Text/ unicode
Input: Document image
×
Input
CC Analysis
Convert to symbols
Reorder symbols
Render the word image
CC Analysis
Labels the CCs
33 51 122 52 113 107
DP based
Matching
to align
R and W
MAP FILE
RULES FILE
R
CC Analysis Label 37 55 107 57 37 58 43 63 14
Feature Extraction
•
IIIT Hyderabad
IIIT Hyderabad
IIIT Hyderabad
IIIT Hyderabad
•••••••••
IIIT Hyderabad
IIIT Hyderabad
IIIT Hyderabad
Feature Dim Classifiers
MLP KNN ANN SVM-1 SVM-2 NB DTC
C.M 20 12.04 4.16 5.86 10.04 9.19 11.93 5.57
DFT 16 8.35 8.96 9.35 7.88 7.86 15.33 13.85
DCT 16 5.43 5.11 5.92 5.25 5.24 8.96 7.89
ZM 47 1.30 1.98 2.34 1.24 1.23 3.99 8.04
PCA 350 1.04 1.14 2.39 0.37 0.35 4.83 5.97
LDA 350 0.55 0.52 1.04 0.35 0.34 3.20 4.77
RP 350 0.33 0.50 0.74 0.34 0.34 3.12 8.04
DT 400 1.94 1.27 1.98 1.84 1.84 4.28 2.20
IMG 400 0.32 0.56 0.78 0.32 0.31 1.22 2.45
Error rate using CNN : 0.93
IIIT Hyderabad
Error rates on Malayalam dataset.
IIIT Hyderabad
IIIT Hyderabad
IIIT Hyderabad
Error rates of SVM-2 classifiers with varying number of features.
IIIT Hyderabad
IIIT Hyderabad
IIIT Hyderabad
Accuracy of different classifiers Vs no. of classes, Feature used : LDA.
IIIT Hyderabad`
IIIT Hyderabad
Images from dataset
IIIT Hyderabad
Feature D-1 D-2 D-3 Blobs Cuts Shear
C.M 9.45 9.46 10.97 16.28 12.33 30.07
DFT 7.89 7.93 7.98 26.70 8.73 18.90
DCT 5.71 5.72 6.07 19.80 7.93 16.46
ZM 1.96 1.98 2.10 8.41 4.35 17.75
PCA 0.39 0.39 0.40 2.17 0.64 8.59
LDA 0.30 0.31 0.32 2.01 0.61 7.32
RP 0.48 0.67 1.04 3.61 0.71 6.75
DT 1.75 1.98 2.21 10.33 5.07 12.34
IMG 0.32 0.33 0.33 2.78 0.66 6.84
IIIT Hyderabad
IIIT Hyderabad
IIIT Hyderabad
•
•
IIIT Hyderabad
Accuracies of SVM-2 classifier when trained with 4 fonts and tested on the 5th font. S1 : Dataset without degradation, S2: Dataset with degradation.
IIIT Hyderabad
Features Telugu (350 class) English (72 class)
20X20 40X40 20X20 40X40
C.M 20.78 12.32 7.25 6.48
DFT 8.45 5.48 2.04 1.12
DCT 9.67 2.71 2.14 1.04
ZM 15.71 6.71 5.37 3.31
PCA 4.62 2.93 0.86 0.46
LDA 2.56 1.67 0.29 0.23
RP 2.49 1.66 0.28 0.23
DT 3.48 3.17 0.98 0.87
IMG 3.18 2.84 0.28 0.23
IIIT Hyderabad
IIIT Hyderabad
1,5
2,5 1,4
3,5 2,4 1,3
4,5 3,4 2,3 1,2
x
5 4 1 2 3
Sample x from class 4
•
|C|
O)stance(C,CharEditDi (CER) DistanceEdit Character
Symbols leRecognizab of No. Total
Symbols leRecognizab and iedMisclassif of No. RateError Symbol
Unicodeof No. Total
UnicodeiedMisclassif of No. RateError Unicode
Wordsof No. Total
rdsCorrect Wo of No. Accuracy level Word
WordsleRecognizab of No. Total
WordsleRecognizabor Correct of No.Accuracy level Word
•
•
Sarada
Sanjayan
0.85%
Thiruttu
0.85%
••
•
• •
•
•
Recommended