31
Document Image Segmentation using Fuzzy Classifier and the Dual-Tree DWT Jamal Saeedi, Reza Safabakhsh from Amirkabir University of Technology and Saeed Mozaffari from Semnan University Presented By: Jamal Saeedi

Document image segmentation using fuzzy classifier and the dual tree dwt

Embed Size (px)

Citation preview

Document Image Segmentation using Fuzzy Classifier and the

Dual-Tree DWTJamal Saeedi, Reza Safabakhsh

from Amirkabir University of Technology

and Saeed Mozaffarifrom Semnan University

Presented By: Jamal Saeedi

Content

What is Document Image Segmentation (DIS)?

DIS Important Applications

Wavelet-based DIS Methods

The New DIS algorithm using Fuzzy Classifier and the

Dual-Tree DWT

Evaluation Methods and Results

Conclusion

2/31

What is Document Image Segmentation?

Document image segmentation (DIS) is the first step of

document image analysis and understanding.

DIS partitions a document image into distinct regions.

In the ideal case, these regions correspond to the image

entities such as:

Text and Non-Text blocks

Text, Picture and Background blocks

3/31

What is Document Image Segmentation?

Document Image Text and Non-Text blocksText, Picture and Background blocks

4/31

DIS Important Applications

Image and text database retrieval

Automated documents reading and processing

Digital documents storing

5/31

DIS MethodsTop-down Methods: Classic top-down techniques are based on the Run Length Smoothing (RLS) algorithm and projection profiles. After the document image is smoothed with RLS, projection profiles is used to split the document into blocks.

Bottom-down Methods: Bottom-up methods group small components (starting with pixels as connected components) into successively larger components until all regions are identified in the image.

Texture-based Methods: Texture-based methods use the observation that texts in images have distinct textural properties that distinguish them from the background.

6/31

Wavelet-based DIS MethodsThe wavelet transform, originally developed in the mid 80’s, is a signal analysis tool that provides a multi-resolution decomposition of an image .

Document Image DWT Sub-bands at first level

LL LH

HL HH

HH

LL

LH

HL

7/31

Wavelet-based DIS Methods

Document Image DWT Sub-bands at first level

or

Texture Feature Extraction Classification

K-means Clustering(Kundu and Acharya 2002)

(G. Lee et al. 2007)

Fisher Classifier(Y. Sheng et al. 2004)(S. Kumar et al. 2007)

Class Labels

LL LH

HL HH

8/31

K-means ClusteringAssuming samples should be divided into C clusters, algorithm is as follows: First, select the appropriate C as the initial centers. Second, for each sample, calculate the distances between this sample and centers. It belongs to the cluster from which the distance is min. Third, calculate the average of all samples that belongs to the same cluster as the new cluster center. Finally, calculate the error between the new cluster centers and the original center, if it is less than the value given beforehand, then the process finished. Otherwise go into the second step.

9/31

Fisher Classifier-1For a feature vector X, the Fisher classifier projects X onto one dimension Y in direction W:

The Fisher criterion finds the optimal projection direction W0 by maximizing the ratio of the between-class scatter to the within-class scatter, which benefits the classification:

10/31

Fisher Classifier-2The optimal projection direction is the eigenvector of the corresponding to its largest eigenvalue. For a two-class classification problem, we do not need to calculate the eigenvectors of It is shown that the optimal projection direction is:

Let Y1 and Y2 be the projections of two classes and let E[Y1] and E[Y2] be the means of Y1 and Y2, respectively. Suppose E[Y1] > E[Y2], then the decision can be made as:

11/31

The DWT has been shown to be an optimal representation of one-dimensional 1D piece-wise smooth signals. However, there are limitations of the separable DWT:

lack of shift invariance

Limited Angular Resolution

The DT-DWT was proposed to overcome shift variance and directionality limitations of the DWT while maintaining the perfect reconstruction property with limited redundancy.

The Dual-Tree DWT-1

12/31

The DT-DWT is basically two parallel DWT filter bank trees. The wavelet and scaling functions used in one tree can be defined as approximate Hilbert transforms of the functions in the other tree .

The Dual-Tree DWT-2

1-D DT- DWT 3-Level Decomposition Filter Bank 13/31

Properties of the complex wavelet transform:

The magnitudes of complex coefficients are almost shift invariant.

Unlike the orthogonal DWT, the basis functions have directional selectivity at 15, 45, 75, -75 –45,-15 degrees; therefore they are a more efficient tool than DWT to represent line or curve singularities.

The Dual-Tree DWT-3

2D DT- DWT, Scale and orientation labeled sub-bands

14/31

The Dual-Tree DWT of the document image:

Document Image

DT-DWT Sub-bands at first levelDT-DWT Sub-bands at second level

-15 +15-45 +45

-75 +75

-45 -15 +15 +45

-75 +75

The Dual-Tree DWT-4

15/31

Now the inverse DT-DWT is applied to generate The Gradient image while the low-frequency sub-bands are vanished.

DT-DWT Sub-bands at second levelThe Gradient Image

The Gradient Image

16/31

In the gradient image:

Text regions have large local variations.

Backgrounds are continuous tone low frequency regions with

monotonous features although mixed with noise.

Images are continuous tone regions falling in between text and

background.

In order to enhance the gradient image information, we use two

texture features to capture and magnify these variations making

the input image appropriate for the classification step.

Feature Extraction-1

17/31

The first feature calculates local entropy and the second one calculates local range of the gradient image:

Local Range Local Entropy

Feature Extraction-2

18/31

The New DIS algorithm using Fuzzy Classifier and the Dual-Tree DWT

Document Image DT-DWT Sub-bands at second level The Gradient Image

Texture Feature Extraction Classification

Fuzzy Rule-based Classifier

Class Labels19/31

Fuzzy rule-based classifier-1In this step we utilize a fuzzy rule-based classifier for classification of document image to:

Text and Non-Text classes

Text, Picture and Background classes

This classifier can be constructed by specifying classification rules as linguistic rules:

.

.

.

20/31

Fuzzy rule-based classifier-2Each linguistic value is represented by a membership function. Having the values of membership function, the firing strength of the rules are calculated as:

.

.

.

we use the train image and the confidence value of each class to assign class label for each rule:

21/31

Fuzzy rule-based classifier-3Specifying weight to rules has a great effect on fuzzy rule-based classifier performance, and the most proper definition of the rule weight would be as following (H. Ishibuchi 2005):

On the other hand, we also obtain a support for each rule:

Finally we utilize the fuzzy rules which their confidence and support are more than a predefined minimum value.

22/31

Fuzzy rule-based classifier-4In our test, the number of final rules is 5 that we have mentioned as sample below:

R1: IF F1 is large AND F2 is large THEN class is Text

R2: IF F1 is large AND F2 is small THEN class is Picture

R3: IF F1 is medium AND F2 is large THEN class is Text

R4: IF F1 is medium AND F2 is small THEN class is Picture

R5: IF F1 is small THEN class is Background

After finding the firing strength of each rule, the output of classifier is obtained:

23/31

Post-ProcessingAfter completion of classification, there may existed some misclassified pixels. In order to solve this problem, we have used a morphological reconstruction, namely region filling.

Before Post-Processing After Post-Processing 24/31

Evaluation Methods and ResultsSeveral images with different properties have been used in this section to demonstrate the performance of the proposed algorithm. The scanned images are taken from different websites.

25/31

Text/non-text classification results-1

Results of three different classification methods:

DWT+K-means DWT+Fisher

DT-DWT+Fuzzy DT-DWT+Fuzzy+Post-Processing 26/31

Text/non-text classification results-2

We use tow criteria to evaluate these methods objectively. Text extraction is a two class classification problem. The precision and recall accuracy of this classification is computed as:

K-means Fisher Fuzzy Fuzzy+P-Proc65

70

75

80

85

90

95

82.5

75.5

86.25

90.4

K-means Fisher Fuzzy Fuzzy+P-Proc0

10

20

30

40

50

60

70

80

90

100

58.25

73.581.25

86.34Precision Rate Recall Rate

27/31

Results of three different classification methods:

Text/picture/background classification results-1

DWT+Fisher

DT-DWT+Fuzzy DT-DWT+Fuzzy+Post-Processing 28/31

We performed other experiments to objectively evaluate these methods. Accuracy measurement is defined as follow:

Text/picture/background classification results-2

Fisher Fuzzy Fuzzy+P-Proc0

10

20

30

40

50

60

70

80

90

100

60.25

83.5 86.75

Accuracy

29/31

In this paper, we have presented a new algorithm for locating the text part based on textural attributes using the dual-tree DWT and fuzzy rule-based classifier. The proposed feature extraction method is simple, effective with low computational requirements. We have applied our algorithm on several document images with different characteristics and complex backgrounds. The obtained results are promising. Considering the input image as a three-class set and classifying it into text/picture/background components is another paper’s novelty.

Conclusion

30/31

Thanks for your attention

31/31