Adaptive page segmentation for color technical journals' cover images

  • Published on

  • View

  • Download

Embed Size (px)


<ul><li><p>Adaptive page segmentation for color technical journals cover images1Wei-Yuan Chen, Shu-Yuan Chen*</p><p>Department of Computer Engineering and Science, Yuan-Ze University, 135 Far-East Rd, Nei-Li, Chung-Li, Taoyuan 32026, Taiwan, Republic of ChinaReceived 20 September 1996; received in revised form 18 September 1997; accepted 8 December 1997</p><p>Abstract</p><p>Page segmentation to locate text blocks is a prior and primary step in document processing, in particular for understanding a journalscover page. However, texts, graphics and images are usually isolated in most documents, unlike cover pages in which texts may be overlaidonto graphics or images. In this paper a new adaptive page segmentation method is proposed to extract text blocks from various types of colortechnical journals cover images. Although color involves useful information to overcome the overlapping problem, color processingrequires tremendous computation loads. Thus, a complexity analysis is included to adaptively adjust processing steps in our approach. Inother words, simple cover images, with few colors and no textgraphics/image overlapping, can be treated as monochrome images to speedup processing time, while for complex cover images, with many colors and textgraphics/image overlapping, correct segmentation resultscan still be obtained but more processing time is required. To accomplish the design concept mentioned above, our method includes severalcomponents. First, in order to degrade the processing complexity on true color images, a new simple quantization method is employed toreduce the color numbers from 24-bit true colors to 42 colors or less. In the block segmentation stage, smearing, labeling and complexityanalysis techniques are used together with edge and color information to find out coherent blocks adaptively. After that, in the blockclassification stage, some conventional and some new features are computed from each block to decide whether it is a text block or not.Finally, in the post-processing stage, some spatial relations are adopted to rectify the classification results. Experimental results prove thefeasibility and practicality of the proposed approach. q 1998 Elsevier Science B.V. All rights reserved.</p><p>Keywords: Page segmentation; Text extraction; Color quantization; Block classification; Document processing; Complex background</p><p>1. Introduction</p><p>1.1. Motivation</p><p>Rapid processes in computer technology allow computervision to be applied in many fields. However, many challen-ging research issues must be overcome to accomplish thegoal of replacing human eyesight by computer vision. Forexample, it is a difficult and worth-studying problem toextract text blocks from complex background in colorimages and then recognize characters within these blocks,although it is a trivial activity for human eyes. More impor-tantly, this particular pattern recognition technique can beapplied to journal contents recognition and journal check-inautomation for library automation. Thus, the goal of thispaper is to develop a good page segmentation method toseparate text blocks from non-text blocks for the colortechnical journals cover images.</p><p>Actually, text-block extraction has been studied in thefield of document analysis for many years and several tech-niques have been developed to achieve the requirements.But texts, graphics and images are usually isolated inmost documents, unlike cover pages in which texts maybe overlaid on graphics or images. Moreover, most of pre-vious research issues were focused on monochrome images.Therefore, we intend to solve this type of problem so as notonly to propose a novel page segmentation method for thecolor images of cover pages but also to develop a generaltext extraction method for any types of color documentimages.</p><p>Unfortunately, the variety of color cover images isdiverse. According to the criterion of processing difficulty,the color cover images can be classified into the followingtwo levels. One includes all the simple images in whichtexts, images or graphics are isolated, as shown in Fig. 11;the other includes all the complex images in which textblocks are overlaid on images or graphics, as shown inFig. 13. In order to handle all the varieties the proposedmethod must have the ability of self-tuning to the complexityof color cover images. In other words, the proposed method</p><p>0262-8856/98/$ - see front matter q 1998 Elsevier Science B.V. All rights reserved.PII S0262-8856(98)00062-6</p><p>* Corresponding author. Tel: 886 3463 8800 ext.357; fax: 886 3463 8850;e-mail:</p><p>1 This work was supported partially by the National Science Council,Republic of China, under NSC 85-2213-E-155-021.</p><p>Image and Vision Computing 16 (1998) 855877</p><p>IMAVIS 1518</p></li><li><p>must adjust processing steps such that both categories canbe processed correctly but with different performance.</p><p>1.2. Survey of related studies</p><p>Since the early 1980s a lot of techniques have been pro-posed to accomplish the task of page segmentation, but mostof these techniques have been proposed for monochromedocument images. For example, there are two famousapproaches, the RLSA (run-length smoothing algorithm)[1] and the RXYC (recursive XY cuts) [2], on whichmost of later studies [39,24,25,27] are based. The RLSAwas proposed by Johnstom [1] to distinguish text blocksfrom graphics and first extended by Wong et al. [3] to obtainvarious types of blocks, each containing the same type ofdata. The method is to smear a binary image by connectingthe black pixels which are close to each other. Then blockscan be found by combining the horizontal and verticalsmearing results. Nagy, Seth and Stoddard [2] use horizon-tal and vertical projections and make cuts corresponding todeep valleys in profiles to segment a document image intodisjointed blocks. However, most of the segmenting proce-dures based on the above two approaches will fail on theimages having severe tilt angles. Several major categories ofskew detection are based on the techniques of projectionprofile [10], Hough transform [11], white streams [12] andnearest-neighbor cluster [13,14].</p><p>Besides the above two approaches often used, severalother techniques have been proposed to achieve page seg-mentation for monochrome documents. Examples are usingknowledge model [15,16] and text regularity properties[12,14]. Nagy et al. [15,16] accomplish both document seg-mentation and component labeling simultaneously. Thesegmentation is aided by performing component labelingbased on a specific grammar regarding technical journallayout. The text regularity property is included in Ref.[12] by using white spaces for block segmentation andusing correlation between adjacent scanlines for text identi-fication. In [14] such property is signified in documentspectrum, docstrum, which is based on nearest-neighborclustering of page components.</p><p>Actually, docstrum can provide proper information toobtain the orientation, text lines and text blocks of adocument. On the other hand, Jain [17,18] involves suchregularities by regarding text regions as texture. Then themutichannel filtering technique for texture segmentation,with specific Gabor filters, is used for page segmentation.For a broad survey, see [1922].</p><p>Recently, more and more printed documents withcomplex layout and multiple colors are published everyday. One of the most difficult tasks to develop a generaldocument analysis system is how to handle the millions ofnumerous colors in the digitized image rapidly and accu-rately. Median-cut color quantization [23] is a simple anduseful method. Tsai et al. used this method for color quan-tization in Ref. [24] and further modified it as a two-step</p><p>method in [25]. Another approach is based on the histogramanalysis. Zhong et al. [26] labeled image pixels as a fewprototype colors. The prototypes are found as local maximain a smoothed color histogram of the input image. Mostly,the number of prototypes is about 5500. A variant quan-tization method, differing from the above two, was proposedby Suen and Wang [27]. It used edge detection techniques toquantize the true color images into binary images.</p><p>Different color quantization methods can greatly impressdifferent text features and classification procedures. Ba-sically, features and classification techniques proposed formonochrome images can be used directly on an individualcolor plane of a quantized color image if each color plane istreated as a monochrome one [24,25]. However, suchapproach has two shortcomings. One is that if the segmenta-tion procedure is applied on a color plane individually,computing time is rising greatly. The other is that treatingcolor as a whole can provide more information than treatingcolor as three independent components. Thus, manyapproaches have been proposed to develop document ana-lysis techniques directly on the color images. In Ref. [26], ahybrid method, combining connected component methodand spatial variance techniques, was proposed to locatetexts in complex color images. The basic idea is that regionswith high variance correspond to text lines and regions withlow variance correspond to white regions between the textlines. Together with the color quantization method based onhistogram analysis, text regions can be located. However,this method will fail when texts are overlaid on a complexgraphic background. On the other hand, edge-based colorquantization method was used in Ref. [27] to extract textstrings from images of color printed documents. Fordocuments with uniform background, it is a good andrapid technique.</p><p>1.3. Proposed approach</p><p>1.3.1. Overview of the proposed approachTo reach the requirement mentioned in Section 1.1,</p><p>human heuristics is employed in the proposed approach.That is, texts can be observed only when the contrastbetween them and their surrounding background is highenough. More specifically, we will define all the blockswith a large color difference from their surroundings asprimary blocks. Then, further processing can be restrictedonly to these primary blocks to reduce computation load.</p><p>However, for a primary block, color processing stillinvolves a great deal of computational load. On the otherhand, color is useful information to overcome the textgra-phics/image overlapping problem since texts with uniformcolor can be detected by considering only those pixels witha specific color. In order to solve the overlapping problembut not to sacrifice the processing speed for the non-over-lapping case we employ an adaptive block segmentationprocedure on the primary block. In other words, complexityanalysis for further subdivision, which is based on the</p><p>856 W.-Y. Chen, S.-Y. Chen/Image and Vision Computing 16 (1998) 855877</p></li><li><p>Fig. 1. Flowchart of the proposed method: (a) the overall page segmentation; (b) the edge-based block segmentation; (c) the color-based block segmentation.</p><p>857W.-Y. Chen, S.-Y. Chen/Image and Vision Computing 16 (1998) 855877</p></li><li><p>number of colors included in the block, is performed on theprimary block first. The resulting simple primary block canbe considered a coherent block on which no subdivision isneeded; while the resulting complex primary block withtexts, graphics and images intermixed must be furtherdecomposed into coherent blocks using color information.In general, simple images will be decomposed into manysmaller primary blocks, each of which is a coherent blockand can be classified directly. In contrast, complex imageswill be decomposed into a few larger adulterant primaryblocks, on each of which further subdivision will be neededand classification can then be applied.</p><p>In summary, a simple image can be handled by involvingonly one color plane, just like a monochrome image; whilefor a complex image, multiple color planes must beinvolved and thus more computation time is demanded.Henceforth, adaptive page segmentation can be accomplished.</p><p>In order to complete the design concept mentioned above,our method can be separated by four major parts: colorquantization, adaptive block segmentation including edge-based and colorbased block segmentation, block classi-fication and post-processing, as shown in Fig. 1(a).Moreover, the edge-based and color-based block segmenta-tion is composed of three steps as shown in Fig. 1(b),(c),respectively. In the remainder of this paper the four majorparts will be described in Sections 25, respectively.Experimental results and conclusions are included in Sec-tions 6 and 7.</p><p>1.3.2. AssumptionsSome restrictions are made in this paper to reduce the</p><p>complexity of our method:</p><p>1. The skew angle of the scanned image cannot be toosloppy.</p><p>2. A character is assumed to have uniform color. Texts withgradient colors are avoided.</p><p>3. Documents contain horizontal textlines only. However,the proposed method can be extended to handletextlines written in vertical direction. Nevertheless,such a generalized algorithm will involve excessivecomputation.</p><p>2. Simple color quantization</p><p>When an A4 journal cover page is scanned as a 250 dpi(dot per inch) true color image, the amount of raw datastorage will be over 15 Mb. Storage is a huge load for theimage processing. It is also a heavy burden if the true colorimage is processed directly. In fact, only a few primarycolors are sufficient for segmentation processing. To savecomputational requirement, we propose a simple colorquantization method to reduce the color clusters from16 777 216 to 42 or less. In this section, quantized colorsare first defined in Section 2.1 followed by quantizationmethod described in Section 2.2.</p><p>2.1. Definition of quantized colors</p><p>The RGB color system is first transformed to the YIQcolor system, because YIQ color system is more related to</p><p>Table 1The initial cluster centers of the 21 quantized colors in the basic group</p><p>Color Chrome Color description R G B</p><p>0 0 Dark red DARK 0 01 1 Dark green 0 DARK 02 2 Dark blue 0 0 DARK3 3 Dark yellow DARK DARK 04 4 Dark magenta DARK 0 DARK5 5 Dark cyan 0 DARK DARK6 6 Dark gray (black) DARK DARK DARK7 0 Red MIDDLE 0 08 1 Green 0 MIDDLE 09 2 Blue 0 0 MIDDLE10 3 Yellow MIDDLE MIDDLE 011 4 Magenta MIDDLE 0 MIDDLE12 5 Cyan 0 MIDDLE MIDDLE13 6 Gray MIDDLE MIDDLE MIDDLE14 0 Light red LIGHT 0 015 1 Light green 0 LIGHT 016 2 Light blue 0 0 LIGHT17 3 Light yellow LIGHT LIGHT 018 4 Light magenta LIGHT 0 LIGHT19 5 Light cyan 0 LIGHT LIGHT20 6 Light gray (white) LIGHT LIGHT LIGHTDARK 32; MIDDLE 112; and LIGHT 208.</p><p>858 W.-Y. Chen, S.-Y. Chen/Image and Vision Computing 16 (1998) 855877</p></li><li><p>human visual perception. The conversion from RGB to YIQis defined by [28]:</p><p>Y</p><p>I</p><p>Q</p><p>26643775</p><p>0:299 0:587 0:114</p><p>0:596 0:275 0:321</p><p>0:212 0:523 0:311</p><p>26643775</p><p>R</p><p>G</p><p>B</p><p>26643775: (1)</p><p>A basic group including 21 quantized colors is thendefined by the combination of seven different chromes(Red, Green, Blue, Yellow, Magenta, Cyan and Gray,denoted by chrome labels 06) and three different lumi-nance values (DARK, MIDDLE and LIGHT set as empiricalvalues 32, 112 and 208, respectively, in this paper). It ismentioned that the seven chromes include three primarycolors of light, three primary col...</p></li></ul>


View more >