Pattern Recognition 39(2006)317–
327
www.elsevier/locate/patcog
Adaptive degraded document image binarization
B.Gatos ∗,I.Pratikakis,S.J.Perantonis
Computational Intelligence Laboratory,Institute of Informatics and Telecommunications,National Center for Scientific Research “Demokritos”,
15310Athens,Greece
Received 11January 2005;accepted 9September 2005
Abstract
adaptiveThis paper presents a new adaptive approach for the binarization and enhancement of degraded documents.The proposed method does not require any parameter tuning by the user and can deal with degradations which occur due to shadows,non-uniform illumination,low contrast,large signal-dep
endent noise,smear and strain.We follow several distinct steps:a pre-processing procedure using a low-pass Wiener filter,a rough estimation of foreground regions,a background surface calculation by interpolating neighboring background intensities,a thresholding by combining the calculated background surface with the original image while incorporating image up-sampling and finally a post-processing step in order to improve the quality of text regions and preserve stroke connectivity.After extensive experiments,our method demonstrated superior performance against four (4)well-known techniques on numerous degraded document images.᭧2005Pattern Recognition Society.Published by Elsevier Ltd.All rights reserved.
Keywords:Degraded document images;Local adaptive binarization
1.Introduction
Document image binarization (threshold selection)refers to the conversion of a gray-scale image into a binary image.It is the initial step of most document image analysis and understanding systems.Usually,it distinguishes text areas from background areas,so it is used as a text locating tech-nique.Binarization plays a key role in document processing since its performance affects quite critically the degree of success in a subsequent character segmentation and recog-nition.When proc
essing degraded document images,bina-rization is not an easy task.Degradations appear frequently and may occur due to several reasons which range from the acquisition source type to environmental conditions.Exam-ples of degradation influence may include the appearance of variable background intensity caused by non-uniform inten-sity,shadows,smear,smudge and low contrast.
∗Corresponding author.Tel.:+302106503183;fax:+302106532175.
E-mail addresses:bgat@ (B.Gatos),ipratika@ (I.Pratikakis),sper@ (S.J.Perantonis)
URL:www./cil .
0031-3203/$30.00᭧2005Pattern Recognition Society.Published by Elsevier Ltd.All rights reserved.doi:10.1016/j.patcog.2005.09.010
In general,approaches that deal with document image bi-narization are either global or local.In a global approach,threshold selection leads to a single threshold value for the entire image.Global thresholding [1–5]has a good per-formance in the case that there is a good separation be-tween the f
oreground and the background.However,very often,document images are exposed to degradations that weaken any guaranty for such a separation.Unlike global approaches,local area information may guide the threshold value for each pixel in local (adaptive)thresholding tech-niques [6–14].These techniques have been widely used in document image analysis because they have a better perfor-mance in extracting the character strokes from an image that contains spatially uneven gray levels due to degradations.While it is essential to have a binarization technique that can correctly keep all essential textual information,it is of equal importance to apply this technique automatically with-out requiring from the user to adjust a set of parameters each time it is applied.
Taking all the above into account,in this paper,we pro-pose a novel locally adaptive thresholding scheme which binarizes and enhances poor quality and degraded docu-ments for the location of meaningful textual information
318 B.Gatos et al./Pattern Recognition39(2006)317–327
without requiring any parameter tuning.The proposed scheme consists offive basic steps.Thefirst step is ded-icated to a denoising procedure using a low-pass Wiener filter.We use an adaptive Wiener method based on local statistics.In the second step,we use afirst rough estimation of foregro
und regions.Next,as a third step,we compute the background surface of the image by interpolating neighbor-ing background intensities into the foreground areas that result from the previous step.In the fourth step,we proceed to thefinal binarization by combining information from the calculated background surface and the original image. Text areas are located if the distance of the original image from the calculated background exceeds a threshold.This threshold adapts to the gray-scale value of the background surface in order to preserve textual information even in very dark background areas.In the last step,we proceed to a post-processing that eliminates noise,improves the quality of text regions and preserves stroke connectivity.The pro-posed method has been extensively tested with a variety of degraded image documents and has demonstrated superior performance against four(4)well-known techniques.
The paper is organized as follows.Section2briefly re-views the state-of-the-art with particular emphasis on local adaptive methods used during our experiments for compar-ison purposes.In Section3,our methodology is described in detail while in Section4we discuss our experimental results.Finally,conclusions are drawn in Section5.
2.Related work
In the literature,binarization is performed either globally or locally.For the global methods(global thresholding),a single calculated threshold value is used to classify image pixels into object or background classes[1–5],while for the local methods(adaptive thresholding),local area in-formation guides the threshold value for each pixel[6,7]. Most of the image binarization algorithms rely on statistical methods,without taking into account the special nature of document images[8–10].However,some document-directed binarization techniques have been developed [11–14].For document image binarization,global thresh-olding methods are not sufficient since document images usually are degraded and have poor quality including shadows,non-uniform illumination,low contrast,large signal-dependent noise,smear and strains.Concerning local methods,a goal-directed performance evaluation of eleven popular local thresholding algorithms has been performed for map images[13].According to this evaluation,for a slowly changing background,local algorithms work well. However,with a complex background,it appeared that none can be tuned up with a set of operating parameters good for all images.Furthermore,local algorithms were dependent on stroke width.A recent exhaustive survey of forty(40) image binarization methods,both global and local,is pre-sented in Ref.[15].Conducting a quantitative performance evaluation,local methods are shown to perform better.Nev-ertheless,this evaluation took into consideration only text document images that are degraded with noise and blur.
In the following,we review three(3)local binarization algorithms that are considered as the current state-of-the-art. These algorithms have been used for the comparison and evaluation of our approach.
Niblack[8]introduced an algorithm that calculates a pix-elwise threshold by shifting a rectangular window across the image.The threshold T for the center pixel of the window is computed using the mean m and the variance s of the gray values in the window
T=m+ks,(1) where k is a constant set to−0.2.The value of k is used to determine how much of the total print object boundary is taken as a part of the given object.This method can dis-tinguish the object from the background effectively in the areas close to the objects.The results are not very sensitive to the window size as long as the window covers at least1–2 characters.However,noise that is present in the background remains dominant in thefinal binary image.Consequently, if the objects are sparse in an image,a lot of background noise will be left.
Sauvola and Pietikainen[11]propose a method that solves this problem by adding a hypothesis on the gray values of text and background pixels(text pixels have gray values near 0and background pixels have gray values near255),which results in the following formula for the threshold:
T=m+(1−k(1−s/R)),(2) where R is the dynamics of the standard deviationfixed to 128and k takes on positive values(usually set to0.5).This method gives better results for document images.
Kim et al.[7]propose a local adaptive thresholding method where an image is regarded as a3D terrain and its local property is characterized by a waterflow model.The waterflow model locally detects the valleys corresponding to regions that are lower than neighboring regions.The deep valleys arefilled with dropped water whereas the smooth plain regions keep up dry.Thefinal step in this method concerns the application of a global thresholding such as Otsu’s method[2]on a difference image between the orig-inal terrain and the water-filled terrain.A shortcoming of this method is the selection of two critical parameters for the method,namely,the amount of rainfall,w,and the pa-rameter of mask size,s,which is done on an experimental basis.
3.Methodology
The proposed methodology for degraded and poor qual-ity document image binarization,text preservation and
B.Gatos et al./Pattern Recognition 39(2006)317–327
319
Fig.1.Block diagram of the proposed methodology for low quality historical document text
preservation.
Fig.2.Image pre-processing:(a)original image;(b)3×3Wiener filtering.
enchancement is illustrated in Fig.1and fully described in this section.3.1.Pre-processing
For degraded and poor quality documents,a pre-processing stage of the grayscale source image is essential for the elimination of noisy areas,smoothing of back-ground texture as well as contrast enhancement between background and text areas.The use of a low-pass Wiener filter [16]has proved efficient for the above goals.The Wiener filter is commonly used in filtering theory for image r
estoration.Our pre-processing module involves an adap-tive Wiener method based on statistics estimated from a local neighborhood around each pixel.The grayscale source image I s is transformed to the filtered grayscale image I according to the following formula:
I (x,y)= + 2−v 2
)(I s (x,y)− / 2,(3)where is the local mean, 2the variance at a 3×3neighborhood around each pixel and v 2is the average of all estimated variances for each pixel in the neighborhood.Fig.2shows the results of applying a 3×3Wiener filter to a document image.
3.2.Rough estimation of foreground regions
At this step,we obtain a rough estimation of foreground (text)regions.Our intention is to proceed to an initial seg-mentation of foreground and background regions that will provide us a superset of the correct set of foreground pix-els.This is refined at a later step (Section 3.3).
Sauvola’s
Fig.3.Adaptive thresholding using Sauvola’s approach:(a)original image;(b)rough estimation of foreground regions.
approach for adaptive thresholding [11]using k =0.2,is suitable for this case (see Fig.3).At this step,we process image I (x,y)in order to extract the binary image S(x,y),where 1’s correspond to the rough estimated foreground regions.
3.3.Background surface estimation
At this stage,we compute an approximate background surface B(x,y)of the image I (x,y).A similar approach has been proposed for the binarization of camera images [17].Background surface estimation is guided by the val-uation of S(x,y)image.For pixels that correspond to 0’s at image S(x,y),the corresponding value at B(x,y)equals to I (x,y).For the remaining pixels,the valuation of B(x,y)is computed by a neighboring pixel interpolation,as
320 B.Gatos et al./Pattern Recognition 39(2006)317–
327
Fig. 4.Background surface estimation:(a)image I ;(b)background
surface B .
described in
B(x,y)
=
I (x,y)
if S(x,y)=0, x +d x ix =x −d x y +d y
iy =y −d y (I (ix ,iy )(1−S(ix ,iy )))
x +d x ix =x −d x y +d y iy =y −d y (1−S(ix ,iy ))
if S(x,y)=1.
(4)
The interpolation window of size d x ×d y is defined to cover at least two image characters.An example of the background surface estimation is shown in Fig.4.3.4.Final thresholding
In this step,we proceed to final thresholding by combining the calculated background surface B(x,y)with the prepro-cessed image I (x,y).Text areas are located if the distance of the preprocessed
image I (x,y)with the calculated back-ground B(x,y)exceeds a threshold d .We suggest that the threshold d must change according to the gray-scale value of the background surface B(x,y)in order to preserve tex-tual information even in very dark background areas.For this reason,we propose a threshold d that has smaller values for darker regions.The final binary image T (x,y)is given by the following formula:
T (x,y)=
1if B(x,y)−I (x,y)>d(B(x,y)),
0otherwise .(5)
A typical histogram of a document image (see Fig.5)has two peaks.One peak corresponds to text regions and the other peak corresponds to background regions.We may note that we consider gray value document images in the range of [0,255]and textual information has a low gray level profile.The average distance between the foreground and background can be calculated by the following formula:
= x y (B(x,y)−I (x,y)) x y S(x,y).(6)In the case of document images with uniform illumina-tion,the minimum threshold d between text pixels and
background pixels can be defined as q · ,where q is a
weighting parameter.Fixing the value of q at 0.8we achieve total character body preservation that leads to successful OCR results [17].At low contrast regions that appear in degraded and poor quality documents,we require a smaller value for the threshold d .To achieve this,we first compute the average background values b of the background surface B that correspond to the text areas of image S ,denoted as
b =
x
y
(B(x,y))(1−S(x,y)) x y (1−S(x,y))
.(7)
We require that the threshold be equal to the value q ·
when the background value is high (greater than the average background value b at Eq.(7))and equal to p 2·q · when the background value is low (less than p 1·b ),with p 1,p 2∈[0,1].To simulate this requirement,we use the following logistic sigmoid function that exhibits the desired saturation behavior for large and small values of the background as shown in Fig.6:
d(B(x,y))=q
(1−p 2)
1+exp (−4B(x,y)
b(1−p 1)+
2(1+p 1)
(1−p 1))
+p 2 .
(8)
After experimental work,for the case of degraded and poor quality document images,we suggest the following param-eter values:q =0.6,p 1=0.5,p 2=0.8.3.5.Upsampling
In order to achieve a better quality binary image we incor-porate in the previous step an efficient upsampling technique.Among available image upsampling techniques,bicubic in-terpolation is the most common technique that provides sat-isfactory results [18].It estimates the value at a pixel in the destination image by an average of 16pixels surrounding the closest corresponding pixel in the source image.Bicubic interpolation for image upsampling can be incorporated in our algorithm by substituting formula (5)with the following formula:
T x ,y =
1if B(x,y)−I u (x ,y )>d(B(x,y)),0otherwise ,
(9)where T (x ,y )is the binary image of size M times the size of the original gray scale image,x =(int )x /M,y =(int )y /M and I u is given by the following formula:I u x ,y
=−b(1−b)2F (x ,y −1)
+
1−2b 2+b 3
F (x ,y)
+b 1+b −b 2
F (x ,y +1)
−b 2(1−b)F (x ,y +2),
(10)
B.Gatos et al./Pattern Recognition 39(2006)317–327
321
Fig.5.Document image histogram:(a)original image;(b)gray level
histogram.
54
5250484644420
50
100
150200250
p 1b
B b
p 2q
d q Fig.6.Function d(B(x,y)).
where F ()is calculated as follows:F (x ,m)=−a(1−a)2I (x −1,m)
+ 1−2a 2+a 3
I (x,m)
+a 1+a −a 2
I (x +1,m)
−a 2(1−a)I (x +2,m),
(11)
where a =(x /M)−x,b =(y /M)−y .
In most cases,a double sized resulting binary image (M =2)is adequate for an improved quality result.3.6.Post-processing
In the final step,we proceed to post-processing of the resulting binary image in order to eliminate noise,improve the quality of text regions and preserve stroke connectivity by isolated pixel removal and filling of possible breaks,gaps or holes.Below follows a detailed step-by-step description of the post-processing algorithm that consists of a successive application of shrink and swell filtering [19].
Step 1:A shrink filter is used to remove noise from the background.The entire binary image is scanned and each foreground pixel is examined.If P sh is the number of back-ground pixels in a sliding n ×n window,which has the fore-ground pixel as the central pixel,then this pixel is changed to background if P sh >k sh where k sh can be defined exper-imentally.
Step 2:A swell filter is used to fill possible breaks,gaps or holes in the foreground.The entire binary image is scanned and each background pixel is examined.If P sw is the number of foreground pixels in a sliding n ×n window,which has the background pixel (x,y)as the central pixel,and x a ,y a the average values for all foreground pixels in the n ×n window,then this pixel is changed to foreground if
P sw >k sw and |x −x a |<d x and |y −y a |<d y .The latter two conditions are used in order to prevent an increase in the thickness of character strokes since we examine only background pixels among uniformly distributed foreground pixels.
Step 3:An extension of the above conditions,leads to a further application of a swell filter that is used to improve the quality of the character strokes.The entire binary image is scanned and each background pixel is examined.If P sw 1is the number of foreground pixels in a sliding n ×n window,which has the background pixel as the central pixel,then this pixel is changed to foreground if P sw 1>k sw 1.
All the parameters used at this step depend on the average character height l h,which is estimated by using connected component analysis.A height histogram of the connected components is formed and the largest peak is selected as the average character height.After experimental work on a rep-resentative training set,we suggest the following parameter values for the post-processing phase:n =0.15l h ,k sh =0.9n 2,k sw =0.05n 2,d x =d y =0.25n ,k sw 1=0.35n 2.An exam-ple of a resulting binary image after post-processing steps is given in Fig.7.
4.Experimental results
The proposed algorithm was tested using degraded doc-ument images which belong to three distinct categories:
322 B.Gatos et al./Pattern Recognition 39(2006)317–
327
Fig.7.Post-processing stage:(a)resulting binary image after final thresh-olding;(b)resulting image after post-processing Step 1;(c)resulting im-age after post-processing Step 2;(d)resulting image after post-processing Step 3.
historical handwritten documents,old newspapers and poor quality modern documents.All images have varying res-olutions,stroke sizes,illumination contrast,background complexity and noise levels.The historical
handwritten
Fig.8.Examples from the handwritten historical documents used for
testing.
Fig.9.Examples from the old newspapers used for testing.
documents were selected from the Library of Congress on-line database [20],from the Mount Sinai F
oundation Col-lection [21],from the Bodleian Library [22],as well as from private collections.All historical handwritten document im-ages are of poor quality and have shadows,non-uniform il-lumination,smear and strain.In some of these documents,ink sipped through the pages and characters on the reverse side become visible and interfere with the characters on the front side.Example historical handwritten documents used for testing are shown in Fig.8.Old newspaper images come from the Library of Congress on-line database [20]and suf-fer from problems similar to historical documents.Addition-ally,old newspaper images have extra noise due to the old printing matrix quality or ink diffusion.Example old news-paper documents used for testing are shown in Fig.9.Poor quality modern documents originate from the MediaTeam Oulu Document Database [23]as well as from recent scan-nings of books and magazines.All modern documents are degraded with difficulties in distinguishing text and back-ground regions.
We compared the performance of our algorithm with four (4)well-known binarization techniques.We evaluated the following:Otsu’s global thresholding method [2],Niblack’s adaptive thresholding method [8],Sauvola et al.adaptive
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系QQ:729038198,我们将在24小时内删除。
发表评论