Adaptive Degraded Document Image Binarization

Pattern Recognition 39(2006)317–

327

www.elsevier/locate/patcog

Adaptive degraded document image binarization

B.Gatos ∗,I.Pratikakis,S.J.Perantonis

Computational Intelligence Laboratory,Institute of Informatics and Telecommunications,National Center for Scientiﬁc Research “Demokritos”,

15310Athens,Greece

Received 11January 2005;accepted 9September 2005

Abstract

adaptive

This paper presents a new adaptive approach for the binarization and enhancement of degraded documents.The proposed method does not require any parameter tuning by the user and can deal with degradations which occur due to shadows,non-uniform illumination,low contrast,large signal-dep

endent noise,smear and strain.We follow several distinct steps:a pre-processing procedure using a low-pass Wiener ﬁlter,a rough estimation of foreground regions,a background surface calculation by interpolating neighboring background intensities,a thresholding by combining the calculated background surface with the original image while incorporating image up-sampling and ﬁnally a post-processing step in order to improve the quality of text regions and preserve stroke connectivity.After extensive experiments,our method demonstrated superior performance against four (4)well-known techniques on numerous degraded document images.᭧2005Pattern Recognition Society.Published by Elsevier Ltd.All rights reserved.

Keywords:Degraded document images;Local adaptive binarization

1.Introduction

Document image binarization (threshold selection)refers to the conversion of a gray-scale image into a binary image.It is the initial step of most document image analysis and understanding systems.Usually,it distinguishes text areas from background areas,so it is used as a text locating tech-nique.Binarization plays a key role in document processing since its performance affects quite critically the degree of success in a subsequent character segmentation and recog-nition.When proc

essing degraded document images,bina-rization is not an easy task.Degradations appear frequently and may occur due to several reasons which range from the acquisition source type to environmental conditions.Exam-ples of degradation inﬂuence may include the appearance of variable background intensity caused by non-uniform inten-sity,shadows,smear,smudge and low contrast.

∗Corresponding author.Tel.:+302106503183;fax:+302106532175.

E-mail addresses:bgat@ (B.Gatos),ipratika@ (I.Pratikakis),sper@ (S.J.Perantonis)

URL:www./cil .

In general,approaches that deal with document image bi-narization are either global or local.In a global approach,threshold selection leads to a single threshold value for the entire image.Global thresholding [1–5]has a good per-formance in the case that there is a good separation be-tween the f

oreground and the background.However,very often,document images are exposed to degradations that weaken any guaranty for such a separation.Unlike global approaches,local area information may guide the threshold value for each pixel in local (adaptive)thresholding tech-niques [6–14].These techniques have been widely used in document image analysis because they have a better perfor-mance in extracting the character strokes from an image that contains spatially uneven gray levels due to degradations.While it is essential to have a binarization technique that can correctly keep all essential textual information,it is of equal importance to apply this technique automatically with-out requiring from the user to adjust a set of parameters each time it is applied.

Taking all the above into account,in this paper,we pro-pose a novel locally adaptive thresholding scheme which binarizes and enhances poor quality and degraded docu-ments for the location of meaningful textual information

318 B.Gatos et al./Pattern Recognition39(2006)317–327

without requiring any parameter tuning.The proposed scheme consists ofﬁve basic steps.Theﬁrst step is ded-icated to a denoising procedure using a low-pass Wiener ﬁlter.We use an adaptive Wiener method based on local statistics.In the second step,we use aﬁrst rough estimation of foregro

und regions.Next,as a third step,we compute the background surface of the image by interpolating neighbor-ing background intensities into the foreground areas that result from the previous step.In the fourth step,we proceed to theﬁnal binarization by combining information from the calculated background surface and the original image. Text areas are located if the distance of the original image from the calculated background exceeds a threshold.This threshold adapts to the gray-scale value of the background surface in order to preserve textual information even in very dark background areas.In the last step,we proceed to a post-processing that eliminates noise,improves the quality of text regions and preserves stroke connectivity.The pro-posed method has been extensively tested with a variety of degraded image documents and has demonstrated superior performance against four(4)well-known techniques.

The paper is organized as follows.Section2brieﬂy re-views the state-of-the-art with particular emphasis on local adaptive methods used during our experiments for compar-ison purposes.In Section3,our methodology is described in detail while in Section4we discuss our experimental results.Finally,conclusions are drawn in Section5.

2.Related work

In the literature,binarization is performed either globally or locally.For the global methods(global thresholding),a single calculated threshold value is used to classify image pixels into object or background classes[1–5],while for the local methods(adaptive thresholding),local area in-formation guides the threshold value for each pixel[6,7]. Most of the image binarization algorithms rely on statistical methods,without taking into account the special nature of document images[8–10].However,some document-directed binarization techniques have been developed [11–14].For document image binarization,global thresh-olding methods are not sufﬁcient since document images usually are degraded and have poor quality including shadows,non-uniform illumination,low contrast,large signal-dependent noise,smear and strains.Concerning local methods,a goal-directed performance evaluation of eleven popular local thresholding algorithms has been performed for map images[13].According to this evaluation,for a slowly changing background,local algorithms work well. However,with a complex background,it appeared that none can be tuned up with a set of operating parameters good for all images.Furthermore,local algorithms were dependent on stroke width.A recent exhaustive survey of forty(40) image binarization methods,both global and local,is pre-sented in Ref.[15].Conducting a quantitative performance evaluation,local methods are shown to perform better.Nev-ertheless,this evaluation took into consideration only text document images that are degraded with noise and blur.

In the following,we review three(3)local binarization algorithms that are considered as the current state-of-the-art. These algorithms have been used for the comparison and evaluation of our approach.

Niblack[8]introduced an algorithm that calculates a pix-elwise threshold by shifting a rectangular window across the image.The threshold T for the center pixel of the window is computed using the mean m and the variance s of the gray values in the window

T=m+ks,(1) where k is a constant set to−0.2.The value of k is used to determine how much of the total print object boundary is taken as a part of the given object.This method can dis-tinguish the object from the background effectively in the areas close to the objects.The results are not very sensitive to the window size as long as the window covers at least1–2 characters.However,noise that is present in the background remains dominant in theﬁnal binary image.Consequently, if the objects are sparse in an image,a lot of background noise will be left.

Sauvola and Pietikainen[11]propose a method that solves this problem by adding a hypothesis on the gray values of text and background pixels(text pixels have gray values near 0and background pixels have gray values near255),which results in the following formula for the threshold:

T=m+(1−k(1−s/R)),(2) where R is the dynamics of the standard deviationﬁxed to 128and k takes on positive values(usually set to0.5).This method gives better results for document images.

Kim et al.[7]propose a local adaptive thresholding method where an image is regarded as a3D terrain and its local property is characterized by a waterﬂow model.The waterﬂow model locally detects the valleys corresponding to regions that are lower than neighboring regions.The deep valleys areﬁlled with dropped water whereas the smooth plain regions keep up dry.Theﬁnal step in this method concerns the application of a global thresholding such as Otsu’s method[2]on a difference image between the orig-inal terrain and the water-ﬁlled terrain.A shortcoming of this method is the selection of two critical parameters for the method,namely,the amount of rainfall,w,and the pa-rameter of mask size,s,which is done on an experimental basis.

3.Methodology

The proposed methodology for degraded and poor qual-ity document image binarization,text preservation and

B.Gatos et al./Pattern Recognition 39(2006)317–327

319

Fig.1.Block diagram of the proposed methodology for low quality historical document text

preservation.

Fig.2.Image pre-processing:(a)original image;(b)3×3Wiener ﬁltering.

enchancement is illustrated in Fig.1and fully described in this section.3.1.Pre-processing

For degraded and poor quality documents,a pre-processing stage of the grayscale source image is essential for the elimination of noisy areas,smoothing of back-ground texture as well as contrast enhancement between background and text areas.The use of a low-pass Wiener ﬁlter [16]has proved efﬁcient for the above goals.The Wiener ﬁlter is commonly used in ﬁltering theory for image r

estoration.Our pre-processing module involves an adap-tive Wiener method based on statistics estimated from a local neighborhood around each pixel.The grayscale source image I s is transformed to the ﬁltered grayscale image I according to the following formula:

I (x,y)= + 2−v 2

)(I s (x,y)− / 2,(3)where is the local mean, 2the variance at a 3×3neighborhood around each pixel and v 2is the average of all estimated variances for each pixel in the neighborhood.Fig.2shows the results of applying a 3×3Wiener ﬁlter to a document image.

3.2.Rough estimation of foreground regions

At this step,we obtain a rough estimation of foreground (text)regions.Our intention is to proceed to an initial seg-mentation of foreground and background regions that will provide us a superset of the correct set of foreground pix-els.This is reﬁned at a later step (Section 3.3).

Sauvola’s

Fig.3.Adaptive thresholding using Sauvola’s approach:(a)original image;(b)rough estimation of foreground regions.

approach for adaptive thresholding [11]using k =0.2,is suitable for this case (see Fig.3).At this step,we process image I (x,y)in order to extract the binary image S(x,y),where 1’s correspond to the rough estimated foreground regions.

3.3.Background surface estimation

At this stage,we compute an approximate background surface B(x,y)of the image I (x,y).A similar approach has been proposed for the binarization of camera images [17].Background surface estimation is guided by the val-uation of S(x,y)image.For pixels that correspond to 0’s at image S(x,y),the corresponding value at B(x,y)equals to I (x,y).For the remaining pixels,the valuation of B(x,y)is computed by a neighboring pixel interpolation,as

320 B.Gatos et al./Pattern Recognition 39(2006)317–

327

Fig. 4.Background surface estimation:(a)image I ;(b)background

surface B .

described in

B(x,y)

I (x,y)

if S(x,y)=0, x +d x ix =x −d x y +d y

iy =y −d y (I (ix ,iy )(1−S(ix ,iy )))

x +d x ix =x −d x y +d y iy =y −d y (1−S(ix ,iy ))

if S(x,y)=1.

(4)

The interpolation window of size d x ×d y is deﬁned to cover at least two image characters.An example of the background surface estimation is shown in Fig.4.3.4.Final thresholding

In this step,we proceed to ﬁnal thresholding by combining the calculated background surface B(x,y)with the prepro-cessed image I (x,y).Text areas are located if the distance of the preprocessed

image I (x,y)with the calculated back-ground B(x,y)exceeds a threshold d .We suggest that the threshold d must change according to the gray-scale value of the background surface B(x,y)in order to preserve tex-tual information even in very dark background areas.For this reason,we propose a threshold d that has smaller values for darker regions.The ﬁnal binary image T (x,y)is given by the following formula:

T (x,y)=

1if B(x,y)−I (x,y)>d(B(x,y)),

0otherwise .(5)

A typical histogram of a document image (see Fig.5)has two peaks.One peak corresponds to text regions and the other peak corresponds to background regions.We may note that we consider gray value document images in the range of [0,255]and textual information has a low gray level proﬁle.The average distance between the foreground and background can be calculated by the following formula:

= x y (B(x,y)−I (x,y)) x y S(x,y).(6)In the case of document images with uniform illumina-tion,the minimum threshold d between text pixels and

background pixels can be deﬁned as q · ,where q is a

weighting parameter.Fixing the value of q at 0.8we achieve total character body preservation that leads to successful OCR results [17].At low contrast regions that appear in degraded and poor quality documents,we require a smaller value for the threshold d .To achieve this,we ﬁrst compute the average background values b of the background surface B that correspond to the text areas of image S ,denoted as

b =

(B(x,y))(1−S(x,y)) x y (1−S(x,y))

.(7)

We require that the threshold be equal to the value q ·

when the background value is high (greater than the average background value b at Eq.(7))and equal to p 2·q · when the background value is low (less than p 1·b ),with p 1,p 2∈[0,1].To simulate this requirement,we use the following logistic sigmoid function that exhibits the desired saturation behavior for large and small values of the background as shown in Fig.6:

d(B(x,y))=q

(1−p 2)

1+exp (−4B(x,y)

b(1−p 1)+

2(1+p 1)

(1−p 1))

+p 2 .

(8)

After experimental work,for the case of degraded and poor quality document images,we suggest the following param-eter values:q =0.6,p 1=0.5,p 2=0.8.3.5.Upsampling

In order to achieve a better quality binary image we incor-porate in the previous step an efﬁcient upsampling technique.Among available image upsampling techniques,bicubic in-terpolation is the most common technique that provides sat-isfactory results [18].It estimates the value at a pixel in the destination image by an average of 16pixels surrounding the closest corresponding pixel in the source image.Bicubic interpolation for image upsampling can be incorporated in our algorithm by substituting formula (5)with the following formula:

T x ,y =

1if B(x,y)−I u (x ,y )>d(B(x,y)),0otherwise ,

(9)where T (x ,y )is the binary image of size M times the size of the original gray scale image,x =(int )x /M,y =(int )y /M and I u is given by the following formula:I u x ,y

=−b(1−b)2F (x ,y −1)

1−2b 2+b 3

F (x ,y)

+b 1+b −b 2

F (x ,y +1)

−b 2(1−b)F (x ,y +2),

(10)

B.Gatos et al./Pattern Recognition 39(2006)317–327

321

Fig.5.Document image histogram:(a)original image;(b)gray level

histogram.

5250484644420

100

150200250

p 1b

B b

p 2q

d q Fig.6.Function d(B(x,y)).

where F ()is calculated as follows:F (x ,m)=−a(1−a)2I (x −1,m)

+ 1−2a 2+a 3

I (x,m)

+a 1+a −a 2

I (x +1,m)

−a 2(1−a)I (x +2,m),

(11)

where a =(x /M)−x,b =(y /M)−y .

In most cases,a double sized resulting binary image (M =2)is adequate for an improved quality result.3.6.Post-processing

In the ﬁnal step,we proceed to post-processing of the resulting binary image in order to eliminate noise,improve the quality of text regions and preserve stroke connectivity by isolated pixel removal and ﬁlling of possible breaks,gaps or holes.Below follows a detailed step-by-step description of the post-processing algorithm that consists of a successive application of shrink and swell ﬁltering [19].

Step 1:A shrink ﬁlter is used to remove noise from the background.The entire binary image is scanned and each foreground pixel is examined.If P sh is the number of back-ground pixels in a sliding n ×n window,which has the fore-ground pixel as the central pixel,then this pixel is changed to background if P sh >k sh where k sh can be deﬁned exper-imentally.

Step 2:A swell ﬁlter is used to ﬁll possible breaks,gaps or holes in the foreground.The entire binary image is scanned and each background pixel is examined.If P sw is the number of foreground pixels in a sliding n ×n window,which has the background pixel (x,y)as the central pixel,and x a ,y a the average values for all foreground pixels in the n ×n window,then this pixel is changed to foreground if

P sw >k sw and |x −x a |<d x and |y −y a |<d y .The latter two conditions are used in order to prevent an increase in the thickness of character strokes since we examine only background pixels among uniformly distributed foreground pixels.

Step 3:An extension of the above conditions,leads to a further application of a swell ﬁlter that is used to improve the quality of the character strokes.The entire binary image is scanned and each background pixel is examined.If P sw 1is the number of foreground pixels in a sliding n ×n window,which has the background pixel as the central pixel,then this pixel is changed to foreground if P sw 1>k sw 1.

All the parameters used at this step depend on the average character height l h,which is estimated by using connected component analysis.A height histogram of the connected components is formed and the largest peak is selected as the average character height.After experimental work on a rep-resentative training set,we suggest the following parameter values for the post-processing phase:n =0.15l h ,k sh =0.9n 2,k sw =0.05n 2,d x =d y =0.25n ,k sw 1=0.35n 2.An exam-ple of a resulting binary image after post-processing steps is given in Fig.7.

4.Experimental results

The proposed algorithm was tested using degraded doc-ument images which belong to three distinct categories:

322 B.Gatos et al./Pattern Recognition 39(2006)317–

327

Fig.7.Post-processing stage:(a)resulting binary image after ﬁnal thresh-olding;(b)resulting image after post-processing Step 1;(c)resulting im-age after post-processing Step 2;(d)resulting image after post-processing Step 3.

historical handwritten documents,old newspapers and poor quality modern documents.All images have varying res-olutions,stroke sizes,illumination contrast,background complexity and noise levels.The historical

handwritten

Fig.8.Examples from the handwritten historical documents used for

testing.

Fig.9.Examples from the old newspapers used for testing.

documents were selected from the Library of Congress on-line database [20],from the Mount Sinai F

oundation Col-lection [21],from the Bodleian Library [22],as well as from private collections.All historical handwritten document im-ages are of poor quality and have shadows,non-uniform il-lumination,smear and strain.In some of these documents,ink sipped through the pages and characters on the reverse side become visible and interfere with the characters on the front side.Example historical handwritten documents used for testing are shown in Fig.8.Old newspaper images come from the Library of Congress on-line database [20]and suf-fer from problems similar to historical documents.Addition-ally,old newspaper images have extra noise due to the old printing matrix quality or ink diffusion.Example old news-paper documents used for testing are shown in Fig.9.Poor quality modern documents originate from the MediaTeam Oulu Document Database [23]as well as from recent scan-nings of books and magazines.All modern documents are degraded with difﬁculties in distinguishing text and back-ground regions.

We compared the performance of our algorithm with four (4)well-known binarization techniques.We evaluated the following:Otsu’s global thresholding method [2],Niblack’s adaptive thresholding method [8],Sauvola et al.adaptive

688IT编程网

Adaptive Degraded Document Image Binarization_图文

发表评论

推荐文章

随机森林算法介绍及R语言实现

基于随机森林优化的神经网络算法在冬小麦产量预测中的应用研究_百度文 ...

基于正则化贪心森林算法的情感分析方法研究

随机森林算法和grandientboosting算法

基于随机森林的图像分类算法研究

热门文章

随机森林特征选择原理

自动驾驶系统中的随机森林算法解析

随机森林算法及其在生物信息学中的应用

监督学习中的随机森林算法解析(六)

随机森林算法在数据分析中的应用

机器学习——随机森林,RandomForestClassifier参数含义详解

随机森林的算法

随机森林算法作用

监督学习中的随机森林算法解析(十)

随机森林算法案例

随机森林案例

二分类问题常用的模型

绘制ssd框架训练流程

一种基于信息熵和DTW的多维时间序列相似性度量算法

SVM训练过程范文

如何使用支持向量机进行股票预测与交易分析

二分类交叉熵损失函数binary

tinybert_训练中文文本分类模型_概述说明

基于门控可形变卷积和分层Transformer的图像修复模型及其应用

人工智能开发技术的测试和评估方法

最新文章

基于随机森林的数据分类算法改进

人工智能中的智能识别与分类技术

基于人工智能技术的随机森林算法在医疗数据挖掘中的应用

随机森林回归模型的建模步骤

r语言随机森林预测模型校准曲线

《2024年随机森林算法优化研究》范文

标签列表