Adaptive Multi-Column Deep Neural Networks--688IT编程网

Adaptive Multi-Column Deep Neural Networks

with Application to Robust Image Denoising Forest Agostinelli Michael R.Anderson Honglak Lee

Division of Computer Science and Engineering

University of Michigan

Ann Arbor,MI48109,USA

{agostifo,mrander,honglak}@umich.edu

Abstract

Stacked sparse denoising autoencoders(SSDAs)have recently been shown to be

successful at removing noise from corrupted images.However,like most denois-

ing techniques,the SSDA is not robust to variation in noise types beyond what

it has seen during training.To address this limitation,we present the adaptive

multi-column stacked sparse denoising autoencoder(AMC-SSDA),a novel tech-

nique of combining multiple SSDAs by(1)computing optimal column weights

via solving a nonlinear optimization program and(2)training a separate network

to predict the optimal weights.We eliminate the need to determine the type of

noise,let alone its statistics,at test time and even show that the system can be

robust to noise not seen in the training set.We show that state-of-the-art denois-

ing performance can be achieved with a single system on a variety of different

adaptivenoise types.Additionally,we demonstrate the efﬁcacy of AMC-SSDA as a pre-

processing(denoising)algorithm by achieving strong classiﬁcation performance

on corrupted MNIST digits.

1Introduction

Digital images are often corrupted with noise during acquisition and transmission,degrading perfor-mance in later tasks such as:image recognition and medical diagnosis.Many denoising algorithms have been proposed to improve the accuracy of these tasks when corrupted images must be used. However,most of these methods are carefully designed only for a certain type of noise or require assumptions about the statistical properties of the corrupting noise.

For instance,the Wienerﬁlter[28]is an optimal linearﬁlter in the sense of minimum mean-square error and performs very well at removing speckle and Gaussian noise,but the input signal and noise are assumed to be wide-sense stationary processes,and known autocorrelation functions of the input are required[7].Medianﬁltering outperforms linearﬁltering for suppressing noise in images with edges and gives good output for salt&pepper noise[2],but it is not as effective for the removal of additive Gaussian noise[1].Periodic noise such as scan-line noise is difﬁcult to eliminate using spatialﬁltering but is relatively easy to remove using Fourier domain band-stopﬁlters once the period of the noise is known[6].

Much of this research has taken place in theﬁeld of medical imaging,most recently because of a drive to reduce patient radiation exposure.As radiation dose is decreased,noise levels in medical images increases[12,15],so noise reduction techniques have been key to maintaining image quality

while improving patient safety[25].In this application,assumptions must also be made or statistical properties must also be determined for these techniques to perform well[24].

Recently,various types of neural networks have been evaluated for their denoising efﬁcacy.Xie et al.[29]had success at removing noise from corrupted images with the stacked sparse denoising

autoencoder(SSDA).The SSDA is trained on images corrupted with a particular noise type,so it too has a dependence on a priori knowledge about the general nature of the noise.

In this paper,we present the adaptive multi-column sparse stacked denoising autoencoder(AMC-SSDA),a method to improve the SSDA’s robustness to various noise types.In the AMC-SSDA, columns of single-noise SSDAs are run in parallel and their outputs are linearly combined to pro-duce theﬁnal denoised image.Taking advantage of the sparse autoencoder’s capability for learning features,the features encoded by the hidden layers of each SSDA are supplied to an additional network to determine the optimal weighting for each column in theﬁnal linear combination.

We demonstrate that a single AMC-SSDA network provides better denoising results for both noise types present in the training set and for noise types not seen by the denoiser during training.A given instance of noise corruption might have features in common with one or more of the training set nois

e types,allowing the best combination of denoisers to be chosen based on that image’s speciﬁc noise characteristics.With our method,we eliminate the need to determine the type of noise,let alone its statistics,at test time.Additionally,we demonstrate the efﬁcacy of AMC-SSDA as a preprocessing (denoising)algorithm by achieving strong classiﬁcation performance on corrupted MNIST digits. We will make our code available for reproducible research.

2Related work

Numerous approaches have been proposed for image denoising using signal processing techniques (e.g.,see[21,8]for a survey).Some methods transfer the image signal to an alternative domain where noise can be easily separated from the signal[23,19].Portilla et al.[23]proposed a wavelet-based Bayes Least Squares with a Gaussian Scale-Mixture(BLS-GSM)method.More recent ap-proaches exploit the“non-local”statistics of images:different patches in the same image are often similar in appearance,and thus they can be used together in denoising[11,20,8].This class of algorithms—BM3D[11]in particular—represents the current state-of-the-art in natural image de-noising;however,it is targeted primarily toward Gaussian noise.In our preliminary evaluation, BM3D did not perform well on many of the variety of noise types.

While BM3D is a well-engineered algorithm,Burger et al.[9]showed that it is possible to achieve state-of-the-art denoising performance with a plain multi-layer perceptron(MLP)that maps noisy patches onto noise-free ones,once the capacity of the MLP,the patch size,and the training set are large enough.Therefore,neural networks indeed have a great potential for image denoising. Vincent et al.[27]introduced the stacked denoising autoencoders as a way of providing a good initial representation of the data in deep networks for classiﬁcation tasks.Our proposed AMC-SSDA builds upon this work by using the denoising autoencoder’s internal representation to determine the optimal column weighting for robust denoising.

Cires¸an et al.[10]presented a multi-column approach for image classiﬁcation,averaging the output of several deep neural networks(or columns)trained on inputs preprocessed in different ways.How-ever,based on our experiments,this ,simply averaging the output of each column)is not robust in denoising since each column has been trained on a different type of noise.To address this problem,we propose an adaptive weighting scheme that can handle a variety of noise types. Jain et al.[17]used deep convolutional neural networks for image denoising.Rather than using a convolutional approach,our proposed method applies multiple sparse autoencoder networks in combination to the denoising task.Tang et al.[26]applied deep learning ,extensions of

the deep belief network with local receptiveﬁelds)to denoising and classifying MNIST digits.In comparison,we achieve favorable classiﬁcation performance on corrupted MNIST digits.

3Algorithm

In this section,weﬁrst describe the SSDA[29].Then we will present the AMC-SSDA and describe the process ofﬁnding optimal column weights and predicting column weights for test images.

3.1Stacked sparse denoising autoencoders

A denoising autoencoder(DA)[27]is typically used as a way to pre-train layers in a deep neural network,avoiding the difﬁculty in training such a network as a whole from scratch by performing

greedy layer-wise ,[4,5,13]).As Xie et al.[29]showed,a denoising autoencoder is also a naturalﬁt for performing denoising tasks,due to its behavior of taking a noisy signal as input and reconstructing the original,clean signal as output.

Commonly,a series of DAs are connected to form a stacked denoising autoencoder(SDA)—a deep network formed by feeding the hidden layer’s activations of one DA into the input of the next DA. Typically,SDAs are pre-trained in an unsupervised fashion where each DA layer is trained by gen-era

ting new noise[27].We follow Xie et al.’s method of SDA training by calculating theﬁrst layer activations for both the clean input and noisy input to use as training data for the second layer.As they showed,this modiﬁcation to the training process allows the SDA to better learn the features for denoising the original corrupting noise.

More formally,let y∈R D be an instance of uncorrupted data and x∈R D be the corrupted version of y.We can deﬁne the feedforward functions of the DA with K hidden units as follows:

h(x)=f(Wx+b)(1)

ˆy(x)=g(W h+b )(2) where f()and g()are respectively encoding and decoding functions(for which sigmoid function σ(s)=1

1+exp(−s)

is often used),1W∈R K×D and b∈R K are encoding weights and biases, and W ∈R D×K and b ∈R D are the decoding weights and biases.h(x)∈R K is the hidden layer’s activation,andˆy(x)∈R D is the reconstruction of the ,the DA’s output).Given training data D={(x1,y1),...,(x N,y N)}with N training examples,the DA is trained by back-propagation to minimize the sparsity regularized reconstruction loss given by

L DA(D;Θ)=1

i=1

y i−ˆy(x i) 22+β

j=1

KL(ρ ˆρj)+

( W 2F+ W 2F)(3)

whereΘ={W,b,W ,b }are the parameters of the model,and the sparsity-inducing term KL(ρ ˆρj)is the Kullback-Leibler divergence betweenρ(target activation)andˆρj(empirical av-erage activation of the j-th hidden unit):

KL(ˆρj ρ)=ρlog ρ

ˆρj

+(1−ρ)log

(1−ρ)

1−ˆρj

whereˆρj=

i=1

h j(x i)(4)

andλ,β,andρare scalar-valued hyperparameters determined by cross validation.

In this work,two DAs are stacked as shown in Figure1a,where the activation of theﬁrst DA’s hidden layer provides the input to the second DA,which in turn provides the input to the output layer of theﬁrst DA.This entire network—the SSDA—is trained again by back-propagation in a ﬁne tuning stage,minimizing the loss given by

L SSDA(D;Θ)=1

i=1

y i−ˆy(x i) 22+

l=1

W(l) 2F(5)

where L is the number of stacked DAs(we used L=2in our experiments),and W(l)denotes weights for the l-th layer in the stacked deep network.2The sparsity-inducing term is not needed for this step because the sparsity was already incorporated in the pre-trained DAs.Our experiments show that there is not a signiﬁcant change in performance when sparsity is included.

3.2Adaptive Multi-Column SSDA

The adaptive multi-column SSDA is the linear combination of several SSDAs,or columns,each trained on a single type of noise using optimized weights determined by the features of each given input image.Taking advantage of the SSDA’s capability of feature learning,we use the features gen-er

ated by the activation of the SSDA’s hidden layers as inputs to a neural network-based regression component,referred to here as the weight prediction module.As shown in Figure1b,this module then uses these features to compute the optimal weights used to linearly combine the column outputs into a weighted average.

1In particular,the sigmoid function is often used for decoding the input data when their values are bounded between0and1.For general cases,other types of functions(such as tanh,rectiﬁed linear,or linear functions) can be used.

2After pre-training,we initialized W(1)and W(4)from the encoding and decoding weights of theﬁrst-layer DA,and W(2)and W(3)from the encoding and decoding weights of the second-layer DA,respectively.

Additionally,the output of each column for each image is collected into a matrixˆY=[y1,...,y C]∈R D×C,with each column being the output of one of the SSDA columns,ˆy c.To determine the ideal linear weighting of the SSDA columns for that given image,we perform the following non-linear minimization(quadratic program)as follows:3

minimize{s

c }

ˆYs−y 2(6)

subject to0≤s c≤1,∀c(7)

1−δ≤

c=1

s c≤1+δ(8)

Here s∈R C is the vector of weights s c corresponding to each SSDA column c.Constraining the weights between0and1was shown to allow for better weight predictions by reducing overﬁtting. The constraint in Eq.(8)helps to avoid degenerate cases where weights for very bright or dark spots 3In addition to the L2error shown in Equation(6),we also tested minimizing the L1distance as the error function,which is a standard method in the relatedﬁeld of image registration[3].The version using the L1 error performed slightly better in our noisy digit classiﬁcation task,suggesting that the loss function might need to be tuned to the task and images at hand.

Noise Type Parameter Parameter value

Gaussianσ20.02,0.06,0.10,0.14,0.18,0.22,0.26

Speckleρ0.05,0.10,0.15,0.20,0.25,0.30,0.35

Salt&Pepperρ0.05,0.10,0.15,0.20,0.25,0.30,0.35

Table1:SSDA training noises in the21-column AMC-SSDA.ρis the noise density. would otherwise be very high or low.Although making the weights sum exactly to one is more intuitive,we found that the p

erformance slightly improved when given someﬂexibility,as shown in Eq.(8).For our experiments,δ=0.05is used.

3.2.3Learning to predict optimal column weights via RBF networks

Theﬁnal training phase is to train the weight prediction module.A radial basis function(RBF) network is trained to take the feature vectorφas input and produce a weight vector s,using the optimal weight training set described in Section3.2.2.An RBF network was chosen for our exper-iments because of its known performance in function approximation[22].However,other function approximation techniques could be used in this step.

3.2.4Denoising with the AMC-SSDA

Once training has been completed,the AMC-SSDA is ready for use.A noisy image x is supplied as input to each of the columns,which together produce the output matrixˆY,each column of which is the output of a particular column of the AMC-SSDA.The feature vectorφis created from the activation of the hidden layers of each SSDA(as described in Section3.2.2)and fed into the weight prediction module(as described in Section3.2.3),which then computes the predicted column weights,s∗.Theﬁnal denoised imageˆy is produced by linearly combining the columns using these weights:ˆy=ˆYs∗.4

4Experiments

We performed a number of denoising tasks by corrupting and denoising images of computed to-mography(CT)scans of the head from the Cancer Imaging Archive[16](Section4.1).Quan-titative evaluation of denoising results was performed using peak signal-to-noise ratio(PSNR), a standard method used for evaluating denoising performance.PSNR is deﬁned as PSNR= 10log10(p2max/σ2e),where p max is the maximum possible pixel value andσ2e is the mean-square error between the noisy and original images.We also tested the AMC-SSDA as pre-processing step in an image classiﬁcation task by corrupting MNIST database of handwritten digits[18]with various types of noise and then denoising and classifying the digits with a classiﬁer trained on the original images(Section4.2).

4.1Image denoising

To evaluate general denoising performance,images of CT scans of the head were corrupted with seven variations of Gaussian,salt-and-pepper,and speckle noise,resulting in the21noise types shown in Table1.Twenty-one individual SSDAs were trained on randomly selected8-by-8pixel patches from the corrupted images;each SSDA was trained on a single type of noise.These twenty-one SSDAs we

re used as columns to create an AMC-SSDA.5The testing noise is given in Table2. The noise was produced using Matlab’s imnoise function with the exception of uniform noise, which was produced with our own implementation.For Poisson noise,the image is divided byλprior to applying the noise;the result is then multiplied byλ.

To train the weight predictor for the AMC-SSDA,a set of images disjoint from the training set of the individual SSDAs were used.The training images for the AMC-SSDA were corrupted with the same noise types used to train its columns.The AMC-SSDA was tested on another set of images disjoint from both the individual SSDA and AMC-SSDA training sets.The AMC-SSDA was trained 4We have tried alternatives to this approach.Some of these involved using a single uniﬁed network to combine the columns,such as joint training.In our preliminary experiments,these approaches did not yield signiﬁcant improvements.

5We also evaluated AMC-SSDAs with smaller number of columns.In general,we achieved better perfor-mance with more columns.We discuss its statistical signiﬁcance later in this section.

688IT编程网

Adaptive Multi-Column Deep Neural Networks

发表评论

推荐文章

java正则表达式选择题

一种基于正则表达式的DBC文件解析及报文分析方法[发明专利]

工龄小数点提取

非零金额正则表达式

提取文本中数字的函数

热门文章

excel文字递增函数公式

数字递增公式

notepad 正则变量运算

C++regex库常用函数及实例

js正则表达式之前瞻后顾与非捕获分组

indesign正则数字和英文之间的空格

C#匹配中文字符串的4种正则表达式分享

PHP正则表达式匹配中文字符

匹配中文汉字的正则表达式介绍

Python正则表达式如何进行字符串替换

orcl中用正则表达式

sql正则表达式excel

dataframe正则表达式

postgress sql正则

el-upload accept 正则表达式

半小时正则表达式

判断科学计数法的正则

根据url判断静态资源的方法

Java正则表达式-匹配正负浮点数

替换模糊匹配正则-hive

最新文章

一种基于正则表达式的DBC文件解析及报文分析方法[发明专利]

能被5整除的十进制整数的正规表达式

大于0小于等于1的正则表达式

linux grep 26个字母

java pattern 正则表达式

掌握文本编辑器中的搜索和替换技巧

标签列表

688IT编程网

Adaptive Multi-Column Deep Neural Networks

发表评论

推荐文章

java正则表达式 选择题

一种基于正则表达式的DBC文件解析及报文分析方法[发明专利]

工龄小数点提取

非零金额 正则表达式

提取文本中数字的函数

热门文章

excel文字递增函数公式

数字递增公式

notepad 正则变量运算

C++regex库常用函数及实例

js正则表达式之前瞻后顾与非捕获分组

indesign正则数字和英文之间的空格

C#匹配中文字符串的4种正则表达式分享

PHP正则表达式匹配中文字符

匹配中文汉字的正则表达式介绍

Python正则表达式如何进行字符串替换

orcl中用正则表达式

sql正则表达式excel

dataframe正则表达式

postgress sql正则

el-upload accept 正则表达式

半小时 正则表达式

判断科学计数法的正则

根据url判断静态资源的方法

Java正则表达式-匹配正负浮点数

替换模糊匹配正则-hive

最新文章

一种基于正则表达式的DBC文件解析及报文分析方法[发明专利]

能被5整除的十进制整数的正规表达式

大于0小于等于1的正则表达式

linux grep 26个字母

java pattern 正则表达式

掌握文本编辑器中的搜索和替换技巧

标签列表

java正则表达式选择题

非零金额正则表达式

半小时正则表达式