Adaptive Multi-Column Deep Neural Networks
with Application to Robust Image Denoising Forest Agostinelli Michael R.Anderson Honglak Lee
Division of Computer Science and Engineering
University of Michigan
Ann Arbor,MI48109,USA
{agostifo,mrander,honglak}@umich.edu
Abstract
Stacked sparse denoising autoencoders(SSDAs)have recently been shown to be
successful at removing noise from corrupted images.However,like most denois-
ing techniques,the SSDA is not robust to variation in noise types beyond what
it has seen during training.To address this limitation,we present the adaptive
multi-column stacked sparse denoising autoencoder(AMC-SSDA),a novel tech-
nique of combining multiple SSDAs by(1)computing optimal column weights
via solving a nonlinear optimization program and(2)training a separate network
to predict the optimal weights.We eliminate the need to determine the type of
noise,let alone its statistics,at test time and even show that the system can be
robust to noise not seen in the training set.We show that state-of-the-art denois-
ing performance can be achieved with a single system on a variety of different
adaptivenoise types.Additionally,we demonstrate the efficacy of AMC-SSDA as a pre-
processing(denoising)algorithm by achieving strong classification performance
on corrupted MNIST digits.
1Introduction
Digital images are often corrupted with noise during acquisition and transmission,degrading perfor-mance in later tasks such as:image recognition and medical diagnosis.Many denoising algorithms have been proposed to improve the accuracy of these tasks when corrupted images must be used. However,most of these methods are carefully designed only for a certain type of noise or require assumptions about the statistical properties of the corrupting noise.
For instance,the Wienerfilter[28]is an optimal linearfilter in the sense of minimum mean-square error and performs very well at removing speckle and Gaussian noise,but the input signal and noise are assumed to be wide-sense stationary processes,and known autocorrelation functions of the input are required[7].Medianfiltering outperforms linearfiltering for suppressing noise in images with edges and gives good output for salt&pepper noise[2],but it is not as effective for the removal of additive Gaussian noise[1].Periodic noise such as scan-line noise is difficult to eliminate using spatialfiltering but is relatively easy to remove using Fourier domain band-stopfilters once the period of the noise is known[6].
Much of this research has taken place in thefield of medical imaging,most recently because of a drive to reduce patient radiation exposure.As radiation dose is decreased,noise levels in medical images increases[12,15],so noise reduction techniques have been key to maintaining image quality
while improving patient safety[25].In this application,assumptions must also be made or statistical properties must also be determined for these techniques to perform well[24].
Recently,various types of neural networks have been evaluated for their denoising efficacy.Xie et al.[29]had success at removing noise from corrupted images with the stacked sparse denoising
autoencoder(SSDA).The SSDA is trained on images corrupted with a particular noise type,so it too has a dependence on a priori knowledge about the general nature of the noise.
In this paper,we present the adaptive multi-column sparse stacked denoising autoencoder(AMC-SSDA),a method to improve the SSDA’s robustness to various noise types.In the AMC-SSDA, columns of single-noise SSDAs are run in parallel and their outputs are linearly combined to pro-duce thefinal denoised image.Taking advantage of the sparse autoencoder’s capability for learning features,the features encoded by the hidden layers of each SSDA are supplied to an additional network to determine the optimal weighting for each column in thefinal linear combination.
We demonstrate that a single AMC-SSDA network provides better denoising results for both noise types present in the training set and for noise types not seen by the denoiser during training.A given instance of noise corruption might have features in common with one or more of the training set nois
e types,allowing the best combination of denoisers to be chosen based on that image’s specific noise characteristics.With our method,we eliminate the need to determine the type of noise,let alone its statistics,at test time.Additionally,we demonstrate the efficacy of AMC-SSDA as a preprocessing (denoising)algorithm by achieving strong classification performance on corrupted MNIST digits. We will make our code available for reproducible research.
2Related work
Numerous approaches have been proposed for image denoising using signal processing techniques (e.g.,see[21,8]for a survey).Some methods transfer the image signal to an alternative domain where noise can be easily separated from the signal[23,19].Portilla et al.[23]proposed a wavelet-based Bayes Least Squares with a Gaussian Scale-Mixture(BLS-GSM)method.More recent ap-proaches exploit the“non-local”statistics of images:different patches in the same image are often similar in appearance,and thus they can be used together in denoising[11,20,8].This class of algorithms—BM3D[11]in particular—represents the current state-of-the-art in natural image de-noising;however,it is targeted primarily toward Gaussian noise.In our preliminary evaluation, BM3D did not perform well on many of the variety of noise types.
While BM3D is a well-engineered algorithm,Burger et al.[9]showed that it is possible to achieve state-of-the-art denoising performance with a plain multi-layer perceptron(MLP)that maps noisy patches onto noise-free ones,once the capacity of the MLP,the patch size,and the training set are large enough.Therefore,neural networks indeed have a great potential for image denoising. Vincent et al.[27]introduced the stacked denoising autoencoders as a way of providing a good initial representation of the data in deep networks for classification tasks.Our proposed AMC-SSDA builds upon this work by using the denoising autoencoder’s internal representation to determine the optimal column weighting for robust denoising.
Cires¸an et al.[10]presented a multi-column approach for image classification,averaging the output of several deep neural networks(or columns)trained on inputs preprocessed in different ways.How-ever,based on our experiments,this ,simply averaging the output of each column)is not robust in denoising since each column has been trained on a different type of noise.To address this problem,we propose an adaptive weighting scheme that can handle a variety of noise types. Jain et al.[17]used deep convolutional neural networks for image denoising.Rather than using a convolutional approach,our proposed method applies multiple sparse autoencoder networks in combination to the denoising task.Tang et al.[26]applied deep learning ,extensions of
the deep belief network with local receptivefields)to denoising and classifying MNIST digits.In comparison,we achieve favorable classification performance on corrupted MNIST digits.
3Algorithm
In this section,wefirst describe the SSDA[29].Then we will present the AMC-SSDA and describe the process offinding optimal column weights and predicting column weights for test images.
3.1Stacked sparse denoising autoencoders
A denoising autoencoder(DA)[27]is typically used as a way to pre-train layers in a deep neural network,avoiding the difficulty in training such a network as a whole from scratch by performing
greedy layer-wise ,[4,5,13]).As Xie et al.[29]showed,a denoising autoencoder is also a naturalfit for performing denoising tasks,due to its behavior of taking a noisy signal as input and reconstructing the original,clean signal as output.
Commonly,a series of DAs are connected to form a stacked denoising autoencoder(SDA)—a deep network formed by feeding the hidden layer’s activations of one DA into the input of the next DA. Typically,SDAs are pre-trained in an unsupervised fashion where each DA layer is trained by gen-era
ting new noise[27].We follow Xie et al.’s method of SDA training by calculating thefirst layer activations for both the clean input and noisy input to use as training data for the second layer.As they showed,this modification to the training process allows the SDA to better learn the features for denoising the original corrupting noise.
More formally,let y∈R D be an instance of uncorrupted data and x∈R D be the corrupted version of y.We can define the feedforward functions of the DA with K hidden units as follows:
h(x)=f(Wx+b)(1)
ˆy(x)=g(W h+b )(2) where f()and g()are respectively encoding and decoding functions(for which sigmoid function σ(s)=1
1+exp(−s)
is often used),1W∈R K×D and b∈R K are encoding weights and biases, and W ∈R D×K and b ∈R D are the decoding weights and biases.h(x)∈R K is the hidden layer’s activation,andˆy(x)∈R D is the reconstruction of the ,the DA’s output).Given training data D={(x1,y1),...,(x N,y N)}with N training examples,the DA is trained by back-propagation to minimize the sparsity regularized reconstruction loss given by
L DA(D;Θ)=1
N
N
i=1
y i−ˆy(x i) 22+β
K
j=1
KL(ρ ˆρj)+
λ
2
( W 2F+ W  2F)(3)
whereΘ={W,b,W ,b }are the parameters of the model,and the sparsity-inducing term KL(ρ ˆρj)is the Kullback-Leibler divergence betweenρ(target activation)andˆρj(empirical av-erage activation of the j-th hidden unit):
KL(ˆρj ρ)=ρlog ρ
ˆρj
+(1−ρ)log
(1−ρ)
1−ˆρj
whereˆρj=
1
N
N
i=1
h j(x i)(4)
andλ,β,andρare scalar-valued hyperparameters determined by cross validation.
In this work,two DAs are stacked as shown in Figure1a,where the activation of thefirst DA’s hidden layer provides the input to the second DA,which in turn provides the input to the output layer of thefirst DA.This entire network—the SSDA—is trained again by back-propagation in a fine tuning stage,minimizing the loss given by
L SSDA(D;Θ)=1
N
N
i=1
y i−ˆy(x i) 22+
λ
2
2L
l=1
W(l) 2F(5)
where L is the number of stacked DAs(we used L=2in our experiments),and W(l)denotes weights for the l-th layer in the stacked deep network.2The sparsity-inducing term is not needed for this step because the sparsity was already incorporated in the pre-trained DAs.Our experiments show that there is not a significant change in performance when sparsity is included.
3.2Adaptive Multi-Column SSDA
The adaptive multi-column SSDA is the linear combination of several SSDAs,or columns,each trained on a single type of noise using optimized weights determined by the features of each given input image.Taking advantage of the SSDA’s capability of feature learning,we use the features gen-er
ated by the activation of the SSDA’s hidden layers as inputs to a neural network-based regression component,referred to here as the weight prediction module.As shown in Figure1b,this module then uses these features to compute the optimal weights used to linearly combine the column outputs into a weighted average.
1In particular,the sigmoid function is often used for decoding the input data when their values are bounded between0and1.For general cases,other types of functions(such as tanh,rectified linear,or linear functions) can be used.
2After pre-training,we initialized W(1)and W(4)from the encoding and decoding weights of thefirst-layer DA,and W(2)and W(3)from the encoding and decoding weights of the second-layer DA,respectively.
Additionally,the output of each column for each image is collected into a matrixˆY=[y1,...,y C]∈R D×C,with each column being the output of one of the SSDA columns,ˆy c.To determine the ideal linear weighting of the SSDA columns for that given image,we perform the following non-linear minimization(quadratic program)as follows:3
minimize{s
c }
1
ˆYs−y 2(6)
subject to0≤s c≤1,∀c(7)
1−δ≤
C
c=1
s c≤1+δ(8)
Here s∈R C is the vector of weights s c corresponding to each SSDA column c.Constraining the weights between0and1was shown to allow for better weight predictions by reducing overfitting. The constraint in Eq.(8)helps to avoid degenerate cases where weights for very bright or dark spots 3In addition to the L2error shown in Equation(6),we also tested minimizing the L1distance as the error function,which is a standard method in the relatedfield of image registration[3].The version using the L1 error performed slightly better in our noisy digit classification task,suggesting that the loss function might need to be tuned to the task and images at hand.
Noise Type Parameter Parameter value
Gaussianσ20.02,0.06,0.10,0.14,0.18,0.22,0.26
Speckleρ0.05,0.10,0.15,0.20,0.25,0.30,0.35
Salt&Pepperρ0.05,0.10,0.15,0.20,0.25,0.30,0.35
Table1:SSDA training noises in the21-column AMC-SSDA.ρis the noise density. would otherwise be very high or low.Although making the weights sum exactly to one is more intuitive,we found that the p
erformance slightly improved when given someflexibility,as shown in Eq.(8).For our experiments,δ=0.05is used.
3.2.3Learning to predict optimal column weights via RBF networks
Thefinal training phase is to train the weight prediction module.A radial basis function(RBF) network is trained to take the feature vectorφas input and produce a weight vector s,using the optimal weight training set described in Section3.2.2.An RBF network was chosen for our exper-iments because of its known performance in function approximation[22].However,other function approximation techniques could be used in this step.
3.2.4Denoising with the AMC-SSDA
Once training has been completed,the AMC-SSDA is ready for use.A noisy image x is supplied as input to each of the columns,which together produce the output matrixˆY,each column of which is the output of a particular column of the AMC-SSDA.The feature vectorφis created from the activation of the hidden layers of each SSDA(as described in Section3.2.2)and fed into the weight prediction module(as described in Section3.2.3),which then computes the predicted column weights,s∗.Thefinal denoised imageˆy is produced by linearly combining the columns using these weights:ˆy=ˆYs∗.4
4Experiments
We performed a number of denoising tasks by corrupting and denoising images of computed to-mography(CT)scans of the head from the Cancer Imaging Archive[16](Section4.1).Quan-titative evaluation of denoising results was performed using peak signal-to-noise ratio(PSNR), a standard method used for evaluating denoising performance.PSNR is defined as PSNR= 10log10(p2max/σ2e),where p max is the maximum possible pixel value andσ2e is the mean-square error between the noisy and original images.We also tested the AMC-SSDA as pre-processing step in an image classification task by corrupting MNIST database of handwritten digits[18]with various types of noise and then denoising and classifying the digits with a classifier trained on the original images(Section4.2).
4.1Image denoising
To evaluate general denoising performance,images of CT scans of the head were corrupted with seven variations of Gaussian,salt-and-pepper,and speckle noise,resulting in the21noise types shown in Table1.Twenty-one individual SSDAs were trained on randomly selected8-by-8pixel patches from the corrupted images;each SSDA was trained on a single type of noise.These twenty-one SSDAs we
re used as columns to create an AMC-SSDA.5The testing noise is given in Table2. The noise was produced using Matlab’s imnoise function with the exception of uniform noise, which was produced with our own implementation.For Poisson noise,the image is divided byλprior to applying the noise;the result is then multiplied byλ.
To train the weight predictor for the AMC-SSDA,a set of images disjoint from the training set of the individual SSDAs were used.The training images for the AMC-SSDA were corrupted with the same noise types used to train its columns.The AMC-SSDA was tested on another set of images disjoint from both the individual SSDA and AMC-SSDA training sets.The AMC-SSDA was trained 4We have tried alternatives to this approach.Some of these involved using a single unified network to combine the columns,such as joint training.In our preliminary experiments,these approaches did not yield significant improvements.
5We also evaluated AMC-SSDAs with smaller number of columns.In general,we achieved better perfor-mance with more columns.We discuss its statistical significance later in this section.

版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系QQ:729038198,我们将在24小时内删除。