Normalized Energy Density-Based Forensic Detection of Resampled Images
Xiaoying Feng,Member,IEEE,Ingemar J.Cox,Fellow,IEEE,and Gwenaël Doërr,Member,IEEE
Abstract—We propose a new method to detect resampled im-agery.The method is based on examining the normalized energy density present within windows of varying size in the second derivative of the image in the frequency domain,and exploiting this characteristic to derive a19-D feature vector that is used to train a SVM classifier.Experimental results are reported on7500 raw images from the BOSS database.Comparison with prior work reveals that the proposed algorithm performs similarly for resampling rates greater than1,and is superior to prior work for resampling rates less than1.Experiments are performed for both bilinear and bicubic interpolations,and qualitatively similar results are observed for each.Results are also provided for the detection of resampled imagery with noise corruption and JPEG compression.As expected,some degradation in performance is observed as the noise increases or the JPEG quality factor declines.
Index Terms—Image forensics,normalized energy density,re-sampling detection.
I.I NTRODUCTION
F ORENSIC signal processing attempts to identify the
variety of processing steps that a signal has undergone. Such information is useful in determining whether,for ex-ample,a signal is authentic or has been tampered with.There are two main approaches to multimedia forensics:active forensics and passive forensics[1].Active forensics relies on modifying the multimedia signal prior to its distribution to assist in later forensic analysis.Digital watermarks[2]are one example of active forensics.A limitation of active forensics is the need for content-generating ameras,sensors, microphones,that embed watermarks.Such devices are often not available,and in these cases active forensics cannot be applied.Passive forensics,in contrast,does not rely on any prior modification of the signal.As such,passive forensics is, in theory,applicable to a broader range of operating scenarios.
A variety of passive forensic methods have been developed to detect,for example,requantization[3]–[7],resampling [8]–[14],and affine transformation[15],[16].In this paper,we
Manuscript received October02,2011;revised October02,2011;accepted March01,2012.Date of publication April03,2012;date of current version May 11,2012.The associate editor coordinating the review of this manuscript and approving it for publication was Dr.Dinei A.Florencio.
X.Feng and I.J.Cox are with the Department of Computer Science,Univer-sity College London,London WC1E6BT,U.K.(e-mail:x.feng@cs.ucl.ac.uk).
G.Doërr is with the Technicolor R&D,Security and Content Protection Labs, 35576Cesson-SévignéCedex,France.
Color versions of one or more of thefigures in this paper are available online at
Digital Object Identifier10.1109/TMM.2012.2191946consider the problem of determining whether an image1has undergone resampling.Section II provides a review of prior work on this topic.Section III describes our normalized energy density-based method.Section IV reports experimental results performed using the BOSS version0.9database[17]consisting of7500raw images.Comparison is made with the algorithm described in[9].Finally,Section V summarizes our results and discusses possible directions for future work.
II.P RIOR W ORK
Early work on detecting image resampling was based on the observation that artifacts were introduced in the resampled im-agery due to interpolation,a basic operation involved in resam-pling.Generally,these artifacts are periodic in the spatial do-main and therefore manifest themselves as peaks in the corre-sponding frequency domain.
Popescu and Farid[8]noted that the interpolation process introduces correlations between the resampled imagery pixels. They proposed measuring these correlations based on an expec-tation/maximization(EM)algorithm.The EM algorithm esti-mates the“average”correlation that is present in the image and subsequently computes the probability of the pixels being cor-related to their neighbors.The corresponding correlation prob-ability map(p-map)in the DFT domain exhibits periodic peaks that are not present in single-sampled images.
The work of Popescu and Farid was subsequently refined. Mahdian and Saic[9]proposed an automatic detector that searches for local maxima,which are defined as times greater than a local average magnitude.Kirchner[10],[11]replaced the EM algorithm,which is computationally demanding,with linearfiltering[10]and linear row and column predictors[11], and proposed an automatic detector based on the maximum gradient of the p-map’s spectrum.Dalgaard et al.[12],and Vazquez-Padín and Pérez-González[13]exploited the under-lying cyclostationarity of communication signals.Based on cyclostationarity theory,they examined a series of prefilters in order to improve the detection accuracy of imagery resampling. More recently,Feng et al.[14]proposed a new normalized energy density-based method for resampling detection.In this paper,we further investigate this method from both the theoretical and experimental perspectives.
The method described in[9]is used as the baseline against which we compare our algorithm.The second derivative is taken along either the horizontal or the vertical dimension of an image. The radon transformation then projects the second derivative to each of180directions,where the projection angles are integers 1For the sake of simplicity,we only discuss grey-scale images in this paper. However,a color image can be represented by three ,intensity, saturation and hue.Our proposed method can be directly applied in the intensity channel of color images.
1520-9210/$31.00©2012IEEE
from0to179.The detection of resampling is based on the de-tection of periodicity in the autocovariance of the projected vec-tors.In so doing,thefirst derivative of all180projected vectors are calculated.The autocovariance of thefirst derivative is then computed.Finally,the periodicity of autocovariance is detected in the DFT domain,using a local maxima detector.Further de-tails of the algorithm can be found in[9].
III.N ORMALIZED E NERGY D ENSITY-B ASED M ETHOD Our method for detecting resampling is based on a19-D fea-ture vector that is derived from an examination of the normal-ized energy density present at various window sizes in the DFT of the second-derivative of an image.
A.Normalized Energy Density
Let denote an image,and denote its frequency the DFT of the image.for simplicity and without loss of generality,that the image is a square with ,there are pixels present in the image. Using Parseval’s equation,the energy,,present in an image, is given by
(1) where is the cutoff frequency of the image.We define
to be the energy present in the power spectrum,in a window of dimension,where
(2) The energy density,,is then defined by the averaged en-ergy within the ,
(3) To accommodate for the likely difference of dimensions be-tween the original image and its resized versions,we normalize the window size,,with respect to the cutoff frequency,,i.e.,
(4) where takes values between0and1.As a result,the corre-sponding normalized energy density,,which is indepen-dent of the size of the image under investigation,is given by
(5) where represents the energy density of the whole image.B.Image Resampling
Let and denote the original image and its corresponding resampled version respectively.Mathemati-cally,resampling can be represented as the convolution of the single-sampled image,,and the impulse response func-tion(IRF)of the resampling system,,that corresponds to the specific interpolation method and any post-processing being ,
(6) Correspondingly,in the frequency domain,we have
(7) In practice,bilinear interpolation and bicubic interpolation are two widely-used interpolation methods.Specifically,bi-linear interpolation considers the closest22neighborhood of known pixel values surrounding the interpolated(unknown) pixel.It then takes a weighted average of these four pixels to calculate the interpolated value.In contrast,bicubic interpola-tion considers the closest44neighborhood of known pixels. Similarly,it then takes a weighted average of these16pixels to calculate the interpolated value.In this paper,both bilinear and bicubic interpolations are considered for comparative purposes. Note that anti-aliasingfilters are routinely applied as a post-processing after down-sampling an image to avoid undesired visual artifacts.Therefore,anti-aliasing option is activated with the M ATLAB function imresize()to perform the resampling operations in the experiments in Section IV.However,for the theoretical analysis of resampling,we do not consider any post-processing in this Section.
C.Up-Sampling and Down-Sampling
Suppose that the cutoff frequency of a single-sampled image is.Note that an original image and its resized versions will not have the same dimensions.In the frequency domain,these different dimensions will be represented by different cutoff fre-quencies.For up-sampled ,resampling rate
,the resampled images will have larger dimensions than the single-sampled images.In other words,the cutoff frequencies of up-sampled images are larger than that of the single-sam-pled images.Similarly,the cutoff frequencies of down-sampled ,,are smaller than those of corresponding single-sampled images.
Mathematically,the cutoff frequency of a resampled image with resampling rate should be.In other words,the fre-quency spectrum of a resampled image is between and .Therefore,in order to generate a resampled image,it is equivalent for the single-sampled image to go through a low-passfilter with the cutoff frequency.For the purpose of theoretical analysis,we consider an ideal low-passfi,
if
otherwise
(8)
1)Up-Sampling:Based on the ideal low-passfilter shown in(8),the width of the frequency spectrum of an up-sampled
Fig.1.Normalized spectrum in the diagonal direction (i.e.,).(a)Without high-pass filter.(b)After first-derivative high-pass filter.(c)After second-derivative high-pass filter.The curves are averaged over 7500images from the BOSS
database.
Fig.2.Normalized energy density curves for a set of resampling rates,averaged over 7500images from the BOSS database (without high-pass filtering,without anti-aliasing,fixed content cropping).(a)Bilinear interpolation.(b)Bicubic interpolation.
image,
,is extended from to
,i.e.,
if if
or
(9)Therefore,the energy density of an up-sampled image,,
is given by
if if
(10)
where
and represent the energy density of the single-sampled and resampled images respectively.
2)Down-Sampling:Based on the ideal low-pass filter shown in (8),the width of the frequency spectrum of a down-sampled
image,
,is narrowed from to ,i.e.,
(11)
Therefore,the energy density of a down-sampled image,
,is given by
(12)
D.High-Pass Filtering
It is well-known that the spectrum present in an image is typ-ically concentrated in the lower frequencies,and drops off with increasing frequency.The shape of the spectrum is often ap-proximated by a Gaussian model.Fig.1(a)shows the normal-ized spectrum in the diagonal ,
,averaged over all 7500images in the BOSS database.
Note that the high concentration of energy in the lower fre-quencies can mask high frequency effects.Fig.2shows the nor-malized energy density curves for a set of different resampling rates,over 7500images from the BOSS database for window sizes ranging from 0to 1,including:1)single-sampled images (solid curve);2)resampled images up-sampled at rates 1.5and 2.0;and 3)resampled images do
wn-sampled at rates 0.1and 0.5.It is clear that the energy curves of different resampling rates cannot be distinguished from each other,due to their similar strong energy in the lower frequencies.We also observe a “dis-crepancy”,where the normalized energy density for down-sam-pled imagery with resampling rate 0.1decreases.This “discrep-ancy”appears to be caused by the high concentration of energy in the lower frequencies,and is absent when high-pass filtering is performed.
In order to eliminate the effect of the high energy concentra-tion in the lower frequencies,an image,
,is first high-pass filtered before examining its power spectrum.The first-derivative gradient filter and second-derivative Laplacian filter are widely-used high-pass filters.It is interesting to compare the effect of these two high-pass filters.Fig.1(b)and (c)com-pare the normalized spectrum in the diagonal direction aver-aged over all 7500images in the database we used.We can see that strong low-frequency components remain in the spec-trum after the first-derivative high-pass filter,although the dc element equals 0.In contrast,low-frequency components are signi ficantly reduced by the second-derivative high-pass filter.For the normalized energy density curves,we would expect that the energy curves for different resampling rates cannot be easily separated after the first-derivative high-pass filtering,due to the remaining strong low-frequency spectrum.In contrast,it is expected that the energy curves for different resampling rates would be well distinguished after second-derivative high-pass
Fig.3.Normalized energy density curves for a set of resampling rates,averaged over 7500images from the BOSS database (after first-derivative high-pass filtering,without anti-aliasing,fixed content cropping).(a)Bilinear interpolation.(b)Bicubic
interpolation.
Fig.4.Normalized energy density curves for a set of resampling rates,averaged over 7500images from the BOSS database (after second-derivative high-pass filtering,without anti-aliasing,fixed content cropping).(a)Bilinear interpolation.(b)Bicubic interpolation.
filtering since it suppresses more of the low-frequencies.Figs.3and 4show the normalized energy density curves for the images with a set of different resampling rates,after the first-derivative and the second-derivative high-pass filters respectively.The re-ported energy curves clearly support the discussion above.We therefore use a second-derivative filter.Speci fically,the following Laplacian high-
pass filter with the kernel ,is used in our experiments
(13)
E.Peak Value and Its Location
The normalized energy density curves of Fig.4,are visually
well separated for different resampling rates.Speci fically,for the single-sampled imagery,we see a curve originating at zero (since there is no energy in a window size of zero),increasing monotonically to a peak at a window size of approximately
resized,and then monotonically decreasing to a value of 1
when the window size encompasses the entire power spectrum
.If we now compare this to the up-sampled images,we
observe a qualitatively similar curve,but the peak values are
shifted to the ,the peak values occur for
.Sim-ilarly for the down-sampled images,we observe a qualitatively similar curve,but the peak values are shifted to the ,
the peak values occur for .That is,the peak value and its location in the normalized energy density curve are shifted as the resampling rate changes.We theoretically explain this re-lationship next.
Let us use
and to represent the normalized en-ergy density of the resampled and single-sampled images re-spectively.Suppose the peak value in the normalized energy density curve of a single-sampled image occurs at the window
size
,we have (14)
Given (10)and (12),we can replace
with ,since the peak in the curve does not occur for a value of be-tween
when up-sampling occurs.Thus we have (15a)(15b)(15c)(15d)
Fig.5.Normalized energy density curves for a set of resampling rates,averaged over7500images from the BOSS database(after second-derivative high-pass filtering,without anti-aliasing,fixed content cropping).The error bars indicate the standard deviations over7500images.(a)Bilinear interpolation.(b)Bicubic interpolation.
The derivation from(15a)to(15d)relies on(10)
for up-sampling and on(12)for down-sampling.As a result, for a resampled image,the peak value of the normalized energy density curve is shifted to the window size
.
1)Up-Sampling:For ,,we have
.Therefore,the peak of the normalized energy den-sity curve is shifted to the left.This is to be expected,because for up-sampled imagery,we would expect that less energy is present in the very high frequencies,as these frequencies are absent in the original images.Furthermore,the peak value after up-sampling is given by
(16) It is clear that the peak value is increased after up-sampling. 2)Down-Sampling:For down-samplin
,,we have.Therefore,the peak of the normalized energy density curve is shifted to the right.This is to be expected,be-cause for down-sampled imagery,the fact that we have more en-ergy in the higher frequencies indicates that these images were derived from images containing higher frequency information. Furthermore,the peak value after down-sampling is given by:
(17) Note that(17)is only applicable ,
),since the possible value of a window size,, should be between0and1.In this case,the fraction in
is less than one and the peak value is therefore decreased after down-sampling.
However,if the down-sampling rate,,is smaller , ,the energy density curve is expected as mono-tonically increasing to the peak value one at the window size of one.F.Detection Method
The curves in Fig.4suggest that it is possible to differentiate single-sampled and resampled imagery based on the normalized energy density present at various window sizes in the DFT of the second-derivative of the scrutinized images.Fig.5shows the normalized energy density curves of Fig.4with associated error bars of standard deviation,computed over the7500images. Clearly,the standard deviations are relatively large,sometimes exhibiting considerable overlap between curves.Consequentl
y, it is not possible to discriminate single-sampling and resampling based on a single peak energy value.In order to overcome this problem,we use a19-D vector,the values of each dimension being the normalized energy density,,for window sizes ranging from0.05to0.95in steps of It is this vector that is used in the experiments reported next.
IV.E XPERIMENTAL R ESULTS
Our experiments use the BOSS database version0.9[17], which consists of7500raw ,the images have never undergone resampling.Experiments are performed for both bi-linear and bicubic interpolations.Note that both interpolation and anti-aliasing are applied to the images,as this is a more re-alistic operating scenario.
The19-D vector was used as input to train support vector machine(SVM)classifiers[18].The widely-used SVM kernel of radial basis function(RBF)is used.The reported results are the average of10trials by applying random subsampling val-idation,where for each of the10trials,training is performed on a random20%subset of the database,and testing on the re-maining80%.
Instead of manually setting afixed threshold,a receiver oper-ating characteristics(ROC)curve[19],the threshold of which is varied,is used to evaluate our detection results.Moreover,in order to summarize t
he classification performance with a single scalar number,we report the area under the ROC curve(AUC). Note that an AUC value of0means the detection is always false, whereas an AUC value of1means a perfect detection.An AUC value of0.5represents random guessing,which is reflected by the diagonal line between(0,0)and(1,1)in the ROC curve. The method described in[9]is used as the baseline algorithm for comparative purposes.Baseline#1is fully implemented ac-
Fig.6.Normalized energy density curves for a set of resampling rates,averaged over 7500images from the BOSS database (after second-derivative high-pass filtering,with anti-aliasing,fixed content cropping).(a)Bilinear interpolation.(b)Bicubic
interpolation.
Fig.7.Normalized energy density curves for a set of resampling rates,averaged over 7500images from the BOSS database (after second-derivative high-pass filtering,with anti-aliasing,variable content cropping).(a)Bilinear interpolation.(b)Bicubic interpolation.
cording to [9].In baseline #2,the discriminating features are im-plemented according to [9],but detection is based on the SVM classi fier that is the same as our method.The purpose of base-line #2is to help identify if the performance variation is due to different discriminating features or different detectors.A.Practical Considerations About Resizing
A typical raw camera image in the BOSS database has the resolution of 52343487.It is not necessary to perform the forensic analysis on the whole image.Instead,an image of smaller dimensions can be used.We therefore crop the original image by selecting its 512512central part.However,we can perform this cropping before or after resampling,as discussed next.
1)Fixed Content:With this first cropping option,the con-tent of a single-sampled image and its corresponding resampled versions are the same.More speci fically,suppose we have an original raw camera image with the resolution of 52343487,the single-sampled version will be the central 512512part cropped from the original raw camera image.Based on this single-sampled version,a set of resampled images at different resampling rates are generated accordingly,either using bilinear interpolation or bicubic interpolation.As a result,we obtain a set of images at different resampling rates with the same image
content.Note that the dimensions of the resulting images at dif-ferent resampling rates are different in this case.
2)Variable Content:With this second cropping option,the content of a single-sampled image and its corresponding resam-pled versions are different.Speci fically,suppose we have an original raw camera
image with the resolution of 52343487.Then we first generate a set of resampled images at different re-sampling rates either using bilinear interpolation or bicubic in-terpolation.Both the original raw camera image and the resam-pled images are then cropped to 512512by selecting their cen-tral part.As a result,we have a set of images at different resam-pling rates with the same image size.Note that the image content at different resampling rates are different with this option.
Fig.6shows the normalized energy density curves with fixed content cropping,while Fig.7shows the equivalent curves with variable content cropping.This pair of figures suggests that the normalized energy density curves at different resampling rates remain well separated in both cases.However,the curves for “fixed content”cropping are more separable.In the remainder of this paper,we report results using “fixed content”cropping.B.One Classi fier Per Resampling Rate
In the first experiment,one SVM-based classi fier is trained for each individual resampling rate ,which ranges from 0.1to
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系QQ:729038198,我们将在24小时内删除。
发表评论