Study of Cover Source Mismatch in Steganalysis and Ways to
Mitigate its Impact
Jan Kodovský,Vahid Sedighi,and Jessica Fridrich
Department of ECE,Binghamton University,NY,USA
ABSTRACT
When a steganalysis detector trained on one cover source is applied to images from a different source,generally the detection error increases due to the mismatch between both sources.In steganography,this situation is recognized as the so-called cover source mismatch(CSM).The drop in detection accuracy depends on many factors,including the properties of both sources,the detector construction,the feature space used to represent the covers,and the steganographic algorithm.Although well recognized as the single most important factor negatively affecting the performance of steganalyzers in practice,the CSM received surprisingly little attention from researchers.One of the reasons for this is the diversity with which the CSM can manifest.On a series of experiments in the spatial and JPEG domains,we refute some of the common misconceptions that t
he severity of the CSM is tied to the feature dimensionality or their“fragility.”The CSM impact on detection appears too difficult to predict due to the effect of complex dependencies among the features.We also investigate ways to mitigate the negative effect of the CSM using simple measures,such as by enlarging the diversity of the training set(training on a mixture of sources)and by employing a bank of detectors trained on multiple different sources and testing on a detector trained on the closest source.
Keywords:Steganalysis,steganography,cover source mismatch,machine learning
1.INTRODUCTION
The problem of the so-called cover source mismatch(CSM)pertains to the situation when a steganalyzer is trained on a source of images and then presented with cover(or stego)examples from a different source.The mismatched detector typically exhibits a lower detection accuracy.The negative impact of the CSM has already been commented upon in[3,7]but the problem became more widely recognized and documented after the BOSS competition[1]in the work of BOSS participants[6,9].The drop of performance due to CSM can range from a small to moderate loss of detection accuracy to literally catastrophic results when a relatively accurate detector on one source completely fails on another.
In general,the impact of the CSM on detection accuracy depends on many factors,including the stegano-graphic algorithm and the distribution of the payload size,the steganalysis feature space,the properties of the cover source,and the actual implementation of the steganalyzer.The differences between cover sources can accept many different forms,examples of which are images of different resolution,size,and format,processed images,images acquired by different hardware,and under different conditions.Given the great diversity and the associated complexity of the problem,it is not surprising that,at present,the problem of the CSM is rather poorly understood.It has also been identified as one of the main obstacles when deploying steganalysis in real world[11].
This article is rather exploratory in nature.Positioning ourselves somewhere between the lab and the real world,we intentionally constrained our experiments to controlled image sets in order to eliminate any hidden variables and to better isolate the effects of various aspects of the CSM on detection accuracy.The focus is on gaining some understanding of how various factors contribute to the severity of the CSM and whether its impact can be mitigated using selected simple measures.On examples,we debunk some widely believed misconceptions that the features’high dimensionality or their“fragility”∗are the main reasons for a catastrophic E-mail:{jan.kodovsky,vsedigh1,fridrich}@binghamton.edu;dde.binghamton.edu
∗See the definition of fragility at the end of Section4.4.
loss of detection performance under CSM.Evidence gathered through experiments suggests that the interplay (mutual dependencies)among features are equally important factors affecting the ability of the detector to generalize to previously unseen sources of images.
The experiments are divided into two parts based on the steganographic embedding domain.In each domain, we investigate one embedding algorithm using two feature sets–one low-dimensional set and one rich model. The steganalysis detectors are always implemented as binary classifiers using the ensemble classifier[16].In the spatial domain,the sources differ by the image-acquiring hardware and applied processing.In the JPEG domain,we restrict our experiments to the CSM due to different quantization tables.We investigate the severity of the CSM impact based on the feature dimension,type of calibration(in the JPEG domain),and processing applied to the cover source.We also study the effectiveness of two simple counter-measures:training a single classifier on a mixture of sources and training a bank of classifiersfirst and then testing a given unseen source on the closest source used for training.The detection accuracy is measured using the total minimum probability of error under equal priors averaged over ten realizations of each experiment(over various random splits of the training and testing sets).
In the next section,we summarize several different approaches forfighting the CSM that were proposed in the machine learning community and in steganalysis.The next two sections contain the main results of this paper. Section3focuses on experiments in the spatial domain,while Section4deals with the steganalysis of JPEG images.Each section contains a detailed interpretation of the results and a summary of the lessons learned.The paper is concluded in Section5,where we outline possible avenues for future research.
2.CSM IN STEGANALYSIS:PRIOR ART
The problem of a mismatched detector is not new.It has been extensively studied within the context of robust statistical hypothesis testing[22].Such robust methods guarantee an upper bound on the drop of detection accuracy as a function of some measure quantifying the mismatch between the assumed and true distributions. On the side of the machine learning community,the interest in constructing robust classifiers has been steadily increasing.The methods of domain adaptation[8,10]can be applied when the analyst has access to a set of (unlabeled)samples from the testing source and can utilize this information for the training of the detector†.This requires a potentially expensive retraining of the classifier for each new testing domain.Domain generalization, on the other hand,tries to transform the feature space so that the differences between a multitude of training sources are as small as possible
while keeping the ability of the transformed features to distinguish between the classes(of cover and stego images).A general domain generalization method recently developed is the Domain Independent Component Analysis(DICA)[18].The term transfer learning[2,19]is used for general techniques directed at addressing the problem when there is a mismatch between the distributions of training and testing data.
For domain adaptation and generalization techniques to work,it is necessary that the training and testing sources be close.This condition is,however,hardly satisfied in real world applications in steganalysis,at least with current feature representations of digital media.Techniques with a lesser tendency to overtrain to a specific training cover source include the Constrained Least Squares(CLS)[21],which appears to be a special case of DICA with linear kernels in both the example and label spaces.‡The CLS is an example of a promising direction based on the idea of linearly transforming the feature space to minimize the statistical spread of cover features while maximizing the correlation between the stego features and the embedding change rate.
The effect of the classifier complexity,together with the size and diversity of the training set has been experimentally studied in[17].There,the authors concluded that simpler classifiers,such as the perceptron or the ensemble classifier[16],indeed appear to produce more robust detectors than more c
omplex machine learning tools,such as Gaussian SVMs.The authors also pointed out that it is better to train on samples from a larger number of diverse sources than on a less diverse source.
†This situation corresponds to the setup of the BOSS competition[1].
‡T.Pevný.Personal communication,November2013.
Cover source Format/processing Native resolution 1Canon EOS400D RAW,dcraw3906×2602
2CanonEOS7D RAW,dcraw5202×3465
3Canon EOS Digital Rebel XSi RAW,dcraw4290×2856
4Pentax K20D RAW,dcraw4688×3124
5Nikon D70RAW,dcraw3039×2014
6Leica M9RAW,dcraw5216×3472
7Canon EOS5D Mark_II RAW,dcraw5634×3753
8Canon EOS20D RAW,dcraw3522×2384
9Canon EOS550D RAW,dcraw5202×3465
10Canon6D RAW,ufraw5496×3670
11Canon6D JPEG5472×3648
12Sony DSC HX100V JPEG4608×3456
13Leica M9RAW,dcraw,resized to1024×10245216×3472
Table1.Thirteen cover sources for experiments in the spatial domain.
3.SPATIAL DOMAIN
3.1Cover sources
For experiments in the spatial domain,we selected13different sources listed in Table1,each consisting of10,000 images.Sources1–9were prepared using9different cameras by converting1,000images originally acquired at full resolution in the raw format to a24-bit color TIFF image using’dcraw’§(calling’dcraw-d-T img_filename’) and then converting the true-color image to8-bit graysc
ale in Matlab(rgb2gray).Ten512×512disjoint image blocks were cut from each of these images to form10,000images per source.Source10was prepared in the same manner except the’dcraw’routine was replaced with’ufraw’¶(calling’ufraw-batch img_filename’).Sources 11–12were produced by taking1,000images directly in the JPEG format in the camera and decompressing them to the spatial domain.Canon6D images were compressed with two JPEG quality factors for the luminance channel:75and≈98,while the SONY used three customized quantization tables.Two tables were close to the standard table with quality factor≈90while the other was close to94.Source13was obtained from5,000raw Leica M9images(a different batch than those used for Source6)using the same conversion as for Source6. The grayscale full-size images were then resized in Matlab with the bicubic kernel with no antialiasing so that the smaller dimension was1024,centrally cropped to1024×1024,andfinally cut up into two512×512images positioned in the lower part.
3.2Stego algorithm and feature sets
All tests in this section were carried out with non-adaptive LSB matching with afixed change rate of0.1changes per pixel.The steganalyzer was the ensemble ver. 2.0(dde.binghamton.edu/download/ensemble/) run with the default settings.The experiments were conducted with two different feature vectors:1)the338-dimensional SQUARE submodel of the Sp
atial Rich Model(SRM)[5]with the quantization step q=1,and2) the12,753-dimensional SRMQ1feature vector.
3.3Experiments
The experiments consisted of the following tasks.First,we trained on Source i and tested on Source j to see the impact of the CSM on detection accuracy.For training,a randomly-selected half of the10,000cover-stego pairs was chosen,while the testing was done on5,000randomly selected pairs from Source j.For i=j,the no CSM case,care was taken to make sure the training and testing sets were disjoint.
Second,we tested three simple strategies for mitigating the CSM impact.Thefirst,called’Mixture’,consisted of training on a mixture of5,000images from12sources(12×417cover-stego pairs uniformly randomly selected §bercom/~dcoffin/dcraw/
¶ufraw.sourceforge/
from each source)and testing on5,000images from the remaining13th source.The second strategy,’Clo
sest source’,used a bank of13classifiers trained on one randomly selected half of each source.Given a set of images from testing Source j,wefirst determined the closest source among the other12sources and then tested images from Source j using the classifier trained for the closest cover source.In the third method,’Closest source (indiv.)’,we steganalyzed each testing image separately with the classifier trained for the cover source closest to that individual image.Note that thefirst and third methods do not require other examples from the testing source than the test image,while the second method assumes the availability of multiple images from the testing source.These images are not used for retraining but merely for computing the distances between the sources.
We experimented with several different measures of source closeness.The overall best performer was a simple L2norm between the centers of gravity of cover feature clusters.The Mahalanobis distance performed slightly worse,while a measure of closeness based on the detection error of a classifier trained to distinguish both cover sources was unable to provide useful information when the sources were very different and the classifier error was near ,between any raw source and the decompressed JPEGs).
SQUARE(338)
Cover source No CSM Worst case Median Mixture Closest source Closest source(indiv.) 1Canon EOS400D.0457±.0010.4992±.0002.1043±.0026.0863±.0018.0610±.0017.1123±.0013 2CanonEOS7D.2529±.0027.5009±.0011.3625±.0044.3623±.0047.3615±.0039.3590±.0026 3Canon EOS Digital Rebel XSi.0774±.0013.4998±.0001.1745±.0057.1479±.0029.1104±.0101.1327±.0058 4Pentax K20D.0869±.0016.4991±.0002.1338±.0027.1271±.0016.1042±.0018.1611±.0024 5Nikon D70.0816±.0022.4995±.0003.1825±.0035.1674±.0034.1180±.0038.1355±.0031 6Leica M9.0720±.0016.4991±.0002.2238±.0057.1758±.0031.1416±.0022.2256±.0027 7Canon EOS5D Mark_II.0389±.0008.4984±.0003.0780±.0040.0465±.0009.0536±.0017.2429±.0006 8Canon EOS20D.0465±.0011.4985±.0003.0947±.0046.0726±.0017.0567±.0015.0992±.0013 9Canon EOS550D.0742±.0020.4994±.0002.1300±.0035.1161±.0029.0969±.0025.1319±.0018 10Canon6D.1209±.0022.4992±.0003.4683±.0157.4740±.0086.4941±.0005.4578±.0120 11Canon6D(JPEG).0028±.0007.4702±.0024.4381±.0098.0834±.0150.3029±.0089.2437±.0094 12Sony DSC HX100V(JPEG).0000±.0000.4963±.0010.4878±.0035.0593±.0237.0002±.0001.1180±.0108 13Leica M9(Resized).0619±.0015.4928±.0029.3400±.0085.2366±.0116.3325±.0055.4105±.0038 Table2.Detection error for no CSM,for CSM(worst and median values when training on the remaining12sources),and when employing one of the three strategies.See the text for more details.Feature space:338-dimensional’SQUARE’.
SRM(12,753)
Cover source No CSM Worst case Median Mixture Closest source Closest source(indiv.) 1Canon EOS400D.0022±.0004.4987±.0002.0116±.0010.0081±.0007.0104±.0008.0440±.0008 2CanonEOS7D.0348±.0012.4983±.0003.2349±.0130.1303±.0102.4744±.0059.2913±.0036 3Canon EOS Digital Rebel XSi.0037±.0005.4994±.0001.0260±.0023.0214±.0034.0136±.0011.0726±.0018 4Pentax K20D.0067±.0006.4978±.0003.0372±.0018.0185±.0010.0228±.0009.0310±.0005 5Nikon D70.0052±.0006.4986±.0002.0248±.0009.0210±.0025.0229±.0012.0420±.0013 6Leica M9.0063±.0005.4976±.0009.0667±.0070.0468±.0043.0641±.0022.0672±.0033 7Canon EOS5D Mark_II.0062±.0007.4976±.0003.0197±.0009.0107±.0011.0113±.0005.0438±.0003 8Canon EOS20D.0033±.0006.4967±.0003.0134±.0020.0080±.0007.0091±.0008.0295±.0002 9Canon EOS550D.0058±.0006.4986±.0002.0222±.0029.0127±.0011.0174±.0010.0285±.0006 10Canon6D.0402±.0017.4994±.0002.4991±.0002.4749±.0084.4988±.0004.4988±.0001 11Canon6D(JPEG).0029±.0005.4696±.0106.3511±.0162.0812±.0134.3487±.0019.3156±.0090 12Sony DSC HX100V(JPEG).0000±.0000.4948±.0018.2929±.0362.1297±.0494.0037±.0027.1804±.0313 13Leica M9(Resized).0516±.0011.4845±.0014.4089±.0126.2797±.0045.4845±.0014.4407±.0081 Table3.Detection error for no CSM,for CSM(worst and median values when training on the remaining1
2th sources),and when employing one of the three strategies.See the text for more details.Feature space:12,753-dimensional’SRMQ1’.
The results of all experiments are summarized in Tables2–3showing the minimum total detection error under
equal priors,
P E=min
(P F A+P MD)/2,(1)
P F A
averaged over ten splits of the training and testing sources into halves(over random selection of images for the mixture)and the statistical spread expressed using the Mean Absolute Deviation(MAD).Thefirst column concerns the case of no CSM.The second and third columns show P E for the worst case of the CSM and the median P E over all12training sources different than the testing source.These two columns are meant to show how bad the impact of the CSM can be.Columns4–6show the detection error P E when employing one of the three strategies to mitigate th
e CSM impact.
For sources that are not very diverse(1–9),the three simple strategies seem to be quite effective in suppressing the negative impact of the CSM on detection,which can be quite bad(c.f.,columns2ans3).Surprisingly,this remains true when steganalyzing with the compact’SQUARE’feature space as well as the rich model’SRMQ1’. The main differences between Sources1–9are due to different sensor resolution,which affects the correlations among neighboring pixels)and processing during the signal transfer and quantization.The remainder of the processing pipeline stayed the same–it was executed using’dcraw’.
The three simple strategies failed for Sources10–13.Source10differs from thefirst9sources in the RAW format converter–’ufraw’was used instead of’dcraw’.The differences in both processing routines apparently produce very different sources.Although untested,we hypothesize that sources generated using either of the two raw convertors sensitively depend on the settings,which include the color interpolation algorithm as well as the parameters for further processing,such as color correction,white balance,gamma correction,noise reduction, and a multitude of other processing that can be used to enhance thefinal image,which includes adjustment of shadows,highlights,blacks,whites,contrast,exposure,tint,color,saturation,vibrance,clarity,denoising,lens-
distortion correction and chromatic aberration(both involve resampling).Given the enormous diversity a raw camera image can be processed,the three simple strategies can only be effective in practice when appropriately scaled up.
It is worth pointing out the results for the two decompressed JPEG sources no.11and12.Since≈70% of Canon6D images were compressed with quality factor≈98,this source is a good training source for Source 12,which contains decompressed JPEGs from SONY with quality factors≈90and≈94.On the other hand, the SONY images are relatively poor training data for Canon6D decompressed JPEGs,which contain images originally stored with a much lower quality factor of75.This very limited experiment seems to suggest that cover sources comprised of decompressed JPEGs could be effectively steganalyzed by one of the tested strategies by enlarging the range of quality factors for the training sources.
resizedThe last source,Leica M9resized to1024×1024(and cropped to512×512)is again an outlier source because none of the12training sources contains images with resizing artifacts.As shown in[15],the resizing algorithm and its parameters can have a very substantial effect on steganalysis.
4.JPEG DOMAIN
Since JPEG compression is a type of a low-passfilter,it largely suppresses differences among sources acquired using different cameras as well as differences due to processing,such as resizing.On the other hand,JPEG images depend on a vector parameter–the quantization table(s).Since quantization has a dramatic effect on the distribution of the DCT coefficients forming the JPEGfile,it also has a major effect on the accuracy with which steganography can be detected and on the robustness of the detectors to the CSM.Therefore,in this section we study the impact of the CSM caused by mismatched quantization tables.Note that it is not feasible to construct a detector for each quantization table as many cameras today use custom tables that may even depend on the image content.
4.1Cover sources
We use the BOSSbase1.01database consisting of10,0008-bit grayscale images of size512×512.Multiple different sources were created by compressing this mother database with quantization tables for JPEG quality factors65−100,as well as a set of custom quantization tables extracted from JPEG images coming from several different camera models.
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系QQ:729038198,我们将在24小时内删除。
发表评论