ON THE HEAVY-TAILED DISTRIBUTION OF THE
SCENE DURATION IN VBR VIDEO
E. Casilari, A. Reyes, A. Díaz-Estrella and
F. Sandoval
Dpto. Tecnología Electrónica, E.T.S.I. Telecomunicación, Universidad de Málaga, Campus de Teatinos, 29071 Málaga (Spain) Telephone no.: 34-95- 2132755; FAX 34-95-2131447;
E-mail: casilari@dte.uma.es
INDEXING TERMS
VBR Video Traffic, Scene Oriented Model, heavy-tailed distribution, Hill's estimate, Long-Range-Dependence.
ABSTRACT
In this letter we propose a classification for the scene detection techniques which are commonly used in
Variable Bit Rate (VBR) video traffic modelling. Using real video traces we show that scene duration follows heavy-tailed distributions. This heavy-tailed nature is proved to be independent of the technique or the threshold used for the scene detection. This invariant property of video sequences offers a physical explanation for the existence of Long Range Dependence (LRD) in video traffic.
INTRODUCTION
Due to the increasing importance of video services, Variable Bit Rate (VBR) Video modelling has become a key issue in the ambit of multimedia traffic. Accurate video traffic models are needed to solve the still open problems that multimedia traffic management arises, such as police, shaping or call admission controls. VBR traffic is determined by the nature of the audiovisual sequences which are being transmitted. Many services (such as video on demand or broadcast TV) impose on video traffic a special variability because of their intrinsic evolution through scenes of different complexity and degree of motion. Moreover, it has been argued that Long Range Dependence (LRD), that is, long term variability existing in VBR Video Traffic could be motivated by the existence of scenes [1]. This LRD, which can seriously impact the network performance [2], can be modelled in two ways: using fractal processes (such as fractal gaussian noises or fractional integrated ARMA filters) or otherwise, considering a scene oriented modelling strategy. While fractal processes approximate the reality in a b
ehaviourist way ("black box" modelling) including parameters which have not an evident physical meaning (such as Hurst parameter), scene oriented models offer a structural approximation ("white box" modelling) which directly imitates the underlying mechanisms of traffic generation. Scenic models contemplate the existence of a superior level (the scene) which modulates the long term traffic flow. A necessary step to define this scene level is to divide the real VBR video traces into different scenes. A proper fit of the statistical distribution of the scene duration is fundamental if it is required to fully characterise the impact of scenes on traffic.
In this letter we propose a classification of the scene detection criteria used in modelling literature. Considering all the possible criteria and utilising a wide set of traces with different compression we show that infinite variance (or Noah effect) is an invariant property of the scene duration in video services.
S CENE DETECTION TECHNIQUES :
The following classification summarises the most common techniques to determine scene changes in real VBR video traces:
- Visual Detection: according to this non-analytical technique, the scenes are visually detected by mea
ns of a thorough observation of the real uncompressed video sequence. This solution obviously presents several drawbacks: visual series are often not available, it is required long human monitoring (frame by frame), scene limits are not always visually clear (because of effects as camera panning, zooms, fading,...). Moreover, there is not always a correspondence between the real scenes and the changes in the bit rate. For instance, two visually different scenes can generate the same traffic if they have the same degree of motion and image complexity.
- High pass filtering [3]: As scene changes usually provoke sudden rate changes in the traffic flow, a first or second order high pass filter can be enough to detect them. According to this technique a scene change occurs in the i-th  interval if: [][]T X i X i X >−−1 (1) where X[i] represents the generated traffic during the i-th  interval (normally the frame period) and X T  is a threshold.
- A combination of low and high pass filtering [1]: it consists in a variant of the previous technique. In order to avoid noise the difference between adjacent samples is compared with the mean value of the W  last samples, which is calculated with an averaging filter in the form: [][]
[]T i W i j X W
j X i X i X >−−∑−−=11 (2) - Low pass filtering and scene classification [4]: In this case, after averaging t
he traffic series, each sample of the signal is classified into N  different types of scenes depending on the averaged traffic of the W  adjacent samples.
- Clustering of the state space [5]: this method divides the state space (X[n], X[n+1]) into N  clusters representing N  types of scenes. The division is performed by means of a minimisation of the distances to N  centroids within the state space. An iterative search algorithm is required to optimise the position of the N  centroids.
E STIMATION O
F THE HEAVY -TAILED NATURE OF THE SCENE DURATION
A probability distribution function F X (x) of a random variable X[n] is said to be heavy-tailed if it exhibits a hyperbolic decay in the way: []α−⋅=−=>=x G x F x n X x G o X X )(1)Pr()( when x  → ∞, where G X (x) represents the complementary distribution function and G o  is a constant value.
It can be proved that for α equal or lower than 2 the distribution has infinite variance or Noah effect, which is related to an extreme variability of X[n].
There exist two common methods to determine α from a series X[n]:
- Plotting G X (x) on a log-log scale results, for a heavy-tail distribution, in an approximately straight line for large x-values, with a slope of -α. So, a common way to estimate α is to perform a least square regression on this representation.  -A statistically more rigorous method is known as Hill's estimate [6]:
[][]1
11log 1)(ˆ−−=−−⎥⎥⎦
⎤⎢⎢⎣⎡⎟⎟⎠⎞⎜⎜⎝⎛∝∑k i k N i N X X k k α (3) where X [N-i] is the i -th largest element of the N samples. As the index k  increases, the series α(k) stabilises at values close to α .
The presence of Noah effect is strongly related to the existence of LRD. In [6] it is proved that traffic sources whose activity periods (scenes in the case of video) follow heavy-tailed distributions (e.g.: Pareto) generate LRD or self-similar traffic, that is, traffic whose variability is not limited to a certain time scale.
To analyse the nature of scene duration distribution and the influence of choosing a particular detection technique, we consider five long real VBR video traces compressed under different schemes (interframe and intraframe): 1) a series called "Wurzburg" consisting of several 30 minute sequences o
f different video signals, containing TV programs, news, films, sports and cartoons, compressed with a MPEG encoder in the University of Wurzburg (Germany); 2) the film "E.T. the Extraterrestrial" compressed under Motion JPEG (intraframe); 3) a series called "MTV", also compressed under M-JPEG, including several hours of a TV channel; 4) and 5) the film "Star Wars" with MPEG and M-JPEG compression, respectively. Figure 1 shows the evolution of Hill's estimate when it is applied to the scene durations of the previous traces. In all cases a combination of high and low pass filtering was considered to detect scene changes. The figure also plots the estimates for an exponentially distributed random series of 10000 samples. In opposition to the exponential series, it is proved that for all real video series the estimator converges to values under 2, reflecting the existence of an infinite variance. These results are confirmed in Table I where the regression method is applied to calculate α.
characteriseFigure 2 depicts Hill's estimate when it is applied to the scenes of "Wurzburg" obtained with different methods and thresholds, which are normalised by the mean value. In particular it is considered a low pass filtering (with W =100 and N =3 types of scenes), a high pass filtering  (with threshold X T =0.5), a clustering method (with N =2 types of scenes) and four combinations of high and low pass filters. In three of them the window W  is fixed to 100 and three thresholds are contemplated (X T  =0.5, 0.8 and 1). In the other case, X T  is 0.5 and W  is changed to 300. In spite of the fact that results diverge for m
ost cases, it can be again observed that α tends to stabilise in values lower than 2, with independence of the threshold or the window size for the averaging filter.
CONCLUSIONS
In this letter we have shown that the scene duration of VBR Video is intrinsically heavy-tailed distributed, exhibiting the syndrome of infinite variance. This property has been detected with independence of the nature of the audiovisual signal (film or TV) or the criterion that is utilised to detect scene changes. This phenomenon enables a physical explanation for the existence of LRD or self-similarity within video traffic, establishing an invariant property that VBR traffic models should take into account.
A CKNOWLEDGEMENTS
This work has been partially supported by the Spanish Comisión Interministerial de Ciencia y Tecnología (CICYT), Project No. TIC96-0743.
R EFERENCES
[1] Jelenkovic, P.R., Lazar, A.A., and Semret, N.: “The effect of Multiple Time Scales and Subexponenti
ality in MPEG Video Streams on Queueing Behavior”, IEEE Journal on Selected Areas in Communications, Vol. 15, No. 6, August, 1997, pp. 1052-1071.
[2] Huang, C., Devetsikiotis, M., Lambadaris, I., and Kaye, A. R.: “Self-Similar Traffic and Its Implications for ATM Network Design”, Proc. of ICCT'96, Pekin, China, May, 1996.
[3] Melamed, B., and Pendarakis, D., “A TES-Based Model for Compressed “Star Wars” Video”, Proceedings of the Communications Theory Mini-Conference at GLOBECOM’94, San Francisco, California, USA, November, 1994, pp. 70-81.
[4] Casilari, E., Lorente, M., Reyes, A., Díaz-Estrella, A., and Sandoval, F.: “Scene Oriented Model For VBR Video”, IEE Electronics Letters, 1998, Vol. 34, No. 2, January, 1998, pp. 166-168.
[5] Chandra K., and Reibman, A.R., "Modeling One and Two-Layer Variable Bit Rate Video”, Research report, AT&T Laboratories, New Jersey, USA, 1997
[6] Willinger, W., Taqqu, M.S., Sherman, R., and Wilson, D.V., “Self-Similarity Through High-Variability: Statistical Analysis of Ethernet LAN Traffic at the Source Level”, Proceedings of the ACM/SIGCOMM’95, Cambridge, Massachusetts, USA, August, 1995, pp. 100-113.
FIGURE CAPTIONS:
Figure 1. Hill's estimate of α for the scene duration of different video traces.
Figure 2. Hill's estimate of α for the scene duration of "Wurzburg" using different scene detection techniques.
020*********
120140
160180200012345
6
7
Hill's estimate for scene duration distribution
α(k )k
Figure 1.

版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系QQ:729038198,我们将在24小时内删除。