Adaptive background mixture models for--688IT编程网

Adaptive background mixture models for real-time tracking Chris Stauﬀer W.E.L Grimson

The Artiﬁcial Intelligence Laboratory

Massachusetts Institute of Technology

Cambridge,MA02139

Abstract

A common method for real-time segmentation of moving regions in image sequences involves“back-ground subtraction,”or thresholding the error between an estimate of the image without moving objects and the current image.The numerous approaches to this problem diﬀer in the type of background model used and the procedure used to update the model.This paper discusses modeling each pixel as a mixture of Gaus-sians and using an on-line approximation to update the model.The Gaussian distributions of the adaptive mixture model are then evaluated to determine which are most likelyto result f rom a background process. Each pixel is classiﬁed based on whether the Gaussian distribution which represents it most eﬀectivelyis con-sidered part of the background model.

This results in a stable,real-time outdoor tracker which reliablydeals with lighting changes,repetitive

motions from clutter,and long-term scene changes. This system has been run almost continuously for16 months,24hours a day,through rain and snow.

1Introduction

In the past,computational barriers have limited the complexity of real-time video processing applications. As a consequence,most systems were either too slow to be practical,or succeeded by restricting themselves to very controlled situations.Recently,faster comput-ers have enabled researchers to consider more complex, robust models for real-time analysis of streaming data. These new methods allow researchers to begin model-ing real world processes under varying conditions.

Consider the problem of video surveillance and monitoring.A robust system should not depend on careful placement of cameras.It should also be robust to whatever is in its visualﬁeld or whatever lighting eﬀects occur.It should be capable of dealing with movement through cluttered areas,objects overlap-ping in the visualﬁeld,shadows,lighting changes,ef-fects of moving elements of the swaying trees),slow-moving objects,and objects being intro-duced or removed from the scene.Traditional ap-proaches based on backgrounding methods typically fail in these general situations.Our goal is to cre-ate a robust,adaptive tracking system that isﬂexible enough to handle va

riations in lighting,moving scene clutter,multiple moving objects and other arbitrary changes to the observed scene.The resulting tracker is primarily geared towards scene-level video surveil-lance applications.

1.1Previous work and current shortcom-

ings

Most researchers have abandoned non-adaptive methods of backgrounding because of the need for manual initialization.Without re-initialization,errors in the background accumulate over time,making this method useful only in highly-supervised,short-term tracking applications without signiﬁcant changes in the scene.

A standard method of adaptive backgrounding is averaging the images over time,creating a background approximation which is similar to the current static scene except where motion occurs.While this is ef-fective in situations where objects move continuously and the background is visible a signiﬁcant portion of the time,it is not robust to scenes with many mov-ing objects particularly if they move slowly.It also cannot handle bimodal backgrounds,recovers slowly when the background is uncovered,and has a single, predetermined threshold for the entire scene.

Changes in scene lighting can cause problems for many backgrounding methods.Ridder et al.[5]mod-eled each pixel with a Kalman Filter which made their system more robust to lighting changes in the scene. While this method does have a pixel-wise automatic threshold,it still recovers slowly and does not han-dle bimodal backgrounds well.Koller et al.[4]have successfully integrated this method in an automatic traﬃc monitoring application.

Pﬁnder[7]uses a multi-class statistical model for

the tracked objects,but the background model is a single Gaussian per pixel.After an initialization pe-riod where the room is empty,the system reports good results.There have been no reports on the success of this tracker in outdoor scenes.

Friedman and Russell[2]have recently implemented a pixel-wise EM framework for detection of vehicles that bears the most similarity to our work.Their method attempts to explicitly classify the pixel values into three separate,predetermined distributions corre-sponding to the road color,the shadow color,and col-ors corresponding to vehicles.Their attempt to medi-ate the eﬀect of shadows appears to be somewhat suc-cessful,but it is not clear what behavior their system would exhibit for pixels which did not contain these three distributions.For example,pixels may present a single backgr

ound color or multiple background col-ors resulting from repetitive motions,shadows,or re-ﬂectances.

1.2Our approach

Rather than explicitly modeling the values of all the pixels as one particular type of distribution,we simply model the values of a particular pixel as a mix-ture of Gaussians.Based on the persistence and the variance of each of the Gaussians of the mixture,we determine which Gaussians may correspond to back-ground colors.Pixel values that do notﬁt the back-ground distributions are considered foreground until there is a Gaussian that includes them with suﬃcient, consistent evidence supporting it.

Our system adapts to deal robustly with lighting changes,repetitive motions of scene elements,track-ing through cluttered regions,slow-moving objects, and introducing or removing objects from the scene. Slowly moving objects take longer to be incorporated into the background,because their color has a larger variance than the background.Also,repetitive vari-ations are learned,and a model for the background distribution is generally maintained even if it is tem-porarily replaced by another distribution which leads to faster recovery when objects are removed.

Our backgrounding method contains two signiﬁcant parameters–α,the learning constant and T,the pr

o-portion of the data that should be accounted for by the background.Without needing to alter parameters,our system has been used in an indoors,human-computer interface application and,for the past16months,has been continuously monitoring outdoor scenes.

2The method

If each pixel resulted from a particular surface un-der particular lighting,a single Gaussian would be suf-ﬁcient to model the pixel value while accounting

for

(a)

adaptive

(b)

(c)(d)

Figure1:The execution of the program.(a)the cur-rent image,(b)an image composed of the means of the most probable Gaussians in the background model, (c)the foreground pixels,(d)the current image with tracking information superimposed.Note:while the shadows are foreground in this case,if the surface was covered byshadows a signiﬁcant amount o f the time, a Gaussian representing those pixel values maybe sig-niﬁcant enough to be considered background. acquisition noise.If only lighting changed over time,a single,adaptive Gaussian per pixel would be suﬃcient. In practice,multiple surfaces often appear in the view frustum of a particular pixel and the lighting condi-tions change.Thus,multiple,adaptive Gaussians are necessary.We use a mixture of adaptive Gaussians to approximate this process.

Each time the parameters of the Gaussians are up-dated,the Gaussians are evaluated using a simple heuristic to hypothesize which are most likely to be part of the“background process.”Pixel values that do not match one of the pixel’s“background”Gaussians are grouped using connected components.Finally, the connected components are tracked from frame to frame using a multiple hypothesis tracker.The pro-cess is illustrated in Figure1.

2.1Online mixture model

We consider the values of a particular pixel over time as a“pixel process”.The“pixel process”is a time series of pixel scalars for grayvalues or vectors for color images.At any time,t,what is known about a particular pixel,{x0,y0},is its history {X1,...,X t}={I(x0,y0,i):1≤i≤t}(1) where I is the image sequence.Some“pixel processes”are shown by the(R,G)scatter plots in Figure2(a)-(c)

(b)

(c)

Figure 2:This ﬁgure contains images and scatter plots

of the red and green values of a single pixel from the image over time.It illustrates some of the diﬃculties involved in real environments.(a)shows two scatter plots from the same pixel taken 2minutes apart.This would require two thresholds.(b)shows a bi-model dis-tribution of a pixel values resulting from specularities on the surface of water.(c)shows another bi-modality resulting from monitor ﬂicker.

which illustrate the need for adaptive systems with au-tomatic thresholds.Figure 2(b)and (c)also highlight a need for a multi-modal representation.

The value of each pixel represents a measurement of the radiance in the direction of the sensor of the ﬁrst object intersected by the pixel’s optical ray.With a static background and static lighting,that value would be relatively constant.If we assume that independent,Gaussian noise is incurred in the sampling process,its density could be described by a single Gaussian dis-tribution centered at the mean pixel value.Unfortu-nately,the most interesting video sequences involve lighting changes,scene changes,and moving objects.If lighting changes occurred in a static scene,it would be necessary for the Gaussian to track those changes.If a static object was added to the scene and was not incorpora

ted into the background until it had been there longer than the previous object,the corresponding pixels could be considered foreground for arbitrarily long periods.This would lead to accu-mulated errors in the foreground estimation,resulting in poor tracking behavior.These factors suggest that more recent observations may be more important in determining the Gaussian parameter estimates.

An additional aspect of variation occurs if moving objects are present in the scene.Even a relatively con-sistently colored moving object is generally expected to produce more variance than a “static”object.Also,in general,there should be more data supporting the background distributions because they are repeated,whereas pixel values for diﬀerent objects are often not the same color.

These are the guiding factors in our choice of model and update procedure.The recent history of each pixel,{X 1,...,X t },is modeled by a mixture of K Gaus-sian distributions.The probability of observing the current pixel value is

P (X t )=

K i =1

ωi,t ∗η(X t ,µi,t ,Σi,t )

(2)

where K is the number of distributions,ωi,t is an es-timate of the weight (what portion of the data is ac-counted for by this Gaussian)of the i th Gaussian in the mixture at time t,µi,t is the mean value of the i th Gaussian in the mixture at time t,Σi,t is the co-variance matrix of the i th Gaussian in the mixture at time t,and where ηis a Gaussian probability density function η(X t ,µ,Σ)=

1(2π)n 2

|Σ|

e −12(X t −µt )

Σ−1(X t −µt )

(3)

K is determined by the available memory and compu-tational power.Currently,from 3to 5are used.Also,for computational reasons,the covariance matrix is assumed to be of the form:

Σk,t =σ2

k I

(4)

This assumes that the red,green,and blue pixel values

are independent and have the same variances.While this is certainly not the case,the assumption allows us to avoid a costly matrix inversion at the expense of some accuracy.

Thus,the distribution of recently observed values of each pixel in the scene is characterized by a mixture of Gaussians.A new pixel value will,in general,be represented by one of the major components of the mixture model and used to update the model.

If the pixel process could be considered a sta-tionary process,a standard method for maximizing the likelihood of the observed data is expectation maximization [1].Unfortunately,each pixel process varie

s over time as the state of the world changes,so we use an approximate method which essentially treats each new observation as a sample set of size 1

and uses standard learning rules to integrate the new data.

Because there is a mixture model for every pixel in the image,implementing an exact EM algorithm on a window of recent data would be costly.Instead,we implement an on-line K-means approximation.Every new pixel value,X t,is checked against the existing K Gaussian distributions,until a match is found.A match is deﬁned as a pixel value within2.5standard deviations of a distribution1.This threshold can be perturbed with little eﬀect on performance.This is eﬀectively a per pixel/per distribution threshold.This is extremely useful when diﬀerent regions have diﬀer-ent lighting(see Figure2(a)),because objects which appear in shaded regions do not generally exhibit as much noise as objects in lighted regions.A uniform threshold often results in objects disappearing when they enter shaded regions.

If none of the K distributions match the current pixel value,the least probable distribution is replaced with a distribution with the current value as its mean value,an initially high variance,and low prior weight.

The prior weights of the K distributions at time t,ωk,t,are adjusted as follows

ωk,t=(1−α)ωk,t−1+α(M k,t)(5) whereαis the learning rate2and M k,t is1for the model which matched and0for the remaining mod-els.After this approximation,the weights are re-normalized.1/αdeﬁnes the time constant which de-termines the speed at which the distribution’s param-eters change.ωk,t is eﬀectively a causal low-passﬁl-tered average of the(thresholded)posterior probabil-ity that pixel values have matched model k given ob-servations from time1through t.This is equivalent to the expectation of this value with an exponential window on the past values.

Theµandσparameters for unmatched distribu-tions remain the same.The parameters of the dis-tribution which matches the new observation are up-dated as follows

µt=(1−ρ)µt−1+ρX t(6)σ2t=(1−ρ)σ2t−1+ρ(X t−µt)T(X t−µt)(7) where

ρ=αη(X t|µk,σk)(8) 1Depending on the curtosis of the noise,some percentage of the data points“generated”by a Gaussian will not“match”. The resulting random noise is easily ignored by neglecting con-nected components containing only1or2pixels.

2While this rule is easily interpreted an an interpolation between two points,it is often shown in the eq

uivalent form:ωk,t=ωk,t−1+α(M k,t−ωk,t−1)which is eﬀectively the same type of causal low-pass ﬁlter as mentioned above,except that only the data which matches the model is included in the estimation.

One of the signiﬁcant advantages of this method is that when something is allowed to become part of the background,it doesn’t destroy the existing model of the background.The original background color re-mains in the mixture until it becomes the K th most probable and a new color is observed.Therefore,if an object is stationary just long enough to become part of the background and then it moves,the distribution describing the previous background still exists with the sameµandσ2,but a lowerωand will be quickly re-incorporated into the background.

2.2Background Model Estimation

As the parameters of the mixture model of each pixel change,we would like to determine which of the Gaussians of the mixture are most likely produced by background processes.Heuristically,we are interested in the Gaussian distributions which have the most sup-porting evidence and the least variance.

To understand this choice,consider the accumu-lation of supporting evidence and the relatively low variance for the“background”distributions when a static,persistent object is visible.In contrast,when a

new object occludes the background object,it will not,in general,match one of the existing distributions which will result in either the creation of a new dis-tribution or the increase in the variance of an existing distribution.Also,the variance of the moving object is expected to remain larger than a background pixel until the moving object stops.To model this,we need a method for deciding what portion of the mixture model best represents background processes.

First,the Gaussians are ordered by the value of ω/σ.This value increases both as a distribution gains more evidence and as the variance decreases.Af-ter re-estimating the parameters of the mixture,it is suﬃcient to sort from the matched distribution to-wards the most probable background distribution,be-cause only the matched models relative value will have changed.This ordering of the model is eﬀectively an ordered,open-ended list,where the most likely back-ground distributions remain on top and the less prob-able transient background distributions gravitate to-wards the bottom and are eventually replaced by new distributions.

Then theﬁrst B distributions are chosen as the background model,where

B=argmin b

k=1

ωk>T

(9)

where T is a measure of the minimum portion of the data that should be accounted for by the background. This takes the“best”distributions until a certain por-tion,T,of the recent data has been accounted for.If a small value for T is chosen,the background model is usually unimodal.If this is the case,using only the most probable distribution will save processing.

If T is higher,a multi-modal distribution caused by a repetitive background leaves on a tree,aﬂag in the wind,a constructionﬂasher,etc.) could result in more than one color being included in the background model.This results in a transparency eﬀect which allows the background to accept two or more separate colors.

2.3Connected components

The method described above allows us to identify foreground pixels in each new frame while updating the description of each pixel’s process.These labeled foreground pixels can then be segme

nted into regions by a two-pass,connected components algorithm[3].

Because this procedure is eﬀective in determining the whole moving object,moving regions can be char-acterized not only by their position,but size,mo-ments,and other shape information.Not only can these characteristics be useful for later processing and classiﬁcation,but they can aid in the tracking process.

2.4Multiple Hypothesis Tracking

While this section is not essential in the under-standing of the background subtraction method,it will allow one to better understand and evaluate the results in the following sections.

Establishing correspondence of connected compo-nents between frames is accomplished using a lin-early predictive multiple hypotheses tracking algo-rithm which incorporates both position and size.We have implemented an online method for seeding and maintaining sets of Kalmanﬁlters.

At each frame,we have an available pool of Kalman models and a new available pool of connected com-ponents that they could explain.First,the models are probabilistically matched to the connected regions that they could explain.Second,the connected re-gions which could not be suﬃciently explain

ed are checked toﬁnd new Kalman models.Finally,mod-els whoseﬁtness(as determined by the inverse of the variance of its prediction error)falls below a threshold are removed.

Matching the models to the connected compo-nents involves checking each existing model against the available pool of connected components which are larger than a pixel or two.All matches are used to up-date the corresponding model.If the updated model has suﬃcientﬁtness,it will be used in the following frame.If no match is found a“null”match can be hypothesized which propogates the model as expected and decreases itsﬁtness by a constant factor.

The unmatched models from the current frame and the previous two frames are then used to hypothe-size new models.Using pairs of unmatched connected components from the previous two frames,a model is hypothesized.If the current frame contains a match with suﬃcientﬁtness,the updated model is added to the existing models.To avoid possible combina-torial explosions in noisy situations,it may be desir-able to limit the maximum number of existing models by removing the least probable models when excessive models exist.In noisy d cameras in low-light conditions),it is often useful to remove the short tracks that may result from random correspon-dances.Further details of this method can be found at www.ai.mit.edu/projects/vsam/.

3Results

On an SGI O2with a R10000processor,this method can process11to13frames a second(frame size160x120pixels).The variation in the frame rate is due to variation in the amount of foreground present. Our tracking system has been eﬀectively storing track-ing information forﬁve scenes for over16months[6]. Figure3shows accumulated tracks in one scene over the period of a day.

While quick changes in cloud cover(relative toα, the learning rate)can sometimes necessitate a new set of background distributions,it will stabilize within10-20seconds and tracking will continue unhindered.

Because of the stability and completeness of the representation it is possible to do some simple classi-ﬁcation.Figure4shows the classiﬁcation of objects which appeared in a scene over a10minute period using a simple binary threshold on the time-averaged aspect ratio of the object.Tracks lasting less than a second were removed.

Every object which entered this scene–in total,33 cars and34people–was tracked.It successfully clas-siﬁed every car except in one case,where it classiﬁed two cars as the same object because one car entered the scene simultaneously with another car leaving at the same point.It found only one person in two cases where two people where walking in physical contact. It also double counted2objects because their tracks were not matched properly.

4Applicability

When deciding on a tracker to implement,the most important information to a researcher is where the

688IT编程网

Adaptive background mixture models for

发表评论

推荐文章

应用程序的安全检测方法、装置、电子设备和存储介质

nginx map用法正则

VBA之正则表达式(1)--基础篇

Prometheus监控学习笔记之初识PromQL

关于PHP中的webshell

热门文章

一种任意人头与任意人体的3D结合方法

正则匹配c语言中8进制

fortran数据格式

python中文本转数字用的公式

gh 文本变数值

js判断输入是否为正整数、浮点数等数字的函数代码

qt浮点数正则表达式

QT正则表达式限制输入值

手机号码和电话号码的正则表达式

str转浮点-概述说明以及解释

英豪结尾的诗句

Java正则表达式:符合以特定字符串开头,以特定字符串结尾的所有结果

machinebuilder使用手册

ASP.NET网站建设基本常用代码

LCD显示实时时钟

经纬度正则表达式解析

前端科学计数法转数字

python正则表达式re之compile函数解析

pythonunittest之断言及示例

[lua]lua中匹配字符串小数

最新文章

nginx map用法正则

Prometheus监控学习笔记之初识PromQL

关于PHP中的webshell

python中re.findall函数实例用法

nginx url表达式

nginx 正则匹配参数

标签列表

688IT编程网

Adaptive background mixture models for

发表评论

推荐文章

应用程序的安全检测方法、装置、电子设备和存储介质

nginx map用法 正则

VBA之正则表达式(1)--基础篇

Prometheus监控学习笔记之初识PromQL

关于PHP中的webshell

热门文章

一种任意人头与任意人体的3D结合方法

正则匹配c语言中8进制

fortran数据格式

python中文本转数字用的公式

gh 文本变数值

js判断输入是否为正整数、浮点数等数字的函数代码

qt浮点数正则表达式

QT正则表达式限制输入值

手机号码和电话号码的正则表达式

str转浮点-概述说明以及解释

英豪结尾的诗句

Java正则表达式:符合以特定字符串开头,以特定字符串结尾的所有结果

machinebuilder使用手册

ASP.NET网站建设基本常用代码

LCD显示实时时钟

经纬度正则表达式解析

前端科学计数法转数字

python正则表达式re之compile函数解析

pythonunittest之断言及示例

[lua]lua中匹配字符串小数

最新文章

nginx map用法 正则

Prometheus监控学习笔记之初识PromQL

关于PHP中的webshell

python中re.findall函数实例用法

nginx url表达式

nginx 正则匹配参数

标签列表

nginx map用法正则

nginx map用法正则