4
Fast abnormal event detection from video surveillance S. Javanbakhti, S. Zinger, P. H.N. de With Electrical Engineering department, Eindhoven, The Netherlands AbstractVideo surveillance systems are becoming in- creasingly important both in private and public environments to monitor activity. In this context, this paper presents a novel block-based approach to detect abnormal situations by analyzing the pixel-wise motion context, as an alternative for the conventional object-based approach. Weproceed directly with event characterization at the pixel level, based on motion estimation techniques. Optical flow is used to extract information such as density and velocity of motion. The proposed approach identifies abnormal motion variations in regions of motion activity based on the entropy of Discrete Cosine Transform coefficients. We aim at a simple block- based approach to support a real-time implementation. We will report successful results on the detection of abnormal events in surveillance videos captured at an airport. Keywords: Video surveillance systems, optical flow, Discrete Cosine Transform, event detection, real-time 1. Introduction Recently, government agencies, businesses, and even schools are turning toward video surveillance as a means to increase public security. Video surveillance has been a key component in ensuring security at airports, banks, casinos, and correctional institutions [1]. The goal of visual surveillance is not only to use cameras instead of human eyes, but also to accomplish the entire surveillance task as automatically as possible using video analysis. Visual surveillance in dynamic scenes using video analysis has a wide range of potential applications, such as security for infrastructures and important buildings, traffic surveillance in cities and highways, detection of military targets, etc. [2]. For automatic dynamic scene analysis, anomaly detection is a challenging task especially given a scene consisting of complex correlated activities of multiple objects [3]. Anomaly detection techniques can be divided into two broad families of approaches, namely pattern-recognition- based and machine-learning-based methods. The pattern recognition approaches are typically those where the type of abnormal activity or object is a priori known. Nevertheless, such shape recognition methods require a list of objects or behavior patterns that are anomalous. Unfortunately, this is not always possible, especially where suspicious activities cannot be known in advance. An alternative approach is based on learning “normal" behavior from a video sequence exhibiting regular activity and then flag moving objects whose behavior deviates from normal behavior [4]. As dis- cussed in different review papers [5] [6], many methods im- plement a general pipeline-based framework: at first moving objects are detected, then they are classified and tracked over a certain number of frames and finally, the resulting paths are used to distinguish “normal" behavior of objects from the “abnormal" [7] [8]. Although tracking-based methods have proven successful in different applications, they suffer from fundamental limitations. First, implementing such pipeline methods can result in a fragile architecture which may suffer from an error propagating through the subsequent processing stages. Secondly, tracking multiple objects at the same time requires complex algorithms and is computationally heavy. Therefore, multi-object tracking is not always efficient in crowded areas where objects regularly fully or partially occlude each other. This task is spatially hard in surveillance videos where quality and color information can be poor. Thirdly, tracking is efficient mostly with rigid moving bodies such as cars, trains, or pedestrians, and is not well suited to deal with unstructured motion such as waves on the water or trees shaking due to wind [4]. To address these limitations, some authors have recently proposed learning methods based on characteristics other than motion paths [9]. In such a case, there is no need for object tracking, instead, we consider pixel-level features. The main idea is to analyze the general motion context instead of tracking subjects one by one. We design a general framework based on features directly extracted from motion such as velocity at pixel-level. This will lead to an image that expresses the motion in the scene. Then we analyze the information content of that image in the “frequency" domain by computing the en- tropy of the involved DCT coefficients that are related to the local motion description. After successfully analyzing motion in each frame, we should understand the behavior of the objects. Behavior understanding involves the analysis and recognition of motion patterns, and the description of actions and interactions at high level [2]. For an event to be considered normal or abnormal based on motion features, we compare the entropies for each block to the median averaged values over time to classify events into normal and abnormal. For the complete system, we aim at simple and fast methods suitable for real-time operation and allowing parallel processing due to block-based computing. The real-time aspect also explains why we have adapted the DCT, as this transform can be efficiently implemented. The sequel of the paper is as follows. Section 2 describes the approach for abnormal event detection including motion

Fast abnormal event detection from video surveillancevca.ele.tue.nl/sites/default/files/abnormal.pdf · Fast abnormal event detection from video surveillance S. Javanbakhti, S. Zinger,

  • Upload
    vandieu

  • View
    239

  • Download
    2

Embed Size (px)

Citation preview

Page 1: Fast abnormal event detection from video surveillancevca.ele.tue.nl/sites/default/files/abnormal.pdf · Fast abnormal event detection from video surveillance S. Javanbakhti, S. Zinger,

Fast abnormal event detection from video surveillance

S. Javanbakhti, S. Zinger, P. H.N. de WithElectrical Engineering department, Eindhoven, The Netherlands

Abstract— Video surveillance systems are becoming in-creasingly important both in private and public environmentsto monitor activity. In this context, this paper presents anovel block-based approach to detect abnormal situations byanalyzing the pixel-wise motion context, as an alternative forthe conventional object-based approach. We proceed directlywith event characterization at the pixel level, based onmotion estimation techniques. Optical flow is used to extractinformation such as density and velocity of motion. Theproposed approach identifies abnormal motion variations inregions of motion activity based on the entropy of DiscreteCosine Transform coefficients. We aim at a simple block-based approach to support a real-time implementation. Wewill report successful results on the detection of abnormalevents in surveillance videos captured at an airport.

Keywords: Video surveillance systems, optical flow, DiscreteCosine Transform, event detection, real-time

1. IntroductionRecently, government agencies, businesses, and even

schools are turning toward video surveillance as a meansto increase public security. Video surveillance has beena key component in ensuring security at airports, banks,casinos, and correctional institutions [1]. The goal of visualsurveillance is not only to use cameras instead of humaneyes, but also to accomplish the entire surveillance taskas automatically as possible using video analysis. Visualsurveillance in dynamic scenes using video analysis has awide range of potential applications, such as security forinfrastructures and important buildings, traffic surveillancein cities and highways, detection of military targets, etc. [2].For automatic dynamic scene analysis, anomaly detectionis a challenging task especially given a scene consistingof complex correlated activities of multiple objects [3].Anomaly detection techniques can be divided into twobroad families of approaches, namely pattern-recognition-based and machine-learning-based methods. The patternrecognition approaches are typically those where the type ofabnormal activity or object is a priori known. Nevertheless,such shape recognition methods require a list of objects orbehavior patterns that are anomalous. Unfortunately, this isnot always possible, especially where suspicious activitiescannot be known in advance. An alternative approach isbased on learning “normal" behavior from a video sequenceexhibiting regular activity and then flag moving objects

whose behavior deviates from normal behavior [4]. As dis-cussed in different review papers [5] [6], many methods im-plement a general pipeline-based framework: at first movingobjects are detected, then they are classified and tracked overa certain number of frames and finally, the resulting pathsare used to distinguish “normal" behavior of objects from the“abnormal" [7] [8]. Although tracking-based methods haveproven successful in different applications, they suffer fromfundamental limitations. First, implementing such pipelinemethods can result in a fragile architecture which may sufferfrom an error propagating through the subsequent processingstages. Secondly, tracking multiple objects at the same timerequires complex algorithms and is computationally heavy.Therefore, multi-object tracking is not always efficient incrowded areas where objects regularly fully or partiallyocclude each other. This task is spatially hard in surveillancevideos where quality and color information can be poor.Thirdly, tracking is efficient mostly with rigid moving bodiessuch as cars, trains, or pedestrians, and is not well suited todeal with unstructured motion such as waves on the water ortrees shaking due to wind [4]. To address these limitations,some authors have recently proposed learning methods basedon characteristics other than motion paths [9]. In such a case,there is no need for object tracking, instead, we considerpixel-level features. The main idea is to analyze the generalmotion context instead of tracking subjects one by one.We design a general framework based on features directlyextracted from motion such as velocity at pixel-level. Thiswill lead to an image that expresses the motion in thescene. Then we analyze the information content of thatimage in the “frequency" domain by computing the en-tropy of the involved DCT coefficients that are related tothe local motion description. After successfully analyzingmotion in each frame, we should understand the behaviorof the objects. Behavior understanding involves the analysisand recognition of motion patterns, and the description ofactions and interactions at high level [2]. For an eventto be considered normal or abnormal based on motionfeatures, we compare the entropies for each block to themedian averaged values over time to classify events intonormal and abnormal. For the complete system, we aim atsimple and fast methods suitable for real-time operation andallowing parallel processing due to block-based computing.The real-time aspect also explains why we have adaptedthe DCT, as this transform can be efficiently implemented.The sequel of the paper is as follows. Section 2 describesthe approach for abnormal event detection including motion

Page 2: Fast abnormal event detection from video surveillancevca.ele.tue.nl/sites/default/files/abnormal.pdf · Fast abnormal event detection from video surveillance S. Javanbakhti, S. Zinger,

estimation, measuring entropy and then detecting abnormalevents. Section 3 presents our experimental results and thecomparison with the state-of-the-art algorithm [9]. Section4 concludes the paper.

2. Abnormal event detection algorithmsOur abnormal event detection is based on motion features

extracted with a motion estimation technique. Motionestimation in image sequences aims at detecting regionscorresponding to moving objects such as vehicles andhumans. Detecting moving regions provides features forlater processing such as tracking and behavior analysis. Webase our features on pixel-based optical flow, since this isthe most natural technique for capturing motion independentof appearance [10]. Computer vision experience suggeststhat computation of optical flow is not very accurate,particularly on coarse and noisy data. To deal with this,we use optical flow at each frame using the Lucas-Kanadealgorithm [11]. Optical-flow-based motion estimation usescharacteristics of flow vectors of moving objects overtime to detect moving regions in an image sequence,relating each image to the next. Each vector representsthe apparent displacement of each pixel from image toimage [12]. The result of optical flow is the value ofdisplacement of each pixel at both vertical and horizontaldirection. We combine this displacement to obtain a motionmagnitude vector. To process these motion vectors, wesubstitute pixel values for the estimated motion and wedivide each frame into blocks. We expect that duringabnormal events the motion patterns and therefore theenergy of the images containing motion vectors changecompared to normal behavior. We apply a DCT to eachblock, as the DCT provides a compact representationof the signal’s energy. Then we compute the entropy ofthe DCT coefficients to measure the information contentof the DCT coefficients [13]. The entropy is defined as [14]:

E = −N∑i=1

p log p, (1)

where N is the size of image and p contains the probabilityof the motion intensity value at a certain pixel location. Thenumber of bins in the histogram is specified by the imagetype. In our case the motion magnitude per pixel forms agray-scale image and we use 256 bins which correspond tothe number of gray levels. For deciding whether the eventis normal or abnormal, we compare the entropy value withthresholds which we learn per block in the beginning of thevideo sequence. This threshold is based on a median value ofthe entropies which we estimate during the first 500 framesof the video. Obviously, we assume that abnormalities willnot occur during the first 500 frames of the surveillancesequence. In general this median computation of the thresh-old can be done continuously through the video because the

abnormalities will be filtered out by taking a median value.We limit the median filtering over time for controlling thetotal complexity of the calculations. An abnormal event isindicated when the value of the entropy for the current frameis higher than the threshold defined for that block.An alternative approach [9] used for comparison definesa statistic measure that describes how much the opticalflow vectors are organized or cluttered in the frame. Thatalgorithm uses a metric which is the scalar product ofthe normalized values of the following factors: directionvariance, motion magnitude variance and direction histogrampeaks. This metric is then compared to a threshold which isa manually set at a constant value. Figure 1 illustrates ourblock-based processing framework in dynamic scenes.

Fig. 1: Block-based processing framework.

3. Experimental resultsIn our experiments, we use surveillance videos of an

airport where several abnormal situations are simulated bya group of volunteers. These situations include running ofseveral people to the middle and from the middle of thescene, dancing in a circle and forming a fast growing group(overcrowding). In total, we have 6 of such a situation whichoccur in the video sequence of 12, 000 frames. The spatialresolution of an original video frame is 720 × 576 with29 frames per second, which is typical for surveillance inbroadcast quality. We implement our block-based approachwhich is mentioned above on these data set. Furthermore, wecompare the performance of our approach with an alternativeapproach presented by Ihaddadene and Djeraba [9] and weanalyze the obtained results.

3.1 Performance of our approachIn our current implementation, we divide each frame into

4 blocks for simplicity but this can be easily changed. Foreach block we calculate the entropy of the DCT coefficientsof the motion vector magnitudes and then compute the me-dian value over the first 500 frames. Based on experimentsand evaluation, the threshold for the median entropy toclassify an abnormal event is empirically set to 3 times themedian value. Abnormality indicator for the whole frameis raised if abnormality is detected in any of the blocks.Figure 2(a) shows an example of a frame of our videosand in Figure 2(b) we indicate the presence of the alarm.Figure 3 illustrates the result of our system in abnormal

Page 3: Fast abnormal event detection from video surveillancevca.ele.tue.nl/sites/default/files/abnormal.pdf · Fast abnormal event detection from video surveillance S. Javanbakhti, S. Zinger,

situation. The original frame of the airport video sequenceshows a situation where people are moving in a circle and theanalysis window shows a clear warning with the 4 bars on all4 blocks. From this video sequence, our approach detected5 out of 6 abnormal events, without producing false alarms.Figure 4(a-d) also show frames of abnormal situations.Figure 5(a) visualizes ground-truth results in block number3 and Figure 5(b) illustrates how our system works in thoseintervals in that block.

3.2 Result of comparisonWe have applied the approach of Ihaddadene and Djer-

aba [9] to our video sequences. In airports, people normallymove in any desired direction, therefore we do not considertheir movement direction as a suitable feature which hasbeen used in [9]. After experimenting with the originalapproach of these authors, we have tried to improve thoseresults by considering only the motion variance per frame.We have experimented with several threshold values forthis approach and the best results obtained are illustrated inFigure 5(c). We can observe that this method produces falsealarms in normal situations and it has missed 3 abnormalevents.

4. ConclusionsIn this paper, we have developed a motion-context-based

algorithm to detect abnormal events in surveillance videosof a public place. Our major contribution is introducinginformative features based on motion and using an automat-ically updated threshold to detect abnormal events. We havediscovered that the entropy of the DCT-transformed motionmagnitude is a reliable measure for classifying whether thecurrent activity in the video is normal or not. Because theproposed method is block-based, we can indicate exactlyin which part of the frame the abnormal event takes place.Another advantage of using block is the possibility forparallel processing in a real-time implementation, since eachblock can be independently processed. The type of signalprocessing is chosen that fast real-time analysis is enabled.We obtain processing results on an airport surveillance videoby detecting 5 out of 6 abnormal events while preventingfalse alarms. Our framework is generic and does not dependon the type of scene. For future work, we envisage more re-search on DCT-based features for abnormal event detection.We would also like to test our approach on different types ofscenes where not only humans are involved but also movingobjects.

References[1] L. Ovsenk, A. Kolesrov, J. Turn, “Video Surveillance Systems", Acta

Electrotechnica et Informatica, Kosice: FEI TU, vol. 10, no. 4, pp.46-53, 2010.

(a) (b)

Fig. 2: Result of our system in a normal situation.

(a) (b)

Fig. 3: Original frame with a result of our system inabnormal situation (block numbering is column wise startingat the top left).

(a) (b)

(c) (d)

Fig. 4: (a) Empty scene before dancing, (b) dancing, (c)overcrowded and (d) running.

Page 4: Fast abnormal event detection from video surveillancevca.ele.tue.nl/sites/default/files/abnormal.pdf · Fast abnormal event detection from video surveillance S. Javanbakhti, S. Zinger,

(a) (b)

(c)

Fig. 5: (a) Ground-truth results in block number 3 for thecomplete surveillance video, (b) warnings based on ourdetection corresponding to those events (bars show alarmin block number 3 at current frame and first two barscorrespond to one event) and (c) warning for comparisoncorresponding to the system of [9].

[2] W. Hu, T. Tan, “A Survey on Visual Surveillance of Object Motion andBehaviors", IEEE transactions on systems, man, and cybernetics-partc: applications and reviews, vol. 34, no. 3, August, 2004.

[3] J. Li, Sh. Gong, T. Xiang, “Global Behaviour Inference using Prob-abilistic Latent Semantic Analysis", the 19th British Machine VisionConference (BMVC), pp. 193-202, Leeds, UK, September, 2008.

[4] Y. Benezeth, P.-M. Jodoin, V.Saligrama, C. Rosenberger, “AbnormalEvent Detection based on spatio-Temporal Co-occurence", in: Proceed-ings of IEEE Conference on Computer Vision and Pattern Recognition,pp. 2458-2465, 2009.

[5] T. Chen, H. Haussecker, A. Bovyrin, et al., “Computer vision workloadanalysis: case study of video surveillance systems", intel. Technologyjournal, vol. 9, no. 2, pp. 109-118, 2005.

[6] H.Boxton, “Learning and understanding dynamic scene activity: Areview", Image and Vision Computing, vol. 21, no. 1, pp. 125-136,2003.

[7] N. Johnson and D. hogg, “Learning the distribution of object trajecto-ries for event recognition", Image and Vision Computing, vol. 14, no.8, pp. 609-615, August, 1996.

[8] I. Junejo, O. javed and M. Shah, “Multi feature path modeling for videosurveillance", In Proc. of ICPR, no. 04, pp. 716-719, 2004.

[9] N. Ihaddadene, C. Djeraba, “Real-time Crowd Motion Analysis”, 19thInternational Conference on Pattern Recognition (ICPR), pp. 08-11,Tampa, Florida - USA, December, 2008.

[10] A. A. Efros, A. C. Berg, G. Mori, J. Malik, “Recognizing action at adistance", Proceeding of the Ninth IEEE International Conference onComputer Vision (ICCV03), pp. 13-16, Nice, France, October, 2003.

[11] B. D Lucas and T. Kanade, “An iterative image registration techniquewith an application to stereo vision", In DARPA Image Understandingworkshop, April, 1981.

[12] A., Wali, A.M, Alimi, “Event Detection from Video Surveillance DataBased on Optical Flow Histogram and High-level Feature Extraction",20th International Workshop on Database and Expert Systems Appli-cation, 2009.

[13] N. Kingsbury, (2005) Connexions homepage on The Haar Transform.[Online]. Available: http://cnx.org/content/m11087/latest.

[14] Gonzalez, R.C., R.E. Woods, S.L. Eddins, Digital Image ProcessingUsing MATLAB, Gatesmark Publishing, 2nd edition, 2009.