[IEEE 2008 International Symposium on Ubiquitous Multimedia Computing (UMC) - Hobart, Australia (2008.10.13-2008.10.15)] 2008 International Symposium on Ubiquitous Multimedia Computing

Video Trans-coding in Smart Camera for Ubiquitous Multimedia Environment

Ekta Rai ChutaniIndian Institute of Technology

Multimedia LaboratoryNew Delhi, India

[email protected]

Santanu ChaudhuryIndian Institute of Technology

Multimedia LaboratoryNew Delhi, India

[email protected]

Abstract

Smart cameras are expected to be important componentsfor creating ubiquitous multimedia environments. In thispaper, we propose a scheme for on-line semantic transcod-ing of the video captured by the smart camera. Thetranscoding process selects frames of importance and re-gions of interest for use by other processing elements in aubiquitous computing environment. We have proposed a lo-cal associative computation based change detection schemefor identifying frames of interest. The algorithm also seg-ments out the region of change. The computation is struc-tured for easy implementation in DSP based embedded en-vironment. The transcoding scheme enables the camera tocommunicate only regions of change in frames of interest toa server or a peer. Consequently communication and pro-cessing overhead reduces in a networked application envi-ronment. Experimental results have established effective-ness of the transcoding scheme.

1. Introduction

Digital cameras with Embedded CODECS are commoncommercial products. Smart Cameras can not only processand interpret the data that they capture in real time but alsocan intelligently decide what to store and what to commu-nicate. Smart cameras are expected to be important com-ponents for creating ubiquitous multimedia environments.In general, video transcoder converts one compressed videobitstream into another with a different format, size (reso-lution), bit rate, or frame rate. The goal of transcoding isto enable the interoperability of heterogeneous multimedianetworks reducing complexity and run time by avoiding thetotal decoding and re-encoding of a video stream[1]. In con-trast to this, in this paper we propose a semantic transcod-ing scheme which can be used to obtain a filtered outputfrom a smart camera satisfying a semantically meaningfulcondition. The proposed semantic filtering scheme selects

only frames of interest from an input video stream for fur-ther processing. The algorithm is structured for facilitatingeasy implementation on an ASIC or dedicated DSP used ina smart camera.[8] Our transcoding scheme is based on achange detection scheme for identifying frames of interest.Change detection is important in any vision based monitor-ing system. Various approaches for change detection havebeen proposed in literature, based upon properties like colorof the scene content, shape of the object of interest, motionparameters of objects, etc. Predictions based on the past his-tory have also been used [7]. Change detection have beenused for a variety of applications. For example, (a) segment-ing video sequence into logical shots using object based fea-tures [4]; (b) segmentation of background and foregroundobjects [2, 9]; (c) classifying the video shots depending onthe change detection features (identifying action shots),[3].(d) prediction based change detection[7]; (e) various mapconditions based on pixel colors [5]; and so on.

Most of the change detection algorithms are pixel basedor block based. In pixel based approach, difference betweenthe two consecutive frames has been calculated for everypixel. Then by using a threshold value depending on theapplication, change has been identified. In block based ap-proach, frame is divided into blocks of required sizes. Blockmatching algorithm is implemented on each block of theframe. There are number of methods to decide the goodnessof the match on block based and some of them are: CrossCorrelation Function, Pel Difference Classification (PDC),Mean Absolute Difference, Mean Squared Difference andIntegral Projection etc [5].

For semantic transcoding, we have proposed a cluster-ing based change detection scheme motivated by [2]. How-ever, our algorithm uses a computationally simpler cluster-ing scheme for change detection and unlike[2] propagatescluster label information in the image space for extractionof areas of interest. The clustering based scheme has ad-vantage of detecting only those changes which are not con-sistent with the past history Our method offers an unifiedapproach which combines the capability of change detec-

2008 International Symposium on Ubiquitous Multimedia Computing

978-0-7695-3427-5/08 $25.00 © 2008 IEEE

DOI 10.1109/UMC.2008.44

185

tion for identifying frames of interest (FOI), segmentationfor identifying reasons of changes and consequent transcod-ing by reducing the number of frames and selecting onlyregions in frames of interest for storage and communica-tion without any loss of required information. The semantictranscoding algorithms with its local and associative com-putational structure are novel contributions of this work.

2. On line Key Frame Detection

The key factor is to detect that change in a videosequence which conveys some relevant information forthe application. For instance, when a person/object entersin area of focus, is a change. In a video, any movingobject can give rise to change in between consecutiveframes. However, repetitive continuous changes in thebackground such as flying birds, moving leaves of trees orconstantly moving traffic on road are changes which arenot of significance. Figure 1 and Figure 2 shows twoexamples of object-centric changes in the scene.

Figure 1. Changed Position of the person asin video sequence

Figure 2. Cow enters in area of focus

Some examples of pseudo stationary backgrounds havebeen shown in Figure 3 & Figure 4. In such cases it is diffi-cult to define one term for change. It varies from applicationto application. Here we have proposed a clustering basedchange detection scheme which learns the characteristic ofpseudo stationary background and flags change when thereoccurs definitive deviation from the past history.

Figure 3. Leaves of the tree are in continu-ous motion. Detected change is the personentered.

Figure 4. Traffic in the background on theroad is in continuous motion: Detectedchange is car entered.

We hypothesize that frames which indicate significantdeviation from the past are only candidates about which in-formation needs to be shared between computing elementsin an ubiquitous computing environment.Here, we proposean algorithm for detecting only substantive changes. Thealgorithm does not assume a background model but learnsthe past history using an unsupervised incremental cluster-ing algorithm. Further, the algorithm makes use of localizedcomputation and consequently has the ability to locate thechanges as well.

2.1. Clustering of Pixel Values

We assume that input frames are in R, G, B colour spacewith identical resolution for each plane. Each frame is parti-tioned into 4x4 blocks. We perform incremental clusteringin color space for each of these 4x4 blocks. Typically inan application setting we need to record pixel history andidentify the change with reference to the past. The pseudo-stationary color values at a pixel have multi-modal distribu-tion. The incremental clustering process is expected to dis-cover modes of data distribution. For each block we have aset of clusters. For each cluster we store

i. Centroid value (in RGB)

ii. Frame number which updated the cluster last and

186

iii. Counter to count the number of frames mapped to thecluster

iv. Optionally each cluster is associated with a flag field

The cluster set for each block is initialized with a clusterhaving its centroid set to the average color value of the cor-responding block of the first frame. Each list may have max-imum of FIVE cluster nodes.

Figure 5. Basic Computational Scheme.

When we receive a new frame, we compute differencewith the previous frame. The difference is calculatedby taking Manhattan distance (absolute differences) be-tween R, G and B values of each pixel in the following way:

R = |R1−R2|

B = |B1 −B2|

G = |G1 −G2|

Diff = R + G + B

Obviously, we have avoided the overhead of multipli-cation in difference computation. We use average differ-ence value for each block. For low difference (less than athreshold A) blocks increment cluster counter and updatethe frame number associated with the cluster node of thecorresponding block of the previous frame. For blocks hav-ing high difference (>A), we find the nearest cluster cen-troid in the cluster set. We use the same Manhattan distancefor similarity computation. If the difference with the nearestcluster is greater than the threshold (A) a new cluster nodeis created. The centroid is updated at each step. We use thesame strategy as in [2] for updating the centroid. The al-gorithm allows maximum of 5 such nodes corresponding toeach 4x4 block. A new cluster is created when a given pixelassumes new value (completely different from the past) cor-responding to the changes happening in the scene. In case ofperiodic changes in pixel values, re-occurrences of the pastsequences are mapped to existing clusters. If numbers ofnodes is greater than 5 then we apply Principle of Aging forthe purpose of deleting old cluster nodes for recording tem-poral evolution of pixel values. When a new cluster node is

to be created and the number of existing cluster nodes is 5,we eliminate the cluster node which has not been updatedfor the longest period of time. Hence, we need not have apriory assumption about the possible number of clusters.

Now, we can look at Cluster Update procedure. TheCluster Update procedure performs the necessary process-ing for the incremental clustering. We assume that the setof clusters for a block is implemented through a vector datastructure.

2.2. Key frame Detection

Clustering algorithm presented in the previous sectionis used for generating a set of key frames for further pro-cessing. These key frames are expected the essential con-tent of the video stream. Given a camera installed at afixed location, informative contents of the video stream arethe (i)background view and (ii) changes in the foreground.These contents can be easily captured through the clusteringprocess. At a given instance, using the centroids of the clus-ters updated by the last frame we can generate a low resolu-tion (one-fourth of the original resolution) view of the sceneon demand. This view can be used by a remote visualizationterminal. Whenever we introduce a new cluster for a block,it is obvious that pixel values for the block have undergonechanges from the past. Further, blocks mapped to new clus-ters (with count less than N) are also likely to belong to fore-ground object. Such blocks are marked and a low resolutionbinary image is generated. Next, we find connected compo-nents in the binary image. If the size of any connected com-ponent exceeds 5% of the image size, we flag such framesas key frame. It is clear from the above discussion that whena new object, with spectral properties different from that ofbackground, enters the view, it will be captured in the keyframes. We shall continue to generate key frames for theobject in motion for at least N frames. Subsequently, if the

187

object becomes static no new clusters will be created for anyblock in the image and hence we shall stop generating keyframes. In Figure 6 & Figure 7 we show the object detectedas the scene changes. It may be noted that in Figure 6 onlythe foreground objects have been identified despite pseudo-stationary nature of the background. The continuous streamof video is replaced by these frames which indicate substan-tive change. Basic computational steps involved in clustergeneration and change detection are block wise independentand can be executed in parallel. In fact, each such task canbe mapped to individual PE’s in a VLIW DSP.

Figure 6. Binary images showing Change inVideo of Figure 1 and Figure 2.

Figure 7. Changes in video shown in figure 3and figure 4.

3. Segmentation

The smart camera is expected to segment the object ofinterest and may communicate region of interest in high res-olution to the other cameras in the network. In this section,we present a simple scheme for segmenting out objects ofinterest.

The connected components found in low resolution bi-nary images as described in the earlier section, are segmentsof interest in a frame. This connected component is furtherrefined with reference to the original frame for extraction ofsegments of different colors.

However, we shall fail to segment out an object follow-ing the above mentioned scheme if it becomes static. Inother words, this object will be missed, if a large numberof blocks belonging to the object have accumulated countsgreater N because in the subsequent frames these blockswill not be marked as changed blocks. However, we need atechnique to overcome this problem. We propose to makeuse of the information from the past frame to perform seg-mentation even when no change is detected.

We consider significant connected components obtainedfrom the previous frame indicative of a substantive changeor existence of a foreground object. We mark the clustercentroids corresponding to the blocks belonging to theseconnected components with an object label, say O. In ab-sence of object motion, blocks belonging to the object willbe mapped to clusters already marked O. Now, we extractconnected components considering change/new flag and Oflag as identical. This enables extraction of the region ofinterest even in absence of motion. In the above figure, weshow an example of object extraction with connected com-ponent of different colors shown in bounded box.

4. Results and Discussions

Case 1: Moving Object Detection with static backgroundIn general, most of the algorithms of Change Detection formoving objects make use of Motion Vectors or Optical FlowMethods. But at this juncture, a new technique of cluster-ing is used for Change Detection. As a result in Figure6, it shows only those blocks where value changes due tothe movements of the person and cow respectively. Thisapproach of clustering also reduces the complexity of thework for about 30% as compare to the other techniques.This makes it more successful for VLIW architecture basedsystems.

Case 2: Pseudo Stationary background Commonly forsuch situations, background-foreground separation is doneexplicitly. This approach has its novelty where continuousmovements in background gets eliminated itself. In Figure

188

7, the change is shown in the area where a person or car en-ters the region of focus. The result shows that the movementof the leaves and background traffic movements are ignoredwithout invoking any special technique. The blocks whichshow changes for long sequence in video, is merged as partof background considering it as constant though out.

Segmentation is done on the basis of color. It results inextracting the objects of different colors by matching theneighbouring blocks.

The algorithms have been implemented on standardLinux based platform. We experimented with about 20video sequences of average length of 500 frames (from 300frames to 1000 frames). Our algorithm has detected about90% of true changes and has reduced substantially numberof frames in the sequence on average by a factor of 9 (ap-proximately).

5. Conclusions

In this paper we have presented algorithms for seman-tic transcoding of the video which can useful for distributedmonitoring applications. The transcoding is done on thebases of Region of Interest. Only those frames are con-sidered where the information is relevant to change detec-tion. The algorithms have been designed so that it can bedecomposed into a set of local pixel based associative com-putations amenable for SIMD processing or mapping ontoVLIW DSP. Semantic transcoding schemes amenable forimplementation in embedded environment are the key con-tribution of this paper. We have also shown that these algo-rithms yield good results.

References

[1] M. A. Bonuccelli, F. Lonetti, and F. Martelli.Video transcoding architectures for multimedia realtime services. ERCIM NEWS, (62), July 2005.http://www.ercim.org/publication/ercim_news/enw62/bonucelli.html.

[2] D. E. Butler, V. M. B. Jr., and S. Sridharan. Real time adaptiveforeground/background segmentation. EURASIP Journal onApplied Signal Processing ACM, 2005(1):2292– 2304, Jan-uary 2005.

[3] H.-W. Chen, J.-H. Kuo, and J.-L. W. Wei-Ta Chu. Ac-tion movies segmentation and summarization based on tempoanalysis. International Multimedia Proceedings of the 6thACM SIGMM international workshop on Multimedia infor-mation retrieval, pages 251– 258, January 2004.

[4] J. FENG, K.-T. LO, and H. MEHRP’OUR. Scene changedetection algorithm for mpeg video sequence. IEEE ImageProcessing Proceedings, 2:821– 824, September 1996.

[5] R. J. Radke and B. Roysam. Image change detection algo-rithms: A systematic survey. Image Processing, IEEE Trans-actions, 14(3):294– 307, March 2005.

[6] G. School. Change detection tutorial. www.globe.unh.edu/MultiSpec/Change/Change.pdf.

[7] M. Steyvers and S. Brown. Prediction and change de-tection. Advances in Neural Information Processing Sys-tems, (18):1281– 1288, 2006. http://psiexp.ss.uci.edu/research/papers/.

[8] A. Vetro, C. Christopoulos, and H. Sun. Video trancoding andarchitectures techniques: An overview. IEEE Signal Process-ing Magazine, IEEE, 20(2):18– 29, March 2003.

[9] H. Wang and D. Suter. A consensus based method for trackingmodelling background scenario and foreground appearance.Pattern Recognition, 40(3):1091– 1105, March 2007.

189

Documents

[IEEE 2008 International Symposium on Ubiquitous Multimedia Computing (UMC) - Hobart, Australia (2008.10.13-2008.10.15)] 2008 International Symposium on Ubiquitous Multimedia Computing