13
A knowledge-based component library for high-level computer vision tasks D. Fernández-López, R. Cabido, A. Sierra-Alonso, A.S. Montemayor, J.J. Pantrigo Departamento de Ciencias de la Computación, Universidad Rey Juan Carlos, c/ Tulipán s/n, 28933 Móstoles, Spain article info Article history: Received 15 November 2013 Received in revised form 27 June 2014 Accepted 23 July 2014 Available online 2 August 2014 Keywords: Knowledge modeling Software component reuse Computer vision Visual surveillance Knowledge based systems Domain modelling Software engineering abstract Computer vision is an interdisciplinary field that includes methods for acquiring, processing, analyzing, and understanding visual information. In computer vision, usually the approaches to solve problems are specific-application methods and, therefore, reusing captured knowledge in computer vision is hard. However, the aim of knowledge modeling (KM) is to capture and reuse knowledge to solve different prob- lems. In this paper, we propose a knowledge-based component library for computer vision tasks such as video surveillance applications. The developed components are based on the region of interest (ROI) a well-known concept in the image processing and computer vision fields. We provide a set of reusable components that are specializations and/or compositions of ROIs. Finally, we propose several case studies that illustrate the feasibility of the proposal. Experimental results show that the proposed method deals effectively and efficiently with real-life computer vision problems. Ó 2014 Elsevier B.V. All rights reserved. 1. Introduction Computer vision is an interdisciplinary research area which objectives include the acquisition, processing and analysis of visual information. During the last two decades, there has been an increasing interest in the development of visual systems which interpret and understand image sequences, mainly motivated by the growing computation power. Nowadays, computer vision is applied in real-life systems such as video surveillance, driver assis- tance systems, biometrics, human–computer interaction, to cite only a few [6,16]. These applications involve visual processing techniques as feature detection, segmentation, optical flow compu- tation, background models estimation, visual tracking, object rec- ognition, etc. [9]. Different strategies have been proposed for the analysis of image sequences. One of them, is the context modeling. The con- textual model is typically used as a tool to interpret a sequence of images [22]. A system is context-aware if it uses contextual data (in space, time, geometry, physics, lighting, behavior, etc.) to pro- vide relevant information and/or services to the user, where rele- vancy depends on the user’s task [8]. Different approaches can be also found in the literature to acquire, model and use contextual models in computer vision. Mira et al. [15] proposes a knowledge-based model for the task of mov- ing objects detection in image sequences using the CommonKADS methodology [1,2]. The authors focus on the convenience of a knowledge modeling strategy based on the definition of tasks and methods in terms of a reusable component library like KADS. The task is decomposed into four subtasks: (a) thresholded seg- mentation; (b) motion detection; (c) extraction of silhouettes; and (d) fusion of silhouette parts of moving objects. Brémond and Thonnat [4] address the problem of context representation for scene interpretation systems. The authors represent the context of the interpretation process in video surveillance applications and use this context representation to modify the structure of the glo- bal interpretation system. The method provides a tool to manually assign context to regions in an image. There are also proposals in the literature that try to acquire con- textual information automatically. For example, Brdiczka et al. [3] proposes a generic situation acquisition algorithm based on a state model, called situation model, to represent contextual information and human behavior. This model consists of different layers refer- ring to entities, filters, roles, etc., and the authors present a frame- work to automatically acquire these different layers. The proposal is tested on a video surveillance task. Rabinovich et al. [20] apply semantic context to visual object categorization tasks. They pres- ent a comparison of two sources of context: one learned from http://dx.doi.org/10.1016/j.knosys.2014.07.017 0950-7051/Ó 2014 Elsevier B.V. All rights reserved. Corresponding author. E-mail addresses: [email protected] (D. Fernández-López), raul.cabido@ urjc.es (R. Cabido), [email protected] (A. Sierra-Alonso), [email protected] (A.S. Montemayor), [email protected] (J.J. Pantrigo). Knowledge-Based Systems 70 (2014) 407–419 Contents lists available at ScienceDirect Knowledge-Based Systems journal homepage: www.elsevier.com/locate/knosys

A knowledge-based component library for high-level computer vision tasks

  • Upload
    jj

  • View
    231

  • Download
    18

Embed Size (px)

Citation preview

Page 1: A knowledge-based component library for high-level computer vision tasks

Knowledge-Based Systems 70 (2014) 407–419

Contents lists available at ScienceDirect

Knowledge-Based Systems

journal homepage: www.elsevier .com/ locate /knosys

A knowledge-based component library for high-level computer visiontasks

http://dx.doi.org/10.1016/j.knosys.2014.07.0170950-7051/� 2014 Elsevier B.V. All rights reserved.

⇑ Corresponding author.E-mail addresses: [email protected] (D. Fernández-López), raul.cabido@

urjc.es (R. Cabido), [email protected] (A. Sierra-Alonso), [email protected](A.S. Montemayor), [email protected] (J.J. Pantrigo).

D. Fernández-López, R. Cabido, A. Sierra-Alonso, A.S. Montemayor, J.J. Pantrigo ⇑Departamento de Ciencias de la Computación, Universidad Rey Juan Carlos, c/ Tulipán s/n, 28933 Móstoles, Spain

a r t i c l e i n f o a b s t r a c t

Article history:Received 15 November 2013Received in revised form 27 June 2014Accepted 23 July 2014Available online 2 August 2014

Keywords:Knowledge modelingSoftware component reuseComputer visionVisual surveillanceKnowledge based systemsDomain modellingSoftware engineering

Computer vision is an interdisciplinary field that includes methods for acquiring, processing, analyzing,and understanding visual information. In computer vision, usually the approaches to solve problemsare specific-application methods and, therefore, reusing captured knowledge in computer vision is hard.However, the aim of knowledge modeling (KM) is to capture and reuse knowledge to solve different prob-lems. In this paper, we propose a knowledge-based component library for computer vision tasks such asvideo surveillance applications. The developed components are based on the region of interest (ROI) awell-known concept in the image processing and computer vision fields. We provide a set of reusablecomponents that are specializations and/or compositions of ROIs. Finally, we propose several case studiesthat illustrate the feasibility of the proposal. Experimental results show that the proposed method dealseffectively and efficiently with real-life computer vision problems.

� 2014 Elsevier B.V. All rights reserved.

1. Introduction

Computer vision is an interdisciplinary research area whichobjectives include the acquisition, processing and analysis of visualinformation. During the last two decades, there has been anincreasing interest in the development of visual systems whichinterpret and understand image sequences, mainly motivated bythe growing computation power. Nowadays, computer vision isapplied in real-life systems such as video surveillance, driver assis-tance systems, biometrics, human–computer interaction, to citeonly a few [6,16]. These applications involve visual processingtechniques as feature detection, segmentation, optical flow compu-tation, background models estimation, visual tracking, object rec-ognition, etc. [9].

Different strategies have been proposed for the analysis ofimage sequences. One of them, is the context modeling. The con-textual model is typically used as a tool to interpret a sequenceof images [22]. A system is context-aware if it uses contextual data(in space, time, geometry, physics, lighting, behavior, etc.) to pro-vide relevant information and/or services to the user, where rele-vancy depends on the user’s task [8].

Different approaches can be also found in the literature toacquire, model and use contextual models in computer vision. Miraet al. [15] proposes a knowledge-based model for the task of mov-ing objects detection in image sequences using the CommonKADSmethodology [1,2]. The authors focus on the convenience of aknowledge modeling strategy based on the definition of tasksand methods in terms of a reusable component library like KADS.The task is decomposed into four subtasks: (a) thresholded seg-mentation; (b) motion detection; (c) extraction of silhouettes;and (d) fusion of silhouette parts of moving objects. Brémondand Thonnat [4] address the problem of context representationfor scene interpretation systems. The authors represent the contextof the interpretation process in video surveillance applications anduse this context representation to modify the structure of the glo-bal interpretation system. The method provides a tool to manuallyassign context to regions in an image.

There are also proposals in the literature that try to acquire con-textual information automatically. For example, Brdiczka et al. [3]proposes a generic situation acquisition algorithm based on a statemodel, called situation model, to represent contextual informationand human behavior. This model consists of different layers refer-ring to entities, filters, roles, etc., and the authors present a frame-work to automatically acquire these different layers. The proposalis tested on a video surveillance task. Rabinovich et al. [20] applysemantic context to visual object categorization tasks. They pres-ent a comparison of two sources of context: one learned from

Page 2: A knowledge-based component library for high-level computer vision tasks

Fig. 1. Automatic traffic monitoring proposal for roundabouts.

408 D. Fernández-López et al. / Knowledge-Based Systems 70 (2014) 407–419

training data and another queried from Google Sets (a tool to gen-erate lists of similar items by just providing a reduced initial set ofexamples). The authors conclude that incorporating context intoobject categorization greatly improves the accuracy of the task.

Pantrigo et al. [16] proposes a knowledge modeling system fortracking complex (articulated and multiple) objects. The proposedvisual tracking task is based on a synergistic combination of strat-egies coming from particle filters and population-based metaheu-ristics. The knowledge modeling components used in thedecomposition of the visual tracking task can be reused as genericelements of problem-solving methods in related problems. Thisapproach was enriched in [18], where abstraction techniques wereapplied to the design of visual tracking algorithms. Sánchez et al.[22] presents an extension of a general tracking system that usescontext knowledge to solve tracking issues. The authors showhow the context knowledge representation and the reasoningmethods can be easily adapted to different scenarios. The experi-mental results demonstrate that the performance of the trackingsystem is improved, enabling a real-time execution. Pantrigoet al. [17] proposes a visual tracking system for video surveillanceapplications, capable of tracking a variable number of objects ofinterest. Objects can enter or exit the scene through special regionsmodeled as enter-exit areas.

When experts design contextual-based applications, they usu-ally build a set of components that are only useful to deal with aspecific domain. In most cases, the component reusability is nottaken into account at all. Therefore, when the application domainchanges, the developed components are rarely reused, being builtfrom scratch to solve a new problem. Although there are specificsoftware libraries oriented to solve computer vision tasks whichenable code reuse (such as OpenCV library1 for C++ and Python orthe Image Processing Toolbox2 and the Computer Vision SystemToolbox3 for Matlab), they are not focused on the acquisition andknowledge modeling. In this work, we propose a knowledge-basedreusable component library for contextual modeling of visual tasks.The structure of the library allows to extend the capabilities to (i)new components and (ii) new functionalities of existing compo-nents. In short, the highlights of this work can be summarized asfollows:

� The proposed library provides a set of elements which are appli-cable to model and implement visual tasks.� It also provides the capability of extending the library by creat-

ing new components. The set of components can be enlarged inthree different ways:

– creating new components,– adapting existing ones to fit to the context model of a given

scenario, and– combining existing ones to develop more and more specific

components.

As a consequence, the library can save significant time in soft-ware development, which is one of the main goals of this work.

The component library stands on the notion of ‘‘Region of Inter-est’’ (ROI). This is the usual way to denote a polygonal (usuallyrectangular) selection from an image (see Fig. 1 for a visual repre-sentation of some ROIs for extracting visual information in aroundabout). In this work, the ROI concept is exploited to developelaborated and expressive elements such as counters, speedome-ters, etc. The composition and aggregation of ROI-based compo-nents provide a higher-level description of the visual context.

1 www.opencv.org/.2 www.mathworks.com/products/images/.3 www.mathworks.com/products/computer-vision/.

The ROI-based approach allows experts to develop very efficientapplications as the measurement process can be performed onlyon these ROIs, rather than over the whole image (these ROIs areusually small as compared to the image size). The proposed modelis very prone to be parallelized (parallel computation), since the setof tasks to develop over the whole scenario (image) is decomposedinto tasks on subsets of data that can be processed independently.Indeed, the number of ROIs usually associated to a typical scenariocan be in the order of a dozen. Each of these ROIs could be easilyhandled by one processing core of a common CPU getting acoarse-grained parallelism approach. Nowadays, consumer CPUsexhibit up to 8 simultaneous processing threads in 4 physical coresso they can improve the performance remarkably. Moreover, CPUsalso offer powerful vector registers and instructions that can beused in a fine-grained parallelism approach as every ROI is usuallycomposed of hundreds of pixels.

The library components are modeled by means of the UnifiedModeling Language (UML). UML has been used in the modelingof a plethora of knowledge systems [13,14,21] and is a de factostandard in Software Engineering. As stated in Rhem [21], UMLcan be used as the central modeling notation to capture knowledgefrom a specific domain.

The rest of the paper is organized as follows. Section 2 synthe-sizes an overview of the library, presenting its inner structure. Sec-tion 3 details the ‘‘what’’ and ‘‘how’’ of each component. Section 4illustrates the performance of the component library when appliedto four different scenarios. The problem being solved in each casestudy determines the required components. Therefore, this sectionfocuses on the ‘‘why’’ of this work. Finally, Section 5 summarizesthe main contributions and conclusions of the proposal.

2. Component library overview

Many problems related to scene interpretation implicitly lie in amodel of the considered scenario. This model usually consists of aset of regions in the image, each of them devoted to measure a rel-evant magnitude in a predefined area of interest in order to take adecision. For example, considering the scenario depicted in Fig. 1,the objective is to visually determine the most-used entrancesand exits, the presence of traffic jams, the number of vehicles inthe roundabout at every time, etc. In this example, the contextmodeling is performed by placing vehicle counters in each entranceand exit (represented as double square red boxes in the figure) andpresence detectors in each entrance (rectangular blue boundingboxes in the figure). Vehicle counters allow us to know the number

Page 3: A knowledge-based component library for high-level computer vision tasks

Fig. 2. Level-based structure of the component library.

Fig. 3. The developed component library organized by levels.

Fig. 4. ROI component class.

D. Fernández-López et al. / Knowledge-Based Systems 70 (2014) 407–419 409

of vehicles that uses each entrance and exit and, as a consequence,the number of vehicles in the roundabout at every moment. Pres-ence detectors give us important information to detect possibletraffic jams in each entrance. Note that these components (i.e.,counters and presence detectors) can be applied ’as is’ to solveother problems, as they are domain independent. Therefore, theyare prone to be reused in different application contexts.

We have designed a component library whose fundamental ele-ment is based on the concept of region of interest (ROI). In com-puter vision, a ROI is a region from an image (i.e., a subimage),that is used to perform a specific measure, transformation or oper-ation. ROIs have been traditionally used in order to limit the com-putational overhead of particular image processing operations thatwould appear when processing an entire image. By applying ROIs,we can reduce this computation overhead and focus on relevantzones. However, considering an upper abstraction level, we canmix different operations on different ROIs in an image to obtaina high level knowledge of the particular scene.

The library is constituted by software components which, onthe one hand, are specializations of ROIs and, on the other hand,these components are made up of the association of several ROIs.Fig. 2 depicts the inner structure of the proposed componentlibrary. It shows a tree-shaped view of the library, with the ROIcomponent acting as a root. The second level consists of specializa-tions of the ROI component, designed to apply specific operations

to the image region pixels and it is called Specialized ROI Compo-nents level. These components are related to the ROI by an inheri-tance relationship and they capture the knowledge from thecomputer vision methods in the literature. In a third level, compo-nents that are formed of a set of components appear. The elementsin the third level are related to the ones in the second level by com-position relationships instead of inheritance as in the previouslevel. This third level is called Composed Components level. Theresearchers’ experience is codified by means of relationshipbetween components in this level. Finally, at the deepest level,the developed components are combined to perform specific tasks

Page 4: A knowledge-based component library for high-level computer vision tasks

(a) (b) (c)

Fig. 5. (a) Presence detector, (b) edge detector and (c) color detector classes and examples of use.

410 D. Fernández-López et al. / Knowledge-Based Systems 70 (2014) 407–419

in real-world applications. This combination of elements is calledscenario. In other words, an scenario is an aggregation of a suitableset of single and/or composed components to accomplish a visualtask for a given application. Therefore, the users’ knowledge is cod-ified in this level. It is important to notice that, as the systems aredeveloped as aggregation of components, each of these compo-nents can be also considered as a sub-system, since they are func-tional parts of a system.

(a) (b)

Fig. 6. Checkpoint: (a) class, and (b) application to the detection of vehicles on ahighway.

3. The component library

This section provides a detailed view of the specific componentsdeveloped in each level: (1) ROI, (2) specialized ROI, and (3) com-posed component level. For each one, we present a description ofits methods and attributes. Fig. 3 particularizes Fig. 2 for the devel-oped components of this work. It is worth mentioning that this ini-tial set of components can be extended to deal with otherapplications. As an example, it is possible to develop new compo-nents as a combination of existing ones.

3.1. Region of interest component

The ROI component is the root of the level-based structure ofthe library. This component has no associated functionality and,therefore, it could be seen as an abstract class. The functionalityis delegated to the specialized ROI components developed in thenext level. The attributes of this component define a region ofinterest in an image: position, size and mask. Sometimes, a rectan-gular box does not provide a suitable region modeling because weare interested in an arbitrary-shaped image region. This can be nat-urally modeled by means of a mask. A mask is a binary matrixwhere the zones evaluated to true are considered in the operationto be performed, while the zones evaluated to false are ignored.As an example, consider the blue ROIs modeling road lanes in

Table 1Specialized components performance (in frames per second – fps) in terms of the ROIsize (in pixels).

ROI size(pixels)

Presence detector(fps)

Edge detector(fps)

Color filter(fps)

20 � 20 533.87 531.86 529.5740 � 40 529.62 524.06 524.8780 � 80 511.35 493.16 521.32160 � 160 439.83 379.21 432.87320 � 320 259.13 183.80 259.35

Fig. 1. As the mentioned lanes are not rectangular-shaped, themodeler can provide a mask to restrict the desired computationto the interesting region, rather to the whole rectangular ROI.Fig. 4 represents the UML class of the ROI component. It has fiveattributes: x, y (the position of the ROI, in pixels), Lx, Ly (the sizeof the ROI, in pixels) and the Mask (a binary matrix equally sizedthan the ROI).

Fig. 7. Finite State Machine that governs the behavior of the checkpointcomponent.

Page 5: A knowledge-based component library for high-level computer vision tasks

(a)

(b)

(c)

(d)

(e)

Fig. 8. An example of the performance of the checkpoint component.

D. Fernández-López et al. / Knowledge-Based Systems 70 (2014) 407–419 411

Page 6: A knowledge-based component library for high-level computer vision tasks

Fig. 9. Activity diagram for CheckRtoL() algorithm. Comments refer to thecorresponding states depicted in Fig. 8.

Fig. 10. Speedometer: (a) class, and (b) an example of use for measuring vehiclesspeed.

Fig. 11. Activity diagram for the GetSpeed() method.

412 D. Fernández-López et al. / Knowledge-Based Systems 70 (2014) 407–419

3.2. Specialized ROI components

We have devised three different ROI specializations: (i)presence detector, (ii) edge detector and (iii) color detector. Thiscomponent level can be extended to deal with other usual tasksin computer vision, such as motion detection or optical flowcomputation.

3.2.1. Presence detectorA presence detector is a component which tries to discover fore-

ground objects into the boundaries of the predefined ROI. In orderto describe this problem, this task is basically addressed by com-paring, pixel by pixel, a background image with each new framecoming from the camera. If the differences between each pair ofcompared pixels are greater than a given threshold, then the pixelis considered as a part of the foreground. Otherwise, it is consid-ered as a part of the background. Such a component has a plethoraof potential applications in the context of automatic visual surveil-lance, as it is described in Section 4.

However, background images change over time due to differentreasons, like variations in lighting conditions, background objects,and camera motion. To deal with this casuistry, it is more desirable

to provide a self-adaptive background model rather than a staticbackground image. It can be found many research work devotedto dynamic background modeling. We refer the reader to Piccardi[19] Brutzer et al. [5] and Herrero and Bescós [10] for an introduc-tory view to this topic. In this work, we use an adaptive back-ground mixture model proposed by KadewTraKuPong andBowden [11], that is available in the popular OpenCV library. Inany case, this method could be replaced by any other one withoutaffecting the functionality of the rest of the library components.

Fig. 5a represents the presence detector class. We have imple-mented the following functionality associated to the detectionand description of presence:

PresenceDetected returns a boolean value indicating thatthe region is occupied or notIsOccupiedSince returns the last timestamp since the regionis continuously occupiedDetectBlobs is able to compute the set of connected compo-nents in the current frame.

Fig. 5a also illustrates an example of usage of the IsOccupi-

edSince method to detect traffic jams. In this example, if the com-ponent continuously detects presence of vehicles for a time longerthan 100 s, then it is considered that a traffic jam occurs.

3.2.2. Edge detectorEdge detection is a fundamental operation in many computer

vision systems concerning feature detection and extraction. Manyrelevant feature extraction algorithms in the literature like SIFT(Scale Invariant Feature Transform) [12], and HOG (Histogram ofOriented Gradients) [7], rely on the edge information of smallpatches. From an image processing point of view, an edge is com-posed of regions at which the image brightness presents disconti-nuities. These regions are typically sets of curved line segments.The edge detector component applies this operation on an image

Page 7: A knowledge-based component library for high-level computer vision tasks

Fig. 12. Safe distance meter: (a) class, and (b) an example of use for measuring safe distance between vehicles in a road.

D. Fernández-López et al. / Knowledge-Based Systems 70 (2014) 407–419 413

region defined by its ROI. Fig. 5b represents the edge detector classand an example of use applied to traffic control. As in the presencedetector, there are many methods related to the edge informationestimation, like gradient-based techniques, and second orderderivatives. Here, we choose the well-known Sobel edge approxi-mation to the image gradient.

3.2.3. Color filterColor filtering is another technique of common use in prelimin-

ary stages of computer vision systems. For that reason, we haveimplemented a color filter component. It is devoted to analyze animage region and determine the pixels in the image that holds acondition related to the color information. Fig. 5c represents thecolor detector class and an example of use to detect human skincolor for a possible application to human–computer interaction.Again, many ways to describe color can be taken, depending onthe color space (i.e., RGB, HSV, CIELAB, etc.) as well as the tech-nique to represent color features (i.e., mixture of gaussians, colorhistograms, logical expression, etc.).

3.2.4. Performance analysisThis section presents a flavour of the proposed components per-

formance during execution. In particular, 10 runs for different ROI

Fig. 13. Safe distance meter activity di

sizes in each component have been tested and an average framer-ate is obtained for every configuration and shown in Table 1.

We can observe that, as expected, the larger the size of the ROI,the lower the framerate is. For the presence detector, a ROI of80 � 80 pixels offers 511 fps and 439 fps for 160 � 160 pixels,and up to 259 fps for 320 � 320 pixels. For the edge detector,results are 493 fps for 80 � 80 pixels, 379 for 160 � 160 and183 fps for 320 � 320 pixels. The edge detector operation is a littlebit more computationally expensive than the presence detector, asit is based on a convolution operation around a 3 � 3 neighbor-hood. The color detector offers 521 fps for 80 � 80, 432 fps for160 � 160 and 259 fps for 320 � 320 pixels. It is important to high-light that all the obtained processing rates are much higher thanwhat it is traditionally considered a real time performance in imag-ing applications (about 30 fps), even for the largest ROIs.

3.3. Composed ROI components

In the next level of the library, components created by thearrangement of two or more single ones are built. The aim is totake advantage of the establishment of relationships among singlecomponents to solve increasingly complex tasks. We have

agram for GetDistance method.

Page 8: A knowledge-based component library for high-level computer vision tasks

(a) (b)

(c) (d)

(e) (f)

Fig. 14. Representative frames of the roundabout visual control case study.

Table 2Results obtained by the proposed model on the roundabout case study.

#TP #FN #FP

roundabout1 344 (95.6%) 13 (3.6%) 3 (0.8%)roundabout2 278 (88.2%) 28 (8.9%) 9 (2.9%)roundabout3 309 (91.2%) 28 (8.3%) 2 (0.5%)

Total 931 (91.8%) 69 (6.8%) 14 (1.4%)

414 D. Fernández-López et al. / Knowledge-Based Systems 70 (2014) 407–419

developed three components in this level, called as checkpoint,speedometer and safe distance meter.

3.3.1. CheckpointA checkpoint is used to detect objects crossing a predefined

region of the scenario. It consists of two presence detectors, placedone beside the other. The rationale behind this composition is todetermine when an object crosses the area defined by the ROIsand its direction of motion. This is done by analyzing the existenceof objects of interest in the two presence detectors over consecu-tive time steps. Fig. 6a shows the class of the checkpoint compo-nent which includes its main attributes and methods whileFig. 6b depicts an example for detecting vehicles on a highway.

The attributes section of the checkpoint component is made oftwo presence detectors, called L and R (for the left and right pres-ence detectors, respectively) and a list of the states that the com-ponent reached over time, called StateList. Specifically, thebehavior of this component can be described by a finite statemachine (FSM). Fig. 7 shows the structure of this FSM, where statesare represented by circles and transitions are arrows betweenstates. The states are determined by the occupancy values of eachpresence detector and the current state as follows:

� ‘Await’: This is the initial state of the FSM that is reachedwhen no presence is detected in any detector.

� ‘L!’ (respectively, ‘R!’): This state is reached when thecheckpoint starts detecting an object in the left (respectively,right) presence detector.� ‘L!R’ (respectively, ‘R!L’): This state is overtaken when the

checkpoint detects an object crossing from left to right (respec-tively, from right to left).� ‘!R’ (respectively, ‘!L’): We get this state when the object

is ending its crossing from left to right (respectively, from rightto left).

Boxes in the arrows represent the events that promote the tran-sitions between states. They are related to the occupancy of the leftand right presence detectors. A value of 1 means that the presencedetector is occupied and a value of 0 means that it is not. Fig. 8 rep-resents an example of the checkpoint to determine the direction of

Page 9: A knowledge-based component library for high-level computer vision tasks

(a) (b)

(c) (d)

(e) (f)

Fig. 15. Representative frames of the traffic speed visual control case study.

Table 3Results obtained by the proposed model on the traffic speed control case study.

#TP #FN #FP

speedometer1 25 (78.1%) 7 (21.9%) 0 (0.0%)speedometer2 55 (80.9%) 11 (16.2%) 2 (2.9%)

Total 80 (80.0%) 18 (18.0%) 2 (2.0%)

D. Fernández-López et al. / Knowledge-Based Systems 70 (2014) 407–419 415

vehicles on a road. Specifically, it describes the states visited by thecheckpoint component when the method ‘CheckRtoL’ returnstrue. The left column shows a selected set of representativeframes while the right column represents the FSM state associatedto each frame. Fig. 8a shows a vehicle before entering the rightpresence detector of the checkpoint and the associated FSM stateis ‘Await’. Fig. 8b represents the vehicle entering the right pres-ence detector and, as a consequence, a transition is produced from‘Await’ to ‘R!’. The next frame represents the vehicle crossingboth presence detectors simultaneously, reaching the state‘R!L’. Fig. 8d depicts the vehicle when abandoning the rightpresence detector, and the corresponding transition to ‘!L’.Finally, in the last frame, the vehicle is completely out of the check-point, and the FSM shows how the checkpoint returns to the initial‘Await’ state.

Fig. 9 represents a simplified version of how the CheckRtoL()algorithm works. That is, it details the algorithm that drives thetransitions described in Fig. 8. It starts by recursively checkingthe presence of an object in the right presence detector R. Whena new object is detected in R, the system goes from the state‘Await’ to ‘R!’ and the algorithm focuses on the presencedetector L. As the object is moving from right to left, it is expectedthat it enters in the region of interest of L, and the system thengoes to state ‘R!L’. Then, the object in its movement will

completely leave the ROI associated to R (state ‘!L’) and, finally,the ROI of L (again returning to the initial state ‘Await’), com-pleting the movement from right to left and crossing the check-point. Symmetrically, The CheckLtoR() algorithm works in anequivalent way, just exchanging the left and right presence detec-tors between them.

3.3.2. SpeedometerA composed use of the checkpoints is the speedometer. Once we

know the direction of motion of an object we can easily computeits speed. The speedometer component consists of two checkpointsseparated by a known physical distance (or in image pixels if weknow the correspondence to real distances by calibrating the cam-era). It is used to determine the speed of objects passing through it.The straightforward application is the speed traffic control using

Page 10: A knowledge-based component library for high-level computer vision tasks

416 D. Fernández-López et al. / Knowledge-Based Systems 70 (2014) 407–419

visual features. Its operation fundamentals are very simple, mainlydue to that the speedometer is based on other developed librarycomponents. Each checkpoint can detect the object passingthrough it and determine the timestamp when this event is pro-duced. Then it is possible to measure the time between successiveactivations of the checkpoints and, therefore, the average speed ofthe object in this region. Fig. 10a shows the class of this componentand Fig. 10b illustrates an application to speed control in a realworld scenario, where the speedometer is represented by twored checkpoints.

Fig. 11 represents the GetSpeed algorithm. It computes theaverage speed of an object crossing the space between the twocheckpoints. The speedometer detects a vehicle crossing the check-point CheckA calling the method CheckA.CheckLtoR(). Then, atime mark, called t1 in the figure, is captured. In a similar way,when the object crosses the checkpoint CheckB, the speedometerdetects (invoking the method CheckB.CheckLtoR()) and cap-tures a second time mark t2. As the distance between both check-points is known, the method returns the average velocity.

3.3.3. Safe distance meterAs another composition of checkpoint components we can cre-

ate a safe distance meter. The aim of this component is to approxi-mately compute the distance between two moving objects. Itsmost leading application is related to dynamically determine thesafe distance between vehicles on road. Fig. 12a represents theclass of this component. The safe distance meter component con-sists of a set of N checkpoints linearly distributed and separateda known distance among them. Its operation is based on the acti-vation of the checkpoint controls. When these elements detectthe objects passing through it, the safe distance meter is able todetermine the number of objects within its area. When there aremore than two vehicles, this component computes the distancebetween each pair of them. Fig. 12b depicts an application of safedistance control.

Fig. 13 represents a simplified activity diagram for the GetDis-tance method.It starts analyzing the set of checkpoints searchingfor a FSM state of ‘L!R’ by calling the methodChecks[I].CheckLtoR(), where 0 < I 6 N. When an object isdetected, the safe distance meter component checks the rest ofthe checkpoints to detect a second object, in the range 0 6 J < I.If a second object is detected, then the safe distance meter compo-nent returns the distance between them. In other case, it returnsvoid.

4. Case studies

This section illustrates the performance of the proposed reus-able components when dealing with real-life applications. Wepresent four different cases of study: (a) the visual control of trafficin a roundabout, (b) the speed traffic control in a highway, (c) thesafe distance control in a road, and (d) the pedestrian controlaccess to restricted areas. All the experiments were performed ona laptop Intel Core 2 Duo T6400 2 GHz processor and 4 GB RAM.It is important to remark that, in all the considered examples, theprocessing rates can be considered beyond real-time restrictionsfor visual surveillance applications. A representative collection ofoutput videos considered in this section is available at http://www.gavab.etsii.urjc.es/capo/kbscv/videos.zip.

4.1. Case study 1: Traffic control in a roundabout

The first example considers the analysis of the traffic in a round-about, placed in a medium-sized Spanish city. Fig. 14 shows someselected representative frames of a video sequence. As it can be

seen in the figure, the scenario is modeled by aggregation of thefollowing library components:

� Six checkpoints, one for each roundabout entrance and exit. Thecheckpoint is represented in the figure as a pair of rectangularboxes with red edges. These components are able to detect(and hence, count) the pass of vehicles through them. As a con-sequence, it is also possible to use them to know the number ofvehicles into the roundabout at any time. This indirect measureis shown as a red number in the roundabout center.� Two presence detectors, in two selected entrances of the round-

about. They are represented as rectangular boxes with blueedges and a blue transparency depicting the mask. Their pur-pose is to detect traffic jams in these roundabout entrances. Itis considered that a traffic congestion event is produced if thepresence detector detects occupancy during a longer time thana given threshold.

We have tested this model over a set of three video sequences(called as roundabout1, roundabout2 and roundabout3), total-izing nearly 500,000 video frames. Table 2 summarizes theobtained results. The first column contains the name of theinstance (video sequence); the second, third and fourth columnspresent the number of detected vehicles (#TP – true positives),the number of undetected vehicles (#FN – false negatives), andthe number of false detections (#FP – False positives), respectively.

As evidenced from the results shown in the table, the proposaldeals with the problem accurately. Specifically, the ratio of truepositives obtained with respect to the total number of vehicles(which is the sum of #TP and #FN) was 93.10%. On the other hand,the ratio of false detections with respect to the total number ofvehicles was 1.40%, which can be considered a very reasonableresult. In view of the obtained results, it is possible to confirm thatthe proposed model properly describes the real behavior of thetraffic in the considered roundabout. Finally, it is remarkable thatthe average processing rate was higher than 30 frames per secondas the processing areas are restricted to the number of consideredROIs.

4.2. Case study 2: Traffic speed control

Our second example focuses on a traffic speed control based onvisual information. Usually, traffic speed control systems rely onradar technology as it is more accurate and robust to varying cli-mate conditions. What we present here is a prototype that is notmature enough to be considered as a realistic alternative toradar-based systems. Nevertheless, we think that this exampleillustrates the potential application field of our proposal.

Fig. 15 illustrates the performance of the components on thetraffic speed control case study. We have modeled the scenarioby the aggregation of the following library components:

� Two speedometers, one for each highway lane. The speedome-ters are composed of two checkpoints, each one representedas two rectangular boxes with red edges. The computed speed(in km/h) is also represented in the figure (see for exampleFig. 15c, e and f).� Two presence detectors, one for each speedometer. They are

represented as rectangular boxes with blue edges and a bluetransparency depicting the mask. When a vehicle is detected,the color of the mask turns red (see for example Fig. 15b andd). These presence detectors are not strictly necessary, but theycan be useful as a second measure to prevent false positives.

The proposed model was tested over a set of three videosequences (called speedometer1 and speedometer2) totalizing

Page 11: A knowledge-based component library for high-level computer vision tasks

(a) (b)

(c) (d)

Fig. 16. Representative frames of the safe distance visual control case study.

(a) (b)

(c) (d)

Fig. 17. Representative frames of the access control to restricted areas case study.

D. Fernández-López et al. / Knowledge-Based Systems 70 (2014) 407–419 417

6000 video frames. Table 3 shows the obtained results in the sameform as the previous application.

Results in Table 3 show that the proposed approach can prop-erly deal with the problem of determining the traffic speed. As aresume, we have obtained a ratio of 80% of true positives, whilethe ratio of false detections with respect to the total number of

vehicles was 2%. False detections are due to shadows of big vehi-cles passing through adjacent lanes. Unfortunately, we do not pos-sess the actual speeds (ground truth) of the detected vehiclesmeasured by a radar device and, as a consequence, we cannotquantitatively evaluate the performance of the system in this task.However, the speed values computed by the system are reasonably

Page 12: A knowledge-based component library for high-level computer vision tasks

418 D. Fernández-López et al. / Knowledge-Based Systems 70 (2014) 407–419

compatible with the evaluated highway section. Finally, theobtained processing rate was 60 frames per second.

Fig. 18. A typical system error counting people when they form groups.

4.3. Case study 3: Safe distance control

The safe distance between vehicles is a matter of great impor-tance to road safety which, as far as we know, currently does nothave an associated control system in operation. In the absence ofsuch a control system, maintaining proper safety distance is a deci-sion that depends on the drivers’ criterion. In this case study, wepropose a basic prototype for safe distance control based on visualinformation. The proposed system calculates the distance betweenconsecutive vehicles driving in the same direction. With theknowledge of experts, it would also be possible to model the mostsuitable safe distance in terms of the velocity of the involved vehi-cles. Although this issue is out of the scope of the present work, weare planning to devote more attention to it in future works.

Fig. 16 illustrates safe distance control case study. We havemodeled the scenario by a safe distance meter composed compo-nent. This component is represented in the figure as a series oftwenty-two checkpoints, each one depicted as rectangular boxeswith red edges. They are linearly arranged and placed at a distanceequivalent to about 2.5 m between each pair of consecutive check-points. When a checkpoint detects a vehicle passing through it, itsedge color turns green.

We have analyzed seven video sequences (called distance1 todistance7), totalizing 900 frames. All the vehicles were detectedwhen passing from the safe distance meter, and all the distancesbetween each pair of consecutive vehicles were accurately com-puted. The component successfully discriminates vehicles movingin different directions but, in these situations, the safe distance isnot computed. With respect to the performance of the proposal,we have obtained an average processing rate of about 50 fps.

4.4. Case study 4: Access control to restricted areas

The access control to restricted areas and people counting aretwo well-known applications in the field of automatic video sur-veillance. The first of these problems consists of determining thepresence of people in a restricted area using visual information.The second problem deals with counting the number of people ina controlled environment. Fig. 17 illustrates the proposed modelto address these objectives. One presence detector is devoted todetect people in the grass area, which is considered as a notallowed area in our experimental design. A second one is placedin the passing zone and tries to determine the number of peopleat each video frame.

We have analyzed two video sequences from the PETS dataset,4

called campus1 and campus2, totalizing about 1000 video frames.All the entrances in the restricted area were properly detected. Asexpected, the people counter works especially well when peopleare not occluded in the observed region. Note that we are not includ-ing high level methods to overcome the occlusion problem but itcould be proposed for individual tracking applications. Under theseexperimental conditions, the system obtains an error rate of10.86% on average. By contrast, when people are grouped, the sys-tem tends to count each group as a single person. Fig. 18 shows amalfunction example over the video campus2, where groups of peo-ple are considered as only one person. It could be solved by usingtypical heuristics such as the size of a person given the context ofthe scene, the ground plane extraction, the use of more sophisticatedalgorithms for people detection such as the HOG algorithm.

4 http://www.cvg.rdg.ac.uk/PETS2013/a.html.

With respect to the performance of the proposal, we haveobtained an average processing rate of about 15 fps. This signifi-cant performance difference with respect to previous examples isdue to the large dimensions of the ROIs. Nevertheless, this framer-ate is compatible with real-time constraints for many applicationsin the context of automatic visual surveillance.

5. Conclusions

In this paper, we propose a knowledge-based componentlibrary for computer vision tasks in the context of video processing.The devised components are based on the specialization and/orcomposition of regions of interest (ROIs). This approach allows toobtain an increasing high-level description of the visual contextand, at the same time, eases the reuse of previously developedcomponents. In addition, the ROI-based approach results in veryefficient (and potentially parallelized) applications as the differentprocesses only focuses on small areas (and also they can be pro-cessed all at a time, in a parallel way), rather than over the wholeimage.

We have tested the proposal on four different cases of study,mainly related to real-world surveillance and traffic control appli-cations. Experimental results demonstrate the feasibility of theproposal. In addition, the component library is easy to use by adomain expert, as it is easy to determine the needed componentsto build a model which satisfies the needs of the specificapplication.

Acknowledgments

This research has been partially supported by the Spanish Gov-ernment research projects Ref. TIN2011-28151 and TIN2012-31104.

References

[1] A. Aamodt, B. Bredeweg, J. Breuker, C. Duursma, C. Löckenhoff, K. Orsvarn, J.Top, A. Valente, W. Van de Velde, The CommonKADS Library, Document KADS-II/T1.3/VUB/TR/005/1.0, 1993.

[2] J. Breuker, W. Van de Velde, CommonKADS Library for Expertise Modelling:Reusable Problem Solving Components, IOS Press – OHM, 1994.

[3] O. Brdiczka, P.C. Yuen, S. Zaidenberg, P. Reignier, J.L. Crowley Automaticacquisition of context models and its application to video surveillance, in:Proceedings of the 18th International Conference on Pattern Recognition,2006.

[4] F. Brémond, M. Thonnat, A context representation for surveillance systems, in:Proceedings of the Workshop Conceptual Descriptions from Images at theEuropean Conference on Computer Vision, 1996.

Page 13: A knowledge-based component library for high-level computer vision tasks

D. Fernández-López et al. / Knowledge-Based Systems 70 (2014) 407–419 419

[5] S. Brutzer, B. Höferlin, G. Heidemann, Evaluation of background subtractiontechniques for video surveillance, in: Proceedings of the IEEE Computer SocietyConference on Computer Vision and Pattern Recognition, CVPR 2011, 2011, pp.1937–1944.

[6] P.J. Burt, A pyramid-based front-end processor for dynamic vision applications,Proc. IEEE 90 (7) (2002) 1188–1200.

[7] N. Dalal, B. Triggs, Histograms of oriented gradients for human detection, Proc.IEEE Comput. Soc. Conf. Comput. Vision Pattern Recognit. CVPR 2005 1 (1)(2005) 886–893.

[8] A.K. Dey, Understanding and using context, J. Pers. Ubiquitous Comput. 5 (1)(2001) 4–7.

[9] R.C. González, K. Viji, Digital Image Processing, 3rd ed., Prentice-Hall, 2001.[10] S. Herrero, J. Bescós, Background subtraction techniques: systematic

evaluation and comparative analysis, Adv. Concepts Intell. Vision Syst.(2009) 33–42.

[11] P. KadewTraKuPong, R. Bowden, An improved adaptive background mixturemodel for real-time tracking with shadow detection, in: Proc. 2nd EuropeanWorkshop on Advanced Video-Based Surveillance Systems, 2001.

[12] D.G. Lowe, Distinctive image features from scale-invariant keypoints, Int. J.Comput. Vision 60 (2) (2004) 91–110.

[13] A. Manjarres, R. Martínez-Tomás, J. Mira, A new task for expert system analysislibraries: the decision task and the HM method, Expert Syst. Appl. 16 (3)(1999) 325–341.

[14] A. Manjarres, S. Pickin, J. Mira, Knowledge model reuse: therapy decisionthrough specialisation of a generic decision model, Expert Syst. Appl. 23 (2)(2002) 113–135.

[15] J. Mira, A.E. Delgado, A. Fernández-Caballero, M.A. Fernández, Knowledgemodeling for the motion detection task: the algorithmic lateral inhibitionmethod, Expert Syst. Appl. 27 (2) (2004) 169–185.

[16] J.J. Pantrigo, A. Sánchez, J. Mira, On knowledge modeling of the Visual Trackingtask, Expert Syst. Appl. 35 (1-2) (2008) 69–81.

[17] J.J. Pantrigo, J. Hernández, A. Sánchez, Multiple and variable target visualtracking for video surveillance applications, Pattern Recognit. Lett. 31 (12)(2010) 1577–1590.

[18] J.J. Pantrigo, A.S. Montemayor, A. Sánchez, Heuristic particle filter: applyingabstraction techniques to the design of visual tracking algorithms, Expert Syst.28 (1) (2011) 49–69.

[19] M. Piccardi, Background subtraction techniques: a review, IEEE Int. Conf. Syst.Man Cyber. 4 (2004) 3099–3104.

[20] A. Rabinovich, A. Vedaldi, C. Galleguillos, E. Wiewiora, S. Belongie, Objects incontext, in: Proceedings of the Int. Conf on Computer Vision., 2007.

[21] A.J. Rhem, UML for Developing Knowledge Management Systems, AuerbachPublications, 2006.

[22] A.M. Sánchez, M.A. Patricio, J. García, J.M. Molina, A Context Model andReasoning System to improve object tracking in complex scenarios, ExpertSyst. Appl. 36 (8) (2009) 10995–11005.