Click here to load reader

1 Scene Understanding perception, multi-sensor fusion, spatio-temporal reasoning and activity recognition. Francois BREMOND PULSAR project-team, INRIA

  • View
    215

  • Download
    1

Embed Size (px)

Text of 1 Scene Understanding perception, multi-sensor fusion, spatio-temporal reasoning and activity...

  • Slide 1
  • 1 Scene Understanding perception, multi-sensor fusion, spatio-temporal reasoning and activity recognition. Francois BREMOND PULSAR project-team, INRIA Sophia Antipolis, FRANCE [email protected] http://www-sop.inria.fr/pulsar/ Key words: Artificial intelligence, knowledge-based systems, cognitive vision, human behavior representation, scenario recognition
  • Slide 2
  • 2 ETISEO: French initiative for algorithm validation and knowledge acquisition: http://www-sop.inria.fr/orion/ETISEO/ Approach: 3 critical evaluation concepts Selection of test video sequences Follow a specified characterization of problems Study one problem at a time, several levels of difficulty Collect long sequences for significance Ground truth definition Up to the event level Give clear and precise instructions to the annotator E.g., annotate both visible and occluded part of objects Metric definition Set of metrics for each video processing task Performance indicators: sensitivity and precision Video Understanding: Performance Evaluation (V. Valentin, R. Ma)
  • Slide 3
  • 3 Evaluation : current approach (AT. NGHIEM) ETISEO limitations: Selection of video sequence according to difficulty levels is subjective Generalization of evaluation results is subjective. One video sequence may contain several video processing problems at many difficulty levels Approach: treat each video processing problem separately Define a measure to compute difficulty levels of input data (e.g. video sequences) Select video sequences containing only the current problems at various difficulty levels For each algorithm, determine the highest difficulty level for which this algorithm still has acceptable performance. Approach validation : applied to two problems Detect weakly contrasted objects Detect objects mixed with shadows
  • Slide 4
  • 4 Evaluation : conclusion A new evaluation approach to generalise evaluation results. Implement this approach for 2 problems. Limitations: only detect the upper bound of algorithm capacity. The difference between the upper bound and the real performance may be significant if: The test video sequence contains several video processing problems The same set of parameters is tuned differently to adapt to several concurrent problems Ongoing evaluation campaigns: PETS at ECCV2008 TRECVid (NIST) with ILids video Benchmarking databases: http://homepages.inf.ed.ac.uk/cgi/rbf/CVONLINE/entries.pl?TAG363 http://www.hitech-projects.com/euprojects/cantata/datasets_cantata/dataset.html
  • Slide 5
  • 5 Video Understanding: Program Supervision
  • Slide 6
  • 6 Goal : easy creation of reliable supervised video understanding systems Approach Use of a supervised video understanding platform A reusable software tool composed of three separate components: program library control knowledge base Formalize a priori knowledge of video processing programs Explicit the control of video processing programs Issues ? Video processing programs which can be supervised A friendly formalism to represent knowledge of programs A general control engine to implement different control strategies A learning tool to adapt system parameters to the environment Supervised Video Understanding : Proposed Approach
  • Slide 7
  • 7 Control Application Domain Expert Video Processing Expert Application domain knowledge base Scene environment knowledge base Video processing program knowledge base Learning Evaluation Particular System Evaluation Video Processing Program Library Proposed Approach
  • Slide 8
  • 8 Use of an operator formalism [Clment and Thonnat, 93] to represent knowledge of video processing programs Composed of frames and production rules Frames: declarative knowledge Operators: abstract model of a video processing program primitive: particular program composite: particular combination of programs Production rules: inferential knowledge Choice and optional criteria Initialization criteria Assessment criteria Adjustment and repair criteria Supervised Video Understanding Platform: Operator Formalism
  • Slide 9
  • 9 Program Supervision: Knowledge and Reasoning Primitive operator Functionality Characteristics Input data Parameters Output data Preconditions Postconditions Effects Calling syntax Rule Bases Parameter initialization rules Parameter adjustment rules Result evalutation rules Repair rules Composite operator Functionality Characteristics Input data Parameters Output data Preconditions Postconditions Effects Decomposition into suboperators (sequential, parallel, alternative) Data flow Rule bases Parameter initialization rules Parameter adjustment rules Choice rules Result evalutation rules Repair rules
  • Slide 10
  • 10 Objective: a learning tool to automatically tune algorithm parameters with experimental data Used for learning the segmentation parameters with respect to the illumination conditions Method Identify a set of parameters of a task 18 segmentation thresholds depending on environment characteristics Image intensity histogram Study the variability of the characteristic Histogram clustering -> 5 clusters Determine optimal parameters for each cluster Optimization of the 18 segmentation thresholds Video Understanding: Learning Parameters (B.Georis)
  • Slide 11
  • 11 Video Understanding: Learning Parameters Camera View
  • Slide 12
  • 12 Learning Parameters Clustering the Image Histograms Number of pixels [%] Pixel intensity [0-255] X Z Y A X-Z slice represents an image histogram i opt4 i opt1 i opt2 i opt5 i opt3
  • Slide 13
  • 13 CARETAKER: An FP6 IST European initiative to provide an efficient tool for the management of large multimedia collections. Applications to surveillance and safety issues, in urban/environment planning, resource optimization, disabled/elderly person monitoring. Currently being validated on large underground video recordings ( Torino, Roma). Complex Events Raw Data Simple Events Knowledge Discovery Raw data Primitives Event and Meta data Audio/Video acquisition and encoding Multiple Audio/Video sensors Knowledge Discovery Generic Event recognition Video Understanding : Knowledge Discovery (E. Corvee, JL. Patino_Vilchis)
  • Slide 14
  • 14 Event detection examples
  • Slide 15
  • 15 Data Flow Object/Event Detection Information Modelling Object Detection Id Type Info 2D Info 3D Event Detection Id Type (inside_zone, stays_inside_zone) Involved Mobile Object Involved Contextual Object Mobile object table Event table Contextual object table
  • Slide 16
  • 16 Mobile Objects People characterised by: Trajectory Shape Significant Event in which they are involved Contextual Objects Find interactions between mobile objects and contextual objects Interaction type Time Table Contents Events Model the normal activities in the metro station Event type Involved objects Time
  • Slide 17
  • 17 Knowledge Discovery: trajectory clustering Objective: Clustering of trajectories into k groups to match people activities Feature set Entry and exit points of an object Direction, speed, duration, Clustering techniques Agglomerative Hierarchical Clustering. K-means Self-Organizing (Kohonen) Maps Evaluation of each cluster set based on Ground-Truth
  • Slide 18
  • 18 Feature Vector Key points: x y x entry y entry m1m1 m2m2 mkmk mKmK Trajectory: Clustering Methods Parameter tuning: which features?
  • Slide 19
  • 19 Agglomerative clustering Trajectory: Clustering Parameter tuning: which distance function?
  • Slide 20
  • 20 Results on Torino subway (45min), 2052 trajectories
  • Slide 21
  • 21 SOM K-meansAgglomerative Groups with mixed overlap Trajectory: Analysis
  • Slide 22
  • 22 Trajectory: Semantic characterisation SOM CL14 / Kmeans CL12 Agglomerative CL 21 Consistency of clusters between algorithms Semantic meaning: walking towards vending machines
  • Slide 23
  • 23 Intraclass & Interclass variance SOM algorithm has the lowest intraclass and higher interclass separation, Parameter tuning: which clustering techniques? Trajectory: Analysis
  • Slide 24
  • 24 Video features modeled under three different tables with topological and temporal relations for quantitative and semantic description. Trajectory clustering gives information about frequent enter exit zones, density occupation and behavior characterization. Meaningful trajectory clusters are validated by the consistency through different algorithms Trajectory: Analysis
  • Slide 25
  • 25 Mobile Objects
  • Slide 26
  • 26 Mobile Object Analysis Building statistics on Objects There is an increase of people after 6:45
  • Slide 27
  • 27 Contextual Object Analysis Vending Machine 2 Vending Machine 1 With an increase of people, there is an increase on the use of vending machines
  • Slide 28
  • 28 Contextual Object Analysis Gates 192345678
  • Slide 29
  • 29 Analysis: Use of the Gates Gates 7 to 9 are the most used (right side of gates) Gates 1 to 3 are the less in use (left side of gates)
  • Slide 30
  • 30 Results : Trajectory Clustering
  • Slide 31
  • 31 Semantic knowledge extracted by the off-line long term analysis of on-line interactions between moving objects and contextual objects: 70% of people are coming from north entrance Most people spend 10 sec in the hall 64% of people are going directly to the gates without stopping at the ticket machine At rush hours people are 40% quicker to buy a ticket, Issues: At which level(s), should be designed clustering techniques: low level (image features)/ middle level (trajectories, shapes)/ high level (primitive events)? to learn what : visual concepts, scenario models? uncertainty (noise/outliers/rare), what are the activities of interest? Parameter tuning (e.g. distance, clustering tech.) and performance evaluation (criteria, ground-truth). Knowledge Discovery: achievements
  • Slide 32
  • 32 Video Understanding : Learning Scenario Models (A. Toshev) or Frequent Composite Event Discovery in Videos event time series
  • Slide 33
  • 33 Why unsupervised model learning in Video Understanding? Complex models containing many events, Large variety of models, Different parameters for different models The learning of models should be automated. Learning Scenarios: Motivation Video surveillance in a parking lot
  • Slide 34
  • 34 Input: A set of primitive events from the vision module: object-inside-zone(Vehicle, Entrance) [5,16] Output: frequent event patterns. A pattern is a set of events: object-inside-zone(Vehicle, Road) [0, 35] object-inside-zone(Vehicle, Parking_Road) [36, 47] object-inside-zone(Vehicle, Parking_Places) [62, 374] object-inside-zone(Person, Road) [314, 344] Learning Scenarios: Problem Definition Goals: Automatic data-driven modeling of composite events, Reoccurring patterns of primitive events correspond to frequent activities, Find classes with large size & similar patterns. Zones
  • Slide 35
  • 35 Approach: Iterative method from data mining for efficient frequent patterns discovery in large datasets, A PRIORI: Sub-patterns of frequent patterns are also frequent (Agrawal & Srikant, 1995), At i th step consider only i-patterns which have frequent (i-1) sub-patterns the search space is thus pruned. A PRIORI-property for activities represented as classes: size(C m-1 ) size(C m ) where C m is a class containing patterns of length m, C m-1 is a sub-activity of C m. Learning Scenarios: A PRIORI Method
  • Slide 36
  • 36 Learning Scenarios: A PRIORI Method Merge two i-patterns with (i-1) primitive events in common to form an (i+1)-pattern:
  • Slide 37
  • 37 2 types of Similarity Measure between event patterns : similarities between event attributes similarities between pattern structures Generic Similarity Measure : Generic properties when possible easy usage in different domains, It should incorporate domain-dependent properties relevance to the concrete application. Learning Scenarios: Similarity Measure
  • Slide 38
  • 38 Attributes: the corresponding events in two patterns should have similar (same) attributes (duration, names, object types,...). Learning Scenarios: Attribute Similarity Comparison between corresponding events (same type, same color). For numeric attributes: G(x,y)= attr(p i, p j ) = average of all event attribute similarities.
  • Slide 39
  • 39 Test data: Video surveillance at a parking lot, 4 hours records from 2 days in 2 test sets, Every test set contains appr. 100 primitive events. Learning Scenarios: Evaluation Results: In both test sets the following event pattern was recognized: object-inside-zone(Vehicle, Road) object-inside-zone(Vehicle, Parking_Road) object-inside-zone(Vehicle, Parking_Places) object-inside-zone(Person, Parking_Road)
  • Slide 40
  • 40 Test data: Video surveillance at a parking lot, 4 hours records from 2 days in 2 test sets, Every test set contains appr. 100 primitive events. Learning Scenarios: Evaluation Results: In both test sets the following event pattern was recognized: object-inside-zone(Vehicle, Road) object-inside-zone(Vehicle, Parking_Road) object-inside-zone(Vehicle, Parking_Places) object-inside-zone(Person, Parking_Road)
  • Slide 41
  • 41 Test data: Video surveillance at a parking lot, 4 hours records from 2 days in 2 test sets, Every test set contains appr. 100 primitive events. Learning Scenarios: Evaluation Results: In both test sets the following event pattern was recognized: object-inside-zone(Vehicle, Road) object-inside-zone(Vehicle, Parking_Road) object-inside-zone(Vehicle, Parking_Places) object-inside-zone(Person, Parking_Road)
  • Slide 42
  • 42 Test data: Video surveillance at a parking lot, 4 hours records from 2 days in 2 test sets, Every test set contains appr. 100 primitive events. Learning Scenarios: Evaluation Results: In both test sets the following event pattern was recognized: object-inside-zone(Vehicle, Road) object-inside-zone(Vehicle, Parking_Road) object-inside-zone(Vehicle, Parking_Places) object-inside-zone(Person, Parking_Road) Maneuver Parking!
  • Slide 43
  • 43 Conclusion: Application of a data mining approach, Handling of uncertainty without losing computational effectiveness, General framework: only a similarity measure and a primitive event library must be specified. Future Work: Other similarities, Handling of different aspects of uncertainty, Qualification of the learned patterns, Frequent equal interesting ? Different applications: different event libraries or features. Learning Scenarios: Conclusion & Future Work
  • Slide 44
  • 44 GERHOME (CSTB, INRIA, CHU Nice) : Ageing population http://gerhome.cstb.fr/ Approach : Multi-sensor analysis based on sensors embedded in the home environment Detect in real-time any alarming situation Identify a person profile his/her usual behaviors - from the global trends of life parameters, and then to detect any deviation from this profile HealthCare Monitoring: (N. Zouba)
  • Slide 45
  • 45 Monitoring of Activities of Daily Living for Elderly Goal: Increase independence and quality of life:Goal: Increase independence and quality of life: Enable elderly to live longer in their preferred environment.Enable elderly to live longer in their preferred environment. Reduce costs for public health systems.Reduce costs for public health systems. Relieve family members and caregivers.Relieve family members and caregivers. Approach:Approach: Detecting alarming situations (eg. Falls)Detecting alarming situations (eg. Falls) Detecting changes in behaviorDetecting changes in behavior (missing activities, disorder, interruptions, repetitions, inactivity). repetitions, inactivity). Calculate the degree of frailty of elderly people.Calculate the degree of frailty of elderly people. Example of normal activity: Meal preparation (in kitchen) (11h 12h) Eating (in dinning room) (12h -12h30) Resting, TV watching, (in living room) (13h 16h)
  • Slide 46
  • 46 GERHOME (Gerontology at Home) : homecare laboratoryGERHOME (Gerontology at Home) : homecare laboratory http://www-sop.inria.fr/orion/personnel/Francois.Bremond/topicsText/gerhomeProject.html Experimental site in CSTB (Centre Scientifique et Technique du Btiment) at Sophia AntipolisExperimental site in CSTB (Centre Scientifique et Technique du Btiment) at Sophia Antipolis http://gerhome.cstb.fr Partners: INRIA, CSTB, CHU-Nice, Philips-NXP, CG06Partners: INRIA, CSTB, CHU-Nice, Philips-NXP, CG06 Gerhome laboratory
  • Slide 47
  • 47 Gerhome laboratory Position of the sensors in Gerhome laboratory Video cameras installed in the kitchen and in the living-room to detect and track the person in the apartment. Video cameras installed in the kitchen and in the living-room to detect and track the person in the apartment. Contact sensors mounted on many devices to determine the interactions with the person. Contact sensors mounted on many devices to determine the interactions with the person. Presence sensors installed in front of sink and cooking stove to detect the presence of people near sink and stove. Presence sensors installed in front of sink and cooking stove to detect the presence of people near sink and stove.
  • Slide 48
  • 48 Dans la cuisine Sensors installed in Gerhome laboratory Video camera in the living-room Pressure sensor underneath the legs of armchair Contact sensor in the window Contact sensor in the cupboard door
  • Slide 49
  • 49 We have modelled a set of activities by using a event recognition language developedWe have modelled a set of activities by using a event recognition language developed in our team. This is an example for Meal preparation event.in our team. This is an example for Meal preparation event. Composite Event (Prepare_meal_1, detected by a video camera combined with a contact sensors Physical Objects ( (p: Person), (Microwave: Equipment), (Fridge: Equipment), (Kitchen: Zone)) Components ((p_inz: PrimitiveState Inside_zone (p, Kitchen)) detected by video camera (open_fg: PrimitiveEvent Open_Fridge (Fridge)) detected by contact sensor (close_fg: PrimitiveEvent Close_Fridge (Fridge)) detected by contact sensor (open_mw: PrimitiveEvent Open_Microwave (Microwave)) detected by contact sensor (close_mw: PrimitiveEvent Close_Microwave (Microwave))) detected by contact sensor Constraints ((open_fg during p_inz ) (open_mw before_meet open_fg ) (open_fg Duration>= 10) (open_mw Duration>=5)) Action ( AText (Person prepares meal) AType (NOT URGENT)) ) Event modelling
  • Slide 50
  • 50 Multi-sensor monitoring: results and evaluation We have validated and visualized the recognized events with a 3D visualization tool. We have validated and visualized the recognized events with a 3D visualization tool. Activity # Videos # Events TPFNFPPrecisionSensitivity In the kitchen 1045405010,888 In the living-room 103540050,8881 Open microwave 815 0011 Open fridge 824 0011 Open cupboard 830 0011 Preparing meal 1 8330011 We have studied and tested a range of activities in the Gerhome laboratory, such as: using microwave, using fridge, preparing meal, We have studied and tested a range of activities in the Gerhome laboratory, such as: using microwave, using fridge, preparing meal,
  • Slide 51
  • 51 Recognition of the Prepare meal event Visualization of a recognized event in the Gerhome laboratory The person is recognized with the posture "standing with one arm up, located in the kitchen and using the microwave. The person is recognized with the posture "standing with one arm up, located in the kitchen and using the microwave.
  • Slide 52
  • 52 Recognition of the Resting in living-room event Recognition of the Resting in living-room event The person is recognized with the posture sitting in the armchair and located in the living- room. The person is recognized with the posture sitting in the armchair and located in the living- room. Visualization of a recognized event in the Gerhome laboratory
  • Slide 53
  • 53End-users There are several end-users in homecare:There are several end-users in homecare: Doctors (gerontologists): Doctors (gerontologists): Frailty measurement (depression, )Frailty measurement (depression, ) Alarm detection (falls, gas, dementia, ).Alarm detection (falls, gas, dementia, ). Caregivers and nursing home: Caregivers and nursing home: Cost reduction: no false alarm and reduction employee involvement.Cost reduction: no false alarm and reduction employee involvement. Employee protection.Employee protection. Persons with special needs, including young children, disabled and elderly people: Persons with special needs, including young children, disabled and elderly people: Feeling safe at home.Feeling safe at home. Autonomy: at night, lighting up the way to bathroom.Autonomy: at night, lighting up the way to bathroom. Improving life: smart mirror, summary of user day, week, month, in terms of walking distance, TV, water consumption.Improving life: smart mirror, summary of user day, week, month, in terms of walking distance, TV, water consumption. Family members and relatives: Family members and relatives: Elderly safety and protection.Elderly safety and protection. Social connectivity.Social connectivity.
  • Slide 54
  • 54 Social problems and solutions ProblemsSolutions Privacy confidentiality and ethics: video (and other data) recording, processing and transmission. No video recording and transmission, only textual alarms. Acceptability for elderly User empowerment. Usability Easy ergonomic interface (no keyboard, large screen), friendly usage of the system. Cost effectiveness The right service for the right price, large variety of solutions. Legal issues, no certification Robustness, benchmarking, on site evaluation Installation, maintenance, training, interoperability with other home devices Adaptability, X-Box integration, wireless, standards (OSGI, ) Research financing ? France (no money, lobbies), Europe (delay), US, Asia.
  • Slide 55
  • 55 Conclusion A global framework for building video understanding systems: Hypotheses: mostly fixed cameras 3D model of the empty scene predefined behavior models Results: Video understanding real-time systems for Individuals, Groups of People, Vehicles, Crowd, or Animals Knowledge structured within the different abstraction levels (i.e. processing worlds) Formal description of the empty scene Structures for algorithm parameters Structures for object detection rules, tracking rules, fusion rules, Operational language for event recognition (more than 60 states and events), video event ontology Tools for knowledge management Metrics, tools for performance evaluation, learning Parsers, Formats for data exchange
  • Slide 56
  • 56 Object and video event detection Finer human shape description: gesture models Video analysis robustness: reliability computation Knowledge Acquisition Design of learning techniques to complement a priori knowledge: visual concept learning scenario model learning System Reusability Use of program supervision techniques: dynamic configuration of programs and parameters Scaling issue: managing large network of heterogeneous sensors (cameras, microphones, optical cells, radars.) Conclusion: perspectives