1
IIIT Hyderabad Trajectory Based Video Object Manipulation Rajvi Shah & P J Narayanan. Center for Visual Information Technology, IIIT Hyderabad, India. http://researchweb.iiit.ac.in/~rajvi.shah/vnm PROBLEM STATEMENT Despite the tremendous increase in home- made videos, high-level video manipulation is still uncommon among home-users due to lack of easy to use and powerful platforms. Can we improve the usability of video authoring interfaces? MOTIVATION Basic video editing platforms for home users provide limited functionality such as synchronizing media objects, adding captions, split and merge videos etc. Professional video editing platforms provide rich functionality but demand certain level of training and expertise for use. Motivation of our work is to improve usability of video authoring interfaces for naïve users, using computer vision and image processing techniques. We propose an object trajectories based interface which allows users to navigate and manipulate video objects in an intuitive ‘click and drag’ fashion. BASIC CONCEPT Most video editing platforms model and represent videos as a collection of frames against a timeline which makes object centric manipulation and browsing an unnatural and laborious experience. Basic concept in our approach is to use object-time model instead of a frame-time model for video representation. Model the video as a collection of spatiotemporal object volumes and a static background. Represent object trajectories in a 3D space-time grid to perform object centric operations. This representation enables a user to perform a number of object centric manipulation tasks by interactively manipulating the object trajectories in a simple “click and drag” fashion. Users can interactively modify the trajectories in interaction grid to effectively modify the spatiotemporal object volume in output video and simultaneously visualize the resulting spatial occupancy and object overlap in visualization grid. This representation replaces complex input elements like parameter specification dialogs and tools by interactive curve manipulation operations like translate, erase, cut, copy, paste. Most home-office users are already familiar with such operations. The visual nature of such operations make the interface very intuitive and easy to grasp. In a number of applications, the proposed interface reduces a seemingly complex task to simple and intuitive drag and drop operations. Applications: Video Browsing, Retiming Independent Objects, Synop- sis, Cloning and Object Removal, Object Annotation RELETED WORK Direct Manipulation Video Browsing - Kimber et al. (ICME’07) : Map based interface for surveillance vieos - Dragicevic et al. (CHI ’08): SIFT feature-flow based playback interface - Karrer et al. (CHI ‘08) Video Navigation, Annotation and Composition - Goldman et al. (UIST ‘08): Particle video based editing interface Making a long vidoe short: Dynamic Video Synopsis - Rav-Acha et al. (CVPR ‘06) PRE-PROCESSING FLOW EXAMPLE MANIPULATION TASKS EXAMPLE APPLICATION Dance Video Montage FUTURE WORK We are trying to extend our approach to support simple camera motion using mosaic based representation. Another useful extension of this work is to estimate the complexity of object motion and represent it visually to aid a user focus on probably more important video segments. CONCLUSION We believe that augmenting video context and motion cues with user in- terface can significantly improve the usability of video manipulation tools. Proposed interface is one such step to achieve that overall goal. We be- lieve that the fidelity and popularity of such interfaces will increase with the progress in computer vision and video processing techniques. INTERFACE Object Tube Model Interaction Grid Visualization Grid (a) Initial ROI Selection by User (b) Background Subtraction and Cleaning (c) Classifier based Object Detection (a) (b) (c) Tracking/Segmentation or combination of both to extract spatiotemporal object volumes. Articulate object matte is not required, only the outer bounding box. Input Video We use manual input to select good set of training frames for “background learning” and also to overcome tracking errors. User interactively resets the ROI bounding box in erroneous frames. Ideally a combined approach for interactive tracking and segmentation should be used. Object Detection Object Volume Extraction Manual Annotation Select Points (Mark point/segment of interest) Erase Segment (Object Removal in selection) Stretch/Shrink Segment (Speed up/Down Object) Move Trajectory (Synchronize/Reorder Events) Cut Trajectory (Split the object tube) Copy-Paste Trajectory (Clone the object) Invert Trajectory (Reverse the object) Original State User drags object trajectories to reorder events User extends blue car’s trajectory to prevent it from disappearing Frame at 00:18 in Modified Video Observe blue car in parking space Frame at 00:18 in Original Video User selects a segment of red car’s trajectory and stretches it to slow down red car’s motion in corresponding time segment in Video. Blue car is speeded up by shrinking it’s trajectory segment. Observe the person as a common point of reference. Frames at 00:06 in Original Video (Left) and Retimed Video (Right). Reordering Events Retiming Individual Events Object Removal and Cloning Object Based Video Navigation Operations: Erase, Copy, Move, Invert, Resize Frames from output video : blue car in reverse motion and a duplication of red car. Dancer’s Original Trajectory User’s arrangement of trajectory segments Magnified XZ view of Interaction grid (in left) Sample frames from final montage Video Simple Video Navigation In this mode, the video is not altered, but user can browse the video by scrubbing the object trajectories in interaction grid. Single Object Navigation User can scrub an object’s trajectory to play video tube of only object of interest. Rest of the objects are replaced by constant background. WYSIWYG In this mode, use’s navigation actions are recorded to later create a video with same effects.

Trajectory Based Video Object Manipulationrajvi.shah/vnm/vnm_poster.pdf · IIIT Hyderabad Trajectory Based Video Object Manipulation ... Basic video editing platforms for home users

  • Upload
    others

  • View
    10

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Trajectory Based Video Object Manipulationrajvi.shah/vnm/vnm_poster.pdf · IIIT Hyderabad Trajectory Based Video Object Manipulation ... Basic video editing platforms for home users

IIIT Hyderabad

Trajectory Based Video Object ManipulationRajvi Shah & P J Narayanan. Center for Visual Information Technology, IIIT Hyderabad, India. http://researchweb.iiit.ac.in/~rajvi.shah/vnm

PROBLEM STATEMENT

Despite the tremendous increase in home-made videos, high-level video manipulation is still uncommon among home-users due to lack of easy to use and powerful platforms. Can we improve the usability of video authoring interfaces?

MOTIVATIONBasic video editing platforms for home users provide limited functionality such as synchronizing media objects, adding captions, split and merge videos etc. Professional video editing platforms provide rich functionality but demand certain level of training and expertise for use.

Motivation of our work is to improve usability of video authoring interfaces for naïve users, using computer vision and image processing techniques.

We propose an object trajectories based interface which allows users to navigate and manipulate video objects in an intuitive ‘click and drag’ fashion.

BASIC CONCEPTMost video editing platforms model and represent videos as a collection of frames against a timeline which makes object centric manipulation and browsing an unnatural and laborious experience.

Basic concept in our approach is to use object-time model instead of a frame-time model for video representation.

Model the video as a collection of spatiotemporal object volumes and a static background. Represent object trajectories in a 3D space-time grid to perform object centric operations.

This representation enables a user to perform a number of object centric manipulation tasks by interactively manipulating the object trajectories in a simple “click and drag” fashion.

Users can interactively modify the trajectories in interaction grid to effectively modify the spatiotemporal object volume in output video and simultaneously visualize the resulting spatial occupancy and object overlap in visualization grid.

This representation replaces complex input elements like parameter specification dialogs and tools by interactive curve manipulation operations like translate, erase, cut, copy, paste. Most home-office users are already familiar with such operations.

The visual nature of such operations make the interface very intuitive and easy to grasp. In a number of applications, the proposed interface reduces a seemingly complex task to simple and intuitive drag and drop operations.

Applications: Video Browsing, Retiming Independent Objects, Synop-sis, Cloning and Object Removal, Object Annotation

RELETED WORKDirect Manipulation Video Browsing - Kimber et al. (ICME’07) : Map based interface for surveillance vieos- Dragicevic et al. (CHI ’08): SIFT feature-flow based playback interface- Karrer et al. (CHI ‘08)

Video Navigation, Annotation and Composition- Goldman et al. (UIST ‘08): Particle video based editing interface

Making a long vidoe short: Dynamic Video Synopsis- Rav-Acha et al. (CVPR ‘06)

PRE-PROCESSING FLOW EXAMPLE MANIPULATION TASKS

EXAMPLE APPLICATION Dance Video Montage

FUTURE WORKWe are trying to extend our approach to support simple camera motion using mosaic based representation.

Another useful extension of this work is to estimate the complexity of object motion and represent it visually to aid a user focus on probably more important video segments.

CONCLUSIONWe believe that augmenting video context and motion cues with user in-terface can significantly improve the usability of video manipulation tools. Proposed interface is one such step to achieve that overall goal. We be-lieve that the fidelity and popularity of such interfaces will increase with the progress in computer vision and video processing techniques.

INTERFACE

Object Tube Model Interaction Grid Visualization Grid

(a) Initial ROI Selection by User (b) Background Subtraction and Cleaning

(c) Classifier based Object Detection

(a) (b) (c)

Tracking/Segmentation or combination of both to extract spatiotemporal object volumes.

Articulate object matte is not required, only the outer bounding box.

Inp

ut V

ideo

We use manual input to select good set of training frames for “background learning” and also to overcome tracking errors. User interactively resets the ROI bounding box in erroneous frames.

Ideally a combined approach for interactive tracking and segmentation should be used.

Object Detection

Object Volume Extraction

Manual Annotation

Select Points (Mark point/segment of interest)

Erase Segment (Object Removal in selection)

Stretch/Shrink Segment (Speed up/Down Object)

Move Trajectory (Synchronize/Reorder Events)

Cut Trajectory (Split the object tube)

Copy-Paste Trajectory (Clone the object)

Invert Trajectory (Reverse the object)

Original State User drags object trajectories to reorder events

User extends blue car’s trajectory to prevent it from

disappearing

Frame at 00:18 in Modified Video Observe blue car in parking space

Frame at 00:18 in Original Video

User selects a segment of red car’s trajectory and stretches it to slow down red car’s motion in corresponding time segment in Video. Blue car is speeded up by shrinking it’s trajectory segment. Observe the person as a common point of reference.

Frames at 00:06 in Original Video (Left) and Retimed Video (Right).

Reordering Events

Retiming Individual Events

Object Removal and Cloning

Object Based Video Navigation

Operations: Erase, Copy, Move, Invert, Resize

Frames from output video : blue car in reverse motion and a duplication of red car.

Dancer’s Original Trajectory

User’s arrangement of trajectory segments

Magnified XZ view of Interaction grid (in left)

Sample frames from final montage Video

Simple Video Navigation In this mode, the video is not altered, but user can browse the video by scrubbing the object trajectories in interaction grid.

Single Object Navigation User can scrub an object’s trajectory to play video tube of only object of interest. Rest of the objects are replaced by constant background.

WYSIWYG In this mode, use’s navigation actions are recorded to later create a video with same effects.