Exploring Video Hyperlinking in Broadcast Media

Maria Eskevich, Quoc-Minh Bui, Hoang-An Le, Benoit Huet

Multimedia DepartmentEURECOM

Sophia-Antipolis, France

Exploring Video Hyperlinking in

Broadcast Media

htpp://www.eurecom.fr/

SLAM Workshop @ ACM MM 2015, Brisbane, Australia 2

Motivation CURRENTLY: Constantly growing quantity of

multimedia content produced by both professionals and individual users.

NEED: navigation systems that allow access to this data on different levels of granularity in order to contribute to further discovery of a topic of interest for the user or to facilitate individual user browsing within a collection.

10/30/15



Task and Challenges We envisage the users to be interested in creating

their own path within the multimedia collection based on their level of awareness/knowledge of it, and tasks.

How do we create a hyperlinked collection to allow each individual user to follow their own path of interest within?

Why is it challenging? Lack of interpretability of the content: Overall textual description on the video level which leads to:

Retrieval process is based on text features, while the rich multimodality of videos is not exploited;

Not possible to partially retrieve a video, a specific fragment without having to watch the entire video.

Archives are not static: Need for dynamic creation of links

10/30/15



Insights in Hyperlinking Hyperlinking

Creating “links” between media

Video Hyperlinking video to video video fragment to video fragment

10/30/15



Related experiments: Search and Hyperlinking @ MediaEval/TRECVid Different techniques based on: Video segmentation method:

Fixed-length units (same length, sentences, speech segments) with further adjustment to match full sentences, using speech segment boundaries and pauses. they build a probabilistic framework to model the importance of words and refine segment boundaries accordingly;

Meaningful segments, based on topics derived from transcripts: classification trees to define the starting and ending times of these segments; lexical cohesion within segment.

Features used for retrieval: Text similarity (vector-based models and TF-IDF weightings); Named entities or synonyms; Visual information: visual concepts or SURF and SIFT features,

detected at the shot level.

10/30/15



Search and Hyperlinking @ MediaEval/TRECVid Our solution: Scene segmentation technique based on

visual and temporal coherence of the video segments that constitute the scenes of the video.

Experiment with hybrid segmentationVisual, Topic and Temporal Coherence

Use visual features to improve the ranking of results of the hyperlinking including visual analysis during the search process.

10/30/15


System overview

Broadcast Media: BBC contentManually transcribed

subtitlesMetadata:

title, cast, description, broadcast time

Shot segmentation, keyframes

Visual content analysis:

Scene segmentation

Concept detection

on shot level

Lucene/Solr:Indexing/Retrieval on shot

level

Media database:76 214 fragments

Webservice:(HTML5/AJAX/PHP)

Videos:2323 items/1697 hours

User Interface

Shot segmentation, keyframe extraction

Result list:Media fragments

level



151 Visual Concepts (TrecVid 2012) 3_Or_More_People Actor Adult Adult_Female_Human Adult_Male_Human Airplane Airplane_Flying Airport_Or_Airfield Anchorperson Animal Animation_Cartoon Armed_Person Athlete Baby Baseball Basketball Beach Bicycles Bicycling Birds Boat_Ship Boy

• Building• Bus• Car• Car_Racing• Cats• Cattle• Chair• Charts• Child• Church• City• Cityscape• Classroom• Clouds• Construction_Vehicles• Court• Crowd• Dancing• Daytime_Outdoor• Demonstration_Or_Protest• Desert• Dogs

• Emergency_Vehicles• Explosion_Fire• Face• Factory• Female-Human-Face-Closeup• Female_Anchor• Female_Human_Face• Female_Person• Female_Reporter• Fields• Flags• Flowers• Football• Forest• Girl• Golf• Graphic• Greeting• Ground_Combat• Gun• Handshaking• Harbors

• Helicopter_Hovering• Helicopters• Highway• Hill• Hockey• Horse• Hospital• Human_Young_Adult• Indoor• Insect• Kitchen• Laboratory• Landscape• Machine_Guns• Male-Human-Face-Closeup• Male_Anchor• Male_Human_Face• Male_Person• Male_Reporter• Man_Wearing_A_Suit• Maps• Meeting• …

10/30/15



Scene segmentation Scene : group of shots based on content

similarity and temporal consistency among shots; Content similarity : visual similarity between HSV

histograms extracted from the keyframes of different shots;

Grouping is performed using two extensions of the Scene Transition Graph (STG): reduces the computational cost of STG-based shot grouping

by considering shot linking transitivity, builds on the former to construct a probabilistic framework

that alleviates the need for manual STG parameter selection. [Apostolidis et al. “Automatic fine-grained hyperlinking of videos within a

closed collection using scene segmentation.” ACM MM 2014]

10/30/15



Handling Visual Concepts in Solr

10/30/15

http://localhost:8983/solr/collection_mediaEval/select?q=text:(Children out on poetry trip Exploration of poetry by school children Poem writing) Animal:[0.2 TO 1] Building:[0.2 TO 1]

Schema = structure of document using fields of different types

Query

<doc><field name="id">

20080401_013000_bbcfour_legends_marty_feldman_six_degrees_of#t=399,402</field><field name="begin">00:06:39.644</field><field name="end">00:06:42.285</field><field name="videoId">20080401_013000_bbcfour_legends_marty_feldman_six_degrees_of</field><field name="subtitle">'It was very, very successful.'</field>

<field name="Actor">0.143</field><field name="Adult">0.239</field><field name="Animal">0.0572</field>

</doc>

http://localhost:8983/solr/collection_mediaEval/select?q=text:(Children+out+on+poetry+trip+Exploration+of+poetry+by+school+children+Poem+writing)+Animal:%5B0.2+TO+1%5D+Building:%5B0.2+TO+1






Results (S&H 2014)

10/30/15



Results discussion Text (Audio) performs best on the 2014

MediaEval Hyperlinking task Using visual features isn’t

straightforward

The ‘MoreLikeThis’ approach is outperformed by the Text only searchPossibly due to query formulation differences

Sentences vs Keywords Visual Scenes outperform other

fragmentation levels (Topic, Sentences)10/30/15

[Safadi et al. “When textual and visual information join forces for multimedia retrieval”, ICMR 2014]



DEMO Hyper Video Browser

10/30/15

Search/Browse Hyperlinking



Conclusion and Future work We proposed and evaluated an approach to include

visual properties in the search of video segments for hyperlinking.

Experimental results show that : Visual properties provide meaningful cues for segmenting the video Mapping text-based queries to visual concepts is not easy Automatic selection of relevant concepts is required (human

intervention is impractical and does not necessarily lead to perfect results)

Operating at the keyword level rather than full text offers improvements

Current/Future work: incorporate query semantics when identifying key visual semantic concepts based on named entity recognition approaches and keyword/visual

concepts co-occurrences

10/30/15



References E. Apostolidis, V. Mezaris, M. Sahuguet, B. Huet, B. Cervenkova, D. Stein, S.

Eickeler, J. L. Redondo Garcia,R. Troncy, and L. Pikora. “Automatic ne-grained hyperlinking of videos within a closed collection using scene segmentation”. In ACMMM 2014, 22nd ACM International Conference on Multimedia, Orlando, Florida, USA, 11 2014.

H. Le, Q. Bui, B. Huet, B. Cervenkova, J. Bouchner, E. Apostolidis, F. Markatopoulou, A. Pournaras, V. Mezaris, D. Stein, S. Eickeler, and M. Stadtschnitzer,.“LinkedTV at MediaEval 2014 Search and Hyperlinking Task”. In Proceedings of MediaEval 2014 Workshop, 2014.

R. J. F. Ordelman, M. Eskevich, R. Aly, B. Huet, and G. J. F. Jones. “Dening and evaluating video hyperlinking for navigating multimedia archives”. In Proceedings of the 24th International Conference on World Wide Web Companion, WWW 2015, Florence, Italy, May 18-22, 2015 - Companion Volume, pages 727-732, 2015.

M. Sahuguet, B. Huet, B. Cervenkova, E. Apostolidis, V. Mezaris, D. Stein, S. Eickeler, J. L. Redondo Garcia, and L. Pikora. “LinkedTV at MediaEval 2013 search and hyperlinking task”. In MediaEval 2013 Workshop, Barcelona, Spain, 10 2013.

B. Safadi, M. Sahuguet, and B. Huet. “When textual and visual information join forces for multimedia retrieval”. In Proceedings of International Conference on Multimedia Retrieval (ICMR '14). ACM, New York, NY, USA, , Pages 265 , 8 pages, 2014.

10/30/15


Questions?

Thank you for your attention!


Technology

Exploring Video Hyperlinking in Broadcast Media