View
139
Download
2
Category
Preview:
Citation preview
SEMANTIC VIDEO CLASSIFICATION BASED ON SUBTITLES AND DOMAIN TERMINOLOGIES “基於字幕以及領域術語學為基礎的影⽚片語義分群”
FROM:KAMC 07’ 1ST INTERNATIONAL WORKSHOP ON KNOWLEDGE ACQUISITION FROM MULTIMEDIA CONTENT EDITOR: POLYXENI KATSIOULI, VASSILEIOS TSETSOS, STATHES HADJIEFTHYMIADES 報告者:蘇⿍鼎⽂文 指導教授:林熙禎
MOTIVATION
新教育革命當中國學生不用花半毛錢在家就能上到美國的知名大學課程
慕課: ⼀一場新教育⾰革命
免費教育網路服務:Coursera 已經有700萬註冊學⽣生,超過英國和法國⼤大學⽣生⼈人數的總和。
Coursera 使⽤用者中,三分之⼀一來⾃自於發展中的經濟體。
What is MOOC⼤大規模網路免費公開課程(Massive Open Online Course)
源於開放教育資源的教育理念
焦點著重於如何使學⽣生更輕易取得e化教學、更能永續經營e化教學
能⾃自由取得資源
沒有學⽣生⼈人數限制
MOOC的優點
只需要網路連線就可以線上學習
⾃自由分享、⾃自由批評和⾃自由瀏覽
課程彈性
Free!!
MOOC的挑戰
容易困惑或迷失⽅方向
需要具備⾃自我管理的學習態度
Guided Learning
在Video-sharing educational tool applied to the teaching in renewable energy subjects 論⽂文中實驗證明能夠⽤用⼀一個影⽚片學習系統幫助學⽣生提⾼高學習能⼒力以及動機
但影⽚片由專家⼿手動加⼊入費時且無法⾃自動化
是否能夠應⽤用Youtube海量影⽚片庫來幫助?
⾃自動分類影⽚片的⽅方法
Text MetaData
Title, Description, Tags
Entity Extraction from consistent text
A/V Features
Audio and Video signal classification
ideal for games
Less ideal for general content
Video Context
Entities from context
Comments
Web embeds
User engagement
問題
在Youtube的教育影⽚片,Text MetaData通常內容都太少了
畫⾯面、⾳音訊處理較困難且處理成本較重
是否有其他可⽤用⽂文字的⽅方式帶來較好的解決⽅方法?
Subtitle
Subtitle
Abstract
An unsupervised approach to classify video content by analyzing the corresponding subtitles
Based on the WordNet and WordNet domains
Apply natural language processing techniques on video subtitles
INTRODUCTION
semantic information from multimedia content
multimedia databases gain more and more popularity
a critical and challenging topic
explore efficient ways to index their content based on its features and semantics
Subtitlescarry information through natural language sentences
may not be able to detect all video semantics, but have several benefits:
more lightweight process than video and audio processing
high-level semantics are more closely related to human language
RELATED WORK
Semantic Video Indexing and Summarization Using Subtitles
partitions the script in segments
represents each one as a term frequency inverse document frequency (TF-IDF) vector
video retrieval and summarization are described through the application of machine learning techniques
MUMIS projectuse of natural language processing techniques for indexing and searching multimedia content
based on an XML-encoded ontology is applied to textual sources of different type and in different language separately
combines the annotations extracted from such sources into one integrated, formal description of their content
Semantic principal video shot classification via mixture Gaussian
a framework for semantic classification of educational surgery videos, two phases:
1.video content characterization via principal video shots
2.video classification through a mixture Gaussian model
Content-based Video Classification Using Support Vector Machines
based on low-level features such as color, shape and motion
use a Support Vector Machine (SVM) classifier
to classify them in one of the following class labels: “cartoons”, “commercials”, “cricket”, “football” and “tennis”
Text Classification
Decision trees are one of most important and successful machine learning technique
leaves represent classifications
branches correspond to the combinations of attributes that leads to those classifications
In this paper, we compare the proposed method for classification with a decision tree classifier
WORDNET AND WORDNET DOMAINS
WordNet
a large dictionaries(or lexical database)!
English nouns, verbs, adjectives and adverbs are grouped into sets of “synsets”
Synset contains a group of synonymous words or collocations
V.S. Traditional dictionariesTraditional dictionaries are arranged alphabetically
WordNet is arranged semantically
EX:
noun synset {base, alkali}
noun synset {basis, base, foundation, fundament, groundwork, cornerstone}
verb synset {establish, base, ground, found}.
semantic relations
Most synsets are connected to other synsets through a number of semantic relations
noun synsets are related through hypernymy (generalization), hyponymy (specialization), holonymy (whole of), and meronymy (part of) relations
semantic relations Exampleartefact: root sysnset
motorcar與motorVehicle互為Hypernyms &Hyponyms
WordNet domains
augmenting WordNet with domain labels
approximately 200 domain labels enhances WordNet synsets
If none of the domain labels is adequate for a specific synset, the label Factotum is assigned to it (almost 35% synsets)
Example
Fig. 1. Some senses of the word "plant" with their corresponding domains
SCHEME
Step 1: Text Preprocessing
subtitles are segmented into sentences
POS tagger is applied to the words of each phrase
stop words are removed as they carry no semantics and do not contribute to the understanding of the main text concepts
Keywords Extraction
identify and select only the most important and relevant subtitle words for further classifying the video
implemented the TextRank algorithm
The number of keywords extracted is based on the size of the text
TextRank
completely unsupervised graph-based ranking model
keywords extraction or text summarization
利⽤用投票的原理,讓每⼀一個單字給它的鄰居投贊成票,票的權重取決於⾃自⼰己的票數
derived from Google’s PageRank algorithm
Step 3: Word Sense Disambiguation
Most words in natural language are characterized by polysemy
Ex:
BANK
Step 3: Word Sense Disambiguation
Most words in natural language are characterized by polysemy
Ex:
BANK銀⾏行
Step 3: Word Sense Disambiguation
Most words in natural language are characterized by polysemy
Ex:
BANK銀⾏行
河岸斜坡
WSD algorithm
adaptation of Lesk’s algorithm for WSD
Lesk’s algorithm:
based on glosses found in traditional dictionaries
assigned the sense whose gloss shares the largest number of words with the glosses of the other words in the context
Extend Lesk’s algorithm
using WordNet to include the related words’ glosses
through semantic relations ex:hyponym, hypernym
⽐比較容易在上位或下位詞中找到相關字詞
Example
he sat on the bank of the river
Example
he sat on the bank of the river
Lesk’s algorithm
Sit
river
Example
he sat on the bank of the river
Lesk’s algorithm
Sit
river
Extend Version
stream, watercourse
lounge
Sprawl
Step 4: WordNet Domains Extraction
derive the domains which these synsets correspond to
calculate the occurrence score of each domain label and sort them in decreasing order.
extract the WordNet domains with the highest occurrence score
圖解
keyword
圖解
keyword Synset
圖解
keyword Synset Domain X
keyword Synset Domain X
keyword Synset Domain Y
keyword Synset Domain Z
圖解
keyword Synset Domain X Wv
keyword Synset Domain X
keyword Synset Domain Y
keyword Synset Domain Z
Dx
Dy
Dz
Step 5: Definition of correspondences between category labels and WordNet domains
choose the most appropriate class label
First, we looked up in WordNet the senses related to each category label
obtained the WordNet domains that correspond to the senses of each category
calculated for each category the occurrence score of each of the derived domains
Dc
Sense
Sense
Sense
Sense
Dc’
c
Dc
Sense
Sense
Sense
Sense
Dc’
c
Dx
Dy
Dz
Dc
Sense
Sense
Sense
Sense
Dc’
c
Dx
Dy
Dz
Dc
Sense
Sense
Sense
Sense
Dx
Dx
Dy
Dz
Dc’
Step 6: Category label assignment
top-ranked WordNet domains(Step5)
Video’s set of the WordNet domains (Step 4)
STEP5
STEP4
proposed deals with assigning a category label to the video entity
Equation(1)
C be the set with all the category labels
D the set of all the WordNet domains that correspond to each category label
D = {Dc'}
c∈C∪
D
D
D
c1c2c3cN
D
c1c2c3cN
Dx
Dy
Dz
D
c1c2c3cN
Dx
Dy
Dz
Da
Dc
Db
Dx
Dy
Db
Dc
Dy
Equation(2)checking which category c ∈ C satisfies equation
classifies video v under the category c
If more than one candidate, compare the second elements and so on
Dc'[0]=Wv[0]
D
c1c2c3cN
Dx
Dy
Dz
Da
Dc
Db
Dx
Dy
Db
Dc
Dy
D
c1c2c3cN
Dx
Dy
Dz
Da
Dc
Db
Dx
Dy
Db
Dc
Dy
Wv
Dx
Dy
Dz
D
c1c2c3cN
Dx
Dy
Dz
Da
Dc
Db
Dx
Dy
Db
Dc
Dy
Wv
Dx
Dy
Dz
D
c1c2c3cN
Dx
Dy
Dz
Da
Dc
Db
Dx
Dy
Db
Dc
Dy
Wv
Dx
Dy
Dz
D
c1c2c3cN
Dx
Dy
Dz
Da
Dc
Db
Dx
Dy
Db
Dc
Dy
Wv
Dx
Dy
Dz
Cv
D
c1c2c3cN
Dx
Dy
Dz
Da
Dc
Db
Dx
Dy
Db
Dc
Dy
Wv
Dx
Dy
Dz
Cv
D
c1c2c3cN
Dx
Dy
Dz
Da
Dc
Db
Dx
Dy
Db
Dc
Dy
Wv
Dx
Dy
Dz
Cv
c1
c3
D
c1c2c3cN
Dx
Dy
Dz
Da
Dc
Db
Dx
Dy
Db
Dc
Dy
Wv
Dx
Dy
Dz
Cv
c1
c3
D
c1c2c3cN
Dx
Dy
Dz
Da
Dc
Db
Dx
Dy
Db
Dc
Dy
Wv
Dx
Dy
Dz
Cv
c1
c3
D
c1c2c3cN
Dx
Dy
Dz
Da
Dc
Db
Dx
Dy
Db
Dc
Dy
Wv
Dx
Dy
Dz
Cv
c1
c3
D
c1c2c3cN
Dx
Dy
Dz
Da
Dc
Db
Dx
Dy
Db
Dc
Dy
Wv
Dx
Dy
Dz
Cv
c1
EXPERIMENT
Experiment on documentary
36 documentaries and General types for documentary
Geography, History, Animals, Politics…
easier to classify documentaries
usually restricted to a specific domain
contain narrative
statistical information
approximately 44% of all the WordNet domains extracted from each video are assigned the label ‘Factotum
Evaluation
Classification Accuracy reflects the proportion of the classifier’s correct category assignments that agree with the user’s assignments
used the Recall and F-measure performance measures to evaluate the classification results for each individual category
Domains and category
comparisonresults were compared to those obtained from decision tree classifier J4.8 of the WEKA tool
results obtained are very promising since it achieved an accuracy value of 69.4%
Expected distance between J4.8 as unsupervised method
POLYSEMA Platform
have been carried out in the context of the POLYSEMA project
develops an end-to-end platform for interactive TV services by exploiting the metadata of the broadcast transmission
POLYSEMA Platformpresent work is part of the activity in Development of semantics extraction techniques for automatic annotation of audiovisual content
Three kinds of techniques are currently investigated:
video summarization
domain ontology learning
video classification
CONCLUSION
Look back
an innovative method for unsupervised classification of video content
applying natural language processing techniques on their subtitles
promising experimental results using documentaries, especially given the fact that no training phase is required.
Improvement
video segments & Subtitle Segments
Compare to other text classification algorithms (mainly unsupervised approaches)
define more knowledge domains more close to the movie classification
keywords extraction algorithm
Comment基於字幕的Text mining⽅方式多採取Entity Extraction的⽅方法,近來則也有採MWH(multi-wing Harmoniums), Entity’s temporal features analysis的部分
作為unsupervised的⽅方式,其Category與Domain Label之間的Mapping為靜態建構,若要動態調整的時候應該不容易
⺫⽬目前採取Single Topic Single Video的⽅方式,但⼀一部影⽚片可能會不⽌止⼀一個議題,Video Segment的⽅方式⾃自動化可能不容易,有辦法發現Topic shifting?
Comment
現在網路教育資源不斷出現但通常難以被普通⼈人接觸到,缺少了⼀一個整合的系統。
若我們能夠了解影⽚片的語義,那我們也許有機會可以做出⼀一些有⽤用的應⽤用。例如幫助學⽣生找到輔助的教材。
Recommended