Upload
bia-javed
View
220
Download
0
Embed Size (px)
Citation preview
8/3/2019 Semantic Based Multimedia Analysis and Retrieval
1/6
26th
IEEEP Students Seminar 2011
Pakistan Navy Engineering College
National University of Sciences & Technology
Semantic Based Multimedia Analysis and Retrieval
Sana Aslam, Mahwash Makhdoom, Madeeha Khan, Amna Basharat
FAST National University of Computer & Emerging Sciences
Department of Computer Sciences
AK Brohi Road, H-11/4 Islamabad, Pakistan
[email protected], [email protected], [email protected], [email protected]
AbstractThe volume of multimedia files is increasing day by
day. Especially with the advancement in religious e-learning and
multimedia knowledge resources, it has become highly
demanding that effective retrieval and search methodologies
should be incorporated in this area. Many renowned scholars
from all over the world deliver lectures in which they discussdiverse issues/topics and these are usually hours long; therefore
it is problematic and time consuming to navigate to a particular
segment within the multimedia files manually. The necessity of
present time is that efficient search methods should be devised
which facilities the user to select the topic of concern within the
multimedia files with just a single click. Moreover, users should
be assisted in a way that allows them to query in natural
language instead of keyword based search so that the retrieval is
improved and results are based upon content and context. In
this paper we are proposing a method that enables the users to
navigate to a particular segment within the multimedia files.
The architecture of Semantic based Multimedia Analysis and
Retrieval (SMART) allows content indexing and time stamped
alignment of transcriptions with multimedia file. The
architecture of SMART incorporates the natural language
processing techniques for efficient query and the modern
semantic web technologies for efficient search. Search will be
handled by modeling the knowledge base with ontologies. The
advantage of making ontologies is that it allows machine
interoperability; further this would help to retrieve the relevant
results. The architecture envisions combining natural language
processing techniques along with modern semantic web
technologies and use them in a domain that opens new ways ofknowledge sharing and information retrieval.
Keywords-component: Natural Language Processing,
Transcription Alignment, Islamic Scholarly Lectures, Multimedia
Segment Navigation, Ontology
I. INTRODUCTIONWith the changes in needs of users new trends have been
introduced and developed to store information. The
advancement in the field of e-learning has transformed
multimedia resources as a very valuable source of knowledge
and information. The search engines nowadays, do not enable
users to search a particular segment within the media file,
further presently the search is performed on the basis of text
associated with the media files. The major issue in the
current process is that the search results have high recall but
low precision [1], but contemporary users demand efficient
and precise information. Usually Multimedia data is very
detailed and lengthy and often different topics are discussed
in a single file, so it becomes tedious and tiresome when it
comes for user to search a particular topic or finding answer
to particular information from the file manually.
With the advancement in this field, there has been an urge in
users to retrieve the most relevant, accurate and precise data
when they are querying. Keyword based search provide
hundreds and thousands of hits but it becomes frustrating and
tiresome for users to search the relevant content. In order to
overcome the problem, there is a need to model the current
searching techniques that enable the users to not only retrieve
the most accurate results by querying in natural language but
also provide them with the exact content that matches theirsearch criteria. There are a number of search engines
presently that are working to incorporate semantics in their
architecture e.g. Hakia [2] is a semantic based image retrieval
search engine, True Knowledge [3] is another that enables
users to query in natural language and then returns the
precise answer to that, still a significant effort needs to be
done to incorporate semantics for efficient retrieval of
multimedia data resources especially in the domain of
Islamic scholarly lectures.
The architecture of SMART uses ELAN tool for alignment
of transcriptions with multimedia file and then uses the
modern semantic based technologies to annotate that data
efficiently so that relevant data retrieval is achieved [4].Studies have shown that use of semantics can empower the
capabilities of e-learning [7, 8] and can support knowledge
virtualization.
In SMART we have transcribed the media file and then time
aligned the media file with the corresponding text. A
metadata is attached with the media file that contains
information about the file. A knowledge base is attached with
the system
mailto:[email protected],%[email protected],%[email protected],%[email protected]:[email protected],%[email protected],%[email protected],%[email protected]:[email protected],%[email protected],%[email protected],%[email protected]8/3/2019 Semantic Based Multimedia Analysis and Retrieval
2/6
26th
IEEEP Students Seminar 2011
Pakistan Navy Engineering College
National University of Sciences & Technology
The tags are generated with the help of the transcribed file
and the knowledge base. The search is performed with the
help of the tags and the media file is returned to the user
which is navigated to the segment which the user demanded.
In order to test our approach, we have chosen the Islamic
scholarly lectures as our target domain thus the domainconcepts represented in the ontologies is concerning the
views of different Islamic scholars. The main aim of SMART
is to open new ways of incorporating technology in Islamic
world and to bridge the gap between these two to help people
in understanding the concepts within the religion with ease.
Structure of this paper: Section 2 describes the motivation
behind the project and challenges associated with working in
this domain. Section 3 describes the design goals, detailed
architecture of SMART and implementation. Section 4
discusses the implementation details. Conclusion and future
work is further discussed in Section 5.
II. MOTIVATION BEHIND WORKING IN THIS DOMAIN ANDCHALLENGESA. MotivationThe motivating factors behind carrying out this project are:
To enable users to retrieve relevant information from the
multimedia files. We would achieve this by pruning the
irrelevant results. The search results provided would be fewer
but would be more precise and relevant.
One of the main targets is to enable users to get the view of
different scholars on diverse topics in a less amount of time
by reducing the query time and also facilitate the users by
enabling them to query in natural language
Previously keyword based search has been performed upon
text resources and multimedia lectures. Semantic based
search techniques have not been performed on multimedia
files till now. So we hope that SMART would open new
ways of efficient and meaningful search in this area
B. Challenges AssociatedThe challenges associated with this project are described as
follows
One of the basic tasks is to convert the multimedia file into
text. One way to achieve this is through speech recognition,
but when tested, the results provided through speechrecognition were not up to the mark as the domain contained
words which are not frequently spoken in English language.
Also the videos contained many words of Arabic language.
So the accuracy achieved through speech recognition was
between 35-45 % which was quite low to work with, as the
search process was dependent on the text associated with the
media file. Another challenge associated with working in this
domain is that no previous work has been done in this area
and more over due to the complexity resulting from many
diverse views of different scholars on the same topic. So to
create a link between them and provide the user with sound
results is a challenging task in this domain.
III. SMARTDESIGN GOALS AND SMARTARCHITECTUREA. Design GoalsThe existing search engines retrieve the multimedia content
based upon the tags associated with the file such as its title,
name of speaker etc. To facilitate the user to efficiently
navigate to particular segment of interest is a challenging and
demanding task nowadays. Many researches show that there
has not been done a significant amount of work in this
prospect. This research claims to propose an architecture that
is capable of facilitating the user to navigate to a particular
segment within the media file and that too by allowing the
user to query in natural language. The purpose behind
facilitating the user to query in natural language is that it will
target and return more precise content the user wants to
search and will prune the results that are of no use for user.
Further elaborations on the goals have been provided below
that provide the basis on which the architecture of SMART is
formulated
1) Processing of Textual Content of MultimediaResources:
SMART should be able to process and align the textual
content i.e. transcriptions associated with multimedia file
efficiently so that the acquisition of timestamps associated
with text of a multimedia file is achieved. Time stamped
information will help to create link between the text of the
file and multimedia content.
2) Automatic Tagging of New Multimedia Files Added
in Repository:
The knowledge base comprises of the most commonly
occurring terms in this domain, whenever a new file is added
in the repository, SMART should be able to automatically tag
the segments of file that contains those domain terms and
should save their timestamps.
3) Natural Language Processing
This research is envisioned to provide the user with facility to
query in natural language. It will allow the user to do query
as sentences in English language. SMART should be able to
process the query and apply Natural Language Processingtechniques [9] on query structure so that extraction of the
meaning out of query and its precise association with text of
the media is ensured.
4) Ontological Knowledge Model:
To enable efficient search, the need of hour is to model data
to knowledge so that ontological model of knowledge can be
designed for this particular domain of Islamic Scholarly
8/3/2019 Semantic Based Multimedia Analysis and Retrieval
3/6
26th
IEEEP Students Seminar 2011
Pakistan Navy Engineering College
National University of Sciences & Technology
Lectures. This ontological model would plot the information
such as speaker, topic, etc.
5) Intelligent Retrieval of Information:
SMART should be able to retrieve efficient and meaningful
results by pruning the irrelevant hits and only providing the
user with most precise results.
B. High Level System Architecture of SMART
The high level system architecture of SMART is shown in
Figure 1 and it comprises of five major activities:
Transcription Alignment, Query Processing, Knowledge
Extraction, Knowledge Modeling, and Query-Result
Accuracy analysis. In the first phase, the multimedia
resources are aligned with the transcriptions that would be
treated as the repository of SMART. In the Query Processing
unit applies Natural Language Processing Techniques on the
user Query and extract its meaning so that it can be mapped
with accuracy in the transcribed data. The Knowledge
Extraction unit comprises of metadata generation and tagsgeneration in which the transcribed aligned data is parsed in a
way that it incorporates in it the data relevant to the data i.e.
metadata of the data. In the tag generation unit the tags are
generated on the parsed data to initiate the navigation
process. Knowledge Modeling builds respective ontology
models for religious scholarly texts and with use of the
ontology schemas. Ontology repository stores the Ontologies.
The ontology repository and the metadata repository form the
knowledge base for the system which will be used for
efficient search retrieval purpose. The incorporation of
semantic web technologies is to speed up the search process
and reduce the response time of the overall process. In the
final phase of Query-Result Accuracy Analysis, the
compatibility analysis of query and the extracted result and
its accuracy is determined using the natural language
techniques and by analysis of the thematic coherence
between query and the results. Finally the results are returned
back to the user and displayed through user interface of
SMART application. The results will comprise of the list of
different speakers and files associated with them that contain
accurate timestamps of occurrence of the answer of userquery within the multimedia stream.
Figure 1: High level System Architecture for Multimedia Segment Navigation
The detailed architecture of SMART on the basis of which
the design and implementation details are formulated isdiscussed in following subsections:
1) Transcription Alignment Unit:As discussed above, the challenge associated with SMART is
to convert the audio into text. The complexity lies with the
fact that the scholarly lectures in English language, contains
Arabic terms and some Urdu terms, so accuracy cannot be
achieved and risk factor cannot be neglected in such a
sensitive domain of religious lectures. For this reason manual
transcriptions are generated for each of the multimedia file.The transcriptions are then aligned with the multimedia
stream using ELAN tool that is an open source tool used for
aligning transcriptions with multimedia. The importance of
this unit lies with the fact that the accurate the alignment, the
more efficient would be the search process. ELAN aligns
transcriptions along with timestamps which are required for
segment navigation [5].
2) Knowledge Extraction Unit:
8/3/2019 Semantic Based Multimedia Analysis and Retrieval
4/6
26th
IEEEP Students Seminar 2011
Pakistan Navy Engineering College
National University of Sciences & Technology
This unit consists of two components; these are Metadata
Generation and Tag generation. A brief detail of both the
components is as follows:
3) Metadata Generation:
This component is responsible for generating metadata of the
multimedia files. The metadata holds information of the file
by storing the name of file, its topic, its location [6].
4) Tag Generation
Tag generation unit is responsible for generating tags in the
multimedia file by identifying useful tags with the help of
knowledge base.
5) Query Processing Unit:
The user query would be passed on to the Query Processing
Engine, which would extract useful keywords from the user
query. Here, Natural Language Processing techniques would
be applied on the user query to understand the syntax and
semantics of the user Query.
6) Knowledge Base:
The knowledge base would contain the most frequently used
words in the Islamic domain. With the help of these words
the tags for the particular video would be generated.
7) Knowledge Modeling Unit
In this unit, the data will be transformed into the form of
knowledge models with the help of metadata generated with
the use of ontologies. The ontological model of data will be
stored in this unit that would comprise of schemas to
incorporate semantics in the data for efficient search and
retrieval purpose. The user query from the Query ProcessingUnit will be mapped onto the data for acquiring the exact
location holding the answer to that query. For efficient
retrieval the data has already been shaped in the form of
ontologies so it would facilitate to speed up the overall
process of knowledge extraction and acquisition. The use of
ontologies facilitates machine interoperability and
conceptualizes the domain in a format that is understood [10]
by the machine.
8) Query-Result Compatibility Analysis Unit
In this unit the query and its corresponding mapping obtained
in the data would be verified. This would be necessary
because the domain under consideration is very sensitive andthere is a risk involved in returning the results to the user
without its proper validation and verification. From this unit,
the verified results would be returned to the user-interface for
SMART application.
IV. IMPLEMENTATION
The implementation of SMART architecture is divided into
two modules. The navigation strategy completion and second
is the NLP techniques with semantic incorporation. Till now
we have implemented the first module i.e. the
implementation of segment navigation within multimedia
stream based on keywords. The details of subsections of this
module are elaborated as follows:
A. Navigation StrategyIt comprises of Transcription generation, alignment with
multimedia stream and facilitating keyword based navigation
of multimedia stream. The three subsections are discussed in
detail as follows:
1) Transcription Generation
Before discussing in detail the first part of transcription
generation, it is important to understand the reason behind
using transcriptions when there are many speech recognition
engines available these days. This is due to the fact that the
domain we are targeting holds in it concepts of Arabic andsome Urdu terms even if the whole lecture is in English
language. This raises the challenge of recognizing
multilingual stream of data file, which to date is not
achievable and efficient. Another issue is that the speech
recognition engines available today are workable with
applying machine learning techniques on them, this approach
works well if there is a single speaker because the machine
has to be trained on it. Moreover due to diversity in the
dialects and pronunciation of various speakers, it is very
difficult to recognize the spoken words with accuracy [11].
This domain is so sensitive that different views are required
by users to understand the concepts within the religion. Also
this would constrict the domain to a single speaker thatwould not benefit the users who want to know views of
different scholars on a same topic. So to deal with above
mentioned issues, we have transcribed the multimedia files.2) Alignment with Multimedia Stream:
Navigation within the stream is possible if we get the time
stamped information of the topics discussed in the
multimedia file. For this there is a need of aligning
transcriptions with the multimedia stream. We have used
ELAN tool for this purpose. ELAN (the Eudico Linguistic
Annotator) is a program that allows aligning the
transcriptions and adding annotations to a video file. ELAN
aligns the transcriptions with the media file and returns thetime stamped data i.e. the words spoken in the video along
with the time at which they were spoken. [5]
3) Keyword Based Navigation:
In this unit of the architecture, keywords based search is
implemented. In this we will discuss in detail the working of
Knowledge Extraction unit of SMART architecture. The
knowledge extraction unit consists of two subunits. One is
8/3/2019 Semantic Based Multimedia Analysis and Retrieval
5/6
26th
IEEEP Students Seminar 2011
Pakistan Navy Engineering College
National University of Sciences & Technology
the tag generation unit and the other is to attach the metadata
with the corresponding media file. The tag generation unit
gets the input in the form of the time aligned data file. The
tags are generated with the help of Knowledge base. The
knowledge base consists of most commonly used words in
the Islamic domain so that tags could be added to in
relevance to the multimedia files. The tagged repository is
maintained which consists of metadata. The metadata is
comprised of the keyword information and the corresponding
media file in which it is occurring. The metadata associated
with the transcription consists of detailed information
regarding keywords contained within knowledge base, their
corresponding timestamps and path of multimedia file
containing those keywords. Figure 2 shows the detailed
working of tag generation.
Figure 2: Detailed Architecture of the Tag Generation Unit
With the acquisition of tagged information, it is now possible
to navigate to a particular segment within the multimedia file.
The detailed algorithm of the search process implemented isdiscussed below.
ALGORITHM1:NAVIGATION WITHIN MULTIMEDIA STREAM
1. Initialize UserQuery to U2. Initialize SelectedSpeakerto S
Input the user query
3. ifthe user selects the speaker4. Store speaker name in Temp5. Retrieve the names of multimedia files of the
corresponding speaker from the meta-data file
6. Search the database for the USERQUERYWHEREmultimedia file name belongs to retrieved list
7. Retrieve the results, their corresponding timestampsand multimedia file names
8. else9. Search the database for the USERQUERY10. Retrieve the results, their corresponding timestamps
and multimedia file names
11. Display the retrieved results to the users12. User selects the multimedia file and plays it
The workflow of the components on the basis of the
algorithm is showed in Figure 3
Figure 3: Detailed Architecture of the Search Engine
B. Incorporation of Semantics
The second module of implementation of SMART includes
incorporation of semantics in the architecture for efficient
search and using NLP techniques for query processing.
V. CONCLUSIONS &FUTURE WORKIn this paper we have proposed a potentially powerful and
novel approach for the retrieval of multimedia information.
The crux of our innovation is the development of a procedure
through which we can retrieve a particular segment from amultimedia file. We have used a domain of Islamic Scholarly
lectures for project demonstration but our results can be
generalized and can be applied on other domains as well.
Moreover, speech recognition does not prove to be a vital
approach for working in this domain due to the variety of
words spoken in different languages within a single lecture.
That is why going with transcriptions is necessary for an
effective search. Combined with modern semantic
technologies, we are hopeful that SMART, in comparison
with other semantic based search engines would prove to be
an efficient and effective search engine for multimedia files.
Although we are confident that the conceptual framework for
this project is sound, and its implementation is completely
feasible from a technical standpoint, but still some other
important aspects are needed to be covered in future. These
include adding semantics to achieve efficiency and
effectiveness while retrieving the results.
Moreover, until now we have been working with videos in
English. Later on we would like to incorporate videos in
8/3/2019 Semantic Based Multimedia Analysis and Retrieval
6/6
26th
IEEEP Students Seminar 2011
Pakistan Navy Engineering College
National University of Sciences & Technology
Urdu language as well. The need of this lies with the fact that
the domain has a vast amount of data in Urdu language that
could be used a valuable resource of knowledge and
information. In future we would also work on user studies
and evaluation.
REFERENCES
[1] Latifur, Dennis August 2000 Effective Retrieval of
Audio Information from Annotated Text Using
Ontologies1, ACM SIGKDD Workshop on
Multimedia Data Mining, Boston, MA
[2] http://www.hakia.com [Accessed 15 September
2010]
[3] http://www.trueknowledge.com/[Accessed 28
October 2010]
[4] Y. Xiao, M. Xiao, and F. Zhang, Agents-Based
Intelligent Retrieval Framework for the Semantic
Web, in Proc. WiCom, 2007, pp. 5357-5360.
[5] http://www.lat-mpi.eu [Accessed 15 November2010]
[6] R. Guenther and J. Radebaugh: Understanding
Metadata Bethesda: NISO, 2004
[7] Y. Li and M. Dong, Towards a Knowledge Portal
for E-Learning Based on Semantic Web, inProc.
8th IEEE Int. Conf on Advanced Learning
Technologies, ICALT'08. 2008, pp. 910-912.
[8] N. Henze, P. Dolog, and W. Nejdl, Reasoning and
Ontologies for Personalized E-Learning in the
Semantic Web. Educational Technology &Society,
pp. 82-97.
[9] O. Kucuktunc, U. Gudukbay, and O. Ulusoy. A
natural language-basedinterface for querying a videodatabase. IEEE MultiMedia, 14(1):8389,2007.
[10] H. Alani, S. Kim, D.E. Millard, M.J. Weal, W. Hall,
P.H. Lewis and N.R. Shadbolt, Automatic
Ontology-Based Knowledge Extraction from Web
Documents, Proc. IEEE Intelligent Systems, 2003,
pp. 14-21.
[11] Forsberg, M. (2003). Why is Speech Recognition
Difficult. Chalmers University of Technology
http://www.hakia.com/http://www.trueknowledge.com/http://www.trueknowledge.com/http://www.lat-mpi.eu/http://www.lat-mpi.eu/http://www.lat-mpi.eu/http://www.trueknowledge.com/http://www.hakia.com/