Semantic Based Multimedia Analysis and Retrieval

Embed Size (px)

Citation preview

  • 8/3/2019 Semantic Based Multimedia Analysis and Retrieval

    1/6

    26th

    IEEEP Students Seminar 2011

    Pakistan Navy Engineering College

    National University of Sciences & Technology

    Semantic Based Multimedia Analysis and Retrieval

    Sana Aslam, Mahwash Makhdoom, Madeeha Khan, Amna Basharat

    FAST National University of Computer & Emerging Sciences

    Department of Computer Sciences

    AK Brohi Road, H-11/4 Islamabad, Pakistan

    [email protected], [email protected], [email protected], [email protected]

    AbstractThe volume of multimedia files is increasing day by

    day. Especially with the advancement in religious e-learning and

    multimedia knowledge resources, it has become highly

    demanding that effective retrieval and search methodologies

    should be incorporated in this area. Many renowned scholars

    from all over the world deliver lectures in which they discussdiverse issues/topics and these are usually hours long; therefore

    it is problematic and time consuming to navigate to a particular

    segment within the multimedia files manually. The necessity of

    present time is that efficient search methods should be devised

    which facilities the user to select the topic of concern within the

    multimedia files with just a single click. Moreover, users should

    be assisted in a way that allows them to query in natural

    language instead of keyword based search so that the retrieval is

    improved and results are based upon content and context. In

    this paper we are proposing a method that enables the users to

    navigate to a particular segment within the multimedia files.

    The architecture of Semantic based Multimedia Analysis and

    Retrieval (SMART) allows content indexing and time stamped

    alignment of transcriptions with multimedia file. The

    architecture of SMART incorporates the natural language

    processing techniques for efficient query and the modern

    semantic web technologies for efficient search. Search will be

    handled by modeling the knowledge base with ontologies. The

    advantage of making ontologies is that it allows machine

    interoperability; further this would help to retrieve the relevant

    results. The architecture envisions combining natural language

    processing techniques along with modern semantic web

    technologies and use them in a domain that opens new ways ofknowledge sharing and information retrieval.

    Keywords-component: Natural Language Processing,

    Transcription Alignment, Islamic Scholarly Lectures, Multimedia

    Segment Navigation, Ontology

    I. INTRODUCTIONWith the changes in needs of users new trends have been

    introduced and developed to store information. The

    advancement in the field of e-learning has transformed

    multimedia resources as a very valuable source of knowledge

    and information. The search engines nowadays, do not enable

    users to search a particular segment within the media file,

    further presently the search is performed on the basis of text

    associated with the media files. The major issue in the

    current process is that the search results have high recall but

    low precision [1], but contemporary users demand efficient

    and precise information. Usually Multimedia data is very

    detailed and lengthy and often different topics are discussed

    in a single file, so it becomes tedious and tiresome when it

    comes for user to search a particular topic or finding answer

    to particular information from the file manually.

    With the advancement in this field, there has been an urge in

    users to retrieve the most relevant, accurate and precise data

    when they are querying. Keyword based search provide

    hundreds and thousands of hits but it becomes frustrating and

    tiresome for users to search the relevant content. In order to

    overcome the problem, there is a need to model the current

    searching techniques that enable the users to not only retrieve

    the most accurate results by querying in natural language but

    also provide them with the exact content that matches theirsearch criteria. There are a number of search engines

    presently that are working to incorporate semantics in their

    architecture e.g. Hakia [2] is a semantic based image retrieval

    search engine, True Knowledge [3] is another that enables

    users to query in natural language and then returns the

    precise answer to that, still a significant effort needs to be

    done to incorporate semantics for efficient retrieval of

    multimedia data resources especially in the domain of

    Islamic scholarly lectures.

    The architecture of SMART uses ELAN tool for alignment

    of transcriptions with multimedia file and then uses the

    modern semantic based technologies to annotate that data

    efficiently so that relevant data retrieval is achieved [4].Studies have shown that use of semantics can empower the

    capabilities of e-learning [7, 8] and can support knowledge

    virtualization.

    In SMART we have transcribed the media file and then time

    aligned the media file with the corresponding text. A

    metadata is attached with the media file that contains

    information about the file. A knowledge base is attached with

    the system

    mailto:[email protected],%[email protected],%[email protected],%[email protected]:[email protected],%[email protected],%[email protected],%[email protected]:[email protected],%[email protected],%[email protected],%[email protected]
  • 8/3/2019 Semantic Based Multimedia Analysis and Retrieval

    2/6

    26th

    IEEEP Students Seminar 2011

    Pakistan Navy Engineering College

    National University of Sciences & Technology

    The tags are generated with the help of the transcribed file

    and the knowledge base. The search is performed with the

    help of the tags and the media file is returned to the user

    which is navigated to the segment which the user demanded.

    In order to test our approach, we have chosen the Islamic

    scholarly lectures as our target domain thus the domainconcepts represented in the ontologies is concerning the

    views of different Islamic scholars. The main aim of SMART

    is to open new ways of incorporating technology in Islamic

    world and to bridge the gap between these two to help people

    in understanding the concepts within the religion with ease.

    Structure of this paper: Section 2 describes the motivation

    behind the project and challenges associated with working in

    this domain. Section 3 describes the design goals, detailed

    architecture of SMART and implementation. Section 4

    discusses the implementation details. Conclusion and future

    work is further discussed in Section 5.

    II. MOTIVATION BEHIND WORKING IN THIS DOMAIN ANDCHALLENGESA. MotivationThe motivating factors behind carrying out this project are:

    To enable users to retrieve relevant information from the

    multimedia files. We would achieve this by pruning the

    irrelevant results. The search results provided would be fewer

    but would be more precise and relevant.

    One of the main targets is to enable users to get the view of

    different scholars on diverse topics in a less amount of time

    by reducing the query time and also facilitate the users by

    enabling them to query in natural language

    Previously keyword based search has been performed upon

    text resources and multimedia lectures. Semantic based

    search techniques have not been performed on multimedia

    files till now. So we hope that SMART would open new

    ways of efficient and meaningful search in this area

    B. Challenges AssociatedThe challenges associated with this project are described as

    follows

    One of the basic tasks is to convert the multimedia file into

    text. One way to achieve this is through speech recognition,

    but when tested, the results provided through speechrecognition were not up to the mark as the domain contained

    words which are not frequently spoken in English language.

    Also the videos contained many words of Arabic language.

    So the accuracy achieved through speech recognition was

    between 35-45 % which was quite low to work with, as the

    search process was dependent on the text associated with the

    media file. Another challenge associated with working in this

    domain is that no previous work has been done in this area

    and more over due to the complexity resulting from many

    diverse views of different scholars on the same topic. So to

    create a link between them and provide the user with sound

    results is a challenging task in this domain.

    III. SMARTDESIGN GOALS AND SMARTARCHITECTUREA. Design GoalsThe existing search engines retrieve the multimedia content

    based upon the tags associated with the file such as its title,

    name of speaker etc. To facilitate the user to efficiently

    navigate to particular segment of interest is a challenging and

    demanding task nowadays. Many researches show that there

    has not been done a significant amount of work in this

    prospect. This research claims to propose an architecture that

    is capable of facilitating the user to navigate to a particular

    segment within the media file and that too by allowing the

    user to query in natural language. The purpose behind

    facilitating the user to query in natural language is that it will

    target and return more precise content the user wants to

    search and will prune the results that are of no use for user.

    Further elaborations on the goals have been provided below

    that provide the basis on which the architecture of SMART is

    formulated

    1) Processing of Textual Content of MultimediaResources:

    SMART should be able to process and align the textual

    content i.e. transcriptions associated with multimedia file

    efficiently so that the acquisition of timestamps associated

    with text of a multimedia file is achieved. Time stamped

    information will help to create link between the text of the

    file and multimedia content.

    2) Automatic Tagging of New Multimedia Files Added

    in Repository:

    The knowledge base comprises of the most commonly

    occurring terms in this domain, whenever a new file is added

    in the repository, SMART should be able to automatically tag

    the segments of file that contains those domain terms and

    should save their timestamps.

    3) Natural Language Processing

    This research is envisioned to provide the user with facility to

    query in natural language. It will allow the user to do query

    as sentences in English language. SMART should be able to

    process the query and apply Natural Language Processingtechniques [9] on query structure so that extraction of the

    meaning out of query and its precise association with text of

    the media is ensured.

    4) Ontological Knowledge Model:

    To enable efficient search, the need of hour is to model data

    to knowledge so that ontological model of knowledge can be

    designed for this particular domain of Islamic Scholarly

  • 8/3/2019 Semantic Based Multimedia Analysis and Retrieval

    3/6

    26th

    IEEEP Students Seminar 2011

    Pakistan Navy Engineering College

    National University of Sciences & Technology

    Lectures. This ontological model would plot the information

    such as speaker, topic, etc.

    5) Intelligent Retrieval of Information:

    SMART should be able to retrieve efficient and meaningful

    results by pruning the irrelevant hits and only providing the

    user with most precise results.

    B. High Level System Architecture of SMART

    The high level system architecture of SMART is shown in

    Figure 1 and it comprises of five major activities:

    Transcription Alignment, Query Processing, Knowledge

    Extraction, Knowledge Modeling, and Query-Result

    Accuracy analysis. In the first phase, the multimedia

    resources are aligned with the transcriptions that would be

    treated as the repository of SMART. In the Query Processing

    unit applies Natural Language Processing Techniques on the

    user Query and extract its meaning so that it can be mapped

    with accuracy in the transcribed data. The Knowledge

    Extraction unit comprises of metadata generation and tagsgeneration in which the transcribed aligned data is parsed in a

    way that it incorporates in it the data relevant to the data i.e.

    metadata of the data. In the tag generation unit the tags are

    generated on the parsed data to initiate the navigation

    process. Knowledge Modeling builds respective ontology

    models for religious scholarly texts and with use of the

    ontology schemas. Ontology repository stores the Ontologies.

    The ontology repository and the metadata repository form the

    knowledge base for the system which will be used for

    efficient search retrieval purpose. The incorporation of

    semantic web technologies is to speed up the search process

    and reduce the response time of the overall process. In the

    final phase of Query-Result Accuracy Analysis, the

    compatibility analysis of query and the extracted result and

    its accuracy is determined using the natural language

    techniques and by analysis of the thematic coherence

    between query and the results. Finally the results are returned

    back to the user and displayed through user interface of

    SMART application. The results will comprise of the list of

    different speakers and files associated with them that contain

    accurate timestamps of occurrence of the answer of userquery within the multimedia stream.

    Figure 1: High level System Architecture for Multimedia Segment Navigation

    The detailed architecture of SMART on the basis of which

    the design and implementation details are formulated isdiscussed in following subsections:

    1) Transcription Alignment Unit:As discussed above, the challenge associated with SMART is

    to convert the audio into text. The complexity lies with the

    fact that the scholarly lectures in English language, contains

    Arabic terms and some Urdu terms, so accuracy cannot be

    achieved and risk factor cannot be neglected in such a

    sensitive domain of religious lectures. For this reason manual

    transcriptions are generated for each of the multimedia file.The transcriptions are then aligned with the multimedia

    stream using ELAN tool that is an open source tool used for

    aligning transcriptions with multimedia. The importance of

    this unit lies with the fact that the accurate the alignment, the

    more efficient would be the search process. ELAN aligns

    transcriptions along with timestamps which are required for

    segment navigation [5].

    2) Knowledge Extraction Unit:

  • 8/3/2019 Semantic Based Multimedia Analysis and Retrieval

    4/6

    26th

    IEEEP Students Seminar 2011

    Pakistan Navy Engineering College

    National University of Sciences & Technology

    This unit consists of two components; these are Metadata

    Generation and Tag generation. A brief detail of both the

    components is as follows:

    3) Metadata Generation:

    This component is responsible for generating metadata of the

    multimedia files. The metadata holds information of the file

    by storing the name of file, its topic, its location [6].

    4) Tag Generation

    Tag generation unit is responsible for generating tags in the

    multimedia file by identifying useful tags with the help of

    knowledge base.

    5) Query Processing Unit:

    The user query would be passed on to the Query Processing

    Engine, which would extract useful keywords from the user

    query. Here, Natural Language Processing techniques would

    be applied on the user query to understand the syntax and

    semantics of the user Query.

    6) Knowledge Base:

    The knowledge base would contain the most frequently used

    words in the Islamic domain. With the help of these words

    the tags for the particular video would be generated.

    7) Knowledge Modeling Unit

    In this unit, the data will be transformed into the form of

    knowledge models with the help of metadata generated with

    the use of ontologies. The ontological model of data will be

    stored in this unit that would comprise of schemas to

    incorporate semantics in the data for efficient search and

    retrieval purpose. The user query from the Query ProcessingUnit will be mapped onto the data for acquiring the exact

    location holding the answer to that query. For efficient

    retrieval the data has already been shaped in the form of

    ontologies so it would facilitate to speed up the overall

    process of knowledge extraction and acquisition. The use of

    ontologies facilitates machine interoperability and

    conceptualizes the domain in a format that is understood [10]

    by the machine.

    8) Query-Result Compatibility Analysis Unit

    In this unit the query and its corresponding mapping obtained

    in the data would be verified. This would be necessary

    because the domain under consideration is very sensitive andthere is a risk involved in returning the results to the user

    without its proper validation and verification. From this unit,

    the verified results would be returned to the user-interface for

    SMART application.

    IV. IMPLEMENTATION

    The implementation of SMART architecture is divided into

    two modules. The navigation strategy completion and second

    is the NLP techniques with semantic incorporation. Till now

    we have implemented the first module i.e. the

    implementation of segment navigation within multimedia

    stream based on keywords. The details of subsections of this

    module are elaborated as follows:

    A. Navigation StrategyIt comprises of Transcription generation, alignment with

    multimedia stream and facilitating keyword based navigation

    of multimedia stream. The three subsections are discussed in

    detail as follows:

    1) Transcription Generation

    Before discussing in detail the first part of transcription

    generation, it is important to understand the reason behind

    using transcriptions when there are many speech recognition

    engines available these days. This is due to the fact that the

    domain we are targeting holds in it concepts of Arabic andsome Urdu terms even if the whole lecture is in English

    language. This raises the challenge of recognizing

    multilingual stream of data file, which to date is not

    achievable and efficient. Another issue is that the speech

    recognition engines available today are workable with

    applying machine learning techniques on them, this approach

    works well if there is a single speaker because the machine

    has to be trained on it. Moreover due to diversity in the

    dialects and pronunciation of various speakers, it is very

    difficult to recognize the spoken words with accuracy [11].

    This domain is so sensitive that different views are required

    by users to understand the concepts within the religion. Also

    this would constrict the domain to a single speaker thatwould not benefit the users who want to know views of

    different scholars on a same topic. So to deal with above

    mentioned issues, we have transcribed the multimedia files.2) Alignment with Multimedia Stream:

    Navigation within the stream is possible if we get the time

    stamped information of the topics discussed in the

    multimedia file. For this there is a need of aligning

    transcriptions with the multimedia stream. We have used

    ELAN tool for this purpose. ELAN (the Eudico Linguistic

    Annotator) is a program that allows aligning the

    transcriptions and adding annotations to a video file. ELAN

    aligns the transcriptions with the media file and returns thetime stamped data i.e. the words spoken in the video along

    with the time at which they were spoken. [5]

    3) Keyword Based Navigation:

    In this unit of the architecture, keywords based search is

    implemented. In this we will discuss in detail the working of

    Knowledge Extraction unit of SMART architecture. The

    knowledge extraction unit consists of two subunits. One is

  • 8/3/2019 Semantic Based Multimedia Analysis and Retrieval

    5/6

    26th

    IEEEP Students Seminar 2011

    Pakistan Navy Engineering College

    National University of Sciences & Technology

    the tag generation unit and the other is to attach the metadata

    with the corresponding media file. The tag generation unit

    gets the input in the form of the time aligned data file. The

    tags are generated with the help of Knowledge base. The

    knowledge base consists of most commonly used words in

    the Islamic domain so that tags could be added to in

    relevance to the multimedia files. The tagged repository is

    maintained which consists of metadata. The metadata is

    comprised of the keyword information and the corresponding

    media file in which it is occurring. The metadata associated

    with the transcription consists of detailed information

    regarding keywords contained within knowledge base, their

    corresponding timestamps and path of multimedia file

    containing those keywords. Figure 2 shows the detailed

    working of tag generation.

    Figure 2: Detailed Architecture of the Tag Generation Unit

    With the acquisition of tagged information, it is now possible

    to navigate to a particular segment within the multimedia file.

    The detailed algorithm of the search process implemented isdiscussed below.

    ALGORITHM1:NAVIGATION WITHIN MULTIMEDIA STREAM

    1. Initialize UserQuery to U2. Initialize SelectedSpeakerto S

    Input the user query

    3. ifthe user selects the speaker4. Store speaker name in Temp5. Retrieve the names of multimedia files of the

    corresponding speaker from the meta-data file

    6. Search the database for the USERQUERYWHEREmultimedia file name belongs to retrieved list

    7. Retrieve the results, their corresponding timestampsand multimedia file names

    8. else9. Search the database for the USERQUERY10. Retrieve the results, their corresponding timestamps

    and multimedia file names

    11. Display the retrieved results to the users12. User selects the multimedia file and plays it

    The workflow of the components on the basis of the

    algorithm is showed in Figure 3

    Figure 3: Detailed Architecture of the Search Engine

    B. Incorporation of Semantics

    The second module of implementation of SMART includes

    incorporation of semantics in the architecture for efficient

    search and using NLP techniques for query processing.

    V. CONCLUSIONS &FUTURE WORKIn this paper we have proposed a potentially powerful and

    novel approach for the retrieval of multimedia information.

    The crux of our innovation is the development of a procedure

    through which we can retrieve a particular segment from amultimedia file. We have used a domain of Islamic Scholarly

    lectures for project demonstration but our results can be

    generalized and can be applied on other domains as well.

    Moreover, speech recognition does not prove to be a vital

    approach for working in this domain due to the variety of

    words spoken in different languages within a single lecture.

    That is why going with transcriptions is necessary for an

    effective search. Combined with modern semantic

    technologies, we are hopeful that SMART, in comparison

    with other semantic based search engines would prove to be

    an efficient and effective search engine for multimedia files.

    Although we are confident that the conceptual framework for

    this project is sound, and its implementation is completely

    feasible from a technical standpoint, but still some other

    important aspects are needed to be covered in future. These

    include adding semantics to achieve efficiency and

    effectiveness while retrieving the results.

    Moreover, until now we have been working with videos in

    English. Later on we would like to incorporate videos in

  • 8/3/2019 Semantic Based Multimedia Analysis and Retrieval

    6/6

    26th

    IEEEP Students Seminar 2011

    Pakistan Navy Engineering College

    National University of Sciences & Technology

    Urdu language as well. The need of this lies with the fact that

    the domain has a vast amount of data in Urdu language that

    could be used a valuable resource of knowledge and

    information. In future we would also work on user studies

    and evaluation.

    REFERENCES

    [1] Latifur, Dennis August 2000 Effective Retrieval of

    Audio Information from Annotated Text Using

    Ontologies1, ACM SIGKDD Workshop on

    Multimedia Data Mining, Boston, MA

    [2] http://www.hakia.com [Accessed 15 September

    2010]

    [3] http://www.trueknowledge.com/[Accessed 28

    October 2010]

    [4] Y. Xiao, M. Xiao, and F. Zhang, Agents-Based

    Intelligent Retrieval Framework for the Semantic

    Web, in Proc. WiCom, 2007, pp. 5357-5360.

    [5] http://www.lat-mpi.eu [Accessed 15 November2010]

    [6] R. Guenther and J. Radebaugh: Understanding

    Metadata Bethesda: NISO, 2004

    [7] Y. Li and M. Dong, Towards a Knowledge Portal

    for E-Learning Based on Semantic Web, inProc.

    8th IEEE Int. Conf on Advanced Learning

    Technologies, ICALT'08. 2008, pp. 910-912.

    [8] N. Henze, P. Dolog, and W. Nejdl, Reasoning and

    Ontologies for Personalized E-Learning in the

    Semantic Web. Educational Technology &Society,

    pp. 82-97.

    [9] O. Kucuktunc, U. Gudukbay, and O. Ulusoy. A

    natural language-basedinterface for querying a videodatabase. IEEE MultiMedia, 14(1):8389,2007.

    [10] H. Alani, S. Kim, D.E. Millard, M.J. Weal, W. Hall,

    P.H. Lewis and N.R. Shadbolt, Automatic

    Ontology-Based Knowledge Extraction from Web

    Documents, Proc. IEEE Intelligent Systems, 2003,

    pp. 14-21.

    [11] Forsberg, M. (2003). Why is Speech Recognition

    Difficult. Chalmers University of Technology

    http://www.hakia.com/http://www.trueknowledge.com/http://www.trueknowledge.com/http://www.lat-mpi.eu/http://www.lat-mpi.eu/http://www.lat-mpi.eu/http://www.trueknowledge.com/http://www.hakia.com/