Upload
jianhua
View
215
Download
0
Embed Size (px)
Citation preview
Accepted Manuscript
Title: An Effective and Economical Architecture forSemantic-based Heterogeneous Multimedia Big DataRetrieval
Author: Kehua Guo Wei Pan Mingming Lu Xiaoke ZhouJianhua Ma
PII: S0164-1212(14)00204-0DOI: http://dx.doi.org/doi:10.1016/j.jss.2014.09.016Reference: JSS 9382
To appear in:
Received date: 5-12-2013Revised date: 30-8-2014Accepted date: 8-9-2014
Please cite this article as: Guo, K., Pan, W., Lu, M., Zhou, X., Ma,J.,An Effective and Economical Architecture for Semantic-based HeterogeneousMultimedia Big Data Retrieval, The Journal of Systems and Software (2014),http://dx.doi.org/10.1016/j.jss.2014.09.016
This is a PDF file of an unedited manuscript that has been accepted for publication.As a service to our customers we are providing this early version of the manuscript.The manuscript will undergo copyediting, typesetting, and review of the resulting proofbefore it is published in its final form. Please note that during the production processerrors may be discovered which could affect the content, and all legal disclaimers thatapply to the journal pertain.
Page 1 of 30
Accep
ted
Man
uscr
ipt
Highlights The precision rate outperforms some other approaches in the case of
user feedback. When the database increases, the time cost is significantly lower than
other approaches. Store the semantic information in the database, not directly process
multimedia data with large size. The storage and I/O cost are reduced. Low-end computers together with open-source frameworks are
adopted. The investment possesses good economic efficiency.
Page 2 of 30
Accep
ted
Man
uscr
ipt
An Effective and Economical Architecture for Semantic-based
Heterogeneous Multimedia Big Data Retrieval
Kehua Guoa, Wei Panb, Mingming Lua, Xiaoke Zhoua, Jianhua Mac
a School of Information Science & Engineering, Central South University, Changsha China
b School of Software, Central South University, Changsha China
c Faculty of Computer and Information Sciences, Hosei University, Tokyo, Japan
Corresponding author: Kehua Guo, [email protected]
Abstract
Data variety has been one of the most critical features for multimedia big data. Some
multimedia documents, although in different data formats and storage structures, often express
similar semantic information. Therefore, the way to manage and retrieve multimedia
documents reflecting users' intent in heterogeneous big data environments has become an
important issue. In this paper, we present an effective and economical architecture named
SHMR (Semantic-based Heterogeneous Multimedia Retrieval), which uses low cost to store
and retrieve semantic information from heterogeneous multimedia data. Firstly, the
particularity of heterogeneous multimedia retrieval in big data environments is addressed.
Secondly, an approach to extract and represent semantic information for heterogeneous
multimedia documents is proposed. Thirdly, a NoSQL-based approach to semantic storage, in
which multimedia can be parallel processed in distributed nodes is provided. Finally, a
MapReduce-based retrieval algorithm is presented and a user feedback supported scheme to
achieve high retrieval precision and good user experience is designed. The experimental results
indicate that the retrieval performance and economic efficiency of SHMR are suitable for
multimedia information retrieval in heterogeneous big data environments.
Keywords: heterogeneous multimedia, semantic based retrieval, information retrieval, variety,
big data
Page 3 of 30
Accep
ted
Man
uscr
ipt
1 Introduction
Nowadays, huge volumes of multimedia such as images, audios, videos and text
documents are being generated and consumed daily. With the development of Internet
technology and multimedia provision, the number of rich multimedia contents has exploded.
Currently, multimedia makes up 60% of Internet traffic, 70% of mobile phone traffic, and 70%
of all available unstructured data (Smith, 2011). Multimedia has become a form of big data
which gives the users valuable information such as event occurrence, networks computing,
purchase recommendation and workflow control, etc. (Chen and Yang, 2011; Wang and Jiang
et al., 2014; Wang and Liu et al., 2014). Therefore, multimedia content retrieval from big data
environment is spurring on a tremendous amount of research (Liu et al., 2013).
Multimedia big data retrieval has its own particularities. In this paradigm, multimedia
computing has switched into a distributed pattern to store and process massive multimedia
contents. Although this manner alleviates the maintenance and the computing burden of the
client, multimedia big data storage and processing are facing great challenges. In big data
environment, a large number of commodity computers which possess massive computation
power and storage capacity will generate multimedia content. Since many services and
applications will provide, edit, process, and retrieve rich multimedia contents, some
multimedia documents, probably having different data formats and storage structures, often
express similar semantic information. Incompatible data formats, non-aligned data structures
and inconsistent data semantics has been important problems in multimedia big data research.
Therefore, the most fundamental challenge for multimedia big data storage and retrieval is
heterogeneity, which can be highlighted as follows: (1) Content heterogeneity. Multimedia
content generated from applications may be various and unstructured. For example, the
different types of multimedia services will generate images, videos, audios, graphics or text
documents. Even in videos, the content may be generated by transportation cameras, video
conferencing or user uploading, etc. (2) Service requirement heterogeneity. Information
retrieval may exist in different services, such as photo sharing, information rendering and
Page 4 of 30
Accep
ted
Man
uscr
ipt
semantic retrieval, etc. Different users require different quality of service (QoS) and quality of
experience (QoE). In this case, the service providers should guarantee the service performance
for millions of users and simultaneously meet different requirements as possible as they can. (3)
Terminal device heterogeneity. In big data environments, numerous types of terminals, such as
personal computers (PC), laptops, pads, mobile phones, etc., can be used to access massive
multimedia. Moreover, even for a single type of terminal, various forms of profile exist. For
instance, mobile phones have different types of operating system (OS), such as Windows CE,
Android and Apple's iOS. Thus, the retrieval architecture should provide ubiquitous services
for various clients. The above features have become new challenges in the area of multimedia
big data storage and retrieval.
In multimedia retrieval process, users' intent is another critical issue. In some traditional
approaches, the retrieval usually restricts to the same type of multimedia content such as
images. This constraint reduces the QoS of multimedia retrieval because the returned results
may fail to identify users' search intent due to the shortage of type diversity.
Therefore, it becomes a significant issue to solve type heterogeneity, storage distribution
and users' intent for a good retrieval performance and economic efficiency (Smith, 2012). In
this paper, a semantic-based approach to represent users' intent is adopted, and a novel storage
and retrieval architecture named SHMR (Semantic-based Heterogeneous Multimedia Retrieval)
to support heterogeneous multimedia big data retrieval is proposed. The characteristics of
SHMR are as follows: (1) heterogeneous multimedia retrieval, because any type of multimedia
documents can be uploaded and retrieved; (2) convenience, since a familiar retrieval interface
similar to traditional commercial search engines; (3) reduced I/O cost, as we store the ontology
represented semantic information in the database, then provide links to the real multimedia
documents, not directly process multimedia data with large size; (4) economic efficiency, as
low-end computers together with open-source frameworks are adopted to store NoSQL
database and process the retrieval, respectively.
The remainder of this paper is structured as follows. Section 2 reviews related works and
briefly introduces the overall concept of SHMR. Section 3 provides a detail description of our
Page 5 of 30
Accep
ted
Man
uscr
ipt
proposed architecture, including semantic extraction and representation, semantic storage,
multimedia retrieval algorithm and user feedback. Section 4 provides performance evaluation
and experimental results. Section 5 concludes our contribution and points the future work.
2 Related Works
In the past decades, multimedia retrieval is mainly founded on text-based approaches,
which are solely based on the text contents surrounding multimedia in certain host files.
Although keywords are utilized to retrieve various types of multimedia documents, this method
is not intrinsically heterogeneous supported. The retrieval is not able to achieve an excellent
performance because of the noise (Zhao and Grosky, 2002; Yang et al., 2012).
Regardless of the fact that users feel more convenient to retrieve multimedia content
through text keywords, content-based retrieval has been widely used in commercial search
engines, such as Google and Bing Image Search. However, it is extremely difficult to execute
heterogeneous retrieval based on multimedia content (Smeulders et al., 2000; Zhou et al.,
2012). For instance, given a video and audio documents for the same artist, the content-based
approaches have no ability to identify the artist or extract other similar features from the binary
data of the two documents because of the data formats difference. Thus, in many cases,
content-based approach may ignore users' retrieval intent.
To support users' intent reflected retrieval, some contributions focused on introducing
relevance feedback (RF) in content-based retrieval. Comprehensive surveys for RF in image
retrieval systems were presented in (Datta et al., 2008). Representative contributions include
active learning algorithm for conducting effective relevance feedback (He, 2010), Support
Vector Machines (SVMs) based feedback analysis (Wang et al., 2011), local geometrical graph
based feedback learning (Chen et al., 2011)) and Biased Discriminant Analysis (BDA)
approach (Zhang et al., 2012). However, the main drawback of RF is to increase user
involvement. Query users are expected to provide only limited feedback, and excessive
feedbacks will increase their burden (Datta et al., 2008).
Page 6 of 30
Accep
ted
Man
uscr
ipt
To support heterogeneous multimedia retrieval and reflect users' intent, the feasible
approach is using social semantic information and automatic semantic analysis (Wong and
Leung, 2008; Gijsenij and Gevers, 2010). Related models have been widely used. At present,
text semantic information is generally extracted using topic models such as PLSA
(Probabilistic Latent Semantic Analysis) (Hofmann, 2001) and LDA (Latent Dirichlet
Allocation) (Blei et al. 2003). In addition, BoW (Bag-Of-Words) model (Wu et al. 2010) has
become a typical model to express the visual words. To semantic information representation,
ontology is the most widely used method (Maedche and Staab, 2001). Some achievements
gave the improvement to traditional ontology. For example, Yang et al. (2008) proposed a
hierarchical ontology-based knowledge representation model, and Wang et al. (2008) used
Semantic Web Rule Language to define the semantic ontology in Web environment.
In recent years, some contributions using above approaches to support heterogeneous
multimedia retrieval have been proposed. For text-image retrieval, Rasiwasia et al. (2010)
modeled the correlations between text and image modalities and learned them with canonical
correlation analysis. In 2014, this approach was revised to achieve a better performance (Costa
et al., 2014). Zhai et al. (2013) proposed a heterogeneous media similarity measure with
nearest neighbors which considers both intra-media and inter-media correlations. Liu et al.
(2014) reported accumulated reconstruction error vector to combine the original feature
descriptions into a shared semantic space. However, the above approaches only support the
heterogeneous retrieval between image and text documents.
To achieve various types supported retrieval, Lu et al. (2012) proposed IBCR (Indexing-
based Cross-Media Retrieval) approach and designed indexing MK-tree based on
heterogeneous data distribution to manage the media objects within the semantic space to
improve the performance of heterogeneous multimedia retrieval. Yang et al. (2012)
constructed a semi-semantic graph by jointly analyzing the heterogeneous multimedia data.
However, these approaches ignore the semantic information provided by the social users and
only focus on automatic learning and relevance feedback of query users. In these systems,
semantic features and multimedia documents are stored in the servers' databases, when the data
Page 7 of 30
Accep
ted
Man
uscr
ipt
scale increases, processing multimedia data with large size will cost much computation
resource.
Retrieval performance and economic efficiency are very important factors to evaluate
multimedia big data retrieval systems. In big data environments, information retrieval
encounters some particular problems such as the data complexity, uncertainty and emergence
(Liu et al., 2013). Traditional RDBMS (Relational Database Management System) technology
is not able to satisfy the requirement of heterogeneous information retrieval due to the data
variety and the high investment (Smith, 2012). At present, NoSQL technology is useful to store
the information which can be represented as map format. Apache HBase is a typical database
to realize the NoSQL idea, which simplifies the design, horizontal scaling and finer control
over availability. The features of HBase are outlined in the original work of GoogleFileSystem
(Ghemawat et al., 2003) and BigTable (Chang et al., 2008). In HBase, tables serve as the input
and output for MapReduce (Dean and Ghemawat, 2008) jobs running in Hadoop (Apache
Hadoop, 2013), and may be accessed through certain typical APIs, such as Java, etc. (Apache
Hbase, 2013).
In this paper, SHMR demonstrates an effective and economical architecture which uses
inexpensive investment to store and retrieve semantic information from heterogeneous
multimedia data. In this architecture, multimedia data with large size is not directly processed,
HBase only stores ontology represented semantic information which can be parallel processed
in distributed nodes with MapReduce-based retrieval algorithm. The experimental results show
that SHMR can effectively identify the heterogeneous multimedia.
3 Methodology
3.1 Overview
This section will present SHMR on how to combine the semantic information and
multimedia documents to perform big data retrieval. Generally speaking, big data processing
tools (e.g. Hadoop) are open-sourced and freely available. On the one hand, Hadoop basically
Page 8 of 30
Accep
ted
Man
uscr
ipt
provides a programming model to perform the distributed computing. Thus, the distributed
paradigm can be followed to revise the traditional retrieval algorithm as long as it meets the
MapReduce programming specification. Multimedia big data retrieval is able to be performed
without increasing the users' cost. On the other hand, the semantic information of the
multimedia documents can also be easily obtained and saved because of the existence of
various computation models (Hofmann, 2001; Blei et al. 2003; Wu et al. 2010). Hence, in
SHMR, such valuable assets will be applied to facilitate the intent reflected heterogeneous
multimedia big data retrieval.
SHMR adopts four-step architecture as shown in Fig. 1. The architecture mainly consists
of multimedia semantic input (Fig. 1(a)), ontology semantic representation (Fig. 1(b)),
NoSQL-based semantic storage (Fig. 1(c)) and MapReduce-based heterogeneous multimedia
retrieval (Fig. 1(d)) steps. In consideration of the economic efficiency, we select Apache
Hadoop as the implementation tool.
Return
Re-Annotation
Upload
(a) Multimedia Semantic Input(b) Ontology Semantic
Representation
(c) NoSQL-based Semantic Storage
Ontology File Input
Map Structure Conversion
Retrieval User
(d) MapReduce-based Heterogeneous Multimedia RetrievalReturned
Result
Ontology
Ontology
Ontology
…
(d3)
(d1)
(d4)
Web Crawling
Sensor Collecting
User Generating
…
Index and Block Generating
HBase Semantic Database
…
Ontology Generating
Map Structure Conversion
Hadoop Framework
Social Annotating
Automatic Learning
Semantic Extraction
Multimedia Location
Weight Adjustment
Scheme
DataNodes
MapReduce based
Retrieval Algorithm
Social Annotating
Automatic Learning
Social Users
(d2)
Annotation
Semantic Field Refinement Scheme
Social Users
Fig. 1 Overview of SHMR
In the first step, the multimedia content will be obtained from various sources such as
Web crawling, sensor collection and user generating, etc. The multimedia types may include
Page 9 of 30
Accep
ted
Man
uscr
ipt
images, videos, audios or text documents with various formats. The semantic information will
be initialized by two ways: (1) social annotating, which means extracting semantic information
from annotations provided by social users; (2) automatic learning, which denotes analyzing
semantic information from multimedia features using topic models. After the semantic
extraction, the semantic fields together with the multimedia location will be represented by
ontology in the second step. Weight adjustment scheme is employed to adjust the weight of
every semantic field.
Ontology files are saved into HBase linked to the real multimedia data with the location
information in the third step. For better adaptation to NoSQL-based big data processing tool,
we use the map<key-value> structure conversion process to normalize the correspondence
between multimedia location and semantic fields. Next, the index and storage block will be
generated, according to which the ontology will be saved into the NoSQL-based distributed
semantic database managed by HBase.
In the fourth step, users can upload annotated multimedia documents with arbitrary format
to execute the heterogeneous multimedia retrieval (Fig. 1 (d1)). The semantic information of
the uploaded file(s) is extracted by social annotating and automatic learning (Fig. 1 (d2)). Then
the engine will execute the ontology generating and map structure conversion to adapt to
MapReduce-based retrieval. After the retrieval, the engine will return the results as the
thumbnails with the file locations (Fig. 1 (d3)). Finally, social users will be asked to give
additional annotations to the multimedia documents they selected (Fig. 1 (d4)) to make the
annotations more abundant and accurate.
3.2 Semantic Fields Extraction
For multimedia documents, semantic fields are extracted by social annotating and
automatic learning. Fig. 2 shows the two approaches to semantic fields extraction from a
typical document in Flickr.
Page 10 of 30
Accep
ted
Man
uscr
ipt
Text Comments User TagsLDA
(Blei et al., 2003) Extract As Initial Social Annotations
Semantic Fields Retrieval User
Add Tags (Guo et al., 2014)
Fig. 2 Semantic Fields Extraction from a Flickr Document
Social annotating is divided into two categories: (1) In the initialization phase, the user
tags of multimedia documents are extracted as the social annotations (note: the dataset is
crawled from some typical websites such as Flickr, Wikipedia and Youtube for simulation,
since the user tags in these websites can be easily analyzed); (2) During the using of SHMR,
social users can manually annotate multimedia documents using the software interfaces
proposed in our previous research (Guo et al., 2014). All the semantic fields are described by
text which will be changed into bytes for storage.
In automatic learning, LDA (Blei et al., 2003) model is employed to analyze the semantic
information. This model performs the extraction process in consideration of the multimedia
text comments, which are supplied by the content provider and embedded in the host document.
LDA model is utilized to extract the topic text as the semantic fields, using the implementation
of APIs from STMT (Stanford Topic Modeling Toolbox, 2014). In addition, this step considers
the multimedia relationship, which is assigned to different document types through hyperlinks.
In this case, the related hyperlinks will be added to the semantic fields.
3.3 Multimedia Semantic Input
Define M as a multimedia document and C as the set of all the input multimedia
documents, satisfying },...,,{ 21 NMMMC (where N is the number of multimedia
Page 11 of 30
Accep
ted
Man
uscr
ipt
documents). Any CM i will be saved in file system. The location information of iM is
represented as text linked to real file.
Although any CMi has numerous semantic fields, not all fields can accurately
represent users' understanding of iM . Therefore, any mii Sns will be assigned a weight.
Hence, any CMi has a final semantic matrix miS as follows:
T
n
nmi www
sssS
,...,,
,...,,
21
21 (1)
where is is the i-th semantic field, n is the number of semantic fields of iM , and iw
is the corresponding weight. Therefore, all the semantic matrices for the multimedia
documents can be defined as },...,,{ 21 mNmm SSSS . The weight iw of any CMi is
assigned a initial value of n/1 .
It is evident that iw for every semantic field could not be constant during the retrieval.
Obviously, more frequently used semantic fields during the retrieval process can better
describe users' intent, and they should be assigned a greater weight. An adjustment scheme is
designed to adjust the weight of every semantic field during retrieving the returned
document M . The algorithm is detailed as follows:
Algorithm 1. Weight Adjustment Scheme
1
2
3
4
5
6
7
8
9
10
11
1. Input: Define matrix Smi for every input multimedia Mi, define semantic matrix S.
2. Initialize: (1) Obtain the semantic fields and store them to Smi.
(2) Assign wi in Smi as 1/n.
(3) Combine all the Smi to generate semantic matrix S.
3. Adjustment During Retrieval: Set step=1
For each returned document M
For j=1 to n
If Mi is retrieved by sj Then
set kj=1
Else set kj =0
End If
Page 12 of 30
Accep
ted
Man
uscr
ipt
12
13
14
15
wj = wj + kj /n
End For
End For
4. Output: Restore all the adjusted semantic matrices to generate new matrix S.
It can be seen from line 8 to 12 that Algorithm 1 is to assign greater weights to more
frequently used semantic fields. In the latter algorithm, the fields with less weight will be
eliminated to make semantic information more accurate.
The initial weight assignment has to check all the multimedia documents in the database,
which is computational expensive. To solve this problem, this process can be executed only
once when the search engine is initialized. Moreover, this process is performed in a
background thread. Considering the weight adjustment scheme during retrieval process, the
computation complexity can be measured as a function of returned list R . In this algorithm,
weight adjustment scheme has to check the semantic fields of every returned document. Hence,
the computation complexity of weight adjustment scheme is |)|( RnO (where ||# represents
the cardinality of a set).
3.4 NoSQL-based Semantic Storage
NoSQL databases have been widely used in industry, including big data and real-time
Web applications. This technology is used to store the semantic fields and multimedia location
which are represented as highly optimized map<key-value> format. The data can be
stored and retrieved utilizing models that employ less constrained consistency than traditional
relational databases such as Oracle and Microsoft SQL Server. In SHMR, Apache HBase is
adopted to simplify the storage.
In SHMR, ontology nodes at the first level are used to represent the most obvious features.
The second and other levels of semantic fields will be provided based on the previous levels.
All the information is extracted through the original semantic input and user feedback. SHMR
adopts composite pattern, where objects can be composed as a tree structure to represent the
Page 13 of 30
Accep
ted
Man
uscr
ipt
part and whole hierarchy (Guo et al., 2014). This pattern regards simple and complex elements
as common elements. Client uses the same method to deal with complex elements as to simple
elements, so that the internal structure of the complex elements will be independent with the
client program (Guo and Zhang, 2013).
To facilitate the subsequent data processing in MapReduce, the data structure is required
to be changed into map<key, value> pairs because the map function takes a key-value pair as
input. HBase stores the files with some blocks in data nodes, the size of every block is a fixed
value (e.g. 64 MByte), and the corresponding multimedia semantic ontology files will be
recorded in each block. The file format is shown in Fig. 3.
Block Header
Record
Record
…
Record
Record Size
Key Bytes (Multimedia Location)
Value Bytes (Multimedia Ontology)
…
Partition 0
Partition 1
…
Partition n
Block File Record
Original Multimedia File
Data Nodes
…
OntologyComponent
OntologyCompositeOntologyLeaf
children
Fig. 3 Map Structure of Block and Record
To reduce the network I/O load between data nodes during the job dispatching, it can be
seen from Fig. 3 that multimedia data are not stored in block files. To a record, the key is the
location of original multimedia document, and the value is the ontology content which is
represented as byte array format.
3.5 MapReduce-based Heterogeneous Multimedia Retrieval
MapReduce-based algorithm consists of two steps: (1) mapper function, which is
specified to process a key-value pair in order to generate a set of intermediate key/value pairs;
(2) reducer function, which is designed to process intermediate values associated with the same
intermediate key. In the queries, every query is assigned a QueryId and QueryOntology, the
Page 14 of 30
Accep
ted
Man
uscr
ipt
returned result will be formed as a ReturnedList. The MapReduce-based retrieval process is
shown in Fig. 4.
mapper
HBase Storage
m1 m2 … mn
…
User
qn
Queries
qn … qn
Intermediate Result 2
Intermediate Result n
…
reducer r1 r2 … rn
rn
ReturnedList
rn … rn
Hadoop Environment
Intermediate Result1
Fig. 4 MapReduce-based Retrieval Process
In the retrieval, for any CMM ji , , the similarity function is computed as :
)()()()(
1),(similarity jiji MOMO
jNiNMM
(2)
where )(iN and )( jN are the row numbers of miS and mjS , )( iMO and )( jMO
are the collection of all the semantic fields of multimedia iM and jM , respectively.
In the queries and returned lists, the information is represented as byte array. The mapper
function takes pairs of record key (multimedia location) and record value (multimedia
ontology). For each pair, retrieval engine executes all queries for each matching using the
similarity function defined in formula (2). MapReduce tool parallel runs the mapper functions
on each machine. When the mapper function finishes, MapReduce tool groups the intermediate
output according to every QueryId. For each corresponding QueryId, the reducer function,
running locally on each machine, simply takes the result whose similarity is above the average
value, and outputs it into the ReturnedList. The pseudo code in Algorithm 2 outlines the
retrieval implementation.
Algorithm 2. MapReduce-based Retrieval Algorithm
1
2
1. Input: (1) Records (containing RecordSize, RecordKey, RecordValue).
(2) Queries (containing QueryIds, QueryOntologies)
Page 15 of 30
Accep
ted
Man
uscr
ipt
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
2. Initialize: Configure the Hadoop running environment.
3. Retrieval:
mapper (RecordKey, RecordValue)
For each (QueryId, QueryOntology) in Queries
sim = similarity(QueryOntology, RecordValue)
If (sim > 0) Then
output (QueryId, (RecordKey, RecordValue, sim))
End If
End For
End Function
reducer (QueryId, Pairs(RecordKey, sim))
avg = the average value for all similarity values of QueryId
For each (RecordKey, sim) in Pairs
If (sim > avg) then
insert (QueryId, (RecordKey, RecordValue)) into ReturnedList
End if
End For
End Function
4. Output:
For each (RecordKey, RecordValue) in ReturnedList
output (QueryId, (RecordKey, RecordValue))
End For
In this algorithm, user query is split into some <QueryId, QueryOntology> pairs. In the
mapper process, in every pair, the QueryOntology is compared with the RecordValue, all the
matching records will be cached. In the reducer process, only the records with greater
similarity are selected into the returned list. In addition, the returned list will be sorted
according to the similarity using Insertion Sort Algorithm, which can guarantee the records
with greater similarity appear at a more forward position in the result list.
Page 16 of 30
Accep
ted
Man
uscr
ipt
3.6 Semantic Field Refinement and User Feedback
For any CMi , the semantic matrix miS stems from the different users' understanding
or automatic learning. Hence, || miS will increase continuously during the using of SHMR. In
miS , wrong or less frequently used semantic fields inevitably exist, which will waste much
retrieving resource and storage space. In order to solve this problem, a semantic field
refinement scheme is designed to to retain the higher frequency annotations and eliminate the
annotations with less use. The scheme is detailed as follows:
Algorithm 3. Semantic Field Refinement Scheme
1
2
3
4
5
6
7
8
9
10
11
12
13
14
1. Input: (1) Load semantic matrix S.
(2) Define a threshold value ( 10 ).
2. Refinement: Set step=1
For i=1 to N
Load Smi and compute
n
iimi w
nt
1
1
For j=1 to n
If wj < mit Then
remove the ith row from Smi
End If
Rebuild Smi
End For
End For
3. Output: New semantic matrix S.
It can be seen from line 5 to 10 that Algorithm 3 is to eliminate the fields whose weights
is less than an average value mit , this can make semantic information more and more accurate.
After the semantic field refinement, SHMR will update the ontology information in the HBase
according to the location information of multimedia documents.
Page 17 of 30
Accep
ted
Man
uscr
ipt
The computation complexity of Algorithm 3 can be measured as a function of the
semantic matrix S. In this algorithm, semantic field refinement scheme will check the semantic
fields of every multimedia document. For every document, computing tmi and eliminate the
fields with less weights take )(nO time. Therefore, in total, the whole running time of
Algorithm 3 is |)|( SnO . This complexity is high and needs enormous computation resource,
so this algorithm will be executed every long time interval such as 24 hours.
SHMR supports user feedback, for a particular returned document, social user can add
additional semantic fields to enrich the semantic information. For these semantic fields, the
initial weight will be assigned as mit . During the retrieval process, the semantic fields will be
increasingly abundant and accurate, and useless fields will be removed gradually. Therefore,
SHMR is a dynamic architecture for the long-term application.
4 Experimental Evaluations
4.1 Dataset and Experiment Tools
Many general datasets have been proposed to construct the experimental evaluation (Khan
et al., 2009; Lu et al., 2012; Yang et al., 2012; Costa et al., 2014). However, some of these
datasets can only perform the experiments aiming to particular multimedia types (e.g. image
and text files). Heterogeneous multimedia retrieval requires a wide variety of files such as
images, videos, audios and text documents, so these datasets are not appropriate for performing
the experiments. In our experiment, a multimedia database containing various multimedia
types is constructed. This multimedia database stores 50,000 multimedia documents, including
20,000 images, 10,000 videos, 10,000 audios, and 10,000 text documents. The documents are
gathered from Flickr, Wikipedia, and Youtube webpages. We use the same categorization
approach as Costa et al. (2014) and divide all the documents into 10 categories referring to the
10 top most populated categories in Wikipedia featured articles, which are listed as: Art &
architecture, Biology, Geography & places, History, Literature & theatre, Media, Music,
Royalty & nobility, Sport & recreation, Warfare (Costa et al., 2014). In every category, file
Page 18 of 30
Accep
ted
Man
uscr
ipt
sub-categories are defined. Hence, every recognizable category contains about 1,000
multimedia documents.
The semantic information is collected from two approaches. On the one hand, users can
provide text tags on the shared multimedia files in Flickr, Wikipedia and Youtube webpages,
we directly analyze the page structure and crawl the tags as the initial social annotations for
simulation. On the other hand, LDA model (Blei et al., 2003) and the STMT (Stanford Topic
Modeling Toolbox, 2014) tool are employed to analyze the text descriptions to extract the topic
words. In each crawled webpage, the multimedia document, file location and semantic
information are used to establish the dataset according to the approach proposed in Section 3.
For simplicity, the dimensional range of multimedia documents is restricted. Table 1 indicates
the initial annotation quantity and dimensional range gathered from the two approaches.
Table 1. Initial Annotation Quantity and Dimensional Range
Multimedia Type Image Video Audio Text
Social Annotating 63,587 35,478 22,174 11,258
Automatic Learning 98,869 47,586 36,352 24,693
Dimensional Range 300KB-1MB 3MB-10MB 1MB-4MB 10KB-30KB
SHMR architecture is implemented on 10 computers, which is able to simulate the
parallel and distributed system. Fig. 5 shows the running architecture in the experiments.
MasterSlave
Web Server(Tomcat6.0)
01
02 03 04
08 09 10
Slave
05 06
07
SlaveSlave Slave
Slave SlaveSlave SlaveSlave
Ubuntu Linux, Hadoop 2.0, OpenSSH
User
Upload
Fig. 5 Running Architecture of Experiments
Page 19 of 30
Accep
ted
Man
uscr
ipt
In this architecture, every node is low-end PC (2.0GHZ CPU, 2GB RAM), which is
installed Ubuntu Linux, Hadoop 2.0 together with the supporting tools (e.g. Java SDK6.0,
OpenSSH, etc.). The nodes are numbered from 01 to 10, and node 01 is taken as the master
node, and all the nodes are organized as slave nodes. Therefore, in total, this parallel system
has 10 machines, 10 processors, 20 GB memory, 10 disks and 10 slave data modes. All the
experiments are conducted in such a simulated environment.
In this paper, some additional software tools are developed to verify the effectiveness of
SHMR. These tools include: (1) annotation interface, which provides an interface for users to
annotate the multimedia documents (Guo et al., 2014). (2) retrieval interface, which is a
convenient operating interface similar to the traditional commercial search engines. Users can
upload multimedia documents in the interface and submit the information to the server. This
interface is developed using HTML5 and can run on typical terminals. (3) search engine,
which is deployed in Web Server (Tomcat 6.0). In the experiment, the threshold value of
Algorithm 3 is chosen as 0.8 and the background process will be executed every 24 hours.
4.2 Performance Evaluation Model
In this section, performance evaluation model will be designed to measure the
performance. The model is based on the following three criteria: precision rate, time cost and
storage cost.
(1) Precision rate. Precision rate is one of the most frequently used measurements for
evaluating the retrieval performance. For better comparison, we slightly modify the traditional
definition and compute the precision rates at top returned list tR . Define all the relevant
multimedia documents set as lR , the precision rate is computed by the proportion of retrieved
relevant documents in tR . Therefore, the precision rate p can be defined as follows:
||
||
t
tl
R
RRp
(3)
Page 20 of 30
Accep
ted
Man
uscr
ipt
(2) Time cost. Time cost includes two factors. The first factor is the time cost of data
process. In SHMR, several background processes are time consuming. The background
process time is defined as follows:
refpreb ttt (4)
where pret is the preprocess time (convert the semantic and multimedia location to map
structure) and satisfies:
N
i
iprepre tt
1
(5)
reft represents the semantic field refinement time (eliminate the redundant or error
semantic information and add the new semantic information from the feedback).
The second factor is retrieval time. Define rt as the time cost for retrieval. In fact, rt
includes extraction time (extract semantic information from the HBase) and the matching time
(match the semantic similarity between the sample document and the stored files).
(3) Storage cost. Because the HBase stores the map information, the storage cost has to be
taken into consideration. The increase rate for storage sp is defined as follows:
orgonts ssp / (6)
where onts and orgs are respectively the total size of ontology files and multimedia
documents:
N
i
iorgorg
N
i
iontont
ss
ss
1
1(7)
4.3 Precision Rate Evaluation
In the experiment, we firstly testify the effectiveness of our algorithm. We upload a
sample multimedia document shown in Fig. 6(a) and search the multimedia documents similar
to it. For simulation, the sample file has been annotated by some other users. Fig. 6(b)
illustrates the part of returned documents. This retrieval costs 4218 ms, and returns 625 images,
Page 21 of 30
Accep
ted
Man
uscr
ipt
136 videos, 17 audios and 295 text documents. The time cost includes extracting semantic
information from documents and matching the semantic fields in HBase.
(b) Some Returned Documents from SHMR(a) Sample Document
Fig. 6. An Example of Multimedia Retrieval
In order to demonstrate the heterogeneous retrieval performance, in the second
experiment, we specially record the precision rates of submitting one sample file to search the
four document types (e.g. use image to search images, videos, audios and text documents). For
every document type, we perform 10 different retrievals using 10 sample documents randomly
chosen from the multimedia dataset and compute the average precision rates. The average
precision rates are illustrated in Fig. 7.
66
68
70
72
74
76
78
80
82
84
Image Video Audio Text
Precision Rate(%)
Image
Video
Audio
Text
Fig. 7. Average Precision Rates of Heterogeneous Retrieval
Fig. 7 indicates that even in the retrieval process between different multimedia types, the
precision rates are not reduced. This is because SHMR completely abandons the physical
feature extraction, and executes the retrieval process based only on semantic fields.
The third experiment will illustrate the precision comparisons between our algorithm and
three typical heterogeneous multimedia retrieval approaches: IBCR (Lu et al., 2012), LRGA
Page 22 of 30
Accep
ted
Man
uscr
ipt
(Yang et al., 2012) and SCM (Costa et al., 2014). Images, videos, audios and text documents
are used as sample files to execute retrieval. For every sample type, we perform 10 different
retrievals and compute the average precision rates. Considering not all approaches support
various document types, we calculate the precision rate for supported ones. The average
precision rates are listed in Fig. 8.
100 200 300 400 50050
60
70
80
90
100
|Rt|
Precision Rate(%)
IBCR
LRGA
SCM
SHMR
Fig. 8. Precision Rates Comparison
It can be seen from Fig. 8 that SHMR achieves good retrieval precision rates. However, in
comparison with the current approaches, obvious advantages cannot be indicated from this
experiment. For instance, the precision rate of SHMR is difficult to exceed SCM.
In order to demonstrate the effectiveness of user feedback, in the fourth experiment, we
specially record the precision rates in consideration of asking the social user to give feedback
annotations to the returned documents. We define feedback quantity to represent the quantity
of feedback annotations. For every feedback quantity (0, 10, 20, 30, 40, 50, 60, 70, 80, 90 and
100), we perform 10 different retrievals and compute the average precision rates. The precision
rates after feedback quantity are illustrated in Fig. 9.
Page 23 of 30
Accep
ted
Man
uscr
ipt
0 20 40 60 80 10050
60
70
80
90
100
Feedback Quantity
Precision Rate(%)
IBCR
LRGA
SCM
SHMR
Fig. 9. Precision Rates After User Feedback
Fig. 9 indicates that after the user feedback, the precision can be increased. With the
increasing of the feedback quantity, the gap will be growing greatly. From the comparison, we
can see that the effectiveness of SHMR outperforms some other approaches in the case of user
feedback.
4.4 Time Cost Evaluation
In order to carry out the retrieval process, SHMR has to perform several background
processes whose time cost is bt , which includes pret and reft . Table 2 shows the time cost
of the background processes.
Table 2. Time Cost of Background Processes (s)
Multimedia Type pret reft bt
Image 102 34 136
Video 57 21 78
Audio 56 19 75
Text 83 34 117
It can be seen from Table 2 that pret , reft will cost many seconds ( bt respectively costs
about 136 seconds to image type, 78 seconds to video type, 75 seconds to audio type and 117
seconds to text type). However, the background processes are not always executed. Preprocess
Page 24 of 30
Accep
ted
Man
uscr
ipt
is executed once for initialization, and semantic field refinement is executed every 24 hours in
background thread.
Next, we measure the retrieval time rt . We specially record the time cost of 16 retrieval
processes. For every document type (image, video, audio and text), we perform 4 different
retrievals (the samples are numbered from 01 to 04). In every retrieval, rt will be recorded
respectively. Detailed time cost of 16 retrievals is listed in Table 3.
Table 3. Time Cost of 16 Retrievals (ms)
Sample Type 01 02 03 04
Image 3928 4350 3814 3020
Video 4215 4624 5742 5235
Audio 4521 4012 3871 3214
Text 3785 3541 3020 3147
Table 3 shows that the semantic information extraction costs only a very short period of
time, this is because we only need to directly extract the semantic segment from the sample
document.
To compare the performance when data scale increases, we perform the following
experiment to illustrate the time cost comparison between our algorithm and some other
approaches. In this experiment, we use datasets with different scale to execute the retrieval.
The document quantities are selected as 5,000, 10,000, 15,000, 20,000, 25,000, 30,000, 35,000,
40,000, 45,000 and 50,000. For every approach, we perform 10 different retrievals and
compute the average time cost. The average time costs are listed in Fig. 10.
Page 25 of 30
Accep
ted
Man
uscr
ipt
100 200 300 400 5002000
3000
4000
5000
6000
7000
8000
Document Quantity(X100)
Time Cost(ms)
IBCR
LRGA
SCM
SHMR
Fig. 10. Time Cost Comparison When Data Scale Increases
Fig. 10 indicates that although SHMR has no obvious advantages in the case of small data
scale, when the database increases, the time cost is significantly lower than other approaches.
Therefore, SHMR is very suitable for heterogeneous multimedia retrieval at distributed big
data environments.
4.5 Storage Cost Evaluation
In this section, storage cost will be taken into consideration because the HBase has to
store the ontology represented semantic files. Table 4 shows the storage space cost in our
architecture.
Table 4. Storage Space Cost (MB)
Multimedia Type onts orgs sp (%)
Image 319 15023 2.12
Video 182 73007 0.25
Audio 163 23575 0.69
Text 43.2 215 20.09
We can see from Table 4 that the semantic information file size occupies 20.09% for text
type. This is because the semantic information in text files is abundant. However, the semantic
Page 26 of 30
Accep
ted
Man
uscr
ipt
file size has almost not increased for image, video and audio types ( sp respectively is about
2.12% for image, 0.25% for video and 0.69% for audio).
5 Conclusions
In this paper, a novel architecture named SHMR supporting semantic multimedia retrieval
in heterogeneous big data environments has been proposed. We described semantic fields
extraction, data storage, semantic-based multimedia retrieval and performance evaluation
model. The architecture consists of four independent steps. Algorithms proposed in this paper
solve two critical problems in semantic-based heterogeneous multimedia retrieval. First, noises
in semantic information cannot reflect users' intent, how to eliminate them to guarantee better
retrieval precision. Second, given the multimedia documents with semantic information, how
to convert it to map structure and retrieve it from the database.
The framework possesses excellent economic efficiency. On Hadoop-based platform,
users only need to purchase some cheap computers to perform the data storage and retrieval
process, this can help us to reduce the hardware investment. Open-sourced tools such as
Ubuntu Linux, Java SDK and Hadoop can be freely downloaded from the corresponding
websites. This will save the investment of software. In addition, the Apache Hadoop provides
simplified programming models for reliable, scalable, distributed computing. It allows
distributed processing of large data sets across clusters of computers using simple
programming models (Leverich and Kozyrakis, 2010). This will save the learning cost.
We applied several experiments on the proposed framework. Comparisons in experiments
demonstrated that the proposed framework obtains remarkable performance, especially in the
case of data scale increasing. After user feedback, the precision outperforms the existing
approaches. In addition, storage and I/O cost in the architecture can be significantly reduced by
using the proposed scheme.
However, in this paper, experimental dataset acquisition is from some specific websites
such as Flickr, Wikipedia and Youtube, the semantic provision by social users is still a
simulation. In the future work, we will plan to explore several improvements of SHMR,
Page 27 of 30
Accep
ted
Man
uscr
ipt
including performing the experiments in real Internet environment and increasing the retrieval
speed.
Acknowledgments
Thanks for the help of Professor Li Kuang in the research of semantic-based
heterogeneous multimedia big data retrieval. This work is supported by Hunan Science and
Technology Plan (2012RS4054), Natural Science Foundation of China (61202341), China
Scholarship (201308430049) and the Major Science & Technology Research Program for
Strategic Emerging Industry of Hunan (2012GK4054). The authors declare that they have no
conflict of interests.
References
Apache Hadoop, 2013. Available at http://www.uefi.org/home/ (accessed on 1 November
2013).
Apache Hbase, 2013. Available at http://en.wikipedia.org/wiki/HBase (accessed on 1
November 2013).
Blei, D.M., Ng, A.Y., Jordan, M.I., 2003. Latent Dirichlet allocation. Journal of Machine
Learning Research 3, 993-1022.
Chang, F., Dean, J., Ghemawat S., Hsieh, W. C., Wallach, D. A., Burrows, M., Chandra, T.,
Fikes, A., Gruber, R. E., 2008. Bigtable: A distributed storage system for structured data.
ACM Transactions on Computer Systems 26, 1-26.
Chen, J., Yang, Y., 2011. Temporal dependency-based checkpoint selection for dynamic
verification of temporal constraints in scientific workflow systems. ACM Transactions on
Software Engineering and Methodology 20, 9.
Chen, R., Cao, Y.F., Sun, H., 2011. Active sample-selecting and manifold learning-based
relevance feedback method for synthetic aperture radar image retrieval. IET Radar, Sonar
& Navigation 5, 118-127.
Page 28 of 30
Accep
ted
Man
uscr
ipt
Costa Pereira, J., Coviello, E., Doyle, G., Rasiwasia, N., Lanckriet, G., Levy, R., Vasconcelos,
N., 2014. On the role of correlation and abstraction in cross-modal multimedia retrieval.
IEEE Transactions on Pattern Analysis and Machine Intelligence 36, 521-535.
Datta, R., Joshi, D., Li, J., Wang, J. Z., 2008. Image retrieval: ideas, influences, and trends of
the new age. ACM Computing Surveys 40, 1-60.
Dean, J., Ghemawat, S., 2008. MapReduce: simplified data processing on large clusters.
Communications of the ACM 51, 107-113.
Ghemawat, S., Gobioff, H., Leung, S.T., 2003. The Google file system. In: Proceedings of the
19th ACM Symposium on Operating Systems Principles, New York, USA, pp. 29-43.
Gijsenij, A., Gevers, T., 2010. Color constancy using natural image statistics and scene
semantics. IEEE Transactions on Pattern Analysis and Machine Intelligence 33, 687-698.
Guo, K., Ma, J., Duan, G., 2014. DHSR: a novel semantic retrieval approach for ubiquitous
multimedia. Wireless Personal Communications 76, 779-793.
Guo, K., Zhang, S., 2013. A semantic medical multimedia retrieval approach using ontology
information hiding. Computational and mathematical methods in medicine 2013, 407917.
He, X., 2010. Laplacian regularized D-Optimal design for active learning and its application
to image retrieval. IEEE Transactions on Image Processing 19, 254-263.
Hofmann, T., 2001. Unsupervised learning by probabilistic latent semantic analysis. Machine
Learning 42, 177-196.
Khan, I., Saffari, A., Bischof, H., 2009. Tvgraz: Multi-modal learning of object categories by
combining textual and visual features. In: Proceedings of 33rd Workshop Austrian
Association for Pattern Recognition, Austria, pp. 213-224.
Leverich, J., Kozyrakis, C., 2010. On the energy (in) efficiency of hadoop clusters. ACM
SIGOPS Operating Systems Review 44, 61-65.
Liu, C., Chen, J., Yang, L., Zhang, X., Yang, C., Ranjan, R., Ramamohanarao, K., 2013.
Authorized public auditing of dynamic big data storage on cloud with efficient verifiable
fine-grained updates. IEEE Transactions on Parallel and Distributed Systems 99, online.
Page 29 of 30
Accep
ted
Man
uscr
ipt
Liu, K., Wei, S., Zhao, Y., Zhu, Z., Wei, Y., Xu, C., 2014. Accumulated reconstruction error
vector (AREV): a semantic representation for cross-media retrieval. Multimedia Tools
and Applications, online.
Lu, B., Wang, G. R., Yuan, Y., 2012. A novel approach towards large scale cross-media
retrieval. Journal of Computer Science and Technology 27, 1140-1149.
Maedche, A., Staab, S., 2001. Ontology learning for the semantic web. IEEE Intelligent
systems 16, 72-79.
Rasiwasia, N., Costa Pereira, J., Coviello, E., Doyle, G., Lanckriet, G. R., Levy, R.,
Vasconcelos, N., 2010. A new approach to cross-modal multimedia retrieval.
In: Proceedings of the ACM international conference on Multimedia, Firenze, Italy, pp.
251-260.
Smeulders, A. W., Worring, M., Santini, S., Gupta, A., Jain, R., 2000. Content-based image
retrieval at the end of the early years. IEEE Transactions Pattern Analysis and Machine
Intelligence 22, 1349-1380.
Smith, J. R., 2011. History made everyday. IEEE Multimedia 18, 2-3.
Smith, J. R., 2012. Minding the gap. IEEE MultiMedia 19, 2-3.
Stanford Topic Modeling Toolbox, 2014. Available at http://nlp.stanford.edu/software/tmt/tmt-0.4/
(accessed on 1 April 2014).
Wang, J., Liu, Z., Zhang, S., Zhang, X., 2014. Defending collaborative false data injection
attacks in wireless sensor networks. Information Sciences 254, 39-53.
Wang, X., Lv, T., Wang, S., Wang, Z., 2008. An ontology and swrl based 3d model retrieval
system. Lecture Notes in Computer Science 4993, 335-344.
Wang, X.Y., Chen, J.W., Yang, H.Y., 2011. A new integrated SVM classifiers for relevance
feedback content-based image retrieval using EM parameter estimation. Applied Soft
Computing 11, 2787-2804.
Wong, R.C.F., Leung, C.H.C., 2008. Automatic semantic annotation of real-world Web
images. IEEE Transactions on Pattern Analysis and Machine Intelligence 30, 1933-1944.
Page 30 of 30
Accep
ted
Man
uscr
ipt
Wu, L., Hoi, S. C., Yu, N., 2010. Semantics-preserving bag-of-words models and
applications. IEEE Transactions on Image Processing 19, 1908-1920.
Wang, G., Jiang, W., Wu, J., Xiong, Z., 2014. Fine-grained feature-based social influence
evaluation in online social networks. IEEE Transactions on Parallel and Distributed
Systems 25, 2286-2296.
Yang, D., Dong, M., Miao, R., 2008. Development of a product configuration system with an
ontology-based approach. Computer-Aided Design 40, 863-878.
Yang, Y., Nie, F., Xu, D., Luo, J., Zhuang, Y., Pan, Y., 2012. A multimedia retrieval
architecture based on semi-supervised ranking and relevance feedback. IEEE
Transactions on Pattern Analysis and Machine Intelligence 34, 723-742.
Zhai, X., Peng, Y., Xiao, J., 2013. Cross-media retrieval by intra-media and inter-media
correlation mining. Multimedia Systems 19, 395-406.
Zhang, L., Wang, L., Lin, W., 2012. Generalized Biased Discriminant Analysis for content-
based image retrieval. IEEE Transactions on Systems, Man, and Cybernetics, Part B:
Cybernetics 42, 282-290.
Zhao, R., Grosky, W. I., 2002. Narrowing the semantic gap-improved text-based Web
document retrieval using visual features. IEEE Transactions on Multimedia 4, 189-200.
Zhou, G. T., Ting, K. M., Liu, F. T., Yin, Y., 2012. Relevance feature mapping for content-
based multimedia information retrieval. Pattern Recognition 45, 1707-1720.