9
ARTICLE IN PRESS JID: JSS [m5G;December 8, 2014;12:13] The Journal of Systems and Software 000 (2014) 1–9 Contents lists available at ScienceDirect The Journal of Systems and Software journal homepage: www.elsevier.com/locate/jss Semantic based representing and organizing surveillance big data using video structural description technology Zheng Xu b,a,, Yunhuai Liu a , Lin Mei a , Chuanping Hu a , Chen Lan a a The Third Research Institute of Ministry of Public Security, Shanghai, China b Tsinghua University, China article info Article history: Received 23 September 2013 Revised 27 May 2014 Accepted 13 July 2014 Available online xxx Keywords: Video structural description Surveillance big data Big data representing and organizing abstract Big data is an emerging paradigm applied to datasets whose size is beyond the ability of commonly used software tools to capture, manage, and process the data within a tolerable elapsed time. Especially, the data volume of all video surveillance devices in Shanghai, China, is up to 1 TB every day. Thus, it is im- portant to accurately describe the video content and enable the organizing and searching potential videos in order to detect and analyze related surveillance events. Unfortunately, raw data and low level features cannot meet the video based task. In this paper, a semantic based model is proposed for representing and organizing video big data. The proposed surveillance video representation method defines a number of concepts and their relations, which allows users to use them to annotate related surveillance events. The defined concepts include person, vehicles, and traffic sighs, which can be used for annotating and repre- senting video traffic events unambiguous. In addition, the spatial and temporal relation between objects in an event is defined, which can be used for annotating and representing the semantic relation between ob- jects in related surveillance events. Moreover, semantic link network is used for organizing video resources based on their associations. In the application, one case study is presented to analyze the surveillance big data. © 2014 Elsevier Inc. All rights reserved. 1. Introduction Big data is an emerging paradigm applied to datasets whose size is beyond the ability of commonly used software tools to capture, manage, and process the data within a tolerable elapsed time (Wigan and Clarke, 2013). Such datasets are often from various sources (Va- riety) yet unstructured such as social media, sensors, scientific ap- plications, surveillance, video and image archives, Internet texts and documents, Internet search indexing, medical records, business trans- actions and web logs; and are of large size (Volume) with fast data in/out (Velocity). More importantly, big data has to be of high value (Value). Various technologies are being discussed to support the han- dling of big data such as massively parallel processing databases (Yuan et al., 2013), scalable storage systems (Zhang et al., 2013a), cloud computing platforms (Liu et al., 2013), and MapReduce (Zhang et al., 2013b). Distributed systems are a classical research discipline investigating various distributed computing technologies and appli- cations such as cloud computing (Yan et al., 2013a, 2013b; Lizhe et al., 2010) and MapReduce (Ze et al., 2014; Dan et al., 2013). Corresponding author at: Tsinghua University, China. Tel.: +86 13817917970. E-mail address: [email protected] (Z. Xu). With new paradigms and technologies, distributed systems research keeps going with new innovative outcomes from both industry and academy. Recent research shows that videos “in the wild” are growing at a staggering rate (Cisco Visual Networking Index, 2013; Great Scott, 2013). For example, with the rapid growth of video resources on the world-wide-web, on YouTube 1 alone, 35 h of video are unloaded every minute, and over 700 billion videos were watched in 2010. Vast amount of videos with no metadata have emerged. Thus au- tomatically understanding raw videos solely based on their visual appearance becomes an important yet challenging problem. The rapid increase number of video resources has brought an urgent need to develop intelligent methods to represent and annotate the video events. Typical applications in which representing and anno- tating video events include criminal investigation systems (Wu and Wang, 2010), video surveillance (Liu et al., 2009), intrusion detec- tion system (Zhang et al., 2008), video resources browsing and in- dexing system (Yu et al., 2012), sport events detection (Xu et al., 2008), and many others. These urgent needs have posed challenges for video resources management, and have attracted the research 1 www.youtube.com. http://dx.doi.org/10.1016/j.jss.2014.07.024 0164-1212/© 2014 Elsevier Inc. All rights reserved. Please cite this article as: Z. Xu et al., Semantic based representing and organizing surveillance big data using video structural description technology, The Journal of Systems and Software (2014), http://dx.doi.org/10.1016/j.jss.2014.07.024

Semantic based representing and organizing surveillance big data using video structural description technology

  • Upload
    lan

  • View
    219

  • Download
    4

Embed Size (px)

Citation preview

Page 1: Semantic based representing and organizing surveillance big data using video structural description technology

ARTICLE IN PRESSJID: JSS [m5G;December 8, 2014;12:13]

The Journal of Systems and Software 000 (2014) 1–9

Contents lists available at ScienceDirect

The Journal of Systems and Software

journal homepage: www.elsevier.com/locate/jss

Semantic based representing and organizing surveillance big data using

video structural description technology

Zheng Xu b,a,∗ , Yunhuai Liu a, Lin Mei a, Chuanping Hu a, Chen Lan a

a The Third Research Institute of Ministry of Public Security, Shanghai, Chinab Tsinghua University, China

a r t i c l e i n f o

Article history:

Received 23 September 2013

Revised 27 May 2014

Accepted 13 July 2014

Available online xxx

Keywords:

Video structural description

Surveillance big data

Big data representing and organizing

a b s t r a c t

Big data is an emerging paradigm applied to datasets whose size is beyond the ability of commonly used

software tools to capture, manage, and process the data within a tolerable elapsed time. Especially, the

data volume of all video surveillance devices in Shanghai, China, is up to 1 TB every day. Thus, it is im-

portant to accurately describe the video content and enable the organizing and searching potential videos

in order to detect and analyze related surveillance events. Unfortunately, raw data and low level features

cannot meet the video based task. In this paper, a semantic based model is proposed for representing and

organizing video big data. The proposed surveillance video representation method defines a number of

concepts and their relations, which allows users to use them to annotate related surveillance events. The

defined concepts include person, vehicles, and traffic sighs, which can be used for annotating and repre-

senting video traffic events unambiguous. In addition, the spatial and temporal relation between objects in

an event is defined, which can be used for annotating and representing the semantic relation between ob-

jects in related surveillance events. Moreover, semantic link network is used for organizing video resources

based on their associations. In the application, one case study is presented to analyze the surveillance big

data.

© 2014 Elsevier Inc. All rights reserved.

1

i

m

a

r

p

d

a

i

(

d

(

c

e

i

c

e

W

k

a

a

2

t

e

V

t

a

r

n

v

t

W

t

d

2

h

0

. Introduction

Big data is an emerging paradigm applied to datasets whose size

s beyond the ability of commonly used software tools to capture,

anage, and process the data within a tolerable elapsed time (Wigan

nd Clarke, 2013). Such datasets are often from various sources (Va-

iety) yet unstructured such as social media, sensors, scientific ap-

lications, surveillance, video and image archives, Internet texts and

ocuments, Internet search indexing, medical records, business trans-

ctions and web logs; and are of large size (Volume) with fast data

n/out (Velocity). More importantly, big data has to be of high value

Value). Various technologies are being discussed to support the han-

ling of big data such as massively parallel processing databases

Yuan et al., 2013), scalable storage systems (Zhang et al., 2013a),

loud computing platforms (Liu et al., 2013), and MapReduce (Zhang

t al., 2013b). Distributed systems are a classical research discipline

nvestigating various distributed computing technologies and appli-

ations such as cloud computing (Yan et al., 2013a, 2013b; Lizhe

t al., 2010) and MapReduce (Ze et al., 2014; Dan et al., 2013).

∗ Corresponding author at: Tsinghua University, China. Tel.: +86 13817917970.

E-mail address: [email protected] (Z. Xu).

f

ttp://dx.doi.org/10.1016/j.jss.2014.07.024

164-1212/© 2014 Elsevier Inc. All rights reserved.

Please cite this article as: Z. Xu et al., Semantic based representing and

technology, The Journal of Systems and Software (2014), http://dx.doi.org

ith new paradigms and technologies, distributed systems research

eeps going with new innovative outcomes from both industry and

cademy.

Recent research shows that videos “in the wild” are growing at

staggering rate (Cisco Visual Networking Index, 2013; Great Scott,

013). For example, with the rapid growth of video resources on

he world-wide-web, on YouTube1 alone, 35 h of video are unloaded

very minute, and over 700 billion videos were watched in 2010.

ast amount of videos with no metadata have emerged. Thus au-

omatically understanding raw videos solely based on their visual

ppearance becomes an important yet challenging problem. The

apid increase number of video resources has brought an urgent

eed to develop intelligent methods to represent and annotate the

ideo events. Typical applications in which representing and anno-

ating video events include criminal investigation systems (Wu and

ang, 2010), video surveillance (Liu et al., 2009), intrusion detec-

ion system (Zhang et al., 2008), video resources browsing and in-

exing system (Yu et al., 2012), sport events detection (Xu et al.,

008), and many others. These urgent needs have posed challenges

or video resources management, and have attracted the research

1 www.youtube.com.

organizing surveillance big data using video structural description

/10.1016/j.jss.2014.07.024

Page 2: Semantic based representing and organizing surveillance big data using video structural description technology

2 Z. Xu et al. / The Journal of Systems and Software 000 (2014) 1–9

ARTICLE IN PRESSJID: JSS [m5G;December 8, 2014;12:13]

w

g

i

t

S

s

s

2

r

i

b

c

o

t

t

t

e

c

d

t

d

t

a

a

r

m

t

s

f

s

e

w

g

o

t

p

s

of the multimedia analysis and understanding. Overall, the goal is to

enable users to search the related events from the huge number of

video resources. The ultimate goal of extracting video events brings

the challenge to build an intelligent method to automatically detect

and retrieve video events.

In fact, the huge number of new emerging video surveillance

data becomes a new application field of big data. The processing

and analysing video surveillance data follow the 4 V feature of big

data.

(1) Variety: The video surveillance data comes from the differ-

ent devices such as traffic cameras, hotel cameras and so on.

Besides the different surveillance devices, these devices also

come from the different region. The distributed feature of video

surveillance data augments the variety of the resources. For

example, in the criminal investigation systems, different video

surveillance data from the different surveillance devices are

processed and analyzed to detect the related people, car, or

things. The variety of video surveillance devices brings the

big challenges for storage and management distributed video

surveillance data.

(2) Volume: With the rapid development of the surveillance de-

vices, for example, the number of surveillance devices in

Shanghai is up to 200,000, the volume of video surveillance

data becomes the big data. The data volume of all video surveil-

lance data in Shanghai is up to 1 TB every day. The whole vol-

ume of all video surveillance data in Shanghai Pudong is up to

25 PB. The huge volume of video surveillance data brings the

big challenges for processing and analyzing distributed video

surveillance data.

(3) Velocity: The video surveillance devices are with fast data

in/out. The video surveillance devices usually work in 24 h per

day. The video surveillance devices collect real-time videos. The

real-time collected videos usually upload to the storage server

or data center. The velocity of collecting video surveillance data

is faster than that of processing and analyzing them. The high

velocity of video surveillance devices brings the big challenges

for processing and analyzing video surveillance data. For exam-

ple, the speed of processing and analyzing video surveillance

data is much lower than collecting them.

(4) Value: The video surveillance data usually has high value.

For example, in the criminal investigation systems, the video

surveillance can help the police to find the suspect. In the traf-

fic surveillance system, the video data can detect the illegal

vehicles or people. On the other hands, the huge volume brings

the challenges for mining the value from the video surveillance

data. The phenomenon of “High volume, low value” also exists

in the video surveillance big data.

In this paper, a semantic based model for representing and or-

ganizing video resources is proposed for bridging the gap between

low-level representative features and high-level semantic content in

terms of object, event, spatial and temporal relation extraction. The

proposed model is named Video Structural Description (VSD). In order

to solve the representing and annotating need for objects, events, and

spatial–temporal relations during the video understanding process,

a wide-domain applicable traffic ontology that uses objects and spa-

tial/temporal relations in an event is developed. In order to organize

the video resources based on their association, semantic link network

(Zhuge, 2011) based method is used. The major contributions of this

paper are summarized as follow.

(1) A whole framework for building domain ontology of VSD is

proposed. The basic concepts, events, and relations of a given

domain are defined. Moreover, a rule construction standard

which is domain independent is given to construct domain

Please cite this article as: Z. Xu et al., Semantic based representing and

technology, The Journal of Systems and Software (2014), http://dx.doi.org

ontologies. Domain ontologies are enriched by including addi-

tional rule definitions.

(2) The proposed method defines a number of concepts and their

relations, which allows users to use them to detect video traf-

fic events. A number of concepts including person, vehicle, and

traffic sigh is given, which can be used by users for annotating

and representing video traffic events unambiguous. In addi-

tion, the spatial and temporal relation in an event is proposed,

which can be used for annotating and representing the seman-

tic relation between objects in video traffic events.

(3) In order to organize the video resources, semantic link net-

work based method is used. The semantic link network

model can mine and organize video resources based on their

associations.

(4) A semantic video annotation tool is implemented for annotat-

ing and organizing video resources based on the video anno-

tation ontology. The annotation tool allows annotators to use

domain specific vocabularies from traffic field to describe the

video resources. These annotated video resources are man-

aged based on the semantic relation between annotations.

A semantic-based video organizing platform is provided for

searching videos. It supports reasoning operation of the anno-

tations of video resources.

The organization of the paper is as follows. In Section 2, the related

ork of the proposed work is given. The proposed VSD framework is

iven in Section 3. In Section 4, the ontology of traffic events domain

s built. In Section 5, the semantic link network model is proposed

o mine and organize video resources based on their associations. In

ections 6 and 7, the application and case study for mining video

urveillance data are given. Finally, the conclusions and future re-

earch directions are discussed.

. Related work

The key issue in semantic content extraction from videos is the

epresentation of the semantic content. Many researchers have stud-

ed this from different aspects. A simple representation method may

e associated the video events with low level features (texture, shape,

olor, etc.) using frames or shots from videos. These simple meth-

ds do not use any relations between features such as spatial or

emporal relations. Obviously, using spatial or temporal relations be-

ween objects in videos is important for achieving accurate extrac-

ion of events. Researches such as BilVideo (Donderler et al., 2005),

xtended-AVIS (Sevilmis et al., 2008), multiView (Fan et al., 2001) and

lassView (Fan et al., 2004) used spatial and temporal relations but

o not have ontology-based models for semantic content represen-

ation. Bai et al. (2007) presented a semantic based framework using

omain ontology. Their work is used to represent video events with

emporal description logic. However, the event extraction is manu-

lly and event descriptions only use temporal information. Nevatia

nd Natarajan (2005) gave an ontology model using spatial temporal

elations to extract complex events where the extraction process is

anual. In Bagdanov et al. (2007), each defined concept is related

o a corresponding visual concept with only temporal relations for

occer videos. Nevatia and Natarajan (2005) built event ontology

or natural representation of complex spatial temporal events given

impler events. A Video Event Recognition Language (VERL) (Nevatia

t al., 2005) that allows users to define the events without interacting

ith the low level processing is defined. VERL is intended to be a lan-

uage for representing events for the purpose of designing ontology

f the domain, and, Video Event Markup Language (VEML) is used

o manually annotate VERL events in videos. The lack of low level

rocessing and using manual annotation are the drawbacks of this

tudy. Akdemir et al. (2008) present a systematic approach to address

organizing surveillance big data using video structural description

/10.1016/j.jss.2014.07.024

Page 3: Semantic based representing and organizing surveillance big data using video structural description technology

Z. Xu et al. / The Journal of Systems and Software 000 (2014) 1–9 3

ARTICLE IN PRESSJID: JSS [m5G;December 8, 2014;12:13]

t

T

d

b

w

m

p

b

a

r

l

a

(

w

b

3

i

(

j

(

P

o

c

F

R

o

i

s

m

e

m

s

g

V

c

V

c

p

a

i

a

r

3

t

F

Fig. 1. The hierarchical structure of VSD.

j

p

T

T

l

s

c

r

3

T

a

l

he problem of designing ontologies for visual activity recognition.

he general ontology design principles are adapted to the specific

omain of human activity ontologies using spatial temporal relations

etween contextual entities. However, most of the contextual entities

hich are utilized as critical entities in spatial and temporal relations

ust be manually provided for activity recognition. Some researches

ay attention to the symbolic representation, i.e. semantic relations

etween visual symbols. Marszalek et al. (2007) used semantic hier-

rchies from WordNet to integrate prior knowledge about inter-class

elationships into the visual appearance learning. Deng et al. (2009)

aunched Image-Net aiming at building a synsets in WordNet with an

verage of 500–1000 images selected manually by humans. Yao et al.

2010) presented an image parsing to text description (I2T) frame-

ork that generates text descriptions of image and video content

ased on image understanding.

. The overview of video structural description

Video structural description (VSD) aims at parsing video content

nto the text information, which uses spatiotemporal segmentation

Chen and Ahuja, 2012), feature selection (Javed et al., 2012), ob-

ect recognition (Choi et al., 2012), and semantic web technology

Luo et al., 2011; Xu et al., 2011; Liu et al., 2010, 2011; Plebani and

ernici, 2009). The parsed text information preserves the semantics

f the video content, which can be understood by human and ma-

hine. Generally speaking, the definition of VSD includes two aspects.

irstly, VSD aims at extracting the semantic content from the video.

elying on the standard video content description mechanism, the

bjects and their features of the video are recognized and expressed

n the form of text. Secondly, VSD aims at organizing the video re-

ources with their semantic relations. With the semantic links across

ultiple cameras, it is possible to use the data mining methods for

ffective analysis and semantic retrieval of videos. Moreover, the se-

antic linking between the video resources and other information

ystems becomes possible. VSD is the foundation of building the next

eneration of intelligent and semantic video surveillance network.

SD also makes the systematical, interconnected, and diversity appli-

ations on video surveillance system to be possible. With the help of

SD, the simple data acquisition mode of video surveillance system

an be transferred to integration mode of data acquisition, content

rocessing, and semantic information services. The primary key issue

nd main innovation of VSD is the integration of video understand-

ng and semantic web technologies. The semantic web technologies

re used for representing and organizing the huge number of video

esources.

.1. The hierarchical structure of VSD

VSD is set as a hierarchical semantic data model including

hree different layers. The different layers of VSD are illustrated in

ig. 1.

(1) Pattern recognition layer: In this layer, VSD technology wants

to extract and represent the content of the videos. For exam-

ple, the people, vehicle, and traffic sigh of the traffic video are

extracted. Different from the existing video content extraction

and representation method, VSD uses the domain ontology in-

cluding basic concepts, events, and relations. These domain

ontologies can be used by users for annotating and represent-

ing video traffic events unambiguous. In addition, the spatial

and temporal relations are defined in event and concepts def-

initions, which can be used by users for annotating and repre-

senting the semantic relations between objects in video traffic

events.

(2) Video resources layer: In the pattern recognition layer, VSD

extracts and represents the content of a single video. In the

Please cite this article as: Z. Xu et al., Semantic based representing and

technology, The Journal of Systems and Software (2014), http://dx.doi.org

video resources layer, VSD technology aims at linking the video

resources with their semantic relations. Similar to the World

Wide Web which uses hyperlinks to link resources, VSD uses

semantic links instead hyperlinks to link video resources.

(3) User demands layer: The pattern recognition layer and video

resources layer focus on processing video resources using their

semantics. The user demands layer focus on processing the

need of users and returning the related resources. In the user

demand layer, the video resources are clustering and integrat-

ing according to user’s need.

From Fig. 1, the bottom layer consists of different objects. These ob-

ects recognized from related pattern recognition methods are com-

osed of single videos. The middle layer consists of different videos.

hese videos consist of the different objects from the bottom layer.

he semantic relations also exist between video resources. In the top

ayer, users can search, annotate, and browse the related video re-

ources. For example, if a user wants to know the vehicles which

ross the red traffic light in a video, the video resources layer can

eturn the related videos.

.2. The supporting technologies of VSD

In this section, the supporting technologies of VSD are introduced.

hese technologies are used in the different layers of VSD, which can

chieve the ultimate goal of VSD. The supporting technologies are

isted as follow.

(1) Computer vision: Computer vision is a field that includes

methods for acquiring, processing, analyzing, and understand-

ing images. A theme in the development of this field has been

to duplicate the abilities of human vision by electronically per-

ceiving and understanding an image. This image understanding

can be seen as the disentangling of symbolic information from

image data using models constructed with the aid of geome-

try, physics, statistics, and learning theory. The computer vision

technologies can be used in the pattern recognition layer. For

example, the car and people of a traffic video can be detected

by the object detection technologies from computer vision

field.

(2) Semantic web: The Semantic Web (Berners-Lee et al., 2001; Ma

et al., 2010; Zhuge, 2009) is a collaborative movement led by

the international standards body, the World Wide Web Consor-

tium (W3C). The standard promotes common data formats on

the World Wide Web. By encouraging the inclusion of semantic

organizing surveillance big data using video structural description

/10.1016/j.jss.2014.07.024

Page 4: Semantic based representing and organizing surveillance big data using video structural description technology

4 Z. Xu et al. / The Journal of Systems and Software 000 (2014) 1–9

ARTICLE IN PRESSJID: JSS [m5G;December 8, 2014;12:13]

Fig. 2. An example of representing a key frame of a video using the domain ontology.

Fig. 3. An example ontology.

D

t

W

C

w

D

v

d

2

content in web pages, the Semantic Web aims at converting the

current web dominated by unstructured and semi-structured

documents into a “web of data”. The semantic web technol-

ogy can be used in the pattern recognition layer. For example,

with the help of the specific domain ontologies, the objects and

relations of videos can be detected accurately.

(3) Semantic link network: A semantic link network (SLN) is a re-

lational network consisting of the following main parts: a set of

semantic nodes, a set of semantic links between the nodes, and

a semantic space. Semantic nodes can be anything. The seman-

tic link between nodes is regulated by the attributes of nodes or

generated by interactions between nodes. The semantic space

includes a classification hierarchy of concepts and a set of rules

for reasoning and inferring semantic links, for influence nodes

and links, for networking, and for evolving the network. The

semantic link network can be used in the video resources

layer. For example, with the help of the semantic link net-

work model, the videos can be organized with their semantic

relations.

(4) Cloud computing: Cloud computing is a colloquial expres-

sion used to describe a variety of different computing con-

cepts that involve a large number of computers that are con-

nected through a real-time communication network. In sci-

ence, cloud computing is a synonym for distributed computing

over a network and means the ability to run a program on many

connected computers at the same time. The cloud computing

technologies can be used in the video application layer. For

example, with the help of the clouding computing technolo-

gies, the huge number of videos can be managed and indexed

efficiently and robustly.

4. The bottom layer – building domain ontology for representing

video surveillance data

In this section, the domain ontology of traffic events is built. Since

the number of traffic videos is huge, the standard ontology can help

to represent videos accurately and efficiently.

4.1. Basic definitions

Concepts, objects, attributes, spatial relations, temporal relations,

and events are basic components of the proposed ontology frame-

work. In this section, we give basic definitions of these components.

Moreover, we add the ontology constrains which are used to give

the standard when building ontologies. Figs. 2 and 3 give an ex-

ample of representing a key frame of a video using the domain

ontology.

Definition 1. Domain Ontology (DO): Domain ontology is the stan-

dard representation of a special domain, including concepts, objects,

attributes, spatial relations, temporal relations, and events. The do-

main ontology can be denoted as

DO =

⎧⎪⎪⎪⎪⎪⎪⎪⎪⎨⎪⎪⎪⎪⎪⎪⎪⎪⎩

Concept

Attribute

Object

Temporal relation

Spatial relation

Event

(1)

Please cite this article as: Z. Xu et al., Semantic based representing and

technology, The Journal of Systems and Software (2014), http://dx.doi.org

efinition 2. Concept (C): Concept is the standard taxonomy of

he objects of a special domain. Concepts are similar to the nodes of

ordNet.2 The concept can be denoted as

oncept = {c1, c2, . . . , cm} (2)

here m means the number of concepts of domain ontology.

efinition 3. Object (O): Object is the extracted component from a

ideo. The extracted object is mapped to a concept. The object can be

enoted as

Object = {o1, o2, . . . , on}(∀oi → ∃cj ∈ Concept) → oi ⇒ cj

(3)

www.Wordnet.princeton.edu.

organizing surveillance big data using video structural description

/10.1016/j.jss.2014.07.024

Page 5: Semantic based representing and organizing surveillance big data using video structural description technology

Z. Xu et al. / The Journal of Systems and Software 000 (2014) 1–9 5

ARTICLE IN PRESSJID: JSS [m5G;December 8, 2014;12:13]

Fig. 4. The hierarchical structure of traffic sighs.

w

m

D

o

A

w

o

o

D

i

p

T

S

b

∀∀∀∀∀

D

r

c

S

S

o

o

I

[

o

∀∀∀

∀∀∀∀∀

w

D

s

E

4

t

r

t

o

i

here n means the number of objects of domain ontology, oi ⇒ cj

eans the mapping operation from object to concept.

efinition 4. Attribute (A): Attribute is the visual feature of the

bjects from a video. The attribute can be denoted as

ttribute = {a1, a2, . . . , ak} (4)

here k means the number of attributes of domain ontology, thus,

bject can be represent as a vector of attributes:

i = {a1, a2, . . . , ak} (5)

efinition 5. Temporal Relation (TR): Temporal relation is the tim-

ng relation between the different time intervals of a video. The tem-

oral relation can be denoted as

R = {before, during, overlap, equal, meet} (6)

uppose two time interval 〈t1, t2〉, 〈t3, t4〉, the temporal relation can

e denoted as

before (〈t1, t2〉, 〈t3, t4〉) → t2 < t3

during (〈t1, t2〉, 〈t3, t4〉) → t1 > t3 ∧ t2 < t4

overlap (〈t1, t2〉, 〈t3, t4〉) → t1 < t3 ∧ t3 < t2

equal (〈t1, t2〉, 〈t3, t4〉) → t1 = t3 ∧ t2 = t4

meet (〈t1, t2〉, 〈t3, t4〉) → t2 = t3

(7)

efinition 6. Spatial Relation (SR): Spatial relation is the position

elation between the different objects of a video. The spatial relation

an be denoted as

R = {inside, touch, partially inside, right, left, above, below,

far, near} (8)

uppose the coordinate of two objects are

i = 〈[x1, y1], [x2, y2]〉j = 〈[x3, y3], [x4, y4]〉 (9)

Please cite this article as: Z. Xu et al., Semantic based representing and

technology, The Journal of Systems and Software (2014), http://dx.doi.org

t is noted that the shape of an object is a rectangle, which means that

x1, y1] and [x2, y2] are the upper left and low right coordinate of the

bject. The spatial relation can be denoted as

inside (oi, oj) → x1 < x3 ∧ x4 < x2 ∧ y1 < y3 ∧ y4 < y2

touch (oi, oj) → x2 = x3 ∨ y2 = y3

partially inside (oi, oj) → (x1 < x3 ∧ (x2 < x4 ∨ y1 > y3 ∨ y2

< y4))∨ (y1 < y3 ∧ (y2 < y4 ∨ x1 > x3 ∨ x2 < x4))

right (oi, oj) → x2 < x3

left (oi, oj) → x1 > x4 (10)

above (oi, oj) → y2 < y3

below (oi, oj) → y1 > y4

far (oi, oj) → x3 − x2 > α ∨ x1 − x4 > α ∨ y3 − y2

> α ∨ y1 − y4 > α

near (oi, oj) → x3 − x2 < α ∨ x1 − x4 < α ∨ y3 − y2

< α ∨ y1 − y4 < α

here α is a threshold to differ the far and near relation.

efinition 7. Event (E): Event is the combine of the objects and their

patial–temporal relation, which can be denoted as

vent = (object, spatial relation, temporal relation) (11)

.2. Case study on representing video traffic events

In this section, the proposed ontology building method on the

raffic events field is used. The events, objects, and spatial–temporal

elations are built together. The traffic events are all extracted from

he illegal action of vehicles from the web site of the ministry

f public security. These illegal traffic events consist of two parts

ncluding

(1) Vehicles: Vehicles are the core components of the illegal traffic

events. The concepts of all potential vehicles which may appear

in the illegal traffic events are built.

(2) Traffic sighs: Traffic sighs are the basic components of the

illegal traffic events. All of the potential traffic sighs which

may appear in the illegal traffic events is built. Fig. 4 gives an

illustration of the defined objects of the traffic domain.

organizing surveillance big data using video structural description

/10.1016/j.jss.2014.07.024

Page 6: Semantic based representing and organizing surveillance big data using video structural description technology

6 Z. Xu et al. / The Journal of Systems and Software 000 (2014) 1–9

ARTICLE IN PRESSJID: JSS [m5G;December 8, 2014;12:13]

Fig. 5. The ontology of example event1.

Fig. 6. The ontology of example event2.

E

a

r

o

(

R

f

n

l

i

o

G

5

r

a

D

t

e

i

c

v

D

(

s

e

r

D

l

Two examples of the illegal traffic events are given. Totally, we

build 215 illegal traffic events.3

Example Event 1: Motor vehicles cross red traffic light in the cross

roads.

In this example, the objects contain motor vehicles, traffic light,

and stop line. Besides the objects, the spatial and temporal relation

between them should be considered. Fig. 5 shows the ontology of this

event. From Fig. 5, we can see that the event1 has the temporal relation

between two different times. In the time1, the motor vehicle is not

inside the stop line. In the next time2, the motor vehicle is inside the

stop line. Through the ontology of the event1, we can detect whether

a car crosses the red light or not.

Example Event 2: Vehicle Overtake the front vehicle at the right

side.

In this example, the objects only contain motor vehicles. Besides

the motor vehicles, the spatial and temporal relation between them

should be considered. Fig. 6 shows the ontology of this event. From

Fig. 6, we can see that the event 2 has the temporal relation between

three different times. In the time1, the motor vehicle1 is below the

motor vehicle2. In the next time2, the motor vehicle1 is at the right

side of the motor vehicle2. In the last time3, the motor vehicle1 is

above the motor vehicle2.

5. The middle layer – using semantic link network for organizing

video surveillance big data

In this section, the semantic link network is used for organizing

traffic video resources. The semantic link network has been verified

in large scale resources environment (Luo et al., 2011).

5.1. The introduction of Semantic Link Network

The Semantic Link Network (SLN) (Zhuge, 2009) was proposed

as a semantic data model for organizing various Web resources by

extending the Web’s hyperlink to a semantic link. SLN is a directed

network consisting of semantic nodes and semantic links. A seman-

tic node can be a concept, an instance of concept, a schema of data

set, a URL, any form of resources, or even an SLN (Zhuge, 2011). A

semantic link reflects a kind of relational knowledge represented

as a pointer with a tag describing such semantic relations as cause

3 The 215 illegal traffic events are got from www.shjtaq.com/zwfg/dmb2012.htm.

s

e

r

Please cite this article as: Z. Xu et al., Semantic based representing and

technology, The Journal of Systems and Software (2014), http://dx.doi.org

ffect, implication, subtype, similar, instance, sequence, reference,

nd equal. The semantics of tags are usually common sense and can be

egulated by its category, relevant reasoning rules, and use cases. A set

f general semantic relation reasoning rules was suggested in Zhuge

2010) and Zhuge (2012). A relation could have a reverse relation.

elations and their corresponding reverse relations are knowledge

or supporting semantic relation reasoning. SLN is a self-organized

etwork since any node can link to any other node via a semantic

ink. SLN has been used to improve the efficiency of query routing

n P2P network (Zhuge et al., 2008), and it has been adopted as one

f the major mechanisms of organizing resources for the Knowledge

rid.

.2. Using SLN for Organizing Traffic Videos

Since ALN model focuses on Web resources, the model should be

evised when using on the video resources. Some related definitions

re given first.

efinition 8. Object Relation (OR): Object relation (OR(oi, oj)) is

he semantic relation between the different objects of a video, for

xample, the same car in the different videos.

In the VSD technology, the object relation between two objects

s detected by their attributes. For example, if two cars with same

olor appear in the different videos, the object relation between these

ideos is detected.

efinition 9. Video Spatial Relation (VSR): Video spatial relation

VSR(vi, vj)) is the spatial relation between videos.

In the VSD technology, since the videos are obtained from the

urveillance equipment, the spatial information can be got easily. For

xample, two videos are in the close cross roads, the video spatial

elation between these videos is detected.

efinition 10. Video Temporal Relation (VTR): Video temporal re-

ation (VTR(vi, vj)) is the temporal relation between videos.

In the VSD technology, since the videos are obtained from the

urveillance equipment, the time information can be got easily. For

xample, two videos are in the related time, the video temporal

elation between these videos is detected.

organizing surveillance big data using video structural description

/10.1016/j.jss.2014.07.024

Page 7: Semantic based representing and organizing surveillance big data using video structural description technology

Z. Xu et al. / The Journal of Systems and Software 000 (2014) 1–9 7

ARTICLE IN PRESSJID: JSS [m5G;December 8, 2014;12:13]

6

t

v

6

s

t

t

r

P

b

b

o

c

T

m

o

6

U

a

p

c

Fig. 7. The annotation interface for users.

U

s

c

d

f

f

7

T

. The top layer – the application on annotating and searching

raffic events

In this section, the applications on annotating and searching of

ideo resources are given.

.1. The video annotation ontology

The video annotation ontology and annotation instance are

tored in a Resource Description Framework (RDF)4 scheme, and

he ontologies reuses a number of RDF vocabularies. These on-

ology vocabularies are extracted from the following knowledge

epository.

(1) The traffic law of China: We analyze the traffic law of china,

and extract the basic concepts from it. For example, the traffic

light, car, people, road line and so on. These basic concepts are

provided for users when they annotate the video resources.

Since the video resources are all about traffic events, these

ontologies are enough for users.

(2) The basic features of car: We give the basic features of a car,

such as color, shape and so on.

(3) The basic features of person: We give the basic features of a

person, such as cloth’s color, the hair style and so on.

These basic concepts are built as the annotation ontologies by

rotégé5 and TBC.6 Protégé is an ontology building tool developed

y Stanford University. This tool can simply generate the ontology

ased on our selected features from traffic law, car, and person. More-

ver, Protégé can support SWRL7 based semantic reasoning, which

an facility the semantic mining procedures in the other modules.

BC is a free ontology generating platform based on Eclipse develop-

ent environment, which provides the instances and meta-field of

ntologies.

.2. The video annotation and searching module

The video annotation module provides the core function for users.

sers can use this module to annotate video resources. Of course, the

nnotation concepts should follow the ontologies. The annotation

rocedures of a user are as follow.

(1) Select or upload video resources: Users can choose annotate an

existing video resource of upload their own video resources. It

is noted that the users of the proposed annotation tool are all

policemen, the upload videos are also about traffic events.

(2) According to the given ontologies, users select the appropriate

concepts to annotate the videos. For example, if a video con-

tains a car, users should annotate the color, style, and other

features of it.

(3) In a video, users can annotate the different frame in the differ-

ent timestamp.

Fig. 7 shows the annotation interface for users. From Fig. 7, we

an see the annotation interface contains the following parts.

(1) Annotation part. The annotation part is in the middle of the

annotation interface. Users can annotate a rectangle to her/his

4 www.w3.org/RDF/, 2013.5 http://protege.stanford.edu/, 2013.6 www.docjar.org, 2013.7 www.w3.org/Submission/SWRL/, 2013.

Please cite this article as: Z. Xu et al., Semantic based representing and

technology, The Journal of Systems and Software (2014), http://dx.doi.org

interested parts. For example, in Fig. 7, users annotate the per-

son in the car.

(2) Input part: The input part is in the right of the annotation in-

terface. Users can input the detailed features of the provided

attributes. For example, in Fig. 7, users annotate the hair style,

cloth color of the person.

(3) Time scroll part: Time scroll part is in the bottom of the annota-

tion interface. Users can scroll forward or back of a video. For

example, in Fig. 7, users annotate the image in the 8:02:43 of

the video.

The video searching module provides the search function for users.

sers can use this module to search video resources. Of course, the

earch concepts should follow the ontologies. The searching interface

ontains the following parts.

(1) The queries input part: The queries input part is in the front of

the searching module. Users can input the searching queries

in this part. For example, the user searches the query “car

light”.

(2) The searching results part: The searching results part is in the

middle of the searching part. Users can browse the searching

result in this part. For example, if users search the car light

in the searching module, the returned results is all annotated

image or video resources contains the concept “car light” in the

annotated meta-data.

The video searching module is implemented on the Virtuoso8

atabase. Java is used to add, delete, and revise the database. Dif-

erent from the Larkc,9 the Virtuoso database contains the better per-

ormance and friendly interface.

. Case study

This case study aims at finding the illegal cars using other licenses.

he detailed information of this study is listed as follow.

(1) Task: Finding the illegal cars using other licenses. In China, each

car should have a sole license and a sole license number. For

example, the license number of a car in Shanghai is A-86812.

Since the license number is the sole identifier of car, some

8 www.virtuoso.com/, 2013.9 www.larkc.eu/, 2013.

organizing surveillance big data using video structural description

/10.1016/j.jss.2014.07.024

Page 8: Semantic based representing and organizing surveillance big data using video structural description technology

8 Z. Xu et al. / The Journal of Systems and Software 000 (2014) 1–9

ARTICLE IN PRESSJID: JSS [m5G;December 8, 2014;12:13]

o

d

A

n

t

o

i

2

u

S

R

A

B

B

B

C

C

2

D

D

D

F

F

2

J

L

L

L

L

L

L

M

M

N

N

P

owners of cars may use other licenses in order to avoid the

punishment of the ministry of public security.

(2) Data set: 1.19 billion Data from the traffic speed camera. The

data consists of the three important information for solving the

task including the license number, the GIS information of the

car, and the catching time of the traffic speed camera. Overall,

from the data set, we can know the appearing time and place

of a car with the license number.

(3) Data processing: Ten servers are used to store and process these

1.19 billion data. 380 blocks are used to store these data and

each block store 3 million data. The total number of storage

space is up to 103GB. The time of copying the data to ten servers

is up to 200 min.

(4) MapReduce: We use the map function to classify the cars into

the different time and the reduce function to classify the cars

by the license number. The MapReduce framework is used to

process the cars by the same license number. The time of the

MapReduce process is up to 50 min.

(5) Rules: We set the rule for detecting the cars with illegal license

number as “the distance between the cars of the same license

number should be lower than 15 km in the time interval of

10 min. In other words, if the time interval of the same cars

is 10 min, the distance between the cars should be lower than

15 km. For example, a car with the license number A-86812

appears in the place A in the 10:00 of 2013.9.13. Another car

with the license number A-86812 appears in the place B in the

10:05 of 2013.9.13. The distance between the place A and B is

longer than 15 km. Obviously, a car can hardly run 15 km in

5 min.

(6) Results: 394 cars are selected as the candidates of the illegal cars

according to the defined rules. These candidates are compared

with the car information database of the ministry of public

security. For example, the brand of a candidate car with the

license number A-86812 is BMW. But the brand information in

the car information database of the ministry of public security

is Audi. Thus, we can say that candidate car uses the license

number of other cars.

In the above case study, the VSD technologies are used to detect

the basic information of a car such as the brand, the color, and the

license number. The MapReduce technology is used for processing

the original data.

8. Conclusion

The increasing need of video based applications issues the impor-

tance of parsing and organizing the content in videos. However, the

accurate understanding and managing video contents at the seman-

tic level is still insufficient. In this paper, a semantic based model

named Video Structural Description (VSD) for representing and or-

ganizing the content in videos is proposed. Video structural descrip-

tion aims at parsing video content into the text information, which

uses spatiotemporal segmentation, feature selection, object recogni-

tion, and semantic web technology. In this paper, a semantic based

model has been proposed for representing and organizing video big

data. The proposed surveillance video representation method de-

fines a number of concepts and their relations, which allows users

to use them to annotate related surveillance events. The defined con-

cepts include person, vehicles, and traffic sighs, which can be used

for annotating and representing video traffic events unambiguous.

In addition, the spatial and temporal relation between objects in an

event has been defined, which can be used for annotating and repre-

senting the semantic relation between objects in related surveillance

events. Moreover, semantic link network has been used for organiz-

ing video resources based on their associations. In the application,

Please cite this article as: Z. Xu et al., Semantic based representing and

technology, The Journal of Systems and Software (2014), http://dx.doi.org

ne case study has been presented to analyze the surveillance big

ata.

cknowledgements

This work was supported in part by the National Science and Tech-

ology Major Project under Grant 2013ZX01033002-003, in part by

he National High Technology Research and Development Program

f China (863 Program) under Grant 2013AA014601, 2013AA014603,

n part by National Key Technology Support Program under Grant

012BAH07B01, in part by the National Science Foundation of China

nder Grant 61300202, and in part by the Science Foundation of

hanghai under Grant 13ZR1452900.

eferences

kdemir, U., Turaga, P., Chellappa, R., 2008. An ontology based approach for activityrecognition from video. In: Proceedings of the ACM International Conference on

Multimedia, pp. 709–712.agdanov, A., Bertini, M., Del Bimbo, A., Torniai, C., Serra, G., 2007. Semantic annotation

and retrieval of video events using multimedia ontologies. In: Proceedings of IEEEInternational Conference on Semantic Computing.

ai, L., Lao, S., Jones, G., Smeaton, A., 2007. Video semantic content analysis based

on ontology. In: Proceedings of the 11th International Machine Vision and ImageProcessing Conference, pp. 117–124.

erners-Lee, T., Hendler, J., Lassila, O., 2001. The semantic web. Sci. Am. 284 (5),34–43.

hen, H., Ahuja, N., 2012. Exploiting nonlocal spatiotemporal structure for video seg-mentation. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition,

pp. 741–748.hoi, M., Torralba, A., Willsky, A., 2012. A tree-based context model for object recogni-

tion. IEEE Trans. Pattern Anal. Mach. Intell. 34 (2), 240–252.

013. Cisco Visual Networking Index: Forecast and Methodology, 2009–2014Available: http://www.cisco.com/en/US/solutions/collateral/ns341/ns525/ns537/

ns705/ns827/whitepaper_c11-481360_ns827_Networking_Solutions_WhitePaper.html.

an, C., Zhixin, L., Lizhe, W., Minggang, D., Jingying, C., Hui, L., 2013. Natural disastermonitoring with wireless sensor networks: a case study of data-intensive appli-

cations upon low-cost scalable systems. ACM/Springer Mob. Netw. Appl. 18 (5),

651–663.eng, J., Socher, R., Li, L.-J., Fei-Fei, L., 2009. ImageNet: a large-scale hierarchical image

database. In: IEEE Proc. CVPR.onderler, M., Saykol, E., Arslan, U., Ulusoy, O., Gudukbay, U., 2005. Bilvideo: design and

implementation of a video database management system. Multimed. Tools Appl.27 (1), 79–104.

an, J., Aref, W., Elmagarmid, A., Hacid, M., Marzouk, M., Zhu, X., 2001. Multiview:

multilevel video content representation and retrieval. J. Electron. Imaging 10 (4),895–908.

an, J., Elmagarmid, A., Zhu, X., Aref, W., Wu, L., 2004. Classview: hierarchical videoshot classification, indexing, and accessing. IEEE Trans. Multimed. 6 (1), 70–86.

013. Great Scott! Over 35 hours of video uploaded every minute to Youtube. The Of-ficial YouTube Blog Available: http://youtube-global.blogspot.com/2010/11/great-

scott-over-35-hours-of-video.html.

aved, K., Babri, H., Saeed, M., 2012. Feature selection based on class-dependent den-sities for high-dimensional binary data. IEEE Trans. Knowl. Data Eng. 24 (3),

465–477.iu, L., Li, Z., Delp, E., 2009. Efficient and low-complexity surveillance video compression

using backward-channel aware Wyner-Ziv video coding. IEEE Trans. Circuits Syst.Video Technol. 19 (4), 452–465.

iu, Y., Zhang, Q., Lionel, M.N., 2010. Opportunity-based topology control in wireless

sensor networks. IEEE Trans. Parallel Distrib. Syst. 21 (3), 405–416.iu, Y., Zhu, Y., Lionel, M., Ni, G.X., 2011. A reliability-oriented transmission service in

wireless sensor networks. IEEE Trans. Parallel Distrib. Syst. 22 (12), 2100–2107.iu, X., Yang, Y., Yuan, D., Chen, J., 2013. Do we need to handle every temporal vi-

olation in scientific workflow systems. ACM Trans. Softw. Eng. Methodol. (earlyaccess).

izhe, W., von Laszewski, G., Younge, A.J., Xi, H., Kunze, M., Jie, T., Cheng, F., 2010. Cloud

computing: a perspective study. N. Gener. Comput. 28 (2), 137–146.uo, X., Zheng, X., Yu, J., Chen, X., 2011. Building association link network for semantic

link on web resources. IEEE Trans. Autom. Sci. Eng. 8 (3), 482–494.a, H., Zhu, J., Lyu, M., King, I., 2010. Bridging the semantic gap between image contents

and tags. IEEE Trans. Multimed. 12 (5), 462–473.arszalek, M., Schmid, C., Inria, M., 2007. Semantic hierarchies for visual object recog-

nition. In: IEEE Proc. CVPR.evatia, R., Natarajan, P., 2005. EDF: a framework for semantic annotation of video.

In: Proceedings of the 10th IEEE International Conference on Computer Vision

Workshops 1876 pp.evatia, R., Hobbs, J., Bolles, R., Smith, J., 2005. VERL: an ontology framework for rep-

resenting and annotating video events. IEEE Multimed. 12 (4), 76–86.lebani, P., Pernici, B., 2009. URBE: web service retrieval based on similarity evaluation.

IEEE Trans. Knowl. Data Eng. 21 (11), 1629–1642.

organizing surveillance big data using video structural description

/10.1016/j.jss.2014.07.024

Page 9: Semantic based representing and organizing surveillance big data using video structural description technology

Z. Xu et al. / The Journal of Systems and Software 000 (2014) 1–9 9

ARTICLE IN PRESSJID: JSS [m5G;December 8, 2014;12:13]

S

W

W

X

X

Y

Y

Y

Y

Y

Z

Z

Z

Z

Z

Z

Z

Z

Z

evilmis, T., Bastan, M., Gudukbay, U., Ulusoy, O., 2008. Automatic detection of salientobjects and spatial relations in videos for a video database system. Image Vis.

Comput. 26 (10), 1384–1396.igan, M., Clarke, R., 2013. Big data’s big unintended consequences. Computer 46 (6),

46–53.u, L., Wang, Y., 2010. The process of criminal investigation based on grey hazy set. In:

2010 IEEE International Conference on System Man and Cybernetics, pp. 26–28.u, C., Zhang, Y., Zhu, G., Rui, Y., Lu, H., Huang, Q., 2008. Using webcast text for semantic

event detection in broadcast sports video. IEEE Trans. Multimed. 10 (7), 1342–1355.

u, Z., Luo, X., Wang, L., 2011. Incremental building association link network. Comput.Syst. Sci. Eng. 26 (3), 153–162.

an, M., Lizhe, W., Dingsheng, L., Tao, Y., Peng, L., Wanfeng, Z., 2013a. Distributeddata structure templates for data-intensive remote sensing applications. Concurr.

Comput.: Pract. Exp. 25, 1784–1797.an, M., Lizhe, W., Zomaya, A.Y., Dan, C., Ranjan, R., 2013b. Task-tree based large-scale

mosaicking for remote sensed imageries with dynamic dag scheduling. IEEE Trans.

Parallel Distrib. Comput. doi:10.1109/TPDS.2013.272.ao, B., Yang, X., Lin, L., Lee, M., Zhu, S., 2010. I2T: image parsing to text description.

Proc. IEEE 98 (8), 1485–1508.u, H., Pedrinaci, C., Dietze, S., Domingue, J., 2012. Using linked data to annotate and

search educational video resources for supporting distance learning. IEEE Trans.Learn. Technol. 5 (2), 130–142.

uan, D., Yang, Y., Liu, X., Li, W., Cui, L., Xu, M., Chen, J., 2013. A highly practical approach

towards achieving minimum datasets storage cost in the cloud. IEEE Trans. ParallelDistrib. Syst. 24 (6), 1234–1244.

e, D., Xiaomin, W., Lizhe, W., Xiaodao, C., Ranjan, R., Zomaya, A., Dan, C., 2014. Parallelprocessing of dynamic continuous queries over streaming data flows. IEEE Trans.

Parallel Distrib. Syst. doi:10.1109/TPDS.2014.2311811, (forthcoming).hang, J., Zulkernine, M., Haque, A., 2008. Random-forests-based network intrusion

detection systems. IEEE Trans. Syst. Man Cybern. C: Appl. Rev. 38 (5), 649–659.

hang, X., Liu, C., Nepal, S., Pandev, S., Chen, J., 2013a. A privacy leakage upper-boundconstraint based approach for cost-effective privacy preserving of intermediate

datasets in cloud. IEEE Trans. Parallel Distrib. Syst. 24 (6), 1192–1202.hang, X., Yang, T., Liu, C., Chen, J., 2013b. A scalable two-phase top-down specialization

approach for data anonymization using MapReduce on cloud. IEEE Trans. ParallelDistrib. Syst. (early access).

huge, H., 2009. Communities and emerging semantics in semantic link network: dis-

covery and learning. IEEE Trans. Knowl. Data Eng. 21 (6), 785–799.huge, H., 2010. Interactive semantics. Artif. Intell. 174, 190–204.

huge, H., 2011. Semantic linking through spaces for cyber-physical-socio intelligence:a methodology. Artif. Intell. 175, 988–1019.

huge, H., 2012. The Knowledge Grid – Toward Cyber-Physical Society, second ed.World Scientific Publishing Co., Singapore.

huge, H., Chen, X., Sun, X., Yao, E., 2008. HRing: a structured P2P overlay based on

harmonic series. IEEE Trans. Parallel Distrib. Syst. 19 (2), 145–158.

Zheng Xu was born in Shanghai, China, in 1984. He received

the diploma and PhD degrees from the School of ComputingEngineering and Science, Shanghai University, Shanghai, in

2007 and 2012, respectively. He is currently working in thethird research institute of ministry of public security and

Tsinghua University, China. His current research interestsinclude topic detection and tracking, semantic web and web

mining.

Please cite this article as: Z. Xu et al., Semantic based representing and

technology, The Journal of Systems and Software (2014), http://dx.doi.org

Yunhuai Liu is a professor in the third research institute

of ministry of public security, China. He received the PhDdegrees from Hong Kong University of Science and Technol-

ogy (HKUST) in 2008. His main research interests include

wireless sensor networks, pervasive computing, and wire-less network. He has authored or co-authored more than

50 publications and his publications have appeared in IEEETransactions on Parallel and Distributed Systems, IEEE Jour-

nal of Selected Areas in Communications, IEEE Transactionson Mobile Computing, IEEE Transactions on Vehicular Tech-

nology, etc.

Lin Mei received his PhD degree from Xian Jiaotong Uni-

versity, China. He is currently working in the third researchinstitute of ministry of public security, China. He is the dean

professor of the Department of Internet of things.

Chuanping Hu received his PhD degree from Tongji Uni-versity, China. He is currently working in the third research

institute of ministry of public security, China. He is the deanprofessor of the third research institute of ministry of public

security.

Lan Chen is a PhD candidate of Beihang University, China.She is currently working in the third research institute of

ministry of public security, China.

organizing surveillance big data using video structural description

/10.1016/j.jss.2014.07.024