Dimensions in Data Processing 2402

8/3/2019 Dimensions in Data Processing 2402

http://slidepdf.com/reader/full/dimensions-in-data-processing-2402 1/76

Dimensions in Data Processing & Data Management Technology -

Data Structures & Algorithms Efficiency Concerns

Vishwambhar Pathak Sr. Lecturer, Dept. of Computer Science & Engineering, BITIC-RAK (UAE)

The efficiency of a computer application is greatly affected by the features of the underlying Database Management

System. Led by the fast advancements in applications of computing and information technology, the database

technology is also expanding at a phenomenal rate.

We find aspects of database management: data structures and data processing algorithms with regard to the varying

data characteristics and processing environment.

Present work aims at providing comprehensive and integrated account of different types of databases concurrent inliterature with comments and review of them, bringing out the point of similarities among them.

INTRODUCTION

The contemporary database management methodologies viz. Relational data model, Object data model, AI

techniques based Knowledge (base) management, Information Retrieval & Exploration and Data Configuration

(XML) techniques are largely under enhancements as the characteristics of data and the processing environment of

various applications largely vary and pose considerable challenges.

Data Characteristics: Multimedia data, Time-Series data, Temporal data, XML data, Multidimensional data Processing Environments: Real time processing, Parallel & Distributed processing, Mobile Computing, P2P n/w.

The focus of current research will be

i) To find solutions (Data representation, Indexing, querying, processing algorithms) to unsolved difficulties arising

due to data characteristics and typicality of the processing environment as summarized above.

ii) To find better ways of enhancement of performance in data management techniques.



Review Of Contemporary Research Related To Representation And Processing Of

Multimedia Data

[ Data Representation Concerns: 3D graphic/ objects handling ( ……may be studied later……..); Event centric; Logic Based representation of Multimedia

data; Representation of Audio data; Universal Common format ]

[ Processing Concerns and Solutions: Content- based retrieval (Querying) CBIR, Color based retrieval, Content based image authentication, Browsing 3D tool,

reactive retrieval in distributed env., semantic-based access in mobile n/w; Data hiding Blockbased-lossless, Distortion~, ERC, Quantization based data hiding;

Error Concealment ERC, Error Concealment, SAR image denoising; Protecting sensitive data ; Replication Multi-quality data replication, Transparent ~;

Information Exploration (Learning from data)Data history tool, Clustering using time-series, Feature extraction; Indexing Content based, graph based,

transform based, wavelet based, for human motion, image data, for multi-feature music, for VLDB-multidimensional ( 2n-tree); Geo-spatial-temporal data

processing ]

For data to be gainfully and meaningfully used in various applications, it is essential to have efficient schemes for

data management and manipulation, which broadly involve acquisition, organization, storage, query, retrieval,

transmission, and presentation of data. A DataBase Management System (DBMS) organizes huge amounts of data

into a database and provide utilities for the efficient storage, usage, and management of data. A multimedia database

management system (MMDBMS) should have capabilities of traditional DBMSs and much more. With multimedia

data, the ability to access all the data with similar features becomes limited with keyword-based indexing and exact

(or range) searching. This makes automatic analysis, classification and content-based query, and similarity-based

search as part of MMDBMS a necessity.

Data Representation Models and Concerns

The use of multimedia data in many applications has increased significantly. Some examples of these applications

are distance learning, digital libraries, video surveillance systems, and medical videos. As a consequence, there are

increasing demands on modeling, indexing and retrieving these data.

[ Concerns: 2D Video, Graph based video, moving object detection and tracking, human motion compressed ]

I. Modeling and refinement of Scalable Video Coding

Modeling and refinement of Scalable Video Coding is much under study [1]. The scalable extension of

H.264/MPEG4-AVC is a current standardization project of the Joint Video Team (JVT) of the ITU-T Video Coding

Experts Group (VCEG) and the ISO/IEC Moving Picture Experts Group (MPEG). The basic SVC design can be

classified as layered video codec.

2



In general, the coder structure as well as the coding efficiency depends on the scalability space that is required. An

important feature of the SVC design is that scalability is provided at a bit-stream level. Bit-streams for a reduced

spatial and/or temporal resolution can be simply obtained by discarding NAL units (or network packets) from a

global SVC bit-stream that are not required for decoding the target resolution. NAL units of PR(progressive

refinement) slices can additionally be truncated in order to further reduce the bit-rate and the associated

reconstruction quality.

Temporal Scalability: In a recent model named H.264/MPEG4-AVC, any picture can be marked as reference

picture and used for motion-compensated prediction of following pictures independent of the corresponding slice

coding types. These features allow the coding of picture sequences with arbitrary temporal dependencies.

So-called key pictures are coded in regular intervals by using only previous key pictures as references. The pictures

between two key pictures are hierarchically predicted as shown in Fig. 2. It is obvious that the sequence of key

pictures represents the coarsest supported temporal resolution, which can be refined by adding pictures of following

temporal prediction levels.

Spatial scalability is achieved by an oversampled pyramid approach. The pictures of different spatial layers are

independently coded with layer specific motion parameters as illustrated in Fig. 1. However, in order to improve the

coding efficiency of the enhancement layers in comparison to simulcast, additional inter-layer prediction

mechanisms have been introduced.

Inter-layer prediction techniques:

The following three inter-layer prediction techniques are included in the SVC design. In the following, only the

original concepts based on simple dyadic spatial scalability are described.

3



1. Inter-layer motion prediction: In order to employ base layer motion data for spatial enhancement layer coding,

additional macroblock modes have been introduced in spatial enhancement layers. The macroblock partitioning is

obtained by upsampling the partitioning of the co-located 8x8 block in the lower resolution layer. The reference

picture indices are copied from the co-located base layer blocks, and the associated motion vectors are scaled by a

factor of 2. These scaled motion vectors are either used unmodified or refined by an additional quartersample

motion vector refinement. Additionally, a scaled motion vector of the lower resolution can be used as motion vector

predictor for the conventional macroblock modes.

I. Inter-layer residual prediction: A flag that is transmitted for all inter-coded macroblocks signals the usage of

inter-layer residual prediction. When this flag is true, the base layer signal of the co-located block is block-wise

upsampled and used as prediction for the residual signal of the current macroblock, so that only the corresponding

difference signal is coded.

II. Inter-layer intra prediction: Furthermore, an additional intra macroblock mode is introduced, in which the

prediction signal is generated by upsampling the co-located reconstruction signal of the lower layer. For this

prediction it is generally required that the lower layer is completely decoded including the computationally complexoperations of motion-compensated prediction and deblocking. However, this problem can be circumvented when the

inter-layer intra prediction is restricted to those parts of the lower layer picture that are intra- coded. With this

restriction, each supported target layer can be decoded with a single motion compensation loop.

Recently a fast search motion estimation algorithm for H.264/AVC SVC (scalable video coding) base layer

with hierarchical B-frame structure for temporal decomposition has been presented [2]. The proposed technique

is a block-matching based motion estimation algorithm working in two steps, called Coarse search and Fine search.

The Coarse search is performed for each frame in display order, and for each 16x16 macroblock chooses the best

motion vector at half pel accuracy. Fine search is performed for each frame in encoding order and finds the best

prediction for each block type, reference frame and direction, choosing the best motion vector at quarter pel

accuracy using R-D optimization. Both Coarse and Fine Search test 3 spatial and 3 temporal predictors, and add to

the best one a set of updates. The spatial predictors for the fine search are the result of the Fine search already

performed for the previous blocks, while the temporal predictors are the results of Coarse Search scaled by an

appropriate coefficient. This scaling is performed since in the Coarse search each picture is always estimated with

respect to the previous one, while in the Fine Search the temporal distance between the current picture and its

references depend on the temporal decomposition level. Moreover in Fine search the number and the value of the

updates tested depend on the distance between the current picture and its references. These sets of updates are the

result of a huge number of simulations on test sequences with different motion features.

II. Storage Concerns and solutions for 2D Scalable video (H.264/MPEG-4 scalable video coding (SVC) )

SVC provides a multi-dimensional scalability, it supports multiple temporal, spatial and SNR resolutions

simultaneously. For the multi-dimensional scalability, SVC enables much more flexible adaptation to various

demands of users and network conditions. With a scalable video, the video server has to extract the exact sub-stream

data that corresponds to the requested resolution, from the full resolution stream. In this case, the extracted sub-

4



stream data may disperse in the disk. Thus, an access of scalable video stream at a corresponding resolution might

incur more disk requests that degrade overall disk throughput severely. Alternatively, server retrieves all streams

including extra sub-streams that are not requested but located between the currently requested sub-streams.

However, it may also cause huge waste of disk bandwidth and memory buffer since a large amount of disk

throughput should be consumed to retrieve unnecessary data and these should be retained in memory until

transmitting them. The disk throughput is a crucial factor that must be taken into account in a video server design,

since disk throughput may restrict the maximum number of clients serviced simultaneously.

There has been several works for placement of scalable video stream or multi-resolution non-scalable video

streaming one disk or a disk array. In multi-resolution non-scalable video stream, Shenoy [7.1] and Lim [7.2] have

proposed a placement strategy that interleaves multi-resolution video stream on a disk array and enables a video

server to efficiently support playback of these streams at different resolution levels. This placement algorithm

ensures that each sub-stream within a stream is independently accessible at any resolution and the seek time and

rotational latency overheads are minimized. In addition, they presented an encoding technique that enables a videoserver to efficiently support scan operations such as fast-forward and rewind. Rangaswami [7.4] developed the

interactive media proxy that transforms non-interactive broadcast or multicast streams into interactive ones. They

carefully manage disk device by considering disk geometry for allocation and making several stream files according

to the fast-forward levels. However, this method consumes large amount of storage space, and they did not consider

disk array management. For the scalable video data, Chang [7.3] have proposed a strategy for scalable video data

placement that maximizes the total data transfer rate on a disk for an arbitrary distribution of requested data rates.

The main concept of this strategy is the frame grouping, which orders data rate layers within one storage unit on a

disk. It allows the optimal disk operation in each service-round by performing one seek and a contiguous read of the

exact amount of data requested. Kang [7.9] presented harmonic placement strategy. In this scheme, the layers are

partitioned into a set of lower layers and a set of upper layers. In the lower layer group, they interleave data blocks

of all layers within the same service round. Meanwhile, in the upper layer group, they cluster the data blocks in a

layer together. Using this scheme, they can reduce disk seek time, since they can cluster the frequently accessed

layers together. However, these schemes described above are not fully utilize the characteristics of scalable video in

the video server that can provide multidimensional scalable video stream. They are limited to only a single

dimensional scalable video.

In a recent work [7], an efficient data reorganization and placement scheme for two dimensional scalable video in a

disk array-based video server has been proposed which considered both disk utilization and load balancing in a disk

array based video server. According to it, we reorganize sub-streams taking into account both of the decoding

dependency of two-dimensional scalable video and the location to be stored in a disk array.

The Two Dimensional SVC Rearrangement

5



SVC provides tools for three scalability dimensions, which are temporal scalability, spatial scalability and quality

(SNR) scalability. We focus on the two of them, spatial and temporal scalability, for the sake of simplicity. Spatial

scalability technique encodes a video into several levels that have different spatial resolutions each other. On the

other hand, temporal scalability is a technique to encode a video sequence into several levels having different frame

rate. These scalability dimensions including spatial and temporal can be easily combined to a general scalable

coding scheme which can provide a wide range of spatial and temporal scalability.

Figure 1. An Illustration of Two Dimensional Scalable Video

Figure 1(a) describes a combined scalability which support simultaneously for spatial and temporal scalability. When a combined

scalability is considered, strict notion of layer does not need to apply any more [2]. Instead, we define combined scalability level

that consists of Ls and Lt, i.e. each scalability dimension has its own level. Ls and Lt represent spatial and temporal scalability

level, respectively. The scalability level in each dimension represents the quality of the video in the corresponding dimension. In

scalable video stream, data segments can be grouped into a minimum sub-stream that is capable of extending scalability level.

Thus, in a scalable video server, data retrievals are requested in units of this minimum sub-stream. We define this sub-stream as

unit sub-stream (US) for two dimensional scalability. The US, Uk (l, m), is defined as a partial stream of k th GOP, which is an

essential sub-stream for reconstruction of video at the resolution of higher than spatial scalability level l and temporal scalability

level m. Thus, to reconstruct kth GOP at spatial scalability level Ls and temporal scalability level Lt, all the US's, U k (l, m), such

that l <= Ls and m <= Lt, should be extracted from the entire stream. GOPk (Ls, Lt), sub-streams for k th GOP at spatial scalability

level Ls and temporal scalability level Lt, is represented with US's as follows.

6



,|),(),( Lt m Lsl ml U Lt LsGOP k k ≤≤= (1)

In the Figure 1(a), relation between scalability level and US's is described. It also represents how scalability level is

related to frame rate and frame size. The encoded scalable video streams are stored in unit of frame, as shown in

Figure 1(b). The number marked on the top of each frame represents the decoding order. Basically, data of encoded

video stream are stored with their decoding order. To exploit access pattern determined by scalability, data should be

partitioned according to US's for the first time. Figure 1(c) shows this data placement partitioned according to US's.

Starting from this placement, the work proposed more efficient placement scheme.

For the scalable video, the requested video streams are likely to be retrieved with discontinuous manner in one

service round duration, since the extracted sub-stream data disperse in the disk. Thus, an access of these streams at a

corresponding resolution might incur more disk requests. To reduce seek-overhead, server can retrieve the sub-

streams including extra sub-streams that are not requested but located between the currently requested sub-streams.

In the view of this, our retrieval policy is that one disk request is generated per one round duration for each disk,

even though it retrieves unnecessary sub-streams. We try to find the optimal placement based on this retrieval

policy. Meanwhile, the request load balancing is important between disks of a disk array. When video streams are

stored into disk array, disk striping is performed by dividing the video data into blocks according to their decoding

order and storing these blocks into different disks. Sub-streams at a corresponding resolution might be located in

some disks but not in some disks. It incurs a biased disk requests and load imbalance between disks, which is not

efficient in a disk array-based server. Thus, the optimal data placement can be obtained by finding the placement

which satisfies both of two criteria:

Criterion 1. For each disk request, the server should retrieve minimum unnecessary sub-streams to maximize disk

utilization during one service round

Criterion 2. The server should generate disk requests to balance loads between disks during one service round

Let us suppose two dimensional scalable video that has three spatial scalability levels and five temporal scalability

levels, and we have a disk array consisting of four disks. Scalable video stream is originally arranged and partitioned

into USs, as described in the previous section. Then, these are initially stored into disks, in which the stripe means

the closed set of one round duration, as shown in Figure 2. The GOP data can be filled to match with striping

distance using FGS layer of quality scalability, which is described as U(F) in the Figure.

7



For a general approach, the optimal data placement can be obtained by finding the placement that optimize the

request size to be retrieved for each disk and distribute as even as possible between disks, during one service round.

Let pij be the probability of retrieving sub-stream at Ls = i, Lt = j, and let Sk = [s 1, s2, …., s N] denote one of the

possible data placement sequences in k th disk, where sn denotes nth US of GOP. Accordingly, S =< S 1, …., SK >

denote the continuous sequence across the K disks of one GOP. Let R ij (Sk) denote the request size that occur when

sub-stream at Ls = i, Lt = j is retrieved from the stream organized as sequence S k for k th disk. Let the spatial

scalability level and temporal scalability level be L and M, respectively, and the number of disks be N. In the first

step, from the criterion 1, we obtain the first data placement by finding S k for each disk that can minimize R(S), the

total request size to be retrieved during one service round from the following equation.

We can obtain several candidate placements from the Eq. 2. In the next step, we select the one that can maximize the

disk load balancing from the criterion 2. Let L ij (S) denote the load balancing factor for scalability level Ls = I and

Lt = j. Load balancing between disks means how the disk requests are distributed as even as possible, so the overall

load balancing factor, L(S), can be described as following equation.

where δ ij denote the number of disks to be accessed for scalability level Ls = i and Lt = j. Later the placement

policy is finding the stream sequence S k for each disk by finding maximum of the Eq. 3.

The procedure of local optimal placement search is described as follows.

1. Reorganize a raw scalable video stream, of which data are basically placed in their decoding order, into US's.

Thus, data are ordered according to scalability level. Then, let i = 1 and α = 1, accordingly the initial sequence is

considered as S1(1)

.

2. Whenever i increases, the sequence of stream Sα(1), is re-ordered, in which the scraper, USα , is relocated in the

ith location within the sequence.

3. For each sequence Sα(1), it is splitted into sub sequences, Sk

(1), for each disk in a disk array. Then, the total

retrieval size, R(Sα(i)), and load balancing factor, L (Sα

(i)) is calculated for that sequence from the Eq. 2 and 3. If it is

better than previous one, we replace it to the optimal sequence S. For the current α , this search is repeated until i

reaches (L . M).

4. While the α increases from 1 to (L . M), the scraper is changed. Using this US, USα , the search algorithm is

repeated from 2 to 3. Finally, the local optimal sequence of stream, S, is selected at the end of the repeat.

When we apply this search algorithm to the initial sequence of Figure 2, we can obtain the placement of Figure 3. In

the above placement, the client distribution probability is assumed to be pre-defined parameter. In particular, the

placement can be optimal when all the scalability level is requested in the same probability.

III. Graph-based Approach for modeling and indexing Video:

8



In [5], a new graph-based video data structure, called Spatio-Temporal Region Graph (STRG), which represents

spatio-temporal features, and the relationships among the video objects. Region Adjacency Graph (RAG) is

generated from each frame, and STRG is constructed by connecting RAGs. The STRG is segmented into a number

of pieces corresponding to shots for efficiency. Then, each segmented STRG is decomposed into its subgraphs,

called Object Graph (OG) and Background Graph (BG) in which redundant BGs are eliminated to reduce index size

and search time. The proposed indexing starts with clustering OGs using Expectation Maximization (EM) algorithm

[5.1] for more accurate indexing. To cluster them, we need a distance measure between two OGs. For the distance

measure, the paper proposed Extended Graph Edit Distance (EGED) because the existing measures are not very

suitable for OGs. The EGED is defined in non-metric space for clustering OGs, and it is extended to metric space to

compute the key values for indexing. Based on the clusters of OGs and the EGED, it proposed a new indexing

structure STRG-Index which provides efficient retrieval.

Spatio-Temporal Region Graph

For a given video, each frame is segmented into a number of regions using a region segmentation technique. Then,

Region Adjacency Graph (RAG) is obtained by converting each region into node, and spatial relationships among

regions into edges [5.2], which is defined as follows:

Definition 1 Given the nth frame fn in a video, a Region Adjacency Graph of fn, Gr(fn), is a four-tuple

Gr(fn) =V, ES, ν, ξ,

where

• V is a finite set of nodes for the segmented regions in fn,

• ES ⊆ V × V is a finite set of spatial edges between adjacent nodes in fn,

• ν : V → AV is a set of functions generating node attributes, and

• ξ : ES →S E A is a set of functions generating spatial edge attributes.

The node attributes ( AV ) represent size (i.e., number of pixels), dominant color and location of corresponding

region, the spatial edge attributes (S E A ) represent the relationships between two adjacent nodes such as spatial

distance and orientation. RAG is good for representing spatial relationships among nodes indicating the segmented

regions. However, it cannot represent temporal characteristics of video. In the new graph-based data structure for

video, Spatio-Temporal Region Graph (STRG) which is temporally connected RAGs [5.3]. The STRG can handle

both temporal and spatial characteristics of video, and defined as follows:

Definition 2: Given a video segment S, a Spatio-Temporal Region Graph, Gst(S), is a six-tuple Gst(S) = V,ES,ET ,

ν, ξ, τ, where

• V is a finite set of nodes for segmented regions from S,

• ES ⊆ V × V is a finite set of spatial edges between adjacent nodes in S,

• ET ⊆ V × V is a finite set of temporal edges between temporally consecutive nodes in S,

• ν : V → AV is a set of functions generating node attributes,

9



• ξ : ES →S E A is a set of functions generating spatial edge attributes, and

• τ : ET →T E A is a set of functions generating temporal edge attributes.

In STRG, the temporal edge attributes (T E A ) represent the relationships between corresponding nodes in two

consecutive frames such as velocity and moving direction. Figure 1 (a) and (b) are actual frames in a sample video

and their region segmentation results, respectively. Figure 1(c) shows a part of STRG for frames #141 − #143

constructed by adding temporal edges which are horizontal lines between the frames.

An STRG is an extension of RAGs by adding temporal edges (ET) to them. ET represents temporal relationships

between corresponding nodes in two consecutive RAGs. The main procedure of building STRG is therefore, how to

construct ET , which is similar to the problem of objects tracking in a video sequence. To find the corresponding

nodes in two consecutive RAGs, a graph isomorphism and maximal common subgraph was used. These algorithms

are conceptually simple, but have a high computational complexity. To address this, a RAG was decomposed into its

neighborhood graphs (GN(v)) which are subgraphs of RAG as follows:

Definition 3 GN(v) is the neighborhood graph of a given node v in a RAG, if for any nodes u ∈ GN(v), u is the

adjacent node of v, and has one edge such that eS = (v, u).

Letm

N G and1+m

N G be sets of the neighborhood graphs in mth and (m + 1)th frames

respectively. For each node v in mth frame, the goal is to find the corresponding target node

v’ in (m+1)th frame. To decide these corresponding nodes, we use the neighborhood graphs

in Definition 3. For each neighborhood graph GN(v ) inm

N G , the goal is converted to finding

10



the corresponding target graph GN (v ) in1+m

N G , which is an isomorphic or the most similar

graph to GN (v ). First, we find the neighborhood graph in1+m

N G , which is isomorphic to

GN(v ). Second, if we cannot find any isomorphic graph in1+m

N G , we find the most similar

neighborhood graph to GN(v ) using a similarity measure, SG(GN(v ),GN(v’ )), which is defined

as follows:

|))'((|,)(min(|

||))'(),((

vGvG

GvGvGSG

N N

C N N = (1)

where |G| denotes the number of nodes of G, and GC is the maximal common subgraph of

GN(v ) and GN(v’ ). GC can be computed based on maximal clique detection. For GN(v ) ∈ m

N G ,

GN(v’ ) is the corresponding neighborhood graph in1+m

N G , whose SG with GN(v ) is the largest

among neighborhood graphs in 1+

m N G , and greater than a certain threshold value. In this

way, we find all pairs of corresponding neighborhood graphs (eventually corresponding

nodes) fromm

N G to1+m

N G .

Object Graph

An STRG is first decomposed into Object Region Graphs (ORGs) to model moving objects. We

consider a temporal subgraph that can be defined as a set of sequential nodes connected to

each other by a set of temporal edges (ET ). An ORG is a special case of temporal subgraph of

STRG when the spatial edge set ES is empty. However, due to the limitations of region

segmentation techniques, different color regions belonging to a single object cannot be

detected as a single region. For instance, a body of person may consist of several regions

such as head, upper body and lower body. Figure 2 (a) shows an object that is segmented

into four regions over three frames. Since there are four regions in each frame, we build four

ORGs, i.e. (v 1, v 5, v 9), (v 2, v 6, v 10), (v 3, v 7, v 11), and (v 4, v 8, v 12) like Figure 2 (b). Since they

belong to a single object, it is better to merge those ORGs into one.

11



For convenience, we refer to the merged ORGs as Object Graph (OG). In order to merge two

ORGs which belong to a single object, we consider the attributes (i.e. velocity and moving

direction) of temporal edge (ET ). If two ORGs have same moving direction and same

velocity, these can be merged into one. In Figure 2 (c), four ORGs are merged into a single

OG, i.e. (v 2, v 6, v 10). After OGs are extracted, the remainders of STRG represent background

information of a video. We call this graph as a Background Graph (BG) and it is used in

indexing.

STRG Indexing

In this section, the paper proposed a graph-based video indexing method, called Spatio-

Temporal Region Graph Index (STRG-Index), which uses the Extended Graph Edit

Distance(EGED)M as a distance measure in metric space, and clustered OGs.

The Extended Graph Edit Distance (EGED) between two object graphs s

mOG andt

nOG is

defined as:

12



In order to satisfy the triangle inequality, EGED is specialized to be metric distance function

(see Theorem 1) by comparing the current value with the fixed constant.

Theorem 1: If gi is a fixed constant, then EGED is a metric.

STRG-Index Tree Structure

To build an index for video data, we adapt the procedure of tree construction proposed in M-

tree [5.4] since it has a minimum number of distance computations and a good I/Operformance. In M-tree, a number of representative data items are selected for efficient

indexing. There are several ways to select them such as Sampling or Random selection. In

the STRG-Index, we employ the clustering results to determine the representative data

items. The STRG-Index tree structure consists of three levels of nodes; shot node, cluster

node, and object node as seen in Figure 3.

13



The top-level has the shot node which contains the information of each shot in a video. Each

record in the shot node represents a segmented shot whose frames share a background.

The record has a shot identifier (ShotID), a key RAG (Grkey), an actual BG (BGr), and an

associated pointer (ptr) which references the top of corresponding cluster node. The

following figure shows an example of a record in the shot node.

The mid-level has the cluster nodes which contain the centroid OGs that represent cluster

centroids. Each record indicates a representative OG among a group of similar OGs. A record

contains its identifier (ClusID), a centroid OG (OGc) of each cluster, and an associated

pointer (ptr), which references the top of corresponding object node. The following figure

shows an example of a record in a cluster node.

The low-level has the object nodes which contain OGs belonging to a same cluster. Each

record in the object node represents an object in a video, and has the index key

(which is computed by EGEDM(OGm,OGc)), an actual OG (OGm), and an associated pointer

(ptr) which references the actual video clip in the disk. The following figure shows an

example of a record in the object node.

STRG-Index Tree Construction

Based on the STRG decomposition described above, an input video is separated into

foreground (OG) and background (BG) as subgraphs of the STRG. The extracted BGs are

stored at the root node without any parent. All the OGs sharing one BG are in a same cluster

node. This can reduce the size of index significantly. For example, in surveillance videos a

camera is stationary so that the background is usually fixed. Therefore, only one record (BG)

in the shot node is sufficient to index the background of the entire video.

We synthesize a centroid OG (OGc) for each cluster which is a representative OG for the

cluster. This centroid OG is inserted into an appropriate cluster node as a record. This

centroid OG is updated as the member OGs are changed such as inserting, deleting, etc.

Also, each record in a cluster node has a pointer to an object node. The object node has

actual OGs in a cluster, which are indexed by EGEDM. To decide an indexing value for each

OG, we compute EGEDM between the representative OG (OGc) in the corresponding cluster

14



and the OG (OGm) to be indexed. Since EGEDM is a metric distance by Theorem 1, the value

can be the key of OG to be indexed.

IV. Moving Object Detection

Moving object detection is very important in intelligent surveillance. Currently, the main

detection algorithms include frame difference method, background subtraction method,

optical flow method and statistical learning method. Optical flow method is the most

complex algorithm. It spends more time than other methods, and statistical learning method

needs many training samples and also has much computational complexity. These two

methods are not suitable for real-time processing. Background subtraction method is

extremely sensitive to the changes of light. Frame difference method is simple and easy to

implement, but the results are not accurate enough, because the changes taking place in

the background brightness cause misjudgment [6.1,6.2,6.3,6.4]. According to that eyes are

sensitive to both of movement and edges, in a recent work [6], an efficient algorithm based

on frame difference and edge detection is presented for moving object detection. Figure 1

gives The flow chart of frame difference method

Figure 1 The flow chart of frame difference method

The flow chart of the detecting process by moving edge method is as Figure 2.

Figure 2 The flow chart of moving edge method

The flow chart of the detection process by using the method based on frame difference and

edge detectionpresented in this paper is as figure 3.

15



Figure 3 The flow chart of the improved algorithm

Further, Object segmentation is performed to divide the image into moving area and static

area. Then After separating moving objects and background, we need to locate the object so

as to get the exact position of moving objects. The common approach is to calculate

connected components in binary images, delete those connected components whose area

are so small, and get circum-rectangle of the object.

V. Motion Picture Storage with Compression [8]

ANIMATION of human-like virtual characters has potential applications in the design of human

computer interfaces, computer games, and modeling of virtual environments using power-

constrained devices such as laptop computers in battery mode, pocket PCs, and PDAs.

Distributed virtual human animation is used in many applications that depict human models

interacting with networked virtual environments. The two major issues involved in the

streaming of MoCap (Human Motion Capture) animation data to mobile devices are 1)

limited bandwidth available for streaming MoCap data, and 2) limited power available to

receive, decompress, and render the compressed MoCap data. It is desirable to have a

compression method which reduces the network bandwidth enough to allow streaming/

using in mobile devices, and also requires less computation, hence, power consumption, at

the client side to reconstruct the motion data from the compressed data stream. In order to

standardize virtual human animation, MPEG-4 has proposed H-Anim standards for

representation of virtual humans and the format of the corresponding motion capture

(MoCap) data to be used for rendering and animating the virtual human [8.1], [8.2], [8.3]. A

recent compression algorithm for MoCap data (or, equivalently, MPEG-4 Body Animation

Parameters (BAP) data), termed as BAP-Indexing [8.4], which uses indexing techniques for

compression of BAP data, resulting in a significant reduction in power consumption required

16



for decompression. BAP-Indexing exploits the structural hierarchy of the virtual human to

achieve efficient compression, which, though lossy, results in reconstructed motion of good

quality.

Fig. 1. The standard compression and decompression pipeline for MPEG-4 Motion Capture (MoCap) data or BodyAnimation Parameter (BAP) data.

Matrix Reprsentation Of MoCap Data

The MoCap Data (or, equivalently, MPEG-4 BAP data) is represented by an n x m-

dimensional matrix X, where n is a multiple of the video sampling rate or frame rate

expressed as frames per second (fps) and m is the number of degrees of freedom for the

virtual human (the maximum value of m = 296 as defined in the MPEG-4 standard). Each

row of the matrix represents a pose of the virtual human for a small time step. Each column

of the matrix corresponds to either the displacement of the model from a fixed origin, or the

Euler angle of rotation needed to achieve the desired pose. We have used a 62-dimensional

virtual human, with a frame rate of 33 fps. This means that, for a 10 second motion

sequence, the motion matrix X is a 330 x 62 array of floating point numbers. The first three

columns of X represent the absolute displacement of the virtual human with respect to a

fixed origin in the 3D virtual world. The next three columns represent the absolute

orientation of the virtual human with respect to the virtual world coordinates. The remaining

56 columns correspond to the angles made by the degrees of freedoms associated with the

various joints in the skeletal virtual human model.

As a first step in the compression process, the matrix X is equivalently represented as adifference matrix, d n-1x m, and the initial pose vector I, where I is assigned the first row of X,

and the rows of d are the differences between successive rows of X.

I j = X1j j = 1, 2, …, m

dij = Xi+1, j - Xij i = 1…n-1; j= 1…m.

The difference matrix d, subsequently termed the motion matrix, can be interpreted as

successive small angular increments (floating point numbers) needed by the virtual human,

17



for each of its degrees of freedom, in order to realize the desired animation. Without loss of

generality, we assume that d has n rows.

Bap-Indexing: Indexing Of The BAP Data

For approximately periodic and regular motions such as walking, jogging, and running, a

collection of all the n x m floating point numbers within the corresponding motion matrix d

exhibit a tendency to form a finite number of well separated clusters. Taking a cue from this

observation, we assign the n x m floating point numbers in d to a finite number of buckets.

Each bucket, in turn, is associated with a representative number which best describes the

collection of the numbers within the bucket. The basic concept underlying the proposed

indexing technique is to be able to index some (perhaps all) of the numbers within the

original motion matrix d and generate a corresponding lookup table for the indices.

Indexing the Motion Matrix d

Step 1: All the data in matrix d is collected into a single 1D array A of size n x m. The array

A is sorted in ascending order. All the numbers in A are multiplied by the resolution

quantization term (RQT), M. The RQT depends on the number of significant digits used to

represent the floating point number. For example, if the required accuracy of the floating

point numbers is a maximum of four digits, RQT = 10, 000. The numbers are rounded off to

represent integers in the range [ Amin . M, Amax .M].

Step 2: The integers in the range [ Amin . M, Amax .M] are divided into buckets numbered from

0 to 255. It is desirable to allocate each of the 256 buckets an equal share of the n x m

numbers in A. The rationale behind assigning the 256 buckets an equal share of n x m

numbers in the motion sequence is that BAP data clusters that contain more data points are

assigned more buckets (hence, more indices). In essence, the indices are distributed among

the clusters in proportion to the cluster size (note that the number of indices is fixed by

fixing the number of bits per index). This scheme is similar to adaptive vector quantization

which is known to reduce the overall encoding error. Thus, each bucket should have freq =

( Amin . M - Amax .M)/256 numbers allocated to it. This is achieved by computing the histogram

of the integers in A, and dividing the histogram into 256 vertical strips such that each strip

has the same area, freq. After all the numbers in A have been allocated a bucket numbered

from 0 to 255, the numbers in A are divided by the RQT to recover the original numbers.At the end of Step 2, we get a set of 256 buckets denoted by bucket (j) for j = 0 to 255,

such that each floating point entry in the motion data matrix, d is contained in exactly one

of the 256 buckets. An index matrix dindex is used to store the bucket number (index) for

the corresponding entry in the matrix d.

Lookup Table for the Index Matrix dindex

18



The lookup table is used to map each of the indices to a corresponding representative

number such that a suitable approximation to the original motion matrix d can be recovered.

The creation of an appropriate lookup table for the recovery of the original motion data

matrix d from the index matrix dindex is critical, since recovery of the original data after

discretization invariably results in motion distortion. A straightforward method to recover thenumber associated with a bucket is to compute the simple average of all the floating point

numbers assigned to the bucket. However, this invariably leads to poor approximation of the

original motion matrix d. We have observed that intelligent exploitation of the hierarchical

structure of the skeletal virtual human model can lead to better construction of the lookup

table Tlookup, which in turn results in reduced error in the reconstructed motion using the

lookup table. The steps for creating the lookup table are detailed as follows:

Step 1: The virtual human is represented by a hierarchical skeletal model. For each m-

dimensional pose vector, each dimension, or column in the motion matrix d, is assigned a

level li (Fig. 2). The level li signifies the importance of the degree of freedom associated with

a particular joint in the overall displacement of the model joints. A joint i, at level li = 1,

when given a small angular displacement, affects the model more in terms of the overall

displacement, than a joint j at level lj = 2, 3, 4, 5, or 6.

Step 2: After assigning level values to the various joints of the virtual human model, these

joint level values are used to compute a weighted sum of the numbers belonging to a

bucket. The jth lookup value in lookup table Tlookup is given by:

where η is a constant. Empirical observations have revealed that as η increases, the

Tlookup values result in a better approximation to the data, resulting in reduced

displacement error. This is due to the fact that the numbers associated with level = 1 affect

the displacements in the body the most. Hence, emphasizing the numbers within a bucket

with level = 1 leads to better approximation of the motion data. As η ->∞, all the weighting

terms in (1) tend to zero, except for the terms with level = 1. Hence, when computing the

weighted sum of the numbers in a bucket, we consider only those numbers with level = 1

(selective averaging), and compute a simple mean of these numbers. If none of the entries

in a bucket have level = 1, we use the next smallest level to compute the weighted sum.

Our empirical observations have shown that the BAP data values from all levels of the virtual

human model form compact and well separated clusters. The data values with level = 1 in

each bucket are fairly close to each other. This allows selective averaging (1) to be

performed without introducing too much visual distortion.

19



Fig. 2. An example of the hierarchical structure of a virtual human skeletal model consisting of 31 nodes, with a total

of 62 degrees of freedom of motion (rotational and translational). For convenience, the root node is drawn at the

bottom.

The above mentioned paper further provides Motion Matrix Decomposition for Motion

Sequences of Long Durations.

VI. Other Techniques for Motion Pictures

Besides MPEG-4, there exist other ad hoc quantization methods for efficient use and

distribution of MoCap data over a network. Endo et al. [8.5] propose quantization of the

motion type, rather than the motion data itself. Hijiri et al. [8.6] describe a new data packet

format which allows flexible scalability of the transmission rate, and a data compression

method, termed as SHCM, which maximizes the efficacy of this format by exploiting the 3D

scene structure. The proposed method in this paper uses quantization to achieve data

compression in a manner somewhat similar to the above work, but incorporates intelligent

exploitation of the hierarchical structure of the human skeletal model. Giacomo et al. [8.7]

present methods for adapting a virtual human’s representation and the resulting animation

stream, and provide practical details for the integration of these methods into MPEG-4 and

MPEG-21 architectures. Aubel et al. [8.8] present a technique for using impostors to improve

the display rate of animated characters by acting solely on the geometric and rendering

information. Recently, Arikan [8.9] has presented a comprehensive MoCap database

compression scheme which is shown to result in a significantly compressed MoCap

database. The above techniques, although very efficient in terms of compression ratio, do

not address the need for customized compression of BAP data for power aware devices. To

this effect, the proposed BAP-Indexing technique is a refined and special case of standard

clustering, quantization, and lookup (CQL) based compression schemes. BAP-Indexing not

only allows for low-bitrate encoding of motion data, but is also suitable for data reception

and data reconstruction on power-constrained devices.

20



Clues for future work on motion picture:

A drawback associated with most animation research is that there is no perfect quantitative

measure for the quality of the reconstructed motion. The compression error (the

displacement of a body segment from its original location) is easily perceptible when the

body segment touches an environment object, whereas a relatively large error is acceptableif the body segment is moving in an empty space. This observation can be exploited for

enhancing the compression ratio provided that detailed models of the environment and the

interaction of the virtual human with the environment are available. Finally, the intelligent

use of the hierarchical structure of the model yields good results for full body motions of the

virtual human; for small delicate motions such as movement of the fingers, or for facial

animation, the proposed technique offers considerable scope for future improvement.

VII. Modeling and refinement of Audio data

The management of large collections of music data in a multimedia database has received

much attention in the past few years. Due to several inherent characteristics of audio data,

there have been demands for huge storage spaces, large bandwidth and real-time

requirements for transmission, content-based queries, similarity-based search and

retrievals, and synchronization of retrieval results. Of interest to the user are easy-to-use

queries with fast and correct retrievals from the audio/multimedia database. To this end, (1)

derivation of good features from the data to be used as indices during search, (2)

organization of these indices in a suitable multi-dimensional data structure with efficient

search, and (3) a good measure of similarity (distance measure) are important factors. An

audio database supporting content-based retrievals should have the indices structured with

respect to the audio features, which are extracted from the data.

In the researches of music content-based retrieval, many approaches extract the features,

such as key melodies, rhythms, and chords, from the music objects and develop indices that

will help to retrieve the relevant music efficiently [9.5][9.8][9.12]. Several reports have also

pointed out that these features of music can be transformed and represented in the forms of

music feature strings [9.1][9.2][9.4][9.6][9.7] or numeric values [9.10][9.11] such that the

indices can be created for music retrievals. We also can combine these features to support

various types of queries.

Existing Multi-feature Indexing for Music Data

In the researches of indexing for music database retrieval, most of existing works were

concentrated in constructing single-feature index structures for query searching: for

instance, in 1999, the Key Melody Extraction and N-note Indexing by Tseng, Melodic

Matching Techniques by Uitdenbogerd, et al., and Approximate String Matching Algorithm by

21



Liu, et al. [9.9]; in 2000, Query by Music Segments by Chen, et al. [9.2]; and in 2002,

Numeric Indexing by Lo, et al. [9.10]. There are only a couple of researches emphasized on

how to create a multi-feature index for music data retrieval. The most of recent works are

Multi-Feature Index Structures [9.6] and Multi-feature Numeric Indexing [9.11]. We briefly

discuss these two approaches in the following subsections.

i. Grid-Twin Suffix Trees

There were four multi-feature index structures for music data retrieval proposed by Lee and

Chen [9.6], in which it consists of Combined Suffix Trees, Independent Suffix Trees, Twin

Suffix Trees, and Grid-Twin Suffix Trees. This research claimed that the structure of Grid-

Twin Suffix Trees provides most scalability among them. The Grid-Twin Suffix Trees is an

improved version from Twin Suffix Trees. An example of Twin Suffix Trees is shown in

Figure 1. There could be two music features in the Twin Suffix Trees and each feature hasits own index structure of independent suffix tree and there are links between them pointing

from each node in one independent suffix tree to the corresponding feature nodes in

another independent suffix tree.

Figure 1. Construction of the Twin Suffix Tree Figure 2. An example of the Grid-Twin Suffix

Trees

To construct Grid Twin Suffix Tree, they first use a hash function to map each suffix of the

feature string into a specific bucket of a 2-dimensional grid. The hash function uses the first

n symbols of the suffix to map it into a specific bucket. After hashing all suffixes, the

remaining symbols of feature string following the suffixes are used to construct the Twin

Suffix Trees and accompanied under the buckets. The Figure 2 shows an overview of the

structure for Grid-Twin Suffix Tree. Considering melody and rhythm only, the hash function

is as following,

22



where x and y are the row and column coordinates, respectively, and P(x, y) denotes the

position of the bucket. The Numm, Numr, Mi, and Ri are the sizes of the melody and rhythm,

the values of the ith symbols of melody and rhythm, respectively. The length of the suffix is

denoted by n.

ii. Multi-Feature Numeric Indexing

The Multi-Feature Numeric Index for music data retrieval was proposed by Lo and Chen

[9.11]. For translating music data into numeric value, they assume that the music symbols,

‘a’, ‘b’, ‘c’, …, ‘m’, can map into integer values 0, 1, 2, …, m-1, respectively. If we pick out a

music segment with n sequential notes from a melody feature string, denoted x1x2…xn, the

integer value of each note can be represented by P(xi), 1 ≤ i ≤ n. Therefore, this segment of

n sequential notes can be transformed into a numeric value by the conversion function –

v(n), as shown below.

Each music feature segment can be converted into a numeric value by equation (2) and

these values for a music feature segment can be looked as a coordinate for multi-

dimensional space. Such that the coordinate can be inserted into a multi-dimensional index

tree, such as R-tree [9.3], for music retrieval. Therefore, it also can be extended for

converting three or more features into high dimensional index tree.

Although, the authors claimed that Grid-Twin Suffix Trees provides more scalability than the

other three index structures in [6]. However, if there are more features or we use more

symbols of suffixes (n > 2) to map into the buckets, a massive of memory space is needed

for Grid-Twin Suffix Trees to construct buckets of grid structure. They may need a huge

memory space and a sparse matrix may occur in the grid structure. In addition, since

numeric index is created by transforming fixed length, n in equation (2), of music segment

into numeric value, the main drawback of Multi-Feature Numeric Index is that the length of a

query (Query By Example, QBE) is inflexible. It had better equal to the length of music

segment which the index created. Otherwise, searching time for the query will be a multiple

times increasing.

iii. Hybrid Multi-Feature Indexing

In a work [9], a hybrid multi-feature indexing has been proposed. It takes advantages of

Multi-Feature Numeric Index and Grid-Twin Suffix Trees to construct a new index structure

such that our proposed index structure will be less memory space needed than Grid-Twin

23



Suffix Trees, as well as, unlike Multi-Feature Numeric Index, be without any query length

restriction. To construct Hybrid Multi-Feature Index, it uses a multi-feature tree structure

instead of grid structure in Grid-Twin Suffix Trees. The Twin Suffix Trees original under each

bucket are now linked under corresponding leaf node of multi-feature tree in Hybrid Multi-

Feature Index. The work organizes the creating of the indexing approach in the followingthree steps:

Step 1: Suppose that there are d features in music data and, in each music feature string,

the first n symbols of the suffix will be transformed into a coordinate. We design equation (3)

for d-feature coordinate P(x1, .., xd) as follows,

where F1(i), … , Fd(i) and N1, … , Nd represent the values and sizes of alphabet symbols,

respectively, for d music features. We note that a suffix within any music segments, such as

“a1” or “a1b2”, will have only one corresponding coordinate.

Step 2: The coordinate derived from Step 1 is then inserted into a d-feature (d dimensional)

tree. The degree of each non-leaf node in this d-feature tree is 2d. There is also a center

point for each non-leaf node. The coordinate, (x1c, x2c, …, xdc), of the center point is

computed by averaging the coordinates inserted under current node or its descendent

nodes. Such that, if there are 2 features and the center point is (x1c, x2c), the node will be

partitioned into four domains, (≥ x1c, ≥ x2c), (< x1c, ≥ x2c), (≥ x1c, < x2c), and (< x1c, <

x2c). To keep the index tree balancing as R-tree, each non-leaf node in this d-feature tree

contains at least 2d-1 not null links (half full). Therefore, to insert a new coordinate into a

node, it may cause the center point to be recomputed or may cause the index tree be

reorganized.

Step 3: The remaining symbols behind the first n symbols of suffix are then used to

construct the Twin Suffix Trees linked under d-feature tree. Figure 3 and Figure 4 represent

the structures of Hybrid 2-Feature Index and Hybrid 3-Feature Index, respectively.

Figure 3. The structure of Hybrid 2-Feature Index Figure 4. The structure of Hybrid 3-Feature Index

24



Use of Stochastic (Statistical) Analysis in Image-processing

I. Image Compression

A proprietary method of image compression has been developed [10] with this technology

which intelligently stores a version of the imagery and recovers it by means of the

Stochastic Matrix Method SMM function recovery. Figure 5 illustrates it.

Figure 5. Closeup look at the face of a bird. Top: input image; Bottom: interpolated image. The input image is what is

stored and the interpolated image is what is viewed by a user.

The input image is very coarse and certain features are hard to discern, but the interpolated

image recovers much of this content and makes it intelligible to the human eye.

Figure 6 illustrates the advantages enjoyed over the ubiquitous JPEG DCT coder. On close

inspection the JPEG DCT coder reveals its 8 pixel by 8 pixel blocks in its characteristic

artifact which becomes a nuisance at significant compression levels. It is clear from the

figure that the function recovery by means of SMMs does not suffer from this problem.

Figure 6. This is a closeup look at the back of the head of a bird in order to illustrate the blockiness of JPEG

compression and the lack of it with the compression scheme by means of SMM function recovery. Left: JPEG DCT

image, the 8 x 8 blocks are apparent. Centre: image interpolated from Right by means of SMM function recovery.

Right: image that is actually stored and which is the input image for the function recovery.

25



II. Moving Object Detection [11]

A common approach to detect foreground objects is to collect pixels in the current frame

that deviate significantly from the model estimations. Those methods can generally be

classified as predictive or non-predictive manners. Predictive methods develop dynamicaltime series models to recover the current input based on past observations. The Kalman

filter was firstly introduced by Koller et al. [11.1] for modeling the dynamic states of

background pixels. The optical flow based method is a natural approach to model persistent

motion behavior. Wixson [11.2] presented a method to detect salient motion by

accumulating directionally-consistent flow. Tian [11.3] combined temporal difference

imaging and a temporal filtered motion field to detect salient motion in complex

environments. Recent methods are based on more complicated models. In [11.4], an

autoregressive model was proposed to capture the properties of dynamic scenes. Non-

predictive density-based methods neglect the order of observations and build a

probabilistic representation (PDF) of the observations at a particular pixel. Wren [11.5] used

a single Gaussian intensity distribution for each pixel. Consequently, the idea extended to

the mixture of Gaussians model (MGM) proposed by Stauffer and Grimson [11.6] to address

the multi-modality of background. When density functions are so complex that cannot be

modeled parametrically, non-parametric approaches, proposed by Elgammal [11.7], are

considered more suitable to handle arbitrary densities, where kernel density functions [11.8]

are used for pixel-wise background modeling. However, it is computational costly and use no

spatial correlate on of the pixel features explicitly.

In a work by Tang, Gao, and Liu [11], they propose a real-time moving object detection

algorithm by clustering salient motion points into spatial and kinetic mixture of Gaussian

model recursively. In each frame, temporal difference filtering first generates a set of

feature points; then evaluations of validation and salience are performed for every feature

points preceded by resampling operations, so as to only preserve those samples that

strongly support the cluster of salient moving object in the feature space. The clusters are

instantiated and updated applying an online approximation algorithm, and are terminated

when their component weights drop below a threshold.

Brief overview of the model:

Model Specification

A four dimensional feature vector is taken to describe the state of each sample, i.e., zi =

(x, y, x’, y’)i i ∈ [1, N] and zi ∈ ℜ4, N ∈ ℵ, where (x, y)i represents the point’s

coordinates, (x’, y’)i denotes motion speed values, and N is the number of samples. For

simplification, let si = (x, y)i and vi = (x’, y’)i describe spatial and motion information

26



respectively. Assuming we have the initial mixture distribution, feature points can be

associated with one of the K clusters (Fig. 3). The likelihood of a feature point belonging to

the foreground can be written as:

(1)

where qk is the prior of the k th Gaussian component in the mixture model, and ),;( k k i z Σ µ η

is the k th Gaussian component defined as:

(2)

where d = 4 is the dimension of the MGM models.

We further assume that the spatial and kinetic component of the MGM models are

decoupled, i.e., the covariance matrix of each Gaussian component takes the block diagonal

form:

(3)

where s and v stand for the spatial and kinetic features respectively. With such

decomposition, each Gaussian component has the following factorized form:

(4)

Based on the above model of representation of moving objects, further Clustering Analysis

was performed employing Gaussian distribution based K-means technique over the

sample data.

To address to the selection / estimation of the features, several steps are performed: a

motion map is firstly obtained by temporal difference of Gaussian from each frame, from

which a number of feature points are extracted using Monte Carlo importance sampling,

with their associated velocities in the sequence are calculated using LK optical flow

algorithm. Feature points are extracted in the position-velocity space. The temporal

difference imaging helps to detect slow moving objects, give better object boundaries, and

speed up the algorithm because the temporal filter of optical flow is only applied to the

regions of change, which are detected by temporal difference imaging. In that region, for

each pixel, the motion is salient motion if the pixel and its neighborhood move in the same

direction in a period of time.

27



Figure 1. Foreground (light gray) and background Figure 2. Example of clustering motion

vectors

(dark gray) pixel colour distributions. extracted from two reversing cars by using SKMGM.

(a) represents the sample points and

(b) is a instance of SKMGM

distribution.

Figure 3. Example of the MGM distributions in position and velocity space respectively from

Fig. 3. (a) specifies the spatial distribution, and (b) depicts velocity space distributions.

Even though many background models have been proposed in the literature, the problem of

moving objects detection in complex environment is still far from being completely solved.

The above-mentioned techniques are important for object detection and tracking in video

surveillance and similar applications.

Indexing: An explicit discussion

Since the relative proportion of multimedia (video, image and audio) data within databases is expected to in-crease

substantially in the future, keyword-based indexing would be inadequate and eficient content-based query and

retrieval are required. The problem of devising content-based query, indexing, and retrieval for these newer data

types remains an open and challenging problem. Apart from the techniques discussed as above i.e. in

particular case of multifeature music representation and retrieval, and that in development

of graphs based model for video data, and more alike, we find the following approaches

available in literature and in practice for other data type like audio, and that in general.

I. Content Based Indexing & Retrieval

Content-based retrieval of multimedia database calls for content-based indexing techniques.

Different from conventional databases, where data items are represented by a set of

attributes of elementary data types, multimedia objects in multimedia databases are

28



represented by a collection of features; similarity of object contents depends on context and

frame of reference; and features of objects are characterized by multimodal feature

measures. These lead to great challenges for content-based indexing. On the other hand,

there are special requirements on content-based indexing: To support visual browsing,

similarity retrieval, and fuzzy retrieval, nodes of the index should represent certainmeaningful categories.

Indexes are crucial for those large databases to speed up the retrieval. On the other hand,

visual, fuzzy and similarity queries in those large content-based databases cannot be

implemented using conventional indexing techniques such as B-trees and inverted files,

which are proved to be very effective in traditional databases to index attributes and text.

This is because the feature measures of object contents are complex and are usually

multidimensional and multimodal. Conventional indexing techniques are based on individual

keys, which are definite and not visual. For the purpose of handling complex feature

measures, there have been researches to extend the concept of indexing using abstraction

and classification [12.8], [12.9], [12.10], [12.20]. To handle multimodal feature measures, to

gain self-organization and learning capabilities in indexing, Jian-Kang Wu [12] developed a

Content based Indexing (ContIndex) method for indexing multimedia objects.

The feature measures of object contents

Content-Based Retrieval

For completeness of the discussion, let us start from the multimedia object definition in [5]

as follows:

Multimedia Object (MOB) can be defined using a six-tuple Omob = U, F, M, A, Op, S,

where:

• U is multimedia data component.

• F = F1, F2, ... represents a set of features derived from data. A feature F i can be

either numerically characterized by feature measures in feature spaces

i

n

iii F F F F ×××× ....321

or conceptually described by a set of concepts.

• M j = M1 j, M2

j, . . . represents the interpretation of features Fi, i = 1, 2, ...

• A stands for a set of attributes or particulars of Omob.

• O p is a set of pointers or links, and is expressed as,

, sup other p

sub p p p OOOO =

are three type of pointers/links pointing/linking to superobjects, subobjects,

and other objects, respectively.

• S represents set of states of Omob.

29



Content of a multimedia object is the content of its data set U, which is restricted to a

certain set of features Fi, i = 1, 2, ... of the object and characterized by feature measure sets

ik F , k = 1, 2, . . . and further described by concept sets M j, j = 1,2, … In many cases,

feature measures are vectors and written asi

j F = x1, x2, …., xn T.

For example, representation of a facial image can be done by focusing our attention to some

visual features such as chin, hair, eyes, eyebrows, nose, and mouth. To characterize eyes,

we need to extract measures such as area, fitting parameters to a deformed template.

These feature measures are vectors and can be considered as points in feature spaces. Eyes

can also be described by a set of concepts such as big eyes, medium eyes, or small eyes.

The concepts “big,” “medium,” and “small” are interpretations of facial feature “eyes.”

Fig. 1 shows a representation hierarchy for images in content-based image databases. In

image archival phase, a bottom-up process is performed to derive from the original image

data the feature measures of regions-of-interest, and interpretations if necessary. This

bottom-up process consists of three steps, namely, segmentation, feature extraction, and

concept mapping. It performs information abstraction, and provides keys for easy access of

large image data. In retrieval phase, the image data are accessed through their feature

measures (similarity query) or interpretations (descriptive query), which are considered as

keys from database point of view. Content-based retrieval usually does not access the data

through attributes A, or directly through the data component U. Instead, it operates on

feature measures.

Fig. 1. Image representation hierarchy. To archive images into content-based image database, images are first segmented to identify

regions of interest. Feature measures are then extracted from the image data within these regions. Interpretations can be finally

generated by mapping of the feature measures into a set of concepts.

30



Content based retrieval is to find the best matches from large databases for a given query

object. The best match is defined in terms of similarity measure. Since the contents of

objects are represented by features, the similarity is then defined with respect to these

features:

(1)

where wi denotes the weight for ith feature, and ),( ii

q F F sim denotes the similarity between

the query object and an object in the database with respect to ith feature. Here we simply

express the similarity between objects as a linear combination of the measures of their

common and distinctive features [12.15].

Content-based indexing is aimed to create indexes in order to facilitate fast content-

based retrieval of multimedia objects in large databases. The index in traditional databases

is quite simple. It operates on attributes, which are of primitive data types such as integer,

float, and string. For example, to build a binary index tree on age of people in a database,

the first two branches can be created for “age >= 35” and “age != 35.” Here the operation

is simple and the meaning is definite and obvious. The situation becomes very complex in

content based indexing, which operates on complex feature measures.

• The challenges for content-based indexing are: The index must be created using all features

of an object class, so that visual browsing of the object class is facilitated, and similarity

retrieval using similarity measure, in (1), can be easily implemented.

• The context and frame of reference in similarity evaluation suggest that nodes in index treeshow consistency with respect to the context and frame of reference. For example, if, in a

level of an index tree, the similarity is evaluated with respect to eye size, the nodes in this

level will represent object categories with various eye sizes. This implies that the index tree

has similar property as classification tree.

• Multiple multimodal feature measures should be fused properly to generate index tree so

that a valid categorization can be possible. Two issues must be addressed here: first, one

measure only is usually not adequate because of the complexity of objects. Second, to

ensure the specified context and frame of reference, care must be taken in feature selection

process.

Content Based Indexing developed by [12], tries to find solution to above difficulties.

It shares features with classification tree. Horizontal links among nodes in the same level

enhance the flexibility of the index. A special neural-network model, called Learning based

on Experiences and Perspectives (LEP), has been developed to create node categories by

fusing multimodal feature measures. It brings into the index the capability of self-organizing

31



nodes with respect to certain context and frames of reference. Algorithms have been

developed to support multimedia object archival and retrieval using ContIndex.

AssumeΣ is a set of multimedia objects, ,......, 21 mω ω ω =Ω represents a set of m

classes to which Σ is to be classified. Assume also that Ω satisfies that

1) Σ≠iω for all i= 1,2,…, m;

2) Σ=∪≤≤ imi ω 1 ;

3) jiω ω ≠ for ji ≠

The indexing process consists of recursive application of mapping Ω→Σ denoted by

),( Ω=Γ Dη , where D is a set of parameters to define the mapping, and classes in Ω

represent the categories of multimedia object set Σ, and are associated with nodes of the

index tree N1, N2, ..., Nm. In ContIndex tree, number of classes m is kept the same for all

intermediate nodes for manipulation efficiency. In this case, the index tree is an m-tree. The

mapping Γ is defined by D and Ω. According to the definition, Ω is a set of classes

representing the goal of the mapping. D is related to a set of feature measures used for the

mapping. When the mapping is defined, D is represented by a set of reference feature

vectors. For simplicity, only one feature is used to create a level of the index tree.

Fig. 2 shows the first three levels of a ContIndex tree. Features selected for creation of

these three levels are:k

l

j

l iF F F F F F ===

21,,

10 . Nodes are labeled with a number of

digits that is equal to their level number (the root is at level 0). For example, N21 is a node in

second level, and is the first child of node N2, N21 N22, ... are children of node N2.

They are similar with respect to featurei

l F F =0

, inherit the reference feature

vectors of feature Fi, and represent categories (ω21, ω22, ...) with respect to featurei

l F F =

0.

New reference feature vectors will be created for them upon the creation of these nodes.

32



Fig. 2. The structure of content-based index ContIndex. As indicated in the figure, features selected for creation of

these three levels of the index tree are

k

l

j

l iF F F F F F ===

21,,

10Nodes are labeled with the number of

digits, which is equal to their level number. For example, N21 is a node in second level (the root is at level 0). It is the

first child of node N2.

A top-down algorithm for the creation of m-tree ContIndex is summarized as follows:

1) Attach all objects to root and start the indexing process from the root and down to leaf

node level.

2) For each node at a level: Select a feature, partition the multimedia objects into m classes

by using a set of feature measures, create a node for each class, and generate a reference

feature vector(s) of the selected feature and an iconic image for each node.

3) Repeat the second step until each node has, at most, m descendants.

4) Start from second level, build horizontal links with respect to features, which have been

already used at the levels above.

Horizontal zooming is facilitated by horizontal links between nodes in the same level. Let us

have a look at nodes in the second level. Nodes at this level under the same parent

m p p p p N N N N ,...,,,21

, represent categories with respect to feature1l

F and under the

same category with respect to feature0l F . Now suppose user finds

q p N is preferable with

respect to feature1l F and wants to have a look at categories of feature

0l F , which are

33



represented by nodesqqq m N N N ...,, 21 . To achieve that, we can simply create horizontal

links among these nodes.

Fig. 3. ContIndex indexing tree and its horizontal links.

Multimedia objects in the database represent event/object cases. ContIndex performs

abstraction/generalization of these events/objects cases and produces content-based index.

Intermediate nodes in the index tree represent categories of cases. They are generalization

of cases, and cases are instances of these categories. If, for example, under a category

there are similar patients a doctor has been cured, this category represents the experience

of this doctor regarding this type of patients. In general, an intermediate node represents a

certain concept, which is an abstraction of cases under it. To capture the validity of the

concept, for each intermediate node, a record of confidence is maintained. The confidence

record of a concept is high if the number of cases supporting it is large.

Content-Based Retrieval Using ContIndex

The retrieval process is a top-down classification process starting from the root of the

tree. At each node, the process chooses from the child nodes one or more nodes which are

the nearest to the query object with respect to the feature used for creation of this node in

34



the index creation process. How many child nodes should be chosen depends on the weight

for the feature. A higher weight implies that the feature is more critical. Less child nodes

should be chosen.

Spatial Self-Organization

For visual browsing of databases, spatial organization of nodes are preferable. For

example, to view types of eyes with respect to eye size, we prefer all icon images are

displayed with the size from largest to smallest on the screen. For this purpose, Self-

Organizing Map (SOM) by Kohonen [12.13] is an effective neural network paradigm for the

ContIndex creation.

II. Transform based Indexing of Audio Data [13]

For representation and indexing of audio data various methods are available including

methods that use pitch characterizations [13.10] or several acoustical characteristics [13.9].

In a work by Subramanya et. al. [13], transform based indexing method has been developed

that accrues the many useful properties of working in the frequency domain familiar to data

compression and signal processing applications such as low sensitivity to additive or

multiplicative scaling, low sensitivity to (high-frequency or white) noise and low space

utilization.

Basics of transforms

Transforms are applied to signals (time domain signals, like audio or spatial, like images) to

transform the data to the frequency domain. This offers several advantages such as easy

noise removal, compression, and facilitates several kinds of processing. Specifically, given a

vector X = ( x 1, x 2, …, x N) representing a discrete signal, the application of a transform yields avector Y = ( y 1, y 2,…, y N) of transform coefficients and the original signal X can be recovered

from Y by applying the inverse transform. The basics of transform-inverse-transform pairs

for DFT and DCT have been used here. In particular, the standard DFT pair is given by:

35



One of the features of a good transform is that, after the application of the transform, only a

fraction of the coefficients in the resulting vector Y can be used to reconstruct a good

approximation of the original signal.

Outline of indexing schemeEach audio file or stream is divided into small blocks of contiguous samples and a transform

like discrete fourier transform or discrete cosine transform is applied to each block. This

yields a set of coefficients in the frequency domain. With a suitable transform, only a few

significant coefficients are adequate to reconstruct a good approximation of the original

signal. (This feature of the transform has also been the main basis for lossy data

compression). Selecting an appropriate subset of the frequency coefficients and retaining a

pointer to the original audio block create an index entry. Thus, the index occupies less space

than the data and allows for faster searching. Next, a query is similarly divided into blocks to

each of which the transform is applied and a subset of transform coefficients is selected.

This forms the pattern. Then, the index is searched for an occurrence of this pattern. In this

case, two strings are considered matched if they are within a small enough “distance” of

each other when distance is measured according to the root-mean-square-difference of the

real-valued components of the strings.

Specifically, suppose A = a1 a2… an represents the discrete samples of the original audio

signal (basically the contents of the audio file) and Q = q1 q2… qm represents the samples of

a given query. Both the original signal and the query are divided into blocks of size L.

Without loss of generality assume that the lengths of the data and query are integral

multiples of the block size. (The other case can be suitably handled). Let the blocks of

original audio and the query be Al, A2, …, AN and Q1, Q2, QM. Generally M << N.

Consider a block of the original signal. Application of a transform (say

FFT, DCT or any similar transform) to Ai will yield a new sequence of values Y i = y1 y2 . . . y L , where Y i = T . Ai ,

where T is the transform matrix and is independent of the input signal. With a suitable transform, usually a few

significant values of Yi (the first few values by position (zonal selection) or the largest few values by magnitude

(threshold selection)) would be enough to reconstruct a good approximation of the original data. Suppose k

significant values of each of the blocks are retained to serve as index for the original data. Specifically, Let

be the index for block Ai denoted by DBC i. With threshold selection, we need to remember the

locations (positions) of the coefficients and these are saved in DBCLi. There will be N such indices for A, one for

each block of A. Together, these form the index set for the original data. Similarly application of the same transform

to a block Qi of the query will yield a sequence of values QBC i = z 1 z 2 z L, where QBC i = T.W . The appropriate k

values of QBC i are compared against the index sets to determine a match (exact or close).

Blocking and segmentation

36



To derive the transform-based index, the audio data (signal) is divided into fixed-size units called blocks, a process

referred to as blocking . A suitable transform is then applied to these individual blocks. The advantages of blocking

are the following:

• When transforms are applied to the whole signal, the transform coefficients capture global averages but not

the finer details.• Blocks of appropriate sizes would contain samples which are highly intercorrelated, so that when

transforms are applied, there is more energy compaction and thus fewer transform coefficients would

adequately describe the data.

• The transforms on the individual blocks could be carried out in parallel.

In segmentation on the other hand, the audio data is divided into variable-length units called segments. The data

within a segment does not vary much. The positions in audio data where very sharp changes occur define the

segment boundaries.

Search algorithm and analysis

In the work presented in this paper, the audio data and the query are divided into fixed-size blocks. In the index

searches, the transform coefficients of the query are compared with corresponding coefficients of the data blocks

and the distance between them is determined. If the distance is below an experimentally determined threshold, it is

accepted as a match. In the following algorithms and their analysis, the following notations are used:

L: The length of a block (Number of samples); N : Number of blocks of the data; M : Number of blocks of the query;

k : The number of significant transform coeffs. per block retained as index.

QBC : Query Block Coefficients. (Obtained by applying transform on query blocks).

DBC : Data Block Coefficients. (Obtained by applying transform on data blocks).

DBCL: Data Block Coefficient Locations. RBC : Reconstructed Data Block Coefficients.

RBCL: Reconstructed Data Blk Coeff. Locations.

Each block of QBC contains L elements and each block of DBC and DBCL contains k elements. (Each block of

RBC and RBCL also contains k elements).

Robust search algorithm

It assumes that the query block boundaries are aligned with those of the data block boundaries.

37



38



39



III. Indexing for Very Large Multidimensional data [14]

As the speed of processors continues to improve, researchers are performing large-scale scientific simulations to

study very complex phenomena at increasingly finer resolution scales. Such studies have resulted in the generation

of datasets that are characterized by their very large sizes ranging from hundreds of gigabytes to tens of terabytes,

thereby generating an imperative need for new interactive visualization capabilities.

A typical way of visualizing such a large multidimensional volumetric data set is to first reduce the dimension of the

data set using techniques such as slicing and then to render the result using one of the isosurface or volume

rendering techniques. Slicing is a very useful tool because it removes or reduces occlusion problems in visualizing

such a multidimensional volumetric data set and it enables fast visual exploration of such a large data set. In order to

efficiently handle the process, we need an efficient out-of-core indexing structure because such a data set very often

does not fit in main memory.

A typical approach to build indexing structures in the case of time-varying volumetric data is to build a separate

indexing structure on each time step of the data set. For example, Sutton and Hansen’s temporal branch-on-need

structure (T-BON) [14.3] is the most representative. Their strategy is to build an out-of-core version of Branch-On-

Need-Octree (BONO) [14.4], in which each leaf node is of disk page size, for each time step and to store general

common infrastructure of the trees in a single file. However, the method of building (n-1)-dimensional trees along a

particular dimension such as that used for the T-BON unfortunately results in the size increase linearly with the

resolution size at the particular dimension (the number of time steps in the case of T-BON). This is due to the fact

that it does not exploit any type of possible coherence across the particular dimension. This lack of scalability

becomes more problematic as we generate higher and higher resolution data in every dimension including the time

dimension.

Building a series of (n-1)-Dimensional indexing structures on n-Dimensional data causes a scalability problem in thesituation of continually growing resolution in every dimension. However, building a single n-Dimensional indexing

structure can cause an indexing effectiveness problem compared to the former case. The information-aware 2n-tree

has been proposed [14] to maximize the indexing structure efficiency by ensuring that the subdivision of space has

as similar coherence as possible along each dimension. It is particularly useful when data distribution along each

dimension constantly shows a different degree of coherence from each other dimension.

Information-Aware 2n-Trees

Information-Aware 2n-trees (IA 2n-trees) are basically 2n-trees (e.g. quadtrees for 2-D and octrees for 3-D

[14.13]) for n-dimensional space. However, it is different in terms of how it decides the extent ratios of a subvolume

when multiple dimensions are integrated into one hierarchical indexing structure. The coherence information along

each dimension is extracted and used for the decision so that each subvolume contains as similar coherence as

possible along each dimension.

A. Dimension Integration

40



We present an entropy-based dimension integration technique. Entropy [14] is a numerical measure of the

uncertainty of the outcome for an event x, given by H(x) = ∑=

−n

i

ii p p1

2log , where x is a random variable, n is the

number of possible states of x, and pi is the probability of x being in state i. This measure indicates how much

information is contained in observing x. The more the variability of x, the more unpredictable x is, and the higher the

entropy. For example, consider a series of scalar field values for a voxel v over the time dimension. The temporal

entropy of v indicates the degree of variability in the series. Therefore, high entropy implies high information

content, and thus more resources are required to store the series. Note that the entropy is maximized when all the

probabilities pi are equal.

Fig. 1. Entropy estimation in each dimension. Note that the Fig. 2. Different supercell sizes and corresponding y dimension has

almost zero entropy in this example. hierarchical indexing structures for the data of Figure 1:

(a) standard supercell; (b) information-aware supercell.

Higher entropy of a dimension relative to the other dimensions implies that this dimension needs to be split

at finer scales than the other dimensions. For example, if a temporal entropy is twice as much as the spatial entropy,

we design the supercell to be of size ( )t z y x s

s s s ××××××2

, where s is the size of the spatial dimension of

the supercell. Figures 1 and 2 show how this entropy-based dimension integration leads to an indexing structure for

the 3-D case. Figure 1 shows an extreme case in which the values along the y dimension remain almost constant

over all possible (x, z) values (that is, the entropy of y is almost zero) while each of the x and z dimensions has some

degree of variability. The supercell size and the corresponding hierarchical indexing structure will be designed as

shown in Figure 2 (b), that is, it has a quadtree structure unlike the standard octree of Figure 2 (a) in which the

supercell has the same size in each dimension. To estimate the ratios of the entropy values among n dimensions, we

randomly select a set of n-Dimensional subvolumes and for each subvolume, obtain the ratios by simply computing

each entropy value along each dimension. The ratios are averaged and globally applied in building indexing

structures. In computing the entropy values, if the number of the possible scalar field values is large (as in the case

of floating point values), we first quantize the original values into n values using a non-uniform quantizer such as

the lloyd-max quantizer. Further, we compute the spatiotemporal entropy ratio defined as the ratio of the average

spatial entropy to the temporal entropy.

41



B. Indexing Structures

We make use of the entropy ratios for the purpose of guiding the branching of the tree and ultimately

adjusting the size of supercells by dividing the dimension of high entropy more finely and that of low entropy more

coarsely. It is simply carried out by multiplying the original size of each dimension by its entropy value, which

becomes the ‘effective’ size of the dimension, and then using the ‘effective’ size instead of the original size in

branching of the tree. In addition to that, we adopt the Branch-On-Need strategy [14.4] by delaying the branching of

each dimension until it is absolutely necessary. For efficient isosurface rendering, each tree node contains the

minimum and maximum values of the scalar fields in the region represented by the node. The size of the tree can be

reduced by pruning nodes in which the minimum and maximum values are the same because they do not contribute

to isosurface extraction.

Further work may include evaluating the goodness of the entropy measure in comparison to other

measures and finding out a more adaptive way of applying the coherence difference in the subdivision as well as a

more effective way of decomposing the time series.

Universal Communication Format for Multimedia Data

Common data format is desired in expanding machine communication field. XDR [15.1] is a hardware

level data standard, and ACL [15.2] is an agent level logical transaction standard. The authors have been developing

an application level content representation called as UDF (Universal Data Format), which is flexible and capable for

representing multimedia data [15.3- 15.4]. However, multimedia data transmissions tend to be in large quantities

even if the receiver requires only a part of it. Data receiver needs to communicate with the sender about what

quantity to send, what kind of quality required. To meet this requirement, we designed UCF (Universal

Communication Format), which has bi-directional communication capability as an extension of UDF.

Brief description of UDF

The UDF is designed to represent any data that can be used on intelligent equipments and software. The

followings are basics of UDF.

(1) Content indication: Data section is wrapped by tags as <text> TEXT DATA </text>.

(2) Tag and data flexibility: Any tag can be defined and any data can be presented in the data section. What kind of

tags can be processed depends on the receiving software.(3) Multimedia multiplexing: Multimedia data sequences are switched as <audio>…</audio><video>…

</video><audio>…</audio>…

The followings are key features of UCF , which enables bi-directional communication.

(1) Target addressing: Although we used tags as data type identifiers in UDF. They can be thought as the names of

objects, specified by the sender, to receive the data. Therefore, we define UCF tags as the names of object. Data to

program A on host B is expressed as <B><A>DATA</A></B>. The wrapped tag (in this case, <A>) in the data

42



section indicates the inner object address. In this manner, any communication object including host and program can

be expressed by tag.

(2) Source addressing: A receiving object may need the address to reply when to return some data or messages. In

UCF, <s> tag indicates the reply-to address as in <A><s>B</s>SEND ME DATA</A>.

(3) Data interpretation: We intended to make a common data representation for each data type, such as text, graphic

object, image, audio, video and etc. However, it is difficult to define, ultimately, the best and common data format.

Then, a practical solution is to leave details to each object. Each named object can define its data format and

interpretations.

The hierarchically wrapped addressing scheme of UCF enables cross-layer communications, naturally.

Sequential nature of UCF multimedia multiplexing implies synchronization of media, e.g. audio and video. Standard

schemes for message generation and handling are to be investigated.

Figure: Example of UCF control data.

Data Hiding and Error Concealment

[ ……… to do……………………]

Review of contemporary researches in concerns related to representation and processing of

Time-series data


[ Concerns: Data management, framework ]

The concept of time series data becomes relevant in context of for example, videos, images, audios, financial data, time

series of traffic flow and so on, where we now have higher expectations in exploring these data at hands. Typical manipulations

are in some forms of video/image/audio processing, including automatic speech recognition, which require fairly large amount of

storage and are computationally intensive.

Many approaches and techniques that address the time series data representation and manipulation, have been proposed

in the past decade. Most commonly used representations are the Discrete Fourier Transform (DFT), the Discrete Wavelet

43



Transform (DWT), Singular Value Decomposition (SVD), Adaptive Piecewise Constant Approximation (APCA), and Piecewise

Aggregate Approximation (PAA). Recently, one promising representation method was proposed called Symbolic Aggregate

Approximation (SAX).

Major Processing Concerns and Solutions

[Concerns: Clustering: traffic, K-means, hierarchical, for clinical data; Correlation analysis; Unsupervised-outlier; Periodic patterns; Similarity mining; Visual

exploration for financial data; An improved data mining algorithm for traffic flow ]

There has been much recent work on adapting data mining algorithms to time series databases. [17.1] introduced a

kernel-density-based algorithm, It ensures those uninteresting sequences would not affect the clustering result. [17.2] firstly used

k-means algorithm, then the prototypes of the resulting clusters were used as time-variables to develop an Auto Regressive model

relating the expression of the prototypes to each other. On hierarchical clustering algorithms, [17.3] developed a method called

Gecko which is similar to Chameleon. This method divides the process of clustering into three steps (segmentation; merging;

determine the best clustering level). The method is used for time-series anomaly detection. The disadvantage of the algorithm is it

takes too much time to cluster. [17.4] proposed a density-based hierarchical clustering method and proved the method is not

sensitive to noisy data. Also, Pedor Rodrigues et al. developed an online divisive-agglomerative clustering system for time-series

data streams in [17.5]. However, the researches mentioned above are almost for time series of gene expression data in biology.

I. Time Series for Gene Expressions

For estimating gene networks from time series gene expression data measured by microarrays, a lot of attention has

been focused on statistical methods, including Boolean networks [19.1,19.11] differential equations [19.3, 19.5] dynamic

Bayesian networks [19.6, 19.7, 19.8] state space models [19.2, 19.4] and so on. While these methods have provided many

successful applications, a serious drawback for using these method to estimate gene networks had been that : a basic assumption

of these methods is that the network structure does not change through all time points, while the real gene network has time-

dependent structure. In a recent work [19], a solution of this problem was provided and a statistical methodology was established

to estimate gene networks with time-dependent structure by using dynamic linear models with Markov switching. This model is

based on the linear state space model, also known as the dynamic linear model (DLM). In the DLM, the high-dimensional

observation vector is compressed into the lower dimensional hidden state variable vector. For the microarray analysis, the

observation vector corresponds to the gene expression value vector and the state variables can be considered as a transcriptional

module that is a set of co-regulated genes.

Dynamic Linear Model

Let yt be a vector of d observed random variables which contains expression values of d genes at time point t. The

DLM relates a collection of , to the hidden k-dimensional state vector xt in the following way:

Here, the At is a d x k measurement matrix and the wt is the Gaussian white noise as Usually the

dimension of state vector is taken to be much smaller than that of data, k < d. In DLM, the time evolution of the state variables

are modeled by a first-order Markov process as

44



Where state transition matrix and the additive system noise follows form the Gaussian distribution as

. the noise covariance matrices are assumed to be diagonal,

, respectively. Notice that the model parameters

depend on the time index. This implies that the underlying dynamics changes discontinuously at certain

undetermined points in time.

The process of the DLM starts with an initial Gaussian state that has mean and covariance matrix . In

DLM, the dynamics of are governed by the joint probability

distribution. The all composition in this representation are the Gaussian density in which

.

The DLM, in its canonical form, implicitly assumes an interesting casual relationship among the d variates (genes ). To

sum up, the time-dependent DLM describes the consecutive changes in module sets of genes, module-module interactions and

gene-gene interactions with the underlying canonical form (see Figure 1). After learning and the projection matrix

we can identify the time-dependent network structure by testing whether or not these parameters lie in a region

significantly far from zero. This problem amounts to the classical testing method or the bootstrap confidential intervals.

DLM with Markov Switching

The problem of modeling change in an evolving time series can be handled by incorporating the dynamics of some

underlying model change discontinuously at certain undetermined points in time. In view of real biological system, the structural

change might occur in smooth. To incorporate a reasonable switching structure, we employ the DLM-MS approach that assumes

the is generated by one of the G possible regimes evolving according to a Markov chain. In this context, the model

parameters are assumed to take one of the G possible configurations

at each time point. For notational convenience, we introduce the hidden vector of G

class labels to indicate the configurations in the following way:

The DLM-MS, in its basic form, assumes that the discrete variable evolves according to the first-order Markov

chain with the transition probability matrix M of order G x G where the (h,g) element defines a probability of event

i.e.,

45



Each row of denoted by is restricted to be Smoothness of change in regimes are

controlled by the entropy of for h = 1,. . . ,G.

Bayesian Inference

For some gene expression data, each array contains some genes with fluorescence intensity measurements that were

flagged by the experimenter and recorded as missing data points. In such a case, is incomplete. To deal with the missing

problem, we define the partition of d observed vector where contain the observed and

missing components, respectively. Consequently, the DLM-MS takes as a complete dataset

having the joint distribution

The parameters to be learned from the observed dataset are collected into a set

The and denote the initial distributions to derive the dynamic

system. Our attention turns to the Bayesian learning of DLM-MS that requires the prior distribution of all model parameters

and the initial distribution of the hidden states and . In this study, we employ the natural conjugate priors.

Let be the i-th row of A, and B, respectively. A family of the conjugate priors of DLM-MS that we use are

expressed as follows:

Where stands for the inverse-gamma distribution with the shape and the scale parameter 6, and

denotes the Dirichlet distribution with the prior sample size Note that the prior distribution

of A, is specified by the truncated Gaussian distribution whose support are restricted to the positive part

For DLM setting the underlying dynamical system is invariant under the transformations as and

To avoid the lack of identifiability, we use the truncated prior distribution. Once the prior distributions are given,

the augmented parameters are estimated through the posterior distribution

46



Within Bayesian framework, all inferences are made based on the marginal posterior distribution, for instance,

47



from these full-conditional distributions. If the iteration have proceeded long enough, the simulations is grossly representative of

the target distribution. To diminish the effect of the starting point, we generally discard the first p simulated samples and focus

attention on the rest n-p. The set is used to summarize the posterior distribution and to

compute quantiles and the other summaries of interest as needed.

II. Time Series for Traffic DataUsing data mining technology to analyze time series of traffic flow not only can forecast the short-time or long-time

traffic volume, but also can judge which street of a city is bottleneck. So that it helps a lot to analyze the traffic situation of the

city. In fact, clustering of similar change trends of traffic flow time series is an interesting issue now. On one hand we can get

some typical patterns of traffic flows, on the other hand we can group the section of highway where the detectors located

according to different flow characteristics. Therefore the sections of highway in one group have similar traffic flow

characteristics, and the sections of highway in different groups have distinct characteristics. Combined with spatial information,

some useful spatial and temporal distributed patterns in transportation could be revealed.

Linkage Difference

In this paper, they used average linkage as distance (or similarity) between clusters. Given two clusters A and B: a1,a2, …, am, B = b1, b2, …, bm m and n are the sizes of A and B. W is the similarity matrix among time

series, that is . Let and then between cluster A and cluster B is:

48



An Algorithm of Similarity Mining in Time Series Data on the Basis of Grey Markov Scgm(1,1) Model has also been

proposed in a recent work by Xiong et.al. [18].

Encoded-Bitmap-Approach-Based Swap

Given two clusters A and B, that is to say now the number of clusters k=2. For an arbitrary time series u∈ A,B, we

can get all the linkage difference D(u,A,B). For every u∈ A, there are two conditions about the value of ∆D(u,A,B):

1. ∆D(u,A,B) = D(A,u) – D(B,u) < 0. It means series u has a relative larger linkage to cluster B even though it is located in

cluster A. Then we move series u to cluster B.

2. ∆D(u,A,B) = D(A,u) – D(B,u)≥ 0. It means series u has a relative larger linkage to its initial cluster A. Then we do

nothing in this situation. For every v ∈ B, there are similar two conditions:

1. ∆D(v,B,A) = D(B,v) – D(A,v) < 0. It means series v has a relative larger linkage to cluster A even though it is located in

cluster B. Then we move series v to cluster A.

2. D(v,B,A) = D(B,v) – D(A,v) ≥ 0. It means series v has a relative larger linkage to its initial cluster B. Then we do

nothing in this situation.

If the number of existing clusters k k>2(assume them to be A,B,C,D,…), then the same we can have: for every u∈A, if all ∆

D(u,A,B) are greater than zero then we do nothing to this situation, else we select the biggest one in absolute value from where

∆D(u,A,X)<0. This procedure of swapping series is called the “Encoded-Bitmap-Approach-Based Swap”. The algorithm when

k=2 is presented below.

Algorithm: EncodedBitmap_Based_Swap(ACluster,BCluster)

// Here k=2; Input: original clusters ACluster and BCluster Output: the two new clusters

Begin

Step 1.Use Encode Bitmap Approach to calculate the similarity matrix:

W(ACluster,BCluster)

Step 2.For every time-series u∈ACluster, Calculate D(A,u) and D(B,u)

If ∆D(u,A,B) = D(A,u) – D(B,u) <0, then move u to BCluster

Step 3.For every time-series v∈ BCluster, Calculate D(B,v) and D(A,v)

If ∆D(v,B,A) = D(B,v) – D(A,v) <0, then move v to ACluster

49



End.

Both grey relation and Encoded-Bitmap-Approach-Based Swap were adopted [17] to improve the classic hierarchical

clustering algorithm.

Algorithm: Improved Hierarchical Clustering Method

// Input: Time Series Datasets Output: K Clusters

Begin

1. Start by assigning each item to its own cluster, so that if there are N items, there are N

clusters, each containing just one item.

2. Use grey relation as time series similarity measurement, and then let the similarities between the clusters equal the similarities

between the items they contain.

3. Find most similar pair of clusters and merge them into a single cluster, so that now one

cluster can be reduced.

4. Compute the average linkage as similarities between the new cluster and each of the old clusters.

5. Repeat steps 3 and 4 until get K clusters.

6. Adopt encoded-bitmap-approach-based swap to refine the K clusters from step 5 and then get the new K clusters.

End.

Experimental results show that, comparing with the classic hierarchical clustering method, the above method has a

better performance on the separation of time series’ change trend.

III. Time Series in Multimedia data

Typical multimedia manipulations require considerable amount of storage and are computationally intensive.

Generally, we can use various image processing techniques [16.8][16.12][16.14][16.24] [16.26] to cluster multimedia data, by

measuring similarities among the raw videos or images, using certain features such as color, texture, or shape. However, recent

work [16.17][16.22] have demonstrated the utility of time series representation as an efficient alternative to the raw multimedia

data, whose advantages include time and space complexity reduction on clustering, classification, and other data mining tasks. In

clustering multimedia time series data, k-medoids algorithm with Dynamic Time warping distance measure is often used. In fact,

there are many other distance measures that can be effectively used for time series data, but we will mainly focus on DTW due to

its ideal shape-based similarity measurement that can break the limitation of one-to-one mapping in Euclidean distance, the most

well-known distance metric. Although k-medoids with DTW gives satisfactory results, k-means clustering is conceivably much

more typical in clustering task, where an averaging algorithm is a crucial subroutine in finding a data representation of each

cluster. In general, Euclidean distance metric (or other types of Minkowski metric) is used to find an average of all the data

within the clusters. However, its one-to-one mapping nature is unable to capture the average shape of the two time series, in

which case the Dynamic Time Warping is more favorable. The work by Gupta et al. [16. 9], introduced the shape averaging

approach using Dynamic Time Warping. In their work, Niennattrakul [16] et. al. provided a generic time

series shape averaging method with a proof of correctness.

Distance Measurement

Distance measure is extensively used in finding the similarity/dissimilarity between

any two time series. The two well known measures are Euclidean distance metric and DTW

50



distance measure. As a distance metric, it must satisfy the four properties – symmetry, self-

identity, non-negativity, and triangular inequality. A distance measure, however, does not

need to satisfy all these properties. DTW [21] is a well-known shape-based similarity

measure for time series data. Unlike the Minkowski distance function, dynamic time warping

breaks the limitation of one-to-one alignment, and also supports non-equal-length timeseries. It uses dynamic programming technique to find all possible paths, and selects the

one that yields a minimum distance between the two time series using a distance matrix,

where each element in the matrix is a cumulative distance of the minimum of the three

surrounding neighbors. Suppose we have two time series, a sequence Q = q1, q2, …, qi, …,

qn and a sequence C = c1, c2, …, cj, …, cm. First, we create an n-by-m matrix, where every

(i, j) element of the matrix is the cumulative distance of the distance at (i, j) and the

minimum of the three elements neighboring the (i, j) element, where ni ≤≤0 and m j ≤≤0 .

We can define the (i, j) element as:

(1)

where is (i, j) element of the matrix which is the summation between

the squared distance of qi and cj, and the minimum cumulative distance of the three

elements surrounding the (i, j) element. Then, to find an optimal path, we have to choose

the path that gives minimum cumulative distance at (n, m). The distance is defined as:

(2)

where P is a set of all possible warping paths, and wk is (i, j) at kth element of a warping

path and K is the length of the warping path. The algorithm generates optimal warping paths

even though the warping distance will always turn out to be the same.

Dynamic Time Warping Averaging

In some situations, we may need to find a template or a model of a collection of time

series, in which case, shape averaging algorithm is desired for a more accurate/meaningful

template. DTW distance measure will be exploited to find appropriate mappings for an

average. More specifically, the algorithm needs to create a DTW distance matrix and find anoptimal warping path. After the path is discovered, a time series average is calculated along

this path by using the index (i, j) of each data point wk on the warping path, which

corresponds to the data points qi and cj on the time series Q and C, respectively. Each data

point in the averaged time series is simply the mean of two values on the two time series

51



that index (i, j) maps to. W = w1, w2, …, wk, …, wK is an optimal warping path, where wk is

the mean value between time series whose indices are i and j.

(3)

in query refinement, where the two time series may have different weights, αQ for a

sequence Q and αC for a sequence C, eq. (3) above may then be simply generalized

according to the desired weight below

(4)

As shown in Figure 1, what we want from a shape averaging algorithm is illustrated in

Figure 1 (a) where DTW is used. If the Euclidean or any one-to-one mapping distance

measures were used, we would probably end up with undesirable result, as shown in Figure

1 (b).

Figure 1. A comparison between (a) shape averaging and (b) amplitude averaging

K -means Clustering

As shown in Table 1 below, the k-means algorithm [16.4] tries to divide N data

objects into k partitions or clusters, where each would have one object (mean) as its cluster

center, representing all data objects within that cluster. We then assign the rest of the

objects to proper clusters and recalculate new centers. We repeat this step until all clustercenters are stable. In general, after each iteration, the quality of the clusters and the means

themselves will essentially be improved.

K -medoids Clustering

52



Unlike k-means clustering, k-medoids [16.11] only differs from k-means in the way

the cluster centers are chosen and represented (step 4), i.e., it will find new cluster centers

by choosing an existing data member within each cluster that best represents its cluster

center, instead of calculating the cluster members’ average.

Figure 4. Examples of six-class Leaf images Figure 3. Examples of six species in Leaf dataset

Figure 7. Tracking hand position in each video frame Figure 6. Examples of four different Face profiles after

converted into time series

K -means Clustering with DTW

It has been demonstrated that k-medoids clustering for multimedia time series data

runs smoothly with DTW. In contrast, it has been observed [16], that if k-means method is

instead used in clustering, there is a high probability of failure, comparing to the k-medoids

algorithm (and that is probably why Euclidean averaging is often used for k- means shapeaveraging despite the use of DTW in cluster membership assignment). The paper[16],

pointed out some interesting problem, which occurs when using kmeans clustering and

DTW. Future study may be done to investigate how these problems can be resolved and

come up with possible remedies in accurately averaging shape-based time series data.

IV. Time Series in Financial Applications

53



Financial time series data has its own characteristics over other time series data. One

of its special characteristics is that it is typically characterized by a few critical points and

multi-resolution consideration is always necessary for long term and short-term analyses.

Second one is that financial time series data is continuous, large and unbound. There are

many technical analytical methods for financial time series data to identify patterns of market behavior. In those financial analytical methods, critical or extreme points, which the

original SAX cannot handle, are very important to discover. To reduce a loss of these

important points, Extended SAX representation especially for financial data analysis and

mining tasks is devised by Lkhagva et. al. [20]. The basic idea of the method proposed in

the paper, is based on two previously proposed representation techniques. These two

methods are the PAA and the SAX representations,

Piecewise Aggregate Approximation (PAA)

Yi and Faloutsos [20.7] and Keogh et al. [20.4] independently proposed PAA. In PAA,each sequence of time series data is divided into k segments with equal length and the

average value of each segment is used as a coordinate of a k-dimensional feature vector.

Figure 1: A time series C is represented by PAA (by the mean values of equal segments). In the

example above, the dimensionality is reduced from n = 60 to k = 6.

The advantages of this transform are that 1) it is very fast and easy to implement, and 2) the index can be

build in linear time. As shown in Figure 1, in order to reduce the time series from n dimensions to k dimensions, the

data is divided into k equal sized segments. The mean value of the data falling within a segment is calculated and a

vector of these values becomes the data-reduced representation.

More formally, a time series C of length n can be represented in a k-dimensional space by a vector k and

the ith element of C is calculated by the following equation [20.4]:

(1)

However since the PAA approach minimizes dimensionality by the mean values of equal sized frames.

This mean value based representation may cause a possibility to miss some important patterns in some time series

data analysis.

Symbolic Aggregate Approximation (SAX)

54



Lin and Keogh et al. [20.3] proposed new approach called SAX. SAX is based on PAA [20.4, 20.7] and

assumes normality of the resulting aggregated values. SAX is the first symbolic representation of time series with an

approximate distance function that lower bounds the Euclidean distance. In SAX, firstly the data is transformed into

the PAA representation and then the transformed PAA representation is symbolized into a sequence of discrete

strings. There are two important advantages to doing this:

Dimensionality Reduction: Dimensionality reduction of PAA [20.4, 20.7] is automatically carried over to

this representation.

Lower Bounding: Distance measure between two symbolic strings can be proved by simply pointing to

the existing proofs for the PAA representation itself [20.4].

In order to obtain string representation after a time series data is transformed into the PAA representation,

symbolization region should be determined. By empirically testing more than 50 datasets, it was defined that

normalized subsequences have highly Gaussian distribution [20.3]. From this result, the “breakpoints” that will

produce equal-sized areas under Gaussian curve is determined. “breakpoints” is defined as the following.

Definition 1 [3]: Breakpoints: breakpoints are a sorted list of numbers such that the area under a

N(0,1) Gaussian curve from are defined as , respectively). Thes

breakpoints can be determined by looking them up in a statistical table, e.g.

Table 1: A lookup table that contains the breakpoints that divides a Gaussian distribution in an

arbitrary

number (from 3 to 5) .

Using these defining breakpoints, a time series is discretized in the following example. First a PAA of the

time series is obtained. Then, all PAA coefficients that are below the smallest breakpoint are mapped to the symbol

“A,” all coefficients greater than or equal to the smallest breakpoint and less than the second smallest breakpoint are

mapped to the symbol “B,” etc. Figure 2 illustrates the idea.

Figure 2: A time series is discretized by SAX. In the example above, with n = 60, k = 6 and a = 3, the

time series is mapped to the word ABCBBA.

SAX has also some disadvantages such as the dimensionality reduction nature that has possibility to miss

important patterns in some datasets as depicted in figure 3.

55



Figure 3: Financial time series data is represented by SAX. Figure 4: Financial time series data is

Some important points (shown in red) are missing. (US$ represented by Extended SAX. The

Extended and Japanese yen exchange rate data of 2 months.) The SAX representation is

SAX representation is CFCBFD. ACFFDFFCAABFFFFDCA. (US$ and Japanese

yen exchange rate data of 2 months.)

Further modified technique Extended SAX has been proposed in [20], as the result depicted in figure 4.

Review of contemporary researches in concerns related to representation and processing of

Spatial data


A pictorial database plays an important role in many applications including

geographical information systems, computer aided design, office automation, medical image

archiving, and trademark picture registration. In such fields there is a need to manage

geometric, geographic, or spatial data, which means data related to space. The space of

interest can be, for example, the two-dimensional abstraction of (parts of) the surface of the

earth – that is, geographic space, the most prominent example –, a man-made space like the

layout of a VLSI design, a volume containing a model of the human brain, or another 3d-

space representing the arrangement of chains of protein molecules.

Representation of relative spatial relations between objects is required in many multimedia database

applications. Quantitative representation of spatial relations taking into account shape, size, orientation and distance

is often required. This cannot be accomplished by assimilating an object to elementary entities such as the centroid

or the minimum bounding rectangle. Thus many authors have proposed numerous representations based on the

notion of histograms of angles. There are many general-purpose content-based image retrieval systems, e.g. the

QBIC [21.6] system and the Photobook [21.14]. They mainly use color, texture and shape as image features.

However, representing the spatial relations between objects is also an important component of image content

description and access. For example, the spatial relationship between brain lesions and anatomical brain structures in

medical images is critically important for early disease diagnosis and thus important for image retrieval. Typical

56



applications of spatial relation representations are content-based image retrieval (e.g. [21.3, 21.8, 21.15, 21.17]),

video indexing and retrieval (e.g. [21.5]), computer vision, robot navigation, and Geographic Information Systems

(GIS).

To assess the degree of similarity of two images according to the spatial relations between objects, first we

need to extract a compact representation of spatial relations from images, and then define a (dis)similarity measure

(e.g. a distance function) on such representations. Our ultimate goal is to answer queries like “find similar MR

images to one with a lesion inside the frontal lobes”, or “find similar surveillance video sequences to one in which a

man walks from the middle of a room to the east side.”

Significant work has been reported on spatial relation representation. Many authors have stressed the

importance of qualitative spatial relationships [21.4]. Approaches have been based on Allen’s interval relations

[21.1] (e.g. [21.13, 21.16]), 2D strings [21.3] and their variants, Attributed Relational Graphs (ARGs) (e.g. [21.15])

or the spatial orientation graph (e.g. [21.8]). All of these approaches assimilate an object to very elementary entities

such as the centroid (e.g. [21.8, 21.9]) or the minimum-bounding rectangle (e.g. [21.13]). This simplification process

cannot give a satisfactory modelling of the spatial relations. For example, projecting two objects to each of thedimensions and considering each dimension separately is inadequate, because the two objects may not overlap at all

when their projections onto the x and y axes overlap simultaneously. In [21.12], Miyajima and Ralescu introduced

the notion of the histogram of angles to represent directional relations.

I. Histogram based representation

In [21], a new histogram representation of spatial relations called R-Histogram. Here, we assume the

images are segmented and each object is assigned a unique label i.e., we deal with symbolic images, as defined

formally in [21.8]. The dissimilarity between two images is then defined by the distance between the two

corresponding R-Histograms.

The R-Histogram

Given a reference object R and an object of interest A, the goal is to represent, quantitatively, the spatial

relations between R and A. Consider the vector originating from a pixel x on the boundary of R to a pixel y on the

boundary of A. If x and y don’t coincide, we compute the angle between the x-axis of the coordinate frame and xy .

This angle, denoted by θ (x,y), takes values in[-π , π ]. As in histogram of angles [21.12], the set of angles from

any pixel on the boundary of R to a pixel on the boundary of A expresses the directional relations between R and A.

The novel idea introduced in this paper is the labeled distance. The labeled distance from x to y, denoted by LD(x,

y), is defined as a pair (d ( x, y), l ( x, y)), where d ( x, y) is the Euclidean distance from x to y and l(x, y) is defined in

Table 1.

57



Here, column1 describes whether pixel x is inside object A, and

column2 describes whether pixel y is inside object R.

For the set of vectors originating from any pixel on the boundary of R to any pixel on the boundary of A,

we construct a histogram as follows: Let x and y be the pixels on the boundary of R and A respectively. The bin H(I,

J,L) is incremented as follows:

(1)

where A I is the range of angle values spanned by bin H ( I, J,L), D J is the range of distance values spanned by bin H ( I,

J,L), and L ∈ 0 , 1 , 2 , 3 is the label associated with the distance values spanned by bin H ( I, J,L).

Then the histogram is normalized as follows:

(2)

where n A is the number of angle bins and n D the number of distance bins. The normalized histogram, denoted as

RH ( A,R), is defined to be the R-Histogram of object A relative to object R.

A R-Histogram example is illustrated in Figure 2, where the x-axis is associated with angles and the y-axis

with distances.

58



Figure 2: RH( A,R) for the two objects in Figure 1. Each quadrant is associated with a unique label.

Time Complexity concerns: Let N be the number of pixels in an image. We assume the objects are homeomorphic

to a 2-ball. In the worst case, the number of pixels on the boundary of an object is O( N ). Therefore, the computation

of R-Histogram takes O( N 2) time. If the objects are convex, the number of boundary pixels will be O( N 1/2) and the

time complexity will drop to O( N ).

Distance Metric

The dissimilarity between two images is defined by the distance between corresponding R-Histograms.

There are many histogram distance metrics. The distance metric used here is the histogram intersection. It is shown

in [21.18] that when the histograms are normalized, the histogram intersection is given by

(3)

Future work may be performed to model the spatial relations of multiple objects in an image, we can use R-

Histograms as the arc attributes in ARGs. Moreover an attempt is likely to improve the time complexity of R-

Histogram computation and investigate the possibility of extracting semantic meanings from the R-Histogram

representations.

II. Content Based Image Retrieval

Content-based image retrieval (CBIR) is the current trend of designing image database systems as opposed

to textbased image retrieval [22.7], [22.11], [22.14], [22.18], [22.23], [22.24], [22.25], [22.27]. The features used in

content-based image retrieval can be roughly divided into two categories: the low-level visual features (such as

color, texture, and shape) and the highlevel features (such as pairwise spatial relationships between objects). Some

examples of content-based image retrieval systems are QBIC [22.8], Virage [22.1], Retrieval Ware [22.29],

VisualSEEK [22.26], WaveGuide [22.17], and Photobook [22.21]. They allow users to retrieve similar pictures from

a large image database based on low-level visual features. On the other hand, there is also a large group of

researchers emphasizing image retrieval based on spatial relationships between objects [22.3], [22.4], [22.5],

[22.10], [22.15], [22.16], [22.20], [22.22], [22.28].

59



The method of representing images is one of the major concerns in designing an image database system.

An ideal representation method for symbolic pictures should provide image database systems with many important

functions such as similarity retrieval, visualization, browsing, spatial reasoning, and picture indexing. One way of

representing an image is to construct a symbolic picture for that image which in turn is encoded into a 2D-string

[22.5]. The 2D string representation method opened up a new approach to spatial reasoning, picture indexing, and

similarity retrieval. There are many followup research works based on the concept of 2D string such as 2D C-string

[22.15], [22.16], and 2D C+-string [22.9]. In [22], we find a new scheme for encoding spatial relations called 9-

Direction SPanning Area (9D-SPA) representation method.

Overview Of Spatial Knowledge Representation

Binary spatial relationships between objects have been identified as one of the most important features for

describing the contents of images [22.6]. For example, a query such as “finding all the pictures containing a house to

the east of a tree” relies on spatial relations to retrieve the desired pictures. Different kinds of spatial knowledge

representations have been proposed so far. Chang et al. [22.5] proposed the 2D string as a spatial knowledge

representation to capture the spatial information about the content of a picture. The fundamental ideal of 2D string is

to project the objects of a picture along the x and y-directions to form two strings representing the relative positions

of objects in the x and y-axis, respectively. Since a 2D string preserves the spatial relationships between any two

objects in a picture, it has the advantage of facilitating spatial reasoning. Moreover, since a query picture [22.6] can

also be represented as a 2D string, the problem of similarity retrieval becomes a problem of 2D string subsequence

matching. Jungert [22.12], Chang et al. [22.4], and Jungert and Chang [22.13] extended the idea of 2D strings to

form 2D G-strings by introducing several new spatial operators to represent more relative positional relationships

among objects of a picture. The 2D G-string representation embeds more information about spatial relationships

between objects and, thus, facilitates spatial reasoning about sizes and relative positions of objects. Following thesame concept, Lee and Hsu [22.15] proposed the 2D C-string representation based on a special cutting mechanism.

Since the number of subparts generated by this new cutting mechanism is reduced significantly, the lengths of the

strings representing pictures are much shorter while still preserving the spatial relationships among objects. The 2D

C-string representation is more economical in terms of storage space efficiency and navigation complexity in spatial

reasoning. The 2D C+-string representation [22.9] extended the 2D C-string representation by adding relative metric

information about the picture to the strings. As a consequence, reasoning about relative sizes and locations of

objects, as well as the relative distance between objects in a symbolic picture becomes possible. Chang [22.3]

proposed a structure called 9DLT to encode the spatial relationships between objects in terms of nine directions.

Since the 9DLT method uses centroid to represent the position of an object, such a representation is too sensitive in

spatial reasoning. For example, the spatial relationships between the two objects shown in Figs. 1a, 1b, and 1c are all

different in 9DLT representation; however, they seem not too much different in human visual perception.

The representation of spatial relations proposed by Zhou and Ang [22.28] combines the nine directional

relations proposed in 9DLT with the five topological relations, namely, disjoint, meet, partly-overlap, contain, and

inside. The topological relation can record the 2D relationship between any two nonzero-sized objects with irregular

60



shapes and, therefore, makes spatial reasoning more accurate as compared to using MBR or centroid to represent an

object. However, Zhou and Ang’s method still has the problem with being too sensitive when reasoning about

directional relations. Instead of combining the nine directional relations with the five topological relations, the 2D-

PIR proposed by Nabil et al. [22.19], [22.20] combines the 13 projection interval relations with the topological

relations. Although 2D-PIR seems2particularly useful in similarity retrieval, it did not provide any picture

reconstruction mechanism for visualization. Besides, incorporating 2D-PIR into any indexing structure is difficult.

Thus, similarity retrieval based on 2D-PIR becomes inefficient if the volume of images in the database increases.

9D-SPA Representation

The picture has to be preprocessed first. We assume that the objects in a picture can be identified by some

image segmentation and object recognition procedures. Various techniques of image segmentation and object

recognition can be found in [22.2]. Suppose that a picture P contains n objects (O1 ,O2 ,. . .,On). Then, the 9D-SPA

representation of P can be encoded as a set of 4-tuples: R =(O ij;Dij;D ji; Tij )| ∀ Oi , O j∈ P , and 1 <= i<j<= n,

where Oij is the code for object-pair (Oi ,O j), Dij is the code for the direction-relation between objectsOi,Oj with Oj

as the reference object, Dji is the code for the direction relation between Oi and Oj with Oi as the reference object,

and Tij is the code for the topological relation between Oi and Oj. It is obvious that the number of 4-tuples in R is

n(n-1)/2.

Let Oi be the ith object in the image database (1 <= i<= n). We assign integer i to object O i as its object

number. Then, Oij is called the object-pair code for object-pair (Oi, Oj). Given two objects O i and O j, we can easily

compute the object-pair code Oij using the following formula:

To obtain the two object numbers i and j from Oij (or to decode Oij), we use the formula ,

where a is the largest integer such that .

Dij represents the value assigned to the directional relationship between objects Oi and O j with Oj as the reference

object. The value of Dij is determined by the following procedure. First, we find the Minimal Bounding Rectangle

(MBR) for reference object O j. Then, we extend the four boundaries of this MBR horizontally and vertically until

61



they cut the whole picture into nine neighborhood areas and then assign each area a binary code as shown in Table

1. The value of Dij is determined by the formula where wk is the binary code of neighborhood

area k; bk = 1 if object Oi overlaps area k, otherwise, bk = 0. The value of Tij indicates the topological relationship

between objects Oi and O j. Possible values assigned to topological relations are: 0 (stands for “disjoint”), 1 (stands

for “meet”), 2 (stands for “partly_overlap”), 3 (stands for “cover”), and 4 (stands for “contain” or “inside”).

Fig. 2. Pictures (a) and (b) are not distinguishable in all 2D+-string representations. However, the difference can be

easily determined by the 9D-SPA representation.

Let us look at the two pictures shown in Figs. 2a and 2b. Assume that object B is the reference object in

both pictures. Then, in Fig. 2a, the code for DAB is

(00001000 + 00010000+ 00100000 + 01000000 + 10000000)2 = 248

and the code for TAB is 0. In Fig. 2b, the code for DAB is

(00000001 + 00000010 + 00100000 + 01000000 + 10000000)2 = 227

and the code for TAB is 0. In 2D+-string representations, the pictures in Figs. 2a and 2b are not distinguishable

because they have the same spatial representation (i.e., A%B in both x and y-directions). However, we can easily tell

the difference between them by using 9D-SPA representation because DAB in Fig. 2a is 248, while DAB in Fig. 2b is

227. Moreover, from DAB = 248= (11111000)2, we can easily determine that object A spans five neighborhood areasof object B, namely, the northwest, the west, the southwest, the south, and the southeast neighborhood areas as

shown in Fig. 2a. Similarly, from DAB = 227= (11100011)2, we can easily determine that object A spans another

different five neighborhood areas of object B, namely, the northeast, the east, the southeast, the south, and the

southwest neighborhood areas as shown in Fig. 2b.

III. Content Based Image Retrieval and Spatial Data Mining for Medical data

In database systems for supporting contemporary advanced applications like medical image analysis and

disease detection and prediction systems, the techniques of content based image retrieval and of spatial data mining

in images are of much importance. Similar techniques are applicable to applications like surveillance systems and

GIS based decision support systems. As an example, we find the work by Chung, Wang [25] in which they have

discussed creation of a skin cancer image database using a three-tier system.

Database Design

An automatic segmentation method for the images of skin tumors is developed in [25.2]. This method first

reduces a color image into an intensity image and then finds an approximate segmentation by intensity thresholding.

Finally, it refines the segmentation using image edges. One table is designed for this skin cancer database to store

62



the features of the tumors. Besides the tumor features, some other attributes are added into the table. These include

record number as a primary key of this table, patient id number, the date that the image was taken, the image id

number to identify the image, and the image file name. Image file names are stored in the database instead of image

file themselves. Although images can be stored in the database as BLOB type, our approach is more flexible because

image files can be stored elsewhere, like on a multimedia server. A DBMS can be easily integrated with multimedia

servers. One advantage is that it is easy to integrate multimedia files with existing databases. Another advantage is

that other non-database applications can access those multimedia files without going through the database. While

performing browsing or content based retrieval, Java applets will try to find and display the images using their file

names stored in the database. The skin cancer database can be used for medical information retrieval, expert

diagnosis, and medical pattern discovery.

Image Feature Definitions

Irregularity is associated with skin malignancies, including malignant melanoma, but it remains undefined

up to now, other than with some subjective terms, such as jagged, notched, not smooth, or not round. One common

way to measure irregularity ( I ) is

,

where p and A are the perimeter and area, respectively [25.3]. Asymmetry is determined about the near-axis of

symmetry by comparing absolute area differences to the total area of the tumor shape [25.4]. Entropy a feature

which measures the randomness of gray-level distribution. It is defined as

[25.5]

P [i, j] is the gray-level co-occurrence matrix. It is defined by first specifying a displacement vector d = (dx, dy) andcounting all pairs of pixels separated by d having gray levels i and j. Entropy is highest when all entries in P [i, j] are

equal [11].

Energy is defined as

[25.5]

Homogeneity is defined as

[25.5]

Inertia is defined as

[25.6]

Database Browsing and Retrieval

The project is implemented in a three-tier architecture. The applets run in browser is the front layer, the

web server is the middle layer and the backend database server is the third layer. JDBC-ODBC is used for the

63



communication between web server and database server. Users can retrieve images by their content, i.e. by

specifying the attribute values or by using a synthesized color.

Data mining, which is also referred to as knowledge discovery in databases, means a process of nontrivial

extraction of implicit, previously unknown and potentially useful information from databases. Its goal is to extract

significant patterns or interesting rules from databases. Data mining can be broadly classified into three categories:

Classification (Clustering)---finding rules that partition the database into finite, disjoint, and previously known

(unknown) classes. Sequences---extracting commonly occurring sequences in ordered data. Association rules (a

form of summarization)--- find the set of most commonly occurring groupings of items [25.7]. In the project, mining

association rules in a skin cancer database has been implemented.

IV. Image Analysis for brain data

Studies of schizophrenia, Parkinson’s disease, Alzheimer’s disease and other illnesses caused by disruption

of brain functions, are often based on collections of brain images, usually obtained at different resolutions through

computer tomography for human subjects, or through surgical procedures for other species. Atlases have been acommon way to organize such image series. Multiple examples of such atlases with corresponding image

segmentation and 2D and 3D visualization techniques have been developed [e.g., for mouse brain: 24.3, 24.4, 24.8,

24.12, 24.13]; several are available online [24.11]. A comprehensive list of available brain data sources and atlases

is maintained by [24.6]. In [24], principles and techniques that enable spatial data interoperability, including spatial

registration, discovery, query, and visualization across brain data sources have been explored.

V. XML based spatial data management for Geo-Cmputation in distributed system

Today, the research interest of geo-computation such as data mining and knowledge discovery purchases

more orders to the data infrastructure of geo-computation. A new data infrastructure which is distributed, extensible

and platform-independent is needed to provide more powerful and flexible data services for the geo-computation

research and applications. Grid computing is a new research agenda which evolves from the distributed computing

and meta-computing. It tries to provide virtual computation resources by strip the power of resources from the

computer hardware and software. When the grid computing technology is adopt in the research domain of spatial

information and geo-computation, Spatial Information Grid [26.6] (shortly SIG) is proposed and studied. In SIG, the

computing power, data, model, arithmetic, and other resources are shared and assembled as abstract resource

through a series of middleware, toolkits and infrastructure. It will be a powerful and easy-to-use infrastructure for

spatial information applications. An SIG based 4-layered data infrastructure which is built up by data nodes, data

sources, data agencies, support libraries, and other components is proposed in [26]. It is distributed in geography and

management. The data infrastructure has a lot of service nodes distributed geographically, and the nodes are

managed distributed by their owners instead of a centralized organization. the SIG based data infrastructure are well

designed by XML schema for co-operation and implemented in java language, so that it can run on almost any

platform and supports any type of data sources.

64



Architecture

The SIG based data infrastructure also adopts the SOA (service oriented architecture) as its footstone, and

regards all components including data sources and data agencies as web services. In logic, it takes a 4-layered

architecture show as figure 1. By invoking the web service in a well-defined XML based protocol; data stored in the

data node can be searched and accessed. In order to make the design of data infrastructure simple and neat, the data

agencies are required to adopt the same protocol as the data sources. It is called “eXtensible Data Accessing

Language”, shortly XDAL.

The eXtensible Data Accessing Language

The data infrastructure shares spatial data in different types, with different formats, and for different goals

in a uniform infrastructure. Because the data are often stored on different platforms, the data sources have to be

invoked by a platform-independent protocol such as SOAP. Furthermore, a well-defined extensible data accessing

language which suits any data source and any data type is needed for the data source accessing based on SOAP. In

order to make it platform-independent, XML should be adopted as its format.

There are several frequently used operations on a data source or data agency: searching data, downloading

data, and querying its capability description. The grammars and usage rules of request and response for these

operations are standardized by the well-defined XDAL in XML schema. Users can accomplish the operation by

invoking the web service provided by data source or data agency, passing the request to it, and analyzing the

response for the result.

In the XDAL, a REQUEST should has a root element named <query>, <access>, <getCapability>,

<getStatus>, or <getResult>, for functions of searching data, downloading data, getting capability description of

data source, and for commands of getting operation status, getting operation result. A RESPONSE should has a root

element named <response>, <status>, or <result>, for responses of starting an operation, getting operation status,

and getting operation result. Figure 3 is a sample of searching data from satellite “Landset-7” with a given

acquirement date.

65



Considering the extensibility of the system, the XDAL is designed as a XML based extensible language.

Both the request of <query>, <access>, <getCapability>, and the response of <result> can be extended a lot. By

extending the request and response, the XDAL will be suit for almost all kinds of data sources and geo-computation

applications.

66



The paper further provides some useful data sources, data agencies and support libraries. A test on the user interface

of the data infrastructure shows that it can organize the distributed data nodes and data agencies dynamically; build

an extensible, robust, and autonomic data infrastructure; and serve the users on-stop as an organic whole.

ON Management of object data base in distributed, real time environment etc.

I. High-performance data management support for real-time visualization time-varying flow fields e.g.

blood flow

Aneurysm surgery remains dangerous because surgeons have limited knowledge of the 3D geometry of

aneurysm and its complex, time-dependent themodynamic factors such as flow, shear force and pressure. This

information is essential to determine if the aneurysm is suitable for a certain surgical procedure. To make it possible

for physicians to obtain such information, we have designed and developed a Virtual Aneurysm (VA) system that

supports an interactive exploration environment suited for the particular needs of brain aneurysm specialists and

directly assists them in their investigations. The handling of large amounts of data in real-time virtual reality

visualization systems is an important and complex problem of high significance. Navigation and exploration of such

large datasets stress computational resources, requiring users and visualization systems make tradeoffs between

time, space, and flexibility.

To make it possible for physicians to obtain such information, Liu and Karplus [26] have designed and

developed a Virtual Aneurysm (VA) system that supports an interactive exploration environment suited for the

particular needs of brain aneurysm specialists and directly assists them in their investigations. VA system mainly

consists of a client-server configuration that provides an immersive environment allowing a physician to move

around and into an aneurysm, interactively navigate to explore its complex computer simulated fluid dynamics

within the vascular system using virtual reality and scientific visualization techniques [26.3].

VA system description in brief

VA system is based on the numerical solutions of Navier-Stokes equations for the case of three-

dimensional time-varying ows. Flow simulations are computed over time as the heart goes through its pumping

cycle. To ensure numerical stability, simulations are computed using small time stepsize such that only a very small

fraction of the total data changes their values from one step to the next. Adding the time dimension drastically

increases the dataset size, increasing storage requirements and computational complexity. Simulations typically run

for tens or hundreds of hours on high-performance computing machines and periodically generate snapshots of states. The large quantities of simulated data are subsequently stored in archives on disk. After data are off-loaded,

they are analyzed and post-processed using scientific visualization and animation techniques to explore the evolving

state of the simulated fluid dynamics within the vascular system from local graphics workstations. Data of such

unprecedented size often exceeds the memory and performance capacity of typical desktop graphics workstations.

The frame rate is the frequency with which the renderer processes new frames. The frame rate of a visualization is to

be kept as constant and as high as possible so that whatever the animation is it will be smooth. The greater the frame

67



rate, because there will be more frames, more work will be required to produce the animation. Typical visualizations

require a significant amount of I/O bandwidth for accessing data at different time steps when there is not enough

memory space for the en tire time sequence. The results of data access must be communicated to the graphics

workstations for display, which not only causes significant data movement across slow networks,

but also interfaces with complex human-computer interactions.

Data Representation

Successful visualization systems must be designed to handle dataset of arbitrary size. We have exploited a

method for the production and a hierarchy of representations of reduced data at various levels of detail, which retain

as many as possible of the essential features in the original data but are small enough to be loaded one chunk at a

time into main memory. We further expanded this multi resolution data reduction method to allow any number of

data variables by developing an algorithm for construction of octree-like data measuring the introduced error in the

multivariate data, ensuring that the errors in multivariate data do not exceed a pre-defined upper bound.

Figure 1: Data flows through the visualization pipeline.

Figure 2: Schematic of the flow visualization environment.

Octrees, like quadtrees, are hierarchical data structures based on decomposition of space. In quadtrees,

space is recursively subdivided into four subregions [26.6]. Octrees are three-dimensional extensions of quadtrees,

where space is recursively subdivided into eight subvolumes [26.4]. The octree-based approach illustrates the

advantage of a regular partition of the 3D space. A hierarchical partition of the space into octants and suboctants,

down to any desired level of granularity, provides a general-purpose scheme for organizing the space as a skeleton

to which any kind of spatial data can be attached for systemic access. This skeleton supports multi-resolution

68



visualization of large three-dimensional datasets so that the current regions of interest are always displayed at a

higher resolution than the rest. In many instances, these higher resolution regions make up a relatively small

percentage of the entire data that leads to accelerate the visualization with only slight degradation of image quality.

The notion of multi-resolution is also a key concept to control the traffic of network connections to guarantee the

quality of service (QoS) in many multimedia applications Octree nodes are partitioned and then restructured into

many page-sized blocks for efficient storage on disk. For simple partitioning, tree nodes are visited in depth-first

order and are accumulated into the current block if the number of nodes does not exceed the block size. The

traversal recursively descends the tree and continues. If a node overfills the current block, close the current block,

leaving it slightly unfilled, and start working on the next block. Because page size can be controlled, any size data

can be run on any size computer, with good scalable characteristics as the computer system grows in memory,

computational power, and data bandwidth. Furthermore, each page-sized data is loaded into main memory one time

step ahead of when it is actually required, resulting in smooth streaming of data from storage device to graphics

pipeline.

II. Data Management for Distributed Moving Object databases

The need to manage massive volume of continuously produced information is ever increasing rapidly in

many future applications, such as Location-Based Service (LBS), stream data processing, wireless sensor networks

and RFID-enabled ubiquitous computing. To realize location based service, it is essential to develop efficient

management scheme of location information for heavy volume of moving objects, which can be anything that can

change its position, including human, equipment, or vehicle. There have been lots of related research efforts, but

most of current research activities are single node oriented, making it difficult to handle the extreme situation that

must cope with a very large volume, at least millions, of moving objects. The architecture named the Gracefully

Aging Location Information System (GALIS) is a cluster-based distributed computing system architecture which

consists of multiple data processors, each dedicated to keeping records relevant to a different geographical zone and

a different time-zone. Much work has further been done in [27, 27.7, 27.8, 27.9, 27.10].

CONCLUSION

Various data processing and computing needs based on different data characteristics and processing

typicality posed by the requirements of different applications and computing environments, has been studied and

presented above. Mostly, an approach to address to the needs of a particular application requires focusing on design

of proper data- representation (data structure). It has to be with regard to efficiency of various operations required to

be performed over that data, as expected by users of the application. As we can observe, the major operations

include search over data. To provide efficient search, the core data representation may be supported with proper

indexing . Further, the recent applications are mostly demanding descriptive as well as predictive learning from the

data sets. This requires techniques for such queries over data forms ranging from static text, image, audio, and raw

69



bit streams to the data about time varying data e.g. for moving objects etc. To solve such problems, several

techniques have been applied, devised, invented, and are being studied. Integration of Pattern Recognition,

Machine Learning, Image Processing techniques [30,31,32,33,34] with upcoming Computational Intelligence/

Optimization Techniques bears many promises to make to the growing needs of the industry. Among the various

applications studied, need for improvement of spatial-data-mining techniques has been found of much importance.

It addressed to a wide range of application areas including multimedia applications, geographic information systems

(GIS), medical image processing, surveillance system and many more. Statistical methods play vital role in all above

solutions. For example, the recently growing techniques of Swarm Intelligence – extension of evolutionary

computing –vastly apply the blending of algorithms design techniques and statistical technique [24,36,37,38,39,40].

Particle Swarm Optimizations (PSO) technique based on genetic algorithms and Stochastic Processes, is remarkable

one to solve many such problems with better efficiency. Further, development of Programming

languages/environment facilitating the functionality devised (by applying the underlying mathematical/ computing

techniques) for a particular application area, to its users, with users friendly interface is also under demand. All such

effort may make considerable contribution to the development of Relational Data Base/ Object Oriented Database/Object Relational Data Base Management Systems adding to their applicability in respective fields of contemporary

applications.

References:

[1] Heiko Schwarz, Detlev Marpe, and Thomas Wiegand, “Overview Of The Scalable H.264/Mpeg4-Avc Extension”,2007, Fraunhofer

Institute for Telecommunications – Heinrich Hertz Institute, Image Processing Department

[2] Livio Lima, Daniele Alfonso, Luca Pezzoni, Riccardo Leonardi, “New fast search algorithm for base layer of H.264 scalable video

coding extension”, 2007 Data Compression Conference (DCC'07), IEEE.

[5] Jeongkyu Lee, Department of Computer Science and Engineering, University of Bridgeport, “A Graph-based Approach for

Modeling and Indexing Video Data”, Proceedings of the Eighth IEEE International Symposium on Multimedia (ISM'06)

[6] ZHAN Chaohui DUAN Xiaohui* XU Shuoyu SONG Zheng LUO Min, “An Improved Moving Object Detection Algorithm Based on

Frame Difference and Edge Detection”, Fourth International Conference on Image and Graphics, 2007.

[7] Seung-Ho Lim, Man-Keun Seo and Kyu Ho Park, “Scrap : Data Reorganization and Placement of Two Dimensional Scalable Video

in a Disk Array-based Video Server”, Computer Engineering Research Laboratory, Department of Electrical Engineering and

Computer Science, KAIST, Ninth IEEE International Symposium on Multimedia 2007 - Workshops

[8] Siddhartha Chattopadhyay, Suchendra M. Bhandarkar, Member, IEEE, and Kang Li, “Human Motion Capture Data Compression by

Model-Based Indexing: A Power Aware Approach”, IEEE Transactions On Visualization And Computer Graphics, Vol. 13, No. 1,

January/February 2007.

[9] Yu-lung Lo, Chun-hsiung Wang, “Hybrid Multi-Feature Indexing for Music Data Retrieval”, 6th IEEE/ACIS International Conference on

Computer and Information Science (ICIS 2007).

[10] Daniel Howard, Joseph Kolibal, “Image Analysis by Means of the Stochastic Matrix Method of Function Recovery”, 2007 ECSIS

Symposium on Bio-inspired, Learning, and Intelligent Systems for Security

[11] Peng Tang, Lin Gao and Zhifang Liu, “Salient Moving Object Detection Using Stochastic Approach Filtering”, Fourth

International Conference on Image and Graphics, IEEE, 2007.

[12] Jian-Kang Wu, “Content-Based Indexing of Multimedia Databases”, IEEE Transactions On Knowledge And Data Engineering, Vol.

9, No. 6, November/December 1997.

[13] S.R. Subramanya Rahul Simha B. Narahari Abdou Youssef, “ Transform-Based Indexing of Audio Data for Multimedia

Databases”, IEEE, 1997.

[14] Jusub Kim, Joseph JaJa, “ Information-Aware 2n-Tree for Efficient Out-of-Core Indexing of Very Large Multidimensional

Volumetric Data”

70



[15] Yukio Hiranaka†, Hitoshi Sakakibara‡ and Toshihiro Taketa, “Universal Communication Format for Multimedia Data”,

Proceedings of the Sixth International Conference on Computational Intelligence and Multimedia Applications (ICCIMA’05)

[16] Vit Niennattrakul Chotirat Ann Ratanamahatana, “On Clustering Multimedia Time Series Data Using K-Means and Dynamic Time Warping”,

2007 International Conference on Multimedia and Ubiquitous Engineering(MUE'07)

[17] JIAN YIN1, DUANNING ZHOU2 AND QIONG-QIONG XIE1, “A Clustering Algorithm For Time Series Data”, Proceedings of the Seventh

International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT'06), 2006

[18] Guoqiang Xiong, Qingjing Gao, “An Algorithm of Similarity Mining in Time Series Data on the Basis of Grey

Markov Scgm(1,1) Model”, 2007 IFIP International Conference on Network and Parallel Computing - Workshops

2007

[19] Ryo Yoshida, Seiya Imoto, Higuchi, “Estimating Time-Dependent Gene Networks from Time Series Microarray Data by Dynamic Linear Models

with Markov Switching”, Proceedings of the 2005 IEEE Computational Systems Bioinformatics Conference (CSB’05)

[20] Battuguldur Lkhagva, Yu Suzuki and Kyoji Kawagoe, “New Time Series Data Representation ESAX for Financial Applications”, Proceedings of the

22nd International Conference on Data Engineering Workshops (ICDEW'06), 2006.

[21] Yuhang Wang, Fillia Makedon, “R-Histogram: Quantitative Representation of Spatial Relations for Similarity-Based Image Retrieval”

[22] Po-Whei Huang and Chu-Hui Lee, “Image Database Design Based on 9D-SPA Representation for Spatial Relations”, IEEE Transactions On

Knowledge And Data Engineering, Vol. 16, No. 12, December 2004

[23] Keith Marsolo, Michael Twa, “ Classification of Biomedical Data Through Model-based Spatial Averaging”, Proceedings of the 5th IEEE

Symposium on Bioinformatics and Bioengineering (BIBE’05), 2005.

[24] Ilya Zaslavsky, Haiyun He, Joshua Tran, Maryann E. Martone, Amarnath Gupta, “Integrating Brain Data Spatially: Spatial Data Infrastructure and

Atlas Environment for Online Federation and Analysis of Brain Images”, Proceedings of the 15th International Workshop on Database and Expert

Systems Applications (DEXA’04), 2004

[25] Soon M. Chung and Qing Wang, “Content-based Retrieval and Data Mining of a Skin Cancer Image Database”, Proceedings of the

International Conference on Information Technology: Coding and Computing (ITCC .01), 2001.

[26] Damon Liu and W alter Karplus, “Data Management for Exploring Complex Time-Dependent Flow Datasets”, Proceedings of the International

Conference on Information Technology: Coding and Computing (ITCC .01), 2001.

[27] Ho Lee, Jaeil Hwang, Joonwoo Lee, Seungyong Park, Chungwoo Lee, Yunmook Nah, “Long-term Location Data Management for Distributed

Moving Object Databases”, Proceedings of the Ninth IEEE International Symposium on Object and Component-Oriented Real-Time Distributed

Computing, 2006

[28] Narendra Ahuja, Jack Veenstra, “Generating Octrees from Object Silhouettes in Orthographic Views”, IEEETransactions On Pattern Analysis And

Machine Intelligence. Vol. Ii. No. 2. February 1989 137

[29] Qingmin Shi, Joseph JaJa, “Isosurface Extraction and Spatial Filtering Using Persistent Octree (POT)”, IEEE Transactions On Visualization And

Computer Graphics, Vol. 12, No. 5, September/October 2006

e-Books/ Book available: DM/Machine Learning/ Image Processing

[30] Ian H. Witten & Eibe Frank, “Data Mining: Practical Machine Learning Tools and techniques”, 2/e, Morgan Kaupman Publishers

[31] Daniel T. Larose, “ Discovering Knowledge in Data: An Introduction to Data Mining “, Wiley Interscience.

[32] Acharya and Roy, “ Image Processing- Principles and Appliations”, Wiley Interscience.

[33] “Image Representation, Indexing and Retrieval Based on Spatial Relationships and Properties of Objects”,

a dissertation presented to the faculty of the Department of Computer Science of the University of Crete

in partial fulfillment of the requirements for the degree of Doctor of Philosophy by Euripides G.M. Petrakis

[34] “Image Processing in C”, 2/e, Dwayne Phillips, R&D Publications /Miller-Freeman Inc./ CMP Media Inc.

[35] “MATLAB Recipes for Earth Sciences”, 2/e, Martin H. Trauth, Springer.

More research papers: Statistical Techniques

[36] Tiago Sousa, Ana Neves, Arlindo Silva, “Swarm Optimisation as a New Tool for Data Mining”, Proceedings of the International Parallel and

Distributed Processing Symposium (IPDPS’03)

[37] Gianluigi Folino, Agostino Forestiero, Giandomenico Spezzano, “Swarming Agents for Discovering Clusters in Spatial Data”, Proceedings of the

Second International Symposium on Parallel and Distributed Computing ISPDC’03)

[38] “Data Clustering: A Review”, A.K. JAIN, M.N. MURTY,P.J. FLYNN, ACM Computing Surveys, Vol. 31, No. 3, September 1999

[39] Bin Gao Tie-Yan Liu Wei-Ying Ma, “Star-Structured High-Order Heterogeneous Data Co-clustering based on Consistent Information Theory”,

Proceedings of the Sixth International Conference on Data Mining (ICDM'06)

71



[40] Bijan Bihari Misra, Suresh Chandra Satapathy P. K. Dash, “Particle Swarm Optimized Polynomials for Data

Classification”, Proceedings of the Sixth International Conference on Intelligent Systems Design and Applications (ISDA'06)

[41] Ofer Miller, Ety Navon, Amir Averbuch, “Tracking Of Moving Objects Based On Graph Edges Similarity”,ICME 2003

References within references

[5.1] A. Dempster, N. Laird, and D. Rubin. Maximum likelihood estimation from incomplete data via them algorithm (with discussion). J.R. Statist. Soc.

B, 39:1–38, 1977.[5.2] S. Lu, M. Lyu, and I. King. Video Summarization by Spatial-Temporal Graph Optimization. In Proceedings of the 2004 International Symposium on

Circuits and Systems, volume 2, pages 197–200, Vancouver, Canada, May 2004.

[5.3] J. Lee, J. Oh, and S. Hwang. STRG-Index: Spatio-Temporal Region Graph Indexing for Large Video Databases. In Proceedings of the 2005 ACM

SIGMOD, pages 718–729, Baltimore, MD, June 2005.

[5.4] H. T. Chen, H. Lin, and T. L. Liu. Multi-object tracking using dynamical graph matching. Proc. of the 2001 IEEE Conf. on CVPR, pages 210–217,

2001.

[6.1] Wang Junqing, Shi Zelin, and Huang Shabai, “Detection of Moving Targets in Video Sequences”. Opto-Electronic Engineering, Dec 2005, pp. 5-8.

[6.2] Ren Mingwu, and Sun Han, “A Practical Method for Moving Target Detection Under Complex Background”. Computer Engineering, Oct 2005, pp.

33-34.

[6.3] Milan Sonka, Vaclav Hlavac, and Roger Boyle, “Image Processing, Analysis, and Machine Vision (Second Edition)”, Posts & Telecom Press,

Beijing, Sep 2003.

[6.4] Zhang Yunchu, Liang Zize, Li En, and Tan Min, “A Background Reconstruction Algorithm Based on C-means Clustering for Video Surveillance”,

Computer Engineering and Application, 2006, pp. 45-47.

[7.1] P. Shenoy and H. M. Vin. Efficient support for interactive operations in multiresolution video server. ACM Multimedia Syst., 7(3):241.253, Nov.

1999.

[7.2] S. Lim, Y. Jeong and K. Park. Interactive media server with media synchronized RAID storage system. Proc. of International Workshop on

Network and Operating Systems Support for Digital Audio and Video 2005, Jun. 2005.

[7.3] E. Chang and A. Zakhor. Disk-based storage for scalable video. IEEE Trans. on circuits and systems for video technology, 7(5):758.770, Oct.

1997.

[7.4] R. Rangaswami, Z. Dimitrijevic, E. Chang and S.-H. G. Chan. Fine-grained Device Management in an Interactive Media Server. IEEE

Transactions on Multimedia, Vol. 5, No. 4, pages 558-569, Dec. 2003.

[7.5] S. Kang, Y. Won and S. Roh. Harmonic placement: file system support for scalable streaming of layer encoded object. Proc. of International

Workshop on Network and Operating Systems Support for Digital Audio and Video 2006, May 2006.

[8.1] ISO/IEC 14496-1:1999, “Coding of Audio-Visual Objects, Systems,” Amendment 1, Dec. 1999.

[8.2] ISO/IEC 14496-2:1999, “Coding of Audio-Visual Objects, Visual,” Amendment 1, Dec. 1999.

[8.3] M. Preda, A. Salomie, F. Preteux, and G. Lafruit, “Virtual Character Definition and Animation within the MPEG-4 Standard,” 3D Modeling and

Animation: Synthesis and Analysis Techniques for the Human Body, M. Strintzis and N. Sarris, eds., chapter 2, pp. 27-69, IRM Press, 2004.

[8.4] S. Chattopadhyay, S.M. Bhandarkar, and K. Li, “Efficient Compression and Delivery of Stored Motion Data for Virtual Human Animation in

Resource Constrained Devices,” Proc. ACM Conf. Virtual Reality Software and Technology (VRST ’05), pp. 235- 243, Nov. 2005.

[8.5] M. Endo, T. Yasuda, and S. Yokoi, “A Distributed Multi-User Virtual Space System,” IEEE Computer Graphics and Applications, vol. 23, no. 1, pp.

50-57, Jan./Feb. 2003.

[8.6] T. Hijiri, K. Nishitani, T. Cornish, T. Naka, and S. Asahara, “A Spatial Hierarchical Compression Method for 3D Streaming Animation,” Proc. Fifth

Symp. Virtual Reality Modeling Language (Web3D-VRML), pp. 95-101, 2000.

[8.7] T. Giacomo, C. Joslin, S. Garchery, and N. Magnenat-Thalmann, “Adaptation of Facial and Body Animation for MPEG-Based Architectures,” Proc.

Int’l Conf. Cyberworlds, p. 221, 2003.[8.8] A. Aubel, R. Boulic, and D. Thalmann, “Animated Impostors for Real-Time Display of Numerous Virtual Humans,” Proc. First Int’l Conf. Virtual

Worlds (VW ’98), vol. 1434, pp. 14-28, 1998.

[8.9] O. Arikan, “Compression of Motion Capture Database,” Proc. ACM Trans. Graphics (ACM TOG), vol. 25, no. 3, pp. 890-897, 2006.

[9.1] James C.C. Chen and Arbee L.P. Chen, “Query by Rhythm An Approach for Song Retrieval in Music Databases,” In Proc. Of Int’l Workshop on

Research Issues in Data Engineering, Pages 139-146, 1998.

[9.2] Arbee L.P. Chen, M. Chang, J. Chen, J.L. Hsu, C.H. Hsu, and Spot Y.S. Hua, “Query by Music Segments: An Efficient Approach for Song

Retrieval,” In Proc. Of IEEE Int’l Conf. on Multimedia and Expro, 2000.

72



[9.4] J.L. Hsu, C.C. Liu, and Arbee L.P. Chen, “Efficient Repeating Patterrn Finding in Music Databases,” InProc. of ACM Int’l Conf. on Information and

Knowledge Management, 1998.

[9.5] C. L. Krumhansl, “Cognitive Foundations of Musical Pitch,” Oxford University Press, New York, 1990.

[9.6] W. Lee and A.L.P. Chen, “Efficient Multi-Feature Index Structures for Music Data Retrieval,” In Proc. Of SPIE Conf. on Storage and Retrieval for

Image and Video Database, 2000.

[9.7] Chia-Han Lin and Arbee L. P. Chen, “Indexing and Matching Multiple-Attribute Strings for Efficient Multimedia Query Processing,” IEEE

Transactions On Multimedia, Vol. 8, No. 2, April 2006.

[9.8] C.C. Liu, J.L. Hsu, and Arbee L.P. Chen, “Efficient Theme and Non-Trivial Repeating Pattern Discovering in Music Databases,” In Proc. of IEEE

Data Engineering, Pages 14-21, 1999.

[9.9] C.C. Liu, J.L. Hsu, and Arbee L.P. Chen, “An Approximate String Matching Algorithm for Content-Based Music Data Retrieval,” In Proc. of IEEE

Int’l Conf. on Multimedia Computing and Systems, Pages 451-456, 1999.

[9.10] Yu-lung Lo and Shiou-jiuan Chen, “The Numeric Indexing For Music Data,” in Proc. of the IEEE 22nd ICDCS Workshops – the 4th Int ’l Workshop

on Multimedia Network Systems and Applications (MNSA’2002), Vienna, Austria, Pages 258-263, July 2002.

[9.11] Yu-lung Lo and Shiou-jiuan Chen, “Multi-feature Indexing For Music Data,” in Proc. Of the IEEE 23nd ICDCS Workshops – the 5th Int’l

Workshop on Multimedia Network Systems and Applications (MNSA’2003), Providence, Rhode Island, USA, Pages 654-659, May 19-22, 2003.

[9.12] Yu-lung Lo, Ho-cheng Yu, and Mei-chin Fan, “Efficient Non-trivial Repeating Pattern Discovering in Music Databses,” Tamsui Oxford Journal of

Mathematical Sciences, Vol. 17, No. 2, Pages 163-187, Nov. 2001.

[11.1] D. Koller, J. Weber, and J. Malik, Robust Multiple Car Tracking with Occlusion Reasoning, in Proc. ECCV 94. Stockholm, Sweden. 1994.

[11.2] L. Wixson, Detecting salient motion by accumulating directionally-consistentflow, IEEE Trans. Pattern Analysis and Machine Intelligence, 2000.

22(8): p. 774-780.

[11.3] Tian, Y.-L. and A. Hampapur. Robust Salient Motion Detection with Complex Background for Real-time Video Surveillance, in IEEE Computer

Society Workshop on Motion and Video Computing 2005. Breckenridge, Colorado.

[11.4] A. Monnet, A. Mittal, and N. Paragios, Background modeling and subtraction of dynamic scenes, in Proc. ICCV 2003: p. 1305-1312.

[11.5] C. R. Wren, et al., Pfinder: real-time tracking of the human body, IEEE Trans. Pattern Analysis and Machine Intelligence, 1997. 19(7): p. 780-

785.

[11.6] C. Stauffer andW.E.L. Grimson, Learning patterns of activity using real-time tracking, IEEE Trans. Pattern Analysis and Machine Intelligence,

2000. 22(8): p. 747-757.

[11.7] A. Elgammal, D. Harwood, and L.S. Davis. Nonparametric Model for Background Subtraction, in Proc. ICCV Frame-Rate Workshop. 1999.

Kerkyra, Greece.

[11.8] A. Mittal and N. Paragios, Motion-based background subtraction using adaptive kernel density estimation, in Proc. IEEE Computer SocietyConference on Computer Vision and Pattern Recognition, 2004. 2.

[12.5] J.K. Wu, A.D. Narasimhalu, B.M. Mehtre, C.P. Lam, and Y.J. Gao, “CORE: A Content-Based Retrieval Engine for Multimedia Databases,” ACM

Multimedia Systems, vol. 3, pp. 3-25, 1995.

[12.8] C. Faloutsos, M. Flickner, W. Niblack, D. Petkovic, W. Equitz, and R. Barber, “Efficient and Effective Querying by Image Content,” Technical

Report, IBM Research Division, Almaden Research Center, RJ 9453 (83074), Aug. 1993.

[12.9] S.K. Chang, C. Yan, D.C. Dimitroff, and T. Arndt, “An Intelligent Image Database System,” IEEE Trans. Software Eng. vol. 14, pp. 681- 688,

1988.

[12.10] W.I. Grosky and R. Mehrota, “Index-Based Object Recognition in Pictorial Data Management,” CVGIP, vol. 52, pp. 416-436, 1990.

[12.13] T. Kohonen, “The Self-Organizing Map,” Proc. IEEE, vol. 78, pp. 1,464-1, 480, 1990.

[12.15] A. Tversky, “Features of Similarity,” Psychological Rev., vol. 84, pp. 327-352, 1977.

[12.20] J.-K. Wu, F. Gao, and P. Yang, “Model-Based 3D Object Recognition,” Proc. Second Int’l Conf. Automation, Robotics, and Computer Vision,

Singapore, Sept. 1992.

[13.9] E.Wold et al. Content-based classification, search and retrieval of audio data. IEEE Multimedia Magazine, 1996.

[13.10] A.Ghias et al. Query by humming. Proc. ACM Multimedia Conf., 1995.

[14.12] C. Silva, Y. Chiang, J. El-Sana, and P. Lindstrom, “Out-of-core algorithms for scientific visualization and computer graphics,” IEEE Visualization

Course Notes, 2002.

[14.3] P. M. Sutton and C. D. Hansen, “Accelerated isosurface extraction in time-varying fields,” IEEE Transactions on Visualization and Computer

Graphics, vol. 6, no. 2, pp. 98–107, Apr 2000.

[14.4] J. Wilhelms and A. V. Gelder, “Octrees for faster isosurface generation,” ACM Transactions on Graphics, vol. 11, no. 3, pp. 201–227, Jul 1992.

73



[14.5] J. Vitter, “External memory algorithms and data structures: Dealing with massive data.” ACM Computing Surveys, March 2000.

[14.13] H. Samet, The design and analysis of spatial data structures. Addison Wesley, 1990.

[14.14] T. M. Cover and J. A. Thomas, Elements of Information Theory. John Wiley, 1991.

[15.1] R. Srinivasan, XDR: External Data Representation Standard, RFC1832, 1995.

[15.2] M.P. Singh, Agent Communication Languages: Rethinking the Principles, IEEE Computer, vol.31, no.12, pp.40-47, 1998.

[15.3] Y. Hiranaka and M. Kato, Multimedia Data Representation by the Universal Data Format, Trans. IPSJ Meeting, 4V-9, 3-577/578, 1999.

[15.4] T. Obata, T. Taketa and Y. Hiranaka, Multimedia User Interface, Trans. IPSJ Tohoku Chapter Meeting, 00-4-6, 2001.

[17.1] Anne Denton, “Density-based Clustering of Time Series Subsequences”, In Proceedings The Third Workshop on Mining Temporal and

Sequential Data (TDM 04) in conjunction with The Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Seattle,

WA, Aug. 22, 2004.

[17.2] Darvish A., Bak E., Gopalakrishhan K., Zadeh R.H., Najarian K., “A New Hierarchical Method for Identification of Dynamic Regulatory Pathways

from Time-Series DNA Microarray Data”, Proceedings of The 3rd annual Computational Systems Bioinformatics conference (CSB2004), Stanford, CA,

U.S.A.. pp.602-603, August 16–20, 2004.

[17.3] S. Salvador, P. Chan, J. Brodie, “Learning States and Rules for Time Series Anomaly Detection”, Proc. 17th Intl. FLAIRS Conf, pp.300-305,

2004.

[17.4] Daxin Jiang, Jian Pei, Aidong Zhang, “DHC: A Density-Based Hierarchical Clustering Method for Time Series Gene Expression Data”, BIBE,

pp.393-400, 2003.

[17.5] Pedro Rodrigues, Joao Gama, Joao Pedro Pedroso, “Hierarchical Time-Series Clustering for Data Streams”, First International Workshop on

Knowledge Discovery in Data Streams, 2004

[19.1] T. Miyano and Kuhara. Identification of genetic networks from a small number of gene expression patterns under the Boolean network model.

Pac. Symp. Biocomput., 4, 17-28, 1999.

[19.2] M.J. Beal, F. Falciani, Z. Ghahramani, C. and D.L. Wild. A Bayesian approach to reconstructing genetic regulatory networks with hidden factors.

BioInformatics, 21(3), 2005

[19.3] T.Chen, H. He and G. Church. Modeling gene expression with differential equations. Pacific Symposium on BioCom puting, 1999.

[19.4] C. Rangel, J. Angus, Z. Ghahramani, M. Lioumi, E. Sotheran, A. Gaiba, D.L. Wild, and Falciani. Modeling T-cell activation using gene expression

profiling and state-space models, BioInformatics, 20(9), 2004.

[19.5] MJL de Hoon, Imoto, Kobayashi, Ogasawara , Miyano. Infering gene regulatory networks from time-ordered gene expression data of Bacillus

subtilis using differential equations. Pac.Symp. Biocomput., 8, 2003.

[19.6] N. Friedman, K. Murphy and S. Russell. Learning the structure of dynamic probabilistic networks. Proc. Conferenceon Uncertainty in Artificial

Intelligence, 139-147, 1998.[19.7] S . Kim, S. Imoto and S. Miyano . Inferring gene networks from time series microarray data using dynamic Bayesian networks. Brief. Bioinform.,

4(3)228-235,2003.

[19.8] S. Kim, S . Imoto and S. Miyano. Dynamic Bayesian network and nonparametric regression for nonlinear modeling of gene networks from time

series gene expression data. Biosystems, 75(1-3), 5765,2004.

[19.11] I. Shmulevich, E.R. Dougherty, S. Kim, and W. Zhang. Probabilistic Boolean networks: a rule-based uncertainty model for gene regulatory

networks. Bioinfomtics, 18(2), 2002.

[20.1] Agrawal, R., Faloutsos, C., & Swami, A. “Efficient similarity search in sequence databases” Proceedings of the 4th Conference on Foundations

of Data Organization and Algorithms.(1993)

[20.2] Chan, K. & Fu, W. “Efficient time series matching by wavelets”, Proceedings of the 15th IEEE InternationalConference on Data Engineering.

(1999).

[20.3] Lin, J., Keogh, E., Lonardi, S. & Chiu, B. “A Symbolic Representation of Time Series, with Implications for Streaming Algorithms”, In proceedings

of the 8th ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery. (2003).

[20.4] Keogh, E,. Chakrabarti, K,. Pazzani, M. & Mehrotra “Dimensionality reduction for fast similarity search in large time series databases”, Journal of

Knowledge and Information Systems. (2000).

[20.5] Eamonn J. Keogh , Michael J. Pazzani, “An Indexing Scheme for Fast Similarity Search in Large Time Series Databases” 11th International

Conference on Scientific and Statistical DatabaseManagement 1999

[20.51] Keogh, E., Chakrabarti, K., Pazzani, M. & Mehrotra, S. “Locally adaptive dimensionality reduction for indexing large time series databases”, In

proceedings of ACM SIGMOD Conference on Management of Data. Santa Barbara, CA, May 21-24. pp 151-162. (2001).

74

http://csdl2.computer.org/persagen/DLAbsToc.jsp?resourcePath=/dl/proceedings/&toc=comp/proceedings/ssdbm/1999/0046/00/0046toc.xml






[20.6] Keogh, E., Chu, S., Hart, D. & Pazzani, M. “An Online Algorithm for Segmenting Time Series”. In Proceedings of IEEE International Conference

on Data Mining. pp 289-296. (2001).

[20.7] Yi, B-K and Faloutsos, C., “Fast Time Sequence Indexing for Arbitrary Lp Norms”, Proceedings of the VLDB, Cairo, Egypt, Sept, (2000).

[21.1] J. F. Allen. Maintaining knowledge about temporal intervals. Commun. ACM, 26(11):832–843, 1983.

[21.2] I. Bloch and A. Ralescu. Directional relative position between objects in image processing: a comparison between fuzzy approaches. Pattern

Recognition, 36(7):1563–1582, 2003.

[21.3] S.-K. Chang, Q.-Y. Shi, and C.-W. Yan. Iconic indexing by 2-D strings. PAMI, 9(3):413– 28, 1987.

[21.4] A. G. Cohn and S. M. Hazarika. Qualitative spatial representation and reasoning: an overview. Fundamenta Informaticae, 46(1-2):1–29, 2001.

[21.5] S. Dagtas and A. Ghafoor. Indexing and retrieval of video based on spatial relation sequences. In Proc. Of ACM Multimedia’99, pages 119–122,

1999.

[21.6] M. Flickner, H. Sawhney, W. Niblack, J. Ashley, Q. Huang, B. Dom, M. Gorkani, J. Hafner, D. Lee, D. Petkovic, D. Steele, and P. Yanker. Query

by image and video content: the QBIC system. Computer, 28(9):23–32, 1995.

[21.8] V. N. Gudivada and V. V. Raghavan. Design and evaluation of algorithms for image retrieval by spatial similarity. ACM Trans. on Information

Systems, 13(2):115–144, 1995.

[21.9] J. Keller and X. Wang. Comparison of spatial relation de.nitions in computer vision. In Proc. of ISUMA - NAFIPS’95, pages 679–684, 1995.

[21.13] M. Nabil, A. H. H. Ngu, and J. Shepherd. Picture similarity retrieval using the 2D projection interval representation. IEEE Trans. Knowl. Data

Eng., 8(4):533–539, 1996.

[21.14] A. Pentland, R. W. Picard, and A. Sclaro. Photobook: Content based manipulation of image databases. Int’l J. of Computer Vision, 18(3):233–

254, 1996.

[21.15] E. Petrakis, C. Faloutsos, and K.-I. Lin. ImageMap: an image indexing method based on spatial similarity. IEEE Trans. on Knowl. and Data

Eng., 14(5):979–987, 2002.

[21.16] J. Sharma and D. M. Flewelling. Inferences from combined knowledge about topology and directions. In Advances in Spatial Databases,

volume 951 of Lecture Notes in Computer Science, pages 279–291. 1995.

[21.17] C.-R. Shyu and P. Matsakis. Spatial lesion indexing for medical image databases using force histograms. In Proc. of IEEE CVPR’01, pages

603–608, 2001.

[21.18] M. Swain and D. Ballard. Color indexing. Int’l J. of Computer Vision, 7(1):11–32, 1991.

[22.1] J.R. Bach, C. Fuller, A. Gupta, A. Hampapur, B. Horowitz, R. Humphrey, R.C. Jain, and C. Shu, “Virage Image Search Engine: An Open

Framework for Image Management,” Proc. Symp. Electronic Imaging: Science and Technology—Storage & Retrieval for Image and Video Database

IV, pp. 76-87, 1996.

[22.2] B. Bhanu and S. Lee, Genetic Learning for Adaptive Image Segmentation. Norwell: Kluwer Academic, 1994.[22.3] C.C. Chang, “Spatial Match Retrieval of Symbolic Pictures,” J. Information Science and Eng., vol. 7, pp. 405-422, Dec. 1991.

[22.4] S.K. Chang, E. Jungert, and Y. Li, “Representation and Retrieval of Symbolic Pictures Using Generalized 2D Strings,” technical report, Univ. of

Pittsburg, 1988.

[22.5] S.K. Chang, Q.Y. Shi, and C.W. Yan, “Iconic Indexing by 2-D Strings,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 9, no. 3, pp.

413-428, May 1987.

[22.6] S.K. Chang, Principles of Pictorial Information Systems Design. Englewood Cliffs, N.J.: Prentice-Hall Inc., 1989.

[22.7] Y. Chen and J.Z. Wang, “A Region-Based Fuzzy Feature Matching Approach to Content-Based Image Retrieval,” IEEE Trans. Pattern Analysis

and Machine Intelligence, vol. 24, no. 9, pp. 1252-1267, Sept. 2002.

[22.8] M. Flickner, H. Sawhney, W. Niblack, J. Ashley, Q. Huang, B. Dom, M. Gorkani, J. Hafner, D. Lee, D. Petkovic, D. Streele, and P. Yanker,

“Query by Image and Video Content: The QBIC System,” Computer, vol. 28, no. 9, pp. 23-32, Sept. 1995.

[22.10] P.W. Huang and Y.R. Jean, “Design of Large Intelligent Image Database Systems,” Int’l J. Intelligent Systems, vol. 11, pp. 347-365, 1996.

[22.11] P.W. Huang and S.K. Dai, “Image Retrieval by Texture Similarity,” Pattern Recognition, vol. 36, pp. 665-679, 2003.

[22.14] L.J. Latecki and R. Lakamper, “Application of Planar Shape Comparison to Object Retrieval in Image Database,” Pattern Recognition, vol. 35,

pp. 15-29, 2002.

[22.15] S.Y. Lee and F.J. Hsu, “2D C-String: A New Spatial Knowledge Representation for Image Database Systems,” Pattern Recognition, vol. 23, no.

10, pp. 1077-1087, Oct. 1990.

[22.16] S.Y. Lee and F.J. Hsu, “Spatial Reasoning and Similarity Retrieval of Images Using 2D C-String Knowledge Representation,” Pattern

Recognition, vol. 25, no. 3, pp. 305-318, Mar. 1992.

75



[22.17] K.C. Liang and C.C. Jay Kuo, “WaveGuide: A Joint Wavelet-Based Image Representation and Description System,” IEEE Trans. Image

Processing, vol. 8, no. 11, pp. 1619-1629, 1999.

[22.18] A.K. Majumdar, I. Bhattacharya, and A.K. Saha, “An Object- Oriented Fuzzy Data Model for Similarity Detection in Image Databases,”

IEEE Trans. Knowledge and Data Eng., vol. 14, no. 5, pp. 1186-1189, Sept./Oct. 2002.

[22.21] A. Pentland, R.W. Picard, and S. Sclaroff, “Photobook: Tool for Content-Based Manipulation of Image Databases,” Int’l J. Computer

Vision, vol. 18, no. 3, pp. 233-254, June 1996.

[22.22] G. Petraglia, M. Sebillo, M. Tucci, and G. Tortora, “Virtual Images for Similarity Retrieval in Image Databases,” IEEE Trans. Knowledge and

Data Eng., vol. 13, no. 6, pp. 951-967, Nov./Dec. 2001.

[22.23] E. Petrakis, C. Faloutsos, and K.I. Lin, “ImageMap: An Image Indexing Method Based on Spatial Similarity,” IEEE Trans. Knowledge and Data

Eng., vol. 14, no. 5, pp. 979-987, Sept./Oct. 2002.

[22.24] A. Rao, R.K. Srihari, L. Zhu, and A. Zhang, “A Method for Measuring the Complexity of Image Databases,” IEEE Trans. Multimedia, vol. 4, no.

2, pp. 160-173, Mar./Apr. 2002.

[22.25] Y. Rui, T.S. Huang, “Image Retrieval: Current Techniques, Promising Directions, and Open Issues,” J. Visual Comm. Image Representation,

vol. 10, pp. 39-62, 1999.

[22.26] J.R. Smith and S.F. Chang, “VisualSEEK: A Full Automated Content-Based Image Query System,” Proc. Fourth ACM Int’l Multimedia Conf., pp.

87-98, 1996.

[22.27] J. Vleugels, R.C. Veltkamp, and C. Remco, “Efficient Image Retrieval through Vantage Objects,” Pattern Recognition, vol. 35, pp. 69-80, 2002.

[22.28] X.M. Zhou and C.H. Ang, “Retrieving Similar Pictures from a Pictorial Database by an Improved Hashing Table,” Pattern Recognition Letters,

vol. 18, pp. 751-758, 1997.

[22.29] http://www.annapolistech.com/reseller/retrieval.htm, 2004.

[25.2] L. Xu, M. Jackowski, A. Goshtasby, C. Yu, D. Roseman, and S. Bines, "Segmentation of Skin Cancer Images," Image and Vision Computing,

17(1), 1999, pp. 65-74.

[25.3] J. E. Golston, W. V. Stoecker, R. H. Moss, and I. P. S. Dhillon, "Automatic Detection of Irregular Borders in Melanoma and Other Skin Tumors,"

Computerized Medical Imaging and Graphics, 16(3), 1992, pp. 188-203.

[25.4] W. V. Stoecker, W. W. Li, and R. H. Moss, "Automatic Detection of Asymmetry in Skin Tumors," Computerized Medical Imaging and Graphics,

16(3), 1992, pp. 191-197.

[25.5] R. Jain, R. Kasturi, and B. G. Schunck, Machine Vision, McGraw-Hill, 1995.

[25.6] D. H. Ballard and C. M. Brown, Computer Vision, Prentice-Hall, 1982.

[25.7] P. Adriaans and D. Zantinge, Data Mining, Addison-Wesley, 1996.

[26.3] D. Liu, M. Burgin, and W. Karplus, \Computer support system for aneurysm treatment," Proc. of the 13th IEEE Symposium on Computer-Based

Medical Systems, Houston, Texas, pp. 1318, June 2000.

[26.4] D.J. Meagher, \Geometric modeling using octree encoding," Computer Graphics and Image Processing, vol. 19, no. 2, pp. 129147, June 1982.

[26.5] D.A. Patterson, P.M. Chen, G. Gibson, and R.H. Katz, \Introduction to Redundant Arrays of Inexpensive Disks (RAID)," Proc. IEEE COMPCON

Spring '89, pp. 112117, IEEE Computer Society Press, 1989.

[26.6] H. Samet, \The quadtree and related hierarchical data structures," Computing Surveys, vol. 16, no. 2, pp. 186260, June 1984.

[27.7] Nah, Y., Wang, T., Kim, K.H., Kim. M.H., and Yang, Y.K., "TMO-structured Cluster-based Real-time Management of Location Data on Massive

Volume of Moving Items," in Proc. STFES 2003, IEEE Press, Hakodate, Japan, May 2003, pp.89-92.

[27.8] Nah, Y., Kim, K.H., Wang, T., Kim, M.H., Lee, J., and Yang, Y.K., "A Cluster-based TMO-structured Scalable Approach for Location Information

Systems," in Proc. WORDS 2003 Fall, IEEE CS Press, Capri Island, Italy, October 2003, pp.225-233.

[27.9] Nah, Y., Lee., J. Lee, W.J., Lee, H., Kim, M.H. and Han, K.J., “Distributed Scalable Location Data Management System based on the GALIS

76

Documents

Dimensions in Data Processing 2402