Upload
pathakvishi
View
225
Download
0
Embed Size (px)
Citation preview
8/3/2019 Dimensions in Data Processing 2402
http://slidepdf.com/reader/full/dimensions-in-data-processing-2402 1/76
Dimensions in Data Processing & Data Management Technology -
Data Structures & Algorithms Efficiency Concerns
Vishwambhar Pathak Sr. Lecturer, Dept. of Computer Science & Engineering, BITIC-RAK (UAE)
The efficiency of a computer application is greatly affected by the features of the underlying Database Management
System. Led by the fast advancements in applications of computing and information technology, the database
technology is also expanding at a phenomenal rate.
We find aspects of database management: data structures and data processing algorithms with regard to the varying
data characteristics and processing environment.
Present work aims at providing comprehensive and integrated account of different types of databases concurrent inliterature with comments and review of them, bringing out the point of similarities among them.
INTRODUCTION
The contemporary database management methodologies viz. Relational data model, Object data model, AI
techniques based Knowledge (base) management, Information Retrieval & Exploration and Data Configuration
(XML) techniques are largely under enhancements as the characteristics of data and the processing environment of
various applications largely vary and pose considerable challenges.
Data Characteristics: Multimedia data, Time-Series data, Temporal data, XML data, Multidimensional data Processing Environments: Real time processing, Parallel & Distributed processing, Mobile Computing, P2P n/w.
The focus of current research will be
i) To find solutions (Data representation, Indexing, querying, processing algorithms) to unsolved difficulties arising
due to data characteristics and typicality of the processing environment as summarized above.
ii) To find better ways of enhancement of performance in data management techniques.
8/3/2019 Dimensions in Data Processing 2402
http://slidepdf.com/reader/full/dimensions-in-data-processing-2402 2/76
Review Of Contemporary Research Related To Representation And Processing Of
Multimedia Data
[ Data Representation Concerns: 3D graphic/ objects handling ( ……may be studied later……..); Event centric; Logic Based representation of Multimedia
data; Representation of Audio data; Universal Common format ]
[ Processing Concerns and Solutions: Content- based retrieval (Querying) CBIR, Color based retrieval, Content based image authentication, Browsing 3D tool,
reactive retrieval in distributed env., semantic-based access in mobile n/w; Data hiding Blockbased-lossless, Distortion~, ERC, Quantization based data hiding;
Error Concealment ERC, Error Concealment, SAR image denoising; Protecting sensitive data ; Replication Multi-quality data replication, Transparent ~;
Information Exploration (Learning from data)Data history tool, Clustering using time-series, Feature extraction; Indexing Content based, graph based,
transform based, wavelet based, for human motion, image data, for multi-feature music, for VLDB-multidimensional ( 2n-tree); Geo-spatial-temporal data
processing ]
For data to be gainfully and meaningfully used in various applications, it is essential to have efficient schemes for
data management and manipulation, which broadly involve acquisition, organization, storage, query, retrieval,
transmission, and presentation of data. A DataBase Management System (DBMS) organizes huge amounts of data
into a database and provide utilities for the efficient storage, usage, and management of data. A multimedia database
management system (MMDBMS) should have capabilities of traditional DBMSs and much more. With multimedia
data, the ability to access all the data with similar features becomes limited with keyword-based indexing and exact
(or range) searching. This makes automatic analysis, classification and content-based query, and similarity-based
search as part of MMDBMS a necessity.
Data Representation Models and Concerns
The use of multimedia data in many applications has increased significantly. Some examples of these applications
are distance learning, digital libraries, video surveillance systems, and medical videos. As a consequence, there are
increasing demands on modeling, indexing and retrieving these data.
[ Concerns: 2D Video, Graph based video, moving object detection and tracking, human motion compressed ]
I. Modeling and refinement of Scalable Video Coding
Modeling and refinement of Scalable Video Coding is much under study [1]. The scalable extension of
H.264/MPEG4-AVC is a current standardization project of the Joint Video Team (JVT) of the ITU-T Video Coding
Experts Group (VCEG) and the ISO/IEC Moving Picture Experts Group (MPEG). The basic SVC design can be
classified as layered video codec.
2
8/3/2019 Dimensions in Data Processing 2402
http://slidepdf.com/reader/full/dimensions-in-data-processing-2402 3/76
In general, the coder structure as well as the coding efficiency depends on the scalability space that is required. An
important feature of the SVC design is that scalability is provided at a bit-stream level. Bit-streams for a reduced
spatial and/or temporal resolution can be simply obtained by discarding NAL units (or network packets) from a
global SVC bit-stream that are not required for decoding the target resolution. NAL units of PR(progressive
refinement) slices can additionally be truncated in order to further reduce the bit-rate and the associated
reconstruction quality.
Temporal Scalability: In a recent model named H.264/MPEG4-AVC, any picture can be marked as reference
picture and used for motion-compensated prediction of following pictures independent of the corresponding slice
coding types. These features allow the coding of picture sequences with arbitrary temporal dependencies.
So-called key pictures are coded in regular intervals by using only previous key pictures as references. The pictures
between two key pictures are hierarchically predicted as shown in Fig. 2. It is obvious that the sequence of key
pictures represents the coarsest supported temporal resolution, which can be refined by adding pictures of following
temporal prediction levels.
Spatial scalability is achieved by an oversampled pyramid approach. The pictures of different spatial layers are
independently coded with layer specific motion parameters as illustrated in Fig. 1. However, in order to improve the
coding efficiency of the enhancement layers in comparison to simulcast, additional inter-layer prediction
mechanisms have been introduced.
Inter-layer prediction techniques:
The following three inter-layer prediction techniques are included in the SVC design. In the following, only the
original concepts based on simple dyadic spatial scalability are described.
3
8/3/2019 Dimensions in Data Processing 2402
http://slidepdf.com/reader/full/dimensions-in-data-processing-2402 4/76
1. Inter-layer motion prediction: In order to employ base layer motion data for spatial enhancement layer coding,
additional macroblock modes have been introduced in spatial enhancement layers. The macroblock partitioning is
obtained by upsampling the partitioning of the co-located 8x8 block in the lower resolution layer. The reference
picture indices are copied from the co-located base layer blocks, and the associated motion vectors are scaled by a
factor of 2. These scaled motion vectors are either used unmodified or refined by an additional quartersample
motion vector refinement. Additionally, a scaled motion vector of the lower resolution can be used as motion vector
predictor for the conventional macroblock modes.
I. Inter-layer residual prediction: A flag that is transmitted for all inter-coded macroblocks signals the usage of
inter-layer residual prediction. When this flag is true, the base layer signal of the co-located block is block-wise
upsampled and used as prediction for the residual signal of the current macroblock, so that only the corresponding
difference signal is coded.
II. Inter-layer intra prediction: Furthermore, an additional intra macroblock mode is introduced, in which the
prediction signal is generated by upsampling the co-located reconstruction signal of the lower layer. For this
prediction it is generally required that the lower layer is completely decoded including the computationally complexoperations of motion-compensated prediction and deblocking. However, this problem can be circumvented when the
inter-layer intra prediction is restricted to those parts of the lower layer picture that are intra- coded. With this
restriction, each supported target layer can be decoded with a single motion compensation loop.
Recently a fast search motion estimation algorithm for H.264/AVC SVC (scalable video coding) base layer
with hierarchical B-frame structure for temporal decomposition has been presented [2]. The proposed technique
is a block-matching based motion estimation algorithm working in two steps, called Coarse search and Fine search.
The Coarse search is performed for each frame in display order, and for each 16x16 macroblock chooses the best
motion vector at half pel accuracy. Fine search is performed for each frame in encoding order and finds the best
prediction for each block type, reference frame and direction, choosing the best motion vector at quarter pel
accuracy using R-D optimization. Both Coarse and Fine Search test 3 spatial and 3 temporal predictors, and add to
the best one a set of updates. The spatial predictors for the fine search are the result of the Fine search already
performed for the previous blocks, while the temporal predictors are the results of Coarse Search scaled by an
appropriate coefficient. This scaling is performed since in the Coarse search each picture is always estimated with
respect to the previous one, while in the Fine Search the temporal distance between the current picture and its
references depend on the temporal decomposition level. Moreover in Fine search the number and the value of the
updates tested depend on the distance between the current picture and its references. These sets of updates are the
result of a huge number of simulations on test sequences with different motion features.
II. Storage Concerns and solutions for 2D Scalable video (H.264/MPEG-4 scalable video coding (SVC) )
SVC provides a multi-dimensional scalability, it supports multiple temporal, spatial and SNR resolutions
simultaneously. For the multi-dimensional scalability, SVC enables much more flexible adaptation to various
demands of users and network conditions. With a scalable video, the video server has to extract the exact sub-stream
data that corresponds to the requested resolution, from the full resolution stream. In this case, the extracted sub-
4
8/3/2019 Dimensions in Data Processing 2402
http://slidepdf.com/reader/full/dimensions-in-data-processing-2402 5/76
stream data may disperse in the disk. Thus, an access of scalable video stream at a corresponding resolution might
incur more disk requests that degrade overall disk throughput severely. Alternatively, server retrieves all streams
including extra sub-streams that are not requested but located between the currently requested sub-streams.
However, it may also cause huge waste of disk bandwidth and memory buffer since a large amount of disk
throughput should be consumed to retrieve unnecessary data and these should be retained in memory until
transmitting them. The disk throughput is a crucial factor that must be taken into account in a video server design,
since disk throughput may restrict the maximum number of clients serviced simultaneously.
There has been several works for placement of scalable video stream or multi-resolution non-scalable video
streaming one disk or a disk array. In multi-resolution non-scalable video stream, Shenoy [7.1] and Lim [7.2] have
proposed a placement strategy that interleaves multi-resolution video stream on a disk array and enables a video
server to efficiently support playback of these streams at different resolution levels. This placement algorithm
ensures that each sub-stream within a stream is independently accessible at any resolution and the seek time and
rotational latency overheads are minimized. In addition, they presented an encoding technique that enables a videoserver to efficiently support scan operations such as fast-forward and rewind. Rangaswami [7.4] developed the
interactive media proxy that transforms non-interactive broadcast or multicast streams into interactive ones. They
carefully manage disk device by considering disk geometry for allocation and making several stream files according
to the fast-forward levels. However, this method consumes large amount of storage space, and they did not consider
disk array management. For the scalable video data, Chang [7.3] have proposed a strategy for scalable video data
placement that maximizes the total data transfer rate on a disk for an arbitrary distribution of requested data rates.
The main concept of this strategy is the frame grouping, which orders data rate layers within one storage unit on a
disk. It allows the optimal disk operation in each service-round by performing one seek and a contiguous read of the
exact amount of data requested. Kang [7.9] presented harmonic placement strategy. In this scheme, the layers are
partitioned into a set of lower layers and a set of upper layers. In the lower layer group, they interleave data blocks
of all layers within the same service round. Meanwhile, in the upper layer group, they cluster the data blocks in a
layer together. Using this scheme, they can reduce disk seek time, since they can cluster the frequently accessed
layers together. However, these schemes described above are not fully utilize the characteristics of scalable video in
the video server that can provide multidimensional scalable video stream. They are limited to only a single
dimensional scalable video.
In a recent work [7], an efficient data reorganization and placement scheme for two dimensional scalable video in a
disk array-based video server has been proposed which considered both disk utilization and load balancing in a disk
array based video server. According to it, we reorganize sub-streams taking into account both of the decoding
dependency of two-dimensional scalable video and the location to be stored in a disk array.
The Two Dimensional SVC Rearrangement
5
8/3/2019 Dimensions in Data Processing 2402
http://slidepdf.com/reader/full/dimensions-in-data-processing-2402 6/76
SVC provides tools for three scalability dimensions, which are temporal scalability, spatial scalability and quality
(SNR) scalability. We focus on the two of them, spatial and temporal scalability, for the sake of simplicity. Spatial
scalability technique encodes a video into several levels that have different spatial resolutions each other. On the
other hand, temporal scalability is a technique to encode a video sequence into several levels having different frame
rate. These scalability dimensions including spatial and temporal can be easily combined to a general scalable
coding scheme which can provide a wide range of spatial and temporal scalability.
Figure 1. An Illustration of Two Dimensional Scalable Video
Figure 1(a) describes a combined scalability which support simultaneously for spatial and temporal scalability. When a combined
scalability is considered, strict notion of layer does not need to apply any more [2]. Instead, we define combined scalability level
that consists of Ls and Lt, i.e. each scalability dimension has its own level. Ls and Lt represent spatial and temporal scalability
level, respectively. The scalability level in each dimension represents the quality of the video in the corresponding dimension. In
scalable video stream, data segments can be grouped into a minimum sub-stream that is capable of extending scalability level.
Thus, in a scalable video server, data retrievals are requested in units of this minimum sub-stream. We define this sub-stream as
unit sub-stream (US) for two dimensional scalability. The US, Uk (l, m), is defined as a partial stream of k th GOP, which is an
essential sub-stream for reconstruction of video at the resolution of higher than spatial scalability level l and temporal scalability
level m. Thus, to reconstruct kth GOP at spatial scalability level Ls and temporal scalability level Lt, all the US's, U k (l, m), such
that l <= Ls and m <= Lt, should be extracted from the entire stream. GOPk (Ls, Lt), sub-streams for k th GOP at spatial scalability
level Ls and temporal scalability level Lt, is represented with US's as follows.
6
8/3/2019 Dimensions in Data Processing 2402
http://slidepdf.com/reader/full/dimensions-in-data-processing-2402 7/76
,|),(),( Lt m Lsl ml U Lt LsGOP k k ≤≤= (1)
In the Figure 1(a), relation between scalability level and US's is described. It also represents how scalability level is
related to frame rate and frame size. The encoded scalable video streams are stored in unit of frame, as shown in
Figure 1(b). The number marked on the top of each frame represents the decoding order. Basically, data of encoded
video stream are stored with their decoding order. To exploit access pattern determined by scalability, data should be
partitioned according to US's for the first time. Figure 1(c) shows this data placement partitioned according to US's.
Starting from this placement, the work proposed more efficient placement scheme.
For the scalable video, the requested video streams are likely to be retrieved with discontinuous manner in one
service round duration, since the extracted sub-stream data disperse in the disk. Thus, an access of these streams at a
corresponding resolution might incur more disk requests. To reduce seek-overhead, server can retrieve the sub-
streams including extra sub-streams that are not requested but located between the currently requested sub-streams.
In the view of this, our retrieval policy is that one disk request is generated per one round duration for each disk,
even though it retrieves unnecessary sub-streams. We try to find the optimal placement based on this retrieval
policy. Meanwhile, the request load balancing is important between disks of a disk array. When video streams are
stored into disk array, disk striping is performed by dividing the video data into blocks according to their decoding
order and storing these blocks into different disks. Sub-streams at a corresponding resolution might be located in
some disks but not in some disks. It incurs a biased disk requests and load imbalance between disks, which is not
efficient in a disk array-based server. Thus, the optimal data placement can be obtained by finding the placement
which satisfies both of two criteria:
Criterion 1. For each disk request, the server should retrieve minimum unnecessary sub-streams to maximize disk
utilization during one service round
Criterion 2. The server should generate disk requests to balance loads between disks during one service round
Let us suppose two dimensional scalable video that has three spatial scalability levels and five temporal scalability
levels, and we have a disk array consisting of four disks. Scalable video stream is originally arranged and partitioned
into USs, as described in the previous section. Then, these are initially stored into disks, in which the stripe means
the closed set of one round duration, as shown in Figure 2. The GOP data can be filled to match with striping
distance using FGS layer of quality scalability, which is described as U(F) in the Figure.
7
8/3/2019 Dimensions in Data Processing 2402
http://slidepdf.com/reader/full/dimensions-in-data-processing-2402 8/76
For a general approach, the optimal data placement can be obtained by finding the placement that optimize the
request size to be retrieved for each disk and distribute as even as possible between disks, during one service round.
Let pij be the probability of retrieving sub-stream at Ls = i, Lt = j, and let Sk = [s 1, s2, …., s N] denote one of the
possible data placement sequences in k th disk, where sn denotes nth US of GOP. Accordingly, S =< S 1, …., SK >
denote the continuous sequence across the K disks of one GOP. Let R ij (Sk) denote the request size that occur when
sub-stream at Ls = i, Lt = j is retrieved from the stream organized as sequence S k for k th disk. Let the spatial
scalability level and temporal scalability level be L and M, respectively, and the number of disks be N. In the first
step, from the criterion 1, we obtain the first data placement by finding S k for each disk that can minimize R(S), the
total request size to be retrieved during one service round from the following equation.
We can obtain several candidate placements from the Eq. 2. In the next step, we select the one that can maximize the
disk load balancing from the criterion 2. Let L ij (S) denote the load balancing factor for scalability level Ls = I and
Lt = j. Load balancing between disks means how the disk requests are distributed as even as possible, so the overall
load balancing factor, L(S), can be described as following equation.
where δ ij denote the number of disks to be accessed for scalability level Ls = i and Lt = j. Later the placement
policy is finding the stream sequence S k for each disk by finding maximum of the Eq. 3.
The procedure of local optimal placement search is described as follows.
1. Reorganize a raw scalable video stream, of which data are basically placed in their decoding order, into US's.
Thus, data are ordered according to scalability level. Then, let i = 1 and α = 1, accordingly the initial sequence is
considered as S1(1)
.
2. Whenever i increases, the sequence of stream Sα(1), is re-ordered, in which the scraper, USα , is relocated in the
ith location within the sequence.
3. For each sequence Sα(1), it is splitted into sub sequences, Sk
(1), for each disk in a disk array. Then, the total
retrieval size, R(Sα(i)), and load balancing factor, L (Sα
(i)) is calculated for that sequence from the Eq. 2 and 3. If it is
better than previous one, we replace it to the optimal sequence S. For the current α , this search is repeated until i
reaches (L . M).
4. While the α increases from 1 to (L . M), the scraper is changed. Using this US, USα , the search algorithm is
repeated from 2 to 3. Finally, the local optimal sequence of stream, S, is selected at the end of the repeat.
When we apply this search algorithm to the initial sequence of Figure 2, we can obtain the placement of Figure 3. In
the above placement, the client distribution probability is assumed to be pre-defined parameter. In particular, the
placement can be optimal when all the scalability level is requested in the same probability.
III. Graph-based Approach for modeling and indexing Video:
8
8/3/2019 Dimensions in Data Processing 2402
http://slidepdf.com/reader/full/dimensions-in-data-processing-2402 9/76
In [5], a new graph-based video data structure, called Spatio-Temporal Region Graph (STRG), which represents
spatio-temporal features, and the relationships among the video objects. Region Adjacency Graph (RAG) is
generated from each frame, and STRG is constructed by connecting RAGs. The STRG is segmented into a number
of pieces corresponding to shots for efficiency. Then, each segmented STRG is decomposed into its subgraphs,
called Object Graph (OG) and Background Graph (BG) in which redundant BGs are eliminated to reduce index size
and search time. The proposed indexing starts with clustering OGs using Expectation Maximization (EM) algorithm
[5.1] for more accurate indexing. To cluster them, we need a distance measure between two OGs. For the distance
measure, the paper proposed Extended Graph Edit Distance (EGED) because the existing measures are not very
suitable for OGs. The EGED is defined in non-metric space for clustering OGs, and it is extended to metric space to
compute the key values for indexing. Based on the clusters of OGs and the EGED, it proposed a new indexing
structure STRG-Index which provides efficient retrieval.
Spatio-Temporal Region Graph
For a given video, each frame is segmented into a number of regions using a region segmentation technique. Then,
Region Adjacency Graph (RAG) is obtained by converting each region into node, and spatial relationships among
regions into edges [5.2], which is defined as follows:
Definition 1 Given the nth frame fn in a video, a Region Adjacency Graph of fn, Gr(fn), is a four-tuple
Gr(fn) =V, ES, ν, ξ,
where
• V is a finite set of nodes for the segmented regions in fn,
• ES ⊆ V × V is a finite set of spatial edges between adjacent nodes in fn,
• ν : V → AV is a set of functions generating node attributes, and
• ξ : ES →S E A is a set of functions generating spatial edge attributes.
The node attributes ( AV ) represent size (i.e., number of pixels), dominant color and location of corresponding
region, the spatial edge attributes (S E A ) represent the relationships between two adjacent nodes such as spatial
distance and orientation. RAG is good for representing spatial relationships among nodes indicating the segmented
regions. However, it cannot represent temporal characteristics of video. In the new graph-based data structure for
video, Spatio-Temporal Region Graph (STRG) which is temporally connected RAGs [5.3]. The STRG can handle
both temporal and spatial characteristics of video, and defined as follows:
Definition 2: Given a video segment S, a Spatio-Temporal Region Graph, Gst(S), is a six-tuple Gst(S) = V,ES,ET ,
ν, ξ, τ, where
• V is a finite set of nodes for segmented regions from S,
• ES ⊆ V × V is a finite set of spatial edges between adjacent nodes in S,
• ET ⊆ V × V is a finite set of temporal edges between temporally consecutive nodes in S,
• ν : V → AV is a set of functions generating node attributes,
9
8/3/2019 Dimensions in Data Processing 2402
http://slidepdf.com/reader/full/dimensions-in-data-processing-2402 10/76
• ξ : ES →S E A is a set of functions generating spatial edge attributes, and
• τ : ET →T E A is a set of functions generating temporal edge attributes.
In STRG, the temporal edge attributes (T E A ) represent the relationships between corresponding nodes in two
consecutive frames such as velocity and moving direction. Figure 1 (a) and (b) are actual frames in a sample video
and their region segmentation results, respectively. Figure 1(c) shows a part of STRG for frames #141 − #143
constructed by adding temporal edges which are horizontal lines between the frames.
An STRG is an extension of RAGs by adding temporal edges (ET) to them. ET represents temporal relationships
between corresponding nodes in two consecutive RAGs. The main procedure of building STRG is therefore, how to
construct ET , which is similar to the problem of objects tracking in a video sequence. To find the corresponding
nodes in two consecutive RAGs, a graph isomorphism and maximal common subgraph was used. These algorithms
are conceptually simple, but have a high computational complexity. To address this, a RAG was decomposed into its
neighborhood graphs (GN(v)) which are subgraphs of RAG as follows:
Definition 3 GN(v) is the neighborhood graph of a given node v in a RAG, if for any nodes u ∈ GN(v), u is the
adjacent node of v, and has one edge such that eS = (v, u).
Letm
N G and1+m
N G be sets of the neighborhood graphs in mth and (m + 1)th frames
respectively. For each node v in mth frame, the goal is to find the corresponding target node
v’ in (m+1)th frame. To decide these corresponding nodes, we use the neighborhood graphs
in Definition 3. For each neighborhood graph GN(v ) inm
N G , the goal is converted to finding
10
8/3/2019 Dimensions in Data Processing 2402
http://slidepdf.com/reader/full/dimensions-in-data-processing-2402 11/76
the corresponding target graph GN (v ) in1+m
N G , which is an isomorphic or the most similar
graph to GN (v ). First, we find the neighborhood graph in1+m
N G , which is isomorphic to
GN(v ). Second, if we cannot find any isomorphic graph in1+m
N G , we find the most similar
neighborhood graph to GN(v ) using a similarity measure, SG(GN(v ),GN(v’ )), which is defined
as follows:
|))'((|,)(min(|
||))'(),((
vGvG
GvGvGSG
N N
C N N = (1)
where |G| denotes the number of nodes of G, and GC is the maximal common subgraph of
GN(v ) and GN(v’ ). GC can be computed based on maximal clique detection. For GN(v ) ∈ m
N G ,
GN(v’ ) is the corresponding neighborhood graph in1+m
N G , whose SG with GN(v ) is the largest
among neighborhood graphs in 1+
m N G , and greater than a certain threshold value. In this
way, we find all pairs of corresponding neighborhood graphs (eventually corresponding
nodes) fromm
N G to1+m
N G .
Object Graph
An STRG is first decomposed into Object Region Graphs (ORGs) to model moving objects. We
consider a temporal subgraph that can be defined as a set of sequential nodes connected to
each other by a set of temporal edges (ET ). An ORG is a special case of temporal subgraph of
STRG when the spatial edge set ES is empty. However, due to the limitations of region
segmentation techniques, different color regions belonging to a single object cannot be
detected as a single region. For instance, a body of person may consist of several regions
such as head, upper body and lower body. Figure 2 (a) shows an object that is segmented
into four regions over three frames. Since there are four regions in each frame, we build four
ORGs, i.e. (v 1, v 5, v 9), (v 2, v 6, v 10), (v 3, v 7, v 11), and (v 4, v 8, v 12) like Figure 2 (b). Since they
belong to a single object, it is better to merge those ORGs into one.
11
8/3/2019 Dimensions in Data Processing 2402
http://slidepdf.com/reader/full/dimensions-in-data-processing-2402 12/76
For convenience, we refer to the merged ORGs as Object Graph (OG). In order to merge two
ORGs which belong to a single object, we consider the attributes (i.e. velocity and moving
direction) of temporal edge (ET ). If two ORGs have same moving direction and same
velocity, these can be merged into one. In Figure 2 (c), four ORGs are merged into a single
OG, i.e. (v 2, v 6, v 10). After OGs are extracted, the remainders of STRG represent background
information of a video. We call this graph as a Background Graph (BG) and it is used in
indexing.
STRG Indexing
In this section, the paper proposed a graph-based video indexing method, called Spatio-
Temporal Region Graph Index (STRG-Index), which uses the Extended Graph Edit
Distance(EGED)M as a distance measure in metric space, and clustered OGs.
The Extended Graph Edit Distance (EGED) between two object graphs s
mOG andt
nOG is
defined as:
12
8/3/2019 Dimensions in Data Processing 2402
http://slidepdf.com/reader/full/dimensions-in-data-processing-2402 13/76
In order to satisfy the triangle inequality, EGED is specialized to be metric distance function
(see Theorem 1) by comparing the current value with the fixed constant.
Theorem 1: If gi is a fixed constant, then EGED is a metric.
STRG-Index Tree Structure
To build an index for video data, we adapt the procedure of tree construction proposed in M-
tree [5.4] since it has a minimum number of distance computations and a good I/Operformance. In M-tree, a number of representative data items are selected for efficient
indexing. There are several ways to select them such as Sampling or Random selection. In
the STRG-Index, we employ the clustering results to determine the representative data
items. The STRG-Index tree structure consists of three levels of nodes; shot node, cluster
node, and object node as seen in Figure 3.
13
8/3/2019 Dimensions in Data Processing 2402
http://slidepdf.com/reader/full/dimensions-in-data-processing-2402 14/76
The top-level has the shot node which contains the information of each shot in a video. Each
record in the shot node represents a segmented shot whose frames share a background.
The record has a shot identifier (ShotID), a key RAG (Grkey), an actual BG (BGr), and an
associated pointer (ptr) which references the top of corresponding cluster node. The
following figure shows an example of a record in the shot node.
The mid-level has the cluster nodes which contain the centroid OGs that represent cluster
centroids. Each record indicates a representative OG among a group of similar OGs. A record
contains its identifier (ClusID), a centroid OG (OGc) of each cluster, and an associated
pointer (ptr), which references the top of corresponding object node. The following figure
shows an example of a record in a cluster node.
The low-level has the object nodes which contain OGs belonging to a same cluster. Each
record in the object node represents an object in a video, and has the index key
(which is computed by EGEDM(OGm,OGc)), an actual OG (OGm), and an associated pointer
(ptr) which references the actual video clip in the disk. The following figure shows an
example of a record in the object node.
STRG-Index Tree Construction
Based on the STRG decomposition described above, an input video is separated into
foreground (OG) and background (BG) as subgraphs of the STRG. The extracted BGs are
stored at the root node without any parent. All the OGs sharing one BG are in a same cluster
node. This can reduce the size of index significantly. For example, in surveillance videos a
camera is stationary so that the background is usually fixed. Therefore, only one record (BG)
in the shot node is sufficient to index the background of the entire video.
We synthesize a centroid OG (OGc) for each cluster which is a representative OG for the
cluster. This centroid OG is inserted into an appropriate cluster node as a record. This
centroid OG is updated as the member OGs are changed such as inserting, deleting, etc.
Also, each record in a cluster node has a pointer to an object node. The object node has
actual OGs in a cluster, which are indexed by EGEDM. To decide an indexing value for each
OG, we compute EGEDM between the representative OG (OGc) in the corresponding cluster
14
8/3/2019 Dimensions in Data Processing 2402
http://slidepdf.com/reader/full/dimensions-in-data-processing-2402 15/76
and the OG (OGm) to be indexed. Since EGEDM is a metric distance by Theorem 1, the value
can be the key of OG to be indexed.
IV. Moving Object Detection
Moving object detection is very important in intelligent surveillance. Currently, the main
detection algorithms include frame difference method, background subtraction method,
optical flow method and statistical learning method. Optical flow method is the most
complex algorithm. It spends more time than other methods, and statistical learning method
needs many training samples and also has much computational complexity. These two
methods are not suitable for real-time processing. Background subtraction method is
extremely sensitive to the changes of light. Frame difference method is simple and easy to
implement, but the results are not accurate enough, because the changes taking place in
the background brightness cause misjudgment [6.1,6.2,6.3,6.4]. According to that eyes are
sensitive to both of movement and edges, in a recent work [6], an efficient algorithm based
on frame difference and edge detection is presented for moving object detection. Figure 1
gives The flow chart of frame difference method
Figure 1 The flow chart of frame difference method
The flow chart of the detecting process by moving edge method is as Figure 2.
Figure 2 The flow chart of moving edge method
The flow chart of the detection process by using the method based on frame difference and
edge detectionpresented in this paper is as figure 3.
15
8/3/2019 Dimensions in Data Processing 2402
http://slidepdf.com/reader/full/dimensions-in-data-processing-2402 16/76
Figure 3 The flow chart of the improved algorithm
Further, Object segmentation is performed to divide the image into moving area and static
area. Then After separating moving objects and background, we need to locate the object so
as to get the exact position of moving objects. The common approach is to calculate
connected components in binary images, delete those connected components whose area
are so small, and get circum-rectangle of the object.
V. Motion Picture Storage with Compression [8]
ANIMATION of human-like virtual characters has potential applications in the design of human
computer interfaces, computer games, and modeling of virtual environments using power-
constrained devices such as laptop computers in battery mode, pocket PCs, and PDAs.
Distributed virtual human animation is used in many applications that depict human models
interacting with networked virtual environments. The two major issues involved in the
streaming of MoCap (Human Motion Capture) animation data to mobile devices are 1)
limited bandwidth available for streaming MoCap data, and 2) limited power available to
receive, decompress, and render the compressed MoCap data. It is desirable to have a
compression method which reduces the network bandwidth enough to allow streaming/
using in mobile devices, and also requires less computation, hence, power consumption, at
the client side to reconstruct the motion data from the compressed data stream. In order to
standardize virtual human animation, MPEG-4 has proposed H-Anim standards for
representation of virtual humans and the format of the corresponding motion capture
(MoCap) data to be used for rendering and animating the virtual human [8.1], [8.2], [8.3]. A
recent compression algorithm for MoCap data (or, equivalently, MPEG-4 Body Animation
Parameters (BAP) data), termed as BAP-Indexing [8.4], which uses indexing techniques for
compression of BAP data, resulting in a significant reduction in power consumption required
16
8/3/2019 Dimensions in Data Processing 2402
http://slidepdf.com/reader/full/dimensions-in-data-processing-2402 17/76
for decompression. BAP-Indexing exploits the structural hierarchy of the virtual human to
achieve efficient compression, which, though lossy, results in reconstructed motion of good
quality.
Fig. 1. The standard compression and decompression pipeline for MPEG-4 Motion Capture (MoCap) data or BodyAnimation Parameter (BAP) data.
Matrix Reprsentation Of MoCap Data
The MoCap Data (or, equivalently, MPEG-4 BAP data) is represented by an n x m-
dimensional matrix X, where n is a multiple of the video sampling rate or frame rate
expressed as frames per second (fps) and m is the number of degrees of freedom for the
virtual human (the maximum value of m = 296 as defined in the MPEG-4 standard). Each
row of the matrix represents a pose of the virtual human for a small time step. Each column
of the matrix corresponds to either the displacement of the model from a fixed origin, or the
Euler angle of rotation needed to achieve the desired pose. We have used a 62-dimensional
virtual human, with a frame rate of 33 fps. This means that, for a 10 second motion
sequence, the motion matrix X is a 330 x 62 array of floating point numbers. The first three
columns of X represent the absolute displacement of the virtual human with respect to a
fixed origin in the 3D virtual world. The next three columns represent the absolute
orientation of the virtual human with respect to the virtual world coordinates. The remaining
56 columns correspond to the angles made by the degrees of freedoms associated with the
various joints in the skeletal virtual human model.
As a first step in the compression process, the matrix X is equivalently represented as adifference matrix, d n-1x m, and the initial pose vector I, where I is assigned the first row of X,
and the rows of d are the differences between successive rows of X.
I j = X1j j = 1, 2, …, m
dij = Xi+1, j - Xij i = 1…n-1; j= 1…m.
The difference matrix d, subsequently termed the motion matrix, can be interpreted as
successive small angular increments (floating point numbers) needed by the virtual human,
17
8/3/2019 Dimensions in Data Processing 2402
http://slidepdf.com/reader/full/dimensions-in-data-processing-2402 18/76
for each of its degrees of freedom, in order to realize the desired animation. Without loss of
generality, we assume that d has n rows.
Bap-Indexing: Indexing Of The BAP Data
For approximately periodic and regular motions such as walking, jogging, and running, a
collection of all the n x m floating point numbers within the corresponding motion matrix d
exhibit a tendency to form a finite number of well separated clusters. Taking a cue from this
observation, we assign the n x m floating point numbers in d to a finite number of buckets.
Each bucket, in turn, is associated with a representative number which best describes the
collection of the numbers within the bucket. The basic concept underlying the proposed
indexing technique is to be able to index some (perhaps all) of the numbers within the
original motion matrix d and generate a corresponding lookup table for the indices.
Indexing the Motion Matrix d
Step 1: All the data in matrix d is collected into a single 1D array A of size n x m. The array
A is sorted in ascending order. All the numbers in A are multiplied by the resolution
quantization term (RQT), M. The RQT depends on the number of significant digits used to
represent the floating point number. For example, if the required accuracy of the floating
point numbers is a maximum of four digits, RQT = 10, 000. The numbers are rounded off to
represent integers in the range [ Amin . M, Amax .M].
Step 2: The integers in the range [ Amin . M, Amax .M] are divided into buckets numbered from
0 to 255. It is desirable to allocate each of the 256 buckets an equal share of the n x m
numbers in A. The rationale behind assigning the 256 buckets an equal share of n x m
numbers in the motion sequence is that BAP data clusters that contain more data points are
assigned more buckets (hence, more indices). In essence, the indices are distributed among
the clusters in proportion to the cluster size (note that the number of indices is fixed by
fixing the number of bits per index). This scheme is similar to adaptive vector quantization
which is known to reduce the overall encoding error. Thus, each bucket should have freq =
( Amin . M - Amax .M)/256 numbers allocated to it. This is achieved by computing the histogram
of the integers in A, and dividing the histogram into 256 vertical strips such that each strip
has the same area, freq. After all the numbers in A have been allocated a bucket numbered
from 0 to 255, the numbers in A are divided by the RQT to recover the original numbers.At the end of Step 2, we get a set of 256 buckets denoted by bucket (j) for j = 0 to 255,
such that each floating point entry in the motion data matrix, d is contained in exactly one
of the 256 buckets. An index matrix dindex is used to store the bucket number (index) for
the corresponding entry in the matrix d.
Lookup Table for the Index Matrix dindex
18
8/3/2019 Dimensions in Data Processing 2402
http://slidepdf.com/reader/full/dimensions-in-data-processing-2402 19/76
The lookup table is used to map each of the indices to a corresponding representative
number such that a suitable approximation to the original motion matrix d can be recovered.
The creation of an appropriate lookup table for the recovery of the original motion data
matrix d from the index matrix dindex is critical, since recovery of the original data after
discretization invariably results in motion distortion. A straightforward method to recover thenumber associated with a bucket is to compute the simple average of all the floating point
numbers assigned to the bucket. However, this invariably leads to poor approximation of the
original motion matrix d. We have observed that intelligent exploitation of the hierarchical
structure of the skeletal virtual human model can lead to better construction of the lookup
table Tlookup, which in turn results in reduced error in the reconstructed motion using the
lookup table. The steps for creating the lookup table are detailed as follows:
Step 1: The virtual human is represented by a hierarchical skeletal model. For each m-
dimensional pose vector, each dimension, or column in the motion matrix d, is assigned a
level li (Fig. 2). The level li signifies the importance of the degree of freedom associated with
a particular joint in the overall displacement of the model joints. A joint i, at level li = 1,
when given a small angular displacement, affects the model more in terms of the overall
displacement, than a joint j at level lj = 2, 3, 4, 5, or 6.
Step 2: After assigning level values to the various joints of the virtual human model, these
joint level values are used to compute a weighted sum of the numbers belonging to a
bucket. The jth lookup value in lookup table Tlookup is given by:
where η is a constant. Empirical observations have revealed that as η increases, the
Tlookup values result in a better approximation to the data, resulting in reduced
displacement error. This is due to the fact that the numbers associated with level = 1 affect
the displacements in the body the most. Hence, emphasizing the numbers within a bucket
with level = 1 leads to better approximation of the motion data. As η ->∞, all the weighting
terms in (1) tend to zero, except for the terms with level = 1. Hence, when computing the
weighted sum of the numbers in a bucket, we consider only those numbers with level = 1
(selective averaging), and compute a simple mean of these numbers. If none of the entries
in a bucket have level = 1, we use the next smallest level to compute the weighted sum.
Our empirical observations have shown that the BAP data values from all levels of the virtual
human model form compact and well separated clusters. The data values with level = 1 in
each bucket are fairly close to each other. This allows selective averaging (1) to be
performed without introducing too much visual distortion.
19
8/3/2019 Dimensions in Data Processing 2402
http://slidepdf.com/reader/full/dimensions-in-data-processing-2402 20/76
Fig. 2. An example of the hierarchical structure of a virtual human skeletal model consisting of 31 nodes, with a total
of 62 degrees of freedom of motion (rotational and translational). For convenience, the root node is drawn at the
bottom.
The above mentioned paper further provides Motion Matrix Decomposition for Motion
Sequences of Long Durations.
VI. Other Techniques for Motion Pictures
Besides MPEG-4, there exist other ad hoc quantization methods for efficient use and
distribution of MoCap data over a network. Endo et al. [8.5] propose quantization of the
motion type, rather than the motion data itself. Hijiri et al. [8.6] describe a new data packet
format which allows flexible scalability of the transmission rate, and a data compression
method, termed as SHCM, which maximizes the efficacy of this format by exploiting the 3D
scene structure. The proposed method in this paper uses quantization to achieve data
compression in a manner somewhat similar to the above work, but incorporates intelligent
exploitation of the hierarchical structure of the human skeletal model. Giacomo et al. [8.7]
present methods for adapting a virtual human’s representation and the resulting animation
stream, and provide practical details for the integration of these methods into MPEG-4 and
MPEG-21 architectures. Aubel et al. [8.8] present a technique for using impostors to improve
the display rate of animated characters by acting solely on the geometric and rendering
information. Recently, Arikan [8.9] has presented a comprehensive MoCap database
compression scheme which is shown to result in a significantly compressed MoCap
database. The above techniques, although very efficient in terms of compression ratio, do
not address the need for customized compression of BAP data for power aware devices. To
this effect, the proposed BAP-Indexing technique is a refined and special case of standard
clustering, quantization, and lookup (CQL) based compression schemes. BAP-Indexing not
only allows for low-bitrate encoding of motion data, but is also suitable for data reception
and data reconstruction on power-constrained devices.
20
8/3/2019 Dimensions in Data Processing 2402
http://slidepdf.com/reader/full/dimensions-in-data-processing-2402 21/76
Clues for future work on motion picture:
A drawback associated with most animation research is that there is no perfect quantitative
measure for the quality of the reconstructed motion. The compression error (the
displacement of a body segment from its original location) is easily perceptible when the
body segment touches an environment object, whereas a relatively large error is acceptableif the body segment is moving in an empty space. This observation can be exploited for
enhancing the compression ratio provided that detailed models of the environment and the
interaction of the virtual human with the environment are available. Finally, the intelligent
use of the hierarchical structure of the model yields good results for full body motions of the
virtual human; for small delicate motions such as movement of the fingers, or for facial
animation, the proposed technique offers considerable scope for future improvement.
VII. Modeling and refinement of Audio data
The management of large collections of music data in a multimedia database has received
much attention in the past few years. Due to several inherent characteristics of audio data,
there have been demands for huge storage spaces, large bandwidth and real-time
requirements for transmission, content-based queries, similarity-based search and
retrievals, and synchronization of retrieval results. Of interest to the user are easy-to-use
queries with fast and correct retrievals from the audio/multimedia database. To this end, (1)
derivation of good features from the data to be used as indices during search, (2)
organization of these indices in a suitable multi-dimensional data structure with efficient
search, and (3) a good measure of similarity (distance measure) are important factors. An
audio database supporting content-based retrievals should have the indices structured with
respect to the audio features, which are extracted from the data.
In the researches of music content-based retrieval, many approaches extract the features,
such as key melodies, rhythms, and chords, from the music objects and develop indices that
will help to retrieve the relevant music efficiently [9.5][9.8][9.12]. Several reports have also
pointed out that these features of music can be transformed and represented in the forms of
music feature strings [9.1][9.2][9.4][9.6][9.7] or numeric values [9.10][9.11] such that the
indices can be created for music retrievals. We also can combine these features to support
various types of queries.
Existing Multi-feature Indexing for Music Data
In the researches of indexing for music database retrieval, most of existing works were
concentrated in constructing single-feature index structures for query searching: for
instance, in 1999, the Key Melody Extraction and N-note Indexing by Tseng, Melodic
Matching Techniques by Uitdenbogerd, et al., and Approximate String Matching Algorithm by
21
8/3/2019 Dimensions in Data Processing 2402
http://slidepdf.com/reader/full/dimensions-in-data-processing-2402 22/76
Liu, et al. [9.9]; in 2000, Query by Music Segments by Chen, et al. [9.2]; and in 2002,
Numeric Indexing by Lo, et al. [9.10]. There are only a couple of researches emphasized on
how to create a multi-feature index for music data retrieval. The most of recent works are
Multi-Feature Index Structures [9.6] and Multi-feature Numeric Indexing [9.11]. We briefly
discuss these two approaches in the following subsections.
i. Grid-Twin Suffix Trees
There were four multi-feature index structures for music data retrieval proposed by Lee and
Chen [9.6], in which it consists of Combined Suffix Trees, Independent Suffix Trees, Twin
Suffix Trees, and Grid-Twin Suffix Trees. This research claimed that the structure of Grid-
Twin Suffix Trees provides most scalability among them. The Grid-Twin Suffix Trees is an
improved version from Twin Suffix Trees. An example of Twin Suffix Trees is shown in
Figure 1. There could be two music features in the Twin Suffix Trees and each feature hasits own index structure of independent suffix tree and there are links between them pointing
from each node in one independent suffix tree to the corresponding feature nodes in
another independent suffix tree.
Figure 1. Construction of the Twin Suffix Tree Figure 2. An example of the Grid-Twin Suffix
Trees
To construct Grid Twin Suffix Tree, they first use a hash function to map each suffix of the
feature string into a specific bucket of a 2-dimensional grid. The hash function uses the first
n symbols of the suffix to map it into a specific bucket. After hashing all suffixes, the
remaining symbols of feature string following the suffixes are used to construct the Twin
Suffix Trees and accompanied under the buckets. The Figure 2 shows an overview of the
structure for Grid-Twin Suffix Tree. Considering melody and rhythm only, the hash function
is as following,
22
8/3/2019 Dimensions in Data Processing 2402
http://slidepdf.com/reader/full/dimensions-in-data-processing-2402 23/76
where x and y are the row and column coordinates, respectively, and P(x, y) denotes the
position of the bucket. The Numm, Numr, Mi, and Ri are the sizes of the melody and rhythm,
the values of the ith symbols of melody and rhythm, respectively. The length of the suffix is
denoted by n.
ii. Multi-Feature Numeric Indexing
The Multi-Feature Numeric Index for music data retrieval was proposed by Lo and Chen
[9.11]. For translating music data into numeric value, they assume that the music symbols,
‘a’, ‘b’, ‘c’, …, ‘m’, can map into integer values 0, 1, 2, …, m-1, respectively. If we pick out a
music segment with n sequential notes from a melody feature string, denoted x1x2…xn, the
integer value of each note can be represented by P(xi), 1 ≤ i ≤ n. Therefore, this segment of
n sequential notes can be transformed into a numeric value by the conversion function –
v(n), as shown below.
Each music feature segment can be converted into a numeric value by equation (2) and
these values for a music feature segment can be looked as a coordinate for multi-
dimensional space. Such that the coordinate can be inserted into a multi-dimensional index
tree, such as R-tree [9.3], for music retrieval. Therefore, it also can be extended for
converting three or more features into high dimensional index tree.
Although, the authors claimed that Grid-Twin Suffix Trees provides more scalability than the
other three index structures in [6]. However, if there are more features or we use more
symbols of suffixes (n > 2) to map into the buckets, a massive of memory space is needed
for Grid-Twin Suffix Trees to construct buckets of grid structure. They may need a huge
memory space and a sparse matrix may occur in the grid structure. In addition, since
numeric index is created by transforming fixed length, n in equation (2), of music segment
into numeric value, the main drawback of Multi-Feature Numeric Index is that the length of a
query (Query By Example, QBE) is inflexible. It had better equal to the length of music
segment which the index created. Otherwise, searching time for the query will be a multiple
times increasing.
iii. Hybrid Multi-Feature Indexing
In a work [9], a hybrid multi-feature indexing has been proposed. It takes advantages of
Multi-Feature Numeric Index and Grid-Twin Suffix Trees to construct a new index structure
such that our proposed index structure will be less memory space needed than Grid-Twin
23
8/3/2019 Dimensions in Data Processing 2402
http://slidepdf.com/reader/full/dimensions-in-data-processing-2402 24/76
Suffix Trees, as well as, unlike Multi-Feature Numeric Index, be without any query length
restriction. To construct Hybrid Multi-Feature Index, it uses a multi-feature tree structure
instead of grid structure in Grid-Twin Suffix Trees. The Twin Suffix Trees original under each
bucket are now linked under corresponding leaf node of multi-feature tree in Hybrid Multi-
Feature Index. The work organizes the creating of the indexing approach in the followingthree steps:
Step 1: Suppose that there are d features in music data and, in each music feature string,
the first n symbols of the suffix will be transformed into a coordinate. We design equation (3)
for d-feature coordinate P(x1, .., xd) as follows,
where F1(i), … , Fd(i) and N1, … , Nd represent the values and sizes of alphabet symbols,
respectively, for d music features. We note that a suffix within any music segments, such as
“a1” or “a1b2”, will have only one corresponding coordinate.
Step 2: The coordinate derived from Step 1 is then inserted into a d-feature (d dimensional)
tree. The degree of each non-leaf node in this d-feature tree is 2d. There is also a center
point for each non-leaf node. The coordinate, (x1c, x2c, …, xdc), of the center point is
computed by averaging the coordinates inserted under current node or its descendent
nodes. Such that, if there are 2 features and the center point is (x1c, x2c), the node will be
partitioned into four domains, (≥ x1c, ≥ x2c), (< x1c, ≥ x2c), (≥ x1c, < x2c), and (< x1c, <
x2c). To keep the index tree balancing as R-tree, each non-leaf node in this d-feature tree
contains at least 2d-1 not null links (half full). Therefore, to insert a new coordinate into a
node, it may cause the center point to be recomputed or may cause the index tree be
reorganized.
Step 3: The remaining symbols behind the first n symbols of suffix are then used to
construct the Twin Suffix Trees linked under d-feature tree. Figure 3 and Figure 4 represent
the structures of Hybrid 2-Feature Index and Hybrid 3-Feature Index, respectively.
Figure 3. The structure of Hybrid 2-Feature Index Figure 4. The structure of Hybrid 3-Feature Index
24
8/3/2019 Dimensions in Data Processing 2402
http://slidepdf.com/reader/full/dimensions-in-data-processing-2402 25/76
Use of Stochastic (Statistical) Analysis in Image-processing
I. Image Compression
A proprietary method of image compression has been developed [10] with this technology
which intelligently stores a version of the imagery and recovers it by means of the
Stochastic Matrix Method SMM function recovery. Figure 5 illustrates it.
Figure 5. Closeup look at the face of a bird. Top: input image; Bottom: interpolated image. The input image is what is
stored and the interpolated image is what is viewed by a user.
The input image is very coarse and certain features are hard to discern, but the interpolated
image recovers much of this content and makes it intelligible to the human eye.
Figure 6 illustrates the advantages enjoyed over the ubiquitous JPEG DCT coder. On close
inspection the JPEG DCT coder reveals its 8 pixel by 8 pixel blocks in its characteristic
artifact which becomes a nuisance at significant compression levels. It is clear from the
figure that the function recovery by means of SMMs does not suffer from this problem.
Figure 6. This is a closeup look at the back of the head of a bird in order to illustrate the blockiness of JPEG
compression and the lack of it with the compression scheme by means of SMM function recovery. Left: JPEG DCT
image, the 8 x 8 blocks are apparent. Centre: image interpolated from Right by means of SMM function recovery.
Right: image that is actually stored and which is the input image for the function recovery.
25
8/3/2019 Dimensions in Data Processing 2402
http://slidepdf.com/reader/full/dimensions-in-data-processing-2402 26/76
II. Moving Object Detection [11]
A common approach to detect foreground objects is to collect pixels in the current frame
that deviate significantly from the model estimations. Those methods can generally be
classified as predictive or non-predictive manners. Predictive methods develop dynamicaltime series models to recover the current input based on past observations. The Kalman
filter was firstly introduced by Koller et al. [11.1] for modeling the dynamic states of
background pixels. The optical flow based method is a natural approach to model persistent
motion behavior. Wixson [11.2] presented a method to detect salient motion by
accumulating directionally-consistent flow. Tian [11.3] combined temporal difference
imaging and a temporal filtered motion field to detect salient motion in complex
environments. Recent methods are based on more complicated models. In [11.4], an
autoregressive model was proposed to capture the properties of dynamic scenes. Non-
predictive density-based methods neglect the order of observations and build a
probabilistic representation (PDF) of the observations at a particular pixel. Wren [11.5] used
a single Gaussian intensity distribution for each pixel. Consequently, the idea extended to
the mixture of Gaussians model (MGM) proposed by Stauffer and Grimson [11.6] to address
the multi-modality of background. When density functions are so complex that cannot be
modeled parametrically, non-parametric approaches, proposed by Elgammal [11.7], are
considered more suitable to handle arbitrary densities, where kernel density functions [11.8]
are used for pixel-wise background modeling. However, it is computational costly and use no
spatial correlate on of the pixel features explicitly.
In a work by Tang, Gao, and Liu [11], they propose a real-time moving object detection
algorithm by clustering salient motion points into spatial and kinetic mixture of Gaussian
model recursively. In each frame, temporal difference filtering first generates a set of
feature points; then evaluations of validation and salience are performed for every feature
points preceded by resampling operations, so as to only preserve those samples that
strongly support the cluster of salient moving object in the feature space. The clusters are
instantiated and updated applying an online approximation algorithm, and are terminated
when their component weights drop below a threshold.
Brief overview of the model:
Model Specification
A four dimensional feature vector is taken to describe the state of each sample, i.e., zi =
(x, y, x’, y’)i i ∈ [1, N] and zi ∈ ℜ4, N ∈ ℵ, where (x, y)i represents the point’s
coordinates, (x’, y’)i denotes motion speed values, and N is the number of samples. For
simplification, let si = (x, y)i and vi = (x’, y’)i describe spatial and motion information
26
8/3/2019 Dimensions in Data Processing 2402
http://slidepdf.com/reader/full/dimensions-in-data-processing-2402 27/76
respectively. Assuming we have the initial mixture distribution, feature points can be
associated with one of the K clusters (Fig. 3). The likelihood of a feature point belonging to
the foreground can be written as:
(1)
where qk is the prior of the k th Gaussian component in the mixture model, and ),;( k k i z Σ µ η
is the k th Gaussian component defined as:
(2)
where d = 4 is the dimension of the MGM models.
We further assume that the spatial and kinetic component of the MGM models are
decoupled, i.e., the covariance matrix of each Gaussian component takes the block diagonal
form:
(3)
where s and v stand for the spatial and kinetic features respectively. With such
decomposition, each Gaussian component has the following factorized form:
(4)
Based on the above model of representation of moving objects, further Clustering Analysis
was performed employing Gaussian distribution based K-means technique over the
sample data.
To address to the selection / estimation of the features, several steps are performed: a
motion map is firstly obtained by temporal difference of Gaussian from each frame, from
which a number of feature points are extracted using Monte Carlo importance sampling,
with their associated velocities in the sequence are calculated using LK optical flow
algorithm. Feature points are extracted in the position-velocity space. The temporal
difference imaging helps to detect slow moving objects, give better object boundaries, and
speed up the algorithm because the temporal filter of optical flow is only applied to the
regions of change, which are detected by temporal difference imaging. In that region, for
each pixel, the motion is salient motion if the pixel and its neighborhood move in the same
direction in a period of time.
27
8/3/2019 Dimensions in Data Processing 2402
http://slidepdf.com/reader/full/dimensions-in-data-processing-2402 28/76
Figure 1. Foreground (light gray) and background Figure 2. Example of clustering motion
vectors
(dark gray) pixel colour distributions. extracted from two reversing cars by using SKMGM.
(a) represents the sample points and
(b) is a instance of SKMGM
distribution.
Figure 3. Example of the MGM distributions in position and velocity space respectively from
Fig. 3. (a) specifies the spatial distribution, and (b) depicts velocity space distributions.
Even though many background models have been proposed in the literature, the problem of
moving objects detection in complex environment is still far from being completely solved.
The above-mentioned techniques are important for object detection and tracking in video
surveillance and similar applications.
Indexing: An explicit discussion
Since the relative proportion of multimedia (video, image and audio) data within databases is expected to in-crease
substantially in the future, keyword-based indexing would be inadequate and eficient content-based query and
retrieval are required. The problem of devising content-based query, indexing, and retrieval for these newer data
types remains an open and challenging problem. Apart from the techniques discussed as above i.e. in
particular case of multifeature music representation and retrieval, and that in development
of graphs based model for video data, and more alike, we find the following approaches
available in literature and in practice for other data type like audio, and that in general.
I. Content Based Indexing & Retrieval
Content-based retrieval of multimedia database calls for content-based indexing techniques.
Different from conventional databases, where data items are represented by a set of
attributes of elementary data types, multimedia objects in multimedia databases are
28
8/3/2019 Dimensions in Data Processing 2402
http://slidepdf.com/reader/full/dimensions-in-data-processing-2402 29/76
represented by a collection of features; similarity of object contents depends on context and
frame of reference; and features of objects are characterized by multimodal feature
measures. These lead to great challenges for content-based indexing. On the other hand,
there are special requirements on content-based indexing: To support visual browsing,
similarity retrieval, and fuzzy retrieval, nodes of the index should represent certainmeaningful categories.
Indexes are crucial for those large databases to speed up the retrieval. On the other hand,
visual, fuzzy and similarity queries in those large content-based databases cannot be
implemented using conventional indexing techniques such as B-trees and inverted files,
which are proved to be very effective in traditional databases to index attributes and text.
This is because the feature measures of object contents are complex and are usually
multidimensional and multimodal. Conventional indexing techniques are based on individual
keys, which are definite and not visual. For the purpose of handling complex feature
measures, there have been researches to extend the concept of indexing using abstraction
and classification [12.8], [12.9], [12.10], [12.20]. To handle multimodal feature measures, to
gain self-organization and learning capabilities in indexing, Jian-Kang Wu [12] developed a
Content based Indexing (ContIndex) method for indexing multimedia objects.
The feature measures of object contents
Content-Based Retrieval
For completeness of the discussion, let us start from the multimedia object definition in [5]
as follows:
Multimedia Object (MOB) can be defined using a six-tuple Omob = U, F, M, A, Op, S,
where:
• U is multimedia data component.
• F = F1, F2, ... represents a set of features derived from data. A feature F i can be
either numerically characterized by feature measures in feature spaces
i
n
iii F F F F ×××× ....321
or conceptually described by a set of concepts.
• M j = M1 j, M2
j, . . . represents the interpretation of features Fi, i = 1, 2, ...
• A stands for a set of attributes or particulars of Omob.
• O p is a set of pointers or links, and is expressed as,
, sup other p
sub p p p OOOO =
are three type of pointers/links pointing/linking to superobjects, subobjects,
and other objects, respectively.
• S represents set of states of Omob.
29
8/3/2019 Dimensions in Data Processing 2402
http://slidepdf.com/reader/full/dimensions-in-data-processing-2402 30/76
Content of a multimedia object is the content of its data set U, which is restricted to a
certain set of features Fi, i = 1, 2, ... of the object and characterized by feature measure sets
ik F , k = 1, 2, . . . and further described by concept sets M j, j = 1,2, … In many cases,
feature measures are vectors and written asi
j F = x1, x2, …., xn T.
For example, representation of a facial image can be done by focusing our attention to some
visual features such as chin, hair, eyes, eyebrows, nose, and mouth. To characterize eyes,
we need to extract measures such as area, fitting parameters to a deformed template.
These feature measures are vectors and can be considered as points in feature spaces. Eyes
can also be described by a set of concepts such as big eyes, medium eyes, or small eyes.
The concepts “big,” “medium,” and “small” are interpretations of facial feature “eyes.”
Fig. 1 shows a representation hierarchy for images in content-based image databases. In
image archival phase, a bottom-up process is performed to derive from the original image
data the feature measures of regions-of-interest, and interpretations if necessary. This
bottom-up process consists of three steps, namely, segmentation, feature extraction, and
concept mapping. It performs information abstraction, and provides keys for easy access of
large image data. In retrieval phase, the image data are accessed through their feature
measures (similarity query) or interpretations (descriptive query), which are considered as
keys from database point of view. Content-based retrieval usually does not access the data
through attributes A, or directly through the data component U. Instead, it operates on
feature measures.
Fig. 1. Image representation hierarchy. To archive images into content-based image database, images are first segmented to identify
regions of interest. Feature measures are then extracted from the image data within these regions. Interpretations can be finally
generated by mapping of the feature measures into a set of concepts.
30
8/3/2019 Dimensions in Data Processing 2402
http://slidepdf.com/reader/full/dimensions-in-data-processing-2402 31/76
Content based retrieval is to find the best matches from large databases for a given query
object. The best match is defined in terms of similarity measure. Since the contents of
objects are represented by features, the similarity is then defined with respect to these
features:
(1)
where wi denotes the weight for ith feature, and ),( ii
q F F sim denotes the similarity between
the query object and an object in the database with respect to ith feature. Here we simply
express the similarity between objects as a linear combination of the measures of their
common and distinctive features [12.15].
Content-based indexing is aimed to create indexes in order to facilitate fast content-
based retrieval of multimedia objects in large databases. The index in traditional databases
is quite simple. It operates on attributes, which are of primitive data types such as integer,
float, and string. For example, to build a binary index tree on age of people in a database,
the first two branches can be created for “age >= 35” and “age != 35.” Here the operation
is simple and the meaning is definite and obvious. The situation becomes very complex in
content based indexing, which operates on complex feature measures.
• The challenges for content-based indexing are: The index must be created using all features
of an object class, so that visual browsing of the object class is facilitated, and similarity
retrieval using similarity measure, in (1), can be easily implemented.
• The context and frame of reference in similarity evaluation suggest that nodes in index treeshow consistency with respect to the context and frame of reference. For example, if, in a
level of an index tree, the similarity is evaluated with respect to eye size, the nodes in this
level will represent object categories with various eye sizes. This implies that the index tree
has similar property as classification tree.
• Multiple multimodal feature measures should be fused properly to generate index tree so
that a valid categorization can be possible. Two issues must be addressed here: first, one
measure only is usually not adequate because of the complexity of objects. Second, to
ensure the specified context and frame of reference, care must be taken in feature selection
process.
Content Based Indexing developed by [12], tries to find solution to above difficulties.
It shares features with classification tree. Horizontal links among nodes in the same level
enhance the flexibility of the index. A special neural-network model, called Learning based
on Experiences and Perspectives (LEP), has been developed to create node categories by
fusing multimodal feature measures. It brings into the index the capability of self-organizing
31
8/3/2019 Dimensions in Data Processing 2402
http://slidepdf.com/reader/full/dimensions-in-data-processing-2402 32/76
nodes with respect to certain context and frames of reference. Algorithms have been
developed to support multimedia object archival and retrieval using ContIndex.
AssumeΣ is a set of multimedia objects, ,......, 21 mω ω ω =Ω represents a set of m
classes to which Σ is to be classified. Assume also that Ω satisfies that
1) Σ≠iω for all i= 1,2,…, m;
2) Σ=∪≤≤ imi ω 1 ;
3) jiω ω ≠ for ji ≠
The indexing process consists of recursive application of mapping Ω→Σ denoted by
),( Ω=Γ Dη , where D is a set of parameters to define the mapping, and classes in Ω
represent the categories of multimedia object set Σ, and are associated with nodes of the
index tree N1, N2, ..., Nm. In ContIndex tree, number of classes m is kept the same for all
intermediate nodes for manipulation efficiency. In this case, the index tree is an m-tree. The
mapping Γ is defined by D and Ω. According to the definition, Ω is a set of classes
representing the goal of the mapping. D is related to a set of feature measures used for the
mapping. When the mapping is defined, D is represented by a set of reference feature
vectors. For simplicity, only one feature is used to create a level of the index tree.
Fig. 2 shows the first three levels of a ContIndex tree. Features selected for creation of
these three levels are:k
l
j
l iF F F F F F ===
21,,
10 . Nodes are labeled with a number of
digits that is equal to their level number (the root is at level 0). For example, N21 is a node in
second level, and is the first child of node N2, N21 N22, ... are children of node N2.
They are similar with respect to featurei
l F F =0
, inherit the reference feature
vectors of feature Fi, and represent categories (ω21, ω22, ...) with respect to featurei
l F F =
0.
New reference feature vectors will be created for them upon the creation of these nodes.
32
8/3/2019 Dimensions in Data Processing 2402
http://slidepdf.com/reader/full/dimensions-in-data-processing-2402 33/76
Fig. 2. The structure of content-based index ContIndex. As indicated in the figure, features selected for creation of
these three levels of the index tree are
k
l
j
l iF F F F F F ===
21,,
10Nodes are labeled with the number of
digits, which is equal to their level number. For example, N21 is a node in second level (the root is at level 0). It is the
first child of node N2.
A top-down algorithm for the creation of m-tree ContIndex is summarized as follows:
1) Attach all objects to root and start the indexing process from the root and down to leaf
node level.
2) For each node at a level: Select a feature, partition the multimedia objects into m classes
by using a set of feature measures, create a node for each class, and generate a reference
feature vector(s) of the selected feature and an iconic image for each node.
3) Repeat the second step until each node has, at most, m descendants.
4) Start from second level, build horizontal links with respect to features, which have been
already used at the levels above.
Horizontal zooming is facilitated by horizontal links between nodes in the same level. Let us
have a look at nodes in the second level. Nodes at this level under the same parent
m p p p p N N N N ,...,,,21
, represent categories with respect to feature1l
F and under the
same category with respect to feature0l F . Now suppose user finds
q p N is preferable with
respect to feature1l F and wants to have a look at categories of feature
0l F , which are
33
8/3/2019 Dimensions in Data Processing 2402
http://slidepdf.com/reader/full/dimensions-in-data-processing-2402 34/76
represented by nodesqqq m N N N ...,, 21 . To achieve that, we can simply create horizontal
links among these nodes.
Fig. 3. ContIndex indexing tree and its horizontal links.
Multimedia objects in the database represent event/object cases. ContIndex performs
abstraction/generalization of these events/objects cases and produces content-based index.
Intermediate nodes in the index tree represent categories of cases. They are generalization
of cases, and cases are instances of these categories. If, for example, under a category
there are similar patients a doctor has been cured, this category represents the experience
of this doctor regarding this type of patients. In general, an intermediate node represents a
certain concept, which is an abstraction of cases under it. To capture the validity of the
concept, for each intermediate node, a record of confidence is maintained. The confidence
record of a concept is high if the number of cases supporting it is large.
Content-Based Retrieval Using ContIndex
The retrieval process is a top-down classification process starting from the root of the
tree. At each node, the process chooses from the child nodes one or more nodes which are
the nearest to the query object with respect to the feature used for creation of this node in
34
8/3/2019 Dimensions in Data Processing 2402
http://slidepdf.com/reader/full/dimensions-in-data-processing-2402 35/76
the index creation process. How many child nodes should be chosen depends on the weight
for the feature. A higher weight implies that the feature is more critical. Less child nodes
should be chosen.
Spatial Self-Organization
For visual browsing of databases, spatial organization of nodes are preferable. For
example, to view types of eyes with respect to eye size, we prefer all icon images are
displayed with the size from largest to smallest on the screen. For this purpose, Self-
Organizing Map (SOM) by Kohonen [12.13] is an effective neural network paradigm for the
ContIndex creation.
II. Transform based Indexing of Audio Data [13]
For representation and indexing of audio data various methods are available including
methods that use pitch characterizations [13.10] or several acoustical characteristics [13.9].
In a work by Subramanya et. al. [13], transform based indexing method has been developed
that accrues the many useful properties of working in the frequency domain familiar to data
compression and signal processing applications such as low sensitivity to additive or
multiplicative scaling, low sensitivity to (high-frequency or white) noise and low space
utilization.
Basics of transforms
Transforms are applied to signals (time domain signals, like audio or spatial, like images) to
transform the data to the frequency domain. This offers several advantages such as easy
noise removal, compression, and facilitates several kinds of processing. Specifically, given a
vector X = ( x 1, x 2, …, x N) representing a discrete signal, the application of a transform yields avector Y = ( y 1, y 2,…, y N) of transform coefficients and the original signal X can be recovered
from Y by applying the inverse transform. The basics of transform-inverse-transform pairs
for DFT and DCT have been used here. In particular, the standard DFT pair is given by:
35
8/3/2019 Dimensions in Data Processing 2402
http://slidepdf.com/reader/full/dimensions-in-data-processing-2402 36/76
One of the features of a good transform is that, after the application of the transform, only a
fraction of the coefficients in the resulting vector Y can be used to reconstruct a good
approximation of the original signal.
Outline of indexing schemeEach audio file or stream is divided into small blocks of contiguous samples and a transform
like discrete fourier transform or discrete cosine transform is applied to each block. This
yields a set of coefficients in the frequency domain. With a suitable transform, only a few
significant coefficients are adequate to reconstruct a good approximation of the original
signal. (This feature of the transform has also been the main basis for lossy data
compression). Selecting an appropriate subset of the frequency coefficients and retaining a
pointer to the original audio block create an index entry. Thus, the index occupies less space
than the data and allows for faster searching. Next, a query is similarly divided into blocks to
each of which the transform is applied and a subset of transform coefficients is selected.
This forms the pattern. Then, the index is searched for an occurrence of this pattern. In this
case, two strings are considered matched if they are within a small enough “distance” of
each other when distance is measured according to the root-mean-square-difference of the
real-valued components of the strings.
Specifically, suppose A = a1 a2… an represents the discrete samples of the original audio
signal (basically the contents of the audio file) and Q = q1 q2… qm represents the samples of
a given query. Both the original signal and the query are divided into blocks of size L.
Without loss of generality assume that the lengths of the data and query are integral
multiples of the block size. (The other case can be suitably handled). Let the blocks of
original audio and the query be Al, A2, …, AN and Q1, Q2, QM. Generally M << N.
Consider a block of the original signal. Application of a transform (say
FFT, DCT or any similar transform) to Ai will yield a new sequence of values Y i = y1 y2 . . . y L , where Y i = T . Ai ,
where T is the transform matrix and is independent of the input signal. With a suitable transform, usually a few
significant values of Yi (the first few values by position (zonal selection) or the largest few values by magnitude
(threshold selection)) would be enough to reconstruct a good approximation of the original data. Suppose k
significant values of each of the blocks are retained to serve as index for the original data. Specifically, Let
be the index for block Ai denoted by DBC i. With threshold selection, we need to remember the
locations (positions) of the coefficients and these are saved in DBCLi. There will be N such indices for A, one for
each block of A. Together, these form the index set for the original data. Similarly application of the same transform
to a block Qi of the query will yield a sequence of values QBC i = z 1 z 2 z L, where QBC i = T.W . The appropriate k
values of QBC i are compared against the index sets to determine a match (exact or close).
Blocking and segmentation
36
8/3/2019 Dimensions in Data Processing 2402
http://slidepdf.com/reader/full/dimensions-in-data-processing-2402 37/76
To derive the transform-based index, the audio data (signal) is divided into fixed-size units called blocks, a process
referred to as blocking . A suitable transform is then applied to these individual blocks. The advantages of blocking
are the following:
• When transforms are applied to the whole signal, the transform coefficients capture global averages but not
the finer details.• Blocks of appropriate sizes would contain samples which are highly intercorrelated, so that when
transforms are applied, there is more energy compaction and thus fewer transform coefficients would
adequately describe the data.
• The transforms on the individual blocks could be carried out in parallel.
In segmentation on the other hand, the audio data is divided into variable-length units called segments. The data
within a segment does not vary much. The positions in audio data where very sharp changes occur define the
segment boundaries.
Search algorithm and analysis
In the work presented in this paper, the audio data and the query are divided into fixed-size blocks. In the index
searches, the transform coefficients of the query are compared with corresponding coefficients of the data blocks
and the distance between them is determined. If the distance is below an experimentally determined threshold, it is
accepted as a match. In the following algorithms and their analysis, the following notations are used:
L: The length of a block (Number of samples); N : Number of blocks of the data; M : Number of blocks of the query;
k : The number of significant transform coeffs. per block retained as index.
QBC : Query Block Coefficients. (Obtained by applying transform on query blocks).
DBC : Data Block Coefficients. (Obtained by applying transform on data blocks).
DBCL: Data Block Coefficient Locations. RBC : Reconstructed Data Block Coefficients.
RBCL: Reconstructed Data Blk Coeff. Locations.
Each block of QBC contains L elements and each block of DBC and DBCL contains k elements. (Each block of
RBC and RBCL also contains k elements).
Robust search algorithm
It assumes that the query block boundaries are aligned with those of the data block boundaries.
37
8/3/2019 Dimensions in Data Processing 2402
http://slidepdf.com/reader/full/dimensions-in-data-processing-2402 38/76
38
8/3/2019 Dimensions in Data Processing 2402
http://slidepdf.com/reader/full/dimensions-in-data-processing-2402 39/76
39
8/3/2019 Dimensions in Data Processing 2402
http://slidepdf.com/reader/full/dimensions-in-data-processing-2402 40/76
III. Indexing for Very Large Multidimensional data [14]
As the speed of processors continues to improve, researchers are performing large-scale scientific simulations to
study very complex phenomena at increasingly finer resolution scales. Such studies have resulted in the generation
of datasets that are characterized by their very large sizes ranging from hundreds of gigabytes to tens of terabytes,
thereby generating an imperative need for new interactive visualization capabilities.
A typical way of visualizing such a large multidimensional volumetric data set is to first reduce the dimension of the
data set using techniques such as slicing and then to render the result using one of the isosurface or volume
rendering techniques. Slicing is a very useful tool because it removes or reduces occlusion problems in visualizing
such a multidimensional volumetric data set and it enables fast visual exploration of such a large data set. In order to
efficiently handle the process, we need an efficient out-of-core indexing structure because such a data set very often
does not fit in main memory.
A typical approach to build indexing structures in the case of time-varying volumetric data is to build a separate
indexing structure on each time step of the data set. For example, Sutton and Hansen’s temporal branch-on-need
structure (T-BON) [14.3] is the most representative. Their strategy is to build an out-of-core version of Branch-On-
Need-Octree (BONO) [14.4], in which each leaf node is of disk page size, for each time step and to store general
common infrastructure of the trees in a single file. However, the method of building (n-1)-dimensional trees along a
particular dimension such as that used for the T-BON unfortunately results in the size increase linearly with the
resolution size at the particular dimension (the number of time steps in the case of T-BON). This is due to the fact
that it does not exploit any type of possible coherence across the particular dimension. This lack of scalability
becomes more problematic as we generate higher and higher resolution data in every dimension including the time
dimension.
Building a series of (n-1)-Dimensional indexing structures on n-Dimensional data causes a scalability problem in thesituation of continually growing resolution in every dimension. However, building a single n-Dimensional indexing
structure can cause an indexing effectiveness problem compared to the former case. The information-aware 2n-tree
has been proposed [14] to maximize the indexing structure efficiency by ensuring that the subdivision of space has
as similar coherence as possible along each dimension. It is particularly useful when data distribution along each
dimension constantly shows a different degree of coherence from each other dimension.
Information-Aware 2n-Trees
Information-Aware 2n-trees (IA 2n-trees) are basically 2n-trees (e.g. quadtrees for 2-D and octrees for 3-D
[14.13]) for n-dimensional space. However, it is different in terms of how it decides the extent ratios of a subvolume
when multiple dimensions are integrated into one hierarchical indexing structure. The coherence information along
each dimension is extracted and used for the decision so that each subvolume contains as similar coherence as
possible along each dimension.
A. Dimension Integration
40
8/3/2019 Dimensions in Data Processing 2402
http://slidepdf.com/reader/full/dimensions-in-data-processing-2402 41/76
We present an entropy-based dimension integration technique. Entropy [14] is a numerical measure of the
uncertainty of the outcome for an event x, given by H(x) = ∑=
−n
i
ii p p1
2log , where x is a random variable, n is the
number of possible states of x, and pi is the probability of x being in state i. This measure indicates how much
information is contained in observing x. The more the variability of x, the more unpredictable x is, and the higher the
entropy. For example, consider a series of scalar field values for a voxel v over the time dimension. The temporal
entropy of v indicates the degree of variability in the series. Therefore, high entropy implies high information
content, and thus more resources are required to store the series. Note that the entropy is maximized when all the
probabilities pi are equal.
Fig. 1. Entropy estimation in each dimension. Note that the Fig. 2. Different supercell sizes and corresponding y dimension has
almost zero entropy in this example. hierarchical indexing structures for the data of Figure 1:
(a) standard supercell; (b) information-aware supercell.
Higher entropy of a dimension relative to the other dimensions implies that this dimension needs to be split
at finer scales than the other dimensions. For example, if a temporal entropy is twice as much as the spatial entropy,
we design the supercell to be of size ( )t z y x s
s s s ××××××2
, where s is the size of the spatial dimension of
the supercell. Figures 1 and 2 show how this entropy-based dimension integration leads to an indexing structure for
the 3-D case. Figure 1 shows an extreme case in which the values along the y dimension remain almost constant
over all possible (x, z) values (that is, the entropy of y is almost zero) while each of the x and z dimensions has some
degree of variability. The supercell size and the corresponding hierarchical indexing structure will be designed as
shown in Figure 2 (b), that is, it has a quadtree structure unlike the standard octree of Figure 2 (a) in which the
supercell has the same size in each dimension. To estimate the ratios of the entropy values among n dimensions, we
randomly select a set of n-Dimensional subvolumes and for each subvolume, obtain the ratios by simply computing
each entropy value along each dimension. The ratios are averaged and globally applied in building indexing
structures. In computing the entropy values, if the number of the possible scalar field values is large (as in the case
of floating point values), we first quantize the original values into n values using a non-uniform quantizer such as
the lloyd-max quantizer. Further, we compute the spatiotemporal entropy ratio defined as the ratio of the average
spatial entropy to the temporal entropy.
41
8/3/2019 Dimensions in Data Processing 2402
http://slidepdf.com/reader/full/dimensions-in-data-processing-2402 42/76
B. Indexing Structures
We make use of the entropy ratios for the purpose of guiding the branching of the tree and ultimately
adjusting the size of supercells by dividing the dimension of high entropy more finely and that of low entropy more
coarsely. It is simply carried out by multiplying the original size of each dimension by its entropy value, which
becomes the ‘effective’ size of the dimension, and then using the ‘effective’ size instead of the original size in
branching of the tree. In addition to that, we adopt the Branch-On-Need strategy [14.4] by delaying the branching of
each dimension until it is absolutely necessary. For efficient isosurface rendering, each tree node contains the
minimum and maximum values of the scalar fields in the region represented by the node. The size of the tree can be
reduced by pruning nodes in which the minimum and maximum values are the same because they do not contribute
to isosurface extraction.
Further work may include evaluating the goodness of the entropy measure in comparison to other
measures and finding out a more adaptive way of applying the coherence difference in the subdivision as well as a
more effective way of decomposing the time series.
Universal Communication Format for Multimedia Data
Common data format is desired in expanding machine communication field. XDR [15.1] is a hardware
level data standard, and ACL [15.2] is an agent level logical transaction standard. The authors have been developing
an application level content representation called as UDF (Universal Data Format), which is flexible and capable for
representing multimedia data [15.3- 15.4]. However, multimedia data transmissions tend to be in large quantities
even if the receiver requires only a part of it. Data receiver needs to communicate with the sender about what
quantity to send, what kind of quality required. To meet this requirement, we designed UCF (Universal
Communication Format), which has bi-directional communication capability as an extension of UDF.
Brief description of UDF
The UDF is designed to represent any data that can be used on intelligent equipments and software. The
followings are basics of UDF.
(1) Content indication: Data section is wrapped by tags as <text> TEXT DATA </text>.
(2) Tag and data flexibility: Any tag can be defined and any data can be presented in the data section. What kind of
tags can be processed depends on the receiving software.(3) Multimedia multiplexing: Multimedia data sequences are switched as <audio>…</audio><video>…
</video><audio>…</audio>…
The followings are key features of UCF , which enables bi-directional communication.
(1) Target addressing: Although we used tags as data type identifiers in UDF. They can be thought as the names of
objects, specified by the sender, to receive the data. Therefore, we define UCF tags as the names of object. Data to
program A on host B is expressed as <B><A>DATA</A></B>. The wrapped tag (in this case, <A>) in the data
42
8/3/2019 Dimensions in Data Processing 2402
http://slidepdf.com/reader/full/dimensions-in-data-processing-2402 43/76
section indicates the inner object address. In this manner, any communication object including host and program can
be expressed by tag.
(2) Source addressing: A receiving object may need the address to reply when to return some data or messages. In
UCF, <s> tag indicates the reply-to address as in <A><s>B</s>SEND ME DATA</A>.
(3) Data interpretation: We intended to make a common data representation for each data type, such as text, graphic
object, image, audio, video and etc. However, it is difficult to define, ultimately, the best and common data format.
Then, a practical solution is to leave details to each object. Each named object can define its data format and
interpretations.
The hierarchically wrapped addressing scheme of UCF enables cross-layer communications, naturally.
Sequential nature of UCF multimedia multiplexing implies synchronization of media, e.g. audio and video. Standard
schemes for message generation and handling are to be investigated.
Figure: Example of UCF control data.
Data Hiding and Error Concealment
[ ……… to do……………………]
Review of contemporary researches in concerns related to representation and processing of
Time-series data
Data Representation Models and Concerns
[ Concerns: Data management, framework ]
The concept of time series data becomes relevant in context of for example, videos, images, audios, financial data, time
series of traffic flow and so on, where we now have higher expectations in exploring these data at hands. Typical manipulations
are in some forms of video/image/audio processing, including automatic speech recognition, which require fairly large amount of
storage and are computationally intensive.
Many approaches and techniques that address the time series data representation and manipulation, have been proposed
in the past decade. Most commonly used representations are the Discrete Fourier Transform (DFT), the Discrete Wavelet
43
8/3/2019 Dimensions in Data Processing 2402
http://slidepdf.com/reader/full/dimensions-in-data-processing-2402 44/76
Transform (DWT), Singular Value Decomposition (SVD), Adaptive Piecewise Constant Approximation (APCA), and Piecewise
Aggregate Approximation (PAA). Recently, one promising representation method was proposed called Symbolic Aggregate
Approximation (SAX).
Major Processing Concerns and Solutions
[Concerns: Clustering: traffic, K-means, hierarchical, for clinical data; Correlation analysis; Unsupervised-outlier; Periodic patterns; Similarity mining; Visual
exploration for financial data; An improved data mining algorithm for traffic flow ]
There has been much recent work on adapting data mining algorithms to time series databases. [17.1] introduced a
kernel-density-based algorithm, It ensures those uninteresting sequences would not affect the clustering result. [17.2] firstly used
k-means algorithm, then the prototypes of the resulting clusters were used as time-variables to develop an Auto Regressive model
relating the expression of the prototypes to each other. On hierarchical clustering algorithms, [17.3] developed a method called
Gecko which is similar to Chameleon. This method divides the process of clustering into three steps (segmentation; merging;
determine the best clustering level). The method is used for time-series anomaly detection. The disadvantage of the algorithm is it
takes too much time to cluster. [17.4] proposed a density-based hierarchical clustering method and proved the method is not
sensitive to noisy data. Also, Pedor Rodrigues et al. developed an online divisive-agglomerative clustering system for time-series
data streams in [17.5]. However, the researches mentioned above are almost for time series of gene expression data in biology.
I. Time Series for Gene Expressions
For estimating gene networks from time series gene expression data measured by microarrays, a lot of attention has
been focused on statistical methods, including Boolean networks [19.1,19.11] differential equations [19.3, 19.5] dynamic
Bayesian networks [19.6, 19.7, 19.8] state space models [19.2, 19.4] and so on. While these methods have provided many
successful applications, a serious drawback for using these method to estimate gene networks had been that : a basic assumption
of these methods is that the network structure does not change through all time points, while the real gene network has time-
dependent structure. In a recent work [19], a solution of this problem was provided and a statistical methodology was established
to estimate gene networks with time-dependent structure by using dynamic linear models with Markov switching. This model is
based on the linear state space model, also known as the dynamic linear model (DLM). In the DLM, the high-dimensional
observation vector is compressed into the lower dimensional hidden state variable vector. For the microarray analysis, the
observation vector corresponds to the gene expression value vector and the state variables can be considered as a transcriptional
module that is a set of co-regulated genes.
Dynamic Linear Model
Let yt be a vector of d observed random variables which contains expression values of d genes at time point t. The
DLM relates a collection of , to the hidden k-dimensional state vector xt in the following way:
Here, the At is a d x k measurement matrix and the wt is the Gaussian white noise as Usually the
dimension of state vector is taken to be much smaller than that of data, k < d. In DLM, the time evolution of the state variables
are modeled by a first-order Markov process as
44
8/3/2019 Dimensions in Data Processing 2402
http://slidepdf.com/reader/full/dimensions-in-data-processing-2402 45/76
Where state transition matrix and the additive system noise follows form the Gaussian distribution as
. the noise covariance matrices are assumed to be diagonal,
, respectively. Notice that the model parameters
depend on the time index. This implies that the underlying dynamics changes discontinuously at certain
undetermined points in time.
The process of the DLM starts with an initial Gaussian state that has mean and covariance matrix . In
DLM, the dynamics of are governed by the joint probability
distribution. The all composition in this representation are the Gaussian density in which
.
The DLM, in its canonical form, implicitly assumes an interesting casual relationship among the d variates (genes ). To
sum up, the time-dependent DLM describes the consecutive changes in module sets of genes, module-module interactions and
gene-gene interactions with the underlying canonical form (see Figure 1). After learning and the projection matrix
we can identify the time-dependent network structure by testing whether or not these parameters lie in a region
significantly far from zero. This problem amounts to the classical testing method or the bootstrap confidential intervals.
DLM with Markov Switching
The problem of modeling change in an evolving time series can be handled by incorporating the dynamics of some
underlying model change discontinuously at certain undetermined points in time. In view of real biological system, the structural
change might occur in smooth. To incorporate a reasonable switching structure, we employ the DLM-MS approach that assumes
the is generated by one of the G possible regimes evolving according to a Markov chain. In this context, the model
parameters are assumed to take one of the G possible configurations
at each time point. For notational convenience, we introduce the hidden vector of G
class labels to indicate the configurations in the following way:
The DLM-MS, in its basic form, assumes that the discrete variable evolves according to the first-order Markov
chain with the transition probability matrix M of order G x G where the (h,g) element defines a probability of event
i.e.,
45
8/3/2019 Dimensions in Data Processing 2402
http://slidepdf.com/reader/full/dimensions-in-data-processing-2402 46/76
Each row of denoted by is restricted to be Smoothness of change in regimes are
controlled by the entropy of for h = 1,. . . ,G.
Bayesian Inference
For some gene expression data, each array contains some genes with fluorescence intensity measurements that were
flagged by the experimenter and recorded as missing data points. In such a case, is incomplete. To deal with the missing
problem, we define the partition of d observed vector where contain the observed and
missing components, respectively. Consequently, the DLM-MS takes as a complete dataset
having the joint distribution
The parameters to be learned from the observed dataset are collected into a set
The and denote the initial distributions to derive the dynamic
system. Our attention turns to the Bayesian learning of DLM-MS that requires the prior distribution of all model parameters
and the initial distribution of the hidden states and . In this study, we employ the natural conjugate priors.
Let be the i-th row of A, and B, respectively. A family of the conjugate priors of DLM-MS that we use are
expressed as follows:
Where stands for the inverse-gamma distribution with the shape and the scale parameter 6, and
denotes the Dirichlet distribution with the prior sample size Note that the prior distribution
of A, is specified by the truncated Gaussian distribution whose support are restricted to the positive part
For DLM setting the underlying dynamical system is invariant under the transformations as and
To avoid the lack of identifiability, we use the truncated prior distribution. Once the prior distributions are given,
the augmented parameters are estimated through the posterior distribution
46
8/3/2019 Dimensions in Data Processing 2402
http://slidepdf.com/reader/full/dimensions-in-data-processing-2402 47/76
Within Bayesian framework, all inferences are made based on the marginal posterior distribution, for instance,
47
8/3/2019 Dimensions in Data Processing 2402
http://slidepdf.com/reader/full/dimensions-in-data-processing-2402 48/76
from these full-conditional distributions. If the iteration have proceeded long enough, the simulations is grossly representative of
the target distribution. To diminish the effect of the starting point, we generally discard the first p simulated samples and focus
attention on the rest n-p. The set is used to summarize the posterior distribution and to
compute quantiles and the other summaries of interest as needed.
II. Time Series for Traffic DataUsing data mining technology to analyze time series of traffic flow not only can forecast the short-time or long-time
traffic volume, but also can judge which street of a city is bottleneck. So that it helps a lot to analyze the traffic situation of the
city. In fact, clustering of similar change trends of traffic flow time series is an interesting issue now. On one hand we can get
some typical patterns of traffic flows, on the other hand we can group the section of highway where the detectors located
according to different flow characteristics. Therefore the sections of highway in one group have similar traffic flow
characteristics, and the sections of highway in different groups have distinct characteristics. Combined with spatial information,
some useful spatial and temporal distributed patterns in transportation could be revealed.
Linkage Difference
In this paper, they used average linkage as distance (or similarity) between clusters. Given two clusters A and B: a1,a2, …, am, B = b1, b2, …, bm m and n are the sizes of A and B. W is the similarity matrix among time
series, that is . Let and then between cluster A and cluster B is:
48
8/3/2019 Dimensions in Data Processing 2402
http://slidepdf.com/reader/full/dimensions-in-data-processing-2402 49/76
An Algorithm of Similarity Mining in Time Series Data on the Basis of Grey Markov Scgm(1,1) Model has also been
proposed in a recent work by Xiong et.al. [18].
Encoded-Bitmap-Approach-Based Swap
Given two clusters A and B, that is to say now the number of clusters k=2. For an arbitrary time series u∈ A,B, we
can get all the linkage difference D(u,A,B). For every u∈ A, there are two conditions about the value of ∆D(u,A,B):
1. ∆D(u,A,B) = D(A,u) – D(B,u) < 0. It means series u has a relative larger linkage to cluster B even though it is located in
cluster A. Then we move series u to cluster B.
2. ∆D(u,A,B) = D(A,u) – D(B,u)≥ 0. It means series u has a relative larger linkage to its initial cluster A. Then we do
nothing in this situation. For every v ∈ B, there are similar two conditions:
1. ∆D(v,B,A) = D(B,v) – D(A,v) < 0. It means series v has a relative larger linkage to cluster A even though it is located in
cluster B. Then we move series v to cluster A.
2. D(v,B,A) = D(B,v) – D(A,v) ≥ 0. It means series v has a relative larger linkage to its initial cluster B. Then we do
nothing in this situation.
If the number of existing clusters k k>2(assume them to be A,B,C,D,…), then the same we can have: for every u∈A, if all ∆
D(u,A,B) are greater than zero then we do nothing to this situation, else we select the biggest one in absolute value from where
∆D(u,A,X)<0. This procedure of swapping series is called the “Encoded-Bitmap-Approach-Based Swap”. The algorithm when
k=2 is presented below.
Algorithm: EncodedBitmap_Based_Swap(ACluster,BCluster)
// Here k=2; Input: original clusters ACluster and BCluster Output: the two new clusters
Begin
Step 1.Use Encode Bitmap Approach to calculate the similarity matrix:
W(ACluster,BCluster)
Step 2.For every time-series u∈ACluster, Calculate D(A,u) and D(B,u)
If ∆D(u,A,B) = D(A,u) – D(B,u) <0, then move u to BCluster
Step 3.For every time-series v∈ BCluster, Calculate D(B,v) and D(A,v)
If ∆D(v,B,A) = D(B,v) – D(A,v) <0, then move v to ACluster
49
8/3/2019 Dimensions in Data Processing 2402
http://slidepdf.com/reader/full/dimensions-in-data-processing-2402 50/76
End.
Both grey relation and Encoded-Bitmap-Approach-Based Swap were adopted [17] to improve the classic hierarchical
clustering algorithm.
Algorithm: Improved Hierarchical Clustering Method
// Input: Time Series Datasets Output: K Clusters
Begin
1. Start by assigning each item to its own cluster, so that if there are N items, there are N
clusters, each containing just one item.
2. Use grey relation as time series similarity measurement, and then let the similarities between the clusters equal the similarities
between the items they contain.
3. Find most similar pair of clusters and merge them into a single cluster, so that now one
cluster can be reduced.
4. Compute the average linkage as similarities between the new cluster and each of the old clusters.
5. Repeat steps 3 and 4 until get K clusters.
6. Adopt encoded-bitmap-approach-based swap to refine the K clusters from step 5 and then get the new K clusters.
End.
Experimental results show that, comparing with the classic hierarchical clustering method, the above method has a
better performance on the separation of time series’ change trend.
III. Time Series in Multimedia data
Typical multimedia manipulations require considerable amount of storage and are computationally intensive.
Generally, we can use various image processing techniques [16.8][16.12][16.14][16.24] [16.26] to cluster multimedia data, by
measuring similarities among the raw videos or images, using certain features such as color, texture, or shape. However, recent
work [16.17][16.22] have demonstrated the utility of time series representation as an efficient alternative to the raw multimedia
data, whose advantages include time and space complexity reduction on clustering, classification, and other data mining tasks. In
clustering multimedia time series data, k-medoids algorithm with Dynamic Time warping distance measure is often used. In fact,
there are many other distance measures that can be effectively used for time series data, but we will mainly focus on DTW due to
its ideal shape-based similarity measurement that can break the limitation of one-to-one mapping in Euclidean distance, the most
well-known distance metric. Although k-medoids with DTW gives satisfactory results, k-means clustering is conceivably much
more typical in clustering task, where an averaging algorithm is a crucial subroutine in finding a data representation of each
cluster. In general, Euclidean distance metric (or other types of Minkowski metric) is used to find an average of all the data
within the clusters. However, its one-to-one mapping nature is unable to capture the average shape of the two time series, in
which case the Dynamic Time Warping is more favorable. The work by Gupta et al. [16. 9], introduced the shape averaging
approach using Dynamic Time Warping. In their work, Niennattrakul [16] et. al. provided a generic time
series shape averaging method with a proof of correctness.
Distance Measurement
Distance measure is extensively used in finding the similarity/dissimilarity between
any two time series. The two well known measures are Euclidean distance metric and DTW
50
8/3/2019 Dimensions in Data Processing 2402
http://slidepdf.com/reader/full/dimensions-in-data-processing-2402 51/76
distance measure. As a distance metric, it must satisfy the four properties – symmetry, self-
identity, non-negativity, and triangular inequality. A distance measure, however, does not
need to satisfy all these properties. DTW [21] is a well-known shape-based similarity
measure for time series data. Unlike the Minkowski distance function, dynamic time warping
breaks the limitation of one-to-one alignment, and also supports non-equal-length timeseries. It uses dynamic programming technique to find all possible paths, and selects the
one that yields a minimum distance between the two time series using a distance matrix,
where each element in the matrix is a cumulative distance of the minimum of the three
surrounding neighbors. Suppose we have two time series, a sequence Q = q1, q2, …, qi, …,
qn and a sequence C = c1, c2, …, cj, …, cm. First, we create an n-by-m matrix, where every
(i, j) element of the matrix is the cumulative distance of the distance at (i, j) and the
minimum of the three elements neighboring the (i, j) element, where ni ≤≤0 and m j ≤≤0 .
We can define the (i, j) element as:
(1)
where is (i, j) element of the matrix which is the summation between
the squared distance of qi and cj, and the minimum cumulative distance of the three
elements surrounding the (i, j) element. Then, to find an optimal path, we have to choose
the path that gives minimum cumulative distance at (n, m). The distance is defined as:
(2)
where P is a set of all possible warping paths, and wk is (i, j) at kth element of a warping
path and K is the length of the warping path. The algorithm generates optimal warping paths
even though the warping distance will always turn out to be the same.
Dynamic Time Warping Averaging
In some situations, we may need to find a template or a model of a collection of time
series, in which case, shape averaging algorithm is desired for a more accurate/meaningful
template. DTW distance measure will be exploited to find appropriate mappings for an
average. More specifically, the algorithm needs to create a DTW distance matrix and find anoptimal warping path. After the path is discovered, a time series average is calculated along
this path by using the index (i, j) of each data point wk on the warping path, which
corresponds to the data points qi and cj on the time series Q and C, respectively. Each data
point in the averaged time series is simply the mean of two values on the two time series
51
8/3/2019 Dimensions in Data Processing 2402
http://slidepdf.com/reader/full/dimensions-in-data-processing-2402 52/76
that index (i, j) maps to. W = w1, w2, …, wk, …, wK is an optimal warping path, where wk is
the mean value between time series whose indices are i and j.
(3)
in query refinement, where the two time series may have different weights, αQ for a
sequence Q and αC for a sequence C, eq. (3) above may then be simply generalized
according to the desired weight below
(4)
As shown in Figure 1, what we want from a shape averaging algorithm is illustrated in
Figure 1 (a) where DTW is used. If the Euclidean or any one-to-one mapping distance
measures were used, we would probably end up with undesirable result, as shown in Figure
1 (b).
Figure 1. A comparison between (a) shape averaging and (b) amplitude averaging
K -means Clustering
As shown in Table 1 below, the k-means algorithm [16.4] tries to divide N data
objects into k partitions or clusters, where each would have one object (mean) as its cluster
center, representing all data objects within that cluster. We then assign the rest of the
objects to proper clusters and recalculate new centers. We repeat this step until all clustercenters are stable. In general, after each iteration, the quality of the clusters and the means
themselves will essentially be improved.
K -medoids Clustering
52
8/3/2019 Dimensions in Data Processing 2402
http://slidepdf.com/reader/full/dimensions-in-data-processing-2402 53/76
Unlike k-means clustering, k-medoids [16.11] only differs from k-means in the way
the cluster centers are chosen and represented (step 4), i.e., it will find new cluster centers
by choosing an existing data member within each cluster that best represents its cluster
center, instead of calculating the cluster members’ average.
Figure 4. Examples of six-class Leaf images Figure 3. Examples of six species in Leaf dataset
Figure 7. Tracking hand position in each video frame Figure 6. Examples of four different Face profiles after
converted into time series
K -means Clustering with DTW
It has been demonstrated that k-medoids clustering for multimedia time series data
runs smoothly with DTW. In contrast, it has been observed [16], that if k-means method is
instead used in clustering, there is a high probability of failure, comparing to the k-medoids
algorithm (and that is probably why Euclidean averaging is often used for k- means shapeaveraging despite the use of DTW in cluster membership assignment). The paper[16],
pointed out some interesting problem, which occurs when using kmeans clustering and
DTW. Future study may be done to investigate how these problems can be resolved and
come up with possible remedies in accurately averaging shape-based time series data.
IV. Time Series in Financial Applications
53
8/3/2019 Dimensions in Data Processing 2402
http://slidepdf.com/reader/full/dimensions-in-data-processing-2402 54/76
Financial time series data has its own characteristics over other time series data. One
of its special characteristics is that it is typically characterized by a few critical points and
multi-resolution consideration is always necessary for long term and short-term analyses.
Second one is that financial time series data is continuous, large and unbound. There are
many technical analytical methods for financial time series data to identify patterns of market behavior. In those financial analytical methods, critical or extreme points, which the
original SAX cannot handle, are very important to discover. To reduce a loss of these
important points, Extended SAX representation especially for financial data analysis and
mining tasks is devised by Lkhagva et. al. [20]. The basic idea of the method proposed in
the paper, is based on two previously proposed representation techniques. These two
methods are the PAA and the SAX representations,
Piecewise Aggregate Approximation (PAA)
Yi and Faloutsos [20.7] and Keogh et al. [20.4] independently proposed PAA. In PAA,each sequence of time series data is divided into k segments with equal length and the
average value of each segment is used as a coordinate of a k-dimensional feature vector.
Figure 1: A time series C is represented by PAA (by the mean values of equal segments). In the
example above, the dimensionality is reduced from n = 60 to k = 6.
The advantages of this transform are that 1) it is very fast and easy to implement, and 2) the index can be
build in linear time. As shown in Figure 1, in order to reduce the time series from n dimensions to k dimensions, the
data is divided into k equal sized segments. The mean value of the data falling within a segment is calculated and a
vector of these values becomes the data-reduced representation.
More formally, a time series C of length n can be represented in a k-dimensional space by a vector k and
the ith element of C is calculated by the following equation [20.4]:
(1)
However since the PAA approach minimizes dimensionality by the mean values of equal sized frames.
This mean value based representation may cause a possibility to miss some important patterns in some time series
data analysis.
Symbolic Aggregate Approximation (SAX)
54
8/3/2019 Dimensions in Data Processing 2402
http://slidepdf.com/reader/full/dimensions-in-data-processing-2402 55/76
Lin and Keogh et al. [20.3] proposed new approach called SAX. SAX is based on PAA [20.4, 20.7] and
assumes normality of the resulting aggregated values. SAX is the first symbolic representation of time series with an
approximate distance function that lower bounds the Euclidean distance. In SAX, firstly the data is transformed into
the PAA representation and then the transformed PAA representation is symbolized into a sequence of discrete
strings. There are two important advantages to doing this:
Dimensionality Reduction: Dimensionality reduction of PAA [20.4, 20.7] is automatically carried over to
this representation.
Lower Bounding: Distance measure between two symbolic strings can be proved by simply pointing to
the existing proofs for the PAA representation itself [20.4].
In order to obtain string representation after a time series data is transformed into the PAA representation,
symbolization region should be determined. By empirically testing more than 50 datasets, it was defined that
normalized subsequences have highly Gaussian distribution [20.3]. From this result, the “breakpoints” that will
produce equal-sized areas under Gaussian curve is determined. “breakpoints” is defined as the following.
Definition 1 [3]: Breakpoints: breakpoints are a sorted list of numbers such that the area under a
N(0,1) Gaussian curve from are defined as , respectively). Thes
breakpoints can be determined by looking them up in a statistical table, e.g.
Table 1: A lookup table that contains the breakpoints that divides a Gaussian distribution in an
arbitrary
number (from 3 to 5) .
Using these defining breakpoints, a time series is discretized in the following example. First a PAA of the
time series is obtained. Then, all PAA coefficients that are below the smallest breakpoint are mapped to the symbol
“A,” all coefficients greater than or equal to the smallest breakpoint and less than the second smallest breakpoint are
mapped to the symbol “B,” etc. Figure 2 illustrates the idea.
Figure 2: A time series is discretized by SAX. In the example above, with n = 60, k = 6 and a = 3, the
time series is mapped to the word ABCBBA.
SAX has also some disadvantages such as the dimensionality reduction nature that has possibility to miss
important patterns in some datasets as depicted in figure 3.
55
8/3/2019 Dimensions in Data Processing 2402
http://slidepdf.com/reader/full/dimensions-in-data-processing-2402 56/76
Figure 3: Financial time series data is represented by SAX. Figure 4: Financial time series data is
Some important points (shown in red) are missing. (US$ represented by Extended SAX. The
Extended and Japanese yen exchange rate data of 2 months.) The SAX representation is
SAX representation is CFCBFD. ACFFDFFCAABFFFFDCA. (US$ and Japanese
yen exchange rate data of 2 months.)
Further modified technique Extended SAX has been proposed in [20], as the result depicted in figure 4.
Review of contemporary researches in concerns related to representation and processing of
Spatial data
Data Representation Models and Concerns
A pictorial database plays an important role in many applications including
geographical information systems, computer aided design, office automation, medical image
archiving, and trademark picture registration. In such fields there is a need to manage
geometric, geographic, or spatial data, which means data related to space. The space of
interest can be, for example, the two-dimensional abstraction of (parts of) the surface of the
earth – that is, geographic space, the most prominent example –, a man-made space like the
layout of a VLSI design, a volume containing a model of the human brain, or another 3d-
space representing the arrangement of chains of protein molecules.
Representation of relative spatial relations between objects is required in many multimedia database
applications. Quantitative representation of spatial relations taking into account shape, size, orientation and distance
is often required. This cannot be accomplished by assimilating an object to elementary entities such as the centroid
or the minimum bounding rectangle. Thus many authors have proposed numerous representations based on the
notion of histograms of angles. There are many general-purpose content-based image retrieval systems, e.g. the
QBIC [21.6] system and the Photobook [21.14]. They mainly use color, texture and shape as image features.
However, representing the spatial relations between objects is also an important component of image content
description and access. For example, the spatial relationship between brain lesions and anatomical brain structures in
medical images is critically important for early disease diagnosis and thus important for image retrieval. Typical
56
8/3/2019 Dimensions in Data Processing 2402
http://slidepdf.com/reader/full/dimensions-in-data-processing-2402 57/76
applications of spatial relation representations are content-based image retrieval (e.g. [21.3, 21.8, 21.15, 21.17]),
video indexing and retrieval (e.g. [21.5]), computer vision, robot navigation, and Geographic Information Systems
(GIS).
To assess the degree of similarity of two images according to the spatial relations between objects, first we
need to extract a compact representation of spatial relations from images, and then define a (dis)similarity measure
(e.g. a distance function) on such representations. Our ultimate goal is to answer queries like “find similar MR
images to one with a lesion inside the frontal lobes”, or “find similar surveillance video sequences to one in which a
man walks from the middle of a room to the east side.”
Significant work has been reported on spatial relation representation. Many authors have stressed the
importance of qualitative spatial relationships [21.4]. Approaches have been based on Allen’s interval relations
[21.1] (e.g. [21.13, 21.16]), 2D strings [21.3] and their variants, Attributed Relational Graphs (ARGs) (e.g. [21.15])
or the spatial orientation graph (e.g. [21.8]). All of these approaches assimilate an object to very elementary entities
such as the centroid (e.g. [21.8, 21.9]) or the minimum-bounding rectangle (e.g. [21.13]). This simplification process
cannot give a satisfactory modelling of the spatial relations. For example, projecting two objects to each of thedimensions and considering each dimension separately is inadequate, because the two objects may not overlap at all
when their projections onto the x and y axes overlap simultaneously. In [21.12], Miyajima and Ralescu introduced
the notion of the histogram of angles to represent directional relations.
I. Histogram based representation
In [21], a new histogram representation of spatial relations called R-Histogram. Here, we assume the
images are segmented and each object is assigned a unique label i.e., we deal with symbolic images, as defined
formally in [21.8]. The dissimilarity between two images is then defined by the distance between the two
corresponding R-Histograms.
The R-Histogram
Given a reference object R and an object of interest A, the goal is to represent, quantitatively, the spatial
relations between R and A. Consider the vector originating from a pixel x on the boundary of R to a pixel y on the
boundary of A. If x and y don’t coincide, we compute the angle between the x-axis of the coordinate frame and xy .
This angle, denoted by θ (x,y), takes values in[-π , π ]. As in histogram of angles [21.12], the set of angles from
any pixel on the boundary of R to a pixel on the boundary of A expresses the directional relations between R and A.
The novel idea introduced in this paper is the labeled distance. The labeled distance from x to y, denoted by LD(x,
y), is defined as a pair (d ( x, y), l ( x, y)), where d ( x, y) is the Euclidean distance from x to y and l(x, y) is defined in
Table 1.
57
8/3/2019 Dimensions in Data Processing 2402
http://slidepdf.com/reader/full/dimensions-in-data-processing-2402 58/76
Here, column1 describes whether pixel x is inside object A, and
column2 describes whether pixel y is inside object R.
For the set of vectors originating from any pixel on the boundary of R to any pixel on the boundary of A,
we construct a histogram as follows: Let x and y be the pixels on the boundary of R and A respectively. The bin H(I,
J,L) is incremented as follows:
(1)
where A I is the range of angle values spanned by bin H ( I, J,L), D J is the range of distance values spanned by bin H ( I,
J,L), and L ∈ 0 , 1 , 2 , 3 is the label associated with the distance values spanned by bin H ( I, J,L).
Then the histogram is normalized as follows:
(2)
where n A is the number of angle bins and n D the number of distance bins. The normalized histogram, denoted as
RH ( A,R), is defined to be the R-Histogram of object A relative to object R.
A R-Histogram example is illustrated in Figure 2, where the x-axis is associated with angles and the y-axis
with distances.
58
8/3/2019 Dimensions in Data Processing 2402
http://slidepdf.com/reader/full/dimensions-in-data-processing-2402 59/76
Figure 2: RH( A,R) for the two objects in Figure 1. Each quadrant is associated with a unique label.
Time Complexity concerns: Let N be the number of pixels in an image. We assume the objects are homeomorphic
to a 2-ball. In the worst case, the number of pixels on the boundary of an object is O( N ). Therefore, the computation
of R-Histogram takes O( N 2) time. If the objects are convex, the number of boundary pixels will be O( N 1/2) and the
time complexity will drop to O( N ).
Distance Metric
The dissimilarity between two images is defined by the distance between corresponding R-Histograms.
There are many histogram distance metrics. The distance metric used here is the histogram intersection. It is shown
in [21.18] that when the histograms are normalized, the histogram intersection is given by
(3)
Future work may be performed to model the spatial relations of multiple objects in an image, we can use R-
Histograms as the arc attributes in ARGs. Moreover an attempt is likely to improve the time complexity of R-
Histogram computation and investigate the possibility of extracting semantic meanings from the R-Histogram
representations.
II. Content Based Image Retrieval
Content-based image retrieval (CBIR) is the current trend of designing image database systems as opposed
to textbased image retrieval [22.7], [22.11], [22.14], [22.18], [22.23], [22.24], [22.25], [22.27]. The features used in
content-based image retrieval can be roughly divided into two categories: the low-level visual features (such as
color, texture, and shape) and the highlevel features (such as pairwise spatial relationships between objects). Some
examples of content-based image retrieval systems are QBIC [22.8], Virage [22.1], Retrieval Ware [22.29],
VisualSEEK [22.26], WaveGuide [22.17], and Photobook [22.21]. They allow users to retrieve similar pictures from
a large image database based on low-level visual features. On the other hand, there is also a large group of
researchers emphasizing image retrieval based on spatial relationships between objects [22.3], [22.4], [22.5],
[22.10], [22.15], [22.16], [22.20], [22.22], [22.28].
59
8/3/2019 Dimensions in Data Processing 2402
http://slidepdf.com/reader/full/dimensions-in-data-processing-2402 60/76
The method of representing images is one of the major concerns in designing an image database system.
An ideal representation method for symbolic pictures should provide image database systems with many important
functions such as similarity retrieval, visualization, browsing, spatial reasoning, and picture indexing. One way of
representing an image is to construct a symbolic picture for that image which in turn is encoded into a 2D-string
[22.5]. The 2D string representation method opened up a new approach to spatial reasoning, picture indexing, and
similarity retrieval. There are many followup research works based on the concept of 2D string such as 2D C-string
[22.15], [22.16], and 2D C+-string [22.9]. In [22], we find a new scheme for encoding spatial relations called 9-
Direction SPanning Area (9D-SPA) representation method.
Overview Of Spatial Knowledge Representation
Binary spatial relationships between objects have been identified as one of the most important features for
describing the contents of images [22.6]. For example, a query such as “finding all the pictures containing a house to
the east of a tree” relies on spatial relations to retrieve the desired pictures. Different kinds of spatial knowledge
representations have been proposed so far. Chang et al. [22.5] proposed the 2D string as a spatial knowledge
representation to capture the spatial information about the content of a picture. The fundamental ideal of 2D string is
to project the objects of a picture along the x and y-directions to form two strings representing the relative positions
of objects in the x and y-axis, respectively. Since a 2D string preserves the spatial relationships between any two
objects in a picture, it has the advantage of facilitating spatial reasoning. Moreover, since a query picture [22.6] can
also be represented as a 2D string, the problem of similarity retrieval becomes a problem of 2D string subsequence
matching. Jungert [22.12], Chang et al. [22.4], and Jungert and Chang [22.13] extended the idea of 2D strings to
form 2D G-strings by introducing several new spatial operators to represent more relative positional relationships
among objects of a picture. The 2D G-string representation embeds more information about spatial relationships
between objects and, thus, facilitates spatial reasoning about sizes and relative positions of objects. Following thesame concept, Lee and Hsu [22.15] proposed the 2D C-string representation based on a special cutting mechanism.
Since the number of subparts generated by this new cutting mechanism is reduced significantly, the lengths of the
strings representing pictures are much shorter while still preserving the spatial relationships among objects. The 2D
C-string representation is more economical in terms of storage space efficiency and navigation complexity in spatial
reasoning. The 2D C+-string representation [22.9] extended the 2D C-string representation by adding relative metric
information about the picture to the strings. As a consequence, reasoning about relative sizes and locations of
objects, as well as the relative distance between objects in a symbolic picture becomes possible. Chang [22.3]
proposed a structure called 9DLT to encode the spatial relationships between objects in terms of nine directions.
Since the 9DLT method uses centroid to represent the position of an object, such a representation is too sensitive in
spatial reasoning. For example, the spatial relationships between the two objects shown in Figs. 1a, 1b, and 1c are all
different in 9DLT representation; however, they seem not too much different in human visual perception.
The representation of spatial relations proposed by Zhou and Ang [22.28] combines the nine directional
relations proposed in 9DLT with the five topological relations, namely, disjoint, meet, partly-overlap, contain, and
inside. The topological relation can record the 2D relationship between any two nonzero-sized objects with irregular
60
8/3/2019 Dimensions in Data Processing 2402
http://slidepdf.com/reader/full/dimensions-in-data-processing-2402 61/76
shapes and, therefore, makes spatial reasoning more accurate as compared to using MBR or centroid to represent an
object. However, Zhou and Ang’s method still has the problem with being too sensitive when reasoning about
directional relations. Instead of combining the nine directional relations with the five topological relations, the 2D-
PIR proposed by Nabil et al. [22.19], [22.20] combines the 13 projection interval relations with the topological
relations. Although 2D-PIR seems2particularly useful in similarity retrieval, it did not provide any picture
reconstruction mechanism for visualization. Besides, incorporating 2D-PIR into any indexing structure is difficult.
Thus, similarity retrieval based on 2D-PIR becomes inefficient if the volume of images in the database increases.
9D-SPA Representation
The picture has to be preprocessed first. We assume that the objects in a picture can be identified by some
image segmentation and object recognition procedures. Various techniques of image segmentation and object
recognition can be found in [22.2]. Suppose that a picture P contains n objects (O1 ,O2 ,. . .,On). Then, the 9D-SPA
representation of P can be encoded as a set of 4-tuples: R =(O ij;Dij;D ji; Tij )| ∀ Oi , O j∈ P , and 1 <= i<j<= n,
where Oij is the code for object-pair (Oi ,O j), Dij is the code for the direction-relation between objectsOi,Oj with Oj
as the reference object, Dji is the code for the direction relation between Oi and Oj with Oi as the reference object,
and Tij is the code for the topological relation between Oi and Oj. It is obvious that the number of 4-tuples in R is
n(n-1)/2.
Let Oi be the ith object in the image database (1 <= i<= n). We assign integer i to object O i as its object
number. Then, Oij is called the object-pair code for object-pair (Oi, Oj). Given two objects O i and O j, we can easily
compute the object-pair code Oij using the following formula:
To obtain the two object numbers i and j from Oij (or to decode Oij), we use the formula ,
where a is the largest integer such that .
Dij represents the value assigned to the directional relationship between objects Oi and O j with Oj as the reference
object. The value of Dij is determined by the following procedure. First, we find the Minimal Bounding Rectangle
(MBR) for reference object O j. Then, we extend the four boundaries of this MBR horizontally and vertically until
61
8/3/2019 Dimensions in Data Processing 2402
http://slidepdf.com/reader/full/dimensions-in-data-processing-2402 62/76
they cut the whole picture into nine neighborhood areas and then assign each area a binary code as shown in Table
1. The value of Dij is determined by the formula where wk is the binary code of neighborhood
area k; bk = 1 if object Oi overlaps area k, otherwise, bk = 0. The value of Tij indicates the topological relationship
between objects Oi and O j. Possible values assigned to topological relations are: 0 (stands for “disjoint”), 1 (stands
for “meet”), 2 (stands for “partly_overlap”), 3 (stands for “cover”), and 4 (stands for “contain” or “inside”).
Fig. 2. Pictures (a) and (b) are not distinguishable in all 2D+-string representations. However, the difference can be
easily determined by the 9D-SPA representation.
Let us look at the two pictures shown in Figs. 2a and 2b. Assume that object B is the reference object in
both pictures. Then, in Fig. 2a, the code for DAB is
(00001000 + 00010000+ 00100000 + 01000000 + 10000000)2 = 248
and the code for TAB is 0. In Fig. 2b, the code for DAB is
(00000001 + 00000010 + 00100000 + 01000000 + 10000000)2 = 227
and the code for TAB is 0. In 2D+-string representations, the pictures in Figs. 2a and 2b are not distinguishable
because they have the same spatial representation (i.e., A%B in both x and y-directions). However, we can easily tell
the difference between them by using 9D-SPA representation because DAB in Fig. 2a is 248, while DAB in Fig. 2b is
227. Moreover, from DAB = 248= (11111000)2, we can easily determine that object A spans five neighborhood areasof object B, namely, the northwest, the west, the southwest, the south, and the southeast neighborhood areas as
shown in Fig. 2a. Similarly, from DAB = 227= (11100011)2, we can easily determine that object A spans another
different five neighborhood areas of object B, namely, the northeast, the east, the southeast, the south, and the
southwest neighborhood areas as shown in Fig. 2b.
III. Content Based Image Retrieval and Spatial Data Mining for Medical data
In database systems for supporting contemporary advanced applications like medical image analysis and
disease detection and prediction systems, the techniques of content based image retrieval and of spatial data mining
in images are of much importance. Similar techniques are applicable to applications like surveillance systems and
GIS based decision support systems. As an example, we find the work by Chung, Wang [25] in which they have
discussed creation of a skin cancer image database using a three-tier system.
Database Design
An automatic segmentation method for the images of skin tumors is developed in [25.2]. This method first
reduces a color image into an intensity image and then finds an approximate segmentation by intensity thresholding.
Finally, it refines the segmentation using image edges. One table is designed for this skin cancer database to store
62
8/3/2019 Dimensions in Data Processing 2402
http://slidepdf.com/reader/full/dimensions-in-data-processing-2402 63/76
the features of the tumors. Besides the tumor features, some other attributes are added into the table. These include
record number as a primary key of this table, patient id number, the date that the image was taken, the image id
number to identify the image, and the image file name. Image file names are stored in the database instead of image
file themselves. Although images can be stored in the database as BLOB type, our approach is more flexible because
image files can be stored elsewhere, like on a multimedia server. A DBMS can be easily integrated with multimedia
servers. One advantage is that it is easy to integrate multimedia files with existing databases. Another advantage is
that other non-database applications can access those multimedia files without going through the database. While
performing browsing or content based retrieval, Java applets will try to find and display the images using their file
names stored in the database. The skin cancer database can be used for medical information retrieval, expert
diagnosis, and medical pattern discovery.
Image Feature Definitions
Irregularity is associated with skin malignancies, including malignant melanoma, but it remains undefined
up to now, other than with some subjective terms, such as jagged, notched, not smooth, or not round. One common
way to measure irregularity ( I ) is
,
where p and A are the perimeter and area, respectively [25.3]. Asymmetry is determined about the near-axis of
symmetry by comparing absolute area differences to the total area of the tumor shape [25.4]. Entropy a feature
which measures the randomness of gray-level distribution. It is defined as
[25.5]
P [i, j] is the gray-level co-occurrence matrix. It is defined by first specifying a displacement vector d = (dx, dy) andcounting all pairs of pixels separated by d having gray levels i and j. Entropy is highest when all entries in P [i, j] are
equal [11].
Energy is defined as
[25.5]
Homogeneity is defined as
[25.5]
Inertia is defined as
[25.6]
Database Browsing and Retrieval
The project is implemented in a three-tier architecture. The applets run in browser is the front layer, the
web server is the middle layer and the backend database server is the third layer. JDBC-ODBC is used for the
63
8/3/2019 Dimensions in Data Processing 2402
http://slidepdf.com/reader/full/dimensions-in-data-processing-2402 64/76
communication between web server and database server. Users can retrieve images by their content, i.e. by
specifying the attribute values or by using a synthesized color.
Data mining, which is also referred to as knowledge discovery in databases, means a process of nontrivial
extraction of implicit, previously unknown and potentially useful information from databases. Its goal is to extract
significant patterns or interesting rules from databases. Data mining can be broadly classified into three categories:
Classification (Clustering)---finding rules that partition the database into finite, disjoint, and previously known
(unknown) classes. Sequences---extracting commonly occurring sequences in ordered data. Association rules (a
form of summarization)--- find the set of most commonly occurring groupings of items [25.7]. In the project, mining
association rules in a skin cancer database has been implemented.
IV. Image Analysis for brain data
Studies of schizophrenia, Parkinson’s disease, Alzheimer’s disease and other illnesses caused by disruption
of brain functions, are often based on collections of brain images, usually obtained at different resolutions through
computer tomography for human subjects, or through surgical procedures for other species. Atlases have been acommon way to organize such image series. Multiple examples of such atlases with corresponding image
segmentation and 2D and 3D visualization techniques have been developed [e.g., for mouse brain: 24.3, 24.4, 24.8,
24.12, 24.13]; several are available online [24.11]. A comprehensive list of available brain data sources and atlases
is maintained by [24.6]. In [24], principles and techniques that enable spatial data interoperability, including spatial
registration, discovery, query, and visualization across brain data sources have been explored.
V. XML based spatial data management for Geo-Cmputation in distributed system
Today, the research interest of geo-computation such as data mining and knowledge discovery purchases
more orders to the data infrastructure of geo-computation. A new data infrastructure which is distributed, extensible
and platform-independent is needed to provide more powerful and flexible data services for the geo-computation
research and applications. Grid computing is a new research agenda which evolves from the distributed computing
and meta-computing. It tries to provide virtual computation resources by strip the power of resources from the
computer hardware and software. When the grid computing technology is adopt in the research domain of spatial
information and geo-computation, Spatial Information Grid [26.6] (shortly SIG) is proposed and studied. In SIG, the
computing power, data, model, arithmetic, and other resources are shared and assembled as abstract resource
through a series of middleware, toolkits and infrastructure. It will be a powerful and easy-to-use infrastructure for
spatial information applications. An SIG based 4-layered data infrastructure which is built up by data nodes, data
sources, data agencies, support libraries, and other components is proposed in [26]. It is distributed in geography and
management. The data infrastructure has a lot of service nodes distributed geographically, and the nodes are
managed distributed by their owners instead of a centralized organization. the SIG based data infrastructure are well
designed by XML schema for co-operation and implemented in java language, so that it can run on almost any
platform and supports any type of data sources.
64
8/3/2019 Dimensions in Data Processing 2402
http://slidepdf.com/reader/full/dimensions-in-data-processing-2402 65/76
Architecture
The SIG based data infrastructure also adopts the SOA (service oriented architecture) as its footstone, and
regards all components including data sources and data agencies as web services. In logic, it takes a 4-layered
architecture show as figure 1. By invoking the web service in a well-defined XML based protocol; data stored in the
data node can be searched and accessed. In order to make the design of data infrastructure simple and neat, the data
agencies are required to adopt the same protocol as the data sources. It is called “eXtensible Data Accessing
Language”, shortly XDAL.
The eXtensible Data Accessing Language
The data infrastructure shares spatial data in different types, with different formats, and for different goals
in a uniform infrastructure. Because the data are often stored on different platforms, the data sources have to be
invoked by a platform-independent protocol such as SOAP. Furthermore, a well-defined extensible data accessing
language which suits any data source and any data type is needed for the data source accessing based on SOAP. In
order to make it platform-independent, XML should be adopted as its format.
There are several frequently used operations on a data source or data agency: searching data, downloading
data, and querying its capability description. The grammars and usage rules of request and response for these
operations are standardized by the well-defined XDAL in XML schema. Users can accomplish the operation by
invoking the web service provided by data source or data agency, passing the request to it, and analyzing the
response for the result.
In the XDAL, a REQUEST should has a root element named <query>, <access>, <getCapability>,
<getStatus>, or <getResult>, for functions of searching data, downloading data, getting capability description of
data source, and for commands of getting operation status, getting operation result. A RESPONSE should has a root
element named <response>, <status>, or <result>, for responses of starting an operation, getting operation status,
and getting operation result. Figure 3 is a sample of searching data from satellite “Landset-7” with a given
acquirement date.
65
8/3/2019 Dimensions in Data Processing 2402
http://slidepdf.com/reader/full/dimensions-in-data-processing-2402 66/76
Considering the extensibility of the system, the XDAL is designed as a XML based extensible language.
Both the request of <query>, <access>, <getCapability>, and the response of <result> can be extended a lot. By
extending the request and response, the XDAL will be suit for almost all kinds of data sources and geo-computation
applications.
66
8/3/2019 Dimensions in Data Processing 2402
http://slidepdf.com/reader/full/dimensions-in-data-processing-2402 67/76
The paper further provides some useful data sources, data agencies and support libraries. A test on the user interface
of the data infrastructure shows that it can organize the distributed data nodes and data agencies dynamically; build
an extensible, robust, and autonomic data infrastructure; and serve the users on-stop as an organic whole.
ON Management of object data base in distributed, real time environment etc.
I. High-performance data management support for real-time visualization time-varying flow fields e.g.
blood flow
Aneurysm surgery remains dangerous because surgeons have limited knowledge of the 3D geometry of
aneurysm and its complex, time-dependent themodynamic factors such as flow, shear force and pressure. This
information is essential to determine if the aneurysm is suitable for a certain surgical procedure. To make it possible
for physicians to obtain such information, we have designed and developed a Virtual Aneurysm (VA) system that
supports an interactive exploration environment suited for the particular needs of brain aneurysm specialists and
directly assists them in their investigations. The handling of large amounts of data in real-time virtual reality
visualization systems is an important and complex problem of high significance. Navigation and exploration of such
large datasets stress computational resources, requiring users and visualization systems make tradeoffs between
time, space, and flexibility.
To make it possible for physicians to obtain such information, Liu and Karplus [26] have designed and
developed a Virtual Aneurysm (VA) system that supports an interactive exploration environment suited for the
particular needs of brain aneurysm specialists and directly assists them in their investigations. VA system mainly
consists of a client-server configuration that provides an immersive environment allowing a physician to move
around and into an aneurysm, interactively navigate to explore its complex computer simulated fluid dynamics
within the vascular system using virtual reality and scientific visualization techniques [26.3].
VA system description in brief
VA system is based on the numerical solutions of Navier-Stokes equations for the case of three-
dimensional time-varying ows. Flow simulations are computed over time as the heart goes through its pumping
cycle. To ensure numerical stability, simulations are computed using small time stepsize such that only a very small
fraction of the total data changes their values from one step to the next. Adding the time dimension drastically
increases the dataset size, increasing storage requirements and computational complexity. Simulations typically run
for tens or hundreds of hours on high-performance computing machines and periodically generate snapshots of states. The large quantities of simulated data are subsequently stored in archives on disk. After data are off-loaded,
they are analyzed and post-processed using scientific visualization and animation techniques to explore the evolving
state of the simulated fluid dynamics within the vascular system from local graphics workstations. Data of such
unprecedented size often exceeds the memory and performance capacity of typical desktop graphics workstations.
The frame rate is the frequency with which the renderer processes new frames. The frame rate of a visualization is to
be kept as constant and as high as possible so that whatever the animation is it will be smooth. The greater the frame
67
8/3/2019 Dimensions in Data Processing 2402
http://slidepdf.com/reader/full/dimensions-in-data-processing-2402 68/76
rate, because there will be more frames, more work will be required to produce the animation. Typical visualizations
require a significant amount of I/O bandwidth for accessing data at different time steps when there is not enough
memory space for the en tire time sequence. The results of data access must be communicated to the graphics
workstations for display, which not only causes significant data movement across slow networks,
but also interfaces with complex human-computer interactions.
Data Representation
Successful visualization systems must be designed to handle dataset of arbitrary size. We have exploited a
method for the production and a hierarchy of representations of reduced data at various levels of detail, which retain
as many as possible of the essential features in the original data but are small enough to be loaded one chunk at a
time into main memory. We further expanded this multi resolution data reduction method to allow any number of
data variables by developing an algorithm for construction of octree-like data measuring the introduced error in the
multivariate data, ensuring that the errors in multivariate data do not exceed a pre-defined upper bound.
Figure 1: Data flows through the visualization pipeline.
Figure 2: Schematic of the flow visualization environment.
Octrees, like quadtrees, are hierarchical data structures based on decomposition of space. In quadtrees,
space is recursively subdivided into four subregions [26.6]. Octrees are three-dimensional extensions of quadtrees,
where space is recursively subdivided into eight subvolumes [26.4]. The octree-based approach illustrates the
advantage of a regular partition of the 3D space. A hierarchical partition of the space into octants and suboctants,
down to any desired level of granularity, provides a general-purpose scheme for organizing the space as a skeleton
to which any kind of spatial data can be attached for systemic access. This skeleton supports multi-resolution
68
8/3/2019 Dimensions in Data Processing 2402
http://slidepdf.com/reader/full/dimensions-in-data-processing-2402 69/76
visualization of large three-dimensional datasets so that the current regions of interest are always displayed at a
higher resolution than the rest. In many instances, these higher resolution regions make up a relatively small
percentage of the entire data that leads to accelerate the visualization with only slight degradation of image quality.
The notion of multi-resolution is also a key concept to control the traffic of network connections to guarantee the
quality of service (QoS) in many multimedia applications Octree nodes are partitioned and then restructured into
many page-sized blocks for efficient storage on disk. For simple partitioning, tree nodes are visited in depth-first
order and are accumulated into the current block if the number of nodes does not exceed the block size. The
traversal recursively descends the tree and continues. If a node overfills the current block, close the current block,
leaving it slightly unfilled, and start working on the next block. Because page size can be controlled, any size data
can be run on any size computer, with good scalable characteristics as the computer system grows in memory,
computational power, and data bandwidth. Furthermore, each page-sized data is loaded into main memory one time
step ahead of when it is actually required, resulting in smooth streaming of data from storage device to graphics
pipeline.
II. Data Management for Distributed Moving Object databases
The need to manage massive volume of continuously produced information is ever increasing rapidly in
many future applications, such as Location-Based Service (LBS), stream data processing, wireless sensor networks
and RFID-enabled ubiquitous computing. To realize location based service, it is essential to develop efficient
management scheme of location information for heavy volume of moving objects, which can be anything that can
change its position, including human, equipment, or vehicle. There have been lots of related research efforts, but
most of current research activities are single node oriented, making it difficult to handle the extreme situation that
must cope with a very large volume, at least millions, of moving objects. The architecture named the Gracefully
Aging Location Information System (GALIS) is a cluster-based distributed computing system architecture which
consists of multiple data processors, each dedicated to keeping records relevant to a different geographical zone and
a different time-zone. Much work has further been done in [27, 27.7, 27.8, 27.9, 27.10].
CONCLUSION
Various data processing and computing needs based on different data characteristics and processing
typicality posed by the requirements of different applications and computing environments, has been studied and
presented above. Mostly, an approach to address to the needs of a particular application requires focusing on design
of proper data- representation (data structure). It has to be with regard to efficiency of various operations required to
be performed over that data, as expected by users of the application. As we can observe, the major operations
include search over data. To provide efficient search, the core data representation may be supported with proper
indexing . Further, the recent applications are mostly demanding descriptive as well as predictive learning from the
data sets. This requires techniques for such queries over data forms ranging from static text, image, audio, and raw
69
8/3/2019 Dimensions in Data Processing 2402
http://slidepdf.com/reader/full/dimensions-in-data-processing-2402 70/76
bit streams to the data about time varying data e.g. for moving objects etc. To solve such problems, several
techniques have been applied, devised, invented, and are being studied. Integration of Pattern Recognition,
Machine Learning, Image Processing techniques [30,31,32,33,34] with upcoming Computational Intelligence/
Optimization Techniques bears many promises to make to the growing needs of the industry. Among the various
applications studied, need for improvement of spatial-data-mining techniques has been found of much importance.
It addressed to a wide range of application areas including multimedia applications, geographic information systems
(GIS), medical image processing, surveillance system and many more. Statistical methods play vital role in all above
solutions. For example, the recently growing techniques of Swarm Intelligence – extension of evolutionary
computing –vastly apply the blending of algorithms design techniques and statistical technique [24,36,37,38,39,40].
Particle Swarm Optimizations (PSO) technique based on genetic algorithms and Stochastic Processes, is remarkable
one to solve many such problems with better efficiency. Further, development of Programming
languages/environment facilitating the functionality devised (by applying the underlying mathematical/ computing
techniques) for a particular application area, to its users, with users friendly interface is also under demand. All such
effort may make considerable contribution to the development of Relational Data Base/ Object Oriented Database/Object Relational Data Base Management Systems adding to their applicability in respective fields of contemporary
applications.
References:
[1] Heiko Schwarz, Detlev Marpe, and Thomas Wiegand, “Overview Of The Scalable H.264/Mpeg4-Avc Extension”,2007, Fraunhofer
Institute for Telecommunications – Heinrich Hertz Institute, Image Processing Department
[2] Livio Lima, Daniele Alfonso, Luca Pezzoni, Riccardo Leonardi, “New fast search algorithm for base layer of H.264 scalable video
coding extension”, 2007 Data Compression Conference (DCC'07), IEEE.
[5] Jeongkyu Lee, Department of Computer Science and Engineering, University of Bridgeport, “A Graph-based Approach for
Modeling and Indexing Video Data”, Proceedings of the Eighth IEEE International Symposium on Multimedia (ISM'06)
[6] ZHAN Chaohui DUAN Xiaohui* XU Shuoyu SONG Zheng LUO Min, “An Improved Moving Object Detection Algorithm Based on
Frame Difference and Edge Detection”, Fourth International Conference on Image and Graphics, 2007.
[7] Seung-Ho Lim, Man-Keun Seo and Kyu Ho Park, “Scrap : Data Reorganization and Placement of Two Dimensional Scalable Video
in a Disk Array-based Video Server”, Computer Engineering Research Laboratory, Department of Electrical Engineering and
Computer Science, KAIST, Ninth IEEE International Symposium on Multimedia 2007 - Workshops
[8] Siddhartha Chattopadhyay, Suchendra M. Bhandarkar, Member, IEEE, and Kang Li, “Human Motion Capture Data Compression by
Model-Based Indexing: A Power Aware Approach”, IEEE Transactions On Visualization And Computer Graphics, Vol. 13, No. 1,
January/February 2007.
[9] Yu-lung Lo, Chun-hsiung Wang, “Hybrid Multi-Feature Indexing for Music Data Retrieval”, 6th IEEE/ACIS International Conference on
Computer and Information Science (ICIS 2007).
[10] Daniel Howard, Joseph Kolibal, “Image Analysis by Means of the Stochastic Matrix Method of Function Recovery”, 2007 ECSIS
Symposium on Bio-inspired, Learning, and Intelligent Systems for Security
[11] Peng Tang, Lin Gao and Zhifang Liu, “Salient Moving Object Detection Using Stochastic Approach Filtering”, Fourth
International Conference on Image and Graphics, IEEE, 2007.
[12] Jian-Kang Wu, “Content-Based Indexing of Multimedia Databases”, IEEE Transactions On Knowledge And Data Engineering, Vol.
9, No. 6, November/December 1997.
[13] S.R. Subramanya Rahul Simha B. Narahari Abdou Youssef, “ Transform-Based Indexing of Audio Data for Multimedia
Databases”, IEEE, 1997.
[14] Jusub Kim, Joseph JaJa, “ Information-Aware 2n-Tree for Efficient Out-of-Core Indexing of Very Large Multidimensional
Volumetric Data”
70
8/3/2019 Dimensions in Data Processing 2402
http://slidepdf.com/reader/full/dimensions-in-data-processing-2402 71/76
[15] Yukio Hiranaka†, Hitoshi Sakakibara‡ and Toshihiro Taketa, “Universal Communication Format for Multimedia Data”,
Proceedings of the Sixth International Conference on Computational Intelligence and Multimedia Applications (ICCIMA’05)
[16] Vit Niennattrakul Chotirat Ann Ratanamahatana, “On Clustering Multimedia Time Series Data Using K-Means and Dynamic Time Warping”,
2007 International Conference on Multimedia and Ubiquitous Engineering(MUE'07)
[17] JIAN YIN1, DUANNING ZHOU2 AND QIONG-QIONG XIE1, “A Clustering Algorithm For Time Series Data”, Proceedings of the Seventh
International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT'06), 2006
[18] Guoqiang Xiong, Qingjing Gao, “An Algorithm of Similarity Mining in Time Series Data on the Basis of Grey
Markov Scgm(1,1) Model”, 2007 IFIP International Conference on Network and Parallel Computing - Workshops
2007
[19] Ryo Yoshida, Seiya Imoto, Higuchi, “Estimating Time-Dependent Gene Networks from Time Series Microarray Data by Dynamic Linear Models
with Markov Switching”, Proceedings of the 2005 IEEE Computational Systems Bioinformatics Conference (CSB’05)
[20] Battuguldur Lkhagva, Yu Suzuki and Kyoji Kawagoe, “New Time Series Data Representation ESAX for Financial Applications”, Proceedings of the
22nd International Conference on Data Engineering Workshops (ICDEW'06), 2006.
[21] Yuhang Wang, Fillia Makedon, “R-Histogram: Quantitative Representation of Spatial Relations for Similarity-Based Image Retrieval”
[22] Po-Whei Huang and Chu-Hui Lee, “Image Database Design Based on 9D-SPA Representation for Spatial Relations”, IEEE Transactions On
Knowledge And Data Engineering, Vol. 16, No. 12, December 2004
[23] Keith Marsolo, Michael Twa, “ Classification of Biomedical Data Through Model-based Spatial Averaging”, Proceedings of the 5th IEEE
Symposium on Bioinformatics and Bioengineering (BIBE’05), 2005.
[24] Ilya Zaslavsky, Haiyun He, Joshua Tran, Maryann E. Martone, Amarnath Gupta, “Integrating Brain Data Spatially: Spatial Data Infrastructure and
Atlas Environment for Online Federation and Analysis of Brain Images”, Proceedings of the 15th International Workshop on Database and Expert
Systems Applications (DEXA’04), 2004
[25] Soon M. Chung and Qing Wang, “Content-based Retrieval and Data Mining of a Skin Cancer Image Database”, Proceedings of the
International Conference on Information Technology: Coding and Computing (ITCC .01), 2001.
[26] Damon Liu and W alter Karplus, “Data Management for Exploring Complex Time-Dependent Flow Datasets”, Proceedings of the International
Conference on Information Technology: Coding and Computing (ITCC .01), 2001.
[27] Ho Lee, Jaeil Hwang, Joonwoo Lee, Seungyong Park, Chungwoo Lee, Yunmook Nah, “Long-term Location Data Management for Distributed
Moving Object Databases”, Proceedings of the Ninth IEEE International Symposium on Object and Component-Oriented Real-Time Distributed
Computing, 2006
[28] Narendra Ahuja, Jack Veenstra, “Generating Octrees from Object Silhouettes in Orthographic Views”, IEEETransactions On Pattern Analysis And
Machine Intelligence. Vol. Ii. No. 2. February 1989 137
[29] Qingmin Shi, Joseph JaJa, “Isosurface Extraction and Spatial Filtering Using Persistent Octree (POT)”, IEEE Transactions On Visualization And
Computer Graphics, Vol. 12, No. 5, September/October 2006
e-Books/ Book available: DM/Machine Learning/ Image Processing
[30] Ian H. Witten & Eibe Frank, “Data Mining: Practical Machine Learning Tools and techniques”, 2/e, Morgan Kaupman Publishers
[31] Daniel T. Larose, “ Discovering Knowledge in Data: An Introduction to Data Mining “, Wiley Interscience.
[32] Acharya and Roy, “ Image Processing- Principles and Appliations”, Wiley Interscience.
[33] “Image Representation, Indexing and Retrieval Based on Spatial Relationships and Properties of Objects”,
a dissertation presented to the faculty of the Department of Computer Science of the University of Crete
in partial fulfillment of the requirements for the degree of Doctor of Philosophy by Euripides G.M. Petrakis
[34] “Image Processing in C”, 2/e, Dwayne Phillips, R&D Publications /Miller-Freeman Inc./ CMP Media Inc.
[35] “MATLAB Recipes for Earth Sciences”, 2/e, Martin H. Trauth, Springer.
More research papers: Statistical Techniques
[36] Tiago Sousa, Ana Neves, Arlindo Silva, “Swarm Optimisation as a New Tool for Data Mining”, Proceedings of the International Parallel and
Distributed Processing Symposium (IPDPS’03)
[37] Gianluigi Folino, Agostino Forestiero, Giandomenico Spezzano, “Swarming Agents for Discovering Clusters in Spatial Data”, Proceedings of the
Second International Symposium on Parallel and Distributed Computing ISPDC’03)
[38] “Data Clustering: A Review”, A.K. JAIN, M.N. MURTY,P.J. FLYNN, ACM Computing Surveys, Vol. 31, No. 3, September 1999
[39] Bin Gao Tie-Yan Liu Wei-Ying Ma, “Star-Structured High-Order Heterogeneous Data Co-clustering based on Consistent Information Theory”,
Proceedings of the Sixth International Conference on Data Mining (ICDM'06)
71
8/3/2019 Dimensions in Data Processing 2402
http://slidepdf.com/reader/full/dimensions-in-data-processing-2402 72/76
[40] Bijan Bihari Misra, Suresh Chandra Satapathy P. K. Dash, “Particle Swarm Optimized Polynomials for Data
Classification”, Proceedings of the Sixth International Conference on Intelligent Systems Design and Applications (ISDA'06)
[41] Ofer Miller, Ety Navon, Amir Averbuch, “Tracking Of Moving Objects Based On Graph Edges Similarity”,ICME 2003
References within references
[5.1] A. Dempster, N. Laird, and D. Rubin. Maximum likelihood estimation from incomplete data via them algorithm (with discussion). J.R. Statist. Soc.
B, 39:1–38, 1977.[5.2] S. Lu, M. Lyu, and I. King. Video Summarization by Spatial-Temporal Graph Optimization. In Proceedings of the 2004 International Symposium on
Circuits and Systems, volume 2, pages 197–200, Vancouver, Canada, May 2004.
[5.3] J. Lee, J. Oh, and S. Hwang. STRG-Index: Spatio-Temporal Region Graph Indexing for Large Video Databases. In Proceedings of the 2005 ACM
SIGMOD, pages 718–729, Baltimore, MD, June 2005.
[5.4] H. T. Chen, H. Lin, and T. L. Liu. Multi-object tracking using dynamical graph matching. Proc. of the 2001 IEEE Conf. on CVPR, pages 210–217,
2001.
[6.1] Wang Junqing, Shi Zelin, and Huang Shabai, “Detection of Moving Targets in Video Sequences”. Opto-Electronic Engineering, Dec 2005, pp. 5-8.
[6.2] Ren Mingwu, and Sun Han, “A Practical Method for Moving Target Detection Under Complex Background”. Computer Engineering, Oct 2005, pp.
33-34.
[6.3] Milan Sonka, Vaclav Hlavac, and Roger Boyle, “Image Processing, Analysis, and Machine Vision (Second Edition)”, Posts & Telecom Press,
Beijing, Sep 2003.
[6.4] Zhang Yunchu, Liang Zize, Li En, and Tan Min, “A Background Reconstruction Algorithm Based on C-means Clustering for Video Surveillance”,
Computer Engineering and Application, 2006, pp. 45-47.
[7.1] P. Shenoy and H. M. Vin. Efficient support for interactive operations in multiresolution video server. ACM Multimedia Syst., 7(3):241.253, Nov.
1999.
[7.2] S. Lim, Y. Jeong and K. Park. Interactive media server with media synchronized RAID storage system. Proc. of International Workshop on
Network and Operating Systems Support for Digital Audio and Video 2005, Jun. 2005.
[7.3] E. Chang and A. Zakhor. Disk-based storage for scalable video. IEEE Trans. on circuits and systems for video technology, 7(5):758.770, Oct.
1997.
[7.4] R. Rangaswami, Z. Dimitrijevic, E. Chang and S.-H. G. Chan. Fine-grained Device Management in an Interactive Media Server. IEEE
Transactions on Multimedia, Vol. 5, No. 4, pages 558-569, Dec. 2003.
[7.5] S. Kang, Y. Won and S. Roh. Harmonic placement: file system support for scalable streaming of layer encoded object. Proc. of International
Workshop on Network and Operating Systems Support for Digital Audio and Video 2006, May 2006.
[8.1] ISO/IEC 14496-1:1999, “Coding of Audio-Visual Objects, Systems,” Amendment 1, Dec. 1999.
[8.2] ISO/IEC 14496-2:1999, “Coding of Audio-Visual Objects, Visual,” Amendment 1, Dec. 1999.
[8.3] M. Preda, A. Salomie, F. Preteux, and G. Lafruit, “Virtual Character Definition and Animation within the MPEG-4 Standard,” 3D Modeling and
Animation: Synthesis and Analysis Techniques for the Human Body, M. Strintzis and N. Sarris, eds., chapter 2, pp. 27-69, IRM Press, 2004.
[8.4] S. Chattopadhyay, S.M. Bhandarkar, and K. Li, “Efficient Compression and Delivery of Stored Motion Data for Virtual Human Animation in
Resource Constrained Devices,” Proc. ACM Conf. Virtual Reality Software and Technology (VRST ’05), pp. 235- 243, Nov. 2005.
[8.5] M. Endo, T. Yasuda, and S. Yokoi, “A Distributed Multi-User Virtual Space System,” IEEE Computer Graphics and Applications, vol. 23, no. 1, pp.
50-57, Jan./Feb. 2003.
[8.6] T. Hijiri, K. Nishitani, T. Cornish, T. Naka, and S. Asahara, “A Spatial Hierarchical Compression Method for 3D Streaming Animation,” Proc. Fifth
Symp. Virtual Reality Modeling Language (Web3D-VRML), pp. 95-101, 2000.
[8.7] T. Giacomo, C. Joslin, S. Garchery, and N. Magnenat-Thalmann, “Adaptation of Facial and Body Animation for MPEG-Based Architectures,” Proc.
Int’l Conf. Cyberworlds, p. 221, 2003.[8.8] A. Aubel, R. Boulic, and D. Thalmann, “Animated Impostors for Real-Time Display of Numerous Virtual Humans,” Proc. First Int’l Conf. Virtual
Worlds (VW ’98), vol. 1434, pp. 14-28, 1998.
[8.9] O. Arikan, “Compression of Motion Capture Database,” Proc. ACM Trans. Graphics (ACM TOG), vol. 25, no. 3, pp. 890-897, 2006.
[9.1] James C.C. Chen and Arbee L.P. Chen, “Query by Rhythm An Approach for Song Retrieval in Music Databases,” In Proc. Of Int’l Workshop on
Research Issues in Data Engineering, Pages 139-146, 1998.
[9.2] Arbee L.P. Chen, M. Chang, J. Chen, J.L. Hsu, C.H. Hsu, and Spot Y.S. Hua, “Query by Music Segments: An Efficient Approach for Song
Retrieval,” In Proc. Of IEEE Int’l Conf. on Multimedia and Expro, 2000.
72
8/3/2019 Dimensions in Data Processing 2402
http://slidepdf.com/reader/full/dimensions-in-data-processing-2402 73/76
[9.4] J.L. Hsu, C.C. Liu, and Arbee L.P. Chen, “Efficient Repeating Patterrn Finding in Music Databases,” InProc. of ACM Int’l Conf. on Information and
Knowledge Management, 1998.
[9.5] C. L. Krumhansl, “Cognitive Foundations of Musical Pitch,” Oxford University Press, New York, 1990.
[9.6] W. Lee and A.L.P. Chen, “Efficient Multi-Feature Index Structures for Music Data Retrieval,” In Proc. Of SPIE Conf. on Storage and Retrieval for
Image and Video Database, 2000.
[9.7] Chia-Han Lin and Arbee L. P. Chen, “Indexing and Matching Multiple-Attribute Strings for Efficient Multimedia Query Processing,” IEEE
Transactions On Multimedia, Vol. 8, No. 2, April 2006.
[9.8] C.C. Liu, J.L. Hsu, and Arbee L.P. Chen, “Efficient Theme and Non-Trivial Repeating Pattern Discovering in Music Databases,” In Proc. of IEEE
Data Engineering, Pages 14-21, 1999.
[9.9] C.C. Liu, J.L. Hsu, and Arbee L.P. Chen, “An Approximate String Matching Algorithm for Content-Based Music Data Retrieval,” In Proc. of IEEE
Int’l Conf. on Multimedia Computing and Systems, Pages 451-456, 1999.
[9.10] Yu-lung Lo and Shiou-jiuan Chen, “The Numeric Indexing For Music Data,” in Proc. of the IEEE 22nd ICDCS Workshops – the 4th Int ’l Workshop
on Multimedia Network Systems and Applications (MNSA’2002), Vienna, Austria, Pages 258-263, July 2002.
[9.11] Yu-lung Lo and Shiou-jiuan Chen, “Multi-feature Indexing For Music Data,” in Proc. Of the IEEE 23nd ICDCS Workshops – the 5th Int’l
Workshop on Multimedia Network Systems and Applications (MNSA’2003), Providence, Rhode Island, USA, Pages 654-659, May 19-22, 2003.
[9.12] Yu-lung Lo, Ho-cheng Yu, and Mei-chin Fan, “Efficient Non-trivial Repeating Pattern Discovering in Music Databses,” Tamsui Oxford Journal of
Mathematical Sciences, Vol. 17, No. 2, Pages 163-187, Nov. 2001.
[11.1] D. Koller, J. Weber, and J. Malik, Robust Multiple Car Tracking with Occlusion Reasoning, in Proc. ECCV 94. Stockholm, Sweden. 1994.
[11.2] L. Wixson, Detecting salient motion by accumulating directionally-consistentflow, IEEE Trans. Pattern Analysis and Machine Intelligence, 2000.
22(8): p. 774-780.
[11.3] Tian, Y.-L. and A. Hampapur. Robust Salient Motion Detection with Complex Background for Real-time Video Surveillance, in IEEE Computer
Society Workshop on Motion and Video Computing 2005. Breckenridge, Colorado.
[11.4] A. Monnet, A. Mittal, and N. Paragios, Background modeling and subtraction of dynamic scenes, in Proc. ICCV 2003: p. 1305-1312.
[11.5] C. R. Wren, et al., Pfinder: real-time tracking of the human body, IEEE Trans. Pattern Analysis and Machine Intelligence, 1997. 19(7): p. 780-
785.
[11.6] C. Stauffer andW.E.L. Grimson, Learning patterns of activity using real-time tracking, IEEE Trans. Pattern Analysis and Machine Intelligence,
2000. 22(8): p. 747-757.
[11.7] A. Elgammal, D. Harwood, and L.S. Davis. Nonparametric Model for Background Subtraction, in Proc. ICCV Frame-Rate Workshop. 1999.
Kerkyra, Greece.
[11.8] A. Mittal and N. Paragios, Motion-based background subtraction using adaptive kernel density estimation, in Proc. IEEE Computer SocietyConference on Computer Vision and Pattern Recognition, 2004. 2.
[12.5] J.K. Wu, A.D. Narasimhalu, B.M. Mehtre, C.P. Lam, and Y.J. Gao, “CORE: A Content-Based Retrieval Engine for Multimedia Databases,” ACM
Multimedia Systems, vol. 3, pp. 3-25, 1995.
[12.8] C. Faloutsos, M. Flickner, W. Niblack, D. Petkovic, W. Equitz, and R. Barber, “Efficient and Effective Querying by Image Content,” Technical
Report, IBM Research Division, Almaden Research Center, RJ 9453 (83074), Aug. 1993.
[12.9] S.K. Chang, C. Yan, D.C. Dimitroff, and T. Arndt, “An Intelligent Image Database System,” IEEE Trans. Software Eng. vol. 14, pp. 681- 688,
1988.
[12.10] W.I. Grosky and R. Mehrota, “Index-Based Object Recognition in Pictorial Data Management,” CVGIP, vol. 52, pp. 416-436, 1990.
[12.13] T. Kohonen, “The Self-Organizing Map,” Proc. IEEE, vol. 78, pp. 1,464-1, 480, 1990.
[12.15] A. Tversky, “Features of Similarity,” Psychological Rev., vol. 84, pp. 327-352, 1977.
[12.20] J.-K. Wu, F. Gao, and P. Yang, “Model-Based 3D Object Recognition,” Proc. Second Int’l Conf. Automation, Robotics, and Computer Vision,
Singapore, Sept. 1992.
[13.9] E.Wold et al. Content-based classification, search and retrieval of audio data. IEEE Multimedia Magazine, 1996.
[13.10] A.Ghias et al. Query by humming. Proc. ACM Multimedia Conf., 1995.
[14.12] C. Silva, Y. Chiang, J. El-Sana, and P. Lindstrom, “Out-of-core algorithms for scientific visualization and computer graphics,” IEEE Visualization
Course Notes, 2002.
[14.3] P. M. Sutton and C. D. Hansen, “Accelerated isosurface extraction in time-varying fields,” IEEE Transactions on Visualization and Computer
Graphics, vol. 6, no. 2, pp. 98–107, Apr 2000.
[14.4] J. Wilhelms and A. V. Gelder, “Octrees for faster isosurface generation,” ACM Transactions on Graphics, vol. 11, no. 3, pp. 201–227, Jul 1992.
73
8/3/2019 Dimensions in Data Processing 2402
http://slidepdf.com/reader/full/dimensions-in-data-processing-2402 74/76
[14.5] J. Vitter, “External memory algorithms and data structures: Dealing with massive data.” ACM Computing Surveys, March 2000.
[14.13] H. Samet, The design and analysis of spatial data structures. Addison Wesley, 1990.
[14.14] T. M. Cover and J. A. Thomas, Elements of Information Theory. John Wiley, 1991.
[15.1] R. Srinivasan, XDR: External Data Representation Standard, RFC1832, 1995.
[15.2] M.P. Singh, Agent Communication Languages: Rethinking the Principles, IEEE Computer, vol.31, no.12, pp.40-47, 1998.
[15.3] Y. Hiranaka and M. Kato, Multimedia Data Representation by the Universal Data Format, Trans. IPSJ Meeting, 4V-9, 3-577/578, 1999.
[15.4] T. Obata, T. Taketa and Y. Hiranaka, Multimedia User Interface, Trans. IPSJ Tohoku Chapter Meeting, 00-4-6, 2001.
[17.1] Anne Denton, “Density-based Clustering of Time Series Subsequences”, In Proceedings The Third Workshop on Mining Temporal and
Sequential Data (TDM 04) in conjunction with The Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Seattle,
WA, Aug. 22, 2004.
[17.2] Darvish A., Bak E., Gopalakrishhan K., Zadeh R.H., Najarian K., “A New Hierarchical Method for Identification of Dynamic Regulatory Pathways
from Time-Series DNA Microarray Data”, Proceedings of The 3rd annual Computational Systems Bioinformatics conference (CSB2004), Stanford, CA,
U.S.A.. pp.602-603, August 16–20, 2004.
[17.3] S. Salvador, P. Chan, J. Brodie, “Learning States and Rules for Time Series Anomaly Detection”, Proc. 17th Intl. FLAIRS Conf, pp.300-305,
2004.
[17.4] Daxin Jiang, Jian Pei, Aidong Zhang, “DHC: A Density-Based Hierarchical Clustering Method for Time Series Gene Expression Data”, BIBE,
pp.393-400, 2003.
[17.5] Pedro Rodrigues, Joao Gama, Joao Pedro Pedroso, “Hierarchical Time-Series Clustering for Data Streams”, First International Workshop on
Knowledge Discovery in Data Streams, 2004
[19.1] T. Miyano and Kuhara. Identification of genetic networks from a small number of gene expression patterns under the Boolean network model.
Pac. Symp. Biocomput., 4, 17-28, 1999.
[19.2] M.J. Beal, F. Falciani, Z. Ghahramani, C. and D.L. Wild. A Bayesian approach to reconstructing genetic regulatory networks with hidden factors.
BioInformatics, 21(3), 2005
[19.3] T.Chen, H. He and G. Church. Modeling gene expression with differential equations. Pacific Symposium on BioCom puting, 1999.
[19.4] C. Rangel, J. Angus, Z. Ghahramani, M. Lioumi, E. Sotheran, A. Gaiba, D.L. Wild, and Falciani. Modeling T-cell activation using gene expression
profiling and state-space models, BioInformatics, 20(9), 2004.
[19.5] MJL de Hoon, Imoto, Kobayashi, Ogasawara , Miyano. Infering gene regulatory networks from time-ordered gene expression data of Bacillus
subtilis using differential equations. Pac.Symp. Biocomput., 8, 2003.
[19.6] N. Friedman, K. Murphy and S. Russell. Learning the structure of dynamic probabilistic networks. Proc. Conferenceon Uncertainty in Artificial
Intelligence, 139-147, 1998.[19.7] S . Kim, S. Imoto and S. Miyano . Inferring gene networks from time series microarray data using dynamic Bayesian networks. Brief. Bioinform.,
4(3)228-235,2003.
[19.8] S. Kim, S . Imoto and S. Miyano. Dynamic Bayesian network and nonparametric regression for nonlinear modeling of gene networks from time
series gene expression data. Biosystems, 75(1-3), 5765,2004.
[19.11] I. Shmulevich, E.R. Dougherty, S. Kim, and W. Zhang. Probabilistic Boolean networks: a rule-based uncertainty model for gene regulatory
networks. Bioinfomtics, 18(2), 2002.
[20.1] Agrawal, R., Faloutsos, C., & Swami, A. “Efficient similarity search in sequence databases” Proceedings of the 4th Conference on Foundations
of Data Organization and Algorithms.(1993)
[20.2] Chan, K. & Fu, W. “Efficient time series matching by wavelets”, Proceedings of the 15th IEEE InternationalConference on Data Engineering.
(1999).
[20.3] Lin, J., Keogh, E., Lonardi, S. & Chiu, B. “A Symbolic Representation of Time Series, with Implications for Streaming Algorithms”, In proceedings
of the 8th ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery. (2003).
[20.4] Keogh, E,. Chakrabarti, K,. Pazzani, M. & Mehrotra “Dimensionality reduction for fast similarity search in large time series databases”, Journal of
Knowledge and Information Systems. (2000).
[20.5] Eamonn J. Keogh , Michael J. Pazzani, “An Indexing Scheme for Fast Similarity Search in Large Time Series Databases” 11th International
Conference on Scientific and Statistical DatabaseManagement 1999
[20.51] Keogh, E., Chakrabarti, K., Pazzani, M. & Mehrotra, S. “Locally adaptive dimensionality reduction for indexing large time series databases”, In
proceedings of ACM SIGMOD Conference on Management of Data. Santa Barbara, CA, May 21-24. pp 151-162. (2001).
74
8/3/2019 Dimensions in Data Processing 2402
http://slidepdf.com/reader/full/dimensions-in-data-processing-2402 75/76
[20.6] Keogh, E., Chu, S., Hart, D. & Pazzani, M. “An Online Algorithm for Segmenting Time Series”. In Proceedings of IEEE International Conference
on Data Mining. pp 289-296. (2001).
[20.7] Yi, B-K and Faloutsos, C., “Fast Time Sequence Indexing for Arbitrary Lp Norms”, Proceedings of the VLDB, Cairo, Egypt, Sept, (2000).
[21.1] J. F. Allen. Maintaining knowledge about temporal intervals. Commun. ACM, 26(11):832–843, 1983.
[21.2] I. Bloch and A. Ralescu. Directional relative position between objects in image processing: a comparison between fuzzy approaches. Pattern
Recognition, 36(7):1563–1582, 2003.
[21.3] S.-K. Chang, Q.-Y. Shi, and C.-W. Yan. Iconic indexing by 2-D strings. PAMI, 9(3):413– 28, 1987.
[21.4] A. G. Cohn and S. M. Hazarika. Qualitative spatial representation and reasoning: an overview. Fundamenta Informaticae, 46(1-2):1–29, 2001.
[21.5] S. Dagtas and A. Ghafoor. Indexing and retrieval of video based on spatial relation sequences. In Proc. Of ACM Multimedia’99, pages 119–122,
1999.
[21.6] M. Flickner, H. Sawhney, W. Niblack, J. Ashley, Q. Huang, B. Dom, M. Gorkani, J. Hafner, D. Lee, D. Petkovic, D. Steele, and P. Yanker. Query
by image and video content: the QBIC system. Computer, 28(9):23–32, 1995.
[21.8] V. N. Gudivada and V. V. Raghavan. Design and evaluation of algorithms for image retrieval by spatial similarity. ACM Trans. on Information
Systems, 13(2):115–144, 1995.
[21.9] J. Keller and X. Wang. Comparison of spatial relation de.nitions in computer vision. In Proc. of ISUMA - NAFIPS’95, pages 679–684, 1995.
[21.13] M. Nabil, A. H. H. Ngu, and J. Shepherd. Picture similarity retrieval using the 2D projection interval representation. IEEE Trans. Knowl. Data
Eng., 8(4):533–539, 1996.
[21.14] A. Pentland, R. W. Picard, and A. Sclaro. Photobook: Content based manipulation of image databases. Int’l J. of Computer Vision, 18(3):233–
254, 1996.
[21.15] E. Petrakis, C. Faloutsos, and K.-I. Lin. ImageMap: an image indexing method based on spatial similarity. IEEE Trans. on Knowl. and Data
Eng., 14(5):979–987, 2002.
[21.16] J. Sharma and D. M. Flewelling. Inferences from combined knowledge about topology and directions. In Advances in Spatial Databases,
volume 951 of Lecture Notes in Computer Science, pages 279–291. 1995.
[21.17] C.-R. Shyu and P. Matsakis. Spatial lesion indexing for medical image databases using force histograms. In Proc. of IEEE CVPR’01, pages
603–608, 2001.
[21.18] M. Swain and D. Ballard. Color indexing. Int’l J. of Computer Vision, 7(1):11–32, 1991.
[22.1] J.R. Bach, C. Fuller, A. Gupta, A. Hampapur, B. Horowitz, R. Humphrey, R.C. Jain, and C. Shu, “Virage Image Search Engine: An Open
Framework for Image Management,” Proc. Symp. Electronic Imaging: Science and Technology—Storage & Retrieval for Image and Video Database
IV, pp. 76-87, 1996.
[22.2] B. Bhanu and S. Lee, Genetic Learning for Adaptive Image Segmentation. Norwell: Kluwer Academic, 1994.[22.3] C.C. Chang, “Spatial Match Retrieval of Symbolic Pictures,” J. Information Science and Eng., vol. 7, pp. 405-422, Dec. 1991.
[22.4] S.K. Chang, E. Jungert, and Y. Li, “Representation and Retrieval of Symbolic Pictures Using Generalized 2D Strings,” technical report, Univ. of
Pittsburg, 1988.
[22.5] S.K. Chang, Q.Y. Shi, and C.W. Yan, “Iconic Indexing by 2-D Strings,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 9, no. 3, pp.
413-428, May 1987.
[22.6] S.K. Chang, Principles of Pictorial Information Systems Design. Englewood Cliffs, N.J.: Prentice-Hall Inc., 1989.
[22.7] Y. Chen and J.Z. Wang, “A Region-Based Fuzzy Feature Matching Approach to Content-Based Image Retrieval,” IEEE Trans. Pattern Analysis
and Machine Intelligence, vol. 24, no. 9, pp. 1252-1267, Sept. 2002.
[22.8] M. Flickner, H. Sawhney, W. Niblack, J. Ashley, Q. Huang, B. Dom, M. Gorkani, J. Hafner, D. Lee, D. Petkovic, D. Streele, and P. Yanker,
“Query by Image and Video Content: The QBIC System,” Computer, vol. 28, no. 9, pp. 23-32, Sept. 1995.
[22.10] P.W. Huang and Y.R. Jean, “Design of Large Intelligent Image Database Systems,” Int’l J. Intelligent Systems, vol. 11, pp. 347-365, 1996.
[22.11] P.W. Huang and S.K. Dai, “Image Retrieval by Texture Similarity,” Pattern Recognition, vol. 36, pp. 665-679, 2003.
[22.14] L.J. Latecki and R. Lakamper, “Application of Planar Shape Comparison to Object Retrieval in Image Database,” Pattern Recognition, vol. 35,
pp. 15-29, 2002.
[22.15] S.Y. Lee and F.J. Hsu, “2D C-String: A New Spatial Knowledge Representation for Image Database Systems,” Pattern Recognition, vol. 23, no.
10, pp. 1077-1087, Oct. 1990.
[22.16] S.Y. Lee and F.J. Hsu, “Spatial Reasoning and Similarity Retrieval of Images Using 2D C-String Knowledge Representation,” Pattern
Recognition, vol. 25, no. 3, pp. 305-318, Mar. 1992.
75
8/3/2019 Dimensions in Data Processing 2402
http://slidepdf.com/reader/full/dimensions-in-data-processing-2402 76/76
[22.17] K.C. Liang and C.C. Jay Kuo, “WaveGuide: A Joint Wavelet-Based Image Representation and Description System,” IEEE Trans. Image
Processing, vol. 8, no. 11, pp. 1619-1629, 1999.
[22.18] A.K. Majumdar, I. Bhattacharya, and A.K. Saha, “An Object- Oriented Fuzzy Data Model for Similarity Detection in Image Databases,”
IEEE Trans. Knowledge and Data Eng., vol. 14, no. 5, pp. 1186-1189, Sept./Oct. 2002.
[22.21] A. Pentland, R.W. Picard, and S. Sclaroff, “Photobook: Tool for Content-Based Manipulation of Image Databases,” Int’l J. Computer
Vision, vol. 18, no. 3, pp. 233-254, June 1996.
[22.22] G. Petraglia, M. Sebillo, M. Tucci, and G. Tortora, “Virtual Images for Similarity Retrieval in Image Databases,” IEEE Trans. Knowledge and
Data Eng., vol. 13, no. 6, pp. 951-967, Nov./Dec. 2001.
[22.23] E. Petrakis, C. Faloutsos, and K.I. Lin, “ImageMap: An Image Indexing Method Based on Spatial Similarity,” IEEE Trans. Knowledge and Data
Eng., vol. 14, no. 5, pp. 979-987, Sept./Oct. 2002.
[22.24] A. Rao, R.K. Srihari, L. Zhu, and A. Zhang, “A Method for Measuring the Complexity of Image Databases,” IEEE Trans. Multimedia, vol. 4, no.
2, pp. 160-173, Mar./Apr. 2002.
[22.25] Y. Rui, T.S. Huang, “Image Retrieval: Current Techniques, Promising Directions, and Open Issues,” J. Visual Comm. Image Representation,
vol. 10, pp. 39-62, 1999.
[22.26] J.R. Smith and S.F. Chang, “VisualSEEK: A Full Automated Content-Based Image Query System,” Proc. Fourth ACM Int’l Multimedia Conf., pp.
87-98, 1996.
[22.27] J. Vleugels, R.C. Veltkamp, and C. Remco, “Efficient Image Retrieval through Vantage Objects,” Pattern Recognition, vol. 35, pp. 69-80, 2002.
[22.28] X.M. Zhou and C.H. Ang, “Retrieving Similar Pictures from a Pictorial Database by an Improved Hashing Table,” Pattern Recognition Letters,
vol. 18, pp. 751-758, 1997.
[22.29] http://www.annapolistech.com/reseller/retrieval.htm, 2004.
[25.2] L. Xu, M. Jackowski, A. Goshtasby, C. Yu, D. Roseman, and S. Bines, "Segmentation of Skin Cancer Images," Image and Vision Computing,
17(1), 1999, pp. 65-74.
[25.3] J. E. Golston, W. V. Stoecker, R. H. Moss, and I. P. S. Dhillon, "Automatic Detection of Irregular Borders in Melanoma and Other Skin Tumors,"
Computerized Medical Imaging and Graphics, 16(3), 1992, pp. 188-203.
[25.4] W. V. Stoecker, W. W. Li, and R. H. Moss, "Automatic Detection of Asymmetry in Skin Tumors," Computerized Medical Imaging and Graphics,
16(3), 1992, pp. 191-197.
[25.5] R. Jain, R. Kasturi, and B. G. Schunck, Machine Vision, McGraw-Hill, 1995.
[25.6] D. H. Ballard and C. M. Brown, Computer Vision, Prentice-Hall, 1982.
[25.7] P. Adriaans and D. Zantinge, Data Mining, Addison-Wesley, 1996.
[26.3] D. Liu, M. Burgin, and W. Karplus, \Computer support system for aneurysm treatment," Proc. of the 13th IEEE Symposium on Computer-Based
Medical Systems, Houston, Texas, pp. 1318, June 2000.
[26.4] D.J. Meagher, \Geometric modeling using octree encoding," Computer Graphics and Image Processing, vol. 19, no. 2, pp. 129147, June 1982.
[26.5] D.A. Patterson, P.M. Chen, G. Gibson, and R.H. Katz, \Introduction to Redundant Arrays of Inexpensive Disks (RAID)," Proc. IEEE COMPCON
Spring '89, pp. 112117, IEEE Computer Society Press, 1989.
[26.6] H. Samet, \The quadtree and related hierarchical data structures," Computing Surveys, vol. 16, no. 2, pp. 186260, June 1984.
[27.7] Nah, Y., Wang, T., Kim, K.H., Kim. M.H., and Yang, Y.K., "TMO-structured Cluster-based Real-time Management of Location Data on Massive
Volume of Moving Items," in Proc. STFES 2003, IEEE Press, Hakodate, Japan, May 2003, pp.89-92.
[27.8] Nah, Y., Kim, K.H., Wang, T., Kim, M.H., Lee, J., and Yang, Y.K., "A Cluster-based TMO-structured Scalable Approach for Location Information
Systems," in Proc. WORDS 2003 Fall, IEEE CS Press, Capri Island, Italy, October 2003, pp.225-233.
[27.9] Nah, Y., Lee., J. Lee, W.J., Lee, H., Kim, M.H. and Han, K.J., “Distributed Scalable Location Data Management System based on the GALIS
76