Search Techniques for Multimedia Databases
Similarity-Based Queries
Similarity Computation
Indexing Techniques
Data Clustering
Search Algorithms
2
Characteristic ofMultimedia Queries
• We normally retrieve a few records from a traditional DBMS through the specification of exact queries based on the notions of “equality”.
• The types of queries expected in an image/video DBMS are relatively vague or fuzzy, and are based on the notion of “similarity”.
The indexing structure should be able to satisfy similarity-based queries for a wide range of similarity measures.
3
Content-Based Retrieval • It is necessary to extract the features
which are characteristic of the image and index the image on these features.
Examples: Shape descriptions, texture properties.
• Typically, there are a few different quantitative measures which describes the various aspects of each feature.
Example: The texture attribute of an image
can be modeled as a 3-dimensional vector with measures of directionality, contrast, and coarseness.
4
Measure of SimilarityA suitable measure of similarity between an image feature vector F and query vector Q is the weighted metric D:
where W is an nxn matrix which can be used to specify suitable weighting measures.
, )()( QFWQFD T
W1•(F1-Q1)2 + W2•(F2-Q2)2 + … + Wn•(Fn-Qn)2
Square of weighted Euclidean Distance
5
Measure of SimilarityA suitable measure of similarity between an image feature vector F and query vector Q is the weighted metric D:
where W is an nxn matrix which can be used to specify suitable weighting measures.
, )()( QFWQFD T
W1•(F1-Q1)2 + W2•(F2-Q2)2 + … + Wn•(Fn-Qn)2
Square of weighted Euclidean Distance
6
Similarity Based on Euclidean Distance ),()(),( QFAQFQFD T
F F F Q1 2 3
346
247
347
246
1 2 1 2
1 3 3
( , ) ( , ) and are equally similar to .
( , ) ( , ) is less similar to .
D F Q D F Q F F Q
D F Q D F Q F Q
D(F1 ,Q) 1 0 01 0 00 1 00 0 1
100
1
0 0 11 0 00 1 00 0 1
001
1
1 0 11 0 00 1 00 0 1
10
12
D(F2 ,Q)
D(F3 ,Q)
A: Identity Matrix
7
Similarity Based on Euclidean Distance (cont.)
Fea
ture
2
Feature 1
Points which lie at the same distance from the query point are considered equally similar, e.g., F1 and F2.
F1
F2 F3
Q
Features 1 & 2 are treated
equally
8
Similarity Based on Weighted Euclidean Distance
Where W is the diagonal. ),(),( QFWQFQFD T
F F Q W
D(F1 ,Q)
1 2
457
358
357
1 0 00 1 00 0 2
1 0 01 0 00 1 00 0 2
100
1
0 0 11 0 00 1 00 0 2
001
2
Example:
D(F2 ,Q)
D(F1 ,Q) < D(F2 ,Q) F1 is more similar to Q
Dissimilarity in 3rd dimension emphasized
9
How to determine the weights ?
A 0 0
0 00 0
σi2: Statistical variance of
the i th feature measure.
The variance of the individual feature measures can be used to determine their weights.
Rationale:
• Variance characterizes the dispersion among the measures
• Use larger weight for feature with smaller variance.
1/σ12
1/σ22
1/σ32
Effect of Weights
10
This feature has larger variance,
use smaller weight
A cluster of similar objects (e.g., cars)
Color
Shap
e
F1
F1 and F2 at same
distance from Q
An ellipsoid, not a circle.
Q
F2
ColorSh
ape
Distance in Euclidean Space
11
1-norm distance(Manhattan distance)
2-norm distance(Euclidean distance)
p-norm distance(Minkowski distance)
Infinity norm distance(Chebyshev distance)
Maximum distance
between any component of
the two vectors
Common Properties of a Distance
12
• Distances, such as the Euclidean distance, have some well known properties.
1. Positive Definiteness: d(p, q) 0 for all p and q and d(p, q) = 0 only if p = q.
2. Symmetry: d(p, q) = d(q, p) for all p and q.
3. Triangle Inequality: d(p, r) d(p, q) + d(q, r) for all points p, q, and r.
where d(p, q) is the distance (dissimilarity) between points (data objects), p and q.
• A distance that satisfies these properties is a metric
13
Query Types
• k-Nearest-Neighbor (k-NN) Queries: The user specifies the number of close matches to the given query point.
Retrieve 10 images most similar to this sample
• Range queries: An interval is given for each dimension of the feature space and all the images which fall inside this hypercube are retrieved.
r is large r is small range query vague query 4-nearest-neighbor query
Q Q Q.r
. .
.
.. .
..
. . ..+ +.
...
.. ..
.
14
Multiattribute and Spatial Indexing
Spatial Databases: Queries involve regions that are represented as multidimensional objects.
Example: A rectangle in a 2-dimensional space involves four values - two points with two values for each point (4D vector).
Access methods that index on multidimensional keys yield better performance for spatial queries.
(X1,Y1)
(X2,Y2)
15
Multiattribute and Spatial Indexing of Multimedia Objects
Multimedia Databases: Multimedia objects typically have several attributes that characterize them.
Example: Attributes of an image include coarseness, shape, color, etc.
Multimedia databases are also good candidates for multikey search structures. Average color
shape
Coa
rsen
ess
16
Indexing Multimedia Objects
Can’t we index multiple features using a B+-tree ?
– B+-tree defines a linear order (e.g., according to X)
– Similar objects (e.g., O1 and O2) can be far apart in the indexing order
•Why multidimensional indexing ?
– A multidimensional index defines a “spatial order”
– Conceptually similar objects are spatially near each other in the indexing order (e.g., O1 and O2)
Feature X
Fea
ture
Y
.O1
O2. .O3
.O4
.O5
O5 is closer to O1
then O2 is in B+-tree order
17
Some Multidimensional Search Structures
• k-d Tree
• Multidimensional Trie
• Grid File
• R Tree
• Point-Quad Tree
• D-Tree
18
k-d Tree• Each node consists of a “record” and two pointers. The pointers are
either null or point to another node.
• Nodes have levels and each level of the tree discriminates for one attribute.
• The partitioning of the space with respect to various attributes alternates between the various attributes of the n-dimensional search space.
Input Sequence
A = (65, 50) B = (60, 70) C = (70, 60) D = (75, 25) E = (50, 90) F = (90, 65) G = (10, 30) H = (80, 85) I = (95, 75)
A(65, 50)
B(60, 70) C(70, 60)
G(10,30) E(50,90) D(75, 25)
F(90, 65)
H(80, 85) I(95, 75)
Discriminator
X
Y
Y
Example: 2-D tree
X
Insertion order can affect performance
19
k-d Tree: Search Algorithm• Notations:
• Algorithm: Search for P(K1, ..., Kn)
Q := Root; /* Q will be used to navigate the tree */
While NOT DONE DO the following: if (Ki(P) = Ki(Q) for i = 1, ..., n) then we have /* Agree
in each */ located the node and we are DONE /* dimension */
Otherwise if A = Disc(Q) and KA(P) < KA(Q) then Q := High(Q)
else Q := Low(Q)
• Performance: O(logN), where N is the number of records
L
M N
(..., Ki(L), ...)Disc(L) : The discriminator at L’s levelKi(L) : The i-attribute value of node LLow(L) : The left child of LHigh(L) : The right child of L
N = High(L)
M = Low(L)
20
Multidimensional TrieMultidimensional tries, or k-d tries, are similar to k-d tree except that they divide the embedding space.
Each split evenly divides a region
Example: Construction of a 2D trie
X<=50 X>50
A(65, 50)
Insert A(65,50):
X<=50 X>50
Y<=50 Y>50
A(65,50) B(60, 70)
Insert B(60, 70):
X>75
X<=50 X>50
Y>50
X<=75
Y<=50
X<=75 X>75
Y>25
A(65,50)
X<=75 Y>75
X<=62.5 X>62.5
B(60,70) C(70,60)
Insert D(75, 25):
Y<=25
D(75,25)
10203040506070
10 20 30 40 50 60 70 80 90
D(75,25)
B(60,70)
C(70, 60)
A(65,50)
X
Y
B(60, 70) C(70, 60)
X<=50 X>50
Y<=50 Y>50
X<=75 X>75
X<=62.5 X>62.5
Y<=75 Y>75
A(65,50)
Insert C(70,60):
Partitioning the space
21
Disadvantages of k-d Trie
•The maximum level of decomposition depends on the minimum separation between two points
A solution: Split a region only if it contains more than p points (i.e., using buckets)
•Not a balanced tree
→ unpredictable performance
22
Grid File
Split Strategy: The partitioning is done with only one hyperplane, but the split extends to all regions in the splitting direction
linear scaleGrid directory
Data bucket H
A B C D
D E F G
H I J K
L L M K
0 25 50 75 100
1 2 3 4
100
75
50
25
0
1
2
3
4
Data basket K
23
Grid File - Potential Issues
• The directory can be quite sparse.
• Many adjacent directory entries may point to the same data block.
• For partial-match and range queries, many directory entries, but only few data blocks, may have to be scanned (i.e., sequential search might be faster)
Grid directory
Data bucket H
A B C K
D E F K
H I J K
L L M K
0 25 50 75 1001 2 3 4
Data basket K
24
Point-Quad Tree•Each node of a k-dimensional quad tree partitions
the object space into k quadrants.
•The partitioning is performed along all search dimensions and is data dependent, like k-d tree.
D(35,85)
A(50,50)
E(25,25)
To search for P(55, 75):
•Since XA< XP and YA < YP → go to NE (i.e., B).
•Since XB > XP and YB > YP → go to SW, which in this case is null.
Partitioning of the space
P
B(75,80)
C(90,65)
The quad tree
SE
SW
E
NW
D
NE
SESW NW
NE
C
Not a balanced tree
A(50,50)
B(75,80)
25
R-tree (RegionTree)
• R-tree is a higher generalization of B-tree
• The nodes correspond to disk pages
• All leaf nodes appear at the same level
• Root and intermediate nodes correspond to the smallest rectangle that encloses its child nodes, i.e., containing [r, <page pointer>] pairs.
• Leaf nodes contain pointers to the actual objects, i.e., containing [r, <RID> …].
• A rectangle may be spatially contained in several nodes (e.g., J ), yet it can be associated with only one node.
A C B
D E F G H I J K L
D
F
E
G
H
IL
K
A
B
C
J
Root
Level 2
May incur redundant search
Level 3
A C B
26
R-tree: Insertion• A new object is added to the
appropriate leaf node.
• If insertion causes the leaf node to overflow, the node must be split, and the records distributed in the two leaf nodes.
A C B
D E F G H I J K L
D
F
E
G
H
IL
K
A
B
C
J
– Minimizing the total area of the covering rectangles (compact clusters)
– Minimizing the area common to the covering rectangles (to minimize redundant search)
• Splits are propagated up the tree (similar to B-tree).
27
R-tree: Delete
If a deletion causes a node to underflow, its nodes are reinserted (instead of being merged with adjacent nodes as in B-tree).
Reason: There is no concept of adjacency in an R-tree.
28
D-tree: Domain DecompositionIf the number of objects inside a domain exceeds a certain threshold, the domain is split into two subdomains.
Horizontal Split
Split lineG FE
D B
A
C
D BA
C
G E F
A border object
1st subdomain
Original
domain
2nd subdomain
Vertical SplitA
B
C D
FE
G
D
Split along longest dimension
A
B
C
FE
G
Original
domain
1st subdomain
2nd subdomain
29
D-tree: Split Example
D-treeEmbedding Space
D
null
D
D1
null
D2
null
D1 D2
D11
null
D2
null
D12
nullD11
D2
D12
D11
null
D2
null
D121
null
D122
nullD11
D2
D121 D122
D11 D21
D22D121 D122
D1 D2
D11
null
D121
null
D122
null null
D21
null
D22
null null null
External node
Internal node
D22.P
30
D-tree: Split ExamplesD-tree Embedding Space
DInitial tree:D
null
After 3insertions:
D
D1 D2
D11
D12
After 2nd split:D11 D2 D12
null
After 1st split:D1 D2
null null
Original domain
Asubdomain
A domain node
A data node
31
D-tree: Split Example (continued)
D-tree Embedding SpaceAfter 3rd
split:D11
D121
D2
D122
D11 D2 D121 D122
After 4th
split:D1 D2
D11 D121 D122 D21 D22 D122
D11 D21
D121 D22
External node
Internal node
D22.P
32
D-tree: Search Algorithm
Search(D_tree_root, search_object )
Current_node = D_tree_root;
For each entry in current-node, say (D, P ), do
if D contains search_object, we do the following:
if Current_node is an external node
retrieve the objects through D.P & compare them
if Current_node is an internal node
call Search(D.P, search_object) /* Recursive call
33
D-tree: Range Query
A range query can be represented as a hypercube embedded in the search space
Search Strategy:
•Use D-tree to retrieve all subdomains which overlap with the query cube.
•For each such subdomain which is not fully contained in the query cube, discard the objects falling outside the query cube. Range
query
Discard this object
34
D-tree: Range Query
Search(D_tree_root, search_cube)
Current_node = D_tree_root
For each entry in Current_node, say (D, P), if D overlaps with search_cube, do:
– If Current_node is an external node, retrieve the objects in D.P, which fall within the overlap region.
– If Current_node is an internal node, call Search(D.P, search_cube).
35
D-tree: Desirable Properties•D-tree is a balanced tree
•Search path for an object is unique No redundant search
•More splits occur in denser regions of the search space.No unnecessary splits Objects evenly distributed among data nodes
•Similar objects are physically clustered in the same, or neighboring data nodes.
•Good performance is ensured regardless of the insertion order of the data.
36
Curse of Dimensionality
As the number of dimensions (D) increases, the probability of finding data in the sphere (Vol-S/V0l-C) decreases exponentially
→ Most data are in “corners” of the cube
→ More dimension we have, more similar things appear (i.e., data have equi-distance)
D Vol-S/Vol-C
1 100%
2 78%
3 52%
4 31%
5 17%
6 9%
Corners are very dense in
high dimensional spaces
37
Effect on High Dimensionality• Figure (a): As dimensionality increases, it
requires a substantially greater search radius to retrieve the same percentage of data (Note: 0.02% means retrieving 3 from 16,000 images.)
• Figure (b): In a high dimensional space, the approximating hypercube query returns substantially more candidate data items as the search radius increases
• Figure (c): The retrieval time increases exponentially with the increases in the number of dimensions (for a given selectivity) after a certain high dimensionality
Sphere query
Approximating query
Data space
1
1
Need larger approximate query Mostly
irrelevant data in corners
Long retrieval time
38
Effect on k-NN Queries• Process k-NN query
Use an approximating hypercube query to find kc candidate neighbors
Examine the candidates to determine the k nearest neighbors
• Effect of high dimensionality
In a high dimensional space, kc >> k
When kc is a very large percentage of the database, no index structure is helpful
Approximating query
Red are my
nearest neighbors
39
Sequential Scan is Better
– In a high-dimensional space, tree-based indexing structures examine large fraction of leaf nodes
– Instead of visiting so many tree nodes, it is better• to scan the whole data set, and • avoid performing seeks altogether
40
Vector Approximation (VA) File• How to speed-up linear scan ?
• A Solution - use approximation
– Divide data space into cells and allocates a bit-string to each cell
– Vectors inside a cell are approximated by the cell
– VA-file is an array of these geometric approximations
– For search, • the VA-file is scanned to select candidate vectors (i.e.,
relevant “cells”).
• Candidates are then verified by visiting the vector files (i.e., the original vectors)
41
VA-File Example
O1
O4
O3
O2
Data Space
00
11
10
01
00
111001
Approximation
O1 00 11
O2 10 10
O3 01 01
O4 11 00
Vector Data
O1 (0.1, 0.9)
O2 (0.7, 0.7)
O3 (0.3, 0.4)
O4 (0.8, 0.2)
VA file
Vectorfile
Principle Component Analysis
• Principle Component Analysis (PCA)– Goal is to find a projection that captures
the largest amount of variation in data
– transforms a number of possibly correlated variables (X1 and X2) into a smaller number of uncorrelated variables called principle components
• This technique can be used to reduce the number of dimensions in content-based image retrieval (CBIR)
42
X1
X2
e
Review: Variance
43
Mean of n data itemsStandard Deviation – A measureof how spread out the data are
Variance: A less computation version ofstandard deviation
Covariance
44
Variance only operates on one dimension
Covariance: It is useful to have a similar measure to find out how much the dimensions vary from the mean with respect to each other
Covariance Matrix
45
X1 X2 X3 X4 X5 X6 X7 X8
X1
X2
X3
X4
X5
X6
X7
X8
Cov(X3,X4)
Cov(X7,X7) = σ7
Review: Transformation Matrix
46
Transformation matrix
Vector - A point in the multidimensional
space
Another vector that is transformed from the original position
Review: Eigenvector (2)
47
• A matrix acts on a vector by changing both its magnitude and its direction
• This matrix may act on certain vectors by changing only their magnitude, and leaving their direction unchanged (or possibly reversing it)
• These vectors are the eigenvectors of the matrix
Not an eigenvector
Eigenvector
Integer multiple of the original vector, i.e., direction remains the same
A Transformation Example
48
• This transformation matrix does not change the direction or magnitude of the vectors along the central vertical axis (e.g., the red vector)
• All the pixels along the central vertical axis are the eigenvectors of this transformation matrix
Apply transformation
Eigenvector: Properties
49
Eigenvector
• Eigenvectors can only be found for square matrices• Not every square matrix has eigenvectors• If an nxn matrix does have eigenvectors, there are n of them• If we scale the vector by some amount before the multiplication,
we still get the same multiple of it as a result• All the unit eigenvectors (i.e., length is 1) are perpendicular
Eigenvalue
50
Eigenvector
The amount by which the original vector was scaled after multiplication by the square matrix is the same
• No matter what multiple of the eigenvector we take before the multiplication, we always get 4 times the scaled vector as the result
• “4” is the eigenvalue associated with this eigenvector
Scale the eigenvector
Eigenvalue
Scaled vector
Principal Components Analysis (1)
51
1. Prepare the matrix Data to hold the original data set, with a data item in each column and each row holding a separate dimension.
A feature vector
1st dimension
2nd dimension
nth dimension
Principal Components Analysis (2)
52
2. Compute the mean for each dimension of the data set (i.e., averaging each row in Data)
A feature vector
Compute means
Principal Components Analysis (3)
53
3. Compute the matrix DataAdjust by subtracting each entry in Data by the mean of the corresponding dimension
• Note: This produce a data set whose mean in each dimension is zero
Adjusted entry
Principal Components Analysis (4)
54
4. Compute the covariance matrix for the dimensions
Principal Components Analysis (5)
55
5. Calculate the unit eigenvectors and eigenvalues of the covariance matrix.
Note: Most math packages give unit eigenvectors
Principal Components Analysis (6)
56
6. Sort the eigenvectors in descending order according to their eigenvalue.
• The eigenvector with the largest eigenvalue is the principal component
• The sorting order gives the components in order of significance
Moresignificant
PCA & Dimension Reduction
57
First principle component
(more important)
Second principle component
(less important)
• The eigenvectors corresponding to the principle components are orthogonal
• We can map data from the original space into the new space defined by the orthogonal vectors
• We can reduce the number of dimensions by dropping some of the less important components
PCA: Deriving the New Data Set
58
1. Choose the components (eigenvectors) we want to keep and form a transformation matrix F, with an eigenvector in each row
2. Compute the new data set by applying the transformation F to the adjusted data matrix (coordinate transformation):
FinalData = F x DataAdjust
T
• We have projected the data from the original coordinate system to a lower dimensional space defined by the chosen eigenvectors
• The relative spatial distances among the original data items are mostly preserved in the lower dimensional space
PCA - Summary• Determine the eigenvectors of the
covariance matrix
• These eigenvectors define the new space with lower dimensionality
59
Data Clustering
• Supervised Classification• Semi-supervised Classification• Unsupervised kMeans Clustering• Semi-Supervised kMeans Clustering• Constrained kMeans Clustering
60
Supervised Classification
61
Training data(labeled data)
Supervised Classification
62
Result of supervised
learning
Supervised Classification
63
Item to be
classified
Supervised Classification
64
Classified based on
the dividing line
Supervised Classification
• Support Vector Machine (SVM)
• Artificial Neural Networks (ANN)
65
Support Vector Machine (1)• The dashed lines mark the distance between the
dividing line and the closest vectors (points) to the line• The vectors that constrain the width of the margin are
the support vectors• SVM analysis finds the dividing line that maximizes the
margin
66Large
margin Small
marginSupport vectors
Support Vector Machine (2)• Points on a 2-dimensional plane can be separated
by a 1-dimensional plane (i.e., a line)
• Points in an d-dimensional plane can be separated by a (d-1)-dimensional hyperplane
67
Support Vector Machine (3)• Problem: What if the
points are separated by a nonlinear region
68
• Solution: Using a kernel function to map the data into a higher-dimensional space where a hyperplane can be used to do the separation
Semi-supervised LearningLabeling a lot of data can be expensive
•Solution: Semi-supervised learning– Make use of unlabeled data in conjunction
with a small amount of labeled data
•Examples– Semi-supervised EM [Ghahramani,NIPS94],
[Nigam,ML00]– Transductive SVM [Joachims,ICML99]– Co-training [Blum,COLT98]
69
Co-training (1)• Many problems have two
complimentary views that can be used to label data
• Example: Faculty home pages1. “my advisor” pointing to a page is a good
indicator that it is a faculty home page
2. “I am teaching” in a page is a good indicator that it is a faculty home page
70
Co-training (2)Features can be split into two sets, i.e., x=(x1,x2).L: set of labeled examples U: set of unlabeled examples
Repeat k times:Use L to train a classifier c1 that considers only x1
Use L to train a classifier c2 that considers only x2
Apply c1 to label p positive and n negative examples from U Apply c2 to label p positive and n negative examples from U Move these self-labeled examples from U to L /* c1 adds labeled examples to L that c2 will be able to use for learning, and vice verse
Assumption: The class of each instance can be accurately predicted from each of the two feature subsets alone 71
Unsupervised Clustering: kMeans
72
Randomly initialize k
means
Unsupervised Clustering: kMeans
73
Assign points to clusters
Unsupervised Clustering: kMeans
74
Re-estimate means
Unsupervised Clustering: kMeans
75
Re-assign points to clusters
Unsupervised Clustering: kMeans
76
Re-estimate means
Unsupervised Clustering: kMeans
77
Assign points to clusters
No changes → ConvergeNo changes → Converge
These are the clusters
Unsupervised Clustering: kMeans• Initialize k cluster centers• Repeat until convergence
– Assign points to the cluster with the closest center
– For each cluster, re-estimate the center as the mean of the points in that cluster
78
Property: Locally minimizes sum of distances between the data points and their corresponding cluster center → More compact clusters
Choose well-separated
centers
Semi-supervised Clustering
79
Labeled data
Idea: Uses small amount of labeled data to guide (bias) the clustering of unlabeled data
Example: Use the labeled data to initialize clusters in k-Means algorithm
Semi-supervised Clustering
80
There are three
clusters
Constrained kMeans Clustering
• Must-link constraints specify that the two points have to be in the same cluster
• Cannot-link constraints specify that the two points must not be placed in the same cluster
81
D: Data setCon= : Set of must-link constraints
Con≠ : Set of cannot-link constraints
Constrained kMeans Clustering [Wagstaff, ICML01]
1. Let C1 … Ck be the initial cluster centers
2. For each point di in D, assign it to the closest cluster Cj such that VIOLATION(di, Cj, Con=, Con≠) is false. If no such cluster exists, fail (return {})
3. For each cluster Ci , update its center by averaging all of the points di that have been assigned to it
4. Repeat (2) and (3) until convergence
5. Return {C1 … Ck }
82
VIOLATION(data point d, cluster C, must-link constraints Con= , cannot-link constraints Con≠)
1. For each (d, d=) ϵ Con= , If d= is in some C’, return true
2. For each (d, d≠) ϵ Con≠ , If d≠ is in C, return true
3. Otherwise, return false
Can we assign d to C without violating any
constraint
83
Content-Based Image Indexing
• Keyword Approach
– Problem: there is no commonly agreed-upon vocabulary for describing image properties.
• Computer Vision Techniques
– Problem: General image understanding and object recognition is beyond the capability of current computer vision technology.
• Image Analysis Techniques
– It is relatively easy to capture the primitive image properties such as
• prominent regions,
• their colors and shapes,
• and related layout and location information within images.
– These features can be used to index image data.– Challenge: Semantic gap !
Original Segmented Contour
84
Features Acquisition: Image Segmentation
• Group adjacent pixels with similar color properties into one region, and
• segment the pixels with distinct color properties into different regions.
85
Image Indexing by contents
By applying image segmentation techniques, a set of regions are detected along with their locations, sizes, colors, and shapes.
These features can be used to index image data.
86
Color• We can divide the color space into a
small number of zones, each of which is clearly distinct with others for human eyes.
• Each of the zones is assigned a sequence number beginning from zero.
Notes: Human eyes are not very sensitive to colors. In fact, users only have a vague idea about the colors they want to specify.
87
ShapeShape feature can be measured by two properties: circularity and major axis orientation.
– Circularity:
– Major Axis Orientation:
The more circular the shape, the closer to one the circularity
420
0o ≤ orientation ≤ 360o
04
12
circularityarea
perimeter
circularityr
r
41
2 ( )
(2 )2
circularitya
a
4
4 4
2
2
( )
( )
circularitya
a
4 2
6
2
9
2
2
( )
( )
a
2a
a
a
r
88
Location• Image is divided into sub-areas.
• Each sub-area is labeled with a number.
• Region location is represented by ID of the sub-area in which the gravity center of the region is contained.
Note: When a user queries the database by visual contents, approximate feature values are used. It is meaningless to use absolute feature values as indices.
0 1 2
3 4 5
6 7 8A
B • Location of A is 4
• Location of B is 1
89
Size• The size range is divided into groups.• A region’s size is represented by the
corresponding group number.Example:
Notes: Only regions more than one-fourth of the sub-area are registered.
ASSIGNED SIZE
ACTUAL SIZES
1 ¼ Asub ˂ S ≤ ½ Asub
2 ½ Asub ˂ S ≤ Asub
3 Asub ˂ S ≤ 2 Asub
4 2 Asub ˂ S ≤ 3 Asub
5 3 Asub ˂ S ≤ 4 Asub
6 4 Asub ˂ S ≤ 5 Asub
7 5 Asub ˂ S ≤ 6 Asub
8 6 Asub ˂ S ≤ 7 Asub
9 7 Asub ˂ S ≤ 8 Asub
10 8 Asub ˂ S ≤ 9 Asub
0 1 2
3 4 5
6 7 8A
B
Cell area is
Asub
90
Texture Areas
Texture areas and images with dominant high frequency components are beyond the capacity of image segmentation techniques.
Matching on the distribution of colors (i.e., color histograms) is a simple yet effective means for these areas.
Strategy: Dividing an image into sub-areas and creating a histogram for each of the sub-areas.
Potential Issue: the partitioning of the image is to capture locality information. We don’t want to match an image with a red balloon on top with an image with a red car in the bottom.
91
Histograms• Gray-Level Histogram: It is a plot of the number
of pixels that assume each discrete value that the quantized image intensity can take.
• Color Histogram: It holds information on color distribution. It is a plot of the statistics of the R, G, B components in the 3-D color space.
R 3
B 3
G 2
R 3
B 3
G 2
Color ImageB
G
R
Color Histogram
3x3x2 = 18 bins
Image
Intensity
Gray-Level Histogram
Co
un
t
36
18
10
white gray black
92
We can use the largest, say 20, bins as the representative bins of the histogram.
these 20 bins form a chain in the 3-D color space.
Histograms (cont.)
B
(0,1,1)
(2,3,0)(6,2,0)
(8,2,6)
G
RO
Most histogram bins are sparsely populated, with only a small number of bins capturing the majority of pixel counts.
93
If we can represent such chains using a numerical number, then we can index the color images using a B+-tree.
– Connecting order: Representative bins are sorted in ascending order by their distance from the origin of the color space.
– Weighted Perimeter:
– Weighted Angle:
– Format of index key:
Histograms (cont.)
iii
i dC
WP ,1
20
1 2
1
ii
i aC
WA
20
1 2
1
B
(0,1,1)
(2,3,0)
(3,2,3)
(6,2,0)
(8,2,6)
G
ROdi i 1, ai
WP (10 bits) WA (10 bits)
94
Video Content Extraction
• Other forms of information extraction can be employed:
• Close-captioned text• Speech recognition• Descriptive information from screenplay• Key frames that characterize a shot
• These content information can be associated with the video story units.
95
Story UnitsShot: Frames recorded in one camera
operation form a shot.
Scene: One or several related shots are combined in a scene.
Sequence: A series of related scenes forms a sequence.
Video: A video is composed of different story units such as shots, scenes, and sequences arranged according to some logical structure (defined by the screen play).
These concepts can be used to organize video data
96
Video Modeling Approaches
• Physical Feature based Modeling
• Semantic Content based Modeling
97
Physical Feature based Modeling
• A Video is represented and indexed based on audio-visual features
– Features can be extracted automatically
• Queries are formulated in terms of color, texture, audio, or motion information
– Very limited in expressing queries close to human thinking
– It would be difficult to ask for a video clip showing the sinking of the ship in the movie “Titanic” using only color description
98
Semantic Content based Modeling
• Video semantics are captured and organized to support video retrieval
– Difficult to automate
– Partially rely on manual annotation
• Capable of supporting natural language like queries.
99
Semantic-Level Models
• Segmentation-based Models
• Stratification-based Models
• Temporal Coherent Model
100
Segmentation-based Modeling
• A video stream is segmented into temporally continuous segments
• Each segment is associated with a description which could be natural text, keywords, or other kinds of annotation.
• Disadvantages:– Lack of flexibility
– Limited in representing semantics
=>
=>Teacher enters classA student raises hand Break Principal interrupts class Empty class
0 10 30 35 70 90
101
Stratification-based• We partition the contextual information into single
events.
• Each event is associated with a video segment called a stratum.
• Strata can overlap or encompass each other.
Event 1
Event 2
Event 3
Event 3
Event 4
0 5 15 20 30 35 60 70 85 90
102
Temporal Coherent• Each event is associated with a set of video
segments where it happens.
• More flexible in structuring video semantics.
Event 1
Event 2
Event 3
Event 4
time
Advantage: Allowing easy retrieval by keyword, e.g., using inverted index
103
StratumThe concept of stratification can be used to assign descriptions to video footage.
- Each stratum refers to a sequence of video frames.
- The strata may overlap or totally encompass each other. Car wreck rescue mission
Medics
Victim
Pulled free In stretcher In ambulance
Siren Ambulance
Video Frames
time
104
Inverted Index
• frequency of the term in the document,
• locations of the term in the document,• other information pertaining to the relationship of
the term and the document.
Inverted Index
ANSI D1, D2
C D3, D4, D5
DB2 D1, D6, D21
GUI D2, D8. D11
SYBASE D2, D6, D17
MULTIMEDIA D5, D15
ORACLE D3, D11, D19
RELATIONAL D2, D3, D11, D19
SQL D2, D11, D20
JAVA D3, D20
B+-tree
Each document entry contains:
105
Video AlgebraGoal: To provide a high-level abstraction that
– models complex information associated with digital video data, and
– supports content-based access
Strategy:
– The algebraic video data model consists of hierarchical compositions of video expressions with high-level semantic descriptions
– The video expressions are constructed using video algebra operations
106
PresentationThe fundamental entity is a presentation
•A presentation is described by a video expression.
•A video expression describes a multi-window spatial-temporal, and content combination of video segments.
A presentatio
n
107
PresentationThe fundamental entity is a presentation
•A presentation is described by a video expression.
•A video expression describes a multi-window spatial-temporal, and content combination of video segments.
•An algebraic video node provides a means of abstraction by which video expressions can be named stored, and manipulated as units
video expression
video expression
video expression video expression
Compound video expression: constructed from simpler expressions using video algebra operations
Primitive video expression: creates a single-window presentation from a raw video segment
An algebraic
video node
Raw videoRaw video
108
Video Algebra Operations
The video algebra operations fall into four categories:
1. Creation: defines the construction of video expressions from raw video.
2. Description: associates content attributes with a video expression.
3. Composition: defines temporal relationships between component video expressions.
4. Output: defines spatial layout and audio output for component video expressions.
109
Descriptions•description E1 content : specifies that E1 is
described by content.– a content is a Boolean combination of attributes that
consists of a field name and a value.
– some field names have predefined semantics (e.g., title), while other fields are user-definable.
– values can assume a variety of types, including strings and video node names.
– field names or values do not have to be unique within a description.
•hide-content E1 : defines a presentation that hides the content of E1 (i.e.., E1 does not contain any description).
– This operation provides a method for creating abstraction barriers for content-based access.
Example: title = “CNN Headline News”
110
CompositionThe composition operations can be combined to produce complex scheduling definitions and constraints.
C1 = create Cnn.HeadlineNews.rv 10 30C2 = create Cnn.HeadlineNews.rv 20 40C3 = create Cnn.HeadlineNews.rv 32 65D1 = (description C1 “Anchor speaking”)D2 = (description C2 “Professor Smith”)D3 = (description C3 “Economic reform”)
Anchor speaking
Professor Smith
Economic reform
create a video presentation raw video segment
C1 C2 C3
)( 321 DDD
D3 follows D2 which follows D1, and common footages are not repeated. (It creates a non-redundant video stream from three overlapping segments.)
D3 follows D2 which follows D1, and common footages are not repeated. (It creates a non-redundant video stream from three overlapping segments.)
111
Composition Operators (1)
Operator Description
E1 E2
defines the presentation where E2 follows E1
E1 E2
defines the presentation where E2 follows E1 and common footage is not repeated
E1 E2
defines the presentation where only common footage of E1 and E2 is played
E1 - E2
defines the presentation where only footage of E1 that is not in E2 is played
E1 || E2
E1 and E2 are played concurrently and terminate simultaneously
(test) ? E1:E2:...:En Ei is played if test evaluates to i.
loop E1 timedefines a repetition of E1 for a duration of time
112
Composition Operators (2)
Operator Description
stretch E1 factor
sets the duration of the presentation equal to factor times duration of E1 by changing the playback speed of the video segment
limit E1 time
sets the duration of the presentation equal to the minimum of time and the duration of E1, but the playback speed is not changed
113
Composition Operators (3)• transition E1 E2 type time: defines type
transition effect between E1 and E2; time defines the duration of the transition effect
The transition type is one of a set of transition effects, such as dissolve, fade, and wipe.
• contains E1 query: defines the presentation that contains component expressions of E1 that match query. (similar to FROM clause in SQL)
A query is a Boolean combination of attributes:
Example: text: smith and text: question
114
Output CharacteristicsVideo expressions include output characteristics that specify the screen layout and audio output for playing back children streams.
A presentatio
n
115
Output CharacteristicsVideo expressions include output characteristics that specify the screen layout and audio output for playing back children streams.
– window E1 (X1 , Y1 ) - (X2 , Y2 ) priority
specifies that E1 will be displayed with priority in the window defined by the top-left corner (X1 , Y1) and the bottom-right corner (X2 , Y2) such that Xi in [0, 1] and Yi in [0, 1].
Window priorities are used to resolve overlap conflicts of screen display.
– audio E1 channel force priority
specifies that the audio of E1 will be output to channel with priority; if force is true, then the audio operation overrides any channel specifications of the component video expressions.
116
Output Characteristics exampleC1 = create MagicvsBulls.rv 30:0 50:0P1 = window C1 (0, 0) - (0.5, 0.5) 10P2 = window C1 (0, 0.5) - (0.5, 1) 20P3 = window C1 (0.5, 0.5) - (1, 1) 30P4 = window C1 (0.5, 0) - (1, 0.5) 40P5 = (P1 || P2 || P4)P6 = (P1 || P2 || P3 || P4)
(P5 || (window
(P5 || (window P6 (0.5, 0.5) - (1, 1) 60)) (0.5, 0.5) - (1, 1) 50))
Larger means higher priority
Bottom-right corner
Top-left corner
P1
P2
P4
P1 P4
P2
P1 P4
P2 P3
0
A presentation
Since expressions can be nested, the spatial layout of any particular video expression is defined relative to the parent rectangle.
117
Scope of a video node description
The scope of a given algebraic video node description is the subgraph that originates from the node.
– The components of a video expression inherit descriptions by context.
– i.e., content attributes associated with some parent video nodes are also associated with all its descendant nodes.
118
Content Based AccessSearch query : Search a collection of video nodes for video expressions that match query. Example: search text: smith AND text: question
Strategy: Matching a query to the attributes of an expression must take into account all of the attributes of that expression including the attributes of its encompassing expressions.
.
O
O
Raw video
Anchor Smith
Question fromaudience
Question
Smith on economic reform Result of
the queryThis node also
satisfies the querybut is not returnedbecause it’s adescendant of a node already inthe result set
119
Browsing and NavigationPlayback video-expression
Plays the video expression. It enables the user to view the presentation defined by the expressions.
Display video-expressionDisplay the video expression. It allows the user to inspect the video expression.
Get-parent video-expressionReturns the set of nodes that directly point to video-expression.
Get-children video-expressionReturns the set of nodes video-expression directly points at.
120
Algebraic Video System Prototype
The implementation is build on top of three existing subsystems:
– The VuSystem is used for managing raw video data and for its support of Tcl (Tool command language) programming. It provides an environment for recording, processing, and playing video.
– The Semantic File System is used as a storage subsystem with content-based access to data for indexing and retrieving files that represent algebraic video nodes.
– The Web server provides a graphical interface to the system that includes facilities for querying, navigating, video editing and composing, and invoking the video player.