Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
A G
raduate Course on
Multim
edia Technology
8. Autom
atic Content A
nalysis8.2-1
© W
olfgang Effelsberg,
Ralf S
teinmetz
8.2 Derivin
g V
ideo
Sem
antics
8.2.1C
ut D
etection
A very sim
ple and very basic scheme for the
investigation of video semantics is cu
t detectio
n. A
cut
denotes the border between tw
o sho
tsin a m
ovie. D
uring a shot the camera film
s continuously.
One distinguishes h
ard cu
ts and soft cu
ts(fade out,
fade in, resolve,....).
The m
ost important use of cut detection is to
decompose a video into sm
aller units which can then be
analyzed. A sam
ple application is the use of shots as the “atom
ic” units for storage and retrieval in a video archive.
A G
raduate Course on
Multim
edia Technology
8. Autom
atic Content A
nalysis8.2-2
© W
olfgang Effelsberg,
Ralf S
teinmetz
Cu
t Detectio
n w
ith C
olo
r Histo
gram
s
The sim
plest approach to cut detection is by means of
color histograms: If the color histogram
s of two
successive video frames i and i+
1 differ at least by a threshold value T
, a hard cut is detected.
Let H(r,g,b,i) be the value of the histogram
for a color triple (r,g,b) in a fram
e i, i.e., the number of pixels in
frame i that have color (r,g,b). A
cut is detected if and only if:
Ti
bg
rH
ib
gr
Hb
gr
≥+
−∑
,,
2))
1,
,,
()
,,
,(
(
A G
raduate Course on
Multim
edia Technology
8. Autom
atic Content A
nalysis8.2-3
© W
olfgang Effelsberg,
Ralf S
teinmetz
Exam
ple: C
ut D
etection
with
Gray-
scale Histo
gram
Differen
ces
fra
me
#
cut
cut
cut
dis
solv
e
intensity histogramm difference
A G
raduate Course on
Multim
edia Technology
8. Autom
atic Content A
nalysis8.2-4
© W
olfgang Effelsberg,
Ralf S
teinmetz
Typ
ical Detectio
n E
rrors
If we use color histogram
differences, the rate of successful detection of hard cuts is 90%
to 98% for a
typical video.
This m
ethod, however, fails w
hen the colors change significantly betw
een two adjacent fram
es even if no cut is present. F
alse hits are m
uch
mo
re com
mo
n th
an
misses.
Exam
ples
•T
he light is switched on in a room
,•
an explosion occurs,
•the cam
era pans very quickly.
A G
raduate Course on
Multim
edia Technology
8. Autom
atic Content A
nalysis8.2-5
© W
olfgang Effelsberg,
Ralf S
teinmetz
Cu
t Detectio
n w
ith th
e E
dg
e Ch
ang
e Ratio
Generally, the edges in the first fram
e after a cut differ largely from
the edges in the last frame before the cut.
Thus the E
CR
can be used to detect hard cuts.
Let EC
Ri be the edge change ration betw
een frame i
and frame i+1. A
cut is detected if and only if
TE
CR
i ≥
A draw
back of this approach is that it requires the com
putation of motion com
pensation for a video: fast cam
era panning or zooming or fast object m
ovements
lead to a high EC
R for successive fram
es even if there is no cut.
Object m
otion or camera operations can be
distinguished from a hard cut since they alw
ays last for several fram
es.
where T
is a threshold.
A G
raduate Course on
Multim
edia Technology
8. Autom
atic Content A
nalysis8.2-6
© W
olfgang Effelsberg,
Ralf S
teinmetz
Exam
ple: C
ut D
etection
with
EC
R
A G
raduate Course on
Multim
edia Technology
8. Autom
atic Content A
nalysis8.2-7
© W
olfgang Effelsberg,
Ralf S
teinmetz
Detectin
g S
oft C
uts
A fade betw
een successive scenes is much harder to
detect than a hard cut. One possibility is to try to detect
a characteristic behaviour of the edg
e chan
ge ratio
(EC
R) in the area of fade outs, fade ins and resolves.
Exam
ple:
During a reso
lve, the edges of the old shot disappear w
ith a constant ratio. At the sam
e time, the edges of the
new shot gradually appear. W
e observe a characteristic pattern of the E
CR
:
dissolve
sequence of frame
K
A G
raduate Course on
Multim
edia Technology
8. Autom
atic Content A
nalysis8.2-8
© W
olfgang Effelsberg,
Ralf S
teinmetz
Fad
e-ins an
d F
ade-o
uts (1)
In a similar fashion, fade-ins and fade-outs in a video
can be determined: W
hen fading out, the number of
pixels located on an edge will be zero in the last fram
e of the fade. W
hen fading in, the number of pixels
located on an edge is zero in the first frame.
Exam
ple:
EC
R during a fade-in and and a fade-out.
fade-infade-out
sequence of frames
A G
raduate Course on
Multim
edia Technology
8. Autom
atic Content A
nalysis8.2-9
© W
olfgang Effelsberg,
Ralf S
teinmetz
Co
uld
We D
etect So
ft Cu
ts With
Co
lor
Histo
gram
s?
It is obvious that soft transitions between co
lor
histo
gram
sare frequent in a video, and can have
many reasons. T
hus, it practically impossible to detect
soft cuts with color histogram
differences.
A G
raduate Course on
Multim
edia Technology
8. Autom
atic Content A
nalysis8.2-10
© W
olfgang Effelsberg,
Ralf S
teinmetz
8.2.2 Actio
n In
tensity
The intensity of action in a video shot is an interesting
parameter. F
or example, it can be used to distinguish
between different genres of T
V broadcasts, such as
newscasts vs. m
usic clips.
Action intensity can easily be estim
ated by means of
mo
tion
vectors:
one computes the average absolute
value of all vectors in each frame of the shot; this
includes both the motion of objects w
ithin the scene and cam
era operations. If the video was com
pressed using m
otion vectors, this physical parameter can be
computed w
ith minim
um cost.
The ed
ge ch
ang
e ratio E
CR
can also be used as an indicator for action: long and relatively static shots have a low
EC
R w
hile shots with a lot of m
otion have a high E
CR
.
A G
raduate Course on
Multim
edia Technology
8. Autom
atic Content A
nalysis8.2-11
© W
olfgang Effelsberg,
Ralf S
teinmetz
8.2.3D
etection
of C
amera O
peratio
ns
The notion of a cam
era operation encompasses
panning, zooming, etc. It is characteristic for cam
era operations that they affect all pixels of a fram
e in a sim
ilar, deterministic m
anner.
Exam
ple 1
When the cam
era pans, all pixels move into the sam
e direction, by the sam
e amount, betw
een two fram
es.
Exam
ple 2
When the cam
era zooms in, all pixels (except the one
in the center of the frame) m
ove from the center to the
boundaries.
A G
raduate Course on
Multim
edia Technology
8. Autom
atic Content A
nalysis8.2-12
© W
olfgang Effelsberg,
Ralf S
teinmetz
Alg
orith
m fo
r the D
etection
of C
amera
Op
eration
s
Alg
orith
mD
etect-Cam
era-Op
eration
s•
Use the m
otion vectors of the underlying com
pression algorithm (e.g., M
PE
G-2), or com
pute the optical flow
for a sequence of frames.
•D
etermine if the direction and the length of the
motion vectors fit to the schem
e of a pre-defined cam
era operation.
The detection of cam
era operations works w
ell when
the shot under consideration contains no or little object m
otion. When cam
era operation and object motion
appear simultaneously (w
hich is quite comm
on in practice), autom
atic detection becomes very difficult.
A G
raduate Course on
Multim
edia Technology
8. Autom
atic Content A
nalysis8.2-13
© W
olfgang Effelsberg,
Ralf S
teinmetz
Derivin
g a P
ano
ramic Im
age fro
m
Cam
era Mo
tion
When cam
era motion can be detected autom
atically, it is also possible to com
pute a panoramic im
age from a
panning operation.
Exam
ple
Com
putation of panoramic im
ages from a video that
was recorded by a cam
era placed on a person‘s head (S
teve Mann, M
IT M
edia Lab, 1996)
A G
raduate Course on
Multim
edia Technology
8. Autom
atic Content A
nalysis8.2-14
© W
olfgang Effelsberg,
Ralf S
teinmetz
Pan
oram
ic Imag
e Gen
eration
We w
ant to generate a panoramic im
age of ourenvironm
ent from a fixed position.
Parallax effect:
If the camera m
oves objects atdifferent distances appear shifted against each other.->
the camera m
ust not change position.
camera m
oves
The easy approach: a cylin
drical p
ano
rama. T
he cam
era only rotates around its vertical axis. Images are
mapped onto a virtual cylinder around the cam
era position.
A G
raduate Course on
Multim
edia Technology
8. Autom
atic Content A
nalysis8.2-15
© W
olfgang Effelsberg,
Ralf S
teinmetz
x
y
z1
input image
panorama
Cylin
drical P
ano
ramas (1)
Map the
image to cylindrical coordinates.
22
/
)/
arctan(
zx
yv
zx
�
+= =
�v
Note that the focal-length z has to be know
n for the
coordinate transform.
A G
raduate Course on
Multim
edia Technology
8. Autom
atic Content A
nalysis8.2-16
© W
olfgang Effelsberg,
Ralf S
teinmetz
Cylin
drical P
ano
ramas (2)
x
y
�
v
Image exam
ple
A G
raduate Course on
Multim
edia Technology
8. Autom
atic Content A
nalysis8.2-17
© W
olfgang Effelsberg,
Ralf S
teinmetz
Cylin
drical P
ano
ramas (3)
Transform
ed images have to be aligned by m
atching the im
age content in the overlapping areas.
∑+
+−
=y
xy
xy
xt
yt
xg
yx
ft
tE
rr,
2|),
()
,(
|)
,(
)(
)(
42
2N
OM
NO
≈
We conclude that w
e need faster algorithms.
The m
inimum
can be found by an exhaustive search over ( tx ;ty ). H
owever, com
putational complexity for
image size
NxN
pixels and search rangeM
xMpixels is:
Find translation vector ( tx ;ty ) by m
inimizing the m
atching error betw
een image f and im
age g:
A G
raduate Course on
Multim
edia Technology
8. Autom
atic Content A
nalysis8.2-18
© W
olfgang Effelsberg,
Ralf S
teinmetz
Cylin
drical P
ano
ramas (4)
Pel-R
ecursive M
otio
n E
stimatio
n:
For sim
plicity consider the one-dim
ensional case. Let g(x) be an exact copy of f(x), displaced by a constant t.
f(x)
g(x)=f(x+
t)
tf(x0)-g(x0)
x
brightnessWith the spatialderivative of f, w
e obtain t as:x fd d
)(
d d)
()
(
0
00
xx f
xf
xg
t−
=
How
ever, this only works for sm
all displacements t.
A G
raduate Course on
Multim
edia Technology
8. Autom
atic Content A
nalysis8.2-19
© W
olfgang Effelsberg,
Ralf S
teinmetz
Cylin
drical P
ano
ramas (5)
Hierarch
ical Mo
tion
Estim
ation
Alg
orith
m•
Scale the original im
age down by a constant factor to
build a pyramid of the im
age at different resolutions.•
Do m
otion estimation on the low
est resolution layer.•
Scale the obtained m
otion model upw
ard to the next resolution level and refine the m
otion model.
scale + refinement
scale + refinement
A G
raduate Course on
Multim
edia Technology
8. Autom
atic Content A
nalysis8.2-20
© W
olfgang Effelsberg,
Ralf S
teinmetz
Cylin
drical P
ano
ramas (6)
Exam
ple result of a cylindrical panorama:
optical axisvirtual im
age plane
horizontal image axis
Disad
vantag
es of cylin
drical p
ano
ramas
•D
istortion of straight lines to curved lines•
Focal length (zoom
) has to be known or estim
ated•
Rotation axis has to be perpendicular to the optical axis
and to the horizontal image axis:
•S
tanding on a hill and looking down is not possible
•C
amera m
ay not be rotated around horizontal axis (but m
athematical extension to spherical
panoramas
is possible).
A G
raduate Course on
Multim
edia Technology
8. Autom
atic Content A
nalysis8.2-21
© W
olfgang Effelsberg,
Ralf S
teinmetz
Fu
ll-Persp
ective Pan
oram
as (1)
Assum
e that your environment is painted on a single,
large plane (environment painted on a glass plane).
Describe the m
otion that the solid plane can perform in
space.
+
=
y x
t t
y x
aa
aa
y x
2221
1211
' '
1'
21
1211
++
++
=y
bx
b
ty
ax
ax
x
1'
21
2221
++
++
=y
bx
b
ty
ax
ay
y
rotate, scaletranslate
3D:
perspective transformation (affine
+ 2 additional
parameters for perspective projection)
2D:
affinem
otion (translation, rotation, scaling)
A G
raduate Course on
Multim
edia Technology
8. Autom
atic Content A
nalysis8.2-22
© W
olfgang Effelsberg,
Ralf S
teinmetz
Fu
ll-Persp
ective Pan
oram
as (2)
Think of our cam
era taking images of the glass plane
from different orientations. T
o align the images, the
relative transformation betw
een the images has to be
estimated.
Exhaustive search for the param
eters is notpossible because the param
eter space is very large (8 param
eters).
Possible approach for p
arameter estim
ation
:•
Find an estim
ate with a low
er-dimensional m
odel (e.g., translatorial only to coarsely com
pensate m
otion)•
Determ
ine an initial estimate for the perspective
model (see next slide)
•U
se the gradient-descent technique for fine alignm
ent.
A G
raduate Course on
Multim
edia Technology
8. Autom
atic Content A
nalysis8.2-23
© W
olfgang Effelsberg,
Ralf S
teinmetz
Fu
ll Persp
ective Pan
oram
as (3)
Determ
ine in
itial estimatio
nof the perspective m
odel:
1.S
earch for four blocks with characteristic features.
2.E
ven though we have com
pensated translatorial m
otion the features will not m
atch perfectly. Refine
matching positions of the blocks in the second im
age by block-m
atching.In general, the motion vectors w
ill be different for the four blocks.
3.T
hese four motion vectors (w
ith two param
eters each) can be used to determ
ine the eigthparam
eters of the perspective m
otion model (eight equations for eight
unknowns).
A G
raduate Course on
Multim
edia Technology
8. Autom
atic Content A
nalysis8.2-24
© W
olfgang Effelsberg,
Ralf S
teinmetz
Fu
ll-Persp
ective Pan
oram
as (4)
Fin
ealig
nm
ent
1.S
tart with the param
eter vector estimated in last step:
2
,
|)'
,'(
),
(|
)(
∑−
=y
x
yx
gy
xf
pE
rr&
)(
1i
ii
pE
rrp
p&
&&
∇−
=+
);
;;
;;
;;
(2
122
2112
110
bb
tt
aa
aa
py
x=
initial estim
ate
error fu
nctio
n (h
ardto
calculate)
0p &
2.C
onsider the matching-error depending on the
transformation param
eters:
3.D
o a gradient descent search to find a better match.
A G
raduate Course on
Multim
edia Technology
8. Autom
atic Content A
nalysis8.2-25
© W
olfgang Effelsberg,
Ralf S
teinmetz
Fu
ll-Persp
ective Pan
oram
as (5)
Imag
e examp
le
A G
raduate Course on
Multim
edia Technology
8. Autom
atic Content A
nalysis8.2-26
© W
olfgang Effelsberg,
Ralf S
teinmetz
8.2.4T
ext Detectio
n
Go
alE
xtraction of text appearing in a video. Motivation: text
is rich in semantics.
We distinguish betw
een artificially generated text (such as a m
ovie title) and text appearing in a scene.
Alg
orith
m•
Detect regions of the fram
e containing text, •
cut them out,
•run an O
CR
algorithm (optical character
recognition.
Ch
aracteristics of g
enerated
text in a vid
eo•
monochrom
e•
rigid•
in the foreground•
the letters have a minim
um and m
aximum
size •
either stationary or moving linearly
•high contrast to background
•appears repeatedly in successive fram
es•
letters appear in groups.
A G
raduate Course on
Multim
edia Technology
8. Autom
atic Content A
nalysis8.2-27
© W
olfgang Effelsberg,
Ralf S
teinmetz
Text S
egm
entatio
n (1)
Text is m
on
och
rom
e
orig
inal fram
e of a vid
eoseg
men
tation
into
regio
ns
A G
raduate Course on
Multim
edia Technology
8. Autom
atic Content A
nalysis8.2-28
© W
olfgang Effelsberg,
Ralf S
teinmetz
Text S
egm
entatio
n (2)
Apply thresholds for the size of the letters and the
contrast.
previo
us resu
ltafter
thresh
old
ing
A G
raduate Course on
Multim
edia Technology
8. Autom
atic Content A
nalysis8.2-29
© W
olfgang Effelsberg,
Ralf S
teinmetz
Text S
egm
entatio
n (3)
The text m
ust be either stationary or moving linearly
(here: horizontally).
after app
lication
of
the ru
le of h
orizo
ntal
mo
tion
previo
us resu
lt
A G
raduate Course on
Multim
edia Technology
8. Autom
atic Content A
nalysis8.2-30
© W
olfgang Effelsberg,
Ralf S
teinmetz
Exp
erimen
tal Resu
lts
test video
n
um
ber
of fram
es n
um
ber o
f letters
successfu
l fin
din
gs
title sequ
ences
7372 6423
99%
advertisem
ents
6858 1065
99%
new
scast 18624
1411 97%
(from the dissertation of R
ainer Lienhart, Praktische
Informatik
IV, U
. Mannheim
)
Detection of text regions in a video
A G
raduate Course on
Multim
edia Technology
8. Autom
atic Content A
nalysis8.2-31
© W
olfgang Effelsberg,
Ralf S
teinmetz
8.2.5F
ace Detectio
n
Go
alD
etection of areas in a video frame that show
the frontal view
of a human face.
Ap
pro
ach•
Construct a neural netw
ork for the detection of texture
•T
rain this network w
ith thousands of faces, where
the line between the eyes as w
ell as an orthogonal line tow
ards the tip of the nose are marked
•P
rocess an unknown im
age with search areas in
varying sizes:-
pre-processing / filtering / standardization of the illum
ination-
test on the hypothesis “face” with the trained
neural network.
A G
raduate Course on
Multim
edia Technology
8. Autom
atic Content A
nalysis8.2-32
© W
olfgang Effelsberg,
Ralf S
teinmetz
Alg
orith
m ”F
ace Detectio
n”
Pyram
id o
f the
inp
ut im
age
Search
area (20x20 p
ixels)S
tand
ardizatio
n
of lu
min
ance
Stan
dard
ization
o
f histo
gram
subsampling
Pre-p
rocessin
g
Stan
dard
ization
o
f histo
gram
Neu
ron
al netw
ork
Inp
ut in
to
the n
etwo
rk
20 x 20p
ixels
Recep
tive field
s
Layer o
f h
idd
en
neu
ron
s
Resu
lt of
the n
etwo
rk
A G
raduate Course on
Multim
edia Technology
8. Autom
atic Content A
nalysis8.2-33
© W
olfgang Effelsberg,
Ralf S
teinmetz
Mu
ltiple D
etection
A G
raduate Course on
Multim
edia Technology
8. Autom
atic Content A
nalysis8.2-34
© W
olfgang Effelsberg,
Ralf S
teinmetz
Visu
alization
of R
esults