Upload
trinhcong
View
224
Download
2
Embed Size (px)
Citation preview
TEORIJA INFORMACIJE I KODIRANJE
Željko
Jeričević, dr. sc.Zavod
za
računarstvo, Tehnički
fakultet
&
Zavod
za
biologiju
i medicinsku
genetiku, Medicinski
fakultet51000 Rijeka, Croatia
Phone: (+385) 51-651 594 E-mail: [email protected]
http://www.riteh.uniri.hr/~zeljkoj/Zeljko_Jericevic.html
10 February 2012 Zeljko Jericevic, Ph.D. 2
TEORIJA INFORMACIJE I KODIRANJE
Predavač:Željko
Jeričević
2-54 651-594 [email protected]://www.riteh.uniri.hr/~zeljkoj/Zeljko_Jericevic.html
10 February 2012 Zeljko Jericevic, Ph.D. 3
Teorija
informacija
u obradi podataka
Univerzalnost
teorije informacija
10 February 2012 Zeljko Jericevic, Ph.D. 4
Teorija
informacija
u obradi podataka
Nominalna
varijabla: poprima
vrijednosti
iz
nekog neporedanog
skupa
(mjesto
i država
boravka, …
itd)
Ordinalna
varijabla: poprima
diskretne
vrijednosti
iz nekog
poredanog
skupa, ne nužno
s metričkim
distancama
(redosljed
planeta
u Sunčevu
sustavu, ocijene
u školi, …
itd)
Kontinuirane
varijable
poprimaju
vrijednosti
iz
skupa realnih
brojeva
(vrijeme, distanca, temperatura, …
itd)
10 February 2012 Zeljko Jericevic, Ph.D. 5
Nominal variablesA variable is called nominal if its values are the members
of some unordered set. For example, “state of residence” is a nominal variable that (in the U.S.) takes on one of 50
values; in astrophysics, “type of galaxy”
is a nominal variable with the three values “spiral,”
“elliptical,”
and
“irregular”.The central tendency of a nominal attribute is given by its
mode; neither the mean nor the median can be defined.Variables assessed on a nominal scale are sometimes called
categorical variables.
6
Mode• In statistics, the mode
is the value that occurs most
frequently
in a data set or a probability distribution. In some fields, notably education, sample data are often called scores, and the sample mode is known as the modal score.
• Like the statistical mean and the median, the mode is a way of capturing important information about a random variable or a population in a single quantity. The mode is in general different from the mean and median, and may be very different for strongly skewed distributions.
7
ModeThe mode is not necessarily unique, since the same maximum frequency may be attained at different values. The most ambiguous case occurs in uniform distributions, wherein all values are equally likely.
8
Mode• The mode of a discrete probability distribution is the
value x at which its probability mass function takes its maximum value. In other words, it is the value that is most likely to be sampled.
• The mode of a continuous probability distribution is the value x at which its probability density function attains its maximum value, so, informally speaking, the mode is at the peak.
• As noted above, the mode is not necessarily unique, since the probability mass function or probability density function may achieve its maximum value at several points x1
, x2
, etc.
10 February 2012 Zeljko Jericevic, Ph.D. 9
ModeThe above definition tells us that only global maxima are modes. Slightly confusingly, when a probability density function has multiple local maxima it is common to refer to all of the local maxima as modes of the distribution. Such a continuous distribution is called multimodal (as opposed to unimodal).
10 February 2012 Zeljko Jericevic, Ph.D. 10
ModeIn symmetric unimodal
distributions, such as the
normal (or Gaussian) distribution (the distribution whose density function, when graphed, gives the famous "bell curve"), the mean (if defined), median and mode all coincide. For samples, if it is known that they are drawn from a symmetric distribution, the sample mean can be used as an estimate of the population mode. The mode is if there is more than one number in the plot Example: 1,2,2,3,4,5,5,6,5,1 The numbers that repeat are the mode. There can be more than one mode anytime
10 February 2012 Zeljko Jericevic, Ph.D. 11
ModeThe mode of a data sample is the element that occurs most often in the collection. For example, the mode of the sample [1, 3, 6, 6, 6, 6, 7, 7, 12, 12, 17] is 6. Given the list of data [1, 1, 2, 4, 4] the mode is not unique -
the
dataset may be said to be bimodal, while a set with more than two modes may be described as multimodal.
10 February 2012 Zeljko Jericevic, Ph.D. 12
ModeFor a sample from a continuous distribution, such as [0.935..., 1.211..., 2.430..., 3.668..., 3.874...], the concept is unusable in its raw form, since each value will occur precisely once. The usual practice is to discretize
the
data by assigning frequency values to intervals of equal distance, as for making a histogram, effectively replacing the values by the midpoints of the intervals they are assigned to. The mode is then the value where the histogram reaches its peak.
10 February 2012 Zeljko Jericevic, Ph.D. 13
ModeFor small or middle-sized samples the outcome of this procedure is sensitive to the choice of interval width if chosen too narrow or too wide; typically one should have a sizable fraction of the data concentrated in a relatively small number of intervals (5 to 10), while the fraction of the data falling outside these intervals is also sizable.
10 February 2012 Zeljko Jericevic, Ph.D. 14
Ordinal variableA variable is termed ordinal if its values are the members
of a discrete, but ordered, set. Examples are grade in school, planetary order from the Sun (Mercury = 1, Venus = 2, : : :), and number of offspring. There need not be any concept of “equal metric distance”
between
the values of an ordinal variable, only that they be intrinsically ordered. Another example is Mohs
scale of
mineral hardness”
10 February 2012 Zeljko Jericevic, Ph.D. 15
Ordinal variable: Mohs
scale of mineral hardness
10 February 2012 Zeljko Jericevic, Ph.D. 16
Ordinal variableWhen using an ordinal scale, the central tendency of a
group of items can be described by using the group's mode (or most common item) or its median (the middle-
ranked item), but the mean (or average) cannot be defined.
Mean does not make sense for ordinal variables but median does!
10 February 2012 Zeljko Jericevic, Ph.D. 17
Continuous variables“We will call a variable continuous if its values are real numbers, as are times, distances, temperatures, etc. (Social scientists sometimes distinguish between interval and ratio continuous variables, but we do not find that distinction very compelling.)”Press, NR3
18
Interval continuous variablesQuantitative attributes are all measurable on interval scales, as any difference
between the levels of an
attribute can be multiplied by any real number to exceed or equal another difference. A highly familiar example of interval scale measurement is temperature with the Celsius scale. In this particular scale, the unit of measurement is 1/100 of the temperature difference between the freezing and boiling points of water under a pressure of 1 atmosphere.
19
Interval continuous variablesThe "zero point" on an interval scale is arbitrary; and negative values can be used. The formal mathematical term is an affine space (in this case an affine line). Variables measured at the interval level are called "interval variables" or sometimes "scaled variables" as they have units of measurement.
10 February 2012 Zeljko Jericevic, Ph.D. 20
Interval continuous variablesRatios between numbers on the scale are not meaningful, so operations such as multiplication and division cannot be carried out directly. But ratios of differences can be expressed; for example, one difference can be twice another.
10 February 2012 Zeljko Jericevic, Ph.D. 21
Interval continuous variablesThe central tendency of a variable measured at the
interval level can be represented by its mode, its median, or its arithmetic mean. Statistical dispersion can be measured in most of the usual ways, which just involved differences or averaging, such as range, and standard deviation.
10 February 2012 Zeljko Jericevic, Ph.D. 22
Ratio continuousMost measurement in the physical sciences and engineering is done on ratio scales. Mass, length, time, plane angle, energy and electric charge are examples of physical measures that are ratio scales. The scale type takes its name from the fact that measurement is the estimation of the ratio between a magnitude of a continuous quantity and a unit magnitude of the same kind.
10 February 2012 Zeljko Jericevic, Ph.D. 23
Ratio continuousInformally, the distinguishing feature of a ratio scale is the possession of a non-arbitrary zero value. For example, the Kelvin temperature scale has a non-
arbitrary zero point of absolute zero, which is denoted 0K and is equal to -273.15 degrees Celsius. This zero point is non arbitrary as the particles that compose matter at this temperature have zero kinetic energy.
Obrada podataka
Example of a contingency table for two nominal variables, here sex and color. The row and column marginals
(totals) are shown.
The variables are “nominal,”
i.e., the order in which their values are listed is arbitrary and does not affect the result of the contingency table analysis. If the ordering of values has some intrinsic meaning, then the variables are “ordinal”
or
“continuous,”
and correlation techniques can be utilized.
Nominal variables
Measures of association between nominal variables: For any pair of nominal variables, the data can be displayed as a contingency table, a table whose rows are labeled by the values of one nominal variable, whose columns are labeled by the values of the other nominal variable, and whose entries are nonnegative integers giving the number of observed events for each combination of row
and column. The analysis of association between nominal variables is thus called contingency table analysis or cross tabulation analysis.
10 February 2012 Zeljko Jericevic, Ph.D. 26
Nominal variablesChi-square statistic does a good job of characterizing the significance of association but is only so-so as a measure of the strength (principally because its numerical values have no very direct interpretations). Contingency table analysis based on the information-
theoretic concept of entropy, will say little about the significance of association (use chi-square for that!) but is capable of very elegantly characterizing the strength of an association already known to be significant. NR3-742
27
Measures of Association Based on χ2Some notation first: Let Nij
denote the number of events that occur with the first variable x taking on its i-th
value and the
second variable y taking on its j-th
value.Let N denote the total number of events, the sum of all the Nij
’s. Let Ni
denote the number of events for which the first variable x takes on its i-th
value regardless of the value of y; Nj
is the number of events with the j th
value of y regardless of x.
i ij j ijj i
i ji j
N N N N
N N N
• •
• •
= =
= =
∑ ∑
∑ ∑
10 February 2012 Zeljko Jericevic, Ph.D. 28
χ2
Based Association
In other words, “dot”
is a placeholder that means, “sum over the missing index”. N.j
and Ni.
are sometimes called the row and column totals or marginals.
i ij j ijj i
i ji j
N N N N
N N N
• •
• •
= =
= =
∑ ∑
∑ ∑
29
χ2
Based AssociationThe null hypothesis is that the two variables x and
y have no association. In this case, the probability of a particular value of x given a particular value of y should be the same as the probability of that value of x regardless of y. Therefore, in the null hypothesis, the expected number
for any Nij
, which we will denote nij
, can be calculated from only the row and column totals
= which implies ij i jiij
j
n N NN nN N N
• ••
•
=
30
χ2
Based AssociationNotice that if a column or row total is zero, then the
expected number for all the entries in that column or row is also zero; in that case, the never-occurring bin of x or y should simply be removed from the analysis.
The chi-square statistic is now given by equation below which, in the present case, is summed over all entries in the table:
( )2
2
,
ij ij
i j ij
N nn
χ−
=∑
10 February 2012 Zeljko Jericevic, Ph.D. 31
χ2
Based AssociationSuppose there is a significant association. How do we
quantify its strength, so that (e.g.) we can compare the strength of one association with another? The idea here is to find some reparametrization
of χ2
that maps it into
some convenient interval, like 0 to 1, where the result is not dependent on the quantity of data that we happen to sample, but rather depends only on the underlying population from which the data were drawn. There are several different ways of doing this. Two of the more common are called Cramer’s V and the contingency coefficient C.
32
Obrada
podatakaThe formula for Cramer’s V is
where I and J are again the numbers of rows and columns, and N is the total number of events. Cramer’s V has the pleasant property that it lies between zero and one inclusive, equals zero when there is no association, and equals one only when the association is perfect: All the events in any row lie in one unique column, and vice versa. (In chess parlance, no two rooks, placed
on a nonzero table entry, can capture each other.)
( )2
min 1, 1v
N I Jχ
=− −
33
Obrada
podatakaIn the case of I = J = 2, Cramer’s V is also referred to as
the phi statistic. The contingency coefficient C is defined as
It also lies between zero and one, but (as is apparent from the formula) it can never achieve the upper limit. While it can be used to compare the strength of association of two tables with the same I and J , its upper limit depends on I and J . Therefore it can never be used to compare tables of different sizes.
2
2CN
χχ
=+
34
Obrada
podatakaInformation-Theoretic Properties of DistributionsIn this section we return to nominal distributions, that is to say, to distributions with discrete outcomes that have no meaningful ordering. Information theory provides a different, and sometimes very useful, perspective on the nature of such a distribution p with outcomes i , 0 ≤
i ≤
I-
1, and associated probabilities pi
, and on the relation between two or more such distributions.
10 February 2012 Zeljko Jericevic, Ph.D. 35
Obrada
podataka
Entropy of a Distribution
21
20
ln
ln 0lim
je vjerojatnost stanja
I
i ii
p
i
H p p
p p
p i
=
→
= −∑
=
10 February 2012 Zeljko Jericevic, Ph.D. 36
Information theoryZdružena
entropija
H(X,Y)
kada
zajednički
promatramo
dvije
slučajne
varijable
(xi ,yj ):
( ) ( )21 1
( , ) , log ,n m
i j i ji j
H X Y p x y p x y= =
= −∑∑
10 February 2012 Zeljko Jericevic, Ph.D. 37
Information theoryUvjetna
entropija
H(Y|X)
je prosječna
vrijednost
entropije
slučajne
varijable
Y uz
poznati
X. Prosjek
se uzima
po
svim
vrijednostima
varijable
X
( ) ( )
( ) ( ) ( )
( ) ( )
1
21 1
21 1
( | ) |
| log |
, log |
n
i ii
n m
i j i j ii j
n m
i j j ii j
H X Y p x H Y x x
p x p y x p y x
p x y p y x
=
= =
= =
= =
= −
= −
∑
∑ ∑
∑∑
10 February 2012 Zeljko Jericevic, Ph.D. 38
Information theoryDrugi
način
gledanja
na
uvjetnu
entropiju
H(Y|X)
je prosječna
neodređenost
slučajne varijable
Y nakon
što
je poznata
varijabla
X. Prije
nego
što
je varijabla
X bila
poznata, entropija
varijable
Y je H(Y). Uz
pretpostavku
da
X utječe na Y, nakon
što
je X poznat, entropija
Y se smanjuje
i postaje
H(Y|X).
10 February 2012 Zeljko Jericevic, Ph.D. 39
Information theorySrednji
uzajamni
sadržaj
informacije
i relativna
entropija.
Relativna
entropija
DKL (p||q)
između
dvaju
razdioba
vjerojatnosti (p(X) i q(X)) slučajne
varijable
X je mjera
divergencije
(Kullback–
Leibler
divergence) između
spomenutih
razdioba. Budući da DKL (p||q) ≠
DKL (q||p)
nije
matematički
ispravno
zvati
DKL
udaljenošću, kao
što
je uobičajeno.
( ) ( ) ( )( )2
1|| log
ni
KL ii i
p xD p q p x
q x=
=∑
10 February 2012 Zeljko Jericevic, Ph.D. 40
Information theoryRelativna
entropija
( ) ( ) ( )( )2
1|| log
ni
KL ii i
p xD p q p x
q x=
=∑
10 February 2012 Zeljko Jericevic, Ph.D. 41
Teorija
informacija
u obradi podataka
Example 1. Suppose that we are seeing events drawn from the distribution p, but we want to rule out an alternative hypothesis that they are drawn from q. We might do this by computing a likelihood ratio L,
( )( )
||
i
data i
p Data p pLp Data q q
= =Π
10 February 2012 Zeljko Jericevic, Ph.D. 42
Teorija
informacija
u obradi podataka
and rejecting the alternative hypothesis q if this ratio is larger than some large number, say 106. (In the above shorthand notation, the product over “data”
means that
we substitute for i in each factor the particular outcome of that factor’s individual data event.) You can easily see that, under hypothesis p, the average increase in lnL
per
data event is just D(p||q). In otherwords, the Kullback- Leibler
distance is the expected log-likelihood with
which a false hypothesis q can be rejected, per event. As we might expect, this has something to do with “how different”
q is from p.
10 February 2012 Zeljko Jericevic, Ph.D. 43
Information theoryUzajamni
sadržaj
informacije
I(X;Y)
(transinformacija)
između slučajnih
varijabli
X i Y je relativna
entropija između
razdiobe
njihovih
združenih
vjerojatnosti
i
razdiobe
umnožaka
njihovih
pojedinačnih
vjerojatnosti:
( ) ( ) ( )( ) ( )2
1 1
,, , log
n mi j
i ji j i j
p x yI X Y p x y
p x p y= =
= ∑∑
10 February 2012 Zeljko Jericevic, Ph.D. 44
Information theoryUzajamni
sadržaj
informacije
I(X;Y)
(transinformacija)
izražava
mjeru
koliko
informacije
jedna
varijabla
sadrži o drugoj.
Ako
su
varijable
potpuno
neovisne, I(X;Y) = 0
jer
p(xi ,yj ) = p(xi )p(yj )
( ) ( ) ( )( ) ( )2
1 1
,, , log
n mi j
i ji j i j
p x yI X Y p x y
p x p y= =
= ∑∑
10 February 2012 Zeljko Jericevic, Ph.D. 45
Information theoryUzajamni
sadržaj
informacije
I(X;Y)
(transinformacija)
izražava
mjeru
koliko
informacije
jedna
varijabla
sadrži o drugoj.
Ako
su
varijable
jednake, I(X;Y) = H(X) = H(Y)
jer
jednu varijablu
možemo
u potpunosti
opisati
drugom:
p(xi ,yj ) = p(xi ) = p(yj ) za
i=jp(xi ,yj ) = 0 za
i≠j
10 February 2012 Zeljko Jericevic, Ph.D. 46
Information theoryEntropija
i uzajamni
sadržaj
informacije:
I(X;Y) = H(X) –
H(X|Y)je smanjenje
neodređenosti
(entropije) varijable
X
uzrokovano
poznavanjem
varijable
Y. Vrijedi
i obrnuto:
I(Y;X) = H(Y) –
H(Y|X)Simetrija
uzajamnog
sadržaja
informacije
dviju
varijabli:
I(X;Y) = I(Y;X)
10 February 2012 Zeljko Jericevic, Ph.D. 47
Information theoryEntropija
H(X), združena
entropija
H(X,Y)
i uvjetna
entropija
H(Y|X)
bazira
se na
definiciji
združene entropije
i odnosa
vjerojatnosti: p(x,y)=p(x)p(y|x)
( ) ( ) ( ) ( ) ( ) ( )
( ) ( ) ( ) ( )
( ) ( ) ( ) ( )( )
2 21 1 1 1
2 21 1 1 1
2 21 1 1
, , log , , log |
, log , log |
log , log |
n m n m
i j i j i j i i ji j i j
n m n m
i j i i j j ii j i j
n n m
i i i j j ii i j
H X Y p x y p x y p x y p x p y x
p x y p x p x y p y x
p x p x p x y p y x
H X H
= = = =
= = = =
= = =
= − = −
= − −
= − −
= +
∑∑ ∑∑
∑∑ ∑∑
∑ ∑∑( )|Y X
10 February 2012 Zeljko Jericevic, Ph.D. 48
Information theoryZdružena
entropija
H(X,Y), para
varijabli
jednaka
je
zbroju
entropije
jedne
varijable
H(X)
i preostale entropija
druge
varijable
uz
uvjet
da
je prva
varijabla
poznata
H(Y|X).Iz
toga proizlazi
da
je uzajamni
sadržaj
informacije
I(X;Y)
jednak:I(X;Y) = H(X) + H(Y) -
H(X,Y)
Gdje
je H(X,Y)
korekcija
entropije
u slučaju
ovisnih varijabli
10 February 2012 Zeljko Jericevic, Ph.D. 49
Information theoryGrafički
prikaz
odnosa
među
informacijskim
mjerama:
Združene
entropije
H(X,Y), entropija
H(X) i
H(Y), uvjetnih entropija
H(X|Y) i
H(Y|X), i uzajamnog
sadržaja
informacije
I(X;Y) i I(Y;X)
10 February 2012 Zeljko Jericevic, Ph.D. 50
Information theoryOdnosi
informacijskih
mjera: Združene
entropije
H(X,Y),
entropija
H(X) i
H(Y), uvjetnih
entropija
H(X|Y) i H(Y|X), i uzajamnog
sadržaja
informacije
I(X;Y) i I(Y;X)
10 February 2012 Zeljko Jericevic, Ph.D. 51
Information theory
Odnos
i svojstva
informacijskih
mjera
1 I(X;Y) = H(X) -
H(X|Y) Uzajamni
sadržaj informacije
–
poznavanje
jedne varijable
smanjuje
neodređenost
druge varijable
2 I(X;Y) = H(Y) -
H(Y|X)
3 I(X;Y) = H(Y) + H(Y) -
H(Y,X) Korekcija
za
iznos uzajamnog
sadržaja
informacije
10 February 2012 Zeljko Jericevic, Ph.D. 52
Information theory
Odnos
i svojstva
informacijskih
mjera
4 H(X,Y) = H(X) -
H(Y|X) Združena
entropija para
varijabli
jednaka
je zbroju entropije
jedne
varijable
i uvjetne entropije
druge
varijable
5 H(X,Y) = H(Y) -
H(X|Y)
10 February 2012 Zeljko Jericevic, Ph.D. 53
Information theory
Odnos
i svojstva
informacijskih
mjera
6 I(X;Y) = I(Y;X) Simetrija
uzajamnog
sadržaja informacije
7 I(X;X) = H(X) Vlastiti
sadržaj
informacije (entropija)
8 I(X;Y) ≥
0 Jedna
varijabla
može
nositi informaciju
o drugoj
10 February 2012 Zeljko Jericevic, Ph.D. 54
Information theory
Odnos
i svojstva
informacijskih
mjera
9 H(X|Y) ≤
H(X) Poznavanje
jedne
varijable
može smanjiti
neodređenost
druge
varijable
10 February 2012 Zeljko Jericevic, Ph.D. 55
Teorija
informacija
u obradi podataka
Nekoliko
jednostavnih
primjera
procesiranja
slika:Sobel
operator je gradijent
operator za
detekciju
rubova
u
slici
2 2
1 2 1 1 0 10 0 0 2 0 21 2 1 1 0 1
arctan
y x
yx y
x
S S
SS S S
S
−⎡ ⎤ ⎡ ⎤⎢ ⎥ ⎢ ⎥= = −⎢ ⎥ ⎢ ⎥⎢ ⎥ ⎢ ⎥− − − −⎣ ⎦ ⎣ ⎦
⎛ ⎞= + Θ = ⎜ ⎟
⎝ ⎠
10 February 2012 Zeljko Jericevic, Ph.D. 56
Teorija
informacija
u obradi podataka
Nekoliko
jednostavnih
primjera
procesiranja
slika:H = {7.266727, 6.401074, 3.954096, 6.857007} bitsOriginal Sobel
H(9 točaka) H(25 točaka)
10 February 2012 Zeljko Jericevic, Ph.D. 57
Teorija
informacija
u obradi podataka
Ostale
informacijske
mjere
upotrebljive
za
problem nalaženja
rubova: Združena
entropija
H(X,Y), entropija
H(X) i
H(Y), uvjetne
entropije
H(X|Y) i
H(Y|X), uzajamni
sadržaj
informacije
I(X;Y) i I(Y;X), relativna
entropija
DKL (p||q) i
DKL (q||p)
Original Sobel
H(9 točaka) H(25 točaka)
10 February 2012 Zeljko Jericevic, Ph.D. 58
Teorija
informacija
u obradi podataka
Inverzni
problemiSuppose that u(x) is some unknown or underlying physical
process, which we hope to determine by a set of N measurements ci
, i=0,…N-1. The relation between u(x) and the ci
’s
is that each ci
measures a (hopefully distinct) aspect of u(x) through its own linear response kernel ri
, and with its own measurement error ni
. In other words,
( ) ( ) i i i i ic s n r x u x dx n≡ + = +∫
59
Teorija
informacija
u obradi podataka
Inverzni
problemi
Within the assumption of linearity, this is quite a general formulation. The ci
’s
might approximate values of u(x) at certain locations xi
, in which case ri
(x) would have the form of a more or less narrow instrumental response centered around x=xi
. Or, the ci
’s might “live”
in an entirely different function space from u(x), measuring different Fourier components of u(x), for example.
( ) ( ) i i i i ic s n r x u x dx n≡ + = +∫
10 February 2012 Zeljko Jericevic, Ph.D. 60
Teorija
informacija
u obradi podataka
The single central idea in inverse theory:
Almost all inverse problem methods involve a trade-off between two optimizations: agreement between data and solution, or “sharpness”
of mapping between true and estimated solutions
(here denoted A), and smoothness or stability of the solution (here denoted B). Among all possible solutions, shown here schematically as the shaded region, those on the boundary connecting the unconstrained minimum of A and the unconstrained minimum of B are the “best”
solutions, in the sense
that every other solution is dominated by at least one solution on the curve.
minimize: A Bλ+
10 February 2012 Zeljko Jericevic, Ph.D. 61
Teorija
informacija
u obradi podataka
Zeroth-order regularization, though dominated by better methods, demonstrates most of the basic ideas that are used in inverse problem theory. In general, there are two positive functionals, call them A and B. The first, A, measures something like the agreement of a model to the data or sometimes a related quantity
like the “sharpness”
of the mapping between the solution and the underlying function.
When A by itself is minimized, the agreement or sharpness becomes very good (often impossibly good), but the solution becomes unstable, wildly oscillating, or in other ways unrealistic, reflecting that A alone typically defines a highly degenerate minimization problem.
10 February 2012 Zeljko Jericevic, Ph.D. 62
Teorija
informacija
u obradi podataka
That is where B comes in. It measures something like the “smoothness”
of the desired solution, or sometimes a
related quantity that parametrizes
the stability of the solution with respect to variations in the data, or sometimes a quantity reflecting a priori judgments about the likelihood of a solution. B is called the stabilizing functional or regularizing operator. In any case, minimizing B by itself is supposed to give a solution that is “smooth”
or “stable”
or “likely”
— and that has
nothing at all to do with the measured data.
10 February 2012 Zeljko Jericevic, Ph.D. 63
Teorija
informacija
u obradi podataka
Metoda
maksimalne
entropije
za
restoraciju
slike
( )
,1
1 21
1
1,...,
nepoznanica (idelna sika) mjerenje (zamrljana slika)
,..., log
ˆ ˆexp 1 exp 1
n
j j i i ji
n
n i ii
mj
i j ji jj
d A f e j m
f d
H f f f f
f A eλ
μ λρ
=
=
=
= + =
= −
⎛ ⎞ ⎛ ⎞= − + + = − +⎜ ⎟ ⎜ ⎟
⎝ ⎠⎝ ⎠
∑
∑
∑
10 February 2012 Zeljko Jericevic, Ph.D. 64
Teorija
informacija
u obradi podataka
Metoda maksimalne
entropije
za restoraciju
slike
10 February 2012 Zeljko Jericevic, Ph.D. 65
Teorija
informacija
u obradi podataka
Metoda
maksimalne
entropije
za
restoraciju
slike
10 February 2012 Zeljko Jericevic, Ph.D. 66
Hvala
na
pažnjiŽeljko
Jeričević, dr. sc.
Zavod
za
računarstvo, Tehnički
fakultet
& Zavod
za
biologiju
i medicinsku
genetiku, Medicinski
fakultet
51000 Rijeka, CroatiaPhone: (+385) 51-651 594
E-mail: [email protected]://www.riteh.uniri.hr/~zeljkoj/Zeljko_Jericevic.html