TEORIJA INFORMACIJE I KODIRANJE - University of Rijekazeljkoj/nastava/teorija_informacije/100... · TEORIJA INFORMACIJE I KODIRANJE Željko Jeri č evi ć, dr. sc. Zavod za ra č

TEORIJA INFORMACIJE I KODIRANJE

Željko

Jeričević, dr. sc.Zavod

za

računarstvo, Tehnički

fakultet

&

Zavod

za

biologiju

i medicinsku

genetiku, Medicinski

fakultet51000 Rijeka, Croatia

Phone: (+385) 51-651 594 E-mail: [email protected]

http://www.riteh.uniri.hr/~zeljkoj/Zeljko_Jericevic.html

mailto:[email protected]

10 February 2012 Zeljko Jericevic, Ph.D. 2

TEORIJA INFORMACIJE I KODIRANJE

Predavač:Željko

Jeričević

2-54 651-594 [email protected]://www.riteh.uniri.hr/~zeljkoj/Zeljko_Jericevic.html


Teorija

informacija

u obradi podataka

Univerzalnost

teorije informacija


Teorija

informacija

u obradi podataka

Nominalna

varijabla: poprima

vrijednosti

iz

nekog neporedanog

skupa

(mjesto

i država

boravka, …

itd)

Ordinalna

varijabla: poprima

diskretne

vrijednosti

iz nekog

poredanog

skupa, ne nužno

s metričkim

distancama

(redosljed

planeta

u Sunčevu

sustavu, ocijene

u školi, …

itd)

Kontinuirane

varijable

poprimaju

vrijednosti

iz

skupa realnih

brojeva

(vrijeme, distanca, temperatura, …

itd)


Nominal variablesA variable is called nominal if its values are the members

of some unordered set. For example, “state of residence” is a nominal variable that (in the U.S.) takes on one of 50

values; in astrophysics, “type of galaxy”

is a nominal variable with the three values “spiral,”

“elliptical,”

and

“irregular”.The central tendency of a nominal attribute is given by its

mode; neither the mean nor the median can be defined.Variables assessed on a nominal scale are sometimes called

categorical variables.

6

Mode• In statistics, the mode

is the value that occurs most

frequently

in a data set or a probability distribution. In some fields, notably education, sample data are often called scores, and the sample mode is known as the modal score.

• Like the statistical mean and the median, the mode is a way of capturing important information about a random variable or a population in a single quantity. The mode is in general different from the mean and median, and may be very different for strongly skewed distributions.

7

ModeThe mode is not necessarily unique, since the same maximum frequency may be attained at different values. The most ambiguous case occurs in uniform distributions, wherein all values are equally likely.

8

Mode• The mode of a discrete probability distribution is the

value x at which its probability mass function takes its maximum value. In other words, it is the value that is most likely to be sampled.

• The mode of a continuous probability distribution is the value x at which its probability density function attains its maximum value, so, informally speaking, the mode is at the peak.

• As noted above, the mode is not necessarily unique, since the probability mass function or probability density function may achieve its maximum value at several points x1

, x2

, etc.


ModeThe above definition tells us that only global maxima are modes. Slightly confusingly, when a probability density function has multiple local maxima it is common to refer to all of the local maxima as modes of the distribution. Such a continuous distribution is called multimodal (as opposed to unimodal).


ModeIn symmetric unimodal

distributions, such as the

normal (or Gaussian) distribution (the distribution whose density function, when graphed, gives the famous "bell curve"), the mean (if defined), median and mode all coincide. For samples, if it is known that they are drawn from a symmetric distribution, the sample mean can be used as an estimate of the population mode. The mode is if there is more than one number in the plot Example: 1,2,2,3,4,5,5,6,5,1 The numbers that repeat are the mode. There can be more than one mode anytime


ModeThe mode of a data sample is the element that occurs most often in the collection. For example, the mode of the sample [1, 3, 6, 6, 6, 6, 7, 7, 12, 12, 17] is 6. Given the list of data [1, 1, 2, 4, 4] the mode is not unique -

the

dataset may be said to be bimodal, while a set with more than two modes may be described as multimodal.


ModeFor a sample from a continuous distribution, such as [0.935..., 1.211..., 2.430..., 3.668..., 3.874...], the concept is unusable in its raw form, since each value will occur precisely once. The usual practice is to discretize

the

data by assigning frequency values to intervals of equal distance, as for making a histogram, effectively replacing the values by the midpoints of the intervals they are assigned to. The mode is then the value where the histogram reaches its peak.


ModeFor small or middle-sized samples the outcome of this procedure is sensitive to the choice of interval width if chosen too narrow or too wide; typically one should have a sizable fraction of the data concentrated in a relatively small number of intervals (5 to 10), while the fraction of the data falling outside these intervals is also sizable.


Ordinal variableA variable is termed ordinal if its values are the members

of a discrete, but ordered, set. Examples are grade in school, planetary order from the Sun (Mercury = 1, Venus = 2, : : :), and number of offspring. There need not be any concept of “equal metric distance”

between

the values of an ordinal variable, only that they be intrinsically ordered. Another example is Mohs

scale of

mineral hardness”


Ordinal variable: Mohs

scale of mineral hardness


Ordinal variableWhen using an ordinal scale, the central tendency of a

group of items can be described by using the group's mode (or most common item) or its median (the middle-

ranked item), but the mean (or average) cannot be defined.

Mean does not make sense for ordinal variables but median does!


Continuous variables“We will call a variable continuous if its values are real numbers, as are times, distances, temperatures, etc. (Social scientists sometimes distinguish between interval and ratio continuous variables, but we do not find that distinction very compelling.)”Press, NR3

18

Interval continuous variablesQuantitative attributes are all measurable on interval scales, as any difference

between the levels of an

attribute can be multiplied by any real number to exceed or equal another difference. A highly familiar example of interval scale measurement is temperature with the Celsius scale. In this particular scale, the unit of measurement is 1/100 of the temperature difference between the freezing and boiling points of water under a pressure of 1 atmosphere.

19

Interval continuous variablesThe "zero point" on an interval scale is arbitrary; and negative values can be used. The formal mathematical term is an affine space (in this case an affine line). Variables measured at the interval level are called "interval variables" or sometimes "scaled variables" as they have units of measurement.


Interval continuous variablesRatios between numbers on the scale are not meaningful, so operations such as multiplication and division cannot be carried out directly. But ratios of differences can be expressed; for example, one difference can be twice another.


Interval continuous variablesThe central tendency of a variable measured at the

interval level can be represented by its mode, its median, or its arithmetic mean. Statistical dispersion can be measured in most of the usual ways, which just involved differences or averaging, such as range, and standard deviation.


Ratio continuousMost measurement in the physical sciences and engineering is done on ratio scales. Mass, length, time, plane angle, energy and electric charge are examples of physical measures that are ratio scales. The scale type takes its name from the fact that measurement is the estimation of the ratio between a magnitude of a continuous quantity and a unit magnitude of the same kind.


Ratio continuousInformally, the distinguishing feature of a ratio scale is the possession of a non-arbitrary zero value. For example, the Kelvin temperature scale has a non-

arbitrary zero point of absolute zero, which is denoted 0K and is equal to -273.15 degrees Celsius. This zero point is non arbitrary as the particles that compose matter at this temperature have zero kinetic energy.

Obrada podataka

Example of a contingency table for two nominal variables, here sex and color. The row and column marginals

(totals) are shown.

The variables are “nominal,”

i.e., the order in which their values are listed is arbitrary and does not affect the result of the contingency table analysis. If the ordering of values has some intrinsic meaning, then the variables are “ordinal”

or

“continuous,”

and correlation techniques can be utilized.

Nominal variables

Measures of association between nominal variables: For any pair of nominal variables, the data can be displayed as a contingency table, a table whose rows are labeled by the values of one nominal variable, whose columns are labeled by the values of the other nominal variable, and whose entries are nonnegative integers giving the number of observed events for each combination of row

and column. The analysis of association between nominal variables is thus called contingency table analysis or cross tabulation analysis.


Nominal variablesChi-square statistic does a good job of characterizing the significance of association but is only so-so as a measure of the strength (principally because its numerical values have no very direct interpretations). Contingency table analysis based on the information-

theoretic concept of entropy, will say little about the significance of association (use chi-square for that!) but is capable of very elegantly characterizing the strength of an association already known to be significant. NR3-742

27

Measures of Association Based on χ2Some notation first: Let Nij

denote the number of events that occur with the first variable x taking on its i-th

value and the

second variable y taking on its j-th

value.Let N denote the total number of events, the sum of all the Nij

’s. Let Ni

denote the number of events for which the first variable x takes on its i-th

value regardless of the value of y; Nj

is the number of events with the j th

value of y regardless of x.

i ij j ijj i

i ji j

N N N N

N N N

• •

• •

= =

= =

∑ ∑

∑ ∑


χ2

Based Association

In other words, “dot”

is a placeholder that means, “sum over the missing index”. N.j

and Ni.

are sometimes called the row and column totals or marginals.

i ij j ijj i

i ji j

N N N N

N N N

• •

• •

= =

= =

∑ ∑

∑ ∑

29

χ2

Based AssociationThe null hypothesis is that the two variables x and

y have no association. In this case, the probability of a particular value of x given a particular value of y should be the same as the probability of that value of x regardless of y. Therefore, in the null hypothesis, the expected number

for any Nij

, which we will denote nij

, can be calculated from only the row and column totals

= which implies ij i jiij

j

n N NN nN N N

• ••

•

=

30

χ2

Based AssociationNotice that if a column or row total is zero, then the

expected number for all the entries in that column or row is also zero; in that case, the never-occurring bin of x or y should simply be removed from the analysis.

The chi-square statistic is now given by equation below which, in the present case, is summed over all entries in the table:

( )2

2

,

ij ij

i j ij

N nn

χ−

=∑


χ2

Based AssociationSuppose there is a significant association. How do we

quantify its strength, so that (e.g.) we can compare the strength of one association with another? The idea here is to find some reparametrization

of χ2

that maps it into

some convenient interval, like 0 to 1, where the result is not dependent on the quantity of data that we happen to sample, but rather depends only on the underlying population from which the data were drawn. There are several different ways of doing this. Two of the more common are called Cramer’s V and the contingency coefficient C.

32

Obrada

podatakaThe formula for Cramer’s V is

where I and J are again the numbers of rows and columns, and N is the total number of events. Cramer’s V has the pleasant property that it lies between zero and one inclusive, equals zero when there is no association, and equals one only when the association is perfect: All the events in any row lie in one unique column, and vice versa. (In chess parlance, no two rooks, placed

on a nonzero table entry, can capture each other.)

( )2

min 1, 1v

N I Jχ

=− −

33

Obrada

podatakaIn the case of I = J = 2, Cramer’s V is also referred to as

the phi statistic. The contingency coefficient C is defined as

It also lies between zero and one, but (as is apparent from the formula) it can never achieve the upper limit. While it can be used to compare the strength of association of two tables with the same I and J , its upper limit depends on I and J . Therefore it can never be used to compare tables of different sizes.

2

2CN

χχ

=+

34

Obrada

podatakaInformation-Theoretic Properties of DistributionsIn this section we return to nominal distributions, that is to say, to distributions with discrete outcomes that have no meaningful ordering. Information theory provides a different, and sometimes very useful, perspective on the nature of such a distribution p with outcomes i , 0 ≤

i ≤

I-

1, and associated probabilities pi

, and on the relation between two or more such distributions.


Obrada

podataka

Entropy of a Distribution

21

20

ln

ln 0lim

je vjerojatnost stanja

I

i ii

p

i

H p p

p p

p i

=

→

= −∑

=


Information theoryZdružena

entropija

H(X,Y)

kada

zajednički

promatramo

dvije

slučajne

varijable

(xi ,yj ):

( ) ( )21 1

( , ) , log ,n m

i j i ji j

H X Y p x y p x y= =

= −∑∑


Information theoryUvjetna

entropija

H(Y|X)

je prosječna

vrijednost

entropije

slučajne

varijable

Y uz

poznati

X. Prosjek

se uzima

po

svim

vrijednostima

varijable

X

( ) ( )

( ) ( ) ( )

( ) ( )

1

21 1

21 1

( | ) |

| log |

, log |

n

i ii

n m

i j i j ii j

n m

i j j ii j

H X Y p x H Y x x

p x p y x p y x

p x y p y x

=

= =

= =

= =

= −

= −

∑

∑ ∑

∑∑


Information theoryDrugi

način

gledanja

na

uvjetnu

entropiju

H(Y|X)

je prosječna

neodređenost

slučajne varijable

Y nakon

što

je poznata

varijabla

X. Prije

nego

što

je varijabla

X bila

poznata, entropija

varijable

Y je H(Y). Uz

pretpostavku

da

X utječe na Y, nakon

što

je X poznat, entropija

Y se smanjuje

i postaje

H(Y|X).


Information theorySrednji

uzajamni

sadržaj

informacije

i relativna

entropija.

Relativna

entropija

DKL (p||q)

između

dvaju

razdioba

vjerojatnosti (p(X) i q(X)) slučajne

varijable

X je mjera

divergencije

(Kullback–

Leibler

divergence) između

spomenutih

razdioba. Budući da DKL (p||q) ≠

DKL (q||p)

nije

matematički

ispravno

zvati

DKL

udaljenošću, kao

što

je uobičajeno.

( ) ( ) ( )( )2

1|| log

ni

KL ii i

p xD p q p x

q x=

=∑


Information theoryRelativna

entropija

( ) ( ) ( )( )2

1|| log

ni

KL ii i

p xD p q p x

q x=

=∑


Teorija

informacija

u obradi podataka

Example 1. Suppose that we are seeing events drawn from the distribution p, but we want to rule out an alternative hypothesis that they are drawn from q. We might do this by computing a likelihood ratio L,

( )( )

||

i

data i

p Data p pLp Data q q

= =Π


Teorija

informacija

u obradi podataka

and rejecting the alternative hypothesis q if this ratio is larger than some large number, say 106. (In the above shorthand notation, the product over “data”

means that

we substitute for i in each factor the particular outcome of that factor’s individual data event.) You can easily see that, under hypothesis p, the average increase in lnL

per

data event is just D(p||q). In otherwords, the Kullback- Leibler

distance is the expected log-likelihood with

which a false hypothesis q can be rejected, per event. As we might expect, this has something to do with “how different”

q is from p.


Information theoryUzajamni

sadržaj

informacije

I(X;Y)

(transinformacija)

između slučajnih

varijabli

X i Y je relativna

entropija između

razdiobe

njihovih

združenih

vjerojatnosti

i

razdiobe

umnožaka

njihovih

pojedinačnih

vjerojatnosti:

( ) ( ) ( )( ) ( )2

1 1

,, , log

n mi j

i ji j i j

p x yI X Y p x y

p x p y= =

= ∑∑



sadržaj

informacije

I(X;Y)

(transinformacija)

izražava

mjeru

koliko

informacije

jedna

varijabla

sadrži o drugoj.

Ako

su

varijable

potpuno

neovisne, I(X;Y) = 0

jer

p(xi ,yj ) = p(xi )p(yj )

( ) ( ) ( )( ) ( )2

1 1

,, , log

n mi j

i ji j i j

p x yI X Y p x y

p x p y= =

= ∑∑



sadržaj

informacije

I(X;Y)

(transinformacija)

izražava

mjeru

koliko

informacije

jedna

varijabla

sadrži o drugoj.

Ako

su

varijable

jednake, I(X;Y) = H(X) = H(Y)

jer

jednu varijablu

možemo

u potpunosti

opisati

drugom:

p(xi ,yj ) = p(xi ) = p(yj ) za

i=jp(xi ,yj ) = 0 za

i≠j


Information theoryEntropija

i uzajamni

sadržaj

informacije:

I(X;Y) = H(X) –

H(X|Y)je smanjenje

neodređenosti

(entropije) varijable

X

uzrokovano

poznavanjem

varijable

Y. Vrijedi

i obrnuto:

I(Y;X) = H(Y) –

H(Y|X)Simetrija

uzajamnog

sadržaja

informacije

dviju

varijabli:

I(X;Y) = I(Y;X)


Information theoryEntropija

H(X), združena

entropija

H(X,Y)

i uvjetna

entropija

H(Y|X)

bazira

se na

definiciji

združene entropije

i odnosa

vjerojatnosti: p(x,y)=p(x)p(y|x)

( ) ( ) ( ) ( ) ( ) ( )

( ) ( ) ( ) ( )

( ) ( ) ( ) ( )( )

2 21 1 1 1

2 21 1 1 1

2 21 1 1

, , log , , log |

, log , log |

log , log |

n m n m

i j i j i j i i ji j i j

n m n m

i j i i j j ii j i j

n n m

i i i j j ii i j

H X Y p x y p x y p x y p x p y x

p x y p x p x y p y x

p x p x p x y p y x

H X H

= = = =

= = = =

= = =

= − = −

= − −

= − −

= +

∑∑ ∑∑

∑∑ ∑∑

∑ ∑∑( )|Y X


Information theoryZdružena

entropija

H(X,Y), para

varijabli

jednaka

je

zbroju

entropije

jedne

varijable

H(X)

i preostale entropija

druge

varijable

uz

uvjet

da

je prva

varijabla

poznata

H(Y|X).Iz

toga proizlazi

da

je uzajamni

sadržaj

informacije

I(X;Y)

jednak:I(X;Y) = H(X) + H(Y) -

H(X,Y)

Gdje

je H(X,Y)

korekcija

entropije

u slučaju

ovisnih varijabli


Information theoryGrafički

prikaz

odnosa

među

informacijskim

mjerama:

Združene

entropije

H(X,Y), entropija

H(X) i

H(Y), uvjetnih entropija

H(X|Y) i

H(Y|X), i uzajamnog

sadržaja

informacije

I(X;Y) i I(Y;X)


Information theoryOdnosi

informacijskih

mjera: Združene

entropije

H(X,Y),

entropija

H(X) i

H(Y), uvjetnih

entropija

H(X|Y) i H(Y|X), i uzajamnog

sadržaja

informacije

I(X;Y) i I(Y;X)


Information theory

Odnos

i svojstva

informacijskih

mjera

1 I(X;Y) = H(X) -

H(X|Y) Uzajamni

sadržaj informacije

–

poznavanje

jedne varijable

smanjuje

neodređenost

druge varijable

2 I(X;Y) = H(Y) -

H(Y|X)

3 I(X;Y) = H(Y) + H(Y) -

H(Y,X) Korekcija

za

iznos uzajamnog

sadržaja

informacije


Information theory

Odnos

i svojstva

informacijskih

mjera

4 H(X,Y) = H(X) -

H(Y|X) Združena

entropija para

varijabli

jednaka

je zbroju entropije

jedne

varijable

i uvjetne entropije

druge

varijable

5 H(X,Y) = H(Y) -

H(X|Y)


Information theory

Odnos

i svojstva

informacijskih

mjera

6 I(X;Y) = I(Y;X) Simetrija

uzajamnog

sadržaja informacije

7 I(X;X) = H(X) Vlastiti

sadržaj

informacije (entropija)

8 I(X;Y) ≥

0 Jedna

varijabla

može

nositi informaciju

o drugoj


Information theory

Odnos

i svojstva

informacijskih

mjera

9 H(X|Y) ≤

H(X) Poznavanje

jedne

varijable

može smanjiti

neodređenost

druge

varijable


Teorija

informacija

u obradi podataka

Nekoliko

jednostavnih

primjera

procesiranja

slika:Sobel

operator je gradijent

operator za

detekciju

rubova

u

slici

2 2

1 2 1 1 0 10 0 0 2 0 21 2 1 1 0 1

arctan

y x

yx y

x

S S

SS S S

S

−⎡ ⎤ ⎡ ⎤⎢ ⎥ ⎢ ⎥= = −⎢ ⎥ ⎢ ⎥⎢ ⎥ ⎢ ⎥− − − −⎣ ⎦ ⎣ ⎦

⎛ ⎞= + Θ = ⎜ ⎟

⎝ ⎠


Teorija

informacija

u obradi podataka

Nekoliko

jednostavnih

primjera

procesiranja

slika:H = {7.266727, 6.401074, 3.954096, 6.857007} bitsOriginal Sobel

H(9 točaka) H(25 točaka)


Teorija

informacija

u obradi podataka

Ostale

informacijske

mjere

upotrebljive

za

problem nalaženja

rubova: Združena

entropija

H(X,Y), entropija

H(X) i

H(Y), uvjetne

entropije

H(X|Y) i

H(Y|X), uzajamni

sadržaj

informacije

I(X;Y) i I(Y;X), relativna

entropija

DKL (p||q) i

DKL (q||p)

Original Sobel

H(9 točaka) H(25 točaka)


Teorija

informacija

u obradi podataka

Inverzni

problemiSuppose that u(x) is some unknown or underlying physical

process, which we hope to determine by a set of N measurements ci

, i=0,…N-1. The relation between u(x) and the ci

’s

is that each ci

measures a (hopefully distinct) aspect of u(x) through its own linear response kernel ri

, and with its own measurement error ni

. In other words,

( ) ( ) i i i i ic s n r x u x dx n≡ + = +∫

59

Teorija

informacija

u obradi podataka

Inverzni

problemi

Within the assumption of linearity, this is quite a general formulation. The ci

’s

might approximate values of u(x) at certain locations xi

, in which case ri

(x) would have the form of a more or less narrow instrumental response centered around x=xi

. Or, the ci

’s might “live”

in an entirely different function space from u(x), measuring different Fourier components of u(x), for example.

( ) ( ) i i i i ic s n r x u x dx n≡ + = +∫


Teorija

informacija

u obradi podataka

The single central idea in inverse theory:

Almost all inverse problem methods involve a trade-off between two optimizations: agreement between data and solution, or “sharpness”

of mapping between true and estimated solutions

(here denoted A), and smoothness or stability of the solution (here denoted B). Among all possible solutions, shown here schematically as the shaded region, those on the boundary connecting the unconstrained minimum of A and the unconstrained minimum of B are the “best”

solutions, in the sense

that every other solution is dominated by at least one solution on the curve.

minimize: A Bλ+


Teorija

informacija

u obradi podataka

Zeroth-order regularization, though dominated by better methods, demonstrates most of the basic ideas that are used in inverse problem theory. In general, there are two positive functionals, call them A and B. The first, A, measures something like the agreement of a model to the data or sometimes a related quantity

like the “sharpness”

of the mapping between the solution and the underlying function.

When A by itself is minimized, the agreement or sharpness becomes very good (often impossibly good), but the solution becomes unstable, wildly oscillating, or in other ways unrealistic, reflecting that A alone typically defines a highly degenerate minimization problem.


Teorija

informacija

u obradi podataka

That is where B comes in. It measures something like the “smoothness”

of the desired solution, or sometimes a

related quantity that parametrizes

the stability of the solution with respect to variations in the data, or sometimes a quantity reflecting a priori judgments about the likelihood of a solution. B is called the stabilizing functional or regularizing operator. In any case, minimizing B by itself is supposed to give a solution that is “smooth”

or “stable”

or “likely”

— and that has

nothing at all to do with the measured data.


Teorija

informacija

u obradi podataka

Metoda

maksimalne

entropije

za

restoraciju

slike

( )

,1

1 21

1

1,...,

nepoznanica (idelna sika) mjerenje (zamrljana slika)

,..., log

ˆ ˆexp 1 exp 1

n

j j i i ji

n

n i ii

mj

i j ji jj

d A f e j m

f d

H f f f f

f A eλ

μ λρ

=

=

=

= + =

= −

⎛ ⎞ ⎛ ⎞= − + + = − +⎜ ⎟ ⎜ ⎟

⎝ ⎠⎝ ⎠

∑

∑

∑


Teorija

informacija

u obradi podataka

Metoda maksimalne

entropije

za restoraciju

slike


Teorija

informacija

u obradi podataka

Metoda

maksimalne

entropije

za

restoraciju

slike


Hvala

na

pažnjiŽeljko

Jeričević, dr. sc.

Zavod

za

računarstvo, Tehnički

fakultet

& Zavod

za

biologiju

i medicinsku

genetiku, Medicinski

fakultet

51000 Rijeka, CroatiaPhone: (+385) 51-651 594

E-mail: [email protected]://www.riteh.uniri.hr/~zeljkoj/Zeljko_Jericevic.html

mailto:[email protected]

http://www.riteh.uniri.hr/~zeljkoj/Zeljko_Jericevic.html

Documents

TEORIJA INFORMACIJE I KODIRANJE - University of Rijekazeljkoj/nastava/teorija_informacije/100... · TEORIJA INFORMACIJE I KODIRANJE Željko Jeri č evi ć, dr. sc. Zavod za ra č