32
Statistical inference The use of a set of sample data to draw information (make statements) about some aspects of the process that generated those data is referred to as statistical inference. DATA GENERATING PROCESS (DGP) Largely unknown STATISTICAL INFERENCE Statement(s) about the unknown parameter(s) that governs the process OBSERVED DATA x The data generating process (DGP) can be modeled by a certain probability density function (PDF) f(x) with = 1 , 2 ,…, k (a vector of parameters). If, for example, X is distributed according to a normal PDF, N(, 2 ), i.e., a two-parameter PDF , the observed data (x 1 ,x 2 ,…,x n ) are a random sample from a normal distribution with parameters and 2 . Note that if one (or more) parameter(s) is(are) left as unknown, not a unique distribution, but a family of distributions is specified.

Inferenza statistica e stimatoripuccini.chimica.uniba.it/~losito/LezioniChemiometria/3. Inferenza statistica.pdfTitle: Microsoft PowerPoint - Inferenza statistica e stimatori Author:

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Inferenza statistica e stimatoripuccini.chimica.uniba.it/~losito/LezioniChemiometria/3. Inferenza statistica.pdfTitle: Microsoft PowerPoint - Inferenza statistica e stimatori Author:

Statistical inference

The use of a set of sample data to drawinformation (make statements) about someaspects of the process that generated those datais referred to as statistical inference.

DATA GENERATING PROCESS (DGP)

Largely unknown

STATISTICAL INFERENCE

Statement(s) about the unknown parameter(s) that

governs the process

OBSERVED DATAxThe data generating process (DGP) can be modeled

by a certain probability density function (PDF)f(x) with = 1, 2 ,…, k (a vector ofparameters).

If, for example, X is distributed according to anormal PDF, N(,2), i.e., a two-parameter PDF, theobserved data (x1,x2,…,xn) are a random samplefrom a normal distribution with parameters and2.

Note that if one (or more) parameter(s) is(are) leftas unknown, not a unique distribution, but a familyof distributions is specified.

Page 2: Inferenza statistica e stimatoripuccini.chimica.uniba.it/~losito/LezioniChemiometria/3. Inferenza statistica.pdfTitle: Microsoft PowerPoint - Inferenza statistica e stimatori Author:

Two major approaches to statistical inference are the following:

Frequentist (maximum likelihood) Bayesian

Note that, in the statistical context, a distinction is made between terms «likelihood»and «probability», depending on the aspect focused on, whether outcomes orparameters:

«probability» is used before data are available, to describe the plausibility of a futureoutcome, given a value for the parameter;

«likelihood» is used after data are available, to describe the plausibility of a parametervalue.

The three major types of statistical inference are:

Point Estimation (what single value of the parameter is most appropriate?) Interval Estimation (what interval of parameter values is most consistent with data?) Hypothesis Testing (which of two values of the parameter is most consistent with

data?)

Page 3: Inferenza statistica e stimatoripuccini.chimica.uniba.it/~losito/LezioniChemiometria/3. Inferenza statistica.pdfTitle: Microsoft PowerPoint - Inferenza statistica e stimatori Author:

Frequentist approach

This approach judges inferences based on their performance in repeated sampling, i.e.,based on the sampling distributon of the statistic used to make the inference.Several ad hoc methods are used to select the statistics used for inference.

Bayesian approach

This approach assumes that the inference problem is subjective and proceeds by:

eliciting a prior distribution of the parameter; combining prior distribution with the density of data to obtain the joint distribution

of the parameter and the data; using Bayes’ Theorem to obtain the posterior distribution of the parameter, given the

data.

Page 4: Inferenza statistica e stimatoripuccini.chimica.uniba.it/~losito/LezioniChemiometria/3. Inferenza statistica.pdfTitle: Microsoft PowerPoint - Inferenza statistica e stimatori Author:

Frequentist approach - point estimation

Let us assume that we have data (x1,x2,…,xn), which represent a random sample from anormal distribution with parameters and 2 and, for simplicity, let us consider 2 asknown.

The probability density function in this case is:

Available data have to be used to determine an estimate of .

The frequentist approach uses the sample mean of data, x, as a point estimate of ,based on the fact that the sampling distribution related to f(x) is itself a normaldistribution centered on and having a sampling variance 2 /n.

Page 5: Inferenza statistica e stimatoripuccini.chimica.uniba.it/~losito/LezioniChemiometria/3. Inferenza statistica.pdfTitle: Microsoft PowerPoint - Inferenza statistica e stimatori Author:

Frequentist approach - interval estimation (confidence interval)

In this approach an interval of parameter values consistent with data or supported bydata has to be found.

At this aim Jerzy Neyman introduced confidence intervals in 1937, giving them thefollowing definition:

«A P% confidence interval for a parameter is an interval generated by a procedure that,on repeated sampling, has an P% of containing the true value of the parameter»

The confidence level indicates, then, the proportion of observed intervals, obtainedfrom many separate data analyses of replicated experiments, that contain the true valueof the parameter.

The P value is called the confidence level, usually indicated as 1-a, whereas a is calledsignificance, or confidence, coefficient.

Page 6: Inferenza statistica e stimatoripuccini.chimica.uniba.it/~losito/LezioniChemiometria/3. Inferenza statistica.pdfTitle: Microsoft PowerPoint - Inferenza statistica e stimatori Author:

As a consequence of the original definition, a confidence interval among those calculatedfrom several samples might not include the true value of the parameter.

In the following picture blue vertical line segments represent 50 calculations ofconfidence intervals for the population mean μ, represented as a red horizontal dashedline:

Note that some confidence intervals do not contain the population mean, as expected.If we randomly choose one realization, in 95% of cases we end up having chosen aninterval that contains the parameter; however we may be unlucky and pick the wrongone.

The procedure leading to the construction of the 50 intervals is referred to as confidenceprocedure.

*

*

*

Page 7: Inferenza statistica e stimatoripuccini.chimica.uniba.it/~losito/LezioniChemiometria/3. Inferenza statistica.pdfTitle: Microsoft PowerPoint - Inferenza statistica e stimatori Author:

Types of confidence interval for the mean when a normally-distributed variable isconsidered

First case: the variance, 2, is known

Second case: the variance, 2, is unknown; n > 30

X Xn

z )2/1( a

n

z )2/1( a- +

Xn

s z )2/1( aX

n

s z )2/1( a- +

Third case: the variance, 2, is unknown; n < 30

- +Xn

s t )2/1(1n a Xn

s t )2/1(1n a

n-1 t 0.95 t 0.975 t0.995

1 6.31 12.71 63.662 2.92 4.30 9.923 2.35 3.18 5.844 2.13 4.78 4.605 2.01 2.57 4.036 1.94 2.45 3.717 1.89 2.37 3.508 1.86 2.31 3.559 1.83 2.26 3.2510 1.81 2.23 3.1720 1.72 2.09 2.8530 1.70 2.04 2.7560 1.67 2.00 2.66120 1.66 1.98 2.62 1.645 1.96 2.58

Page 8: Inferenza statistica e stimatoripuccini.chimica.uniba.it/~losito/LezioniChemiometria/3. Inferenza statistica.pdfTitle: Microsoft PowerPoint - Inferenza statistica e stimatori Author:

Bayesian approach: Bayes’ theorem

The theorem is stated mathematically as thefollowing equation:

where A and B are events and P(B) 0.

P(A|B) is a conditional probability: the probability of event A to occur if B is true

P(B|A) is another conditional probability: the probability of event B to occur if A istrue

P(A) and P(B) are called marginal probabilities: the probabilities of observing A and Bindependently on each other.

Page 9: Inferenza statistica e stimatoripuccini.chimica.uniba.it/~losito/LezioniChemiometria/3. Inferenza statistica.pdfTitle: Microsoft PowerPoint - Inferenza statistica e stimatori Author:

A simple example of the Bayesian approach

Suppose that a test for using a particular drug is 99% sensitive and 99% specific.That is, the test will produce 99% true positive results for drug users and 99% true negativeresults for non-drug users.Suppose that 0.5% of people are users of the drug.What is the probability that a randomly selected individual with a positive test is a druguser?

The calculation based on the Bayes’ theorem is:

In this equation P(+|User), the probability that the test will be positive for a drug user, isknown by hypothesis, i.e. 99% (0.99).Also P(User), the probability of finding a drug user in the population, is known byhypothesys, i.e. 0.5 % (0.005).

P(+) is the total probability that the test is positive.

Page 10: Inferenza statistica e stimatoripuccini.chimica.uniba.it/~losito/LezioniChemiometria/3. Inferenza statistica.pdfTitle: Microsoft PowerPoint - Inferenza statistica e stimatori Author:

P(+) has to account for true positive tests but also for false positive tests, i.e. cases in whichthe test is positive although the person is NOT a drug user. Then:

Under the initial hypothesys P(+| Non-user) is 0.01 % (0.001) and P(Non-user) is 99.5%(0.995).

The final calculation is, then

Note that the key probability, in this case, is that of false positives, P(+|Non-user), since itis multiplied by a number close to unity (i.e. the probability of finding a non-user, which ishigh), thus if it is not small the denominator is raised and the P(User|+) is decreased.

Page 11: Inferenza statistica e stimatoripuccini.chimica.uniba.it/~losito/LezioniChemiometria/3. Inferenza statistica.pdfTitle: Microsoft PowerPoint - Inferenza statistica e stimatori Author:

The calculation can be visualized by the scheme below, in which U are users and Ū are nonusers:

Note that the final probability, 33.2%, arises from the ratio between P(U ꓵ +) and the sumbetween it and P(Ū ꓵ +), where the symbol ꓵ indicates that both U and + events have tooccur.

It is clear that only by reducing further P(+|Ū), the probability of false negatives, the P(Ū ꓵ+) can be reduced and thus the P(U ꓵ +) has a greater weight on the total of positiveresults.

Page 12: Inferenza statistica e stimatoripuccini.chimica.uniba.it/~losito/LezioniChemiometria/3. Inferenza statistica.pdfTitle: Microsoft PowerPoint - Inferenza statistica e stimatori Author:

Bayesian approach – credible intervals

The Bayesian approach provides an alternative to confidence intervals, represented bycredible intervals.

According to the definition, a 95% credible interval is the interval of values in which weare 95% certain that an unobserved parameter falls, based on sample data of size n.

With the credible interval approach one starts from sample data to arrive to thepopulation parameter, whereas, according to statisticians not favorable to the confidenceinterval approach, i.e., to the frequentist approach, this appears to do the reverse.

Page 13: Inferenza statistica e stimatoripuccini.chimica.uniba.it/~losito/LezioniChemiometria/3. Inferenza statistica.pdfTitle: Microsoft PowerPoint - Inferenza statistica e stimatori Author:

An ironic representation of this controversy is given by the following cartoon, in whichsymbols [X|Q] and [Q|X] are intended to indicate, respectively, that X, the sample data,are inferred if Q, the parameter value, is known, and viceversa.

German Molina & Enrique ter Horst, 2001

Page 14: Inferenza statistica e stimatoripuccini.chimica.uniba.it/~losito/LezioniChemiometria/3. Inferenza statistica.pdfTitle: Microsoft PowerPoint - Inferenza statistica e stimatori Author:

Calculation of a credible interval

Let us assume that we have data (x1,x2,…,xn), that can be referred to as a vector x, obtainedfrom a random variable X and suppose that this variable is distributed according to aparameter .

In Bayesian analysis also this parameter is treated as a random variable, taking values in aspace Q and supposed to be distributed with a probability density function h(), which isknown as prior distribution.

The joint probability density function, i.e., the probability that a data vector x will beobtained if a specific value of the parameter is true, P(x, ),

can be calculated from the product:

Additionally, the (unconditional) probability density function of X is the function f given by:

if the parameter has a discrete distribution

Page 15: Inferenza statistica e stimatoripuccini.chimica.uniba.it/~losito/LezioniChemiometria/3. Inferenza statistica.pdfTitle: Microsoft PowerPoint - Inferenza statistica e stimatori Author:

if the parameter has a continuous distribution

Finally, starting from the Bayes’s theorem, the posterior probability density function of canbe determined by the following equation:

In one of the most common Bayesian approaches, a credible interval (or Bayesian confidenceset) at a 1-a level of confidence, indicated as C(x), is a subset of the parameter space thatdepends on the data vector x so that:

P[ C(x)|X = x] = 1-a

In this definition is random and the interval can be obtained only if h(|x) is known.

An example of the calculation is given in the following for the case of Normal Distribution.

Page 16: Inferenza statistica e stimatoripuccini.chimica.uniba.it/~losito/LezioniChemiometria/3. Inferenza statistica.pdfTitle: Microsoft PowerPoint - Inferenza statistica e stimatori Author:

Credible intervals: the case of Normal Distribution

Suppose that x = (x1,x2,…,xn) is a random sample of size n obtained from a normaldistribution with unknown mean and known variance 2:

Suppose, also, that μ has a normal distribution h() with mean a and variance b(representing the prior distribution):

Page 17: Inferenza statistica e stimatoripuccini.chimica.uniba.it/~losito/LezioniChemiometria/3. Inferenza statistica.pdfTitle: Microsoft PowerPoint - Inferenza statistica e stimatori Author:

Under these conditions the following equation can be written:

with:

Page 18: Inferenza statistica e stimatoripuccini.chimica.uniba.it/~losito/LezioniChemiometria/3. Inferenza statistica.pdfTitle: Microsoft PowerPoint - Inferenza statistica e stimatori Author:

On the other hand:

Therefore:

If the pre-exponential term is condensed into a term C together with parts of the exponentialfunction not including (those marked by red or green boxes), the following equation isobtained:

Page 19: Inferenza statistica e stimatoripuccini.chimica.uniba.it/~losito/LezioniChemiometria/3. Inferenza statistica.pdfTitle: Microsoft PowerPoint - Inferenza statistica e stimatori Author:

The expression can be rearranged as follows:

where:

Page 20: Inferenza statistica e stimatoripuccini.chimica.uniba.it/~losito/LezioniChemiometria/3. Inferenza statistica.pdfTitle: Microsoft PowerPoint - Inferenza statistica e stimatori Author:

According to the Bayes’ theorem, the h()f(x|) product should be divided by f(x) to obtainh(|x):

Considering that f(x) can be considered as a normalizing factor (i.e. a numerical value), theresulting expression, corresponding to the posterior distribution h(|x), is proportional to anormal distribution with the following mean and variance:

mean variance

This distribution is said to be conjugate to the normal distribution with unknown mean andknown variance.Note that the variance of the posterior distribution is deterministic, since it depends on dataonly through the sample size n.

In the special case in which b = σ, the posterior distribution is normal with mean (y+a)/(n+1)and variance σ2/(n+1).

Page 21: Inferenza statistica e stimatoripuccini.chimica.uniba.it/~losito/LezioniChemiometria/3. Inferenza statistica.pdfTitle: Microsoft PowerPoint - Inferenza statistica e stimatori Author:

A numerical comparison between frequentist and Bayesian confidence intervals

Initial conditions

The length of a certain machined part is supposed to be 10 centimeters but, due toimperfections in the manufacturing process, the actual length is normally distributed withmean μ and variance σ2.

The variance is due to inherent factors in the process, which remain fairly stable over time.From historical data, it is known that σ = 0.3. On the other hand, μ may be set by adjustingvarious parameters in the process and hence may change to an unknown value fairlyfrequently. Thus, suppose that we consider for μ a prior normal distribution with mean 10and standard deviation 0.3; moreover, a sample of 100 parts has mean 10.2.

Frequentist 95% confidence interval

In this case: X Xn

z )2/1( a

n

z )2/1( a- +

Since z0.975 = 1.96, the interval is:

10.2 – (1.96 × 0.3)/10 10.2 + (1.96 × 0.3)/10 10.1412 10.2588

Page 22: Inferenza statistica e stimatoripuccini.chimica.uniba.it/~losito/LezioniChemiometria/3. Inferenza statistica.pdfTitle: Microsoft PowerPoint - Inferenza statistica e stimatori Author:

Bayesian 95% confidence interval

In the specific example, the standard deviation of the prior distribution (b) and are thesame, so the posterior distribution, that has to be used to calculate the confidence interval,has the following parameters:

Mean) (y+a)/(n+1)

y corresponds to the sum of values obtained experimentally, i.e., 10.2 × 100 = 1020a is the mean of the prior distribution, equal to 10,n is the number of parts subjected to length measurements, i.e. 100.

The mean of posterior distribution is, then, equal to 1030/101 = 10.198

Standard deviation) [σ2/(n+1)]1/2

= 0.3, thus the standard deviation of the posterior distribution is equal to:

[0.32/101]1/2 = 0.0299

Page 23: Inferenza statistica e stimatoripuccini.chimica.uniba.it/~losito/LezioniChemiometria/3. Inferenza statistica.pdfTitle: Microsoft PowerPoint - Inferenza statistica e stimatori Author:

Since z0.975 = 1.96, the Bayesian 95% confidence interval is:

10.198 – (1.96 × 0.0299) 10.198 + (1.96 × 0.0299) 10.1394 10.2566

The comparison between the two types of intervals, before making rounding off decimalfigures:

Frequentist) 10.1412 10.2588

Bayesian) 10.1394 10.2566

shows that there can be cases in which the two intervals may become identical once therounding of figures is performed, although this cannot be considered as a general rule.

Page 24: Inferenza statistica e stimatoripuccini.chimica.uniba.it/~losito/LezioniChemiometria/3. Inferenza statistica.pdfTitle: Microsoft PowerPoint - Inferenza statistica e stimatori Author:

Conceptual comparison between frequentist and Bayesian approaches

The difference in the approach to statistical inference followed by frequentists and Bayesianscan be conceptualized as follows.

A frequentist believes that probabilities are only defined as the quantities obtained in thelimit after the number of independent trials tends to infinity. For example, if an unbiasedcoin is tossed over numerous trials, the probability 1/2 represents the value to which theratio between heads (or tails) and the total number of trials will converge as the number oftrials tends to infinity.

A Bayesian interprets probabilities as the degree of belief in a hypothesis. Under thisphilosophy, it is perfectly valid to begin with a prior distribution, gather a few observations,and then make decisions based on the resulting posterior distribution from applying Bayes’theorem.

Bayesian credible intervals treat their bounds as fixed and the estimated parameter as arandom variable, whereas frequentist confidence intervals treat their bounds as randomvariables and the parameter as a fixed value.

Page 25: Inferenza statistica e stimatoripuccini.chimica.uniba.it/~losito/LezioniChemiometria/3. Inferenza statistica.pdfTitle: Microsoft PowerPoint - Inferenza statistica e stimatori Author:

Apart from special cases, Bayesian credible intervals do not coincide with frequentistconfidence intervals for two reasons:

credible intervals incorporate problem-specific contextual information from the priordistribution, whereas confidence intervals are based only on the data;

credible intervals and confidence intervals treat nuisance parameters in radically differentways.

Page 26: Inferenza statistica e stimatoripuccini.chimica.uniba.it/~losito/LezioniChemiometria/3. Inferenza statistica.pdfTitle: Microsoft PowerPoint - Inferenza statistica e stimatori Author:

Qualification criteria for estimators

Four main criteria can be cited among those that enable a qualification of the estimatorof a parameter referred to a random variable:

A. Consistency

B. Non-distorsion

C. Efficiency

D. Minimum variance

Page 27: Inferenza statistica e stimatoripuccini.chimica.uniba.it/~losito/LezioniChemiometria/3. Inferenza statistica.pdfTitle: Microsoft PowerPoint - Inferenza statistica e stimatori Author:

A. Consistency

Consistency of an estimator is theproperty usually required as mandatoryto discriminate a general «statistics»from an «estimate» of a parameter.

Consistency is an asymptotic property:an estimator n can be defined«consistent» if it tends to theparameter when n, the number ofdata it is based on, tends to infinity:

As shown in the following, the sampling mean is a typical example of consistent estimator.

Page 28: Inferenza statistica e stimatoripuccini.chimica.uniba.it/~losito/LezioniChemiometria/3. Inferenza statistica.pdfTitle: Microsoft PowerPoint - Inferenza statistica e stimatori Author:

Let us consider a random variable X N(, 2), with unknown and 2 known andgreater than 0.It is easy to show that the sampling mean,

is distributed as a normal distribution centered on and with variance 2/n.

As a consequence, when n diverges towards infinity, the distribution dispersion tends to0.

As for the consistency, if the Chebyshev theorem is applied:

the following equation can be obtained:

Page 29: Inferenza statistica e stimatoripuccini.chimica.uniba.it/~losito/LezioniChemiometria/3. Inferenza statistica.pdfTitle: Microsoft PowerPoint - Inferenza statistica e stimatori Author:

B. Non-distorsion

An estimator is defined as notdistorted or «unbiased» if itsexpectation is equal to theparameter:

UnbiasedBiased

E =

Let us consider the mean as theparameter to be estimated for acontinuous random variable X.It is easy to demonstrate thatexpectation of the sampling mean:

corresponds to the mean , thus the sampling mean is an unbiased and consistentestimator of the population mean.

Page 30: Inferenza statistica e stimatoripuccini.chimica.uniba.it/~losito/LezioniChemiometria/3. Inferenza statistica.pdfTitle: Microsoft PowerPoint - Inferenza statistica e stimatori Author:

C. Efficiency

As a premise to the concept of efficiency of an estimator, it is worth noting that if twounbiased estimators exist for a parameter , also every possible linear combination of thetwo estimators is still an unbiased estimator.

Consequently, infinite unbiased estimators exist, thus the criterion of efficiency fulfils thetask of choosing between several unbiased estimators.

In particular, the efficiency ratio, Re, based on the variances of the two estimators, can beadopted:

Page 31: Inferenza statistica e stimatoripuccini.chimica.uniba.it/~losito/LezioniChemiometria/3. Inferenza statistica.pdfTitle: Microsoft PowerPoint - Inferenza statistica e stimatori Author:

Specifically, an estimator is more efficient than another one if its variance is lower, whichmeans that the efficiency ratio expressed with respect to the other one is higher than 1.

As an example, let us compare two unbiased estimators of , corresponding to thesampling mean and the sampling median .

Since the variance of the sampling mean is equal to 2/n, whereas the variance of thesampling median (for large numbers of data) can be demonstrated to be (/2) 2/n, theefficiency ratio is given by:

Thus the sampling mean is a more efficient estimator than the sampling median.

Page 32: Inferenza statistica e stimatoripuccini.chimica.uniba.it/~losito/LezioniChemiometria/3. Inferenza statistica.pdfTitle: Microsoft PowerPoint - Inferenza statistica e stimatori Author:

D. Minimum variance

If a class of unbiased estimators isavailable, the optimal estimator is theone having the minimum variance.

The Rao-Cramer theorem expresses alower bound on the variance ofunbiased estimators of a deterministic(fixed, though unknown) parameter.

It is possible to demonstrate that thesampling mean is the estimator withminimum variance.

The sampling mean can be defined as aUniformly Minimum Variance UnbiasedEstimator (UMVUE).

UMVUE = Uniformly Minimum VarianceUnbiased Estimator