View
49
Download
3
Category
Tags:
Preview:
DESCRIPTION
Suppose, we have a bag of nuts. I will choose one of nut s , I will crack it and it will be empty. What then I can conclude? The optimist says: „ But this! O nly one nut is bad and I have to pull it. At least we got rid of it. - PowerPoint PPT Presentation
Citation preview
doc.Ing. Zlata Sojková, CSc. 1
Inferential statistics
Suppose, we have a bag of nuts.I will choose one of nuts, I will crack it and it will be empty. What then I can conclude? The optimist says: „But this! Only one nut is bad and I have to pull it. At least we got rid of it."Pessimist says:" This is what I was afraid of, the bag is full of bad nuts ". What will say Statistician? I declare that both pessimist and optimist may be right.
To determine whether the nuts in the bag are bad,
it is enough to select few nuts from different places of bag and crack them …
doc.Ing. Zlata Sojková, CSc. 2
Statistical inference is based on the sample investigation
Statistical inference is the process of using sample results to draw conclusions about the parameters of a population.
The sample should be a representativesample of the population. On the picture it’s not so ...
doc.Ing. Zlata Sojková, CSc. 3
Examples of inferential statistics
Household accounts Marketing research of consumer behavior (patterns?) Sample investigation of agricultural enterprises Survey of public opinions Quality control
doc.Ing. Zlata Sojková, CSc. 4
Inferential statistics (or Statistical inference)
Assume that we are working with the sample and we calculate a sample statistics such: sample average, sample variance , sample standard deviation.
Based on the sample we assume the properties of a population.
This means , the values of a sample statistics are used to estimate the unknown values of population parameters
Usually we estimate parameters of population such : population mean, population variance, standard deviation of population.
doc.Ing. Zlata Sojková, CSc. 5
Graphicaly
Sample with size n
Symbols: parameters of population: , 2, , generally Q
sample characteristics :
Generally:un
s, s,x 2 Population – size N,resp. (infinity)
doc.Ing. Zlata Sojková, CSc. 6
has two basic tasks
statistical estimation - unknown population parameters are estimated by sample characteristics
Statistical hypothesis testing - we express assumptions about the unknown parameters of the population. If we can formulate these assumptions to statistical hypotheses and we can verify their validity by statistical procedures, then these statistical process is statistical hypothesis testing.
Statistical inference (SI)
doc.Ing. Zlata Sojková, CSc. 7
Some another tasks of SI To determinate size of sample (n), which will be enough for reliable
(spoľahlivý) estimation of parameters
To determinate some methods of statistical units sampling from population
Explanation: the sample characteristics are deterministic in relationship to the sample,
but they are random variables in relationship to the population , so they have some
probability distribution.
That means, important is choosing of the right model of sample characteristic
distribution, which we have to use in statistical inference (this made for us
statisticians). Arithmetic average has usually Student distribution, but in large
sample (n>30) we can approximate Student distribution by Normal distribution
doc.Ing. Zlata Sojková, CSc. 8
Random samplingThere are a lot of methods that can be used to select
a sample from a population
from the repetition point of view selection with replacement •selection without replacement
Classification based on the subdivision file simple random sample (finite or infinite
population) or composite, which can be:.
• Based on choosing of groups • Quota sampling …..e.t.c.
doc.Ing. Zlata Sojková, CSc. 9
Theory of Estimation (TO)
Point estimate – bodový odhad Interval estimate – intervalový odhad
Repetition: the main goal of theory of estimation is to estimate population parameters such: mju, sigma by using sample characteristics
There are two types of estimators
doc.Ing. Zlata Sojková, CSc. 10
Point estimation of population parameter Q (generally)
Point estimator – is a single numerical value used as an estimate of population parameter Q - geometrically that means one point
Estimate- estimator – abbrev.est. sign: est Q = un
Q un
Mostly we estimate :
population mean
variance of population 2 and standard deviation of
population
doc.Ing. Zlata Sojková, CSc. 11
Attributes of point estimates
The best estimator satisfies (meets) following conditions: Unbiasedness - neskreslenosť (nevychýlenosť) Consistency - konzistencia Efficiency - výdatnosť Suficiency (postačujúci odhad)
We eplain two first condition
doc.Ing. Zlata Sojková, CSc. 12
UnbiasednessE(un - Q) = 0 E( un )= Q
we will repeat sampling more times, always we will get some another error – so we will get another average .
According to the unbiasedness we require that expected value of all errors should be equal to zero. We
require that all errors are only random, so we don’t underestimate or overestimate the mean of population.
x
x x xx
xx xx
x
x
Asymptotically unbiased estimator of Q is sample characteristic , which satisfy condition :
Q)E(u lim nn
x
doc.Ing. Zlata Sojková, CSc. 13
Consistency
1 ) | Q- u(|P lim nn
Principle of consistency lies in the law of large numbers. The consistency provides in statistical practice, that with increasing sample size the error of estimation decreases.
For large samples the error of estimation is very smallSufficient condition of consistency is asymptotically unbiased estimation of un and meeting of the condition:
0)D(u lim n n
doc.Ing. Zlata Sojková, CSc. 14
Efficiency PE
Any sample characteristic is a random variable, with some variance
If we have two unbiased point estimators of the same population parameter, the point with the smaller variance is said to have greater efficiency.
min )u(D n
doc.Ing. Zlata Sojková, CSc. 15
Point estimator of population mean
nxDxE
2
... )( , )(
While offers unbiased estimator of and : x
0n
lim)x(D n
lim 2
n
The sufficiency condition of consistence is satisfied and is unbiased and consistent estimator of population mean x est
nx
! Standard deviation of average , mean standard error of estimation
x
doc.Ing. Zlata Sojková, CSc. 16
Point estimator of variance 2 resp. 22 .
n
1)-(n ... )( sE
Sample variance s 2 isn’t unbiased estimator of population variance 2 -it offers negatively biased estimation.
Unbiasedness is equal to 2 .
n
1
22
n
2
n
n
1nlim )E(s lim
The sample variance is asymptotically unbiased of 2, while
doc.Ing. Zlata Sojková, CSc. 17
So, unbiased point estimator of population variance 2 is sample variance s1
2, which is computed:
2n
1jj
221 )(x
1-n
1
1-n
n xss
Bessel’s correction
Difference between s12
and s2 is decreasing with increasing sample size n. At the sample size greater than 50, ( n > 50 ) difference is negligible
Conclusion x est
21
2 s est
doc.Ing. Zlata Sojková, CSc. 18
Example:At 400 random households in one of the regions SR were investigated expenditures on alcoholic drinks and cigarettes. We will make point estimate of mean and standard error.
Skxest 973 Skest 286s 1
3.1420
2861 n
sx
Estimated average error of mean is relatively small. It is only 1.5% of mean. We can expect that error in estimation of average expenditures on alcoholic drinks and cigarettes is not too large.
doc.Ing. Zlata Sojková, CSc. 19
Comparison of the statistical distribution of attributes X in the population to the distribution of
sample average :
)x(f
)x(f
n
σx
doc.Ing. Zlata Sojková, CSc. 20
Interval estimate of parameter Q
q1 q2
(1 - ) confidency level/2 /2
P(q1 Q q2) = 1-
-risk of estimation
q1,q2 – lower and upper limit of interval - random
f(g)
doc.Ing. Zlata Sojková, CSc. 21
Interval estimation of population mean Suppose, that the statistical attribute has a Normal distribution X.....N(,2) , If we will choose a sample with the size of n, then aritmethic average has Normal distribution too .......N(, 2/n)
Confidence interval for depends on disponibility of information and sample size:a) If the variance of population is known (theoretical
assumption) we can create standardized normal variables :
n
- x u
u has N(0,1) independent on
estiamed value
doc.Ing. Zlata Sojková, CSc. 22
-1 u
n
σ-μx
uP2
12
1
21u
21
u
1 -
f(u)
doc.Ing. Zlata Sojková, CSc. 23
-1
nu x
n ux P
21
21
After transformation we get
- sampling error
- half of the interval, determinates accurancy of the estimation,Interval estimate is actually point estimate , t.j.
Δ x
Δ x Δ x
doc.Ing. Zlata Sojková, CSc. 24
b) The population variance is unknown est 2 = s1
2 , and the sample size is large, n > 30
n
s u x 1
21
c) If the population variance is unknown est 2 = s1
2 , and the sample size is small (less than 30), n 30
n
s t x 1
1)-(nt(n-1) –critical value of Student’s distribution at alfa level and at degrees of freedom
We can use N(0,1)
doc.Ing. Zlata Sojková, CSc. 25
Example: Based on the point estimator of household expenditure on cigarette and alcohol we will do interval estimation with 95% of probability
n
1
21
sux
= 1.96 * 14.3 = 28.03
973 - 28.03 < < 973 + 28.03, t.j 944.97 < < 1 001.03
With 95% probability we estimate average expenditure from 945 Sk to 1001 Sk.
n=400 973x
3.14400
2861 n
sx
96.1uu u 975.00.025 -1
21
Excel... NORMSINV(0.975)
x
doc.Ing. Zlata Sojková, CSc. 26
Example: It was taken research to investigate the weight loss of
carrot, after one week storage. 20 samples of 1 kg weight at the begining of the storage was analyzed and the loss of weight was identified. Average weight loss was 49g with sample standard deviation 4g.We assume, that weight loss have normal distribution. We will estimate average loss of weight with 95% confidence. Because n<30 we will use...
9.501.47
9.120
4. 2.09 49
n
11)-(n
s
tx t(n-1) -kvantil Studentovho rozdelenia, t0.05(19)=2.09TINV(0.05;19) - Excel
With 95 % confidence, average weight loss of 1kg carrot sample is in interval 47.1g to 50.9g
doc.Ing. Zlata Sojková, CSc. 27
The large of confidence error depends on the??
confidence probability (1- ) mean error of average which depends on:
Variability of attributes - we can’t change it ,Sample size . That we can change !!!
2
212
/2-1
s u n
The sample size which we need for achievement of
reliability an accuracy we can determinate using next formula:
doc.Ing. Zlata Sojková, CSc. 28
Confidence Interval for variance 2 a
1 χ χ χP 22/
222/1
2
212
σ
)s1(nχ
/2/2 1 -
2 1-/2 2
/2
f(2)
2χ
Critical values of CHÍ-square distribution
doc.Ing. Zlata Sojková, CSc. 29
After transformation we receive:
1 χ
1)s-(n
χ
1)s-(n P
22/1
212
22/
21
1 χ
1)s-(n
χ
1)s-(nP
22/1
21
22/
21
Respectively confidence interval for standard deviation:
doc.Ing. Zlata Sojková, CSc. 30
QuestionsWhat is relevant difference between
point and interval estimation? How boundary interval depends on the confidence level?? How confidence level influences the accuracy of the confidence interval
How can we assure interval estimate of mean with chosen confidence and accurancy?
Recommended