41
Sampling and Hypothesis Testing(I) in MATLAB Kajal Rai [email protected]

parametric hypothesis testing using MATLAB

Embed Size (px)

DESCRIPTION

 

Citation preview

Page 1: parametric hypothesis testing using MATLAB

Sampling and Hypothesis Testing(I) in MATLAB

Kajal [email protected]

Page 2: parametric hypothesis testing using MATLAB

Contents:

• Sampling• Hypothesis Test• Types of parametric test• One sample t-test• Paired t-test• Tailed t-test• Two sample t-test• z-test• F-test• Difference between t-test, z-test and F-test

Page 3: parametric hypothesis testing using MATLAB

Sampling:• Sampling is the technique to be used in selecting the

items for the sample from the population.• Simple Random Sampling: In which each and every

unit of the population has an equal opportunity of being selected in the sample.

• Can be done with or without replacement.• If done with replacement, then each item has a

probability of 1/N of being drawn at each selection.• If done without replacement, then the first item has a

probability of 1/N, second item has 1/(N-1) and so on of being drawn.

Page 4: parametric hypothesis testing using MATLAB

Random Sampling in MATLAB

• y = randsample(n,k) returns a vector of k sample of values sampled uniformly at random, without replacement, from the integers 1 to n.

• y = randsample(population,k) returns a vector of k values sampled uniformly at random, without replacement, from the values in the vector population.

Page 5: parametric hypothesis testing using MATLAB

Random Sampling in MATLAB cntd…

• y = randsample(n,k,replacement) or 

y = randsample(population,k,replacement)

 returns a sample taken with replacement if replacement is true, or without replacement if replacement is false. By default it is false.

Page 6: parametric hypothesis testing using MATLAB
Page 7: parametric hypothesis testing using MATLAB

Random Sampling in MATLAB cntd…

• y = randsample(n,k,true,w) or y = randsample(population,k,true,w) returns a weighted sample taken with replacement, using a vector of positive weights w, whose length is n. The probability that the integer i is selected for an entry of y is w(i)/sum(w). Where, w is a vector of probabilities.

• randsample does not support weighted sampling without replacement.

Page 8: parametric hypothesis testing using MATLAB

Generate a random sequence of the characters A, C, G, and T, with replacement, according to the specified probabilities.

Page 9: parametric hypothesis testing using MATLAB

Hypothesis Tests:

• A hypothesis test is a procedure for determining if an assertion about a characteristic of a population is correct.

•  In hypothesis testing, the goal is to see if there is sufficient statistical evidence to accept a presumed null hypothesis or to reject the alternative hypothesis[1].

• The null hypothesis is usually denoted H0 while the alternative hypothesis is usually denoted H1.

Page 10: parametric hypothesis testing using MATLAB

Types of parametric test:• One sample t-test: The one-sample t-test is used when we want to know

whether our sample comes from a particular population but we do not have full population information available to us. Used when we don't know the variance.

• Paired t-test: A paired t-test looks at the difference between paired values in two samples, takes into account the variation of values within each sample, and produces a single number known as a t-value.

• Two sample t-test:To compare responses from two groups. These two groups can come from different experimental treatments, or different "populations".

• z-test: It is an appropriate parametric statistical procedure when there is one sample that is being compared to a population with a known mean and standard deviation.

• F-test: The F-test is designed to test if two population variances are equal.

Page 11: parametric hypothesis testing using MATLAB

One sample t-test:

• [h,p,ci,stat] = ttest(X,M) performs a t-test of the hypothesis that the data in X come from a distribution with mean M.

• CI returns a 100*(1-ALPHA)% confidence interval for the true mean of X.

• STATS returns a structure with the following fields:

• 'tstat' -- the value of the test statistic

• 'df' -- the degrees of freedom of the test

• 'sd' -- the estimated population standard deviation.

Page 12: parametric hypothesis testing using MATLAB

One sample t-test example:

• Ex: The specimen of copper wires drawn form a large lot have the following breaking strength (in kg. weight):

• 578, 572, 570, 568, 572, 578, 570, 572, 596, 544

• Test (using Student’s t-statistic)whether the mean breaking strength of the lot may be taken to be 578 kg. weight (Test at 5 per cent level of significance).

Page 13: parametric hypothesis testing using MATLAB
Page 14: parametric hypothesis testing using MATLAB
Page 15: parametric hypothesis testing using MATLAB

t-test with own significance level:• [h,p,ci,stat] = TTEST(...,ALPHA) performs the test at the significance level (100*A

LPHA)%. ALPHA must be a scalar.

Page 16: parametric hypothesis testing using MATLAB

Paired t-test:

• A paired t-test looks at the difference between paired values in two samples, takes into account the variation of values within each sample, and produces a single number known as a t-value.

Page 17: parametric hypothesis testing using MATLAB

Paired t-test in MATLAB:

• H = TTEST(X,Y) performs a paired T-test of the hypothesis that two matched samples, in the vectors X and Y, come from distributions with equal means. The difference X-Y is assumed to come from a normal distribution with unknown variance.

• X and Y must have the same length.

Page 18: parametric hypothesis testing using MATLAB

Example: Paired t-test

• Memory capacity of 9 students was tested before and after training. State at 5 percent level of significance whether the training was effective from the following scores:

• Before:10,15,9,3,7,12,16,17,4• After:12,17,8,5,6,11,18,20,3• Take the score before training as X and the sco

re after training as Y and then taking the null hypothesis that the mean of difference is zero

Page 19: parametric hypothesis testing using MATLAB

we accept H0 and conclude that the difference in score before and after training is insignificant i.e., it is only due to sampling fluctuations. Hence we can infer that the training was not effective.

Page 20: parametric hypothesis testing using MATLAB

Tailed t-test:

• A one- or two-tailed t-test is determined by whether the total area of α is placed in one tail or divided equally between the two tails.

• The one-tailed t-test is performed if the results are interesting only if they turn out in a particular direction.

• The two-tailed t-test is performed if the results would be interesting in either direction.

Page 21: parametric hypothesis testing using MATLAB

One-Tailed t-Test:

• There are two different one-tailed t-tests, one for each tail.

• In a one-tailed t-test, all the area associated with α is placed in either one tail or the other. Selection of the tail depends upon which direction t would be (+ or -) if the results of the experiment came out as expected.

• The selection of the tail must be made before the experiment is conducted and analyzed.

•  Test to see whether one mean was higher than the other.

Page 22: parametric hypothesis testing using MATLAB

One-tailed t-test in the positive direction

The value tcrit would be positive. For example when α is set to .05 with ten degrees of freedom (df=10), tcrit would be equal to +1.812.

Page 23: parametric hypothesis testing using MATLAB

One-tailed t-test in the negative direction 

The value tcrit would be negative. For example, when αis set to .05 with ten degrees of freedom (df=10), tcrit would be equal to -1.812.

Page 24: parametric hypothesis testing using MATLAB

Two-Tailed t-Test:

• A two-tailed t-test divides αin half, placing half in the each tail. The null hypothesis in this case is a particular value, and there are two alternative hypotheses, one positive and one negative. The critical value of t, tcrit, is written with both a plus and minus sign (± ). For example, the critical value of t when there are ten degrees of freedom (df=10) and α is set to .05, is tcrit= ± 2.228.

• We would use a two-tailed test to see if two means are different from each other (ie from different populations), or from the same population.

Page 25: parametric hypothesis testing using MATLAB

Tailed t-test in MATLAB

• H = TTEST(...,TAIL) performs the test against the alternative hypothesis specified by TAIL:

• 'both' -- "mean is M" (two-tailed test)

• 'right' -- "mean is greater than M" (right-tailed test)

• 'left' -- "mean is less than M" (left-tailed test)

Page 26: parametric hypothesis testing using MATLAB

One tailed t-test in MATLAB

Page 27: parametric hypothesis testing using MATLAB

Two-tailed test in MATLAB

Page 28: parametric hypothesis testing using MATLAB

Two sample t-test

• H = TTEST2(X,Y) performs a T-test of the hypothesis that two independent samples, in the vectors X and Y, come from distributions with equal means, and returns the result of the test in H.

• H=0 indicates that the null hypothesis ("means are equal") cannot be rejected at the 5% significance level. H=1 indicates that the null hypothesis can be rejected at the 5% level.

• The data are assumed to come from normal distributions with unknown, but equal, variances.

• X and Y can have different lengths.

Page 29: parametric hypothesis testing using MATLAB

Example:

• A group of seven-week old chickens reared on a high protein diet weight 12, 15, 11, 16, 14, 14, and 16, a second group of five chickens, similarly treated except that they receive a low protein diet, weight 8, 10, 14, 10 and 13. Testing at 5 percent level whether there is significant evidence that additional protein has increased the weight of the chickens. Using assumed mean = 10 for the sample of 7 and assumed mean = 8 for the sample of 5 chickens in our calculations.

• Taking the null hypothesis that additional protein has not increased the weight of the chickens

Page 30: parametric hypothesis testing using MATLAB

Two sample t-test in MATLAB

we reject H0 and conclude that additional protein has increased the weight of chickens, at 5 per cent level of significance.

Page 31: parametric hypothesis testing using MATLAB

Two sample t-test in MATLAB cntd…

• H = TTEST2(X,Y,ALPHA,TAIL,VARTYPE) allows you to specify the type of test. When VARTYPE is 'equal', TTEST2 performs the default test assuming equal variances.

• When VARTYPE is 'unequal', TTEST2 performs the test assuming that the two samples come from normal distributions with unknown and unequal variances. This is known as the Behrens-Fisher problem.

Page 32: parametric hypothesis testing using MATLAB

z-test in MATLAB

A z-test is used for testing the mean of a population or comparing the means of two populations, with large (n ≥ 30) samples when we know the population standard deviation.

• H = ZTEST(X,M,SIGMA) performs a Z-test of the hypothesis that the data in the vector X come from a distribution with mean M, and returns the result of the test in H.

• H=0 indicates that the null hypothesis ("mean is M") cannot be rejected at the 5% significance level. H=1 indicates that the null hypothesis can be rejected at the 5% level.

• The data are assumed to come from a normal distribution with standard deviation SIGMA.

Page 33: parametric hypothesis testing using MATLAB

Example:

• A dog food manufacturer, had created new Super Vitamin Enriched Puppy Chow, specially designed for the active and growing Doberman Pincer.

• The sample of 10 Doberman puppies are 27.5, 33.5, 36.8, 39.5, 40.5, 42.5, 40.0, 22.9, 39.8, 40.8 and fed them nothing but with Super Vitamin Enriched Puppy Chow. When these dogs reached adulthood, they weighed 39.7 kg on average (M) and σ = 6.2 kg

• Did Puppy Chow make them grow especially big, test with a = .05?

• H0: The puppy chow did make the dogs grow more than normal.• H1: The puppy chow did not make the dogs grow larger than no

rmal

Page 34: parametric hypothesis testing using MATLAB

We will accept H0 and conclude that the Super Vitamin Enriched Puppy Chow makes Doberman Pincers grow significantly larger.

Page 35: parametric hypothesis testing using MATLAB

F-test:

• F-test is used to compare the variance of the two-independent samples.

• This test is also used in the context of analysis

of variance (ANOVA) for judging the significance of more than two sample means at one and the same time.

• It is also used for judging the significance of multiple correlation coefficients.

Page 36: parametric hypothesis testing using MATLAB

F-test in MATLAB

• H = vartest2(X,Y) performs an F test of the hypothesis that two independent samples, in the vectors X and Y, come from normal distributions with the same variance, against the alternative that they come from normal distributions with different variances.

• The result is H=0 if the null hypothesis ("variances are equal") cannot be rejected at the 5% significance level, or H=1 if the null hypothesis can be rejected at the 5% level.

• X and Y can have different lengths.

Page 37: parametric hypothesis testing using MATLAB

Example:

• Two random samples drawn from two normal populations are:

• Sample1: 20 16 26 27 23 22 18 24 25 19• Sample2: 27 33 42 35 32 34 38 28 41 43 30 37• At 5% significance level.• We take the null hypothesis that the two populations f

rom where the samples have been drawn have the same variances

Page 38: parametric hypothesis testing using MATLAB

Since p value is more than 0.05 as such we accept the null hypothesis and conclude that samples have been drawn from two populations having the same variances.

Page 39: parametric hypothesis testing using MATLAB

Difference between t-test, z-test and F-test:

t-test z-test F-test

A t-test is used for testing the mean of one population against a standard or comparing the means of two populations. And when you do not know the populations’ standard deviation and when you have a limited sample (n < 30).

A z-test is used for testing the mean of a population versus a standard, or comparing the means of two populations, with large (n ≥ 30) samples when we know the population standard deviation.

It is also used for testing the proportion of some characteristic versus a standard proportion, or comparing the proportions of two populations.

An F-test is used to compare 2 populations’ variances. The samples can be any size. It is the basis of ANOVA.

Page 40: parametric hypothesis testing using MATLAB

References:

• Kothari, C.R.,1985, Research Methodology- Methods and Techniques, New Delhi, Wiley Eastern Limited.

• S.P.Gupta,Statistical Methods,eight revised edition 2009

• http://www.mathworks.in/help/stats/ztest.html#btriieq

• http://www.math.uah.edu/stat/hypothesis/Introduction.html

• http://www.mathworks.in/products/statistics/description7.html

• How to Do a T-Test in MATLAB eHow http://www.ehow.com/how_12211819_ttest-matlab.html#ixzz2WSQ6BN6o

Page 41: parametric hypothesis testing using MATLAB

THANK YOU