11
TASK 2 (REFER TO INDIVIDUAL PROJECT INSTRUCTIONS) QUESTION ______________________________________________________________________ a) Give suitable short notes for a given topic above that cover all that have been learn. (Give at least five (5) pages with references and do not use the class notes). i. definition / introduction ii. state the important statement / options iii. When to use iv. Uses of the Procedure v. Advantages & Disadvantages INTRODUCTION The ANOVA procedure is one of several procedures available in SAS/STAT software for analysis of variance. One-way analysis of variance (ANOVA) is used for experimental data in which there is a continuous response variable and a single independent classification variable. The total variation in the response variable is explained as the sum of the variation due to the effects of the classification variable and the variation due to random error. In a one-way-design the effect of a single independent factor variable on a response variable is of interest. The resulting ANOVA is called a one-way ANOVA. When ANOVA is appropriate (normal distribution) the samples within the groups are best summarized by their means and standard deviation.

TASK 2gnfgnfgnfgnfgnfgnfgnfgnfgnfgnfgnfg

Embed Size (px)

Citation preview

Page 1: TASK 2gnfgnfgnfgnfgnfgnfgnfgnfgnfgnfgnfg

TASK 2 (REFER TO INDIVIDUAL PROJECT INSTRUCTIONS) QUESTION ______________________________________________________________________

a) Give suitable short notes for a given topic above that cover all that have been learn.

(Give at least five (5) pages with references and do not use the class notes).

i. definition / introduction

ii. state the important statement / options

iii. When to use

iv. Uses of the Procedure

v. Advantages & Disadvantages

INTRODUCTION

The ANOVA procedure is one of several procedures available in SAS/STAT software for

analysis of variance.

One-way analysis of variance (ANOVA) is used for experimental data in which there is a

continuous response variable and a single independent classification variable.

The total variation in the response variable is explained as the sum of the variation due

to the effects of the classification variable and the variation due to random error.

In a one-way-design the effect of a single independent factor variable on a response

variable is of interest.

The resulting ANOVA is called a one-way ANOVA.

When ANOVA is appropriate (normal distribution) the samples within the groups are best

summarized by their means and standard deviation.

In a one-way design we can obtain estimates of the

Population means within the k groups μ1,…, μk and confidence intervals simply by

considering each group separately (see continuous data). For the confidence intervals to

be valid we need to assume (1) and (2) but not

When using one-way analysis of variance, the process of looking up the resulting value

of F in an F-distribution table, is proven to be reliable under the following assumptions:

Page 2: TASK 2gnfgnfgnfgnfgnfgnfgnfgnfgnfgnfgnfg

the values in each of the groups (as a whole) follow the normal curve,

with possibly different population averages (though the null hypothesis is that all of the

group averages are equal) and

equal population standard deviations.

Hypotheses

The null hypothesis will be that all population means are equal, the alternative

hypothesis is that at least one mean is different.

In the following, lower case letters apply to the individual samples and capital letters

apply to the entire set collectively. That is, n is one of many sample sizes, but N is the

total sample size.

Grand Mean

The grand mean of a set of samples is the total of all the data values divided by the total

sample size. This requires that you have all of the sample data available to you, which is

usually the case, but not always. It turns out that all that is necessary to find perform a

one-way analysis of variance are the number of samples, the sample means, the sample

variances, and the sample sizes.

Another way to find the grand mean is to find the weighted average

of the sample means. The weight applied is the sample size.

Page 3: TASK 2gnfgnfgnfgnfgnfgnfgnfgnfgnfgnfgnfg

Total Variation

The total variation (not variance) is comprised the sum

of the squares of the differences of each mean with

the grand mean.

There is the between group variation and the within group variation. The whole idea

behind the analysis of variance is to compare the ratio of between group variance to

within group variance. If the variance caused by the interaction between the samples is

much larger when compared to the variance that appears within each group, then it is

because the means aren't the same.

IMPORTANT STATEMENT & SYNTAX

PROC ANOVA: Syntax

The following statements are available in PROC ANOVA.

PROC ANOVA ;CLASS variables ;MODEL dependents=effects ;BY variables ;MEANS effects ;

The PROC ANOVA, CLASS, and MODEL statements are required, and they must precede

the first

RUN statement. The CLASS statement must precede the MODEL statement. If you use BY

statement,

it must precede the first RUN statement. The MEANS statement must follow the MODEL

statement.

Adapted from SAS (V. 8.02) System Help phbrito 2003

Page 4: TASK 2gnfgnfgnfgnfgnfgnfgnfgnfgnfgnfgnfg

CLASS variables ;The CLASS statement names the classification variables to be used in the model.

Typical class variables are TREATMENT, SEX, RACE, GROUP, and REPLICATION.

The CLASS statement is required, and it must appear before the MODEL statement.

Classification variables are also called

categorical, qualitative, discrete, or nominal variables. The values of a class variable are

called levels.Class variables can be either numeric or character. This is in contrast to the

response (or dependent)variables, which are continuous. Response variables must be

numeric.

MODEL dependents=effects;The MODEL statement names the dependent variables and independent effects. If no

independent effects are specified, only an intercept term is fit. This tests the hypothesis

that the mean of the dependent variable is zero. All variables in effects that you specify

in the MODEL statement must appear in the CLASS statement.

BY variables ;You can specify a BY statement with PROC ANOVA to obtain separate analyses on

observations in

groups defined by the BY variables. When a BY statement appears, the procedure

expects the input data set to be sorted in order of the BY variables. The variables are

one or more variables in the input data set. If your input data set is not sorted you should

sort the data using the SORT procedure with a similar BY statement. If the BY statement

is used, it must appear before the first RUN statement or it is ignored. When you use a

BY statement, the interactive features of PROC ANOVA are disabled.

MEANS effects ;PROC ANOVA can compute means of the dependent variables for any effect that

appears on the righthand side in the MODEL statement.

Page 5: TASK 2gnfgnfgnfgnfgnfgnfgnfgnfgnfgnfgnfg

WHEN TO USE?

To compare the means of more than two independent, normally distributed samples of

equal size, PROC ANOVA should be used.

To study the variable effects on the response variable and also the significance of the

model

USES OF THE PROCEDURE

The ANOVA procedure is designed to handle balanced data (that is, data with equal

numbers of observations for every combination of the classification factors)

Because PROC ANOVA takes into account the special structure of a balanced design, it

is faster and uses less storage than PROC GLM for balanced data.

Compares the means between the groups you are interested in and determines whether

any of those means are significantly different from each other

ADVANTAGES & DISADVANTAGES

The main advantage of the within subjects design is :

i. It controls for individual differences between participants.

ii. In between groups designs some fluctuation in the scores of the groups that is

due to different participants providing scores

iii. To control this unwanted variability participants provide scores for each of the

treatment levels the variability due to the participants is assumed not to vary

across the treatment levels

REFERENCES

https://statistics.laerd.com/statistical-guides/one-way-anova-statistical-guide-4.php

http://www.public.asu.edu/~davidpm/classes/psy530/examples/ samplesas.htm#one

http://people.stat.sfu.ca/~sgchiu/Grace/S330/handouts/1-way/

http://www.stat.purdue.edu/~tqin/system101/method/

method_one_way_ANOVA_sas.htm#data

http://math.colgate.edu/math102/dlantz/examples/ANOVA/anovahyp.html

https://people.richland.edu/james/lecture/m170/ch13-1wy.html

Page 6: TASK 2gnfgnfgnfgnfgnfgnfgnfgnfgnfgnfgnfg

b) Create / Find the dataset with the appropriate sample size for the suitable statistical

analysis. Create the case study and give at least two (2) examples for the given topic

above.

i. 15 Subjects in three treatment groups X,Y and Z. Response: Number of words a

subject reads per minute.

X         Y         Z

700      480      500

850      460      550

820      500      480

640      570      600

920      580      610

How do we know if the means obtained are different because of difference in the

reading programs(X,Y,Z)?

ii. Psychologist wants to know if weather condition affects problem-solving.30

undergrads assigned to 1 of 3 conditions.

Raining outside

Snowing outside

Sunny outside

Subjects took 30 minutes to solve problems.

Raining Snowing Sunny 12 15 15 17 12 20 14 13 17 13 14 16 15 17 18 17 16 20 15 13 18 13 18 16 18 15 21 16 17 19

Page 7: TASK 2gnfgnfgnfgnfgnfgnfgnfgnfgnfgnfgnfg

c) Based on your creativity do the appropriate analysis and give the comment on the

output.

DATA READING; INPUT GROUP $ WORDS @@;DATALINES;X 700 X 850 X 820 X 640 X 920Y 480 Y 460 Y 500 Y 570 Y 580Z 500 Z 550 Z 480 Z 600 Z 610;PROC PRINT DATA=READING;run;PROC ANOVA DATA=READING; TITLE ‘ANALYSIS OF READING DATA’; CLASS GROUP; MODEL WORDS=GROUP; MEANS GROUP;RUN;

Page 8: TASK 2gnfgnfgnfgnfgnfgnfgnfgnfgnfgnfgnfg

From the given output, it can be summarized that the mean obtained are different

because of difference in the reading programs(X,Y,Z).

We also can conclude that the model is appropriate since the p-value is 0.0003 and it

has 73.67% of total variation in the response variable is explained by reading

programs

Page 9: TASK 2gnfgnfgnfgnfgnfgnfgnfgnfgnfgnfgnfg

d) Create mind mapping to summarize the following topics as follows: (Hint: use Microsoft Office).

**Note: Include the references such as book, journal, papers, internet access and others.

One – Way

ANOVA

used for continuous response variable

does not work for unbalanced data

explained as the sum of the variation

STATEMENT & SYNTAX:

PROC ANOVA CLASS MODEL BY MEANS

WHEN TO USE:

To compare the means

To study the variable effects

USES OF THE PROCEDURE:

to handle balanced data determines means

different