Upload
ikhram-johari
View
214
Download
1
Embed Size (px)
Citation preview
TASK 2 (REFER TO INDIVIDUAL PROJECT INSTRUCTIONS) QUESTION ______________________________________________________________________
a) Give suitable short notes for a given topic above that cover all that have been learn.
(Give at least five (5) pages with references and do not use the class notes).
i. definition / introduction
ii. state the important statement / options
iii. When to use
iv. Uses of the Procedure
v. Advantages & Disadvantages
INTRODUCTION
The ANOVA procedure is one of several procedures available in SAS/STAT software for
analysis of variance.
One-way analysis of variance (ANOVA) is used for experimental data in which there is a
continuous response variable and a single independent classification variable.
The total variation in the response variable is explained as the sum of the variation due
to the effects of the classification variable and the variation due to random error.
In a one-way-design the effect of a single independent factor variable on a response
variable is of interest.
The resulting ANOVA is called a one-way ANOVA.
When ANOVA is appropriate (normal distribution) the samples within the groups are best
summarized by their means and standard deviation.
In a one-way design we can obtain estimates of the
Population means within the k groups μ1,…, μk and confidence intervals simply by
considering each group separately (see continuous data). For the confidence intervals to
be valid we need to assume (1) and (2) but not
When using one-way analysis of variance, the process of looking up the resulting value
of F in an F-distribution table, is proven to be reliable under the following assumptions:
the values in each of the groups (as a whole) follow the normal curve,
with possibly different population averages (though the null hypothesis is that all of the
group averages are equal) and
equal population standard deviations.
Hypotheses
The null hypothesis will be that all population means are equal, the alternative
hypothesis is that at least one mean is different.
In the following, lower case letters apply to the individual samples and capital letters
apply to the entire set collectively. That is, n is one of many sample sizes, but N is the
total sample size.
Grand Mean
The grand mean of a set of samples is the total of all the data values divided by the total
sample size. This requires that you have all of the sample data available to you, which is
usually the case, but not always. It turns out that all that is necessary to find perform a
one-way analysis of variance are the number of samples, the sample means, the sample
variances, and the sample sizes.
Another way to find the grand mean is to find the weighted average
of the sample means. The weight applied is the sample size.
Total Variation
The total variation (not variance) is comprised the sum
of the squares of the differences of each mean with
the grand mean.
There is the between group variation and the within group variation. The whole idea
behind the analysis of variance is to compare the ratio of between group variance to
within group variance. If the variance caused by the interaction between the samples is
much larger when compared to the variance that appears within each group, then it is
because the means aren't the same.
IMPORTANT STATEMENT & SYNTAX
PROC ANOVA: Syntax
The following statements are available in PROC ANOVA.
PROC ANOVA ;CLASS variables ;MODEL dependents=effects ;BY variables ;MEANS effects ;
The PROC ANOVA, CLASS, and MODEL statements are required, and they must precede
the first
RUN statement. The CLASS statement must precede the MODEL statement. If you use BY
statement,
it must precede the first RUN statement. The MEANS statement must follow the MODEL
statement.
Adapted from SAS (V. 8.02) System Help phbrito 2003
CLASS variables ;The CLASS statement names the classification variables to be used in the model.
Typical class variables are TREATMENT, SEX, RACE, GROUP, and REPLICATION.
The CLASS statement is required, and it must appear before the MODEL statement.
Classification variables are also called
categorical, qualitative, discrete, or nominal variables. The values of a class variable are
called levels.Class variables can be either numeric or character. This is in contrast to the
response (or dependent)variables, which are continuous. Response variables must be
numeric.
MODEL dependents=effects;The MODEL statement names the dependent variables and independent effects. If no
independent effects are specified, only an intercept term is fit. This tests the hypothesis
that the mean of the dependent variable is zero. All variables in effects that you specify
in the MODEL statement must appear in the CLASS statement.
BY variables ;You can specify a BY statement with PROC ANOVA to obtain separate analyses on
observations in
groups defined by the BY variables. When a BY statement appears, the procedure
expects the input data set to be sorted in order of the BY variables. The variables are
one or more variables in the input data set. If your input data set is not sorted you should
sort the data using the SORT procedure with a similar BY statement. If the BY statement
is used, it must appear before the first RUN statement or it is ignored. When you use a
BY statement, the interactive features of PROC ANOVA are disabled.
MEANS effects ;PROC ANOVA can compute means of the dependent variables for any effect that
appears on the righthand side in the MODEL statement.
WHEN TO USE?
To compare the means of more than two independent, normally distributed samples of
equal size, PROC ANOVA should be used.
To study the variable effects on the response variable and also the significance of the
model
USES OF THE PROCEDURE
The ANOVA procedure is designed to handle balanced data (that is, data with equal
numbers of observations for every combination of the classification factors)
Because PROC ANOVA takes into account the special structure of a balanced design, it
is faster and uses less storage than PROC GLM for balanced data.
Compares the means between the groups you are interested in and determines whether
any of those means are significantly different from each other
ADVANTAGES & DISADVANTAGES
The main advantage of the within subjects design is :
i. It controls for individual differences between participants.
ii. In between groups designs some fluctuation in the scores of the groups that is
due to different participants providing scores
iii. To control this unwanted variability participants provide scores for each of the
treatment levels the variability due to the participants is assumed not to vary
across the treatment levels
REFERENCES
https://statistics.laerd.com/statistical-guides/one-way-anova-statistical-guide-4.php
http://www.public.asu.edu/~davidpm/classes/psy530/examples/ samplesas.htm#one
http://people.stat.sfu.ca/~sgchiu/Grace/S330/handouts/1-way/
http://www.stat.purdue.edu/~tqin/system101/method/
method_one_way_ANOVA_sas.htm#data
http://math.colgate.edu/math102/dlantz/examples/ANOVA/anovahyp.html
https://people.richland.edu/james/lecture/m170/ch13-1wy.html
b) Create / Find the dataset with the appropriate sample size for the suitable statistical
analysis. Create the case study and give at least two (2) examples for the given topic
above.
i. 15 Subjects in three treatment groups X,Y and Z. Response: Number of words a
subject reads per minute.
X Y Z
700 480 500
850 460 550
820 500 480
640 570 600
920 580 610
How do we know if the means obtained are different because of difference in the
reading programs(X,Y,Z)?
ii. Psychologist wants to know if weather condition affects problem-solving.30
undergrads assigned to 1 of 3 conditions.
Raining outside
Snowing outside
Sunny outside
Subjects took 30 minutes to solve problems.
Raining Snowing Sunny 12 15 15 17 12 20 14 13 17 13 14 16 15 17 18 17 16 20 15 13 18 13 18 16 18 15 21 16 17 19
c) Based on your creativity do the appropriate analysis and give the comment on the
output.
DATA READING; INPUT GROUP $ WORDS @@;DATALINES;X 700 X 850 X 820 X 640 X 920Y 480 Y 460 Y 500 Y 570 Y 580Z 500 Z 550 Z 480 Z 600 Z 610;PROC PRINT DATA=READING;run;PROC ANOVA DATA=READING; TITLE ‘ANALYSIS OF READING DATA’; CLASS GROUP; MODEL WORDS=GROUP; MEANS GROUP;RUN;
From the given output, it can be summarized that the mean obtained are different
because of difference in the reading programs(X,Y,Z).
We also can conclude that the model is appropriate since the p-value is 0.0003 and it
has 73.67% of total variation in the response variable is explained by reading
programs
d) Create mind mapping to summarize the following topics as follows: (Hint: use Microsoft Office).
**Note: Include the references such as book, journal, papers, internet access and others.
One – Way
ANOVA
used for continuous response variable
does not work for unbalanced data
explained as the sum of the variation
STATEMENT & SYNTAX:
PROC ANOVA CLASS MODEL BY MEANS
WHEN TO USE:
To compare the means
To study the variable effects
USES OF THE PROCEDURE:
to handle balanced data determines means
different