Upload
juliet-holt
View
217
Download
0
Tags:
Embed Size (px)
Citation preview
Basic Statistics for Scientific Research
Statistics involves math and calculations of numbers. It also guides how numbers are chosen, interpreted, and compared to other numbers.
Consider the following three scenarios and the interpretations based upon the presented statistics. You will find that the numbers may be right, but the interpretation may be wrong. Try to identify a major flaw with each interpretation before we describe it.
1) A new advertisement for Ben and Jerry's ice cream introduced in late May of last year resulted in a 30% increase in ice cream sales for the following three months. Thus, the advertisement was effective.
A major flaw is that other things also affect ice cream consumption and these other variables were not controlled or accounted for. Ice cream consumption generally increases in the months of June, July, and August regardless of advertisements. This effect is called a history effect and leads people to interpret outcomes as the result of one variable when another variable (in this case, the passage of time) is actually responsible.
2) The more churches in a city, the more crime there is. Thus, churches lead to crime.
A major flaw is that other things also affect crime rates and these other variables were not controlled or accounted for. Both increased churches and increased crime rates can be explained by larger populations. In bigger cities, there are both more churches and more crime. This problem is an eg of the third-variable problem. Namely, a third variable can cause both situations; however, people erroneously believe that there is a causal relationship between the two primary variables rather than recognize that a third variable can cause both.
Statistics involves math and calculations of numbers. It also guides how numbers are chosen, interpreted, and compared to other numbers.
Consider the following three scenarios and the interpretations based upon the presented statistics. You will find that the numbers may be right, but the interpretation may be wrong. Try to identify a major flaw with each interpretation before we describe it.
3) You measured the value of a variable repeatedly and found an average value of 7.0. Another lab measured an average value of 8.0. Do the two results agree?
NO MEASUREMENT IS EXACT because there are random errors in every measurement. Therefore, every measured value really represents a range of values (7 ± uncertainty). Without knowledge of the uncertainty in the average value, no comparison between the 2 results can be made. How close is close enough for agreement?
Statistics involves math and calculations of numbers. It also guides how numbers are chosen, interpreted, and compared to other numbers.
Consider the following three scenarios and the interpretations based upon the presented statistics. You will find that the numbers may be right, but the interpretation may be wrong. Try to identify a major flaw with each interpretation before we describe it.
Statistics is not just mathematics and calculations of numbers. It is also a guide to interpreting those numbers.
VALIDITY OF A CONCLUSION
Almost every research problem investigates a relationship. There are basically 2 possible conclusions (and either could be wrong)
1.There is NO relationship: (but the relationship could have been missed or not seen because it is so weak/infrequent, not enough data collected)
2. There is a relationship: (but things could have been seen that were not really there)
Things that improve conclusion validity- Statistics (collect more info, larger sample
size)- Be aware of assumptions made in data
analysis- Make more precise, less noisy
measurements
In most research, data analysis involves three major steps, done in roughly this order:
• Data Preparation - Logging, cleaning and organizing the data for analysis
• Descriptive Statistics – numbers used to summarize and describe data. Together with graphical analysis, they form the basis of almost every quantitative analysis of data. With descriptive stats, you are simply describing what the data shows
• Inferential Statistics - Testing Hypotheses and Models. Conclusions from inferential stats extends beyond the immediate data (sample) and tries to infer more general conclusions (population)
Types of Statistics/AnalysesDescriptive Statistics – describes basic features of
data. Simplifies large amounts of data with single indicators – Means (central tendency)– standard deviation/variance
(dispersion around mean)– Frequencies/percentages
(distribution)
How many? How much? How variable? How uncertain?BP, HR, BMI, IQ, etc.
Proving or disproving theoriesAssociations between phenomenaIf sample relates to the larger populationE.g., Diet and health
– Hypothesis Testing– Correlation– Confidence Intervals / T-tests – Significance Testing– Prediction
Inferential Statistics – goes beyond the immediate data to make inferences about population phenomena
Descriptive Statistics used to present quantitative descriptions in a manageable form. In a research study there may be lots of measures. Descriptive statistics helps simplify large amounts of data in a sensible way. Each descriptive statistic reduces lots of data into a simpler summary.
Examples of Descriptive Stats
- Batting average –summarizes how well batter performs. The single number describes a large number of discrete events.
- Grade Point Average (GPA)-single number that describes the general performance of a student across a potentially wide range of course experiences.
Using a single indicator for a large set of observations can distort the original data or loose important detail
doesn’t tell whether hits are HRs or singles, whether slump or streak.
doesn’t tell whether difficult or easy courses, major field or others
Descriptive StatisticsDescriptive statistics can be used to summarize and describe a single variable at a time (UNIvariate analysis). There are 3 major characteristics of a single variable (found by repeating the measurement) 1. The distribution (frequencies, percentages)2. The central tendency (mean, median, mode)3. The dispersion or spread around the central
tendency (standard deviation, range)
Types of VariablesTYPE DEFINITION EXAMPLES
DISCRETE or QUALITATIV
E
Categorical or nominal
Variable that can take on one of a limited number of possible values, groups or categories. There is no intrinsic ordering to the categories.
GenderhaircolorBloodtype (A, B, AB, O)State resident lives in
Ordinal Similar to categorical, but there is a clear ordering of the variables
Economic classEducational experienceLikert scales or ratings
CONTINUOUS or QUANTITATIVE
Interval Similar to ordinal except the intervals between the values of the interval variable are equally spaced so they can be measured along a continuum and given numerical values
Annual incomeTemperatureReaction timeBP
Ratio interval variables, but with the added condition that 0 of the measurement indicates that there is none of that variable. You can use ratios to relate these measurements.
HeightMassDistanceweightTemp only in Kelvin
Distributions of Discrete Variablesuse frequencies (counts) and percentagesexamples of discrete variables: Levels, types, groupings, yes/no, Drug A vs. Drug B
Color Frequency %
RedBrownYellowGreenOrangeBlue
18177742
30.932.712.712.73.67.3
Total 55
Table 1. Frequencies in the Bag of M&M's
Table
Different ways to display frequencies and percentages for M&M data:
17
187
7
24
Bar chart
Pie chart
FREQUENCY DISTRIBUTIONS
good if more than 20
observations
Distributions of Ordinal Level DataFrequencies and percentages can also be computed for ordinal data (which is also discrete)
– Examples: Likert Scales (Strongly Disagree to Strongly Agree); High School/Some College/College Graduate/Graduate School
Strongly Agree
Agree Disagree Strongly Disagree
0
10
20
30
40
50
60
Distribution of Interval/Ratio (Continuous) Data
We can compute frequencies and percentages for continuous (interval and ratio level) data as well
– Examples: Age, Temperature, Height, Weight, Many Clinical Serum Levels
Distribution of Injury Severity Score in a population of patients
NO MEASUREMENT IS EXACT. There is uncertainty in measured
values due to random errors. Since random errors are by nature, erratic, they are subject to the laws of probability or chance.
The uncertainty of a measurement can be reduced by repeating a measurement many times and taking the average (central tendency). The individual measurements will be scattered or distributed around the average.
The amount of spread or dispersion of individual measurements around the average value is a measure of the uncertainty of the measurement.
62 63 64 65 66 67 68
MEAN = 65.36 cm
Same average values
Larger spread or uncertainty (less
precise)
Smaller spread or uncertainty (more
precise)
The average spread of the multiple measurements around a mean value is an estimate of the
uncertainty and is called the standard deviation, SD.
Experiment: What is the effect of release height on the horizontal landing distance.Here we characterize ONE VARIABLE - the distance (for one height)
9/14/2010 17Photo courtesy of Judy Davidson, DNP, RN
STANDARD DEVIATION AND SAMPLE SIZE (degrees of freedom)
n=10
n=50
n=150
As sample size, n, increases, so SD or dispersion around the mean decreases
68% of all the measurements fall within 1 SD from the mean
The average spread of the multiple measurements around a mean value represents the uncertainty of the variable and is called the standard deviation,
SD.
95% of all the measurements fall within 2 SDs from the mean
99% of all the measurements fall within 3 SDs from the mean
Mean
NORMAL DISTRIBUTION of a VariableIn a normal distribution, points are distributed symmetrically around the mean. Many naturally-occurring phenomena can be approximated surprisingly well by this distribution
Mean (central tendency)
Dispersion measured by S.D.
Only 3 points in 1000 will fall outside 3 S.D. from mean
SDSDSDSDSDSD + + +
Freq
uenc
y or
pro
babi
lity
p<0.05
normal distrib interactive
trial measurement spread
x x – xavr (x – xavr)2
1 9 2 4
2 4 -3 9
3 7 0 0
4 6 -1 1
5 10 3 9
6 5 -2 4
7 5 -2 4
8 7 0 0
9 8 1 110 8 1 1
total
mean 7 33 %)27(9.19
33SD
1
)( 2
n
xxSD
)(9.17 SDmean
SD is also known as the square root of the
variance, s
SS
trial measurement Spread
x x – xavr (x – xavr)2
1 9 2 4
2 4 -3 9
3 7 0 0
4 6 -1 1
5 10 3 9
6 5 -2 4
7 5 -2 4
8 7 0 0
9 8 1 110 8 1 1
total
mean 7 33
1
)( 2
n
xx avrSD
)(9.17 SDmean 68% confidence that another measurement would be within one SD of the average value.
95% confidence that another measurement would be within two SDs of the average value.
99% confidence that another measurement would be within three SDs of the average value.
(between 5.1-8.9)
(between 3.2-10.8)
(between 1.3-12.7)
)(3.07 SDaveragemeasured
NO since the range within 2 STDs of the mean (95% confidence interval between 6.4 – 7.6) does not overlap with the accepted value. Therefore, there is a statistically significant difference between the two values.
Example
8valueacceptedDoes your measurement agree with the
accepted value?
What is the relative error of your
measurement?
%5.121008
87
100(%)Re
accepted
acceptedtmeasuremenErrorl
)(6.07 SDaveragemeasured
YES since within 2 STDs of uncertainty, your measured average agrees with the accepted value.You have 95% confidence that the measured average value is within two STDs (between 5.8 – 8.2).
Example
8valueacceptedDoes your measurement agree with the
accepted value?
What is the relative error of your
measurement?
%5.121008
87
100(%)Re
accepted
acceptedtmeasuremenErrorl
OTHER DISTRIBUTIONS of a single variable
Normal orsymmetric Right Skewed
Left SkewedRare
Bimodal
Fat tailed
DistributionsThe distribution of scores or values can also be displayed using Box and Whiskers Plots and Frequency Histograms
BOXPLOT (BOX AND WHISKER PLOT)
2774N =
MaleFemale
Pain
(VAS)
12
10
8
6
4
2
0
-2
MEDIAN (50th centile)
75th Centile
25th Centile
2.5th Centile
97.5th Centile
Inter-quartile range
SKEWED DISTRIBUTION
MEAN
MEDIAN – 50% OF VALUES WILL LIE ON EITHER SIDE OF THE
MEDIAN
DESCRIBING DATA – Central Tendency
MEANAverage or arithmetic mean of the data
MEDIANThe value which comes half way when the data are ranked in order
MODE Most common value observed
• In a normal distribution, mean and median are the same
• If median and mean are different, indicates that the data are not normally distributed
• The mode is of little if any practical use
DISTRIBUTIONS: EXAMPLESNORMAL DISTRIBUTION
SKEWED DISTRIBUTION
• Height
• Weight
• Haemoglobin
• IQ
• BP
• Test scores
• Speed on HW at a spot
• Bankers’ bonuses (+)
• Number of marriages (+)
• Mileage on used cars for sale (+)
• Age at death in developed countries (-)
• Number of fingers (-)
HOW TO TELL IF A VARIABLE FOLLOWS A NORMAL DISTRIBUTION
• Important because parametric statistics assume normal distributions
• Statistics packages can test normality• Distribution unlikely to be normal if:
– Mean is very different from the median– Two SDs below the mean give an
impossible answer (eg height <0 cm)
Descriptive StatisticsDescriptive statistics can be used to summarize and describe a single variable at a time (UNIvariate analysis). There are 3 major characteristics of a single variable (repeatedly measured) 1. The distribution (frequencies, percentages)2. The central tendency (mean, median, mode)3. The dispersion or spread around the central tendency
(standard deviation, variance, range). Large STD means that there is a lot of uncertainty to the mean and that the measurement is not precise.
By taking multiple measurements, a good estimate of these 3 properties of the measured variable fully characterizes the uncertainty of the variable.
INFERENTIAL STATISTICSWith inferential statistics, you are trying to reach conclusions that extend beyond the immediate data (sample) to a population. Inferential statistics can be used to prove or disprove theories, determine associations between variables, and determine if findings are significant and whether or not we can generalize from our sample to the entire population
The types of inferential statistics we will go over: • T-tests/ANOVA• Correlation • Chi-square • Logistic Regression
Types of VariablesTYPE DEFINITION EXAMPLES
DISCRETE or QUALITATIV
E
Categorical or nominal
Variable that can take on one of a limited number of possible values, groups or categories. There is no intrinsic ordering to the categories.
GenderhaircolorBloodtype (A, B, AB, O)State resident lives in
Ordinal Similar to categorical, but there is a clear ordering of the variables
Economic classEducational experienceLikert scales or ratings
CONTINUOUS or QUANTITATIVE
Interval Similar to ordinal except the intervals between the values of the interval variable are equally spaced so they can be measured along a continuum and given numerical values
Annual incomeTemperatureReaction timeBP
Ratio interval variables, but with the added condition that 0 of the measurement indicates that there is none of that variable. You can use ratios to relate these measurements.
HeightMassDistanceweightTemp only in Kelvin
Inferential Statistics• Comparisons of ONE VARIABLE between 2 or more
groups (UNIvariate analysis)
– Correlation T-tests– T-tests (comparison of one variable between 2
groups)– ANOVA (comparison of one variable between 3 or
more groups)
• Relating how TWO VARIABLES vary together (BIvariate)
– Correlations– Chi-square– Logistic Regression
T-TestsOne of the simplest inferential tests. Used to compare the average of ONE VARIABLE between two groups to see if there is a statistical difference. eg. Do 8th-grade boys and girls differ in math test scores? or Does the outcome measure differ from a control group?
The t-test assesses whether the means of two groups of data are statistically different from each other.
The two distributions are widely separated so their means are clearly different
The distributions overlap, so it is unclear whether the samples come from the same population
To judge the difference between two groups, we must consider the difference between their means relative to the spread or uncertainty of the group. The t-test does just this.
difference theof SE
meansbetween Differencenoise
signal
valuet
T-Tests What does it mean to say that the averages for two groups are statistically different?
Lower variability
Higher variability
In these 2 examples, the difference between the means is the same.
With smaller sample sizes, SD increases, as does the overlap between the two curves, so t-value decreases
Size Sample
SDSE
difference the of SEmeans between Difference
t
T-Tests What does it mean to say that the averages for two groups are statistically different?
Depends on- Difference between the means- Variability or dispersion of the dataset- Sample size or degrees of freedom (since this affects the variability)
2
22
1
21
21
var
nSD
nSD
xx
iability
meansbetweendifference
noise
signalvaluet
Does the t-value represent a statistically significant difference???? Depends on- significance level (usually set to at least 0.05 which means that
the risk of concluding that a difference exists when there is no actual difference is p<5% . p<0.05 is defined as statistically significant and not due to chance)
- df or sample size (df = ntot-2 =combined sample # -2)
you can look in a standard table of significance to see if the t-value is large enough to say that the difference between the groups is not likely to have been a chance (p<0.05)
T-Test: Are the averages of 2 sets of data, x1 and x2, statistically different?
2 types of T-tests 1. Paired t-tests: Used to compare the MEANS of a continuous
variable in two non-independent or related samples Examplesa) measurements on the same people before and after a treatment b) Is diet X effective in lowering serum cholesterol levels in a sample of
12 people?c) Do patients who receive drug X have lower blood pressure after
treatment then they did before treatment?2. Independent samples t-tests: Used to compare the MEANS of a
continuous variable in two independent samples Examplesd) Measurements from two different groups of peoplee) Do people with diabetes have the same Systolic Blood Pressure as
people without diabetes? a) Do patients who receive a new drug treatment have lower blood
pressure than those who receive a placebo?Tip: if you have > 2 different groups, you use ANOVA, which compares the means of
3 or more groups
EXAMPLE: Sam Sleepresearcher hypothesizes that people who sleep for only 4 hours will score significantly lower than people who sleep for 8 hours on a cognitive skills test. He brings 16 participants into his sleep lab and randomly assigns them to one of two groups. Group A sleeps 8 hrs, group B sleeps 4 hrs. The next morning he administers the SCAT to all participants. (Scores on SCAT range from 1-9 with high scores representing better performance).
according to the t-table, with df = 14, t must be at least 2.145 to reach p <0.05, so this difference is not statistically significant
A scores B scores5 8
7 1
5 4
3 6
5 6
3 4
3 1
9 2
Avr = 5 Avr = 4sd = 2.14 sd = 2.56
847.0
856.2
814.2
452222
B
B
A
A
avravr
nsd
nsd
BAt
T-test Calculator (click on it)
If t-test indicates no significant difference between 2 groups, that does NOT prove that there is no difference. YOU CANNOT PROVE A NEGATIVE RESULT. You can only say that the probability or confidence level is less than a particular p-value
If there is no statistically significant difference between groups, it could indicate that either- The difference is very small and not enough data was collected to see the
difference (need a larger sample size)- There really is no difference
A low probability value indicates that a difference between the groups would be very unlikelyResearch in March, 2012 reported here found evidence for the existence of the Higgs Boson particle. However, the evidence for the existence of the particle was not statistically significant. Did the researchers conclude that their investigation had been a failure or did they conclude they have evidence of the particle, just not strong enough evidence to draw a confident conclusion?
What does no significant difference mean?
How do you report t-tests results?
“As can be seen in Figure 1, specialty candidates had significantly higher scores on questions dealing with treatment than residency
candidates (t = [insert t-value from stats output], p < .001).
“As can be seen in Figure 1, children’s mean reading performance was significantly higher on the post-tests in all four grades, ( t = [insert from stats output], p < .05)”
SUMMARY THUS FAR …
ONE-SAMPLE t-Test (INDEPENDENT SAMPLE)
Used to compare means of two independent samples
PAIRED t-TEST (MATCHED PAIR)
Used to compare two (repeated) measures from the same subjects
T-tests • What does a t-test tell you?
If there is a statistically significant difference (p<0.05) between the means of two variables or groups. The 2 groups or variables being compared can be either independent OR related (ie before and after, control and experimental)
• What do the results look like? Student’s t
• How do you interpret it?By looking at the p-value that corresponds to the t-value • p < 0.05 means the 2 groups are significantly different from each other • p > 0.05 means the difference between the groups is NOT statistically
significant. It does NOT mean the 2 groups are statistically the same. (either the difference is small and more measurements are required to detect it or there is no difference between the groups)
T-test Calculator (click on it)
Comparisons of one variable Between THREE OR MORE Samples- ANOVA
• Cannot use t-test (only for 2 samples)
• Use ANalysis Of VAriance (ANOVA)
• Essentially, ANOVA involves dividing the variance in the groups into:
– Variance BETWEEN groups (variance of averages)
– Variance WITHIN groups
The greater F, the more statistically significant the difference is (values
of F in standard tables)
varianceGroups of Measure
varianceGroups of Measure
WITHIN
BETWEENF
ANOVA
WITHIN-Group Variance
BETWEEN-Group Variance
Here, the BETWEEN-group variance is large relative to the WITHIN-group variance, so F will be LARGE
ANOVA Here, the WITHIN-group variance is larger, and the BETWEEN-group variance smaller, so F will be smaller (reflecting the likeli-hood of no significant differences between these three sample means
WITHIN-Group Variance
BETWEEN-Group Variance
variance
variance
WITHIN
BETWEENF
F=0 if the group means are identical
F>0 if not F could be >0 by chance.
So how large is F to be statistically significant?
How large an F-value represents a statistically significant difference???? Depends on- significance level (usually 0.05 which means that the risk of
concluding that a difference exists when there is no actual difference is p<5% . This is defined as statistically significant and not due to chance)
- df or sample size (degrees of freedom)
you can look in a table of significance to see if the F-value is large enough to say that the difference between the groups is significant and not likely to have been due to chance (p<0.05)
ANOVA Significance
ANOVA – AN EXAMPLE
Age Group n Mean SD18-24 13 31.9 5.0
25-31 12 31.1 5.7
32-38 10 35.8 5.3
39-45 10 38.0 6.6
46-52 12 29.3 6.0
53-59 11 28.5 5.3
Total 68
“Between group” 32.2 6.4
• Focus group where 68 participants were asked to rate DVD players• Table shows scores for ‘Total DVD assessment’ by 6 different age groups
Data from SPSS sample data file ‘dvdplayer.sav’Results from running ‘One Way ANOVA’
BETWEEN group variance:The 6 group means are treated as a dataset with its own variance.The BETWEEN Variance is related to- SS- df (ngroups – 1) = 5 in this case
2)( xxi
WITHIN group variance:The BETWEEN Variance is related to- SS- df (ntotal – dfbetween) = 62 in this case
2)( xxi
ANOVA ExampleSum of Squares
df Variance(SS/df)
F Sig.(p <)
BETWEEN Groups(6 groups) 733.27 5
(ngroups-1)146.65
WITHIN Groups(68 participants) 1976.42 62
(ntotal-dfbetween)31.88
Total 2709.69 67 4.60
• ‘BETWEEN Groups’ Sum of Squares is related to the variance (or variability) of the average values of the groups. Thus we treat the collection of the 6 group means as data.
• ‘WITHIN Groups’ Sum of Squares concerns the variance within the groups
• The degrees of freedom (df) represent the number of independent data points required to define each value calculated.
2)( xxiVar
Var
WITHIN
BETWEEN
0.0012
Does the F-value represent a statistically significant difference???? Depends on the significance level and the sample size (df)
ANOVA ExampleSum of Squares
df Variance(SS/df)
F Sig.(p <)
BETWEEN Groups(6 groups) 733.27 5
(ngroups-1)146.65
WITHIN Groups(68 participants) 1976.42 62
(ntotal-dfbetween)31.88
Total 2709.69 67 4.60 0.0012
2)( xxi Var
Var
WITHIN
BETWEEN
• This would be reported as follows:
Mean scores of total DVD assessment varied significantly between age groups (F(5,62)=4.60, p=0.0012)
• Have to include the Between Groups and Within Groups degrees of freedom because these determine the significance of F
There are basic assumptions used in T-tests and ANOVA
1. the variances within each group are equal to each other
2. The data are normally distributed
T-test and ANOVA Assumptions
Inferential StatisticsCorrelation and Regression
T-tests compare one variable between 2 groupseg. Comparison of BP in females and males or
Comparison of BP before and after a drug
Correlation and regression analyses is used to see how two variables vary together.Eg. How are the variables BP and Heart Rate
related
CorrelationOne of the most common and most useful statistics. A correlation is a single number that describes the degree of relationship between two variables. It is NOT related to cause and effect.
• When to use it? When you want to know about the association or relationship between two continuous variables egs. Relationship between food intake and weight;
drug dosage and blood pressure air temp and metabolic rate
• What does it tell you? If a linear relationship exists between two variables, and how strong that relationship is.
• What do the results look like?– The correlation coefficient = Pearson’s r
can use CORREL() function in Excel– Ranges from -1 to +1
CorrelationGuide for interpreting strength of correlations:
r = 0 – 0.25: Little / no relationship r = 0.25 – 0.50: Fair relationship r = 0.50 - 0.75: Moderate relationship r = 0.75 – 1.0: Strong relationship r = 1.0: perfect correlation
Correlation • How do you interpret it?
a) If r is positive, high values of one variable are associated with high values of the other variable (both go in SAME direction: ↑↑ OR ↓↓) Eg. Diastolic blood pressure tends to rise with age, thus the two variables are
positively correlatedb) If r is negative, low values of one variable are associated with high values of
the other variable (variables change in opposite direction: ↑↓ OR ↓ ↑) Eg. Heart rate tends to be lower in persons who exercise frequently, the two
variables correlate negativelyc) r=0: Correlation of 0 indicates NO linear relationship
• How do you report it? “Diastolic blood pressure was positively correlated with age (r = .75, p < . 05).”
Tip: Correlation does NOT equal causation!!! If two variables are highly correlated, this does NOT mean that one CAUSES the other!!!
NO Correlation does NOT mean no relationship
Correlation ExampleLet's assume that we want to look at the relationship between two variables, ice cream sales and temperature
Ice Cream Sales vs Temperature for 12 days
Temperature °C
Ice Cream Sales
14.2° $21516.4° $325
11.9° $18515.2° $332
18.5° $406
22.1° $522
19.4° $41225.1° $61423.4° $544
18.1° $421
22.6° $445
17.2° $408
We can easily see that warmer weather leads to more sales, the relationship is good but not perfect.r = 0.9575
Correlation does NOT mean causation
How statistically significant is the correlation value? Depends on- significance level (usually 0.05 which is the probability that the
correlation occurred by chance. p<5% is defined as statistically significant and not due to chance)
- df or sample size (degrees of freedom)
you can look in a table of r values to see if the r is large enough to say that the difference between the groups is significant and not likely to have been due to chance (p<0.05)
Correlation Significance
Correlation ExampleCorrelation is not good at curves. It only works well for relationships that are linear.Here is the ice cream example again, but there has been a heat wave
There is a nice peak at around 25oC. Correlation analysis cannot “see” this non linear relationship.
Need a Regression analysis of scatter plot
r = 0 so there is no correlation between ice cream sales and temp.That does NOT mean that there is no relationship between the variables