Basic Statistics for Scientific Research. Statistics involves math and calculations of numbers. It also guides how numbers are chosen, interpreted, and

Basic Statistics for Scientific Research

Statistics involves math and calculations of numbers. It also guides how numbers are chosen, interpreted, and compared to other numbers.

Consider the following three scenarios and the interpretations based upon the presented statistics. You will find that the numbers may be right, but the interpretation may be wrong. Try to identify a major flaw with each interpretation before we describe it.

1) A new advertisement for Ben and Jerry's ice cream introduced in late May of last year resulted in a 30% increase in ice cream sales for the following three months. Thus, the advertisement was effective.

A major flaw is that other things also affect ice cream consumption and these other variables were not controlled or accounted for. Ice cream consumption generally increases in the months of June, July, and August regardless of advertisements. This effect is called a history effect and leads people to interpret outcomes as the result of one variable when another variable (in this case, the passage of time) is actually responsible.

2) The more churches in a city, the more crime there is. Thus, churches lead to crime.

A major flaw is that other things also affect crime rates and these other variables were not controlled or accounted for. Both increased churches and increased crime rates can be explained by larger populations. In bigger cities, there are both more churches and more crime. This problem is an eg of the third-variable problem. Namely, a third variable can cause both situations; however, people erroneously believe that there is a causal relationship between the two primary variables rather than recognize that a third variable can cause both.



3) You measured the value of a variable repeatedly and found an average value of 7.0. Another lab measured an average value of 8.0. Do the two results agree?

NO MEASUREMENT IS EXACT because there are random errors in every measurement. Therefore, every measured value really represents a range of values (7 ± uncertainty). Without knowledge of the uncertainty in the average value, no comparison between the 2 results can be made. How close is close enough for agreement?



Statistics is not just mathematics and calculations of numbers. It is also a guide to interpreting those numbers.

VALIDITY OF A CONCLUSION

Almost every research problem investigates a relationship. There are basically 2 possible conclusions (and either could be wrong)

1.There is NO relationship: (but the relationship could have been missed or not seen because it is so weak/infrequent, not enough data collected)

2. There is a relationship: (but things could have been seen that were not really there)

Things that improve conclusion validity- Statistics (collect more info, larger sample

size)- Be aware of assumptions made in data

analysis- Make more precise, less noisy

measurements

In most research, data analysis involves three major steps, done in roughly this order:

• Data Preparation - Logging, cleaning and organizing the data for analysis

• Descriptive Statistics – numbers used to summarize and describe data. Together with graphical analysis, they form the basis of almost every quantitative analysis of data. With descriptive stats, you are simply describing what the data shows

• Inferential Statistics - Testing Hypotheses and Models. Conclusions from inferential stats extends beyond the immediate data (sample) and tries to infer more general conclusions (population)

Types of Statistics/AnalysesDescriptive Statistics – describes basic features of

data. Simplifies large amounts of data with single indicators – Means (central tendency)– standard deviation/variance

(dispersion around mean)– Frequencies/percentages

(distribution)

How many? How much? How variable? How uncertain?BP, HR, BMI, IQ, etc.

Proving or disproving theoriesAssociations between phenomenaIf sample relates to the larger populationE.g., Diet and health

– Hypothesis Testing– Correlation– Confidence Intervals / T-tests – Significance Testing– Prediction

Inferential Statistics – goes beyond the immediate data to make inferences about population phenomena

Descriptive Statistics used to present quantitative descriptions in a manageable form. In a research study there may be lots of measures. Descriptive statistics helps simplify large amounts of data in a sensible way. Each descriptive statistic reduces lots of data into a simpler summary.

Examples of Descriptive Stats

- Batting average –summarizes how well batter performs. The single number describes a large number of discrete events.

- Grade Point Average (GPA)-single number that describes the general performance of a student across a potentially wide range of course experiences.

Using a single indicator for a large set of observations can distort the original data or loose important detail

doesn’t tell whether hits are HRs or singles, whether slump or streak.

doesn’t tell whether difficult or easy courses, major field or others

Descriptive StatisticsDescriptive statistics can be used to summarize and describe a single variable at a time (UNIvariate analysis). There are 3 major characteristics of a single variable (found by repeating the measurement) 1. The distribution (frequencies, percentages)2. The central tendency (mean, median, mode)3. The dispersion or spread around the central

tendency (standard deviation, range)

Types of VariablesTYPE DEFINITION EXAMPLES

DISCRETE or QUALITATIV

E

Categorical or nominal

Variable that can take on one of a limited number of possible values, groups or categories. There is no intrinsic ordering to the categories.

GenderhaircolorBloodtype (A, B, AB, O)State resident lives in

Ordinal Similar to categorical, but there is a clear ordering of the variables

Economic classEducational experienceLikert scales or ratings

CONTINUOUS or QUANTITATIVE

Interval Similar to ordinal except the intervals between the values of the interval variable are equally spaced so they can be measured along a continuum and given numerical values

Annual incomeTemperatureReaction timeBP

Ratio interval variables, but with the added condition that 0 of the measurement indicates that there is none of that variable. You can use ratios to relate these measurements.

HeightMassDistanceweightTemp only in Kelvin

Distributions of Discrete Variablesuse frequencies (counts) and percentagesexamples of discrete variables: Levels, types, groupings, yes/no, Drug A vs. Drug B

Color Frequency %

RedBrownYellowGreenOrangeBlue

18177742

30.932.712.712.73.67.3

Total 55

Table 1. Frequencies in the Bag of M&M's

Table

Different ways to display frequencies and percentages for M&M data:

17

187

7

24

Bar chart

Pie chart

FREQUENCY DISTRIBUTIONS

good if more than 20

observations

Distributions of Ordinal Level DataFrequencies and percentages can also be computed for ordinal data (which is also discrete)

– Examples: Likert Scales (Strongly Disagree to Strongly Agree); High School/Some College/College Graduate/Graduate School

Strongly Agree

Agree Disagree Strongly Disagree

0

10

20

30

40

50

60

Distribution of Interval/Ratio (Continuous) Data

We can compute frequencies and percentages for continuous (interval and ratio level) data as well

– Examples: Age, Temperature, Height, Weight, Many Clinical Serum Levels

Distribution of Injury Severity Score in a population of patients

NO MEASUREMENT IS EXACT. There is uncertainty in measured

values due to random errors. Since random errors are by nature, erratic, they are subject to the laws of probability or chance.

The uncertainty of a measurement can be reduced by repeating a measurement many times and taking the average (central tendency). The individual measurements will be scattered or distributed around the average.

The amount of spread or dispersion of individual measurements around the average value is a measure of the uncertainty of the measurement.

62 63 64 65 66 67 68

MEAN = 65.36 cm

Same average values

Larger spread or uncertainty (less

precise)

Smaller spread or uncertainty (more

precise)

The average spread of the multiple measurements around a mean value is an estimate of the

uncertainty and is called the standard deviation, SD.

Experiment: What is the effect of release height on the horizontal landing distance.Here we characterize ONE VARIABLE - the distance (for one height)

9/14/2010 17Photo courtesy of Judy Davidson, DNP, RN

STANDARD DEVIATION AND SAMPLE SIZE (degrees of freedom)

n=10

n=50

n=150

As sample size, n, increases, so SD or dispersion around the mean decreases

68% of all the measurements fall within 1 SD from the mean

The average spread of the multiple measurements around a mean value represents the uncertainty of the variable and is called the standard deviation,

SD.

95% of all the measurements fall within 2 SDs from the mean

99% of all the measurements fall within 3 SDs from the mean

Mean

NORMAL DISTRIBUTION of a VariableIn a normal distribution, points are distributed symmetrically around the mean. Many naturally-occurring phenomena can be approximated surprisingly well by this distribution

Mean (central tendency)

Dispersion measured by S.D.

Only 3 points in 1000 will fall outside 3 S.D. from mean

SDSDSDSDSDSD + + +

Freq

uenc

y or

pro

babi

lity

p<0.05

normal distrib interactive

https://www.mathsisfun.com/data/standard-normal-distribution-table.html



trial measurement spread

x x – xavr (x – xavr)2

1 9 2 4

2 4 -3 9

3 7 0 0

4 6 -1 1

5 10 3 9

6 5 -2 4

7 5 -2 4

8 7 0 0

9 8 1 110 8 1 1

total

mean 7 33 %)27(9.19

33SD

1

)( 2

n

xxSD

)(9.17 SDmean

SD is also known as the square root of the

variance, s

SS

trial measurement Spread

x x – xavr (x – xavr)2

1 9 2 4

2 4 -3 9

3 7 0 0

4 6 -1 1

5 10 3 9

6 5 -2 4

7 5 -2 4

8 7 0 0

9 8 1 110 8 1 1

total

mean 7 33

1

)( 2

n

xx avrSD

)(9.17 SDmean 68% confidence that another measurement would be within one SD of the average value.

95% confidence that another measurement would be within two SDs of the average value.

99% confidence that another measurement would be within three SDs of the average value.

(between 5.1-8.9)

(between 3.2-10.8)

(between 1.3-12.7)

)(3.07 SDaveragemeasured

NO since the range within 2 STDs of the mean (95% confidence interval between 6.4 – 7.6) does not overlap with the accepted value. Therefore, there is a statistically significant difference between the two values.

Example

8valueacceptedDoes your measurement agree with the

accepted value?

What is the relative error of your

measurement?

%5.121008

87

100(%)Re

accepted

acceptedtmeasuremenErrorl

)(6.07 SDaveragemeasured

YES since within 2 STDs of uncertainty, your measured average agrees with the accepted value.You have 95% confidence that the measured average value is within two STDs (between 5.8 – 8.2).

Example

8valueacceptedDoes your measurement agree with the

accepted value?

What is the relative error of your

measurement?

%5.121008

87

100(%)Re

accepted

acceptedtmeasuremenErrorl

OTHER DISTRIBUTIONS of a single variable

Normal orsymmetric Right Skewed

Left SkewedRare

Bimodal

Fat tailed

DistributionsThe distribution of scores or values can also be displayed using Box and Whiskers Plots and Frequency Histograms

BOXPLOT (BOX AND WHISKER PLOT)

2774N =

MaleFemale

Pain

(VAS)

12

10

8

6

4

2

0

-2

MEDIAN (50th centile)

75th Centile

25th Centile

2.5th Centile

97.5th Centile

Inter-quartile range

SKEWED DISTRIBUTION

MEAN

MEDIAN – 50% OF VALUES WILL LIE ON EITHER SIDE OF THE

MEDIAN

DESCRIBING DATA – Central Tendency

MEANAverage or arithmetic mean of the data

MEDIANThe value which comes half way when the data are ranked in order

MODE Most common value observed

• In a normal distribution, mean and median are the same

• If median and mean are different, indicates that the data are not normally distributed

• The mode is of little if any practical use

DISTRIBUTIONS: EXAMPLESNORMAL DISTRIBUTION

SKEWED DISTRIBUTION

• Height

• Weight

• Haemoglobin

• IQ

• BP

• Test scores

• Speed on HW at a spot

• Bankers’ bonuses (+)

• Number of marriages (+)

• Mileage on used cars for sale (+)

• Age at death in developed countries (-)

• Number of fingers (-)

HOW TO TELL IF A VARIABLE FOLLOWS A NORMAL DISTRIBUTION

• Important because parametric statistics assume normal distributions

• Statistics packages can test normality• Distribution unlikely to be normal if:

– Mean is very different from the median– Two SDs below the mean give an

impossible answer (eg height <0 cm)

Descriptive StatisticsDescriptive statistics can be used to summarize and describe a single variable at a time (UNIvariate analysis). There are 3 major characteristics of a single variable (repeatedly measured) 1. The distribution (frequencies, percentages)2. The central tendency (mean, median, mode)3. The dispersion or spread around the central tendency

(standard deviation, variance, range). Large STD means that there is a lot of uncertainty to the mean and that the measurement is not precise.

By taking multiple measurements, a good estimate of these 3 properties of the measured variable fully characterizes the uncertainty of the variable.

INFERENTIAL STATISTICSWith inferential statistics, you are trying to reach conclusions that extend beyond the immediate data (sample) to a population. Inferential statistics can be used to prove or disprove theories, determine associations between variables, and determine if findings are significant and whether or not we can generalize from our sample to the entire population

The types of inferential statistics we will go over: • T-tests/ANOVA• Correlation • Chi-square • Logistic Regression

Types of VariablesTYPE DEFINITION EXAMPLES

DISCRETE or QUALITATIV

E

Categorical or nominal

Variable that can take on one of a limited number of possible values, groups or categories. There is no intrinsic ordering to the categories.

GenderhaircolorBloodtype (A, B, AB, O)State resident lives in

Ordinal Similar to categorical, but there is a clear ordering of the variables

Economic classEducational experienceLikert scales or ratings

CONTINUOUS or QUANTITATIVE

Interval Similar to ordinal except the intervals between the values of the interval variable are equally spaced so they can be measured along a continuum and given numerical values

Annual incomeTemperatureReaction timeBP

Ratio interval variables, but with the added condition that 0 of the measurement indicates that there is none of that variable. You can use ratios to relate these measurements.

HeightMassDistanceweightTemp only in Kelvin

Inferential Statistics• Comparisons of ONE VARIABLE between 2 or more

groups (UNIvariate analysis)

– Correlation T-tests– T-tests (comparison of one variable between 2

groups)– ANOVA (comparison of one variable between 3 or

more groups)

• Relating how TWO VARIABLES vary together (BIvariate)

– Correlations– Chi-square– Logistic Regression

T-TestsOne of the simplest inferential tests. Used to compare the average of ONE VARIABLE between two groups to see if there is a statistical difference. eg. Do 8th-grade boys and girls differ in math test scores? or Does the outcome measure differ from a control group?

The t-test assesses whether the means of two groups of data are statistically different from each other.

The two distributions are widely separated so their means are clearly different

The distributions overlap, so it is unclear whether the samples come from the same population

To judge the difference between two groups, we must consider the difference between their means relative to the spread or uncertainty of the group. The t-test does just this.

difference theof SE

meansbetween Differencenoise

signal

valuet

T-Tests What does it mean to say that the averages for two groups are statistically different?

Lower variability

Higher variability

In these 2 examples, the difference between the means is the same.

With smaller sample sizes, SD increases, as does the overlap between the two curves, so t-value decreases

Size Sample

SDSE

difference the of SEmeans between Difference

t

T-Tests What does it mean to say that the averages for two groups are statistically different?

Depends on- Difference between the means- Variability or dispersion of the dataset- Sample size or degrees of freedom (since this affects the variability)

2

22

1

21

21

var

nSD

nSD

xx

iability

meansbetweendifference

noise

signalvaluet

Does the t-value represent a statistically significant difference???? Depends on- significance level (usually set to at least 0.05 which means that

the risk of concluding that a difference exists when there is no actual difference is p<5% . p<0.05 is defined as statistically significant and not due to chance)

- df or sample size (df = ntot-2 =combined sample # -2)

you can look in a standard table of significance to see if the t-value is large enough to say that the difference between the groups is not likely to have been a chance (p<0.05)

T-Test: Are the averages of 2 sets of data, x1 and x2, statistically different?

2 types of T-tests 1. Paired t-tests: Used to compare the MEANS of a continuous

variable in two non-independent or related samples Examplesa) measurements on the same people before and after a treatment b) Is diet X effective in lowering serum cholesterol levels in a sample of

12 people?c) Do patients who receive drug X have lower blood pressure after

treatment then they did before treatment?2. Independent samples t-tests: Used to compare the MEANS of a

continuous variable in two independent samples Examplesd) Measurements from two different groups of peoplee) Do people with diabetes have the same Systolic Blood Pressure as

people without diabetes? a) Do patients who receive a new drug treatment have lower blood

pressure than those who receive a placebo?Tip: if you have > 2 different groups, you use ANOVA, which compares the means of

3 or more groups

EXAMPLE: Sam Sleepresearcher hypothesizes that people who sleep for only 4 hours will score significantly lower than people who sleep for 8 hours on a cognitive skills test. He brings 16 participants into his sleep lab and randomly assigns them to one of two groups. Group A sleeps 8 hrs, group B sleeps 4 hrs. The next morning he administers the SCAT to all participants. (Scores on SCAT range from 1-9 with high scores representing better performance).

according to the t-table, with df = 14, t must be at least 2.145 to reach p <0.05, so this difference is not statistically significant

A scores B scores5 8

7 1

5 4

3 6

5 6

3 4

3 1

9 2

Avr = 5 Avr = 4sd = 2.14 sd = 2.56

847.0

856.2

814.2

452222

B

B

A

A

avravr

nsd

nsd

BAt

T-test Calculator (click on it)

http://graphpad.com/quickcalcs/ttest1/

If t-test indicates no significant difference between 2 groups, that does NOT prove that there is no difference. YOU CANNOT PROVE A NEGATIVE RESULT. You can only say that the probability or confidence level is less than a particular p-value

If there is no statistically significant difference between groups, it could indicate that either- The difference is very small and not enough data was collected to see the

difference (need a larger sample size)- There really is no difference

A low probability value indicates that a difference between the groups would be very unlikelyResearch in March, 2012 reported here found evidence for the existence of the Higgs Boson particle. However, the evidence for the existence of the particle was not statistically significant. Did the researchers conclude that their investigation had been a failure or did they conclude they have evidence of the particle, just not strong enough evidence to draw a confident conclusion?

What does no significant difference mean?

http://www.wired.com/2012/03/tevtaron-sees-higgs/

How do you report t-tests results?

“As can be seen in Figure 1, specialty candidates had significantly higher scores on questions dealing with treatment than residency

candidates (t = [insert t-value from stats output], p < .001).

“As can be seen in Figure 1, children’s mean reading performance was significantly higher on the post-tests in all four grades, ( t = [insert from stats output], p < .05)”

SUMMARY THUS FAR …

ONE-SAMPLE t-Test (INDEPENDENT SAMPLE)

Used to compare means of two independent samples

PAIRED t-TEST (MATCHED PAIR)

Used to compare two (repeated) measures from the same subjects

T-tests • What does a t-test tell you?

If there is a statistically significant difference (p<0.05) between the means of two variables or groups. The 2 groups or variables being compared can be either independent OR related (ie before and after, control and experimental)

• What do the results look like? Student’s t

• How do you interpret it?By looking at the p-value that corresponds to the t-value • p < 0.05 means the 2 groups are significantly different from each other • p > 0.05 means the difference between the groups is NOT statistically

significant. It does NOT mean the 2 groups are statistically the same. (either the difference is small and more measurements are required to detect it or there is no difference between the groups)

T-test Calculator (click on it)



Comparisons of one variable Between THREE OR MORE Samples- ANOVA

• Cannot use t-test (only for 2 samples)

• Use ANalysis Of VAriance (ANOVA)

• Essentially, ANOVA involves dividing the variance in the groups into:

– Variance BETWEEN groups (variance of averages)

– Variance WITHIN groups

The greater F, the more statistically significant the difference is (values

of F in standard tables)

varianceGroups of Measure

varianceGroups of Measure

WITHIN

BETWEENF

ANOVA

WITHIN-Group Variance

BETWEEN-Group Variance

Here, the BETWEEN-group variance is large relative to the WITHIN-group variance, so F will be LARGE

ANOVA Here, the WITHIN-group variance is larger, and the BETWEEN-group variance smaller, so F will be smaller (reflecting the likeli-hood of no significant differences between these three sample means

WITHIN-Group Variance

BETWEEN-Group Variance

variance

variance

WITHIN

BETWEENF

F=0 if the group means are identical

F>0 if not F could be >0 by chance.

So how large is F to be statistically significant?

How large an F-value represents a statistically significant difference???? Depends on- significance level (usually 0.05 which means that the risk of

concluding that a difference exists when there is no actual difference is p<5% . This is defined as statistically significant and not due to chance)

- df or sample size (degrees of freedom)

you can look in a table of significance to see if the F-value is large enough to say that the difference between the groups is significant and not likely to have been due to chance (p<0.05)

ANOVA Significance

ANOVA – AN EXAMPLE

Age Group n Mean SD18-24 13 31.9 5.0

25-31 12 31.1 5.7

32-38 10 35.8 5.3

39-45 10 38.0 6.6

46-52 12 29.3 6.0

53-59 11 28.5 5.3

Total 68

“Between group” 32.2 6.4

• Focus group where 68 participants were asked to rate DVD players• Table shows scores for ‘Total DVD assessment’ by 6 different age groups

Data from SPSS sample data file ‘dvdplayer.sav’Results from running ‘One Way ANOVA’

BETWEEN group variance:The 6 group means are treated as a dataset with its own variance.The BETWEEN Variance is related to- SS- df (ngroups – 1) = 5 in this case

2)( xxi

WITHIN group variance:The BETWEEN Variance is related to- SS- df (ntotal – dfbetween) = 62 in this case

2)( xxi

ANOVA ExampleSum of Squares

df Variance(SS/df)

F Sig.(p <)

BETWEEN Groups(6 groups) 733.27 5

(ngroups-1)146.65

WITHIN Groups(68 participants) 1976.42 62

(ntotal-dfbetween)31.88

Total 2709.69 67 4.60

• ‘BETWEEN Groups’ Sum of Squares is related to the variance (or variability) of the average values of the groups. Thus we treat the collection of the 6 group means as data.

• ‘WITHIN Groups’ Sum of Squares concerns the variance within the groups

• The degrees of freedom (df) represent the number of independent data points required to define each value calculated.

2)( xxiVar

Var

WITHIN

BETWEEN

0.0012

Does the F-value represent a statistically significant difference???? Depends on the significance level and the sample size (df)

http://onlinestatbook.com/2/calculators/F_dist.html



ANOVA ExampleSum of Squares

df Variance(SS/df)

F Sig.(p <)

BETWEEN Groups(6 groups) 733.27 5

(ngroups-1)146.65

WITHIN Groups(68 participants) 1976.42 62

(ntotal-dfbetween)31.88

Total 2709.69 67 4.60 0.0012

2)( xxi Var

Var

WITHIN

BETWEEN

• This would be reported as follows:

Mean scores of total DVD assessment varied significantly between age groups (F(5,62)=4.60, p=0.0012)

• Have to include the Between Groups and Within Groups degrees of freedom because these determine the significance of F

There are basic assumptions used in T-tests and ANOVA

1. the variances within each group are equal to each other

2. The data are normally distributed

T-test and ANOVA Assumptions

https://explorable.com/statistical-variance

https://explorable.com/normal-probability-distribution

https://explorable.com/normal-probability-distribution

Inferential StatisticsCorrelation and Regression

T-tests compare one variable between 2 groupseg. Comparison of BP in females and males or

Comparison of BP before and after a drug

Correlation and regression analyses is used to see how two variables vary together.Eg. How are the variables BP and Heart Rate

related

CorrelationOne of the most common and most useful statistics. A correlation is a single number that describes the degree of relationship between two variables. It is NOT related to cause and effect.

• When to use it? When you want to know about the association or relationship between two continuous variables egs. Relationship between food intake and weight;

drug dosage and blood pressure air temp and metabolic rate

• What does it tell you? If a linear relationship exists between two variables, and how strong that relationship is.

• What do the results look like?– The correlation coefficient = Pearson’s r

can use CORREL() function in Excel– Ranges from -1 to +1

CorrelationGuide for interpreting strength of correlations:

r = 0 – 0.25: Little / no relationship r = 0.25 – 0.50: Fair relationship r = 0.50 - 0.75: Moderate relationship r = 0.75 – 1.0: Strong relationship r = 1.0: perfect correlation

Correlation • How do you interpret it?

a) If r is positive, high values of one variable are associated with high values of the other variable (both go in SAME direction: ↑↑ OR ↓↓) Eg. Diastolic blood pressure tends to rise with age, thus the two variables are

positively correlatedb) If r is negative, low values of one variable are associated with high values of

the other variable (variables change in opposite direction: ↑↓ OR ↓ ↑) Eg. Heart rate tends to be lower in persons who exercise frequently, the two

variables correlate negativelyc) r=0: Correlation of 0 indicates NO linear relationship

• How do you report it? “Diastolic blood pressure was positively correlated with age (r = .75, p < . 05).”

Tip: Correlation does NOT equal causation!!! If two variables are highly correlated, this does NOT mean that one CAUSES the other!!!

NO Correlation does NOT mean no relationship

Correlation ExampleLet's assume that we want to look at the relationship between two variables, ice cream sales and temperature

Ice Cream Sales vs Temperature for 12 days

Temperature °C

Ice Cream Sales

14.2° $21516.4° $325

11.9° $18515.2° $332

18.5° $406

22.1° $522

19.4° $41225.1° $61423.4° $544

18.1° $421

22.6° $445

17.2° $408

We can easily see that warmer weather leads to more sales, the relationship is good but not perfect.r = 0.9575

Correlation does NOT mean causation

How statistically significant is the correlation value? Depends on- significance level (usually 0.05 which is the probability that the

correlation occurred by chance. p<5% is defined as statistically significant and not due to chance)

- df or sample size (degrees of freedom)

you can look in a table of r values to see if the r is large enough to say that the difference between the groups is significant and not likely to have been due to chance (p<0.05)

Correlation Significance

Correlation ExampleCorrelation is not good at curves. It only works well for relationships that are linear.Here is the ice cream example again, but there has been a heat wave

There is a nice peak at around 25oC. Correlation analysis cannot “see” this non linear relationship.

Need a Regression analysis of scatter plot

r = 0 so there is no correlation between ice cream sales and temp.That does NOT mean that there is no relationship between the variables

Documents

Basic Statistics for Scientific Research. Statistics involves math and calculations of numbers. It also guides how numbers are chosen, interpreted, and