22
Quantitative Methods brought to Life Biometris Data collection and Statistics Evert Jan Bakker and Gerrit Gort Biometris - Wageningen University

Data collection and Statistics

Embed Size (px)

DESCRIPTION

Data collection and Statistics. Evert Jan Bakker and Gerrit Gort Biometris - Wageningen University. Introduction: What is Statistics?. Probability calculus - theoretical and exact - PowerPoint PPT Presentation

Citation preview

Page 1: Data collection and Statistics

Quantitative Methods brought to Life

Biometris

Data collection and Statistics

Evert Jan Bakker and Gerrit Gort

Biometris - Wageningen University

Page 2: Data collection and Statistics

Quantitative Methods brought to Life

Biometris

Introduction: What is Statistics?1. Probability calculus - theoretical and exact

2. Descriptive StatisticsJust describes the data. All conclusions only refer to the sample. The conclusions are ‘always correct’. Càn be convincing already.Graphical representations of the data.

3. Inference (Test of Hypothesis, Estimate Conf. Interval)Conclusions are drawn about a population (e.g. Wageningen Students) or a general phenomenon (maize yield), only using data from a limited sample.

4. Experimental design/ Sampling designRandomisation, Blocking, Special designs…/sample size

Page 3: Data collection and Statistics

Quantitative Methods brought to Life

Biometris

Inference An experiment used for inference :

Question / Hypothesis Design of the experiment Statistics

Carry out the experiment Analysis of the experimental data Statistics

For standard designs, the data analysis follows a fixed calculation pattern, which is known before the experiment is done.

Page 4: Data collection and Statistics

Quantitative Methods brought to Life

Biometris

2 types of research aims

1. exploration : generate new ideas Measure many response variables; report any fact of interest / relationship / differences, using “any” descriptive analysis.

2. Inference (test / confidence interval):drawing conclusions about a population or a general phenomenon based on sample data. Inference has to be done according to the rules, so as not to ‘Lie with Statistics’.The model of analysis should be reasonable

Page 5: Data collection and Statistics

Quantitative Methods brought to Life

Biometris

Qualitative vs Quantitative data

“green”

Page 6: Data collection and Statistics

Quantitative Methods brought to Life

Biometris

Data collection Primary data collection:

for observational research: sampling, how?, how many?

for experimental research: design of experiment (choice of exerimental units, randomisation, measurement of response(s), nr. of replications

In case secondary data is used: know how the data were obtained (meta-data). Otherwise the conclusion will be about an unknown population.

Sampling: random, stratification, subsampling, ... Conclusion can be drawn about a population from which a random sample was taken.

Page 7: Data collection and Statistics

Quantitative Methods brought to Life

Biometris

Design principles : brief overview 1. Repetition (n > 1)

required for more precision 1-sample example: st.dev of - is

required to know natural variation

2-sample example: - must be compared with the natural variation, impossible without repetition

2. Random drawings / Random allocation of treatments

no bias (systematic error) introduction of chance in the system

Page 8: Data collection and Statistics

Quantitative Methods brought to Life

Biometris

Design principles : brief overview (2)

3. Increase homogeneity : all experimental units are as similar and in as similar conditions as possible, - except the conditions influenced by the treatment

4. Measure other variables that may influence the response in the analysis used as covariates

5. In case of known other possible sources of variation:Blocking create homogeneous groups (blocks) In the analysis, block-effects can be corrected for.

Total variation =

Error Treatm effect +

Total variation =

Treatm effect ++ Error

Block/cov eff

Page 9: Data collection and Statistics

Quantitative Methods brought to Life

Biometris

Lessons, also from personal experience

Own PhD experience: Not believing the results led to an extra year of analyses!

Lesson: know your analysis in advance

Real-life research experience in MaliChoice of experimental units

Page 10: Data collection and Statistics

Quantitative Methods brought to Life

Biometris

During 10 days, 3 cows are observed, one per observer, during 8 hours, 12 times per hour, during 60 (s). Measurement: amount of time spent walking (%) = y.

Result for walking (%) between 10 and 12 a.m.: 72 observ. observations per cow, (suppose): within-cow sE = 10.

Some cows walk more than others, e.g. Between-cow standard deviation of mean time spent walking: sC = 4.

Suppose : = 20. What is the standard error?

Cows observed in pasture land - example

Page 11: Data collection and Statistics

Quantitative Methods brought to Life

Biometris

Cows example y = C + E C = mean for a (random) cow, E = deviation = measurement – C

Var () = Var() + Var() = 42/3 + 102/ 72 = 5.33 + 0.84 = 6.17So, using 1 cow per observer: se() = 6.17 = 2.6

If 2 cows per observer were used: Var () =Var() + Var() = 42/6 + 102/ 120 = 3.5

se() = 3.5 = 2.01

If 4 cows per observer were used, ..... se() = 1.65

Page 12: Data collection and Statistics

Quantitative Methods brought to Life

Biometris

Cows example Make sure to think about the sources of

variation. Important sources need to be often sampled independently.

The observations were pseudo-replications.

The many within-cow observations enabled us to have a very precise estimate of the mean walking % for each of the 3 cows, but not for the overall mean.

Experimental /sampling units: units to which a treatment is assigned / that were randomly sampled.Measured units: units on which measurements are taken. Example: pens vs chickens in the pen.

Page 13: Data collection and Statistics

Quantitative Methods brought to Life

Biometris

2 Hypothetical Populations, one for each treatment. We call the population means: μ1 and μ2

Parameter of interest: Δ=μ1- μ2

Samples: y1,1, …, y1,n1; y2,1, …, y2,n2

Model = Assumptions: the data are outcomes of n1 and n2 independent drawings from N(μ1, σ1) and N(μ2, σ2).

Extra assumption: σ1= σ2 = σ.

Sample size calculations: 2 treatments

Page 14: Data collection and Statistics

Quantitative Methods brought to Life

Biometris

3 (of many) possible realities

Δ= 0 (no difference)

Δ= Δ1 (large difference)

Δ= Δ2 (small difference)

Assumed: Normality and σ1= σ2

mC=mT

D=0s

mTmC

D1

mTmCD2

Page 15: Data collection and Statistics

Quantitative Methods brought to Life

Biometris

Testing: reality vs. conclusion

Given a relevant Ha reality (value for Δ ), and given α (e.g. 0.05) the power of a planned

experiment can be calculated.

Page 16: Data collection and Statistics

Quantitative Methods brought to Life

Biometris

Simulations to mimick the test result Excel: simulations 2 samples.xls one experiment with test is repeated 200 times We assume that σ is approximately known We can vary “reality” Δ = μ1 – μ2

That is: let us assume that Δ is …. (so and so much)Then see how frequent H0 is rejected (=power of the test)

We can vary sample size n (=n1=n2). We can vary α We can then simulate power

(demonstration of simulation program)

Page 17: Data collection and Statistics

Quantitative Methods brought to Life

Biometris

Formula for sample size : confidence interval

Formula (n per sample), for a (1-α) C.I.Error Margin ≤ M. tα/2≈ 2.0/2.2

Precision criteria that have to be specified: 1- α = confidence level and M = max Error Margin

Notes 1) σ has to estimated 2) if α=0.05, z=1.96.

3) if outcome for n is small (< 10) change the

t-value with df = 2(n -1) and calculate again.

4) In testing, in stead of M, we specify Δ, the minimum relevant difference and (=1 –power)

nstyy 22/21 : limits Interval Confidence

2

2

2/

22

M

tn s

Page 18: Data collection and Statistics

Quantitative Methods brought to Life

Biometris

2C. Power calculation with Russ Lenth

Lenth, R. V. (2006).  Java Applets for Power and Sample Size [Computer software].  Retrieved March 15, 2009, from http://www.stat.uiowa.edu/~rlenth/Power.

Example : Estimate p = fraction of baby’s with consti-pation (<0.2) with an Error Margin of at most 1%. Define y=1 (yes) or 0 (no). Then Var(y) = σ2 = p(1-p) < (0.2*0.8)=0.16. formula: n ≥ …

Page 19: Data collection and Statistics

Quantitative Methods brought to Life

Biometris

In design phase Think about the relevant “sources of variation” (influential

factors) which of them will you include in design, which of them will you keep constant? Block design? Split plot?

Measure conditions that vary (weather,...)

Measure general conditions (even if they do not vary across treatments in your experiment)

Avoid / be aware of pseudo-replication experimental units measured unit sampling unit measured unit

Correct randomisation

Conclusions

Page 20: Data collection and Statistics

Quantitative Methods brought to Life

Biometris

Analysis Conclusions from a statistical analysis are

drawn in the context of a statistical model. The correctness and the relevance of the conclusion depend on the correctness and the relevance of the model.

Model = assumptions about the observations Systematic part (how the mean value of the

response depends on the factor levels / factor level combinations)

Random part: independence, Normality and equal variance(independence follows from correct randomisation)

Page 21: Data collection and Statistics

Quantitative Methods brought to Life

Biometris

Conclusions For sample size calculations, the researcher must

know beforehand which analysis she will perform with the collected data.

specify research goals in terms of precision requirements: Minimum relevant difference , power (0.8/0.9), α (5%)

know error variation: (guess: range/4)

Decide on sample sizes (Russ Lenth Power)

Measure and store quantitative data, when possible, not binary data.

Page 22: Data collection and Statistics

Quantitative Methods brought to Life

Biometris

Conclusions Enter data once in a data base. Use programs to

derive calculated variables or partial data sets for which you do an analysis.

In case of need, contact a statistician !... beforehand.