Upload
caesar-stanton
View
55
Download
2
Embed Size (px)
DESCRIPTION
Data collection and Statistics. Evert Jan Bakker and Gerrit Gort Biometris - Wageningen University. Introduction: What is Statistics?. Probability calculus - theoretical and exact - PowerPoint PPT Presentation
Citation preview
Quantitative Methods brought to Life
Biometris
Data collection and Statistics
Evert Jan Bakker and Gerrit Gort
Biometris - Wageningen University
Quantitative Methods brought to Life
Biometris
Introduction: What is Statistics?1. Probability calculus - theoretical and exact
2. Descriptive StatisticsJust describes the data. All conclusions only refer to the sample. The conclusions are ‘always correct’. Càn be convincing already.Graphical representations of the data.
3. Inference (Test of Hypothesis, Estimate Conf. Interval)Conclusions are drawn about a population (e.g. Wageningen Students) or a general phenomenon (maize yield), only using data from a limited sample.
4. Experimental design/ Sampling designRandomisation, Blocking, Special designs…/sample size
Quantitative Methods brought to Life
Biometris
Inference An experiment used for inference :
Question / Hypothesis Design of the experiment Statistics
Carry out the experiment Analysis of the experimental data Statistics
For standard designs, the data analysis follows a fixed calculation pattern, which is known before the experiment is done.
Quantitative Methods brought to Life
Biometris
2 types of research aims
1. exploration : generate new ideas Measure many response variables; report any fact of interest / relationship / differences, using “any” descriptive analysis.
2. Inference (test / confidence interval):drawing conclusions about a population or a general phenomenon based on sample data. Inference has to be done according to the rules, so as not to ‘Lie with Statistics’.The model of analysis should be reasonable
Quantitative Methods brought to Life
Biometris
Qualitative vs Quantitative data
“green”
Quantitative Methods brought to Life
Biometris
Data collection Primary data collection:
for observational research: sampling, how?, how many?
for experimental research: design of experiment (choice of exerimental units, randomisation, measurement of response(s), nr. of replications
In case secondary data is used: know how the data were obtained (meta-data). Otherwise the conclusion will be about an unknown population.
Sampling: random, stratification, subsampling, ... Conclusion can be drawn about a population from which a random sample was taken.
Quantitative Methods brought to Life
Biometris
Design principles : brief overview 1. Repetition (n > 1)
required for more precision 1-sample example: st.dev of - is
required to know natural variation
2-sample example: - must be compared with the natural variation, impossible without repetition
2. Random drawings / Random allocation of treatments
no bias (systematic error) introduction of chance in the system
Quantitative Methods brought to Life
Biometris
Design principles : brief overview (2)
3. Increase homogeneity : all experimental units are as similar and in as similar conditions as possible, - except the conditions influenced by the treatment
4. Measure other variables that may influence the response in the analysis used as covariates
5. In case of known other possible sources of variation:Blocking create homogeneous groups (blocks) In the analysis, block-effects can be corrected for.
Total variation =
Error Treatm effect +
Total variation =
Treatm effect ++ Error
Block/cov eff
Quantitative Methods brought to Life
Biometris
Lessons, also from personal experience
Own PhD experience: Not believing the results led to an extra year of analyses!
Lesson: know your analysis in advance
Real-life research experience in MaliChoice of experimental units
Quantitative Methods brought to Life
Biometris
During 10 days, 3 cows are observed, one per observer, during 8 hours, 12 times per hour, during 60 (s). Measurement: amount of time spent walking (%) = y.
Result for walking (%) between 10 and 12 a.m.: 72 observ. observations per cow, (suppose): within-cow sE = 10.
Some cows walk more than others, e.g. Between-cow standard deviation of mean time spent walking: sC = 4.
Suppose : = 20. What is the standard error?
Cows observed in pasture land - example
Quantitative Methods brought to Life
Biometris
Cows example y = C + E C = mean for a (random) cow, E = deviation = measurement – C
Var () = Var() + Var() = 42/3 + 102/ 72 = 5.33 + 0.84 = 6.17So, using 1 cow per observer: se() = 6.17 = 2.6
If 2 cows per observer were used: Var () =Var() + Var() = 42/6 + 102/ 120 = 3.5
se() = 3.5 = 2.01
If 4 cows per observer were used, ..... se() = 1.65
Quantitative Methods brought to Life
Biometris
Cows example Make sure to think about the sources of
variation. Important sources need to be often sampled independently.
The observations were pseudo-replications.
The many within-cow observations enabled us to have a very precise estimate of the mean walking % for each of the 3 cows, but not for the overall mean.
Experimental /sampling units: units to which a treatment is assigned / that were randomly sampled.Measured units: units on which measurements are taken. Example: pens vs chickens in the pen.
Quantitative Methods brought to Life
Biometris
2 Hypothetical Populations, one for each treatment. We call the population means: μ1 and μ2
Parameter of interest: Δ=μ1- μ2
Samples: y1,1, …, y1,n1; y2,1, …, y2,n2
Model = Assumptions: the data are outcomes of n1 and n2 independent drawings from N(μ1, σ1) and N(μ2, σ2).
Extra assumption: σ1= σ2 = σ.
Sample size calculations: 2 treatments
Quantitative Methods brought to Life
Biometris
3 (of many) possible realities
Δ= 0 (no difference)
Δ= Δ1 (large difference)
Δ= Δ2 (small difference)
Assumed: Normality and σ1= σ2
mC=mT
D=0s
mTmC
D1
mTmCD2
Quantitative Methods brought to Life
Biometris
Testing: reality vs. conclusion
Given a relevant Ha reality (value for Δ ), and given α (e.g. 0.05) the power of a planned
experiment can be calculated.
Quantitative Methods brought to Life
Biometris
Simulations to mimick the test result Excel: simulations 2 samples.xls one experiment with test is repeated 200 times We assume that σ is approximately known We can vary “reality” Δ = μ1 – μ2
That is: let us assume that Δ is …. (so and so much)Then see how frequent H0 is rejected (=power of the test)
We can vary sample size n (=n1=n2). We can vary α We can then simulate power
(demonstration of simulation program)
Quantitative Methods brought to Life
Biometris
Formula for sample size : confidence interval
Formula (n per sample), for a (1-α) C.I.Error Margin ≤ M. tα/2≈ 2.0/2.2
Precision criteria that have to be specified: 1- α = confidence level and M = max Error Margin
Notes 1) σ has to estimated 2) if α=0.05, z=1.96.
3) if outcome for n is small (< 10) change the
t-value with df = 2(n -1) and calculate again.
4) In testing, in stead of M, we specify Δ, the minimum relevant difference and (=1 –power)
nstyy 22/21 : limits Interval Confidence
2
2
2/
22
M
tn s
Quantitative Methods brought to Life
Biometris
2C. Power calculation with Russ Lenth
Lenth, R. V. (2006). Java Applets for Power and Sample Size [Computer software]. Retrieved March 15, 2009, from http://www.stat.uiowa.edu/~rlenth/Power.
Example : Estimate p = fraction of baby’s with consti-pation (<0.2) with an Error Margin of at most 1%. Define y=1 (yes) or 0 (no). Then Var(y) = σ2 = p(1-p) < (0.2*0.8)=0.16. formula: n ≥ …
Quantitative Methods brought to Life
Biometris
In design phase Think about the relevant “sources of variation” (influential
factors) which of them will you include in design, which of them will you keep constant? Block design? Split plot?
Measure conditions that vary (weather,...)
Measure general conditions (even if they do not vary across treatments in your experiment)
Avoid / be aware of pseudo-replication experimental units measured unit sampling unit measured unit
Correct randomisation
Conclusions
Quantitative Methods brought to Life
Biometris
Analysis Conclusions from a statistical analysis are
drawn in the context of a statistical model. The correctness and the relevance of the conclusion depend on the correctness and the relevance of the model.
Model = assumptions about the observations Systematic part (how the mean value of the
response depends on the factor levels / factor level combinations)
Random part: independence, Normality and equal variance(independence follows from correct randomisation)
Quantitative Methods brought to Life
Biometris
Conclusions For sample size calculations, the researcher must
know beforehand which analysis she will perform with the collected data.
specify research goals in terms of precision requirements: Minimum relevant difference , power (0.8/0.9), α (5%)
know error variation: (guess: range/4)
Decide on sample sizes (Russ Lenth Power)
Measure and store quantitative data, when possible, not binary data.
Quantitative Methods brought to Life
Biometris
Conclusions Enter data once in a data base. Use programs to
derive calculated variables or partial data sets for which you do an analysis.
In case of need, contact a statistician !... beforehand.