Download ppt - I Critical evaluation of (veterinary) scientific literature II Applied epidemiology III Common biostatistics concepts & methods IV Field group research

I Critical evaluation of (veterinary) scientific literature

II Applied epidemiology

III Common biostatistics concepts & methods

IV Field group research projects

Scientific literacy – four sections

BCPM Scientific Literacy - Module III, June 5, 2008

0

1. Three critical reviews of scientific papersJeffrie Fox

& Brad JonesSowell et al, 1999. Feeding and watering behavior of healthy and morbid steers in a commercial feedlot. J Ani Sci 77(5): 1105-1112.

Tom Furman & Jeff Ondrak

Ellis et al, 2002. Comparative efficacy of an injectable vaccine and an intranasal vaccine in stimulating Bordetella bronchoseptica-reactive antibody responses in seropositive dogs. JAVMA 220(1):43-48.

Agenda items

John Davidson & Richard Linhart

Barling et al, 2005. Acute trichomoniasis and suboptimal bull fertility in a cow/calf herd: an investigation and case management. Bovine Practitioner, 39(1):1-5.

BCPM Scientific Literacy - Module III, June 5, 2008

1

2. Keen - Common biostatistics concepts

Scientific literacy - Agenda items - cont

-10 to 15 minute oral presentation update by each project group on state of your project (who, what, when, where, why, how?)

4. Dave Smith – Diagnostic test evaluation

3. Group research projects

2

BCPM Research Project - resources & support materials

Module III – 5 June 2008

Little Handbook of Statistical Practice Gerard Dallal, Tufts University PhD biostatistician http://www.tufts.edu/~gdallal/LHSP.HTM

Some Aspects of Study Design Gerard Dallal, Tufts University PhD biostatistician http://www.tufts.edu/~gdallal/STUDY.HTM

Some Statistical Basics B Gerstaman San Jose State Univerity, DVM PhD epidemiologiost/biostatistician http://www.sjsu.edu/faculty/gerstman/EpiInfo/basics.htm

Data Management - B Gerstaman San Jose State University, DataEntry.pdf – two page pdf file on BCPM website

EpiData – non-spreadsheet freeware for data management http://www.epidata.dk => can download software here http://www.epidata.org/wiki/index.php/Field_Guide 3

BCPM Research Project - resources & support materials-continued

Module III – 5 June 2008

British Medical Journal – Statistical Notes Gerard Dallal Website, Tufts University http://www.tufts.edu/~gdallal/bmj.htm (link to articles)

An excellent & ongoing series of short articles on use of statistics in bio-medicine published on occasional basis since mid-1990s

4

http://www.tufts.edu/~gdallal/bmj.htm

LittleHandbookofStatisticalPracticeDallal.pdf(3 page Table of contents only on BCPM website)

http://www.tufts.edu/~gdallal/LHSP.HTM

5

Some Aspects of Study Design Gerard Dallal, Tufts University biostatistician http://www.tufts.edu/~gdallal/STUDY.HTM

StudyDesignDallal.pdf(21 page complete pdf on BCPM website)

6

Some Statistical Basics - B Gerstaman San Jose State Univerity, http://www.sjsu.edu/faculty/gerstman/EpiInfo/basics.htm

Some Statistical Basics Gerstman.pdf(8 page complete pdf file on BCPM Website)

7

DataEntry.pdf(2 page pdf on BCPM website)

8

EpiDataIntro.pdf(from B Gerstman)

9

Being able to critically read an article puts the power back in your hands, freeing you from an overreliance on "experts". Reading a paper requires addressing the same three basic issues:

validity, results & relevance

A researcher is in a gondola of a balloon that loses lift and lands in the middle of a field near a road. Of course, it looks like the balloon landed in the middle of nowhere. As the researcher ponders appropriate courses of action, another person wanders by.

The researcher asks, "Where am I?" The other person responds, "You are in the gondola of a balloon in the middle of a field."

The researcher comments, "You must design clinical trials." "Well, that’s amazing, how did you know?" "Your answer was correct and precise and totally useless."

BCPM Critical Scientific Review

10

BCPM Critical Scientific Review support materials

Follies and Fallacies in Medicine - Petr Skrabanek Follies-and-Fallacies-in–Medicine-1up.pdf- 183 page pdf file in BCPM website (out of print book)

Scepticemia

11

Excerpt from Follies & Fallacies in Medicine

12

Excerpt fromFollies & Fallacies In Medicine

13

An important scientific question is important

because of the question, not the answer

Research projects

14

Common problems in study protocols

• Too ambitious - too many questions (false economy)

• Insufficient attention to literature (repeat history)

• Poor justification why is it important to answer this question? what impact does it have?

• Poorly formulated objectives

• Inappropriate analysis

• Inadequate description

• Absence of pilot data

15

Epi & biostats – important issues

15

“ There is no biological or life science where the epidemiologic approach and principles cannot be applied .”

Epidemiology (from Greek roots) epi = on, upon

demo = people or population logos = knowledge, understanding

Translation - the study of what befalls the population = medical or veterinary ecology

= Disease patterns that exist under field conditions

Therefore, epidemiology must be applied in the field to be effective

16

Two major epidemiology concepts

1. Epidemiology is the science of denominators Thus, it is the rationale counterbalance to clinical medicine which tends to be preoccupied with numerators (ie cases)”

Clinical => focus on patients, cases & individuals versus

Epidemiology => focus on both sick & healthy animals & on groups (not just individuals)

Sick animals = Numerator = ____Cases____Sick + healthy animals Denominator Population at risk

- Denominators permit calculation of risk, rates & ratios

17

Types of epidemiology

Chronic disease = non-infectious diseasesepidemiology (eg heart attacks or diabetes)

Infectious disease = infectious diseasesepidemiology (eg brucellosis, avian influenza)

Descriptive epidemiology = summarize what is happening in groups by counting or measuring events and rates-by place of event of interest occurrence -by time of event of interest occurrence-by demography (eg animal age, breed, gender, parity)

Analytical epidemiology = compare groups for important differences in clinical (sickness, death) or other (eg production performance) outcomes 18

“It is as important to know what kind of man has the disease as it is to know what kind of disease has the man

Osler, 1849-1919“Medical statistics will be our standard of measurement; we will weigh life for life and see where the dead lie thicker, among the worker or the privileged”

Virchow, 1849

Epi & statistics

19

Two major epidemiology concepts (cont)

2. Disease occurrence is not random

- The critical epidemiologic assumption

- Goals of epidemiology a. Identify the disease occurrence pattern b. Determine key determinants = risk factors which can be

manipulated

-Biostatistics => tool used to detect randomness or patterns

15

RANDOM UNIFORM/DISPERSED

CLUSTERED

Random - any point equally likely to occur at any location and the position of any point not affected by the position of any other point.

Uniform - every point is as far from all of its neighbors as possible;“unlikely to be close”

Clustered – many points concentrated close together and there are large areas that contain very few, if any, points; “unlikely to be distant”

Types of Distributions Non- RANDOM

16

Distribution of world airports 3100 airports in 220 countries

In nature or human culture, few distributions are random

17

Descriptiveepidemiology

Who?What?Where?When?How many?

Rule out Bias Chance Confounding

Descriptive study Design Implement Analyze Interpret

Analyticepidemiology

Why?How?

Control for Bias Chance Confounding

Analytic study Design Implement Analyze Interpret

Observe

Compare subgroups

Epidemiologic inference

Causal inference

Hypothesize

Epidemiologic inference

17

"The main point is gained if the student is put in a position not to be paralyzed by the mere mention of such things but ... feels that they are inherently rational and manageable and that if he encounters them he will be in a position to find out, at need, what to do with them." RA Fisher on teaching intro statistics

18

Dr.H.Qotba 27

Statistics - science of collecting, organizing, summarising, analysing,

and making inference from data

Descriptivecollecting, organizing,

summarising, analysing, and presenting data

Inferentialmaking inferences, hypothesis testing

determining relationships, making predictions

ParametersPopulation

Randomsample

Numerical data

Statistics

1. There exists a

2. An investigator draws a

3. The sample generates

4. Used to evaluate pertinent

5. Used to estimate

Statistical study summary

19

Statistical inference

• A user of statistics is always working in two worlds!– Ideal world – population level– World of reality – sample level

• Statistical Inference– The process whereby one draws conclusions

about a population from the results observed in a sample from that population.

20


Two categories of inference– Estimation (point & interval eg mean + 95% CI)

• Estimating the value of an unknown *population parameter

• Predicts the most likely location of a population parameter

• eg “What is the prevalence ofTritrichomonas foetus in bulls in Texas? (point estimation)

– Hypothesis testing• Making a decision about a hypothesized value of

an unknown population parameter• eg Is prevalence of Tritrichomonas foetus in bulls

in Texas higher than in Nebraska? (Yes or No?)

22


• Three questions concerning a random variable of interest at the population level:

– What is the location?– How much variation?– What is the shape of the distribution?

• Do the values of the variable tend to fall into a bell-shaped, flat, u-shaped, or some other distinctive pattern?

• A common distribution is the normal distribution.

Threats to validity

1. Chance – random error, two types - False positive association = convict the innocent p value, alpha p = 0.05, confidence intervals (precision) - False negative = free the guilty Power

2. Bias => systematic error, many types -Selection bias -Measurement bias 3. Confounding

Should I believe my measurement?

MayonnaiseSalmonella RR =

4.3

Chance?Confounding?Bias?

True association

causalnon-causal

Domain of statistics Domain of proper design

21

Errors

• Two broad types of error– Random error - reflects amount of variability

• Chance? – Systematic error (Bias)

Definition of bias

Any systematic error in an epidemiological study

resulting in an incorrect estimate of association

between exposure and risk of disease

Imprecision & Bias - target analogy

Systematic error (bias): off base on the average

Random error (imprecision): scatters about the target

Errors in epi studies

Error

Study size

Source: Rothman, 2002

Systematic error (bias)

Random error (chance)

The main purpose of analytic epidemiology is to attempt to overcome bias.

It is not easy to overcome bias.One major reason for epi noise (eg non-repeatability of studies)

Systematic Error (Bias)

• Bias is a systematic error in inference

• Consider the direction of bias – Toward the null (effects are

underestimated) – Away from the null (effects are

overestimated)

• Three categories of bias– Selection bias – Information bias – Confounding

Selection Bias

• Selection bias: selection of study participants in a way that favors a certain outcome

• Examples (pp. 229 – 231) – Publicity bias– Healthy worker effect.

• Historical illustration: Dewey Defeats Truman. Republicans were more likely to be polled than Democrats

Example of Information Bias: “The Loaded Question”

• A loaded question is a question with a false, disputed, or question-begging presupposition

• "Have you stopped beating your wife?" presupposes that you have beaten your wife prior to its asking. There are only the following possible answers, both of which entails the presupposition of the question: 1."Yes”, which entails "I was beating my

wife." 2."No”, which entails "I am still beating my

wife.”

Hypothesis Tests are not Perfect

No association

Association

P >0.05

Correct decision

Type II β error

P < 0.05

Type 1α error

Correct decision

Measurement error, bias, confounding

Confidence Intervals

• The “95% confidence interval” is the range of values for which there is a 95% chance it contains the true value of the difference between groups

• This probability is not constant across the confidence interval

• The narrower the confidence interval, the more precise the estimate

The Confidence interval

Picture the mean (an estimate) with an interval around it.– The interval is a “random” interval with endpoints

that are calculated and based on the sample information.

– The Interval has a probability associated with it – the confidence associated with the estimated mean

• Example: 95% confidence interval– Probability of trapping the population mean

is 95/100 – 5 intervals will not “trap” due to chance!

Confidence intervals - Coin Toss Example

# Tosses H T Pointestimat

e

95% CI

2 1 1 0.50 0.00-1.00

10 5 5 0.50 0.19-0.81

50 25 25 0.50 0.36-0.65

100 50 50 0.50 0.40-0.60

1000 500 500 0.50 0.47-0.53

Precision vs sample size

Preference for Confidence Interval

In Comparison 1 Wt. Loss = 7Lbs P = 0.0005 95% CI (5-8)

In Comparison 2 Wt Loss = 7 Lbs P = 0.0047 95%CI (3-11)

Evidence-Based Medicine 2005;10:133-134

P < 0.05

It is not a good description of information in the data

Variables

Quantitative•Discrete

•Continuous

Qualitative •Ordinal •Categorical

Data types

• Quantitative data– Produced when one either measures or counts a

characteristic for each sample element.

• Measured characteristic– e.g., weight, age– Continuous data with meaningful scale– No gaps between data values

• Counted characteristic– e.g., number of piglets– Discrete data, integer data

• Qualitative dataProduced when one groups each sample element into distinct categories based on the “value” of a specific characteristic.

• Categorical data• Two types – nominal and ordinal

–Nominal• Groups without inherent ordering (breed)

–Ordinal• Groups with inherent ordering (body

condition score)• Quantitative and qualitative data are

summarized, analyzed, and graphically presented in different fashions.

Data types

Parametric vs

non-parametric tests

• Parametric - decision making method where the distribution of the sampling statistic is known

eg normal distribution

• Non-Parametric - decision making method which does not require knowledge of the distribution of the sampling statistic

How to select appropriate statistical test

• Type of variables• Quantitative (blood pres.)• Qualitative (gender)

• Type of research question• Association• Comparison• Risk factor

• Data structure • Independent• Paired• Matched• Distribution (normal, skewed)

Most popular errors when doing biostatistics

1. Use parametric statistics for nominal data.

2. Use Standard Error of the Mean (SEM) to describe data.

3. Use Standard deviations, SEMs. Confidence Intervals (CIs) for to describe data that is non-normally.

4. Study sample size is too small ie power is close to 0.5

5. Assume that of an effect is not significant, it is zero. (or “Absence of evidence is evidence of absence”

6. Assume that the level of statistical significance indicates the importance or Size of a difference or relation

1. Throw all your data into a computer and report as significant any relation where P<0.05

2. If baseline differences between the groups favor the intervention group, remember not to adjust for them

3. Do not test your data to see if they are normally distributed. If you do, you might get stuck with non-parametric tests, which aren't as much fun

4. Ignore all withdrawals (drop outs) and non-responders, so the analysis only concerns subjects who fully complied with treatment

5. Always assume that you can plot one set of data against another and calculate an "r value" (Pearson correlation coefficient), and assume that a "significant" r value proves causation

Ten ways to cheat on statistical tests when writing up results

6. If outliers (points which lie a long way from the others on your graph) are messing up your calculations, just rub them out. But if outliers help your case, even if they seem to be spurious results, leave them in.

7. If the confidence intervals of your result overlap zero difference between the groups, leave them out of your report. Better still, mention them briefly in the text but don't draw them in on the graph—and ignore them when drawing your conclusions

8. If the difference between two groups becomes significant four and a half months into a six month trial, stop the trial and writing it up. Alternatively, if at six months the results are "nearly significant," extend the trial for three more weeks

9. If your results prove uninteresting, ask the computer to go back and see if any particular subgroups behaved differently.

10. If analysing your data the way you plan to does not give the result you wanted, run the figures through a selection of other tests

Ten ways to cheat on statistical tests when writing up results (cont)

t-test

• Compare the means of a continuous variable into samples in order to determine whether or not the difference between the 2 expected means exceed the difference that would be expected by chance

What is probability the mean will differ?

T test requirements

• The observations are independent• Drawn from normally distributed population

Types of t-test

• One sample t test - test if a sample mean for a variable differs significantly from the given population with a known mean

• Unpaired or independent t test - test if the population means estimated by independent 2 samples differ significantly (eg group of male and group of female)

• Paired t test: test if the population means estimated by dependent samples differ significantly (mean of pre- and post-treatment for same set of animals

Chi² test

• Used to test strength of association between qualitative variables

• Used for categorical data

Chi 2 test requirements

• Data should be in form of frequency• Total number of observed must exceed 20• Expected frequency in one category or in any cell

must be >5 (When 1 of the cells have <5 in observed yats correction) or if (When 1 of the cells have <5 in expected fisher exact)

• Observed minus chance expected

ANOVA(Analysis of variance)

• Used to compare two or more means

Correlation and Regression

• Methods to study magnitude and direction of the association and the functional relationship between two or more variables

Association of two variables (dep, indep)

Spearman Correlation linear Regression

QuantitativeQuantitative

2 out come T test3+out come ANOVA

categoricalQuantitative

Log. regressionQuantitativecategorical

chi-squarecategoricalcategorical

Test Types of variableDependent independent

Comparing (difference) variables

chi-square

Kruskal wallis

ANOVA

McNemarchi-square*

Wilcoxon Mann-Whitney

Paired T testT testQuantitative

Ordinal

Categorical

Number of independent variable 2 groups paired data >2groups

Variable

* When 1 of the cells have <5 in expected Fisher exact

When 1 of the cells have <5 in observed Yates correction

Dr.H.Qotba 63

Risk Factors

Types of variablesDependent several independent

Test

categorical categorical Multiple log. Regression

quantitative categorical ANOVA

quantitative quantitative Linear, log regression

Sample Size Estimation: Logistic Considerations

• Need to identify outcome(s) that determine sample size– Primary versus Secondary outcomes

• Budget• Ability to recruit from target population• Accrual period • Anticipated refusal rate• Anticipated dropout rate (longitudinal only)

Sample Size Estimation: Statistical Considerations

• Type I error rate (α; usually .05)• Type II error rate (β; 1 – β = Power)• Variability in the outcome (e.g.,

standard deviation)• Size of effect you would like to detect

– Minimum clinically relevant effect size• Not the same as an effect found by someone else

– What is the smallest policy-relevant difference?

• Example: Difference in adherence rates > 15%

• Sample size

Confidence Intervals

• The confidence interval (CI) surrounds the point estimate with a margin of error.

• One margin of error below the point estimate is the lower confidence limit.

• One margin of error above the point estimate is the upper confidence limit.

• The confidence interval’s width quantifies the precision of the estimate (narrow confidence intervals precise).

• Precision is inversely related to sample size (big studies narrow confidence intervals precise estimates)