17
Wilhelm Gaus, Institute for Epidemiology and Medical Biometry, Medical Faculty, University of Ulm, Germany Explorative versus Confirmative Interpretation of Statistical Significance in Toxicological Research using Ginkgo Biloba as an Example

Wilhelm Gaus, Institute for Epidemiology and Medical Biometry, Medical Faculty, University of Ulm, Germany Explorative versus Confirmative Interpretation

Embed Size (px)

Citation preview

Page 1: Wilhelm Gaus, Institute for Epidemiology and Medical Biometry, Medical Faculty, University of Ulm, Germany Explorative versus Confirmative Interpretation

Wilhelm Gaus, Institute for Epidemiology and Medical Biometry,Medical Faculty, University of Ulm, Germany

Explorative versus Confirmative Interpretation

of Statistical Significance

in Toxicological Research

using Ginkgo Biloba as an Example

Page 2: Wilhelm Gaus, Institute for Epidemiology and Medical Biometry, Medical Faculty, University of Ulm, Germany Explorative versus Confirmative Interpretation

Principle of a Statistical Test

Null-hypothesis means there is no effect in the data.

Assume null-hypothesis is true.

Then the probability is computed that the observed data (or more extreme data) originate from random effects .

If this probability is small – less than the selected level of significance – then we decide, that the assumption of the calculation, namely the null-hypothesis, is wrong.

We say the test is significant.

As consequence we decide that the alternative of to the null-hypothesis is right.

Page 3: Wilhelm Gaus, Institute for Epidemiology and Medical Biometry, Medical Faculty, University of Ulm, Germany Explorative versus Confirmative Interpretation

If data from a random number generator are tested with a level of significance of 5% then the probability to get a significant result is 5%.

p-value = probability ( test significant │ random numbers as data ) = 5%

Meaning of the p-value

p-value = probability (test significant null-hypothesis is right)

Page 4: Wilhelm Gaus, Institute for Epidemiology and Medical Biometry, Medical Faculty, University of Ulm, Germany Explorative versus Confirmative Interpretation

Let us make an Experiment in our Minds.

A random number generator gives us n-times data for e.g. 2 groups.

For each pair of groups we make a statistical test with 5% level of significance.How many significant results we expect?

For n = 1 we expect 1 0.05 = 0.05 false significant resultsn = 2 2 0.05 = 0.10n = 5 5 0.05 = 0.25n = 10 10 0.05 = 1.00n = 50 50 0.05 = 2.50n = 100 100 0.05 = 5.00n = 500 500 0.05 = 25.00n = 1000 1000 0.05 = 50.00

The more we test, the more false significant results we get.

Page 5: Wilhelm Gaus, Institute for Epidemiology and Medical Biometry, Medical Faculty, University of Ulm, Germany Explorative versus Confirmative Interpretation

How many test are possible in a data set with v variables measured for g groups at t time-points?

Between g groups are ½ g (g – 1) pairwise comparisons possible.Between t time points are ½ t (t – 1) pairwise comparisons possible.

These comparisons can be done for each of the v variables.

Thus in total ½ v g t (g + t – 2) pairwise comparisons are possible.

Number of Possible Tests

Page 6: Wilhelm Gaus, Institute for Epidemiology and Medical Biometry, Medical Faculty, University of Ulm, Germany Explorative versus Confirmative Interpretation

Example for Number of Possible Tests

Observational study of in-patients in a rehabilitation hospital

18 variables: Body-mass-index, systolic and diastolic blood pressure, resting heart rate, left ventricular ejection fraction (LVEF), haemoglobin, haematocrit, HbA1c, LDL, HDL, serum creatinine, bilirubin, ALT, AST, gamma-GT, forced expiry volume in 1 sec (FEV1), CRP and power in bicycle test.

2 gender groups: female, male2 age groups: ≤ 65 years, >65 years

2 time-points: After admission and before discharge

Number of possible tests (pairwise comparisons) =

½ v g t (g + t – 2) = ½ 18 4 2 (4 + 2 – 2) = 288 tests possible

Page 7: Wilhelm Gaus, Institute for Epidemiology and Medical Biometry, Medical Faculty, University of Ulm, Germany Explorative versus Confirmative Interpretation

Expected Number of False Significant Tests

288 pairwise comparisons, i.e. 288 tests are possible.

Assume a level of significance of 5% was selected for all tests.

Assume further, all data come from a random number generator.

Then we expect 228 0.05 = 14.4 significant tests.

All these significant results are wrong !

Page 8: Wilhelm Gaus, Institute for Epidemiology and Medical Biometry, Medical Faculty, University of Ulm, Germany Explorative versus Confirmative Interpretation

It is regardless if a test is actually computed

or if it is seen from descriptive statistics that it is not significantand therefor superfluous computational work is saved.

Tests possible - Tests actually computed

Relevant is the number of tests possible !

Page 9: Wilhelm Gaus, Institute for Epidemiology and Medical Biometry, Medical Faculty, University of Ulm, Germany Explorative versus Confirmative Interpretation

Example: National Toxicology Program (NTP), Technical Report 578 on Ginkgo biloba

Four studies are reported: A 3-month study on rats, a 3-month study on mice,a 2-year study on rats, and a 2-year study on mice.

3-months study on rats:Groups of 10 male and 10 female rats were administered 0, 62.5, 125, 250, 500, or 1 000 mg of Ginkgo biloba extract / kg body weight in corn oil by gavage, 5 days a week for 14 weeks.

At the end of the experiment, the stomach, liver, bile duct, thyroid gland, kidney, nose, and other locations were investigated. Typical findings were hypertrophy, atrophy, hyperplasia, inflammation, hyperkeratosis, ulcers, pigmentation etc.

Page 10: Wilhelm Gaus, Institute for Epidemiology and Medical Biometry, Medical Faculty, University of Ulm, Germany Explorative versus Confirmative Interpretation

How many Tests are Possible ?3-Month Study on Rats Only

5 dosed groups + control group2 sex-groups: female and male rats

About 10 organs: stomach, liver, bile duct, thyroid gland, kidney, nose, etc

About 10 typical findings: hypertrophy, atrophy, hyperplasia, inflammation, hyperkeratosis, ulcers, pigmentation, etc.

Number of possible tests 6 dose-groups 2 sex-groups 10 locations 10 possible findings

= 1200 possible pairwise comparisons = 1200 possible tests

Of course, not all these tests were computed, but a test would have been computed if promising descriptive statistical results were seen.

Page 11: Wilhelm Gaus, Institute for Epidemiology and Medical Biometry, Medical Faculty, University of Ulm, Germany Explorative versus Confirmative Interpretation

How Many Significant Tests are Expected ?

AssumeAll data came from a random number generator, i.e. all null-hypotheses are valid.

Of course, NTP report 578 presents real data !5% was chosen as level of significance.

About 1200 tests 0.05 60 wrongly significant tests are expected.

Actually 33 significant results were reported.

I conclude,all reported significant results are not a “statistical proof”,but newly generated hypotheses.

Page 12: Wilhelm Gaus, Institute for Epidemiology and Medical Biometry, Medical Faculty, University of Ulm, Germany Explorative versus Confirmative Interpretation

The 2 steps of Experimental Research

Hypothesis generation Hypothesis confirmation

Screening investigationSurvey

Result of data miningInteresting thing in available dataSub-analysis in a controlled trial

Statistical test significantExploratory testing

Precise hypothesis is generated

Precise hypothesis is established

A specific and controlled study is planned and performed.Specific data are gained

Statistical test significantConfirmative testing

Hypothesis is confirmed“Statistical proof”

Page 13: Wilhelm Gaus, Institute for Epidemiology and Medical Biometry, Medical Faculty, University of Ulm, Germany Explorative versus Confirmative Interpretation

Prerequisites for Confirmative Interpretation of Significance

Hypothesis was established beforehand (a priori)

Level of significance was established beforehand (a priori)

New data for confirmative Testing= Independent data for hypothesis generation and hypothesis confirmation

In case more than one statistical test was done:Adjustment for multiple testing e.g. Bonferroni-Holm procedure

If one or more of these 4 prerequisites are not fulfilled:A significance is exploratory, i.e. a hypothesis is generated.

Page 14: Wilhelm Gaus, Institute for Epidemiology and Medical Biometry, Medical Faculty, University of Ulm, Germany Explorative versus Confirmative Interpretation

Irrelevant for exploratory or confirmative

Type of study: Clinical study, animal research, cell cultures

Study design: Parallel groups or cross-over

Type of outcome variable: Qualitative, rating, quantitative, normal distributed

Statistical Test used: Chi-square test, Wilcoxon test, Student t-test

Page 15: Wilhelm Gaus, Institute for Epidemiology and Medical Biometry, Medical Faculty, University of Ulm, Germany Explorative versus Confirmative Interpretation

What You Should Take Home

If you are told that something is significant …If a p-value < 5% is presented …

then you should make clear, is the significance is exploratory or confirmative.

There is another important rule which was not the topic of my presentation:Never interpret a significant p-value without the appropriate descriptive statistics.

Page 16: Wilhelm Gaus, Institute for Epidemiology and Medical Biometry, Medical Faculty, University of Ulm, Germany Explorative versus Confirmative Interpretation

There are several techniques to lie with statistics.

Now you have learned one of these capabilities.

Thanks for your audience !

How to Lie with Statistics ?

Page 17: Wilhelm Gaus, Institute for Epidemiology and Medical Biometry, Medical Faculty, University of Ulm, Germany Explorative versus Confirmative Interpretation

References

Gaus W: Which level of evidence does the US National Toxicology Program provide? Statistical considerations using the Technical Report 578 on Ginkgo biloba as an example.Toxicology Letters 229 (2014), S. 402‑404, http://dx.doi.org/10.1016/j.toxlet.2014.06.017

Heinonen T, Gaus W: Cross matching observations on toxicological and clinical data for the assessment of tolerability and safety of Ginkgo biloba leaf extract.Toxicology 327 (2015), S. 95‑115, http://dx.doi.org/10.1016/j.tox.2014.10.013

Gaus W, Muche R, Mayer B: Interpretation of Statistical Significance – Exploratory Versus Confirmative Testing in Clinical Trials, Epidemiological Studies, Meta-Analyses and Toxicological Screening (Using Ginkgo biloba as an Example).Clinical & Experimental Pharmacology 2015, 5:4http://dx.doi.org/10.4172/2161-1459.1000182

Brand new !

Not in the abstract.