View
28
Download
1
Category
Tags:
Preview:
DESCRIPTION
Results II (Figures) Numbers & Statistics Forestry 545 March 4 2014. Dr Sue Watts Faculty of Forestry University of British Columbia Vancouver, BC Canada sue.watts@ubc.ca. General manuscript format. Title Authors Abstract Introduction Materials & Methods Results Discussion - PowerPoint PPT Presentation
Citation preview
1
Results II (Figures) Numbers & Statistics
Forestry 545March 4 2014
Dr Sue WattsFaculty of ForestryUniversity of British ColumbiaVancouver, BC Canada sue.watts@ubc.ca
General manuscript format
Title Authors Abstract Introduction Materials & Methods Results Discussion References
2
Figures• Photographs• Drawings• Gazintas• Algorithms• Maps• Line graphs• Bar graphs• Pie charts• Pictographs
4
Figures
• As with tables, figures should be independent and indispensable
• Good visual material will spark reader interest
• Interested readers will look to the text for answers
5
Figures
• Need to be attractive but not glitzy
• Watch out for size and scale (reduction may accentuate some flaws)
• After reduction to publication size capital letters should be about 2 mm high
• X and Y axis lines should be no wider than lettering
6
Year
0
10
20
30
40
50
60
70
90
100
80
1900 19101905 1915 1920
Local index
Year
0
10
20
30
40
50
60
70
90
100
80
1900 19101905 1915 1920
Local index
Avoid chart junk
7
Katz 2008
Figure captions
• Reader looks at figures then legends
• Title should explain meaning without need to read manuscript
• Does not need to be a complete sentence
• Like table title, usually in two parts– Descriptive title– Essential details
8
Figure captions• Captions for figures
go below figure• In a manuscript,
figure captions are placed on a separate sheet
• How could you improve this caption and graph?
Cumulative weeks to delivery of the women in group A (n =78) and group B
(n = 78)
9
Gustavii 2002
Improved caption & graph
Gestational duration did not differ between the treated women and control
Gustavii 2002
10
Figures• Photograph – used for documentary
illustration
• Drawing• Gazinta• Algorithm• Map• Line graph• Bar graph• Pie chart• Pictograph
11
Photograph
• Value to article can range from Ø to more valuable than any text!
• If you need a photo, pick a journal that produces high quality reproduction
• Crop or mark with arrows to highlight important detail
12
Figures• Photograph
• Drawing• Gazinta• Algorithm• Map – all used as explanatory artwork
• Line graph• Bar graph• Pie chart• Pictograph
16
Gazinta
Visuals that show hierarchy, organization or interaction
• Tree gazintas show sub-assemblies of the same relative importance
• Block diagrams are interaction gazintas
19
“Gazinta” (organization tree)
ELECTRON MICROSCOPE LABORATORY
TRANSMISSION EM SCANNING EM IMAGE PROCESSING
TECHNICAL PERSONNEL
SAMPLE SECTIONING
SAMPLE STAINING
A Typical drawing tree gazinta describes a relatively stable situation.
20
Mathews and Mathews 2008
Algorithm
• Flowcharts & taxonomic keys
• Algorithms are illustrations of a means of making a decision by considering only those factors relevant to that decision
• Algorithms are usually easier to follow than the written text equivalent
21
Flow chart algorithmAbout to receive a
heartworm preventative for the first time…
On a monthly macrolide heartworm
preventative...
Resuming a daily DEC preventative for the coming HW season…
History and heartworm status unknown…
PERFORM A HEARTWORM ANTIGEN TEST
WE NEED ANTIGEN TESTING
Is test positive?
Examine blood with a Knotts or Filter testnoHas dog been on a monthly
heartworm preventative?
yes
Suspect lapse in protection
yesIs test negative?
yes
Is there any history or clinical evidence to suggest
heartworm infection?
no
Dog is free from heartworm infection. May begin
preventative regimen
Are microfilariae present?
Is there any history or clinical evidence to suggest
heartworm infection?
no
yes
noyes
What kind?
Dog has a heartworm infection. Evaluate extent of disease. Determine
treatment protocol. Regard antigen test as false negative.
D. immitis
Suspect error in testing procedure.
Repeat antigen test
Begin further diagnostic
procedures.
no yes
Retest in 3-6 months or contact test manufacturer for consultation
Negative or uncertain results - retest
Is infection confirmed?no
yesPositive retest
D. reconditum
22
Mathews and Mathews 2008
Figures• Photograph• Drawing• Gazinta• Algorithm• Map
• Line graph• Bar graph• Histogram• Pie chart• Pictograph – all used to promote
understanding of numerical results24
Line graph
Graphs are a good choice when you think that a relationship is more important to the reader than the actual numbers
25
Line graph
• Line graphs, scatter graphs, bar graphs, histograms, pies and pictographs are used to promote understanding of numerical results
• Tables present results
• Graphs promote understanding of results and suggest interpretation of their meaning
26
Table or figure?
Time(hour)
midnight2:004:006:008:00
10:00noon2:004:006:008:00
10:00
Normal(mg/dl*)
100.393.688.2
100.5138.6102.4
93.8132.3103.8
93.6127.8109.2
Diabetic(mg/dl)
175.8165.7159.4
72.1271.0224.6161.8242.7219.4152.6227.1221.3
Blood glucose levels
* decaliters/milligram
Blood glucose levels for normal individual and diabetic
Hour
12:00 6:00 am 12:00 6:00 pm 12:00
BloodGlucoseLevel(mg/dl)
300
250
200
150
100
50
0
BreakfastLunch Dinner
Normal
Diabetic
27
Gustavii 2002
Line graph
0
2000
4000
6000
8000
10000
1988 1989 1990 1991 1992
Num
ber
of
confirm
ed c
ases
USA
YearChanges in rabies disease incidence over time.
Canada
28
Mathews and Mathews 2008
-20
0
20
40
60
80
100
0 30 60 90 120 150
Right eye
Left eye
Minutes
Pup
il d
iam
ete
r (%
cha
nge
)
-20
0
20
40
60
80
100
Minutes
Pup
il d
iam
ete
r (%
cha
nge
) Right eye
Left eye
0 30 60 90 120 150
Tyramine Tyramine
Line graph labeling
29
Gustavii 2002
Line graph symbols
• Use standard symbols on line graphs (order below is suggested)
• In some cases there can be symbolic use of symbols, i.e. filled circle for treatment and unfilled circle for the control
Symbols for Line Graphs
30
visible pattern
0
2
4
6
8
10
12
14
16
2
y
0 4 6 8 10 12 14 16x
no visible pattern
0
2
4
6
8
10
12
14
16
2
y
0 4 6 8 10 12 14 16 x
Scatter graphs
31
Katz 2006
Bar graph
• Used to present discrete (unrelated) variables in a forceful way
• Downside is that they present a relatively small amount of information in quite a large space
32
Comparative bar graph
This effective bar graph relates insect type to turning choices.
34
Mathews and Mathews 2008
Histogram
• An estimate of the probability distribution of a continuous variable
• Used to present continuous variables in a forceful way
37
Comparative histogram
Probability of dying in a coronary care unit after admission with initial working diagnosis of acute myocardial infarction.
0
0.1
0.2
0.3
0.4
<45
Pro
ba
bili
lty
Can replace legend with symbols
38
Gustavii 2002
Comparative histogram
Time (min)
pH
0
1
2
3
4
5
6
0 0.5 10 20 30 40 60 80
MD K lowNA highNA
HighNaK H W C
Maximum three groups per category 39
Gustavii 2002
Pie graph
• Good for getting attention
• Show relationship of a number of parts to the whole
• Arrange segments in size order with largest at 12 o’clock
• Downside is that you cannot compare areas
40
Pie graph
Dandelion (50%)
Apple (25%)
Violet (20%)
Rose (5%)
Typical Honeybee Pollen Load Composition (n = 1,034 pellets)
This effective divided-circle graph shows which flowers contribute to a typical honeybee pollen load. To help readers compare the proportions, percentages are included.
41
Mathews and Mathews 2008
Pictograph
Number of Flowering Plant Species in West Suffolk County
1985 1990 1995 2000
110
75
65
55
In this effective pictograph, the length of the flower stems corresponds to the number of plant species.
43
Mathews and Mathews 2008
Using statistics
Using statistics properly is a skill
Never be afraid to ask for advice
Dr Tony KozakWednesdays 8:30 – 11:00 amFSC 2027 by appointment tony.kozak@ubc.ca
47
Descriptive statistics
Usually want to reduce the volume of your data to a few characteristic numbers
These characteristic numbers are descriptive statistics
Certain descriptive statistics are particularly helpful in your Results section
48
Common descriptive statistics
• Size
• Range
• Middle– Mean– Mode– Median
• Spread– Standard deviation– Central 50%
50
Size and range
• Size – this is the total number of data points referred to as N
• Real world data is referred to as the sample and the output of the mathematical formula is called the population
• Range – Distance between smallest and Distance between smallest and largest data valueslargest data values
51
Middle
• Mean – Average data value
• Mode – Data value that occurs most often
• Median – Value such that half the data values are less than this and half are greater
52
Spread
• Standard deviation – Deviation of each data point from the mean
• Large standard deviation means data points are more spread out
• Central 50% – Boundaries in which the middle half of the data points lie when all placed in order
53
Referring to mean and standard deviation
Use mean (SD) = 44% (3)
mean of 44% (SD 3)
NotSD = 44 3%
56
Standard error or standard deviation?
• Standard error (SE) is not a measure of variability
• Standard error is the standard deviation of a statistic and as such is a measure of precision for an estimate
• However, SE is often used descriptively and must be properly identified to avoid confusion
57
Inferential statistics
• Pure mathematics exists in an abstract universe, parallel to the real world
• Inferential statistics is done in the mathematical universe and infers the identity of the mathematical formula from the real world sample
58
Inferential statistics
• Statistical judgments are made by working on the formula in the mathematical universe
• Inferences are covered in your Discussion
59
Normal distribution
• A curve with a smooth bell shape
• Mean, median and mode have same value
• The exact shape of any normal distribution can be defined with just 2 numbers– Its mean and– Its standard deviation
60
Normal distribution
• In the real world no data set makes a perfect curve with infinite smoothness
• Nevertheless, we frequently call real world data sets “normally distributed”
• Many large sets of real world data CAN be well approximated with a normal distribution (baby birth weights). Normal distributions are frequently used in statistical analyses
61
Normal distribution
• Examine your data set carefully
• Look at its shape and do not make any assumptions based on a normal distribution if you are not sure
• Check with a statistician to be certain
63
Non-normal distribution
Many sets of real world data are not normally distributed
– Consider the assignment grades in a graduate level communications course where data points are concentrated asymmetrically in the upper percent numbers
– Consider the histogram of the number of people dying at each age where asymmetry is in the upper ages
64
Non-normal distribution
When you have a non-normal distribution you cannot use mean and standard deviation to describe the distribution – you must use median and rangeConsider the “hand-to-floor stretch” of pregnant women (Gustavii 2002)
– reported as mean of 12 cm (SD 14)
(Does this suggest some poked their fingers through the floor?)
– should have used median and percentile range
66
Non-normal distribution
Rule of thumb
If SD is greater than half the mean, the data are unlikely to be normally distributed
Most results in biomedical science are asymmetrically distributed
67
Hypothesis testing
• In hypothesis testing need to specify probability of a type I error or significance level (α) Usually use α = 0.05
• Results from hypothesis testing should include– Test statistic– Degrees of freedom– P value
68
Choosing a significance test
Do not begin with a test in mind
Answer yes/no questions about what you want to assign confidence levels to
Is my data normally distributed?Is my data random?Does my data match someone else’s?Does my data from exp A differ from data set
of exp B?
69
Choosing a significance test
Now pick a significance test that will directly answer your questions using the data in the form that you have generated
Do not be afraid to ask for advice
70
Probability values
• P value is the probability of obtaining a value of test statistic as large as that observed by chance alone
• Do not confuse this P value with the significance level of the test (α)
• Simply stating that a P value was greater or less than a significance level reduces interpretation to a yes or no
71
Probability values
• Yes/no answers do not indicate the chances of getting a more extreme result
• A P value of 0.04 and 0.06 could be interpreted similarly
• Reporting an actual P value allows the reader to evaluate the actual probability
72
Statistical reportingAlways report
• Name of test
• If data conformed to assumptions of test
• Absolute differences between groups
• 95% confidence interval for each difference
• Practical relevance of each difference
73
Statistical reporting
Always report
• Name of statistical software package that you have used – commercially available packages have usually been well validated, may not be case for custom packages
74
Statistical reporting
• Report statistics parenthetically with individual elements of a test separated by commas
…were significant (2=18.2, df=2, P<0.001)
• Use zero to left of decimal when reporting P values and correlation coefficients
...means differed by 17.8 g (p=0.23)
75
Statistical reporting
• Do not use more than 3 decimal places when reporting P values
• Use exact values rather than inequalities
• Smallest P value that needs to be reported is p<0.001
76
Statistical reporting
• Statistical methods do not need elaborate presentation – a simple statement of the chosen test and the probability level is usually all that is needed
• Reference a text that details the procedure if you feel that this is necessary
77
Statistical reporting (Mathews et al 2000)
To determine whether the two species differed in their egg cannibalism rate (Table 1), we used the Fisher Exact Probability Test, with =(A+B)!(C+D)!(A+C)!(B+D)!/N!A!B!C!D!,
to obtain a p=0.05, which was not significant
BetterThe differences in the egg cannibalism rates of the two species (Table 1) were not significant (Fisher Exact Probability Test, p=0.05)
78
Statistical significance & scientific importance
Scientific research yields 2 kinds of significanceScientific
Statistical
Scientific importance is often ignored as it involves some subjectivity
Statistical significance is easy to convey but may lack scientific vigour
79
Statistical significance & scientific importance
A test result may be statistically significant but the difference between the means tested may be so small that it is scientifically irrelevant
Also, the power of a test increases with sample size and large samples may reveal differences that small ones would not
80
Statistical significance & scientific importance
Statistically significant results should always be accompanied by a discussion of the scientific importance of the findings
81
Statistical significance & scientific importance
Drug lowered blood pressure by a mean of8 mm Hg from 100 – 92 mm HgStatistically significant (p<0.05)
Better way to present this is with 95% confidence interval (CI)
Here, CI was 2 – 14 mm HgScientifically important to decrease blood pressure by as much as 14 mm Hg, reduction of 2 mm Hg would not be important
Example from Gustavii 200282
Statistical significance & scientific importance
In this example could have said
Blood pressure was lowered by a mean of 8 mm Hg from 100-92 mm Hg (95% CI=2-14 mm Hg; p=0.02)
P values estimate statistical significance
CI values also estimate scientific importance
When CI is used readers can judge for themselves
83
Potentially problematic statistical terms (CSE 2006)
Random sample implies true randomizationOften confused with “sampling without known bias”Confidence interval or limit better to use interval as limit implies 2 discrete and unchanging valuesStandard deviation better to note as SD rather than S. Does not need sign
84
Potentially problematic statistical terms (CSE 2006)
Standard error of the mean (SE) has little practical value on its own
Use SD (or interpercentile range) not SE to indicate variability in a set of data
Use CI rather than SE as a measure of precision for an estimate
85
Significant digits (CSE 2006)
• Calculated values (means, standard deviations) should be to no more than one significant digit beyond the accuracy of the data
• Only when sample sizes are large (>100) should percentages be expressed to one decimal place
86
Rounding numbers (CSE 2006)
To retain 3 significant digits
If 4th digit is less than 5, leave 3rd unchanged
4.282 becomes 4.28
If 4th digit is greater than 5, increase 3rd by 1
4.286 becomes 4.29
87
Rounding numbers (CSE 2006)
To retain 3 significant digitsIf 4th digit is 5 and 5th is zero, leave 3rd digit unchanged when third digit is even
4.285 becomes 4.28
When 3rd digit is odd, increase it by 1
4.275 becomes 4.28
If 4th digit is 5 and 5th is not zero, increase 3rd by 1
4.2851 becomes 4.29
88
Numbers and units
Ranges and units – can use single unit after second number
23 to 47 km or 23 km to 47 kmNot so with percentages
10% to 15% not 10 to 15% (but 10-15% is acceptable)
Close up numbers and non-alphanumeric symbols
3 mm 44% $98
89
Scientific notation (CSE 2006)
Express very large numbers to the power of 10 (scientific notation)
2.6 x 104 ……. not 26 000
4.23 x 108……not 423 000 000
7.41 x 10-6 ……not 0.000 007 41
90
Writing numbers
Some rules
Most style manuals now suggest writing out all numbers (not just those <10)
New rule: In 1 of the 19 forest stands…
Still need to spell out numbers at beginning of sentence
91
Writing numbers
Example following this rule:
Three thousand eight hundred and seventy-six seedlings were measured at 8-12 weeks following fertilizer treatment. One hundred and sixty-six (4.3%) were found to have increased height growth.
Correct, but do you find this difficult to grasp?
92
Writing numbers
Better to re-write so that numbers fall somewhere in the middle
Height measurements of 3 876 seedlings at 8-12 weeks following fertilizer treatment showed that 166 (4.3%) had increased growth.
93
Writing numbers
Numbers side by side:
The spiders with dorsal stripes had an average of 257, 112 red and 145 other colours
Need to separate:
The spiders had an average dorsal stripe count of 257, of which 112 were red and 145 were other colours
94
Writing numbers
• American and British practice is to indicate thousands with commas
• However, to avoid confusion with decimal marker, many style manuals recommend the use of a space to mark off thousands
12 345 (not 12,345)
Follow your journal style
95
Using percentages
• If the total number is less than 25, do not use percentages
• If the total number is between 25 and 100, percentages should be expressed without decimals (7%, not 7.1%)
• If the total number is between 100 and 100 000, one decimal place may be added (7.1%, not 7.13%)
• Only if the total number exceeds 100 000 may two decimals be added (7.13%)
96
Using percentages
The original data should always be includedOrder of presentation is important
Height growth occurred in 209 (7.5%) of the 2,801 trees
Do not write Height growth occurred in 7.5% (209) of the 2,801 trees
97
Using percentages
Do not use prose descriptions for numerical data without the actual numbers
When 51 researchers were asked to quantify “often”, the range was between 28 and 92 percent (average 59%)
Better to say
Most of the trees (82%)….
98
Recommended