6
ELSEVIER Computer Methods and Programs in Biomedicine 48 (1995) 257-262 computer methods and programs in biomedicine ale: a new computer program for medical statistics ’ F. Schoonjans *, A. Zalata2, C.E,. Depuydt, F.H. Comhaire Department of Internal Medicine, Section for Endocrinology and Metabolic Diseases, University Hospital, De .Pintelaan 185, 8-9000 Gent, Belgium Received 21 June 1995; revision received 2 September 1995; accepted 13 September 1995 In recent years, the useand abuse of statistics in the medical literature has extensively been reviewed. Amongst others, the importance of the P-value has been challenged and lthe use of misleading graphics, includiqg 3-dimensional displays, has been criticized. Theease of access to more complex statistical procedures, since the introduction of several statistical software packages for personal computers, hasbeen identified asone of the factors involved in the misuse of statistics. Therefore, we have developed a new computer program that includes those statistical procedures common- ly encountered in the medical literature and in statistical textbmooks for medical researchers. More complexstatistical analyses are not implemented in the software. If researchers with limited statistical training require moresophisticated statistical analyses, they shouldrefer to a statistician, not to a more complete statistical software package. Keywords: Medical statistics; Statistical software; ROC curves; Bland and Altman plot 1. htmduction Since the introduction of several statistical soft- ware packages for personal computers, more com- plex statistical procedures have become available for medical researchers. This, however, has not lead to more reliable statistics [l]. In addition, errors in statistical software and manuals have been documented 121. Am.ongst others, the importance of P-values has * Corresponding author, Tel: +32-9-2402141; e-mail: [email protected] ’ A demo of the program can be downloaded from tiie internet: http:Nallserv.rug.ac.be/-fschoonj/. *On leave of absence from Department of Biochemistry, Faculty of Medicine, Mansoura University, Mansoura. Egypt. been overemphasizedin the medical literature and more attention should be given to the useof confi- dence intervals [3-81. Although correlation and regression are two distinct statistical methods, serving different pur- poses,they are commonly confused [ 1,9]. One of the reasons may be that computer programs treat correlation and regression simultaneously. The use of 3-dimensional graphs may be impor- tant for publicity and marketing of somestatistical packages, but they do not have any scientific sig- nificance [lo]. Other graphs such as the frequently seen bar chart with error bars, where the error bars represent 1 standard deviation (SD.) or 1 standard error of the mean (S.E.M.) are not useful [lo] and may even be misleading 131. Since the use of statistical procedures in the 016%2607/95/$09..50 0 1995 Elsevier Science Ireland Ltd. All rights reserved SSDI 0169-2607(95)01703-V

MedCalc: a new computer program for medical statistics

Embed Size (px)

Citation preview

Page 1: MedCalc: a new computer program for medical statistics

ELSEVIER Computer Methods and Programs in Biomedicine 48 (1995) 257-262

computer methods and programs in biomedicine

ale: a new computer program for medical statistics ’

F. Schoonjans *, A. Zalata2, C.E,. Depuydt, F.H. Comhaire Department of Internal Medicine, Section for Endocrinology and Metabolic Diseases, University Hospital, De .Pintelaan 185,

8-9000 Gent, Belgium

Received 21 June 1995; revision received 2 September 1995; accepted 13 September 1995

In recent years, the use and abuse of statistics in the medical literature has extensively been reviewed. Amongst others, the importance of the P-value has been challenged and lthe use of misleading graphics, includiqg 3-dimensional displays, has been criticized. The ease of access to more complex statistical procedures, since the introduction of several statistical software packages for personal computers, has been identified as one of the factors involved in the misuse of statistics. Therefore, we have developed a new computer program that includes those statistical procedures common- ly encountered in the medical literature and in statistical textbmooks for medical researchers. More complex statistical analyses are not implemented in the software. If researchers with limited statistical training require more sophisticated statistical analyses, they should refer to a statistician, not to a more complete statistical software package.

Keywords: Medical statistics; Statistical software; ROC curves; Bland and Altman plot

1. htmduction

Since the introduction of several statistical soft- ware packages for personal computers, more com- plex statistical procedures have become available for medical researchers. This, however, has not lead to more reliable statistics [l]. In addition, errors in statistical software and manuals have been documented 121.

Am.ongst others, the importance of P-values has

* Corresponding author, Tel: +32-9-2402141; e-mail: [email protected]

’ A demo of the program can be downloaded from tiie internet: http:Nallserv.rug.ac.be/-fschoonj/.

*On leave of absence from Department of Biochemistry, Faculty of Medicine, Mansoura University, Mansoura. Egypt.

been overemphasized in the medical literature and more attention should be given to the use of confi- dence intervals [3-81.

Although correlation and regression are two distinct statistical methods, serving different pur- poses, they are commonly confused [ 1,9]. One of the reasons may be that computer programs treat correlation and regression simultaneously.

The use of 3-dimensional graphs may be impor- tant for publicity and marketing of some statistical packages, but they do not have any scientific sig- nificance [lo]. Other graphs such as the frequently seen bar chart with error bars, where the error bars represent 1 standard deviation (SD.) or 1 standard error of the mean (S.E.M.) are not useful [lo] and may even be misleading 131.

Since the use of statistical procedures in the

016%2607/95/$09..50 0 1995 Elsevier Science Ireland Ltd. All rights reserved SSDI 0169-2607(95)01703-V

Page 2: MedCalc: a new computer program for medical statistics

258 F. Schoonjans et al. /Computer Methods and Programs in Biomedicine 48 (1995) 257-262

Table 1 Statistical content of 115 original articles in volume 32 1 (1989) of the New England Journal of Medicine [ 1 l]

Procedure

No statistical methods or descriptive methods

t-tests Contingency tables Survival methods Epidemiologic statistics Non-parametric tests Analysis of variance Pearson correlation Multiple regression Multiway tables Multiple comparisons Adjustment and standardization Simple linear regression Transformation Power Non-parametric correlation Cost-benefit analysis Other methods

n (“q -

14 (12)

45 (39) 41 (36) 37 (32) 25 (22) 24 (21) 23 m) 22 (19) 16 (14) 11 (10) 10 (9) 10 (9) IO (9) 8 (7) 4 (3) 1 (1) 0 (0)

10 (9)

medical literature is well known [l l] (Table l), a specific software package for medical researchers can be limited to those statistics and tests describl- ed and used in medical journals.

All these aspects have been taken into account during the development of a new computer pro- gram for medical statistics named MedCalc.

2. Description of the software

2.1. Program description The program is written in Modu1a.G [12], using

the Topspeed Modula-2 development system (ver- sion 1.17). This package was enhanced by adding several new libraries and modules in order to keep the system up to date (mouse functions, menu and form manager, extended memory functions).

2.2. User interface The requirements to run the MedCalc program

are an IBM-compatible AT computer, MS-DOS version 3.00 or higher, a Microsoft compatible mouse, 640 Kb RAM memory and a hard disk. Extended memory is not required, but is useful.

MedCalc uses an up-to-date user interface in- cluding drop-down menus, dialogue panels and a window display system (Fig. 1). Data1 can easily be entered in the integrated spreadsheet window. The user can import Lotus, dBASE or delimited ASCII files. A built-in text editor can be used to docu-

ENT

174 182 172 170 179 177 175 178 186 189 188 178

61 82 80 75 72 78 66 83 76 86 89 88

630 610 531 723 412

416 517

26 441

11

Fig. 1. The user interface includes drop-down menus and a window display system. Data are entered in an integrated spreadsheet.

Page 3: MedCalc: a new computer program for medical statistics

F. Schoonjans et al. /Computer Methods and Programs in Biomedicine 48 (1995) 257-262 259

Table 2 The program’s menu structure

Menu

Info File

Data

Statistics

RoiX

Tests

Graphs

Sampling Notes Options

Options

About MedCalc, File information, System information, Help index, Glossary New, Load file, Save, Save as, Add file, Rlead Lotus file, Read dBASE file, Import ASCII file, Export ASCII file, Delete tile, Print data, Quit MedCalc Open spreadsheet window, Find, Find and Replace, Sort, Define column, Fill column, Calculate, Insert rows/columns, Delete rows/columns Clear rows/columns, List variables, Create variables list, Remove variable, List selection criteria, Clear selection criteria, Remove criterion Summary statistics, Histogram, Cumulative distribution, Box-and-whisker plot, Correlation, Rank correlation, Scatter diagram, Regression, Scatter diagram and Regression line, Multiple regression, I-tests, Wilcoxon tests, One-way analysis of variance, Two-way analysis of variance, K.ruskal-Wallis test, Contingency table ROC curve analysis, Display ROC curve, Interactive dot diagram, Predictive values, Comparison of ROC curves, Display ROC curves Test for one mean, Chi-square test, Comparison of means (I-test), Comparison of standard deviations (F-test), Correlation coefficients, Percentages, Relative risk, Odds ratio, Inter-rater agreement Data comparison graphs, Multiple comparison graphs, Bland and Altman plot, Kaplan-Meier survi- val curve, Quality control chart Single mean, Single proportion, Comparison of two means, Comparison of two proportions Open text window. Clear text, Print text, Page layout, Import text, Export text, Print results Settings, Row synchronization, Column synchronization, Directories, Graphics board, Graphics printer, Graph colours, Save options

ment the data files. Comprehensive on-line help is available. The complete menu structure is represented in Table 2.

2.3. Spreadsheet In 1989, we developed the 3-dimensional spread-

sheet 3D-Calc for Atari ST computers [ 13,141. In this spreadsheet program we introduced the STDS en VARS functions for the calculation of sample standard variation and variance. These functions are now common in newer versions of other spreadsheet programs (Lotus, Excel) and are also available in the MedCalc spreadsheet.

The Barnes of the variables are entered in the top row of the different columns of the spreadsheet. Cells may contain numeric of alphanumeric en- tries, or formulas. These formulas may include re- lative or absolute cell references. The functions’ arguments may also be the name of a variable. Spreadsheet functions can also be used for data transformation, e.g. the LOG function for loga- rithmic transformations.

Up to 42 mathematical, statistical, string, logical and date functions, similar to the ones known in other spreadsheet programs, such as, for example,

L,otus 123, can be used in the spreadsheet formu- las. Some special functions are, for example, the RND function which generates uni:formly distri- buted random numbers, and the RNDNORM function, which generates random numbers drawn from a normal distribution.

The MedCalc spreadsheet consists of 256 col- umns and 2048 rows. This means that up to 256 variables can be entered for 2048 cases. If extended memory is available, there is no probtem in storing all these data in memory. Furthermore, when started from within Windows, MedCalc can make use of Windows’ virtual memory manager. If the user does not use Windows, the MedCalc program can be started with the N parameter. It will then use an integrated virtual memory system.

2.4. Text module A simple text editor is integrated in the pro-

gram. The notes entered in this editor form are saved with the spreadsheet data file. Since categorical variables are preferable entered in the spreadsheet by means of numeric codes, this editor can be used for instance ‘for documenting and

Page 4: MedCalc: a new computer program for medical statistics

260 F. Schoonjans et al. / Computer Methods and Programs in Biomedicine 48 (1995) 257-262

6UMRV STAIISTICS- Uarlable : WEIGHT

-------I Sample size = ml3 Lousst vaiue z SY.afmR Highest %mlue q 165.Eem

Arithmetic mean = 77.8488 95% CI for the mean 75.1376 to 78.9424

Nedian = 77.fmm 9% CI for the median = 74.8888 to 79.eflm

Variance = 91.9176 Standard deviation = 9.5874 Relative standard deviation = 8.1244 StalPdard ~PX.OP of the mean = 8.9587

Test for Normal distribution : P = 8.7132 CChi-square=S.I%'Y DF=lZ)

Percentiles : 2.5th = 68.6eeB Y7.5th =

Fig. 2. Presentation of results for summary statistics.

descri iables.

the codes used for the different var-

2.5. Statistics The statistical procedures and graphs im-

plemented in MedCalc can be recognized in the menu structure (Table 2).

In the summary statistics display (Fig. 2), Med- Calc a~ltomati~ally gives the 95% confidence inter- val of the mean and the median. A test for normal distribntion is always performed. Besides the classical 5th, IOth, 25th, 75th, 90th and 95th per- centile, the program also lists the 2.5th and 97.5i.h percentiles, giving the 95% central range.

The program makes a clear distinction between correlation and regression, which are different op- tions in the menu. The correlation coefficient is given with a 95% confidence interval. Correlation is accompanied by a scatter diagram. Regression is accompanied by a scatter diagram plus regression line and confidence intervals. The regression line does not exceed the range of observations.

When a researcher wants to compare his results with results reported in the literature and he does not have access to the raw data, he can also per- form statistical tests (e.g. comparison .of means, percentages, correlation coefficients). This is a fea- ture rarely seen in other statistical programs.

The program does not offer 3-dimensional graphs. This should be considered to be an asset, not a sbort~Qrni~g since neither 3-D, realism nor perspective features are considered useful for scientific presentation [lo].

The weal-known bar or line chart with error bars

are available in the program. However, these error bars do not represent 1 standard deviation (SD.) or 1 standard error of the mean (S.E.M.), which is suggestive and leads to misinterpretation [3,9,10]. Several better options are available, including error bars representing 2 S.D. or a 95% confidence interval of the mean. As alternatives the box-and- whisker plot [I 51, or notched box-and-whisker plot [16] are also available. The latter plot allows for a pairwise comparison of the medians at the 95% confidence level. Furthermore, all these graphs offer the possibility of also displaying indi- vidual observations [3].

The program includes some statistical tests which are not available in other commercial packages, namely Receiver Operating Character- istic (ROC) curve analysis and Bland and Altman plot for method comparison. Both the ROC and Bland and Altman methods have been described in medical journals [17-201 and in statistical text- books for medicine [l].

In ROC curve analysis [21], the power of a test t.o discriminate between two populations is studied, one population known to be positive and one known to be negative. For every possible threshold value, the program calculates the test’s sensitivity, specificity and predictive power (Fig. 3) and the ROC plot is obtained by plotting the sensi- tivity for every threshold value against l- specificity (Fig. 4). The further the curve shifts to the upper left comer of the ROC plot, the better the characteristic can discriminate between the two Ipopulations. If the ROC curve coincides with the

POSITIVE GROUP q INFECTION=1

Sample size = 46 NSGATIUE GROUP = INFECTION=8

Sample size = 124

Disease prevalence CA/.) q 1a.e

under the ROC C,,PV~ = B.911

Fig. 3. Presentation of results for Receiver Operating Characteristic curve analysis.

Page 5: MedCalc: a new computer program for medical statistics

F. Schoonjans et al. / Computer Methods and Programs in Biomedicine 48 / 199.5) 257-262 261

ROC CURUE S-IL6

.:.

:;:

.i.

20 40 60 80 100

100-Spec if ie ity

Fig. 4. Receiver Operating Characteristic curve: graphical display.

diagonal, the characteristic cannot separate diseas- ed from non-diseased cases and, therefore, has no diagnostic value. The ROC curve for an ideal diag- nostic test (perfect separation) reaches the upper

POSITIUE NEGAT I UE

left hand corner. In a second graphical display, an interactive dot diagram, the user can use the mouse to select different threshold values and read the corresponding sensitivity and. splecificity from the screen (Fig. 5). In addition, the program has an option to compare the areas under two ROC curves [22].

The Bland and Altman plot is a relatively new graphical method for method comparison studies (Fig. 6) [20]. The graph is basically a scatter dia- gram of the differences plotted against the averages of two measurements. Horizontal lines are drawn at the mean difference and at the mean difference plus and minus 2 times the standard deviation of the differences. The plot is useful to reveal a possible relationship between the differ- ences and the averages, to look for any systematic biases and to identify possible outliers. If the dif- ferences within mean f 2 SD. are not clinically important, the two methods may be used inter- changeably. The Bland and Altman plot may also be useful to assess the repeatability o:f a method by comparing repeated measurements using one sin- gle method on a series of subjects. The graph can then also be used to check whether the variability or precision of a method is related to the size of the clharacteristic being measured.

S-IL6 S-IL6 Spec: 01 INFECTION=1 INFECTION=0

Fig. 5. Imeractive dot diagram. The user can select different cut-off vaiues by clicking with the mouse in the displey and read the corresponding sensitivity and specificity from the screen.

Page 6: MedCalc: a new computer program for medical statistics

F. Schoonjans et al. /Computer Methods and Programs in Biomedicine 48 (1995) 257-262 262

-15 10 30 50 70 90 l10

FtUERAGE of Meth~odl and Method2

Fig. 6. Bland and Altman plot [20] for method comparison: In this graph, the differences between two methods are plotted against the averages of the two methods. Horizontal lines rep- resent the mean difference and mean difference plus and minus 2 x standard deviations.

3. Conclusion

Several statistical computer packages are avail- able for personal computers and medical doctoss can now perform sophisticated statistical analyses. However, when unqualified users refer to a soft- ware package designed for statisticians, there is a danger of misinterpretation and error.

We have described a new computer progralm which fulfils the needs of medical researchers. If these researchers, who often have limited statistical training [23,24], require more sophisticated statistical analyses, they should refer to a statistician, but should not be tempted to use more complete and also more complex statistical packages. Nevertheless, there is still a need for greater emphasis on statistical training in medical education.

References

[l] D.G. Altman, Practical statistics for medical research. (Chapman and Hall, London, 1991).

[2] J.M. Bland and D.G. Altman, Misleading statistics: errors in textbooks, software and manuals, Int. J. Epidemiol. 17 (1988) 245-247.

[3] D.G. Altman, SM. Gore, M.J. Gardner and S.J. Pocock, Statistical guidelines for contributors to medical jour- nals, Br. Med. J. 286 (1983) 1489-1493.

[4] M.J. Gardner and D.G. Altman, Confidence intervals

ISI

I61

t71

I81

(91

[lOI

1111

WI

iI31

u41

t151

I161

1171

I181

1191

WI

1211

1221

[231

[241

rather than P values: estimation mther than hypothesis testing, Br. Med. J. 292 (1986) 746-750. C.J. Bulpitt, Confidence intervals, Lancet i (1987) 494-497. S.J.W. Evans, P. Mills and J. Dawson, The end of the p value? Br. Heart. J. 60 (1988) 177-180. International Committee of Medical Journal Editors, Uniform requirements for manuscripts submitted to biomedical journals, N. Engl. .I. Med. 324 (1991) 424-428. G. Guyatt, R. Jaeschke, N. Heddle, D. Cook, II. Shan- non and S. Walter, Basic statistics for clinicians: 2. Inter- preting study results: confidence intervals, Can. Med. Assoc. J. 152 (1995) 169-173. D.G. Altman, Statistics and ethics in medical research. VI - Presentation of results, Br. Med. J. 281 (1980) 1542-1544. L. Wilkinson, Graphical displays, Stat. Methods Med. Res. 1 (1992) 3-25 J.D. Emerson and G.A. Colditz, Use of statistical analy- sis in the New England Journal of Meldicine, in Medical Uses of Statistics, eds. J.C. Bailer III and F. Mosteller, pp. 45-57 (NEJM Books, Boston MA, 1992). N. Wirth, Programming in Modula-2, 3rd ed (Springer- Verlag, Berlin, 1985). F. Schoonjans, 3D-Calc: the 3-dimensional spreadsheet. Program manual, 1st ed (Istari Corp., Ghent, Belgium, 1989). F. Schoonjans, 3D-Calc: the 3-dimensional spreadsheet. Program manual, 3rd ed (FaST Club, Nottingham, UK, 1994). J.W. Tukey, Exploratory data analysis, (Addison- Wesley, Reading, MA, 1977). R. McGill, J.W. Tukey and W.A. Larsen, Yariations of box plots, Am. Stat. 32 (1978) 12-16. C.E. Metz, Basic principles of ROC analysis, Semin. Nucl. Med. 8 (1978) 283-298. P.F. Grlner, R.J. Mayewski, AI. Mushlin and P. Greenland, Selection and interpretation of diagnostic tests and procedures, Ann. Int. Med. 94 (1981) 553-600. D.G. Altman and J.M. Bland, Diagnostic tests, 3: receiver operating characteristic plots, Br. Med. J. 309 (1994) 188. J.M. Bland and D.G. Altman, Statistical methods for assessing agreement between two methods of clinical measurement, Lancet i (1986) 307-310. M.H. Zweig and G. Campbell, Receiver-operating characteristic (ROC) plots: a fundamental evaluation tool in clinical medicine, Clin. Chem. 39 (1993) 561-577. J.A. Hanley and B.J. McNeil, A method of comparing the areas under receiver operating characteristic curves derived from the same cases. Radiology 148 (1983) 839-843. H.R. Wulff, B. Andersen, P. Brandenhoff and F. Guttler, What do doctors know about statistics? Stat. Med. 6 (1987) 3-10. D.G. Altman and J.M. Bland, Improving doctors’ understanding of statistics (with discussion), J. R. Stat. Sot., A. 154 (1991) 223-267.