3
Statistics in the Life and Medical Sciences Author(s): Norman E. Breslow Source: Journal of the American Statistical Association, Vol. 95, No. 449 (Mar., 2000), pp. 281- 282 Published by: American Statistical Association Stable URL: http://www.jstor.org/stable/2669546 . Accessed: 15/06/2014 05:01 Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at . http://www.jstor.org/page/info/about/policies/terms.jsp . JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship. For more information about JSTOR, please contact [email protected]. . American Statistical Association is collaborating with JSTOR to digitize, preserve and extend access to Journal of the American Statistical Association. http://www.jstor.org This content downloaded from 195.78.109.54 on Sun, 15 Jun 2014 05:01:43 AM All use subject to JSTOR Terms and Conditions

Statistics in the Life and Medical Sciences

Embed Size (px)

Citation preview

Page 1: Statistics in the Life and Medical Sciences

Statistics in the Life and Medical SciencesAuthor(s): Norman E. BreslowSource: Journal of the American Statistical Association, Vol. 95, No. 449 (Mar., 2000), pp. 281-282Published by: American Statistical AssociationStable URL: http://www.jstor.org/stable/2669546 .

Accessed: 15/06/2014 05:01

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at .http://www.jstor.org/page/info/about/policies/terms.jsp

.JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range ofcontent in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new formsof scholarship. For more information about JSTOR, please contact [email protected].

.

American Statistical Association is collaborating with JSTOR to digitize, preserve and extend access to Journalof the American Statistical Association.

http://www.jstor.org

This content downloaded from 195.78.109.54 on Sun, 15 Jun 2014 05:01:43 AMAll use subject to JSTOR Terms and Conditions

Page 2: Statistics in the Life and Medical Sciences

Statistics in the Year 2000: Vignettes The turn of the century and the millennium is a natural

time to review some of the major developments in statis- tics and to consider research areas that may see increased activity in the years to come.

In the year 2000, the Journal of the American Statistical Association will publish a series of short articles under four subject areas: life and medical sciences, social science and business, physical sciences and engineering, and theory and methods. One subject area will be featured in the Review Section of each of the four issues in 2000.

Each of these vignette collections will highlight some of the most important statistical advances in the given area, as well as outline potentially fruitful topics of research.

We examine our statistical past, comment on our present, and speculate on our future. These will not be exhaus- tive pieces, but rather selected "snapshots" of the world of statistics at the end of the twentieth century. Each of these four vignette collections will be coordinated and edited by a Guest Editor and will consist of vignettes by dis- tinguished authors. In this issue we are pleased to present the life and medical sciences vignettes, edited by Professor Norman Breslow.

Adrian E. Raftery Martin A. Tanner Martin T. Wells

Statistics in the Life and Medical Sciences Norman E. BRESLOW

One of the pleasures of working as an applied statis- tician is the awareness it brings of the wide diversity of scientific fields to which our profession contributes critical concepts and methods. My own awareness was enhanced by accepting the invitation from the editors of JASA to serve as guest editor for this section of vignettes celebrating the significant contributions made by statisticians to the life and medical sciences in the twentieth century. The goal of the project was not an encyclopedic catalog of all the major developments, but rather a sampling of some of the most interesting work. Of the 12 vignettes, 10 focus on partic- ular areas of application: environmetrics, wildlife popula- tions, animal breeding, human fertility, toxicology, medical diagnosis, clinical trials, environmental epidemiology, sta- tistical genetics, and molecular biology. The two vignettes that begin the series focus more on methods that have had, or promise to have, impact across a range of subject matter areas: survival analysis and causal analysis.

The concept of a counterfactual true treatment effect was introduced by Neyman for agricultural field experiments in the 1920s, and Fisher's method of randomization provided a physical basis for making causal inferences. Bradford Hill's advocacy of these principles for use in medicine led to the randomized, double-blind, placebo-controlled clinical trial. As Harrington points out, this was arguably the most im- portant scientific advance in medicine during the twentieth century. Greenland's vignette describes recent theory and methods developed from these same foundations for causal analysis of observational data that may help sort out some vexing public health issues.

The impact of survival analysis has been immense. Wein- berg and Dunson discuss survival methods for population

Norman E. Breslow is Professor of Biostatistics, University of Wash- ington, Seattle, WA 98195 (E-mail: [email protected]).

monitoring of fertility. Ryan describes how transition rate models for carcinogenicity underlie the analysis and inter- pretation of data from the lifetime rodent bioassay, which still strongly influence regulatory policy. Oakes mentions the importance of multivariate survival methods for genetic epidemiology, Pollock cites applications to wildlife studies, and Gianola notes increased use of survival models even in animal breeding. But these many applications still represent only a small sampling of the whole. Kaplan and Meier's product limit estimate, Mantel and Peto's log-rank test and Cox's proportional hazards regression model are the indis- pensable tools of a large cadre of statisticians working on clinical trials in industry, government, and academia. The fact that Cox received the 1990 General Motors prize for clinical cancer research underscores the enormously bene- ficial impact of this work on clinical medicine.

Preventive medicine has been no less affected by the con- cepts and methods of survival analysis. The key epidemi- ologic measure of incidence rate is rooted firmly in the centuries-old tradition of the life table, whereas the more recent concept of relative risk is best understood as a ratio of such rates. The proportional hazards model provided the mathematical foundation for classical epidemiologic meth- ods of relative risk estimation. It paved the way for modern developments by connecting the field to Fisher's likelihood inference and its semiparametric extensions. Particularly important are the new epidemiologic designs that have been stimulated by ideas from survival analysis: the nested case- control design, the case-cohort design, the case-crossover design, and two-phase stratified versions of all of these.

? 2000 American Statistical Association Journal of the American Statistical Association

March 2000, Vol. 95, No. 449, Vignettes

281

This content downloaded from 195.78.109.54 on Sun, 15 Jun 2014 05:01:43 AMAll use subject to JSTOR Terms and Conditions

Page 3: Statistics in the Life and Medical Sciences

282 Journal of the American Statistical Association, March 2000

The vignettes by Oakes and Thomas reference some of this work and cite recent, comprehensive reviews.

Hierarchical modeling is a cross-cutting development whose great importance is chronicled in several vignettes. Statisticians who have discovered its value in their own ar- eas of application owe a great debt to the pioneering efforts of those working in the field of animal breeding, notably Henderson and Patterson and Thompson. Gianola argues that the mixed model equations and their best linear unbi- ased predictors (BLUPs) of genetic value are probably "the most important technological contribution of statistics to animal breeding." Analogous predictors of random effects in both linear and nonlinear mixed-effects models play no less a role in spatial statistics. Thomas, for example, notes their value for smoothing of small area disease rates prior to map construction.

Although hierarchical modeling can proceed using only the mixed model equations and restricted maximum likeli- hood (REML) estimation of variance components, the ad- vantages of a full Bayes approach are increasingly appar- ent. Gianola argues that this provides the only satisfactory solution to assessing uncertainty in variance components and BLUPs. Markov chain Monte Carlo (MCMC) calcula- tions, furthermore, are essential for fitting models with large (he cites a case with 700,000) numbers of random effects. Thomas calls attention to the importance of Bayes model averaging techniques in epidemiology. Guttorp cites several applications of MCMC for spatial prediction in environ- mental problems, and Wong notes the use of MCMC for multiple alignment of DNA sequence data. But Bayesians are not alone in their use of MCMC and other computation- ally intensive procedures. Efron's bootstrap has also dra-

matically impacted both the theory and practice of statis- tics. Pollock in particular notes its application to capture- recapture data.

Public health statisticians tend to favor marginal mean regression models over their hierarchical counterparts, be- cause the parameters then have a desired interpretation in terms of population averages. The generalized estimating equation (GEE) approach with a specified "working" cor- relation matrix, as developed by Liang and Zeger, has rev- olutionized the analysis of longitudinal and other forms of clustered data. Ryan notes the impact of these methods on the analysis of data from reproductive toxicology studies, where the correlation of outcomes among littermates is of little intrinsic interest, and Thomas mentions their impor- tance in epidemiology. Pepe cites both marginal and hier- archical approaches to the analysis of receiver operating characteristic data.

This series of short vignettes provides a sampling of the fascinating statistical problems that arise from the life and medical sciences, of the crucial contributions made by statisticians to those sciences, and of the statistical con- cepts and techniques that have led to this success. They confirm that the statistics of the twenty-first century will be heavily influenced by the revolutionary developments in technology, particularly in the information and biomedical sciences, and by the availability of vast new repositories of geographic and molecular data. The authors, referees, and editors who have contributed their hard work to this project will be amply rewarded if the series helps to attract students of statistical science into the fields that have so stimulated their own interest and productivity.

Survival Analysis David OAKES

1. INTRODUCTION

Survival analysis concerns data on times T to some event; for example, death, relapse into active disease after a pe- riod of remission, failure of a machine component, or time to secure a job after a period of unemployment. Such data are often right-censored; that is, the actual survival time Ti = ti for the ith subject is observed only if ti < ci for some potential censoring time ci. Otherwise, the fact that {Ti > ci} is observed, but the actual value of Ti is not. For example, in a study of mortality following a heart attack, we will typically know the exact date of death for patients who died, but for those patients who survived, we will know only that they were alive on the date of their last follow-up. As an important but sometimes overlooked practical point,

David Oakes is Professor and Chair, Department of Biostatistics, Uni- versity of Rochester, Rochester, NY 14642. This work was supported in part by National Cancer Institute grant ROI CA52572.

these event-free follow-up times must be recorded to allow any meaningful analysis of the data. Usually the ci will vary from patient to patient, typically depending on when they entered the study. The paper of Kaplan and Meier (1958) in this journal brought the analysis of right-censored data to the attention of mathematical statisticians by formulat- ing and solving this estimation problem via nonparametric maximum likelihood. Over the next few years, attention focused largely on extending nonparametric tests, such as logrank, Wilcoxon, and Kruskal-Wallis, to allow for possi- ble right censoring. In this context, Efron (1967) introduced the notion of self-consistency ("to thine own self be true"), a key to the modern approach to missing-data problems via the EM algorithm (Dempster, Laird, and Rubin 1977).

? 2000 American Statistical Association Journal of the American Statistical Association

March 2000, Vol. 95, No. 449, Vignettes

This content downloaded from 195.78.109.54 on Sun, 15 Jun 2014 05:01:43 AMAll use subject to JSTOR Terms and Conditions