FREE-KNOT SPLINES AND BOOTSTRAPPING FOR … SPLINES AND BOOTSTRAPPING FOR NONLINEAR MODELING IN COMPLEX SAMPLES ... estimating the complexity of the free-knot spline by …

FREE-KNOT SPLINES AND BOOTSTRAPPING FOR NONLINEAR MODELING IN COMPLEX SAMPLES

by

SCOTT W. KEITH

DAVID B. ALLISON, CHAIR CHARLES R. KATHOLI CHARLES D. COWAN

NENGJUN YI OLIVIA THOMAS

EDWARD W. GREGG

A DISSERTATION

Submitted to the graduate faculty of The University of Alabama at Birmingham, in partial fulfillment of the requirements for the degree of

Doctor of Philosophy

BIRMINGHAM, ALABAMA

2008

Copyright by Scott W. Keith

2008

FREE-KNOT SPLINES AND BOOTSTRAPPING FOR NONLINEAR MODELING IN COMPLEX SAMPLES

Scott W. Keith

BIOSTATISTICS

ABSTRACT

Studies on body mass index (BMI) as it relates to headache or mortality have

noted considerable nonlinearity. Approaches to rigorously analyzing these relationships

have been limited to generalized linear models and survival models using either

categorizations or polynomials of BMI. I have designed, evaluated, and implemented a

piecewise linear logistic regression framework for modeling nonlinearity between a

binary outcome and a continuous predictor, such as BMI, adjusted for covariates in

complex samples. Least squares and maximum likelihood estimation methods were used

to numerically optimize free-knot splines. Inference methods utilized both parametric and

nonparametric bootstrapping. Parameter estimates were structured for interpretability by

investigators familiar with logistic regression. Unlike other nonlinear software, this

framework accounts for multistage cross-sectional survey sample designs.

I applied this framework to complex datasets to examine the US population for

headache among women and mortality as they respectively relate to BMI. For headache,

datasets included the National Health Interview Survey (NHIS) and the first National

Health and Nutrition Examination Survey (NHANES I). A common nadir in the BMI-

headache relationship was detected around a BMI of 20, relative to which mild obesity

(BMI of 30) and severe obesity (BMI of 40) were respectively associated with roughly

35% and 80% increased odds of headache. Mortality analyses focused on NHANES III.

BMI showed a checkmark-shaped relationship with odds of mortality, but elevated BMI

iii

did not show significantly increased odds. This was unexpected and the product of a

nascent analysis plan. Thus, this finding should be viewed as preliminary. Waist-to-hip

ratio (WHR) has been used as an anthropometric predictor of mortality risk, but the shape

of the relationship has not been carefully examined. For comparison with BMI, I

investigated WHR in NHANES III. Linear logistic regression methods were sufficient for

the WHR-mortality relationship, but WHR was a significant predictor for women only.

The results of these studies relate broadly to the US population and the methods

provide a flexible logistic regression framework for detecting and characterizing

nonlinear relationships. The estimates may provide impetus for more focused obesity,

headache, and mortality research which might realistically affect long-term public health

policy and risk awareness.

iv

DEDICATION

I owe a great debt of gratitude to my best friend and wife, Aimee A. Dugas. Her

love and tireless support of me in developing my career has been a priceless gift. It is to

her that I dedicate this work.

v

ACKNOWLEDGMENTS

I would like to thank the members of this dissertation committee for their

guidance, criticism, and support. In particular, I wish to acknowledge my friend and

mentor David B. Allison whose trust and encouragement has profoundly impacted my

thinking, skill set, and potential for success. Much appreciation also goes to Tapan Mehta

for his assistance in running simulation programs and porting code for high-performance

parallel computing. Thanks to the International Journal of Obesity in which was

published, “Putative contributors to the secular increases in obesity: Exploring the roads

less traveled.” Thanks to Obesity in which was published, “BMI and headache among

women: Results from 11 epidemiologic datasets.” This research was supported in part by

the following NIH Grants: T32HL079888, P30DK056336, and R01DK076771; and

Ortho-McNeil Pharmaceutical, Inc.

vi

TABLE OF CONTENTS

Page

ABSTRACT....................................................................................................................... iii DEDICATION.....................................................................................................................v ACKNOWLEDGMENTS ................................................................................................. vi LIST OF TABLES............................................................................................................. ix LIST OF FIGURES .............................................................................................................x LIST OF ABBREVIATIONS............................................................................................ xi

INTRODUCTION ...............................................................................................................1

Goals and Objectives ...............................................................................................1 How These Ideas Developed ...................................................................................2 An Overview of the Components of This Research ................................................3

Piecewise Linear Free-Knot Splines............................................................3 Knot Selection Via Parametric Bootstrapping ............................................3 Complex Multistage Probability Samples....................................................3 Adjustments to Estimates and Confidence Intervals ....................................4 Simulations...................................................................................................4 Applications to Public Health Outcomes .....................................................4

Specific Aims...........................................................................................................5

Specific Aim 1 ..............................................................................................5 Specific Aim 2 ..............................................................................................6

The Papers................................................................................................................7

PUTATIVE CONTRIBUTORS TO THE SECULAR INCREASE IN OBESITY: EXPLORING THE ROADS LESS TRAVELED ...............................................................8

vii

A FREE-KNOT SPLINE MODELING FRAMEWORK FOR PIECEWISE LINEAR LOGISTIC REGRESSION IN COMPLEX SAMPLES ..................................................44 BMI AND HEADACHE AMONG WOMEN: RESULTS FROM 11 EPIDEMIOLOGIC DATASETS .......................................................................................................................90 BODY MASS INDEX AND WAIST-TO-HIP RATIO AS THEY RELATE TO MORTALITY IN NHANES III ......................................................................................116 CONCLUSION................................................................................................................145 GENERAL LIST OF REFERENCES .............................................................................150 APPENDIX A Future Directions: Nonlinear Cox Proportional Hazards Regression...........154 B On the effects of ignoring sample weights or those with extremely

high BMI .....................................................................................................158

viii

LIST OF TABLES

Tables Page BMI AND HEADACHE AMONG WOMEN: RESULTS FROM 11 EPIDEMIOLOGIC

DATASETS 1 Description of epidemiologic datasets used.........................................................111

2 The coding of headache among the 11 datasets...................................................112

3 Piecewise logistic regression primary model results ...........................................113

4 Piecewise logistic regression extended model results .........................................114

5 Odds ratios and 95% confidence intervals across selected BMI values for the primary and extended models ..............................................................................115

BODY MASS INDEX AND WAIST-TO-HIP RATIO AS THEY RELATE TO

MORTALITY IN NHANES III

1 NHANES III data description..............................................................................139

2 Piecewise linear logistic regression model results for relating log-odds of mortality during follow-up to BMI and WHR by gender ....................................140

3 Odds ratios and bootstrap 95% CI across selected BMI and WHR values by

gender...................................................................................................................141

ix

LIST OF FIGURES

Figures Page

PUTATIVE CONTRIBUTORS TO THE SECULAR INCREASE IN OBESITY: EXPLORING THE ROADS LESS TRAVELED

1 Secular changes in a number of key indicators of factors that may be

related to the increase in obesity............................................................................43 A FREE-KNOT SPLINE MODELING FRAMEWORK FOR PIECEWISE LINEAR

LOGISTIC REGRESSION IN COMPLEX SAMPLES

1 Plotted B-spline basis functions of order m = 2 having knots at BMI = 21 and BMI = 34................................................................................................................87

2 Model selection simulation results for the parametric bootstrap 2 df forward

selection procedure ................................................................................................88 3 A comparison of knot selection simulation results ...............................................89

BMI AND HEADACHE AMONG WOMEN: RESULTS FROM 11 EPIDEMIOLOGIC

DATASETS

1 Odds ratios for headaches among women by BMI (reference BMI = 20)...........110

BODY MASS INDEX AND WAIST-TO-HIP RATIO AS THEY RELATE TO MORTALITY IN NHANES III

1 Histograms depicting the distributions of BMI and WHR by gender .................142

2 Unadjusted proportion of deaths observed among those grouped per unit of BMI and 1

100 unit of WHR, respectively, by gender ...................................................143 3 Odds ratios plotted for BMI* and WHR by gender..............................................144

x

LIST OF ABBREVIATIONS

(Entries are listed alphabetically)

AIC Akaike’s information criterion

alt. alternative or “under the alternative hypothesis”

B-spline basis spline

BARS Bayesian adaptive regression splines

BIC Schwarz’s Bayesian information criterion

BMI body mass index (kg/m2)

BRR balanced repeated replication

CV cross validation

df degree(s) of freedom

GAM generalized additive model

GLM generalized linear model

GCV generalized cross validation

IML integrated matrix language package in SAS

LMF linked mortality file

log(odds) the natural logarithm of the odds in favor of some binary event occurring

LR likelihood ratio

LSE least squares estimate, estimator, or estimation

arg min() argument of the minimum (i.e., minimize)

MARS multivariate adaptive regression splines

xi

MLE maximum likelihood estimate, estimator, or estimation

NCHS National Center for Health Statistics

NHANES National Health and Nutrition Examination Survey

NHIS National Health Interview Survey

NLIN least squares nonlinear optimization package in SAS

NLP nonlinear programming optimization package in SAS

OR odds ratio(s)

P-splines penalized splines

PLS piecewise linear slope(s) (refers to representation of spline parameters)

PROC SAS software procedure

PSU primary sampling unit

RDC Research Data Center at NCHS

rep replicate (i.e., indicating a randomly drawn replicate)

SAS Statistical Analysis Software

SSE sum of squared residual error

SRS simple random sample or sampling

SUDAAN a SAS-callable package for common analyses of complex samples

TPSLINE thin plate spline modeling package in SAS

TRANSREG transformation regression modeling package in SAS

WHO World Health Organization

WHR waist-to-hip ratio

xii

INTRODUCTION

Goals and Objectives

In this dissertation research I investigated an approach to nonlinear modeling

designed to take advantage of complex sample design features common to large

nationally representative observational datasets. This nonlinear modeling framework

focused on:

1) using piecewise linear free-knot splines to estimate the nonlinear relationship

between a binary outcome of interest and a continuous predictor variable adjusted

for relevant covariate information; and

2) applying bootstrap methods to perform two important functions:

a) estimating the complexity of the free-knot spline by determining the

number of knot parameters that provide the optimal piecewise linear fit to

the data; and

b) making appropriate adjustments to parameter estimates, variance

estimates, and confidence intervals by taking into account complex sample

design information.

To demonstrate the properties of the novel aspects of this nonlinear modeling

framework, I have designed and carried out a comparative simulation study. Next, I

focused my efforts on making original contributions to public health knowledge by

applying the framework to real data from the third National Health and Nutrition

Examination Survey (NHANES III) and other large, cross-sectional datasets. In

1

particular, I have conducted two potentially nonlinear analyses to investigate the

following:

1) the relationship between headaches and body mass index (BMI: weight in kg

divided by the square of height in meters) among women (Keith et al., 2008); and

2) the risk of mortality as it relates to BMI and waste-to-hip ratio (WHR; waist

circumference divided by hip circumference), respectively.

The estimates I have computed may then be used to provide a basis for designing

more focused observational and clinical quantitative research in the areas of obesity,

headache, and mortality. The eventual results from these studies may be expressed by

complex decision models (Parmigiani, 2002) that may realistically affect public health

policy in the long-term.

How These Ideas Developed

The ideas for this research stem primarily from my efforts to model the apparently

nonlinear relationships between headache and BMI among women sampled throughout

the United States in datasets that included the first National Health and Nutrition

Examination Survey (NHANES I: 1971-1975) and the National Health Interview Surveys

(NHIS series 1997-2003). These eight publicly available datasets each had complex,

multistage sample designs that allowed for efficient achievement of samples that

represented the civilian non-institutionalized US population.

I could find no software packages that would fit nonlinear models and make the

appropriate adjustments for the complex sample designs of these large cross-sectional

datasets. Thus, I set out to construct my own nonlinear modeling software capable of

2

utilizing the strata, primary sampling unit (PSU), and sample weight information

provided by these surveys.

An Overview of the Components of This Research

Here I discuss very briefly the topics and issues involved in this research. Each of

these areas will be discussed in detail in later chapters.

Piecewise Linear Free-Knot Splines

After extensively reviewing the literature on nonlinear modeling, free-knot splines

stood out for their potential for flexibility and interpretability. A free-knot spline may be

loosely described as a nonlinear regression characterized by piecewise polynomials of

order m joined at locations called knots where the adjoining segments typically agree at

their (m-2)th derivative and both the number and locations of the knots are parameters to

be estimated along with other model parameters.

Knot Selection Via Parametric Bootstrapping

Picking the most appropriate free-knot spline model is a complicated problem. I

have devised a unique method of knot selection based on parametric bootstrap

methodology.

Complex Multistage Probability Samples

The complex sample designs, for which I am designing this framework, provide

investigators information necessary to adjust analyses for the planned sampling schemes.

3

These designs employ multistage probability sampling involving stratification, clustering,

assessment of non-response, and oversampling of specific subpopulations (e.g., age or

race subgroups) that would be difficult to represent well with simple random sampling

(SRS). Ignoring these characteristics can result in biased and possibly misleading

estimates.

Adjustments to Estimates and Confidence Intervals

Research into adjustments for complex, multistage probability sample designs

revealed useful applications of resampling techniques, such as the bootstrap, which may

be conveniently employed to make appropriate adjustments.

Simulations

The knot selection procedure I have devised for piecewise linear free-knot spline

modeling framework will be carefully evaluated on its performance in analyzing data

simulated under a variety of conditions.

Applications to Public Health Outcomes

Experts tend to agree that obesity is complex (Keith et al., 2006a) and costly

(Allison et al., 1999; WHO, 1998), having relationships with headache (e.g., Keith et al.,

2008; Bigal et al., 2006) and mortality rate (e.g., Flegal et al., 2005; Fontaine et al., 2003;

Calle et al., 1999; Narayan et al., 2007; Keith et al., 2006b), respectively, where many

factors contribute or modify an individual’s susceptibility to obesity and its correlates.

The contemporary studies of these relationships call for large sample sizes and analytical

4

tools capable of identifying significant associations that are likely to be unevenly

distributed over groups that vary by age, race, gender, geographic locations,

socioeconomic status, etc. Moreover, statistical tools, such as those which use splines,

that are capable of providing highly flexible models are called for in these settings (Korn

and Graubard, 1999).

The information derived from this research may be potentially useful for

clinicians and biostatisticians. Firstly, BMI is an easily measured and modifiable risk

factor. Thus, clinicians and public health officials could use the results of this study to

advise or counsel patients regarding the benefits associated with remaining “below” or

“above” a given BMI to reduce their risk of headache or mortality. Secondly,

biostatisticians and other quantitative researchers may be interested in more closely

estimating the nonlinear functional form of the association between a continuous

predictor and a binary outcome related to their particular field.

Specific Aims

Specific Aim 1

To complete development of and evaluate a piecewise linear free-knot spline

approach to modeling the nonlinear relationship between a binary outcome and a

continuous prognostic variable in large datasets having complex sampling designs and

covariate information.

In brief, the framework carries forward concepts and ideas developed for least

squares estimation (LSE) to applying maximum likelihood estimation (MLE) in

piecewise linear free-knot splines based upon either the truncated power basis or B-

5

splines (de Boor, 1978). Statistical software will be employed in this framework to

simultaneously optimize model equations with respect to multiple covariate and knot

parameters. For the purpose of selecting the optimal number of knots and their locations,

I have developed a model selection algorithm that utilizes the binomial probability

distribution assumption under the piecewise linear logistic regression model. This

framework will employ parametric bootstrapping (Davison and Hinkley, 1997) of a two

degree of freedom test of model improvement from adding a knot parameter and a slope

parameter (my “2 df knot testing procedure”). In order to compute accurate standard error

estimates and confidence intervals, another level of specialized nonparametric

bootstrapping must be applied to rescale individual sampling weights according to the

methods outlined by Rao and colleagues (1992). A critical component of this aim

involved evaluating the 2 df knot testing procedure with respect to its efficiency and

qualities. This evaluation involved simulating data under a variety of controlled

conditions, applying the method to the simulated data, and plotting the results for

comparison to the “true” simulated model. Comparisons to other popular alternative

methods, such as AIC and BIC, was also an important component of this simulation study

which demonstrated the advantages and disadvantages of using the proposed method.

Specific Aim 2

To apply the framework to nationally representative datasets to carefully examine

1) the risk of headache; and 2) the risk of mortality associated with BMI while adjusting

for the effects of covariates and complex sample designs.

6

This aim focused on applying my nonlinear modeling framework. Cross-sectional

data analyses of the relationship between BMI and headache outcomes were conducted in

eleven large datasets, many of which had complex sample designs. For estimation and

comparison, nonlinear analyses of the respective relationships of BMI and WHR with

mortality in the United States have been carried out on data from NHANES III. This

dataset was large (over 14,000 adult participants), had a complex, nonignorable

multistage probability cluster sampling design, and contained several detailed measures

of adiposity. When used in conjunction with its Linked Mortality File (LMF), NHANES

III provided information on vital status, BMI, WHR, and an abundance of covariate

measures.

The Papers

This dissertation follows the “three papers” model. The first paper is an extensive

literature review which points out alternative contributors to the obesity epidemic to be

considered alongside the “big two” contributors (i.e., food marketing practices and

reductions in physical activity). This paper illustrates the importance of developing new

ideas and challenging the assumptions we, as scientists, often make in obesity-related

research. With data resources growing in size and scope, so too should our abilities to

draw connections between health outcomes and possible predictors. The statistical

methodology and applications detailed in the following three papers take that general

aim. The second paper describes my unique nonlinear modeling framework for complex

samples and the implementation of B-splines and likelihood-based methods to improve

computational performance and stability. It also includes a simulation study of my novel

7

knot testing procedure. The third paper applies the framework, based on least-squares

estimation by the Levenberg-Marquardt optimization procedure, to analyze the possibly

nonlinear relationship between BMI and headache among women in 11 large

epidemiologic datasets. The fourth paper details an application of my likelihood-based

modeling framework to NHANES III for the purpose of comparing and contrasting the

predictive capacities of BMI and WHR as they relate to all-cause mortality as a binary

outcome.

8

PUTATIVE CONTRIBUTORS TO THE SECULAR INCREASE IN OBESITY:

EXPLORING THE ROADS LESS TRAVELED

by

SCOTT W. KEITH, DAVID T. REDDEN, PETER KATZMARZYK, MARY M. BOGGIANO, ERIN C. HANLON, RUTH M. BENCA, DOUGLAS RUDEN,

ANGELO PIETROBELLI, JAMIE BARGER, KEVIN R. FONTAINE, CHENXI WANG, LOUIS J. ARONNE, SUZANNE WRIGHT, MONICA BASKIN,

NIKHIL DHURANDHAR, MARIA C. LIJOI, CARLOS M. GRILO, MARIA DELUCA, ANDREW O. WESTFALL, DAVID B. ALLISON

International Journal of Obesity. 30:1585-94.

Copyright 2006 by

Scott W. Keith

9

Abstract

Objective: To investigate plausible contributors to the obesity epidemic beyond the two

most commonly suggested factors, reduced physical activity and food marketing

practices.

Design: A narrative review of data and published materials that provide evidence of the

role of additional putative factors in contributing to the increasing prevalence of obesity.

Data: Information was drawn from ecological and epidemiological studies of humans,

animal studies, and studies addressing physiological mechanisms when available.

Results: For at least 10 putative additional explanations for the increased prevalence of

obesity over recent decades, we found supportive (though not conclusive) evidence that

in many cases is as compelling as the evidence for more commonly discussed putative

explanations.

Conclusion: Undue attention has been devoted to two postulated causes for increases in

the prevalence of obesity leading to neglect of other plausible mechanisms and well-

intentioned, but potentially ill-founded proposals for reducing obesity rates.

Key Words: additional explanations, prevalence of obesity, obesity epidemic, body mass

index, food marketing, physical activity.

10

Introduction

The prevalence of obesity has increased substantially since 1970.1 Although the

causes are uncertain, many contend that environmental changes are almost certainly

responsible and focus overwhelmingly on food marketing practices and technology, and

on institution-driven reductions in physical activity (the “Big Two”), eschewing the

importance of other influences. This has created a hegemony whereby the importance of

the Big Two is accepted as established and other putative factors are not seriously

explored. The result may be well-intentioned but ill-founded proposals for reducing

obesity rates.

We begin by reviewing key facts about the secular increase in obesity (“the

epidemic”). We then highlight evidence showing that the obesogenic influence of the Big

Two is largely ‘circumstantial’, relying heavily on ecological correlations rather than

individual-level epidemiologic data or randomized experiments. Subsequently, we

delineate the evidence for 10 other putative factors for which the evidence is also

circumstantial but in many cases, at least equally compelling. We conclude that undue

attention has been devoted to 2 postulated causes for the epidemic, yielding neglect of

other plausible mechanisms.

The Epidemic

Obesity prevalence in the United States has been increasing for at least 100 years2

with an apparent acceleration in the past 3 decades. The distribution of body mass index

11

(BMI; Kg/m2) has increased modestly in median and moderately in mean. What has

increased far more dramatically is the positive (right-tailed) skewness of the distribution,

such that the most obese segments of the distribution are far more obese than in years

past. Obesity has increased in every age, sex, race, and smoking-status stratum of the

population, which has correctly been taken to indicate that changes in the distribution of

age, race, sex, and smoking status cannot completely account for the epidemic. However,

as we show later, this finding does not indicate that changes in the distribution of these

variables are not contributing to the epidemic.

Evidence for the Big Two

Reduced physical activity,3 particularly from reduced school-based physical

education,4 and specific food manufacturing and marketing practices (e.g., vending

machines in schools,5 increased portion size,6 increased availability of fast-food,3,7,8 use

of high-fructose corn syrup (HFCS)9) comprise the Big Two explanations proffered for

the obesity epidemic and are frequently cited as targets of potential public health

interventions. We do not intend to imply that the Big Two are not salient contributors to

the epidemic. Rather, we offer that the evidence of their role as primary players in

producing the epidemic (as well as the evidence supporting their potential ability to

reverse the trend if manipulated) is both equivocal and largely circumstantial—that is, the

hypothesized effects are underdetermined by the data. Data rarely, if ever, stem from

randomized controlled trials of the effects in population settings and in many cases do not

even include a consistently supportive body of individual-level epidemiologic studies.

The arguments for the effects of each subcomponent tend to rely heavily (though not

12

exclusively) on presumed mechanisms of action and ecological studies10 in which

associations between the putative factor and obesity rates are shown at the aggregate

population level across times or geographic locations. According to the Food and Drug

Administration,11 because ecological “studies do not examine the relationship between

exposure and disease among individuals, the studies have been traditionally regarded as

useful for generating, rather than definitively testing, a scientific hypothesis.” Consider

several examples. Regarding physical education classes, Pathways, a large, expensive,

and expertly designed childhood obesity prevention program emphasized increasing

frequency and quality of physical education classes and found no effect on BMI.12

Regarding vending machines, a thorough evidence-based review (Faith et al.,

unpublished, 2005) found no published randomized trials, quasi-experiments, or

observational epidemiologic studies evaluating their effects on obesity. Regarding fast-

food availability, although some studies showed associations with obesity, Burdette and

Whitaker13 found no association between being overweight and proximity to fast-food

restaurants in over 7000 children. Regarding HFCS, the leading source (in the United

States) is sweetened beverages and 3 out of 4 studies conducted in children have found

no association between soft drink consumption and BMI when controlling for total

energy intake14-17 raising the issue that there is no independent effect of HFCS calories on

body weight other than its pleasant taste possibly leading to the potential increase in total

caloric intake as would any food.

Regarding TV viewing, a recent meta-analysis concluded “A statistically

significant relationship exists between TV viewing and body fatness among children and

youth although it is likely to be too small to be of substantial clinical relevance. …media-

13

based [TV-based] inactivity may be unfairly implicated in recent epidemiologic trends of

overweight and obesity among children and youth”.18 Regarding portion size, Rolls has

presented considerable evidence that portion size may increase daily food intake.

Nevertheless, Rolls19 wrote, “…. that adults who are obese eat bigger portions of energy-

dense foods do[es] not prove that portion size plays a role in the etiology of obesity.

Indeed, at this time we know of no data showing such a causal relationship.”

Again, these data and quotations do not disprove the importance of those factors

listed but highlight their less-than-unequivocal evidential basis. Realizing this should

serve as an impetus for more vigorous consideration of additional factors.

Additional Explanations for the increase in obesity

We do not review all plausible contributors to the epidemic but select those that

are most interesting and for which the totality of current evidence is strongest. Figure 1

portrays the secular increase in a number of key indicators of these putative causal

influences. For most Additional Explanations we offer the conclusion that a factor (e.g.,

X) that has contributed to the epidemic will logically follow acceptance of two

propositions: 1) X has a causal influence on human adiposity and 2) during the past

several decades, the frequency distribution of X has changed such that the relative

frequency of values of X leading to higher adiposity levels has increased. Absent of

countervailing forces, if both propositions are true, obesity levels will increase.

Therefore, for postulated factors supported by this line of propositional argument

(Additional Explanations 1-7), we evaluate evidence addressing whether the factor can

increase fatness and whether the factor’s frequency distribution has changed in the

14

obesogenic direction. For the remaining Additional Explanations, propositional

arguments vary in form and are outlined separately.

Additional Explanation 1: Sleep Debt

Evidence that less sleep can cause increased body weight. For children and adults,

hours of sleep per night is inversely related to BMI and obesity in cross-sectional studies

and incident obesity in longitudinal studies.20,21 In animals, sleep deprivation produces

hyperphagia, offering a mechanism of action.22 Evidence for the physiologic mechanism

includes decreased leptin and thyroid stimulating hormone secretion, increased ghrelin

levels and decreased glucose tolerance, all endocrine changes that occur with sleep

deprivation.23-25 Sleep restriction in humans has recently been shown to produce similar

effects, including increased hunger and appetite.26 These changes are consistent with

chronic sleep deprivation leading to increased risk of obesity.

Has average sleep debt increased?Data clearly show that the average amount of

sleep has steadily decreased among U.S. adults and children during the past several

decades.27,28 Average daily sleep has decreased from over 9 to just over 7 hours among

adults.

We note that future studies examining the association between sleep debt on BMI

or any cause-effect link between them would benefit from utilizing more objective

assessments of sleep duration and sleep quality (vs. self-reporting). A good example is

the measure of spontaneous physical activity during sleep measured by microwave radar

detector. Bitz et al. (2002)29 used this technique in finding increased sleep disruptions

15

among diabetic subjects. Resta et al. (2003)30 found that even in the absence of sleep

apnea, obese subjects were observed to suffer more sleep disruptions defined as higher

sleep latency, a lower percentage of REM sleep, and a lower sleep efficiency (a ratio

between total sleep time and time spent in the bed) than non-obese subjects. The effect

of age should be controlled in such assessments, as it correlates positively with sleep time

activity.29 Large-scale self-report studies could also be improved with subjects’ use of

actigrophy watches to verify self-reported sleep times.

Additional Explanation 2: Endocrine Disruptors

Evidence that endocrine disruptors can increase adiposity. Endocrine disruptors

(EDs) are lipophilic, environmentally stable, industrially produced substances that can

affect endocrine function and include dichlorodiphenyltrichloroethane (DDT), some

polychlorinated biphenols (PCBs), and some alkylphenols. By disturbing endogenous

hormonal regulation, EDs may fatten through multiple pathways. Consider the effect of

estrogen on white adipose tissue: In rodents white adipose is increased by ovariectomy

and decreased by estrogen replacement therapy.31 Similarly, postmenopausal women

have increased white adipose tissue, which is reduced by estrogen replacement therapy.32

The estrogen receptor-α knockout mouse has increased white adipose tissue in mice of

both sexes.33 Some EDs directly bind to nuclear receptors, including the peroxisome

proliferator-activated receptor γ and the retinoic acid X receptor. Kanayama et al.34 found

that the organontin EDs are high-affinity agonists for the peroxisome proliferator-

activated receptor γ and retinoic acid X receptor and stimulate adipocyte proliferation.

Other EDs are antagonists of certain nuclear receptors. For example, vinclozolin is a

16

dicarboximide fungicide and an androgen receptor antagonist.35 Some EDs are anti-

androgens36 and may thereby alter nutrient partitioning toward a more fatty body

composition. EDs can also inhibit aromatases37 and the aromatase knockout mouse has

increased adiposity. In humans, body ED burden and BMI or fat mass are positively

correlated even when normalized to total body triglyceride.38

Evidence that ED exposure has increased. EDs have increased in the food

chain.39,40 One example indicator is that polybrominated diphenyl ether concentration in

Swedish women’s breast milk almost doubled every 5 years from 1972 to 1998.39

Additional Explanation 3: Reduction in Variability in Ambient Temperature

Evidence that remaining in the thermoneutral zone promotes adiposity. The

thermoneutral zone (TNZ) is the range of ambient temperature in which energy

expenditure is not required for homeothermy. Exposure to ambient temperatures above or

below the TNZ increases energy expenditure, which, all other things being equal,

decreases energy stores (i.e., fat). This effect was shown in short-term controlled human

experiments41,42 and the decreases in adiposity were evidenced in controlled animal

experiments; these effects are widely exploited in livestock husbandry, where selecting

the environment to maximize weight gain is critical.43

Animal44 and human45 studies show that excursions above the TNZ markedly

reduce food intake. Herman45 cited a consumer survey suggesting that after an air-

conditioning breakdown, restaurant sales drop dramatically.

17

Evidence that time in the TNZ has increased. Humans dwell more in the TNZ

than they did 30 years ago. For example, the average internal U.K. home temperature

increased from 13°C to 18°C between 1970 and 2000.46 The U.S. thermal standard for

winter comfort increased from 18°C in 1923 to 24.6°C in 1986.47,48 The percentage of

U.S. homes with central air conditioning increased from 23% to 47% between 1978 and

1997 while the percentage of homes with no air-conditioning decreased from 44% to

28%. In the southern United States, where some of the highest obesity rates are observed,

the percentage of homes with central air conditioning increased from 37% to 70%

between 1978 and 1997 and the percentage of homes without any air-conditioning

decreased from 26% to 7%.49

Additional Explanation 4: Decreased Smoking

Evidence that smoking reduces weight. Epidemiologic and clinical studies

consistently show that smokers tend to weigh less than nonsmokers and weight gain

follows smoking cessation.50,51 Nicotine has both thermogenic and appetite suppressant

effects and its effects on appetite are enhanced by caffeine.52

Evidence that smoking rates have decreased. Rates of cigarette smoking among

U.S. adults steadily declined during the past several decades.53 Centers for Disease

Control and Prevention scientists estimated that between 1978 and 1990 smoking

cessation was responsible for about one quarter (2.3 of 9.6 percentage points) of the

increase in the prevalence in overweight in men and for about one sixth (1.3 of 8.0

percentage points) of the increase in women.50

18

Additional Explanation 5: Pharmaceutical Iatrogenesis

Evidence that certain pharmaceuticals increase weight. Weight gain is induced

by many psychotropic medications (antipsychotics, antidepressants, mood stabilizers),

anticonvulsants, antidiabetics, antihypertensives, steroid hormones and contraceptives,

antihistamines, and protease inhibitors. Selective serotonin reuptake inhibitors

(antidepressants) may also produce weight gain but data are less consistent.54-56 Almost

all atypical antipsychotics produce markedly more weight gain than placebo or than

traditional antipsychotics. For olanzapine and clozapine, mean weight gains were over 4

kg at 10 weeks.57 These drugs are active at many receptors involved in body weight

regulation58 and these findings were reproduced in animal models.59 Most antidiabetics

including insulin, sulfonylureas, and thiazolidinediones also promote adiposity,

especially the newer thiazolidinediones, which promote adipocyte proliferation.60 Beta-

blockers induce a mean weight gain of approximately 1.2 kg.61 Data are less consistent

for oral contraceptives, but one study estimated a mean weight gain of approximately 5

kg at 2 years.62 Antihistamines also appear to induce weight gain, with more potent

antihistamines producing greater weight gain.63 HIV antiretroviral drugs and protease

inhibitors also produce weight gain and increased abdominal adiposity.64

Evidence that use of such pharmaceuticals has increased. Most pharmaceuticals

described above were introduced or had their use dramatically increased in the past 3

decades. In the past 30 years, outpatient prescriptions for atypical anti-psychotic

medications have increased from essentially zero to be nearly 70% of the prescriptions to

this large patient population.65,66 Oral antidiabetic prescriptions increased more than 2-

19

fold from 1990 to 2001.67 Similar increases were also observed for use of

anticonvulsants68 and antihypertensives.69 HIV therapies were only introduced in the past

several decades.

Additional Explanation 6: Changes in Distribution of Ethnicity and Age

Evidence that some age and ethnic groups have higher prevalences of obesity

than others. Compared with young European-Americans, middle-age adults, African-

Americans (when comparing women only), and Hispanic-Americans have a markedly

higher obesity prevalence.1

Evidence that those age and ethnic groups have increased in relative frequency.

As a proportion of U.S. adults, the Hispanic-American population increased from less

than 5% in 1970 to approximately 13% in 2000.70,71 Similarly, from 1970 to 2000, the

proportion of the total U.S. adult population aged 35–44 years and 45–54 years increased

by 43% and 18%, respectively.71 Given that these groups have higher than average

obesity rates, it is likely that these demographic changes in the population are

contributing to the increased prevalence of obesity in at least a small way.

Additional Explanation 7: Increasing Gravida Age

Evidence that greater gravida age increases risk of offspring obesity. Wilkinson

et al.72 studied obese British children and found that a common risk factor was having an

elderly mother. Patterson et al.73 studied girls aged 9–10 years and found that the odds of

obesity increased 14.4% for every 5-year increment in maternal age. Biological data

20

support these findings. Symonds et al.74 observed a correlation between maternal age and

fat deposition in sheep, in part related to uncoupling protein levels. This is in part related

to an accelerated loss of the brown adipose uncoupling protein 1 levels in the offspring of

adult primiparous mothers after birth, which may act to increase white adipose tissue

deposition in later life.74

Evidence that gravida age is increasing. Gravida age is increasing globally,75,76

rising in mean by 1.4 years in the United Kingdom between 1984 and 199475 and in

median by 2 years in Canada from 1981 to 1987.76 Mean age at first birth increased 2.6

years among U.S. mothers since 1970.77 Given Patterson et al.’s73 finding above, these

increases in maternal age might produce a clinically meaningful ~7% increase in the odds

of obesity.

Additional Explanation 8: Intrauterine and Intergenerational Effects

Some influences on obesity may occur in utero or even 2 generations back when

oocytes are formed in the grandmother.78 These may occur partly through epigenetic

(e.g., methylation) events as evidenced by the fact that cloned mice tend to be obese yet

do not pass on this obesity to their offspring.79 Thus, the increases in obesity we see today

may well be due, in part, to environmental changes that affected prior generations.

Obesity, which began increasing at least a century ago,2 may perpetuate its own increase

through a fetally-driven positive feedback loop. Specifically, maternal obesity and

resulting diabetes during gestation and lactation may promote the same conditions in

subsequent generations.80

21

Animal studies testing the fetal origins hypothesis provide support.81-83 In one

study, offspring from parent rats fed high-fat and low-fat diets were fed a high-fat diet.

Not only were body weight and abdominal adiposity increased in the offspring of high-

fat-fed parents, but the effect remained significant over 3 generations.81,84 Similarly,

overfeeding first generation female pups produced heavier pups as compared with a

control group and effects persisted for 2 subsequent generations.84 In humans, birth

weight positively correlates with adult BMI. However, as Allison et al.85 showed, barring

extreme variations, this association seems to reflect common genetic influences on birth

weight and adult BMI rather than an intrauterine environment that affects both birth

weight and adult obesity. Nevertheless, there may be intrauterine effects on adult BMI

that are not manifested in high birth weight. New evidence suggests that low birth weight

and/or the rapid catch-up growth that often follows it may be a risk factor for later obesity

and its life-shortening sequelae.86 It is then noteworthy that the incidence of low birth

weight in the United States has increased. According to Hamilton et al.,87 low birth

weight increased to 7.8% for 2002, the highest in more than 3 decades; the rate of low

birth weight had declined in the 1970s and early 1980s but has increased since the mid-

1980s. Furthermore, mothers who were themselves low-birth-weight infants are at

increased risk for gestational diabetes,88 which, in turn, places their offspring at increased

obesity risk.89

Thus, it is possible that the extremes of energy imbalance in utero (overfeeding

and low birthweight) may contribute to obesity. We may now be seeing the

transgenerational obesogenic effects of environmental changes initiated one or more

22

generations ago. Forebodingly, obesity’s prevalence could increase further if children of

the current generation’s overweight or obese parents are thereby predisposed further still.

Additional Explanation 9: Greater BMI is Associated With Greater Reproductive Fitness

Yielding Selection for Obesity-Predisposing Genotypes

Reproductive fitness can be defined as one’s capacity to pass on one’s DNA.

BMI-associated reproductive fitness (viz natural selection) would increase obesity

prevalence if BMI has a genetic component (i.e., is heritable) and if individuals

genetically predisposed toward higher BMIs reproduce at a higher rate than do

individuals genetically predisposed toward lower BMIs.

Proposition A. BMI has a genetic component. That BMI (or adiposity) has a

heritable component is well supported by animal breeding studies and human twin,

family, and adoption studies90 with an estimated heritability of approximately 65%.91

Proposition B. Individuals with genetic predisposition toward greater adiposity

are reproducing at a higher rate than are individuals with a predisposition toward lesser

adiposity. Number of offspring is positively correlated with BMI among women.91 One

might assume that this is because childbearing or child rearing leads to weight gain.

Although this is plausible, other mechanisms may be contributing to this correlation.

Specifically, mild-to-moderate (but not severe) phenotypic obesity and/or a genotypic

predisposition to obesity may increase fecundity relative to phenotypic thinness and/or a

genetic predisposition to thinness because 1) obesity (at least in women) leads to

23

socioeconomic falling93 that, in turn, is associated with producing more offspring;94 2)

leanness beyond a certain point impairs fertility in women;95 and 3) other biological,

social, or economic factors may induce a positive correlation between genetic

predisposition to obesity and fecundity. Indeed, evidence shows that the direction of

causation may be from obesity predisposition to fecundity and not only the reverse. First,

while true that high BMI (> 25) is associated with reduced sperm concentration and total

sperm count, so too is low BMI (< 20) and the reduction is greater among men with low

BMI,96 there is an association between parent adiposity and number of offspring for both

fathers and mothers.97 Although this does not rule out that child rearing leads to obesity,

the correlation among fathers obviously cannot be ascribed to the effects of childbearing.

Second, at least one study showed that higher BMI among parents before producing

offspring is associated with subsequent offspring number.97 Finally, animal studies are

supportive: In cattle, calving rate and adiposity have a positive genetic correlation98 and

in male rhesus monkeys, adiposity is positively correlated with siring rate.99

Additional Explanation 10: Assortative Mating and Floor Effects

Assortative mating is a pattern of nonrandom mating that we will use to refer to

positive assortment in which the probability that 2 individuals mate is positively related

to their degree of phenotypic similarity. Assortative mating increases genetic variance in

a population even though it does not affect allele frequencies (it does affect genotype

frequencies). Three propositions imply that assortative mating is contributing to increased

obesity prevalence:100,101 1) human adiposity variations have a genetic component, 2) the

adiposity threshold for defining obesity was historically above the population median,

24

and 3) humans assortatively mate for adiposity. Moreover, if factors are present that

prevent most people from becoming extremely thin (i.e., floor effects), then the

population distribution of adiposity will become increasingly positively skewed, further

increasing the population mean. The extent of assortative mating does not need to have

increased over time for it to have contributed to increasing prevalence of obesity over

time.

Evidence that human adiposity variations have a genetic component. This was

discussed in the context of Additional Explanation 9.

The threshold for defining obesity was historically above the population median.

The threshold for defining obesity is currently a BMI of 30. This is above the current and

historical median.1

Humans assortatively mate for adiposity. Extensive research shows that for BMI

and other adiposity indicators, the spousal correlation is small (~0.15) but clearly

statistically significant and cannot be attributed to the effects of cohabitation.102 This

combined evidence strongly suggests that assortative mating has contributed to the

epidemic.100,101 Finally, there are clear floor effects on BMI103 that have likely

accentuated these effects.

Putting It All Together – Interconnections

Having laid out several of these possible contributing factors, it is interesting to

25

consider what their relative importance may be and whether there are interconnections

among these putative causal variables. With respect to their relative importance,

importance can be judged in multiple ways. For example, one could judge importance in

terms of the amount of variance in BMI explained, the magnitude of the mean increase in

BMI, a population attributable fraction, or some other measure of effect. Unfortunately,

we do not believe we are currently at the point where we can confidently say what the

effect size metrics are for each of these putatively causal variables and therefore cannot

confidently evaluate their relative importance on these metrics. Another way to consider

the importance of variables is their potential modifiability. It is unlikely that anyone

would suggest that we should have more people take up smoking as a way of controlling

body weight. Therefore, further consideration of the effects of smoking cessation on

population increases on BMI may be less important than consideration of other factors

that we might be more willing or able to modify. In this regard, factors such as sleep

reduction and increased use of heating and air conditioning might be things that are easily

modifiable and for which modifications in the direction that would hypothetically reduce

obesity levels would also have added benefits (e.g. a more healthy and alert population

and less use of fossil fuels). Thus, these types of putative contributing factors may be

more important in terms of meriting more attention.

It is also noteworthy that there may be interconnections among these putative

contributing factors. For example, Additional Explanation 6 specifies that the average

age of the US adult population has increased relative to the average age of that population

several decades ago. Even if the rates of reproduction within an any age category remain

constant, this would not only result in an older adult population who are more likely to be

26

obese solely by virtue of their own age, but would also result in increasing gravida age on

average (Additional Explanation 7) which may lead to more obesity among offspring.

Moreover, the greater obesity among the parental generation, due in part to increasing

age, may also predispose to greater obesity among the offspring generation as articulated

in Additional Explanation 8. Similarly, it is possible that the effects of assortative

mating, as discussed in Additional Explanation 10, may be accentuated by all other

factors. That is, it is possible that the influence of assortative mating is quite modest

when most people lie within some intermediate range of BMI with very few people being

severely obese. However, as larger proportions of the population become severely obese

as a result of the influence of other factors, it may be that there is a greater pattern of

intermating among these severely obese individuals which may then further accelerate

the increase in obesity levels in subsequent generations. There may yet be additional

connections among these factors that remain to be explored.

Discussion

The evidence for the putative roles of the 10 Additional Explanations in the

epidemic is compelling and in most cases consists of the concurrence of ecological

correlations, epidemiologic study results, model organism studies, and strong theoretical

or plausible mechanisms of action models. Nevertheless, we do not claim that all of the

Additional Explanations definitively are contributors, only that they are as plausibly so as

are the Big Two and deserve more attention and study.

Although the effect of any one factor may be small, the combined effects may be

consequential. Moreover, the Additional Explanations we consider do not exhaust the

27

possibilities. Other factors potentially involved in the epidemic with varying degrees of

evidential support include an epidemic of adenovirus-36,104 increases in childhood

depression,105 less calcium (or dairy) consumption,106 and hormones in agricultural

species.107 In trying to reduce obesity levels, we consider only factors that have changed

over time and potentially contributed to the epidemic. Other factors such as shift

work108,109 and not breastfeeding110 can contribute to obesity and decreasing them may

alleviate the epidemic even though they may not have contributed to it, because their

rates have not increased in the past 30 years.111,112 Of course, as we consider any

environmental factor it is important to remain cognizant that such factors act in concert

with individual genetic susceptibilities.113

Bray and Champagne114 have recently published a review of five environmental

agents that they found disturb energy balance and cause obesity in susceptible hosts.

While they offer three available strategies for combating the epidemic (nutrition

education, regulation of serving size and food labels, and modification to the food

system), their suggested measures target the Big Two and not the drugs, chemicals,

viruses, or toxins that they have implicated as contributing factors. If the Additional

Explanations we have offered are probable contributors to the epidemic as we believe,

then additional research is warranted to evaluate how much they actually contribute, their

mechanisms of action, their interaction effects, and how they may be countermanded.

While we are not suggesting in this paper that one discount the potential effects of the

Big Two, if Additional Explanations are veracious, the expectations for the likely public

health impact of programs that only target the Big Two might be tempered. Public health

28

practitioners and clinicians may need to address a broader range of influential factors to

more adequately address the epidemic.

Acknowledgements

Each author contributed to writing one or more sections of the manuscript and

each author edited the entire manuscript. We gratefully acknowledge Richard Forshee,

Ph.D. of the Center for Food and Nutrition Policy at Virginia Polytechnic Institute and

State University for his suggestions. This research was supported in part by NIH grant

P30DK056336. This funding source had no involvement in the writing of or the decision

to submit this paper.

References

1 Hedley AA, Ogden CL, Johnson CL, Carroll MD, Curtin LR, Flegal KM. Prevalence

of overweight and obesity among US children, adolescents, and adults, 1999-2002.

JAMA 2004;291:2847-2850.

2 Heimburger DC, Allison DB, Goran MI, et al. A festschrift for Roland L. Weinsier:

nutrition scientist, educator, and clinician. Obes Res 2003;11:1246-1262.

3 Swinburn B, Egger G. The runaway weight gain train: too many accelerators, not

enough brakes. BMJ 2004;329:736-769.

4 Gabbard C. The need for quality physical education. J Sch Nurs 2001;17:73-75.

5 Sothern MS. Obesity prevention in children: physical activity and nutrition. Nutrition

2004;20:704-708.

29

6 Matthiessen J, Fagt S, Biltoft-Jensen A, Beck AM, Ovesen L. Size makes a

difference. Public Health Nutr 2003;6:65-72.

7 Ebbeling CB, Sinclair KB, Pereira MA, Garcia-Lago E, Feldman HA, Ludwig DS.

Compensation for energy intake from fast food among overweight and lean adolescents.

JAMA 2004;291:2828-2833.

8 Rogers JH. Living on the fat of the land: How to have your burger and sue it too.

Washington Univ Law Q 2003;81:859-884.

9 Bray GA. The epidemic of obesity and changes in food intake: the fluoride

hypothesis. Physiol Behav 2004;82:115-121.

10 Morgenstern H. Ecologic studies in epidemiology: concepts, principles, and methods.

Annu Rev Public Health 1995;16:61-81.

11 U.S. Food and Drug Administration. Redbook 2000: Toxicological Principles for the

Safety Assessment of Food Ingredients. Available at:

http://vm.cfsan.fda.gov/~redbook/red-vib.html. Accessed March 3, 2005.

12 Caballero B, Clay T, Davis SM, Ethelbah B, Rock BH, Lohman T, Norman J, Story

M, Stone EJ, Stephenson L, Stevens J; Pathways Study Research Group. Pathways: a

school-based, randomized controlled trial for the prevention of obesity in American

Indian schoolchildren. Am J Clin Nutr 2003 Nov;78(5):1030-8.

13 Burdette HL, Whitaker RC. Neighborhood playgrounds, fast food restaurants, and

crime: relationships to overweight in low-income preschool children. Prev Med

2004;38:57-63.

14 Berkey CS, Rockett HR, Field AE, Gillman MW, Colditz GA. Sugar-added

beverages and adolescent weight change. Obes Res 2004;12:778-788.

30

15 Field AE, Austin SB, Gillman MW, Rosner B, Rockett HR, Colditz GA. Snack food

intake does not predict weight change among children and adolescents. Int J Obes Relat

Metab Disord 2004;28:1210-1216.

16 Ludwig DS, Peterson KE, Gortmaker SL. Relation between consumption of sugar-

sweetened drinks and childhood obesity: a prospective, observational analysis. Lancet

2001;357:505-508.

17 Newby PK, Peterson KE, Berkey CS, Leppert J, Willett WC, Colditz GA. Beverage

consumption is not associated with changes in weight and body mass index among low-

income preschool children in North Dakota. J Am Diet Assoc 2004;104:1086-1094.

18 Marshall SJ, Biddle SJ, Gorely T, Cameron N, Murdey I. Relationships between

media use, body fatness and physical activity in children and youth: a meta-analysis. Int J

Obes Relat Metab Disord 2004;28:1238-1246.

19 Rolls BJ. The supersizing of America: portion size and the obesity epidemic. Nutr

Today 2003;38:42-53.

20 von Kries R, Toschke AM, Wurmser H, Sauerwald T, Koletzko B. Reduced risk for

overweight and obesity in 5- and 6-y-old children by duration of sleep—a cross-sectional

study. Int J Obes Relat Metab Disord 2002;26:710-716.

21 Gangwisch J, Heymsfield S. Sleep deprivation as a risk factor for obesity: results

based on the NHANES I. North American Association for the Study of Obesity

(NAASO) 2004;Abstract no. 42-OR:A11.

22 Everson CA. Functional consequences of sustained sleep deprivation in the rat. Behav

Brain Res 1995;69:43-54.

31

23 Spiegel K, Leproult R, Van Cauter E. Impact of sleep debt on metabolic and

endocrine function. Lancet 1999;354:1435-1439.

24 Spiegel K, Leproult R, L'hermite-Baleriaux M, Copinschi G, Penev PD, Van Cauter

E. Leptin levels are dependent on sleep duration: relationships with sympathovagal

balance, carbohydrate regulation, cortisol, and thyrotropin. J Clin Endocrinol Metab

2004;89:5762-5771.

25 Taheri S, Lin L, Austin D, Young T, Mignot E. Short sleep duration is associated

with reduced leptin, elevated ghrelin, and increased body mass index. PloS Med

2004;1:e62.

26 Spiegel K, Tasali E, Penev P, Van Cauter E. Brief communication: Sleep curtailment

in healthy young men is associated with decreased leptin levels, elevated ghrelin levels,

and increased hunger and appetite. Ann Intern Med 2004;141:846-850.

27 Bonnet MH, Arand DL. We are chronically sleep deprived. Sleep 1995;18:908-911.

28 Iglowstein I, Jenni OG, Molinari L, Largo RH. Sleep duration from infancy to

adolescence: reference values and generational trends. Pediatrics 2003;111:302-307.

29 Bitz C, Harder H, Astrup A. A paradoxical diurnal movement pattern in obese

subjects with type 2 diabetes: a contributor to impaired appetite and glycemic control?

Diabetes Care. 2005; 28:2040-2041.

30 Resta, O., Foschino, B.M.P., Bonfitto, P., Giliberti, T., Depalo, A., Pannacciulli, N.,

De Pergola, G. Low sleep quality and daytime sleepiness in obese patients without

obstructive sleep apnoea syndrome. J Intern Med. 2003; 253:536-43.

31 Wade GN, Gray JM, Bartness TJ. Gonadal influences on adiposity. Int J Obes

1985;9(suppl 1): 83-92.

32

32 Haarbo J, Marslew U, Gotfredsen A, Christiansen C. Postmenopausal hormone

replacement therapy prevents central fat distribution. Metabolism 1991;40:1323-1326.

33 Heine PA, Taylor JA, Iwamoto GA, Lubahn DB, Cooke PS. Increased adipose tissue

in male and female estrogen receptor-alpha knockout mice. Proc Natl Acad Sci USA

2000;97:12729-12734.

34 Kanayama T, Kobayashi N, Mamiya S, Nakanishi T, Nishikawa J. Organotin

compounds promote adipocyte differentiation as agonists of the peroxisome proliferator-

activated receptor γ/ retinoid X receptor pathway. Mol Pharmacol 2005;67:766-774.

35 Uzumcu M, Suzuki H, Skinner M, Effect of the anti-androgenic endocrine disruptor

vinclozolin on embryonic testis cord formation and postnatal testis development and

function. Reprod Toxicol 2004;18:765-774.

36 Sohoni P, Sumpter JP. Several environmental oestrogens are also anti-androgens. J

Endocrinol 1998;158:327-339.

37 Woodhouse AJ, Cooke GM. Suppression of aromatase activity in vitro by PCBs 28

and 105 and Aroclor 1221. Toxicol Lett 2004;152:91-100.

38 Pelletier C, Imbeault P, Tremblay A. Energy balance and pollution by

organochlorines and polychlorinated biphenyls. Obes Rev 2003;4:17-24.

39 Noren K, Meironyte D. Certain organochlorine and organobromine contaminants in

Swedish human milk in perspective of past 20-30 years. Chemosphere 2000;40:1111-

1123.

40 Nilsson R. Endocrine modulators in the food chain and environment. Toxicol Pathol

2000;28:420-431.

33

41 Westerterp-Plantenga MS, van Marken Lichtenbelt WD, Cilissen C, Top S. Energy

metabolism in women during short exposure to the thermoneutral zone. Physiol Behav

2002;75:227-235.

42 Saxton C. Effects of severe heat stress on respiration and metabolic rate in resting

man. Aviat Space Environ Med 1981;52:281-286.

43 Mader TL. Environmental stress in confined beef cattle. J Anim Sci 2003;81:E110-

E119.

44 Collin A, van Milgen J, Dubois S, Noblet J. Effect of high temperature on feeding

behaviour and heat production in group-housed young pigs. Br J Nutr 2001;86:63-70.

45 Herman CP. Effects of heat on appetite. In: Marriott BM, ed. Nutritional Needs in

Hot Environments: Applications for Military Personnel in Field Operations. Washington,

DC: National Academy Press, 1993:187-214.

46 EHCS 2000. Housing Research Summary: English House Condition Survey 1996:

Energy Report (No. 120). Office of the Deputy Prime Minister, The Stationary Office,

UK.

47 Understanding comfort, behavior, and productivity. Available at:

http://www.esource.com/public/pdf/Heating.pdf. Accessed March 3, 2005.

48 E Source space heating atlas. Available at:

http://www.esource.com/public/products/atlas_heating.asp. Accessed March 3, 2005.

49 Type of air-conditioning equipment by census region and survey year. Available at:

http://www.eia.doe.gov/emeu/consumptionbriefs/recs/actrends/recs_ac_trends_table2.ht

ml. Accessed March 3, 2005.

34

50 Flegal KM, Troiano RP, Pamuk ER, Kuczmarski RJ, Campbell SM. The influence of

smoking cessation on the prevalence of overweight in the United States. N Engl J Med

1995;333:1165-1170.

51 Filozof C, Fernandez Pinilla MC, Fernandez-Cruz A. smoking cessation and weight

gain. Obes Rev 2004;5:95-103.

52 Jessen AB, Buemann B, Toubro S, Skovgaard IM, Astrup A. The appetite-

suppressant effect of nicotine is enhanced by caffeine. Diab Obes Metab 2005;7:327-333.

53 Centers for Disease Control and Prevention. Cigarette smoking among adults—

United States, 2002. MMWR 2004;53:427-431.

54 Fava M. Weight gain and antidepressants. J Clin Psychiatry. 2000:61(suppl 11):37-

41.

55 Garland EJ, Remick RA, Zis AP. Weight gain with antidepressants and lithium. J

Clin Psychopharmacol 1988;8:323-330.

56 Sussman N, Ginsberg DL, Bikoff J. Effects of nefazodone on body weight: a pooled

analysis of selective serotonin reuptake inhibitor- and imipramine-controlled trials. J Clin

Psychiatry 2001;62:256-260.

57 Allison DB, Mentore JL, Heo M, et al. Antipsychotic-induced weight gain: a

comprehensive research synthesis. Am J Psychiatry 1999:156:1686-1696.

58 Allison DB, Casey DE. Antipsychotic-induced weight gain: a review of the literature.

J Clin Psychiatry 2001;62(suppl 7):22-31.

59 Cope MB, Nagy TR, Fernandez JR, Geary N, Casey DE. Allison DB. Antipsychotic

drug–induced weight gain: development of an animal model. Int J Obes Relat Metab

Disord 2005;29:607-14.

35

60 Fonseca V. Effect of thiazolidinediones on body weight in patients with diabetes

mellitus. Am J Med 2003;115(suppl 8A):42S-48S.

61 Sharma AM, Pischon T, Hardt S, et al. Hypothesis: beta-adrenergic receptor blockers

and weight gain. A systematic analysis. Hypertension 2001;37:250-254.

62 Espey E, Steinhart J, Ogburn T, Qualls C. Depo-provera associated with weight gain

in Navajo women. Contraception 2000;62:55-58.

63 Aronne LJ. Drug-induced weight gain: non-CNS medications. In: Aronne LJ, ed. A

Practical Guide to Drug-induced Weight Gain. Minneapolis: McGraw-Hill, 2002:77-91.

64 Stricker RB, Goldberg B. Weight gain associated with protease inhibitor therapy in

HIV-infected patients. Res Virol 1998;149(2):123-126.

65 Daumit GL, Crum RM, Guallar E, Rowe RN, Primm AB, Steinwachs EM, Ford DE.

Outpatient prescriptions for atypical antipsychotics for African Americans, Hispanics and

Whites in the United States. JAMA 2003: 60: 121-128.

66 Hermann RC, Yang D, Ettner SL, Marcus SC, Yoon C, Abraham M. Prescription of

antipsychotic drugs by office-based physicians in the United States, 1989-1997. Psychiatr

Serv 2002;53:425-430.

67 Wysowski DK, Armstrong G, Governale L. Rapid increase in the use of oral

antidiabetic drugs in the United States, 1990-2001. Diabetes Care 2003;26:1852-1855.

68 Citrome L, Jaffe A, Levine J, Allingham B. Use of mood stabilizers among patients

with schizophrenia, 1994-2001.Psychiatr Serv 2002;53:1212.

69 Psaty BM, Manolio TA, Smith NL, et al. Time trends in high blood pressure control

and use of antihypertensive medications in older adults. Arch Intern Med 2002;162:2325-

2332.

36

70 Race and Hispanic origin 1790 to 1990. Available at:

http://www.census.gov/population/documentation/twps0056/tab01.pdf. Accessed March

15, 2005.

71 The Hispanic population 2000. Available at:

http://www.census.gov/prod/2001pubs/c2kbr01-3.pdf. Accessed March 15, 2005.

72 Wilkinson PW, Parkin JM, Pearlson J, Philips PR, Sykes P. Obesity in childhood: a

community study in Newcastle upon Tyne. Lancet 1977;1:350-352.

73 Patterson ML, Stern S, Crawford PB, et al. Sociodemographic factors and obesity in

preadolescent black and white girls: NHLBI's Growth and Health Study. J Natl Med

Assoc 1997;89:594-600.

74 Symonds ME, Pearce S, Bispham J, Gardner DS, Stephenson T. Timing of nutrient

restriction and programming of fetal adipose tissue development. Proc Nutr Soc

2004;63:397-403.

75 Armitage B, Babb P. Population review: (4). Trends in fertility. Popul Trends

1996;84:7-13.

76 Wadhera S. Trends in birth and fertility rates, Canada, 1921-1987. Health Rep

1989;1(2):211-223.

77 Mathews TJ, Hamilton BE. Mean age of mother, 1970-2000. Natl Vital Stat Rep

2002;51:1-13.

78 Finch CE, Loehlin JC. Environmental influences that may precede fertilization: a first

examination of the prezygotic hypothesis from maternal age influences on twins. Behav

Genet 1998;28(2):101-106.

37

79 Inui A. Obesity—a chronic health problem in cloned mice? Trends Pharmacol Sci

2003;24(2):77-80.

80 Levin B, Govek E. Gestational obesity accentuates obesity in obesity-prone progeny.

Am J Physiol 1998;275:R1374-R1379.

81 Wu Q, Mizushima Y, Komiya M, Matsuo T, Suzuki M. Body fat accumulation in the

male offspring of rats fed high-fat diet. J Clin Biochem Nutr 1998;25:71-79.

82 Wu Q, Mizushima Y, Komiya M, Matsuo T, Suzuki M. The effects of high-fat diet

feeding over generations on body fat accumulation with lipoprotein lipase and leptin in

rat adipose tissues. Asia Pacific J Clin Nutr 1999;8:46-52.

83 Lim K, Shimomura Y, Suzuki M. Effect of high-fat diet feeding over generations on

body fat accumulation. Japan Sci Soc Press 1991; 181-190.

84 Diaz J, Taylor EM. Abnormally high nourishment during sensitive periods results in

body weight changes across generations. Obes Res 1998;6:368-374.

85 Allison DB, Paultre F, Heymsfield SB, Pi-Sunyer FX. Is the intra-uterine period

really a critical period for the development of adiposity? Int J Obes Relat Metab Disord

1995;19:397-402.

86 Ozanne SE, Hales CN. Lifespan: catch-up growth and obesity in male mice. Nature

2004;427:411-412.

87 Hamilton BE, Martin JA, Sutton PD. Births: preliminary data for 2003. Natl Vital

Stat Rep 2004;53(9):1-17.

88 Bo S, Marchisio B, Volpiano M, Menato G, Pagano G. Maternal low birth weight and

gestational hyperglycemia. Gynecol Endocrinol 2003;17(2):133-136.

38

89 Silverman BL, Rizzo TA, Cho NH, Metzger BE. Long-term effects of the intrauterine

environment. The Northwestern University Diabetes in Pregnancy Center. Diabetes Care

1998;21:B142-B149.

90 Allison DB, Pietrobelli A, Faith MS, Fontaine KR, Gropp E, Fernández JR. (2003).

Genetic influences on obesity. In: Eckel R, ed. Obesity: Mechanisms & Clinical

Management. New York: Elsevier, 2003:31-74.

91 Segal NL, Allison DB. Twins and virtual twins: bases of relative body weight

revisited. Int J Obes Relat Metab Disord 2002;26:437-441.

92 Weng HH, Bastion LA, Taylor DH, Moser BK, Ostbye T. (2004). Number of

children associated with obesity in middle-aged women and men: Results from the Health

and Retirement Study. J Womens Health 2004;13:85-91.

93 Lipowicz A. Effect of husbands' education on fatness of wives. Am J Hum Biol

2003;15:1-7.

94 Salihu HM, Kinniburgh, Aliyu MH, Kirby RS, Alexander GR. Racial disparity in

stillbirth among singleton, twin and triplet gestations in the United States. Obstet Gynecol

2004;104:734-740.

95 Frisch RE. Body fat, menarche, fitness and fertility. Hum Reprod 1987;2:521-533.

96 Jensen TK, Andersson AM, Jorgensen N et al. Body mass index in relation to semen

quality and reproductive hormones among 1,558 Danish men. Fertil Steril 2004;82:863-

70.

97 Ellis L, Haman D. Population increases in obesity appear to be partly due to genetics.

J Biosoc Sci 2004;36:547-559.

39

98 Splan RK, Cundiff LV, Van Vleck LD. Genetic correlations between male carcass

and female growth and reproductive traits in beef cattle. Available at: http://elib.tiho-

hannover.de/publications/6wcgalp/papers/23274.pdf. Accessed March 4, 2005.

99 Bercovitch FB, Nurnberg P. Socioendocrine and morphological correlates of

paternity in rhesus macaques (Macaca mulatta). J Reprod Fertil 1996;107:59-68.

100Hebebrand J, Wulftange H, Goerg T, et al. Epidemic obesity: are genetic factors

involved via increased rates of assortative mating? Int J Obes Relat Metab Disord

2000;24:345-353.

101Katzmarzyk,PT, Hebebrand J, Bouchard C. Spousal resemblance in the Canadian

population: implications for the obesity epidemic. Int J Obes Relat Metab Disord

2002;26:241-246.

102Katzmarzyk PT, Perusse L, Rao DC, Bouchard C. Spousal resemblance and risk of 7-

year increases in obesity and central adiposity in the Canadian population. Obes Res

1999;7:545-551.

103Henry CJK. Variability in adult body size: uses in defining the limits of human

survival. In: Ulijaszek SJ, Mascie-Taylor CGN, eds. Anthropometry: The Individual and

the Population. New York: Cambridge University Press, 1994:117-129.

104Atkinson RL, Dhurandhar NV, Allison DB, et al. Human adenovirus-36 is associated

with increased body weight and paradoxical reduction of serum lipids. Int J Obes

2005;29:281-6.

105Pine DS, Goldstein RB, Wolk S, Weissman MM. The association between childhood

depression and adulthood body mass index. Pediatrics 2001;107:1049-1056.

40

106Zemel MB, Thompson W, Milstead A, Morris K, Campbell P. Calcium and dairy

acceleration of weight and fat loss during energy restriction in obese adults. Obes Res

2004;12:582-590.

107Mayfield R. (2003). Hormones in meat—what you should know! News from Dr.

Robin. You Can Feel Good!, No. 6, April 22, 2003. Available at:

http://www.drrobinmayfield.com/newsletters/newsletter-6.html. Accessed March 4, 2005.

108Di Lorenzo L, De Pergola G, Zocchetti C, et al. Effect of shift work on body mass

index: results of a study performed in 319 glucose-tolerant men working in a Southern

Italian industry. Int J Obes Relat Metab Disord 2003;27:1353-1358.

109Kivimaeki M, Kuisma P, Virtanen M, Elovainio M. Does shift work lead to poorer

health habits? A comparison between women who had always done shift work with those

who had never done shift work. Work-and-Stress 2001;15:3-13.

110Arenz S, Ruckerl R, Koletzko B, von Kries R. Breast-feeding and childhood obesity -

a systematic review. Int J Obes 2004;28:1247-1256.

111Hamermesh DS. The Timing of Work Over Time. Economic J. 1999;109. Available

at: http://www.res.org.uk/journals/abstracts.asp?ref=0013-

0133&vid=109&iid=452&aid=390. Accessed March 4, 2005.

112Breastfeeding by mothers 15-44 years of age by year of baby’s birth, according to

selected characteristics of mother: United States, average annual 1972-74 to 1993-94.

Available at: http://www.cdc.gov/nchs/data/hus/tables/2003/03hus018.pdf. Accessed

March 4, 2005.

113Friedman JM. A war on obesity, not the obese. Science 2003;299:856-8.

41

114Bray GA, Champagne CM. Beyond energy balance: there is more to obesity than

kilocalories. J Am Diet Assoc 2005;105(5 Suppl 1):S17-23.

115Middleton N, Gunnell D, Whitley E, Dorling D, Frankel S. Secular trends in

antidepressant prescribing in the UK, 1975-1998. J Public Health Med; 2001;23:262-267.

42

Figure 1. Secular changes in a number of key indicators of factors that may be related to

the increase in obesity. These indicators include: mean age of US mothers at first birth;77

antidepressant prescribing in the UK;115 prevalence of AC—the percentage of US

households equipped with air conditioning;49 UK average internal home temperature—

average internal home temperature;46 PDBE concentration—the concentration of

polybrominated diphenyl ethers in the breast milk of Swedish women from 1972 to

1978;39 proportion of US adult population that is Hispanic and/or between 35 and 55

years of age;71 time spent awake;27,28 non-smoker prevalence;50,53 adult obesity

prevalence, U.S. adults only, BMI ≥ 30 indicates obesity.1

43

A FREE-KNOT SPLINE MODELING FRAMEWORK FOR PIECEWISE LINEAR

LOGISTIC REGRESSION IN COMPLEX SAMPLES

by

SCOTT W. KEITH, DAVID B. ALLISON

In preparation for Statistics in Medicine

Format adapted for dissertation

44

Summary

This paper details the design, evaluation, and implementation of a framework for

modeling nonlinearity between a binary outcome and a continuous prognostic variable

adjusted for covariates in complex samples. The primary objective of this methodology is

to analyze non-random survey samples by applying sophisticated modeling techniques

capable of detecting nonlinearity and adjusting model flexibility. Providing familiar-

looking parameterizations of output, such as linear slope coefficients and odds ratios, is

the secondary objective. Estimation methods include least squares or maximum

likelihood optimization of piecewise linear free-knot splines formulated as truncated

power bases or B-splines. Correctly specifying the optimal number and positions of the

knots improves the approximating power of a spline model, but has been marked by

computational intensity and numerical instability. Inference methods utilize both

parametric and nonparametric bootstrapping. Unlike other nonlinear modeling packages,

this framework accounts for multistage cross-sectional survey sample designs common to

nationally representative datasets. We conducted a simulation study of our novel

procedure for specifying the optimal number of knots. Under the conditions we

simulated, our method was commonly more accurate than Schwarz’s Bayesian

Information Criterion (BIC) and very similar to Akaike’s Information Criterion (AIC) in

terms of accuracy and precision as long as sample sizes were large. AIC and BIC were

not effective model selection methods when complex sampling weights were

incorporated into the likelihood functions.

45

Keywords: Free-knot splines, nonlinear logistic regression, bootstrap, complex samples,

body mass index.

46

Introduction

Large epidemiological health cross-sectional surveys are powerful sources of

observational information for investigating health outcomes as they relate to potentially

predictive or confounding factors. Appropriately analyzing the data from many of these

surveys, such as the National Health and Nutrition Examination Survey (NHANES) and

the National Health Initiative Survey (NHIS), is not straightforward as their participants

are not selected by simple random sampling (SRS). Conducting an SRS of a large,

diverse population would be exorbitantly expensive. Instead, the survey designers plan

the sampling of groups of individuals in multiple stages with oversampling of certain

demographic or geographic clusters to collect a complex sample which represents the

population more efficiently than by SRS. There is a drawback in the statistical analysis of

these survey samples. The observations should not be considered independent and

identically distributed (iid) and therefore traditional statistical methods for modeling and

hypothesis testing must be adjusted to account for the correlation induced by the survey

sampling design (Korn and Graubard, 1999; U.S. DHHS NHANES III Analytic and

Reporting Guidelines, 1996).

Specialized software packages, such as SUDAAN or WestVar, have been

designed for conducting many types of statistical analyses on complex samples.

However, such software is not currently available for free-knot spline modeling.

Bessaoud and colleagues (2005) have pointed out the utility of these models for

effectively representing nonlinear associations between continuous predictors and a

47

binary outcome. Interestingly, they also describe how certain free parameters in their

models can be interpreted as thresholds for distinguishing groups with differing risk

relationships.

Free-knot spline modeling methodology could be very useful in providing an

alternative to traditional quantitative epidemiological methods for characterizing

nonlinear risk relationships (i.e., categorizing the continuous predictor or using

polynomials). A free-knot spline may be loosely described as a nonlinear regression

characterized by piecewise polynomials of order m joined at locations called knots where

the adjoining segments typically agree at their (m-2)th derivative and both the number and

locations of the knots are free parameters estimated along with other model parameters

(de Boor, 1978).

We propose in this paper a free-knot spline framework for conducting piecewise

linear logistic regression in complex multi-stage cross-sectional survey samples using B-

splines and bootstrapping with a focus on likelihood function maximization for model

computation. Piecewise linear representations of parameter estimates and odds ratios are

output for expressing results in a familiar-looking format. Simulation study results will

demonstrate the performance of our procedure for specifying the optimal number of

knots.

Free-Knot Splines: Nonlinear Modeling and Parameter Estimation

What is Available for Nonlinear Modeling

It appears that the literature regarding innovation in nonlinear modeling and

smoothing methods in recent years has been focused in several areas: penalized splines

48

with fixed knots (P-splines) (e.g., Eilers and Marx, 1996; Ruppert et al., 2003);

multivariate adaptive regression splines (MARS) (Friedman, 1991); incorporating splines

into logistic regression (e.g., Bessaoud et al., 2005; Johnson, 2007) and survival analysis

(e.g., Kooperberg et al., 1995; Gray, 1996; Gray, 1994; Rosenberg, 1995; Molinari et al.,

2001); Bayesian methods that utilize Reversible Jump Markov Chain Monte Carlo such

as penalized free-knot splines (e.g., Lindstrom, 2002) and Bayesian Adaptive Regression

Splines (BARS) (DiMatteo et al., 2001); applying mixed models to smoothing (e.g.,

Ruppert et al., 2003; Wand and Pearce, 2006); generalized additive models (GAM)

(Hastie and Tibshirani, 1986; 1990; Wood, 2006); and free-knot splines (e.g. Lindstrom,

1999; Stone et al., 1997). Some researchers are also applying bootstrapping methods to

spline estimation (Kauermann et al., 2006; Bessaoud et al., 2005; Molinari et al., 2001).

Research in spline methodology continues to be popular as new methods, particularly

those utilizing the increased computing power of today’s technology, are increasingly

important for summarizing information and drawing inferences from data sources that are

growing in number and complexity.

Features of P-splines and GAM. P-splines and GAM are perhaps the most popular

modeling tools available for modeling a binary outcome as a nonlinear function of one or

more continuous predictor variables. P-splines were introduced by Eilers and Marx

(1996) as a semiparametric method to analyzing nonlinear relationships by fitting B-

splines with a relatively large number of fixed knot locations and difference penalties on

adjacent B-spline coefficients. The penalties imposed; often based on Akaike’s

Information Criterion (AIC) (Akaike, 1974), Schwarz’s Bayesian Information Criterion

49

(BIC) (Schwarz, 1978), cross validation (CV) (e.g., Eilers and Marx, 1996), or

generalized cross validation (GCV) (e.g., Wahba, 1990); adjust the smoothness of the

fitted function. Methods have also been developed to select the number of knots for P-

spline models by fitting a dense map of knots and iteratively adding and removing knots

(Ruppert, 2002).

GAMs were introduced by Hastie and Tibshirani (1986) as a way of additively

relating the mean of a response (outcome) variable to a set of linear predictors in addition

to a set of smoothed predictors. Any GAM reduces to a generalized linear model (GLM)

(Nelder and Wedderburn, 1972; McCullagh and Nelder, 1989) by “zeroing” or

“shrinking” spline parameters. CV or GCV methods are commonly used in GAM to

optimize smoothing parameters by balancing residual and prediction errors. Hence, they

control the dilemma surrounding over- or under- fitting the data. This has been referred to

as “bias - variance trade-off.” Unpenalized nonlinear modeling based on minimizing

residual sums of squares results in over-fitting to the degree that the model interpolates

the data points themselves. Over-penalizing the nonlinear model will result in excessive

residual error variance and an under-fitted model. For a thorough general discussion on

these issues, see Hastie et al. (2001).

What Free-knot Splines Offer and Why They are Used

Specialized statistical modeling tools are called for in clinical and epidemiological

settings for constructing useful models under circumstances of nonlinearity, non-

normality, and heteroscedasticity which represent departures from GLM assumptions

(Korn and Graubard, 1999). Of particular interest are those models with localized

50

estimation, such as free-knot splines (Lindstrom, 1999; 2002), which can limit the

influence of observations to particular regions of the fitted model. This property may lead

to a better characterization of associations in the tails of the predictor and response joint

distribution where small proportions of perhaps the most interesting observations exist.

Although there are many nonlinear modeling tools available, free-knot splines offer these

as well as other features, which make them appealing for the applications in health survey

research.

Our nonlinear framework will utilizes piecewise linear free-knot splines to build

an additive model of an outcome (or a function of an outcome) as a nonlinear function of

a continuous predictor. The knots will be estimated as free parameters along with other

linear continuous or categorical covariate parameters. Estimating the optimal number and

positions of the knots improves the approximating power (Burchard, 1974) of the model,

but has been marked by computational intensity and numerical instability. Free-knot

splines are very sensitive to local maxima in either the likelihood or residual sums of

squares surfaces. Effort has been made to mitigate these ailments by the introduction and

evaluation of B-splines (de Boor, 1978) and penalties for coalescent knots (Lindstrom,

1999). When free knots coalesce or overlap, the result is poor computational performance

described by Jupp (1978) as the “lethargy property” of free-knot splines.

In our research, we will be restricting our method to nonlinearity between an

outcome and only one independent variable. This will allow us to avoid the “curse of

dimensionality” (Bellman, 1957; Hastie and Tibshirani, 1986) which may be described as

the problem of extremely rapid increases in data sparseness as the dimension of the

nonlinear multivariate space increases. Regardless, the optimization of even one

51

nonlinear relationship via a free-knot spline has proven to be a difficult task in large

datasets. If the computational demands and numerical instability associated with free-

knot splines may be overcome, the free-knot models may have great potential for optimal

fit to observed data.

A key feature of the framework is that the splines may be represented

algebraically and interpreted according to their piecewise polynomial segments which

gives the output from these models a familiar appearance to researchers accustomed to

interpreting GLM results. This is an important aspect as the framework is intended to be

accessible and attractive for use by epidemiologists and other quantitative researchers.

Interpretability of the Knots can be Biologically or Clinically Useful

Effectively estimating both the numbers and locations of the knots tends to

produce a simpler, low dimensional analytic function than fixing either the number or

locations (or both) of the knots a priori. This is appealing from the perspective of

maximizing the parsimony of the fitted model, but can also provide an interesting

interpretation for the knots. Assuming that the true model has the same order and number

of knots as that estimated, then the model can be considered parametric. Bessaoud et al.

(2005) and Molinari et al. (2001) use this to their advantage by interpreting the knots in

their free-knot spline models as cut-points in a risk relationship that define thresholds

between groups with differential patterns of association with the outcome of interest. We

suggest that this may be inappropriate for cubic, perhaps even quadratic, free-knot splines

as the ability to correctly specify the true model seems to decrease with increasing order.

52

However, in situations where the aforementioned assumption holds sufficiently well, this

interpretation of the knots can yield compelling biological or clinical insights.

Basis Functions

A spline is constructed from basis functions. A basis function is an element of the

basis for a function space. Each function in a given function space can be expressed as a

linear combination of its basis functions. For example, the class of cubic polynomials

with real-valued coefficients has a basis consisting of {1, x, x2, x3}. Every cubic function

can be written as a linear combination of this basis (i.e. a1+bx+cx2+dx3). Basis function

expansions must be explicitly specified in order to calculate free-knot splines. We will

consider two possible bases for our framework: the truncated power basis (Ruppert et al.,

2003) and the B-spline basis (de Boor, 1978).

Truncated power basis. The truncated power basis of order m (Ruppert et al.,

2003) can be expressed as,

10 1

1( ) (x) x ... x x

Kp

p pii

g f b piµ β β β ζ +

=

= = + + + + ( −∑ )

ζ

(1)

where some function of an average response, g(µ), is a nonlinear predictive function of an

independent variable, f(x); ζi is the ith of K knots such that ζ1 ≤ ζ2 ≤…≤ ζK; and u+=

max(u,0). Here, we limit our scope to the piecewise linear truncated power basis (order m

= p+1 = 2),

0 11

( ) (x) x x .K

i ii

g f bµ β β +=

= = + + ( −∑ ) (2)

53

A piecewise linear representation. We use indicator functions, , to then

express the truncated power base as a piecewise linear function on the i

{ }I

th contiguous

interval of the domain of the predictor, x, delimited by either the knots or bounds of x.

Suppose that and that we fix knots that will not be estimated at the endpoints a =

min(x) and b = max(x) such that ζ

x +∈ℜ

0 = a and ζ K+1 = b, then we have,

{ } {0 1 1 1 1 11 1

( ) [ (x)]

x I x (x ) ( ) I x ,K i

i i j j j i ii j

g h f

a a a a }

µ

ζ ζ ζ ζ ζ+ −= =

=

⎡ ⎤= + < + − + − ≤ <⎢ ⎥

⎣ ⎦∑ ∑ ζ +

(3)

where using the coefficients from (2) can give us,

11

, ( 1, , 1),l

l ii

a b l Kβ=

= + = …∑-1

+ (4)

the slope parameter for any observed [ )1x ,l lζ ζ−∈ . Note that although algebraically

equivalent, this basis representation of the piecewise linear space is much less stable for

computation than the truncated power basis.

To illustrate our model including covariates, suppose that X is an N x 1 vector of

data on some continuous prognostic variable of interest and Z represents a N x (p + 1)

matrix consisting of a column of ones followed by p columns of data on covariates. Let

η(.) be a parametric function of p+1 linear covariate predictors multiplied by their

respective logistic regression coefficients (β) and the K+1 piecewise linear spline

coefficients (a1, ..., aK+1). For the qth individual (q = 1, …, N),

( ) { }

{ }

0 1 2 ( 1) 1 1

1 11 1

X Z ... Z X I X

(X ) ( ) I X .

q q q p q p q q

K i

i q i j j j i q ii j

a

a a

η β β β ζ

ζ ζ ζ ζ ζ

+

+ −= =

= + + + + <

⎡ ⎤+ − + − ≤ <⎢ ⎥

⎣ ⎦∑ ∑

Z

1+

(5)

For comparison, consider the simpler truncated power basis expression,

54

( ) 0 1 2 p ( 1) 01

X ... X XK

q q q q p q i q ii

Z Z b bη β β β ζ+ +=

= + + + + + ( − )∑Z (6)

B-splines. B-spline bases are easy to incorporate into the framework as de Boor

(1978) describes a recurrence relation for their practical implementation. B-splines are

used extensively throughout the nonlinear modeling literature. Here, we will discuss them

only in brief detail.

Consider a knot sequence,

( ) ( )0 1 2 1 2 3min maxK K K Kζ ζ ζ ζ ζ ζ ζ+ += = ≤ ≤ …≤ ≤ ≤ = =X X + , such that there are

K interior knots. Let [ ]T2 1... .Kζ ζ +=ζ By definition of B-splines (see de Boor, 1978), the

jth B-spline of order m = 1 (piecewise constant) is,

11

1, if XB

0, otherwisej q j

j

ζ ζ +≤ <⎧⎪= ⎨⎪⎩

(7)

and the higher order B-splines may be constructed by this recurrence relation,

( )( 1) ( 1) ( 1)( 1)B B 1 Bjm jm j m j m j mω ω− + += + − ,− (8)

where,

( )1

XX q j

jm qj m j

.ζ

ωζ ζ+ −

−=

− (9)

So, the linear B-spline basis of order m = 2 (piecewise linear) we use can be expressed for

any as, Xq ∈ℜ

{ } { }22 1 1 2

1 2 1

X XB I X I X , 0,..., 1,q j j q

j j q j j q jj j j j

j Kζ ζ

ζ ζ ζ ζζ ζ ζ ζ

++ + +

+ + +

− −= ≤ < + ≤ < =

− −+ (10)

where K is the number of interior knots fitted. Thus we have,

55

(11) ( )1

0 1 2 p ( 1) 21

[ X ] Z ... Z B X ,K

q q q q q q p i qi qi

bη β β β+

+=

= + = + + + + (∑Z Z B bβ )

where b1, …, bK+1 are linear regression coefficients corresponding to their respectively

indexed values in the qth row vector, Bq, of the B-spline expansion matrix B. This shows

how η is an additive, linear expression of B-spline parameters. Note that (11) may be

easily transformed to a polynomial expression as piecewise polynomial coefficients are

clearly linear combinations of the B-spline coefficients. Figure 1 shows what B-spline

basis functions of order m = 2 look like for a continuous predictor (e.g., BMI) with two

knots.

Computation Methods

Before we discuss optimizing the fit of the spline to data, we briefly consider

some computational aspects. Mathematicians and computer scientists have demonstrated

that B-splines can have desirable properties, such as local linear independence (de Boor,

1978) and computational stability (Dierckx, 1993). As such, they have been a popular

choice and used extensively for free-knot modeling (e.g., Bessaoud et al., 2005;

Lindstrom, 1999).

We will be fitting nonlinear functions constrained to the class of piecewise linear

free-knot spline functions mapping a continuous independent variable onto the space of

the outcome variable as a projected estimate of the mean response surface. Our goal is to

first find the optimal fit for a given number of knots, K, and then determine which value

of K best represents the data. There are two general approaches to these computations:

1) by minimizing the distance between observed and predicted values (i.e., least

squares estimation or LSE); and

56

2) by maximizing the likelihood function (i.e., maximum likelihood or MLE

approach).

Least Squares

LSE in this context involves minimizing, with respect to residual sums of squares

(SSE), the distance between the observed outcome or a function of the observed outcome

and nonlinear estimates. This typically requires a method, such as the Gauss-Newton

method with the Levenberg-Marquardt adjustment (Levenberg, 1944; Marquardt, 1963),

which uses derivatives or estimates of derivatives to pick out the optimal fit.

No canned SAS procedures (version 9.1; SAS Institute Inc, Cary, NC) such as

PROC GAM, PROC TRANSREG, or PROC TPSPLINE are capable of fitting free-knot

splines. Thus, we programmed a spline basis using SAS macros and PROC NLIN for

least squares estimation. This involves minimizing a measure of distance between

vectors, 2ˆ ,−f f which represents the nonlinear SSE in a multidimensional space where f

is the collected data and f̂ is a collection of nonlinear estimates as a function of the data,

complex sample weights, and model parameters including free-knots.

Maximum Likelihood

For MLE, the nonlinear logistic likelihood function must be numerically

maximized to find the parameter values under which the observed data was most likely

produced. In theory, these estimates might have the asymptotic efficiency and invariance

under reparameterization which makes MLE attractive in general (Casella and Berger,

2002). The invariance property might be important to our framework as we intend to

57

perform the optimization with B-splines, but to report linear combinations of B-spline

parameters that represent the local piecewise linear slopes as they are easier to interpret in

practice.

The Nelder-Mead simplex (Nelder and Mead, 1965) is a popular and powerful

direct search procedure for likelihood-based optimization. The attraction of this method is

that the simplex does not use any derivatives and does not assume that the objective

function being optimized has continuous derivatives. Nelder-Mead simplex optimization

is the only method currently available in SAS which does not require derivative

calculations to search the parameter space. In cases such as ours (i.e., piecewise linear

splines) we do not expect continuity in the first derivatives at the knot locations.

Therefore an MLE and simplex optimization approach seems more reasonable than the

LSE and residuals sums of squares minimization approach. Direct search methods can,

however, be much less efficient or even highly unstable as compared to derivative-based

LSE or MLE methods when sample sizes are as large as the datasets common to complex

survey designs. Hence, as a compromise, we have used “quasi-Newton methods” with

estimated derivatives to perform the MLE.

Nonlinear Logistic Likelihood. Let’s now examine the nonlinear logistic

likelihood function for modeling binary outcomes. The probability of the qth participant

having experienced the outcome of interest, Yq = 1, can be expressed as,

( )( ) ( )( ) ( ){ }( ){ }

exp [ X ][ X ] 1| [ X ]

1 exp [ X ]

qqq q q qq q

qq

P Yη

π η ηη

= = =+

ZZ Z

Z, (12)

58

where η([ ) may be equation (11). Note that the logit or log(odds) function of this

probability,

X ]qqZ

( )( ){ } ( )( )( )( ) (

[ X ]logit [ X ] log [ X ] ,

1 [ X ]

qqq qq q

qq

π ηπ η η

π η

⎧ ⎫⎪ ⎪= =⎨ ⎬

−⎪ ⎪⎩ ⎭

ZZ Z

Z)q (13)

may reasonably be modeled piecewise linearly as a function of the variables in [Zq Xq].

We may express a weighted likelihood function,

( )n wy 1 y

1

| [ ], {1 ) ,q

q qq q

q

L π π −

=

⎡ ⎤= −⎣ ⎦∏Z X Wθ (14)

where θ is a vector of all the linear and spline parameters expressed in (11), n = sample

size, πq is defined above in (12), yq is the binary outcome, and wq is the complex sample

weight in the weight vector, W, assigned to the qth participant by the study designers. The

weighted log-likelihood, which is more convenient for use in optimization procedures,

( ){ }n

1

log L | [ ], w log(1 ) y log ,1

qq q q

q q

ππ

π=

⎡ ⎤⎛ ⎞= − +⎢ ⎥⎜ ⎟⎜ ⎟−⎢ ⎥⎝ ⎠⎣ ⎦

∑Z X Wθ (15)

may be maximized numerically using PROC NLP in SAS.

Optimization

Our goal is to first find the optimal fit for a given number of knots and leave the

optimization with respect to the number of knots for the next section.

Levenberg-Marquardt adaptation to the Gauss-Newton algorithm for nonlinear

LSE. The nonlinear LSE optimization procedure by the Gauss-Newton method is fairly

straightforward. Consider this nonlinear system of equations that represent our nonlinear

59

model between a vector of outcomes, Y, and a function of the observed data, X, and

parameters,

( ) , F ε= θ +Y X (16)

where ε is the error vector. The general approach to solving for the minimum distance

between Y and ( )ˆ ,F Xθ , that is, the residual distance e = Y - ( )ˆ ,F Xθ , is to solve the

nonlinear “normal” equations,

( )T ˆ , F =D X Dθ T ,Y (17)

where D represents the gradient matrix,

( )ˆ

.ˆF ,∂

=∂

XD

θ

θ (18)

Note that, in practice, we cannot actually calculate D because the derivatives at the knot

locations do not exist. Instead, we used finite difference approximations to D.

A closed form solution to (17) generally will not exist, so we try to find a solution

by an iterative process beginning with some starting value for the values, , and

continuing to update to until e

oldθ̂

oldθ̂ newθ̂ Te, the residual sum of squares (SSE), shows no

major improvement after reiterating,

(19) ˆ ˆSSE( ) SSE( ) SSE( ),k= + <new old old∆θ θ θ̂

where ∆ represents the next “step,” and k is a coefficient that can be adjusted to control

the size of the step.

For this LSE approach to numerically solving the piecewise linear function

optimization problem, SAS software offers several popular iterative algorithms. We

60

chose the Levenber-Marquardt updating formula (Levenberg, 1944; Marquardt, 1963)

defined as follows:

( )( )T Tdiag 'λ−

= +D D D D D e.∆ (20)

This method is a compromise between the Gauss-Newton and steepest descent ( '= De∆ )

methods (Marquardt, 1963) affected by adjusting the magnitude of λ. Lindstrom (1999)

suggested that for estimating free-knot parameter locations, the Levenberg-Marquardt

method increases the chance of finding the global optimum.

Quasi-Newton methods for MLE. Using quasi-Newton methods worked more

efficiently and produced more stable results than the Nelder-Mead simplex. Quasi-

Newton methods are a class of optimization algorithms which we used to locate minima

in the negative natural logarithm of (14). The particular quasi-Newton procedure we

employed is called the dual Broyden-Fletcher-Goldfarb-Shanno method (DBFGS)

(Broyden, 1970; Fletcher, 1970; Goldfarb, 1970; Shanno, 1970). This is a complicated

procedure, the details of which extend beyond our scope. In brief, DBFGS uses line

searches along feasible descent search directions in combination with estimation of the

Cholesky factor of the Hessian matrix of second derivatives to iteratively update the

overall search for minima. Although this method requires first derivatives, we were able

to calculate derivative estimates by using finite difference approximations as we did for

the LSE methods. In application to large survey datasets, we have found that the MLE

methods suffer fewer problems with convergence than the LSE methods.

61

Knot Selection

A Novel Parametric Bootstrap-Based Method

We outline in this section a novel method of selecting the optimal number of

knots. Knot locations, linear and nonlinear coefficients, and a common intercept are

parameters optimized simultaneously having complex sample weights incorporated into

the function fitted to achieve unbiased and fully adjusted estimates. Like Bessaoud et al.

(2005) and Molinari et al. (2001) we will be interested in interpreting the fitted knots to

define clinically meaningful groups with differential patterns of risk. It will be very

important to correctly specify a parsimonious number of knots, say 4 or fewer, which

would indicate 5 or fewer different risk groups. Therefore, keeping the framework from

producing models with unnecessary knots is a priority.

Our technique involves a forward selection procedure based on the concept of a

two degree of freedom test for the addition of two parameters, a knot and a slope, to the

piecewise linear model (our “2 df knot testing procedure”). As depicted in (11), we are

considering a set of p linear or categorical covariates for adjustment purposes, but this

procedure is targeted at optimizing the complexity necessary to effectively model the one

potentially nonlinear prognostic variable, X. The test statistic for the LSE framework is

an F-ratio,

( )reduced full

full

full

SSE -SSE2 ,SSE

df

F = (21)

where dffull = N-(p+2K) (i.e., one less than the sample size minus the number of free

parameters estimated: the intercept, p linear coefficients, K free-knots, and the K+1 spline

parameters (the piecewise linear slopes)).

62

We do not know the distribution of F, so we use the parametric bootstrap (Efron,

1982) as described by Davison and Hinkley (1997) to build a hypothetic distribution of F-

ratio test statistics under the null hypothesis that the reduced model having K knots is true

against an alternative having K+1 knots. We draw D1 parametrically resampled replicate

datasets of binary outcomes and compute the F-ratio distribution { }11 D,...,rep repF F . A

bootstrap p-value representing the probability that adding the (K+1)th knot produces an F-

ratio at least as large as what might be observed by chance alone can be calculated from

this F-ratio distribution as,

1D

1

1boot

1 I{.

D 1

repj

jF F

p =

+ ≥=

+

∑ } (22)

This is analogous to integrating the distribution of F to determine the probability of

observing the data given that the null model is true.

We select a value for α to represent the significance level or inclusion criterion for

this test of the contribution to reducing SSE. That is, the p-value (22) would have to be

smaller than α in order to reject the null hypothesis that the model with K knots is the true

model in favor of the model with K+1 knots. We can control the flexibility of the model

by manipulating α.

The LSE approach to the 2 df knot testing procedure for the null hypothesis that

the “true” model has Knull knots versus an alternative of Kalt. can be specified for model

parameters (θ; including linear and nonlinear free parameters as expressed in (11)) fitted

to a dataset having binary outcome (Y), potentially nonlinear continuous predictor (X),

covariates (Z), and sample weights (W) by these algorithm specifications:

63

Step 1: Set: Knull = 0, Kalt. = Knull + 1; Step 2: Input: Y, X, Z, W; Step 3: Initialize: ;0

.alt0null ,θθ

Step 4: Minimize: arg min( ) , start = ; nullSSE( )θ nullθ̂ 0nullθ

Compute: ; nullˆSSE( )θ

Step 5: Minimize: arg min(objective = , start = ; alt.SSE( )θ .altθ̂ 0.altθ

Compute: ; alt.ˆSSE( )θ

Step 6: Compute: ( )null alt.

alt.

alt.

SSE -SSE2

SSEdf

F = ;

Step 7: Parametric Bootstrap: for j = 1 to D1 do Generate by drawing a random binary outcome for each subject, i =

1,…,N, from Bernoulli(p

repjY

i| ); nullθ̂ Repeat Steps 2 through 6 replacing Y with ; rep

jY

Compute: re of F under HpjF o: θ = ; nullθ̂

End do;

Step 9: Compute: 1D

repboot

11

1 1 I{ }D 1 j

jp F F

=

⎛ ⎞= + ≤⎜ ⎟+ ⎝ ⎠

∑ ;

Step 10: Select the model: If pboot ≤ α and Knull ≤ 2 then do Set Knull = Knull + 1, Kalt. = Kalt. + 1; Repeat Steps 2 through 9; End do; Else if pboot ≤ α and Knull = 3 then do K = 4; = ; θ̂ alt.θ̂ End do; Else do; K = Knull; = ; θ̂ nullθ̂ End do; Step 11: Compute: from where B-spline parameter elements have been

linearly transformed to piecewise linear slope parameters; PLSθ̂ θ̂

For the MLE, we adopted a similar approach, but with a likelihood ratio (LR) test statistic

in place of the F-ratio statistics.

64

The MLE approach to the 2 df knot testing procedure algorithm is be specified as

follows:

Step 1: Set: Knull = 0, Kalt. = Knull + 1; Step 2: Input: Y, X, Z, W; Step 3: Initialize: ;0

.alt0null ,θθ

Step 4: Minimize: arg min(objective = -ln[L(θnull)]) , start = ; nullθ̂ 0nullθ

Compute: -ln[L( )]; nullθ̂

Step 5: Minimize: arg min(objective = -ln[L(θalt.)]) . , start = ; altθ̂ 0.altθ

Compute: -ln[L( )]; .altθ̂

Step 6: Compute: null

alt.

ˆln L( )ˆln L( )

LR −=

−θθ

;

Step 7: Parametric Bootstrap: for j = 1 to D1 do Generate by drawing a random binary outcome for each subject, i =

1,…,N, from Bernoulli(p

repjY

i| ); nullθ̂ Repeat Steps 2 through 6 replacing Y with ; rep

jY

Compute: of LR under HrepjLR o: θ = ; nullθ̂

End do;

Step 9: Compute: 1D

repboot

11

1 1 I{ }D 1 j

jp LR LR

=

⎛ ⎞= + ≤⎜ ⎟+ ⎝ ⎠

∑ ;

Step 10: Select the model: If pboot ≤ α and Knull ≤ 2 then do Set Knull = Knull + 1, Kalt. = Kalt. + 1; Repeat Steps 2 through 9; End do; Else if pboot ≤ α and Knull = 3 then do K = 4; = ; θ̂ .

âltθ

End do; Else do; K = Knull; = ; θ̂ nullθ̂ End do; Step 11: Compute: from where B-spline parameter elements have been

linearly transformed to piecewise linear slope parameters; PLSθ̂ θ̂

Thus, by either approach, we end with output parameters, , for an optimal piecewisePLSθ̂

65

linear model expressed in terms of local linear slope coefficients, knot locations, and

covariate coefficients.

Grid Search

The selection of good starting values is critical for iterative optimization

procedures in avoiding locally optimal model parameter settings in favor of converging to

the global optima. It has been difficult identifying such starting values when modeling

with free-knot splines. This issue is ubiquitous in the literature and is particularly

troublesome when the functional surface relating the nonlinear predictor and response is

nearly flat. Not only is it important to place the knots well, but the algorithm must also

start with well placed covariate and spline parameters. To address this, we start spline

coefficient parameters at zero and any covariate coefficient parameters at their

multivariate GLM estimates. For the knots, we objectively search the free-knot parameter

space for plausible knot locations by using a grid search algorithm similar to that

described by Bessaoud et al. (2005). This obviates the need for subjectivity in assigning

starting values, but comes with extreme computational costs as increasing the size of the

grid has a multiplicative effect on the number of times we need to run the bootstrap

testing procedure.

The grid search was implemented in steps 4 and 5 of the 2 df MLE testing

procedure algorithm to locate the best starting values, . In the LSE approach

applied to headaches among women (Keith et al., 2008), we had started with values over

a fairly sparse set of plausible locations for where to place the knot values and simply

picked the configurations which minimized the SSE most. By either approach, we set

0null alt.andθ 0θ

66

starting covariate coefficient parameters in equal to those estimated by a linear logistic

regression in SAS PROC SURVEYLOGISTIC. For the MLE, we calculate models for L

possible starting locations placed at nearly equal distances throughout the range of the

predictor to avoid getting stuck on local maxima and help prevent coalescent knots. To

ensure that knots did not overlap, we also enforced linear constraints so that a small

minimal distance was maintained between any two knots, including the boundary knots.

As noted, the search can be extremely computationally expensive as for each loop from

step 2 through 9 of the algorithm, we call PROC NLP times. In the most extensive

case, where we reject the model having K = 2, we would require

PROC NLP calls. For instance, this

quantity can vary from 22,604 when D

0θ

LK

⎛ ⎞⎜ ⎟⎝ ⎠

( )1

L L L L L4 D 1 2

0 1 2 3 4⎧ ⎫⎡ ⎤⎛ ⎞ ⎛ ⎞ ⎛ ⎞ ⎛ ⎞ ⎛ ⎞⎪+ + + + + +⎨ ⎢ ⎥⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠ ⎝ ⎠ ⎝ ⎠ ⎝ ⎠⎪ ⎪⎣ ⎦⎩ ⎭

⎪⎬

1 = 200 and L = 6 up to 511,004 when D1 = 1,000

and L = 9. The latter is a large number of calls and is best run in a high performance

parallel processor computing environment.

Simulations

Evaluating the MLE 2 df Knot Testing Procedure

We have devised a simulation study to assess how well our MLE 2 df knot testing

procedure performs in correctly specifying the optimal number of knots and compare

results to those by,

( ){ }ÂIC 2log L | , 2 ,r= − +X Wθ (23)

and

67

( ){ } ( )ˆBIC 2log L | , log n ,r= − +X Wθ (24)

where r represents the number of parameters in the model and n is the sample size. The

conditions we focused upon in simulating data were the size of the sample (n) and the

proportion of events (po). For each we selected two setting: { }n 500, 5000∈ and

{ }op 0.10, 0.33∈ . The simulated data were generated from randomly selected samples of

n BMI records from the Third National Health and Nutrition Examination Survey

(NHANES III) – a complex sample weighted nationally representative cross-sectional

survey conducted between 1988 and 1994 with mortality follow-up in 2000. 100

simulated sets of n binary outcomes were generated conditional on the n BMI records and

a true log(odds) model. More precisely, for a given set of n BMI values, each one, q = 1,

…, n, was assigned a Bernoulli random variable, Yqj | Xq = BMIq (j = 1, …, 100), based

on the probability of event defined by,

( )( ) ( )( ) ( ){ }( ){ }

exp [X ][X ] 1| [X ] ,

1 exp [X ]q

q q q qq

P Yη

π η ηη

= = =+

(25)

where the η function was specified by a true log(odds) model having Ktrue = 2 knots fixed

at BMI = 25 and BMI = 32 and piecewise slopes fixed at a1 = -0.4, a2 = 0.0, and a3 = 0.2

as in (3). These parameter settings were chosen as they define a functional shape that

characterizes the nonlinear U-shaped BMI relationship with a binary outcome variable

indicating mortality during follow-up among NHANES III participants having 17 ≤ BMI

≤ 45 at baseline. The intercept of the true model, a0, was calculated for each BMI dataset

as it must be conditioned on the desired level of po. To evaluate sensitivity to the

68

inclusion criterion (α), we ran our MLE 2 df knot testing procedure with settings of

{ }α 0.10, 0.25∈ on all combinations of the n and po settings.

Figure 2 displays the true model (in red) and replicated models (in black)

resulting from the application of the MLE 2 df knot testing procedure to 20 randomly

selected simulated datasets for each combination of settings. When the sample size was

low (n = 500 in Figure 1 frames a) – d)), there was considerable error variance or “noise”

distorting the true model “signal” which generated the binary simulated data and our

procedure did not perform nearly as well as when there was more information available

(n = 5,000 in Figure 1 frames e) – i)). When the proportion of events was elevated (po =

0.33 in frames b), d), f), and h)), it also introduced more information and reduce

uncertainty in where the true model was located. Note that Figure 1 frame i) shows the

only instance in which the sample weights from NHANES III had been included. They

did not greatly impact the performance of the MLE 2 df knot testing procedure, but they

did introduce an extra source of variance and possibly some degree of numerical

instability resulting in increased computation time and more varied results.

In Figure 3 we plotted the frequencies with which each of the possible number of

knots (i.e., K = 0, 1, 2, 3, or 4) was selected as optimal under each combination of

settings. Each frame has colored points representing the observed frequencies connected

by colored lines representing knot selection results from our method (in red), AIC (in

black), and BIC (in blue). All of these approaches were too conservative when the sample

size was low (n = 500 in Figure 3 frames a) – d)), but BIC was also too conservative

when the sample size was high and the proportion of events was low (n = 5000, po = 0.10

in Figure 3 frames e) and g)). Sample-weighted likelihoods from large survey samples are

69

not on a scale by which the AIC or BIC penalties would have any effect to curb

overfitting. You can see this result clearly in Figure 3 frame i) where our method was

accurate, but somewhat imprecise while AIC and BIC methods were neither accurate nor

precise. In general, our method worked accurately and very similarly to AIC, in cases

where no sample weights were used, as long as the sample size was large or when the

inclusion criterion was set high (α = 0.25 or n = 5000 in Figure 3 frames c) - h)).

Estimating Uncertainty in Parameter Estimates and Expressing Results

To illustrate why using the sample weights is important in such studies, consider

this simple hypothetical example. Suppose that in the population you have 20% African

Americans and 80% Caucasians and that due to planned oversampling, you have drawn a

sample consisting of 50% African Americans and 50% Caucasians. Now suppose that

the effect you are studying is more pronounced in Caucasians than African Americans. If

there is heterogeneity between these two groups and you do not adjust for the additional

weight given to the African Americans, you can expect to misspecify the variability

estimates and you might miss detecting effects or differences because of the bias induced

by over-representation of African Americans. The sampling design structures we run into

in practice are analogous, but more complicated and will be described in some detail

below. For a more general and technical review of the use of sampling weights for

analytic inference about parameters and how to incorporate the weights into statistical

models the interested reader should see Pfeffermann (1993).

70

Complex Samples

Standard statistical procedures and software typically have the underlying

assumption that the sample to be analyzed was collected by simple random sampling

(SRS). SRS gives equal probability of selection to each unit of the population which

results in a “random sample” of independent observations. Complex samples do not give

equal probability of selection to each unit in the population and are not independently

sampled, thus care must be taken in conducting the statistical analyses required to analyze

these samples appropriately. Analyzing a complex sample with methods designed for

SRS samples will produce incorrect estimates of variances and standard errors, and

possibly incorrect estimates of means and model parameters.

Multistage probability cluster sampling. The data we are considering will be from

samples designed to efficiently represent the US noninstitutionalized population. These

samples are drawn from the population using complex, multistage probability cluster

sampling that achieves the quality of effectively representing the population much more

quickly than the classic simple random sampling (SRS) design (Kish, 1995; U.S. DHHS

NHANES III Analytic and Reporting Guidelines, 1996). There are three components to

the information provided to the analyst to adjust for the unequal probablility sampling of

multistage complex sample designs we see in datasets such as NHANES and NHIS. The

components are stratum, primary sampling unit (PSU), and sample weight. The strata are

usually based on geographic area. PSUs are clusters within a stratum and generally given

a probability of being selected for sampling that is proportional to the size of the cluster

(with the exception that some clusters, such as the New York City metropolitan PSU, that

71

are assigned a selection probability = 1). The sample weights can be loosely defined as

giving each sampled subject a weight to indicate what proportion of the population they

represent.

The complex sample design variables actually presented to the researcher are

pseudo-variables. That is, they are false, but useful design variables provided by the

survey designers to mask the true sampling design features in order to protect participant

confidentiality. It is not clear from the pseudo-variables which PSUs have been sampled

with certainty and which have not. For more information on the issues surrounding

confidentiality and complex survey samples, the interested reader may see Lu (2000).

Making adjustments without existing software. As we are not aware of any

available tools, such as SUDAAN software, for nonlinear modeling of survey data with

complex sample designs, we found a way to make appropriate adjustments in our

program. There are two basic approaches to making complex sample adjustments:

linearization and resampling. Linearization is the application of a Taylor’s series

expansion to make first order linear approximations to possibly nonlinear parameters.

Variance estimates are then based on the linear approximations (Rao, 1997). Rao et al.

(1992) provide useful ideas for alternative approaches to this problem based on

resampling. Rao (1997) suggests that,

“An advantage of a resampling method is that it employs a single standard-error formula for all statistics θ , unlike the linearization method, which requires the derivation of a separate formula for each statistic . Moreover, linearization can become cumbersome in handling poststratification and nonresponse adjustments, whereas it is relatively straightforward with resampling methods… As a result, they [software using linearization] cannot handle more complex analyses such as logistic regression with poststratified weights.”

ˆ

θ̂

72

Thus, resampling provides a more general and versatile approach well suited to our

problem.

The resampling methods detailed by Rao et al. (1992) include balanced repeated

replication (BRR), the jackknife, and bootstrap. BRR involves resampling many “half-

sample” replicates by deleting one PSU from each stratum, rescaling the complex sample

weights, calculating a weighted replicate parameter estimate, and computing variance

estimates for the original parameter estimate based on the variability in the BRR

replicates. This method does not work well in cases where we have more than two PSU

per stratum. The jackknife method deletes one PSU, rescales the sample weights,

calculates a replicate parameter estimate, and repeats this for each PSU within each

stratum. A variance estimate for the original parameter estimate can then be calculated

from these jackknife replicates.

The most convenient resampling approach is to resample the PSUs with

replacement within each strata by using the nonparametric bootstrap method (Rao et al.,

1992) and appropriately rescale the weights. To be specific, the individual sampling

weights within the hth stratum (h = 1, …, H) are rescaled by the following equation:

nw w 1

1 1* h h hhij hij hi

h h h

d d rn n d

⎛ ⎞= − + × ×⎜⎜ − −⎝ ⎠

,⎟⎟ (26)

where, ,w*hij

is the rescaled weight for jth individual in the ith PSU, whij is the original

weight for the jth individual in the ith PSU, nh and dh are respectively the number of PSUs

and the number of bootstrap samples drawn from this stratum, and rhi is the number of

times the ith PSU is resampled. This is the underlying methodology applied in our

73

framework to achieve approximately unbiased standard errors and confidence intervals

adjusted for multistage complex sample designs.

Rao et al. (1992) and Rust & Rao (1996) each discuss in detail this method for

bootstrap adjustment of complex multistage sample weights when the number of PSUs

per statum is at least 2 (nh ≥ 2). Rao et al. (1992) suggested that this method is valid and

consistent for estimated parameters expressed as either smooth or nonsmooth functions of

totals when nh ≥ 2 and H is relatively large (e.g., H = 49 in NHANES III). Setting nh = 2

is a popular choice (common to both the NHANES and NHIS series) as it provides the

maximum amount of stratification possible for conducting valid variance estimation.

Once we have settled on a model with K knots by application of our 2 df knot

testing procedure, we are prepared to ascertain the certainty in our parameter estimates.

We begin by applying the methods suggested by Rao et al. (1992) described above to

generate D2 nonparametric bootstrap replicate estimates per each parameter of interest.

Keith et al. (2008), in applying the LSE methodology, used the bootstrap-t method

described by DiCiccio and Efron (1996) for calculating 95% CI from D2 = 1000

nonparametric bootstrap replicates. Hall and Wilson (1991) suggest this method as a

general guideline for improving statistical power and the accuracy of coverage

probabilities (i.e. bootstrapping a distribution for an asymptotically pivotal quantity,

ˆ

ˆθ θTσ−

= by *

*2*

ˆ ˆθ -θ , 1, , D ,ˆ

ii

i

T iσ

= = … where θ is some parameter of interest (say a

particular knot or slope), is the original parameter estimate, σ is the original standard

deviation estimate, and are the parameter and standard deviation estimates,

θ̂ ˆ

*θ̂i*σ̂i

74

respectively, from the ith bootstrapped sample). Then the bootstrap estimate of the

standard error of is, θ̂

( ) ( )T* * * *

2

1 ˆ ˆ ˆ ˆσ̂ -θ -θD 1

* =−

θ θ (27)

where represents the vector of ’s estimated from the D*θ̂ î*θ 2 bootstrap samples and

2

T1θ̂D

1* = θ̂* is the mean of the bootstrap replicates.

The distribution of T is not necessarily symmetric, so we locate the critical values

at either end of the ordered bootstrapped distribution { }2

* * *(1) (D ),...,T T=T such that

P(T*(lower critical)<T< T*

(upper critical)) ≥ 0.95, with equal probability in either tail, and applying

some algebra leads to the 95% CI for θ.

This method can be more stable and less conservative than using the more basic

percentile methods described by Davison and Hinkley (1997) and applied to free-knot

splines by Bessaoud et al. (2005), however, the standard error estimates, , were drawn

from the optimization procedure (PROC NLIN) and required running the model with the

far less stable piecewise linear basis, depicted in (5), in order to apply them directly to the

bootstrap-t distribution of the slope coefficient parameters. The following specification

outlines this complex sample adjustment procedure.

*σ̂i

Nonparametric bootstrap procedure specifications for calculating standard

errors and 95% confidence intervals for parameter estimates by the LSE approach:

Step 1: Input: Y, X, Z, W Step 2: Nonparametric bootstrap: for j = 1 to D2 do for h = 1 to H do resample with replacement mh = nh - 1 PSUs from stratum h; rescale sample weights;

75

End do; Minimize: arg min(objective = SSE(θ ) , start = ; ˆ

jθ θ̂

Compute: where B-spline parameter elements have been linearly transformed to a piecewise linear slope parameters;

PLSˆjθ

End do; Step 3: Let represent the vector transopose of the iT

iΛ th row of the matrix

; 2

PLS PLS1 D ×N

ˆ ˆ...p

⎡ ⎤= ⎣ ⎦θ θΛ

Step 4: Compute SE and 95% CI for each model parameter, i: for i = 1 to p do

Compute: PLS T T

2

1θ̂Di i= 1 Λ ;

Compute: ( ) ( )T* T PLS T

2

1 ˆ ˆσ̂ -θ -θD 1i i i i=

−Λ Λ PLS

i ;

Sort: in ascending order; TiΛ

Compute: { }2

* * *(1) (D ),...,i i iT T=T from T

iΛ

Compute: P(Ti *(lower critical)<Ti< Ti *(upper critical)) ≥ 0.95 with equal probability in either tail; Compute: 95% CI for PLS

iθ from Ti *(lower critical) and Ti *(upper critical) ; End do;

For the MLE, we decided to implement the more conservative percentile method

of calculating the 95% confidence intervals and do all optimization with the B-spline

basis.

Nonparametric bootstrap procedure specifications for calculating standard

errors and 95% confidence intervals for parameter estimates by the MLE approach

Step 1: Input: Y, X, Z, W Step 2: Nonparametric bootstrap: for j = 1 to D2 do for h = 1 to H do resample with replacement mh = nh - 1 PSUs from stratum h; rescale sample weights; End do; Minimize: arg min(objective = -ln[L(θ)]) , start = ; ˆ

jθ θ̂

Compute: where B-spline parameter elements have been linearly transformed to a piecewise linear slope parameters;

PLSˆjθ

End do;

76

Step 3: Let represent the vector transopose of the iTiΛ th row of the matrix

; 2

PLS PLS1 D ×N

ˆ ˆ...p

⎡ ⎤= ⎣ ⎦θ θΛ

Step 4: Compute SE and 95% CI for each model parameter, i: for i = 1 to p do

Compute: PLS T T

2

1θ̂Di i= 1 Λ ;

Compute: ( ) ( )T* T PLS T

2

1 ˆ ˆσ̂ -θ -θD 1i i i i=

−Λ Λ PLS

i ;

Sort: in ascending order; TiΛ

Set lower bound for the 95% CI of to the 2.5PLSiθ

th percentile of TiΛ ;

Set upper bound for the 95% CI of to the 97.5PLSiθ

th percentile of TiΛ ;

End do;

Odds ratios

We found odds ratios (OR) to be a powerful way of expressing event risk as a

function of the nonlinear predictor. We choose OR over the log(odds) when models have

been adjusted for covariate information because, unlike log(odds), OR for comparing two

otherwise similar individuals do not depend on the covariates. While computing OR in

our framework is not quite as simple as in a conventional GLM, it is straight-forward.

Assume the basis in equation (5) and, assuming all else is equal between individuals l = 1

and l = 2 except for their respective nonlinear predictor values, X1 and X2, respectively,

we may compute an odds ratio:

{ } {

{ } {

}

}

1 1 1 1 1 1 1 1 11 1

1 2 2 1 1 2 1 2 11 1

X I X X I XOR .

X I X X I X

K i

i i j j j i ii j

K i

i i j j j i ii j

a a ( ) a ( )

a a ( ) a ( )

ζ ζ ζ ζ ζ

ζ ζ ζ ζ ζ

+ −= =

+ −= =

⎡ ⎤< + − + − ≤ <⎢ ⎥

⎣ ⎦=⎡ ⎤

< + − + − ≤ <⎢ ⎥⎣ ⎦

∑ ∑

∑ ∑

ζ

ζ

+

+

(28)

Graphical representations of this OR may be created if a a suitable reference level can be

fixed for X2 while allowing X1 to range.

77

Possible Future Directions

The next step for the development of the MLE framework is to introduce

penalties for coalescing knots in a fashion similar to that of Lindstrom (1999). If the data

are truly better modeled by jump-discontinuities, then the modeling framework should

allow for this possibility. Inducing penalties to avoid unnecessary overlapping of knot is

thus a more appealing approach to avoiding lethargy problems (Jupp, 1978) than

dropping models in which knots have coalesced (Bessaoud et al., 2005) or by enforcing

linear constraints to ensure a certain amount of space between knots as we have done.

Modeling time to events or censorship during follow-up in complex samples is a

crucial objective for this nonlinear modeling framework. We expect to extend our

likelihood-based MLE methods to modeling nonlinear bases in partial likelihood

formulations (Cox, 1975). This will provide a foundation to begin modeling relative risks

in terms of hazard ratios computed by nonlinear proportional hazards regression in our

framework with some modifications to design of the MLE approach.

Discussion

The methods described in this paper have been successfully implemented and

applied elsewhere to real complex data on BMI related to headaches among women by

the LSE approach (Keith et al., 2008) and to BMI or waist-to-hip ratio related mortality

by the MLE approach (Keith et al, in preparation). Our MLE 2 df knot testing procedure

for specifying the optimal number of knots worked accurately and very similarly to AIC

as long as the sample size was large or when the inclusion criterion was set fairly high

(i.e., α = 0.25 or n = 5000). BIC was too conservative unless both the proportion of

78

events and the sample size was large (i.e., po = 0.33 and n = 5000). Most sample sizes

among complex nationally representative surveys have at least 5,000 participant records

available for analyses. However, when the data are stratified and analyses are run on

small subsets of the survey data, our methods may not have enough power to precisely or

accurately characterize the “true” model.

Neither AIC nor BIC will incur penalties sufficient enough to curb overfitting

when complex sampling weights were applied to the likelihood objective functions. The

weights distorted the scale of the likelihood away from the penalty to the point that they

no longer corrected for overfitting. It is clear that the likelihood and/or the penalty terms

must be normalized in some way before these methods would work correctly in the

complex sample analysis setting.

Due to computational demands and long run-times associated with our MLE 2 df

knot testing procedure, we were only able to conduct analyses on 100 simulations for 8

setting combinations with D1 = 200 parametric bootstrap replicates and a coarse grid

search over L = 6 evenly-spaced BMI locations. Bessaoud et al. (2005) suggest D1 =

1000 and L = 9 which would be feasible for this simulation study only by using high

performance parallel computing resources. We are currently porting the SAS programs to

R (R Development Core Team, 2005) for parallel processing of a more extensive

simulation study. Additionally, we did not introduce covariates into the simulated

models. In future studies we will examine how correlation structures and collinearities

might influence the performance of our knot selection procedure. We acknowledge these

weaknesses in our current study, however, we feel that the results from the simulations

79

are valid and characterize several properties of this aspect of the modeling framework

quite well.

Our methods are intended for use on biological data in which the knot parameters

have meaning and thus we expect to be looking for a relatively low number of knots in

data conditions where the number of observations, n, is expected to be much larger than

the number of parameters, p. Furthermore, given the computational intensity of fitting the

framework, it is not recommended for applications where p is close to or greater than n.

With enough computing power, we suggest that the nonlinear bases in our framework can

be easily extended for fitting models with more than one nonlinear predictor, say age and

BMI, as well as multiplicative interactions. We again offer the caveat that the volume of

the multivariate parameter space will increase exponentially by adding nonlinear

predictors (the “the curse of dimensionality” (Bellman, 1957; Hastie and Tibshirani,

1986) as mentioned in the introduction.

We feel that our 2 df testing procedure for the significant contribution to model fit

of adding a knot is general enough to test against polynomial models. That is, the null

and/or alternative models do not necessarily have to be linear logistic or piecewise linear

logistic. The distributions of the likelihood ratio or F-ratio test statistics are constructed

empirically by the parametric bootstrap procedure and do not rely in any particular null or

alternative model specifications. The forward testing routine could then be modified to

test if a smooth polynomial would fit the data significantly better than a piecewise model.

AIC and BIC were designed for testing non-nested models. Although our MLE 2

df knot testing procedure performed well in our simulation study in comparison to AIC

and BIC, it is important to note that a problem may exist for our approach to selecting the

80

optimal number of knots, K. One of the foundational assumptions of the forward

selection is that the model with K knots is nested within the model with K+1 knots. As

Bessaoud et al. (2005) pointed out, free-knot splines in which both the number and

locations of knots are estimated are non-nested, with the notable exception that the linear

model (K=0) is nested in all K-knot models. Although it is hard to imagine a well-fitted

(K+1)-knot model fitting any worse than a K-knot model, this could possibly happen

since the models are not nested. Defining nested models is not an easy task. Clarke

(2001) provides intuitive, but oversimplified definitions of nested and nonnested models:

“Two models are nested if one model can be reduced to the other model by

imposing a set of linear restrictions on the parameter vector.”

“Two models are nonnested, either partially or strictly, if one model cannot be

reduced to the other model by imposing a set of linear restrictions on the

parameter vector.”

This is the basic idea presented in introductory statistics courses as a foundation for the

asymptotic F-test or likelihood ratio test for testing the contribution of parameter subsets

to the overall model fit in regression analysis. Our situation with fitting free knot

parameters is more complicated than regression. When a free knot parameter is added or

removed from these models, the parameters locally fitted to construct the adjoining spline

segments do not maintain their definition. If we compare two piecewise linear free-knot

spline models (say K=1 to K=2) fitted to the same data, we cannot say that the slope

parameter to the right of the knot in the K=1 model (a2) means the same thing as the slope

parameter (a2) between the two knots in the K=2 model (note that it does not mean the

same thing as the parameter (a3) to the right of the second knot either). These models

81

would be nested if the knot fitted in the K=1 knot model was in a fixed location for the

K=2 knot model, but conditioning added knots on the location of the previous knot

locations undermines the properties we prize in free-knot splines. Even though our 2 df

knot testing procedure appeared to work well in simulations and in application to real

data, finding a more comprehensive approach to finding the optimal K from amongst

nonnested candidate models deserves further research.

References

Akaike H. (1974) A new look at the statistical model identification. IEEE Transactions

on Automatic Control;19:716–723.

Bellman RE. (1957) Dynamic Programming. Princeton University Press, Princeton, NJ.

Bessaoud F, Daurès JP, Molinari N. (2005) Free knot splines for

logistic models and threshold selection. Computer Methods and Programs

in Biomedicine;77:1-9.

Broyden CG. (1970) The Convergence of a Class of Double-rank Minimization

Algorithms. Journal of the Institute of Mathematics and Its Applications;6:76-90.

Burchard HG. (1974) Splines (With Optimal Knots) Are Better. Applicable

Analysis;3:309-319.

Casella G, Berger R. (2001) Statistical Inference. 2nd Edition. New York: Duxbury.

Clarke KA. (2001) Testing nonnested models of international relations: reevaluating

realism. American Journal of Political Science; 45:724-44.

Cox DR. (1975) Partial likelihood. Biometrika;62:69-72.

82

Davison AC, Hinkley DV. (1997) Bootstrap Methods and their Application. New York:

Cambridge University Press.

de Boor C. (1978) A Practical Guide to Splines. New York: Springer-Verlag.

DiCiccio TJ, Efron B. (1996) Bootstrap Confidence Intervals. Stat Sci;11:189-212.

Dierckx, P. (1993) Curve and surface fitting with splines, Oxford Science Publications.

DiMatteo I, Genovese CR, Kass RE. (2001) Bayesian curve-fitting with free-knot splines.

Biometrika;88:1055-71.

Efron, B. (1982) The Jackknife, the Bootstrap, and Other Resampling Plans.

Philadelphia: SIAM.

Eilers P, Marx B. (1996) Flexible Smoothing with B-splines and Penalties. Statistical

Science;11:89-121.

Fletcher R. A New Approach to Variable Metric Algorithms. (1970) Computer

Journal;13:317-22.

Friedman J. (1991) Invited Paper: Multivariate Adaptive Regression Splines. The Annals

of Statistics;19:1-141.

Gray RJ. (1996) Hazard Rate Regression Using Ordinary Nonparametric Regression

Smoothers. Journal of Computational and Graphical Statistics;5:190-207.

Gray RJ. (1994) Spline-based tests in survival analysis. Biometrics;50:640-52.

Goldfarb D. (1970) A Family of Variable Metric Updates Derived by Variational Means.

Mathematics of Computation;24:23-6.

Hall P, Wilson S. (1991) Two guidelines for bootstrap hypothesis testing.

Biometrics;47:757-762.

83

Hastie T, Tibshirani R. (1986) Generalized Additive Models (with discussion). Statistical

Science;1:297-318.

Hastie T, Tibshirani R. (1990) Generalized additive models. Chapman and Hall, London.

Hastie T, Tibshirani R, Friedman J. (2001) The Elements of Statistical Learning: Data

Mining, Inference, and Prediction. New York: Springer-Verlag.

Johnson MS. (2007) Modeling dichotomous item responses with free-knot splines.

Computational Statistics and Data Analysis;51:4178-4192.

Jupp DL. (1978) Approximation to data by splines with free knots. SIAM J Numer

Anal;15:328-343.

Kauermann G, Claeskens G, Opsomer JD. (2006). Bootstrapping for Penalized Spline

Regression. Preprint Series #06-01, Department of Statistics, Iowa State

University. Submitted to Statistical Science.

Keith SW, Wang C, Fontaine KR, Allison DB. (2008) Body mass index and headache

among women: Results from 11 epidemiologic datasets. Obesity;16:377-83.

Kish L. (1995) Survey Sampling, Wiley, New York.

Kooperberg C, Stone C, Truong Y. (1995) Hazard Regression. JASA;90:78-94.

Korn EL, Graubard BI. (1999) Analysis of Health Surveys. J. Wiley & Sons, New York.

Levenberg K. (1944) A Method for the Solution of Certain Problems in Least Squares.

Quart. Appl. Math;2:164-168.

Lindstrom MJ. (1999) Penalized estimation of free-knot splines. Journal of

Computational and Graphical Statistics;8:333-352.

Lindstrom MJ. (2002) Bayesian estimation of free-knot splines using reversible jumps.

Computational Statistics and Data Analysis;41:255-269.

84

Lu WW. (2000) Confidentiality and variance estimation in complex surveys. M.Sc.

Thesis, Simon Fraser University.

Marquardt, D. (1963) An Algorithm for Least-Squares Estimation of Nonlinear

Parameters. SIAM J. Appl. Math;11:431-441.

McCullagh P, Nelder JA. (1989) Generalized Linear Models, Second Edition. Chapman

& Hall/CRC, Boca Raton.

Molinari N, Daures JP, Durand JF. (2001) Regression splines for threshold selection in

survival data analysis; 20:237-247.

Nelder JA, Mead R. (1965) A Simplex Method for Function Minimization. Computer

J;7:308-313.

Nelder JA, Wedderburn R. (1972) Generalized Linear Models. J.R. Statisti. Soc.

A;135:370-384.

Pfeffermann D. (1993) The role of sampling weights when modeling survey data.

International Statistical Review;61:317-37.

R Development Core Team (2005). R: A Language and Environment for Statistical

Computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-

900051-07-0, URL http://www.R-project.org/.

Rao JNK. (1997) Developments in sample survey theory: an appraisal. The Canadian

Journal of Statistics;25:1-21.

Rao JNK, Wu CFJ, Yue K. (1992) Some recent work on resampling methods for

complex surveys. Survey Methodology;18:209–217.

Rosenberg, P. (1995) Hazard Function Estimation Using B-Splines. Biometrics;51:874-

887.

85

Ruppert D. (2002) Selecting the Number of Knots for Penalized Splines. Journal of

Computational & Graphical Statistics;11:735-57.

Ruppert D, Wand MP, Carroll RJ. (2003) Semiparametric Regression. New York:


Rust KF, Rao JNK. (1996) Variance estimation for complex surveys using replication

techniques. Statistical Methods in Medical Research;5:283-310.

SAS Institute Inc., Cary, NC, USA; version 9.1.

Schwarz G. (1978) Estimating the dimension of a model. Ann. Stat;6:461-464.

Shanno DF. Conditioning of Quasi-Newton Methods for Function Minimization. (1970)

Mathematics of Computation;24:647-56.

Stone CJ, Hansen MH, Kooperberg C, Truong YK. (1997) Polynomial Splines and Their

Tensor Products in Extended Linear Modeling. The Annals of Statistics;25:1371-

1425.

U.S. Department of Health and Human Services (DHHS). (1996) National Center for

Health Statistics. Third National Health and Nutrition Examination Survey, 1988-

1994, NHANES III Laboratory Data File (CD-ROM). Public Use Data File

Documentation Number 76200. Hyattsville, MD: Centers for Disease Control and

Prevention.

Wahba G. (1990) Spline Models for Observational Data. SIAM, Philadelphia.

Wand MP, Pearce ND. (2006) Penalized splines and reproducing kernel methods. The

American Statistician;60:233-240.

Wood, S. (2006) Generalized Additive Models: An Introduction with R. New York:

Chapman & Hall/CRC.

86

http://www.ingentaconnect.com/content/asa/jcgs;jsessionid=e9d18mrmjekg.henrietta

http://www.ingentaconnect.com/content/asa/jcgs;jsessionid=e9d18mrmjekg.henrietta

Figure 1. Plotted B-spline basis functions of order m = 2 having knots at BMI =

21 and BMI = 34.

87

a) α=0.10; po=0.10; n=500 b) α=0.10; po =0.33; n=500 c) α=0.25; po =0.10; n=500

d) α=0.25; po =0.33; n=500 e) α=0.10; po =0.10; n=5000 f) α=0.10; po =0.33; n=5000

g) α=0.25; po =0.10; n=5000 h) α=0.25; po =0.33; n=5000 i) α=0.10; po =0.33; n=5000*

* includes complex sample weights Figure 2. Model selection simulation results for the parametric bootstrap 2 df forward selection procedure. True log(odds) model (K = 2) plotted by BMI (in red) with results from 20 replicate datasets (in black). Each cell a) - i) depicts results from data simulated under various conditions including inclusion criterion, α, the proportion of events, po, and sample size, n.

88

a) α=0.10; po=0.10; n=500 b) α=0.10; po=0.33; n=500 c) α=0.25; po=0.10; n=500

d) α=0.25; po=0.33; n=500 e) α=0.10; po=0.10; n=5000 f) α=0.10; po=0.33; n=5000

g) α=0.25; po=0.10; n=5000 h) α=0.25; po=0.33; n=5000 i) α=0.10; po=0.33; n=5000*

* includes complex sample weights Figure 3. A comparison of knot selection simulation results. Plotted lines connect frequencies for the number of knots fitted to 100 simulated datasets by method: AIC (in black), BIC (in blue), and the parametric 2 df forward selection procedure (in red). Each cell a) - i) depicts results from data simulated under various conditions including inclusion criterion, α, the proportion of events, po, and sample size, n.

89

BMI AND HEADACHE AMONG WOMEN: RESULTS FROM 11 EPIDEMIOLOGIC

DATASETS

by

SCOTT W. KEITH, CHENXI WANG, KEVIN R. FONTAINE, CHARLES D. COWAN, DAVID B. ALLISON

Obesity; 16:377-83.

Copyright 2008 by

Scott W. Keith

90

Abstract

Objective: To evaluate the association between body mass index (BMI: kg/m2) and

headaches among women.

Research Methods and Procedures: Cross-sectional analysis of 11 datasets identified

after searching for all large publicly available epidemiologic cohort study datasets

containing relevant variables. Datasets included: National Health Interview Survey:

1997-2003; the first National Health Examination and Nutrition Survey; Alameda County

Health Study; Tecumseh Community Health Study; and Women’s Health Initiative. The

women (220,370 in total) were aged 18 years or older and had reported their headache or

migraine status.

Results: Using nonlinear regression techniques and models adjusted for age, race, and

smoking, we found that increased BMI was generally associated with increased

likelihood of headache or severe headache among women. A BMI of approximately 20

was associated with the lowest risk of headache. Relative to a BMI of 20, mild obesity

(BMI of 30) was associated with roughly a 35% increase in odds of headache whereas

severe obesity (BMI of 40) was associated with roughly an 80% increase in odds. Results

were essentially unchanged when models were further adjusted for socioeconomic

variables, alcohol consumption, and hypertension. Being diagnosed with migraine

showed no association with BMI.

Discussion: Among US women, a BMI of approximately 20 (about the 5th percentile)

was associated with the lowest likelihood of headache. Consistently across studies, obese

91

women had significantly increased risk of headaches. In contrast, risk of diagnosed

migraine headache per se was not obviously related to BMI. The direction of causation

and mechanisms of action remain to be determined.

Key Words: women, headaches, migraines, nonlinear regression, splines.

92

Introduction

Various forms of headache (e.g., chronic daily headache, tension-type headache,

migraine headache) are disabling conditions (1-3) that, compared to other common pain

conditions, produce the greatest loss of productive time in the US workforce. (2) Because

the prevalence of the different forms of headache vary widely in published studies (e.g.,

1.3% to 86% for tension-type headache (1)) it is difficult both to derive a definitive

estimate and to assess whether the headache prevalence has changed over time. (4,5)

Headache has been shown to be associated with breathing disorders, caffeine

consumption, alcohol consumption, hypertension, anxiety and depressive disorders. (6)

Emerging evidence from case-control (6,7) and observational studies (8-10) suggest that

increased body mass index (BMI: kg/m2) might be a risk factor for headache.

In this study we estimate the association between BMI and headache among adult

women using data from several large publicly available epidemiologic datasets. We

restricted our analyses to women because it has been established that headache

prevalence is much higher among women, (3) and preliminary unpublished data

suggested that obesity’s association with headache varied substantially by gender. This is

consistent with the differential associations of obesity and a variety of health issues

observed between men and women. (11-14) Rather than analyzing a single dataset and

issuing the near ubiquitous call for replication in the discussion, to evaluate the

reproducibility of results and how results might change as a function of study-related

factors, we opted to analyze multiple data sets using identical statistical procedures. This

93

allowed us to derive estimates of the magnitude of the BMI-headache association across

all publicly available epidemiologic datasets meeting a set of inclusion criteria.

Methods

Inclusion Criteria for Datasets

To rigorously evaluate the association between BMI and headache among

women, we used cross-sectional epidemiologic datasets that met the following

requirements: (i) they must be large enough (i.e., ≥ 500 women) to allow us to generate

reasonably precise estimates across a broad range of BMI; (ii) they must contain the

height and weight of respondents (measured or self-reported) allowing calculation of

BMI; (iii) they must contain respondents’ age, race, and other variables of interest (i.e.,

smoking status, socioeconomic status, and hypertension); and (iv) they must contain

information on the presence/absence of headache.

Dataset Search Procedures

To obtain epidemiologic datasets that met the aforementioned criteria, we

searched the following electronic resources: Inter-University Consortium for Political and

Social Research (ICPSR) [http://www.icpsr.umich.edu/access/index.html], the National

Center for Health Statistics [http://www.cdc.gov/nchs/express.htm], the National Heart,

Lung and Blood Institute (NHLBI) Population Studies Dataset

[http://apps.nhlbi.nih.gov/popstudies], the North Carolina Center for Population Studies

[http://www.cpc.unc.edu/restools/sdf], the Economic and Social Data Service, United

Kingdom [http://www.esds.ac.uk/access/access.asp], and the National Library of

94

Medicine’s Medline and pre-Medline dataset [http://www.ncbi.nlm.nih.gov]. The search

of these resources yielded 11 datasets that met criteria for inclusion in our analyses.

Overview of Datasets Used

We briefly describe here and in Table 1 the characteristics of the 11 datasets used

in our analysis.

The Alameda County Health Study (ACHS) followed adults selected in 1965 to

represent the non-institutionalized population of Alameda County California.15 Data

collected included self-reported demographic information, as well as physical, cognitive,

psychological, and social functioning.

The Tecumseh Community Health Study (TCHS), initiated in 1959, investigates

health and disease determinants in the rural community of Tecumseh, Michigan.

Participants completed extensive questionnaires and medical examinations.16

The National Health Interview Survey (NHIS: 1997 to 2003), begun in 1969, is a

continuing nationwide survey of the U.S. civilian non-institutionalized population

conducted in households on a yearly basis. (17) A probability sample of households is

interviewed each year. Detailed information on the health of each living member of the

sample household is obtained.

The First National Health and Nutrition Examination Survey (NHANES I) was

conducted from 1971 to 1975 on a nationwide probability sample of individuals aged 1-

74 years. We analyzed the data from women aged 18 and over. NHANES I collected data

via questionnaire as well as through comprehensive medical and dental examinations.

NHANES I design and sampling methods have been reported previously. (18)

95

The Women’s Health Initiative (WHI) is a 40-center, national United States study

of risk factors and the prevention of common causes of mortality, morbidity, and

impaired quality of life in women. Post-menopausal women, aged 50 to 79 years,

completed health forms and attended a clinic visit at baseline and three years later.

Details of the sampling design, protocol sampling procedures, and selection criteria have

been previously published. (19)

Study Variables

Predictor. Body mass index (BMI: kg/m2) was the predictor variable of primary

interest and was calculated from either measured or self-reported (NHIS only) weight and

height. Self-reported weight has been shown to correlate very highly with measured

weight. (20) BMI is largely independent of height (r ≈ -0.03), strongly related to weight (r

≈ 0.86), and reasonably correlated with body fatness. (21)

Outcomes variables. The datasets varied somewhat with regard to how headache

was assessed (see Table 2). We recoded and dichotomized headache outcomes so that: 0

= absence of an indicator of severe or frequent headache or migraine versus 1 = presence

of an indicator of severe or frequent headache or migraine.

Covariates. Data on age, race, and smoking status were included in the primary

analyses (i.e., the primary models) as covariates. We also included socio-economic status

variables (income, education, and employment status), alcohol consumption, and

hypertension as covariates in our secondary analyses (i.e., the extended models).

96

Missing Data. Missing data were handled using listwise deletion (22) because

more complex missing data management procedures would impose a significantly greater

computational demand to an already computationally demanding set of analyses.

Furthermore, the complex sampling designs of datasets, most notably the NHIS, would

create additional statistical issues related to imputation. Although there was no reason to

hypothesize that “missingness” was systematically related to the study variables, we

noted two datasets in which some study variables were missing information in at least 5%

of records. In these datasets, we fitted logistic regression models for each variable

producing such missingness (> 5%) to test for a relationship between missingness, coded

as a binary dependent variable and the other study variables as possible predictors.

Statistical analysis

Traditionally, analyses of the association between BMI and dichotomous

outcomes such as mortality or the presence/absence of a given medical condition have

been estimated by treating BMI as either a continuous or categorical variable. (23) Each

approach has advantages and disadvantages. Advantages of treating BMI as a continuous

variable include that it does not degrade the data, tends to preserve power, and does not

impose arbitrary cut-points. Rather, one can adjust for curvature in data via incorporation

of polynomials of BMI into the model. The major advantage of treating BMI as a

categorical variable, with the categories chosen a priori, is the (seeming) ease of

communication of the results and the allowance for marked non-linearity that may not be

easily captured by polynomials. The nonlinear regression we used offers an alternative

that captures advantages of treating BMI as both a continuous and categorical variable.

97

(24) Specifically, we applied piecewise linear free knot spline logistic regression models

that do not assume a linear relationship between BMI and headache and allow for fitting

“breakpoints” in the logit function at so-called knots that may be interpreted to define

categories. (24) Thus, these data-driven models can take into account potential non-

linearity by determining BMI categories for contiguous BMI groupings of individuals

with like patterns of risk while, at the same time, allowing individuals with different BMI

in a category to have different levels of risk estimated as a function of their BMI. In brief,

we fitted nonlinear models to each dataset, used a parametric bootstrap procedure to

select the optimal spline model for BMI, and then used a nonparametric bootstrap

procedure to calculate accurate standard error estimates and confidence intervals that

adjusted for complex sample design features.

Two analyses were conducted on each dataset: primary and secondary. In the

primary analyses, we adjusted for age, race, and smoking status. In secondary analyses,

we assessed an extended model that adjusted for the aforementioned covariates as well as

socio-economic variables, alcohol consumption, and hypertension. For purposes of

comparing models within each dataset, the number of knot parameters fitted in the

extended model was fixed at the number of knots found in the primary model.

We excluded BMI values less than 14 and greater than 90 to avoid possible outlier

effects and data recording errors. After exclusions, a total of 220,370 respondents from

the 11 datasets were available for statistical analyses. Results were presented as

parameter estimates with standard errors and 95% confidence intervals. We also plotted

odds ratios for graphically demonstrating the shapes of relationships we found and

98

calculated odds ratios and confidence intervals at selected BMI values (i.e., 18, 25, 30,

35, 40) as compared to a reference BMI value for each of the 11 datasets.

Results

Table 3 and Figure 1 present the piecewise logistic regression results for the

primary model (i.e., adjusted for age, race, and smoking) for each of the 11 datasets.

Increased BMI was generally associated with increased risk of headache or severe

headache among women. Moreover, results from the NHIS 1997, NHIS 1999, NHIS

2003, and ACHS datasets located “breakpoints” (knots) in the logit function around a

BMI of 20 suggesting that the relationship between increased BMI and the risk of

headache may change significantly at this point. As shown in Figure 1, our models often

predicted that a BMI of approximately 20 was associated with the lowest risk of severe

and/or frequent headaches. The NHIS 1997 data also produced a second knot at a BMI of

about 35. At this point, the increased risk of headache with increased BMI significantly

decelerated suggesting that people having BMI greater than 35 may share the same level

of headache risk with respect to BMI. Risk of being diagnosed with migraine headache

(as assessed in the WHI) or with taking medication for headache (as assessed in

NHANES I) were not significantly related to BMI.

Table 4 and Figure 1 present the results for the extended models (i.e., the primary

models extended to include socio-economic status variables, alcohol consumption, and

hypertension as covariates). As can be seen, these estimates were generally in accord

with those derived from the primary models. The results from the NHANES I data were

99

also not materially altered when headache was coded either as: [No = 0; Occasionally &

Regularly = 1 OR No & Occasionally = 0; Regularly = 1].

Table 5 presents the odds ratios (OR’s) and confidence intervals derived from the

primary and extended models for selected BMI values (i.e., 18, 25, 30, 35, and 40)

compared to the reference BMI of 20. We chose a BMI of 20 as the reference level

because it was the most common nadir (i.e., the value most often associated with the

lowest probability of reporting headache) across the datasets, thus making the computed

OR’s greater than 1 in most cases. This allowed us to present results on a consistent scale

for visually comparing OR’s across the datasets we examined. You can see that among

women with BMI greater than 20 the OR’s were mostly statistically significant and

exhibited a similar increasing pattern in headache risk across 9 of our 11 datasets. For

example, among the NHIS results: as compared to a BMI of 20, we estimated that mild

obesity (BMI of 30) was associated with an increase in odds of reporting headache

ranging from 31% to 65% whereas severe obesity (BMI of 40) was associated with an

increase in odds ranging from 49% to 118%.

We found that only NHANES I and TCHS had variables which were missing

information from at least 5% of records. Smoking was available in only about 40% of the

NHANES I women participants and was removed from the analyses presented here.

Analyses including smokers in NHANES I produced the same non-significant results

(data not shown). The missing smoking data was associated with decreased BMI,

decreased age, income below $20,000, not being a current drinker, and hypertension.

Hypertension was missing in 23% of women in NHANES I and missingness was

associated with taking headache medication, increased age, being white, having attended

100

graduate school, earning over $20,000, having ever been an alcohol drinker, and

increased BMI (data not shown). In the TCHS data, missing information on income

(22%) was associated with increased age; missing information on hypertension (9%) was

associated with income over $20,000 and decreased BMI; and missing information on

BMI (5%) was associated with having headaches (data not shown).

Discussion

In this set of analyses of 11 different, large datasets collectively containing over

200,000 US women, we found that increased BMI was generally associated with

significantly increased risk of headache, but not diagnosed migraines. We note that our

results across all datasets, with the exceptions of WHI (diagnosed migraines) and

NHANES I (taking headache medication), suggested that, as compared to a BMI of 20,

mild obesity (BMI of 30) was associated with approximately a 35% increase in odds of

reporting headache whereas severe obesity (BMI of 40) was associated with a roughly

80% increase in odds of headache. Across the databases, a BMI of about 20 was

commonly associated with the lowest risk of headache. These results were not materially

altered when socioeconomic variables, alcohol consumption, and hypertension were also

included in the model.

With regard to migraine headache, results from our primary model of the WHI

data, the only dataset that explicitly assessed migraine headache diagnosis, suggested that

BMI may not be associated with migraine, but our extended model revealed a slight

negative association. It is noteworthy that many people with migraines go undiagnosed.

Therefore, the relationship between BMI and diagnosed migraines is not likely to reflect

101

the BMI relationship with all migraines (both diagnosed and undiagnosed). Our main

conclusion that BMI was associated with increased likelihood of headaches is based on

the following logic: 1) the WHI analysis showed no positive correlation between BMI

and diagnosed migraines, and this is a finding of consequence because of the large WHI

sample; 2) the NHIS analyses showed a positive association between BMI and

“headaches or migraines;” 3) the ACHS and TCHS showed a positive association

between BMI and “headaches” which were probably interpreted by most participants to

include migraines; 4) finding 1) suggested that there was no association between BMI

and diagnosed migraines in our NHIS, ACHS and TCHS analyses; and 5) so we

concluded that the findings in NHIS, ACHS and TCHS suggest that BMI was associated

with non-migraine headaches and possibly undiagnosed migraine.

Most of the databases we analyzed individually provided ample statistical power

to detect the estimated effect sizes we observed. That, along with the consistency of the

results, obviated the need to conduct a formal meta-analysis. Our findings clarify and

accord with previous studies. Specifically, after adjusting for age, gender, race, and

education, Scher and colleagues (6) found that obesity was associated with prevalent

chronic daily headache (OR = 1.34). Similarly, Ohayon (10) and associates found that

overweight/obese (BMI >27) respondents were more likely to report morning headache

than were adults with BMIs 20-25 and among a sample of nearly 15,000 Australian

women, Brown (9) and colleagues found that obese persons were more likely to report

headache (OR = 1.47). Also, consistent with our primary model analysis of WHI data,

Bigal and colleagues, (8) using data from over 30,000 participants, found that BMI was

not associated with migraine prevalence.

102

Interestingly, we observed some evidence in four datasets (NHIS 1997, 1999,

2003; and ACHS) that unusually low BMI may be associated with increased risk of

headache. These results suggest that increased BMI may be associated with decreased

risk of headache among the category of women having BMI less than 20 and increased

risk of headache among those having BMI over 20. This finding should be interpreted

with caution since the association was statistically significant at the 0.05 level only in the

NHIS 1997 and 1999 datasets. It was noteworthy that, since only about 5% of all study

participants had BMI values below 20, we may have lacked sufficient power to reliably

detect the elevated risk levels estimated to be associated with low BMI across studies. To

our knowledge, low BMI, in the absence of major illness (e.g., cancer), has not been

previously associated with reports of headache. Nonetheless, this finding merits further

investigation before definitive conclusions can be drawn.

9 out of the 11 datasets we examined had no study variables missing more than

3% data, so any resulting effects from missingness were likely minimal in these cases. In

NHANES I and TCHS where we saw higher levels of missing data for some variables it

was less clear what, if any, effects missing values might have had on our results.

Interestingly, in TCHS, those reporting headache were about 80% more likely to be

missing BMI data than those not reporting headache, but we cannot know how this would

influence the significant linear relationship we detected between BMI and headache.

Considering results from the more complete datasets (i.e., NHIS 1997 - 2003 and ACHS)

and the similarity of their results to those from TCHS, missingness may not have

significantly affected the TCHS results.

103

The mechanisms that might be responsible for the obesity-headache association

are unclear. However, obesity associates with the metabolic syndrome, a pro-

inflammatory, pro-thrombotic state which may contribute to headache development and

progression. (25,26) Headache is also related to sleep apnea, a condition highly

associated with obesity. (27) Hypertension also associates with headache (28) and obesity

is a major risk factor for hypertension. (29) Moreover, headache is one of the side effects

of many medications, including sibutramine, a medication to treat obesity. (30) Each of

these offers a hypothesis meriting further study.

This study has limitations. First, the headache-related questions in the datasets

differed, in some cases substantially. For example, the WHI headache question focused

on migraine headache and asked, “Has a doctor told you that you have ‘migraine’?” In

contrast, the NHANES headache question did not ask whether the respondent suffered

from headache but, rather, whether they used medication for headache (“During the past

6 months have you used any medicine, drugs or pills for headache?”). Although we coded

the headache variables in the datasets to create uniformity in outcome variables (see

Table 2), these two datasets (WHI and NHANES I) which asked about headache in a way

related to diagnosis or treatment are the only two that did not detect clear and statistically

significant associations. The assessment of headache in the other datasets focused

primarily on the presence/frequency/severity of headaches. Second, we only considered

cross-sectional datasets as they were more widely available and different statistical

methodology would be required to analyze longitudinal data. Hence, our analyses were

restricted to headache or migraine status concurrent with BMI status. We did not look at

data on subjects free of headaches at baseline that were followed prospectively to see if

104

BMI or changes in BMI would predict headache or migraine occurrence over time.

Follow-up data were available in only the two smallest studies (ACHS and TCHS). In the

future, we recommend analyzing any available longitudinal data on headache and BMI

by using nonlinear methodology similar to that which we have applied to these cross-

sectional databases.

In conclusion, the results of estimating the association between BMI and

headache in large, nationally-representative samples of women indicated that obese

women have significantly higher risk of headaches. Further research is warranted to study

the direction and mechanisms of causation as well as to investigate the possible BMI-

headache relationship among men. The possibility that weight loss may alleviate severe

or chronic headache problems among obese people also warrants investigation.

Acknowledgement This research was supported by Ortho-McNeil Pharmaceutical, Inc. and NIH

grants P30DK056336, T32HL079888, K23MH066381, and AR49720-01A1.

Conflict of interest statement

The corresponding author, David B. Allison, had full access to all the data in the

study and had final responsibility for the decision to submit for publication. The

investigators have no financial and personal relationships with other people or

organizations that could inappropriately influence (bias) their work. They do wish to

disclose that the work was funded by Ortho-McNeil Pharmaceutical, Inc.

105

References

1 Schwartz BS, Stewart WF, Simon D, Lipton RB. Epidemiology of tension–type

headache. JAMA. 1998;279:381–83.

2 Stewart WF, Ricci JA, Chee E, Morganstein D, Lipton R. Lost productive time and

cost due to common pain conditions in the US workforce. JAMA. 2003;290:2443–54.

3 Scher AI, Stewart WF, Liberman J, Lipton RB. Prevalence of frequent headache in a

population sample. Headache. 1998;38:497–506.

4 Lipton RB, Diamond M, Freitag FG, Bigal M, Stewart WF, Reed ML. Migraine

prevention patterns in a community sample: Results from the American Migraine

Prevalence and Prevention (AMPP) Study. Headache. 2005;65:792. Abstract F38.

5 Rasmussen BK, Jensen R, Schroll M, Olesen J. Epidemiology of headache in the

general population: a prevalence study. J Clin Epidemiol. 1991;44:1147–57.

6 Scher AI, Stewart WF, Ricci JA, Lipton RB. Factors associated with the onset and

remission of chronic daily headache in a population–based study. Pain. 2003;106:81–

89.

7 Peres MFP, Lerario DDG, Garrido AB, Zukerman E. Primary headache in obese

patients. Arq Neuropsiquiatr. 2005;63:931–33.

8 Bigal ME, Liberman JN, Lipton RB. Obesity and migraine: A population study.

Neurology. 2006;28:545–50.

9 Brown WJ, Mishra G, Kenardy J, Dobson A. Relationships between body mass index

and well-being in young Australian women. Int J Obese. 2000;24:1360–68.

10 Ohayon MM. Prevalence and risk factors of morning headaches in the general

population. Arch Intern Med. 2004;164:97–102.

106

11 Calle EE, Rodriguez C, Walker-Thurmond BA, Thun MJ. Overweight, obesity and

mortality from cancer in a prospectively studied cohort of US adults. N Engl J Med.

2003;348:1625–38.

12 Haslam DW, James WP. Obesity. Lancet. 2005;366:1197–1209.

13 Klein S, Burke LE, Bray GA, et al. Clinical implications of obesity with specific

focus on cardiovascular disease: a statement for professionals from the American

Heart Association Council on nutrition, Physical Activity, and Metabolism.

Circulation. 2004;110:2952–67.

14 Pi-Sunyer FX. Comorbidities of overweight and obesity: current evidence and

research issues. Med Sci Sports Exerc. 1999;31:S602–S608.

15 Berkham LF, Breslow L. Health and ways of living: The Alameda County Studies.

New York, NY: Oxford University Pres, 1983.

16 Epstein FH, Napier JA, Block WD, et al. The Tecumseh Study: design, progress, and

perspectives. Arch Environ Health. 1970;21:402–07.

17 NCHS. National Health Interview Survey (NHIS). Public–Use Data Files. http:

//www.cdc.gov/nchs/products/elec_prods/subject/nhis.htm.

18 Cox CS, Mussolino ME, Rothwell ST, et al. Plan and operation of the NHANES I

Epidemiologic Follow–Up Study, 1992. Vital Health Stat 1. 1997;35:1–231.

19 The Women’s Health Initiative Study Group. Design paper. Design of Women’s

Health Initiative Clinical Trial and Observational Study. Control Clin Trials.

1998;19:61–109.

20 Jeffrey RW. Bias in reported body weight as a function of education, occupation,

health, and weight concern. Addict Behav. 1996;21:217–22.

107

21 Heymsfield SB, Allison DB, Heshka S, Pierson RN. Assessment of human body

composition. In: D.B. Allison, ed. Handbook of assessment methods for eating

behaviors and weight-related problems: Measures, theory, and research. San Diego,

CA: Sage Publications, 1995:515–60.

22 Rao JNK, Wu CFJ, Yue K. Some recent work on resampling methods for complex

surveys. Survey Methodology. 1992;18:209–17.

23 Fontaine KR, Allison DB: Obesity and Mortality Rates. In: G. Bray & C Bouchard

(Eds.), Handbook of Obesity, 2nd Edition. New York, Dekker and Co., 2003:767–85.

24 Bessaoud F, Daures JP, Molinari N. Free knot splines for logistic models and

threshold selection. Comp Meth Prog Biomed. 2005;77:1–9.

25 Lee YH, Pratley RE. The evolving role of inflammation in obesity and the metabolic

syndrome. Curr Diabetes Rep. 2005;5:70–75.

26 Alessi MC, Lijnen HR, Bastelica D, Juhan-Vague I. Adipose tissue and

atherothrombosis. Pathophysiol Haemost Thromb. 2004;33:290–97.

27 Dodick DW, Eross EJ, Parish JM. Clinical, anatomical, and physiologic relationship

between sleep and headache. Headache. 2003;43:282–92.

28 Law M, Morris JK, Jordan R, Wald N. Headaches and treatment of blood pressure:

results from a meta–analysis of 94 randomized placebo–controlled trials with 24,000

participants. Circulation. 2005;112:2301–06.

29 Pi-Sunyer FX. Pathophysiology and long–term management of metabolic syndrome.

Obes Res. 2004;12 Suppl:174S–180S.

30 Loewinger LE, Young WB. Headache preventives: effect on weight. Neurology.

2002;58[7 Suppl 3]:A286.

108

31 SAS, Version 9.1. SAS Institute. Cary, NC. 2003.

32 Davison AC, Hinkley DV. Bootstrap Methods and their Applications. Cambridge:

Cambridge University Press, 1997.

33 Efron B. The jackknife, the bootstrap, and other resampling plans, in CBMS–NSF

Regional Conf. Series in Applied Mathematics, no. 38. SIAM. 91, 1982.

34 DiCiccio TJ, Efron B. Bootstrap Confidence Intervals. Stat Sci. 1996;11:189–212.

35 Hall P, Wilson SR. Two guidelines for bootstrap hypothesis testing. Biometrics.

1991;47:757–62.

36 Rust KF, Rao JNK. Variance estimation for complex systems using replication

techniques. Stat Methods Med Res. 1996;5:283–310.

37 Lahiri, P. On the impact of Bootstrap in survey sampling and small-area estimation.

Stat Sci. 2003;18:199–210.

109

Figure 1. Odds ratios for headaches among women by BMI (reference BMI = 20).

110

Table 1 Description of epidemiologic datasets used Study

Composition of Sample

Dates of

study

Female

(%)

Age at entry (yrs)

Weight

& height

White (%)

National Health Interview Survey (NHIS: 1997-2003)

Continuous nationwide household survey of the

civilian non-institutionalized US

population

Annual ~52% ≥ 18 Self-report

~72%

Women’s Health Initiative Observational Study (WHI)

Women ineligible for clinical trial

components enrolled from 40 US centers

1993-1998

100% 50-79 Measured 83%

National Health and Nutrition Examination Survey (NHANES I)

Collects information about the health and

lifestyle of a nationally representative sample of

the civilian non-institutionalized US

population

Annual ~51% 20+ Measured ~60%

Alameda County Health Study (ACHS)

Representative sample of Alameda County, CA

1965-1975

54% 16-94 Measured 79%

Tecumseh County Health Study (TCHS)

Representative sample of Tecumseh, MI

1959-1985

52% 35-69 Measured 100%

111

Table 2 The coding of headache among the 11 datasets

Dataset Question Response options National Health Interview Survey (NHIS: 1997-2003)

“During the PAST THREE MONTHS, did you have… severe headache or migraine?”

Yes No Refused Not ascertained Don’t know

National Health and Nutrition Examination Survey (NHANES I)

“During the past 6 months have you used any medicine, drugs or pills for headache?” Coded and analyzed in two ways: [No = 0; Occasionally & Regularly = 1] [No & Occasionally = 0; Regularly = 1]

Regularly Occasionally No Blank

Women’s Health Initiative (WHI)

“Has a doctor told you that you have any of the following conditions?”

Migraine headache

Alameda County Health Study (ACHS)

“Have you had frequent headaches during the past 12 months?”

Yes No No answer

Tecumseh Community Health Study (TCHS)

(If gets headaches) do they bother you just a little or quite a bit? [0,1,2 coded as absence of headache AND 3,4,5,6,7,8 coded as presence of headache]

0. Never gets headaches 1. Gets headaches rarely, bother a little 2. Gets headaches frequently, bother a

little 3. Gets headaches rarely, bother quite a

bit 4. Gets headaches frequently, bother

quite a bit 5. Gets headaches, bother quite a bit 6. Other headaches, not classifiable 7. Headaches rarely, not ascertained how

bothersome 8. Headaches frequently, not ascertained

how bothersome 9. Not ascertained

112

Table 3 Piecewise logistic regression primary model results*

Study Sample Size Parameter

Estimate Standard Error 95% CI (bootstrapped)

NHIS 1997 BMI Slope (low)†

19,727 -0.151 0.115 (-1.235, -0.017) knot 1‡ 18.97 2.236 (16.45, 20.30)

BMI Slope (mid)† 0.041 0.093 (0.033, 0.049) knot 2 35.08 4.212 (32.74, 48.91)

BMI Slope (high)† -0.003 0.019 (-0.061, 0.016) NHIS 1998

BMI Slope 17,355 0.027 0.004 (0.021, 0.032) NHIS 1999

BMI Slope (low) 16,704 -0.056 0.063 (-0.137, 0.015) knot 20.19 3.219 (18.83, 22.17)

BMI Slope (high) 0.030 0.020 (0.025, 0.036) NHIS 2000

BMI Slope 17,298 0.034 0.003 (0.030, 0.039) NHIS 2001

BMI Slope 17,666 0.029 0.003 (0.024, 0.033) NHIS 2002

BMI Slope 16,280 0.031 0.004 (0.025, 0.036) NHIS 2003


BMI Slope (high) 0.039 0.010 (0.033, 0.044) ACHS


BMI Slope (high) 0.032 0.013 (0.010, 0.052) TCHS§

BMI Slope 2,397 0.022 0.009 (0.005, 0.038) NHANES I**

BMI Slope 10,113 0.013 0.012 (-0.016, 0.032) WHI

BMI Slope 82,953 0.000 0.002 (-0.004, 0.004) * Adjusted for age, race, and smoking status; † BMI was divided into 1, 2, or 3 contiguous segments (depending on the number of knots in the model – 0, 1, or 2, respectively) and will have an estimate of the linear rate of change in log odds of headache per segment: low, mid, high; ‡ Knots were entered into the model and retained only if they contributed significantly to model fit at significance level 0.05; § 100% Caucasian; ** Employment status on men only and not adjusted for smoking as nearly 60% of records are missing that data.

113

114

Table 4 Piecewise logistic regression extended model results*

Study Sample Size Parameter Estimate

Standard Error

95% CI (bootstrapped)

NHIS 1997 BMI Slope (low)†

18,480 -0.068 0.093 (-0.224, 0.046) knot 1‡ 20.04 1.794 (17.44, 23.23)

BMI Slope (mid)† 0.051 0.130 (0.029, 0.125) knot 2 26.38 6.897 (10.71, 35.82)

BMI Slope (high)† 0.017 0.036 (0.006, 0.030 NHIS 1998

BMI Slope 16,132 0.020 0.004 (0.014, 0.026) NHIS 1999

BMI Slope (low) 15,421 -0.062 0.098 (-0.898, -0.004) knot 20.19 5.497 (12.30, 22.28)

BMI Slope (high) 0.024 0.061 (0.019, 0.032) NHIS 2000

BMI Slope 16,205 0.029 0.003 (0.024, 0.033) NHIS 2001

BMI Slope 16,429 0.023 0.003 (0.018, 0.028) NHIS 2002

BMI Slope 15,077 0.024 0.004 (0.018, 0.031) NHIS 2003


BMI Slope (high) 0.033 0.011 (0.028, 0.039) ACHS


BMI Slope (high) 0.013 0.110 (-0.011, 0.037) TCHS§

BMI Slope 1,624 0.027 0.011 (0.006, 0.049) NHANES I**

BMI Slope 7,427 -0.015 0.015 (-0.048, 0.014) WHI

BMI Slope 82,423 -.009 0.002 (-0.013, -0.005) * Adjusted for age, race, smoking, alcohol, hypertension, and SES; † BMI divided into 1, 2, or 3 contiguous segments (depending on number of knots in model) and has an estimate of the linear rate of change in log odds of headache per segment: low, mid, high; ‡ Extended models fit with same number of knots as the primary model; § 100% Caucasians; ** Employment status on men only and not adjusted for smoking as nearly 60% of records are missing that data.

115

Table 5 Odds ratios* and 95% confidence intervals across selected BMI values for the primary and extended models

Primary Model: Body Mass Index (BMI: kg/m2) 18 25 30 35 40NHIS 1997 1.11 (0.91, 1.35) 1.23 (1.10, 1.32) 1.51 (1.32, 1.73) 1.86 (1.51, 2.17) 1.84 (1.54, 2.11) NHIS 1998 0.95 (0.93, 0.96) 1.14 (1.10, 1.19) 1.31 (1.21, 1.41) 1.50 (1.34, 1.68) 1.72 (1.47, 2.00) NHIS 1999 1.12 (0.94, 1.32) 1.14 (1.03, 1.20) 1.33 (1.19, 1.44) 1.54 (1.36, 1.73) 1.78 (1.53, 2.07) NHIS 2000 0.93 (0.92, 0.95) 1.19 (1.15, 1.23) 1.41 (1.32, 1.51) 1.68 (1.51, 1.86) 1.99 (1.74, 2.29) NHIS 2001 0.94 (0.93, 0.96) 1.16 (1.12, 1.19) 1.33 (1.25, 1.42) 1.53 (1.40, 1.69) 1.77 (1.57, 2.02) NHIS 2002 0.94 (0.93, 0.95) 1.17 (1.23, 1.21) 1.36 (1.26, 1.46) 1.58 (1.41, 1.77) 1.85 (1.59, 2.14) NHIS 2003 1.16 (0.91, 1.37) 1.21 (1.12, 1.26) 1.47 (1.34, 1.59) 1.79 (1.59, 1.99) 2.18 (1.87, 2.50) ACHS 1.23 (0.92, 1.48) 1.17 (1.02, 1.30) 1.37 (1.09, 1.69) 1.60 (1.15, 2.19) 1.88 (1.20, 2.84) TCHS 0.96 (0.93, 0.99) 1.11 (1.03, 1.21) 1.24 (1.05, 1.47) 1.38 (1.08, 1.78) 1.54 (1.11, 2.16) NHANES I 0.97 (0.93, 1.02) 1.07 (0.96, 1.21) 1.14 (0.91, 1.45) 1.22 (0.87, 1.75) 1.31 (0.83, 2.11) WHI 1.00 (0.99, 1.01) 1.00 (0.98, 1.02) 1.00 (0.96, 1.04) 1.00 (0.94, 1.05) 0.99 (0.92, 1.07)

Extended Model: Body Mass Index (BMI: kg/m2) 18 25 30 35 40NHIS 1997 1.14 (0.90, 1.37) 1.29 (1.11, 1.51) 1.47 (1.25, 1.70) 1.60 (1.39, 1.98) 1.74 (1.48, 2.04) NHIS 1998 0.96 (0.94, 0.98) 1.10 (1.06, 1.15) 1.22 (1.12, 1.33) 1.35 (1.18, 1.54) 1.49 (1.25, 1.77) NHIS 1999 1.13 (0.93, 1.31) 1.11 (0.99, 1.19) 1.26 (1.12, 1.41) 1.42 (1.24, 1.67) 1.60 (1.36, 1.88) NHIS 2000 0.94 (0.93, 0.96) 1.15 (1.12, 1.20) 1.33 (1.24, 1.43) 1.54 (1.39, 1.71) 1.79 (1.55, 2.05) NHIS 2001 0.96 (0.94, 0.97) 1.12 (1.08, 1.16) 1.26 (1.18, 1.35) 1.41 (1.28, 1.56) 1.58 (1.38, 1.81) NHIS 2002 0.95 (0.94, 0.97) 1.13 (1.08, 1.18) 1.28 (1.17, 1.39) 1.44 (1.26, 1.63) 1.63 (1.36, 1.92) NHIS 2003 1.15 (0.92, 1.39) 1.18 (1.11, 1.25) 1.40 (1.28, 1.53) 1.65 (1.47, 1.86) 1.95 (1.67, 2.27) ACHS 1.23 (0.95, 1.49) 1.06 (0.92, 1.20) 1.13 (0.88, 1.44) 1.21 (0.84, 1.73) 1.28 (0.79, 2.07) TCHS 0.95 (0.91, 0.99) 1.14 (1.02, 1.27) 1.31 (1.04, 1.62) 1.49 (1.06, 2.07) 1.71 (1.08, 2.64) NHANES I 1.03 (0.97, 1.10) 0.93 (0.79, 1.07) 0.86 (0.62, 1.14) 0.80 (0.49, 1.22) 0.75 (0.39, 1.31) WHI 1.02 (1.01, 1.03) 0.96 (0.94, 0.98) 0.92 (0.88, 0.95) 0.88 (0.82, 0.93) 0.84 (0.77, 0.91) * Compared to the BMI reference level of 20

BODY MASS INDEX AND WAIST-TO-HIP RATIO AS THEY RELATE TO

MORTALITY IN NHANES III

by

SCOTT W. KEITH, DAVID B. ALLISON

In preparation for The Journal of the American Medical Association

Format adapted for dissertation

116

Abstract

Context: As body mass index (BMI) and waist-to-hip ratio (WHR) continue to be popular

choices for characterizing obesity, it remains unclear which might better predict mortality

in the general population.

Objective: To analyze BMI and WHR for their respective capacities to predict the odds of

mortality among adults in a recent nationally representative dataset.

Design, Setting, and Participants: Piecewise linear logistic regression was applied to data

from the third National Health and Nutrition Examination Survey (NHANES III: 1988-

1994) to model the possibly nonlinear relationships between the predictors of interest

(BMI and WHR) and mortality among the non-institutionalized United States population

of adults aged at least 25 years. Models were adjusted for baseline age and indicators of

ethnicity, smoking, alcohol, and serious illnesses.

Main Outcome Measures: Mortality indicated as death prior to follow-up in 2000.

Results: We analyzed data from 14,386 participants. The likelihood of mortality related

piecewise linearly to BMI where models indicated thresholds in the odds at about 19 and

23 for women and men, respectively. The shapes of the BMI-mortality relationships were

similar for both men and women suggesting no significant elevation in odds with

increasing BMI. Linear logistic models were adequate to relate WHR to mortality and

suggested that increased WHR linearly increased log(odds) of mortality among women (β

= 2.4; 95% CI (1.3, 3.5)), but not among men.

117

Conclusions: Weighted logistic regression was, however, sufficient to model the WHR-

mortality relationship, but WHR was a significant predictor for women only. BMI related

nonlinearly to mortality with a broken line shape similar among both women and men

and was a significant predictor of mortality only for very low values. This finding was

unexpected and should be viewed with caution as an example of how the nonlinear

framework, restricted to certain settings and a specific a priori analysis plan, would fit

the data, not as a final concluding result. The data suggested that a longer follow-up time

might be required for characterizing mortality at high levels of BMI.

Key words: mortality, BMI, waist-to-hip ratio, NHANES III, piecewise linear logistic

regression.

118

Introduction

Mortality as a Possibly Nonlinear Function of BMI or WHR

Obesity prevalence has been increasing in the United States (Ogden et al., 2002;

Flegal et al., 2002) along with many plausible contributors (Keith et al., 2006a) to the

obesity epidemic (WHO, 1998). Body mass index (BMI) is a commonly used proxy

measure of adiposity or obesity which has been shown to have a nonlinear J- or U-shaped

relationship with several health outcomes including health-related quality of life (Heo et

al., 2003); headaches (Keith et al., 2008); dementia (Rosengren et al., 2005); and

mortality rate or longevity (e.g.’s Allison et al., 1997; Bigaard et al., 2004; Calle et al.,

1999; Fontaine et al., 2003; Kaplan et al., 2002; Keith et al., 2006b; Troiano et al., 1996;

Zhou et al., 2002). Much effort has been directed at determining the effects associated

with elevated BMI. Allison et al. (1999a) used estimates of relative risk of mortality in

combination with the distribution of BMI and other factors to estimate the annual deaths

attributable to obesity in the United States. Since then, several studies have conducted

similar analyses and reported a variety of results ranging from 165,000 (Flegal et al.,

2005) to 365,000 (Mokdad et al., 2004; 2005) which suggests considerable uncertainty in

how much excess risk might be attributable to obesity. As these estimates provide a basis

for estimating obesity-related healthcare costs (Allison et al., 1999b) and which might

influence healthcare budgeting that can affect millions of Americans, the reliability and

quality of risk estimates is of considerable public health import.

Here we consider a different modeling approach to obtaining the mortality risk

estimates in terms of odds ratios that might better account for the nonlinearity in the

119

BMI-mortality relationship in the presence of covariate information. We also consider

waist-to-hip ratio (WHR) in analyses along side those for BMI for purposes of comparing

and contrasting these two popular respective measures of weight distribution and relative

weight for their capacities to predict mortality.

Goetghebeur and Pocock (1995) warn that analyzing such relationships with

many conventional methods, such as a quadratic polynomial of BMI, can distort the risk-

relationship and produce misleading results. We hypothesized that mortality and BMI

will have a relationship characterized by upturns such as those warned about by

Goetghebeur and Pocock (1995) as we expect mortality risk may increase substantially

with decreasing BMI below some unknown, but estimable, threshold and may thereafter

flatten-out or begin increasing with increasing BMI above some other unknown

threshold.

We have found little evidence in the literature to suggest that WHR might relate

nonlinearly with mortality. Studies have suggested that linear models depict the

relationship well WHR-mortality relationships (e.g., Lindqvist et al., 2006). It remains

unclear if nonlinearity might be detected with a sufficiently flexible modeling framework.

Categorical vs. Continuous Representations of predictors: Basic Issues

A key assumption in fitting statistical models between a continuous predictor and

an outcome of interest is that there is an underlying quantitative relationship between the

predictor of interest (e.g., BMI) and the outcome (e.g., mortality odds). Furthermore, we

assume that the outcome can be represented well by some estimable function of the

predictor (i.e., a functional form). There are basically three different approaches to

120

modeling a continuous predictor’s relationship with an outcome: categorize the

observations or maintain a continuous metric and apply polynomials or piecewise

polynomial splines.

Many investigators have applied (e.g., Manson et al., 1995; Stevens et al., 1998)

or advocated (e.g., Rothman, 1992) in favor of using contiguous categories of BMI set a

priori by common standards (e.g. underweight: BMI<18; normal: 18≤BMI<25;

overweight: 25≤BMI<30; and obese: 30≤BMI), quintiles, or some other arbitrary

classification rules in the estimation of mortality relative risk. The reasons for this

treatment of BMI are most likely borne out of convenience or convention. Theoretically,

categorization permits an examination of differences in risk or odds of an event occurring

between categories and does not assume linearity or smoothness in the relationship.

However, categorization of BMI has significant disadvantages and limitations. They

include: 1) ignoring within-category BMI information resulting in decreased statistical

power; 2) insensitivity of trend tests to non-monotonic relationships; 3) trend tests may

indicate a trend, but cannot describe it; 4) similar individuals within a BMI category are

treated as though they have a uniformly constant risk regardless of their actual BMI level;

5) results can depend heavily upon how the categories are chosen; and 6) unfortunately, a

priori classification boundaries of BMI are not likely to represent “true” partitions or

thresholds that would group individuals according to the underlying pattern of mortality

risk or odds within a BMI category.

Treating BMI continuously with polynomial predictor variables does not degrade

data, tends to preserve power, and does not impose arbitrary cut-points. Curvilinear

relationships (U- or J-shaped) have commonly been detected between BMI and mortality

121

risk when modeled using polynomials of BMI. Modeling substantially nonlinear

relationships via polynomials will also have disadvantages and limitations. They include:

1) a lack of flexibility possibly leading to biased estimation, particularly in the tails of the

BMI distribution; 2) poorly parameterized models; and 3) the model will smooth over

any cutpoints between BMI groups with different mortality risk relationships.

Zhao and Kolonel (1992) suggest using categorical analysis as an outstanding

exploratory technique. Categorization can help in evaluating the extent to which any

smooth function fitted to the data adequately captures patterns in the data. Following this

exploration, in the overwhelming majority (though perhaps not all) cases, we believe it is

appropriate to return to a continuous metric and model any nonlinearity with appropriate

functions.

Piecewise linear regression models (also called regression spline models; see de

Boor, 1978) may offer some of the best features of the polynomial and categorical

approaches to modeling BMI. When the knots (also called breakpoints, changepoints,

joinpoints, or cutpoints) that connect the linear segments in a spline model are allowed to

be fitted as free model parameters (the so-called free-knots), they may estimate the

boundaries of BMI groups experiencing differential, non-constant risk relationships. The

localized estimation properties of free-knot splines may help pick out the relationships in

the tails of the BMI distribution where polynomial models tend to lack flexibility and

protect central observations from excessive model influence of extreme values.

To our knowledge, no studies have employed nonlinear (non-categorical)

statistical methodology to estimate risk or odds of mortality related to BMI or WHR with

122

appropriate adjustments for complex survey samples which are capable of representing

the US population. This represents a gap in the literature that deserves attention.

Methods

Population and sample design

The data from the Third National Health and Nutrition Examination Survey

(NHANES III) is a complex multistage cross-sectional sample representative of the non-

institutionalized United States population described in detail elsewhere (Plan and

operation of the Third National Health and Nutrition Examination Survey, 1994). The

National Center for Health Statistics conducted this survey and in 2000 collected

mortality information on participants based on probabilistic match with records in the

National Death Index, thus providing up to 13 years of mortality follow-up. NHANES III

was subject to institutional review and obtained informed consent from participants.

Measurements

Table 1 lists the variables we considered in our analyses and respective

distribution summaries by gender. Both reported and measured information are available

in NHANES III. During standardized health examinations men and women were

measured for dimensions including height, weight, waist circumference, and hip

circumference. Participants were asked to report demographic information including

indications of their ethnicity or race (Non-Hispanic black, Non-Hispanic white, or

Mexican-American); tobacco smoking; heavy alcohol use; and personal history of serious

123

illness or disease which could include congestive heart failure (CHF), myocardial

infartion (MI), stroke, non-skin cancer, or emphysema.

Analyses included only anthropometric variables resulting from technician

measurements and we applied appropriate NHANES III sample weights; the “mobile

examination center (MEC) and home examination weights (WTPFHEX6)” (U.S. DHHS

NHANES III Analytic and Reporting Guidelines, 1996). These weights were the most

inclusive for our purposes. That is, they were non-zero for the largest proportion of

completed examination records of the three general sample final weights.

The predictor variables we are concerned with are anthropometric ratios of which,

in the case of BMI, might indicate relative weight or adiposity or, in the case of WHR,

might indicate weight or adipose tissue distribution about the trunk. Note that 650 women

and 500 men were inexplicably missing WHR measurements from their examination

records, but otherwise had complete information on study variables.

Outcome Definition

We are interested in mortality outcomes which have occurred among adult

NHANES III participants aged 25 years or older. Specifically, this outcome was coded as

a binary random variable where participants determined to have died during follow-up

(i.e., prior to NCHS drawing mortality information from NDI on December 31, 2000)

were assigned a ‘1’ while the others assumed to have survived the follow-up period were

assigned a ‘0’.

124

Statistical Analysis

To model the respective relationships BMI and WHR have with mortality we

incorporated free-knot splines with B-spline bases (de Boor, 1978) into logistic

regression in a way similar to the maximum likelihood-base optimization methodology

described by Bessaoud et al. (2005). The distinctions in our methods focus on knot

selection and adjustments for the complex sample design of NHANES III. We employed

a parametric bootstrap approach (Keith et al., in preparation) to forward selection of the

optimal number of knots or change-points. Bessaoud and colleagues suggested using the

Bayesian Information Criterion (BIC) first described by Schwarz (1978) for this purpose.

However, we found that the complex sample weights inflates and distorts the distribution

of the likelihood to the extent that the over-parameterization penalty imposed by the BIC

has no meaningful effect. Bootstrapping allowed us to generate distributions of 200

likelihood replicates to determine the impact of adding knots and spline parameters

relative to the cost of increased model complexity. Another distinction is that our

modeling framework incorporated the analysis of 500 nonparametric bootstrap samples

according to the methods described by Rao et al. (1992) for adjusting model parameters

for the multistage probability cluster sampling design of NHANES III.

Due to apparent interactions between gender and BMI or WHR in their

relationships with mortality, we stratified analyses by gender group. B-splines were used

to help stabilize the computational performance of our programs, however, the raw spline

parameters have been transformed into their piecewise linear polynomial representation

(i.e., localized slopes). This gives the parameters an interpretation common to linear

logistic regression where they represent the estimated increase in log-odds of mortality

125

per unit increase in predictor (i.e., BMI). The slopes and optimal knot locations have been

estimated along with bootstrap estimates of their standard errors (SE) and bootstrapped

95% confidence intervals (CI). Odds ratios (OR) and bootstrappled 95% CI were

calculated and tabled for selected BMI and WHR values. To depict the bootstrapping

sampling variation, we plotted OR for both BMI and WHR along with the selected model

fitted to 20 randomly selected bootstrap replicated datasets.

All programs were written in SAS (version 9.1; SAS Institute Inc, Cary, NC)

using macro utilities, PROC IML, and PROC NLP.

Results

Table 1 shows how the study variables were distributed among the men and

women surveyed in NHANES III. More than twice as many men had smoked tobacco

and about three times as many men regularly consumed at least 35 oz/day of alcohol as

compared to women. There were also about 20% more deaths among men than women

during follow-up even though their baseline indications of serious illness were similar.

Mean BMI was higher among women while men generally had higher WHR. Figure 1

shows how the distributions of these anthropometrics differ by gender. Men have less

variation in BMI and WHR and there is a noticeable upward shift in the distribution of

WHR for men as compared to women.

The distribution of age and mortality per unit bins of BMI and per 1100 - unit bins

of WHR are displayed in Figure 2. The plotted values refer to the median age of

participants falling into in each bin rounded and truncated to the nearest decade. The

proportion of deaths in each bin seems to be collinear between age and WHR for both

126

women and men. The picture is less clear for BMI. U- or J- shaped curves are apparent in

frames a) and c) where both high and low values of BMI seem to translate to higher

proportions of death in these bin groupings across genders. It is interesting to see that

men and women with the highest levels of BMI are relatively young (many in their 30’s

and 40’s) and survived the follow-up period.

We detected one knot in the BMI-mortality relationships for women and men

located at BMI = 23.4 (95% CI (23.2, 25.8)) and BMI = 19.0 (95% CI (18.1, 20.0)),

respectively. Table 2 displays these model parameter estimates as well as the weighted

and covariate-adjusted linear slopes (β’s) in the log-odds of mortality associated with

BMI relative to adjacent knot parameters detected. That is, if a knot has been fitted in the

model, there will be a slope parameter to represent the linear relationship on one side of

the knot and another slope parameter to represent the linear relationship on the other side

of the knot. The knot points we detected were nadirs in the piecewise linear log-odds

models. They may represent biologically meaningful and gender-specific thresholds at

which points the relationships between BMI and mortality inflect from increased BMI

relating to decreased log-odds of mortality (women: β1 = -0.56, 95% CI (-1.02, -0.32);

men: β1 = -0.23, 95% CI (-0.30, -0.01)) to BMI losing any significant capacity to predict

mortality for either gender. Figure 2 frames a) and c) show plots of odds ratios (OR) for

BMI and mortality against a reference BMI of 23 among otherwise alike individuals. The

model fitted to the original data (in red) shows our best estimates by gender along with

models fitted to 20 randomly selected bootstrap replicate datasets which illustrate the

sampling variability. Table 3 shows OR results with 95% confidence intervals for

selected BMI levels which are commonly used to categorize BMI. The 95% CI for the

127

elevated BMI (say 30 and over) in comparison to the BMI = 23 reference all contained 1,

further showing how BMI does not perform well for predicting odds of mortality.

No knots were necessary to adequately model the adjusted associations between

mortality and WHR among women or men. See Table 2. WHR was, however, a

significant linear predictor of adjusted log-odds of mortality only for women (β = 2.41;

95% CI (1.31, 3.46)). This translates to statistically significant OR (CI not containing 1)

from comparing a reference woman, having WHR = 0.9 (about the average), with

otherwise like women having various WHR (see Table 3). See frame b) of Figure 3 for a

graphical representation of the fitted WHR model (in red) for women. The mortality

response to WHR was basically flat which can be seen in the OR plot in frame d) of

Figure 3.

Comments

Our results did not suggest that elevated BMI predicts mortality odds significantly

in NHANES III for either men or women. This finding was unexpected and should be

viewed with caution as an example of how the nonlinear framework, restricted to certain

settings and a nascent a priori analysis plan, would fit the data, not as a final concluding

statement on how elevated BMI relates to mortality risk. It is clear that more analyses

will be required; particularly regarding possible interactions between BMI and age.

We did feel confident, however, in detecting possibly biologically meaningful

thresholds in the BMI-mortality relationships among men and women at BMIs of about

19 and 23, respectively, which suggest the points at which increased BMI no longer

relates with decreased mortality odds. No such thresholds were evident for WHR in either

128

women or men. The linear relationship of WHR and mortality suggested that as

compared to a woman with an average WHR (0.9), similar women having WHR of 1.0 or

1.1 might have an increase in odds of mortality of 25% or 62%, respectively. While these

results should be interpreted with some caution considering that these are observational

results, they do suggest that more focused prospective studies might produce similar

results and be more useful for direct application in clinical care and public health policy.

WHR does not appear to significantly predict mortality odds among men. This

might seem counter to what one might infer from the plots in frames b) and d) of Figure 2

which seem to suggest that the WHR-mortality relationships might be the same for men

and women. These plots are unadjusted and the apparent relationship among men

disappears when adjusted for other covariates possibly as a result of collinearity with age.

The results accord with those of Price et al. (2006) where they also found among their

United Kingdom cohort stronger associations between WHR and mortality for women

than men. However, the multicollinearity apparent in Figure 2 might be causing unstable

results in the model while it attempts to sort out respective contributions to increasing the

likelihood function from age and WHR. Note that removing a possible outlier (the

octogenarian in the upper left-hand corner of Figure 2 frame d) had no effect on the

overall model. Lindqvist et al. (2006) concluded that the WHR-mortality relationship

depended on age in their Swedish cohort. We did not observe such an interaction. At least

one study has also found that waist circumference has been increasing in the US

population over time beyond what might be expected given the increases in BMI

observed over the same period (Elobeid et al., 2007). This would likely affect the

129

distribution of WHR over time and further complicate effective modeling of the WHR-

mortality relationship.

There are issues surround BMI, age, and the length of mortality follow-up which

limited our ability to model the BMI-mortality relationship. The prevalence of obesity

has increased and may be accelerating with passing time for many plausible reasons

(Keith et al., 2006b). Interpretation of Figure 2 suggests that the distribution of BMI

might be changing with calendar time and it is clear that the heaviest individuals in

NHANES III were among the youngest and tended to survive the follow-up period. We

cannot rule out that self-selection bias had entered into this survey as perhaps only the

younger and/or healthier among those having BMI ≥ 45 might have been motivated to

travel for participation in the MEC health examination. Regardless, we had little or no

information in this dataset for statistical analyses on what factors would be associated

with their mortality. These observations were very influential and pulled the adjusted log-

odds of mortality models nearly flat over the upper part of the BMI distribution despite

the fact that older participants with high BMI were showing elevated mortality odds.

Thus, NHANES III might be of only limited use for estimating BMI-mortality risk

associations unless BMI has been categorized.

Flegal et al. (2005) used categorized BMI and Cox proportional hazards modeling

to analyze mortality in NHANES I, II, and III. As compared to our continuous piecewise

linear logistic regression modeling of BMI, these two aspects represent the most obvious

dissimilarities in statistical methodology and probably account for most disparity in

results. To see how results from categorized BMI would compare to those previously

published by Flegal et al. (2005) we used SAS PROC SURVEYLOGISTIC and SAS-

130

Callable SUDAAN PROC RLOGIST to fit complex survey design weighted logistic

regression models (data not shown) stratified by baseline age groups (1: 25≤age<60; 2:

60≤age<69; and 3: age≥70) with participants categorized as “underweight” if BMI≤18.5;

“normal weight” (reference) if 18.5≤BMI<25; “overweight” if 25≤BMI<30; “obese” if

30≤BMI<35; and “severely obese” if BMI≥35. In summary, we noted that the BMI-

mortality odds ratios from these models were quite similar in magnitude to the hazard

ratios presented by Flegal et al. (2005) except that among those in their sixties where we

noted significantly smaller odds of mortality among the obese as compared to normal

weight and an odds ratio below 1 (not statistically significant) for the severely obese. We

can offer no clear explanation for this result at this point, but it is noteworthy that when

the proportion of events are as high as 10 or 15%, odds ratios will not likely estimate

relative risks as accurately as hazard ratios. It might be important to note also that Flegal

et al. (2005) made no adjustments for height or indications of baseline illness and, to be

consistent with the NHANES I and II covariates, they coded Mexican Americans as

having ‘white’ race/ethnicity. However, our categorical BMI analyses were robust to

removal of the illness, height, and race/ethnicity covariates from the models. Thus, these

covariates were not likely the root cause of the heterogeneity between any of our findings

and those of Flegal et al. (2005).

Incorporating follow-up time, say by nonlinear Cox proportional hazards

regression, and stratifying analyses by age groupings as Flegal et al. (2005) had done or

by using time-dependent covariates might provide more information and a better

approach to modeling the sparse mortality information among the most obese

participants. However, there are no software packages of which we are aware that offer

131

Cox regression model tools capable of incorporating free-knot spline bases and complex

sample designs. We suggest that if NCHS collects another round of mortality data from

the NDI, there might by then be sufficient mortality information to characterize the BMI-

mortality relationship among these individuals.

Some have found evidence from studies of the first three waves of NHANES that

the effects of elevated BMI on mortality risk might be decreasing with time (Flegal et al.,

2005). In some ways, this makes intuitive sense as being overweight or obese has become

highly prevalent and healthy people are commonly reaching these levels of BMI at earlier

ages. If elevated BMI does have deleterious effects which might accumulate with time, it

might take stratification by age cohorts in addition to greater follow-up periods to tease

out the BMI-mortality relationship. On the other hand, if these individuals continue to

survive, then their excess weight may be protective against mortality as they age (Stevens

et al., 1998).

Another limitation is that we have conducted analyses on the entire sample and

our results might be subject to the biasing effects of regression dilution (Clarke et al.,

1999) and what has been called “reverse-causation” (Manson et al., 1987; Willett et al.,

2005) which tend to deflate estimates of mortality related to BMI. Given the structure of

the NHANES III data as depicted in Figure 2, we suggest that the models are being

influenced by extreme values that would likely overshadow any regression dilution

affecting the size of estimates for elevated BMI. We have taken some measures to adjust

for reverse-causation factors such as smoking and baseline illness. These factors have the

potential to confound and spuriously associate low levels of BMI with elevated risk of

death. This can result in deflated relative risks when comparing groups with high BMI to

132

those low BMI (Greenberg et al., 2007). Some have suggested removing participants

reporting exposures to reverse-causation factors (e.g., Willett et al., 2005). Others have

suggested that removal of subjects experiencing early deaths or confounding exposures

will not likely remove the bias and is thus, not advisable (e.g., Allison et al., 1999c). Our

modeling approach allowed for considerable flexibility over the entire range of BMI. As

opposed to fitting a quadratic or cubic polynomial curve to the data, the piecewise linear

model effectively allowed us to fit well the steep linearly decreasing association observed

between low BMI and mortality and then break abruptly to model the relatively flat

mortality response surface associated with elevated BMI. This approach has placed the

estimated thresholds where they are likely to belong (i.e., at low BMI levels) which

limited the influence of those observations likely to cause bias on those with elevated

BMI. This, in addition to adjusting to some extent for illness and smoking, appeared to

effectively diminish the biasing effects of reverse-causation without removing

participants reporting exposures to reverse-causation factors.

References

Allison DB, Faith MS, Heo M, Kotler DP. (1997) Hypothesis concerning the U-shaped

relation between BMI and mortality. Am. J. Epidemiol;146:339-349.

Allison DB, Fontaine KR, Manson JE, Stevens J, VanItallie TB. (1999a) Annual Deaths

Attributable to Obesity in the United States. JAMA;282:1530-8.

Allison DB, Heo M, Flanders DW, Faith MS, Carpenter KM, Williamson DF. (1999c)

Simulation Study Of The Effects Of Excluding Early Mortality On Risk Factor-

133

Mortality Analyses In The Presence Of Confounding Due To Occult Disease: The

Example Of Body Mass Index. Annals of Epidemiology;9:132-42.

Allison DB, Zannolli R, Narayan KVM. (1999b) The direct health care costs of obesity in

the United States. American Journal of Public Health;89:1194-9.




Bigaard J, Frederiksen K, Tjonneland A, Thomsen BL, Overvad K, Heitmann BL,

Sorensen TI. (2004) Body fat and fat-free mass and all-cause mortality. Obes

Res;12:1042-9.

Calle EE, Thun MJ, Petrelli JM, Rodriguez C, Heath CW Jr. (1999) Body mass index and

mortality in a prospective cohort of US adults. N Engl J Med;341:1097–1105.

Clarke R, Shipley M, Lewington S, Youngman L, Collins R, Marmot M et al.

Underestimation of risk associations due to regression dilution in long-term

follow-up prospective studies. (1999) Am J Epidemiol;150:341–353.


Elobeid MA, Desmond R, Thomas O, Keith SW, Allison DB. (2007) Waist

circumference values are increasing beyond that expected from body mass index

increases. Obesity;15:2380-3.

Flegal KM, Graubard BI, Williamson DF, Gail MH. (2005) Excess deaths associated

with underweight, overweight, and obesity. JAMA;293:1861-7.

Flegal et al. (2002) Prevalence and Trends in Obesity Among US Adults, 1999-2000.

JAMA;288:1723-7.

134

Fontaine KR, Redden DT, Wang C, Westfall AO, Allison DB. (2003) Years of life lost

due to obesity. JAMA;289:187-93.

Goetghebeur E, Pocock S. (1995) Detection and Estimation of J-shaped Risk-Response

Relationships. JR Statist Soc A;158:107-122.

Greenberg JA, Fontaine KR, Allison DB. (2007) Putative biases in estimating mortality

attributable to obesity in the US population. IJO;31:1449-55.

Heo M, Allison DB, Faith MS, Zhu S, Fontaine KR. (2003) Obesity and quality of life:

mediating effects of pain and comorbidities. Obes Res;11:209-16.

Kaplan RC, Heckbert SR, Furberg CD, Psaty BM. (2002) Predictors of subsequent

coronary events, stroke, and death among survivors of first hospitalized

myocardial infarction. J Clin Epidemiol;55:654-64.

Keith SW, Wang C, Fontaine KR, Allison DB. (2008) Body mass index and headache


Keith SW, Desmond R, Allison DB. (2006b) Body fat and mortality: A survival analysis

of the third National Health and Nutrition Examination Study (NHANES III).

Obesity;14 Suppl A262.

Keith SW, Redden DT, Katzmarzyk P, Boggiano MM, Hanlon EC, Benca RM, Ruden D,

Pietrobelli A, Barger J, Fontaine K, Wang C, Arronne L, Wright S, Baskin M,

Dhurandhar N, Lijoi M, Grilo CM, De Luca M, Allison DB. (2006a) Putative

Contributors to the Secular Increase in Obesity: Exploring the Roads Less

Traveled. IJO;30:1585-94.

Greenberg JA, Fontaine KR, Allison DB. (2007) Putative biases in estimating mortality

attributable to obesity in the US population. IJO;31:1449-55.

135

Lindqvist P, Andersson K, Sundh V, Lissner L, Bjorkelund C, Bengtsson C. (2006)

Concurrent and separate effects of body mass index and waist-to-hip ratio on 24-

year mortality in the Population Study of Women in Gothenburg: Evidence of

age-dependency. European Journal of Epidemiology;21:789-94.

Manson JE, Stampfer MJ, Hennekens CH, Willett WC. (1987) Body weight and

longevity: a reassessment. JAMA;257:353–58.

Manson, J.E., Willett, W.C., Stampfer, M.J., Colditz, G.A., Hunter, D.J., Hankinson,

S.E., Hennekens, C.H., & Speizer, F.E. (1995) Body weight and mortality among

women. New England Journal of Medicine, 333, 677-685.

Mokdad AH, Marks JS, Stroup DF, Gerberding JL. (2004) Actual causes of death in the

United States, 2000. JAMA;291:1238-45.

Mokdad AH, Marks JS, Stroup DF, Geberding JL. (2005) Correction: actual causes of

death in the United States,2000. JAMA.;293:293-4.

Ogden et al. Prevalence and Trends in Overweight Among US Children and Adolescents,

1999-2000. JAMA. 2002 Oct 9;288(14):1728-32.

Plan and operation of the Third National Health and Nutrition Examination Survey,

1988-1994: series 1: programs and collection procedures. (1994) Vital Health Stat

1;32:1-407.

Price GM, Uauy R, Breeze E, Bulpitt CJ, Fletcher AE. (2006) Weight, shape, and

mortality risk in older persons: elevated waist-hip ratio, not high body mass index,

is associated with a greater risk of death. Am J Clin Nutr;84:449-60.



136

Rosengren A, Skoog I, Gustafson D, Wilhelmsen L. (2005) Body mass index, other

cardiovascular risk factors, and hospitalization for dementia. Arch Intern

Med;165:321-6.

Rothman KJ. Modern Epidemiology. Boston, MA: Little Brown, 1992.


Stevens J, Cal J, Pamuk ER, Williamson DF, Thun MJ, Wood JL. (1998). The effect of

age on the association between body-mass index and mortality. New Eng J Med;

338, 1-7.

Troiano RP, Frongillo EA, Sobal J, Levitsky DA. (1996) The relationship between body

weight and mortality: A quantitative analysis of combined information from

existing studies. Int J Obesity;20:63-75.





Prevention.

WHO. (1998) Obesity. Preventing and managing the global epidemic. World Health

Organization, Geneva.

Willett WC, Hu FB, Colditz GA, Manson JE. (2005) Underweight, overweight, obesity,

and excess deaths. JAMA;294:551.

Zhao LP., Kolonel LN. (1992). Efficiency loss from categorizing quantitative exposures

into qualitative exposures in case-control studies. American Journal of

Epidemiology, 136(4):464-74.

137

Zhou BF. (2002) Effect of body mass index on all-cause mortality and incidence of

cardiovascular diseases-report for meta-analysis of prospective studies open

optimal cut-off points of body mass index in Chinese adults. Biomed Environ

Sci;15:245-52.

138

Table 1 NHANES III data description.* Survey baseline years 1988-1994 Mortality follow-up through 2000 Unweighted sample size 14,386 Women (%) 7,626 (53) Baseline Study Variables

Women Men BMI: mean (SD) 27.83 (6.58) 26.84 (4.83) WHR: mean (SD) 0.89 (0.08) 0.97 (0.07) Age: mean (SD) 52.22 (18.56) 52.83 (18.37) Height (cm): mean (SD) 159.99 (7.24) 173.40 (7.52) Ethnicity: Black (%) 2,219 (29) 1,833 (27) Mexican-American (%) 1,843 (24) 1,849 (27) White (%) 3,564 (47) 3,078 (46) Smoking: Never (%) 4,559 (60) 2,192 (32) Former (%) 1,433 (19) 2,514 (37) Current (%) 1,634 (21) 2,054 (31) Alcohol ≥ 0.35 oz./day (%) 435 (6) 1,255 (19) Illnesses (CHF, MI, stroke,

cancer, or emphysema) 1,077 (14) 1,104 (16)

Follow-up Variable

Deaths (%) 1,235 (16) 1,482 (22) * Means and frequencies were not weighted or adjusted. They reflect the information available, but may not represent well population parameters.

139

Table 2 Piecewise linear logistic regression model* results for relating log-odds of mortality during follow-up to BMI and WHR by gender.

Predictor Parameter Parameter Estimate

Standard Error (bootstrapped)

95% CI (bootstrapped)

BMI among women (n=7,626)

β1 (slope for BMI<knot) -0.56 0.18 (-1.02, -0.32)

knot 19.00 0.54 (18.10, 20.00)

β2 (slope for BMI>knot) 0.01 0.01 (-.01, 0.03) BMI among men (n=6,760)

β1 (slope for BMI<knot) -0.23 0.05 (-0.30,-0.01)

knot 23.43 0.87 (23.23, 25.81)

β2 (slope for BMI>knot) 0.03 0.02 (-0.01, 0.06) WHR among women (n=6,973)

β 2.41 0.54 (1.31, 3.46) WHR among men (n=6,267)

β -0.37 0.85 (-2.03, 1.34) * Adjusted for baseline age, age squared, and height, and indicators of ethnicity, smoking status, alcohol use, and serious illness.

140

Table 3 Odds ratios and bootstrap 95% CI across selected BMI and WHR values by gender.

BMI*

18.5 25 30 35 40 Women 1.25 (0.91, 1.74) 1.03 (0.99, 1.07) 1.10 (0.96, 1.25) 1.18 (0.94, 1.47) 1.26 (0.91, 1.72) Men 2.85 (1.55, 3.57) 0.94 (0.68, 0.98) 1.07 (0.69, 1.15) 1.21 (0.69, 1.39) 1.38 (0.68, 1.74)

WHR†

0.8 1.0 1.1 1.2 1.3 Women 0.79 (0.71, 0.88) 1.27 (1.14, 1.41) 1.62 (1.30, 2.00) 2.06 (1.48, 2.82) 2.63 (1.69, 3.99) Men 1.04 (0.87, 1.22) 0.96 (0.82, 1.14) 0.93 (0.67, 1.31) 0.90 (0.54, 1.50) 0.86 (0.44, 1.71) * Compared to a BMI reference level of 23; † compared to a WHR reference level of 0.9.

141

a) BMI among women b) WHR among women

Freq

uenc

y

Freq

uenc

y

c) BMI among men d) WHR among men

Freq

uenc

y

Freq

uenc

y

Figure 1. Histograms depicting the distributions of BMI and WHR by gender.

142

a) BMI among women b) WHR among women

Prop

ortio

n

Prop

ortio

n

c) BMI among men d) WHR among men

Prop

ortio

n

Prop

ortio

n

Figure 2. Unadjusted proportion of deaths observed among those grouped per unit of BMI and 1

100 unit of WHR, respectively, by gender. Plotted numbers refer to the median age rounded and truncated to the nearest decade of each subunit grouping.

143

a) BMI* among women b) WHR† among women

Mor

talit

y O

R

Mor

talit

y O

R

c) BMI* among men d) WHR† among men

Mor

talit

y O

R

Mor

talit

y O

R

* BMI reference level is 23; † WHR reference level is 0.9. Figure 3. Odds ratios plotted for BMI and WHR by gender. Fitted model (in red) and 20 randomly selected bootstrap replicate models (in black).

144

CONCLUSIONS

The obesity research field has grown considerably in recent years, but I believe

undue attention has been devoted to two postulated causes for increases in the prevalence

of obesity leading to neglect of other plausible mechanisms and well-intentioned, but

potentially ill-founded proposals for reducing obesity rates. For at least 10 putative

additional explanations for the increased prevalence of obesity over recent decades, in my

first paper I showed supportive (though not conclusive) evidence that, in many cases, is

as compelling as the evidence for more commonly discussed putative explanations.

Although the effect of any one factor may be small, the combined effects may be

consequential.

This paper illustrates the importance of developing new ideas and challenging the

assumptions we make in research. Currently, many researchers are investigating the

connections between the putative contributors we described and the obesity epidemic.

Each of the 10 putative contributors we highlighted deserves more attention, but sleep

debt has stood out as a strong candidate for immediate extensive research. Both sleep

research and obesity research are popular and investigators have been proposing (e.g.,

http://clinicaltrials.gov/ct2/show/NCT00261898?term=obesity+AND+sleep&rank=1) and

conducting observational studies or experiments relating sleep and the regulation of

certain aspects of metabolism including appetite and satiety which may affect excess

adipose tissue accumulation (e.g.’s, Gangswisch et al., 2005; Hasler et al., 2004; Speigel

et al., 2004). Other putative contributors are being investigated as well, such as study of

145

http://clinicaltrials.gov/ct2/show/NCT00261898?term=obesity+AND+sleep&rank=1

endocrine disruptors (e.g.,

http://crisp.cit.nih.gov/crisp/CRISP_LIB.getdoc?textkey=7229996&p_grant_num=5R21

ES01372402&p_query=(obesity+%26+endocrine+%26+disruptors)&ticket=74028657&

p_audit_session_id=357566744&p_audit_score=14&p_audit_numfound=1&p_keywords

=ob) and environmental temperature (e.g.,

http://clinicaltrials.gov/ct2/show/NCT00521729?term=obesity+AND+temperature&rank

=1). These efforts will likely result in even more cogent investigations in future

interdisciplinary research providing a more comprehensive picture of what might be

driving the epidemic so that efforts to curb the secular increases in obesity may be better

guided.

With data resources growing in size and scope, so too should our abilities to draw

connections between health outcomes and possible predictors. I have provided extensive

details on the design, evaluation, and implementation of a framework for modeling

nonlinearity between a binary outcome and a continuous prognostic variable adjusted for

covariates in complex health survey samples. The primary objective of this methodology

was to analyze non-random survey samples by applying sophisticated modeling

techniques capable of detecting nonlinearity and adjusting model flexibility. It is

important that this methodology be useful in practice, so providing familiar-looking

parameterizations of output, such as linear slope coefficients and odds ratios, was a key

objective. Unlike other nonlinear modeling packages, my framework accounted for

multistage cross-sectional survey sample designs common to nationally representative

datasets. Under the conditions I simulated, my method of selecting the optimal number of

knots was commonly more accurate than Schwarz’s Bayesian Information Criterion

146

http://crisp.cit.nih.gov/crisp/CRISP_LIB.getdoc?textkey=7229996&p_grant_num=5R21ES01372402&p_query=(obesity+%26+endocrine+%26+disruptors)&ticket=74028657&p_audit_session_id=357566744&p_audit_score=14&p_audit_numfound=1&p_keywords=ob)




http://clinicaltrials.gov/ct2/show/NCT00521729?term=obesity+AND+temperature&rank=1

http://clinicaltrials.gov/ct2/show/NCT00521729?term=obesity+AND+temperature&rank=1

(BIC) (Schwarz, 1978) and very similar to Akaike’s Information Criterion (AIC)

(Akaike, 1974) in terms of accuracy and precision as long as sample sizes were relatively

large. Moreover, AIC and BIC were substantially biased in model selection when

sampling weights were incorporated.

In the application of my framework to examine the relationship between BMI and

headaches among over 220,000 women, I found that a BMI of approximately 20 was

associated with the lowest risk of headache. Relative to a BMI of 20, mild obesity (BMI

of 30) was associated with roughly a 35% increase in odds of headache whereas severe

obesity (BMI of 40) was associated with roughly an 80% increase in odds. Results were

essentially unchanged when models were further adjusted for socioeconomic variables,

alcohol consumption, and hypertension. Consistently, across the study databases I

analyzed, obese women had significantly increased risk of headaches.

Menopause was a possible confounder of some of the results, but I would suggest

that it is unlikely to have a strong influence on the BMI-migraine association as our

results from WHI were consistent with other studies (e.g., Bigal et al., 2006) which were

not based on older women (as WHI was). From communicating with people in the field

about these results, a common opinion is that the personal volition to take medication for

illness or pain can be highly heterogeneous from person-to-person. While this might not

have confounded our results from NHANES I, per se, it might have added enough

“noise” to the outcome response (coded as headache vs. no headache) such that we could

not detect a significant association even with thousands of observations. Behavior is

almost certainly an important factor; over which our analyses had no control. Stress,

physical activity, and diet behaviors each seem to have a strong impact on the frequency

147

and severity of migraine headaches. Research efforts on these connections and finding

treatment interventions are currently ongoing (e.g., The American Migraine Prevention

Study: http://clinicaltrials.gov/ct2/show/NCT00363506?term=headache+diet&rank=4).

BMI showed a nonlinear association with mortality in NHANES III where models

indicated thresholds in the odds of mortality at BMI values near 19 and 23 for women

and men, respectively. The broken line shapes of the BMI-mortality relationships were

similar for both men and women suggesting no significant elevation in odds with

increasing BMI. The data, however, suggested that a longer follow-up time might be

required for characterizing mortality at high levels of BMI. Linear logistic models were

adequate to relate WHR to mortality and suggested that increased WHR linearly

increased log(odds) of mortality among women (β = 2.4; 95% CI (1.3, 3.5)), but not

among men. For more information on the influence of ignoring the complex sample

design information and secondary analyses of BMI and mortality, see Appendix B.

I suggest that the methodology and information described in this research will be

useful to clinicians, public health scientists, epidemiologists, and biostatisticians. I have

implemented the only free-knot spline logistic regression modeling framework which

makes adjustments for complex sample designs. By linearly transforming the B-spline or

truncated power basis parameters to the more familiar-looking piecewise linear

polynomial parameter representations I have provided a powerful estimation and

inference tool; the output of which may be understood by non-statisticians. Since BMI is

an easily measured and modifiable risk factor, clinicians and public health officials might

use the results I displayed from the application of my framework to advise or counsel

patients regarding the risk associated with being “below” or “above” a threshold BMI

148

http://clinicaltrials.gov/ct2/show/NCT00363506?term=headache+diet&rank=4

level I detected to reduce their risk of headache or mortality. My results on the linear

relationship of WHR and mortality suggest that as compared to a woman with an average

WHR (0.9), similar women having WHR of 1.0 or 1.1 might have an increase in odds of

mortality of 25% or 62%, respectively. While these results should be interpreted with

some caution considering that these are observational cross-sectional results, they do

suggest that more focused prospective studies might produce similar results and be more

useful for direct application in clinical care and public health policy.

149

GENERAL LIST OF REFERENCES

Akaike H. (1974) A new look at the statistical model identification. IEEE Transactions

on Automatic Control;19:716–23.

Allison DB, Zannolli R, Narayan KVM. (1999) The direct health care costs of obesity in

the United States. American Journal of Public Health;89:1194-1199.




Bender R, Augustin T, Blettner M. (2005) Generating survival times to simulate Cox

proportional hazards models. Statistics in Medicine;24:1713-1723.

Bigal ME, Liberman JN, Lipton RB. (2006) Obesity and migraine: A population study.

Neurology;28:545–50.

Calle EE, Thun MJ, Petrelli JM, Rodriguez C, Heath CW Jr. (1999) Body mass index and

mortality in a prospective cohort of US adults. N Engl J Med;341:1097–1105.

Cox, D.R., 1961. Tests of separate families of hypotheses. Pro-ceedings of the Fourth

Berkeley Symposium on Mathematical Statistics and Probability. University of

California Press, Berkeley, pp. 105–123.

Cox, D.R., 1962. Further results on tests of separate families of hypotheses. Journal of the

Royal Statistical Society. Series B 24, 406–424.

Cox DR. (1972) Regression models and life tables (with Discussion). Journal of the

Royal Statistical Society, Series B;34:187-220.

150

Cox DR. (1975) Partial likelihood. Biometrika;62:69-72.

Davison AC, Hinkley DV. (1997) Bootstrap Methods and their Application. New York:



Efron B. (1977) The efficiency of Cox’s likelihood function for censored data.

JASA;72:557-65.

Flegal KM, Graubard BI, Williamson DF, Gail MH. (2005) Excess deaths associated

with underweight, overweight, and obesity. JAMA;293:1861-7.

Fontaine KR, Redden DT, Wang C, Westfall AO, Allison DB. (2003) Years of life lost

due to obesity. JAMA;289:187-93.

Gangwisch JE, Malaspina D, Boden-Albala B, Heymsfield SB. (2005) Inadequate sleep

as a risk factor for obesity: analyses of the NHANES I. Sleep;28:1289-96.

Hasler G, Buysse DJ, Klaghofer R, Gamma A, Ajdacic V, Eich D, Rossler W, Angst J.

(2004) The association between short sleep duration and obesity in young adults:

a 13-year prospective study. Sleep;27:661-6.

Keith SW, Wang C, Fontaine KR, Allison DB. (in press) Body mass index and headache


Keith SW, Desmond R, Allison DB. (2006b) Body fat and mortality: A survival analysis

of the third National Health and Nutrition Examination Study (NHANES III).

Obesity;14 Suppl A262.

Keith SW, Redden DT, Katzmarzyk P, Boggiano MM, Hanlon EC, Benca RM, Ruden D,

Pietrobelli A, Barger J, Fontaine K, Wang C, Arronne L, Wright S, Baskin M,

Dhurandhar N, Lijoi M, Grilo CM, De Luca M, Allison DB. (2006) Putative

151

Contributors to the Secular Increase in Obesity: Exploring the Roads Less

Traveled. IJO;30:1585-94.

Klein JP, Moeschberger ML. (2003) Survival Analysis, Second Edition. Springer-

Verlag, New York.

Korn EL, Graubard BI. (1999) Analysis of Health Surveys. J. Wiley & Sons, New York.

Molinari N, Daures JP, Durand JF. (2001) Regression splines for threshold selection in

survival data analysis; 20:237-247.

Narayan KM, Boyle JP, Thompson TJ, Gregg EW, Williamson DF. (2007) Effect of BMI

on lifetime risk for diabetes in the U.S. Diabetes Care;30:1562-6.

Parmigiani G. (2002) Measuring uncertainty in complex decision analysis models.

Statistical Methods in Medical Research;11:513-37.

Rao JNK. (1997) Developments in sample survey theory: an appraisal. The Canadian

Journal of Statistics;25:1-21.




Spiegel K, Tasali E, Penev P, Van Cauter E. (2004) Brief communication: Sleep

curtailment in healthy young men is associated with decreased leptin levels,

elevated ghrelin levels, and increased hunger and appetite. Ann Intern

Med;141:846-50.

Therneau TM, Grambsch PM. (2000) Modeling Survival Data: Extending the Cox

Model. New York: Springer.

152





Prevention.

153

APPENDIX

A. Future Directions: Nonlinear Cox Proportional Hazards Regression

Since the proposed framework can be based on maximizing likelihood functions, I

anticipate that the method may be extended to censored time-to-event data models to

construct nonlinear Cox proportional hazards models, that will rely on the partial

likelihood theory introduced by Cox (1972; 1975). The basic Cox model relies on a

hazard relation. Suppose that X is an N x 1 vector of data on some continuous prognostic

variable of interest, Z represents a N x (p + 1) matrix consisting of a column of ones

followed by p columns of data on covariates, and t is a vector of time to event or

censoring. The basic Cox model relies on the hazard relation,

λ(t | Z,X) = λo(t)exp{[Z X]β} ,

where t represents time to either the event of interest or right censoring (at the end of the

study or otherwise lost to follow-up) and β is a (p + 2) x 1 vector of regression parameter

coefficients which can be estimated without specifying the baseline hazard function, λo(t),

by choosing β̂ to maximize the partial likelihood (Cox, 1975).

The linear relationship between log0

( | )( ),λ

λ⎧ ⎫⎨ ⎬⎩ ⎭

tt

Z X and [Z X]β is linear in β and can

be easily modified to include a nonlinear predictor by replacing X with a B-spline basis

and representing 2p+β with a linear combination of B-spline coefficients.

In general, the values in Z or X can be either fixed under the assumption that they will

not change over time or they can be modeled if they vary with time: Z(t) or X(t). For the

154

immediately planned applications, I may assume X to be fixed and leave modeling of

time varying covariates for future research perhaps to coincide with modeling survival as

a counting process (Therneau and Grambsch, 2000).

Our method will extend to model hazard ratios nonlinearly by estimating hazard

by,

λ(t | Z,X) = λo(t)exp{ ([ ])η Z X },

where for the qth individual,

( )1

0 1 2 p ( 1) 21

[ X ] Z ... Z B XK

q q q q q q p i i qi

bη β β β+

+=

= + = + + + + (∑Z Z B bβ ) .

Here I will be estimating parameters by maximizing the nonlinear weighted partial

likelihood,

( )( ){ }

( ){ } { }11

exp [ X ]|

exp [ X ] I

q qwN q q

Nq q q j qj

PL t t

δ

η

η==

⎡ ⎤⎢ ⎥=⎢ ⎥≤⎣ ⎦

∏∑

Z, X

θZ

Z,

where θ = {β0, β1,…, βp, b1,…, bK+1}, wq is the complex sample weight assigned to the qth

participant, and δq is a censorship variable such that,

1, if is an event time

0, if is a censored time.q

qq

t

tδ

⎧⎪= ⎨⎪⎩

As it may be more convenient to optimize by maximization in PROC NLP, the nonlinear

log weighted partial likelihood is,

( ){ } ( ) ( ){ } { }1 1

log | [ X ] log exp [ X ] IN N

q q q q q q j qq j

PL w t tδ η η= =

⎡ ⎤⎧ ⎫⎪ ⎪= − ≤⎢ ⎥⎨ ⎬⎪ ⎪⎢ ⎥⎩ ⎭⎣ ⎦

∑ ∑θ Z, X Z Z .

It is important to note that I may expect to see a fairly large number of ties in survival

times and adjustment to the partial likelihood must be made. Efron (1977) proposed an

155

adjustment to the partial likelihood for this situation and his method will be incorporated

in to our nonlinear framework.

I propose in this section a novel method of selecting the optimal number of knots.

Like Bessaoud et al. (2005) and Molinari et al. (2001) I will be interested in identifying

the fitted knots and evaluate their potential to be interpretted thresholds defining

clinically (or biologically) meaningful groups having differential patterns of risk. It will

be very important to specify a parsimonious number of knots, say K = 4 or fewer, which

would indicate 5 or fewer different risk groups. Therefore, keeping the framework from

producing overparameterized models having unnecessary knots will be a priority.

Again, I are considering a set of p linear and/or categorical covariates and one

potentially nonlinear prognostic variable. Model selection herein is aimed at selecting the

number of meaningful knots that best suits a given set of complex sample data. Our

technique involves a selection procedure based on Monte Carlo sampling of partial

likelihood ratio statistics for the addition of parameters (i.e., knots and adjoining slopes)

to a given piecewise linear model.

Note here that these free-knot models are generally non-nested models (Cox 1961,

1962) and thus requires the testing of nearly all possible max 12

K +⎛ ⎞⎜⎝ ⎠

⎟ models, where Kmax

represents the maximum number of knots (i.e., 4). The exception being that a K = 0 linear

model is nested in every free-knot spline model. So, I would have to conduct all pairwise

tests for models shown to be significantly better than the linear Cox model. That is, the

number of tests against the K = 0 model will be Kmax plus the number of models found

significantly better than the K = 0 model (say S):

156

# tests = . { }max I 12S

K S⎛ ⎞

+ > ⎜ ⎟⎝ ⎠

The test statistic for the least squares framework will be a partial likelihood ratio (PLR),

( )( )

0

1

ˆ |

ˆ |

PLPLR

PL=

X

X

θ

θ,

where represents the parameters found to be the best fit for some model having K0θ̂ 0

knots which is then considered to compose the null model (including the intercept, linear

coefficients, free-knots, and the spline parameters (i.e., the piecewise linear slopes)); and

represents the parameter set that maximized the partial likelihood under some

alternative model having K

1θ̂

1 knots such that K0 < K1 ≤ Kmax.

The parametric bootstrap described by Davison and Hinkley (1997) is a tool I can

use to repeatedly simulate Monte Carlo datasets. I use the fitted model parameter

estimates from the null model fitted to the original data to compute a hypothetic

distribution of PLR statistics, under the null hypothesis that the “reduced” null model

having K0 knots is true.

Say I draw D parametrically bootstrapped datasets of outcomes (censored time-to-

event random variables) and compute the PLR distribution { }*D

*1 PLR,...,PLR by fitting

models having K0 and K1 knots, respectively, to each of the D bootstrap datasets. The

bootstrap p-value representing the probability that adding (K1 - K0) knots to the null

model produces a PLR at least as small as what was observed by chance alone if the K0-

knot null model were true,

∑=

≤=D

1i

*iboot }PLRPLR{

D1p I .

157

I let α represent the significance level for this parametric bootstrap test of the

contribution to significantly increasing the partial likelihood. Comparing pboot to α gives

us a likelihood ratio test of size α for the contribution to the likelihood from the K1-knot

model. Simulating a variety of data and fitting models at various levels of α should lead

to an broad understanding of the specificity and sensitivity of the parametric bootstrap

testing and how to properly adjust the framework for optimal parsimony.

Parametric bootstrapping for censored time-to-event outcomes

Imputing repeated simulated outcomes by the parametric bootstrap procedure for

free-knot selection in the piecewise logistic regression modeling (see step 4 in Table 1) is

fairly straight forward. The free-knot selection in the piecewise Cox regression modeling

will be more complicated. In order to estimate a distribution of censored time-to-event

outcomes under the null hypothesis of K knots, I will need to not only estimate the linear

and spline parameters, but also the baseline hazard function by Breslow’s estimator

(Klein and Moeschberger, 2003) to estimate the survival probabilities for each subject.

Once I have estimated the model parameters and baseline hazard function, I plan to

generate censored survival times by the methods outlined by Bender et al. (2005).

B. On the effects of ignoring sample weights or those with extremely high BMI

To examine how robust our findings might be to ignoring the sample weights and

assuming the data were drawn by simple random sampling, I analyzed some of the data

without the survey weights and design information. Some of the headache study datasets

were not nationally representative and did not include sample weights. In these datasets,

158

the results were very consistent across studies that ascertained information on the

frequency or severity of headache and when I analyzed the NHIS 2003 data without

sample weights the results were quite similar, but I noted some obviously deflated

variance estimates in comparison to the complex sample weighted results. For the

mortality study, I used the AIC and BIC methods to select the “best” nonlinear models

for BMI and WHR by gender group. This approach avoided the parametric 2 df testing

procedure allowing me to grid search over 9 BMI and WHR locations (as opposed to 6)

in a reasonable amount of time. The WHR parameter estimates were nearly identical in

magnitude, but the variance estimates were somewhat deflated. BMI, on the other hand,

would allow two knots into the model for women and three knots for the men. Figure 1

shows the results plotted in terms of odds ratios. The model for women maintained the

steep slope, as we saw before, for low BMI and the first knot remained near BMI = 20

after which the slope increased linearly until the second knot made a breakpoint around

50 for specifying a different very steep slope reflecting the lack of mortality information

for those with extremely high BMI. The model for men maintained the steep slope, as we

saw before, for low BMI and the first knot remained near BMI = 23 after which the slope

was flat until the second knot made a breakpoint around 36 for specifying an increase in

risk for the severely obese. The third knot breaks the relationship again at around BMI =

42 at which point the leverage of those with BMI ≥ 45 showing little or no mortality

information caused our model to show decreased mortality odds with increased BMI

among those with extremely high BMI. These results demonstrate what models are likely

to be fitted with our framework once we have the computing power to grid search over

more potential knot locations in the BMI spectrum.

159

a) BMI* among women b) BMI* among men M

orta

lity

OR

Mor

talit

y O

R

* BMI reference level is 23 Figure 1. NHANES III mortality odds ratios plotted for BMI by gender from models selected without consideration for sample design. Models shown were selected by using AIC.

To see what the models would look like if we removed those participants with

extremely high BMI, I ran the analyses over again, but after having removed any subjects

with BMI > 45. Figure 2 shows the results from these models. The results for the women

in this restricted dataset were nearly identical to the results previously reported by our

methods. However, the model fitted to the men in this restricted dataset was quite

different. It specified four knots located at { }BMI 25.2,35.7,40.4,41.7 .∈ I am confident

that we are seeing some overfitting in this model especially at BMI > 40, but these the

two knots at 40.4 and 41.7 significantly increased the likelihood that the data came from

this model and were necessary to partition the data to better reflect the increased odds of

mortality we expect to see, given the information presented in the fourth paper, among

men having 35 < BMI < 40.

160

a) BMI* among women b) BMI* among men

Mor

talit

y O

R

Mor

talit

y O

R

* BMI reference level is 23 Figure 2. NHANES III mortality odds ratios calculated by our framework plotted for participants having BMI ≤ 45 by gender.

161

Documents

FREE-KNOT SPLINES AND BOOTSTRAPPING FOR … SPLINES AND BOOTSTRAPPING FOR NONLINEAR MODELING IN COMPLEX SAMPLES ... estimating the complexity of the free-knot spline by …