Estimation Techniques for Dose-response Functions Presented by Bahman Shafii, Ph.D. Statistical Programs College of Agricultural and Life Sciences University

Estimation Techniques for Dose-response Functions

Presented by

Bahman Shafii, Ph.D.

Statistical ProgramsCollege of Agricultural and Life Sciences

University of Idaho

Acknowledgments

• Research partially funded by USDA-ARS Hatch Project IDA01412, Idaho Agricultural Experiment Station.

• Collaborators:

• William J. Price Ph. D., Statistical Programs, University of Idaho.

• Steven Seefeldt, Ph. D., USDA -ARS, University of Alaska Fairbanks.

• Dose-response models are common in agricultural research.

• They can encompass many types of problems:

• Modeling environmental effects due to exposure to chemical or temperature regimes.

• Estimation of time dependent responses such as germination, emergence, or hatching.

(e.g. Shafii and Price 2001; Shafii, et al. 2009)

• Bioassay assessments via calibration curves and quantal estimation. (e.g. Shafii and Price 2006)

Introduction

Estimation

• Curve estimation.• Linear or non-linear techniques.

• Estimate other quantities:• percentiles.

• typically: LD50, LC50, EC50, etc.

• percentile estimation problematic.• inverted solutions.• unknown distributions.• approximate variances.

• The response distribution:

• Continuous• Normal• Log Normal• Gamma, etc.

• Discrete - quantal responses• Binomial, Multinomial (yes/no)• Poisson (count)

• The response form:

• Typically expressed as a nonlinear curve

• increasing or decreasing sigmoidal form• increasing or decreasing asymptotic form

Dose

Res

pon

se

Dose

• Given a dose-response curve and an observed response:

• What dose generated the response?

• What is the probability of a dose given an observed response and the calibration curve?

• This problem fits naturally into a Bayesian framework.

Bioassay and Calibration

Dose

Res

pon

se

Measured Response

Unknown Dose

• Typical dose-response estimation assumes that the functional form or tolerance distribution, is known, e.g. a sigmoidal shape.

• In some cases, however, it may be advantageous to relax this assumption and restrict estimation to a family of dose-response forms.

• The dose-response population consists of a mixture of subpopulations which can not be sampled separately.

• The dose-response series exhibits a more complex behavior than a simple sigmoidal shape, e.g. hormesis.

• Objectives

• Outline estimation methods for dose-

response models.

• Modern approaches.• Probit - Maximum Likelihood

• Generalized non-linear models.

• Bayesian solutions.

• Traditional approaches.

• Probit - Least Squares.

• Objectives

• Demonstrate solutions for calibration of an

unknown dose with a binary response

assuming:

• A known dose-response form.• Standard MLE estimation.

• Standard Parametric Bayesian estimation.

• A family of dose-response forms.

• Nonparametric Bayesian estimation.

Estimation Methods

Traditional Approach

• Probit Analysis - Least Squares

^

where pij = yij / N and yij is the number of successes out of N

trials in the jth replication of the ith dose. 0 and 1 are regression parameters and i is a random

error; ij ~ N(0,2).

• Minimize: SSerror = (pij - probit)2

• A linearized least squares estimation (Bliss, 1934 ; Fisher, 1935;

Finney, 1971):

Probiti = -1(pij) = 0 + 1*dosei + ij (1)

• is a convenient CDF form or “tolerance distribution“, e.g.

• Normal: pij = (1/2) exp((x-)2/2

• Logistic: pij = 1 / (1 + exp( -dosei - ))

• Modified Logistic: pij = C + (C-M) / (1 + exp( -dosei -)) (e.g. Seefeldt et al. 1995)

• Gompertz: pij = 0 (1 - exp(exp(-(dose))))

• Exponential: pij = 0 exp(-(dose))

• SAS: PROC REG.

Modern Approaches

• Probit Analysis - Maximum Likelihood

for data set yij where i = (0 + 1*dosei ) and 0, 1, and dosei are those given previously.

• The CDF, , is typically defined as a Normal, Logistic, or

Gompertz distribution as given above.

• SAS: PROC PROBIT.

• The responses, yij, are assumed binomial at each dose i

with parameter i. Using the joint likelihood, L(i) :

Maximize: L(i) (i)yij (1 - i)(N - yij) (2)

• Limitations:

• Least squares limited.• Linearized solution to a non-linear problem.

• Even under ML, solution for percentiles approximated. • inversion.• use of the ratio 0/1 (Fieller, 1944).

• Appropriate only for proportional data.

• Assumes the response -1(pij) ~ N(, 2).

• Interval estimation and comparison of percentile values approximated.

Probit Analysis

Modern Approaches (cont)

• Nonlinear Regression - Iterative Least Squares

where yij is an observed continuous response, f(dosei)

may be generalized to any continuous function of dose

and ij ~ N(, 2).

• Minimize: SSerror = [ yij - f(dosei) ] 2.

• SAS: PROC NLIN.

• Directly models the response as:

yij = f(dosei) + ij (3)

• Nonlinear Regression - Iterative Least Squares

• Limitations:

• assumes the data, yij , is continuous; could be discrete.

• the response distribution may not be Normal,

i.e. ij ~ N(, 2).

• standard errors and inference are asymptotic.

• treatment comparisons difficult in PROC NLIN.

• differential sums of squares, or

• specialized SAS codes ; PROC IML.

• Generalized Nonlinear Model - Maximum Likelihood


where yij and f(dosei) are as defined above.

• Estimation through maximum likelihood where the

response distribution may take on many

forms:

Normal: yij ~ N(i, ) ,

Binomial: yij ~ bin(N, i) ,

Poisson: yij ~ poisson(i) , or

in general: yij ~ ƒ().

• Directly models the response as:

yij = f(dosei) + ij

• Generalized Nonlinear Model - Maximum Likelihood

• Maximize: L() ƒ(yij) (4)

• Nonlinear estimation.

• Response distribution not restricted to Normal.

• May also incorporate random components into the model.

• Treatment comparisons easier in SAS.• Contrast and estimate statements.

• SAS: PROC NLMIXED.

• Generalized Nonlinear Model - Inference

• Formulate a full dummy variable model encompassing k

treatments.• The joint likelihood over the k treatments becomes:

L(k) ijk ƒ(kyijk) (5)

where yijk is the jth replication of the ith dose in the kth treatment and k are the parameters of the kth treatment.

• Comparison of parameter values is then possible through single and multiple degree of freedom contrasts.

• Generalized Nonlinear Model

• Limitations

• percentile solution may still be based on inversion or Fieller’s theorem.

• inferences based on normal theory approximations.

• standard errors and confidence intervals asymptotic.

• Bayesian Estimation - Iterative Numerical Techniques


• Considers the probability of the parameters, ,

given the data yij.

• Using Bayes theorem, estimate:

p(|yij) = p(yij|)*p() (6)

p(yij|)*p()d

where p(|yij) is the posterior distribution of given the data yij, p(yij|) is the likelihood definedabove, and p() is a prior probability distribution for the parameters .


• Nonlinear estimation.

• Percentiles can be found from the distribution of .

• The likelihood is same as Generalized Nonlinear Model.

• flexibility in the response distribution.

• f(dosei) any continuous function of dose.

• Inherently allows updating of the estimation.

• Correct interval estimation (credible intervals).

• agrees well with GNLM at midrange percentiles.

• can perform better at extreme percentiles.

• SAS: PROC MCMC.

• Limitations

• User must specify a prior probability p().

• Estimation requires custom programming.• SAS: PROC MCMC• Specialized software: WinBUGS

• Computationally intensive solutions.

• Requires statistical expertise. • Sample programs and data are available at:

http://www.uidaho.edu/ag/statprog


Calibration Methods

• Tolerance Distribution: Logistic

• The response yij/Ni at dose i = 1 to k, and replication

j =1 to r , is binomial with the proportion of success

given by:

yij/Ni = M/(1 + exp(- (dosei - ))) (7)

where is a rate related parameter and is the dosei for which the proportion of success, yij/Ni , is M/2. M is the theoretical maximum proportion attainable.

• A convenient generalization of (1) will allow to represent any dose at which yij/Ni = Q:

yij/Ni = M*C / (C + exp(- (dosei - ))) (8)

Where the constant C = Q/(M – Q). Note that, if Q = M/2, then C = 1 and equation (8) reverts to the standard form given in (7).

Equation (8), therefore, permits an unknown dose at a given response, Q, to be estimated through parameter .

• Maximum Likelihood

• Given the binomial responses, yij/Ni, a joint

likelihood may be defined as:

L(i | yij/Ni) ij (i)yij (1 - i)(Ni - yij) (9)Where the binomial parameter ,i , is defined by (8)

and the associated parameters, = [M, , ], are estimated through maximization of (9). Ni and yij are the total number of trials and number of successes, respectively.• Inferences on are carried out assuming ~ N(, ).

• SAS: PROC NLMIXED

• Bayesian: Parametric

• A Bayesian posterior distribution for is given by:

pr(| yij/Ni) pr(yij/Ni |) · pr() (10)

where pr(yij/Ni j|) is the likelihood shown in (9) and pr()

is a prior distribution for the parameters = [M, , ].

Estimation of is carried out through numerically

intensive techniques such as MCMC. (e.g. Price and Shafii 2005)

• Inference on is obtained through integration of

(10) over the parameter space of M and .

• Bayesian: Nonparametric

• Assuming the responses, yij/Ni, are binomial, a likelihood canthen be defined as:

L(P | yij/Ni) ij (pi)yij (1 - pi)(Ni - yij) (11)

• This methodology was first proposed by Mukhopadhyay (2000) and followed by Kottas et al. (2002).

• The technique considers the dose-response series as a multinomial process with parameters P = [p1, p2, p3, … pk].

• If the random segments between true response rates, pi , are distributed as a Dirichlet Process (DP), a joint

prior distribution on the pi may then be defined by:

pr(P) i (pi – pi - 1)(i - 1) (12)

where i = { F0(dose i) – F0(dose i – 1 ) }, is a precision parameter , and F0 is a base tolerance distribution.

• The precision parameter, , reflects how closely the final estimation follows the base distribution. Low values indicate less correspondence , while larger values indicate a tighter association.

• The base distribution, F0(.), defines a family of tolerance distributions.

• A posterior distribution for P can then be defined by combining (11) and (12) as:

pr(P | yij/Ni) ij (pi)yij (1 - pi)(Ni - yij) i (pi – pi - 1)(i - 1)

(13)

• Estimation of this posterior is again carried out numerically using techniques such as MCMC.

• Inference on an unknown dose, , at a known response p0 = y0/N0, is obtained through sampling of the posterior given in (13) .

Concluding Remarks• Dose-response models have wide application in agriculture.

• Probit models of estimation are limited in scope.

• Generalized nonlinear and Bayesian models provide the most flexible framework for dose-response estimation.

• Can use various response distributions • Can use various dose-response models.• Can incorporate random model effects.• Can be used to compare treatments.

• GNLM: full dummy variable modeling.• Bayesian methods: probability statements.

• They are useful for quantifying the relative efficacy of treatments.

• Bayesian estimation is preferred when estimating extreme percentiles.

• Generalized nonlinear models sufficient in most situations.

• Methodology proposed here uses a base tolerance distribution.

• Should be used and interpreted with caution.• Standard model assessment techniques still apply.• Introduces more uncertainty into the estimation situation.

Concluding Remarks (cont)• Bioassay is an import part of dose-response analysis.

• Determining an unknown dose can be problematic for some parametric functional forms.

• Dose estimation fits naturally in a Bayesian framework.

• Some dose-response data may not follow typical sigmoidal patterns.

References Bliss, C. I. 1934. The method of probits. Science, 79:2037, 38-39

Bliss, C. I. 1938. The determination of dosage-mortality curves from small numbers. Quart. J. Pharm., 11: 192-216.

Berkson, J. 1944. Application of the Logistic function to bio-assay. J. Amer. Stat. Assoc. 39: 357-65.

Feiller, E. C. 1944. A fundamental formula in the statistics of biological assay and some applications. Quart. J. Pharm. 17: 117-23.

Finney, D. J. 1971. Probit Analysis. Cambridge University Press, London.

Fisher, R. A. 1935. Appendix to Bliss, C. I.: The case of zero survivors., Ann. Appl. Biol., 22: 164-5.

SAS Inst. Inc. 2004. SAS OnlineDoc, Version 9, Cary, NC.

Seefeldt, S.S., J. E. Jensen, and P. Fuerst. 1995. Log-logistic analysis of herbicide dose-response relationships. Weed Technol. 9:218-227.

Kottas, A., M. D. Branco, and A. E. Gelfand. 2002. A Nonparametric Bayesian Modeling Approach for Cytogenetic Dosimetry. Biometrics 58, 593-600.

ReferencesMukhopadhyay, S. 2000. Bayesian Nonparametric Inference on the Dose Level

with Specified Response Rate. Biometrics 56, 220-226.

Price, W. J. and B. Shafii. 2005. Bayesian Analysis of Dose-response Calibration Curves. Proceedings of the Seventeenth Annual Kansas State

University Conference on Applied Statistics in Agriculture [CDROM], April 25-27, 2005. Manhattan Kansas.

Shafii, B. and W. J. Price. 2001. Estimation of cardinal temperatures in germination data analysis. Journal of Agricultural, Biological and Environmental Statistics. 6(3):356-366.

Shafii, B. and W. J. Price. 2006. Bayesian approaches to dose-response calibration models. Abstract: Proceedings of the XXIII International Biometrics Conference [CDROM], July 16 - 21, 2006. Montreal, Quebec Canada.

Shafii, B., Price, W.J., Barney, D.L. and Lopez, O.A. 2009. Effects of stratification and cold storage on the seed germination characteristics of cascade huckleberry and oval-leaved bilberry. Acta Hort. 810:599-608.

Questions / Comments

Documents

Estimation Techniques for Dose-response Functions Presented by Bahman Shafii, Ph.D. Statistical Programs College of Agricultural and Life Sciences University