Simulation r

Embed Size (px)

Citation preview

  • 7/29/2019 Simulation r

    1/128

    Outline Introduction Distributions Current Design

    Software for Distributions in R

    David Scott1 Diethelm Wurtz2 Christine Dong1

    1Department of Statistics

    The University of Auckland

    2Institut fur Theoretische Physik

    ETH Zurich

    June 30, 2009

    David Scott, Diethelm Wurtz, Christine Dong Software for Distributions in R

    http://find/http://goback/
  • 7/29/2019 Simulation r

    2/128

    Outline Introduction Distributions Current Design

    Outline

    1 Introduction

    2 Current Software for DistributionsBase RContributed Packages

    3 Implementation in Base R

    4 Design for Distribution Implementation

    David Scott, Diethelm Wurtz, Christine Dong Software for Distributions in R

    http://find/
  • 7/29/2019 Simulation r

    3/128

    Outline Introduction Distributions Current Design

    Outline

    1 Introduction

    2 Current Software for DistributionsBase RContributed Packages

    3 Implementation in Base R

    4 Design for Distribution Implementation

    David Scott, Diethelm Wurtz, Christine Dong Software for Distributions in R

    http://find/
  • 7/29/2019 Simulation r

    4/128

    Outline Introduction Distributions Current Design

    Outline

    1 Introduction

    2 Current Software for DistributionsBase R

    Contributed Packages

    3 Implementation in Base R

    4 Design for Distribution Implementation

    David Scott, Diethelm Wurtz, Christine Dong Software for Distributions in R

    http://find/
  • 7/29/2019 Simulation r

    5/128

    Outline Introduction Distributions Current Design

    Outline

    1 Introduction

    2 Current Software for DistributionsBase R

    Contributed Packages

    3 Implementation in Base R

    4 Design for Distribution Implementation

    David Scott, Diethelm Wurtz, Christine Dong Software for Distributions in R

    O li I d i Di ib i C D i

    http://find/
  • 7/29/2019 Simulation r

    6/128

    Outline Introduction Distributions Current Design

    Outline

    1 Introduction

    2 Current Software for DistributionsBase R

    Contributed Packages

    3 Implementation in Base R

    4 Design for Distribution Implementation

    David Scott, Diethelm Wurtz, Christine Dong Software for Distributions in R

    O tli I t d ti Di t ib ti C t D i

    http://find/http://goback/
  • 7/29/2019 Simulation r

    7/128

    Outline Introduction Distributions Current Design

    Outline

    1 Introduction

    2 Current Software for DistributionsBase R

    Contributed Packages

    3 Implementation in Base R

    4 Design for Distribution Implementation

    David Scott, Diethelm Wurtz, Christine Dong Software for Distributions in R

    Outline Introduction Distributions Current Design

    http://find/http://goback/
  • 7/29/2019 Simulation r

    8/128

    Outline Introduction Distributions Current Design

    1 Introduction

    2 Current Software for Distributions

    3 Implementation in Base R

    4 Design for Distribution Implementation

    David Scott, Diethelm Wurtz, Christine Dong Software for Distributions in R

    Outline Introduction Distributions Current Design

    http://find/http://goback/
  • 7/29/2019 Simulation r

    9/128

    Outline Introduction Distributions Current Design

    Distributions

    Distributions are how we model uncertainty

    There is well-established theory concerning distributions

    There are standard approaches for fitting distributions

    There are many distributions which have been found to be ofinterest

    Software implementation of distributions is a well-definedsubject in comparision to say modelling of time-series

    David Scott, Diethelm Wurtz, Christine Dong Software for Distributions in R

    Outline Introduction Distributions Current Design

    http://find/
  • 7/29/2019 Simulation r

    10/128

    Outline Introduction Distributions Current Design

    Distributions

    Distributions are how we model uncertainty

    There is well-established theory concerning distributionsThere are standard approaches for fitting distributions

    There are many distributions which have been found to be ofinterest

    Software implementation of distributions is a well-definedsubject in comparision to say modelling of time-series

    David Scott, Diethelm Wurtz, Christine Dong Software for Distributions in R

    Outline Introduction Distributions Current Design

    http://find/
  • 7/29/2019 Simulation r

    11/128

    Outline Introduction Distributions Current Design

    Distributions

    Distributions are how we model uncertainty

    There is well-established theory concerning distributionsThere are standard approaches for fitting distributions

    There are many distributions which have been found to be ofinterest

    Software implementation of distributions is a well-definedsubject in comparision to say modelling of time-series

    David Scott, Diethelm Wurtz, Christine Dong Software for Distributions in R

    Outline Introduction Distributions Current Design

    http://find/http://goback/
  • 7/29/2019 Simulation r

    12/128

    g

    Distributions

    Distributions are how we model uncertainty

    There is well-established theory concerning distributionsThere are standard approaches for fitting distributions

    There are many distributions which have been found to be ofinterest

    Software implementation of distributions is a well-definedsubject in comparision to say modelling of time-series

    David Scott, Diethelm Wurtz, Christine Dong Software for Distributions in R

    Outline Introduction Distributions Current Design

    http://find/
  • 7/29/2019 Simulation r

    13/128

    g

    Distributions

    Distributions are how we model uncertainty

    There is well-established theory concerning distributionsThere are standard approaches for fitting distributions

    There are many distributions which have been found to be ofinterest

    Software implementation of distributions is a well-definedsubject in comparision to say modelling of time-series

    David Scott, Diethelm Wurtz, Christine Dong Software for Distributions in R

    Outline Introduction Distributions Current Design

    http://find/
  • 7/29/2019 Simulation r

    14/128

    Introduction

    In base R there are 20 distributions implemented, at least in

    partAll univariateconsider univariate distributions only

    Numerous other distributions have been implemented in R

    CRAN packages solely devoted to one or more distributionsCRAN packages which implement distributions incidentally(e.g. VGAM)implementations of distributions not on CRAN, e.g JimLindseys work

    David Scott, Diethelm Wurtz, Christine Dong Software for Distributions in R

    Outline Introduction Distributions Current Design

    http://find/http://goback/
  • 7/29/2019 Simulation r

    15/128

    Introduction

    In base R there are 20 distributions implemented, at least in

    partAll univariateconsider univariate distributions only

    Numerous other distributions have been implemented in R

    CRAN packages solely devoted to one or more distributionsCRAN packages which implement distributions incidentally(e.g. VGAM)implementations of distributions not on CRAN, e.g JimLindseys work

    David Scott, Diethelm Wurtz, Christine Dong Software for Distributions in R

    Outline Introduction Distributions Current Design

    http://find/
  • 7/29/2019 Simulation r

    16/128

    Introduction

    In base R there are 20 distributions implemented, at least in

    partAll univariateconsider univariate distributions only

    Numerous other distributions have been implemented in R

    CRAN packages solely devoted to one or more distributionsCRAN packages which implement distributions incidentally

    (e.g. VGAM)implementations of distributions not on CRAN, e.g JimLindseys work

    David Scott, Diethelm Wurtz, Christine Dong Software for Distributions in R

    Outline Introduction Distributions Current Design

    http://find/
  • 7/29/2019 Simulation r

    17/128

    Introduction

    In base R there are 20 distributions implemented, at least in

    partAll univariateconsider univariate distributions only

    Numerous other distributions have been implemented in R

    CRAN packages solely devoted to one or more distributionsCRAN packages which implement distributions incidentally

    (e.g. VGAM)implementations of distributions not on CRAN, e.g JimLindseys work

    David Scott, Diethelm Wurtz, Christine Dong Software for Distributions in R

    Outline Introduction Distributions Current Design

    http://find/http://goback/
  • 7/29/2019 Simulation r

    18/128

    Introduction

    In base R there are 20 distributions implemented, at least in

    partAll univariateconsider univariate distributions only

    Numerous other distributions have been implemented in R

    CRAN packages solely devoted to one or more distributionsCRAN packages which implement distributions incidentally

    (e.g. VGAM)implementations of distributions not on CRAN, e.g JimLindseys work

    David Scott, Diethelm Wurtz, Christine Dong Software for Distributions in R

    Outline Introduction Distributions Current Design

    http://find/http://goback/
  • 7/29/2019 Simulation r

    19/128

    Introduction

    In base R there are 20 distributions implemented, at least in

    partAll univariateconsider univariate distributions only

    Numerous other distributions have been implemented in R

    CRAN packages solely devoted to one or more distributionsCRAN packages which implement distributions incidentally

    (e.g. VGAM)implementations of distributions not on CRAN, e.g JimLindseys work

    David Scott, Diethelm Wurtz, Christine Dong Software for Distributions in R

    Outline Introduction Distributions Current Design

    http://find/
  • 7/29/2019 Simulation r

    20/128

    Introduction

    There are overlaps in coverage of distributions in R

    Implementations of distributions in R are inconsistentnaming of objectsparameterizationsfunction argumentsfunctionality

    return structuresIt is useful to discuss some standardization of implementationof software for distributions

    David Scott, Diethelm Wurtz, Christine Dong Software for Distributions in R

    Outline Introduction Distributions Current Design

    http://find/
  • 7/29/2019 Simulation r

    21/128

    Introduction

    There are overlaps in coverage of distributions in R

    Implementations of distributions in R are inconsistentnaming of objectsparameterizationsfunction argumentsfunctionality

    return structuresIt is useful to discuss some standardization of implementationof software for distributions

    David Scott, Diethelm Wurtz, Christine Dong Software for Distributions in R

    Outline Introduction Distributions Current Design

    http://find/http://goback/
  • 7/29/2019 Simulation r

    22/128

    Introduction

    There are overlaps in coverage of distributions in R

    Implementations of distributions in R are inconsistentnaming of objectsparameterizationsfunction argumentsfunctionality

    return structuresIt is useful to discuss some standardization of implementationof software for distributions

    David Scott, Diethelm Wurtz, Christine Dong Software for Distributions in R

    Outline Introduction Distributions Current Design

    http://find/
  • 7/29/2019 Simulation r

    23/128

    Introduction

    There are overlaps in coverage of distributions in R

    Implementations of distributions in R are inconsistentnaming of objectsparameterizationsfunction argumentsfunctionality

    return structuresIt is useful to discuss some standardization of implementationof software for distributions

    David Scott, Diethelm Wurtz, Christine Dong Software for Distributions in R

    Outline Introduction Distributions Current Design

    I d i

    http://find/http://goback/
  • 7/29/2019 Simulation r

    24/128

    Introduction

    There are overlaps in coverage of distributions in R

    Implementations of distributions in R are inconsistentnaming of objectsparameterizationsfunction argumentsfunctionality

    return structuresIt is useful to discuss some standardization of implementationof software for distributions

    David Scott, Diethelm Wurtz, Christine Dong Software for Distributions in R

    Outline Introduction Distributions Current Design

    I d i

    http://find/http://goback/
  • 7/29/2019 Simulation r

    25/128

    Introduction

    There are overlaps in coverage of distributions in R

    Implementations of distributions in R are inconsistentnaming of objectsparameterizationsfunction argumentsfunctionality

    return structuresIt is useful to discuss some standardization of implementationof software for distributions

    David Scott, Diethelm Wurtz, Christine Dong Software for Distributions in R

    Outline Introduction Distributions Current Design

    I t d ti

    http://find/http://goback/
  • 7/29/2019 Simulation r

    26/128

    Introduction

    There are overlaps in coverage of distributions in R

    Implementations of distributions in R are inconsistentnaming of objectsparameterizationsfunction argumentsfunctionality

    return structuresIt is useful to discuss some standardization of implementationof software for distributions

    David Scott, Diethelm Wurtz, Christine Dong Software for Distributions in R

    Outline Introduction Distributions Current Design

    I t d ti

    http://find/http://goback/
  • 7/29/2019 Simulation r

    27/128

    Introduction

    There are overlaps in coverage of distributions in R

    Implementations of distributions in R are inconsistentnaming of objectsparameterizationsfunction argumentsfunctionality

    return structuresIt is useful to discuss some standardization of implementationof software for distributions

    David Scott, Diethelm Wurtz, Christine Dong Software for Distributions in R

    Outline Introduction Distributions Current Design

    St d di ti

    http://find/
  • 7/29/2019 Simulation r

    28/128

    Standardization

    There are benefits to a standardized approach

    easier for userseasier for developersfewer errors

    Deserves thought, even if not prescriptive

    Perhaps too late!

    David Scott, Diethelm Wurtz, Christine Dong Software for Distributions in R

    Outline Introduction Distributions Current Design

    Standardization

    http://find/http://goback/
  • 7/29/2019 Simulation r

    29/128

    Standardization

    There are benefits to a standardized approach

    easier for userseasier for developersfewer errors

    Deserves thought, even if not prescriptive

    Perhaps too late!

    David Scott, Diethelm Wurtz, Christine Dong Software for Distributions in R

    Outline Introduction Distributions Current Design

    Standardization

    http://find/
  • 7/29/2019 Simulation r

    30/128

    Standardization

    There are benefits to a standardized approach

    easier for userseasier for developersfewer errors

    Deserves thought, even if not prescriptive

    Perhaps too late!

    David Scott, Diethelm Wurtz, Christine Dong Software for Distributions in R

    Outline Introduction Distributions Current Design

    Standardization

    http://find/
  • 7/29/2019 Simulation r

    31/128

    Standardization

    There are benefits to a standardized approach

    easier for userseasier for developersfewer errors

    Deserves thought, even if not prescriptive

    Perhaps too late!

    David Scott, Diethelm Wurtz, Christine Dong Software for Distributions in R

    Outline Introduction Distributions Current Design

    Standardization

    http://find/http://goback/
  • 7/29/2019 Simulation r

    32/128

    Standardization

    There are benefits to a standardized approach

    easier for userseasier for developersfewer errors

    Deserves thought, even if not prescriptive

    Perhaps too late!

    David Scott, Diethelm Wurtz, Christine Dong Software for Distributions in R

    Outline Introduction Distributions Current Design

    Standardization

    http://find/
  • 7/29/2019 Simulation r

    33/128

    Standardization

    There are benefits to a standardized approach

    easier for userseasier for developersfewer errors

    Deserves thought, even if not prescriptive

    Perhaps too late!

    David Scott, Diethelm Wurtz, Christine Dong Software for Distributions in R

    Outline Introduction Distributions Current Design Base R Contributed

    http://find/
  • 7/29/2019 Simulation r

    34/128

    1 Introduction

    2 Current Software for Distributions

    3 Implementation in Base R

    4 Design for Distribution Implementation

    David Scott, Diethelm Wurtz, Christine Dong Software for Distributions in R

    Outline Introduction Distributions Current Design Base R Contributed

    Distributions in Base R

    http://find/http://goback/
  • 7/29/2019 Simulation r

    35/128

    Distributions in Base R

    Implementation in R is essentially the provision ofdpqr

    functions: the density (or probability) function, distributionfunction, quantile or inverse distribution function and randomnumber generation

    The distributions are the binomial (binom), geometric (geom),hypergeometric (hyper), negative binomial (nbinom), Poisson

    (pois), Wilcoxon signed rank statistic (signrank), Wilcoxonrank sum statistic (wilcox), beta (beta), Cauchy (Cauchy),non-central chi-squared (chisq), exponential (exp),F (f),gamma (gamma), log-normal (lnorm), logistic (logis),normal (norm), t (t), uniform (unif), Weibull (weibull),and Tukey studentized range (tukey) for which only the pand q functions are implemented

    David Scott, Diethelm Wurtz, Christine Dong Software for Distributions in R

    Outline Introduction Distributions Current Design Base R Contributed

    Distributions in Base R

    http://find/
  • 7/29/2019 Simulation r

    36/128

    Distributions in Base R

    Implementation in R is essentially the provision ofdpqr

    functions: the density (or probability) function, distributionfunction, quantile or inverse distribution function and randomnumber generation

    The distributions are the binomial (binom), geometric (geom),hypergeometric (hyper), negative binomial (nbinom), Poisson

    (pois), Wilcoxon signed rank statistic (signrank), Wilcoxonrank sum statistic (wilcox), beta (beta), Cauchy (Cauchy),non-central chi-squared (chisq), exponential (exp),F (f),gamma (gamma), log-normal (lnorm), logistic (logis),normal (norm), t (t), uniform (unif), Weibull (weibull),and Tukey studentized range (tukey) for which only the pand q functions are implemented

    David Scott, Diethelm Wurtz, Christine Dong Software for Distributions in R

    Outline Introduction Distributions Current Design Base R Contributed

    Constributed Packages

    http://find/http://goback/
  • 7/29/2019 Simulation r

    37/128

    Constributed Packages

    Many distributions are implemented in R

    The following list is not completesee the task viewhttp://cran.r-project.org/web/views/Distributions.html

    Primarily packages which deal with a particular distribution orset of related distributions

    Some packages not on CRAN are nonetheless available

    I know of other implementations not on CRAN and notavailable

    David Scott, Diethelm Wurtz, Christine Dong Software for Distributions in R

    Outline Introduction Distributions Current Design Base R Contributed

    Constributed Packages

    http://cran.r-project.org/web/views/Distributions.htmlhttp://cran.r-project.org/web/views/Distributions.htmlhttp://find/http://goback/
  • 7/29/2019 Simulation r

    38/128

    C b g

    Many distributions are implemented in R

    The following list is not completesee the task viewhttp://cran.r-project.org/web/views/Distributions.html

    Primarily packages which deal with a particular distribution orset of related distributions

    Some packages not on CRAN are nonetheless available

    I know of other implementations not on CRAN and notavailable

    David Scott, Diethelm Wurtz, Christine Dong Software for Distributions in R

    Outline Introduction Distributions Current Design Base R Contributed

    Constributed Packages

    http://cran.r-project.org/web/views/Distributions.htmlhttp://cran.r-project.org/web/views/Distributions.htmlhttp://find/
  • 7/29/2019 Simulation r

    39/128

    g

    Many distributions are implemented in R

    The following list is not completesee the task viewhttp://cran.r-project.org/web/views/Distributions.html

    Primarily packages which deal with a particular distribution orset of related distributions

    Some packages not on CRAN are nonetheless available

    I know of other implementations not on CRAN and notavailable

    David Scott, Diethelm Wurtz, Christine Dong Software for Distributions in R

    Outline Introduction Distributions Current Design Base R Contributed

    Constributed Packages

    http://cran.r-project.org/web/views/Distributions.htmlhttp://cran.r-project.org/web/views/Distributions.htmlhttp://find/http://goback/
  • 7/29/2019 Simulation r

    40/128

    g

    Many distributions are implemented in R

    The following list is not completesee the task viewhttp://cran.r-project.org/web/views/Distributions.html

    Primarily packages which deal with a particular distribution orset of related distributions

    Some packages not on CRAN are nonetheless available

    I know of other implementations not on CRAN and notavailable

    David Scott, Diethelm Wurtz, Christine Dong Software for Distributions in R

    Outline Introduction Distributions Current Design Base R Contributed

    Constributed Packages

    http://cran.r-project.org/web/views/Distributions.htmlhttp://cran.r-project.org/web/views/Distributions.htmlhttp://find/http://goback/
  • 7/29/2019 Simulation r

    41/128

    g

    Many distributions are implemented in R

    The following list is not completesee the task viewhttp://cran.r-project.org/web/views/Distributions.html

    Primarily packages which deal with a particular distribution orset of related distributions

    Some packages not on CRAN are nonetheless available

    I know of other implementations not on CRAN and notavailable

    David Scott, Diethelm Wurtz, Christine Dong Software for Distributions in R

    Outline Introduction Distributions Current Design Base R Contributed

    Packages

    http://cran.r-project.org/web/views/Distributions.htmlhttp://cran.r-project.org/web/views/Distributions.htmlhttp://find/
  • 7/29/2019 Simulation r

    42/128

    Package Name Description

    actuar Collection of functions and data sets related toactuarial science applications, including loss distri-butions

    bs Package for the Birnbaum-Saunders distributionevd Functions for extreme value distributions

    Davies The Davies quantile functionevdbayes Bayesian analysis in extreme value theoryevir Extreme values in RexactRankTests Exact distributions for rank and permutation testsextRemes Extreme value toolkit

    ghyp A package on generalized hyperbolic distributionsgld Estimation and use of the generalised (Tukey)

    lambda distributionHyperbolicDist The hyperbolic distribution

    David Scott, Diethelm Wurtz, Christine Dong Software for Distributions in R

    Outline Introduction Distributions Current Design Base R Contributed

    Packages

    http://find/
  • 7/29/2019 Simulation r

    43/128

    Package Name Description

    ig Package for the robust and classical versions of theinverse Gaussian distribution

    ismev An introduction to statistical modeling of extremevalues

    lmomco L-moments, trimmed L-moments, L-comoments,

    and many distributionsLmoments L-moments and quantile mixturesmnormt The multivariate normal and t distributionsmvtnorm Multivariate normal and t distributionnormalp Package for exponential power distributions (EPD)

    POT Generalized Pareto distribution and peaks overthreshold

    Rmetrics An environment for teaching financial engineeringand computational finance with R

    David Scott, Diethelm Wurtz, Christine Dong Software for Distributions in R

    Outline Introduction Distributions Current Design Base R Contributed

    Packages

    http://goforward/http://find/
  • 7/29/2019 Simulation r

    44/128

    Package Name Description

    skewt The skewed Student-t distribution

    sn The skew-normal and skew-t distributionsSuppDists Supplementary distributionstdist Distribution of a linear combination of independent

    Students t-variables

    triangle Provides the standard distribution functions for thetriangle distributionVarianceGamma The variance gamma distribution

    There are a number of packages which have implementationsof a number of distributions

    David Scott, Diethelm Wurtz, Christine Dong Software for Distributions in R

    Outline Introduction Distributions Current Design Base R Contributed

    Packages

    http://find/http://goback/
  • 7/29/2019 Simulation r

    45/128

    Package Name Distributions

    actuar Burr, inverse Burr, generalized beta, gen-

    eralized Pareto, inverse exponential, in-verse gamma, inverse paralogistic, inversePareto, inverse transformed gamma, inverseWeibull, log-gamma, log-logistic, paralogis-

    tic, Pareto, transformed gammafBasics (Rmetrics) Skew-normal, skew-t, generalized hyper-bolic, hyperbolic, normal inverse Gaussian,generalized hyperbolic Student-t, stable

    fExtremes (Rmetrics) Generalized extreme value, generalize Pareto

    fGarch (Rmetrics) Generalized exponential, double exponentialfOptions (Rmetrics) Johnson Type IV, reciprocal gamma

    David Scott, Diethelm Wurtz, Christine Dong Software for Distributions in R

    Outline Introduction Distributions Current Design Base R Contributed

    Packages

    http://find/http://goback/
  • 7/29/2019 Simulation r

    46/128

    Package Name Distributions

    HyperbolicDist generalized hyperbolic, hyperbolic, general-

    ized inverse Gaussian, skew-LaplaceQRMlib generalized extreme value, generalized

    hyperbolic, generalized Pareto, Gumbel,probit-normal

    SuppDists Friedman chi-squared, Johnson system(types IIV), Kendalls tau, Kruskal-Wallis,normal scores, generalized hypergeometric,inverse Gaussian

    David Scott, Diethelm Wurtz, Christine Dong Software for Distributions in R

    Outline Introduction Distributions Current Design Base R Contributed

    Packages by Jim Lindsey

    http://find/
  • 7/29/2019 Simulation r

    47/128

    Jim Lindseys packages include multiple distributions also

    Package Name Distributionsrmutil beta-binomial, Box-Cox, Burr, Consul, dou-

    ble binomial, double Poisson, gammacount, generalized extreme value, general-ized gamma, generalized inverse Gaussian,

    generalized logistic, generalized Weibull,Hjorth, Laplace, Levy, multiplicative bino-mial, multiplicative Poisson, Pareto, powerexponential, power variance function Pois-son, simplex, skew-Laplace, two-sided power

    stable stable

    David Scott, Diethelm Wurtz, Christine Dong Software for Distributions in R

    Outline Introduction Distributions Current Design

    http://find/
  • 7/29/2019 Simulation r

    48/128

    1 Introduction

    2 Current Software for Distributions

    3 Implementation in Base R

    4 Design for Distribution Implementation

    David Scott, Diethelm Wurtz, Christine Dong Software for Distributions in R

    Outline Introduction Distributions Current Design

    dpqr Functions

    http://find/http://goback/
  • 7/29/2019 Simulation r

    49/128

    Any experienced R user will be aware of the namingconventions for the density, cumulative distribution, quantileand random number generation functions for the base Rdistributions

    The argument lists for the dpqr functions are standard

    First argument is x, p, qand n for respectively a vector of

    quantiles, a vector of quantiles, a vector of probabilities, andthe sample size

    rwilcox is an exception using nn because n is a parameter

    Subsequent arguments give the parameters

    The gamma distribution is unusual, with argument listshape, rate =1, scale = 1/rate

    This mechanism allows the user to specify the secondparameter as either the scale or the rate

    David Scott, Diethelm Wurtz, Christine Dong Software for Distributions in R

    Outline Introduction Distributions Current Design

    dpqr Functions

    http://find/http://goback/
  • 7/29/2019 Simulation r

    50/128

    Any experienced R user will be aware of the namingconventions for the density, cumulative distribution, quantileand random number generation functions for the base Rdistributions

    The argument lists for the dpqr functions are standard

    First argument is x, p, qand n for respectively a vector of

    quantiles, a vector of quantiles, a vector of probabilities, andthe sample size

    rwilcox is an exception using nn because n is a parameter

    Subsequent arguments give the parameters

    The gamma distribution is unusual, with argument listshape, rate =1, scale = 1/rate

    This mechanism allows the user to specify the secondparameter as either the scale or the rate

    David Scott, Diethelm Wurtz, Christine Dong Software for Distributions in R

    Outline Introduction Distributions Current Design

    dpqr Functions

    http://find/
  • 7/29/2019 Simulation r

    51/128

    Any experienced R user will be aware of the namingconventions for the density, cumulative distribution, quantileand random number generation functions for the base Rdistributions

    The argument lists for the dpqr functions are standard

    First argument is x, p, qand n for respectively a vector of

    quantiles, a vector of quantiles, a vector of probabilities, andthe sample size

    rwilcox is an exception using nn because n is a parameter

    Subsequent arguments give the parameters

    The gamma distribution is unusual, with argument listshape, rate =1, scale = 1/rate

    This mechanism allows the user to specify the secondparameter as either the scale or the rate

    David Scott, Diethelm Wurtz, Christine Dong Software for Distributions in R

    Outline Introduction Distributions Current Design

    dpqr Functions

    http://find/http://goback/
  • 7/29/2019 Simulation r

    52/128

    Any experienced R user will be aware of the namingconventions for the density, cumulative distribution, quantileand random number generation functions for the base Rdistributions

    The argument lists for the dpqr functions are standard

    First argument is x, p, qand n for respectively a vector of

    quantiles, a vector of quantiles, a vector of probabilities, andthe sample size

    rwilcox is an exception using nn because n is a parameter

    Subsequent arguments give the parameters

    The gamma distribution is unusual, with argument listshape, rate =1, scale = 1/rate

    This mechanism allows the user to specify the secondparameter as either the scale or the rate

    David Scott, Diethelm Wurtz, Christine Dong Software for Distributions in R

    Outline Introduction Distributions Current Design

    dpqr Functions

    http://find/
  • 7/29/2019 Simulation r

    53/128

    Any experienced R user will be aware of the namingconventions for the density, cumulative distribution, quantileand random number generation functions for the base Rdistributions

    The argument lists for the dpqr functions are standard

    First argument is x, p, qand n for respectively a vector of

    quantiles, a vector of quantiles, a vector of probabilities, andthe sample size

    rwilcox is an exception using nn because n is a parameter

    Subsequent arguments give the parameters

    The gamma distribution is unusual, with argument listshape, rate =1, scale = 1/rate

    This mechanism allows the user to specify the secondparameter as either the scale or the rate

    David Scott, Diethelm Wurtz, Christine Dong Software for Distributions in R

    Outline Introduction Distributions Current Design

    dpqr Functions

    http://find/http://goback/
  • 7/29/2019 Simulation r

    54/128

    Any experienced R user will be aware of the namingconventions for the density, cumulative distribution, quantileand random number generation functions for the base Rdistributions

    The argument lists for the dpqr functions are standard

    First argument is x, p, qand n for respectively a vector of

    quantiles, a vector of quantiles, a vector of probabilities, andthe sample size

    rwilcox is an exception using nn because n is a parameter

    Subsequent arguments give the parameters

    The gamma distribution is unusual, with argument listshape, rate =1, scale = 1/rate

    This mechanism allows the user to specify the secondparameter as either the scale or the rate

    David Scott, Diethelm Wurtz, Christine Dong Software for Distributions in R

    Outline Introduction Distributions Current Design

    dpqr Functions

    http://find/
  • 7/29/2019 Simulation r

    55/128

    Any experienced R user will be aware of the namingconventions for the density, cumulative distribution, quantileand random number generation functions for the base Rdistributions

    The argument lists for the dpqr functions are standard

    First argument is x, p, qand n for respectively a vector of

    quantiles, a vector of quantiles, a vector of probabilities, andthe sample size

    rwilcox is an exception using nn because n is a parameter

    Subsequent arguments give the parameters

    The gamma distribution is unusual, with argument listshape, rate =1, scale = 1/rate

    This mechanism allows the user to specify the secondparameter as either the scale or the rate

    David Scott, Diethelm Wurtz, Christine Dong Software for Distributions in R

    Outline Introduction Distributions Current Design

    dpqr Functions

    http://find/http://goback/
  • 7/29/2019 Simulation r

    56/128

    Other arguments differ among the dpqr functions

    The d functions take the argument log, the p and q functionsthe argument log.p

    These allow the extension of the range of accuratecomputation for these quantities

    The p and q functions have the argument lower.tail

    The dpqr functions are coded in C and may be found in thesource software tree at /src/math/

    They are in large part due to Ross Ihaka and Catherine Loader

    Martin Machler is now responsible for on-going maintenance

    David Scott, Diethelm Wurtz, Christine Dong Software for Distributions in R

    Outline Introduction Distributions Current Design

    dpqr Functions

    http://find/http://goback/
  • 7/29/2019 Simulation r

    57/128

    Other arguments differ among the dpqr functions

    The d functions take the argument log, the p and q functionsthe argument log.p

    These allow the extension of the range of accuratecomputation for these quantities

    The p and q functions have the argument lower.tail

    The dpqr functions are coded in C and may be found in thesource software tree at /src/math/

    They are in large part due to Ross Ihaka and Catherine Loader

    Martin Machler is now responsible for on-going maintenance

    David Scott, Diethelm Wurtz, Christine Dong Software for Distributions in R

    Outline Introduction Distributions Current Design

    dpqr Functions

    http://find/http://goback/
  • 7/29/2019 Simulation r

    58/128

    Other arguments differ among the dpqr functions

    The d functions take the argumentlog

    , the p and q functionsthe argument log.p

    These allow the extension of the range of accuratecomputation for these quantities

    The p and q functions have the argument lower.tail

    The dpqr functions are coded in C and may be found in thesource software tree at /src/math/

    They are in large part due to Ross Ihaka and Catherine Loader

    Martin Machler is now responsible for on-going maintenance

    David Scott, Diethelm Wurtz, Christine Dong Software for Distributions in R

    Outline Introduction Distributions Current Design

    dpqr Functions

    http://find/http://goback/
  • 7/29/2019 Simulation r

    59/128

    Other arguments differ among the dpqr functions

    The d functions take the argumentlog

    , the p and q functionsthe argument log.p

    These allow the extension of the range of accuratecomputation for these quantities

    The p and q functions have the argument lower.tail

    The dpqr functions are coded in C and may be found in thesource software tree at /src/math/

    They are in large part due to Ross Ihaka and Catherine Loader

    Martin Machler is now responsible for on-going maintenance

    David Scott, Diethelm Wurtz, Christine Dong Software for Distributions in R

    Outline Introduction Distributions Current Design

    dpqr Functions

    http://find/
  • 7/29/2019 Simulation r

    60/128

    Other arguments differ among the dpqr functions

    The d functions take the argument log, the p and q functionsthe argument log.p

    These allow the extension of the range of accuratecomputation for these quantities

    The p and q functions have the argument lower.tail

    The dpqr functions are coded in C and may be found in thesource software tree at /src/math/

    They are in large part due to Ross Ihaka and Catherine Loader

    Martin Machler is now responsible for on-going maintenance

    David Scott, Diethelm Wurtz, Christine Dong Software for Distributions in R

    Outline Introduction Distributions Current Design

    dpqr Functions

    http://find/http://goback/
  • 7/29/2019 Simulation r

    61/128

    Other arguments differ among the dpqr functions

    The d functions take the argument log, the p and q functionsthe argument log.p

    These allow the extension of the range of accuratecomputation for these quantities

    The p and q functions have the argument lower.tail

    The dpqr functions are coded in C and may be found in thesource software tree at /src/math/

    They are in large part due to Ross Ihaka and Catherine Loader

    Martin Machler is now responsible for on-going maintenance

    David Scott, Diethelm Wurtz, Christine Dong Software for Distributions in R

    Outline Introduction Distributions Current Design

    dpqr Functions

    http://find/
  • 7/29/2019 Simulation r

    62/128

    Other arguments differ among the dpqr functions

    The d functions take the argument log, the p and q functionsthe argument log.p

    These allow the extension of the range of accuratecomputation for these quantities

    The p and q functions have the argument lower.tail

    The dpqr functions are coded in C and may be found in thesource software tree at /src/math/

    They are in large part due to Ross Ihaka and Catherine Loader

    Martin Machler is now responsible for on-going maintenance

    David Scott, Diethelm Wurtz, Christine Dong Software for Distributions in R

    Outline Introduction Distributions Current Design

    Testing and Validation

    http://find/
  • 7/29/2019 Simulation r

    63/128

    The algorithms used in the dpqr functions are well-establishedalgorithms taken from a substantial scientific literature

    There are also tests performed, found in the directory testsin two files d-p-q-r-tests.R and p-r-random-tests.R

    Tests in d-p-q-r-tests.R are inversion tests which checkthat qdist(pdist(x))=x for values x generated by rdist

    There are tests relying on special distribution relationships,and tests using extreme values of parameters or arguments

    For discrete distributions equality ofcumsum(ddist(.))=pdist(.)

    David Scott, Diethelm Wurtz, Christine Dong Software for Distributions in R

    Outline Introduction Distributions Current Design

    Testing and Validation

    http://find/
  • 7/29/2019 Simulation r

    64/128

    The algorithms used in the dpqr functions are well-establishedalgorithms taken from a substantial scientific literature

    There are also tests performed, found in the directory testsin two files d-p-q-r-tests.R and p-r-random-tests.R

    Tests in d-p-q-r-tests.R are inversion tests which checkthat qdist(pdist(x))=x for values x generated by rdist

    There are tests relying on special distribution relationships,and tests using extreme values of parameters or arguments

    For discrete distributions equality ofcumsum(ddist(.))=pdist(.)

    David Scott, Diethelm Wurtz, Christine Dong Software for Distributions in R

    Outline Introduction Distributions Current Design

    Testing and Validation

    http://find/http://goback/
  • 7/29/2019 Simulation r

    65/128

    The algorithms used in the dpqr functions are well-establishedalgorithms taken from a substantial scientific literature

    There are also tests performed, found in the directory testsin two files d-p-q-r-tests.R and p-r-random-tests.R

    Tests in d-p-q-r-tests.R are inversion tests which checkthat qdist(pdist(x))=x for values x generated by rdist

    There are tests relying on special distribution relationships,and tests using extreme values of parameters or arguments

    For discrete distributions equality ofcumsum(ddist(.))=pdist(.)

    David Scott, Diethelm Wurtz, Christine Dong Software for Distributions in R

    Outline Introduction Distributions Current Design

    Testing and Validation

    http://find/http://goback/
  • 7/29/2019 Simulation r

    66/128

    The algorithms used in the dpqr functions are well-establishedalgorithms taken from a substantial scientific literature

    There are also tests performed, found in the directory testsin two files d-p-q-r-tests.R and p-r-random-tests.R

    Tests in d-p-q-r-tests.R are inversion tests which checkthat qdist(pdist(x))=x for values x generated by rdist

    There are tests relying on special distribution relationships,and tests using extreme values of parameters or arguments

    For discrete distributions equality ofcumsum(ddist(.))=pdist(.)

    David Scott, Diethelm Wurtz, Christine Dong Software for Distributions in R

    Outline Introduction Distributions Current Design

    Testing and Validation

    http://find/http://goback/
  • 7/29/2019 Simulation r

    67/128

    The algorithms used in the dpqr functions are well-establishedalgorithms taken from a substantial scientific literature

    There are also tests performed, found in the directory testsin two files d-p-q-r-tests.R and p-r-random-tests.R

    Tests in d-p-q-r-tests.R are inversion tests which checkthat qdist(pdist(x))=x for values x generated by rdist

    There are tests relying on special distribution relationships,and tests using extreme values of parameters or arguments

    For discrete distributions equality ofcumsum(ddist(.))=pdist(.)

    David Scott, Diethelm Wurtz, Christine Dong Software for Distributions in R

    Outline Introduction Distributions Current Design

    Testing and Validation

    http://find/http://goback/
  • 7/29/2019 Simulation r

    68/128

    Tests in p-r-random-tests.R are based on an inequality ofMassart:

    Pr

    supx

    |Fn(x) F(x)| >

    2 exp(2n2)

    where Fn is the empirical distribution function for a

    This is a version of the Dvoretsky-Kiefer-Wolfowitz inequalitywith the best possible constant, namely the leading 2 in theright hand side of the inequality

    The inequality above is true for all distribution functions, for

    all n and

    Distributions are tested by generating a sample of size 10,000and comparing the difference between the empiricaldistribution function and distribution function

    David Scott, Diethelm Wurtz, Christine Dong Software for Distributions in R

    Outline Introduction Distributions Current Design

    Testing and Validation

    b d l f

    http://find/http://goback/
  • 7/29/2019 Simulation r

    69/128

    Tests in p-r-random-tests.R are based on an inequality ofMassart:

    Pr

    supx

    |Fn(x) F(x)| >

    2 exp(2n2)

    where Fn is the empirical distribution function for a

    This is a version of the Dvoretsky-Kiefer-Wolfowitz inequalitywith the best possible constant, namely the leading 2 in theright hand side of the inequality

    The inequality above is true for all distribution functions, for

    all n and

    Distributions are tested by generating a sample of size 10,000and comparing the difference between the empiricaldistribution function and distribution function

    David Scott, Diethelm Wurtz, Christine Dong Software for Distributions in R

    Outline Introduction Distributions Current Design

    Testing and Validation

    T i b d i li f

    http://find/http://goback/
  • 7/29/2019 Simulation r

    70/128

    Tests in p-r-random-tests.R are based on an inequality ofMassart:

    Pr

    supx

    |Fn(x) F(x)| >

    2 exp(2n2)

    where Fn is the empirical distribution function for a

    This is a version of the Dvoretsky-Kiefer-Wolfowitz inequalitywith the best possible constant, namely the leading 2 in theright hand side of the inequality

    The inequality above is true for all distribution functions, for

    all n and

    Distributions are tested by generating a sample of size 10,000and comparing the difference between the empiricaldistribution function and distribution function

    David Scott, Diethelm Wurtz, Christine Dong Software for Distributions in R

    Outline Introduction Distributions Current Design

    Testing and Validation

    T t i d R b d i lit f

    http://find/http://goback/
  • 7/29/2019 Simulation r

    71/128

    Tests in p-r-random-tests.R are based on an inequality ofMassart:

    Pr

    supx

    |Fn(x) F(x)| >

    2 exp(2n2)

    where Fn is the empirical distribution function for a

    This is a version of the Dvoretsky-Kiefer-Wolfowitz inequalitywith the best possible constant, namely the leading 2 in theright hand side of the inequality

    The inequality above is true for all distribution functions, for

    all n and

    Distributions are tested by generating a sample of size 10,000and comparing the difference between the empiricaldistribution function and distribution function

    David Scott, Diethelm Wurtz, Christine Dong Software for Distributions in R

    Outline Introduction Distributions Current Design

    http://find/http://goback/
  • 7/29/2019 Simulation r

    72/128

    1 Introduction

    2 Current Software for Distributions

    3 Implementation in Base R

    4 Design for Distribution Implementation

    David Scott, Diethelm Wurtz, Christine Dong Software for Distributions in R

    Outline Introduction Distributions Current Design

    What Should be Provided?

    Besides the obvious dpqr functions what else is needed?

    http://find/http://goback/
  • 7/29/2019 Simulation r

    73/128

    Besides the obvious dpqr functions, what else is needed?

    moments, at least low order ones

    the mode for unimodal distributionsmoment generating function and characteristic functionfunctions for changing parameterisationsfunctions for fitting of distributions and fit diagnosticsgoodness-of-fit tests

    methods associated with fit results: print, plot, summary,print.summary and coeffor maximum likelihood fits, methods such as logLik andprofile

    David Scott, Diethelm Wurtz, Christine Dong Software for Distributions in R

    Outline Introduction Distributions Current Design

    What Should be Provided?

    Besides the obvious dpqr functions what else is needed?

    http://find/
  • 7/29/2019 Simulation r

    74/128

    Besides the obvious dpqr functions, what else is needed?

    moments, at least low order ones

    the mode for unimodal distributionsmoment generating function and characteristic functionfunctions for changing parameterisationsfunctions for fitting of distributions and fit diagnosticsgoodness-of-fit tests

    methods associated with fit results: print, plot, summary,print.summary and coeffor maximum likelihood fits, methods such as logLik andprofile

    David Scott, Diethelm Wurtz, Christine Dong Software for Distributions in R

    Outline Introduction Distributions Current Design

    What Should be Provided?

    Besides the obvious dpqr functions what else is needed?

    http://find/
  • 7/29/2019 Simulation r

    75/128

    Besides the obvious dpqr functions, what else is needed?

    moments, at least low order ones

    the mode for unimodal distributionsmoment generating function and characteristic functionfunctions for changing parameterisationsfunctions for fitting of distributions and fit diagnosticsgoodness-of-fit tests

    methods associated with fit results: print, plot, summary,print.summary and coeffor maximum likelihood fits, methods such as logLik andprofile

    David Scott, Diethelm Wurtz, Christine Dong Software for Distributions in R

    Outline Introduction Distributions Current Design

    What Should be Provided?

    Besides the obvious dpqr functions what else is needed?

    http://find/http://goback/
  • 7/29/2019 Simulation r

    76/128

    Besides the obvious dpqr functions, what else is needed?

    moments, at least low order ones

    the mode for unimodal distributionsmoment generating function and characteristic functionfunctions for changing parameterisationsfunctions for fitting of distributions and fit diagnosticsgoodness-of-fit tests

    methods associated with fit results: print, plot, summary,print.summary and coeffor maximum likelihood fits, methods such as logLik andprofile

    David Scott, Diethelm Wurtz, Christine Dong Software for Distributions in R

    Outline Introduction Distributions Current Design

    What Should be Provided?

    Besides the obvious dpqr functions, what else is needed?

    http://find/
  • 7/29/2019 Simulation r

    77/128

    Besides the obvious dpqr functions, what else is needed?

    moments, at least low order ones

    the mode for unimodal distributionsmoment generating function and characteristic functionfunctions for changing parameterisationsfunctions for fitting of distributions and fit diagnosticsgoodness-of-fit tests

    methods associated with fit results: print, plot, summary,print.summary and coeffor maximum likelihood fits, methods such as logLik andprofile

    David Scott, Diethelm Wurtz, Christine Dong Software for Distributions in R

    Outline Introduction Distributions Current Design

    What Should be Provided?

    Besides the obvious dpqr functions, what else is needed?

    http://find/http://goback/
  • 7/29/2019 Simulation r

    78/128

    pq ,

    moments, at least low order ones

    the mode for unimodal distributionsmoment generating function and characteristic functionfunctions for changing parameterisationsfunctions for fitting of distributions and fit diagnosticsgoodness-of-fit tests

    methods associated with fit results: print, plot, summary,print.summary and coeffor maximum likelihood fits, methods such as logLik andprofile

    David Scott, Diethelm Wurtz, Christine Dong Software for Distributions in R

    Outline Introduction Distributions Current Design

    What Should be Provided?

    Besides the obvious dpqr functions, what else is needed?

    http://find/
  • 7/29/2019 Simulation r

    79/128

    pq ,

    moments, at least low order ones

    the mode for unimodal distributionsmoment generating function and characteristic functionfunctions for changing parameterisationsfunctions for fitting of distributions and fit diagnosticsgoodness-of-fit tests

    methods associated with fit results: print, plot, summary,print.summary and coeffor maximum likelihood fits, methods such as logLik andprofile

    David Scott, Diethelm Wurtz, Christine Dong Software for Distributions in R

    Outline Introduction Distributions Current Design

    What Should be Provided?

    Besides the obvious dpqr functions, what else is needed?

    http://find/
  • 7/29/2019 Simulation r

    80/128

    pq

    moments, at least low order ones

    the mode for unimodal distributionsmoment generating function and characteristic functionfunctions for changing parameterisationsfunctions for fitting of distributions and fit diagnosticsgoodness-of-fit tests

    methods associated with fit results: print, plot, summary,print.summary and coeffor maximum likelihood fits, methods such as logLik andprofile

    David Scott, Diethelm Wurtz, Christine Dong Software for Distributions in R

    Outline Introduction Distributions Current Design

    What Should be Provided?

    Besides the obvious dpqr functions, what else is needed?

    http://find/
  • 7/29/2019 Simulation r

    81/128

    moments, at least low order ones

    the mode for unimodal distributionsmoment generating function and characteristic functionfunctions for changing parameterisationsfunctions for fitting of distributions and fit diagnosticsgoodness-of-fit tests

    methods associated with fit results:print, plot, summary,

    print.summary and coeffor maximum likelihood fits, methods such as logLik andprofile

    David Scott, Diethelm Wurtz, Christine Dong Software for Distributions in R

    Outline Introduction Distributions Current Design

    Fitting Diagnostics

    To assess the fit of a distribution, diagnostic plots should be

    http://find/
  • 7/29/2019 Simulation r

    82/128

    g pprovided

    Some useful plots area histogram or empirical density with fitted densitya log-histogram with fitted log-densitya QQ-plot with optional fitted linea PP-plot with optional fitted line

    For maximum likelihood estimation, contour plots andperspective plots for pairs of parameters, and likelihood profileplots

    David Scott, Diethelm Wurtz, Christine Dong Software for Distributions in R Outline Introduction Distributions Current Design

    Fitting Diagnostics

    To assess the fit of a distribution, diagnostic plots should be

    http://find/
  • 7/29/2019 Simulation r

    83/128

    g pprovided

    Some useful plots area histogram or empirical density with fitted densitya log-histogram with fitted log-densitya QQ-plot with optional fitted linea PP-plot with optional fitted line

    For maximum likelihood estimation, contour plots andperspective plots for pairs of parameters, and likelihood profileplots

    David Scott, Diethelm Wurtz, Christine Dong Software for Distributions in R Outline Introduction Distributions Current Design

    Fitting Diagnostics

    To assess the fit of a distribution, diagnostic plots should be

    http://find/
  • 7/29/2019 Simulation r

    84/128

    provided

    Some useful plots area histogram or empirical density with fitted densitya log-histogram with fitted log-densitya QQ-plot with optional fitted linea PP-plot with optional fitted line

    For maximum likelihood estimation, contour plots andperspective plots for pairs of parameters, and likelihood profileplots

    David Scott Diethelm Wurtz Christine Dong Software for Distributions in R Outline Introduction Distributions Current Design

    Fitting Diagnostics

    To assess the fit of a distribution, diagnostic plots should be

    http://find/
  • 7/29/2019 Simulation r

    85/128

    provided

    Some useful plots area histogram or empirical density with fitted densitya log-histogram with fitted log-densitya QQ-plot with optional fitted linea PP-plot with optional fitted line

    For maximum likelihood estimation, contour plots andperspective plots for pairs of parameters, and likelihood profileplots

    David Scott Diethelm Wurtz Christine Dong Software for Distributions in R Outline Introduction Distributions Current Design

    Fitting Diagnostics

    To assess the fit of a distribution, diagnostic plots should be

    http://find/http://goback/
  • 7/29/2019 Simulation r

    86/128

    provided

    Some useful plots area histogram or empirical density with fitted densitya log-histogram with fitted log-densitya QQ-plot with optional fitted linea PP-plot with optional fitted line

    For maximum likelihood estimation, contour plots andperspective plots for pairs of parameters, and likelihood profileplots

    David Scott Diethelm Wurtz Christine Dong Software for Distributions in R Outline Introduction Distributions Current Design

    Fitting Diagnostics

    To assess the fit of a distribution, diagnostic plots should be

    http://find/http://goback/
  • 7/29/2019 Simulation r

    87/128

    provided

    Some useful plots area histogram or empirical density with fitted densitya log-histogram with fitted log-densitya QQ-plot with optional fitted linea PP-plot with optional fitted line

    For maximum likelihood estimation, contour plots andperspective plots for pairs of parameters, and likelihood profileplots

    David Scott Diethelm Wurtz Christine Dong Software for Distributions in R Outline Introduction Distributions Current Design

    Fitting Diagnostics

    To assess the fit of a distribution, diagnostic plots should be

    http://find/
  • 7/29/2019 Simulation r

    88/128

    provided

    Some useful plots area histogram or empirical density with fitted densitya log-histogram with fitted log-densitya QQ-plot with optional fitted linea PP-plot with optional fitted line

    For maximum likelihood estimation, contour plots andperspective plots for pairs of parameters, and likelihood profileplots

    David Scott Diethelm Wurtz Christine Dong Software for Distributions in R Outline Introduction Distributions Current Design

    Fitting

    Some generic fitting routines are currently available

    http://find/
  • 7/29/2019 Simulation r

    89/128

    mle from stats4 can be used to fit distributions but the log

    likelihood and starting values must be supplied

    fitdistr from MASS will automatically fit most of thedistributions from base R

    Other distributions can be fitted using mle by supplying the

    density and started valuesIn designing fitting functions, the structure of the objectreturned and the methods available are vital aspects

    mle returns an S4 object of class mle

    fitdistr produces and S3 object of class fitdistr

    David Scott Diethelm Wurtz Christine Dong Software for Distributions in R Outline Introduction Distributions Current Design

    Fitting

    Some generic fitting routines are currently available

    http://find/http://goback/
  • 7/29/2019 Simulation r

    90/128

    mle from stats4 can be used to fit distributions but the log

    likelihood and starting values must be supplied

    fitdistr from MASS will automatically fit most of thedistributions from base R

    Other distributions can be fitted using mle by supplying the

    density and started valuesIn designing fitting functions, the structure of the objectreturned and the methods available are vital aspects

    mle returns an S4 object of class mle

    fitdistr produces and S3 object of class fitdistr

    David Scott Diethelm Wurtz Christine Dong Software for Distributions in R Outline Introduction Distributions Current Design

    Fitting

    Some generic fitting routines are currently available

    http://find/
  • 7/29/2019 Simulation r

    91/128

    mle from stats4 can be used to fit distributions but the log

    likelihood and starting values must be supplied

    fitdistr from MASS will automatically fit most of thedistributions from base R

    Other distributions can be fitted using mle by supplying the

    density and started valuesIn designing fitting functions, the structure of the objectreturned and the methods available are vital aspects

    mle returns an S4 object of class mle

    fitdistr produces and S3 object of class fitdistr

    David Scott Diethelm Wurtz Christine Dong Software for Distributions in R Outline Introduction Distributions Current Design

    Fitting

    Some generic fitting routines are currently available

    http://find/
  • 7/29/2019 Simulation r

    92/128

    mle from stats4 can be used to fit distributions but the log

    likelihood and starting values must be supplied

    fitdistr from MASS will automatically fit most of thedistributions from base R

    Other distributions can be fitted using mle by supplying the

    density and started valuesIn designing fitting functions, the structure of the objectreturned and the methods available are vital aspects

    mle returns an S4 object of class mle

    fitdistr produces and S3 object of class fitdistr

    David Scott Diethelm Wurtz Christine Dong Software for Distributions in R Outline Introduction Distributions Current Design

    Fitting

    Some generic fitting routines are currently available

    f fi

    http://find/http://goback/
  • 7/29/2019 Simulation r

    93/128

    mle from stats4 can be used to fit distributions but the log

    likelihood and starting values must be supplied

    fitdistr from MASS will automatically fit most of thedistributions from base R

    Other distributions can be fitted using mle by supplying the

    density and started valuesIn designing fitting functions, the structure of the objectreturned and the methods available are vital aspects

    mle returns an S4 object of class mle

    fitdistr produces and S3 object of class fitdistr

    David Scott Diethelm Wurtz Christine Dong Software for Distributions in R Outline Introduction Distributions Current Design

    Fitting

    Some generic fitting routines are currently available

    f 4 b d fi di ib i b h l

    http://find/http://goback/
  • 7/29/2019 Simulation r

    94/128

    mle from stats4 can be used to fit distributions but the log

    likelihood and starting values must be suppliedfitdistr from MASS will automatically fit most of thedistributions from base R

    Other distributions can be fitted using mle by supplying the

    density and started valuesIn designing fitting functions, the structure of the objectreturned and the methods available are vital aspects

    mle returns an S4 object of class mle

    fitdistr produces and S3 object of class fitdistr

    David Scott Diethelm Wurtz Christine Dong Software for Distributions in R Outline Introduction Distributions Current Design

    Fitting

    Some generic fitting routines are currently available

    l f 4 b d t fit di t ib ti b t th l

    http://find/
  • 7/29/2019 Simulation r

    95/128

    mle from stats4 can be used to fit distributions but the log

    likelihood and starting values must be suppliedfitdistr from MASS will automatically fit most of thedistributions from base R

    Other distributions can be fitted using mle by supplying the

    density and started valuesIn designing fitting functions, the structure of the objectreturned and the methods available are vital aspects

    mle returns an S4 object of class mle

    fitdistr produces and S3 object of class fitdistr

    David Scott Diethelm Wurtz Christine Dong Software for Distributions in R Outline Introduction Distributions Current Design

    Fitting

    The methods available for an object of class mle are:Method Action

    http://find/
  • 7/29/2019 Simulation r

    96/128

    Method Action

    confint Confidence intervals from likelihood profileslogLik Extract maximized log-likelihoodprofile Likelihood profile generationshow Display object brieflysummary Generate object summary

    update Update fitvcov Extract variance-covariance matrix

    For fitdistr the methods are print, coef, and logLik

    Neither function returns the data, so a plot method which

    produces suitable diagnostic plots is not possible

    Ideally a fit should return and object of class distFit say,and the mle class should extend that

    David Scott Diethelm Wurtz Christine Dong Software for Distributions in R Outline Introduction Distributions Current Design

    Fitting

    The methods available for an object of class mle are:Method Action

    http://find/http://goback/
  • 7/29/2019 Simulation r

    97/128

    Method Action

    confint Confidence intervals from likelihood profileslogLik Extract maximized log-likelihoodprofile Likelihood profile generationshow Display object brieflysummary Generate object summary

    update Update fitvcov Extract variance-covariance matrix

    For fitdistr the methods are print, coef, and logLik

    Neither function returns the data, so a plot method which

    produces suitable diagnostic plots is not possible

    Ideally a fit should return and object of class distFit say,and the mle class should extend that

    David Scott Diethelm Wurtz Christine Dong Software for Distributions in R Outline Introduction Distributions Current Design

    Fitting

    The methods available for an object of class mle are:Method Action

    http://find/
  • 7/29/2019 Simulation r

    98/128

    Method Action

    confint Confidence intervals from likelihood profileslogLik Extract maximized log-likelihoodprofile Likelihood profile generationshow Display object brieflysummary Generate object summary

    update Update fitvcov Extract variance-covariance matrix

    For fitdistr the methods are print, coef, and logLik

    Neither function returns the data, so a plot method which

    produces suitable diagnostic plots is not possible

    Ideally a fit should return and object of class distFit say,and the mle class should extend that

    D id S tt Diethel W t Ch isti e D S ft e f Dist ib ti s i R Outline Introduction Distributions Current Design

    Fitting

    The methods available for an object of class mle are:Method Action

    http://find/
  • 7/29/2019 Simulation r

    99/128

    t o ct o

    confint Confidence intervals from likelihood profileslogLik Extract maximized log-likelihoodprofile Likelihood profile generationshow Display object brieflysummary Generate object summary

    update Update fitvcov Extract variance-covariance matrix

    For fitdistr the methods are print, coef, and logLik

    Neither function returns the data, so a plot method which

    produces suitable diagnostic plots is not possible

    Ideally a fit should return and object of class distFit say,and the mle class should extend that

    David Scott, Diethelm Wurtz, Christine Dong Software for Distributions in R

    Outline Introduction Distributions Current Design

    Some Principles

    There are obvious advantages in a standard design, both fordevelopers and for users

    http://find/
  • 7/29/2019 Simulation r

    100/128

    p

    Some principles arethe major guide to the design should be what exists in base Rthe design should be logical with as few special cases aspossiblethe design should minimize the possibility of programming

    mistakes by users and developersthe design should simplify as much as possible the provision ofthe range of facilities needed to implement a distribution

    A naming scheme for functions is an important part of astandard

    David Scott, Diethelm Wurtz, Christine Dong Software for Distributions in R

    Outline Introduction Distributions Current Design

    Some Principles

    There are obvious advantages in a standard design, both fordevelopers and for users

    http://find/http://goback/
  • 7/29/2019 Simulation r

    101/128

    p

    Some principles arethe major guide to the design should be what exists in base Rthe design should be logical with as few special cases aspossiblethe design should minimize the possibility of programming

    mistakes by users and developersthe design should simplify as much as possible the provision ofthe range of facilities needed to implement a distribution

    A naming scheme for functions is an important part of astandard

    David Scott, Diethelm Wurtz, Christine Dong Software for Distributions in R

    Outline Introduction Distributions Current Design

    Some Principles

    There are obvious advantages in a standard design, both fordevelopers and for users

    http://find/
  • 7/29/2019 Simulation r

    102/128

    p

    Some principles arethe major guide to the design should be what exists in base Rthe design should be logical with as few special cases aspossiblethe design should minimize the possibility of programming

    mistakes by users and developersthe design should simplify as much as possible the provision ofthe range of facilities needed to implement a distribution

    A naming scheme for functions is an important part of astandard

    David Scott, Diethelm Wurtz, Christine Dong Software for Distributions in R

    Outline Introduction Distributions Current Design

    Some Principles

    There are obvious advantages in a standard design, both fordevelopers and for users

    http://find/
  • 7/29/2019 Simulation r

    103/128

    Some principles arethe major guide to the design should be what exists in base Rthe design should be logical with as few special cases aspossiblethe design should minimize the possibility of programming

    mistakes by users and developersthe design should simplify as much as possible the provision ofthe range of facilities needed to implement a distribution

    A naming scheme for functions is an important part of astandard

    David Scott, Diethelm Wurtz, Christine Dong Software for Distributions in R

    Outline Introduction Distributions Current Design

    Some Principles

    There are obvious advantages in a standard design, both fordevelopers and for users

    http://find/http://goback/
  • 7/29/2019 Simulation r

    104/128

    Some principles arethe major guide to the design should be what exists in base Rthe design should be logical with as few special cases aspossiblethe design should minimize the possibility of programming

    mistakes by users and developersthe design should simplify as much as possible the provision ofthe range of facilities needed to implement a distribution

    A naming scheme for functions is an important part of astandard

    David Scott, Diethelm Wurtz, Christine Dong Software for Distributions in R

    http://find/
  • 7/29/2019 Simulation r

    105/128

    Outline Introduction Distributions Current Design

    Some Principles

    There are obvious advantages in a standard design, both fordevelopers and for users

  • 7/29/2019 Simulation r

    106/128

    Some principles arethe major guide to the design should be what exists in base Rthe design should be logical with as few special cases aspossiblethe design should minimize the possibility of programming

    mistakes by users and developersthe design should simplify as much as possible the provision ofthe range of facilities needed to implement a distribution

    A naming scheme for functions is an important part of astandard

    David Scott, Diethelm Wurtz, Christine Dong Software for Distributions in R

    Outline Introduction Distributions Current Design

    A Possible Naming Scheme

    Function Name Description

    ddist Density function or probability functiondi t

    http://find/
  • 7/29/2019 Simulation r

    107/128

    pdist

    Cumulative distribution functionqdist Quantile functionrdist Random number generatordistMean Theoretical meandistVar Theoretical variance

    distSkew Theoretical skewnessdistKurt Theoretical kurtosisdistMode Mode of the densitydistMoments Theoretical moments (and mode)distMGF Moment generating function

    distChFn Characteristic functiondistChangePars Change parameterization of the distributiondistFit Result of fitting the distribution to datadistTest Test the distribution

    David Scott, Diethelm Wurtz, Christine Dong Software for Distributions in R

    Outline Introduction Distributions Current Design

    Some Alternatives

    Use dots: hyperb.mean, usually deprecated because ofconfusion with S3 methods

    http://find/http://goback/
  • 7/29/2019 Simulation r

    108/128

    Use initial letters: mhyperb, but what about the mgf,multivariate distributions?

    David Scott, Diethelm Wurtz, Christine Dong Software for Distributions in R

    Outline Introduction Distributions Current Design

    Some Alternatives

    Use dots: hyperb.mean, usually deprecated because ofconfusion with S3 methods

    http://find/http://goback/
  • 7/29/2019 Simulation r

    109/128

    Use initial letters: mhyperb, but what about the mgf,multivariate distributions?

    David Scott, Diethelm Wurtz, Christine Dong Software for Distributions in R

    Outline Introduction Distributions Current Design

    Function Arguments

    Specification of parameters could allow for the parameters tobe specified as a vector

    http://find/
  • 7/29/2019 Simulation r

    110/128

    This is helpful for maximum likelihood estimationThe approach used for the gamma distribution can be used toallow for both single parameter specification and vectorparameter specification

    Separate parameters should be in the order of location, scale,skewness, kurtosis

    David Scott, Diethelm Wurtz, Christine Dong Software for Distributions in R

    Outline Introduction Distributions Current Design

    Function Arguments

    Specification of parameters could allow for the parameters tobe specified as a vector

    http://find/
  • 7/29/2019 Simulation r

    111/128

    This is helpful for maximum likelihood estimationThe approach used for the gamma distribution can be used toallow for both single parameter specification and vectorparameter specification

    Separate parameters should be in the order of location, scale,skewness, kurtosis

    David Scott, Diethelm Wurtz, Christine Dong Software for Distributions in R

    Outline Introduction Distributions Current Design

    Function Arguments

    Specification of parameters could allow for the parameters tobe specified as a vector

    http://find/
  • 7/29/2019 Simulation r

    112/128

    This is helpful for maximum likelihood estimationThe approach used for the gamma distribution can be used toallow for both single parameter specification and vectorparameter specification

    Separate parameters should be in the order of location, scale,skewness, kurtosis

    David Scott, Diethelm Wurtz, Christine Dong Software for Distributions in R

    Outline Introduction Distributions Current Design

    Function Arguments

    Specification of parameters could allow for the parameters tobe specified as a vector

    http://find/http://goback/
  • 7/29/2019 Simulation r

    113/128

    This is helpful for maximum likelihood estimationThe approach used for the gamma distribution can be used toallow for both single parameter specification and vectorparameter specification

    Separate parameters should be in the order of location, scale,skewness, kurtosis

    David Scott, Diethelm Wurtz, Christine Dong Software for Distributions in R

    Outline Introduction Distributions Current Design

    Classes

    I dont have a view on whether S3 or S4 classes should beused, but probably S4 classes should be aimed for

    http://find/
  • 7/29/2019 Simulation r

    114/128

    For a fit from a distribution, the class could be called distfitA fit for a particular distribution would add the distributionname: hyperbfit, distfit with S3 methods

    For S4 methods the class distfit would be extended

    Similar ideas would be used for tests: a Kolomogorov-Smirnovtest would have class kstest, htest with S3 methods

    David Scott, Diethelm Wurtz, Christine Dong Software for Distributions in R

    Outline Introduction Distributions Current Design

    Classes

    I dont have a view on whether S3 or S4 classes should beused, but probably S4 classes should be aimed for

    http://find/
  • 7/29/2019 Simulation r

    115/128

    For a fit from a distribution, the class could be called distfitA fit for a particular distribution would add the distributionname: hyperbfit, distfit with S3 methods

    For S4 methods the class distfit would be extended

    Similar ideas would be used for tests: a Kolomogorov-Smirnovtest would have class kstest, htest with S3 methods

    David Scott, Diethelm Wurtz, Christine Dong Software for Distributions in R

    Outline Introduction Distributions Current Design

    Classes

    I dont have a view on whether S3 or S4 classes should beused, but probably S4 classes should be aimed for

    http://find/http://goback/
  • 7/29/2019 Simulation r

    116/128

    For a fit from a distribution, the class could be called distfitA fit for a particular distribution would add the distributionname: hyperbfit, distfit with S3 methods

    For S4 methods the class distfit would be extended

    Similar ideas would be used for tests: a Kolomogorov-Smirnovtest would have class kstest, htest with S3 methods

    David Scott, Diethelm Wurtz, Christine Dong Software for Distributions in R

    Outline Introduction Distributions Current Design

    Classes

    I dont have a view on whether S3 or S4 classes should beused, but probably S4 classes should be aimed for

    http://find/
  • 7/29/2019 Simulation r

    117/128

    For a fit from a distribution, the class could be called distfitA fit for a particular distribution would add the distributionname: hyperbfit, distfit with S3 methods

    For S4 methods the class distfit would be extended

    Similar ideas would be used for tests: a Kolomogorov-Smirnovtest would have class kstest, htest with S3 methods

    David Scott, Diethelm Wurtz, Christine Dong Software for Distributions in R

    Outline Introduction Distributions Current Design

    Classes

    I dont have a view on whether S3 or S4 classes should beused, but probably S4 classes should be aimed for

    http://find/http://goback/
  • 7/29/2019 Simulation r

    118/128

    For a fit from a distribution, the class could be called distfitA fit for a particular distribution would add the distributionname: hyperbfit, distfit with S3 methods

    For S4 methods the class distfit would be extended

    Similar ideas would be used for tests: a Kolomogorov-Smirnovtest would have class kstest, htest with S3 methods

    David Scott, Diethelm Wurtz, Christine Dong Software for Distributions in R

    Outline Introduction Distributions Current Design

    Testing

    Testing is vital to quality of software and developers shouldprovide test data and code

    http://find/http://goback/
  • 7/29/2019 Simulation r

    119/128

    Firstly test parameter sets should be provided, which coverthe range of the parameter set

    Unit tests should be provided for all functions: the RUnitpackage supports this

    The distributions in Rmetrics have tests of this type, althoughfurther development seems warranted

    The Massart inequality seems ideal to use in testing

    David Scott, Diethelm Wurtz, Christine Dong Software for Distributions in R

    Outline Introduction Distributions Current Design

    Testing

    Testing is vital to quality of software and developers shouldprovide test data and code

    http://find/http://goback/
  • 7/29/2019 Simulation r

    120/128

    Firstly test parameter sets should be provided, which coverthe range of the parameter set

    Unit tests should be provided for all functions: the RUnitpackage supports this

    The distributions in Rmetrics have tests of this type, althoughfurther development seems warranted

    The Massart inequality seems ideal to use in testing

    David Scott, Diethelm Wurtz, Christine Dong Software for Distributions in R

    Outline Introduction Distributions Current Design

    Testing

    Testing is vital to quality of software and developers shouldprovide test data and code

    http://find/http://goback/
  • 7/29/2019 Simulation r

    121/128

    Firstly test parameter sets should be provided, which coverthe range of the parameter set

    Unit tests should be provided for all functions: the RUnitpackage supports this

    The distributions in Rmetrics have tests of this type, althoughfurther development seems warranted

    The Massart inequality seems ideal to use in testing

    David Scott, Diethelm Wurtz, Christine Dong Software for Distributions in R

    Outline Introduction Distributions Current Design

    Testing

    Testing is vital to quality of software and developers shouldprovide test data and code

    http://find/http://goback/
  • 7/29/2019 Simulation r

    122/128

    Firstly test parameter sets should be provided, which coverthe range of the parameter set

    Unit tests should be provided for all functions: the RUnitpackage supports this

    The distributions in Rmetrics have tests of this type, althoughfurther development seems warranted

    The Massart inequality seems ideal to use in testing

    David Scott, Diethelm Wurtz, Christine Dong Software for Distributions in R

    Outline Introduction Distributions Current Design

    Testing

    Testing is vital to quality of software and developers shouldprovide test data and code

    http://find/http://goback/
  • 7/29/2019 Simulation r

    123/128

    Firstly test parameter sets should be provided, which coverthe range of the parameter set

    Unit tests should be provided for all functions: the RUnitpackage supports this

    The distributions in Rmetrics have tests of this type, althoughfurther development seems warranted

    The Massart inequality seems ideal to use in testing

    David Scott, Diethelm Wurtz, Christine Dong Software for Distributions in R

    Outline Introduction Distributions Current Design

    Final Thoughts

    The distr package is an object-oriented implementation ofdistributions

    I f l d b h l

    http://goforward/http://find/http://goback/
  • 7/29/2019 Simulation r

    124/128

    It facilitates operations on distributions such as convolutionsIt uses sampling for calculation of moments and distributionfunctions

    The package VarianceGamma has been designed andimplemented using these ideas

    Implementing the logarithm options of the dpq functions isquite difficult for the distributions in which I am interested

    David Scott, Diethelm Wurtz, Christine Dong Software for Distributions in R

    Outline Introduction Distributions Current Design

    Final Thoughts

    The distr package is an object-oriented implementation ofdistributions

    I f ili i di ib i h l i

    http://find/
  • 7/29/2019 Simulation r

    125/128

    It facilitates operations on distributions such as convolutionsIt uses sampling for calculation of moments and distributionfunctions

    The package VarianceGamma has been designed andimplemented using these ideas

    Implementing the logarithm options of the dpq functions isquite difficult for the distributions in which I am interested

    David Scott, Diethelm Wurtz, Christine Dong Software for Distributions in R

    Outline Introduction Distributions Current Design

    Final Thoughts

    The distr package is an object-oriented implementation ofdistributions

    I f ili i di ib i h l i

    http://find/
  • 7/29/2019 Simulation r

    126/128

    It facilitates operations on distributions such as convolutionsIt uses sampling for calculation of moments and distributionfunctions

    The package VarianceGamma has been designed andimplemented using these ideas

    Implementing the logarithm options of the dpq functions isquite difficult for the distributions in which I am interested

    David Scott, Diethelm Wurtz, Christine Dong Software for Distributions in R

    Outline Introduction Distributions Current Design

    Final Thoughts

    The distr package is an object-oriented implementation ofdistributions

    It f ilit t ti di t ib ti h l ti

    http://find/http://goback/
  • 7/29/2019 Simulation r

    127/128

    It facilitates operations on distributions such as convolutionsIt uses sampling for calculation of moments and distributionfunctions

    The package VarianceGamma has been designed andimplemented using these ideas

    Implementing the logarithm options of the dpq functions isquite difficult for the distributions in which I am interested

    David Scott, Diethelm Wurtz, Christine Dong Software for Distributions in R

    Outline Introduction Distributions Current Design

    Final Thoughts

    The distr package is an object-oriented implementation ofdistributions

    It f ilit t ti di t ib ti h l ti

    http://find/http://goback/
  • 7/29/2019 Simulation r

    128/128

    It facilitates operations on distributions such as convolutionsIt uses sampling for calculation of moments and distributionfunctions

    The package VarianceGamma has been designed andimplemented using these ideas

    Implementing the logarithm options of the dpq functions isquite difficult for the distributions in which I am interested

    David Scott, Diethelm Wurtz, Christine Dong Software for Distributions in R

    http://find/