Upload
verstatik
View
216
Download
0
Embed Size (px)
Citation preview
7/29/2019 Simulation r
1/128
Outline Introduction Distributions Current Design
Software for Distributions in R
David Scott1 Diethelm Wurtz2 Christine Dong1
1Department of Statistics
The University of Auckland
2Institut fur Theoretische Physik
ETH Zurich
June 30, 2009
David Scott, Diethelm Wurtz, Christine Dong Software for Distributions in R
http://find/http://goback/7/29/2019 Simulation r
2/128
Outline Introduction Distributions Current Design
Outline
1 Introduction
2 Current Software for DistributionsBase RContributed Packages
3 Implementation in Base R
4 Design for Distribution Implementation
David Scott, Diethelm Wurtz, Christine Dong Software for Distributions in R
http://find/7/29/2019 Simulation r
3/128
Outline Introduction Distributions Current Design
Outline
1 Introduction
2 Current Software for DistributionsBase RContributed Packages
3 Implementation in Base R
4 Design for Distribution Implementation
David Scott, Diethelm Wurtz, Christine Dong Software for Distributions in R
http://find/7/29/2019 Simulation r
4/128
Outline Introduction Distributions Current Design
Outline
1 Introduction
2 Current Software for DistributionsBase R
Contributed Packages
3 Implementation in Base R
4 Design for Distribution Implementation
David Scott, Diethelm Wurtz, Christine Dong Software for Distributions in R
http://find/7/29/2019 Simulation r
5/128
Outline Introduction Distributions Current Design
Outline
1 Introduction
2 Current Software for DistributionsBase R
Contributed Packages
3 Implementation in Base R
4 Design for Distribution Implementation
David Scott, Diethelm Wurtz, Christine Dong Software for Distributions in R
O li I d i Di ib i C D i
http://find/7/29/2019 Simulation r
6/128
Outline Introduction Distributions Current Design
Outline
1 Introduction
2 Current Software for DistributionsBase R
Contributed Packages
3 Implementation in Base R
4 Design for Distribution Implementation
David Scott, Diethelm Wurtz, Christine Dong Software for Distributions in R
O tli I t d ti Di t ib ti C t D i
http://find/http://goback/7/29/2019 Simulation r
7/128
Outline Introduction Distributions Current Design
Outline
1 Introduction
2 Current Software for DistributionsBase R
Contributed Packages
3 Implementation in Base R
4 Design for Distribution Implementation
David Scott, Diethelm Wurtz, Christine Dong Software for Distributions in R
Outline Introduction Distributions Current Design
http://find/http://goback/7/29/2019 Simulation r
8/128
Outline Introduction Distributions Current Design
1 Introduction
2 Current Software for Distributions
3 Implementation in Base R
4 Design for Distribution Implementation
David Scott, Diethelm Wurtz, Christine Dong Software for Distributions in R
Outline Introduction Distributions Current Design
http://find/http://goback/7/29/2019 Simulation r
9/128
Outline Introduction Distributions Current Design
Distributions
Distributions are how we model uncertainty
There is well-established theory concerning distributions
There are standard approaches for fitting distributions
There are many distributions which have been found to be ofinterest
Software implementation of distributions is a well-definedsubject in comparision to say modelling of time-series
David Scott, Diethelm Wurtz, Christine Dong Software for Distributions in R
Outline Introduction Distributions Current Design
http://find/7/29/2019 Simulation r
10/128
Outline Introduction Distributions Current Design
Distributions
Distributions are how we model uncertainty
There is well-established theory concerning distributionsThere are standard approaches for fitting distributions
There are many distributions which have been found to be ofinterest
Software implementation of distributions is a well-definedsubject in comparision to say modelling of time-series
David Scott, Diethelm Wurtz, Christine Dong Software for Distributions in R
Outline Introduction Distributions Current Design
http://find/7/29/2019 Simulation r
11/128
Outline Introduction Distributions Current Design
Distributions
Distributions are how we model uncertainty
There is well-established theory concerning distributionsThere are standard approaches for fitting distributions
There are many distributions which have been found to be ofinterest
Software implementation of distributions is a well-definedsubject in comparision to say modelling of time-series
David Scott, Diethelm Wurtz, Christine Dong Software for Distributions in R
Outline Introduction Distributions Current Design
http://find/http://goback/7/29/2019 Simulation r
12/128
g
Distributions
Distributions are how we model uncertainty
There is well-established theory concerning distributionsThere are standard approaches for fitting distributions
There are many distributions which have been found to be ofinterest
Software implementation of distributions is a well-definedsubject in comparision to say modelling of time-series
David Scott, Diethelm Wurtz, Christine Dong Software for Distributions in R
Outline Introduction Distributions Current Design
http://find/7/29/2019 Simulation r
13/128
g
Distributions
Distributions are how we model uncertainty
There is well-established theory concerning distributionsThere are standard approaches for fitting distributions
There are many distributions which have been found to be ofinterest
Software implementation of distributions is a well-definedsubject in comparision to say modelling of time-series
David Scott, Diethelm Wurtz, Christine Dong Software for Distributions in R
Outline Introduction Distributions Current Design
http://find/7/29/2019 Simulation r
14/128
Introduction
In base R there are 20 distributions implemented, at least in
partAll univariateconsider univariate distributions only
Numerous other distributions have been implemented in R
CRAN packages solely devoted to one or more distributionsCRAN packages which implement distributions incidentally(e.g. VGAM)implementations of distributions not on CRAN, e.g JimLindseys work
David Scott, Diethelm Wurtz, Christine Dong Software for Distributions in R
Outline Introduction Distributions Current Design
http://find/http://goback/7/29/2019 Simulation r
15/128
Introduction
In base R there are 20 distributions implemented, at least in
partAll univariateconsider univariate distributions only
Numerous other distributions have been implemented in R
CRAN packages solely devoted to one or more distributionsCRAN packages which implement distributions incidentally(e.g. VGAM)implementations of distributions not on CRAN, e.g JimLindseys work
David Scott, Diethelm Wurtz, Christine Dong Software for Distributions in R
Outline Introduction Distributions Current Design
http://find/7/29/2019 Simulation r
16/128
Introduction
In base R there are 20 distributions implemented, at least in
partAll univariateconsider univariate distributions only
Numerous other distributions have been implemented in R
CRAN packages solely devoted to one or more distributionsCRAN packages which implement distributions incidentally
(e.g. VGAM)implementations of distributions not on CRAN, e.g JimLindseys work
David Scott, Diethelm Wurtz, Christine Dong Software for Distributions in R
Outline Introduction Distributions Current Design
http://find/7/29/2019 Simulation r
17/128
Introduction
In base R there are 20 distributions implemented, at least in
partAll univariateconsider univariate distributions only
Numerous other distributions have been implemented in R
CRAN packages solely devoted to one or more distributionsCRAN packages which implement distributions incidentally
(e.g. VGAM)implementations of distributions not on CRAN, e.g JimLindseys work
David Scott, Diethelm Wurtz, Christine Dong Software for Distributions in R
Outline Introduction Distributions Current Design
http://find/http://goback/7/29/2019 Simulation r
18/128
Introduction
In base R there are 20 distributions implemented, at least in
partAll univariateconsider univariate distributions only
Numerous other distributions have been implemented in R
CRAN packages solely devoted to one or more distributionsCRAN packages which implement distributions incidentally
(e.g. VGAM)implementations of distributions not on CRAN, e.g JimLindseys work
David Scott, Diethelm Wurtz, Christine Dong Software for Distributions in R
Outline Introduction Distributions Current Design
http://find/http://goback/7/29/2019 Simulation r
19/128
Introduction
In base R there are 20 distributions implemented, at least in
partAll univariateconsider univariate distributions only
Numerous other distributions have been implemented in R
CRAN packages solely devoted to one or more distributionsCRAN packages which implement distributions incidentally
(e.g. VGAM)implementations of distributions not on CRAN, e.g JimLindseys work
David Scott, Diethelm Wurtz, Christine Dong Software for Distributions in R
Outline Introduction Distributions Current Design
http://find/7/29/2019 Simulation r
20/128
Introduction
There are overlaps in coverage of distributions in R
Implementations of distributions in R are inconsistentnaming of objectsparameterizationsfunction argumentsfunctionality
return structuresIt is useful to discuss some standardization of implementationof software for distributions
David Scott, Diethelm Wurtz, Christine Dong Software for Distributions in R
Outline Introduction Distributions Current Design
http://find/7/29/2019 Simulation r
21/128
Introduction
There are overlaps in coverage of distributions in R
Implementations of distributions in R are inconsistentnaming of objectsparameterizationsfunction argumentsfunctionality
return structuresIt is useful to discuss some standardization of implementationof software for distributions
David Scott, Diethelm Wurtz, Christine Dong Software for Distributions in R
Outline Introduction Distributions Current Design
http://find/http://goback/7/29/2019 Simulation r
22/128
Introduction
There are overlaps in coverage of distributions in R
Implementations of distributions in R are inconsistentnaming of objectsparameterizationsfunction argumentsfunctionality
return structuresIt is useful to discuss some standardization of implementationof software for distributions
David Scott, Diethelm Wurtz, Christine Dong Software for Distributions in R
Outline Introduction Distributions Current Design
http://find/7/29/2019 Simulation r
23/128
Introduction
There are overlaps in coverage of distributions in R
Implementations of distributions in R are inconsistentnaming of objectsparameterizationsfunction argumentsfunctionality
return structuresIt is useful to discuss some standardization of implementationof software for distributions
David Scott, Diethelm Wurtz, Christine Dong Software for Distributions in R
Outline Introduction Distributions Current Design
I d i
http://find/http://goback/7/29/2019 Simulation r
24/128
Introduction
There are overlaps in coverage of distributions in R
Implementations of distributions in R are inconsistentnaming of objectsparameterizationsfunction argumentsfunctionality
return structuresIt is useful to discuss some standardization of implementationof software for distributions
David Scott, Diethelm Wurtz, Christine Dong Software for Distributions in R
Outline Introduction Distributions Current Design
I d i
http://find/http://goback/7/29/2019 Simulation r
25/128
Introduction
There are overlaps in coverage of distributions in R
Implementations of distributions in R are inconsistentnaming of objectsparameterizationsfunction argumentsfunctionality
return structuresIt is useful to discuss some standardization of implementationof software for distributions
David Scott, Diethelm Wurtz, Christine Dong Software for Distributions in R
Outline Introduction Distributions Current Design
I t d ti
http://find/http://goback/7/29/2019 Simulation r
26/128
Introduction
There are overlaps in coverage of distributions in R
Implementations of distributions in R are inconsistentnaming of objectsparameterizationsfunction argumentsfunctionality
return structuresIt is useful to discuss some standardization of implementationof software for distributions
David Scott, Diethelm Wurtz, Christine Dong Software for Distributions in R
Outline Introduction Distributions Current Design
I t d ti
http://find/http://goback/7/29/2019 Simulation r
27/128
Introduction
There are overlaps in coverage of distributions in R
Implementations of distributions in R are inconsistentnaming of objectsparameterizationsfunction argumentsfunctionality
return structuresIt is useful to discuss some standardization of implementationof software for distributions
David Scott, Diethelm Wurtz, Christine Dong Software for Distributions in R
Outline Introduction Distributions Current Design
St d di ti
http://find/7/29/2019 Simulation r
28/128
Standardization
There are benefits to a standardized approach
easier for userseasier for developersfewer errors
Deserves thought, even if not prescriptive
Perhaps too late!
David Scott, Diethelm Wurtz, Christine Dong Software for Distributions in R
Outline Introduction Distributions Current Design
Standardization
http://find/http://goback/7/29/2019 Simulation r
29/128
Standardization
There are benefits to a standardized approach
easier for userseasier for developersfewer errors
Deserves thought, even if not prescriptive
Perhaps too late!
David Scott, Diethelm Wurtz, Christine Dong Software for Distributions in R
Outline Introduction Distributions Current Design
Standardization
http://find/7/29/2019 Simulation r
30/128
Standardization
There are benefits to a standardized approach
easier for userseasier for developersfewer errors
Deserves thought, even if not prescriptive
Perhaps too late!
David Scott, Diethelm Wurtz, Christine Dong Software for Distributions in R
Outline Introduction Distributions Current Design
Standardization
http://find/7/29/2019 Simulation r
31/128
Standardization
There are benefits to a standardized approach
easier for userseasier for developersfewer errors
Deserves thought, even if not prescriptive
Perhaps too late!
David Scott, Diethelm Wurtz, Christine Dong Software for Distributions in R
Outline Introduction Distributions Current Design
Standardization
http://find/http://goback/7/29/2019 Simulation r
32/128
Standardization
There are benefits to a standardized approach
easier for userseasier for developersfewer errors
Deserves thought, even if not prescriptive
Perhaps too late!
David Scott, Diethelm Wurtz, Christine Dong Software for Distributions in R
Outline Introduction Distributions Current Design
Standardization
http://find/7/29/2019 Simulation r
33/128
Standardization
There are benefits to a standardized approach
easier for userseasier for developersfewer errors
Deserves thought, even if not prescriptive
Perhaps too late!
David Scott, Diethelm Wurtz, Christine Dong Software for Distributions in R
Outline Introduction Distributions Current Design Base R Contributed
http://find/7/29/2019 Simulation r
34/128
1 Introduction
2 Current Software for Distributions
3 Implementation in Base R
4 Design for Distribution Implementation
David Scott, Diethelm Wurtz, Christine Dong Software for Distributions in R
Outline Introduction Distributions Current Design Base R Contributed
Distributions in Base R
http://find/http://goback/7/29/2019 Simulation r
35/128
Distributions in Base R
Implementation in R is essentially the provision ofdpqr
functions: the density (or probability) function, distributionfunction, quantile or inverse distribution function and randomnumber generation
The distributions are the binomial (binom), geometric (geom),hypergeometric (hyper), negative binomial (nbinom), Poisson
(pois), Wilcoxon signed rank statistic (signrank), Wilcoxonrank sum statistic (wilcox), beta (beta), Cauchy (Cauchy),non-central chi-squared (chisq), exponential (exp),F (f),gamma (gamma), log-normal (lnorm), logistic (logis),normal (norm), t (t), uniform (unif), Weibull (weibull),and Tukey studentized range (tukey) for which only the pand q functions are implemented
David Scott, Diethelm Wurtz, Christine Dong Software for Distributions in R
Outline Introduction Distributions Current Design Base R Contributed
Distributions in Base R
http://find/7/29/2019 Simulation r
36/128
Distributions in Base R
Implementation in R is essentially the provision ofdpqr
functions: the density (or probability) function, distributionfunction, quantile or inverse distribution function and randomnumber generation
The distributions are the binomial (binom), geometric (geom),hypergeometric (hyper), negative binomial (nbinom), Poisson
(pois), Wilcoxon signed rank statistic (signrank), Wilcoxonrank sum statistic (wilcox), beta (beta), Cauchy (Cauchy),non-central chi-squared (chisq), exponential (exp),F (f),gamma (gamma), log-normal (lnorm), logistic (logis),normal (norm), t (t), uniform (unif), Weibull (weibull),and Tukey studentized range (tukey) for which only the pand q functions are implemented
David Scott, Diethelm Wurtz, Christine Dong Software for Distributions in R
Outline Introduction Distributions Current Design Base R Contributed
Constributed Packages
http://find/http://goback/7/29/2019 Simulation r
37/128
Constributed Packages
Many distributions are implemented in R
The following list is not completesee the task viewhttp://cran.r-project.org/web/views/Distributions.html
Primarily packages which deal with a particular distribution orset of related distributions
Some packages not on CRAN are nonetheless available
I know of other implementations not on CRAN and notavailable
David Scott, Diethelm Wurtz, Christine Dong Software for Distributions in R
Outline Introduction Distributions Current Design Base R Contributed
Constributed Packages
http://cran.r-project.org/web/views/Distributions.htmlhttp://cran.r-project.org/web/views/Distributions.htmlhttp://find/http://goback/7/29/2019 Simulation r
38/128
C b g
Many distributions are implemented in R
The following list is not completesee the task viewhttp://cran.r-project.org/web/views/Distributions.html
Primarily packages which deal with a particular distribution orset of related distributions
Some packages not on CRAN are nonetheless available
I know of other implementations not on CRAN and notavailable
David Scott, Diethelm Wurtz, Christine Dong Software for Distributions in R
Outline Introduction Distributions Current Design Base R Contributed
Constributed Packages
http://cran.r-project.org/web/views/Distributions.htmlhttp://cran.r-project.org/web/views/Distributions.htmlhttp://find/7/29/2019 Simulation r
39/128
g
Many distributions are implemented in R
The following list is not completesee the task viewhttp://cran.r-project.org/web/views/Distributions.html
Primarily packages which deal with a particular distribution orset of related distributions
Some packages not on CRAN are nonetheless available
I know of other implementations not on CRAN and notavailable
David Scott, Diethelm Wurtz, Christine Dong Software for Distributions in R
Outline Introduction Distributions Current Design Base R Contributed
Constributed Packages
http://cran.r-project.org/web/views/Distributions.htmlhttp://cran.r-project.org/web/views/Distributions.htmlhttp://find/http://goback/7/29/2019 Simulation r
40/128
g
Many distributions are implemented in R
The following list is not completesee the task viewhttp://cran.r-project.org/web/views/Distributions.html
Primarily packages which deal with a particular distribution orset of related distributions
Some packages not on CRAN are nonetheless available
I know of other implementations not on CRAN and notavailable
David Scott, Diethelm Wurtz, Christine Dong Software for Distributions in R
Outline Introduction Distributions Current Design Base R Contributed
Constributed Packages
http://cran.r-project.org/web/views/Distributions.htmlhttp://cran.r-project.org/web/views/Distributions.htmlhttp://find/http://goback/7/29/2019 Simulation r
41/128
g
Many distributions are implemented in R
The following list is not completesee the task viewhttp://cran.r-project.org/web/views/Distributions.html
Primarily packages which deal with a particular distribution orset of related distributions
Some packages not on CRAN are nonetheless available
I know of other implementations not on CRAN and notavailable
David Scott, Diethelm Wurtz, Christine Dong Software for Distributions in R
Outline Introduction Distributions Current Design Base R Contributed
Packages
http://cran.r-project.org/web/views/Distributions.htmlhttp://cran.r-project.org/web/views/Distributions.htmlhttp://find/7/29/2019 Simulation r
42/128
Package Name Description
actuar Collection of functions and data sets related toactuarial science applications, including loss distri-butions
bs Package for the Birnbaum-Saunders distributionevd Functions for extreme value distributions
Davies The Davies quantile functionevdbayes Bayesian analysis in extreme value theoryevir Extreme values in RexactRankTests Exact distributions for rank and permutation testsextRemes Extreme value toolkit
ghyp A package on generalized hyperbolic distributionsgld Estimation and use of the generalised (Tukey)
lambda distributionHyperbolicDist The hyperbolic distribution
David Scott, Diethelm Wurtz, Christine Dong Software for Distributions in R
Outline Introduction Distributions Current Design Base R Contributed
Packages
http://find/7/29/2019 Simulation r
43/128
Package Name Description
ig Package for the robust and classical versions of theinverse Gaussian distribution
ismev An introduction to statistical modeling of extremevalues
lmomco L-moments, trimmed L-moments, L-comoments,
and many distributionsLmoments L-moments and quantile mixturesmnormt The multivariate normal and t distributionsmvtnorm Multivariate normal and t distributionnormalp Package for exponential power distributions (EPD)
POT Generalized Pareto distribution and peaks overthreshold
Rmetrics An environment for teaching financial engineeringand computational finance with R
David Scott, Diethelm Wurtz, Christine Dong Software for Distributions in R
Outline Introduction Distributions Current Design Base R Contributed
Packages
http://goforward/http://find/7/29/2019 Simulation r
44/128
Package Name Description
skewt The skewed Student-t distribution
sn The skew-normal and skew-t distributionsSuppDists Supplementary distributionstdist Distribution of a linear combination of independent
Students t-variables
triangle Provides the standard distribution functions for thetriangle distributionVarianceGamma The variance gamma distribution
There are a number of packages which have implementationsof a number of distributions
David Scott, Diethelm Wurtz, Christine Dong Software for Distributions in R
Outline Introduction Distributions Current Design Base R Contributed
Packages
http://find/http://goback/7/29/2019 Simulation r
45/128
Package Name Distributions
actuar Burr, inverse Burr, generalized beta, gen-
eralized Pareto, inverse exponential, in-verse gamma, inverse paralogistic, inversePareto, inverse transformed gamma, inverseWeibull, log-gamma, log-logistic, paralogis-
tic, Pareto, transformed gammafBasics (Rmetrics) Skew-normal, skew-t, generalized hyper-bolic, hyperbolic, normal inverse Gaussian,generalized hyperbolic Student-t, stable
fExtremes (Rmetrics) Generalized extreme value, generalize Pareto
fGarch (Rmetrics) Generalized exponential, double exponentialfOptions (Rmetrics) Johnson Type IV, reciprocal gamma
David Scott, Diethelm Wurtz, Christine Dong Software for Distributions in R
Outline Introduction Distributions Current Design Base R Contributed
Packages
http://find/http://goback/7/29/2019 Simulation r
46/128
Package Name Distributions
HyperbolicDist generalized hyperbolic, hyperbolic, general-
ized inverse Gaussian, skew-LaplaceQRMlib generalized extreme value, generalized
hyperbolic, generalized Pareto, Gumbel,probit-normal
SuppDists Friedman chi-squared, Johnson system(types IIV), Kendalls tau, Kruskal-Wallis,normal scores, generalized hypergeometric,inverse Gaussian
David Scott, Diethelm Wurtz, Christine Dong Software for Distributions in R
Outline Introduction Distributions Current Design Base R Contributed
Packages by Jim Lindsey
http://find/7/29/2019 Simulation r
47/128
Jim Lindseys packages include multiple distributions also
Package Name Distributionsrmutil beta-binomial, Box-Cox, Burr, Consul, dou-
ble binomial, double Poisson, gammacount, generalized extreme value, general-ized gamma, generalized inverse Gaussian,
generalized logistic, generalized Weibull,Hjorth, Laplace, Levy, multiplicative bino-mial, multiplicative Poisson, Pareto, powerexponential, power variance function Pois-son, simplex, skew-Laplace, two-sided power
stable stable
David Scott, Diethelm Wurtz, Christine Dong Software for Distributions in R
Outline Introduction Distributions Current Design
http://find/7/29/2019 Simulation r
48/128
1 Introduction
2 Current Software for Distributions
3 Implementation in Base R
4 Design for Distribution Implementation
David Scott, Diethelm Wurtz, Christine Dong Software for Distributions in R
Outline Introduction Distributions Current Design
dpqr Functions
http://find/http://goback/7/29/2019 Simulation r
49/128
Any experienced R user will be aware of the namingconventions for the density, cumulative distribution, quantileand random number generation functions for the base Rdistributions
The argument lists for the dpqr functions are standard
First argument is x, p, qand n for respectively a vector of
quantiles, a vector of quantiles, a vector of probabilities, andthe sample size
rwilcox is an exception using nn because n is a parameter
Subsequent arguments give the parameters
The gamma distribution is unusual, with argument listshape, rate =1, scale = 1/rate
This mechanism allows the user to specify the secondparameter as either the scale or the rate
David Scott, Diethelm Wurtz, Christine Dong Software for Distributions in R
Outline Introduction Distributions Current Design
dpqr Functions
http://find/http://goback/7/29/2019 Simulation r
50/128
Any experienced R user will be aware of the namingconventions for the density, cumulative distribution, quantileand random number generation functions for the base Rdistributions
The argument lists for the dpqr functions are standard
First argument is x, p, qand n for respectively a vector of
quantiles, a vector of quantiles, a vector of probabilities, andthe sample size
rwilcox is an exception using nn because n is a parameter
Subsequent arguments give the parameters
The gamma distribution is unusual, with argument listshape, rate =1, scale = 1/rate
This mechanism allows the user to specify the secondparameter as either the scale or the rate
David Scott, Diethelm Wurtz, Christine Dong Software for Distributions in R
Outline Introduction Distributions Current Design
dpqr Functions
http://find/7/29/2019 Simulation r
51/128
Any experienced R user will be aware of the namingconventions for the density, cumulative distribution, quantileand random number generation functions for the base Rdistributions
The argument lists for the dpqr functions are standard
First argument is x, p, qand n for respectively a vector of
quantiles, a vector of quantiles, a vector of probabilities, andthe sample size
rwilcox is an exception using nn because n is a parameter
Subsequent arguments give the parameters
The gamma distribution is unusual, with argument listshape, rate =1, scale = 1/rate
This mechanism allows the user to specify the secondparameter as either the scale or the rate
David Scott, Diethelm Wurtz, Christine Dong Software for Distributions in R
Outline Introduction Distributions Current Design
dpqr Functions
http://find/http://goback/7/29/2019 Simulation r
52/128
Any experienced R user will be aware of the namingconventions for the density, cumulative distribution, quantileand random number generation functions for the base Rdistributions
The argument lists for the dpqr functions are standard
First argument is x, p, qand n for respectively a vector of
quantiles, a vector of quantiles, a vector of probabilities, andthe sample size
rwilcox is an exception using nn because n is a parameter
Subsequent arguments give the parameters
The gamma distribution is unusual, with argument listshape, rate =1, scale = 1/rate
This mechanism allows the user to specify the secondparameter as either the scale or the rate
David Scott, Diethelm Wurtz, Christine Dong Software for Distributions in R
Outline Introduction Distributions Current Design
dpqr Functions
http://find/7/29/2019 Simulation r
53/128
Any experienced R user will be aware of the namingconventions for the density, cumulative distribution, quantileand random number generation functions for the base Rdistributions
The argument lists for the dpqr functions are standard
First argument is x, p, qand n for respectively a vector of
quantiles, a vector of quantiles, a vector of probabilities, andthe sample size
rwilcox is an exception using nn because n is a parameter
Subsequent arguments give the parameters
The gamma distribution is unusual, with argument listshape, rate =1, scale = 1/rate
This mechanism allows the user to specify the secondparameter as either the scale or the rate
David Scott, Diethelm Wurtz, Christine Dong Software for Distributions in R
Outline Introduction Distributions Current Design
dpqr Functions
http://find/http://goback/7/29/2019 Simulation r
54/128
Any experienced R user will be aware of the namingconventions for the density, cumulative distribution, quantileand random number generation functions for the base Rdistributions
The argument lists for the dpqr functions are standard
First argument is x, p, qand n for respectively a vector of
quantiles, a vector of quantiles, a vector of probabilities, andthe sample size
rwilcox is an exception using nn because n is a parameter
Subsequent arguments give the parameters
The gamma distribution is unusual, with argument listshape, rate =1, scale = 1/rate
This mechanism allows the user to specify the secondparameter as either the scale or the rate
David Scott, Diethelm Wurtz, Christine Dong Software for Distributions in R
Outline Introduction Distributions Current Design
dpqr Functions
http://find/7/29/2019 Simulation r
55/128
Any experienced R user will be aware of the namingconventions for the density, cumulative distribution, quantileand random number generation functions for the base Rdistributions
The argument lists for the dpqr functions are standard
First argument is x, p, qand n for respectively a vector of
quantiles, a vector of quantiles, a vector of probabilities, andthe sample size
rwilcox is an exception using nn because n is a parameter
Subsequent arguments give the parameters
The gamma distribution is unusual, with argument listshape, rate =1, scale = 1/rate
This mechanism allows the user to specify the secondparameter as either the scale or the rate
David Scott, Diethelm Wurtz, Christine Dong Software for Distributions in R
Outline Introduction Distributions Current Design
dpqr Functions
http://find/http://goback/7/29/2019 Simulation r
56/128
Other arguments differ among the dpqr functions
The d functions take the argument log, the p and q functionsthe argument log.p
These allow the extension of the range of accuratecomputation for these quantities
The p and q functions have the argument lower.tail
The dpqr functions are coded in C and may be found in thesource software tree at /src/math/
They are in large part due to Ross Ihaka and Catherine Loader
Martin Machler is now responsible for on-going maintenance
David Scott, Diethelm Wurtz, Christine Dong Software for Distributions in R
Outline Introduction Distributions Current Design
dpqr Functions
http://find/http://goback/7/29/2019 Simulation r
57/128
Other arguments differ among the dpqr functions
The d functions take the argument log, the p and q functionsthe argument log.p
These allow the extension of the range of accuratecomputation for these quantities
The p and q functions have the argument lower.tail
The dpqr functions are coded in C and may be found in thesource software tree at /src/math/
They are in large part due to Ross Ihaka and Catherine Loader
Martin Machler is now responsible for on-going maintenance
David Scott, Diethelm Wurtz, Christine Dong Software for Distributions in R
Outline Introduction Distributions Current Design
dpqr Functions
http://find/http://goback/7/29/2019 Simulation r
58/128
Other arguments differ among the dpqr functions
The d functions take the argumentlog
, the p and q functionsthe argument log.p
These allow the extension of the range of accuratecomputation for these quantities
The p and q functions have the argument lower.tail
The dpqr functions are coded in C and may be found in thesource software tree at /src/math/
They are in large part due to Ross Ihaka and Catherine Loader
Martin Machler is now responsible for on-going maintenance
David Scott, Diethelm Wurtz, Christine Dong Software for Distributions in R
Outline Introduction Distributions Current Design
dpqr Functions
http://find/http://goback/7/29/2019 Simulation r
59/128
Other arguments differ among the dpqr functions
The d functions take the argumentlog
, the p and q functionsthe argument log.p
These allow the extension of the range of accuratecomputation for these quantities
The p and q functions have the argument lower.tail
The dpqr functions are coded in C and may be found in thesource software tree at /src/math/
They are in large part due to Ross Ihaka and Catherine Loader
Martin Machler is now responsible for on-going maintenance
David Scott, Diethelm Wurtz, Christine Dong Software for Distributions in R
Outline Introduction Distributions Current Design
dpqr Functions
http://find/7/29/2019 Simulation r
60/128
Other arguments differ among the dpqr functions
The d functions take the argument log, the p and q functionsthe argument log.p
These allow the extension of the range of accuratecomputation for these quantities
The p and q functions have the argument lower.tail
The dpqr functions are coded in C and may be found in thesource software tree at /src/math/
They are in large part due to Ross Ihaka and Catherine Loader
Martin Machler is now responsible for on-going maintenance
David Scott, Diethelm Wurtz, Christine Dong Software for Distributions in R
Outline Introduction Distributions Current Design
dpqr Functions
http://find/http://goback/7/29/2019 Simulation r
61/128
Other arguments differ among the dpqr functions
The d functions take the argument log, the p and q functionsthe argument log.p
These allow the extension of the range of accuratecomputation for these quantities
The p and q functions have the argument lower.tail
The dpqr functions are coded in C and may be found in thesource software tree at /src/math/
They are in large part due to Ross Ihaka and Catherine Loader
Martin Machler is now responsible for on-going maintenance
David Scott, Diethelm Wurtz, Christine Dong Software for Distributions in R
Outline Introduction Distributions Current Design
dpqr Functions
http://find/7/29/2019 Simulation r
62/128
Other arguments differ among the dpqr functions
The d functions take the argument log, the p and q functionsthe argument log.p
These allow the extension of the range of accuratecomputation for these quantities
The p and q functions have the argument lower.tail
The dpqr functions are coded in C and may be found in thesource software tree at /src/math/
They are in large part due to Ross Ihaka and Catherine Loader
Martin Machler is now responsible for on-going maintenance
David Scott, Diethelm Wurtz, Christine Dong Software for Distributions in R
Outline Introduction Distributions Current Design
Testing and Validation
http://find/7/29/2019 Simulation r
63/128
The algorithms used in the dpqr functions are well-establishedalgorithms taken from a substantial scientific literature
There are also tests performed, found in the directory testsin two files d-p-q-r-tests.R and p-r-random-tests.R
Tests in d-p-q-r-tests.R are inversion tests which checkthat qdist(pdist(x))=x for values x generated by rdist
There are tests relying on special distribution relationships,and tests using extreme values of parameters or arguments
For discrete distributions equality ofcumsum(ddist(.))=pdist(.)
David Scott, Diethelm Wurtz, Christine Dong Software for Distributions in R
Outline Introduction Distributions Current Design
Testing and Validation
http://find/7/29/2019 Simulation r
64/128
The algorithms used in the dpqr functions are well-establishedalgorithms taken from a substantial scientific literature
There are also tests performed, found in the directory testsin two files d-p-q-r-tests.R and p-r-random-tests.R
Tests in d-p-q-r-tests.R are inversion tests which checkthat qdist(pdist(x))=x for values x generated by rdist
There are tests relying on special distribution relationships,and tests using extreme values of parameters or arguments
For discrete distributions equality ofcumsum(ddist(.))=pdist(.)
David Scott, Diethelm Wurtz, Christine Dong Software for Distributions in R
Outline Introduction Distributions Current Design
Testing and Validation
http://find/http://goback/7/29/2019 Simulation r
65/128
The algorithms used in the dpqr functions are well-establishedalgorithms taken from a substantial scientific literature
There are also tests performed, found in the directory testsin two files d-p-q-r-tests.R and p-r-random-tests.R
Tests in d-p-q-r-tests.R are inversion tests which checkthat qdist(pdist(x))=x for values x generated by rdist
There are tests relying on special distribution relationships,and tests using extreme values of parameters or arguments
For discrete distributions equality ofcumsum(ddist(.))=pdist(.)
David Scott, Diethelm Wurtz, Christine Dong Software for Distributions in R
Outline Introduction Distributions Current Design
Testing and Validation
http://find/http://goback/7/29/2019 Simulation r
66/128
The algorithms used in the dpqr functions are well-establishedalgorithms taken from a substantial scientific literature
There are also tests performed, found in the directory testsin two files d-p-q-r-tests.R and p-r-random-tests.R
Tests in d-p-q-r-tests.R are inversion tests which checkthat qdist(pdist(x))=x for values x generated by rdist
There are tests relying on special distribution relationships,and tests using extreme values of parameters or arguments
For discrete distributions equality ofcumsum(ddist(.))=pdist(.)
David Scott, Diethelm Wurtz, Christine Dong Software for Distributions in R
Outline Introduction Distributions Current Design
Testing and Validation
http://find/http://goback/7/29/2019 Simulation r
67/128
The algorithms used in the dpqr functions are well-establishedalgorithms taken from a substantial scientific literature
There are also tests performed, found in the directory testsin two files d-p-q-r-tests.R and p-r-random-tests.R
Tests in d-p-q-r-tests.R are inversion tests which checkthat qdist(pdist(x))=x for values x generated by rdist
There are tests relying on special distribution relationships,and tests using extreme values of parameters or arguments
For discrete distributions equality ofcumsum(ddist(.))=pdist(.)
David Scott, Diethelm Wurtz, Christine Dong Software for Distributions in R
Outline Introduction Distributions Current Design
Testing and Validation
http://find/http://goback/7/29/2019 Simulation r
68/128
Tests in p-r-random-tests.R are based on an inequality ofMassart:
Pr
supx
|Fn(x) F(x)| >
2 exp(2n2)
where Fn is the empirical distribution function for a
This is a version of the Dvoretsky-Kiefer-Wolfowitz inequalitywith the best possible constant, namely the leading 2 in theright hand side of the inequality
The inequality above is true for all distribution functions, for
all n and
Distributions are tested by generating a sample of size 10,000and comparing the difference between the empiricaldistribution function and distribution function
David Scott, Diethelm Wurtz, Christine Dong Software for Distributions in R
Outline Introduction Distributions Current Design
Testing and Validation
b d l f
http://find/http://goback/7/29/2019 Simulation r
69/128
Tests in p-r-random-tests.R are based on an inequality ofMassart:
Pr
supx
|Fn(x) F(x)| >
2 exp(2n2)
where Fn is the empirical distribution function for a
This is a version of the Dvoretsky-Kiefer-Wolfowitz inequalitywith the best possible constant, namely the leading 2 in theright hand side of the inequality
The inequality above is true for all distribution functions, for
all n and
Distributions are tested by generating a sample of size 10,000and comparing the difference between the empiricaldistribution function and distribution function
David Scott, Diethelm Wurtz, Christine Dong Software for Distributions in R
Outline Introduction Distributions Current Design
Testing and Validation
T i b d i li f
http://find/http://goback/7/29/2019 Simulation r
70/128
Tests in p-r-random-tests.R are based on an inequality ofMassart:
Pr
supx
|Fn(x) F(x)| >
2 exp(2n2)
where Fn is the empirical distribution function for a
This is a version of the Dvoretsky-Kiefer-Wolfowitz inequalitywith the best possible constant, namely the leading 2 in theright hand side of the inequality
The inequality above is true for all distribution functions, for
all n and
Distributions are tested by generating a sample of size 10,000and comparing the difference between the empiricaldistribution function and distribution function
David Scott, Diethelm Wurtz, Christine Dong Software for Distributions in R
Outline Introduction Distributions Current Design
Testing and Validation
T t i d R b d i lit f
http://find/http://goback/7/29/2019 Simulation r
71/128
Tests in p-r-random-tests.R are based on an inequality ofMassart:
Pr
supx
|Fn(x) F(x)| >
2 exp(2n2)
where Fn is the empirical distribution function for a
This is a version of the Dvoretsky-Kiefer-Wolfowitz inequalitywith the best possible constant, namely the leading 2 in theright hand side of the inequality
The inequality above is true for all distribution functions, for
all n and
Distributions are tested by generating a sample of size 10,000and comparing the difference between the empiricaldistribution function and distribution function
David Scott, Diethelm Wurtz, Christine Dong Software for Distributions in R
Outline Introduction Distributions Current Design
http://find/http://goback/7/29/2019 Simulation r
72/128
1 Introduction
2 Current Software for Distributions
3 Implementation in Base R
4 Design for Distribution Implementation
David Scott, Diethelm Wurtz, Christine Dong Software for Distributions in R
Outline Introduction Distributions Current Design
What Should be Provided?
Besides the obvious dpqr functions what else is needed?
http://find/http://goback/7/29/2019 Simulation r
73/128
Besides the obvious dpqr functions, what else is needed?
moments, at least low order ones
the mode for unimodal distributionsmoment generating function and characteristic functionfunctions for changing parameterisationsfunctions for fitting of distributions and fit diagnosticsgoodness-of-fit tests
methods associated with fit results: print, plot, summary,print.summary and coeffor maximum likelihood fits, methods such as logLik andprofile
David Scott, Diethelm Wurtz, Christine Dong Software for Distributions in R
Outline Introduction Distributions Current Design
What Should be Provided?
Besides the obvious dpqr functions what else is needed?
http://find/7/29/2019 Simulation r
74/128
Besides the obvious dpqr functions, what else is needed?
moments, at least low order ones
the mode for unimodal distributionsmoment generating function and characteristic functionfunctions for changing parameterisationsfunctions for fitting of distributions and fit diagnosticsgoodness-of-fit tests
methods associated with fit results: print, plot, summary,print.summary and coeffor maximum likelihood fits, methods such as logLik andprofile
David Scott, Diethelm Wurtz, Christine Dong Software for Distributions in R
Outline Introduction Distributions Current Design
What Should be Provided?
Besides the obvious dpqr functions what else is needed?
http://find/7/29/2019 Simulation r
75/128
Besides the obvious dpqr functions, what else is needed?
moments, at least low order ones
the mode for unimodal distributionsmoment generating function and characteristic functionfunctions for changing parameterisationsfunctions for fitting of distributions and fit diagnosticsgoodness-of-fit tests
methods associated with fit results: print, plot, summary,print.summary and coeffor maximum likelihood fits, methods such as logLik andprofile
David Scott, Diethelm Wurtz, Christine Dong Software for Distributions in R
Outline Introduction Distributions Current Design
What Should be Provided?
Besides the obvious dpqr functions what else is needed?
http://find/http://goback/7/29/2019 Simulation r
76/128
Besides the obvious dpqr functions, what else is needed?
moments, at least low order ones
the mode for unimodal distributionsmoment generating function and characteristic functionfunctions for changing parameterisationsfunctions for fitting of distributions and fit diagnosticsgoodness-of-fit tests
methods associated with fit results: print, plot, summary,print.summary and coeffor maximum likelihood fits, methods such as logLik andprofile
David Scott, Diethelm Wurtz, Christine Dong Software for Distributions in R
Outline Introduction Distributions Current Design
What Should be Provided?
Besides the obvious dpqr functions, what else is needed?
http://find/7/29/2019 Simulation r
77/128
Besides the obvious dpqr functions, what else is needed?
moments, at least low order ones
the mode for unimodal distributionsmoment generating function and characteristic functionfunctions for changing parameterisationsfunctions for fitting of distributions and fit diagnosticsgoodness-of-fit tests
methods associated with fit results: print, plot, summary,print.summary and coeffor maximum likelihood fits, methods such as logLik andprofile
David Scott, Diethelm Wurtz, Christine Dong Software for Distributions in R
Outline Introduction Distributions Current Design
What Should be Provided?
Besides the obvious dpqr functions, what else is needed?
http://find/http://goback/7/29/2019 Simulation r
78/128
pq ,
moments, at least low order ones
the mode for unimodal distributionsmoment generating function and characteristic functionfunctions for changing parameterisationsfunctions for fitting of distributions and fit diagnosticsgoodness-of-fit tests
methods associated with fit results: print, plot, summary,print.summary and coeffor maximum likelihood fits, methods such as logLik andprofile
David Scott, Diethelm Wurtz, Christine Dong Software for Distributions in R
Outline Introduction Distributions Current Design
What Should be Provided?
Besides the obvious dpqr functions, what else is needed?
http://find/7/29/2019 Simulation r
79/128
pq ,
moments, at least low order ones
the mode for unimodal distributionsmoment generating function and characteristic functionfunctions for changing parameterisationsfunctions for fitting of distributions and fit diagnosticsgoodness-of-fit tests
methods associated with fit results: print, plot, summary,print.summary and coeffor maximum likelihood fits, methods such as logLik andprofile
David Scott, Diethelm Wurtz, Christine Dong Software for Distributions in R
Outline Introduction Distributions Current Design
What Should be Provided?
Besides the obvious dpqr functions, what else is needed?
http://find/7/29/2019 Simulation r
80/128
pq
moments, at least low order ones
the mode for unimodal distributionsmoment generating function and characteristic functionfunctions for changing parameterisationsfunctions for fitting of distributions and fit diagnosticsgoodness-of-fit tests
methods associated with fit results: print, plot, summary,print.summary and coeffor maximum likelihood fits, methods such as logLik andprofile
David Scott, Diethelm Wurtz, Christine Dong Software for Distributions in R
Outline Introduction Distributions Current Design
What Should be Provided?
Besides the obvious dpqr functions, what else is needed?
http://find/7/29/2019 Simulation r
81/128
moments, at least low order ones
the mode for unimodal distributionsmoment generating function and characteristic functionfunctions for changing parameterisationsfunctions for fitting of distributions and fit diagnosticsgoodness-of-fit tests
methods associated with fit results:print, plot, summary,
print.summary and coeffor maximum likelihood fits, methods such as logLik andprofile
David Scott, Diethelm Wurtz, Christine Dong Software for Distributions in R
Outline Introduction Distributions Current Design
Fitting Diagnostics
To assess the fit of a distribution, diagnostic plots should be
http://find/7/29/2019 Simulation r
82/128
g pprovided
Some useful plots area histogram or empirical density with fitted densitya log-histogram with fitted log-densitya QQ-plot with optional fitted linea PP-plot with optional fitted line
For maximum likelihood estimation, contour plots andperspective plots for pairs of parameters, and likelihood profileplots
David Scott, Diethelm Wurtz, Christine Dong Software for Distributions in R Outline Introduction Distributions Current Design
Fitting Diagnostics
To assess the fit of a distribution, diagnostic plots should be
http://find/7/29/2019 Simulation r
83/128
g pprovided
Some useful plots area histogram or empirical density with fitted densitya log-histogram with fitted log-densitya QQ-plot with optional fitted linea PP-plot with optional fitted line
For maximum likelihood estimation, contour plots andperspective plots for pairs of parameters, and likelihood profileplots
David Scott, Diethelm Wurtz, Christine Dong Software for Distributions in R Outline Introduction Distributions Current Design
Fitting Diagnostics
To assess the fit of a distribution, diagnostic plots should be
http://find/7/29/2019 Simulation r
84/128
provided
Some useful plots area histogram or empirical density with fitted densitya log-histogram with fitted log-densitya QQ-plot with optional fitted linea PP-plot with optional fitted line
For maximum likelihood estimation, contour plots andperspective plots for pairs of parameters, and likelihood profileplots
David Scott Diethelm Wurtz Christine Dong Software for Distributions in R Outline Introduction Distributions Current Design
Fitting Diagnostics
To assess the fit of a distribution, diagnostic plots should be
http://find/7/29/2019 Simulation r
85/128
provided
Some useful plots area histogram or empirical density with fitted densitya log-histogram with fitted log-densitya QQ-plot with optional fitted linea PP-plot with optional fitted line
For maximum likelihood estimation, contour plots andperspective plots for pairs of parameters, and likelihood profileplots
David Scott Diethelm Wurtz Christine Dong Software for Distributions in R Outline Introduction Distributions Current Design
Fitting Diagnostics
To assess the fit of a distribution, diagnostic plots should be
http://find/http://goback/7/29/2019 Simulation r
86/128
provided
Some useful plots area histogram or empirical density with fitted densitya log-histogram with fitted log-densitya QQ-plot with optional fitted linea PP-plot with optional fitted line
For maximum likelihood estimation, contour plots andperspective plots for pairs of parameters, and likelihood profileplots
David Scott Diethelm Wurtz Christine Dong Software for Distributions in R Outline Introduction Distributions Current Design
Fitting Diagnostics
To assess the fit of a distribution, diagnostic plots should be
http://find/http://goback/7/29/2019 Simulation r
87/128
provided
Some useful plots area histogram or empirical density with fitted densitya log-histogram with fitted log-densitya QQ-plot with optional fitted linea PP-plot with optional fitted line
For maximum likelihood estimation, contour plots andperspective plots for pairs of parameters, and likelihood profileplots
David Scott Diethelm Wurtz Christine Dong Software for Distributions in R Outline Introduction Distributions Current Design
Fitting Diagnostics
To assess the fit of a distribution, diagnostic plots should be
http://find/7/29/2019 Simulation r
88/128
provided
Some useful plots area histogram or empirical density with fitted densitya log-histogram with fitted log-densitya QQ-plot with optional fitted linea PP-plot with optional fitted line
For maximum likelihood estimation, contour plots andperspective plots for pairs of parameters, and likelihood profileplots
David Scott Diethelm Wurtz Christine Dong Software for Distributions in R Outline Introduction Distributions Current Design
Fitting
Some generic fitting routines are currently available
http://find/7/29/2019 Simulation r
89/128
mle from stats4 can be used to fit distributions but the log
likelihood and starting values must be supplied
fitdistr from MASS will automatically fit most of thedistributions from base R
Other distributions can be fitted using mle by supplying the
density and started valuesIn designing fitting functions, the structure of the objectreturned and the methods available are vital aspects
mle returns an S4 object of class mle
fitdistr produces and S3 object of class fitdistr
David Scott Diethelm Wurtz Christine Dong Software for Distributions in R Outline Introduction Distributions Current Design
Fitting
Some generic fitting routines are currently available
http://find/http://goback/7/29/2019 Simulation r
90/128
mle from stats4 can be used to fit distributions but the log
likelihood and starting values must be supplied
fitdistr from MASS will automatically fit most of thedistributions from base R
Other distributions can be fitted using mle by supplying the
density and started valuesIn designing fitting functions, the structure of the objectreturned and the methods available are vital aspects
mle returns an S4 object of class mle
fitdistr produces and S3 object of class fitdistr
David Scott Diethelm Wurtz Christine Dong Software for Distributions in R Outline Introduction Distributions Current Design
Fitting
Some generic fitting routines are currently available
http://find/7/29/2019 Simulation r
91/128
mle from stats4 can be used to fit distributions but the log
likelihood and starting values must be supplied
fitdistr from MASS will automatically fit most of thedistributions from base R
Other distributions can be fitted using mle by supplying the
density and started valuesIn designing fitting functions, the structure of the objectreturned and the methods available are vital aspects
mle returns an S4 object of class mle
fitdistr produces and S3 object of class fitdistr
David Scott Diethelm Wurtz Christine Dong Software for Distributions in R Outline Introduction Distributions Current Design
Fitting
Some generic fitting routines are currently available
http://find/7/29/2019 Simulation r
92/128
mle from stats4 can be used to fit distributions but the log
likelihood and starting values must be supplied
fitdistr from MASS will automatically fit most of thedistributions from base R
Other distributions can be fitted using mle by supplying the
density and started valuesIn designing fitting functions, the structure of the objectreturned and the methods available are vital aspects
mle returns an S4 object of class mle
fitdistr produces and S3 object of class fitdistr
David Scott Diethelm Wurtz Christine Dong Software for Distributions in R Outline Introduction Distributions Current Design
Fitting
Some generic fitting routines are currently available
f fi
http://find/http://goback/7/29/2019 Simulation r
93/128
mle from stats4 can be used to fit distributions but the log
likelihood and starting values must be supplied
fitdistr from MASS will automatically fit most of thedistributions from base R
Other distributions can be fitted using mle by supplying the
density and started valuesIn designing fitting functions, the structure of the objectreturned and the methods available are vital aspects
mle returns an S4 object of class mle
fitdistr produces and S3 object of class fitdistr
David Scott Diethelm Wurtz Christine Dong Software for Distributions in R Outline Introduction Distributions Current Design
Fitting
Some generic fitting routines are currently available
f 4 b d fi di ib i b h l
http://find/http://goback/7/29/2019 Simulation r
94/128
mle from stats4 can be used to fit distributions but the log
likelihood and starting values must be suppliedfitdistr from MASS will automatically fit most of thedistributions from base R
Other distributions can be fitted using mle by supplying the
density and started valuesIn designing fitting functions, the structure of the objectreturned and the methods available are vital aspects
mle returns an S4 object of class mle
fitdistr produces and S3 object of class fitdistr
David Scott Diethelm Wurtz Christine Dong Software for Distributions in R Outline Introduction Distributions Current Design
Fitting
Some generic fitting routines are currently available
l f 4 b d t fit di t ib ti b t th l
http://find/7/29/2019 Simulation r
95/128
mle from stats4 can be used to fit distributions but the log
likelihood and starting values must be suppliedfitdistr from MASS will automatically fit most of thedistributions from base R
Other distributions can be fitted using mle by supplying the
density and started valuesIn designing fitting functions, the structure of the objectreturned and the methods available are vital aspects
mle returns an S4 object of class mle
fitdistr produces and S3 object of class fitdistr
David Scott Diethelm Wurtz Christine Dong Software for Distributions in R Outline Introduction Distributions Current Design
Fitting
The methods available for an object of class mle are:Method Action
http://find/7/29/2019 Simulation r
96/128
Method Action
confint Confidence intervals from likelihood profileslogLik Extract maximized log-likelihoodprofile Likelihood profile generationshow Display object brieflysummary Generate object summary
update Update fitvcov Extract variance-covariance matrix
For fitdistr the methods are print, coef, and logLik
Neither function returns the data, so a plot method which
produces suitable diagnostic plots is not possible
Ideally a fit should return and object of class distFit say,and the mle class should extend that
David Scott Diethelm Wurtz Christine Dong Software for Distributions in R Outline Introduction Distributions Current Design
Fitting
The methods available for an object of class mle are:Method Action
http://find/http://goback/7/29/2019 Simulation r
97/128
Method Action
confint Confidence intervals from likelihood profileslogLik Extract maximized log-likelihoodprofile Likelihood profile generationshow Display object brieflysummary Generate object summary
update Update fitvcov Extract variance-covariance matrix
For fitdistr the methods are print, coef, and logLik
Neither function returns the data, so a plot method which
produces suitable diagnostic plots is not possible
Ideally a fit should return and object of class distFit say,and the mle class should extend that
David Scott Diethelm Wurtz Christine Dong Software for Distributions in R Outline Introduction Distributions Current Design
Fitting
The methods available for an object of class mle are:Method Action
http://find/7/29/2019 Simulation r
98/128
Method Action
confint Confidence intervals from likelihood profileslogLik Extract maximized log-likelihoodprofile Likelihood profile generationshow Display object brieflysummary Generate object summary
update Update fitvcov Extract variance-covariance matrix
For fitdistr the methods are print, coef, and logLik
Neither function returns the data, so a plot method which
produces suitable diagnostic plots is not possible
Ideally a fit should return and object of class distFit say,and the mle class should extend that
D id S tt Diethel W t Ch isti e D S ft e f Dist ib ti s i R Outline Introduction Distributions Current Design
Fitting
The methods available for an object of class mle are:Method Action
http://find/7/29/2019 Simulation r
99/128
t o ct o
confint Confidence intervals from likelihood profileslogLik Extract maximized log-likelihoodprofile Likelihood profile generationshow Display object brieflysummary Generate object summary
update Update fitvcov Extract variance-covariance matrix
For fitdistr the methods are print, coef, and logLik
Neither function returns the data, so a plot method which
produces suitable diagnostic plots is not possible
Ideally a fit should return and object of class distFit say,and the mle class should extend that
David Scott, Diethelm Wurtz, Christine Dong Software for Distributions in R
Outline Introduction Distributions Current Design
Some Principles
There are obvious advantages in a standard design, both fordevelopers and for users
http://find/7/29/2019 Simulation r
100/128
p
Some principles arethe major guide to the design should be what exists in base Rthe design should be logical with as few special cases aspossiblethe design should minimize the possibility of programming
mistakes by users and developersthe design should simplify as much as possible the provision ofthe range of facilities needed to implement a distribution
A naming scheme for functions is an important part of astandard
David Scott, Diethelm Wurtz, Christine Dong Software for Distributions in R
Outline Introduction Distributions Current Design
Some Principles
There are obvious advantages in a standard design, both fordevelopers and for users
http://find/http://goback/7/29/2019 Simulation r
101/128
p
Some principles arethe major guide to the design should be what exists in base Rthe design should be logical with as few special cases aspossiblethe design should minimize the possibility of programming
mistakes by users and developersthe design should simplify as much as possible the provision ofthe range of facilities needed to implement a distribution
A naming scheme for functions is an important part of astandard
David Scott, Diethelm Wurtz, Christine Dong Software for Distributions in R
Outline Introduction Distributions Current Design
Some Principles
There are obvious advantages in a standard design, both fordevelopers and for users
http://find/7/29/2019 Simulation r
102/128
p
Some principles arethe major guide to the design should be what exists in base Rthe design should be logical with as few special cases aspossiblethe design should minimize the possibility of programming
mistakes by users and developersthe design should simplify as much as possible the provision ofthe range of facilities needed to implement a distribution
A naming scheme for functions is an important part of astandard
David Scott, Diethelm Wurtz, Christine Dong Software for Distributions in R
Outline Introduction Distributions Current Design
Some Principles
There are obvious advantages in a standard design, both fordevelopers and for users
http://find/7/29/2019 Simulation r
103/128
Some principles arethe major guide to the design should be what exists in base Rthe design should be logical with as few special cases aspossiblethe design should minimize the possibility of programming
mistakes by users and developersthe design should simplify as much as possible the provision ofthe range of facilities needed to implement a distribution
A naming scheme for functions is an important part of astandard
David Scott, Diethelm Wurtz, Christine Dong Software for Distributions in R
Outline Introduction Distributions Current Design
Some Principles
There are obvious advantages in a standard design, both fordevelopers and for users
http://find/http://goback/7/29/2019 Simulation r
104/128
Some principles arethe major guide to the design should be what exists in base Rthe design should be logical with as few special cases aspossiblethe design should minimize the possibility of programming
mistakes by users and developersthe design should simplify as much as possible the provision ofthe range of facilities needed to implement a distribution
A naming scheme for functions is an important part of astandard
David Scott, Diethelm Wurtz, Christine Dong Software for Distributions in R
http://find/7/29/2019 Simulation r
105/128
Outline Introduction Distributions Current Design
Some Principles
There are obvious advantages in a standard design, both fordevelopers and for users
7/29/2019 Simulation r
106/128
Some principles arethe major guide to the design should be what exists in base Rthe design should be logical with as few special cases aspossiblethe design should minimize the possibility of programming
mistakes by users and developersthe design should simplify as much as possible the provision ofthe range of facilities needed to implement a distribution
A naming scheme for functions is an important part of astandard
David Scott, Diethelm Wurtz, Christine Dong Software for Distributions in R
Outline Introduction Distributions Current Design
A Possible Naming Scheme
Function Name Description
ddist Density function or probability functiondi t
http://find/7/29/2019 Simulation r
107/128
pdist
Cumulative distribution functionqdist Quantile functionrdist Random number generatordistMean Theoretical meandistVar Theoretical variance
distSkew Theoretical skewnessdistKurt Theoretical kurtosisdistMode Mode of the densitydistMoments Theoretical moments (and mode)distMGF Moment generating function
distChFn Characteristic functiondistChangePars Change parameterization of the distributiondistFit Result of fitting the distribution to datadistTest Test the distribution
David Scott, Diethelm Wurtz, Christine Dong Software for Distributions in R
Outline Introduction Distributions Current Design
Some Alternatives
Use dots: hyperb.mean, usually deprecated because ofconfusion with S3 methods
http://find/http://goback/7/29/2019 Simulation r
108/128
Use initial letters: mhyperb, but what about the mgf,multivariate distributions?
David Scott, Diethelm Wurtz, Christine Dong Software for Distributions in R
Outline Introduction Distributions Current Design
Some Alternatives
Use dots: hyperb.mean, usually deprecated because ofconfusion with S3 methods
http://find/http://goback/7/29/2019 Simulation r
109/128
Use initial letters: mhyperb, but what about the mgf,multivariate distributions?
David Scott, Diethelm Wurtz, Christine Dong Software for Distributions in R
Outline Introduction Distributions Current Design
Function Arguments
Specification of parameters could allow for the parameters tobe specified as a vector
http://find/7/29/2019 Simulation r
110/128
This is helpful for maximum likelihood estimationThe approach used for the gamma distribution can be used toallow for both single parameter specification and vectorparameter specification
Separate parameters should be in the order of location, scale,skewness, kurtosis
David Scott, Diethelm Wurtz, Christine Dong Software for Distributions in R
Outline Introduction Distributions Current Design
Function Arguments
Specification of parameters could allow for the parameters tobe specified as a vector
http://find/7/29/2019 Simulation r
111/128
This is helpful for maximum likelihood estimationThe approach used for the gamma distribution can be used toallow for both single parameter specification and vectorparameter specification
Separate parameters should be in the order of location, scale,skewness, kurtosis
David Scott, Diethelm Wurtz, Christine Dong Software for Distributions in R
Outline Introduction Distributions Current Design
Function Arguments
Specification of parameters could allow for the parameters tobe specified as a vector
http://find/7/29/2019 Simulation r
112/128
This is helpful for maximum likelihood estimationThe approach used for the gamma distribution can be used toallow for both single parameter specification and vectorparameter specification
Separate parameters should be in the order of location, scale,skewness, kurtosis
David Scott, Diethelm Wurtz, Christine Dong Software for Distributions in R
Outline Introduction Distributions Current Design
Function Arguments
Specification of parameters could allow for the parameters tobe specified as a vector
http://find/http://goback/7/29/2019 Simulation r
113/128
This is helpful for maximum likelihood estimationThe approach used for the gamma distribution can be used toallow for both single parameter specification and vectorparameter specification
Separate parameters should be in the order of location, scale,skewness, kurtosis
David Scott, Diethelm Wurtz, Christine Dong Software for Distributions in R
Outline Introduction Distributions Current Design
Classes
I dont have a view on whether S3 or S4 classes should beused, but probably S4 classes should be aimed for
http://find/7/29/2019 Simulation r
114/128
For a fit from a distribution, the class could be called distfitA fit for a particular distribution would add the distributionname: hyperbfit, distfit with S3 methods
For S4 methods the class distfit would be extended
Similar ideas would be used for tests: a Kolomogorov-Smirnovtest would have class kstest, htest with S3 methods
David Scott, Diethelm Wurtz, Christine Dong Software for Distributions in R
Outline Introduction Distributions Current Design
Classes
I dont have a view on whether S3 or S4 classes should beused, but probably S4 classes should be aimed for
http://find/7/29/2019 Simulation r
115/128
For a fit from a distribution, the class could be called distfitA fit for a particular distribution would add the distributionname: hyperbfit, distfit with S3 methods
For S4 methods the class distfit would be extended
Similar ideas would be used for tests: a Kolomogorov-Smirnovtest would have class kstest, htest with S3 methods
David Scott, Diethelm Wurtz, Christine Dong Software for Distributions in R
Outline Introduction Distributions Current Design
Classes
I dont have a view on whether S3 or S4 classes should beused, but probably S4 classes should be aimed for
http://find/http://goback/7/29/2019 Simulation r
116/128
For a fit from a distribution, the class could be called distfitA fit for a particular distribution would add the distributionname: hyperbfit, distfit with S3 methods
For S4 methods the class distfit would be extended
Similar ideas would be used for tests: a Kolomogorov-Smirnovtest would have class kstest, htest with S3 methods
David Scott, Diethelm Wurtz, Christine Dong Software for Distributions in R
Outline Introduction Distributions Current Design
Classes
I dont have a view on whether S3 or S4 classes should beused, but probably S4 classes should be aimed for
http://find/7/29/2019 Simulation r
117/128
For a fit from a distribution, the class could be called distfitA fit for a particular distribution would add the distributionname: hyperbfit, distfit with S3 methods
For S4 methods the class distfit would be extended
Similar ideas would be used for tests: a Kolomogorov-Smirnovtest would have class kstest, htest with S3 methods
David Scott, Diethelm Wurtz, Christine Dong Software for Distributions in R
Outline Introduction Distributions Current Design
Classes
I dont have a view on whether S3 or S4 classes should beused, but probably S4 classes should be aimed for
http://find/http://goback/7/29/2019 Simulation r
118/128
For a fit from a distribution, the class could be called distfitA fit for a particular distribution would add the distributionname: hyperbfit, distfit with S3 methods
For S4 methods the class distfit would be extended
Similar ideas would be used for tests: a Kolomogorov-Smirnovtest would have class kstest, htest with S3 methods
David Scott, Diethelm Wurtz, Christine Dong Software for Distributions in R
Outline Introduction Distributions Current Design
Testing
Testing is vital to quality of software and developers shouldprovide test data and code
http://find/http://goback/7/29/2019 Simulation r
119/128
Firstly test parameter sets should be provided, which coverthe range of the parameter set
Unit tests should be provided for all functions: the RUnitpackage supports this
The distributions in Rmetrics have tests of this type, althoughfurther development seems warranted
The Massart inequality seems ideal to use in testing
David Scott, Diethelm Wurtz, Christine Dong Software for Distributions in R
Outline Introduction Distributions Current Design
Testing
Testing is vital to quality of software and developers shouldprovide test data and code
http://find/http://goback/7/29/2019 Simulation r
120/128
Firstly test parameter sets should be provided, which coverthe range of the parameter set
Unit tests should be provided for all functions: the RUnitpackage supports this
The distributions in Rmetrics have tests of this type, althoughfurther development seems warranted
The Massart inequality seems ideal to use in testing
David Scott, Diethelm Wurtz, Christine Dong Software for Distributions in R
Outline Introduction Distributions Current Design
Testing
Testing is vital to quality of software and developers shouldprovide test data and code
http://find/http://goback/7/29/2019 Simulation r
121/128
Firstly test parameter sets should be provided, which coverthe range of the parameter set
Unit tests should be provided for all functions: the RUnitpackage supports this
The distributions in Rmetrics have tests of this type, althoughfurther development seems warranted
The Massart inequality seems ideal to use in testing
David Scott, Diethelm Wurtz, Christine Dong Software for Distributions in R
Outline Introduction Distributions Current Design
Testing
Testing is vital to quality of software and developers shouldprovide test data and code
http://find/http://goback/7/29/2019 Simulation r
122/128
Firstly test parameter sets should be provided, which coverthe range of the parameter set
Unit tests should be provided for all functions: the RUnitpackage supports this
The distributions in Rmetrics have tests of this type, althoughfurther development seems warranted
The Massart inequality seems ideal to use in testing
David Scott, Diethelm Wurtz, Christine Dong Software for Distributions in R
Outline Introduction Distributions Current Design
Testing
Testing is vital to quality of software and developers shouldprovide test data and code
http://find/http://goback/7/29/2019 Simulation r
123/128
Firstly test parameter sets should be provided, which coverthe range of the parameter set
Unit tests should be provided for all functions: the RUnitpackage supports this
The distributions in Rmetrics have tests of this type, althoughfurther development seems warranted
The Massart inequality seems ideal to use in testing
David Scott, Diethelm Wurtz, Christine Dong Software for Distributions in R
Outline Introduction Distributions Current Design
Final Thoughts
The distr package is an object-oriented implementation ofdistributions
I f l d b h l
http://goforward/http://find/http://goback/7/29/2019 Simulation r
124/128
It facilitates operations on distributions such as convolutionsIt uses sampling for calculation of moments and distributionfunctions
The package VarianceGamma has been designed andimplemented using these ideas
Implementing the logarithm options of the dpq functions isquite difficult for the distributions in which I am interested
David Scott, Diethelm Wurtz, Christine Dong Software for Distributions in R
Outline Introduction Distributions Current Design
Final Thoughts
The distr package is an object-oriented implementation ofdistributions
I f ili i di ib i h l i
http://find/7/29/2019 Simulation r
125/128
It facilitates operations on distributions such as convolutionsIt uses sampling for calculation of moments and distributionfunctions
The package VarianceGamma has been designed andimplemented using these ideas
Implementing the logarithm options of the dpq functions isquite difficult for the distributions in which I am interested
David Scott, Diethelm Wurtz, Christine Dong Software for Distributions in R
Outline Introduction Distributions Current Design
Final Thoughts
The distr package is an object-oriented implementation ofdistributions
I f ili i di ib i h l i
http://find/7/29/2019 Simulation r
126/128
It facilitates operations on distributions such as convolutionsIt uses sampling for calculation of moments and distributionfunctions
The package VarianceGamma has been designed andimplemented using these ideas
Implementing the logarithm options of the dpq functions isquite difficult for the distributions in which I am interested
David Scott, Diethelm Wurtz, Christine Dong Software for Distributions in R
Outline Introduction Distributions Current Design
Final Thoughts
The distr package is an object-oriented implementation ofdistributions
It f ilit t ti di t ib ti h l ti
http://find/http://goback/7/29/2019 Simulation r
127/128
It facilitates operations on distributions such as convolutionsIt uses sampling for calculation of moments and distributionfunctions
The package VarianceGamma has been designed andimplemented using these ideas
Implementing the logarithm options of the dpq functions isquite difficult for the distributions in which I am interested
David Scott, Diethelm Wurtz, Christine Dong Software for Distributions in R
Outline Introduction Distributions Current Design
Final Thoughts
The distr package is an object-oriented implementation ofdistributions
It f ilit t ti di t ib ti h l ti
http://find/http://goback/7/29/2019 Simulation r
128/128
It facilitates operations on distributions such as convolutionsIt uses sampling for calculation of moments and distributionfunctions
The package VarianceGamma has been designed andimplemented using these ideas
Implementing the logarithm options of the dpq functions isquite difficult for the distributions in which I am interested
David Scott, Diethelm Wurtz, Christine Dong Software for Distributions in R
http://find/