26
Matlab statistics toolbox-Fitting Distributions to Data 刘静远 0711160008 2007.12.18

Matlab statistics toolbox-Fitting Distributions to Dataastro › sites › Computational_Astronomy › html › 6shi… · Matlab statistics toolbox-Fitting Distributions to Data

  • Upload
    others

  • View
    8

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Matlab statistics toolbox-Fitting Distributions to Dataastro › sites › Computational_Astronomy › html › 6shi… · Matlab statistics toolbox-Fitting Distributions to Data

Matlab statistics toolbox-Fitting Distributions to Data

刘静远 07111600082007.12.18

Page 2: Matlab statistics toolbox-Fitting Distributions to Dataastro › sites › Computational_Astronomy › html › 6shi… · Matlab statistics toolbox-Fitting Distributions to Data

Analyzing Survival or Reliability Data

In biological or medical applications——survival analysis.The times may represent the survival time of an organism or the timeuntil a disease is cured

In engineering applications——reliability analysis.the times may represent the time to failure of a piece of equipment.

Page 3: Matlab statistics toolbox-Fitting Distributions to Dataastro › sites › Computational_Astronomy › html › 6shi… · Matlab statistics toolbox-Fitting Distributions to Data

Special Properties of Lifetime DataWays of Looking at DistributionsFitting a Weibull DistributionAdding a Smooth Nonparametric EstimateAlternative Models

Page 4: Matlab statistics toolbox-Fitting Distributions to Dataastro › sites › Computational_Astronomy › html › 6shi… · Matlab statistics toolbox-Fitting Distributions to Data

Special Properties of Lifetime Data

Some features of lifetime data distinguish them other types of data.

positive values.

some lifetimes may not be observed exactly.

distributions and analysis techniques are fairly specific to lifetime data.

Page 5: Matlab statistics toolbox-Fitting Distributions to Dataastro › sites › Computational_Astronomy › html › 6shi… · Matlab statistics toolbox-Fitting Distributions to Data

rand('state',1);lifetime = [wblrnd(15000,3,90,1); wblrnd(1500,3,10,1)];T = 14000;obstime = sort(min(T, lifetime));failed = obstime(obstime<T); nfailed = length(failed);survived = obstime(obstime==T); nsurvived = length(survived);censored = (obstime >= T);plot([zeros(size(obstime)),obstime]', repmat(1:length(obstime),2,1), ...'Color','b','LineStyle','‐')

line([T;3e4], repmat(nfailed+(1:nsurvived), 2, 1), 'Color','b','LineStyle',':');line([T;T], [0;nfailed+nsurvived],'Color','k','LineStyle','‐')text(T,30,'<‐‐Unknown survival time past here')xlabel('Survival time'); ylabel('Observation number')

Page 6: Matlab statistics toolbox-Fitting Distributions to Dataastro › sites › Computational_Astronomy › html › 6shi… · Matlab statistics toolbox-Fitting Distributions to Data
Page 7: Matlab statistics toolbox-Fitting Distributions to Dataastro › sites › Computational_Astronomy › html › 6shi… · Matlab statistics toolbox-Fitting Distributions to Data

Ways of Looking at Distributions

Consider different ways of looking at a probability distribution.

A probability density function (PDF).A survivor function (1‐CDF).      The hazard rate.  It is the PDF divided by the survivor   function(PDF./(1‐CDF))A probability plot is a re‐scaled CDF, and is used to compare data   to a fitted distribution.

Page 8: Matlab statistics toolbox-Fitting Distributions to Dataastro › sites › Computational_Astronomy › html › 6shi… · Matlab statistics toolbox-Fitting Distributions to Data

x = linspace(1,30000);subplot(2,2,1);plot(x,wblpdf(x,14000,2),x,wblpdf(x,18000,2),x,wblpdf(x,14000,1.1))title('Prob. Density Fcn')subplot(2,2,2);plot(x,1‐wblcdf(x,14000,2),x,1‐wblcdf(x,18000,2),x,1‐wblcdf(x,14000,1.1))title('Survivor Fcn')subplot(2,2,3);wblhaz = @(x,a,b) (wblpdf(x,a,b) ./ (1‐wblcdf(x,a,b)));plot(x,wblhaz(x,14000,2),x,wblhaz(x,18000,2),x,wblhaz(x,14000,1.1))title('Hazard Rate Fcn')subplot(2,2,4);probplot('weibull',wblrnd(14000,2,40,1))title('Probability Plot')

Page 9: Matlab statistics toolbox-Fitting Distributions to Dataastro › sites › Computational_Astronomy › html › 6shi… · Matlab statistics toolbox-Fitting Distributions to Data
Page 10: Matlab statistics toolbox-Fitting Distributions to Dataastro › sites › Computational_Astronomy › html › 6shi… · Matlab statistics toolbox-Fitting Distributions to Data

Fitting a Weibull Distribution

The Weibull distribution is a generalization of the exponential distribution.  If lifetimes follow an exponential distribution,then they have a constant hazard rate.  

Other distributions used for modeling lifetime data include the lognormal, gamma, and Birnbaum‐Saunders distributions.

Page 11: Matlab statistics toolbox-Fitting Distributions to Dataastro › sites › Computational_Astronomy › html › 6shi… · Matlab statistics toolbox-Fitting Distributions to Data

subplot(1,1,1);[empF,x,empFlo,empFup] = ecdf(obstime,'censoring',censored);stairs(x,empF);hold on;stairs(x,empFlo,':'); stairs(x,empFup,':');hold offxlabel('Time'); ylabel('Proportion failed'); title('Empirical CDF')paramEsts = wblfit(obstime,'censoring',censored);[nlogl,paramCov] = wbllike(paramEsts,obstime,censored);xx = linspace(1,2*T,500);[wblF,wblFlo,wblFup] = wblcdf(xx,paramEsts(1),paramEsts(2),paramCov);stairs(x,empF);hold onhandles = plot(xx,wblF,'r‐',xx,wblFlo,'r:',xx,wblFup,'r:');hold offxlabel('Time'); ylabel('Fitted failure probability'); title('Weibull Model vs. Empirical')

Page 12: Matlab statistics toolbox-Fitting Distributions to Dataastro › sites › Computational_Astronomy › html › 6shi… · Matlab statistics toolbox-Fitting Distributions to Data

Adding a Smooth Nonparametric Estimate

The pre‐defined functions provided with the Statistics Toolbox don't include any distributions that have an excess of early failures like this.We might want to draw a smooth, nonparametric curve through the empirical CDF, using the function ksdensity.

Page 13: Matlab statistics toolbox-Fitting Distributions to Dataastro › sites › Computational_Astronomy › html › 6shi… · Matlab statistics toolbox-Fitting Distributions to Data

delete(handles(2:end))[npF,ignore,u] = ksdensity(obstime,xx,'cens',censored,'function','cdf');line(xx,npF,'Color','g');npF3 = ksdensity(obstime,xx,'cens',censored,'function','cdf','width',u/3);line(xx,npF3,'Color','m');xlim([0 1.3*T])title('Weibull and Nonparametric Models vs. Empirical')legend('Empirical','Fitted Weibull','Nonparametric, default','Nonparametric, 1/3 default', ...

'location','northwest');hazrate = ksdensity(obstime,xx,'cens',censored,'width',u/3) ./ (1‐npF3);plot(xx,hazrate)title('Hazard Rate for Nonparametric Model')xlim([0 T])

Page 14: Matlab statistics toolbox-Fitting Distributions to Dataastro › sites › Computational_Astronomy › html › 6shi… · Matlab statistics toolbox-Fitting Distributions to Data

Alternative Models

For this example, the Weibull distribution was not a suitable fit.  We were able to fit the data well with a nonparametric fit, but that model was only useful within the range of the data.The Statistics Toolbox includes other functions such as the lognormal, gamma, and Birnbaum‐Saunders.Fitting Custom Univariate Distributions, Part 2demo. use a mixture of two parametric distributions ‐‐ one representing early failure and the other            representing the rest of the distribution. Fitting Custom Univariate Distributions demo.

Page 15: Matlab statistics toolbox-Fitting Distributions to Dataastro › sites › Computational_Astronomy › html › 6shi… · Matlab statistics toolbox-Fitting Distributions to Data

Fitting Custom Univariate Distributions

Use mle function to fit custom distributions to univariate data.You can write code to compute the probability density function (PDF) for the distribution that you want to fit, and mle will do most of the remaining work for you.

Page 16: Matlab statistics toolbox-Fitting Distributions to Dataastro › sites › Computational_Astronomy › html › 6shi… · Matlab statistics toolbox-Fitting Distributions to Data

Fitting Custom Distributions: A Zero-Truncated Poisson ExampleIn some situations, counts that are zero do not get recorded in the data.

For this example, we'll use simulated data from a zero‐truncated Poisson distribution.  

Page 17: Matlab statistics toolbox-Fitting Distributions to Dataastro › sites › Computational_Astronomy › html › 6shi… · Matlab statistics toolbox-Fitting Distributions to Data

randn('state',0); rand('state',0);n = 75;lambda = 1.75;x = poissrnd(lambda,n,1);x = x(x > 0);length(x)                                  ans =68hist(x,[0:1:max(x)+1]);pf_truncpoiss = @(x,lambda) poisspdf(x,lambda) ./ (1‐poisscdf(0,lambda));          1‐Pr{0}start = mean(x)                    start =2.1029[lambdaHat,lambdaCI] = mle(x, 'pdf',pf_truncpoiss, 'start',start, 'lower',0)avar = mlecov(lambdaHat, x, 'pdf',pf_truncpoiss);stderr = sqrt(avar)              stderr =0.1827

Page 18: Matlab statistics toolbox-Fitting Distributions to Dataastro › sites › Computational_Astronomy › html › 6shi… · Matlab statistics toolbox-Fitting Distributions to Data

Supplying Additional Values to the Distribution Function: A Truncated Normal

ExampleIt sometimes also happens that continuous data are truncated.  For example, observations larger than some fixed value might not be recorded because of imitations in the way data are collected.  This example will show how to fit a normal distribution            to truncated data, using the function mle.

Page 19: Matlab statistics toolbox-Fitting Distributions to Dataastro › sites › Computational_Astronomy › html › 6shi… · Matlab statistics toolbox-Fitting Distributions to Data

n = 75;mu = 1;sigma = 3;x = normrnd(mu,sigma,n,1);xTrunc = 4;x = x(x < xTrunc);length(x)                       ans =64hist(x,[‐10:.5:4]);pdf_truncnorm = @(x,mu,sigma) normpdf(x,mu,sigma) ./ normcdf(xTrunc,mu,sigma);start = [mean(x),std(x)]                   start =0.4491    2.3565[paramEsts,paramCIs] = mle(x, 'pdf',pdf_truncnorm, 'start',start, 'lower',[‐Inf 0])                   paramEsts =1.7136    3.1553acov = mlecov(paramEsts, x, 'pdf',pdf_truncnorm)stderr = sqrt(diag(acov))

Page 20: Matlab statistics toolbox-Fitting Distributions to Dataastro › sites › Computational_Astronomy › html › 6shi… · Matlab statistics toolbox-Fitting Distributions to Data

The end

Thank you!

Page 21: Matlab statistics toolbox-Fitting Distributions to Dataastro › sites › Computational_Astronomy › html › 6shi… · Matlab statistics toolbox-Fitting Distributions to Data
Page 22: Matlab statistics toolbox-Fitting Distributions to Dataastro › sites › Computational_Astronomy › html › 6shi… · Matlab statistics toolbox-Fitting Distributions to Data
Page 23: Matlab statistics toolbox-Fitting Distributions to Dataastro › sites › Computational_Astronomy › html › 6shi… · Matlab statistics toolbox-Fitting Distributions to Data
Page 24: Matlab statistics toolbox-Fitting Distributions to Dataastro › sites › Computational_Astronomy › html › 6shi… · Matlab statistics toolbox-Fitting Distributions to Data
Page 25: Matlab statistics toolbox-Fitting Distributions to Dataastro › sites › Computational_Astronomy › html › 6shi… · Matlab statistics toolbox-Fitting Distributions to Data
Page 26: Matlab statistics toolbox-Fitting Distributions to Dataastro › sites › Computational_Astronomy › html › 6shi… · Matlab statistics toolbox-Fitting Distributions to Data

R = WBLRND(A,B,M,N,...)scale parameter A and shape parameter BB decides the shape of the curve and a expands or dwindle the curve.