Upload
others
View
8
Download
0
Embed Size (px)
Citation preview
Matlab statistics toolbox-Fitting Distributions to Data
刘静远 07111600082007.12.18
Analyzing Survival or Reliability Data
In biological or medical applications——survival analysis.The times may represent the survival time of an organism or the timeuntil a disease is cured
In engineering applications——reliability analysis.the times may represent the time to failure of a piece of equipment.
Special Properties of Lifetime DataWays of Looking at DistributionsFitting a Weibull DistributionAdding a Smooth Nonparametric EstimateAlternative Models
Special Properties of Lifetime Data
Some features of lifetime data distinguish them other types of data.
positive values.
some lifetimes may not be observed exactly.
distributions and analysis techniques are fairly specific to lifetime data.
rand('state',1);lifetime = [wblrnd(15000,3,90,1); wblrnd(1500,3,10,1)];T = 14000;obstime = sort(min(T, lifetime));failed = obstime(obstime<T); nfailed = length(failed);survived = obstime(obstime==T); nsurvived = length(survived);censored = (obstime >= T);plot([zeros(size(obstime)),obstime]', repmat(1:length(obstime),2,1), ...'Color','b','LineStyle','‐')
line([T;3e4], repmat(nfailed+(1:nsurvived), 2, 1), 'Color','b','LineStyle',':');line([T;T], [0;nfailed+nsurvived],'Color','k','LineStyle','‐')text(T,30,'<‐‐Unknown survival time past here')xlabel('Survival time'); ylabel('Observation number')
Ways of Looking at Distributions
Consider different ways of looking at a probability distribution.
A probability density function (PDF).A survivor function (1‐CDF). The hazard rate. It is the PDF divided by the survivor function(PDF./(1‐CDF))A probability plot is a re‐scaled CDF, and is used to compare data to a fitted distribution.
x = linspace(1,30000);subplot(2,2,1);plot(x,wblpdf(x,14000,2),x,wblpdf(x,18000,2),x,wblpdf(x,14000,1.1))title('Prob. Density Fcn')subplot(2,2,2);plot(x,1‐wblcdf(x,14000,2),x,1‐wblcdf(x,18000,2),x,1‐wblcdf(x,14000,1.1))title('Survivor Fcn')subplot(2,2,3);wblhaz = @(x,a,b) (wblpdf(x,a,b) ./ (1‐wblcdf(x,a,b)));plot(x,wblhaz(x,14000,2),x,wblhaz(x,18000,2),x,wblhaz(x,14000,1.1))title('Hazard Rate Fcn')subplot(2,2,4);probplot('weibull',wblrnd(14000,2,40,1))title('Probability Plot')
Fitting a Weibull Distribution
The Weibull distribution is a generalization of the exponential distribution. If lifetimes follow an exponential distribution,then they have a constant hazard rate.
Other distributions used for modeling lifetime data include the lognormal, gamma, and Birnbaum‐Saunders distributions.
subplot(1,1,1);[empF,x,empFlo,empFup] = ecdf(obstime,'censoring',censored);stairs(x,empF);hold on;stairs(x,empFlo,':'); stairs(x,empFup,':');hold offxlabel('Time'); ylabel('Proportion failed'); title('Empirical CDF')paramEsts = wblfit(obstime,'censoring',censored);[nlogl,paramCov] = wbllike(paramEsts,obstime,censored);xx = linspace(1,2*T,500);[wblF,wblFlo,wblFup] = wblcdf(xx,paramEsts(1),paramEsts(2),paramCov);stairs(x,empF);hold onhandles = plot(xx,wblF,'r‐',xx,wblFlo,'r:',xx,wblFup,'r:');hold offxlabel('Time'); ylabel('Fitted failure probability'); title('Weibull Model vs. Empirical')
Adding a Smooth Nonparametric Estimate
The pre‐defined functions provided with the Statistics Toolbox don't include any distributions that have an excess of early failures like this.We might want to draw a smooth, nonparametric curve through the empirical CDF, using the function ksdensity.
delete(handles(2:end))[npF,ignore,u] = ksdensity(obstime,xx,'cens',censored,'function','cdf');line(xx,npF,'Color','g');npF3 = ksdensity(obstime,xx,'cens',censored,'function','cdf','width',u/3);line(xx,npF3,'Color','m');xlim([0 1.3*T])title('Weibull and Nonparametric Models vs. Empirical')legend('Empirical','Fitted Weibull','Nonparametric, default','Nonparametric, 1/3 default', ...
'location','northwest');hazrate = ksdensity(obstime,xx,'cens',censored,'width',u/3) ./ (1‐npF3);plot(xx,hazrate)title('Hazard Rate for Nonparametric Model')xlim([0 T])
Alternative Models
For this example, the Weibull distribution was not a suitable fit. We were able to fit the data well with a nonparametric fit, but that model was only useful within the range of the data.The Statistics Toolbox includes other functions such as the lognormal, gamma, and Birnbaum‐Saunders.Fitting Custom Univariate Distributions, Part 2demo. use a mixture of two parametric distributions ‐‐ one representing early failure and the other representing the rest of the distribution. Fitting Custom Univariate Distributions demo.
Fitting Custom Univariate Distributions
Use mle function to fit custom distributions to univariate data.You can write code to compute the probability density function (PDF) for the distribution that you want to fit, and mle will do most of the remaining work for you.
Fitting Custom Distributions: A Zero-Truncated Poisson ExampleIn some situations, counts that are zero do not get recorded in the data.
For this example, we'll use simulated data from a zero‐truncated Poisson distribution.
randn('state',0); rand('state',0);n = 75;lambda = 1.75;x = poissrnd(lambda,n,1);x = x(x > 0);length(x) ans =68hist(x,[0:1:max(x)+1]);pf_truncpoiss = @(x,lambda) poisspdf(x,lambda) ./ (1‐poisscdf(0,lambda)); 1‐Pr{0}start = mean(x) start =2.1029[lambdaHat,lambdaCI] = mle(x, 'pdf',pf_truncpoiss, 'start',start, 'lower',0)avar = mlecov(lambdaHat, x, 'pdf',pf_truncpoiss);stderr = sqrt(avar) stderr =0.1827
Supplying Additional Values to the Distribution Function: A Truncated Normal
ExampleIt sometimes also happens that continuous data are truncated. For example, observations larger than some fixed value might not be recorded because of imitations in the way data are collected. This example will show how to fit a normal distribution to truncated data, using the function mle.
n = 75;mu = 1;sigma = 3;x = normrnd(mu,sigma,n,1);xTrunc = 4;x = x(x < xTrunc);length(x) ans =64hist(x,[‐10:.5:4]);pdf_truncnorm = @(x,mu,sigma) normpdf(x,mu,sigma) ./ normcdf(xTrunc,mu,sigma);start = [mean(x),std(x)] start =0.4491 2.3565[paramEsts,paramCIs] = mle(x, 'pdf',pdf_truncnorm, 'start',start, 'lower',[‐Inf 0]) paramEsts =1.7136 3.1553acov = mlecov(paramEsts, x, 'pdf',pdf_truncnorm)stderr = sqrt(diag(acov))
The end
Thank you!
R = WBLRND(A,B,M,N,...)scale parameter A and shape parameter BB decides the shape of the curve and a expands or dwindle the curve.