8
KSCE Journal of Civil Engineering (2013) 17(4):638-645 DOI 10.1007/s12205-013-0147-x 638 www.springer.com/12205 Construction Management The Probability Distribution of Project Completion Times in Simulation-based Scheduling Dong-Eun Lee*, David Arditi**, and Chang-Baek Son*** Received March 22, 2012/Accepted July 9, 2012 ··································································································································································································································· Abstract The assumption of the normality of the distribution of Project Completion Times (PCTs) in simulation-based scheduling has been generally accepted as the norm. However, it is well established in the literature that PCTs are not always normally distributed and that their distribution and variability are affected by the distribution and variability of activity durations. This paper presents an automated risk quantification method that determines the best fit Probability Distribution Function (PDF) of PCTs. The algorithm is programmed in MATLAB and generates a set of simulation outputs obtained by systematically changing the probability distribution functions that define activities’ durations in a network and analyzes the effect of different distributions of activity durations on the distribution of the PCTs. The procedure is described and the findings are presented. This easy-to-use computerized tool improves the reliability of simulation-based scheduling by calculating the exact PDFs of activity durations, simulating the network, and calculating the exact PDF of PCTs. It also simplifies the tedious process involved in finding the PDFs of the many activity durations, and is a welcome replacement for the normality assumptions used by most simulation-based scheduling researchers. Keywords: stochastic networks, simulation-based scheduling, project risk analysis ··································································································································································································································· 1. Introduction Simulation-based scheduling enhances the value of traditional scheduling methods by relaxing some of the restrictive assumptions of PERT. It enhances reliability by describing the Project Completion Time (PCT) as a probability distribution. But attention has not been paid to the normality assumption that is built in these scheduling methods, neither to the opportunity to improve the reliability of these scheduling methods by finding the best- fit-PDFs of the many activities in a schedule and an exact PDF of PCTs. Even though the statistical process to identify the best-fit- PDFs is well established (Ang and Tang, 1975), it has not been effectively used in simulation-based scheduling. PERT requires three time estimates (optimistic, most likely, and pessimistic times) for each activity. The estimates determine the probability distributions of the activity durations and eventually the completion time of the entire project. PERT assumes that activity duration is a random variable that can be derived by using a simple formula. The expected mean activity durations hence calculated are used to generate PCTs and their variance. Given that the PCT is the sum of the durations of the activities located on the critical path(s), researchers argue that it can be approximated with a normal distribution because it is known that the sum of a large number of independent and identically distributed random variables will approximate a normally distributed random variable. Therefore PERT assumes that PCTs constitute normally distributed random variables even though this assumption has been challenged by many researchers since the seminal study by MacCrimmon and Ryavec (1962). The large majority of simulation-based scheduling methods assume implicitly that PCTs follow a normal distribution too. For example, Ang and Tang (1975) assumed that the PDF of PCTs is the same as that of activity durations, which is a Gaussian random variable. Halpin and Riggs (1992) proposed a CYCLONE- CPM simulation approach to obtain the average project completion time using the CYCLONE system (1990). Lu and AbouRizk (2000) calculated the probability of completing a project by a specified duration using a theoretical normal distribution in their simplified CPM/PERT simulation model. The assumption of normality in calculating the PCT has been accepted and used without any question and without examining its authenticity. But if this assumption is not correct, it may lead to errors in the results. That is why, the best fit PDF of PCTs needs to be determined empirically, but automatically to handle a large network that is frequently encountered in practice. It is noteworthy that durations of all activities are not always *Member, Associate Professor, School of Architecture & Civil Engineering, KyungPook National University, Daegu 702-701, Korea (Corresponding Author, E-mail: [email protected]) **Professor, Dept. of Civil and Architectural Engineering, Illinois Inst. of Tech., Chicago, IL 60616, USA (E-mail: [email protected]) ***Member, Professor, Dept. of Architectural Engineering, Semyung University, Chungbuk 390-711, Korea (E-mail: [email protected])

The probability distribution of project completion times in simulation-based scheduling

Embed Size (px)

Citation preview

Page 1: The probability distribution of project completion times in simulation-based scheduling

KSCE Journal of Civil Engineering (2013) 17(4):638-645DOI 10.1007/s12205-013-0147-x

− 638 −

www.springer.com/12205

Construction Management

The Probability Distribution of Project Completion Times in Simulation-based Scheduling

Dong-Eun Lee*, David Arditi**, and Chang-Baek Son***

Received March 22, 2012/Accepted July 9, 2012

···································································································································································································································

Abstract

The assumption of the normality of the distribution of Project Completion Times (PCTs) in simulation-based scheduling has beengenerally accepted as the norm. However, it is well established in the literature that PCTs are not always normally distributed and thattheir distribution and variability are affected by the distribution and variability of activity durations. This paper presents an automatedrisk quantification method that determines the best fit Probability Distribution Function (PDF) of PCTs. The algorithm isprogrammed in MATLAB and generates a set of simulation outputs obtained by systematically changing the probability distributionfunctions that define activities’ durations in a network and analyzes the effect of different distributions of activity durations on thedistribution of the PCTs. The procedure is described and the findings are presented. This easy-to-use computerized tool improves thereliability of simulation-based scheduling by calculating the exact PDFs of activity durations, simulating the network, and calculatingthe exact PDF of PCTs. It also simplifies the tedious process involved in finding the PDFs of the many activity durations, and is awelcome replacement for the normality assumptions used by most simulation-based scheduling researchers.Keywords: stochastic networks, simulation-based scheduling, project risk analysis

···································································································································································································································

1. Introduction

Simulation-based scheduling enhances the value of traditionalscheduling methods by relaxing some of the restrictive assumptionsof PERT. It enhances reliability by describing the ProjectCompletion Time (PCT) as a probability distribution. But attentionhas not been paid to the normality assumption that is built inthese scheduling methods, neither to the opportunity to improvethe reliability of these scheduling methods by finding the best-fit-PDFs of the many activities in a schedule and an exact PDF ofPCTs. Even though the statistical process to identify the best-fit-PDFs is well established (Ang and Tang, 1975), it has not beeneffectively used in simulation-based scheduling.

PERT requires three time estimates (optimistic, most likely,and pessimistic times) for each activity. The estimates determinethe probability distributions of the activity durations and eventuallythe completion time of the entire project. PERT assumes thatactivity duration is a random variable that can be derived byusing a simple formula. The expected mean activity durationshence calculated are used to generate PCTs and their variance.Given that the PCT is the sum of the durations of the activitieslocated on the critical path(s), researchers argue that it can beapproximated with a normal distribution because it is known that

the sum of a large number of independent and identically distributedrandom variables will approximate a normally distributedrandom variable. Therefore PERT assumes that PCTs constitutenormally distributed random variables even though thisassumption has been challenged by many researchers since theseminal study by MacCrimmon and Ryavec (1962).

The large majority of simulation-based scheduling methodsassume implicitly that PCTs follow a normal distribution too. Forexample, Ang and Tang (1975) assumed that the PDF of PCTs isthe same as that of activity durations, which is a Gaussianrandom variable. Halpin and Riggs (1992) proposed a CYCLONE-CPM simulation approach to obtain the average projectcompletion time using the CYCLONE system (1990). Lu andAbouRizk (2000) calculated the probability of completing aproject by a specified duration using a theoretical normaldistribution in their simplified CPM/PERT simulation model.The assumption of normality in calculating the PCT has beenaccepted and used without any question and without examiningits authenticity. But if this assumption is not correct, it may leadto errors in the results. That is why, the best fit PDF of PCTsneeds to be determined empirically, but automatically to handle alarge network that is frequently encountered in practice.

It is noteworthy that durations of all activities are not always

*Member, Associate Professor, School of Architecture & Civil Engineering, KyungPook National University, Daegu 702-701, Korea (CorrespondingAuthor, E-mail: [email protected])

**Professor, Dept. of Civil and Architectural Engineering, Illinois Inst. of Tech., Chicago, IL 60616, USA (E-mail: [email protected])***Member, Professor, Dept. of Architectural Engineering, Semyung University, Chungbuk 390-711, Korea (E-mail: [email protected])

Page 2: The probability distribution of project completion times in simulation-based scheduling

The Probability Distribution of Project Completion Times in Simulation-based Scheduling

Vol. 17, No. 4 / May 2013 − 639 −

identically distributed, nor necessarily independent from eachother. Sometimes the durations of different activities are modeledusing different Probability Distribution Functions (PDFs) havingunique parameters. Some activity durations may have very longtails, and may be skewed to the left or to the right. In such cases,it would be incorrect to assume that PCTs are normally distributed.This paper presents an automated method that generates the bestfit PDF that characterizes the PCTs obtained from networksimulations, given that historical activity durations are assignedspecific PDFs. The algorithm is implemented by using MATLAB(Chapra and Canale, 2002; Schilling and Harris, 2000). The bestfit PDF of PCTs can be predicted more accurately and with moreconfidence if one uses this method, which integrates theautomated tool that finds the best fit PDF into stochasticsimulation based scheduling.

2. PDFs Used in Simulation-based Scheduling

A detailed review of simulation-based scheduling methodswas conducted by Adlakha and Kulkarni (1989). Much of theresearch deals with project completion times (Ahuja andNandakumar, 1985; Barraza et al., 2004; Dodin and Sirvanci,1990; Lee, 2005; Lee and Arditi, 2006; Sculli, 1983) whereasonly few researchers ever attempted to determine the probabilitydistribution of PCTs (Barraza et al., 2004; Cottrell, 1999; Lu andAbouRizk, 2000). The research studies in this field can becategorized into: (1) exact methods, (2) approximation methods,and (3) simulation methods.

The exact methods (Dodin, 1985; Fisher et al., 1985; Hagstrom,1990; Kulkarni and Adlakha, 1986) use a direct approach, butmake some restrictive assumptions resulting in limitations. Forexample, Hagstrom (1990) assumes that the probability distributionof the activity duration is a discrete distribution.

The approximation methods (Dodin, 1985; Dodin and Sirvanci,1990; Golenko-Ginzburg, 1989; Sculli, 1989; Sculli and Wong,1985) determine the distribution of PCTs by using an indirectapproach. For example, Sculli (1983) proposes a method tocompute the mean and variance of the PCT approximately.Cottrell (1999) proposes a simplified PERT by reducing thenumber of estimates of activity durations from three to two (i.e.,most likely and pessimistic times). Finally, Kamburowski (1985)attempts to use the normal distribution rather than the PERT-betato model activity duration.

The simulation methods (Barraza et al., 2004; Lee, 2005; Leeand Arditi, 2006; Lee and Shi, 2004; Lu and AbouRizk, 2000;Sculli, 1983) obtain the desired PCT statistics with activitydurations that have specific PDFs. After running tests, Sculli(1983) concludes that simulation is more accurate and moreeconomical than PERT. In addition, Sculli (1989) proposesvariance reduction techniques using a multivariate normaldistribution to model PCTs. Barraza et al. (2004) presentstochastic S-curves, which provide probability distributions fortime and cost at every intermediate point and at completion.After Lee (2008) identified the non-normality issues in simulation-

based scheduling, Kim et al. (2009) proposed a statistical processthat effectively identifies the best-fit-PDFs.

The reliability of network calculations depends on theprobability distribution of PCTs. The distribution of PCTs mayvary from a normal to an asymmetric distribution depending onfactors such as the size and configuration of the network, thedependence between paths, the probability distributions assignedto activity durations, and the number of competing and/ordominating paths.

Thirty years of experiences have been accumulated about theperformance of simulation-based scheduling methods. But theassumption that the simulation output is normally distributed hasbeen used over and over again because obtaining the mean andstandard deviation of a normal distribution is easier (Ang andTang, 1975; Halpin and Riggs, 1992; Lu and AbouRizk, 1990).After Perry and Greig (1975) found that a beta distribution isappropriate when the distribution of a data set is not known,identifying the parameters of this PDF has been the subject ofseveral researchers. For example, AbouRizk et al. (1991) presenteda procedure that estimates the beta parameters (α and β).AbouRizk and Halpin (1992) demonstrated that most earthmovingconstruction operations can be described by a beta PDF. Farid andKoning (1994), Maio and Schexnayder (2000), Fente et al.(2002), and Schexnayder et al. (2005) determined the parametersof a beta PDF and confirmed that the beta distribution is a closefit for modeling construction task time distributions. Thesemethods are particularly useful when not enough data areavailable or when subjective estimates are involved. In this respect,efforts to advance simulation-based scheduling have remainedstagnant. It should be noted that the normality assumption of thePCT may misrepresent the variation in the data obtained fromsimulation and may limit the project scheduler’s ability to drawinferences and make informed predictions and/or decisions.

3. Methodology

The method proposed in the study is described in the flowchartpresented in Fig. 1. The algorithm runs many simulation experimentsusing activity durations modeled with a specific PDF anddifferent PDFs for different activities. Several sets of PCTs aregenerated by the algorithm presented in Fig. 1.

● Step ①: A network schedule is modeled using PrimaveraProject Planner (P3) . The schedule data exported from P3are read by the system.

● Step ②: The schedule data read in step ① are convertedinto an appropriate data structure for simulation runs. Theconverted schedule data are saved in an Excel spreadsheetfile.

● Step ③: The deterministic activity duration data are read bythe system.

● Step ④: In CPM mode, a set of predefined deterministicdurations are assigned to the activities.

● Step ⑤: Deterministic CPM calculations are performed.The PCT is saved in the computer’s memory.

Page 3: The probability distribution of project completion times in simulation-based scheduling

Dong-Eun Lee, David Arditi, and Chang-Baek Son

− 640 − KSCE Journal of Civil Engineering

● Step ⑥: In PERT mode, activities’ most likely times arerepresented by predefined deterministic durations. The mostlikely times are then used to compute optimistic and pessi-mistic times based on the assumption that the Coefficient ofVariation (COV) is 20% as in Ang and Tang (1975). It isnoteworthy that COV is adjustable for a user’s preference.The three time estimates (i.e., optimistic, most-likely, andpessimistic times) hence generated are used in the probabi-listic PERT mode.

● Step ⑦: Probabilistic PERT calculations are performed.The mean and variance of the PCTs are computed and savedin the computer's memory.

● Step ⑧: In Simulation mode, the durations of the activitiesare generated by simulation. The deterministic activity dura-tions read in step ③ are used as the mean values of the expo-nential distribution as described later in Case I. Since thecoefficient of variation is taken as 20%, the system automat-ically generates a user defined number of random variatesfor each activity by using the activity’s probability distribu-tion. The random variates of activity durations are saved inan Excel spreadsheet file.

● Step ⑨: The Maximum Likelihood Estimates (MLEs) ofthe parameters of the PDFs (i.e., normal, lognormal, beta,uniform, exponential, gamma, generalized extreme value,extreme value, and Weibull, etc) are computed by using theactivity durations obtained in step ⑧.

● Step ⑩: A PDF is selected with respect to the MLE infor-mation calculated in step ⑨. It is also possible to specify a

PDF depending on the user’s preference. Seven PDFs (nor-mal, lognormal, beta, uniform, triangular, exponential, andWeibull) were specified in this study in addition to the deter-ministic and PERT-Beta values.

● Step ⑪: Stochastic schedule simulation (S3) is conducted.The S3 algorithm is described in steps ⑫ to .

● Step ⑫: The number of iterations is set by the user of thesystem. In the experiments conducted in this study, 3,000activity durations were generated for each activity for eachPDF.

● Step ⑬: The initial number of iterations is set to zero.● Step ⑭: Activity durations are generated using a random

number generator that produces random variates using thePDFs and their MLEs estimated in step ⑨. The randomnumber generation functions (i.e., Normrnd for normal,Lognrnd for lognormal, Betarnd for beta, Unifrnd for uni-form, Trirnd for trianmgular, Exprnd for exponential, andWebrnd for Weibull distributions, etc) available in MAT-LAB are used to generate activity durations based on thePDF assigned to an activity. The kernel smoothing functionin MATLAB is used to convert the discrete data into a con-tinuous distribution of PCTs.

● Step ⑮: The forward pass algorithm is executed by usingthe random durations generated in step ⑭. PCTs areobtained after performing the CPM calculations. The projectcompletion time corresponds to the event time of the endnode.

● Step : After the maximum number of specified simula-

Fig. 1. Algorithm to Establish the Distribution of PCTs Using Simulation

Page 4: The probability distribution of project completion times in simulation-based scheduling

The Probability Distribution of Project Completion Times in Simulation-based Scheduling

Vol. 17, No. 4 / May 2013 − 641 −

tion iterations is completed, the PCTs obtained in step ⑭ arecollected, and saved in a vector. When 3,000 simulation iter-ations are completed, 3,000 sets of PCTs are obtained andsaved in respective vectors.

● Step : If the total number of simulation runs is below thepredefined number of iterations, the program performs steps⑬ to again. The algorithm moves to the next step assoon as the number of iterations reaches the maximum num-ber of iterations set by the user (3,000 in this study). The“for repetition structure” is used to repeat steps ⑬ to ⑮ forthe specified number of iterations.

● Step : After getting a set of 3,000 PCTs, the minimumnumber of simulation runs is calculated (Ang and Tang,1975; Lee and Arditi, 2006).

● Step : The system checks whether the simulation experi-ment passes the maturity test, i.e., whether more than 3,000iterations are necessary. The algorithm compares if the mini-mum number of simulation runs determined in step isgreater than 3,000 iterations set automatically by the systemat the outset in step ⑫.

● Step : If the minimum number of simulation runs is equalto or smaller than 3,000, the simulation experiment is consid-ered to have reached appropriate maturity (Lee and Arditi,2006). Otherwise, the workspace is cleared and the algorithmreturns to step ⑪. The maximum number of simulation itera-tions is set to the value calculated in step . Then, steps ⑪ to

are repeated using the “for repetition structure”.● Step : When the simulation experiments reach maturity,

the best-fit-PDF and its parameters describing the sets of PCTsare identified by using the automated distribution fitting toolsnamed BestFitPDF implemented in MATLAB. Then the PDFsand their parameters are saved in the computer’s memory.

● Step : The system checks whether the simulation experi-ments using the PDFs are completed.

● Step : The results computed in the deterministic CPMmode (refer to step ⑤), the mean and standard deviationcomputed in the probabilistic PERT mode (refer to step ⑦),and the best-fit-PDFs and their parameters computed in thesimulation mode (refer to step ) for all PDFs are pre-sented to the system user in graphical format. The Probabil-ity Distribution Functions (PDFs) and Cumulative DistributionFunctions (CDFs) of the PCTs, each describing 3,000 dura-tions, are plotted using the vectors holding the PCTs gener-ated at each iteration. The theoretical cumulative distributionfunctions are plotted rather than the corresponding empiricalcumulative distribution functions for the samples. Tests areconducted to identify the best fit for each PDF and to com-pute its statistical parameters (i.e., means and standard devi-ations) by using an automated risk quantification method.The best fit PDF is identified by inspecting the Log likeli-hood values. The method of maximum likelihood is used forparameter estimation. An automated distribution fitting algo-rithm named BestFitPDF (pseudocode available uponrequest) was developed by the authors. BestFitPDF was

developed by using MATLAB to find the distribution thatbest fits the PCTs at hand. As per Ang and Tang’s recom-mendation (Ang and Tang, 1975), the PDF with the largestLog likelihood value is selected as the best fit.

The program described above has the capability to handle anetwork that consists of an unlimited number of activities veryefficiently in simulation. It directly uses schedule data exportedfrom Primavera Project Planner (P3). The only other requirement isa file containing historical activity duration data. So if one hasaccess to historical activity duration data and a P3 schedule filethat includes information about activity ID, activity name,predecessors, successors, and activity duration, the program willautomatically find the PDFs of the many activities in thenetwork, simulate the network and find the exact PDF of PCTs.Furthermore, the program computes four different modes, (1)CPM, (2) PERT, (3) normality based simulation, and (4) best fitbased simulation, for comparison purposes.

The simulation method proposed in Fig. 1 provides modelingflexibility to assign different PDFs to the durations of differentactivities in a network. It is noteworthy that the simulationmethod can define activities’ durations using the same PDF forall activities on a network or different PDFs for differentactivities. This is a major advancement that is comparable toexisting research by Kim et al. (2009) and Lee (2008). It facilitatesthe running of experiments in simulation-based scheduling. Thepredictability of simulation-based scheduling may be improvedby applying the automated statistical method that finds the bestfit PDF of PCTs, from which one can infer the probability ofcompletion of a project within a PCT with higher confidence.

4. Case Studies

4.1 Case IThe network shown in Fig. 2 was reproduced from Abdelkader’s

work (2004) to demonstrate the procedure described in thepreceding section. It consists of an activity-on-arrow networkcomposed of 13 nodes and 24 activities. Each activity is assigned amost likely duration by the user. The durations are either useddirectly for CPM or converted to appropriate parameters dependingon the PDF that is selected to model the activity durations.

After defining activities’ durations as PDFs (or deterministicor PERT durations), the algorithm presented in Fig. 1 was usedto find the PDF that best describes the distribution of the PCTs.Simulation was performed by using seven PDFs for activitydurations (normal, lognormal, beta, uniform, triangular, exponential,

Fig. 2. Network Data for Case I (Reproduced from Abdelkader,2004)

Page 5: The probability distribution of project completion times in simulation-based scheduling

Dong-Eun Lee, David Arditi, and Chang-Baek Son

− 642 − KSCE Journal of Civil Engineering

and Weibull), hence generating seven sets of simulation outputsfor PCTs.

PCTs were generated automatically by the algorithm at each ofthe 3,000 simulation iterations. The algorithm plotted thetheoretical PDFs and CDFs of the PCTs. It then performed teststo find the best fit for each PDF and calculating the respectiveparameters using BestFitPDF, the tool developed by the authorsand described briefly earlier.

The results of CPM, PERT and simulation using various PDFsthat represent the best fits were compared and analyzed as shownin Figs. 3 and 4. This information includes the means, standarddeviations, and probabilities of completion presented in the righthalf of Table 1 (columns 7 to 11). The probabilities to complete aproject within the PCT dictated by deterministic CPM (i.e.,8.333 in this example) were computed using the parameters ofthe best fit PDFs and are presented in column (11) of Table 1.The findings indicate that the lognormal distribution is found tobe the best fitting distribution for PCTs when asymmetricdistributions (e.g., Weibull, exponential and lognormal) with along tail are used to model activity duration. On the other hand,the normal distribution appears to be the best fitting distributionfor PCTs when symmetric distributions (e.g., uniform, triangular,normal and Beta) are used to model activity duration.

In addition, if one uses the CPM and PERT PCT as benchmark(8.3333), all schedules that use the PDFs tested generate longerPCTs. The exponential and Weibull distributions appear to beparticularly conservative. When the probability of completion isconsidered for the different PDFs, it can be seen that theexponential distribution results in the most conservative PCTwhereas CPM results in the least conservative PCT, with thePCTs of the other distributions ranking as follows:

Exponential >Weibull > Uniform > Triangular > Lognormal> Normal> Beta > PERT > CPM

On the other hand, when the variability is considered for the

Fig. 3. Comparative Study of Distributions of PCTs

Fig. 4. Comparative Study of Cumulative Distributions of PCTs

Table 1. Statistics of Normal vs Best Fit PDFs Representing PCTs, Using Different PDFs for Activity Durations

MethodPDF of activity

duration

Statistics of normally distributed PCTs Statistics of PCTs fitted with best PDFs Percent difference in completion probabilities between best fit PDF

and normal PDF[(11)-(6)] / (11)

Normal distribution for all cases

Mean Standard deviation

Probability of completing

in 8.3333

Best fitting distribution identified

Log likelihood Mean Standard

deviationProbability

of completing in 8.3333

(1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) (12)

Simulation

Exponential Normal 10.8058 3.0017 21.290% Lognormal -7580.53 10.7868 3.2305 23.825% 11%

Weibull Normal 10.8442 3.4079 24.800% Lognormal -734.361 10.4494 2.9714 25.669% 3%

Uniform Normal 8.62264 0.8691 36.959% Normal -3835.28 8.62264 0.8691 36.959% 0%

Triangular Normal 8.47445 0.6308 41.147% Normal -2873.86 8.47445 0.6308 41.147% 0%

Lognormal Normal 8.3754 0.3164 45.440% Lognormal -91.7136 8.40673 0.3294 41.906% 8%

Normal Normal 8.35890 0.3214 46.816% Normal -84.6774 8.35890 0.3214 46.816% 0%

Beta Normal 8.33799 0.3043 49.385% Normal -68.2551 8.33799 0.3043 49.385% 0%

PERT PERT-Beta 8.3333 1.3889 50% 8.3333 1.3889 50%

CPM Determinis-tic 8.3333 - 100% 8.3333 - 100%

Page 6: The probability distribution of project completion times in simulation-based scheduling

The Probability Distribution of Project Completion Times in Simulation-based Scheduling

Vol. 17, No. 4 / May 2013 − 643 −

different PDFs, it is observed that the variability ranking is asfollows:

(Exponential, Weibull) > PERT > (Uniform, Triangular)> (Normal, Lognormal, Beta) > CPM

The probabilities of completion assuming a normal distributionfor PCTs in all cases are presented in column (6) of Table 1. Acomparison of the probabilities presented in columns (6) and(11) in Table 1 indicates that the probabilities obtained by thetwo modes (i.e., normality based simulation, and best fit basedsimulation), differ by 11%, 3%, and 8%, respectively, whenexponential, Weibull, and lognormal distributions are used tomodel activity durations in the case study. This finding confirmsthat assuming normally distributed PCTs may compromise theaccuracy of the simulation output and lead to erroneous results.In addition, this case study confirms that the PDF of PCTs ishighly dependent on the choice of a PDF to model activitydurations, even though appropriate MLEs may have been providedfor each activity’s PDF.

Therefore, this case study ascertains that the normality assumption,which frequently has been used in construction simulationstudies, may lead to misleading decisions. In addition, theinformation in Table 1 confirms that Kamburowski’s (1985) andCottrell’s (1999) attempts to use the normal distribution ratherthan the PERT-beta to model activity duration, are acceptable

even though they may result in some deviation. The statistics ofPCTs may vary depending on several factors (e.g., the size andconfiguration of the network, the dependence between paths,the PDFs assigned to activity durations, and the number ofcompeting and/or dominating paths). However, the proposedsystem would always provide information similar to theinformation in Table 1.

4.2 Case IIBecause the critical path(s) of the network used in Case I seldom

change when the activity durations are changed stochastically,another case study was undertaken where the network generatesdifferent critical paths when activity durations are changed. Thenetwork illustrated in the next case study has critical paths thatare very intensively competing with each other as activitydurations change.

The network shown in Fig. 5 was adapted from Lee andArditi’s work (2006) and is used to verify the effects of variabilityin the distribution of activity durations on the distribution of thePCTs under the circumstances of changing critical paths. Thistime, a sensitivity analysis was performed by changing the rangeand shape parameters of the beta distribution (i.e., q and r) that fitthe set of data under study. While holding the q and r parametersthe same, the PDFs of each set of PCTs were plotted as shown inFig. 6. When q = r = 1, q = r = 2, q = r = 3, and q = r = 4, themeans and standard deviations were calculated as µ1 = 303.5842and σ1 = 64.3486; µ2 = 288.4372 and σ2 = 50.2670; µ3 = 282.0477and σ3 = 40.9736; and µ4 = 277.0368 and σ4 = 37.8118, respectively.These findings show that the greater the variability assigned toactivity durations, the wider the scattering of the PCTs. In addition,when the parameter r was changed from 1 to 6 while holding qconstant, the distribution of the PCTs duplicated the skewness ofthe activity durations’ distribution. The parameters (q and r) of thebeta distribution were computed in this case by assuming that thecoefficient of variation is 20% as used by Ang and Tang (1975).But when the coefficient of variation was changed to 10%, theresulting PDF (µ = 182.5377 and σ = 60.7378) of the PCTs wasaffected significantly. Statistically speaking, this case suggests thatthe more dispersed the beta distributions of the activity durations,the more underestimated the probability to complete the projectcompared to the PERT benchmark.

5. Conclusions

It has been stated in the literature by many researchers and ithas been proven by the two case studies presented in this paperthat: (1) PCTs do not always display a normal distribution, and(2) the distribution and variability of PCTs are affected by thedistribution and variability of activity durations. Nevertheless,previous research studies relative to the existing methods (i.e.,most simulation-based scheduling) have assumed that PCTs arenormally distributed because it is easier and simpler to computethe probability of project completion when one assumesnormality. This study presents an automated tool developed by

Fig. 5. Network Data for Case II (Adapted from Lee and Arditi,2006)

Fig. 6. The Distributions of PCTs when the Beta Distribution isUsed to Define Activity Duration

Page 7: The probability distribution of project completion times in simulation-based scheduling

Dong-Eun Lee, David Arditi, and Chang-Baek Son

− 644 − KSCE Journal of Civil Engineering

using the facilities of MATLAB that generates the best fit PDFthat characterizes the PCTs given that activities are assignedspecific PDFs. This tool improves the reliability of stochasticsimulation-based scheduling.

This research found that a normal distribution should notalways be assumed in computing PCT if one wants to achievereliable results. As shown in Table 1, if one assumes that PCTsare normally distributed, PERT may lead to an approximately 10to 30% more optimistic PCT than when activity durations aregenerated assuming triangular, uniform, exponential, and Weibullfunctions. It is noteworthy that PERT underestimates the PCT byapproximately 10, 20, and 30% compared to when activitydurations are modeled assuming triangular (or uniform), exponentialand Weibull functions, respectively as shown in Fig. 4. Figs. 3and 4 show that the distribution defining PCTs should not alwaysbe assumed to be normal. Also, as shown in Figs. 5 and 6, thedistribution of PCTs is affected significantly by the distributionand variability of activity durations.

In the case studies presented in this paper, all activities of anetwork have the same PDF. However, in practice, it is possiblethat different activities display different characteristics thatnecessitate the use of different PDFs. It is possible to usedifferent PDFs for different activities in the same network whencalculating a PCT using the risk quantification method presentedin this paper. Using the automated system, one would be in abetter position to make reliable predictions in determining PCTby reducing the error contributed by inappropriate assumptionsand flawed analysis.

Assigning realistic individual PDFs to represent activitydurations and using simulation and “goodness of fit” principlesto identify the true distribution of PCTs improves the reliabilityof the network simulation model. This process eliminates theerrors contributed by the assumption of normality in thedistribution of PCTs. The PCT can be predicted more reliablyand with more confidence if one uses the automated methodproposed in this study that allows users to establish the truedistribution of the PCTs based on the different PDFs thatrepresent the different activities’ durations.

The main contribution of this study is the development of aneasy-to-use computerized tool that is a welcome addition tostochastic simulation-based scheduling researchers’ arsenal. Whencalculating the probability of project completion, the toolpresented here is as easy and simple to use as assuming a normaldistribution. Since it is well established that PCTs are not alwaysnormally distributed, there is every reason to use this tool ratherthan assuming normality.

It is cumbersome, time-consuming, and demanding to collectreliable activity duration data in most situations, because it isdifficult to separate the effect of operation disturbances, which aredefined as unexpected occurrences causing an interruption or adelay in the execution of tasks and causing a significantdiscrepancy between the target and actual productivity. Thedependency of project scheduling on the availability of historicalactivity duration data can be eliminated by using simulated

activity duration data as a feasible substitute to historicalactivity duration data, hence reducing the burden involved incollecting vast amounts of input data. On the other hand, if onewants to make use of real (not simulated) data, one shouldconsider further research into developing a formal method tocollect historical activity duration data and an algorithm thatautomatically retrieves the historical activity durations from aproject data warehouse.

Acknowledgments

This work was supported by the National Research Foundationof Korea (NRF) grant funded by the Korea government (MEST)(No. 2011-0027641). The contribution of the Ministry of Education,Science and Technology is gratefully acknowledged. Partially,this research was supported by Kyungpook National UniversityResearch Fund, 2012.

References

Abdelkader, H. Y. (2004). “Evaluating project completion times whenactivity times are Weibull distributed.” European Journal ofOperational Research, Vol. 157, No. 3, pp. 704-715.

AbouRizk, S. M. and Halpin, D. W. (1992). “Statistical properties ofconstruction duration data.” Journal of Construction Engineeringand Management, ASCE, Vol. 118, No. 3, pp. 525-543.

AbouRizk, S. M., Halpin, D. W., and Wilson, J. R. (1991). “Visualinteractive fitting of beta distributions.” Journal of ConstructionEngineering and Management, ASCE, Vol. 117, No. 4, pp. 589-605.

Adlakha, V. G. and Kulkarni, V. G. (1989). “A classified bibliography ofresearch on stochastic PERT networks.” 1966-1987, INFOR, Vol.27, No. 3, pp. 272-296.

Ahuja, N. T. H. and Nandakumar, V. (1985). “Simulation model toforecast project completion time.” Journal of Construction Engineeringand Management, ASCE, Vol. 111, No. 4, pp. 325-342.

Ang, A. H.-S. and Tang, W. H. (1975). Probability concepts in engineeringplanning and design: Volume I - basic principles, Wiley, New York,NY.

Barraza, A. G., Back, W. E., and Mata, F. (2004). “Probabilistic forecastingof project performance using stochastic Scurves.” Journal ofConstruction Engineering and Management, ASCE, Vol. 130, No.1, pp. 25-32.

Chapra, C. S., Canale, P. R. (2002). Numerical methods for engineerswith software and programming applications, Fourth Edition,McGraw-Hill, New York, NY.

Cottrell, D. W. (1999). “Simplified Program Evaluation and ReviewTechnique (PERT).” Journal of Construction Engineering andManagement, ASCE, Vol. 125, No. 1, pp. 16-22.

Dodin, B. M. (1985). “Approximating the distribution functions instochastic networks.” Computers and Operations Research, Vol. 12,No. 3, pp. 251-264.

Dodin, B. M. (1985). “Bounding the project completion time distributionin PERT networks.” Operations Research, Vol. 24, No. 4, pp. 862-881.

Dodin, B. M. and Sirvanci, M. (1990). “Stochastic networks and theextreme value distribution.” Computers and Operations Research,Vol. 17, No. 4, pp. 397-409.

Farid, F. and Koning, T. L. (1994). “Simulation verifies queuing program

Page 8: The probability distribution of project completion times in simulation-based scheduling

The Probability Distribution of Project Completion Times in Simulation-based Scheduling

Vol. 17, No. 4 / May 2013 − 645 −

for selecting loader-truck fleets.” Journal of Construction Engineeringand Management, Vol. 120, No. 2, pp. 386-404.

Fente, J., Schexnayder, C., and Knutson, K. (2002). “Defining a probabilitydistribution function for construction simulation.” Journal ofConstruction Engineering and Management, Vol. 126, No. 3, pp.234-241.

Fisher, D. L., Saisi, D., and Goldstein, W. M. (1985). “Stochastic PERTnetworks: OP diagrams, critical paths and the project completiontime.” Computers and Operations Research, Vol. 12, No. 5, pp. 471-482.

Golenko-Ginzburg, D. (1989). “A new approach to the activity-timedistribution in PERT.” Journal of Operational Research Society, Vol.40, No. 4, pp. 389-393.

Hagstrom, J. N. (1990). “Computing the probability distribution ofproject duration in a PERT network.” Networks, Vol. 20, No. 2, pp.231-244.

Halpin, D. W. (1990). MICROCYCLONE user’s manual, PurdueUniversity, Division of Construction Engineering and Management,West Lafayette, IN.

Halpin, D. W. and Riggs, L. S. (1992). Planning and analysis ofconstruction operations, Wiley, New York, NY.

Kamburowski, J. (1985). “Normally distributed activity durations inPERT networks.” Journal of Operational Research Society, Vol. 36,No. 11, pp. 1051-1057.

Kim, R.-H., Bae, T.-H., and Lee, D.-E. (2009). “Advanced stochasticsimulation-based scheduling system for improving usability inpractice.” Journal of Architectural Research, Architectural Instituteof Korea, Vol. 25, No. 5, pp. 221-230.

Kulkarni, V. G. and Adlakha, V. G. (1986). “Markov and Markov-regenerative PERT networks.” Operations Research, Vol. 34, No. 5,pp. 769-781.

Lee, D.-E. (2005). “Probability of project completion using StochasticProject Scheduling Simulation (SPSS).” Journal of ConstructionEngineering and Management, ASCE, Vol. 131, No. 3, pp. 310-318.

Lee, D.-E. (2008). “Non-normality of probability distribution of project

completion times in simulation based scheduling.” Journal ofArchitectural Research, Architectural Institute of Korea, Vol. 24,No. 4, pp. 143-151.

Lee, D.-E. and Arditi, D. (2006). “Automated statistical analysis instochastic project scheduling simulation.” Journal of ConstructionEngineering and Management, ASCE, Vol. 132, No. 3, pp. 268-277.

Lee, D.-E. and Shi, J. J. (2004). “Statistical analyses for simulatingschedule networks.” In: Proceeding of the 2004 Winter SimulationConference, Washington, D.C., pp. 1283-1289.

Lu, M. and AbouRizk, S. M. (2000). “Simplified CPM/PERT simulationmodel.” Journal of Construction Engineering and Management,ASCE, Vol. 126, No. 3, pp. 219-226.

MacCrimmon, K. R. and Ryavec, C. A. (1962). An analytical study ofthe PERT assumptions, Memorandum RM-3408-PR.

Maio, C. and Schexnayder, C. (2000). “Probability distribution functionfor construction simulation.” Journal of Construction Engineeringand Management, Vol. 126, No. 4, pp. 285-292.

Perry, C. and Greig, I. D. (1975). “Estimating the mean and variance ofsubjective distributions in PERT and decision analysis.” ManagementScience, Vol. 21, No. 12, pp. 1477-1480.

Schexnayder, C., Knutson, K., and Fente, J. (2005). “Describing a betaprobability distribution function for construction simulation.” Journalof Construction Engineering and Management, Vol. 131, No. 2, pp.221-229.

Schilling, J. R. and Harris, L. S. (2000). Applied numerical methods forengineers using MATLAB and C, Brooks/Cole, Boston, MA.

Sculli, D. (1983). “The completion time of PERT networks.” TheJournal of the Operational Research Society, Vol. 34, No. 2, pp.155-158.

Sculli, D. (1989). “A simulation solution to the PERT problem.” IMAJournal of Management Mathematics, Vol. 2, No. 3, pp. 255-265.

Sculli, D. and Wong, K. L. (1985). “The maximum and sum of two betavariables and the analysis of PERT networks.” Omega, Vol. 13, No.3, pp. 233-240.