7/30/2019 Project Data Mining and Project Estimation Top-Down Methodology With TRANSCALE Tool
1/16
7/30/2019 Project Data Mining and Project Estimation Top-Down Methodology With TRANSCALE Tool
2/16
PM World Today June 2010 (Vol XII, Issue VI)
Introduction
Experience of project data mining over the last 40-50 years showed that utilization in this area of
primitive methods of statistical analysis gives equally primitive and most importantly wrongresults. For unknown reasons, it is assumed that complex and multifaceted process of project
data mining can be solved by such a primitive method, which is a direct application of regression
analysis to these data.
Even the prolonged failure in this area is unable to shake people's faith that their intuition is
sufficient to solve complex quantitative problems of project data mining. Sometimes peoplemake even such a statement that the application of basic quantitative methods in this area is an
end in itself and cannot lead to success.
Specifying the above statements for project data mining one can be simply amazed by the
insistence of the leading universities and research centers that continue to use statistical methods
for this specific purpose, despite the fact that these methods in terms of accuracy over the past
40-50 years have never paid off in the field of project management.
Usually, if some scientific methodology systematically does not work very well, people simply
refuse its further usage, trying to replace it with new, more reasonable solutions to the problems.But to our surprise, this does not happen in the area of project data mining and estimation of
project parameters. It's time to realize that to solve problems in this area there is a need to shift
from the outdated methods of statistical analysis to a more scientifically-sound methodologies.
To do this it is necessary, following the experience of more developed areas of knowledge, to try
to get out of simple empiricism which currently dominated in this area and to develop more
sound scientific methodologies in the field of project management.
Fortunately, one can cite many instructive examples from the other areas of knowledge. Nearly
every serious quantitative science has passed through this way and it is not necessary to break anew ground in project management. In order to do this we must use the experience of those areas
of knowledge, which in spirit are the closest to the problems of project management.
In this sense, it is important to use the experience of classical thermodynamics, which has passed
all the way from the primitive empiricism to the most current heights of scientific and practical
achievements. Experience in other fields of knowledge shows that overcoming the limitations ofthe statistical approach one can proceed to the development of the genuine mathematical theory
of projects.
In this way, we must first get rid of the so-called statistical curse, when the results of dataprocessing are directly dependent on the choice of specific data. In a truly scientific approach,
this cannot happen, and always stable results of data processing should be invariant with respect
to specific data.
PM World Todayis a free monthly eJournal - Subscriptions available at http://www.pmworldtoday.net Page 2
7/30/2019 Project Data Mining and Project Estimation Top-Down Methodology With TRANSCALE Tool
3/16
PM World Today June 2010 (Vol XII, Issue VI)
As an example we can point to Ohm's law, the essence of which is independent of specific data.
The same can be said about the other laws of a fundamental nature. Namely such approaches
and theories must be developed in the field of project management.
For example, if we study the functional relationship between the project total effort and itsduration, each new dataset could lead to a new and uncertain result.
Of course, for 15-20 years one can collect data on, say, 10,000 projects and be confident that 7-8
projects over the past month cannot change the statistical trend, derived from data on 10,000
projects. But that does not mean that these stable results are correct and that this approach to dataprocessing is legitimate.
In reality it is simply a self-deception, regardless of whether it is done consciously, or
unconsciously. Assume we deal with the functional relationship between project effort and itsduration.
Only the fact that the project data were collected over a long period of time makes the jointprocessing of the whole data meaningless, because of change in productivity during the long
time of data collection due to new methodologies and tools.
On the other hand, if we try to use for analysis purposes only the most recent projects, we will
inevitably face the problem of non-applicability of statistical methods to small data.
The persistent application of statistical methods in this case of small project data already wears
cartoon character and can only be justified by considerations of business. Obviously, such a
statistical approach to interpretation of small project data has nothing to do with the scientific
method.
Project data mining: State of the art
Lets for the analysis of contemporary methods of project data use a database consisting of 56
projects. The database contains information about the complexity of projectsW , their total effort
E, the duration of projects T, average team size avN and productivity of teams P.
Multi-parametric flat representation of these data with the aid of TRANSCALE tool [1] is shownin Fig.1. Lets using the sequence of coordinate axes denote this representation of data as
[ avN ,T, E,P]. There are numerous other multi-parametric plane representations of these data.TRANSCALE tool enables smooth transitions between these representations.
According to contemporary methods of project data mining, these data can be processed by
statistical methods [2]. As a result of this empirical analysis the functional relationships between
the parameters of projects can be obtained (Fig.2.1 - Fig. 2.8).
PM World Todayis a free monthly eJournal - Subscriptions available at http://www.pmworldtoday.net Page 3
7/30/2019 Project Data Mining and Project Estimation Top-Down Methodology With TRANSCALE Tool
4/16
PM World Today June 2010 (Vol XII, Issue VI)
Despite the fact that from a qualitative point of view, all the obtained curves have correct
behavior, that is the logic of increases and decreases of project parameters are not violated, but
this is not enough to ensure the adequacy and practicality of functional relationships obtained in
this way.
avN
E
Fig.1 [ avN ,T,E,P] presentation of project data in the flat multi-dimensional project space
In addition, for one of the curves even qualitative adequacy is not ensured. It is a functional
relationship between team productivity Pand the team average size avN that falls too fast. Other
curves also contain qualitative discrepancies. Just these discrepancies cannot be detected with the
naked eye.
An overall analysis of statistical methods for processing project data shows that their accuracy is
very low. This can be easily seen by applying the obtained empirical relationships for the
individual assessments of projects. This circumstance indicates that the statistical methodology isa deadlock for the area of quantitative project management.
A more detailed analysis shows that the statistical approach to the problems of data mining andproject estimation have two main drawbacks. Lets analyze these shortcomings using statistical
curves, presented in Fig.2.1 - Fig. 2.8.
1. These curves contain qualitative discrepancies, which simply means, that the trends presented
in Fig.2 does not reflect the genuine behavior of functional relationships between project
parameters.
PM World Todayis a free monthly eJournal - Subscriptions available at http://www.pmworldtoday.net Page 4
TP
7/30/2019 Project Data Mining and Project Estimation Top-Down Methodology With TRANSCALE Tool
5/16
PM World Today June 2010 (Vol XII, Issue VI)
Fig.2.1 Fig.2.2
Fig.2.3 Fig.2.4
Fig.2.5 Fig.2.6
Fig.2.7 Fig.2.8
PM World Todayis a free monthly eJournal - Subscriptions available at http://www.pmworldtoday.net Page 5
7/30/2019 Project Data Mining and Project Estimation Top-Down Methodology With TRANSCALE Tool
6/16
PM World Today June 2010 (Vol XII, Issue VI)
Fig.2 Results of the statistical treatment of project data
2. Even if these curves would not contain qualitative discrepancies, still they could not provide
high accuracy of the estimates, because the result of the statistical processing of the entire system
of data is a single fitting curve.
According to the most elementary considerations, based on the method of least squares, a singlecurve is not able to provide a relevant accuracy for data mining and project estimation in
principle.
This problem can be solved only by replacing the data systems with the families of curves, rather
than a single curve. Such a family of curves can be constructed based on different principles. The
most basic and obvious of these principles is the construction of approximating curves using the
state equation of projects with different conditions of constancy of the values of projectparameters.
Representation of project data in the form of a family of curves
For precise experimental investigation of phenomena people typically proceed as follows.
If the phenomenon is described by the large number of parameters, two of them, the functional
relationship between which is investigated, remain free, and the other parameters are kept
constant. Then, changing the values of one of the free parameters, the values of the other freeparameter are measured. Then the same procedure is repeated for other constant values of other
parameters. This approach permits the direct application of regression analysis for data analysis.
But such an approach is possible only when there is a chance to control the parameters of the
object under study.
Unfortunately experimentation in such a classic manner in the area of project management is
simply impossible, because it is connected with the huge organizational and financial difficulties.
PM World Todayis a free monthly eJournal - Subscriptions available at http://www.pmworldtoday.net Page 6
7/30/2019 Project Data Mining and Project Estimation Top-Down Methodology With TRANSCALE Tool
7/16
PM World Today June 2010 (Vol XII, Issue VI)
Fig.3 Presentation of the functional relationship between project effort Eand its
complexity W in the form of the family of curves
Fig.4 Presentation of the functional relationship between team productivity Pand theaverage team size avN in the form of the family of curves
For a more detailed discussion of the problem we turn to the state equation of projects [3].
PM World Todayis a free monthly eJournal - Subscriptions available at http://www.pmworldtoday.net Page 7
7/30/2019 Project Data Mining and Project Estimation Top-Down Methodology With TRANSCALE Tool
8/16
PM World Today June 2010 (Vol XII, Issue VI)
Fig.5. Presentation of the functional relationship between team productivity Pand project
effort Ein the form of the family of curves
In the field of project management, where there is no experiment in the classical sense of this
word and the project data is a result of a random collection, there are other ways to overcomesuch difficulties.
In particular, the project data can be divided into groups, using the condition of the relative
constancy of one of project parameters.
At the systemic level, the state equation of projects combines the parameters of the project and
development team [3].
WPTNav =** , (1)
and
WPE =* . (2)
For the dividing of project data into groups, we can order that data by increasing values of team
productivity, and divide this sequence of projects into groups. As a result we can have groups of
projects with relatively constant values of productivity. Fig.3 represents the functionalrelationship between project effort and its complexity for the four groups of projects with
relatively constant values of productivity. This allows us to replace functional relationship shown
in Fig. 2.2 in the form of a single approximating curve, with the family of straight lines (Fig. 3),which have higher accuracy of approximation.
Similarly, Fig.4 presents the functional relationship between team productivity and average team
size for relativly constant values of the ratioT
W, which is the throughput.
PM World Todayis a free monthly eJournal - Subscriptions available at http://www.pmworldtoday.net Page 8
7/30/2019 Project Data Mining and Project Estimation Top-Down Methodology With TRANSCALE Tool
9/16
PM World Today June 2010 (Vol XII, Issue VI)
Comparing the accuracies of a family of curves, shown in this figure, with the accuracy of
approximation, shown in Fig.2.7 it is easy to see that the accuracy in the case of a family of
curves is higher. Decreasing the interval of the relative constancy of the ratio
T
Wone can
achieve greater accuracy of approximation.
Fig.5 represents another example, which shows the functional relationship between teamproductivity and project effort as a family of curves that is consistent with the zones of constant
values of project complexity.
Project data mining and project estimation have a common methodological
basis
From the methodological point of view project data mining and project estimation are closely
linked, because they have a common conceptual framework. Therefore, lets consider theconceptual framework and common sources of information, on which are based both project data
mining and estimation of projects.
At the system level, the project can be represented by the following three main components.
1. The model of accumulation of the work performed during the execution of the project or just a
model of projects,
2. The objectives of the project (development cost, project duration, risk and other program levelor corporate level goals and objectives)
3. Restrictions imposed on the project.
At a structural level, the presentation of the project with three components shown in Fig.6.
Such a presentation of the project can be used for different purposes.
In particular, it applies both to project data mining, and for the planning and execution of
projects. Only in such diverse applications inputs and outputs for them differ from each other and
have different meanings.
In the case of the planning of projects having as inputs project complexity W and team
productivityP, it is necessary to find out the total effort Erequired for the project and the
distribution of that effort over time, including the definition of the planned project duration Tand the required number of people avN .
PM World Todayis a free monthly eJournal - Subscriptions available at http://www.pmworldtoday.net Page 9
7/30/2019 Project Data Mining and Project Estimation Top-Down Methodology With TRANSCALE Tool
10/16
PM World Today June 2010 (Vol XII, Issue VI)
Fig. 6 Quantitative presentation of projects with three components
In the case of project data mining all parameters of individual projects are known and it is
necessary to solve the problems associated with the classification of projects, and find out thefunctional dependencies between project parameters.
Information, needed for the reconstruction of projects
For the sake of simplicity, lets first determine which input information is needed for the
reconstruction of the average behavior of a project.
If in order to achieve such a goal to use as input information: 1. Project complexity W and
2. Team productivityP, with the hope that these data are of sufficient reliability, then, on thisbasis can be estimated the amount of total project effort only.
But for the planning or synthesis of a project we need not only the total amount of effort. Inaddition, we must have the distribution of this effort over time, which means that we must have
the number of working people as a function of time. If the finding of this function is associated
with difficulties, we must know at least the average team size avN .
But the solution of the problem of finding of the effort distribution over time, having informationabout the complexity Wand productivity Ponly, is impossible in principle. This means that the
solution to this problem requires additional input.
To clarify the essence of this additional information, consider the possible different
implementations of the same project (Fig. 7).
PM World Todayis a free monthly eJournal - Subscriptions available at http://www.pmworldtoday.net Page 10
Model of theprocess of work
Project goals andobjectives
Project constraints
and restrictions
Project model
Outputs
Inputs
7/30/2019 Project Data Mining and Project Estimation Top-Down Methodology With TRANSCALE Tool
11/16
7/30/2019 Project Data Mining and Project Estimation Top-Down Methodology With TRANSCALE Tool
12/16
PM World Today June 2010 (Vol XII, Issue VI)
in turn, will help us to improve the quality and accuracy of estimates and predictions of project
parameters.
1. Thus, as the complexity Wof project remains constant, then, of course, it cannot be the
cause of changes in project parameters avN , T and E .
2. Similarly, all changes of project parameters have little to do with the team productivity Pas well. In particular, the analysis of the functional relationship between team productivity
Pand average team size avN indicates that the productivity is a slowly falling function of
the number of people. In addition, for large values of the average team size change in
productivity is so small that as a first approximation, team productivity can be considered
as a constant. This means that only a small part of the change avN is related to the value
of productivity and mainly that change is defined by the value of change of the duration of
project T .
3. If the value of team productivity Pis almost constant for the larger values of avN then itmeans that in this case the total project effort Ealso will have a constant value.
4. In turn, this means that the distribution of project efforts over time is associated only with
the values of avN ,T, avN , T and almost has nothing to do with the values of project
complexity W and team productivity P.
5. From here it can be made the main conclusion, which states that it is fundamentally
impossible to obtain the effort distribution over time having as inputs project complexityWand team productivity Ponly.
6. This means that any project estimation system designed for the definition of project
duration and team size, along with input information on the project complexity and teamproductivity must have at least one more input. Otherwise, estimates of the project duration
and average team size will be an arena of arbitrary decisions (by the way, is what ishappening now).
Project objectives and effort distribution over time
Analysis shows that to find out the distribution of project effort over time first of all we need to
have information about the goals and objectives of project and their relative importance in
achieving maximum benefits at the level of the whole enterprise.
Find out the effort distribution over time, it means to define the duration of the project and, as a
minimum, the average size of the development team.
In turn, these values have a direct link with objectives of the project because each project within
their feasibility range can be performed within a short time with large number of people and for a
long time with a small number of people.
PM World Todayis a free monthly eJournal - Subscriptions available at http://www.pmworldtoday.net Page 12
7/30/2019 Project Data Mining and Project Estimation Top-Down Methodology With TRANSCALE Tool
13/16
PM World Today June 2010 (Vol XII, Issue VI)
On the other hand it is also known that the development cost of projects increases with the
reduction of project duration. This means that the solution of the problem of effort distribution
over time is closely related to the trade-off between the duration and cost of the project.
In its turn this means that the solution of this problem can be reduced to the analysis of the
priorities of the project objective functions and quantitative representation of these priorities.
A detailed discussion of this problem can be found in [4], where projects are classified in termsof project objectives. This classification as the criterion for the similarity of projects uses the
ratio of project duration over average team sizeavN
TR = .
From the standpoint of project data mining the above analysis means that it is necessary with the
aid of this criterion to divide the database into the groups of similar projects after which the
regression techniques can be applied to separate groups of data.
In terms of project estimation this means that for a complete presentation of the essence of the
project, along with the complexity of the project W and productivity Pit is necessary to have
quantitative information about the project objectives and their priorities.
Missing input information in the modern systems of project data mining and
project estimation
The main result of the above analysis is that in modern systems of project data mining andproject estimation there is a lack of information on the objectives of projects and their priorities.
This circumstance makes it impossible to obtain accurate functional dependencies between the
parameters of the project by statistical data processing.
Further utilization of these inaccurate functional relationships for the assessment and prediction
of new projects results large errors in the estimation of parameters of new projects, andultimately to the failure of projects.
The need to integrate the goals of projects and their priorities in the process of data mining is
explained by the fact that each specific value of project parameters reflects the entire designprocess, including the direct impact of goals and priorities on these parameters. Accordingly, the
processing of data must take into account the same considerations. In particular, processing of
project data must take into account the considerable impacts that have project objectives and
their priorities on the project duration.
The same applies to the estimation of projects in the process of their planning. Utilizing inplanning systems the input information on the project complexity and team productivity only is
not enough for the comprehensive assessment of a project. In order to estimate the duration of
projects and the average team size there is a need for the input information on the projectobjectives and their priorities.
PM World Todayis a free monthly eJournal - Subscriptions available at http://www.pmworldtoday.net Page 13
7/30/2019 Project Data Mining and Project Estimation Top-Down Methodology With TRANSCALE Tool
14/16
PM World Today June 2010 (Vol XII, Issue VI)
Conclusions
1. The accuracy of statistical methods of contemporary quantitative project management is
unacceptably low and therefore these methods are completely unsuitable to meet the dailyneeds of industry.
2. Statistical methodology of project data mining and analysis has a number of fundamentalweaknesses (For small data, it is unsuitable, since in this case, this approach can generate
very random results, and, moreover, the results of this treatment are highly dependent on the
specific data. This methodology is not suitable also for the large databases, due to thedifficulties of processing of the collected data related to their incompatibility with each
other).
3. The Achilles heel of statistical methods of project data mining is the strong instability ofthe results of such treatment and their dependence on specific data.
4. Even if as a result of statistical treatment of a large project database are obtained stableresults, they also may be unsuitable for practical applications, since the stable result doesnt
mean correct result.
5. Very often the stable results of statistical project data mining are not able to reflect the
reality in an adequate way; moreover, they simply might be wrong.
6. One of the main reasons of inaccuracy of statistical methods of project data mining is thatthe replacement of the entire system of data by a single approximating curve.
7. In order to increase the accuracy of statistical methods of project data mining it isnecessary to cover the systems of data points not with a single curve but with the families of
curves.
8. For that purpose the system of data points must be divided into groups by applying
advanced methods of project similarity analysis.
9. These methods of project similarity analysis should be based on top-down analysis of the
project objectives and their priorities.
10. The main shortcoming of modern methods of project data mining and project estimationis that these methods do not take into account for the project objectives and their priorities.
PM World Todayis a free monthly eJournal - Subscriptions available at http://www.pmworldtoday.net Page 14
7/30/2019 Project Data Mining and Project Estimation Top-Down Methodology With TRANSCALE Tool
15/16
PM World Today June 2010 (Vol XII, Issue VI)
11. Methods of project data mining and project estimation should have the same
methodological framework, based on the accounting of the objectives of projects and their
priorities.
References
1. Pavel Barseghyan (2010) Project Nonlinear Scaling and Transformation Methodology andTRANSCALE Tool.PM World Today May 2010 (Vol XII, Issue V). 16 pages.
2. S. Oligny, P. Bourque, A. Abrain, B. Fournier. Exploring the Relation Between Effort
and Duration in Software Engineering Projects.http://www.lrgl.uqam.ca/publications/pdf/536.pdf
3. Pavel Barseghyan. (2009). Principles of Top-Down Quantitative Analysis of Projects. Part1: State Equation of Projects and Project Change Analysis. PM World Today May 2009
(Vol XI, Issue V) http://www.pmworldtoday.net/featured_papers/2009/may/Principlesof
Top-Down-Quantitative-Analysis-of-Projects.html
4. Pavel Barseghyan (2009) Problems of the Mathematical Theory of Human Work
(Principles of mathematical modeling in project management).PM World TodayAugust 2009 (Vol XI, Issue VIII).
PM World Todayis a free monthly eJournal - Subscriptions available at http://www.pmworldtoday.net Page 15
http://www.lrgl.uqam.ca/publications/pdf/536.pdfhttp://www.pmworldtoday.net/featured_papers/2009/may/Principlesof%20Top-Down-Quantitative-Analysis-of-Projects.htmlhttp://www.pmworldtoday.net/featured_papers/2009/may/Principlesof%20Top-Down-Quantitative-Analysis-of-Projects.htmlhttp://www.lrgl.uqam.ca/publications/pdf/536.pdfhttp://www.pmworldtoday.net/featured_papers/2009/may/Principlesof%20Top-Down-Quantitative-Analysis-of-Projects.htmlhttp://www.pmworldtoday.net/featured_papers/2009/may/Principlesof%20Top-Down-Quantitative-Analysis-of-Projects.html7/30/2019 Project Data Mining and Project Estimation Top-Down Methodology With TRANSCALE Tool
16/16
PM World Today June 2010 (Vol XII, Issue VI)
About the Author
Pavel Barseghyan, PhD
Author
Dr. Pavel Barseghyan is a consultant in the field ofquantitative project management, project data
mining and organizational science. He is the founderof Systemic PM, LLC, a project management
company. Has over 40 years experience in academia, the electronics industry,the EDA industry and Project Management Research and tools development.
During the period of 1999-2010 he was the Vice President of Research forNumetrics Management Systems. Prior to joining Numetrics, Dr. Barseghyan
worked as an R&D manager at Infinite Technology Corp. in Texas. He was also afounder and the president of an EDA start-up company, DAN Technologies, Ltd.
that focused on high-level chip design planning and RTL structural floor planningtechnologies. Before joining ITC, Dr. Barseghyan was head of the ElectronicDesign and CAD department at the State Engineering University of Armenia,
focusing on development of the Theory of Massively Interconnected Systems andits applications to electronic design. During the period of 1975-1990, he was alsoa member of the University Educational Policy Commission for Electronic Design
and CAD Direction in the Higher Education Ministry of the former USSR. Earlier inhis career he was a senior researcher in Yerevan Research and Development
Institute of Mathematical Machines (Armenia). He is an author of ninemonographs and textbooks and more than 100 scientific articles in the area of
quantitative project management, mathematical theory of human work,electronic design and EDA methodologies, and tools development. More than 10Ph.D. degrees have been awarded under his supervision. Dr. Barseghyan holdsan MS in Electrical Engineering (1967) and Ph.D. (1972) and Doctor of Technical
Sciences (1990) in Computer Engineering from Yerevan Polytechnic Institute
(Armenia). Pavel can be contacted at [email protected].
PM World Todayis a free monthly eJournal - Subscriptions available at http://www.pmworldtoday.net Page 16
mailto:[email protected]:[email protected]