Upload
phamthuy
View
214
Download
0
Embed Size (px)
Citation preview
RILEY / DIEBERT 1
MICHAEL LEROY RILEY II
IAN DIEBERT
DR. SAM SIEWERT
CS-332 ORGANIZATION OF PROGRAMMING LANGUAGES
20APR17
COBOL: Lethal Injection or Fountain of Youth?
Towards the end of the 1950’s there were growing concerns among computer users and
manufacturers regarding the rising cost of programming. During this time people did not process
data on their laptops as we do today. In addition, servers were not a conceptually familiar entity
in the way that they are today. (Wikipedia) Data processing installations were dedicated
businesses, such as the Computer Services Bureau, which offered services such as payroll but at
a significant cost. In 1959, a survey was conducted which found that in any data processing
installation, the programming cost was approximately $800,000.00 on average and translating
programs to run on new hardware would cost approximately $600,000.00. (Wikipedia) While
those figures are not appetizing for any business, let’s remember the context by which those
figures are derived. Those dollar amounts were from 1959, so $800,000.00 would be
approximately equal to $6,702,432.99 in 2017 after inflation which is an increase of 737.8%.
(US Inflation Calculator.) In addition to these costs, there were many other reasons to create a
domain specific programming language. Many businesses during this time were creating their
own programming languages and / or software packages to perform the tasks relevant to their
organization. From a narrow vantage point of that organization this makes sense; however, when
one views the business arena from a broader perspective an immediate conclusion can be reached
that most of these businesses were of a similar nature, particularly in the sense that they were
RILEY / DIEBERT 2
businesses first. Although nearly every business was producing different physical products, their
data for business transactions only differed in the format of the records and the length of fields.
From a software engineering perspective, and coupling these facts with the exorbitantly high cost
of development, these businesses were similar enough to warrant exploration into the creation of
a domain specific ‘Common Business Oriented Language’, derivatively named COBOL.
There were only two major professionally developed programming languages present at
the time COBOL arose. Figure one below show a minimalistic view of the status of
programming languages at that time.
Figure 1: Origination of COBOL with
Key (www.digibarn.com)
Looking at figure 1 above, it may be difficult to grasp the significance of the amount of parallel
software development projects occurring in companies between Fortran and COBOL. Figure 2,
below, shows a tree of programming language development between the years of 1955-1960 for
just the Electronic Numerical Integrator and Computer (ENIAC). Although the ENIAC was not
only a programming language, it was under development at the same time. It is not far off to
2 | P a g e
RILEY / DIEBERT 3
come to the conclusion that as the hardware and capability of the ENIAC progressed and as
different customers with different business needs acquired it, the rate of programming language
development would fervently increase.
Figure 2: Software Development Efforts between 1955 and 1960 (Gordon)
In 1959, a meeting was held at the University of Pennsylvania with the goal of
formulating ideas for common business languages. This group soon propositioned the United
States Department of Defense (DoD) to sponsor an effort to create such a language. The DoD
was heavily invested in the crisis that was enveloping the business and software engineering
worlds at this time. The DoD itself operated 225 computers, had another 175 on back order, and
had already spent over $200 million on software development efforts for this machine. Needless
to say, the DoD was quite willing to assist in the effort and an early 1960 compiler construction
for COBOL-60 was underway. (Wikipedia)
If COBOL was developed to satisfy a niche market need then is questionable as to why
many software/computer engineers and scientists would speak the need for its replacement.
There are many arguments for both the continuation of COBOL via specification refinement and
for the complete retirement of the language in lieu of another more modern language such as C+
3 | P a g e
RILEY / DIEBERT 4
+. Arguments for the replacement of COBOL are centered around the cost of the programming
language in today’s market. Due to the poor design and documentation of legacy systems, it is
difficult to check downstream and upstream dependencies for COBOL modules, making it
expensive and risky to make any changes in the code. (Moffitt, 2017) COBOL programmers,
with and without vast experience, are paid handsomely for code maintenance and even more so
for code generation. According to www.payscale.com, a COBOL programmer with less than
one year of experience can expect to earn approximately $45k, while a ‘COBOLER’ with over
five years of experience can expect to earn $70k. These may not seem very impressive at first
but it must be considered that they represent the mean not the maximum. A C++ programmer’s
salary is comparable; however, the market is overloaded with C++ developers relative to
COBOLERs. The US government is also concerned with the vulnerabilities and expenses of an
aging IT infrastructure. On September 22, 2016, the US House of Representatives passed The
Modernizing Government Technology Act (Bailey, 2016).
Another area of concern that COBOL is facing in today’s market is the constant threat of
cyber-attacks; an area which may overwhelm the COBOL programming language and force it
into retirement. The United States government has experienced a number of attacks against
COBOL systems in recent years. The Office of Personnel Management (OPM) suffered a breach
of security in 2015 as did the Veterans Administration (VA). At the heart of their information
technology environment was COBOL. According to Nextgov.com three out of ten of the oldest
federal IT systems still in use run on COBOL. Not surprisingly the VA and OPM are listed as
the agencies still running COBOL. One question that persists in debates regarding this particular
topic is how cyber-attacks effect – directly and indirectly – COBOL systems. According to John
Walker, owner of Secure-Bastion Ltd, “At the end of the day it is a case of ‘security through
4 | P a g e
RILEY / DIEBERT 5
obscurity’ versus ‘devoid security through confusion’. On one hand, we have the inferred
security represented by say, non-routable protocols, or mainframe speciation partitions such as
LPAR. On the other side of the security coin we have unpatched outdated systems such as NT4.0
residing inside virtualization. No matter, the outcomes are the same: confusion, and an enormous
potential for unknown unknowns of insecurity to reside within the operational environment.”
(Naked Security) Although this argument seems logical there seems to be an easy work around
to the crucial point. If the main defense against injection attacks or any other form of cyber-
attack is a lack of knowledge regarding a subsystem and its associated protocols or layers then
the only thing required of a malicious actor is to learn about these facets of the system.
Furthermore, if the only assurance against the malicious actor gaining the requisite knowledge is
the fact that those systems are not well documented, then that really is not a great deal of
assurance at all. If a malicious actor had physical access to a COBOL system then learning
about it would be trivial. According to Dr. Jon Haass and Dr. Paul Hriljac, professors in the
College of Security and Intelligence at Embry-Riddle Aeronautical University Prescott, AZ,
“What is more likely the case is that a gateway or filter is used to provide connection to the
legacy system and modern systems. The challenge for COBOL is its lack of strong
authentication so if you can get past the gateway, you are in.” This seems to be a serious risk
that financial organizations are taking and it also seems to be a risk that will eventually result in
catastrophic failure. Dr. Haass and Dr. Hriljac continue on to say, “A person does not have to
have physical access to interfere, instead you hack the gateway or bridge system and then you
can practically access everything in the legacy code. Very little work is being done with Cobol
except to link up to new front ends whether they are web based or some other device.” When
5 | P a g e
RILEY / DIEBERT 6
cast in this light there seems to be some very real evidence that COBOL may need to be ported
to another programming language.
C++ exists as a potential competitor to COBOL and is one of the many languages that
has been analyzed and discussed for its possible benefits should a portion of the industry decide
to transit to the language. C++ contains many well-known and highly understood benefits; it is
more powerful (in terms of overall code length and supported features), it is better documented
(with perhaps many thousands of manuals and books dedicated to the language), and it is known
by a much wider set of programmers. These are easily provable points on a personal level; one
can search Google regarding function templates and see that they are not supported in COBOL,
or look up COBOL references/manuals to see that the number of results is far less than the same
of C++, or simply search the trends regarding COBOL and C++ to find that C++ is a more active
topic.
The primary concerns regarding COBOL are that it is obsolete, and lacks the dedicated
programmer bases maintained in languages such as java, python, and of course C++. Many
argue that a programming language which is generally less useful and performs to a worse
degree should not be maintained as the standard for the majority of the business and transactional
sectors of programming. Despite this argument, which is – from a high-level perspective – very
sound, businesses have always and continue still to support and run COBOL on even the most
critical of data systems. Leading to the hypothesis that for the usage of COBOL to continue
there must reasons and a market to substantiate this.
C++ was originally designed to be a general-purpose language that superseded C with
object oriented features, such that the language would stay low-level and close to the hardware.
As a superset of C, C++ would also retain the speed, portability, and all functionality present
6 | P a g e
RILEY / DIEBERT 7
within the C language; this made it an easy transition to all those programmers who were
implementing C. COBOL now was established as a language, but had already been witnessing a
decline in overall popularity; Edsger Dijkstra as early as 1975 (over a decade before the official
release of C++) stated, “the use of COBOL cripples the mind.” Regarding the overall lack of
structure present in the COBOL language.
C++ excels in the areas in which COBOL most lacks; structure, compatibility/efficiency
of adding to the language, the verbosity of syntax, support of the greater computer science
community, and the methods of design undertaken by the original creators and subsequent
standards committee.
C++ is incredibly well structured, primarily due to its derivation from C, and the
standards upon which it has been built. Well written C++ code can be understood and read
(albeit with some significant work) even as part of a million-line program, meaning that it can be
modified, updated, or ported to another language without the fear of tearing apart an entire
digital ecosystem simply through the alteration of a single piece of data. On the other end of the
spectrum, COBOL suffers from just that; much of the legacy COBOL code can no longer be
modified, as the older versions exist as archaic, unstructured, and illegible constructs that can no
longer be understood or, more importantly, modified without the danger of tearing down a
critical system.
C++ has also demonstrated many times that it is very capable of being modified without
sacrificing existing legacy support, or support for the entire modern C language. COBOL, as an
old and monolithic language, was unable to be upgraded without leaving behind much of the
code that had already been implemented. When COBOL-85 was released, it did not support
compatibility with older versions, and was therefore highly criticized; users citing the heavy
7 | P a g e
RILEY / DIEBERT 8
reprogramming costs of implementing the updated standard. It is important to note that the
board of standards for COBOL was unable to convince the then users of the language to switch
from it, to a more modern version of itself. This has led to over 300 different dialects of COBOL
and a user base that cannot and will not switch to even a more modern version of COBOL,
leaving aside the notion of separate modern language.
This ultimately led to COBOLs final primary detriment, which is the lack of continued
support by engineers and scientists at the top of the field. COBOL was already an underused
language during the mid-80s, which was strongly exacerbated with the alienation of the
computer science community regarding its development by members of commerce and
government. Elements such as the exclusion of Backus-Naur form for COBOLs syntax in lieu of
its own metalanguage prevented most of the computer engineering/science community from
paying much notice at all to COBOL.
The COBOL and C++ programming languages were evaluated against each other with
regards to binary executable size on disk, lines of code to perform the same operations, and pro-
gram execution time. Each program was written to the following requirements to ensure the
comparison be as representative to a true analysis as possible.
Read in a .CSV file
Store the data from the file into memory
Randomly generate values
Update the entire contents of the file by adding the randomly generated number to a spe-
cific field per row of the entire data set
Write the data back to a .CSV file
Be ran on the same computer (laptop) during the evaluation and analysis events
8 | P a g e
RILEY / DIEBERT 9
Figure 3: Visual Studio Performance Analysis of COBOL Execution
Figure 3, above, shows a performance analysis of the COBOL program ran by Microsoft Visual
Studio 2015 (VS15). Although execution time of about 2.02 seconds is depicted, visual studio
profiler removed the time it took for the program to gain admittance to the processor and for
garbage to provide actual time on the CPU. According to research 1.2 seconds is very long for a
COBOL file IO and sort operation. This is most likely attributed to running COBOL code in a
non MAINFRAME setting, not using Job Control Language (JCL) files, COBOL Copy (CPY)
files, or any other of the many resources that COBOLER’s typically make use of.
Figure 4: Breakdown of COBOL Function Calls During Analysis
VS15 also created a separate window showing the functions that took the most time to complete.
In figure 4, above, the first two and last functions called by the program were native to the Mi-
croFocus programming environment. These functions consumed 63.37% of the total execution
9 | P a g e
RILEY / DIEBERT 10
time for the program. The functions written for this program, not built in to COBOL or native to
MicroFocus, consumed 36.63% of the total program execution time which yields an execution
time of .43956 seconds. This number reflects what most COBOLER’s refer to as a ‘good
enough’ execution time for small COBOL operations. Further analysis of the execution timing
reveals that the user defined sorting function took up 26.59% of the total execution time.
COBOL has a built-in SORT operation which was also tested earlier on in the development
phase. While the SORT operation does work on tables, with some modification, it was deter-
mined that due to time constraints it would be wiser to write a Bubble Sort method. It is noted
that not using the built-in SORT operation is not a fair comparison on COBOL’s behalf.
Figure 5: Visual Studio Performance Analysis of C++ Execution with Optimization Set-
tings O2 – Maximize Speed
Figure 5, above, shows the performance analysis of the C++ program completing the
same task as the COBOL program. Perhaps not surprisingly, C++ had an average runtime of 542
milliseconds, performing just over twice as fast as the COBOL program. One of the more chal-
lenging aspects of this research was finding examples of data and industry standard COBOL. We
would have liked to have implemented programs and data sets similar to that used in the business
10 | P a g e
RILEY / DIEBERT 11
world however these implementations were more elusive than we thought they would be, per-
haps because of their secure nature. Due to this, we cannot say with one hundred percent cer-
tainty that the data concludes C++ as faster, however for basic tasks using unoptimized code this
is shown to be the case.
Figure 6: Breakdown of C++ Function Calls During Analysis (Optimized O2)
This is the separate window containing the C++ programs functions requiring the largest
amount of completion time. The oddity here (not represented in the above figure) was that almost
every time the program was run, a different function would take up the largest percentage of in-
dividual work. For example, in the case above it happened to be the ‘<<’ operator in some fash-
ion, while other times the ‘getline’ function would take that place. Whenever it was run, it con-
sistently ended with the result that one function would take ~60% of the work. There was no
clear answer as to why this was the case. Overall, the ‘FileIn’ portion of the code was the most
time consuming for the program, and can be seen in the figure above.
11 | P a g e
RILEY / DIEBERT 12
Figure 7: C++ Execution with Optimizations Disabled
Microsoft Visual Studio:MicroFocus does not provide settings for optimization of the
COBOL compilation, yet C++ optimization settings are certainly available. After seeing the C+
+ and COBOL execution times, and factoring out MicroFocus overhead we were curious as to
the performance of C++ when optimizations were disabled. Figure 7, above shows the execution
profile of the C++ program, without any other changes, optimizations disabled. The execution
profile without optimizations presented a striking divergence from the profile with optimizations.
As shown above the un-optimized C++ program took fourteen seconds to complete, which is
twenty-five times slower and would be considered unacceptable for database file IO.
Figure 8: Breakdown of Un-Optimized C++ Program Execution
An interesting observation emerges from this profiling of the C++ program. During this
profile the operations which utilized the Central Procssing Unit (CPU) the most were relating to
vector operations, most notably the vector push_back() operation, and it’s associated allocation
operations. A difference in magnitude of execution time as depicted above leads us to think that
perhaps some re-thinking of the C++ program could be done to bring the C++ program closer to
the COBOL program in terms of execution time. Even so it is hard to argue the speed of
COBOL as a 14 second gap could very likely be attributed to C++ runtime or operating system
service calls during vector operations.
The total lines of code in the COBOL program could have been reduced by about 25
lines. The implementation of the structures used to hold the data from the .CSV file were most
12 | P a g e
RILEY / DIEBERT 13
likely not optimized if compared to operational COBOL code. Time constraints and the signifi-
cant learning curve encountered when learning COBOL resulted in an implementation of a ‘data
structure’ for containing the csv file data being less than optimal by professional standards. The
size of the binary executable on disk is probably about the right size since COBOL was created
during a time where memory management is crucial.
The C++ program was similar, and could have been reduced by at least 50 lines without a
large amount of effort (however this would have removed its level of explicitness) and likely re-
duced even farther with the fine tuning of some algorithms. An expert/professional C++ pro-
grammer could probably implement this functionality in a moderately better way, however for
the purposes of the research done in this report we felt the data to be both accurate and conclu-
sive.
COBOL at first seems to be foreign and obfuscated to the normal imperative procedural
programmer, yet given some time the syntax of COBOL begins to make perfect sense. As pro-
grammers learn languages the concept of programming and the act of writing a program evolve
to a level at which the English language becomes foreign and obfuscated to them when viewed
within the context of a programs code. When Michael Riley began his journey upon the road to
COBOL he was met with a great deal of confusion and uncertainty. Logically a conditional
statement would consist of an LVALUE a relational operator and an RVALUE in most other im-
perative procedural languages. Yet in COBOL a conditional statement looks something like the
following:
if rec-counter is greater than or equal to 1 then
Similarly, a loop can be constructed in the following manner:
perform varying rec-counter from 1 by 1 until rec-counter is greater than 1526
13 | P a g e
RILEY / DIEBERT 14
Both above examples can be written as English or as something more closely resembling C++
for example: if (rec-counter = 24) then display "hello" end-if. // = is comparison
One can see after some simple reading that this syntax actually makes logical sense. Many peo-
ple claim that this is not the case though and cite the absence of software engineering influence
during the design and creation of COBOL. On one hand the syntax above makes sense since, in
its day, because the individuals writing the programs weren’t actually programmers but business
professionals. However, in current times this syntax is not as logically sound since those who
will be writing COBOL in the future days to come will be software engineers.
Overall our efforts were not a direct comparison between the two programming
languages due to our inability to write COBOL in the sense of what it was intended. Running
COBOL via MicroFocus on a modern laptop is a far cry away from running COBOL code on an
IBM mainframe and business environment. On the other end, C++ could have been
implemented in a more efficient manner and with potentially better hardware. Further testing is
required to reach a definitive answer as to whether COBOL should be replaced by C++ in
industry. Such testing is beyond the capability and scope of our efforts for CS332; however, this
does seem to be a topic of future research, perhaps at the master’s level or above. What we
ultimately learned while performing this research is that there are many reasons to support both
sides of the argument and it is clear this debate exists for strong reasons. C++ may execute more
quickly, but even if this is true for the expertly programmed, wholly optimized version of a
similar test, it is still perhaps not worth the switch. There may be little benefit to code that
performs marginally faster in this particular workflow, as these transactions seem to work fine
for the modern marketplace. Our goal in performing further research would be to find clear
examples of code and data used in industry, and to find the hardware capable of running it. With
14 | P a g e
RILEY / DIEBERT 15
this information we would be able to better test against C++, as well as perhaps multiple other
languages that have been proposed as competitors to COBOL.
Special Thanks to Mr. Bill Woodger, GnuCOBOL forum Senior Member, for his
assistance in helping Michael get his mind around COBOL. Without his assistance this project
may have failed to generate a comparison entirely.
15 | P a g e
RILEY / DIEBERT 16
Works Cited
Wikipedia contributors. "COBOL." Wikipedia, The Free Encyclopedia. Wikipedia, The Free
Encyclopedia, 9 Apr. 2017. Web. 20 Apr. 2017.
Boutin, Paul, Brent Hailpern, Todd Proebsting, and Gio Wiederhold. "Mother Tounges."
www.digibarn.com, n.d. Web. 20 Apr. 2017.
Admin. "US Inflation Calculator." US Inflation Calculator. N.p., n.d. Web. 20 Apr. 2017.
"Dr. Grace Murray Hopper." Dr. Grace Murray Hopper: COBOL Computer Language. N.p., n.d.
Web. 20 Apr. 2017.
Bell, Gordon. "Computer Trees." Computer Trees. N.p., 22 Mar. 2015. Web. 20 Apr. 2017.
"Not Just a Load of Old COBOLers: Systems Are Still Running on Old Code." Naked Security.
N.p., 31 Mar. 2017. Web. 20 Apr. 2017.
Moffitt, Kevin. "A Framework and Implementation for Detecting Source Code Faults in COBOL
Code." By Kevin Moffitt :: SSRN. SSRN, 05 Apr. 2017. Web. 20 Apr. 2017.
"COBOL Skill Salary - PayScale." COBOL Skill Salary, Average Salaries | PayScale. N.p., n.d.
Web. 20 Apr. 2017.
Bailey, C. (2016, September 23). How Mainframe Innovation Can Help Modernize Federal
Government IT. Retrieved from Inside Tech Chalk Web Site:
http://insidetechtalk.com/modernize-federalgovernment-it/
Moore, Jack. "HERE ARE 10 OF THE OLDEST IT SYSTEMS IN THE FEDERAL
GOVERNMENT." Nextgov. N.p., 25 May 2016. Web. 20 Apr. 2017.
16 | P a g e