25
RILEY / DIEBERT 1 MICHAEL LEROY RILEY II IAN DIEBERT DR. SAM SIEWERT CS-332 ORGANIZATION OF PROGRAMMING LANGUAGES 20APR17 COBOL: Lethal Injection or Fountain of Youth? Towards the end of the 1950’s there were growing concerns among computer users and manufacturers regarding the rising cost of programming. During this time people did not process data on their laptops as we do today. In addition, servers were not a conceptually familiar entity in the way that they are today. (Wikipedia) Data processing installations were dedicated businesses, such as the Computer Services Bureau, which offered services such as payroll but at a significant cost. In 1959, a survey was conducted which found that in any data processing installation, the programming cost was approximately $800,000.00 on average and translating programs to run on new hardware would cost approximately $600,000.00. (Wikipedia) While those figures are not appetizing for any business, let’s remember the context by which those figures are derived. Those dollar amounts were

sourceforge.net · Web viewAccording to research 1.2 seconds is very long for a COBOL file IO and sort operation. This is most likely attributed to running COBOL code in a non MAINFRAME

Embed Size (px)

Citation preview

Page 1: sourceforge.net · Web viewAccording to research 1.2 seconds is very long for a COBOL file IO and sort operation. This is most likely attributed to running COBOL code in a non MAINFRAME

RILEY / DIEBERT 1

MICHAEL LEROY RILEY II

IAN DIEBERT

DR. SAM SIEWERT

CS-332 ORGANIZATION OF PROGRAMMING LANGUAGES

20APR17

COBOL: Lethal Injection or Fountain of Youth?

Towards the end of the 1950’s there were growing concerns among computer users and

manufacturers regarding the rising cost of programming. During this time people did not process

data on their laptops as we do today. In addition, servers were not a conceptually familiar entity

in the way that they are today. (Wikipedia) Data processing installations were dedicated

businesses, such as the Computer Services Bureau, which offered services such as payroll but at

a significant cost. In 1959, a survey was conducted which found that in any data processing

installation, the programming cost was approximately $800,000.00 on average and translating

programs to run on new hardware would cost approximately $600,000.00. (Wikipedia) While

those figures are not appetizing for any business, let’s remember the context by which those

figures are derived. Those dollar amounts were from 1959, so $800,000.00 would be

approximately equal to $6,702,432.99 in 2017 after inflation which is an increase of 737.8%.

(US Inflation Calculator.) In addition to these costs, there were many other reasons to create a

domain specific programming language. Many businesses during this time were creating their

own programming languages and / or software packages to perform the tasks relevant to their

organization. From a narrow vantage point of that organization this makes sense; however, when

one views the business arena from a broader perspective an immediate conclusion can be reached

that most of these businesses were of a similar nature, particularly in the sense that they were

Page 2: sourceforge.net · Web viewAccording to research 1.2 seconds is very long for a COBOL file IO and sort operation. This is most likely attributed to running COBOL code in a non MAINFRAME

RILEY / DIEBERT 2

businesses first. Although nearly every business was producing different physical products, their

data for business transactions only differed in the format of the records and the length of fields.

From a software engineering perspective, and coupling these facts with the exorbitantly high cost

of development, these businesses were similar enough to warrant exploration into the creation of

a domain specific ‘Common Business Oriented Language’, derivatively named COBOL.

There were only two major professionally developed programming languages present at

the time COBOL arose. Figure one below show a minimalistic view of the status of

programming languages at that time.

Figure 1: Origination of COBOL with

Key (www.digibarn.com)

Looking at figure 1 above, it may be difficult to grasp the significance of the amount of parallel

software development projects occurring in companies between Fortran and COBOL. Figure 2,

below, shows a tree of programming language development between the years of 1955-1960 for

just the Electronic Numerical Integrator and Computer (ENIAC). Although the ENIAC was not

only a programming language, it was under development at the same time. It is not far off to

2 | P a g e

Page 3: sourceforge.net · Web viewAccording to research 1.2 seconds is very long for a COBOL file IO and sort operation. This is most likely attributed to running COBOL code in a non MAINFRAME

RILEY / DIEBERT 3

come to the conclusion that as the hardware and capability of the ENIAC progressed and as

different customers with different business needs acquired it, the rate of programming language

development would fervently increase.

Figure 2: Software Development Efforts between 1955 and 1960 (Gordon)

In 1959, a meeting was held at the University of Pennsylvania with the goal of

formulating ideas for common business languages. This group soon propositioned the United

States Department of Defense (DoD) to sponsor an effort to create such a language. The DoD

was heavily invested in the crisis that was enveloping the business and software engineering

worlds at this time. The DoD itself operated 225 computers, had another 175 on back order, and

had already spent over $200 million on software development efforts for this machine. Needless

to say, the DoD was quite willing to assist in the effort and an early 1960 compiler construction

for COBOL-60 was underway. (Wikipedia)

If COBOL was developed to satisfy a niche market need then is questionable as to why

many software/computer engineers and scientists would speak the need for its replacement.

There are many arguments for both the continuation of COBOL via specification refinement and

for the complete retirement of the language in lieu of another more modern language such as C+

3 | P a g e

Page 4: sourceforge.net · Web viewAccording to research 1.2 seconds is very long for a COBOL file IO and sort operation. This is most likely attributed to running COBOL code in a non MAINFRAME

RILEY / DIEBERT 4

+. Arguments for the replacement of COBOL are centered around the cost of the programming

language in today’s market. Due to the poor design and documentation of legacy systems, it is

difficult to check downstream and upstream dependencies for COBOL modules, making it

expensive and risky to make any changes in the code. (Moffitt, 2017) COBOL programmers,

with and without vast experience, are paid handsomely for code maintenance and even more so

for code generation. According to www.payscale.com, a COBOL programmer with less than

one year of experience can expect to earn approximately $45k, while a ‘COBOLER’ with over

five years of experience can expect to earn $70k. These may not seem very impressive at first

but it must be considered that they represent the mean not the maximum. A C++ programmer’s

salary is comparable; however, the market is overloaded with C++ developers relative to

COBOLERs. The US government is also concerned with the vulnerabilities and expenses of an

aging IT infrastructure. On September 22, 2016, the US House of Representatives passed The

Modernizing Government Technology Act (Bailey, 2016).

Another area of concern that COBOL is facing in today’s market is the constant threat of

cyber-attacks; an area which may overwhelm the COBOL programming language and force it

into retirement. The United States government has experienced a number of attacks against

COBOL systems in recent years. The Office of Personnel Management (OPM) suffered a breach

of security in 2015 as did the Veterans Administration (VA). At the heart of their information

technology environment was COBOL. According to Nextgov.com three out of ten of the oldest

federal IT systems still in use run on COBOL. Not surprisingly the VA and OPM are listed as

the agencies still running COBOL. One question that persists in debates regarding this particular

topic is how cyber-attacks effect – directly and indirectly – COBOL systems. According to John

Walker, owner of Secure-Bastion Ltd, “At the end of the day it is a case of ‘security through

4 | P a g e

Page 5: sourceforge.net · Web viewAccording to research 1.2 seconds is very long for a COBOL file IO and sort operation. This is most likely attributed to running COBOL code in a non MAINFRAME

RILEY / DIEBERT 5

obscurity’ versus ‘devoid security through confusion’. On one hand, we have the inferred

security represented by say, non-routable protocols, or mainframe speciation partitions such as

LPAR. On the other side of the security coin we have unpatched outdated systems such as NT4.0

residing inside virtualization. No matter, the outcomes are the same: confusion, and an enormous

potential for unknown unknowns of insecurity to reside within the operational environment.”

(Naked Security) Although this argument seems logical there seems to be an easy work around

to the crucial point. If the main defense against injection attacks or any other form of cyber-

attack is a lack of knowledge regarding a subsystem and its associated protocols or layers then

the only thing required of a malicious actor is to learn about these facets of the system.

Furthermore, if the only assurance against the malicious actor gaining the requisite knowledge is

the fact that those systems are not well documented, then that really is not a great deal of

assurance at all. If a malicious actor had physical access to a COBOL system then learning

about it would be trivial. According to Dr. Jon Haass and Dr. Paul Hriljac, professors in the

College of Security and Intelligence at Embry-Riddle Aeronautical University Prescott, AZ,

“What is more likely the case is that a gateway or filter is used to provide connection to the

legacy system and modern systems. The challenge for COBOL is its lack of strong

authentication so if you can get past the gateway, you are in.” This seems to be a serious risk

that financial organizations are taking and it also seems to be a risk that will eventually result in

catastrophic failure. Dr. Haass and Dr. Hriljac continue on to say, “A person does not have to

have physical access to interfere, instead you hack the gateway or bridge system and then you

can practically access everything in the legacy code. Very little work is being done with Cobol

except to link up to new front ends whether they are web based or some other device.” When

5 | P a g e

Page 6: sourceforge.net · Web viewAccording to research 1.2 seconds is very long for a COBOL file IO and sort operation. This is most likely attributed to running COBOL code in a non MAINFRAME

RILEY / DIEBERT 6

cast in this light there seems to be some very real evidence that COBOL may need to be ported

to another programming language.

C++ exists as a potential competitor to COBOL and is one of the many languages that

has been analyzed and discussed for its possible benefits should a portion of the industry decide

to transit to the language. C++ contains many well-known and highly understood benefits; it is

more powerful (in terms of overall code length and supported features), it is better documented

(with perhaps many thousands of manuals and books dedicated to the language), and it is known

by a much wider set of programmers. These are easily provable points on a personal level; one

can search Google regarding function templates and see that they are not supported in COBOL,

or look up COBOL references/manuals to see that the number of results is far less than the same

of C++, or simply search the trends regarding COBOL and C++ to find that C++ is a more active

topic.

The primary concerns regarding COBOL are that it is obsolete, and lacks the dedicated

programmer bases maintained in languages such as java, python, and of course C++. Many

argue that a programming language which is generally less useful and performs to a worse

degree should not be maintained as the standard for the majority of the business and transactional

sectors of programming. Despite this argument, which is – from a high-level perspective – very

sound, businesses have always and continue still to support and run COBOL on even the most

critical of data systems. Leading to the hypothesis that for the usage of COBOL to continue

there must reasons and a market to substantiate this.

C++ was originally designed to be a general-purpose language that superseded C with

object oriented features, such that the language would stay low-level and close to the hardware.

As a superset of C, C++ would also retain the speed, portability, and all functionality present

6 | P a g e

Page 7: sourceforge.net · Web viewAccording to research 1.2 seconds is very long for a COBOL file IO and sort operation. This is most likely attributed to running COBOL code in a non MAINFRAME

RILEY / DIEBERT 7

within the C language; this made it an easy transition to all those programmers who were

implementing C. COBOL now was established as a language, but had already been witnessing a

decline in overall popularity; Edsger Dijkstra as early as 1975 (over a decade before the official

release of C++) stated, “the use of COBOL cripples the mind.” Regarding the overall lack of

structure present in the COBOL language.

C++ excels in the areas in which COBOL most lacks; structure, compatibility/efficiency

of adding to the language, the verbosity of syntax, support of the greater computer science

community, and the methods of design undertaken by the original creators and subsequent

standards committee.

C++ is incredibly well structured, primarily due to its derivation from C, and the

standards upon which it has been built. Well written C++ code can be understood and read

(albeit with some significant work) even as part of a million-line program, meaning that it can be

modified, updated, or ported to another language without the fear of tearing apart an entire

digital ecosystem simply through the alteration of a single piece of data. On the other end of the

spectrum, COBOL suffers from just that; much of the legacy COBOL code can no longer be

modified, as the older versions exist as archaic, unstructured, and illegible constructs that can no

longer be understood or, more importantly, modified without the danger of tearing down a

critical system.

C++ has also demonstrated many times that it is very capable of being modified without

sacrificing existing legacy support, or support for the entire modern C language. COBOL, as an

old and monolithic language, was unable to be upgraded without leaving behind much of the

code that had already been implemented. When COBOL-85 was released, it did not support

compatibility with older versions, and was therefore highly criticized; users citing the heavy

7 | P a g e

Page 8: sourceforge.net · Web viewAccording to research 1.2 seconds is very long for a COBOL file IO and sort operation. This is most likely attributed to running COBOL code in a non MAINFRAME

RILEY / DIEBERT 8

reprogramming costs of implementing the updated standard. It is important to note that the

board of standards for COBOL was unable to convince the then users of the language to switch

from it, to a more modern version of itself. This has led to over 300 different dialects of COBOL

and a user base that cannot and will not switch to even a more modern version of COBOL,

leaving aside the notion of separate modern language.

This ultimately led to COBOLs final primary detriment, which is the lack of continued

support by engineers and scientists at the top of the field. COBOL was already an underused

language during the mid-80s, which was strongly exacerbated with the alienation of the

computer science community regarding its development by members of commerce and

government. Elements such as the exclusion of Backus-Naur form for COBOLs syntax in lieu of

its own metalanguage prevented most of the computer engineering/science community from

paying much notice at all to COBOL.

The COBOL and C++ programming languages were evaluated against each other with

regards to binary executable size on disk, lines of code to perform the same operations, and pro-

gram execution time. Each program was written to the following requirements to ensure the

comparison be as representative to a true analysis as possible.

Read in a .CSV file

Store the data from the file into memory

Randomly generate values

Update the entire contents of the file by adding the randomly generated number to a spe-

cific field per row of the entire data set

Write the data back to a .CSV file

Be ran on the same computer (laptop) during the evaluation and analysis events

8 | P a g e

Page 9: sourceforge.net · Web viewAccording to research 1.2 seconds is very long for a COBOL file IO and sort operation. This is most likely attributed to running COBOL code in a non MAINFRAME

RILEY / DIEBERT 9

Figure 3: Visual Studio Performance Analysis of COBOL Execution

Figure 3, above, shows a performance analysis of the COBOL program ran by Microsoft Visual

Studio 2015 (VS15). Although execution time of about 2.02 seconds is depicted, visual studio

profiler removed the time it took for the program to gain admittance to the processor and for

garbage to provide actual time on the CPU. According to research 1.2 seconds is very long for a

COBOL file IO and sort operation. This is most likely attributed to running COBOL code in a

non MAINFRAME setting, not using Job Control Language (JCL) files, COBOL Copy (CPY)

files, or any other of the many resources that COBOLER’s typically make use of.

Figure 4: Breakdown of COBOL Function Calls During Analysis

VS15 also created a separate window showing the functions that took the most time to complete.

In figure 4, above, the first two and last functions called by the program were native to the Mi-

croFocus programming environment. These functions consumed 63.37% of the total execution

9 | P a g e

Page 10: sourceforge.net · Web viewAccording to research 1.2 seconds is very long for a COBOL file IO and sort operation. This is most likely attributed to running COBOL code in a non MAINFRAME

RILEY / DIEBERT 10

time for the program. The functions written for this program, not built in to COBOL or native to

MicroFocus, consumed 36.63% of the total program execution time which yields an execution

time of .43956 seconds. This number reflects what most COBOLER’s refer to as a ‘good

enough’ execution time for small COBOL operations. Further analysis of the execution timing

reveals that the user defined sorting function took up 26.59% of the total execution time.

COBOL has a built-in SORT operation which was also tested earlier on in the development

phase. While the SORT operation does work on tables, with some modification, it was deter-

mined that due to time constraints it would be wiser to write a Bubble Sort method. It is noted

that not using the built-in SORT operation is not a fair comparison on COBOL’s behalf.

Figure 5: Visual Studio Performance Analysis of C++ Execution with Optimization Set-

tings O2 – Maximize Speed

Figure 5, above, shows the performance analysis of the C++ program completing the

same task as the COBOL program. Perhaps not surprisingly, C++ had an average runtime of 542

milliseconds, performing just over twice as fast as the COBOL program. One of the more chal-

lenging aspects of this research was finding examples of data and industry standard COBOL. We

would have liked to have implemented programs and data sets similar to that used in the business

10 | P a g e

Page 11: sourceforge.net · Web viewAccording to research 1.2 seconds is very long for a COBOL file IO and sort operation. This is most likely attributed to running COBOL code in a non MAINFRAME

RILEY / DIEBERT 11

world however these implementations were more elusive than we thought they would be, per-

haps because of their secure nature. Due to this, we cannot say with one hundred percent cer-

tainty that the data concludes C++ as faster, however for basic tasks using unoptimized code this

is shown to be the case.

Figure 6: Breakdown of C++ Function Calls During Analysis (Optimized O2)

This is the separate window containing the C++ programs functions requiring the largest

amount of completion time. The oddity here (not represented in the above figure) was that almost

every time the program was run, a different function would take up the largest percentage of in-

dividual work. For example, in the case above it happened to be the ‘<<’ operator in some fash-

ion, while other times the ‘getline’ function would take that place. Whenever it was run, it con-

sistently ended with the result that one function would take ~60% of the work. There was no

clear answer as to why this was the case. Overall, the ‘FileIn’ portion of the code was the most

time consuming for the program, and can be seen in the figure above.

11 | P a g e

Page 12: sourceforge.net · Web viewAccording to research 1.2 seconds is very long for a COBOL file IO and sort operation. This is most likely attributed to running COBOL code in a non MAINFRAME

RILEY / DIEBERT 12

Figure 7: C++ Execution with Optimizations Disabled

Microsoft Visual Studio:MicroFocus does not provide settings for optimization of the

COBOL compilation, yet C++ optimization settings are certainly available. After seeing the C+

+ and COBOL execution times, and factoring out MicroFocus overhead we were curious as to

the performance of C++ when optimizations were disabled. Figure 7, above shows the execution

profile of the C++ program, without any other changes, optimizations disabled. The execution

profile without optimizations presented a striking divergence from the profile with optimizations.

As shown above the un-optimized C++ program took fourteen seconds to complete, which is

twenty-five times slower and would be considered unacceptable for database file IO.

Figure 8: Breakdown of Un-Optimized C++ Program Execution

An interesting observation emerges from this profiling of the C++ program. During this

profile the operations which utilized the Central Procssing Unit (CPU) the most were relating to

vector operations, most notably the vector push_back() operation, and it’s associated allocation

operations. A difference in magnitude of execution time as depicted above leads us to think that

perhaps some re-thinking of the C++ program could be done to bring the C++ program closer to

the COBOL program in terms of execution time. Even so it is hard to argue the speed of

COBOL as a 14 second gap could very likely be attributed to C++ runtime or operating system

service calls during vector operations.

The total lines of code in the COBOL program could have been reduced by about 25

lines. The implementation of the structures used to hold the data from the .CSV file were most

12 | P a g e

Page 13: sourceforge.net · Web viewAccording to research 1.2 seconds is very long for a COBOL file IO and sort operation. This is most likely attributed to running COBOL code in a non MAINFRAME

RILEY / DIEBERT 13

likely not optimized if compared to operational COBOL code. Time constraints and the signifi-

cant learning curve encountered when learning COBOL resulted in an implementation of a ‘data

structure’ for containing the csv file data being less than optimal by professional standards. The

size of the binary executable on disk is probably about the right size since COBOL was created

during a time where memory management is crucial.

The C++ program was similar, and could have been reduced by at least 50 lines without a

large amount of effort (however this would have removed its level of explicitness) and likely re-

duced even farther with the fine tuning of some algorithms. An expert/professional C++ pro-

grammer could probably implement this functionality in a moderately better way, however for

the purposes of the research done in this report we felt the data to be both accurate and conclu-

sive.

COBOL at first seems to be foreign and obfuscated to the normal imperative procedural

programmer, yet given some time the syntax of COBOL begins to make perfect sense. As pro-

grammers learn languages the concept of programming and the act of writing a program evolve

to a level at which the English language becomes foreign and obfuscated to them when viewed

within the context of a programs code. When Michael Riley began his journey upon the road to

COBOL he was met with a great deal of confusion and uncertainty. Logically a conditional

statement would consist of an LVALUE a relational operator and an RVALUE in most other im-

perative procedural languages. Yet in COBOL a conditional statement looks something like the

following:

if rec-counter is greater than or equal to 1 then 

Similarly, a loop can be constructed in the following manner:

perform varying rec-counter from 1 by 1 until rec-counter is greater than 1526

13 | P a g e

Page 14: sourceforge.net · Web viewAccording to research 1.2 seconds is very long for a COBOL file IO and sort operation. This is most likely attributed to running COBOL code in a non MAINFRAME

RILEY / DIEBERT 14

Both above examples can be written as English or as something more closely resembling C++

for example: if (rec-counter = 24) then display "hello" end-if. // = is comparison

One can see after some simple reading that this syntax actually makes logical sense. Many peo-

ple claim that this is not the case though and cite the absence of software engineering influence

during the design and creation of COBOL. On one hand the syntax above makes sense since, in

its day, because the individuals writing the programs weren’t actually programmers but business

professionals. However, in current times this syntax is not as logically sound since those who

will be writing COBOL in the future days to come will be software engineers.

Overall our efforts were not a direct comparison between the two programming

languages due to our inability to write COBOL in the sense of what it was intended. Running

COBOL via MicroFocus on a modern laptop is a far cry away from running COBOL code on an

IBM mainframe and business environment. On the other end, C++ could have been

implemented in a more efficient manner and with potentially better hardware. Further testing is

required to reach a definitive answer as to whether COBOL should be replaced by C++ in

industry. Such testing is beyond the capability and scope of our efforts for CS332; however, this

does seem to be a topic of future research, perhaps at the master’s level or above. What we

ultimately learned while performing this research is that there are many reasons to support both

sides of the argument and it is clear this debate exists for strong reasons. C++ may execute more

quickly, but even if this is true for the expertly programmed, wholly optimized version of a

similar test, it is still perhaps not worth the switch. There may be little benefit to code that

performs marginally faster in this particular workflow, as these transactions seem to work fine

for the modern marketplace. Our goal in performing further research would be to find clear

examples of code and data used in industry, and to find the hardware capable of running it. With

14 | P a g e

Page 15: sourceforge.net · Web viewAccording to research 1.2 seconds is very long for a COBOL file IO and sort operation. This is most likely attributed to running COBOL code in a non MAINFRAME

RILEY / DIEBERT 15

this information we would be able to better test against C++, as well as perhaps multiple other

languages that have been proposed as competitors to COBOL.

Special Thanks to Mr. Bill Woodger, GnuCOBOL forum Senior Member, for his

assistance in helping Michael get his mind around COBOL. Without his assistance this project

may have failed to generate a comparison entirely.

15 | P a g e

Page 16: sourceforge.net · Web viewAccording to research 1.2 seconds is very long for a COBOL file IO and sort operation. This is most likely attributed to running COBOL code in a non MAINFRAME

RILEY / DIEBERT 16

Works Cited

Wikipedia contributors. "COBOL." Wikipedia, The Free Encyclopedia. Wikipedia, The Free

Encyclopedia, 9 Apr. 2017. Web. 20 Apr. 2017.

Boutin, Paul, Brent Hailpern, Todd Proebsting, and Gio Wiederhold. "Mother Tounges."

www.digibarn.com, n.d. Web. 20 Apr. 2017.

Admin. "US Inflation Calculator." US Inflation Calculator. N.p., n.d. Web. 20 Apr. 2017.

"Dr. Grace Murray Hopper." Dr. Grace Murray Hopper: COBOL Computer Language. N.p., n.d.

Web. 20 Apr. 2017.

Bell, Gordon. "Computer Trees." Computer Trees. N.p., 22 Mar. 2015. Web. 20 Apr. 2017.

"Not Just a Load of Old COBOLers: Systems Are Still Running on Old Code." Naked Security.

N.p., 31 Mar. 2017. Web. 20 Apr. 2017.

Moffitt, Kevin. "A Framework and Implementation for Detecting Source Code Faults in COBOL

Code." By Kevin Moffitt :: SSRN. SSRN, 05 Apr. 2017. Web. 20 Apr. 2017.

"COBOL Skill Salary - PayScale." COBOL Skill Salary, Average Salaries | PayScale. N.p., n.d.

Web. 20 Apr. 2017.

Bailey, C. (2016, September 23). How Mainframe Innovation Can Help Modernize Federal

Government IT. Retrieved from Inside Tech Chalk Web Site:

http://insidetechtalk.com/modernize-federalgovernment-it/

Moore, Jack. "HERE ARE 10 OF THE OLDEST IT SYSTEMS IN THE FEDERAL

GOVERNMENT." Nextgov. N.p., 25 May 2016. Web. 20 Apr. 2017.

16 | P a g e