Metrics and Optimization

8/2/2019 Metrics and Optimization

1/60

Metrics and Optimization

Team Tango & Victor


2/60

What is it

Metric is a rule for quantifying characteristic or attribute of aprogram Important part of software development process. Helps developers to find improvements in their code.


3/60

Common Code Metrics

Code Coverage Program Load Time Cohesion Coupling

Code Density Source Lines of Code or Program Length Bugs Per Line of Code Number of Classes and Interfaces Execution Time


4/60

Code Coverage

Definition: A measurement of how many lines of code areexecuted while in automated testing. Its a structural testingtechnique. The code coverage tool will give a percentage of how much

the code has been exercised. Used to develop a set of rigorous and manageableregression tests.


5/60

Types of Coverage

The Main Coverages are: Function Coverage Reports whether each function orprocedure is invoked. Statement Coverage Reports whether each executable

statement is encountered. Decision Coverage Reports whether each Booleanexpression tested in control structures evaluates to both trueand false. Condition/Decision Coverage Both decision and conditionrequirement must be satisfied.


6/60

Continued

Condition Coverage Reports whether each Boolean sub-expression evaluates to both true and false.

Path Coverage Reports whether each path in each function

has been followed. Also known as predicate coverage.


7/60


8/60

Advantage

Allows the developers and QA to look at parts of a systemthat are rarely accessed under normal conditions such as errorhandling. Testers can use the results to develop more test case sets

that increases the overall code coverage.


9/60

Disadvantages

Time Consuming

High Cost


10/60

Program Loading Time

Definition: How long it takes for a program to load before theuser can interact with it. It starts by OS reading the contents of the executable into thememory and carry out other preparatory tasks to prepare it.

Once the loading part is done, the OS starts the program bypassing the control to the loaded program.


11/60

Cohesion and Coupling

(Invented by Larry Constantine)


12/60

Cohesion

What is Cohesion?

Cohesion is a measure of how strongly-related each piece of

functionality expressed by the source code of a softwaremodule is.


13/60

Which one is better?

High Cohesion or Low Cohesion


14/60

Disadvantages of Low Cohesion

Increased difficulty in understanding modules

Increased difficulty in maintaining a system

Increased difficulty in reusing a module


15/60

Types of Cohesion

Coincidental Cohesion (worst): is when parts of a moduleare grouped arbitrarily

Logical cohesion: is when parts of a module are grouped

because they logically are categorized to do the same thing

Temporal cohesion: is when parts of a module are groupedby when they are processed

Procedural cohesion: is when parts of a module aregrouped because they always follow a certain sequence ofexecution


16/60

Types of Cohesion

Communicational Cohesion: is when parts of a module aregrouped because they operate on the same data

Sequential cohesion: is when parts of a module are grouped

because the output from one part is the input to another part

Functional cohesion (best): is when parts of a module aregrouped because they all contribute to a single well-definedtask of the module


17/60

Coupling (or Dependency)

Coupling is the degree to which each program module relies oneach one of the other modules.


18/60

Which one is better?

Tight Coupling or Loose Coupling


19/60

Disadvantages of Tight Coupling

A change in one module usually forces a ripple effect ofchanges in other modules.

Assembly of modules might require more effort and/or time

due to the increased inter-module dependency.

A particular module might be harder to reuse and/or testbecause dependent modules must be included


20/60

Types of Coupling


21/60

Cohesion V.S. Coupling

Low Coupling often correlates with high cohesion

High Coupling often correlates with low cohesion


22/60

Comment Density

A measure of meaningful comments per each logical line ofcode

Content of comments important

Too many comments can be cumbersome


23/60

Source Lines of Code (SLOC)

Software metric used to measure size of a program Useful for predicting general amount of effort required to

complete a similar program First used when FORTRAN and assembler were main

languages


24/60

SLOC

Physical SLOC (LOC) Actual number of lines in source code Easier to write tools to measure SLOC Subject to logically irrelevant formatting conventions

Logical SLOC (LLOC)

Number of "statements" in source code Does not count formatting conventions


25/60

SLOC Example

example 1:

for (i = 0; i < 100; i += 1) printf ("hello");

example 2:

/* Now how many lines of code is this? */

for (i = 0; i < 100; i += 1)

{

printf("hello");

}

How many Physical Lines of Code?Logical Lines of Code?


26/60

Importance of SLOC

Advantages:

Intuitive software size measuring metric

Easy for new programmers to understand

Can estimate number of bugs per chunk of code

SLOC per staff hour


27/60

Importance of SLOC

Disadvantages:

Coding accounts for a small chunk of entire softwarecreation process

Software often uses more than one language

Often cause of unnecessarily verbose code

GUI Tools achieve high level of functionality from very littlework from the programmer


28/60

Types of SLOC

KLOC - 1 000 lines KDLOC - 1 000 delivered lines KSLOC - 1 000 source lines

MLOC - 1 000 000 lines

GLOC - 1 000 000 000 lines

Does having a lower SLOC count mean having a better

program?


29/60

Similarly

Number of classes and interfaces

Number of lines of customer requirements


30/60

Bugs Per Line of Code

On average: In the software industry there are 15 - 50 errors per 1000 lines

of delivered code. Microsoft applications have about 0.5 defects per 1000 lines of

delivered code (in 1992). You should expect 500 bugs per 10 KLOC

You should spend 50% of the time debugging

Count bugs and log errors to improve code quality.


31/60

How can we reduce this? Clean Room Development

Averages 3 defects per 1000 lines during testing and0.1 defects per 1000 lines of delivered code.

Focuses on defect prevention rather than removal.

An example of software written using this method is "TheSpace-Shuttle software" which achieved 1 defects in 400,000lines of code using format development methods, peer

reviews, and statistical testing. The downside: this came at acost of 1000$ (tax payers money) per line of code.

E ti Ti


32/60

Execution Time

What is it? Defined by the time during which a program is running or

executing.

Things that can influence Execution time: Type checking Storage allocation

Code optimization Run time of algorithms

How to improve?

Try to push most tasks to compile time rather than runtime Multithread when possible Design better and faster algorithms


33/60

Number of Classes and Interfaces

The number of classes and interfaces excluding the numberof lines of code is a good way to measure the size of the

program

If you know the number of classes and interfaces beforeimplementing the code, then you can use this number as a

measurement to estimate completion time

Example: A program with 5 classes is smaller than a program with

500 classes.


34/60

Tools for Software Metrics

There are plenty of tools used to measure software such as Analyst4j - An Eclipse plug in or stand-alone tool to

measure Java programs OOMeter - Measuring software using cohesion,

complexity, or coupling

S mmar


35/60

Summary

"You can't control what you can't measure."

To prevent bugs, worry about them while coding ratherthan fixing them after

Try to minimize execution time by pushing most tasks tocompile time, designing efficient algorithms, and use your

processor to its potential

Measuring your program will give you a better estimate ofhow much work is remaining and helps understandwhere code could be improved

Use good methods of measurements because you couldcause more harm than good using a naive approach


36/60

Performance Tuning

"Machine independent code hasmachine independentperformance." Greg Wilson


37/60

Performance Tuning

Moore's Law tells us that chips double in speed every 18months

Proebstring's Law tells us that compiler optimizations doubleprogram speed every 18 years

After 18 years, an upgrade to the latest hardware wouldconstitute a 4096x improvement in speed over theoptimization's 2x.


38/60

What could possibly go wrong?

Optimization almost alwaysfails when performedprematurely.

The system is constantlyevolving early on.

Effects are unpredictable oncomplex modern systems.

bool ready = true;while (ready) {

// act}

Optimized =>while (true) {}


39/60

Considerations before you optimize

Why is it behaving thatway?

Is that behaviour reallynecessary?

Read the documentation! Add documentationyourself

Can predict the effects ofminor changes.

//TODO: Use this laterbool ready = true;while (ready) {

// changeReady();

}


40/60

Bentley's Rules for

Optimization

Complete list at http://www.imaging.robarts.ca/~kwang/OriginTuning/sgi_html/apa.html

Excerpts from WritingEfficient Programs

25 years old

Some rules still worthknowing

Key points of interest:1. Data Structure

Augmentation

2. Storing Computed Results3. Lazy Evaluation4. Packing5. Interpreters6. Code Motion Out of Loops

7. Combining Tests8. Loop Unrolling & Fusion9. Exploit Algebraic ID's

10. Short-Circuiting

11. Precomputation12. Iteration Over Recursion13. Recycling Objects
http://www.imaging.robarts.ca/~kwang/OriginTuning/sgi_html/apa.htmlhttp://www.imaging.robarts.ca/~kwang/OriginTuning/sgi_html/apa.htmlhttp://www.imaging.robarts.ca/~kwang/OriginTuning/sgi_html/apa.htmlhttp://www.imaging.robarts.ca/~kwang/OriginTuning/sgi_html/apa.htmlhttp://www.imaging.robarts.ca/~kwang/OriginTuning/sgi_html/apa.htmlhttp://www.imaging.robarts.ca/~kwang/OriginTuning/sgi_html/apa.htmlhttp://www.imaging.robarts.ca/~kwang/OriginTuning/sgi_html/apa.html


41/60

Data Structure Augmentation

"The time required for commonoperations on data can often bereduced by augmenting thestructure with additional

information, or by changing theinformation within the structure soit can be accessed more easily."(Bentley)


42/60

Augmentation High-Level Example

string quoteText = @"The time required forcommon operations on data can often be reducedby augmenting the structure with additionalinformation";

Quote quote = new Quote(quoteText, "Bentley");quote.Append(@"or by changing the informationwithin the structure so it can be accessed moreeasily.");

quote.Owner = "Michael Scott";


43/60

Store Precomputed Results

"The cost of recomputing anexpensive function can bereduced by computing thefunction only once and storing the

results. Subsequent requests forthe function are handled by tablelookup." (Bentley)


44/60

Storing Precomputed Results Local variables within

functions and methods Static variables withinclasses

Collections and hashingresults when multiple

results. Cache database requests

Worthwhile to do this when

function calls are expensive

Get in the habit of doing this forshort functions as well.


45/60

Lazy Evaluation

"We'll do it live!"


46/60

Lazy Evaluation

"Never evaluate an item until it isneeded."


47/60

Examples of Lazy Evaluation

def fibonacci(i):# Evaluate fibonacci numbers lazily.return fibonacci_results.setdefault(i, \

i if i in [0, 1] else (fibonacci(i - 1) + fibonacci(i - 2)))

Populate the table with only the values that are actually

requested, when they are requested.Bad:def fibonacci(i):

if not fibonacci_results.has_key(i):# Precompute at least the first N valuesfibonacci_results[0] = 0fibonacci_results[1] = 1for j in range(2, max(i, N + 1)):

fibonacci_results[j] = fibonacci_results[j - 1] + \fibonacci_results[j - 2]

fibonacci_results[i] = fibonacci_results[i - 1] + fibonacci_results[i - 2]return fibonacci_results[i]

Ti f S R l


48/60

Packing

Dense storage representations can decrease storage

costs by increasing the time to store and retrievedata.

Time-for-Space Rules

struct big_rgb {

int red;int green;int blue;

};

/* Wasting 3 bytes (24 bits)* per component.

* Bad for high resolution

* images*/

struct small_rgb {

unsigned int red:8;unsigned int green:8;

unsigned int blue:8;

};

/* C automatically packs the

* above bit fields as

* compactly as possible.* Could use unsigned char */

Example: Storing rgb component values (0-255).

Ti f S R l


49/60

Interpreters

The space required to represent a program can oftenbe decreased by the use of interpreters wherecommon sequences of operations are representedcompactly.

Time-for-Space Rules

Example: regular expressions encoded as FSAs However, state jumping confuses compilers.

L R l


50/60

Code Motion Out of Loops

An expression whose value does not depend on theloop variable should be calculated once, outside theloop, rather than iteratively.

Loop Rules

Compiler is good at recognizing invariant

expressions. Place expressions where it ismost natural to read or write them, and let thecompiler move them for you.

Example: for (i=0; i < n ;i++) { if (x[i]


51/60

L R l


52/60

Loop Unrolling

Large cost of some short loops is in modifying loopindexes. That cost can often be reduced by unrollingthe loop.

The goal of loop unrolling is to increase a program'sspeed by reducing (or eliminating) instructions thatcontrol the loop.

Loop Rules

Example:

int x;

for (x = 0; x < 100; x++){delete(x);

}

int x;

for (x = 0; x < 100; x+=5) {delete(x);delete(x+1);delete(x+2);delete(x+3);delete(x+4);

}

L R l


53/60

Loop Fusion

If two nearby loops operate on the same set ofelements, combine their operational parts and useonly one set of loop-control operations.

However, this contradicts modularity principles.

Loop Rules

Example:

//BAD!!

for (i=0;i


54/60

Exploit Algebraic Identities

In a conditional expression, replace a costlyexpression with an algebraically equivalentexpression that is cheaper to evaluate.

Compilers can sometimes do this.

Logic Rules

Example: Not (sqr(X) > 1) but (X > 1). Not (!(A) && !(B)) but!(A || B)(DeMorgan's Law).

Logic Rules


55/60

Short-Circuit Monotone Functions

Take advantage of short-circuit behavior of Booleanexpressions by evaluating cheap conditions first.

Compilers don't usually trust themselves to rearrangeorder of evaluation.

Logic Rules

Example:

//BAD: May get a run time error.

if(((1/x) < 1) && x != 0)

vs.

//GOOD: Avoid dividing by zero.

if( x!=0 && ((1/x) < 1))

P L i l F i


56/60

Precompute Logical Functions

Hard-code a function as a table instead.

Pros: Table look-ups are very fast.

Cons:

Consumes more memory. A lot more on large domains.

Takes more time to write and change.

I i R i


57/60

Iteration vs. Recursion

Iteration is always at least as good as recursion.

In many cases, recursion is worse due to stack-frameallocation.

Some languages will optimize tail-end recursion to avoidexcessive stack-frame allocation.

for (i = 0; i < 10; ++i) {// DO STUFF

}

Good

void stuff(int i) {// DO STUFFi = i + 1;if (i < 10)

stuff(i + 1);

}

Bad

R li Obj t


58/60

Recycling Objects Important to free unused memory for future use.

Manual is most efficient, but prone to errors:

Too early, too late, not at all, too many times.

Automatic garbage collection is more reliable, but

expensive. "Mark-and-Sweep": mark all reachable references,

sweep away all unmarked memory. "Stop-and-Copy": copy all reachable references to a new

section of memory, free everything left behind.

"Generational": use one of the above methods on recentmemory (faster, but doesn't catch everything).

M Ti


59/60

More Tips Vectors are the most efficient lists.

Hash Tables are the most efficient maps.

Don't synchronize "just in case".

Acquire lock, then loop/recurse, then release.

Instance variables are initialized per-object, class variablesjust once.

Inner classes can use private methods of their containers,but at a cost.


60/60

Questions?

Documents

Metrics and Optimization