Metrics and Optimization

Embed Size (px)

Citation preview

  • 8/2/2019 Metrics and Optimization

    1/60

    Metrics and Optimization

    Team Tango & Victor

  • 8/2/2019 Metrics and Optimization

    2/60

    What is it

    Metric is a rule for quantifying characteristic or attribute of aprogram Important part of software development process. Helps developers to find improvements in their code.

  • 8/2/2019 Metrics and Optimization

    3/60

    Common Code Metrics

    Code Coverage Program Load Time Cohesion Coupling

    Code Density Source Lines of Code or Program Length Bugs Per Line of Code Number of Classes and Interfaces Execution Time

  • 8/2/2019 Metrics and Optimization

    4/60

    Code Coverage

    Definition: A measurement of how many lines of code areexecuted while in automated testing. Its a structural testingtechnique. The code coverage tool will give a percentage of how much

    the code has been exercised. Used to develop a set of rigorous and manageableregression tests.

  • 8/2/2019 Metrics and Optimization

    5/60

    Types of Coverage

    The Main Coverages are: Function Coverage Reports whether each function orprocedure is invoked. Statement Coverage Reports whether each executable

    statement is encountered. Decision Coverage Reports whether each Booleanexpression tested in control structures evaluates to both trueand false. Condition/Decision Coverage Both decision and conditionrequirement must be satisfied.

  • 8/2/2019 Metrics and Optimization

    6/60

    Continued

    Condition Coverage Reports whether each Boolean sub-expression evaluates to both true and false.

    Path Coverage Reports whether each path in each function

    has been followed. Also known as predicate coverage.

  • 8/2/2019 Metrics and Optimization

    7/60

  • 8/2/2019 Metrics and Optimization

    8/60

    Advantage

    Allows the developers and QA to look at parts of a systemthat are rarely accessed under normal conditions such as errorhandling. Testers can use the results to develop more test case sets

    that increases the overall code coverage.

  • 8/2/2019 Metrics and Optimization

    9/60

    Disadvantages

    Time Consuming

    High Cost

  • 8/2/2019 Metrics and Optimization

    10/60

    Program Loading Time

    Definition: How long it takes for a program to load before theuser can interact with it. It starts by OS reading the contents of the executable into thememory and carry out other preparatory tasks to prepare it.

    Once the loading part is done, the OS starts the program bypassing the control to the loaded program.

  • 8/2/2019 Metrics and Optimization

    11/60

    Cohesion and Coupling

    (Invented by Larry Constantine)

  • 8/2/2019 Metrics and Optimization

    12/60

    Cohesion

    What is Cohesion?

    Cohesion is a measure of how strongly-related each piece of

    functionality expressed by the source code of a softwaremodule is.

  • 8/2/2019 Metrics and Optimization

    13/60

    Which one is better?

    High Cohesion or Low Cohesion

  • 8/2/2019 Metrics and Optimization

    14/60

    Disadvantages of Low Cohesion

    Increased difficulty in understanding modules

    Increased difficulty in maintaining a system

    Increased difficulty in reusing a module

  • 8/2/2019 Metrics and Optimization

    15/60

    Types of Cohesion

    Coincidental Cohesion (worst): is when parts of a moduleare grouped arbitrarily

    Logical cohesion: is when parts of a module are grouped

    because they logically are categorized to do the same thing

    Temporal cohesion: is when parts of a module are groupedby when they are processed

    Procedural cohesion: is when parts of a module aregrouped because they always follow a certain sequence ofexecution

  • 8/2/2019 Metrics and Optimization

    16/60

    Types of Cohesion

    Communicational Cohesion: is when parts of a module aregrouped because they operate on the same data

    Sequential cohesion: is when parts of a module are grouped

    because the output from one part is the input to another part

    Functional cohesion (best): is when parts of a module aregrouped because they all contribute to a single well-definedtask of the module

  • 8/2/2019 Metrics and Optimization

    17/60

    Coupling (or Dependency)

    Coupling is the degree to which each program module relies oneach one of the other modules.

  • 8/2/2019 Metrics and Optimization

    18/60

    Which one is better?

    Tight Coupling or Loose Coupling

  • 8/2/2019 Metrics and Optimization

    19/60

    Disadvantages of Tight Coupling

    A change in one module usually forces a ripple effect ofchanges in other modules.

    Assembly of modules might require more effort and/or time

    due to the increased inter-module dependency.

    A particular module might be harder to reuse and/or testbecause dependent modules must be included

  • 8/2/2019 Metrics and Optimization

    20/60

    Types of Coupling

  • 8/2/2019 Metrics and Optimization

    21/60

    Cohesion V.S. Coupling

    Low Coupling often correlates with high cohesion

    High Coupling often correlates with low cohesion

  • 8/2/2019 Metrics and Optimization

    22/60

    Comment Density

    A measure of meaningful comments per each logical line ofcode

    Content of comments important

    Too many comments can be cumbersome

  • 8/2/2019 Metrics and Optimization

    23/60

    Source Lines of Code (SLOC)

    Software metric used to measure size of a program Useful for predicting general amount of effort required to

    complete a similar program First used when FORTRAN and assembler were main

    languages

  • 8/2/2019 Metrics and Optimization

    24/60

    SLOC

    Physical SLOC (LOC) Actual number of lines in source code Easier to write tools to measure SLOC Subject to logically irrelevant formatting conventions

    Logical SLOC (LLOC)

    Number of "statements" in source code Does not count formatting conventions

  • 8/2/2019 Metrics and Optimization

    25/60

    SLOC Example

    example 1:

    for (i = 0; i < 100; i += 1) printf ("hello");

    example 2:

    /* Now how many lines of code is this? */

    for (i = 0; i < 100; i += 1)

    {

    printf("hello");

    }

    How many Physical Lines of Code?Logical Lines of Code?

  • 8/2/2019 Metrics and Optimization

    26/60

    Importance of SLOC

    Advantages:

    Intuitive software size measuring metric

    Easy for new programmers to understand

    Can estimate number of bugs per chunk of code

    SLOC per staff hour

  • 8/2/2019 Metrics and Optimization

    27/60

    Importance of SLOC

    Disadvantages:

    Coding accounts for a small chunk of entire softwarecreation process

    Software often uses more than one language

    Often cause of unnecessarily verbose code

    GUI Tools achieve high level of functionality from very littlework from the programmer

  • 8/2/2019 Metrics and Optimization

    28/60

    Types of SLOC

    KLOC - 1 000 lines KDLOC - 1 000 delivered lines KSLOC - 1 000 source lines

    MLOC - 1 000 000 lines

    GLOC - 1 000 000 000 lines

    Does having a lower SLOC count mean having a better

    program?

  • 8/2/2019 Metrics and Optimization

    29/60

    Similarly

    Number of classes and interfaces

    Number of lines of customer requirements

  • 8/2/2019 Metrics and Optimization

    30/60

    Bugs Per Line of Code

    On average: In the software industry there are 15 - 50 errors per 1000 lines

    of delivered code. Microsoft applications have about 0.5 defects per 1000 lines of

    delivered code (in 1992). You should expect 500 bugs per 10 KLOC

    You should spend 50% of the time debugging

    Count bugs and log errors to improve code quality.

  • 8/2/2019 Metrics and Optimization

    31/60

    How can we reduce this? Clean Room Development

    Averages 3 defects per 1000 lines during testing and0.1 defects per 1000 lines of delivered code.

    Focuses on defect prevention rather than removal.

    An example of software written using this method is "TheSpace-Shuttle software" which achieved 1 defects in 400,000lines of code using format development methods, peer

    reviews, and statistical testing. The downside: this came at acost of 1000$ (tax payers money) per line of code.

    E ti Ti

  • 8/2/2019 Metrics and Optimization

    32/60

    Execution Time

    What is it? Defined by the time during which a program is running or

    executing.

    Things that can influence Execution time: Type checking Storage allocation

    Code optimization Run time of algorithms

    How to improve?

    Try to push most tasks to compile time rather than runtime Multithread when possible Design better and faster algorithms

  • 8/2/2019 Metrics and Optimization

    33/60

    Number of Classes and Interfaces

    The number of classes and interfaces excluding the numberof lines of code is a good way to measure the size of the

    program

    If you know the number of classes and interfaces beforeimplementing the code, then you can use this number as a

    measurement to estimate completion time

    Example: A program with 5 classes is smaller than a program with

    500 classes.

  • 8/2/2019 Metrics and Optimization

    34/60

    Tools for Software Metrics

    There are plenty of tools used to measure software such as Analyst4j - An Eclipse plug in or stand-alone tool to

    measure Java programs OOMeter - Measuring software using cohesion,

    complexity, or coupling

    S mmar

  • 8/2/2019 Metrics and Optimization

    35/60

    Summary

    "You can't control what you can't measure."

    To prevent bugs, worry about them while coding ratherthan fixing them after

    Try to minimize execution time by pushing most tasks tocompile time, designing efficient algorithms, and use your

    processor to its potential

    Measuring your program will give you a better estimate ofhow much work is remaining and helps understandwhere code could be improved

    Use good methods of measurements because you couldcause more harm than good using a naive approach

  • 8/2/2019 Metrics and Optimization

    36/60

    Performance Tuning

    "Machine independent code hasmachine independentperformance." Greg Wilson

  • 8/2/2019 Metrics and Optimization

    37/60

    Performance Tuning

    Moore's Law tells us that chips double in speed every 18months

    Proebstring's Law tells us that compiler optimizations doubleprogram speed every 18 years

    After 18 years, an upgrade to the latest hardware wouldconstitute a 4096x improvement in speed over theoptimization's 2x.

  • 8/2/2019 Metrics and Optimization

    38/60

    What could possibly go wrong?

    Optimization almost alwaysfails when performedprematurely.

    The system is constantlyevolving early on.

    Effects are unpredictable oncomplex modern systems.

    bool ready = true;while (ready) {

    // act}

    Optimized =>while (true) {}

  • 8/2/2019 Metrics and Optimization

    39/60

    Considerations before you optimize

    Why is it behaving thatway?

    Is that behaviour reallynecessary?

    Read the documentation! Add documentationyourself

    Can predict the effects ofminor changes.

    //TODO: Use this laterbool ready = true;while (ready) {

    // changeReady();

    }

  • 8/2/2019 Metrics and Optimization

    40/60

    Bentley's Rules for

    Optimization

    Complete list at http://www.imaging.robarts.ca/~kwang/OriginTuning/sgi_html/apa.html

    Excerpts from WritingEfficient Programs

    25 years old

    Some rules still worthknowing

    Key points of interest:1. Data Structure

    Augmentation

    2. Storing Computed Results3. Lazy Evaluation4. Packing5. Interpreters6. Code Motion Out of Loops

    7. Combining Tests8. Loop Unrolling & Fusion9. Exploit Algebraic ID's

    10. Short-Circuiting

    11. Precomputation12. Iteration Over Recursion13. Recycling Objects

    http://www.imaging.robarts.ca/~kwang/OriginTuning/sgi_html/apa.htmlhttp://www.imaging.robarts.ca/~kwang/OriginTuning/sgi_html/apa.htmlhttp://www.imaging.robarts.ca/~kwang/OriginTuning/sgi_html/apa.htmlhttp://www.imaging.robarts.ca/~kwang/OriginTuning/sgi_html/apa.htmlhttp://www.imaging.robarts.ca/~kwang/OriginTuning/sgi_html/apa.htmlhttp://www.imaging.robarts.ca/~kwang/OriginTuning/sgi_html/apa.htmlhttp://www.imaging.robarts.ca/~kwang/OriginTuning/sgi_html/apa.html
  • 8/2/2019 Metrics and Optimization

    41/60

    Data Structure Augmentation

    "The time required for commonoperations on data can often bereduced by augmenting thestructure with additional

    information, or by changing theinformation within the structure soit can be accessed more easily."(Bentley)

  • 8/2/2019 Metrics and Optimization

    42/60

    Augmentation High-Level Example

    string quoteText = @"The time required forcommon operations on data can often be reducedby augmenting the structure with additionalinformation";

    Quote quote = new Quote(quoteText, "Bentley");quote.Append(@"or by changing the informationwithin the structure so it can be accessed moreeasily.");

    quote.Owner = "Michael Scott";

  • 8/2/2019 Metrics and Optimization

    43/60

    Store Precomputed Results

    "The cost of recomputing anexpensive function can bereduced by computing thefunction only once and storing the

    results. Subsequent requests forthe function are handled by tablelookup." (Bentley)

  • 8/2/2019 Metrics and Optimization

    44/60

    Storing Precomputed Results Local variables within

    functions and methods Static variables withinclasses

    Collections and hashingresults when multiple

    results. Cache database requests

    Worthwhile to do this when

    function calls are expensive

    Get in the habit of doing this forshort functions as well.

  • 8/2/2019 Metrics and Optimization

    45/60

    Lazy Evaluation

    "We'll do it live!"

  • 8/2/2019 Metrics and Optimization

    46/60

    Lazy Evaluation

    "Never evaluate an item until it isneeded."

  • 8/2/2019 Metrics and Optimization

    47/60

    Examples of Lazy Evaluation

    def fibonacci(i):# Evaluate fibonacci numbers lazily.return fibonacci_results.setdefault(i, \

    i if i in [0, 1] else (fibonacci(i - 1) + fibonacci(i - 2)))

    Populate the table with only the values that are actually

    requested, when they are requested.Bad:def fibonacci(i):

    if not fibonacci_results.has_key(i):# Precompute at least the first N valuesfibonacci_results[0] = 0fibonacci_results[1] = 1for j in range(2, max(i, N + 1)):

    fibonacci_results[j] = fibonacci_results[j - 1] + \fibonacci_results[j - 2]

    fibonacci_results[i] = fibonacci_results[i - 1] + fibonacci_results[i - 2]return fibonacci_results[i]

    Ti f S R l

  • 8/2/2019 Metrics and Optimization

    48/60

    Packing

    Dense storage representations can decrease storage

    costs by increasing the time to store and retrievedata.

    Time-for-Space Rules

    struct big_rgb {

    int red;int green;int blue;

    };

    /* Wasting 3 bytes (24 bits)* per component.

    * Bad for high resolution

    * images*/

    struct small_rgb {

    unsigned int red:8;unsigned int green:8;

    unsigned int blue:8;

    };

    /* C automatically packs the

    * above bit fields as

    * compactly as possible.* Could use unsigned char */

    Example: Storing rgb component values (0-255).

    Ti f S R l

  • 8/2/2019 Metrics and Optimization

    49/60

    Interpreters

    The space required to represent a program can oftenbe decreased by the use of interpreters wherecommon sequences of operations are representedcompactly.

    Time-for-Space Rules

    Example: regular expressions encoded as FSAs However, state jumping confuses compilers.

    L R l

  • 8/2/2019 Metrics and Optimization

    50/60

    Code Motion Out of Loops

    An expression whose value does not depend on theloop variable should be calculated once, outside theloop, rather than iteratively.

    Loop Rules

    Compiler is good at recognizing invariant

    expressions. Place expressions where it ismost natural to read or write them, and let thecompiler move them for you.

    Example: for (i=0; i < n ;i++) { if (x[i]

  • 8/2/2019 Metrics and Optimization

    51/60

    L R l

  • 8/2/2019 Metrics and Optimization

    52/60

    Loop Unrolling

    Large cost of some short loops is in modifying loopindexes. That cost can often be reduced by unrollingthe loop.

    The goal of loop unrolling is to increase a program'sspeed by reducing (or eliminating) instructions thatcontrol the loop.

    Loop Rules

    Example:

    int x;

    for (x = 0; x < 100; x++){delete(x);

    }

    int x;

    for (x = 0; x < 100; x+=5) {delete(x);delete(x+1);delete(x+2);delete(x+3);delete(x+4);

    }

    L R l

  • 8/2/2019 Metrics and Optimization

    53/60

    Loop Fusion

    If two nearby loops operate on the same set ofelements, combine their operational parts and useonly one set of loop-control operations.

    However, this contradicts modularity principles.

    Loop Rules

    Example:

    //BAD!!

    for (i=0;i

  • 8/2/2019 Metrics and Optimization

    54/60

    Exploit Algebraic Identities

    In a conditional expression, replace a costlyexpression with an algebraically equivalentexpression that is cheaper to evaluate.

    Compilers can sometimes do this.

    Logic Rules

    Example: Not (sqr(X) > 1) but (X > 1). Not (!(A) && !(B)) but!(A || B)(DeMorgan's Law).

    Logic Rules

  • 8/2/2019 Metrics and Optimization

    55/60

    Short-Circuit Monotone Functions

    Take advantage of short-circuit behavior of Booleanexpressions by evaluating cheap conditions first.

    Compilers don't usually trust themselves to rearrangeorder of evaluation.

    Logic Rules

    Example:

    //BAD: May get a run time error.

    if(((1/x) < 1) && x != 0)

    vs.

    //GOOD: Avoid dividing by zero.

    if( x!=0 && ((1/x) < 1))

    P L i l F i

  • 8/2/2019 Metrics and Optimization

    56/60

    Precompute Logical Functions

    Hard-code a function as a table instead.

    Pros: Table look-ups are very fast.

    Cons:

    Consumes more memory. A lot more on large domains.

    Takes more time to write and change.

    I i R i

  • 8/2/2019 Metrics and Optimization

    57/60

    Iteration vs. Recursion

    Iteration is always at least as good as recursion.

    In many cases, recursion is worse due to stack-frameallocation.

    Some languages will optimize tail-end recursion to avoidexcessive stack-frame allocation.

    for (i = 0; i < 10; ++i) {// DO STUFF

    }

    Good

    void stuff(int i) {// DO STUFFi = i + 1;if (i < 10)

    stuff(i + 1);

    }

    Bad

    R li Obj t

  • 8/2/2019 Metrics and Optimization

    58/60

    Recycling Objects Important to free unused memory for future use.

    Manual is most efficient, but prone to errors:

    Too early, too late, not at all, too many times.

    Automatic garbage collection is more reliable, but

    expensive. "Mark-and-Sweep": mark all reachable references,

    sweep away all unmarked memory. "Stop-and-Copy": copy all reachable references to a new

    section of memory, free everything left behind.

    "Generational": use one of the above methods on recentmemory (faster, but doesn't catch everything).

    M Ti

  • 8/2/2019 Metrics and Optimization

    59/60

    More Tips Vectors are the most efficient lists.

    Hash Tables are the most efficient maps.

    Don't synchronize "just in case".

    Acquire lock, then loop/recurse, then release.

    Instance variables are initialized per-object, class variablesjust once.

    Inner classes can use private methods of their containers,but at a cost.

  • 8/2/2019 Metrics and Optimization

    60/60

    Questions?