37
Copyright 2017 Kirk Pepperdine BETTER PERFORMANCE BETTER CODE

good performance code...integer range typing conditional constant propagation dominating test detection flow-carried type narrowing dead code elimination dead value elimination class

  • Upload
    others

  • View
    6

  • Download
    0

Embed Size (px)

Citation preview

  • Copyright 2017 Kirk Pepperdine

    BETTER PERFORMANCEBETTER CODE

  • Copyright 2017 Kirk Pepperdine

    ABOUT ME▸Author of jPDM, a performance tuning methodology

    ▸bring structure and predicability to performance tuning

    ▸ Found of jClarity

    ▸next generation of performance tooling based on jPDM

    ▸ Performance consulting and Training (Kodewerk)

    ▸ Java Champion since 2006

  • Copyright 2017 Kirk Pepperdine

    TEXT

    TITLE TEXT

    ▸ Body Level One

    ▸ Body Level Two

    ▸ Body Level Three

    ▸ Body Level Four

    ▸ Body Level Five

    www.kodewerk.com

    Java P

    erform

    ance T

    uning

    Worksh

    op

  • Copyright 2017 Kirk Pepperdine

    ▸Does what it’s suppose to do

    WHAT IS GOOD CODE

    jClarity

  • Copyright 2017 Kirk Pepperdine

    ▸Does what it’s suppose to do

    ▸ Is easy for Humans to read

    WHAT IS GOOD CODE

    jClarity

    "(?:(\\d{4}-\\d{2}-\\d{2}T\\d{2}:\\d{2}:\\d{2}\\.\\d{3}[\\+|\\-]\\d{4}): )?(\\d+(?:\\.|,)\\d{3}): "

  • Copyright 2017 Kirk Pepperdine

    ▸Does what it’s suppose to do

    ▸ Is easy for Humans to read

    ▸ Translates well into the execution environment

    WHAT IS GOOD CODE

    jClarity

  • Copyright 2017 Kirk Pepperdine

    ▸Does what it’s suppose to do

    ▸ Is easy for Humans to read

    ▸ Translates well into the execution environment

    WHAT IS GOOD CODE

    jClarity

  • Copyright 2017 Kirk Pepperdine

    CODING PRINCIPLES▸ SOLID

    ▸Single Responsibility

    ▸Open Closed

    ▸Liskov substitution

    ▸ Interface segregation

    ▸Dependency inversion

    ▸Delegation (tell don’t ask)

    ▸ Small methods

    ▸ Localized variables

    jClarity

  • Copyright 2017 Kirk Pepperdine

    COMPLEXITY▸We need to be at war with complexity

    ▸find the proper abstractions

    ▸ Implementations that are hard to explain

    ▸are unlikely to be good

    ▸often reflect the current (lack) of understanding of the problem

    ▸ Implementations that are easy to explain

    ▸maybe good

    ▸maybe too simple for the problem at hand

    jClarity

  • Copyright 2017 Kirk Pepperdine

    COUPLING VS COHESION▸Coupling is the degree of interdependence between classes

    ▸you need some degree of coupling to get useful work done

    ▸high degrees of coupling result in code that is harder to maintain

    ▸Cohesion refers to the degree belong together

    ▸ things that are related should be bound together

    ▸ low cohesion results when bundle up things that don’t belong together

    ▸ violates SRP

    jClarity

  • Copyright 2017 Kirk Pepperdine

    COUPLING VS COHESION

    jClarity

    Tension between Coupling and Cohesion

  • Copyright 2017 Kirk Pepperdine

    STABILITY RATIO▸Afferent Coupling is a count of the number of classes dependent upon a target

    class

    ▸ Efferent Coupling is a count of the number of classes the target class is dependent upon

    ▸ Instability = efferent couplings / afferent + efferent couplings

    ▸ Indicator of classes resiliency to change

    ▸Range of 0-1 where 0 is stable and 1 is unstable

    ▸Code with a large number of dependencies is highly coupled

    ▸ Instability ratio will be closer to 1 implying code is not resilient to change

    jClarity

  • Copyright 2017 Kirk Pepperdine

    ▸Does what it’s suppose to do

    ▸ Is easy for Humans to read

    ▸ Translates well into the execution environment

    WHAT IS GOOD CODE

    jClarity

  • Copyright 2017 Kirk Pepperdine

    EXECUTION ENVIRONMENT

    jClarity

    Java source code

    javac

    class Loader.class file

    JVM HotSpot

    method cache

    Runtime

    code cache

    JIT

    ahead of time compilation

    Continuous and Just In Time compilation

    Profiler

  • Copyright 2017 Kirk Pepperdine

    JIT COMPILERS▸C1 - client

    ▸easy to reach optimizations

    ▸ compile count threshold 1500

    ▸C2 - server - optimizing compiler

    ▸deeper more complex optmizations

    ▸ compile count threshold 10,000

    ▸ Tiered

    ▸ combination of C1 and C2

    ▸optimizations are applied as they are found

    jClarity

  • Copyright 2017 Kirk Pepperdine

    BENEFIT OF HOTSPOT▸ Time to complete workload

    ▸with -Xint : 766.973 seconds

    ▸with JIT : 124.740 seconds

    jClarity

    766.973/124.740 ~= 6

  • Copyright 2017 Kirk Pepperdine

    STATIC AND DYNAMIC OPTIMIZATIONS

    jClarity

    Inlining delayed compilation tiered compilation on-stack replacement dependence graph representation static single assignment representation exact type inference memory value inference constant folding reassociation operator strength reduction null check elimination type test strength reduction type test elimination algebraic simplification common subexpression elimination integer range typing conditional constant propagation

    dominating test detection flow-carried type narrowing dead code elimination dead value elimination class hierarchy analysis devirtualization symbolic constant propagation autobox elimination escape analysis lock elision lock fusion de-reflection optimistic nullness assertions optimistic type assertions optimistic type strengthening optimistic array length strengthening untaken branch pruning optimistic N-morphic inlining branch frequency prediction call frequency prediction expression hoisting expression sinking

    redundant store elimination adjacent store fusion card-mark elimination merge-point splitting loop unrolling loop peeling safepoint elimination loop vectorization inlining (graph integration) global code motion heat-based code layout switch balancing throw inlining local code scheduling local code bundling delay slot filing graph-coloring register allocation live range splitting copy coalescing constant splitting copy removal address mode matching instruction peepholing DFA-based code generator

  • Copyright 2017 Kirk Pepperdine

    jClarity

    Inlining delayed compilation tiered compilation on-stack replacement dependence graph representation static single assignment representation exact type inference memory value inference constant folding reassociation operator strength reduction null check elimination type test strength reduction type test elimination algebraic simplification common subexpression elimination integer range typing conditional constant propagation

    dominating test detection flow-carried type narrowing dead code elimination dead value elimination class hierarchy analysis devirtualization symbolic constant propagation autobox elimination escape analysis lock elision lock fusion de-reflection optimistic nullness assertions optimistic type assertions optimistic type strengthening optimistic array length strengthening untaken branch pruning optimistic N-morphic inlining branch frequency prediction call frequency prediction expression hoisting expression sinking

    redundant store elimination adjacent store fusion card-mark elimination merge-point splitting loop unrolling loop peeling safepoint elimination loop vectorization inlining (graph integration) global code motion heat-based code layout switch balancing throw inlining local code scheduling local code bundling delay slot filing graph-coloring register allocation live range splitting copy coalescing constant splitting copy removal address mode matching instruction peepholing DFA-based code generator

    STATIC AND DYNAMIC OPTIMIZATIONS

  • Copyright 2017 Kirk Pepperdine

    FOO() CALLS BAR()▸ Forms a call site

    ▸virtual method lookup in a virtual method table

    ▸ vtable is constructed at class loading time

    ▸ jmp to code for BAR() and execute it with a return jmp

    ▸ involves pushing and popping variables on the stack

    ▸ Inlining eliminates the call site

    ▸ replaces the call site in foo() with the body of bar()

    jClarity

  • Copyright 2017 Kirk Pepperdine

    MASTERMIND▸Game to discover a hidden code

    ▸make a guess which is scored

    ▸ Red -> both color and column are correct

    ▸White -> only color is correct

    ▸use previous guesses to refine current guess

    ▸ can our current guess produce the scores for all the previous guesses

    ▸ P(8,4)=1680 possible combinations

    ▸very small solution space for a computer

    jClarity

  • Copyright 2017 Kirk Pepperdine

    MASTERMIND SIMULATION▸ P(100000,3) = 999,970,000,200,000

    ▸very large solution space

    ▸ Player thread makes guess

    ▸filters guess against all provious guesses

    ▸ if pass submits it to be scored

    ▸Board records the guess with the score

    ▸ Players guess comes from stack containing the permutation group

    ▸generate all 999,970,000,200,000 permutations

    jClarity

  • Copyright 2017 Kirk Pepperdine

    MASTERMIND SIMULATION▸ P(100000,3) = 999,970,000,200,000

    ▸very large solution space

    ▸ Player thread makes guess

    ▸filters guess against all provious guesses

    ▸ if pass submits it to be scored

    ▸Board records the guess with the score

    ▸ Players guess comes from stack containing the permutation group

    ▸generate all 999,970,000,200,000 permutations

    jClarity

    Seriously????

    Do you know how

    long that will take????

  • Copyright 2017 Kirk Pepperdine

    MASTERMIND SIMULATION▸ Player thread use an index into the permutation group

    ▸element is generated on the fly

    ▸need to transpose an index into a element

    ▸ but how????

    jClarity

  • Copyright 2017 Kirk Pepperdine

    TRANSLATE 555 TO HEX

    ▸ Pivot values

    ▸16 = 10Hex, 256 = 100Hex, 4096 = 1000Hex

    ▸Calculation

    ▸555 / 256 = 2, 555 % 256 = 43 = (2*256) + 43

    ▸43 / 16 = 2, 43 % 16 = 11

    ▸0x22B

    jClarity

    digit = number / pivot value number = number % pivot value pivot value = pivot value / base

  • Copyright 2017 Kirk Pepperdine

    TRANSLATE 0 TO ELEMENT IN P(100000,3)

    ▸ Symbols -> [0,1,2,3,4,….99999]

    ▸Calculation

    ▸0 / P1= 0, 0 % P1 = 0, symbol[0] = 0, symbols -> [1,2,3,4,….99999]

    ▸0 / P2= 0, 0 % P2 = 0, symbol[0] = 1, symbols -> [2,3,4,….99999]

    ▸0 / P3= 0, 0 % P3 = 0, symbol[0] = 2, symbols -> [1,2,3,4,….99999]

    ▸Element -> 0,1,2

    jClarity

    digit = index / pivot value number = index % pivot value pivot value = pivot value / base remove symbol from list of symbols

  • Copyright 2017 Kirk Pepperdine

    Time for a Demo!

    jClarity

  • Copyright 2017 Kirk Pepperdine

    UGLY CODE CAN RUN FAST ALSO

    jClarity

  • Copyright 2017 Kirk Pepperdine

    VARIABLE ORDERING

    ▸ Violating Single Responsible Pattern sets up the conditions for False sharing

    ▸ False sharing performance impact

    ▸Single thread: 532ms, CPU 100%

    ▸8 threads with false sharing: 8310ms, CPU 800%

    ▸8 threads no false sharing: 1290ms, CPU 800%

    jClarity

    •doubles (8) and longs (8) • ints (4) and floats (4) • shorts (2) and chars (2) • booleans (1)and bytes (1) • references (4/8) •

  • Copyright 2017 Kirk Pepperdine

    Time for a Demo!

    jClarity

  • Copyright 2017 Kirk Pepperdine

    MONITORING HOTSPOT▸ -XX:+PrintCompliation

    ▸ -XX:+LogCompilation

    ▸ requires -XX:+UnlockDiagnosticVMOptions

    ▸ log is best viewed using JITWatch

    ▸ requires -XX:+TraceClassLoading

    jClarity

  • Copyright 2017 Kirk Pepperdine

    INLINING STATES▸ Inline hot

    ▸ the method was determined hot

    ▸ Too big cold

    ▸ the method was not inlined as the code is too big

    ▸ the method was not hot

    ▸ Too big hot

    ▸ the method was determined hot

    ▸ but not inlined because the code is too big.

    jClarity

  • Copyright 2017 Kirk Pepperdine

    (SOME) THRESHOLDS▸ Inlining thresholds

    ▸MaxInlineSize=35 (bytes)

    ▸MaxInlineLevel=9 (nested)

    ▸MaxRecursiveInlineLevel=1

    ▸Medium methods

    ▸DesiredMethodLimit=8000 (bytecodes)

    ▸already compiled and too big to accept more inlining

    ▸MaxTrivialSize=6, MinInliningTheshold=250

    ▸ small methods get inlined very quickly

    ▸HugeMethodLimit=8000

    ▸ these won’t get compiled so forget about inlining

    jClarity

  • Copyright 2017 Kirk Pepperdine

    Back to the Code

    jClarity

  • Copyright 2017 Kirk Pepperdine

    BIT OF A BOOST BUT…..

    jClarity

  • Copyright 2017 Kirk Pepperdine

    VARIABLE ORDERING

    ▸ Violating Single Responsible Pattern sets up the conditions for False sharing

    ▸ False sharing performance impact

    ▸Single thread: 532ms, CPU 100%

    ▸8 threads with false sharing: 8310ms, CPU 800%

    ▸8 threads no false sharing: 1290ms, CPU 800%

    jClarity

    •doubles (8) and longs (8) • ints (4) and floats (4) • shorts (2) and chars (2) • booleans (1)and bytes (1) • references (4/8) •

  • Copyright 2017 Kirk Pepperdine

    CONCLUSION▸ old story, code for correctness and readability

    ▸ the two are related

    ▸ software metrics can help

    ▸ Know your execution environment to make sure the code translates well into it

    ▸ tools are required

    ▸ helps you focus on the trees in the forest

    ▸HotSpot helps

    ▸ adds an extra layer of complexity

    ▸won't fix egregious coding mistakes

    jClarity

  • Copyright 2017 Kirk Pepperdine

    Questions?

    jClarity