39
1 PRAGMATIC OPTIMIZATION IN MODERN PROGRAMMING ORDERING OPTIMIZATION APPROACHES Created by for / 2015-2016 Marina (geek) K olpakova UNN

Pragmatic Optimization in Modern Programming - Ordering Optimization Approaches

Embed Size (px)

Citation preview

Page 1: Pragmatic Optimization in Modern Programming - Ordering Optimization Approaches

1

PRAGMATICOPTIMIZATION

IN MODERN PROGRAMMINGORDERING OPTIMIZATION APPROACHES

Created by for / 2015-2016Marina (geek) K olpakova UNN

Page 2: Pragmatic Optimization in Modern Programming - Ordering Optimization Approaches

2

COURSE TOPICSOrdering optimization approachesDemystifying a compilerMastering compiler optimizationsModern computer architectures concepts

Page 3: Pragmatic Optimization in Modern Programming - Ordering Optimization Approaches

3

OUTLINEWhat is optimization?Pragmatic approachOptimization trade-offsKnowledge which is requiredWhere to get the performance?Optimization cycleTop-Down (High-low) approachOptimization cycle (revised)Optimization steps overviewHow to learn optimization?Recommended literatureSummary

Page 4: Pragmatic Optimization in Modern Programming - Ordering Optimization Approaches

4

WHAT IS OPTIMIZATION?In computing, optimization is a process of modifying a systemto make some aspect of it to work more ef�ciently or use fewer

resources, in particular, a process of transforming a piece ofcode to make it more ef�cient without changing its output.

Page 5: Pragmatic Optimization in Modern Programming - Ordering Optimization Approaches

5

PRAGMATIC APPROACH“Programmers waste enormous amounts of time thinking

about, or worrying about, the speed of non-critical parts oftheir programs, and these attempts at ef�ciency actually have

a strong negative impact when debugging and maintenanceare considered. We should forget about small inef�ciencies, sayabout 97% of the time; premature optimization is the root of

all evil. Yet we should not pass up our opportunities in thatcritical 3%.“

-Donald Knuth, Structur ed Programming With go to Statements

1. Find what to start from (3%)2. Know when to stop (97%)

Page 6: Pragmatic Optimization in Modern Programming - Ordering Optimization Approaches

6

OPTIMIZATION TRADE-OFFSCode portability decreases when we go deeperPerformance portability decreases when we go deeperThe cost of maintenance and extensibility increases when we go deeperOptimizations are often not reusableOptimizations become obsolete very quickly

...but still performance is a crucial requirement for mostapplications.

Page 7: Pragmatic Optimization in Modern Programming - Ordering Optimization Approaches

7

KNOWLEDGE WHICH IS REQUIRED1. The code

The problem, it solvesThe algorithm, it implementsThe algorithmic complexity

2. The compilerCompilation trajectoryCompiler's capabilities and obstacles

3. The platformArchitecture capabilities

Instruction Set Architecture

Micro-architecture speci�cs

Page 8: Pragmatic Optimization in Modern Programming - Ordering Optimization Approaches

8

WHERE TO GET THE PERFORMANCE?

High-level Programmer

Middle-level Compiler

Low-level Hardware

Page 9: Pragmatic Optimization in Modern Programming - Ordering Optimization Approaches

9

OPTIMIZATION CYCLE

Page 10: Pragmatic Optimization in Modern Programming - Ordering Optimization Approaches

10

TOP-DOWN (HIGH-LOW) APPROACH1. Understand the code2. Use appropriate algorithms

3. Optimize memory access patterns4. Minimize number of operations

5. Shrink the critical path6. Perform HW-speci�c optimizations

7. Dive into assembly

Page 11: Pragmatic Optimization in Modern Programming - Ordering Optimization Approaches

11

OPTIMIZATION CYCLE (REVISED)

Page 12: Pragmatic Optimization in Modern Programming - Ordering Optimization Approaches

12

STEP #1: UNDERSTAND THE CODEDifferent people think differently

you'll need some time to get used to the code Understand data�ow

input/output parametersdata dependencies

Identify performance limiters

TimePro�leCollect metrics

e.g. CPI, power consumption

Page 13: Pragmatic Optimization in Modern Programming - Ordering Optimization Approaches

13 . 1

STEP #2: USE APPROPRIATE ALGORITHMConsider and lower big O complexity

Choose data structures wiselyLook for optimized librariesFind opportunities to scalarize & parallelize

Page 14: Pragmatic Optimization in Modern Programming - Ordering Optimization Approaches

13 . 2

STEP #2: USE APPROPRIATE ALGORITHMCompilers are not aware of semantics of code, taking this into

account focus on an algorithmic aspect �rst.

Decrease big-O complexityUse optimized libraries for subroutinesRestructure the code to use fewer resourcesSplit problem on subtasks, organize them wiselyParallelize

What if you need to sort 100 Mb of numerical data...

WHAT SORTING ALGORITHM WOULD YOU CHOOSE?

Page 15: Pragmatic Optimization in Modern Programming - Ordering Optimization Approaches

14 . 1

STEP #3: OPTIMIZE MEMORY ACCESSESYou'll be surprised how many algorithms are memory bound!

Optimization for memory usually involves:

Data restructuringto load only data that is really needed for computations.

Data packagingto shrink the data in size

Loop transformationsto walk through the data in a more ef�cient way,to increase temporal & spacial locality,to perform cache-aware optimization

Page 16: Pragmatic Optimization in Modern Programming - Ordering Optimization Approaches

14 . 2

STEP #3: OPTIMIZE MEMORY ACCESSESCompilers are quite good at local optimization, such as

loop bodies transformations,local functions inlining,arithmetic expressions simpli�cation

so help a compiler rather than try to outfox it.

Work cohesively with it on

enabling auto-vectorization,optimizing critical loops,vectorizing.

Page 17: Pragmatic Optimization in Modern Programming - Ordering Optimization Approaches

14 . 3

STEP #3: OPTIMIZE MEMORY ACCESSESfor (int j = 0; j < height; j++) for (int i = 0; i < width; i++) if (img[j * width + i] > 0) count++;

for (int i = 0; i < width; i++) for (int j = 0; j < height; j++) if (img[j * width + i] > 0) count++;

WHICH IS MORE OPTIMAL FOR CONVENTIONAL CPU PROCESSOR?

Page 18: Pragmatic Optimization in Modern Programming - Ordering Optimization Approaches

14 . 4

STEP #3: OPTIMIZE MEMORY ACCESSESfor (int j = 0; j < height; j++) for (int i = 0; i < width; i++) if (img[j * width + i] > 0) count++;

for (int i = 0; i < width; i++) for (int j = 0; j < height; j++) if (img[j * width + i] > 0) count++;

Page 19: Pragmatic Optimization in Modern Programming - Ordering Optimization Approaches

15 . 1

The compiler usually helps a lot here:

STEP #4: MINIMIZE NUMBER OF OPERATIONSReducing a program in the number of operations

doesn't necessarily decrease its runtime, but it's a good heuristic, though.

Machine-independentoptimizations

Common Sub-expression Elimination

Constant propagationRedundancy elimination..

Machine-dependentoptimizations

Register allocationInstruction selectIonInstruction schedulingPeephole optimization..

Page 20: Pragmatic Optimization in Modern Programming - Ordering Optimization Approaches

15 . 2

STEP #4: MINIMIZE NUMBER OF OPERATIONSfloat pows(float a,float b,float c, float d, float e, float f, float x){ return a * powf(x, 5.f) + b * powf(x, 4.f) + c * powf(x, 3.f) + d * powf(x, 2.f) + e * x + f; }

gcc -march=armv7-a -mfpu=neon-vfpv4 -mthumb-mfloat-abi=softfp -O3 1.c -S -o 1.s

Page 21: Pragmatic Optimization in Modern Programming - Ordering Optimization Approaches

15 . 3

... Let's apply Horner rule.

STEP #4: MINIMIZE NUMBER OF OPERATIONSpows: push {r3, lr} flds s17, [sp, #56] fmsr s24, r1 movs r1, #0 fmsr s22, r0 movt r1, 16544 fmrs r0, s17 fmsr s21, r2 fmsr s20, r3 flds s19, [sp, #48] flds s18, [sp, #52] bl powf(PLT) mov r1, #1082130432 fmsr s23, r0 fmrs r0, s17 bl powf(PLT)

movs r1, #0 movt r1, 16448 fmsr s16, r0 fmrs r0, s17 bl powf(PLT) fmuls s16, s16, s24 vfma.f32 s16, s23, s22 fmsr s15, r0 vfma.f32 s16, s15, s21 fmuls s15, s17, s17 vfma.f32 s16, s20, s15 vfma.f32 s16, s19, s17 fadds s15, s16, s18 fldmfdd sp!, {d8-d12} fmrs r0, s15 pop {r3, pc}

Page 22: Pragmatic Optimization in Modern Programming - Ordering Optimization Approaches

15 . 4

STEP #4: MINIMIZE NUMBER OF OPERATIONSfloat horner(float a, float b, float c, float d, float e, float f, float x){ return ((((a * x + b) * x + c) * x + d) * x + e) * x + f; }

horner: flds s15, [sp, #8] fmsr s11, r0 fmsr s12, r1 flds s14, [sp] vfma.f32 s12, s11, s15 fmsr s11, r2 flds s13, [sp, #4] vfma.f32 s11, s12, s15 fcpys s12, s11 fmsr s11, r3 vfma.f32 s11, s12, s15 vfma.f32 s14, s11, s15 vfma.f32 s13, s14, s15 fmrs r0, s13 bx lr

Page 23: Pragmatic Optimization in Modern Programming - Ordering Optimization Approaches

15 . 5

STEP #4: MINIMIZE NUMBER OF OPERATIONSUnfortunately, sometimes a compiler fails some optimization

steps (e.g. register allocation, scalarization) and harms theperformance by introducing redundant operations.

Starting from this optimization step it is worth to look at theassembly code to check whether the compiler is actually

automating a particular optimization.

Page 24: Pragmatic Optimization in Modern Programming - Ordering Optimization Approaches

16 . 1

STEP #5: SHRINK THE CRITICAL PATHCritical path is the longest sequence of operations in a code

block that must be completed in order, which is usually causedby dependencies between steps or operations.

The critical path of a code block is hardly deducible fromhigh-level code and requires assembly inspection.Knowledge about architecture capabilities is required toestimate critical path more precisely.Some pro�lers are able to do critical path analysis.The term could also refer to the longest sequence ofdependent steps in a pipeline that limits its parallelization.Control-�ow diagram is used to �nding the critical path.

Page 25: Pragmatic Optimization in Modern Programming - Ordering Optimization Approaches

16 . 2

STEP #5: SHRINK THE CRITICAL PATHLet's look at the critical path of the following code block.

const uint8_t* p0 = src.ptr(row0); const uint8_t* p1 = src.ptr(row1); uint8_t* dptr = dst.ptr(row);

for (int col = 0; col < cols; ++col) { dptr[col] = (p0[col*2]+p0[col*2+1] + p1[col*2]+p1[col*2+1]+2)>>2; }

WHAT IS THE CRITICAL PATH OF THIS CODE LINE?

Page 26: Pragmatic Optimization in Modern Programming - Ordering Optimization Approaches

16 . 3

STEP #5: SHRINK THE CRITICAL PATHLet's create 3-positional representation of the code block

r0 = col*2 // 1 r1 = r0+1 // 2 r2 = load(sptr0, r0) // 3 r3 = load(sptr0, r1) // 4 r4 = load(sptr1, r0) // 5 r5 = load(sptr1, r1) // 6 r6 = r2+r3 // 7 r7 = r6+r4 // 8 r8 = r7+r5 // 9 r9 = r8+2 // 10 r10 = shl(r9, 2) // 11

11 ?Let's construct the dependency graph...

Page 27: Pragmatic Optimization in Modern Programming - Ordering Optimization Approaches

16 . 4

STEP #5: SHRINK THE CRITICAL PATH

8 ?

Page 28: Pragmatic Optimization in Modern Programming - Ordering Optimization Approaches

16 . 5

But, the compiler reorders instructions since integer math is associative

STEP #5: SHRINK THE CRITICAL PATH

6 ?

Page 29: Pragmatic Optimization in Modern Programming - Ordering Optimization Approaches

16 . 6

And let's assume that hardware schedules 1 arithmetic and 1 memory operation per clock.

STEP #5: SHRINK THE CRITICAL PATH

AND BACK TO 8 AGAIN

Page 30: Pragmatic Optimization in Modern Programming - Ordering Optimization Approaches

17 . 1

STEP #6: DO HW-SPECIFIC OPTIMIZATIONIt requires comprehensive understanding of the target HW,

which usually goes beyond compiler's abilities

Using special hardware capabilitiesOvercoming micro-architecture weaknessUsing instructions, which are speci�c for concrete HWbalancing usage of different instruction types

A classical example here is a question of recomputingtemporal v.s. getting it from the memory.

Page 31: Pragmatic Optimization in Modern Programming - Ordering Optimization Approaches

17 . 2

STEP #6: DO HW-SPECIFIC OPTIMIZATIONModern hardware is quite advanced,

deep pipelines,out-of-order execution,sophisticated branch prediction,multi-level memory hierarchies,processor specialization.

so utilize unique properties of the hardware.

Peephole optimization is not as important as used to be 10 years ago.

Page 32: Pragmatic Optimization in Modern Programming - Ordering Optimization Approaches

18

STEP #7: DIVE INTO ASSEMBLYAssembler is a must-have to check the compiler

but it is rarely used to write low-level code.

Raw assembly make sense to:Overcome compiler bugs & optimization limitations

addition of redundant instructionssuboptimal register allocation

Use speci�c hardware featureswhich are not expressed in higher level ISA

Keep in mind that:Assembly writing is the least portable optimizationIn-line assembly limits compiler optimizations

Page 33: Pragmatic Optimization in Modern Programming - Ordering Optimization Approaches

19

HOW TO LEARN OPTIMIZATION?Optimization is a craft rather than a science.

Practice moreDo not make practical knowledge too theoretical.

Look, what other people doDo �nd real use cases of different optimizationapproaches and techniques.

Dig into an architectureHW evolves rapidly hence devices obsolete in a wink.Comprehensive knowledge helps see beforehand.

Page 35: Pragmatic Optimization in Modern Programming - Ordering Optimization Approaches

20 . 2

RECOMMENDED LITERATURE

by The Mature Optimization Handbook Carlos Bueno

Page 36: Pragmatic Optimization in Modern Programming - Ordering Optimization Approaches

20 . 3

RECOMMENDED LITERATURE

by

Is Parallel Programming Hard, And, If So, What Can You Do About It?

Paul E. McKenney

Page 37: Pragmatic Optimization in Modern Programming - Ordering Optimization Approaches

20 . 4

RECOMMENDED LITERATURE

by and

Engineering a CompilerKeith Cooper Linda Torczon

Page 38: Pragmatic Optimization in Modern Programming - Ordering Optimization Approaches

21

SUMMARYPractice, look what others do and dig into an architecture.The main task of an optimizer is �nding the critical part.Optimizer's mastership is to know where to stop.Knowledge about the code, the compiler and the platformis a must-have.Optimization is a measure-analyze-optimize-check cycle.Stick to the high-to-low approach.Get the performance from algorithmic and data structurechoices �rst,... ensure memory access patterns next,... then go deeper.

Page 39: Pragmatic Optimization in Modern Programming - Ordering Optimization Approaches

22

THE END

/ 2015-2016MARINA KOLPAKOVA