84
Devel::NYTProf Perl Source Code Profiler Tim Bunce - YAPC::EU - Pisa - August 2010

Devel::NYTProf v4 at YAPC::EU 201008

Embed Size (px)

DESCRIPTION

Slides of my talk on Devel::NYTProf and optimizing perl code at YAPC::EU in August 2010. It covers the new features in NYTProf v4 and outlines a multi-phase approach to optimizing your perl code. Best viewed fullscreen. A screencast of a slightly earlier version of this talk, that includes a demo, is available at http://blip.tv/file/3913278

Citation preview

Page 1: Devel::NYTProf v4 at YAPC::EU 201008

Devel::NYTProfPerl Source Code Profiler

Tim Bunce - YAPC::EU - Pisa - August 2010

Page 2: Devel::NYTProf v4 at YAPC::EU 201008

Devel::DProf Is Broken$ perl -we 'print "sub s$_ { sqrt(42) for 1..100 }; s$_({});\n" for 1..1000' > x.pl

Page 3: Devel::NYTProf v4 at YAPC::EU 201008

Devel::DProf Is Broken$ perl -we 'print "sub s$_ { sqrt(42) for 1..100 }; s$_({});\n" for 1..1000' > x.pl

$ perl -d:DProf x.pl

Page 4: Devel::NYTProf v4 at YAPC::EU 201008

Devel::DProf Is Broken$ perl -we 'print "sub s$_ { sqrt(42) for 1..100 }; s$_({});\n" for 1..1000' > x.pl

$ perl -d:DProf x.pl

$ dprofpp -rTotal Elapsed Time = 0.108 Seconds Real Time = 0.108 SecondsExclusive Times%Time ExclSec CumulS #Calls sec/call Csec/c Name 9.26 0.010 0.010 1 0.0100 0.0100 main::s76 9.26 0.010 0.010 1 0.0100 0.0100 main::s323 9.26 0.010 0.010 1 0.0100 0.0100 main::s626 9.26 0.010 0.010 1 0.0100 0.0100 main::s936 0.00 - -0.000 1 - - main::s77 0.00 - -0.000 1 - - main::s82

Page 5: Devel::NYTProf v4 at YAPC::EU 201008

Evolution

Devel::DProf | 1995 | Subroutine Devel::SmallProf | 1997 | Line Devel::AutoProfiler | 2002 | Subroutine Devel::Profiler | 2002 | Subroutine Devel::Profile | 2003 | Subroutine Devel::FastProf | 2005 | Line Devel::DProfLB | 2006 | Subroutine Devel::WxProf | 2008 | Subroutine Devel::Profit | 2008 | Line Devel::NYTProf v1 | 2008 | Line Devel::NYTProf v2 | 2008 | Line + Subroutine Devel::NYTProf v3 | 2009 | Line & Sub + Opcode Devel::NYTProf v4 | 2010 | Line & Sub & Op + Eval

Page 6: Devel::NYTProf v4 at YAPC::EU 201008

Profiling 101The Basics

Page 7: Devel::NYTProf v4 at YAPC::EU 201008

CPU Time Real Time

Subroutines

Statements

? ?? ?

What To Measure?

Page 8: Devel::NYTProf v4 at YAPC::EU 201008

CPU Time vs Real Time

• CPU time- Measures time CPU sent executing your code

- Not (much) affected by other load on system

- Doesn’t include time spent waiting for i/o etc.

• Real time- Measures the elapsed time-of-day

- Your time is affected by other load on system

- Includes time spent waiting for i/o etc.

Page 9: Devel::NYTProf v4 at YAPC::EU 201008

Subroutine vs Statement

• Subroutine Profiling- Measures time between subroutine entry and exit

- That’s the Inclusive time. Exclusive by subtraction.

- Reasonably fast, reasonably small data files

• Problems- Can be confused by funky control flow (goto &sub)

- No insight into where time spent within large subs

- Doesn’t measure code outside of a sub

Page 10: Devel::NYTProf v4 at YAPC::EU 201008

Subroutine vs Statement

• Line/Statement profiling- Measure time from start of one statement to next

- Exclusive time (except includes built-ins & xsubs)

- Fine grained detail

• Problems- Very expensive in CPU & I/O

- Assigns too much time to some statements

- Too much detail for large subs

- Hard to get overall subroutine times

Page 11: Devel::NYTProf v4 at YAPC::EU 201008

Devel::NYTProf

Page 12: Devel::NYTProf v4 at YAPC::EU 201008

v1 Innovations

• Fork by Adam Kaplan of Devel::FastProf- working at the New York Times

• HTML report borrowed from Devel::Cover

• More accurate: Discounts profiler overheadincluding cost of writing to the file

• Test suite!

Page 13: Devel::NYTProf v4 at YAPC::EU 201008

v2 Innovations

• Profiles time per block!- Statement times can be aggregated

to enclosing blockand enclosing sub

• Dual Profilers!- Is a statement profiler

and a subroutine profilerconcurrently

Page 14: Devel::NYTProf v4 at YAPC::EU 201008

v2 Innovations

• Subroutine profiler- tracks subroutine time per calling location

- even for xsubs

- calculates exclusive time on-the-fly

- discounts cost of statement profiler

- immune from funky control flow

- in memory, writes to file at end

- extremely fast

Page 15: Devel::NYTProf v4 at YAPC::EU 201008

v2 Innovations

• Statement profiler gives correct timing after leave ops- last statement in loops doesn’t accumulate

time spent evaluating the condition

- last statement in subs doesn’t accumulate time spent in remainder of calling statement

- previous statement profilers didn’t fix this

- slightly dependant on perl version

Page 16: Devel::NYTProf v4 at YAPC::EU 201008

v2 Other Features

• Profiles compile-time activity

• Profiling can be enabled & disabled on the fly

• Handles forks with no overhead

• Correct timing for mod_perl

• Sub-microsecond resolution

• Multiple clocks, including high-res CPU time

• Can snapshot source code & evals into profile

• Built-in zip compression

Page 17: Devel::NYTProf v4 at YAPC::EU 201008

v3 Features

• Profiles slow opcodes: system calls, regexps, ...

• Subroutine caller name noted, for call-graph

• Handles goto ⊂ e.g. AUTOLOAD

• HTML report includes interactive TreeMaps

• Outputs call-graph in Graphviz dot format

• High resolution timer on Mac OS X (100ns)

• Merge multiple profiles

Page 18: Devel::NYTProf v4 at YAPC::EU 201008

v4 Features

• Profile reporting of code inside string evals

• Smart handling of high numbers of evals

• Smart handling of ‘duplicate’ anon subs

• Better handling of assorted egde-cases

• Detection of slow regex match vars: $& $' $`

Page 19: Devel::NYTProf v4 at YAPC::EU 201008

Cost of Profiling

TimeTime File Size Perl DProf SmallProf FastProf NYTProf

+ blocks=0+ stmts=0

x 1 -

x 4.9 60.7 MB

x 22.0 -

x 6.3 42.9 MB

x 3.6 19.6 MB

x 3.3 16.9 MB

x 2.4 1.2 MB

NYTProf v4.04 running perl 5.12.0 perlcritic 1.088 on lib/Perl/Critic/Policy

Page 20: Devel::NYTProf v4 at YAPC::EU 201008

Effect of Profiling

• Your code- runs more slowly

- uses more memory, e.g. saves string eval src code

- context switches much more often

• Non-closures are as slow to create as closures- I don’t know why (perhaps perl keeps the sub

lexicals around for introspection by the ‘debugger’)

Page 21: Devel::NYTProf v4 at YAPC::EU 201008

Running NYTProf

perl -d:NYTProf ...

perl -MDevel::NYTProf ...

Configure profiler via the NYTPROF env varperldoc Devel::NYTProf for the details

To profile code that’s invoked elsewhere:PERL5OPT=-d:NYTProf

NYTPROF=file=/tmp/nytprof.out:addpid=1:...

Page 22: Devel::NYTProf v4 at YAPC::EU 201008

Reporting: KCachegrind

• KCachegrind call graph - new and cool- contributed by C. L. Kao.

- requires KCachegrind

$ nytprofcg # generates nytprof.callgraph

$ kcachegrind # load the file via the gui

Page 23: Devel::NYTProf v4 at YAPC::EU 201008

KCachegrind

Page 24: Devel::NYTProf v4 at YAPC::EU 201008

Reporting: HTML

• HTML report- page per source file, annotated with times and links

- subroutine index table with sortable columns

- interactive Treemap of subroutine times

- generates Graphviz dot file of call graph

- -m (--minimal) faster generation but less detailed

$ nytprofhtml # writes HTML report in ./nytprof/...

$ nytprofhtml --file=/tmp/nytprof.out.793 --open

Page 25: Devel::NYTProf v4 at YAPC::EU 201008
Page 26: Devel::NYTProf v4 at YAPC::EU 201008

Summary

Links to annotatedsource code

Timings for perl builtins

Link to sortable tableof all subs

Page 27: Devel::NYTProf v4 at YAPC::EU 201008

Exclusive vs. Inclusive

• Exclusive Time is best for Bottom Up- Detail of time spent “in the code of this sub”

- Where the time actually gets spent

- Useful for localized (peephole) optimisation

• Inclusive Time is best for Top Down- Overview of time spent “in and below this sub”

- Useful to prioritize structural optimizations

Page 28: Devel::NYTProf v4 at YAPC::EU 201008
Page 29: Devel::NYTProf v4 at YAPC::EU 201008

Timings for each location calling into, or out of, the subroutine

Overall time spent in and below this sub

(in + below)

Time between starting this perl statement and starting the next.So includes overhead of calls to

perl subs.

Color coding based onMedian Average Deviationrelative to rest of this file

Page 30: Devel::NYTProf v4 at YAPC::EU 201008
Page 31: Devel::NYTProf v4 at YAPC::EU 201008

Boxes represent subroutinesColors only used to show

packages (and aren’t pretty yet)

Hover over box to see detailsClick to drill-down one level

in package hierarchy

Treemap showing relative proportions of exclusive time

Page 32: Devel::NYTProf v4 at YAPC::EU 201008

Calls between packages

Page 33: Devel::NYTProf v4 at YAPC::EU 201008

Calls to/from/within package

Page 34: Devel::NYTProf v4 at YAPC::EU 201008

Let’s take a look...

Page 35: Devel::NYTProf v4 at YAPC::EU 201008

DEMO

Page 36: Devel::NYTProf v4 at YAPC::EU 201008

OptimizingHints & Tips

Page 37: Devel::NYTProf v4 at YAPC::EU 201008

Do your own testing

With your own perl binary

On your own hardware

Beware My Examples!

Page 38: Devel::NYTProf v4 at YAPC::EU 201008

Beware 2!

Page 39: Devel::NYTProf v4 at YAPC::EU 201008

Take care comparing code fragments!

Edge-effects at loop and scope boundaries.

Statement time includes time getting to the next perl statement, wherever that may be.

Beware 2!

Page 40: Devel::NYTProf v4 at YAPC::EU 201008

Beware Your Examples!

Page 41: Devel::NYTProf v4 at YAPC::EU 201008

Consider effect of CPU-level data and code caching

Tends to make second case look faster!

Swap the order to double-check alternatives

Beware Your Examples!

Page 42: Devel::NYTProf v4 at YAPC::EU 201008

Phase 0 Before you start

Page 43: Devel::NYTProf v4 at YAPC::EU 201008
Page 44: Devel::NYTProf v4 at YAPC::EU 201008

DONʼTDO IT!

Page 45: Devel::NYTProf v4 at YAPC::EU 201008

“The First Rule of Program Optimization: Don't do it.

The Second Rule of Program Optimization (for experts only!): Don't do it yet.”

- Michael A. Jackson

Page 46: Devel::NYTProf v4 at YAPC::EU 201008

Why not?

Page 47: Devel::NYTProf v4 at YAPC::EU 201008

“More computing sins are committed in the name of efficiency (without necessarily achieving it) than for any other single reason - including blind stupidity.”

- W.A. Wulf

Page 48: Devel::NYTProf v4 at YAPC::EU 201008

“We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil.”

- Donald Knuth

Page 49: Devel::NYTProf v4 at YAPC::EU 201008

“We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil.Yet we should not pass up our opportunities in that critical 3%.”

- Donald Knuth

Page 50: Devel::NYTProf v4 at YAPC::EU 201008

How?

Page 51: Devel::NYTProf v4 at YAPC::EU 201008

“Throw hardware at it!”

Hardware is Cheap, Programmers are Expensive.

Hardware upgrades can be less risky than software optimizations.

Page 52: Devel::NYTProf v4 at YAPC::EU 201008

“Bottlenecks occur in surprising places, so don't try to second guess and put in a speed hack until you have proven that's where the bottleneck is.”

- Rob Pike

Page 53: Devel::NYTProf v4 at YAPC::EU 201008

“Measure twice, cut once.”

- Old Carpenter’s Maxim

Page 54: Devel::NYTProf v4 at YAPC::EU 201008

Phase 1Low Hanging Fruit

Page 55: Devel::NYTProf v4 at YAPC::EU 201008

Low Hanging Fruit1. Profile code running representative workload.

2. Look at Exclusive Time of subroutines.

3. Do they look reasonable?

4. Examine worst offenders.

5. Fix only simple local problems.

6. Profile again.

7. Fast enough? Then STOP!

8. Rinse and repeat once or twice, then move on.

Page 56: Devel::NYTProf v4 at YAPC::EU 201008

“Simple Local Fixes”

Changes unlikely to introduce bugs

Page 57: Devel::NYTProf v4 at YAPC::EU 201008

Move invariant expressionsout of loops

Page 58: Devel::NYTProf v4 at YAPC::EU 201008

Avoid->repeated->chains->of->accessors(...);

Avoid->repeated->chains->of->accessors(...);

Use a temporary variable

Page 59: Devel::NYTProf v4 at YAPC::EU 201008

Use faster accessors

Class::Accessor-> Class::Accessor::Fast--> Class::Accessor::Faster---> Class::Accessor::Fast::XS----> Class::XSAccessor

These aren’t all compatible so consider your actual usage.

Page 60: Devel::NYTProf v4 at YAPC::EU 201008

Avoid calling subs that don’t do anything!

my $unused_variable = $self->get_foo;

my $is_logging = $log->info(...);while (...) { $log->info(...) if $is_logging; ...}

Page 61: Devel::NYTProf v4 at YAPC::EU 201008

Exit subs and loops earlyDelay initializations

return if not ...a cheap test...;return if not ...a more expensive test...;my $foo = ...initializations...;...body of subroutine...

Page 62: Devel::NYTProf v4 at YAPC::EU 201008

Fix silly code

- return exists $nav_type{$country}{$key}- ? $nav_type{$country}{$key}- : undef;

+ return $nav_type{$country}{$key};

Page 63: Devel::NYTProf v4 at YAPC::EU 201008

Beware pathological regular expressions

Devel::NYTProf shows regular expression opcodes

Page 64: Devel::NYTProf v4 at YAPC::EU 201008

Avoid unpacking argsin very hot subs

sub foo { shift->delegate(@_) }

sub bar { return shift->{bar} unless @_; return $_[0]->{bar} = $_[1]; }

Page 65: Devel::NYTProf v4 at YAPC::EU 201008
Page 66: Devel::NYTProf v4 at YAPC::EU 201008

Avoid unnecessary(capturing parens)

in regex

Page 67: Devel::NYTProf v4 at YAPC::EU 201008

Retest.

Fast enough?

STOP!Put the profiler down and walk away

Page 68: Devel::NYTProf v4 at YAPC::EU 201008

Phase 2Deeper Changes

Page 69: Devel::NYTProf v4 at YAPC::EU 201008

Profile with aknown workload

E.g., 1000 identical requests

Page 70: Devel::NYTProf v4 at YAPC::EU 201008

Check Inclusive Times(especially top-level subs)

Reasonable percentagefor the workload?

Page 71: Devel::NYTProf v4 at YAPC::EU 201008

Check subroutinecall counts

Reasonablefor the workload?

Page 72: Devel::NYTProf v4 at YAPC::EU 201008

Add cachingif appropriate

to reduce calls

Remember invalidation!

Page 73: Devel::NYTProf v4 at YAPC::EU 201008

Walk up call chainto find good spots

for caching

Remember invalidation!

Page 74: Devel::NYTProf v4 at YAPC::EU 201008

Creating many objectsthat don’t get used?

Try a lightweight proxye.g. DateTime::Tiny, DateTimeX::Lite, DateTime::LazyInit

Page 75: Devel::NYTProf v4 at YAPC::EU 201008

Reconfigure your Perlcan yield significant gains with little effort

thread support costs ~2..30%debugging support costs ~15%

Also consider: usemymalloc, use64bitint, use64bitall, uselongdouble, optimize,

and compiler (system vs gcc vs commercial).Consider binary compatibilityvs new installation directory.

Rerun make test before installing.

Page 76: Devel::NYTProf v4 at YAPC::EU 201008

Retest.

Fast enough?

STOP!Put the profiler down and walk away.

Page 77: Devel::NYTProf v4 at YAPC::EU 201008

Phase 3Structural Changes

Page 78: Devel::NYTProf v4 at YAPC::EU 201008

Push loops down

- $object->walk($_) for @dogs;

+ $object->walk_these(\@dogs);

Page 79: Devel::NYTProf v4 at YAPC::EU 201008

Change the data structure

hashes <–> arrays

Page 80: Devel::NYTProf v4 at YAPC::EU 201008

Change the algorithm

What’s the “Big O”?O(n2) or O(logn) or ...

Page 81: Devel::NYTProf v4 at YAPC::EU 201008

Rewrite hot-spotsin XS / C

Consider Inline::C but beware of deployment issues.

Page 82: Devel::NYTProf v4 at YAPC::EU 201008

Small changes add up!

“I achieved my fast times by multitudes of 1% reductions”

- Bill Raymond

Page 83: Devel::NYTProf v4 at YAPC::EU 201008

See also “Top 10 Perl Performance Tips”

• A presentation by Perrin Harkins

• Covers higher-level issues, including- Good DBI usage

- Fastest modules for serialization, caching, templating, HTTP requests etc.

• http://docs.google.com/present/view?id=dhjsvwmm_26dk9btn3g