10

Click here to load reader

Low maintenance perl notes

Embed Size (px)

DESCRIPTION

These are my notes from my talk "Low-Maintenance Perl"

Citation preview

Page 1: Low maintenance perl notes

3/3/12 No Title

1/10file://localhost/Users/perrinharkins/Conferences/low_maintenance_perl.html

Low-Maintenance Perl"Optimizing for an Easy Life"How many times have you been reading some blog and seen something like this?

"I would never use Perl for a large project.""Perl is write-only.""Perl is unmaintainable."

Most likely this was coming from someone who has an agenda, and doesn't know much about Perl, but still,the idea gets out there. A typical response to this is that perl is no worse to maintain than any other code ifyou hire good programmers, and I believe this is true. However, this leads to an interesting question: Whatwould these fabled good programmers do, in order to make their perl code more maintainable?

[ Slide: "What Would Good Programmers Do?" ]

Larry Wall famously said that Perl is intentionally flexible in its syntax in order to allow people to optimizefor the things they care about -- performance, brevity (as in Perl golf), entertainment value, etc. Okay then,let's take a look at what we could do in order to optimize for maintability. I want code that's easy to write,easy to read, and easy to debug.

I won't pretend to have all the answers, but I have worked on some large, long-term projects written in perlover the years, and I'll tell what has worked for me.

Some Things I Shouldn't Have to Tell YouLet's get something out of the way up front. Perl provides some very good tools for avoiding commonmistakes, and if you aren't using these, you really can't expect to have maintainable code. This should besecond-nature by now for anyone working on code with an expected lifespan of more than 15 minutes.

use strict; use warnings;

There are no more excuses for not using these. I think of these as a directive to Perl meaning "I want thiscode to actually work." But I'm sure you already use them, and you don't need to listen to me going onabout them. This is a no-brainer.

A somewhat less widely-used tool for keeping your code healthy is perltidy.

[ Slide: before and after of tidied code ]

If you haven't heard of it, this is a code formatter for perl that supports a very configurable style. It used tobe that if you wanted to have consistent formatting for all the code in a project, you had to write up someformatting standards and then go aorund rapping people on the knuckles for not following them. Besideswasting a lot of time, this tended to cause a dis-proportionate amount of team friction.

With perltidy, consistent formatting takes no effort at all. In fact I save a lot of time by writing my codewithout bothering to format it and then tidying it when I pause to test it.

Page 2: Low maintenance perl notes

3/3/12 No Title

2/10file://localhost/Users/perrinharkins/Conferences/low_maintenance_perl.html

Consistent formatting helps much more than you would imagine. It really removes some of the "alien"feeling of working on code written by someone else. And a big project means you will be working on otherpeople's code.

Know Your AudienceOkay, with the preliminaries out of the way, let's get down to some juicier stuff. Consider this quote, fromAndy Hunt of "Pragmatic Programmers" fame:

"Code is always read many more times than it is written."

Seems pretty obvious, right? So why do we spend so much time worrying about how fast we can write it,when we're going to be spending more time trying to read, understand, and possibly debug it?

A programmer working on a large or long-lived piece of code is not so different from a newspaper writerworking on an article. You want your writing to be elegant, but your primary goal is to convey someinformation in a way that will be clear to your reader. Newspapers know something about their targetaudience and they adjust the vocabulary and style of their writing to suit. Most US newspapers aim foraround a 6th to 8th grade reading level.

Who is your target audience? I'll tell you who it's not. It's not you, fresh from a stroll through the perlfuncman page, with the full structure of your entire program clearly laid out in your mind. I like to think that mytarget audience is me, woken up on some emergency 2am phone call, still a little tipsy from the night before,trying to debug something over a flaky dial-up connection. I think that keeping this image in mind helps tocurb daredevil coding.

Choosing a DialectWhere am I going with this? Am I really suggesting that you should limit your Perl vocabulary in the nameof code that's easier to maintain? I am. I'll admit that I have been called a crackpot for suggesting this, andyou may agree by the time I'm done talking, but maybe it will still give you some things to chew on.

Limiting your vocabulary is a large part of optimization in a high-level language like Perl. If you want tooptimize for speed, you might avoid certain file manipulation idioms, or make heavier use of the non-regexstring testing functions. If you want to optimize for golf, you'd lean on the punctuation variables and defaultvalues. Optimizing for poetry would mean choosing functions whose names have the most resonance inEnglish. And so on.

Optimizing Perl for easy maintenance is about making it harder to screw up, and quicker to understand, andtherefore debug. The philosophy of my Perl dialect is based on five principles:

Don't use something complex when something simple will work.

This one seems terribly obvious, doesn't it? Nevertheless, many Perl programmers, when confronted with asimple problem, reach for an extreme, whiz-bang solution. It may be neat to use AUTOLOAD to build a coupleof accessor methods, but it's kind of like swatting a fly with a hand-grenade.

Don't do things in a magical way when an explicit way will work.

Page 3: Low maintenance perl notes

3/3/12 No Title

3/10file://localhost/Users/perrinharkins/Conferences/low_maintenance_perl.html

Somewhat related to the last one, Perl provides many features that modify something about the behavior ofthe base language, or allow for sneaky code (like tied variables) where what appears to be happening isvery different from what's really happening. This is going to make your code harder to understand. It's alsogoing to make the error messages that show up when someone makes a mistake very confusing. Theythought they were doing something simple, and suddenly they have this crazy error message about missingoverload magic.

Perhaps more importantly, many of the magical features can lead to what's called "action at a distance" --when some code in one section of a program modifies how code in a different part of the program behaves,possibly by accident. To Perl's credit, many of the old magic-variable tricks that were the worst offenders inthis area -- stuff like $[, which changes the index that arrays start at! -- are clearly discouraged now.

Most of the features in programming languages for managing complexity have to do with isolating sectionsof code from each other, so that you don't have to keep the entire program in your head all the time just towrite a line. This is a good thing, and we don't want to break it.

Don't make your code complex, just so you can get a certain syntax.

Some Perl programmers really obsess over syntax. They want function prototypes so their sub calls canlook like built-ins. (Guess what? They're not built-ins.) They spend lots of energy using overloading andabusing import and indirect object syntax to get certain effects.

For example, (and I don't mean to pick on anyone here, but I need to show you something to make thispoint) the web-scraping module FEAR::API allows you to say this:

fetch("search.cpan.org") > \my @cont;

That's roughly equivalent to this:

my $scraper = FEAR::API->fear(); my $page = $scraper->fetch("search.cpan.org"); push my @cont, $page->document->as_string;

Which one of these do you think will be more likely to break when someone uses it in an unexpected way?Which one do you think will give clearer error messages if it breaks?

To be fair to the author of FEAR::API, he was trying to optimize for something totally different from whatI'm after: shorter code. Having less code is generally a good thing for maintability, but not when you haveto tweak Perl's syntax to do it, and not when it sacrifices readability.

Follow common conventions when you can.

There are few rules in Perl, but there are many things that people have come to expect, after seeing themagain and again in books, documentation, and other people's code. Every time you go against these, you'remaking the reader work harder. Don't do it when you don't have to.

Take regex syntax, for example. Damian Conway made some good arguments for writing all your regexeswith alternative delimiters (s{foo}{bar}) in his book Perl Best Practices. However, most Perl programmershave learned to instinctively think "regex" when they see those forward slashes (s/foo/bar/). It savesthem some think time if you follow the conventions.

Don't use an obscure language feature when a common one will work.

Page 4: Low maintenance perl notes

3/3/12 No Title

4/10file://localhost/Users/perrinharkins/Conferences/low_maintenance_perl.html

Perl has some dark corners. There are features that many people either don't know about at all or have justnever used. There are even some things that seem to work, but you wonder if they were really intended to.

This quote from a PerlMonks member is relevant here:

Dragonchild's Law

"If I have to ask if something is possible on PerlMonks, I probably should rethink my design."

Why avoid obscure features? Why not explore every nook and cranny of Perl?

Obscure features are more likely to be used incorrectly.

Life would be pretty dull if you never tried anything new, but maybe the critical path of yourshipping cost calculator is not the ideal place to break out.

The docs may not be as well-reviewed.

If fewer people use the feature, fewer people feel qualified to review the docs. They may not be asgood as the ones for commonly-used features.

Peers may not have enough experience with them to give good advice.

Your questions on mailing lists may be answered with a resounding silence, since fewer peopleactually know how to use the feature you're asking about.

Obscure features are not as widely tested.

Perl has a great test suite, but the most commonly used features get more real-world testing, bydefinition.

Obscure features are more likely to change in future versions.

I know, Perl has great backwards compatiblity. Still, some things just don't make the cut. Rememberpseudo-hashes? If you built a whole bunch of code based on them, I'll bet you do. Or what about thisevil static variable trick:

my $foo = 1 if $bar; # like my $foo = 1 if 0

That now gives a warning in Perl 5.10, which is great, because I've seen it cause some really hard tofind bugs when people did it by accident. If you used this sneaky way of getting a static variableinstead of one of the more obvious ones, you're going to be in trouble when Perl stops supporting thisbehavior (which is the plan).

An Example DialectEnough talk. You probably want to see where your pet feature falls in my low-maintenance dialect of Perl,so let's get on with it.

Never

[ Slide: Toaster oven, "Don't touch it! It's evil!" ]

Page 5: Low maintenance perl notes

3/3/12 No Title

5/10file://localhost/Users/perrinharkins/Conferences/low_maintenance_perl.html

These are the features that I just plain avoid.

Formats

Nobody remembers how to use these. Just use sprintf or a templating module.

Punctuation variables

As mentioned before, these are major offenders in the action-at-a-distance area. They can change aprogram's behavior globally.

Having said that, there are some you just can't avoid. If you want to read files efficiently, you mayneed the common slurp idiom:

my $text = do { local $/ = undef; <$fh>; };

Just don't forget that local! At one place I worked, someone forgot to localize a change in $/ andending up killing all the order processing on a major e-commerce site for a few hours. When you seethese punctuation variables in code, you should be on yellow alert.

import() functions that do something other than import

I think maybe Test::More led a lot of people down the primrose path with this one. It looks reallyinviting to stick some kind of configuration directives in there. Most people won't understand what'sgoing on though, and may accidentally break it, or be confused about alternative ways to call it.

As an example, Catalyst has code in its synopsis that looks like this:

use Catalyst qw/-Debug/;

What do you think happens if you make that an empty list, maybe to save some memory by notimporting anything?

use Catalyst ();

Your code blows up, because Catalyst was using that hook to add itself to the @ISA of your class. Aneat trick, but totally unnecessary. If you just inherit from Catalyst in the normal way, it works fine.

Prototypes

What can I say that hasn't already been said? Lots of potential problems, all for some syntactic sugar.

The Error module is a good example of the trouble prototypes can cause. The try/catch syntax that itsupports has some very confusing behaviors (which are too long to go into here, but can be found byGoogling) due to the use of code ref prototypes. It used to be a common cause of memory leaksbecause of this too, but that has been fixed in more recent versions of Perl.

Indirect object syntax

It can trip you up in various ways. Just get over it. It's Class->new() , not new Class .

UNIVERSAL::

Page 6: Low maintenance perl notes

3/3/12 No Title

6/10file://localhost/Users/perrinharkins/Conferences/low_maintenance_perl.html

Using the UNIVERSAL:: namespace is a pretty blatant violation of encapsulation. Shoving methodsinto other people's classes is not nice, and seeing some other code calling methods on your own classthat you know aren't there will blow your mind when you're trying to debug something.

Run-time @ISA manipulation

Sometimes people decide that they don't like the way perl does inheritance, and the best solution is tomess with @ISA or otherwise change how inheritance works to suit them. I'd agree that otherinheritance schemes might be better, but the extra risk is just not worth it to me.

Re-blessing existing objects

bless, $object, 'Some::Other::Class';

It's just asking for trouble, not to mention breaking encapsulation.

Objects that are not hashes

I know people will disagree with me on this one, but hashes are how objects are done in Perl. Usingsomething else is going to make things more confusing for everyone who has to work on the code,and is likely to break some of their favorite tools and idioms. I do understand why some people likeinside-out objects though, and I think that Jerry Hedden's Object::InsideOut is your best option ifyou want them.

overloading

This is another feature that obscures what's going on, and when it bites you, it bites hard.

Take a look at this example code using Exception::Class, my favorite exception module:

use Exception::Class qw(MyProject::InsufficientKarma); # in some method, the exception is triggered MyProject::InsufficientKarma->throw(); # in the caller, we catch it with eval and then check the type if ($@ and $@->isa('MyProject::InsufficientKarma')) {

Looks reasonable, right? It doesn't work. It doesn't work because Exception::Class overloadsstringification, and since that throw() call didn't pass in a message, it's testing for truth on an emptystring. That's an hour of my life that I'll never get back.

You could argue (correctly) that I should have read the docs more closely, but who expects anexception to evaluate to false?

Multiple packages in one file

Have another file. They're free. And then I won't have to grep all the code looking for where on earththat package is hidden.

Source filters

I'm sure you've heard the warnings. They're easy to break, and can play havoc with your code.

Page 7: Low maintenance perl notes

3/3/12 No Title

7/10file://localhost/Users/perrinharkins/Conferences/low_maintenance_perl.html

use constant

It causes too many problems with string interpolation, especially where stringification is less obvious,like with the fat comma operator. I just use normal package variables instead.

# bad use constant TEA => 'Darjeeling'; %beverages = (TEA => 1); # good our $TEA = 'Darjeeling'; %beverages = ($TEA => 1);

Rarely

These are things I use very rarely, when I'm out of other options.

DESTROY methods

The problem with DESTROY methods is that scoping in Perl is much more confusing than you realize,and if you put something important in a DESTROY it may not happen when you expect it to. It's notjust a simple matter of lexical variables vs. package variables -- there are some strange corner caseswith returns from blocks and the like that are really counter-intuitive. Tim Bunce showed us somedoozies on the Class::DBI mailing list.

The module Apache::Session is a poster child for relying too much on DESTROY. It has a tied hashfor an interface and uses the DESTROY method to save all changes and release locks. Sounds prettysimple, but Perl mailing lists are full of anguished souls asking why their data is not getting written tothe session. An explicit method to commit changes would be a lot less error-prone.

Weak refs

There are some problems that are really hard to solve without them, but once you decide to use themyou have a whole new set of things to worry about, like how to handle refs that point to objects thatare gone. This is not a feature to use lightly.

AUTOLOAD

I was a huge fan of AUTOLOAD when I first found it. It seemed like the hammer for all of my nails.Some things really are a lot easier with AUTOLOAD, but don't be too quick on the draw. Writing anAUTOLOAD that handles failures well can be a challenge, and the invisibility of the magicAUTOLOAD methods will break things like can(). As chromatic has pointed out before, you canwrite more code to make can() work again, but I consider that the programming equivalent ofthrowing good money after bad. If you don't break it, you don't have to fix it.

Tied variables

tie is another feature that sounds great, until you try to use it for something important. Then youdiscover all the caveats and limitations, and you get into confusing questions about whether youshould be passing around references to the tied variable or the underlying object. The nail in thecoffin is that it's actually slower than just calling methods directly.

I think the fundamental problem with tie is that it's a feature that is intentionally misleading. It

Page 8: Low maintenance perl notes

3/3/12 No Title

8/10file://localhost/Users/perrinharkins/Conferences/low_maintenance_perl.html

obscures what's really happening, and that's exactly what we're trying to get away from.

Sometimes

These are things I use sometimes, but try hard not to overuse, since I consider them accident-prone.

Closures

Sorry, but only MJD really knows how to use them right. Most of the little things I see people useclosure variables for could just be done with package variable, and they'd be more obvious to mostreaders.

Sub refs

Not to pick on you Higher-Order Perl fans. I know you love them, but don't overruse them. I usuallyuse them when I have a variation in behavior that seems too small to make a set of classes for. If theystart to become the focal point of the code, or I need several of them to make something work, Iswitch to an OO design.

String eval

Sometimes you have to use it, but it's another yellow alert. Not something to do lightly.

Subroutine attributes

I use these a bit, in things like Test::Class, and they seem pretty reliable now. I just try to avoidsituations where there's anything more than very simple data in them. In most cases, a littleconfiguration hash would work just as well.

Exported subs

Those short subroutine calls look nice, but they just don't play well with OO code. I've had to goback and change things to class methods enough times that now I just start them out that way.

Chained map/grep

These are too useful to simply avoid, but those big chains of them that you have to read backwardsare a nightmare.

wantarray

I know, people love their contextual returns. I prefer consistent return values. First, because wantarraymakes testing your code a pain -- you have to test things in each context! Second, it can cause lots ofhard-to-spot bugs like this Class::DBI one:

@books = Book->search(author => $author) || die "book not found";

Class::DBI returns an iterator object in scalar context, and the || forces scalar context, breaking thiscode. It works if you use or instead.

I think most experienced coders know the contextual returns of the built-ins very well, but that's noreason to make your own code harder to use.

Page 9: Low maintenance perl notes

3/3/12 No Title

9/10file://localhost/Users/perrinharkins/Conferences/low_maintenance_perl.html

ternary operator

It's just an uglier, harder to read version of if. I'll only use it when the test and results are very short.And that chained stuff makes my eyes bleed.

$_

Last, but not least. Sure, it's neat that Perl has pronouns, and its unavoidable if you want to use map,but its easy to break (by doing something that modifies $_ as a side effect), and almost always harderto read than a real variable name.

Some Anticipated QuestionsLet's answer a few questions that you might have about my approach.

Doesn't this take all the fun out of programming?

No, it doesn't. Next question.

Okay, it does possibly shift the level you work at a bit, from focusing on being clever on individual lines tobeing clever in the larger scheme.

There's a saying that I like about writing. It goes like this: When you write poem, you work at the wordlevel. When you write a short story, you work at the sentence novel. When you write a novel, you work atthe paragraph level.

A large coding prohect is a novel. The higher-level view in programming is that of interface design, of datastructures, of groups of classes, and finally of entire systems. If you keep the low-level stuff simple, youhave more time to think about these things.

Won't this make your code longer?

Probably. Although writing less code is usually a good thing, there is a balance. Generally sound advicelike "Don't Repeat Yourself" can sometimes lead people to obsessively shorten their code with techniquesthat make it more complex. Do you really need to run regexes on your POD to parse out the names of thearguments your script takes, just to avoid listing them again in your code? Probably not.

But AUTOLOAD is awesome!

Agreed. AUTOLOAD is awesome. Every time I drink a 40, I pour a little on the ground for my old friendAUTOLOAD. But I still don't use it.

Why don't you just use Java?

Who let that guy in here?

I have used Java, and I mostly like it, but I use Perl when I can for probably the same reason other peopledo: it lets me get things done faster. But it's not enough to get things done -- you need them to stay done, bywriting them in a way that you can maintain when other things change around them.

Page 10: Low maintenance perl notes

3/3/12 No Title

10/10file://localhost/Users/perrinharkins/Conferences/low_maintenance_perl.html

Beyond the CodeI think we're about out of time, but let me just briefly mention a few huge topics, in order to get credit forhaving thought about them.

Configuration Management

You need configuration management. You need a predictable environment for your code to run in. Youneed to control your perl version, web server, database version, CPAN module versions, and probablymore. Without that, you'll spend all your time debugging strange compile problems that you can't reproduceon your own system. If you are installing all your dependencies by just grabbing the latest with the CPANshell, this means you.

Revision Control with Branches

It's not enough to just have source code control. If you aren't using multiple branches, how can you work ona new feature that will take two months while also fixing bugs on your released version? Not in any goodway.

Perl::Critic

This is a really interesting module that could form the basis for evaluating how close some source code is toyour own local dialect. I haven't tried doing it yet, but it's on my list.

Tests Can Save Your Life

Over the past few years, it's become clear to me that large-scale coding is essentially impossible withoutautomated tests.

Test::Class Can Save Your Tests

Your test code is important, and if you're doing it right, there will be lots of it. Don't let it turn into a mess.Test::Class allows you to write test code with better organization and code reuse.

Smolder

One of my co-workers, Michael Peters, wrote this great smoke-testing server that you can use. It makespretty graphs and has some slick AJAX features that make it more fun than your smoke tester.

http://sourceforge.net/projects/smolder/