Upload
bryan-osullivan
View
3.889
Download
3
Tags:
Embed Size (px)
DESCRIPTION
Slides from a talk I gave at an ACCU meeting in Mountain View, California, on September 10, 2008.
Citation preview
Haskell for theReal World
Bryan O’Sullivan
1
Real World
• The hardest problems in modern software
• Reliability
• Modularity
• Performance
• Concurrency
2
Haskell
• Decades of work in academia
• Vehicle for leading-edge research
• “Breaking out” this decade
3
Real World + Haskell
• Fast native-code compiler (GHC)
• Debugger, code coverage, profiling
• 750+ open source packages
• Mostly BSD-licensed, one-click install
• Friendly, active user community
• #haskell 12th biggest on freenode
4
Code You Can Believe In
Bryan O’Sullivan, Don Stewart & John Goerzen
Real World
Haskell
5
Language philosophy
• “Multi-paradigm”, but opinionated
• Carefully chosen defaults:
• Pure and functional
• Static strong typing
• Lazy evaluation
6
Pure and functional
• Data is immutable
• Code is a function of its visible inputs
• Consequences
• Easier to build, read, test, scale
• Many classes of bug are eliminated
7
Static strong typing
• All types are known at compile time
• The compiler infers types
• No need to keyboard them in
• Do not confuse with familiar type systems
8
Static strong typing
• Consequences
• Data conversions are explicit
• Many bugs caught by the compiler
• We don’t pay a “keyboard tax” for safety
9
Lazy evaluation
• Defer work until needed
• Consequences
• Improves modularity
• Helps with reasoning and code reuse
10
The k-minima problem
• Find the k least elements in a list
• Conventional solutions are complicated
• Haskell solution uses laziness
11
Lazy k-minima
• The “take” function extracts the first k elements from a list
• The “sort” function sorts a list
k_minima k list = take k (sort list)
12
How does this work?
• “sort” doesn’t completely sort the list
• Only enough to give the k least elements demanded by the caller
• That extra work to sort the rest of the list?
• Never happens!
13
Algebraic data types
• Powerful and ubiquitous
• Unifying key concepts of data structuring
• enum
• union
• struct
14
The enum-like view
• A type can have several constructors
data Bool = False | True
data Colour = Red | Green | Blue | Violet
15
The union-like view
data PhoneNumber
= Home [Digit]
| Work [Digit]
• Major bonus: we know at runtime which constructor was used
16
The struct-like view
data Tree a
= Node (Tree a) (Tree a)
| Leaf a
17
Algebraic data types
data JSON = JObject [(String,JSON)]
| JArray [JSON]
| JString String
| JNumber Double
| JBoolean Bool
| JNull
18
Typeclasses
• Ad-hoc polymorphism
• A function’s behaviour depends on its type
• How do we express this in a non-OO language?
19
Checking for equality
• How do we express this idea?
• A type whose values can be compared for equality
class Eq a where
(==) :: a → a → Bool
20
Testing for equality
• Define this function:
• “is a value present in a list?”
• Desired:
• One definition for all types that can be compared for equality
elem :: Eq a ⇒ a → [a] → Bool
21
One definition of elem
• elem k (x:xs)
| k == x = True
| otherwise = elem k xs
elem k [] = False
22
Instances
• How do we compare JSON values?
instance Eq JSON where
JNumber a == JNumber b = a == b
etc.
23
Gene sequencing
• Splice e.g. mouse DNA into E. coli
• Replicate
• Extract DNA fragments
• ... now what?
24
Contamination
• Fragments of target and E. coli genes mixed
• Must filter out the E. coli fragments
• How to identify them quickly?
• Standard solution: BLAST
25
Filtering in Haskell
• Our solution: 75 lines of Haskell
• 5x faster than BLAST
26
Development time
• 2 days: develop application
• 2 days: speed up app by 5x
• 2 hours: knock out 3 bugs found by QuickCheck
• 17 seconds: index human chromosome 20
• 5 minutes: check it for 100,000 E. coli fragments
27
What helped?
• Great libraries
• “bio” handles biological sequences
• “bloomfilter” for fast indexing
• “bytestring” provides efficient I/O
• “QuickCheck” for randomized testing
28
What helped?
• Laziness
• Generate all k-length E. coli sequences
allKWords k list = map (take k) (tails list)
• List generated on demand
• Constant space overhead
29
What helped?
• Native code compilation
• Indexing and I/O at C speeds
• Mature profiling tools
• Found opportunities for 5x speedup
30
What helped?
• QuickCheck is a life saver
• Generated random test cases for us
• If a test failed, provided a test case
• Found and fixed 3 gnarly bugs in 2 hours
• Would take days with traditional testing
31
Parallelism
• Next step: run the code in parallel
• Expect 2 days of work needed
• Change maybe 10 lines of code
• The challenge: find the right 10 lines
• Functional programming does not give us parallelism for free ... yet
32
Concurrency
• Threaded programming is a nightmare
• Locks and condition variables do not scale
• Fundamental problem:
• Combining correct threaded functions does not give a correct threaded program
33
Software transactions
• A new approach to threaded programming
• Concurrent updates from multiple threads are atomic and isolated
• Like treating shared memory as a database
34
Interesting features
• The type system prevents us from doing unsafe operations inside a transaction
• We can compose pieces of transactional code into a larger unit
• This still runs as one transaction
• This preserves correctness!
35
Code You Can Believe In
Bryan O’Sullivan, Don Stewart & John Goerzen
Real World
Haskell
Real World Haskell
Online now, free:
book.realworldhaskell.org
In stores in November
~700 pages of good stuff
36