Upload
gregory-bence
View
218
Download
1
Embed Size (px)
Citation preview
Tools for Refactoring Functional Programs
Simon Thompsonwith
Huiqing LiClaus Reinke
www.cs.kent.ac.uk/projects/refactor-fp
LIL 2006 2
Design
Models
Prototypes
Design documents
Visible artifacts
LIL 2006 3
All in the code …
Functional programs embody their design in their code.
This is enabled by their high-level nature: constructs, types …
data Message = Message Head Body
data Head = Head Metadata Title
data Metadata = Metadata [Tags]
type Title = String …
data Message = Message Head Body
data Head = Head Metadata Title
data Metadata = Metadata [Tags]
type Title = String …
LIL 2006 4
Evolution
Successful systems are long lived …
… and evolve continuously.
Supporting evolution of code and design?
LIL 2006 5
Soft-WareThere’s no single correct design …
… different options for different situations.
Maintain flexibility as the system evolves.
LIL 2006 6
RefactoringRefactoring means changing the design or structure of a program … without changing its behaviour.
RefactorModify
LIL 2006 7
Not just programmingPaper or presentation
moving sections about; amalgamate sections; move inline code to a figure; animation; …
Proof add lemma; remove, amalgamate hypotheses, …
Programthe topic of the lecture
LIL 2006 8
Splitting a function in two
LIL 2006 9
Splitting a function in two
LIL 2006 10
Splitting a function in two
LIL 2006 11
Splitting a functionmodule Split where
f :: [String] -> String
f ys = foldr (++) [] [ y++"\n" | y <- ys ]
LIL 2006 12
Splitting a functionmodule Split where
f :: [String] -> String
f ys = foldr (++) [] [ y++"\n" | y <- ys ]
LIL 2006 13
Splitting a functionmodule Split where
f :: [String] -> String
f ys = join [y ++ "\n" | y <- ys] where join = foldr (++) []
LIL 2006 14
Splitting a functionmodule Split where
f :: [String] -> String
f ys = join [y ++ "\n" | y <- ys] where join = foldr (++) []
LIL 2006 15
Splitting a functionmodule Split where
f :: [String] -> String
f ys = join addNL where join zs = foldr (++) [] zs addNL = [y ++ "\n" | y <- ys]
LIL 2006 16
Splitting a functionmodule Split where
f :: [String] -> String
f ys = join addNL where join zs = foldr (++) [] zs addNL = [y ++ "\n" | y <- ys]
LIL 2006 17
Splitting a functionmodule Split where
f :: [String] -> String
f ys = join (addNL ys) where join zs = foldr (++) [] zs addNL ys = [y ++ "\n" | y <- ys]
LIL 2006 18
Splitting a functionmodule Split where
f :: [String] -> String
f ys = join (addNL ys) where join zs = foldr (++) [] zs addNL ys = [y ++ "\n" | y <- ys]
LIL 2006 19
Splitting a functionmodule Split where
f :: [String] -> String
f ys = join (addNL ys) join zs = foldr (++) [] zs
addNL ys = [y ++ "\n" | y <- ys]
LIL 2006 20
Overview
Example refactorings: what they involve.
Building the HaRe tool.
Design rationale.
Infrastructure.
Haskell and Erlang.
The Wrangler tool.
Conclusions.
LIL 2006 21
Haskell 98
Standard, lazy, strongly typed, functional programming language.
Layout is significant … “offside rule” … and idiosyncratic.doSwap pnt = applyTP (full_buTP (idTP `adhocTP`
inMatch `adhocTP` inExp `adhocTP` inDecl)) where inMatch ((HsMatch loc fun pats rhs ds)::HsMatchP) | fun == pnt = case pats of (p1:p2:ps) -> do pats'<-swap p1 p2 pats return (HsMatch loc fun pats' rhs ds) _ -> error "Insufficient arguments to swap." inMatch m = return m inExp exp@((Exp (HsApp (Exp (HsApp e e1)) e2))::HsExpP) | expToPNT e == pnt = swap e1 e2 exp
inExp e = return e
doSwap pnt = applyTP (full_buTP (idTP `adhocTP` inMatch `adhocTP` inExp `adhocTP` inDecl)) where inMatch ((HsMatch loc fun pats rhs ds)::HsMatchP) | fun == pnt = case pats of (p1:p2:ps) -> do pats'<-swap p1 p2 pats return (HsMatch loc fun pats' rhs ds) _ -> error "Insufficient arguments to swap." inMatch m = return m inExp exp@((Exp (HsApp (Exp (HsApp e e1)) e2))::HsExpP) | expToPNT e == pnt = swap e1 e2 exp
inExp e = return e
LIL 2006 22
Why refactor Haskell?The only design artefact is (in) the code.
Semantics of functional languages support large-scale transformations (?)
Building real tools to support functional programming … heavy lifting.
Platform for research and experimentation.
LIL 2006 23
Lift / demotef x y = … h … where h = …
Hide a function which is clearly subsidiary to f; clear up the namespace.
f x y = … (h y) … h y = …
Makes h accessible to the other functions in the module and beyond.
Free variables: which parameters of f are used in h?Need h not to be defined at the top level, … , Type of h will generally change … .
LIL 2006 24
Algebraic or abstract type?
data Tr a
= Leaf a |
Node a (Tr a) (Tr a) Tr
Leaf
Node
flatten :: Tr a -> [a]
flatten (Leaf x) = [x]
flatten (Node s t)
= flatten s ++
flatten t
LIL 2006 25
Algebraic or abstract type?
data Tr a
= Leaf a |
Node a (Tr a) (Tr a)
isLeaf = …
isNode = …
…
Tr
isLeaf
isNode
leaf
left
right
mkLeaf
mkNode
flatten :: Tr a -> [a]
flatten t
| isleaf t = [leaf t]
| isNode t
= flatten (left t)
++ flatten (right t)
LIL 2006 26
Information required
Lexical structure of programs,abstract syntax,binding structure,type system andmodule system.
LIL 2006 27
Program transformationsProgram optimisation source-to-source transformations to get more efficient code
Program derivation calculating efficient code from obviously correct specifications
Refactoring transforming code structure usually bidirectional and conditional.
Refactoring = Transformation + Condition
LIL 2006 28
Conditions: renaming f to g
“No change to the binding structure”
1. No two definitions of g at the same level.
2. No capture of g.3. No capture by g.
LIL 2006 29
Capture of renamed identifier
h x = … h … f … g … where g y = …
f x = …
h x = … h … g … g … where g y = …
g x = …
LIL 2006 30
Capture by renamed identifier
h x = … h … f … g … where f y = … f … g …
g x = …
h x = … h … g … g … where g y = … g … g …
g x = …
LIL 2006 31
Refactoring by hand?
By hand = in a text editor
Tedious
Error-prone• Implementing the transformation …• … and the conditions.
Depends on compiler for type checking, …
… plus extensive testing.
LIL 2006 32
Machine support invaluableReliable
Low cost of do / undo, even for large refactorings.
Increased effectiveness and creativity.
LIL 2006 33
Demonstration of HaRe, hosted in vim.
LIL 2006 34
LIL 2006 35
LIL 2006 36
LIL 2006 37
The refactorings in HaRe
RenameDeleteLift / DemoteIntroduce definitionRemove definitionUnfoldGeneraliseAdd/remove parameters
Move def between modules
Delete/add to exports
Clean importsMake imports
explicit
data type to ADTShort-cut, warm
fusionAll module aware
LIL 2006 38
HaRe design rationale Integrate with existing development tools.
Work with the complete language: Haskell 98
Preserve comments and the formatting style.
Reuse existing libraries and systems.
Extensibility and scriptability.
LIL 2006 39
Information required
Lexical structure of programs,abstract syntax,binding structure,type system andmodule system.
LIL 2006 40
The Implementation of HaRe
Informationgathering
Informationgathering
Pre-conditionchecking
Pre-conditionchecking
Programtransformation
Programtransformation
ProgramrenderingProgramrendering
Strafunski
LIL 2006 41
Finding free variables ‘by hand’
instance FreeVbls HsExp where freeVbls (HsVar v) = [v] freeVbls (HsApp f e) = freeVbls f ++ freeVbls e freeVbls (HsLambda ps e) = freeVbls e \\ concatMap paramNames ps freeVbls (HsCase exp cases) = freeVbls exp ++ concatMap freeVbls cases freeVbls (HsTuple _ es) = concatMap freeVbls es …
Boilerplate code: 1000 noise : 100 significant.
LIL 2006 42
StrafunskiStrafunski allows a user to write general (read generic), type safe, tree traversing programs, with ad hoc behaviour at particular points.
Top-down / bottom up, type preserving / unifying,
full stop one
LIL 2006 43
Strafunski in useTraverse the tree accumulating free variables from components, except in the case of lambda abstraction, local scopes, …
Strafunski allows us to work within Haskell …
Other options? Generic Haskell, Template Haskell, AG, Scrap Your Boilerplate, …
LIL 2006 44
Rename an identifier
rename:: (Term t)=>PName->HsName->t->Maybe t rename oldName newName = applyTP worker where worker = full_tdTP (idTP ‘adhocTP‘ idSite) idSite :: PName -> Maybe PName idSite v@(PN name orig) | v == oldName = return (PN newName orig) idSite pn = return pn
LIL 2006 45
The coding effort
Transformations: straightforward in Strafunski …
… the chore is implementing conditions that the transformation preserves meaning.
This is where much of our code lies.
LIL 2006 46
Program rendering example-- This is an example
module Main where
sumSquares x y = sq x + sq y where sq :: Int->Int sq x = x ^ pow pow = 2 :: Int
main = sumSquares 10 20
module Main where
sumSquares x y = sq pow x + sq pow y where pow = 2 :: Int
sq :: Int->Int->Intsq pow x = x ^ pow
main = sumSquares 10 20
-- This is an example
module Main where
sumSquares x y = sq pow x + sq pow y where pow = 2 :: Int
sq :: Int->Int->Intsq pow x = x ^ pow
main = sumSquares 10 20
LIL 2006 47
Token stream and ASTWhite space + comments only in token stream.
Modification of the AST guides the modification of the token stream.
After a refactoring, the program source is recovered from the token stream not the AST.
Heuristics associate comments with program entities.
LIL 2006 48
Work in progress‘Fold’ against definitions … find duplicate code.
All, some or one? Effect on the interface …
f x = … e … e …
Symbolic evaluation
Data refactorings
Interfaces … ‘bad smell’ detection.
LIL 2006 49
API and DSL
RefactoringsRefactorings
Refactoringutilities
Refactoringutilities
StrafunskiStrafunski
HaskellHaskell
Combining formsCombining forms
Library functions
Grammar as data
Strafunski
???
LIL 2006 50
What have we learned?
Efficiency and robustness of libraries in question.
• type checking large systems, • linking, • editor script languages (vim, emacs).
The cost of infrastructure in building practical tools.
Reflections on Haskell itself.
LIL 2006 51
Reflections on HaskellCannot hide items in an export list (cf import).
Field names for prelude types?
Scoped class instances not supported.
‘Ambiguity’ vs. name clash.
‘Tab’ is a nightmare!
Correspondence principle fails …
LIL 2006 52
Correspondence
Operations on definitions and operations on expressions can be placed in one to one correspondence
(R.D.Tennent, 1980)
LIL 2006 53
Correspondence
Definitions
where
f x y = e
f x | g1 = e1 | g2 = e2
Expressions
let
\x y -> e
f x = if g1 then e1 else if g2 … …
LIL 2006 54
Function clauses f x | g1 = e1
f x | g2 = e2
Can ‘fall through’ a function clause … no direct correspondence in the expression language.
f x = if g1 then e1 else if g2 …
No clauses for anonymous functions … no reason to omit them.
LIL 2006 55
Haskell 98 vs. Erlang: generalities
Haskell 98: a lazy, statically typed, purely functional programming language featuring higher-order functions, polymorphism, type classes and monadic effects.
Erlang: a strict, dynamically typed functional programming language with support for concurrency, communication, distribution and fault-tolerance.
LIL 2006 56
Haskell 98 vs. Erlang: example
%% Factorial In Erlang.
-module (fact).
-export ([fac/1]).
fac(0) -> 1;
fac(N) when N > 0 -> N * fac(N-1).
%% Factorial In Erlang.
-module (fact).
-export ([fac/1]).
fac(0) -> 1;
fac(N) when N > 0 -> N * fac(N-1).
-- Factorial In Haskell.
module Fact(fac) where
fac :: Int -> Int
fac 0 = 1
fac n | n>0 = n * fac(n-1)
-- Factorial In Haskell.
module Fact(fac) where
fac :: Int -> Int
fac 0 = 1
fac n | n>0 = n * fac(n-1)
LIL 2006 57
Haskell 98 vs. Erlang: pragmatics Type system makes implementation complex. Layout and comment preservation. Types also affect the refactorings themselves. Clearer semantics for refactorings, but more complex infrastructure.
Untyped traversals much simpler. Use the layout given by emacs. Use cases which cannot be understood statically.
Dynamic semantics of Erlang makes refactorings harder to pin down.
LIL 2006 58
Challenges of Erlang refactoring Multiple binding occurrences of variables.
Indirect function call or function spawn: apply (lists, rev, [[a,b,c]])
Multiple arities … multiple functions: rev/1
Concurrency Refactoring within a design library: OTP. Side-effects.
LIL 2006 59
Generalisation and side-effects
-module (test).
-export([f/0]).
repeat(0) -> ok;repeat(N) ->
io:format (“hello\n"), repeat(N-1).
f( ) -> repeat(5).
-module (test).
-export([f/0]).
repeat(0) -> ok;repeat(N) ->
io:format (“hello\n"), repeat(N-1).
f( ) -> repeat(5).
-module (test).
-export([f/0]).
repeat(A, 0) -> ok;repeat(A, N) ->
A, repeat(A,N-1).
f( ) -> repeat (io:format (“hello\n”), 5).
-module (test).
-export([f/0]).
repeat(A, 0) -> ok;repeat(A, N) ->
A, repeat(A,N-1).
f( ) -> repeat (io:format (“hello\n”), 5).
LIL 2006 60
Generalisation and side-effects
-module (test).
-export([f/0]).
repeat(0) -> ok;repeat(N) ->
io:format (“hello\n"), repeat(N-1).
f( ) -> repeat(5).
-module (test).
-export([f/0]).
repeat(0) -> ok;repeat(N) ->
io:format (“hello\n"), repeat(N-1).
f( ) -> repeat(5).
-module (test).
-export([f/0]).
repeat(A, 0) -> ok;repeat(A, N) ->
A(), repeat(A,N-1).
f( ) -> repeat (fun( )-> io:format (“hello\
n”), 5).
-module (test).
-export([f/0]).
repeat(A, 0) -> ok;repeat(A, N) ->
A(), repeat(A,N-1).
f( ) -> repeat (fun( )-> io:format (“hello\
n”), 5).
LIL 2006 61
The Wrangler
Scanner/ParserParse Tree
Syntax tools
AST annotated with comments
Program analysis and transformation by the refactorer
Transformed AST
Pretty printer
Program source
Program source
Refactorer
AST + comments + binding structure
LIL 2006 62
Teaching and learning design
Exciting prospect of using a refactoring tool as an integral part of an elementary programming course.
Learning a language: learn how you could modify the programs that you have written …
… appreciate the design space, and
… the features of the language.
LIL 2006 63
ConclusionsRefactoring + functional programming: good fit.
Real win from available libraries … with work.
Substantial effort in infrastructure.
De facto vs de jure: GHC vs Haskell 98.
Correctness and verification …
Language independence …