Upload
hoangnhan
View
219
Download
2
Embed Size (px)
Citation preview
Fault-tolerance with ErlangSeminar on Scientific Computing Languages 2013/14
Jan Winkelmann
January 9, 2014
Jan Winkelmann Fault-tolerance with Erlang January 9, 2014 2 / 37
Table of contents
1 Introduction
2 Erlang Actors
3 Fault-tolerant Applications in Erlang
4 Conclusion
Jan Winkelmann Fault-tolerance with Erlang January 9, 2014 3 / 37
Introduction
Section 1
Introduction
Jan Winkelmann Fault-tolerance with Erlang January 9, 2014 4 / 37
Introduction Motivation
Erlang
Developed by Ericsson in 1987 for telephony routers
Requirements:
Soft real-time capable
Provide high availability
(less than 5 seconds downtime per year)
Provide fault-tolerance
Hi = fun() -> io:format("Hello World~n") end. Hi().
Jan Winkelmann Fault-tolerance with Erlang January 9, 2014 5 / 37
Introduction Motivation
Erlang: Facts
Strongly and dynamically typed (like Ruby, Python)
Impure functional, compiled (like Scala)
Actor based
Concurrent
Extremly lightweight processes in Virtual Machine
Out-of-the-box distributed computing
Jan Winkelmann Fault-tolerance with Erlang January 9, 2014 6 / 37
Introduction Motivation
Features I will not talk about
High availability (i.e. hot code loading)
Real-time programming
Distributed Erlang
We will focus on Erlangs fault-tolerance features
Jan Winkelmann Fault-tolerance with Erlang January 9, 2014 7 / 37
Introduction Motivation
Why design fault-tolerant applications?
Your program will not fail when errors occur
Modern languages have features to deal with some errors
Exceptions, anyone?
Consider temporary hardware faults
Problem is even worse for distributed programming
Erlang has a different approach to these problems
Trends in HPC: Exascale Computing
Traditional programming approach becomes difficultMean time to failure will increaseMPI has no build-in fault-tolerance
Jan Winkelmann Fault-tolerance with Erlang January 9, 2014 8 / 37
Erlang Actors
Section 2
Erlang Actors
Jan Winkelmann Fault-tolerance with Erlang January 9, 2014 9 / 37
Erlang Actors Basics
What are Actors
Model for concurrent computation from theoretical CS
Actors can:
Compute,Send/receive messages, andSpawn new actors.
Used in a few languages (Scala, Erlang, ...)
Jan Winkelmann Fault-tolerance with Erlang January 9, 2014 10 / 37
Erlang Actors Basics
Actors in Erlang
Processes in Erlang are actors
No shared memory, very little shared state
Leightweight:
Extremely fast creation, destruction, and scheduling
Jan Winkelmann Fault-tolerance with Erlang January 9, 2014 11 / 37
Erlang Actors Basics
Messages in Erlang
Communication via message passing
Messages sent to existent processes will arrive
But receiving process might
Not existCrash before reading messageCrash before replying
No errors for all of these
Jan Winkelmann Fault-tolerance with Erlang January 9, 2014 12 / 37
Erlang Actors Example
Actors in Erlang: Example
-module(helloworld).
-export([start/0, server/0]).
start() ->
spawn(helloworld,server,[]).
server() ->
receive
{Msg, Client} ->
erlang:display(Msg),
Client ! sucess,
server();
_ ->
io:format("Stopping~n");
end.
Jan Winkelmann Fault-tolerance with Erlang January 9, 2014 13 / 37
Erlang Actors Example
Actors in Erlang: Example
8> c(helloworld).
9> Server = helloworld:start().
10> Server ! {12, self()}.
12
11> flush().
Shell got sucess
12> Server ! {notvalid}.
Stopping
13> Server ! {12, self()}.
Jan Winkelmann Fault-tolerance with Erlang January 9, 2014 14 / 37
Erlang Actors Example
From here on we need Actors
Questions about Actors?
Jan Winkelmann Fault-tolerance with Erlang January 9, 2014 15 / 37
Fault-tolerant Applications in Erlang
Section 3
Fault-tolerant Applications in Erlang
Jan Winkelmann Fault-tolerance with Erlang January 9, 2014 16 / 37
Fault-tolerant Applications in Erlang Motivation
Reminder: Motivation
Let’s talk about fault-tolerance (in distributed systems):
Best example: temporary faults, hardware failure
What to do when something crashes?
Error handling can be very complex
Jan Winkelmann Fault-tolerance with Erlang January 9, 2014 17 / 37
Fault-tolerant Applications in Erlang Features
Let it fail!
Design Pattern: Let it fail!
Recover only if easily possible
Fail as early as possible,
then retry
Jan Winkelmann Fault-tolerance with Erlang January 9, 2014 18 / 37
Fault-tolerant Applications in Erlang Features
Actor based fault-tolerance
Let it fail for actors:
If error recovery not easily possible: fail
i.e. via exception
Dependent processes also fail
Restart all failed processes, retry
Make process fail 3
Make dependent processes fail 7
Notice failed processes 7
Jan Winkelmann Fault-tolerance with Erlang January 9, 2014 19 / 37
Fault-tolerant Applications in Erlang Actor-based fault-tolerance
Link: Dependencies between Processes
Link: logical connection between two processes
Used for expressing dependency among processes
Established via function call
If process dies it sends an exit signal to all linked processes
Processes fail when receiving an exit signal
Process fail 3
Dependent process fail 3
Notice failed processes 7
Jan Winkelmann Fault-tolerance with Erlang January 9, 2014 20 / 37
Fault-tolerant Applications in Erlang Actor-based fault-tolerance
Linking: Example
1> self(). %Pid of shell
<0.32.0>
2> link(spawn(fun() - > %Exit with error after 5s
timer:sleep(5000), exit(reason) end)).
true
...
** exception error: reason
3> self(). %Pid of shell
<0.37.0>
The shell crashed, because the linked actor exited abnormally
Jan Winkelmann Fault-tolerance with Erlang January 9, 2014 21 / 37
Fault-tolerant Applications in Erlang Actor-based fault-tolerance
Linking: Respawning failed processes
Normally linked processes die when receiving exit signal
Processes can choose to trap exit signals
Exit signal is then added to the process’ inbox.
Can be used to respawn failed processes
Worker/Supervisor Design Pattern
Process fail 3
Dependent process fail 3
Notice failed processes 3
Jan Winkelmann Fault-tolerance with Erlang January 9, 2014 22 / 37
Fault-tolerant Applications in Erlang Superviser Trees
Restart Strategies
What does a supervisor do when a watched process fails?
One for oneRestart only failed process
One for allEquivalent to linking watched processesRestart all watched processes if one process fails
Jan Winkelmann Fault-tolerance with Erlang January 9, 2014 23 / 37
Fault-tolerant Applications in Erlang Superviser Trees
Supervisor Trees
Supervisors restart watched processes,
according to a restart strategy.
Simplification: Each process is watched by only one supervisor
Allow supervisors to watch supervisors
⇒ Supervisor trees
Jan Winkelmann Fault-tolerance with Erlang January 9, 2014 24 / 37
Fault-tolerant Applications in Erlang Superviser Trees
Supervisor Trees: Example
1
1 a
a 1
1: One for one restart strategy; a: One for all restart strategy
Jan Winkelmann Fault-tolerance with Erlang January 9, 2014 25 / 37
Fault-tolerant Applications in Erlang Superviser Trees
Wrap-up: Fault-tolerant parallel Erlang
Erlang can work distributed out of the box
Supervisors may be on separate node
Restart children on another node
Jan Winkelmann Fault-tolerance with Erlang January 9, 2014 26 / 37
Fault-tolerant Applications in Erlang Superviser Trees
Erlang/OTP: Batteries included
Open Telecom Platform comes with
Supervisor trees,
Finite state machines,
Gen server (sychronous messaging)
A key value store (Mnesia)
Jan Winkelmann Fault-tolerance with Erlang January 9, 2014 27 / 37
Closing Words
Section 4
Conclusion
Jan Winkelmann Fault-tolerance with Erlang January 9, 2014 28 / 37
Closing Words Summary
Summary
Erlang is functional and actor based
Fast, distributed, process spawning
Express dependencies between processes
Restart of failed processes
⇒ Supervisor trees
Remark: not tolerant against all possible faults
Jan Winkelmann Fault-tolerance with Erlang January 9, 2014 29 / 37
Closing Words Summary
Erlang and Scientific Computing
Use Erlang for scientific computing with these requirements:
Fault-tolerant,
High Availability,
Massively Large Scale,
Soft real-time
Jan Winkelmann Fault-tolerance with Erlang January 9, 2014 30 / 37
Closing Words Industry Usage
Example: Disco
Disco, a MapReduce framework by Nokia
http://discoproject.org/
Basically a Hadoop competitor
Also: Neural Networks
ejabberd, CouchDB, ZeroMQ, Wings3D, WhatsApp, High-frequency trading, ...
Jan Winkelmann Fault-tolerance with Erlang January 9, 2014 31 / 37
Closing Words Further Resources
Sources
Programming Erlang by Joe ArmstrongISBN: 978-1937785536
Seven Languages in Seven Weeks by Bruce TateISBN: 978-1934356593
Erlang Documentation: erlang.org/doc
http://learnyousomeerlang.com
Jan Winkelmann Fault-tolerance with Erlang January 9, 2014 32 / 37
Closing Words Further Resources
Further Resources
Erlang
Seven Languages in Seven Weeks by Bruce TateISBN: 978-1934356593learnyousomeerlang.com
Functional Programming:
learnyouahaskell.com
Real World Haskell by Bryan O’SullivanISBN: 978-0596514983 book.realworldhaskell.org/read
Take a look look at elixir-lang.org a “functional,meta-programming aware language built on top of the Erlang VM.”
Jan Winkelmann Fault-tolerance with Erlang January 9, 2014 33 / 37
Closing Words Further Resources
The End.
Questions?
Jan Winkelmann Fault-tolerance with Erlang January 9, 2014 34 / 37
Closing Words Further Resources
There are a lot of things missing
Ask me about:
Functional Programming
Types
Toolchain (compiler, debugger)
C language bindings and Cuda
Parallelism vs. Concurrency
High performance Erlang, HiPE
Handling binary data
Jan Winkelmann Fault-tolerance with Erlang January 9, 2014 35 / 37
Closing Words Further Resources
Factorial
-module(factorial).
-export([factorial/1]).
factorial(N) when N =< 1 ->
1;
factorial(N) ->
N * factorial(N-1).
Jan Winkelmann Fault-tolerance with Erlang January 9, 2014 36 / 37
Closing Words Further Resources
Factorial: DEMO
1> c(factorial).
{ok,factorial}
2> factorial:factorial(100).
93326215443944152681699238856266700490715968264381621468592963895217599993229915608941463976156518286253697920827223758251185210916864000000000000000000000000
Jan Winkelmann Fault-tolerance with Erlang January 9, 2014 37 / 37