37
Jan Winkelmann Fault-tolerance with Erlang January 9, 2014 1 / 37

Jan Winkelmann Fault-tolerance with Erlang January 9, …hpac.rwth-aachen.de/teaching/sem-lsc-13/Erlang.pdf · Jan Winkelmann Fault-tolerance with Erlang January 9, ... Table of contents

Embed Size (px)

Citation preview

Jan Winkelmann Fault-tolerance with Erlang January 9, 2014 1 / 37

Fault-tolerance with ErlangSeminar on Scientific Computing Languages 2013/14

Jan Winkelmann

January 9, 2014

Jan Winkelmann Fault-tolerance with Erlang January 9, 2014 2 / 37

Table of contents

1 Introduction

2 Erlang Actors

3 Fault-tolerant Applications in Erlang

4 Conclusion

Jan Winkelmann Fault-tolerance with Erlang January 9, 2014 3 / 37

Introduction

Section 1

Introduction

Jan Winkelmann Fault-tolerance with Erlang January 9, 2014 4 / 37

Introduction Motivation

Erlang

Developed by Ericsson in 1987 for telephony routers

Requirements:

Soft real-time capable

Provide high availability

(less than 5 seconds downtime per year)

Provide fault-tolerance

Hi = fun() -> io:format("Hello World~n") end. Hi().

Jan Winkelmann Fault-tolerance with Erlang January 9, 2014 5 / 37

Introduction Motivation

Erlang: Facts

Strongly and dynamically typed (like Ruby, Python)

Impure functional, compiled (like Scala)

Actor based

Concurrent

Extremly lightweight processes in Virtual Machine

Out-of-the-box distributed computing

Jan Winkelmann Fault-tolerance with Erlang January 9, 2014 6 / 37

Introduction Motivation

Features I will not talk about

High availability (i.e. hot code loading)

Real-time programming

Distributed Erlang

We will focus on Erlangs fault-tolerance features

Jan Winkelmann Fault-tolerance with Erlang January 9, 2014 7 / 37

Introduction Motivation

Why design fault-tolerant applications?

Your program will not fail when errors occur

Modern languages have features to deal with some errors

Exceptions, anyone?

Consider temporary hardware faults

Problem is even worse for distributed programming

Erlang has a different approach to these problems

Trends in HPC: Exascale Computing

Traditional programming approach becomes difficultMean time to failure will increaseMPI has no build-in fault-tolerance

Jan Winkelmann Fault-tolerance with Erlang January 9, 2014 8 / 37

Erlang Actors

Section 2

Erlang Actors

Jan Winkelmann Fault-tolerance with Erlang January 9, 2014 9 / 37

Erlang Actors Basics

What are Actors

Model for concurrent computation from theoretical CS

Actors can:

Compute,Send/receive messages, andSpawn new actors.

Used in a few languages (Scala, Erlang, ...)

Jan Winkelmann Fault-tolerance with Erlang January 9, 2014 10 / 37

Erlang Actors Basics

Actors in Erlang

Processes in Erlang are actors

No shared memory, very little shared state

Leightweight:

Extremely fast creation, destruction, and scheduling

Jan Winkelmann Fault-tolerance with Erlang January 9, 2014 11 / 37

Erlang Actors Basics

Messages in Erlang

Communication via message passing

Messages sent to existent processes will arrive

But receiving process might

Not existCrash before reading messageCrash before replying

No errors for all of these

Jan Winkelmann Fault-tolerance with Erlang January 9, 2014 12 / 37

Erlang Actors Example

Actors in Erlang: Example

-module(helloworld).

-export([start/0, server/0]).

start() ->

spawn(helloworld,server,[]).

server() ->

receive

{Msg, Client} ->

erlang:display(Msg),

Client ! sucess,

server();

_ ->

io:format("Stopping~n");

end.

Jan Winkelmann Fault-tolerance with Erlang January 9, 2014 13 / 37

Erlang Actors Example

Actors in Erlang: Example

8> c(helloworld).

9> Server = helloworld:start().

10> Server ! {12, self()}.

12

11> flush().

Shell got sucess

12> Server ! {notvalid}.

Stopping

13> Server ! {12, self()}.

Jan Winkelmann Fault-tolerance with Erlang January 9, 2014 14 / 37

Erlang Actors Example

From here on we need Actors

Questions about Actors?

Jan Winkelmann Fault-tolerance with Erlang January 9, 2014 15 / 37

Fault-tolerant Applications in Erlang

Section 3

Fault-tolerant Applications in Erlang

Jan Winkelmann Fault-tolerance with Erlang January 9, 2014 16 / 37

Fault-tolerant Applications in Erlang Motivation

Reminder: Motivation

Let’s talk about fault-tolerance (in distributed systems):

Best example: temporary faults, hardware failure

What to do when something crashes?

Error handling can be very complex

Jan Winkelmann Fault-tolerance with Erlang January 9, 2014 17 / 37

Fault-tolerant Applications in Erlang Features

Let it fail!

Design Pattern: Let it fail!

Recover only if easily possible

Fail as early as possible,

then retry

Jan Winkelmann Fault-tolerance with Erlang January 9, 2014 18 / 37

Fault-tolerant Applications in Erlang Features

Actor based fault-tolerance

Let it fail for actors:

If error recovery not easily possible: fail

i.e. via exception

Dependent processes also fail

Restart all failed processes, retry

Make process fail 3

Make dependent processes fail 7

Notice failed processes 7

Jan Winkelmann Fault-tolerance with Erlang January 9, 2014 19 / 37

Fault-tolerant Applications in Erlang Actor-based fault-tolerance

Link: Dependencies between Processes

Link: logical connection between two processes

Used for expressing dependency among processes

Established via function call

If process dies it sends an exit signal to all linked processes

Processes fail when receiving an exit signal

Process fail 3

Dependent process fail 3

Notice failed processes 7

Jan Winkelmann Fault-tolerance with Erlang January 9, 2014 20 / 37

Fault-tolerant Applications in Erlang Actor-based fault-tolerance

Linking: Example

1> self(). %Pid of shell

<0.32.0>

2> link(spawn(fun() - > %Exit with error after 5s

timer:sleep(5000), exit(reason) end)).

true

...

** exception error: reason

3> self(). %Pid of shell

<0.37.0>

The shell crashed, because the linked actor exited abnormally

Jan Winkelmann Fault-tolerance with Erlang January 9, 2014 21 / 37

Fault-tolerant Applications in Erlang Actor-based fault-tolerance

Linking: Respawning failed processes

Normally linked processes die when receiving exit signal

Processes can choose to trap exit signals

Exit signal is then added to the process’ inbox.

Can be used to respawn failed processes

Worker/Supervisor Design Pattern

Process fail 3

Dependent process fail 3

Notice failed processes 3

Jan Winkelmann Fault-tolerance with Erlang January 9, 2014 22 / 37

Fault-tolerant Applications in Erlang Superviser Trees

Restart Strategies

What does a supervisor do when a watched process fails?

One for oneRestart only failed process

One for allEquivalent to linking watched processesRestart all watched processes if one process fails

Jan Winkelmann Fault-tolerance with Erlang January 9, 2014 23 / 37

Fault-tolerant Applications in Erlang Superviser Trees

Supervisor Trees

Supervisors restart watched processes,

according to a restart strategy.

Simplification: Each process is watched by only one supervisor

Allow supervisors to watch supervisors

⇒ Supervisor trees

Jan Winkelmann Fault-tolerance with Erlang January 9, 2014 24 / 37

Fault-tolerant Applications in Erlang Superviser Trees

Supervisor Trees: Example

1

1 a

a 1

1: One for one restart strategy; a: One for all restart strategy

Jan Winkelmann Fault-tolerance with Erlang January 9, 2014 25 / 37

Fault-tolerant Applications in Erlang Superviser Trees

Wrap-up: Fault-tolerant parallel Erlang

Erlang can work distributed out of the box

Supervisors may be on separate node

Restart children on another node

Jan Winkelmann Fault-tolerance with Erlang January 9, 2014 26 / 37

Fault-tolerant Applications in Erlang Superviser Trees

Erlang/OTP: Batteries included

Open Telecom Platform comes with

Supervisor trees,

Finite state machines,

Gen server (sychronous messaging)

A key value store (Mnesia)

Jan Winkelmann Fault-tolerance with Erlang January 9, 2014 27 / 37

Closing Words

Section 4

Conclusion

Jan Winkelmann Fault-tolerance with Erlang January 9, 2014 28 / 37

Closing Words Summary

Summary

Erlang is functional and actor based

Fast, distributed, process spawning

Express dependencies between processes

Restart of failed processes

⇒ Supervisor trees

Remark: not tolerant against all possible faults

Jan Winkelmann Fault-tolerance with Erlang January 9, 2014 29 / 37

Closing Words Summary

Erlang and Scientific Computing

Use Erlang for scientific computing with these requirements:

Fault-tolerant,

High Availability,

Massively Large Scale,

Soft real-time

Jan Winkelmann Fault-tolerance with Erlang January 9, 2014 30 / 37

Closing Words Industry Usage

Example: Disco

Disco, a MapReduce framework by Nokia

http://discoproject.org/

Basically a Hadoop competitor

Also: Neural Networks

ejabberd, CouchDB, ZeroMQ, Wings3D, WhatsApp, High-frequency trading, ...

Jan Winkelmann Fault-tolerance with Erlang January 9, 2014 31 / 37

Closing Words Further Resources

Sources

Programming Erlang by Joe ArmstrongISBN: 978-1937785536

Seven Languages in Seven Weeks by Bruce TateISBN: 978-1934356593

Erlang Documentation: erlang.org/doc

http://learnyousomeerlang.com

Jan Winkelmann Fault-tolerance with Erlang January 9, 2014 32 / 37

Closing Words Further Resources

Further Resources

Erlang

Seven Languages in Seven Weeks by Bruce TateISBN: 978-1934356593learnyousomeerlang.com

Functional Programming:

learnyouahaskell.com

Real World Haskell by Bryan O’SullivanISBN: 978-0596514983 book.realworldhaskell.org/read

Take a look look at elixir-lang.org a “functional,meta-programming aware language built on top of the Erlang VM.”

Jan Winkelmann Fault-tolerance with Erlang January 9, 2014 33 / 37

Closing Words Further Resources

The End.

Questions?

Jan Winkelmann Fault-tolerance with Erlang January 9, 2014 34 / 37

Closing Words Further Resources

There are a lot of things missing

Ask me about:

Functional Programming

Types

Toolchain (compiler, debugger)

C language bindings and Cuda

Parallelism vs. Concurrency

High performance Erlang, HiPE

Handling binary data

Jan Winkelmann Fault-tolerance with Erlang January 9, 2014 35 / 37

Closing Words Further Resources

Factorial

-module(factorial).

-export([factorial/1]).

factorial(N) when N =< 1 ->

1;

factorial(N) ->

N * factorial(N-1).

Jan Winkelmann Fault-tolerance with Erlang January 9, 2014 36 / 37

Closing Words Further Resources

Factorial: DEMO

1> c(factorial).

{ok,factorial}

2> factorial:factorial(100).

93326215443944152681699238856266700490715968264381621468592963895217599993229915608941463976156518286253697920827223758251185210916864000000000000000000000000

Jan Winkelmann Fault-tolerance with Erlang January 9, 2014 37 / 37