30
Easier, Better, Faster, Stronger Kenta Sato July 02, 2014 1 / 30

Julia - Easier, Better, Faster, Stronger

Embed Size (px)

Citation preview

Page 1: Julia - Easier, Better, Faster, Stronger

Easier, Better, Faster, Stronger

Kenta Sato

July 02, 2014

1 / 30

Page 2: Julia - Easier, Better, Faster, Stronger

Agenda1. The Julia Language

2. Easier

Familiar Syntax

Just-In-Time Compiler

3. Better

Types for Technical Computing

Library Support

Type System

4. Faster

Benchmark

N Queens Puzzle

5. Stronger

Multiple Dispatch

Macros2 / 30

Page 3: Julia - Easier, Better, Faster, Stronger

NotationsHere I use the following special notation in examples.

<expression> #> <value>: The <expression> is evaluated to the <value>.

<expression> #: <output>: When the <expression> is evaluated, it prints the<output> to the screen.

<expression> #! <error>: When the <expression> is evaluated, it throws the<error>.

Examples:

42 #> 422 + 3 #> 5"hello, world" #> "hello, world"println("hello, world") #: hello, world42 + "hello, world" #! ERROR: no method +(Int64, ASCIIString)

3 / 30

Page 4: Julia - Easier, Better, Faster, Stronger

The Julia Language

Julia is a high­level, high­performance dynamicprogramming language for technical computing, withsyntax that is familiar to users of other technicalcomputing environments. It provides a sophisticatedcompiler, distributed parallel execution, numerical accuracy,and an extensive mathematical function library.

The core of the Julia implementation is licensed under the MITlicense. Various libraries used by the Julia environmentinclude their own licenses such as the GPL, LGPL, and BSD(therefore the environment, which consists of the language,user interfaces, and libraries, is under the GPL).— http://julialang.org/

4 / 30

Page 5: Julia - Easier, Better, Faster, Stronger

Easier - Familiar SyntaxAt a glance, you will feel familiar with the syntax of Julia.

The usage of for, while, and if is very close to that of Ruby or Python.

continue, break, and return work as you expect.

Defining function is also straightforward, the function name is followed by itsarguments.

You can specify the types of arguments, which is actually optional.

@inbounds is a kind of macros, and macros always start with the @ character.

function sort!(v::AbstractVector, lo::Int, hi::Int, ::InsertionSortAlg, o::Ordering) @inbounds for i in lo+1:hi j = i x = v[i] while j > lo if lt(o, x, v[j-1]) v[j] = v[j-1] j -= 1 continue end break end v[j] = x end return vend base/sort.jl5 / 30

Page 6: Julia - Easier, Better, Faster, Stronger

Easier - Familiar Syntaxn:m creates a range data which is inclusive on both sides.

Python's range(n, m) includes the left side, but doesn't the right side,which is often confusing.

[... for x in xs] creates an array from xs, which is something iterable.

This notation is known as list comprehension in Python and Haskell.

4:8 #> 4:8[x for x in 4:8] #> [4,5,6,7,8][4:8] #> [4,5,6,7,8][x * 2 for x in 4:8] #> [8,10,12,14,16]

6 / 30

Page 7: Julia - Easier, Better, Faster, Stronger

Easier - Familiar SyntaxThe index of an array always starts with 1, not 0.

That means when you allocate an array with size n, all indices in 1:n areaccessible.

You can use a range data to copy a part of an array.

The step of a range can be placed between the start and stop. (i.e.start:step:stop)

You can also specify negative step, which creates a reversed range.

There is a special index - end - indicating the last index of an array.

xs = [8, 6, 4, 2, 0]xs[1:3] #> [8,6,4]xs[4:end] #> [2,0]xs[1:2:end] #> [8,4,0]xs[end:-2:1] #> [0,4,8]

7 / 30

Page 8: Julia - Easier, Better, Faster, Stronger

Easier - Just-In-Time CompilerTo run your program written in Julia, there is no need to compile it beforehand.You only have to give the entry point file to the Julia's JIT (Jist-In-Time) compiler:

% cat myprogram.jln = 10xs = [1:n]println("the total between 1 and $n is $(sum(xs))")% julia myprogram.jlthe total between 1 and 10 is 55

From version 0.3, the standard libraries are precompiled when you build Julia,which saves much time to start your program.

% time julia myprogram.jlthe total between 1 and 10 is 55 0.80 real 0.43 user 0.10 sys

8 / 30

Page 9: Julia - Easier, Better, Faster, Stronger

Better - Types for Technical ComputingJulia supports various numerical types with different sizes.

Integer types

Type Signed? Number of bits Smallest value Largest value

Int8 ✓ 8 -2^7 2^7 - 1

Uint8 8 0 2^8 - 1

Int16 ✓ 16 -2^15 2^15 - 1

Uint16 16 0 2^16 - 1

Int32 ✓ 32 -2^31 2^31 - 1

Uint32 32 0 2^32 - 1

Int64 ✓ 64 -2^63 2^63 - 1

Uint64 64 0 2^64 - 1

Int128 ✓ 128 -2^127 2^127 - 1

Uint128 128 0 2^128 - 1

Bool N/A 8 false (0) true (1)

Char N/A 32 '\0' '\Uffffffff'

9 / 30

Page 10: Julia - Easier, Better, Faster, Stronger

Better - Types for Technical ComputingFloating-point types

Type Precision Number of bits

Float16 half 16

Float32 single 32

Float64 double 64

10000 #> 10000typeof(10000) #> Int640x12 #> 0x12typeof(0x12) #> Uint80x123 #> 0x0123typeof(0x123) #> Uint161.2 #> 1.2typeof(1.2) #> Float641.2e-10 #> 1.2e-10

Complex numbers and rational numbers are also available:

1 + 2im # 1 + 2i6//9 # 2/3

http://julia.readthedocs.org/en/latest/manual/integers-and-floating-point-numbers/#integers-and-floating-point-numbers10 / 30

Page 11: Julia - Easier, Better, Faster, Stronger

Better - Types for Technical ComputingIf you need more precise values, arbitrary-precision arithmetic is supported. Thereare two data types to offer this arithmetic operation:

BigInt - arbitrary precision integer

BigFloat - arbitrary precision floating point numbers

big_prime = BigInt("5052785737795758503064406447721934417290878968063369478337")typeof(big_prime) #> BigInt

precise_pi = BigFloat("3.14159265358979323846264338327950288419716939937510582097")typeof(precise_pi) #> BigFloat

And if you need customized types, you can create a new type. The user-definedtypes are instantiated by their type name functions called constructors:

type Point x::Float64 y::Float64end

# Point is the constructor.p1 = Point(1.2, 3.4)p2 = Point(0.2, -3.1) 11 / 30

Page 12: Julia - Easier, Better, Faster, Stronger

Better - Library SupportJulia bundles various libraries in it. These libraries are incorporated into thestandard library, thus almost no need to know the details of the underlying APIs.

Numerical computing

OpenBLAS ― basic linear algebra subprograms

LAPACK ― linear algebra routines for solving systems

Intel® Math Kernel Library (optional) ― fast math library for Intelprocessors

SuiteSparse ― linear algebra routines for sparse matrices

ARPACK ― subroutines desined to solve large scale eigenvalue problems

FFTW ― library for computing the discrete Fourier transformations

Other tools

PCRE ― Perl-compatible regular expressions library

libuv ― asynchronous IO library

12 / 30

Page 13: Julia - Easier, Better, Faster, Stronger

Better - Library SupportHere some functions of linear algebra library.

a = randn((50, 1000)) # 50x1000 matrixb = randn((50, 1000)) # 50x1000 matrixx = randn((1000, 1000)) # 1000x1000 matrix

# dot productdot(vec(a), vec(b))# matrix multiplicationa * x# LU factorizationlu(x)# eigen values and eigen vectorseig(x)

The vec function converts a multi-dimensional array into a vector without copy.❏

13 / 30

Page 14: Julia - Easier, Better, Faster, Stronger

Better - Type SystemThe type system of Julia is categorized as dynamic type-checking, in whichthe type safety is verified at runtime.

But each value has a concrete type and its type is not implicitly converted toother type at runtime.

You can almost always think that types should be converted explicitly.

There are two notable exceptions: arithmetic operators andconstructors.

x = 12typeof(x) #> Int64y = 12.0typeof(y) #> Float64

# this function only accepts an Int64 argumentfunction foo(x::Int64) println("the value is $x")end

foo(x) #: the value is 12foo(y) #! ERROR: no method foo(Float64)

14 / 30

Page 15: Julia - Easier, Better, Faster, Stronger

x = 12y = 12.0x + y #> 24.0x - y #> 0.0x * y #> 144.0x / y #> 1.0

promotion rule is defined as:

promote_rule(::Type{Float64}, ::Type{Int64}) = Float64

type Point x::Float64 y::Float64end

Point(x, y) #> Point(12.0, 12.0)

Better - Type SystemArithmetic operators are functions in Julia.

For example, addition of Float64 is defined as +(x::Float64,y::Float64)at float.jl:125.

But you can use these operators for differently typed values.

This automatic type conversion is called promotion, which is defined by thepromote_rule function.

Constructors also do type conversion implicitly.

15 / 30

Page 16: Julia - Easier, Better, Faster, Stronger

Better - Type SystemTypes can be parameterized by other types or values. This is called typeparameters.

For example, an array has two type parameters - the element type and thedimensions.

The Array{T,D} type contains elements typed as T, and is a Ddimensional array.

typeof([1, 2, 3]) #> Array{Int64,1}typeof([1.0, 2.0, 3.0]) #> Array{Float64,1}typeof(["one", "two", "three"]) #> Array{ASCIIString,1}typeof([1 2; 3 4]) #> Array{Int64,2}

Julia allows you to define parameterized types as follows:

type Point{T} x::T y::Tend

Point{Int}(1, 2) #> Point{Int64}(1,2)Point{Float64}(4.2, 2.1) #> Point{Float64}(4.2,2.0)

16 / 30

Page 17: Julia - Easier, Better, Faster, Stronger

Faster - BenchmarkThe performance of Julia is comparable to other compiled languages like C andFortran, and much faster than other interpreted languages.

101

102

10-2

107

10-3

108

100

10-1

106

104

103

105

MatlabGo RMathematicaPythonFortran OctaveJavaScriptJulia

benchmark

fib

mandel

pi_sum

rand_mat_mulrand_mat_stat

printfd

quicksort

parse_int

Figure: benchmark times relative to C (smaller is better, C performance = 1.0).17 / 30

Page 18: Julia - Easier, Better, Faster, Stronger

Faster - BenchmarkThe performance of Julia is comparable to other compiled languages like C andFortran, and much faster than other interpreted languages.

Figure: benchmark times relative to C (smaller is better, C performance = 1.0).

Fortran Julia Python R Matlab Octave Mathe-matica JavaScript Go

gcc4.8.1 0.2 2.7.3 3.0.2 R2012a 3.6.4 8.0 V8

3.7.12.22 go1

fib 0.26 0.91 30.37 411.36 1992.00 3211.81 64.46 2.18 1.03

parse_int 5.03 1.60 13.95 59.40 1463.16 7109.85 29.54 2.43 4.79

quicksort 1.11 1.14 31.98 524.29 101.84 1132.04 35.74 3.51 1.25

mandel 0.86 0.85 14.19 106.97 64.58 316.95 6.07 3.49 2.36

pi_sum 0.80 1.00 16.33 15.42 1.29 237.41 1.32 0.84 1.41

rand_mat_stat 0.64 1.66 13.52 10.84 6.61 14.98 4.52 3.28 8.12

rand_mat_mul 0.96 1.01 3.41 3.98 1.10 3.41 1.16 14.60 8.51

C compiled by gcc 4.8.1, taking best timing from all optimization levels (-O0 through -O3). C, Fortran and Julia use OpenBLASv0.2.8. The Python implementations of rand_mat_stat and rand_mat_mul use NumPy (v1.6.1) functions; the rest are purePython implementations.

18 / 30

Page 19: Julia - Easier, Better, Faster, Stronger

Faster - N Queens PuzzlePlace N queens on an N × N chessboard so that no queens cut in each other, andreturn the number of possible cases.

These are part of solutions when N = 8.

Weisstein, Eric W. "Queens Problem." From MathWorld--A Wolfram Web Resource. http://mathworld.wolfram.com/QueensProblem.html 19 / 30

Page 20: Julia - Easier, Better, Faster, Stronger

Faster - N Queens PuzzleWhen N gets bigger, the number of solutions grows drastically.

It may take a long time to get the answer when N is sufficiently large.

The algorithm uses a bunch of arithmetic, iteration, recursive function call,and branching.

So this puzzle would be suitable for trying the efficiency of a programminglanguage.

The number of solutions of N queens puzzle.

N 4 5 6 7 8 9 10 11 12 13 14 15

#Solutions 2 10 4 40 92 352 724 2,680 14,200 73,712 365,596 2,279,183

20 / 30

Page 21: Julia - Easier, Better, Faster, Stronger

Program in Julia.

solve(n::Int): put n queens on aboard, then return the number ofsolutions.

search(places, i, n): put a queenon the ith row.

isok(places, i, j): checkwhether you can put a queen at(i, j).

This algorithm is not optimal; you canexploit the symmetry of position, butthis is enough to time the speed ofJulia.

Faster - N Queens Puzzle

In isok, you can iterate over enumerate(places) instead.But that killed the performance of the code.

function solve(n::Int) places = zeros(Int, n) search(places, 1, n)end

function search(places, i, n) if i == n + 1 return 1 end

s = 0 @inbounds for j in 1:n if isok(places, i, j) places[i] = j s += search(places, i + 1, n) end end send

function isok(places, i, j) qi = 1 @inbounds for qj in places if qi == i break elseif qj == j || abs(qi - i) == abs(qj - j) return false end qi += 1 end trueend

Julia

21 / 30

Page 22: Julia - Easier, Better, Faster, Stronger

Python and C++ are competitors in ourbenchmark.

Faster - N Queens Puzzle

def solve(n): places = [-1] * n return search(places, 0, n)

def search(places, i, n): if i == n: return 1

s = 0 for j in range(n): if isok(places, i, j): places[i] = j s += search(places, i + 1, n) return s

def isok(places, i, j): for qi, qj in enumerate(places): if qi == i: break elif qj == j or abs(qi - i) == abs(qj - j): return False return True

Python int solve(int n){ std::vector<int> places(n, -1); return search(places, 0, n);}

int search(std::vector<int>& places, int i, int n){ if (i == n) return 1;

int s = 0; for (int j = 0; j < n; j++) { if (isok(places, i, j)) { places[i] = j; s += search(places, i + 1, n); } } return s;}

bool isok(const std::vector<int>& places, int i, int j){ int qi = 0; for (int qj : places) { if (qi == i) break; else if (qj == j || abs(qi - i) == abs(qj - j)) return false; qi++; } return true;}

C++

22 / 30

Page 23: Julia - Easier, Better, Faster, Stronger

Faster - N Queens PuzzleI measured the total time to get the answers corresponding to N = 4, 5, ..., 14.

Julia - v0.3 (commit: da158df6b5b7f918989a73317a799c909d639e5f)

% time julia.jl eightqueen.jl 14 > /dev/null 10.05 real 9.89 user 0.11 sys

Python - v3.4.1

% time python3 eightqueen.py 14 > /dev/null 1283.34 real 1255.18 user 2.67 sys

C++ - v503.0.40

% clang++ -O3 --std=c++11 --stdlib=libc++ eightqueen.cpp% time ./a.out 14 > /dev/null 8.24 real 8.17 user 0.01 sys

23 / 30

Page 24: Julia - Easier, Better, Faster, Stronger

Faster - N Queens PuzzleAnd N = 15.

Julia

% time julia.jl eightqueen.jl 15 > /dev/null 64.75 real 63.73 user 0.17 sys

C++

% time ./a.out 15 > /dev/null 54.31 real 53.89 user 0.05 sys

Note that the result of Julia included JIT compiling time whereas C++ was compiledbefore execution.

The execution time of Python is not measured because Python took too much time when N = 15.❏

Platform Info: System: Darwin (x86_64-apple-darwin13.2.0) CPU: Intel(R) Core(TM) i5-2435M CPU @ 2.40GHz❏

24 / 30

Page 25: Julia - Easier, Better, Faster, Stronger

Stronger - Multiple DispatchWe often want to use a single function name to handle different types.

Additions of floats and integers are completely different procedures, but wealways want to use the + operator in both cases.

Leaving some parameters as optional is useful.

maximum(A, dims) computes the maximum value of an array A over thegiven dimensions.

maximum(A) computes the maximum value of an array A, ignoringdimensions.

Unified API will save your memory.

fit(model, x, y) trains model based on the input x and the output y.

The model may be Generalized Linear Model, Lasso, Random Forest, SVM,and so on.

Julia satisfies these demands using multiple dispatch; multiple methods aredispatched according to their arity and argument types.

25 / 30

Page 26: Julia - Easier, Better, Faster, Stronger

Stronger - Multiple DispatchWhen the foo function is called, one of the following methods is actually selectedbesed on the number of arguments.

function foo() println("foo 0:")end

function foo(x) println("foo 1: $x")end

function foo(x, y) println("foo 2: $x $y")end

foo() #: foo 0:foo(100) #: foo 1: 100foo(100, 200) #: foo 2: 100 200

26 / 30

Page 27: Julia - Easier, Better, Faster, Stronger

Stronger - Multiple DispatchMultiple dispatch discerns the types of arguments - a suitable method which hasthe matching type spec to the values is selected.

function foo(x::Int, y::Int) println("foo Int Int: $x $y")end

function foo(x::Float64, y::Float64) println("foo Float64 Float64: $x $y")end

function foo(x::Int, y::Float64) println("foo Int Float64: $x $y")end

foo(1, 2) #: foo Int Int: 1 2foo(1.0, 2.0) #: foo Float64 Float64: 1.0 2.0foo(1, 2.0) #: foo Int Float64: 1 2.0

27 / 30

Page 28: Julia - Easier, Better, Faster, Stronger

Stronger - MacrosMacros allows you to get or modify your code from Julia itself.

In the following example, the assert macro gets given expression (x > 0), thenevaluates the expression in that place. When the evaluated result is false, itthrows an assertion error. Note that the error message contains acquiredexpression (x > 0) which is evaluated as false; this information is useful fordebugging purpose.

x = -5@assert x > 0 #! ERROR: assertion failed: x > 0

Instead of an expression, you can specify an error message:

x = -5@assert x > 0 "x must be positive" #! ERROR: assertion failed: x must be positive

28 / 30

Page 29: Julia - Easier, Better, Faster, Stronger

Stronger - MacrosThe assert macro is defined as follows in the standard library.

The macro is called with an expression (ex) and zero or more messages(msg...).

If the messages are empty, the expression itself becomes the error message(msg).

Then the error message is constructed.

Finally, an assertion code is spliced into the calling place.

macro assert(ex,msgs...) msg = isempty(msgs) ? ex : msgs[1] if !isempty(msgs) && isa(msg, Expr) # message is an expression needing evaluating msg = :(string("assertion failed: ", $(esc(msg)))) elseif isdefined(Base,:string) msg = string("assertion failed: ", msg) else # string() might not be defined during bootstrap msg = :(string("assertion failed: ", $(Expr(:quote,msg)))) end :($(esc(ex)) ? $(nothing) : error($msg))end base/error.jl

29 / 30

Page 30: Julia - Easier, Better, Faster, Stronger

Future - :)

https://twitter.com/stuinzuri/status/45935285512452505630 / 30