90
JavaScript Engine Performance

Javascript engine performance

Embed Size (px)

Citation preview

Page 1: Javascript engine performance

JavaScript Engine Performance

Page 2: Javascript engine performance

关于我

• Baidu资深工程师

• 目前主要做性能优化相关的工作• 参与W3C的“HTML” 和“Web Performance” 工作组

@nwind @nwind

Page 3: Javascript engine performance

请注意

• 我不是虚拟机的专家,仅仅是业余兴趣• 很多内容都经过了简化,实际情况要复杂很多• 这里面的观点仅代表我个人看法

Page 4: Javascript engine performance

大纲

• 虚拟机的基本原理• JavaScript引擎是如何优化性能的

• V8、Dart、Node.js的介绍

• 如何编写高性能的JavaScript代码

Page 5: Javascript engine performance

VM basic

Page 6: Javascript engine performance

Virtual Machine history

• pascal 1970

• smalltalk 1980

• self 1986

• python 1991

• java 1995

• javascript 1995

Page 7: Javascript engine performance

Smalltalk的演示展现了三项惊人的成果。包括电脑之间如何实现联网,以及面向对象编程是如何工作的。

但乔布斯和他的团队对这些并不感兴趣,因为他们的注意力被...

Page 8: Javascript engine performance

How Virtual Machine Work?

• Parser

• Intermediate Representation

• Interpreter, JIT

• Runtime, Garbage Collection

Page 9: Javascript engine performance

Parser

• Tokenize

• AST

Page 10: Javascript engine performance

Tokenize

var foo = 10;keyword

identifier

equal

number

semicolon

Page 11: Javascript engine performance

AST

Assign

Variable foo Constant 10

Page 12: Javascript engine performance

Intermediate Representation

• Bytecode

• Stack vs. register

Page 13: Javascript engine performance

00000: deffun 0 null00005: nop00006: callvar 000009: int8 200011: call 100014: pop00015: stop

foo:00020: getarg 000023: one00024: add00025: return00026: stop

Bytecode (SpiderMonkey)

function foo(bar) { return bar + 1;}

foo(2);

Page 14: Javascript engine performance

8 m_instructions; 168 bytes at 0x7fc1ba3070e0; 1 parameter(s); 10 callee register(s)

[ 0] enter[ 1] mov! ! r0, undefined(@k0)[ 4] get_global_var! r1, 5[ 7] mov! ! r2, undefined(@k0)[ 10] mov! ! r3, 2(@k1)[ 13] call!! r1, 2, 10[ 17] op_call_put_result! ! r0[ 19] end! ! r0

Constants: k0 = undefined k1 = 2

3 m_instructions; 64 bytes at 0x7fc1ba306e80; 2 parameter(s); 1 callee register(s)

[ 0] enter[ 1] add! ! r0, r-7, 1(@k0)[ 6] ret! ! r0

Constants: k0 = 1

End: 3

Bytecode (JSC)

function foo(bar) { return bar + 1;}

foo(2);

Page 15: Javascript engine performance

Stack vs. register

• Stack

• JVM, .NET, PHP, Python, Old JavaScript engine

• Register

• Lua, Dalvik, Modern JavaScript engine

• Smaller, Faster (about 20%~30%)

• RISC

Page 16: Javascript engine performance

local a,t,i 1: LOADNIL 0 2 0a=a+i 2: ADD 0 0 2a=a+1 3: ADD 0 0 250 ; aa=t[i] 4: GETTABLE 0 1 2

Stack vs. registerlocal a,t,i 1: PUSHNIL 3a=a+i 2: GETLOCAL 0 ; a 3: GETLOCAL 2 ; i 4: ADD 5: SETLOCAL 0 ; aa=a+1 6: SETLOCAL 0 ; a 7: ADDI 1 8: SETLOCAL 0 ; aa=t[i] 9: GETLOCAL 1 ; t 10: GETINDEXED 2 ; i 11: SETLOCAL 0 ; a

Page 17: Javascript engine performance

Interpreter

• Switch statement

• Direct threading, Indirect threading, Token threading ...

Page 18: Javascript engine performance

while (true) {! switch (opcode) {! ! case ADD:! ! ! ...! ! ! break;

! ! case SUB:! ! ! ...! ! ! break; ...! }}

Switch statement

Page 19: Javascript engine performance

typedef void *Inst;Inst program[] = { &&ADD, &&SUB };Inst *ip = program;goto *ip++;

ADD: ... goto *ip++;

SUB: ...

Direct threading

http://gcc.gnu.org/onlinedocs/gcc/Labels-as-Values.html

Page 20: Javascript engine performance

Threaded Code

Page 22: Javascript engine performance

Context Threading

Context Threading: A flexible and efficient dispatch technique for virtual machine interpreters

Context Threading14

Essence of our Solution

iload_1: ..ret;

iadd: ..ret;

..call iload_1call istore_1call iaddcall iload_1

call iload_1

CTT - ContextThreading Table

(generated code)

Bytecode bodies (ret terminated)

Return Branch Predictor Stack

…iload_1iload_1iaddistore_1iload_1bipush 64if_icmplt 2…

Package bodies as subroutines and call them

Page 23: Javascript engine performance

Garbage Collection

• Reference counting (php, python ...), Smart pointer

• Tracing

• Generational

• Stop-the-world, Concurrent, Incremental

• Copying, Sweep, Compact

Page 24: Javascript engine performance

Why JavaScript is slow?

• Dynamic Type

• Weak Type

• Need to parse every time

• GC

Page 25: Javascript engine performance

Fight with Weak Type

Page 26: Javascript engine performance

typedef union { void *p; double d; long l;} Value;

typedef struct { unsigned char type; Value value;} Object;

Object a;

Object model in most VM

Page 27: Javascript engine performance

Tagged pointer

Page 28: Javascript engine performance

在几乎所有系统中,指针地址会对齐 (4或8字节)

http://www.gnu.org/s/libc/manual/html_node/Aligned-Memory-Blocks.html

Page 29: Javascript engine performance

这意味着0xc00ab958 指针的最后2或3个位⼀一定是0

1 0 0 19999

PointerPointerPointerPointer

1 0 0 08888

Small NumberSmall NumberSmall NumberSmall Number

可以在最后⼀一位加1来表示指针

Page 30: Javascript engine performance

Tagged pointerMemory

...

2

0x3d2aa00

...

...

object b

...

var a = 1var b = {a:1}

Page 31: Javascript engine performance

Small Number

31位能表示十亿,对大部分应用来说足够了

230 −1= 1073741823−230 = −1073741824

Page 32: Javascript engine performance

External Fixed Typed Array

• Strong type, Fixed length

• Out of VM heap

• Example: Int32Array, Float64Array

Page 33: Javascript engine performance

Small Number + Typed Array

0

1250

2500

3750

5000

C/C++ Java(HotSpot) V8 PHP Ruby Python

420040203180

807050

Seconds (smaller is better)

http://shootout.alioth.debian.org/u32/performance.php?test=fannkuchredux

40x

Page 34: Javascript engine performance

Warning: Benchmark lies

Page 35: Javascript engine performance

ES6 will have struct

Page 36: Javascript engine performance

Point2D = new StructType({ ! x: uint32, ! y: uint32 });

ES6 StructType

Pixel = new StructType({ ! point: Point2D, ! color: Color });

Color = new StructType({ ! r: uint8, ! g: uint8, ! b: uint8 });

Page 37: Javascript engine performance

Use typed array to run faster

Page 38: Javascript engine performance

Fight with Dynamic Type

Page 39: Javascript engine performance

foo.bar

Page 40: Javascript engine performance

movl 4(%edx), %ecx //getmovl %ecx, 4(%edx) //put

foo.bar in C

Page 41: Javascript engine performance

found = HashTable.FindEntry(key)if (found) return found;

for (pt = GetPrototype(); pt != null; pt = pt.GetPrototype()) { found = pt.HashTable.FindEntry(key) if (found) return found;}

foo.bar in JavaScript

Page 42: Javascript engine performance

How to optimize?

Page 43: Javascript engine performance

First, We need to know Object layout

Page 44: Javascript engine performance

Add Type for object

add property x add property y

http://code.google.com/apis/v8/design.html

Page 45: Javascript engine performance

Inline Cache

• Slow lookup at first time

• Modify the JIT code in-place

• Next time will directly jump to the address

Page 46: Javascript engine performance

Inline cache make simple

function fun(foo) { return foo.bar;}

return foo.lookupProperty(bar);

if (foo[hiddenClass] == 0xfe1) { return foo[indexOf_bar];}return foo.lookupProperty(bar);

Page 47: Javascript engine performance

实际代码中的JS并不会那么动态Delete操作只占了0.1%

99%的原始类型可以在运行通过静态分析确定

97%的属性访问可以被inline cache

“TypeCastor: Demystify Dynamic Typing of JavaScript...”

“An Analysis of the Dynamic Behavior of JavaScript...”

Page 48: Javascript engine performance

V8 can’t handle delete yet

http://jsperf.com/test-v8-delete

20x times slower!

Page 49: Javascript engine performance

Avoid alter object property layout

Page 50: Javascript engine performance

Faster Data Structure & Algorithm

Page 51: Javascript engine performance

Array push is faster than String concat?

Page 53: Javascript engine performance

Why?

Page 54: Javascript engine performance

other string optimizations

• Adaptive string search

• Single char, Linear, Boyer-Moore-Horspool

• Adaptive ascii and utf-8

• Zero copy sub string

Page 55: Javascript engine performance

Feel free to use String in modern Engine

Page 56: Javascript engine performance

Just-In-Time (JIT)

Page 57: Javascript engine performance

JIT

• Method JIT, Trace JIT, Regular expression JIT

• Register allocation

• Code generation

Page 58: Javascript engine performance

How JIT work?

• mmap, malloc (mprotect)

• generate native code

• cast (c), reinterpret_cast (c++)

• call the function

Page 59: Javascript engine performance

V8

Page 60: Javascript engine performance

V8

• Lars Bak

• Hidden Class, PICs

• Some of Built-in objects are written in JavaScript

• Crankshaft

• Precise generation GC

Page 61: Javascript engine performance

Lars Bak

• implement VM since 1988

• Beta

• Self

• JVM (VM architect at Sun)

• V8 (Google)

Page 62: Javascript engine performance

Lines of code (VM only)

0

125000

250000

375000

500000

HotSpot V8 SpiderMonkey JSC Ruby CPython PHP-Zend

42113

1547580438086763975

70787

110831

44646108280120941

83920135547

224038

359986

.cpp/.c .h

Page 63: Javascript engine performance

Crankshaft

Page 64: Javascript engine performance
Page 65: Javascript engine performance

Source code Native Code

High-Level IR Low-Level IR Opt Native Code}Crankshaft

runtime profiling

Page 66: Javascript engine performance

Crankshaft

• Profiling

• Compiler optimization

• Generate new JIT code

• On-stack replacement

• Deoptimize

Page 67: Javascript engine performance

High-Level IR (Hydrogen)• AST to SSA

• Type inference (type feedback from inline cache)

• Compiler optimization

• Function inline

• Loop-invariant code motion, Global value numbering

• Eliminate dead phis

• ...

Page 68: Javascript engine performance

for (i = 0; i < n; i++) { a[i] = x + y;}

Loop-invariant code motion

tmp = x + y;for (i = 0; i < n; i++) { a[i] = tmp;}

Page 69: Javascript engine performance

Function inline limit for now

• big function (large than 600 bytes)

• have recursive

• have unsupported statements

• with, switch

• try/catch/finally

• ...

Page 70: Javascript engine performance

Avoid “with”, “switch” and “try” in hot path

Page 71: Javascript engine performance

function ArraySort(comparefn) { ... // In-place QuickSort algorithm. // For short (length <= 22) arrays, insertion sort is used for efficiency.

if (!IS_SPEC_FUNCTION(comparefn)) { comparefn = function (x, y) { if (x === y) return 0; if (%_IsSmi(x) && %_IsSmi(y)) { return %SmiLexicographicCompare(x, y); } x = ToString(x); y = ToString(y); if (x == y) return 0; else return x < y ? -1 : 1; }; } ...

Built-in objects written in JS

v8/src/array.js

Page 72: Javascript engine performance

GC

• Precise

• Stop-the-world

• Generation

• Incremental (2011-10)

Page 73: Javascript engine performance

V8 performance

Page 74: Javascript engine performance

V8 performance

Page 75: Javascript engine performance

V8 performance

Why?

Page 76: Javascript engine performance

V8 performance

Unfair, they are using gmp library

Page 77: Javascript engine performance

Warning: Benchmark lies

Page 78: Javascript engine performance
Page 79: Javascript engine performance

Node.JS• Pros

• Easy to write Async I/O

• One language for everything

• Maybe Faster than PHP, Python

• Bet on JavaScript is safe

• Cons

• Lack of great libraries

• Large JS is hard to maintain

• Easy to have Memory leak (compare to PHP, Erlang)

• Still too youth, unproved

Page 80: Javascript engine performance

Why Dart?

• Build for large application

• option type, structured, libraries, tools

• Performance

• lightweight process like erlang

• easy to write a faster vm than javascript

Page 81: Javascript engine performance

The future of Dart?

• It will not replace JS

• But it may replace GWT, and become a better choice for Building large front-end application

• with great IDE, mature libraries

• and some way to communicate with JavaScript

Page 82: Javascript engine performance

How to make JavaScript faster?

Page 83: Javascript engine performance

How to make JavaScript faster?

• Wait for ES6: StructType, const, WeakMap, yield...

• High performance build-in library

• WebCL

• Embed another language

• KL(FabricEngine), GLSL(WebGL)

• Wait for Quantum computer :)

Page 84: Javascript engine performance

Things you can learn also

• NaN tagging

• Polymorphic Inline Cache

• Type Inference

• Regex JIT

• Runtime optimization

• ...

Page 85: Javascript engine performance

References• The behavior of efficient virtual

machine interpreters on modern architectures

• Virtual Machine Showdown: Stack Versus Registers

• The implementation of Lua 5.0

• Why Is the New Google V8 Engine so Fast?

• Context Threading: A Flexible and Efficient Dispatch Technique for Virtual Machine Interpreters

• Effective Inline-Threaded Interpretation of Java Bytecode Using Preparation Sequences

• Smalltalk-80: the language and its implementation

Page 86: Javascript engine performance

References• Design of the Java HotSpotTM

Client Compiler for Java 6

• Oracle JRockit: The Definitive Guide

• Virtual Machines: Versatile platforms for systems and processes

• Fast and Precise Hybrid Type Inference for JavaScript

• LLVM: A Compilation Framework for Lifelong Program Analysis & Transformation

• Emscripten: An LLVM-to-JavaScript Compiler

• An Analysis of the Dynamic Behavior of JavaScript Programs

Page 87: Javascript engine performance

References• Adaptive Optimization for SELF

• Bytecodes meet Combinators: invokedynamic on the JVM

• Context Threading: A Flexible and Efficient Dispatch Technique for Virtual Machine Interpreters

• Efficient Implementation of the Smalltalk-80 System

• Design, Implementation, and Evaluation of Optimizations in a Just-In-Time Compiler

• Optimizing direct threaded code by selective inlining

• Linear scan register allocation

• Optimizing Invokedynamic

• Threaded Code

Page 88: Javascript engine performance

References• Why Not a Bytecode VM?

• A Survey of Adaptive Optimization in Virtual Machines

• An Efficient Implementation of SELF, a Dynamically-Typed Object-Oriented Language Based on Prototypes

• Making the Compilation "Pipeline" Explicit- Dynamic Compilation Using Trace Tree Specialization

• Uniprocessor Garbage Collection Techniques

Page 89: Javascript engine performance

References• Representing Type Information in

Dynamically Typed Languages

• The Behavior of Efficient Virtual Machine Interpreters on Modern Architectures

• Trace-based Just-in-Time Type Specialization for Dynamic Languages

• The Structure and Performance of Efficient Interpreters

• Know Your Engines: How to Make Your JavaScript Fast

• IE Blog, Chromium Blog, WebKit Blog, Opera Blog, Mozilla Blog, Wingolog’s Blog, RednaxelaFX’s Blog, David Mandelin’s Blog, Brendan Eich’s Blog...

Page 90: Javascript engine performance

!ank y"