Upload
duoyi-wu
View
5.531
Download
2
Embed Size (px)
Citation preview
JavaScript Engine Performance
关于我
• Baidu资深工程师
• 目前主要做性能优化相关的工作• 参与W3C的“HTML” 和“Web Performance” 工作组
@nwind @nwind
请注意
• 我不是虚拟机的专家,仅仅是业余兴趣• 很多内容都经过了简化,实际情况要复杂很多• 这里面的观点仅代表我个人看法
大纲
• 虚拟机的基本原理• JavaScript引擎是如何优化性能的
• V8、Dart、Node.js的介绍
• 如何编写高性能的JavaScript代码
VM basic
Virtual Machine history
• pascal 1970
• smalltalk 1980
• self 1986
• python 1991
• java 1995
• javascript 1995
Smalltalk的演示展现了三项惊人的成果。包括电脑之间如何实现联网,以及面向对象编程是如何工作的。
但乔布斯和他的团队对这些并不感兴趣,因为他们的注意力被...
How Virtual Machine Work?
• Parser
• Intermediate Representation
• Interpreter, JIT
• Runtime, Garbage Collection
Parser
• Tokenize
• AST
Tokenize
var foo = 10;keyword
identifier
equal
number
semicolon
AST
Assign
Variable foo Constant 10
Intermediate Representation
• Bytecode
• Stack vs. register
00000: deffun 0 null00005: nop00006: callvar 000009: int8 200011: call 100014: pop00015: stop
foo:00020: getarg 000023: one00024: add00025: return00026: stop
Bytecode (SpiderMonkey)
function foo(bar) { return bar + 1;}
foo(2);
8 m_instructions; 168 bytes at 0x7fc1ba3070e0; 1 parameter(s); 10 callee register(s)
[ 0] enter[ 1] mov! ! r0, undefined(@k0)[ 4] get_global_var! r1, 5[ 7] mov! ! r2, undefined(@k0)[ 10] mov! ! r3, 2(@k1)[ 13] call!! r1, 2, 10[ 17] op_call_put_result! ! r0[ 19] end! ! r0
Constants: k0 = undefined k1 = 2
3 m_instructions; 64 bytes at 0x7fc1ba306e80; 2 parameter(s); 1 callee register(s)
[ 0] enter[ 1] add! ! r0, r-7, 1(@k0)[ 6] ret! ! r0
Constants: k0 = 1
End: 3
Bytecode (JSC)
function foo(bar) { return bar + 1;}
foo(2);
Stack vs. register
• Stack
• JVM, .NET, PHP, Python, Old JavaScript engine
• Register
• Lua, Dalvik, Modern JavaScript engine
• Smaller, Faster (about 20%~30%)
• RISC
local a,t,i 1: LOADNIL 0 2 0a=a+i 2: ADD 0 0 2a=a+1 3: ADD 0 0 250 ; aa=t[i] 4: GETTABLE 0 1 2
Stack vs. registerlocal a,t,i 1: PUSHNIL 3a=a+i 2: GETLOCAL 0 ; a 3: GETLOCAL 2 ; i 4: ADD 5: SETLOCAL 0 ; aa=a+1 6: SETLOCAL 0 ; a 7: ADDI 1 8: SETLOCAL 0 ; aa=t[i] 9: GETLOCAL 1 ; t 10: GETINDEXED 2 ; i 11: SETLOCAL 0 ; a
Interpreter
• Switch statement
• Direct threading, Indirect threading, Token threading ...
while (true) {! switch (opcode) {! ! case ADD:! ! ! ...! ! ! break;
! ! case SUB:! ! ! ...! ! ! break; ...! }}
Switch statement
typedef void *Inst;Inst program[] = { &&ADD, &&SUB };Inst *ip = program;goto *ip++;
ADD: ... goto *ip++;
SUB: ...
Direct threading
http://gcc.gnu.org/onlinedocs/gcc/Labels-as-Values.html
Threaded Code
http://en.wikipedia.org/wiki/File:Pipeline,_4_stage.svg
Context Threading
Context Threading: A flexible and efficient dispatch technique for virtual machine interpreters
Context Threading14
Essence of our Solution
iload_1: ..ret;
iadd: ..ret;
..call iload_1call istore_1call iaddcall iload_1
call iload_1
CTT - ContextThreading Table
(generated code)
Bytecode bodies (ret terminated)
Return Branch Predictor Stack
…iload_1iload_1iaddistore_1iload_1bipush 64if_icmplt 2…
Package bodies as subroutines and call them
Garbage Collection
• Reference counting (php, python ...), Smart pointer
• Tracing
• Generational
• Stop-the-world, Concurrent, Incremental
• Copying, Sweep, Compact
Why JavaScript is slow?
• Dynamic Type
• Weak Type
• Need to parse every time
• GC
Fight with Weak Type
typedef union { void *p; double d; long l;} Value;
typedef struct { unsigned char type; Value value;} Object;
Object a;
Object model in most VM
Tagged pointer
在几乎所有系统中,指针地址会对齐 (4或8字节)
http://www.gnu.org/s/libc/manual/html_node/Aligned-Memory-Blocks.html
这意味着0xc00ab958 指针的最后2或3个位⼀一定是0
1 0 0 19999
PointerPointerPointerPointer
1 0 0 08888
Small NumberSmall NumberSmall NumberSmall Number
可以在最后⼀一位加1来表示指针
Tagged pointerMemory
...
2
0x3d2aa00
...
...
object b
...
var a = 1var b = {a:1}
Small Number
31位能表示十亿,对大部分应用来说足够了
230 −1= 1073741823−230 = −1073741824
External Fixed Typed Array
• Strong type, Fixed length
• Out of VM heap
• Example: Int32Array, Float64Array
Small Number + Typed Array
0
1250
2500
3750
5000
C/C++ Java(HotSpot) V8 PHP Ruby Python
420040203180
807050
Seconds (smaller is better)
http://shootout.alioth.debian.org/u32/performance.php?test=fannkuchredux
40x
Warning: Benchmark lies
ES6 will have struct
Point2D = new StructType({ ! x: uint32, ! y: uint32 });
ES6 StructType
Pixel = new StructType({ ! point: Point2D, ! color: Color });
Color = new StructType({ ! r: uint8, ! g: uint8, ! b: uint8 });
Use typed array to run faster
Fight with Dynamic Type
foo.bar
movl 4(%edx), %ecx //getmovl %ecx, 4(%edx) //put
foo.bar in C
found = HashTable.FindEntry(key)if (found) return found;
for (pt = GetPrototype(); pt != null; pt = pt.GetPrototype()) { found = pt.HashTable.FindEntry(key) if (found) return found;}
foo.bar in JavaScript
How to optimize?
First, We need to know Object layout
Add Type for object
add property x add property y
http://code.google.com/apis/v8/design.html
Inline Cache
• Slow lookup at first time
• Modify the JIT code in-place
• Next time will directly jump to the address
Inline cache make simple
function fun(foo) { return foo.bar;}
return foo.lookupProperty(bar);
if (foo[hiddenClass] == 0xfe1) { return foo[indexOf_bar];}return foo.lookupProperty(bar);
实际代码中的JS并不会那么动态Delete操作只占了0.1%
99%的原始类型可以在运行通过静态分析确定
97%的属性访问可以被inline cache
“TypeCastor: Demystify Dynamic Typing of JavaScript...”
“An Analysis of the Dynamic Behavior of JavaScript...”
V8 can’t handle delete yet
http://jsperf.com/test-v8-delete
20x times slower!
Avoid alter object property layout
Faster Data Structure & Algorithm
Array push is faster than String concat?
http://jsperf.com/nwind-string-concat-vs-array-push
Why?
other string optimizations
• Adaptive string search
• Single char, Linear, Boyer-Moore-Horspool
• Adaptive ascii and utf-8
• Zero copy sub string
Feel free to use String in modern Engine
Just-In-Time (JIT)
JIT
• Method JIT, Trace JIT, Regular expression JIT
• Register allocation
• Code generation
How JIT work?
• mmap, malloc (mprotect)
• generate native code
• cast (c), reinterpret_cast (c++)
• call the function
V8
V8
• Lars Bak
• Hidden Class, PICs
• Some of Built-in objects are written in JavaScript
• Crankshaft
• Precise generation GC
Lars Bak
• implement VM since 1988
• Beta
• Self
• JVM (VM architect at Sun)
• V8 (Google)
Lines of code (VM only)
0
125000
250000
375000
500000
HotSpot V8 SpiderMonkey JSC Ruby CPython PHP-Zend
42113
1547580438086763975
70787
110831
44646108280120941
83920135547
224038
359986
.cpp/.c .h
Crankshaft
Source code Native Code
High-Level IR Low-Level IR Opt Native Code}Crankshaft
runtime profiling
Crankshaft
• Profiling
• Compiler optimization
• Generate new JIT code
• On-stack replacement
• Deoptimize
High-Level IR (Hydrogen)• AST to SSA
• Type inference (type feedback from inline cache)
• Compiler optimization
• Function inline
• Loop-invariant code motion, Global value numbering
• Eliminate dead phis
• ...
for (i = 0; i < n; i++) { a[i] = x + y;}
Loop-invariant code motion
tmp = x + y;for (i = 0; i < n; i++) { a[i] = tmp;}
Function inline limit for now
• big function (large than 600 bytes)
• have recursive
• have unsupported statements
• with, switch
• try/catch/finally
• ...
Avoid “with”, “switch” and “try” in hot path
function ArraySort(comparefn) { ... // In-place QuickSort algorithm. // For short (length <= 22) arrays, insertion sort is used for efficiency.
if (!IS_SPEC_FUNCTION(comparefn)) { comparefn = function (x, y) { if (x === y) return 0; if (%_IsSmi(x) && %_IsSmi(y)) { return %SmiLexicographicCompare(x, y); } x = ToString(x); y = ToString(y); if (x == y) return 0; else return x < y ? -1 : 1; }; } ...
Built-in objects written in JS
v8/src/array.js
GC
• Precise
• Stop-the-world
• Generation
• Incremental (2011-10)
V8 performance
V8 performance
V8 performance
Why?
V8 performance
Unfair, they are using gmp library
Warning: Benchmark lies
Node.JS• Pros
• Easy to write Async I/O
• One language for everything
• Maybe Faster than PHP, Python
• Bet on JavaScript is safe
• Cons
• Lack of great libraries
• Large JS is hard to maintain
• Easy to have Memory leak (compare to PHP, Erlang)
• Still too youth, unproved
Why Dart?
• Build for large application
• option type, structured, libraries, tools
• Performance
• lightweight process like erlang
• easy to write a faster vm than javascript
The future of Dart?
• It will not replace JS
• But it may replace GWT, and become a better choice for Building large front-end application
• with great IDE, mature libraries
• and some way to communicate with JavaScript
How to make JavaScript faster?
How to make JavaScript faster?
• Wait for ES6: StructType, const, WeakMap, yield...
• High performance build-in library
• WebCL
• Embed another language
• KL(FabricEngine), GLSL(WebGL)
• Wait for Quantum computer :)
Things you can learn also
• NaN tagging
• Polymorphic Inline Cache
• Type Inference
• Regex JIT
• Runtime optimization
• ...
References• The behavior of efficient virtual
machine interpreters on modern architectures
• Virtual Machine Showdown: Stack Versus Registers
• The implementation of Lua 5.0
• Why Is the New Google V8 Engine so Fast?
• Context Threading: A Flexible and Efficient Dispatch Technique for Virtual Machine Interpreters
• Effective Inline-Threaded Interpretation of Java Bytecode Using Preparation Sequences
• Smalltalk-80: the language and its implementation
References• Design of the Java HotSpotTM
Client Compiler for Java 6
• Oracle JRockit: The Definitive Guide
• Virtual Machines: Versatile platforms for systems and processes
• Fast and Precise Hybrid Type Inference for JavaScript
• LLVM: A Compilation Framework for Lifelong Program Analysis & Transformation
• Emscripten: An LLVM-to-JavaScript Compiler
• An Analysis of the Dynamic Behavior of JavaScript Programs
References• Adaptive Optimization for SELF
• Bytecodes meet Combinators: invokedynamic on the JVM
• Context Threading: A Flexible and Efficient Dispatch Technique for Virtual Machine Interpreters
• Efficient Implementation of the Smalltalk-80 System
• Design, Implementation, and Evaluation of Optimizations in a Just-In-Time Compiler
• Optimizing direct threaded code by selective inlining
• Linear scan register allocation
• Optimizing Invokedynamic
• Threaded Code
References• Why Not a Bytecode VM?
• A Survey of Adaptive Optimization in Virtual Machines
• An Efficient Implementation of SELF, a Dynamically-Typed Object-Oriented Language Based on Prototypes
• Making the Compilation "Pipeline" Explicit- Dynamic Compilation Using Trace Tree Specialization
• Uniprocessor Garbage Collection Techniques
References• Representing Type Information in
Dynamically Typed Languages
• The Behavior of Efficient Virtual Machine Interpreters on Modern Architectures
• Trace-based Just-in-Time Type Specialization for Dynamic Languages
• The Structure and Performance of Efficient Interpreters
• Know Your Engines: How to Make Your JavaScript Fast
• IE Blog, Chromium Blog, WebKit Blog, Opera Blog, Mozilla Blog, Wingolog’s Blog, RednaxelaFX’s Blog, David Mandelin’s Blog, Brendan Eich’s Blog...
!ank y"