77
Node.js Web Apps @ ebay scale

Node.js Web Apps @ ebay scale

Embed Size (px)

Citation preview

Node.js Web Apps@ ebay scale

By Dmytro SemenovMember of Technical Staff @ NodeJS, Cloud and Platform Services, eBay Inc.

[email protected]

Our Journey

1995 2004 2013 present2006

Perl

C++

XSL/XML

Java

Node

Scala

Java UI, JSP Extended

Python Go ???

future

Marko/Lasso(Tessa)

Polyglot initiative

OSGi

Programming Language History

● Perl/Apache (no layers)

○ Scalability limit of 50K active listings

● C/C++/IIS/XML/XSL (monolithic client/server architecture)

○ Compiler limit of methods per class, very long build times (2+ hours)

● Java/J2EE (3 tier architecture, split build into domain specific)

○ XML/XSL (MSXSL) as presentation layer hit native memory limit

○ Java UI components, proprietary technology, steep learning curve

○ Startup time up to 40 minutes

● Java Stack (embrace open-source, JSP, modular approach)

○ OSGi conflicts at startup, very hard to get everything working at the start

○ Slow startup time (2+ minutes)

● Polyglot initiative to support popular languages/technologies

Node.JS Introduction

● Started in 2013

○ CubeJS, first attempt was not so good,

○ Unified NodeJS, eBay and PayPal platform merge

○ Big projects decided to move to NodeJS

● Embedding into application teams

○ Accumulation of issue knowledge base

○ Build confidence and form local knowledge centers

○ Fast feedback

○ Fast project startup

○ Pick new ideas from application teams

Current State

● Node 4.x

● 200 applications and growing

● 80 million requests / day now 1 billion

● Platform team of 6 developers

● Very vibrant internal community

○ Local pool of knowledge among teams

○ Self sustaining support by internal community

○ PRs are welcome

○ 45 contributors to eBay NodeJS platform

● 80 platform modules

● 330 total modules

● More agile

○ Startup time <2s

○ Test coverage close to 90%-100%

○ Faster releases (every day) vs 2 weeks cycle

○ Automatic upgrades with semver

○ Modular UI architecture based on UI components

● Better tools

○ flame graphs, waterfall charts, pre-production testing

● Learnings Applied to Java Platform

○ Incorporate best practices from NodeJS

○ Startup time < 1 minute

○ Embrace modular approach and semver

○ Lighter Java stacks

Architecture

NPM: Private

KappaCouchDB

Primary

CouchDB

Backup

replicate

registry.npmjs.org

npmjs.ebay.com

fallback

LB

Kappa

Kappa

Multi-Screen World

mobile web

browser

native

App

Server

Domain

Experience

Service

services

API

Gateway

ajax

● Experience Service

○ Provides view models for UI modules

○ View model is specific to the device type (native, size, desktop, tablet)

● NodeJS Web App

○ Talk to experience service

○ Handles UI functionality

○ Renders desktop and mobile web pages

● Native Apps

○ Talk to experience service via API gateway

Module View

pm2

node.js

express (3 workers)

platform-ebay

analytics

logging

config

service client

lasso plugin

app

servicespages components

kraken-js lasso/tessamarko

Don’t Build Pages, Build Modules

The old directory

structure (bad)...

src/

pages/

component

s/src/components/login-form/

index.js

style.css

template.marko

src/pages/login/

index.js

style.css

browser.json

UI Technologies

● Async + streaming

● Custom tags

● Browser + Node.js runtime

● Lightweight

● Extremely fast

● Compiles to CommonJS

markojs.comgithub.com/marko-js/marko

github.com/marko-js/marko-widgets

Templating

UI components

● DOM diffing/patching

● Batched updates

● Stateful widgets

● Declarative event binding

● Efficient event delegation

● Lightweight runtime

UI components built using Marko Widgets

Uses Marko templates as the view. Plus:

github.com/marko-js/marko-widgets

● DOM diffing/patching

● Batched updates

● Stateful widgets

● Declarative event binding

● Efficient event delegation

● Lightweight runtime

Speed is king

Node is built on a streaming interface

HTTP is a streaming protocol

So why then do most templating and UI libraries render

as a single chunk?

● Faster perceived page load

● Better time, resource and traffic utilization

● Progressive rendering between browser, frontend and backend

○ Frontend<->Backend: Stream of server side events (SSE)

○ Browser<->Frontend: Module render and flush on arrival

● Marko helps:

○ Async fragments support by marko-async

○ Re-order on client side

Marko vs React: Server-side rendering

github.com/patrick-steele-idem/marko-vs-react

browser

Domain

Experience

Service

http/https http/https

chunked stream sse stream

App

Server

Service Invocation

Why not use middleware pattern on the

client?

logging

error handling

analytics

circuit breaker

retry

security/oauth

http/https/ssese

rvic

e in

vo

ca

tio

n h

an

dle

rs

service client

middlewares

service

web app

request

response

co

okie

s

body-p

ars

er

an

aly

tics

● Generic Service Client with pluggable handlers

● Handler performs a single function

● The handler pipeline is configurable per client

Configuration based Code based

"my-service-client": {

"protocol": "http:",

"hostname": "myservice.com",

“path”: “/path/to/resource”,

"port": 8080,

"socketTimeout": 1000,

"pipeline": [

"logging/handler",

"circuit-breaker/handler",

"error/handler" ]

}

var serviceClient = require(‘service-client’);

serviceClient.use(require(‘logging/handler’));

serviceClient.use(require(‘circuit-breaker/handler’));

serviceClient.use(require(‘error/handler’));

serviceClient

.get(‘http://myservice.com/path/to/resource’)

.end((err, response) => {

console.log(response.body);

})

Resource Bundling and

Externalization

Why not deploy everything

at once?

Resource

Server

App

Server

1. GET

http://www.ebay.com

2. GET

http://rs/check/hp-24512.js

3. PUT

http://rs/upload/hp-24512.js4. <script src=”hp-

24512.js”>

5. GET http://hp-

24512.js

6. GET http://hp-

24512.js

Akamai

browser

lasso.js/tessaJavaScript module bundler + asset pipeline

github.com/lasso-js/lasso

● Supported resources

○ Defined by lasso plugins

○ CSS/JS/Images

○ Templates

○ I18n content

○ Bundle definition

● Lasso plugin

○ Adaptor between lasso and resource server

● Benefits

○ Single build and deployment

○ No synchronization problems between app and resource server deployments

○ Externalization at startup and during runtime on-the-fly

Configuration Management

app/config.json

/node_modules

/moduleA/config.json

/moduleB/config.json

CMS

app/1.0.1/config: {}

moduleA/1.1.1/config: {}

moduleB/1.0.0/config: {}

pull every minute

Config

Deployment

App

Deployment

App

Server

App

Server

App

Server

● Module configuration

○ Local, packaged with module

○ Remote, hot deployed

● Application configuration

○ Local

○ Remote, hot deployed

● Application can override any module configuration

● Configuration can be injected via Admin Console

● Future - Code and Config “separation”, but

○ Keep app and config together in git repo and separate at deployment.

○ Easier to manage

i18n

● Use krakenjs/spud module

○ Property file format

○ Marko tags/helpers

● Externalizable as resources

● 17 main languages

● Support multiple languages per country

● Application and modules can have localizable content

app/locales/

US/

en/

ProjectName/

xxx.properties

yyy.properties

ru/

ProjectName/

xxx.properties

yyy.properties

DE/

de/

ProjectName/

xxx.properties

yyy.properties

Folder Structure Example

Security

● Nsp tool

○ On-demand scan for every application project

○ Security badge for every platform module

● ScanJS

○ Source code scan

● CSRF tokens

● Redirect validation

● XSS

● Rate limiter

Logging & Monitoring

● Logging every transaction/subtransaction

○ Central Logging Repository (CAL) provides log aggregation per pool/box/datacenter

○ Use Domains to maintain context per request and avoid passing it around

● Explicit code instrumentation

○ Support DEBUG, INFO, WARN, ERROR, FATAL

○ Time span to record transaction (start and end)

○ Nested spans

● Health checks/stats monitoring and alerts

● Crash/OOM emails to the owner of the pool

● Early problem detection using traffic mirror

App Resiliency

● Proactive testing in pre-production

○ Traffic mirroring of read-only traffic to the box with new build

○ Easy upgrades

● Handling uncaught errors

○ Domains used to capture context

○ Send email to the owner with stack trace, group and box name, request info

○ Graceful restart

● Handling memory leaks

○ Email sent to the owner with group and box name, request info

○ Graceful restart when memory threshold is reached

● Too busy load shedding

○ 503 or connection reset to trigger browser DNS fallback

○ Filter bots traffic under heavy load

● Planning for failure

● Hystrix like service calls

○ Fail fast

○ Circuit breaker

Performance Optimization

● Fast startup/re-start

○ Cold cache to avoid service invocation

○ Pre-compiling template @ deployment

○ Pre-externalizing resources @ deployment

● Progressive rendering/streaming @ browser side

● Progressive chunking/streaming @ service side

● Performance tuning

○ Flame Graphs

○ Waterfall charts @ server side

Marko vs React: Server-side rendering

github.com/patrick-steele-idem/marko-vs-react

Tools > Flame Graphs

Flame Graphs

Why not use v8-profiler data?

● How to

○ Use v8-profiler to generate json data file

○ Aggregate stack frames into json

○ Render using d3-flame-graph

● No special environment

● No special steps

● Single button generation

● Can be used in production/dev/qa

● Exposes only javascript side of the code

Used For ...

● CPU profiling

● Troubleshooting in production the problem at hand

● Memory leak investigation

● Regular sampling

Practice Fire Safety

Flame Graphs:

Tools > Waterfall Charts

● Logs are hard to read

● Timestamps are hard to compare

● We need a faster tool?

Why not use the same method used by

developer tools in browser?

Waterfall Charts

● Requires code instrumentation

● Easy and quick to assess what is going on

● Easy to spot synchronous events

● Analyze for possible task parallelization

Lessons Learned

● Latency, TCP_NODELAY=true

● Handling request close, finish, error events is important

● No dns cache out of the box

○ Use OS level caching to allow restarts

● Avoid modules with a state

● Embedding within App teams to bootstrap works great

● Use cold cache to keep restarts fast

Challenges

● Version control

○ npm shrinkwrap does not guarantee versions

○ switched to Uber shrinkwrap

● App and platform coupling in one build

○ It is still monolithic, platform coupled to app

● Upgrading to major versions

○ Need to keep backwards compatible

○ Teams go at their own pace

● Memory leak analysis

● Debugging

○ Not stable, gets broken frequently

So Far So Good

What’s next?

● 1 billion requests / day

● Decoupling platform from application

○ Moving platform components into separate processes

○ Independent platform deployments

○ More resilient apps

○ Problem isolation (easier memory leak detection)

● Platform microservices

● Docker

● Kubernetes

● SenecaJS

● NodeJS services

References

• Progressive rendering: http://www.ebaytechblog.com/2014/12/08/async-fragments-

rediscovering-progressive-html-rendering-with-marko

• AMP: http://www.ebaytechblog.com/2016/06/30/browse-ebay-with-style-and-speed/ - sse

between frontend and backend - streaming

• http://www.ebaytechblog.com/2016/06/15/igniting-node-js-flames/

• http://www.ebaytechblog.com/2016/07/14/mastering-the-fire/

• http://www.slideshare.net/tcng3716/ebay-architecture

• Cloud http://www.computerweekly.com/news/2240222899/Case-study-How-eBay-uses-its-

own-OpenStack-private-cloud

• http://www.ebaytechblog.com/2014/10/02/dont-build-pages-build-modules/

• history: http://www.addsimplicity.com/downloads/eBaySDForum2006-11-29.pdf

• lasso: https://www.npmjs.com/package/lasso

• marko: http://markojs.com/

• https://github.com/spiermar/d3-flame-graph

Questions ?

Back slides