Scaling Gilt: from monolith ruby app to micro service scala service architecture

Preview:

DESCRIPTION

Gilt Lead Software Engineer Yoni Goldberg delivered this presentation at the NYC Tech Talks' January 14, 2014 meetup at Gilt.

Citation preview

SCALING GILTFrom Monolith Ruby App

to

Distributed Scala Micro-Services

#NYCTECHTALKS

Lead Engineer - GILT Podium => http://bit.ly/podiumapp

Yoni (Jonathan) Goldberg

ABOUT ME- Leading the Popeye Team

- Sale Personalization, Loyalty, SEO Post-purchase, Login/Registration flows

- MIT CS BS/Meng | Google | IBM | IDF

- Brooklyn | Coffee | Arduino | Running | Kite Surfing |Online Collaboration | Poker

Excited to be part of the NYC Tech community

THE LESSONS AND CHALLENGES THATWE HAD/HAVE WITH

MICRO-SERVICE ARCHITECTURE

Flash Sales Business Founded in 2007

Top 50 Internet-Retailer~150 Engineers

WHAT IS GILT?

ANOTHER WAY TO LOOK AT GILT

Three day traffic pattern

THE CLASSICSTARTUP STORY

THE EARLY DAYS2007 - Ruby on Rails the hottest new thing

The goal was to get to market fast

WE WERE ABLE TO HANDLE OURTRAFFIC PRETTY WELL

UNTIL LOUBOUTIN CAME TO GILT

TECHNOLOGY PAIN POINTS - 2009Spike required to launch 1,000s of ruby processesPostgres was overloadedRouting traffic between ruby processes sucked

|Note to self| - hide from the ruby fan boys

DEV PAIN POINTS1000 Models/Controllers, 200K LOC, 100s of jobsLots of contributors + no ownershipDifficult deployments with long integration cyclesHard to identify root causes

WE NEEDED TO SOLVETHE PROBLEM FAST

THREE THINGS HAPPENEDStarted the transition to the JVMM(a/i)cro-Service Era StartedDedicated data stores

WHY JVM?Widely adoptedStableBetter support for concurrencyBetter GC vs MRI

FIRST 10 SERVICES

We solved 90% of our arch scaling problemBut not the Dev points

PAIN POINTSSpike required to launch 1,000s of ruby processesPostgres was overloadedRouting traffic between ruby processes suckedNew services became semi-monolithic1000 Models/Controllers, 200K LOC, 100s of jobsLots of contributors + no ownershipDifficult deployments with long integration cycles

WHY WE DOUBLED DOWN ON MICRO-SERVICES

Empower teams and ownershipSmaller scopeSimpler and Easier deployments and rollbacks

MICRO SERVICE ARCHITECTURESTARTED TO GET TRACTION

AS OF LAST WEEK WE HAVE MORETHAN

450 SERVICES

APP BOOTSTRAPrake bootstrap:admin-web # Bootstrap a admin-web service rake bootstrap:babylon-docs # Bootstrap a babylon-docs service rake bootstrap:client-server-core # Bootstrap a client-server-core service rake bootstrap:jersey-java # Bootstrap a jersey-java service rake bootstrap:jersey-scala # Bootstrap a jersey-scala service rake bootstrap:play # Bootstrap a play service rake bootstrap:play-ui-build # Bootstrap a play-ui-build service rake bootstrap:sbt-library # Bootstrap a sbt-library service rake bootstrap:schema # Bootstrap a schema service

WE BEGAN THE TRANSITION TO SCALAAND PLAY

LOSA - Lots Of Small AppsSame motivation and benefits of Micro-Service

Architecture

NEW CHALLENGESDev/Integration EnvironmentsWho owns this service!?MonitoringDeployments and Testing (Functional/Integration)

ON DEV/INTEGRATION ENVIRONMENTSThe hardware is not strong enoughNo one wants to compile 20 services

EACH TEAM HAS A STAGING ENVSERVICE_PORTS=[ 4001, #listing-service 8235, #svc-user-set 9420, #svc-free-fall 7895, #svc-Loyalty 8155, #web-loyalty 9410, #web inventory status 7898, #admin-loyalty 7899, #notification 7102, #rouge 9530, #svc-component 6802, #svc-waitlist-submit 4066, #svc-action-sale ....

PORT_FORWARD_ARGS=SERVICE_PORTS.map { |port| ['-L', "#{port}:localhost:#{port}"] }

exec(*[%w{ssh -a -C -N -n}, PORT_FORWARD_ARGS, GW_HOST].flatten)

STAGING DIFFICULTIES:Hard to keep all the services up to dateMaxed our staging env capacitiesRequires to have internet connection for some of theservices (e.g LOSA-apps)

The Future

DOCKERAn extension to Linux Containers (LXC)

DecentralizationSimple ConfigurationsMuch lighter than a VMImmutableSupports services and platforms

ON OWNERSHIP "code stays much longer than people" - SB

CODE OWNERSHIP

CURRENT APPROACHCode Review!Code Review!Code Review!Team owns services, not individual developersOwnership transfer

DATA OWNERSHIP

WE TRANSITIONED TO MICRO-DBSThird of the services have their own

MongoDB

Postgres

Voldemort

MANAGE MICRO-RELATIONAL DBS SCHEMA EVOLUTION MANAGER

https://github.com/gilt/schema-evolution-manager

PRINCIPLES OF SCHEMA EVOLUTION MANAGER

Can manage the schema evolutions in a Git repoSchema changes are deployed as tar fliesNo rollbacksSchema changes are required to be incremental

echo "create table releases (id integer)" > new.sql sem-add ./new.sql #Created a git commit sem-dist # generates the tar e.g schema-ion-cannon0.0.2.tar.gz # Scp and untar on your server cd schema-ion-cannon-0.0.2 sem-apply host --localhost --name ion-cannon --user ion-cannon

ON MONITORING

THE TOOLS WE USE

graphite / openTSDB

ON DEPLOYMENTS AND TESTING

(FUNCTIONAL/INTEGRATION) "Testing is HARD" - the dev that sits on your left

THE CHALLENGES THAT WE FACED:Hard to execute functional tests between servicesFrustrating to deploy semi-manually (Capistrano)Scary to deploy other teams services

SBTMotivation: Scala adaptionComplex Scala syntaxCool features: ~test, shell, consoleHard to debug

GILT-SBT-BUILDSimple config for all the servicesPulls many plugins: [nexus, testing, RPMs, run scripts, Monitoring,SemVer, ...]Custom commands (e.g 'sbt release')

object Build extends ClientServerCoreProject with Dependencies { val name = 'svc-sale-activation' val coreDeps = .... val serverDeps = ... val clientDeps = ...

override val ioncannonTrack = IonCannon.FastTrack }

ION-CANNON + SBTRun functional/Selenium tests on dedicated EnvSupports Canary releasesEasy rollbacksIntegrated health checks

MAIN TAKEAWAYSSimplicity - Do you really need it?We feel that it was the right choice for usMicroServices promise works for most casesAs of 2014 - You will need to invest in Tools!

Keep in touch: jgoldberg@gilt.com

PODIUM TIMEWe are hiring...

www.yonigoldberg.com

Recommended