70
Aviran Mordo Head of Engineering @aviranm linkedin.com/in/aviran aviransplace.com The Road to Continuous Delivery

Road to Continuous Delivery - Wix.com

Embed Size (px)

Citation preview

PowerPoint Presentation

Aviran MordoHead of Engineering

@aviranmlinkedin.com/in/aviranaviransplace.com The Road to Continuous Delivery

How many built a website?1

Wix is a web publishing platform2

Wix In NumbersOver 66,000,000 usersStatic storage is >2PB of data3 Data centers + 3 Clouds (Google, Amazon)2B HTTP requests/day1000 people work at Wix

Traditional Dev Pipeline

10:45

Traditional Dev Pipeline

WaterfallLong development cycleTime waste (Wait)Late feedbackHard to fix1-2 Releases a year

Scrum

Scrum != Agile

Lets Go Back In Time

Where We WereWorking traditional waterfallWith fear of change With low product qualityWith slow development velocityWith tradition enterprise development lifecycle - Three months of a VERSION development and QA - Six months of crisis mode stabilizing system

10:45

Production SystemApproach to ProductionBuild only what is neededStop if something goes wrongEliminate anything which does not add value

Philosophy of WorkRespect for workersFull utilization of workers capabilitiesEntrust workers with responsibility & authority Taiichi Ohno (1912-1990)

Taiichi Ohno, Toyota's chief of production in the post-WWII period. He was THE main developer of Toyota Production System (TPS).

13

Seeing WasteSeven Wastes of Manufacturing

InventoryExtra ProcessingOverproductionTransportationWaitingMotionDefects Seven Wastes of Software Development

Partially Done WorkPaperworkExtra FeaturesBuilding the wrong thingWaiting for informationTask switchingDefects

Taiichi Ohno, Toyota's chief of production in the post-WWII period. He was THE main developer of Toyota Production System (TPS).

14

The Biggest Source Of WasteFeatures and functions used in a typical systemOften or Always used: 20%Rarely or never used: 64%

Taiichi Ohno, Toyota's chief of production in the post-WWII period. He was THE main developer of Toyota Production System (TPS).

15

Lean Product development Top 5 Most-Used Commands in Microsoft Word PasteSaveCopyUndoBold

Paste itself accounts for more than 11% of all commands used, and has more than twice as much usage as the #2 entry on the list, Save.

32% of the total command usage

Scaling challenges ProductProduct Minimum Viable Product (MVP)Does MVP meet your product standards?What about tooltip, help,first time ux, etc.. ? And that can win in a/b test

To Be Implemented

17

Get out of thought land The law of failure Most new its will fail even if they are flawlessly executed

Invest less, in-touch less , better ability to admit it fail Data beats opinions - let the customer decide

Make sure you building the right it before build it right

Quick Feedback

18

Continuous Delivery

RiskWaterfall - minimize number of deploymentsCD - minimize number of changes and impact in $$

Risk = #deployments* chance of something going wrong (~ number of changes) * impact of something wrong in $$

Small Development IterationsNo WaterfallNo ScrumNo IterationsNo long documentsBuild something smallWhen it is ready, deploy it - Measure it - Then fix it - Repeat, until Dev, Product and Customers are happy

Product / Dev / QA / Ops boundaries are going down

22

What Is The Common Denominator?Product managerProject managerQAOperationsDBADevelopers can do these jobs

CD is culture & mindsetTrust the developers - Empower developers to change production - Developer knows his system best

Automation as a default choice - No more is it worth to automate ? - Everything should be automated

Welcome to the twilight zone - Product/Dev/QA boundaries are going down - Everyone need to care about everything - Less formality : Corridor - IN , Meeting Room - Out

24

Dev Centric Culture Involve the DeveloperProduct definition (with product) Development (with architect)Testing (with QA developers)Deployment / Rollback(with ops) Monitoring / BI (with BI team)DevOps to enable deployment and rollback, fully automatedSupport Circle

The process for releasing/deploying software MUST be repeatable and reliableAutomate everything!If something's difficult or painful, do it more oftenKeep everything in source controlDone means releasedBuilt in qualityEverybody has responsibility for the release processImprove continuouslyContinuous Delivery principles

Test Driven DevelopmentNo new code is pushed to Git without being fully tested - We currently have over 40,000 automated tests

Before fixing a bug first write a test to reproduce the bug

Cover legacy (untested) systems with Integration tests

What people think of TDDTDD slows down developmentWith TDD we write more code (product + test code).TDD has no significant impact on quality

What people think of TDDTDD slows down developmentWith TDD we write more code (product + test code).TDD has no significant impact on quality

TDD Actual impact on development

We develop products fasterRemoves fear of changeEasier to enter some one elses projectDo we still need QA? (Yes, they code automation tests) - We dont have QA for back-end applicationsWriting a feature is 10-30% slower, 45-90% less bugs50% faster to reach production.Considerably less time to fix bugs (almost no need for debuger)

Guidelines for successful TDDTests should run on project checkout to a random computer. Tests should be debugged on a developers machine Tests should run fastTests have to be readable They are the projects specsFixture is evil!

Refactoring

Is Refactoring Rework?Absolutely NOT !Refactoring is the outcome of learningRefactoring is the cornerstone of improvementRefactoring builds the capacity to changeRefactoring doesnt cost, it pays

RefactoringRefactor from inside outSmall iterations with testsRefactor small methods make sure the tests dont breakDeploy oftenRe-write from the outside inWrite from scratch (one piece at a time)Code duplication sometimes needed (temporary)Protected by Feature Toggle

Before refactoring cover everything with tests- Legacy code usually covered by IT tests

Feature Toggles10:45

One of the key components to successful CD35

Code branch

New CodeOld Code

FT OpenedYesNo

Usage exampleSimple if statement in your code

Feature TogglesEveryone develops on the TrunkEvery piece of code can get to production at anytime Unused new code can go to production no harm doneOperational new code goes with a guard use new or old code by feature toggle

DB Schema Changes Without DowntimeAdding columns - Use another table link by primary key - Use blob field for schema flexibility

Removing fields - Stop using. Do not do any DB schema changes

New DB schema with data migrationPlan a lazy migration path controlled by feature toggleWrite to old / Read from oldWrite to both / Read from old Write to both / Read from new, fallback to old Backward compatibility is a mustWrite to new / Read from new, fallback to oldEagerly migrate data in the backgroundWrite to new / Read from new

Feature Toggle Strategies (gradual expose users)

Company employeesSpecific users or group of usersPercentage of trafficBy GEO By LanguageBy user-agentUser Profile basedBy context (site id or some kind of hash on site id)

Feature Toggle OverrideBy specific serverUsed to test system loadNew database flows/migrationRefactoring that may affect performance and memory usageBy Url parameterEnable internal testingProduct acceptanceFaking GEOBy FT cookie valueTestingWhen working with API on a single page application

Full load on a single serverOverride size limitation by setting a cookie on the client43

A/B Test

A/B TestEvery new feature is A/B testedWe open the new feature to a % of users - Define KPIs to check if the new feature is better - If it is better, keep it - If worse, check why and improve - impact of flaws is just for % of our users

Link to purchase on the editor was causing drop in conversion because users went there too soon without intent45

An interesting site effect on productHow many times did you have the conversion what is better? - Put the menu on top / on the sideWell, how about building both and A/B Testing?

Link to purchase on the editor was causing drop in conversion because users went there too soon without intent46

Marking users for persistent UXAnonymous user - Toss is randomly determined - Can not guarantee persistent experience if changing browser

Registered User - Toss is determined by the user ID - Guarantee toss persistency across browsers - Allows setting additional tossing criteria (for example new users only) - Only use this for sections that a user has to be authenticated

Do not mix anonymous and registered tests

AB test parentage of users with optional filtersNew Users Only (Registered users only)By language By GEOBy Browser user-agent OSAny other criteria you have on your users

A/B Test FeaturesA/B Test OverrideStartStopPauseBots are always excluded from the test

Wix PETRI

NOT !!!

Gradual DeploymentAssume two componentsWe shutdown one and install on it the new version. It is not active yetDo self testActivate the new server it is passes self testContinue deploying the other servers, a few at a time, checking each one with self testA 1.1B 1.1A 1.1B 1.2A 1.1A 1.1B 1.1B 1.1A 1.1A 1.1B 1.1B 1.2A 1.1B 1.2A 1.1A 1.1B 1.1B 1.2A 1.1B 1.1A 1.1A 1.1B 1.1B 1.2

Self Test / Post Deployment TestAfter each server deployment run a self test before deploying the next server.Checking server configuration and topology - Make sure DB is accessible - Is the schema the one I expect - Access required local resources (files, config, templates, etc) - Access remote resources - RPC / REST endpoints reachable and operationalServer will refuse requests unless it passes the self testAllow a way to skip self test (and continue deployment)

Tools - App-info Self Test

Backward and Forward compatibleAssume two components

We release a new version of one

Now Rollback the other

A 1.1B 1.2A 1.2B 1.1A 1.1A 1.1B 1.1B 1.2A 1.2A 1.1B 1.1B 1.1A 1.1B 1.1A 1.1A 1.1B 1.1B 1.1A 1.0A 1.2A 1.1B 1.2B 1.1B 1.2A 1.2A 1.2A 1.1B 1.2B 1.1B 1.0

A Story on Wix Time Machine

Time machine event = Deployment capabilities : no click deployment - Dozens of services , 130+ servers, 3 Data CentersBackward and forward compatibility at the extreme field test case - Mixed versions of services / DB with no service downtimeEmpowerment - The power we give to individual Risk taken and failure embracement

57

Wix in 201417,000 Deployments (production changes) a yearDouble the velocity from last yearEvery 7 minutes production changes its state (during working hours)

Do You Have The Guts To Deploy 60 Times A Day?

CD prepare to invest..Dev infrastructure - Refactor , Refactor, RefactorTesting infrastructure & know howDeployment infrastructure & toolsAutomation , Automation , Automation Monitoring (business and technical)hundreds of aspects thresholds use is a MustMonitor business KPIsInternal & external Endless Tuning & learning

60

How does it work CD PracticesTest driven developmentSmall Development IterationsBackwards and Forwards compatibleGradual Deployment & Self-TestFeature ToggleA/B TestingException ClassificationProduction visibility

Tools - App-info - Dashboard

Tools - App-info Running Experiments

App-Info Resource Pools

Tools Monitoring - New Relic

Tools Frying Pan

Tools Lifecycle To Rule Them All

Where are we today?We have re-written our flash editor product as an HTML 5 editor - In just 4 monthsIntroduced Wix 3rd party applications (developers API) - In just 6 weeksWe are easily replacing significant parts of our infrastructureAnd we are doing ~60 releases a day!Production state changes every 7 minutes.

Aviran MordoHead of Back-end Engineering

@aviranmlinkedin.com/in/aviranaviransplace.com The Road to Continuous Delivery

Read more: The Road to Continuous Delivery: http://goo.gl/K6zEK

Dev-Centric Culture: http://goo.gl/0Vo70t

How many built a website?69

How would you do it?How will you change Wix session encryption key with as little service interruption as possibleEncryption key is currently hard coded in the frameworkAll the services have the encryption keyUser server creates a sessionServices can renew a sessionExternal services not in the framework also have the encryption key