How Salesforce built a Scalable, World-Class, Performance Engineering Team

How Salesforce built a Scalable,

World-Class, Performance

Engineering Team

September 18th, 2012

Kasey Lee, Salesforce, VP Performance Engineering

in/leekasey

Safe Harbor

Safe harbor statement under the Private Securities Litigation Reform Act of 1995:

This presentation may contain forward-looking statements that involve risks, uncertainties, and assumptions. If any such uncertainties materialize or if

any of the assumptions proves incorrect, the results of salesforce.com, inc. could differ materially from the results expressed or implied by the forward-

looking statements we make. All statements other than statements of historical fact could be deemed forward-looking, including any projections of

product or service availability, subscriber growth, earnings, revenues, or other financial items and any statements regarding strategies or plans of

management for future operations, statements of belief, any statements concerning new, planned, or upgraded services or technology developments

and customer contracts or use of our services.

The risks and uncertainties referred to above include – but are not limited to – risks associated with developing and delivering new functionality for our

service, new products and services, our new business model, our past operating losses, possible fluctuations in our operating results and rate of growth,

interruptions or delays in our Web hosting, breach of our security measures, the outcome of intellectual property and other l itigation, risks associated

with possible mergers and acquisitions, the immature market in which we operate, our relatively limited operating history, our ability to expand, retain,

and motivate our employees and manage our growth, new releases of our service and successful customer deployment, our limited history reselling

non-salesforce.com products, and utilization and selling to larger enterprise customers. Further information on potential factors that could affect the

financial results of salesforce.com, inc. is included in our annual report on Form 10-Q for the most recent fiscal quarter ended July 31, 2012. This

documents and others containing important disclosures are available on the SEC Filings section of the Investor Information section of our Web site.

Any unreleased services or features referenced in this or other presentations, press releases or public statements are not currently available and may

not be delivered on time or at all. Customers who purchase our services should make the purchase decisions based upon features that are currently

available. Salesforce.com, inc. assumes no obligation and does not intend to update these forward-looking statements.

Welcome! What brings you here?

A. I’m curious how PerfEng can excel in an Agile Environment

B. I’m curious how to utilize a Performance Engineer's time

C. I’d like to understand how to better articulate the value of

Performance Engineering

D. I thought this was a great place to take a break and check my

social feeds before dinner

E. A, B, or C

F. All of the above

What do typical Performance Teams start as?

“Performance Engineering is run as a Shared Services

model so your charter is the entire organization with

maximum visibility. Everything flows through PerfEng

because it’s so critical. Dev, QE, Technical

Operations, Level II and III Support, and Professional

Services wants the most out of your engineers by

leveraging your talent across projects to scale mission

critical applications”

What it sounds like

PerfEng

What it actually feels like ;)

PerfEng

Top Ten Signs Your Team Needs Help 1. You laugh when asked to signoff at Feature Freeze (and Release Freeze)

2. Your engineers work on 6-12 parallel projects (others work on 1-2 projects serially)

3. If you attended each of your assigned scrum teams’ daily 15 min standup you’d

never sit down (the entire week)

4. When you can’t signoff on a feature, everyone wants to raise the goals instead of

fixing the performance problem

5. Every day you answer “How did you decide to prioritize my feature? How can I

escalate this?” (even after you had agreement)

6. You’re told to commit to a plan for the next release while your team is busiest in the

current release and has no time to plan

7. Your team wants to influence the product or hardware architecture but can’t find the

time to even write up their analysis

8. Developers discount poor results due to variance without looking at the data (even

though the results of the latest release are always worse)

9. IT always asks “Why do you need isolated labs? Dev and QA don’t need them”

10.Devs ask your engineers to do manual tasks at all hours

That sounds like my

situation… How did

Salesforce approach this?

What’s in store?

Introduction

The Unique Challenge at Salesforce

How the Team Scales

Workloads

Automation, Tools, Environments

Closing Thoughts and Tips

Q: Who is Kasey?

Brief Background

VP @ Salesforce

Performance Engineering

Sr. Director / Tech Lead @ Wily Technology

Performance Engineering, Software Tools, QA,

R&D Lab

Architect @ Event Zero

Developer, consultant for startups

Developer @ Ziff-Davis Benchmark Operation

Industry Standard Software Benchmarks

iBench, WebBench, ServerBench, NetBench

What drew me to Salesforce?

• Performance and Scalability is one of the

top three core values of the company

• One of the most complex Enterprise

scalability challenges anywhere

• As of today one of the best funded teams in

the industry and growing as quickly as we

can find the best people

What are some key challenges at Salesforce?

1. Mission Critical Enterprise Apps Customers pay for

No perf testing in production on unwary customers

No tolerance for downtime or slow response times which

immediately impact customers’ bottom line

2. Security is Paramount

Extremely difficult to access production systems / data

Can’t easily examine load and data shapes in detail

3. True Multi-Tennant Architecture

Every customer can create completely different load / data

characteristics at a moment’s notice

Noteworthy Milestones

Mid 2006 – “System Test” Team created from HA crisis

April 2008 – Kasey Lee joins a struggling team of 7

Sept 2008 – Automation & Tools Team Created

Sept 2009 – Team averts162 R1 Load Balancer Disaster

Jan 2010 – Leads solution to Capacity Planning crisis

Sept 2010 – Team predicts GC Heap 168 R1 Regression

Sept 2010 – Team leads solutions to NA6 Perf

Nov 2011 – Team helps reduce production CPU >60%

May 2012 – Team Triages 178 R1 Bytecode Regression

June 2012 – Team size rises to 60

Jan 2013 - Target size: 80+

Traffic & complexity

continues to increase to

~60B / Quarter, but response

times have decreased!

Major Accomplishments ex. – “CPU 15”

• SWAT team optimization /

tuning efforts saved the

company ~$150 Million dollars

• Optimizations include potential

to change the JVM spec

directly to benefit everyone

• Great example of ROI

Not only in dollars, but helps

build the credibility that you can

leverage to do even more

Performance

Daily Dashboard

– 10/24/2011 –

RED!

Performance

Daily Dashboard

– 11/15/2011 –

Look at all that

GREEN!

How do we accomplish this?

• Baseline Functionality & Benchmarking

• New Feature Benchmarking

• Patches / Production Support

• Hardware / Infrastructure Analysis

• Special Studies / Research / POC

• Production Visualizations

• Capacity / Sizing Guides

• Architecture Expertise

• Profiling Concepts and Training

• Automation Frameworks

• Self Service Frameworks

• Data Analysis, Creation, Visualization Tools

• Load Generation Tools

• Environment Design

• Optimization

What We Continually Focus On Blazing fast performance delivered by Cloud teams and PerfEng through

collaboration, innovation and transparency

Empowered and engaged PerfEng inspired by the real world impact of

their work and widely recognized as industry thought leaders

Quick and accurate test results, effective testing, seamless scheduling

and flexible environments

Frequent assessment, optimizations, and deep visibility into feature

performance during development and in production

Fully integrating PerfEng into product development as beneficial and

essential members of Cloud teams

Performance built in by Cloud teams and able to catch obvious

performance issues themselves

What really makes us so effective?

1. Our Perf/Dev ratios have been adopted (after numerous “discussions” )

2. We have a Software Development Team

3. We have a Product Owner (Prod. Mgr) for our Labs

4. We have a dedicated TechOps team “PerfInfra” for Labs

5. We have a substantial lab for testing

6. We have a Program Manager focused on cross functional project strategy,

visibility, and communications

Performance

• Sales/Service/Data

• Features

• Workloads

• Chatter

• Features

• Workloads

• Platform/Mobile/UI

• Features

• Workloads

• Core/Search/Analytics

• Features

• Workloads

Performance Engineering Team Structure

Automation & Tools & Env

• Software Tools Developers

• Environments

• Product Owner

• Special Projects Lead

Architect

Program Manager

The Importance of Early Starts…

PerfEng Historical Lag per Release

Product Development Sprints

Jan Feb Mar Apr May Jun

Planning Final Plans Due Feature Freeze

Release Sprint Release

Sandbox R1 Release

Freeze

Accumulated performance bug debt

Cost of

finding &

fixing bugs

PerfEng Begins Testing

Why Start Early

and Profile

upstream?

Avoid obvious

problems earlier

• Late starts with

minimal

workloads

• Increased

workloads and

decreased time

to bring online

• No longer need

to track

PerfEng Starts vs. Release Timeline





Sandbox R1 Release

Freeze

Q: How do we do scale PerfEng to meet

demands of a larger organization?

Ratios are Key to Establish and Socialize

PerfEng established a 1:8 ratio of Perf/Dev IC

No more than two scrum teams or three projects / release

Does not include workload engineers (min of two per cloud)

Does not include Managers or Software Tools Engineers

Perf Managers/IC ratio may need to be higher than 1:8

Managers may require 1:3 or 1:5 due to the additional teams

managers interact with cross functionally

Find a ratio that enables PerfEng

Factor early participation, deep dives, optimization work to provide

meaningful contributions

Support discussion with velocity points, automation and efficiency

examples, ROI Examples

Gap is closing today, but still

haven’t reached the target

>2x

>2x

>1.2x

Self Service to the Rescue!

Embed Performance Mindset into Every Team

“Closely partner with scrum teams to

provide early, fast, continuous

architecture engagement / results /

analysis for complex scenarios and

enable scrum teams to catch obvious

performance issues with self service

tools, automation, and processes before

they reach Performance Engineering”

How We Interact with Scrum Teams

• Each scrum team appoints one Dev and QE engineer

who are mapped to a single PerfEng

• Teams must co-develop their release plans and sign off

criteria up front

• Teams are accountable for their features (complete

ownership coming back to PerfEng as team scales up)

• Teams must characterize obvious performance criteria

themselves every sprint (Cadence, PTest)

• Teams must deliver their features on time or accept

testing into the release sprint or beyond

PerfEng Rep Mapping

PerfEng

Rep

Dev/QE

Reps

Embedding Performance – A Tiered Approach

80% Scrum Team +

0% PerfEng

15% Scrum Team +

40% PerfEng

5% Scrum Team +

60% PerfEng

Single user transactions on

Desktops/Local Builds

Single user transactions on Corsa Single user transactions on IST

Single user transactions in PTests High Load, High Concurrency on

Corsa

High Load, High Concurrency on

IST

Increasing Test Complexity and Feature Risk

Scrum Teams focus on catching obvious low-hanging

fruit; PerfEng focuses on difficult to construct, high

load/concurrency scenarios requiring highly specialized

knowledge to detect and analyze

86 GB of meta data primarily from PerfEng workload tests!

1.4 TB of meta data from tests created by Devs and outside teams!

Q: What are the key Agile Release

Milestones and activities for PerfEng?

Release Timeline and PerfEng Activities





Sandbox R1 Release

Freeze

•Appoint Liaisons

•Complete Release Plans

•Double Check Exit

Criteria

•Signoff on all

Features and

Workloads

•Continue Workload

Optimizations

Milesto

nes

•Monitor Sandbox

•Final Optimizations

•Initial visibility into

all features

•Signoff on ¾ of

features

•Get workloads

green

How Do We Allocate Engineer’s Time?

70% Velocity Points Open

Feature or Workloads work for a specific cloud

30% Velocity Points Reserved

PTOn (9 days/year to work on whatever they want)

External Training Classes (e.g. SQL Tuning)

Other Cloud’s projects they are interested in

Conferences (e.g. HBase, Hadoop)

Foundation events (1:1:1) We Leverage Agile and

ADM to enable People’s

Changing Interests

Requirements/Arch Strategy/Test Plan Analysis/Results

Templates Cover Most Important Phases of a Project

Release Signoff Criteria and Team Dynamics

• PerfEng will only sign off on features we

worked on directly (or have thoroughly

reviewed the plans and results)

• Scrum Teams may sign off on features by

themselves at their own risk for any feature

with Medium or Less Risk (if PerfEng is

short of resources)

Quick Tip – Negotiating Release Criteria

Bring in teams from operations and support

Quote examples of consequences of releasing

without adequate throttles and caps in place

Cite examples from your company or other

leading companies of the cost of reduced

customer credibility

Workloads

What is a “Workload”?

• A repeatable test simulation or benchmark that provides a

meaningful result by utilizing specific inputs into the system under

test while recording numerical metric data, which is subsequently

analyzed and weighted to perform a qualitative assessment

• Changing a variable in the workload and re-running provides a

meaningful comparison

• Baseline Workloads are automated and enhanced release over

release wherever possible

Workloads Map

DB Workloads

Grinder

Force.com

Apex

VisualForce

Chatter

Search

MQ

UI

Sharing

“Shape” Terminology

Load Shape – The distribution, rate, and

type of requests injected into the system

under test (SUT)

Data Shape - The size, skew, and type of

data, files, etc. accessed during the test

Categories

Playback tests take production traffic logs and replay traffic against the cut of data from

that time period

• This enables Salesforce.com to properly capture data skews, volumes, and transactions that customers have

run at a particular time and cover features that are heavily customizable

Synthetic tests involve utilizing custom tools to profile production load and data shapes

and then use custom tools to create workloads that mimic the desired characterisitics

• Synthetic tests enable the team to create data and load shapes that may be far greater or more accentuated

than in production, in a deterministic and precise fashion that enables granular studies of linearity,

bottlenecks, and resource utilization

• In most situations different versions of Salesforce are compared against one another, although absolute

performance metrics are used for new features or situations where it is too difficult to make meaningful

comparisons

Workload Highlights

Name Summary Load Shape Data Shape

DB Workloads A workload the replays real production requests against

customer data in a precise fashion to meticulously identify

proper DB stats, tuning

100,000 complex

reports and filters

Sanitized copy of real world production

Data with emphasis on massive data sets

for reports

Grinder A large scale, high load, high concurrency test that simulates an

hour of peak production traffic by replaying transactions

400 RPS, target

production

steady state

utilization of 35%,

peaks of 80%

Sanitized copy of real world production

Data

Force.com Simulates traffic against a standard Ideas sites / base them

application.

Requests are

generated across

40 different URLs /

operations

Synthetic data based on real world force.com

app “Ideas”

Visual Force A read-only targeted test isolating specific components of VF at high

request rates. Apex components are designed to be constant across

all requests so regressions are to pure VF

32 concurrent

requests across 10

orgs

Small Synthetic VF classes. Viewstates,

Wrapperless / Wrapped nested data

presentation and Namespaces

Apex A targeted test that exercises the components of Apex Cache, CPU

consumption, Memory

64 threads across

16 organizations

Synthetic set of classes that exercise Apex

Cache, CPU Use of Apex L1, Maximal

number of lines of apex, creation of

temporary objects

Sharing A workload that performs DML Operations on Sharing Enabled Orgs,

Performs Sharing Rule Maintenance Operations on Various Entities,

Territory Management Operations and Accounts/Opportunity

reassignments, Routes bulk processing via Message Queues.

2 app servers, 10

concurrent Users,

7 Thread groups

3 bulk processing

Synthetic Orgs (One Territory Managed, One Regular)

Workload Highlights (continued)

Name Summary Load Shape Data Shape

Search High load, high concurrency test that simulates peak production traffic

by replaying searches and concurrently simulating incremental

indexing. Monitors and reports metrics on entire stack [Indexers, DB,

Query Servers, App Servers, Memcached]

Replay production searches

and performs incremental

indexing at peak load. Issues

searches at 55 RPS

Sanitized copy of real world

production Data

MQ Workload:

QPID (transport in

isolation)

A workload which enqueues messages into QPID on an IST using

multiple IST app servers. Tests QPID (the MQ transport service) in

isolation. Suitable for acceptance testing an upgrade.

20 app servers x 20 threads

enqueue messages of

varying sizes for 10min-6hr

synthetic; configurable message

size

MQ Workload:

Hydra

(integrated)

A workload which creates load on the integrated SFDC MQ framework

using the SFDC MQ API library. Uses synthetic asynchronous handlers

running on the app servers to simulate message and resource

consumption. Suitable for running with every release, and for

simulating the impact of a new asynchronous handler.

20 app servers x 20 threads

enqueue messages of

varying sizes for 10min-6hr

synthetic; configurable message

size

Mobile Workloads simulate user actions over a real 3G network. Captures

metrics to measure end-user perceived response times on slow

networks and real devices

Real Device & Emulator. On

Real 3G networks Sanitized copy of real world

production Data

UI Workloads simulate user actions in a real browser. Captures metrics to

measure end-user perceived response times. Org with Chatter data is

very large.

6 Browsers – Nightly tests Synthetic user data – across all

standard pages. 3 Different orgs

to test across different

skins/chatter.

Workload End to End Coverage* (At a Glance)

UI Network App Search Indexer FFX Batch DB SAN

DB Wkld 8 1

Grinder 1 8 3 3 3 3 7 3

Force.com 1 8 1

VF 7

Apex 7

Sharing 7 2

Search 6 6 6 4 3

MQ 6 2

Mobile 5 4 1

UI 8 5 4 1

Batch 6

*Higher numbers indicate better coverage in a given tier

Daily DB, Appserver, and UI Performance Tests!

UI - End User Response

Time Workloads

Appserver Workloads

Database Workloads

168 – Performance Bugs ROI

290 Total = 78 Workloads (27%), 211 Feature Testing (73%)

Note that >50% of P0 bugs were

found by baseline workloads!

Automation and Tools

Tools (Analysis, Monitoring, Automation)

Custom

Michelangelo / Caliper

StatsForce

Leonardo

LightHouse

Suzuki

Cadence

SUIT / CSP

PTest

ReplayForce

DataForce

Off the shelf

JMeter

STAF

JProfiler

Dynatrace Ajax

Splunk

Shunra

Jiffy

HTTP Analyzer

Selenium

Fiddler, YSlow, PageSpeed, Firebug

Michelangelo – Results Viewer

• Provides single point of

entry into all automated

tests

• Dynamic Test vs. Test

views

• Automatic Averaging of

test runs and filtering of

outliers

• Compare baseline to

results trends

Michelangelo Changelist Trend Example

• Dramatically shows

changes in performance

to the changelist

• Row and Column

highlighting

• Color Coding

• Annotations

• Compare baseline to

results trends

• Absolute and Relative

difference comparisons

Specific

Changelist fix

results in 33%

more GC

activity

Michelangelo Demo

StatsForce – High Resolution

Time Correlated

Visualizations

• OS Statistics

• Application Statistics

• JVM Statistics

• Errors

• Mix and match

representations and chart

types on demand

Notice the

benefits of

time

correlation!

Notice different

chart types

(scatter) on

same timeline!

Notice

how Full

GCs affect

Response

Times!

This looks odd!

Statsforce Example - Force.com Workload Load

Balancer Regression 166 164

166 164

This looks odd!

Statsforce Example – Errors Per Second

Statsforce

Example -

Custom Test vs.

Test Views!

Environments

Name Description Size

IST (Integration

System Testing)

• Large scale pod. Closest to production in both software and hardware

configuration (load balancers, 8 node RAC database, etc.)

• Primarily uses production data

CST (Comparison

System Testing)

• Small environments focused on Database workloads

• Primarily uses production data

DB Load (Prod,

Synthetic)

• Small environment with large sized DBs (4TB – 20TB)

• “Prod” uses production data , “Synthetic” uses synthetic data

Corsa (“Race”)

• Small environments with hardware vertically identical to production

• Fewer horizontal nodes, focused on a particular SUT (Search, DB)

• Does not utilize production data

VMs / Autobuilds Dedicated environment for each engineer for development purposes

Desktops / Adhocs Dev local machines or Adhocs for PerfEng – dedicated for each

engineer for local tests or development

Environment Types

Continuous Data Refresh System

•Enables teams to access latest

production / synthetic data with

minimal downtime

•Performance tests can modify / delete

TB of data and rollback in minutes

Details

Production snapshots and corsa

images are taken periodically and

stored on SAN Refresh

A “jukebox” server prepares snaps

into “green” database images

The jukebox applies schema updates

and keep them “green”

The “green” images are always ready

to use and rsynched directly to the

environments

Closing Thoughts and Tips

Where is Salesforce.com PerfEng Today? 30,000 ft. view

• Team has evolved from seven “Systest” engineers who

struggled to produce meaningful analysis, to a world class

Performance Engineering organization of >60 engineers with

no significant production issues the day after release for

almost three years

• Active participation in features, provides visibility and risk

assessment at critical milestones and averts major

degradations, helps triage and mitigate production issues,

delivers optimizations across the stack, and whose skills and

headcount are now lobbied for by Development teams

• Automation has increased from two workloads which ran a

handful of times late in the release, to over 15 sophisticated

workload suites that run every day and are critical to signoff

Top Ten Tips for Scaling Your Team

1.Socialize your ratios for PerfEng to Developers to eventually embed into teams

2.Propose a dedicated model over a shared service model

3.In a pinch, provide teams the velocity points they have funded, and ask them to prioritize

4.Build out your management team at every opportunity

5.Develop meaningful automated workloads with low variance and show the ROI regularly

6.Create a tools team that spends >=75% of their time developing automation and tools

7.Make your Labs and Test Frameworks self service

8.Develop production monitoring tools to collect relevant data for workloads and exit criteria

9.Create frameworks to enable staged work from Dev desktop to large scale Perf environments

10.Develop training classes for perfeng, new hires, Dev/QE liaisons – smaller population first

What else could be

responsible for this

dramatic optimization?

Could Increasing PerfEng

continue this trend…?

Bonus Tips for a Happy Team

• Contribute to a positive atmosphere that

promotes Autonomy, Mastery, and Purpose

with interesting projects to tackle in depth

• Focus on your strengths and strive to

improve at every opportunity

• Set a bold vision with achievable

milestones, and celebrate progress

“Is anything truly impossible? Perhaps it is

temporarily impractical or unlikely” – Kasey Lee

Ex. Human Exoskeletons (2:05)

What will you take from today?

What will you change starting next week?

http://www.youtube.com/watch?v=BLnOPA7oMxY

Kasey Lee

VP Performance

Engineering,

in/leekasey

Turn your PerfEng team from this… Into this…

Manual Black Box Testers Architecture / Analysis /

Simulation / Optimization /

Visualization / Automation /

Monitoring Experts

Lines of Defense

1. Single user requests in PTest on VMs

2. Single user requests / high load on Corsa

3. Concurrent / high load on Corsa

4. Single user requests on DB Load

5. Concurrent / high load on DB Load

6. Single user requests on IST

7. Concurrent / high load on IST

Questions

Documents

How Salesforce built a Scalable, World-Class, Performance Engineering Team