Upload
salesforce-developers
View
1.497
Download
2
Embed Size (px)
DESCRIPTION
Citation preview
How Salesforce built a Scalable,
World-Class, Performance
Engineering Team
September 18th, 2012
Kasey Lee, Salesforce, VP Performance Engineering
in/leekasey
Safe Harbor
Safe harbor statement under the Private Securities Litigation Reform Act of 1995:
This presentation may contain forward-looking statements that involve risks, uncertainties, and assumptions. If any such uncertainties materialize or if
any of the assumptions proves incorrect, the results of salesforce.com, inc. could differ materially from the results expressed or implied by the forward-
looking statements we make. All statements other than statements of historical fact could be deemed forward-looking, including any projections of
product or service availability, subscriber growth, earnings, revenues, or other financial items and any statements regarding strategies or plans of
management for future operations, statements of belief, any statements concerning new, planned, or upgraded services or technology developments
and customer contracts or use of our services.
The risks and uncertainties referred to above include – but are not limited to – risks associated with developing and delivering new functionality for our
service, new products and services, our new business model, our past operating losses, possible fluctuations in our operating results and rate of growth,
interruptions or delays in our Web hosting, breach of our security measures, the outcome of intellectual property and other l itigation, risks associated
with possible mergers and acquisitions, the immature market in which we operate, our relatively limited operating history, our ability to expand, retain,
and motivate our employees and manage our growth, new releases of our service and successful customer deployment, our limited history reselling
non-salesforce.com products, and utilization and selling to larger enterprise customers. Further information on potential factors that could affect the
financial results of salesforce.com, inc. is included in our annual report on Form 10-Q for the most recent fiscal quarter ended July 31, 2012. This
documents and others containing important disclosures are available on the SEC Filings section of the Investor Information section of our Web site.
Any unreleased services or features referenced in this or other presentations, press releases or public statements are not currently available and may
not be delivered on time or at all. Customers who purchase our services should make the purchase decisions based upon features that are currently
available. Salesforce.com, inc. assumes no obligation and does not intend to update these forward-looking statements.
Welcome! What brings you here?
A. I’m curious how PerfEng can excel in an Agile Environment
B. I’m curious how to utilize a Performance Engineer's time
C. I’d like to understand how to better articulate the value of
Performance Engineering
D. I thought this was a great place to take a break and check my
social feeds before dinner
E. A, B, or C
F. All of the above
What do typical Performance Teams start as?
“Performance Engineering is run as a Shared Services
model so your charter is the entire organization with
maximum visibility. Everything flows through PerfEng
because it’s so critical. Dev, QE, Technical
Operations, Level II and III Support, and Professional
Services wants the most out of your engineers by
leveraging your talent across projects to scale mission
critical applications”
What it sounds like
PerfEng
What it actually feels like ;)
PerfEng
Top Ten Signs Your Team Needs Help 1. You laugh when asked to signoff at Feature Freeze (and Release Freeze)
2. Your engineers work on 6-12 parallel projects (others work on 1-2 projects serially)
3. If you attended each of your assigned scrum teams’ daily 15 min standup you’d
never sit down (the entire week)
4. When you can’t signoff on a feature, everyone wants to raise the goals instead of
fixing the performance problem
5. Every day you answer “How did you decide to prioritize my feature? How can I
escalate this?” (even after you had agreement)
6. You’re told to commit to a plan for the next release while your team is busiest in the
current release and has no time to plan
7. Your team wants to influence the product or hardware architecture but can’t find the
time to even write up their analysis
8. Developers discount poor results due to variance without looking at the data (even
though the results of the latest release are always worse)
9. IT always asks “Why do you need isolated labs? Dev and QA don’t need them”
10.Devs ask your engineers to do manual tasks at all hours
That sounds like my
situation… How did
Salesforce approach this?
What’s in store?
Introduction
The Unique Challenge at Salesforce
How the Team Scales
Workloads
Automation, Tools, Environments
Closing Thoughts and Tips
Q: Who is Kasey?
Brief Background
VP @ Salesforce
Performance Engineering
Sr. Director / Tech Lead @ Wily Technology
Performance Engineering, Software Tools, QA,
R&D Lab
Architect @ Event Zero
Developer, consultant for startups
Developer @ Ziff-Davis Benchmark Operation
Industry Standard Software Benchmarks
iBench, WebBench, ServerBench, NetBench
What drew me to Salesforce?
• Performance and Scalability is one of the
top three core values of the company
• One of the most complex Enterprise
scalability challenges anywhere
• As of today one of the best funded teams in
the industry and growing as quickly as we
can find the best people
What are some key challenges at Salesforce?
1. Mission Critical Enterprise Apps Customers pay for
No perf testing in production on unwary customers
No tolerance for downtime or slow response times which
immediately impact customers’ bottom line
2. Security is Paramount
Extremely difficult to access production systems / data
Can’t easily examine load and data shapes in detail
3. True Multi-Tennant Architecture
Every customer can create completely different load / data
characteristics at a moment’s notice
Noteworthy Milestones
Mid 2006 – “System Test” Team created from HA crisis
April 2008 – Kasey Lee joins a struggling team of 7
Sept 2008 – Automation & Tools Team Created
Sept 2009 – Team averts162 R1 Load Balancer Disaster
Jan 2010 – Leads solution to Capacity Planning crisis
Sept 2010 – Team predicts GC Heap 168 R1 Regression
Sept 2010 – Team leads solutions to NA6 Perf
Nov 2011 – Team helps reduce production CPU >60%
May 2012 – Team Triages 178 R1 Bytecode Regression
June 2012 – Team size rises to 60
Jan 2013 - Target size: 80+
Traffic & complexity
continues to increase to
~60B / Quarter, but response
times have decreased!
Major Accomplishments ex. – “CPU 15”
• SWAT team optimization /
tuning efforts saved the
company ~$150 Million dollars
• Optimizations include potential
to change the JVM spec
directly to benefit everyone
• Great example of ROI
Not only in dollars, but helps
build the credibility that you can
leverage to do even more
Performance
Daily Dashboard
– 10/24/2011 –
RED!
Performance
Daily Dashboard
– 11/15/2011 –
Look at all that
GREEN!
How do we accomplish this?
• Baseline Functionality & Benchmarking
• New Feature Benchmarking
• Patches / Production Support
• Hardware / Infrastructure Analysis
• Special Studies / Research / POC
• Production Visualizations
• Capacity / Sizing Guides
• Architecture Expertise
• Profiling Concepts and Training
• Automation Frameworks
• Self Service Frameworks
• Data Analysis, Creation, Visualization Tools
• Load Generation Tools
• Environment Design
• Optimization
What We Continually Focus On Blazing fast performance delivered by Cloud teams and PerfEng through
collaboration, innovation and transparency
Empowered and engaged PerfEng inspired by the real world impact of
their work and widely recognized as industry thought leaders
Quick and accurate test results, effective testing, seamless scheduling
and flexible environments
Frequent assessment, optimizations, and deep visibility into feature
performance during development and in production
Fully integrating PerfEng into product development as beneficial and
essential members of Cloud teams
Performance built in by Cloud teams and able to catch obvious
performance issues themselves
What really makes us so effective?
1. Our Perf/Dev ratios have been adopted (after numerous “discussions” )
2. We have a Software Development Team
3. We have a Product Owner (Prod. Mgr) for our Labs
4. We have a dedicated TechOps team “PerfInfra” for Labs
5. We have a substantial lab for testing
6. We have a Program Manager focused on cross functional project strategy,
visibility, and communications
Performance
• Sales/Service/Data
• Features
• Workloads
• Chatter
• Features
• Workloads
• Platform/Mobile/UI
• Features
• Workloads
• Core/Search/Analytics
• Features
• Workloads
Performance Engineering Team Structure
Automation & Tools & Env
• Software Tools Developers
• Environments
• Product Owner
• Special Projects Lead
Architect
Program Manager
The Importance of Early Starts…
PerfEng Historical Lag per Release
Product Development Sprints
Jan Feb Mar Apr May Jun
Planning Final Plans Due Feature Freeze
Release Sprint Release
Sandbox R1 Release
Freeze
Accumulated performance bug debt
Cost of
finding &
fixing bugs
PerfEng Begins Testing
Why Start Early
and Profile
upstream?
Avoid obvious
problems earlier
• Late starts with
minimal
workloads
• Increased
workloads and
decreased time
to bring online
• No longer need
to track
PerfEng Starts vs. Release Timeline
Product Development Sprints
Jan Feb Mar Apr May Jun
Planning Final Plans Due Feature Freeze
Release Sprint Release
Sandbox R1 Release
Freeze
Q: How do we do scale PerfEng to meet
demands of a larger organization?
Ratios are Key to Establish and Socialize
PerfEng established a 1:8 ratio of Perf/Dev IC
No more than two scrum teams or three projects / release
Does not include workload engineers (min of two per cloud)
Does not include Managers or Software Tools Engineers
Perf Managers/IC ratio may need to be higher than 1:8
Managers may require 1:3 or 1:5 due to the additional teams
managers interact with cross functionally
Find a ratio that enables PerfEng
Factor early participation, deep dives, optimization work to provide
meaningful contributions
Support discussion with velocity points, automation and efficiency
examples, ROI Examples
Gap is closing today, but still
haven’t reached the target
>2x
>2x
>1.2x
Self Service to the Rescue!
Embed Performance Mindset into Every Team
“Closely partner with scrum teams to
provide early, fast, continuous
architecture engagement / results /
analysis for complex scenarios and
enable scrum teams to catch obvious
performance issues with self service
tools, automation, and processes before
they reach Performance Engineering”
How We Interact with Scrum Teams
• Each scrum team appoints one Dev and QE engineer
who are mapped to a single PerfEng
• Teams must co-develop their release plans and sign off
criteria up front
• Teams are accountable for their features (complete
ownership coming back to PerfEng as team scales up)
• Teams must characterize obvious performance criteria
themselves every sprint (Cadence, PTest)
• Teams must deliver their features on time or accept
testing into the release sprint or beyond
PerfEng Rep Mapping
PerfEng
Rep
Dev/QE
Reps
Embedding Performance – A Tiered Approach
80% Scrum Team +
0% PerfEng
15% Scrum Team +
40% PerfEng
5% Scrum Team +
60% PerfEng
Single user transactions on
Desktops/Local Builds
Single user transactions on Corsa Single user transactions on IST
Single user transactions in PTests High Load, High Concurrency on
Corsa
High Load, High Concurrency on
IST
Increasing Test Complexity and Feature Risk
Scrum Teams focus on catching obvious low-hanging
fruit; PerfEng focuses on difficult to construct, high
load/concurrency scenarios requiring highly specialized
knowledge to detect and analyze
86 GB of meta data primarily from PerfEng workload tests!
1.4 TB of meta data from tests created by Devs and outside teams!
Q: What are the key Agile Release
Milestones and activities for PerfEng?
Release Timeline and PerfEng Activities
Product Development Sprints
Jan Feb Mar Apr May Jun
Planning Final Plans Due Feature Freeze
Release Sprint Release
Sandbox R1 Release
Freeze
•Appoint Liaisons
•Complete Release Plans
•Double Check Exit
Criteria
•Signoff on all
Features and
Workloads
•Continue Workload
Optimizations
Milesto
nes
•Monitor Sandbox
•Final Optimizations
•Initial visibility into
all features
•Signoff on ¾ of
features
•Get workloads
green
How Do We Allocate Engineer’s Time?
70% Velocity Points Open
Feature or Workloads work for a specific cloud
30% Velocity Points Reserved
PTOn (9 days/year to work on whatever they want)
External Training Classes (e.g. SQL Tuning)
Other Cloud’s projects they are interested in
Conferences (e.g. HBase, Hadoop)
Foundation events (1:1:1) We Leverage Agile and
ADM to enable People’s
Changing Interests
Requirements/Arch Strategy/Test Plan Analysis/Results
Templates Cover Most Important Phases of a Project
Release Signoff Criteria and Team Dynamics
• PerfEng will only sign off on features we
worked on directly (or have thoroughly
reviewed the plans and results)
• Scrum Teams may sign off on features by
themselves at their own risk for any feature
with Medium or Less Risk (if PerfEng is
short of resources)
Quick Tip – Negotiating Release Criteria
Bring in teams from operations and support
Quote examples of consequences of releasing
without adequate throttles and caps in place
Cite examples from your company or other
leading companies of the cost of reduced
customer credibility
Workloads
What is a “Workload”?
• A repeatable test simulation or benchmark that provides a
meaningful result by utilizing specific inputs into the system under
test while recording numerical metric data, which is subsequently
analyzed and weighted to perform a qualitative assessment
• Changing a variable in the workload and re-running provides a
meaningful comparison
• Baseline Workloads are automated and enhanced release over
release wherever possible
Workloads Map
DB Workloads
Grinder
Force.com
Apex
VisualForce
Chatter
Search
MQ
UI
Sharing
“Shape” Terminology
Load Shape – The distribution, rate, and
type of requests injected into the system
under test (SUT)
Data Shape - The size, skew, and type of
data, files, etc. accessed during the test
Categories
Playback tests take production traffic logs and replay traffic against the cut of data from
that time period
• This enables Salesforce.com to properly capture data skews, volumes, and transactions that customers have
run at a particular time and cover features that are heavily customizable
Synthetic tests involve utilizing custom tools to profile production load and data shapes
and then use custom tools to create workloads that mimic the desired characterisitics
• Synthetic tests enable the team to create data and load shapes that may be far greater or more accentuated
than in production, in a deterministic and precise fashion that enables granular studies of linearity,
bottlenecks, and resource utilization
• In most situations different versions of Salesforce are compared against one another, although absolute
performance metrics are used for new features or situations where it is too difficult to make meaningful
comparisons
Workload Highlights
Name Summary Load Shape Data Shape
DB Workloads A workload the replays real production requests against
customer data in a precise fashion to meticulously identify
proper DB stats, tuning
100,000 complex
reports and filters
Sanitized copy of real world production
Data with emphasis on massive data sets
for reports
Grinder A large scale, high load, high concurrency test that simulates an
hour of peak production traffic by replaying transactions
400 RPS, target
production
steady state
utilization of 35%,
peaks of 80%
Sanitized copy of real world production
Data
Force.com Simulates traffic against a standard Ideas sites / base them
application.
Requests are
generated across
40 different URLs /
operations
Synthetic data based on real world force.com
app “Ideas”
Visual Force A read-only targeted test isolating specific components of VF at high
request rates. Apex components are designed to be constant across
all requests so regressions are to pure VF
32 concurrent
requests across 10
orgs
Small Synthetic VF classes. Viewstates,
Wrapperless / Wrapped nested data
presentation and Namespaces
Apex A targeted test that exercises the components of Apex Cache, CPU
consumption, Memory
64 threads across
16 organizations
Synthetic set of classes that exercise Apex
Cache, CPU Use of Apex L1, Maximal
number of lines of apex, creation of
temporary objects
Sharing A workload that performs DML Operations on Sharing Enabled Orgs,
Performs Sharing Rule Maintenance Operations on Various Entities,
Territory Management Operations and Accounts/Opportunity
reassignments, Routes bulk processing via Message Queues.
2 app servers, 10
concurrent Users,
7 Thread groups
3 bulk processing
Synthetic Orgs (One Territory Managed, One Regular)
Workload Highlights (continued)
Name Summary Load Shape Data Shape
Search High load, high concurrency test that simulates peak production traffic
by replaying searches and concurrently simulating incremental
indexing. Monitors and reports metrics on entire stack [Indexers, DB,
Query Servers, App Servers, Memcached]
Replay production searches
and performs incremental
indexing at peak load. Issues
searches at 55 RPS
Sanitized copy of real world
production Data
MQ Workload:
QPID (transport in
isolation)
A workload which enqueues messages into QPID on an IST using
multiple IST app servers. Tests QPID (the MQ transport service) in
isolation. Suitable for acceptance testing an upgrade.
20 app servers x 20 threads
enqueue messages of
varying sizes for 10min-6hr
synthetic; configurable message
size
MQ Workload:
Hydra
(integrated)
A workload which creates load on the integrated SFDC MQ framework
using the SFDC MQ API library. Uses synthetic asynchronous handlers
running on the app servers to simulate message and resource
consumption. Suitable for running with every release, and for
simulating the impact of a new asynchronous handler.
20 app servers x 20 threads
enqueue messages of
varying sizes for 10min-6hr
synthetic; configurable message
size
Mobile Workloads simulate user actions over a real 3G network. Captures
metrics to measure end-user perceived response times on slow
networks and real devices
Real Device & Emulator. On
Real 3G networks Sanitized copy of real world
production Data
UI Workloads simulate user actions in a real browser. Captures metrics to
measure end-user perceived response times. Org with Chatter data is
very large.
6 Browsers – Nightly tests Synthetic user data – across all
standard pages. 3 Different orgs
to test across different
skins/chatter.
Workload End to End Coverage* (At a Glance)
UI Network App Search Indexer FFX Batch DB SAN
DB Wkld 8 1
Grinder 1 8 3 3 3 3 7 3
Force.com 1 8 1
VF 7
Apex 7
Sharing 7 2
Search 6 6 6 4 3
MQ 6 2
Mobile 5 4 1
UI 8 5 4 1
Batch 6
*Higher numbers indicate better coverage in a given tier
Daily DB, Appserver, and UI Performance Tests!
UI - End User Response
Time Workloads
Appserver Workloads
Database Workloads
168 – Performance Bugs ROI
290 Total = 78 Workloads (27%), 211 Feature Testing (73%)
Note that >50% of P0 bugs were
found by baseline workloads!
Automation and Tools
Tools (Analysis, Monitoring, Automation)
Custom
Michelangelo / Caliper
StatsForce
Leonardo
LightHouse
Suzuki
Cadence
SUIT / CSP
PTest
ReplayForce
DataForce
Off the shelf
JMeter
STAF
JProfiler
Dynatrace Ajax
Splunk
Shunra
Jiffy
HTTP Analyzer
Selenium
Fiddler, YSlow, PageSpeed, Firebug
Michelangelo – Results Viewer
• Provides single point of
entry into all automated
tests
• Dynamic Test vs. Test
views
• Automatic Averaging of
test runs and filtering of
outliers
• Compare baseline to
results trends
Michelangelo Changelist Trend Example
• Dramatically shows
changes in performance
to the changelist
• Row and Column
highlighting
• Color Coding
• Annotations
• Compare baseline to
results trends
• Absolute and Relative
difference comparisons
Specific
Changelist fix
results in 33%
more GC
activity
Michelangelo Demo
StatsForce – High Resolution
Time Correlated
Visualizations
• OS Statistics
• Application Statistics
• JVM Statistics
• Errors
• Mix and match
representations and chart
types on demand
Notice the
benefits of
time
correlation!
Notice different
chart types
(scatter) on
same timeline!
Notice
how Full
GCs affect
Response
Times!
This looks odd!
Statsforce Example - Force.com Workload Load
Balancer Regression 166 164
166 164
This looks odd!
Statsforce Example – Errors Per Second
Statsforce
Example -
Custom Test vs.
Test Views!
Environments
Name Description Size
IST (Integration
System Testing)
• Large scale pod. Closest to production in both software and hardware
configuration (load balancers, 8 node RAC database, etc.)
• Primarily uses production data
CST (Comparison
System Testing)
• Small environments focused on Database workloads
• Primarily uses production data
DB Load (Prod,
Synthetic)
• Small environment with large sized DBs (4TB – 20TB)
• “Prod” uses production data , “Synthetic” uses synthetic data
Corsa (“Race”)
• Small environments with hardware vertically identical to production
• Fewer horizontal nodes, focused on a particular SUT (Search, DB)
• Does not utilize production data
VMs / Autobuilds Dedicated environment for each engineer for development purposes
Desktops / Adhocs Dev local machines or Adhocs for PerfEng – dedicated for each
engineer for local tests or development
Environment Types
Continuous Data Refresh System
•Enables teams to access latest
production / synthetic data with
minimal downtime
•Performance tests can modify / delete
TB of data and rollback in minutes
Details
Production snapshots and corsa
images are taken periodically and
stored on SAN Refresh
A “jukebox” server prepares snaps
into “green” database images
The jukebox applies schema updates
and keep them “green”
The “green” images are always ready
to use and rsynched directly to the
environments
Closing Thoughts and Tips
Where is Salesforce.com PerfEng Today? 30,000 ft. view
• Team has evolved from seven “Systest” engineers who
struggled to produce meaningful analysis, to a world class
Performance Engineering organization of >60 engineers with
no significant production issues the day after release for
almost three years
• Active participation in features, provides visibility and risk
assessment at critical milestones and averts major
degradations, helps triage and mitigate production issues,
delivers optimizations across the stack, and whose skills and
headcount are now lobbied for by Development teams
• Automation has increased from two workloads which ran a
handful of times late in the release, to over 15 sophisticated
workload suites that run every day and are critical to signoff
Top Ten Tips for Scaling Your Team
1.Socialize your ratios for PerfEng to Developers to eventually embed into teams
2.Propose a dedicated model over a shared service model
3.In a pinch, provide teams the velocity points they have funded, and ask them to prioritize
4.Build out your management team at every opportunity
5.Develop meaningful automated workloads with low variance and show the ROI regularly
6.Create a tools team that spends >=75% of their time developing automation and tools
7.Make your Labs and Test Frameworks self service
8.Develop production monitoring tools to collect relevant data for workloads and exit criteria
9.Create frameworks to enable staged work from Dev desktop to large scale Perf environments
10.Develop training classes for perfeng, new hires, Dev/QE liaisons – smaller population first
What else could be
responsible for this
dramatic optimization?
Could Increasing PerfEng
continue this trend…?
Bonus Tips for a Happy Team
• Contribute to a positive atmosphere that
promotes Autonomy, Mastery, and Purpose
with interesting projects to tackle in depth
• Focus on your strengths and strive to
improve at every opportunity
• Set a bold vision with achievable
milestones, and celebrate progress
“Is anything truly impossible? Perhaps it is
temporarily impractical or unlikely” – Kasey Lee
Ex. Human Exoskeletons (2:05)
What will you take from today?
What will you change starting next week?
Kasey Lee
VP Performance
Engineering,
in/leekasey
Turn your PerfEng team from this… Into this…
Manual Black Box Testers Architecture / Analysis /
Simulation / Optimization /
Visualization / Automation /
Monitoring Experts
Lines of Defense
1. Single user requests in PTest on VMs
2. Single user requests / high load on Corsa
3. Concurrent / high load on Corsa
4. Single user requests on DB Load
5. Concurrent / high load on DB Load
6. Single user requests on IST
7. Concurrent / high load on IST
Questions