Scaling Development Darrin West - Emergent

Scaling Scaling DevelopmentDevelopmentDarrin West - Darrin West - EmergentEmergent

Scaling Development: Challenges Change Scale of Team Scale of SW/codebase/content Time crunch; Long critical path Software Engineering at scale adds

overhead, but allows for parallelism

Team OrganizationSoftware Engineering Processes Team size – code/content change rate

Red Team, Blue Team (pipelining) Enforced Modularity (functional areas) Better/reliable direct parallelism (no

locking) Reduce overhead, retain

correctness/causality Less waste, less blocking.

Configuration/Change management Builds breaking (probabilities)

1 % chance of break (twice a year) for each of 50 engineers…gives 60% chance of a good build (0.99^50)

Sample SE Processes

Per-Developer branches, Integration Engineer. Continuous Integration: tests run during build. Fast incremental builds, fast load/test times,

or eng might skip. Automatic regression

Testing Pre-Checkin Tools

Branch and Run. Don’t wait for everyone at end of “phase”.

Nimble (but big) Development Iteration rate

Build, test, reliability Lazy (minimal) content loading

Small init time Don’t wait to optimize this Build sophisticated asset loader early. It pays

every day Play test – reliable, quick injection of new ideas. Design Tuning (without making a new Build) Content iteration

Incremental development – always demoable – small sprints

Scale and break testing Rapid application of changes to Production.

En-Nimbling Suggestions

Fast Iteration Don’t restart the app/server just for new content. Don’t load all content, use lazy loading. Minimize build times: incremental build/link Break dependencies on implementation, minimize

recompile. Fast Adaptation to change

Design for change. Script development. Very modular. Solid lower layers. Fewer side-effects. Autotesting for a sure footing; easy refactoring. Optimize later (Kent Beck – rephrased)...

Make it work at all (and start getting feedback) Make it work right (well-formed, refactored, optimizable) Make it work efficiently (and still right)

Independent Iterations

Individual, parallel iterations require modularity, independence of modules

Ability to merge (even content) Invest in OH of C.M. to get some ||ism Which OH’s are *really* needed/best?

Content/code locking? Checkin “token”? Pre/post checkin code reviews?

Consistent, simple, flexible approach (software architecture)

Technique:Modularity and Interfaces A change is encapsulated/insulated

Makes system flexible Make system resilient to breakage

Parts can be reused Dependencies

Short include paths, faster builds, dependency checks

Acyclic: compile and link dependencies Forward declare, not #include Opaque pointers Tends to “drift”. How to avoid cycles even in an emergency.

Interfaces/abstractions Layering/levelizing: “Low level” modules should publish an

interface (so impl can change) Implementation can depend on Interface. Nothing should

depend on Implementation. Pure-virtual classes.

F .

A

B

ED

F

A

D

E

CB,C

GG

Modularity(2)

DLL/.so Significant reduction in link times Encapsulate change. Plugin, optional loading

With Interfaces, don’t have to export symbols Faster link and load times Fewer global symbols

Disentangle dependencies. Easier to understand code. Easier to test. Easier to integrate. One header file change doesn’t rebuild all Less likely to rebuild base libraries. If they do

change, only need to relink upper layers.

Frameworks

Sharing cycles Message driven. Multiple message handlers. Message oriented/not locks

Ease of understanding/debugging Single threaded, single writer Multi-process same as multi-threaded Allocate units of work to more or fewer

processes based on measurements/policy, not data/chance

Single paradigm: easier for an eng to move to a new part of the app. No 1 eng can know all of the app.

Lessons Learned from COM & .NET Lifetime management (who owns this

ptr?) Interface/Abstraction Plugin/Optional loading Factories Init Phases Recompile independence Dependency simplicity

Layers Peers/App

Pitfalls/Alternatives

Massive Link Circular dependencies “Infinite” include paths Unknowable dependencies Bad Layering, leading to no reuse

Resources

“Large Scale C++ Software Design”, John Lakos Some out of date ideas, some obvious, some

very good! Scale has gone up an order of magnitude

“Refactoring”, Martin Fowler Because you won’t get it right the first time

“The Game Asset Pipeline”, Ben Carter Good tips Still need fundamentally new approach

Scrum: http://danube.com

Scaling with Scaling with ParallelismParallelism Darrin West - EmergentDarrin West - Emergent

Scaling Performance: Challenges In distributed systems, not just code

efficiency, but interaction patterns/timing.

Compute “Load” varies in time, location, project phase…

OH vs. granularity/parallelism trade Blocking/Idling and Deadlock

Work/Computation Organization Variable compute load Can’t know: changes during development,

in real time. Design changes over life of project SW impl, structure, function-level changes Community does the unexpected. Varies in different areas, time of day.

Heterogeneous host speed and memory (incremental upgrades/purchasing)

Unpredictable # processors deployed on. Cores or Hosts might go offline.

Solve Variability:Flexibility and Policy Mainly talking about server-side

Lessons do apply to Clients Particularly symmetric multiprocessors. Watch out for scheduler assumptions. Where is data?

Decompose into small bits, and map Allows explicit mapping (tunable – metrics, tools) Automatic (if you have development time)

Separate mechanism from Policy. Policy easy to adapt New mechanism, or decomposition is hard.

Break it into a gazillion pieces, but re-aggregate it parametrically to minimize overheads.

Be in control of ||ism, not at the mercy of your data.

Logical Processes and Messages

LP

LP

LP LP

LPLP

LP

LP LP

LP

Physical Processor2

Physical Processor1

Map to Physical

Processes

Entities communicate using Messages only.

Entities communicate using Messages only.

Entities can therefore be mapped to any processor.

Entities can therefore be mapped to any processor.

Maximize Flexibility

Communicating Sequential Processes Single-writer Disjoint memory Lock-free

Decompose into small bits Uniform interaction pattern Recombine

Beware the performance saddle - overheads Too many processors. Too many processes.

Early Decisions on Decomposition/Mapping

What fraction of problem in each “unit” of work, and how many processors? % to graphics, other functions?

Interaction patterns: sparseness in space vs. even mapping to processors

Tradeoffs in the Decision

Concurrency and non-determinism to get performance.

Sacrifice serial, logical, repeatable computations.

Correctness vs. efficiency. Level of detail Causality

Overheads: amount of ||ism is traded against overheads.

These affect intrinsic, vs. achieved ||ism Synthetic ||ism Critical path

Effect of Overhead: Performance Saddle

Amdahl limit, plus scheduling/communication overhead

Ideal/real speedup (adds OH’s). Intrinsic, synthetic, achieved …

Elapsed Tim e

0

2000

4000

6000

8000

10000

12000

0 5 10 15 20 25 30 35

Processors

Ela

ps

ed

Tim

e

Efficiency: Examples of Overhead Linear increases in overhead, not logarithmic Shared bus, shared network Global locks (affect other processors) Blocking/Idling Other MMU/cache coherency limits Ultimately, communication/synchronization OH. Speed of light in 3d, real scale limited to O(N^3) Yes, Amdahl says break it into a gazillion pieces

to “not limit parallelism”, but that adds overhead. What is the ratio of real work to overhead?

Optimize OH on the critical path by running PingPong (all overhead application)

Best Policy:Measuring, Reacting Sharing “nicely”: allows multiple

processes per machine. Waste: constant looping vs.

instrumentation/measurement See remaining/available processing. Vs. dynamic Level of Detail?

Locating bottlenecks Change mapping (dynamic) Load bal vs. nearness of

Communication

How to Visualize Parallelism: Easy

Compute Hogs

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

10 9 11 8 12 7 13 6 14 5 15 16 4 17 3 18 19 2 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40

Processor #

Communication Hogs

0

2

4

6

8

10

12

14

10 9 11 8 12 7 13 6 14 5 15 16 4 17 3 18 19 2 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40

Processor #

Com

mo

Del

ay

Com

mo

+ S

ched

/Q

ueui

ngD

elay

SchedulingOverhead

Computation“Grain”

Message/Request

Synchronization/Mutexconsidered “Commo” forsimplicity.

Processor 1

Processor 2

Processor 3

Processor 4

Idle

How to Visualize Parallelism: Causality Tools

Summary: Why Message Passing FW’s Extracting ||ism

Efficiency: OH, lock-free, single-writer Tunability: granular, mapped

Cost and time to build: simple, flexible…

Plan for change Be in control. Configure.

Adding processors can make it slower, adding processes can make it slower. Find best, tune for that.

||ism Alternatives Multi-threaded, multiple r/w

Critical sections: priority inversion, deadlock Data access locks: slow when no contention, multi-resource

races. Stream processing

Can cost in extra data copying Scheduling effectiveness/efficiency Granularity and performance saddle

Functional Limited # pieces Long-pole, Amdahl

Pipelining (added latency) Optimistic

Memory, complexity, debugging, I/O and GVT Parallelizing compilers/small scale ||ism

Communication OH swamps computing, except in special cases (e.g with lots of pipelining)

Think: Ease of adapting to change

Development Similarity to Parallel Processing Can’t tell where time is going to be spent. Enough small pieces to keep everyone busy. Mapping. Keep parts independent (single writer) Avoid communication and syncronization overheads. Note immediately when someone is blocked or idle (no

busy-work). Only work on “real” stuff. No rollback. (maybe “some”) Avoid “global” activities

Meetings, checkin “semaphores”, consensus, Don’t wait for everyone to catch up. Measure and react Independent teams coordinated with “messages”. General-purpose “processors”, reassigned to today’s

task

Documents

Scaling Development Darrin West - Emergent