Upload
delta
View
37
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Scaling Development Darrin West - Emergent. Scaling Development: Challenges. Change Scale of Team Scale of SW/codebase/content Time crunch; Long critical path Software Engineering at scale adds overhead, but allows for parallelism. Team Organization Software Engineering Processes. - PowerPoint PPT Presentation
Citation preview
Scaling Scaling DevelopmentDevelopmentDarrin West - Darrin West - EmergentEmergent
Scaling Development: Challenges Change Scale of Team Scale of SW/codebase/content Time crunch; Long critical path Software Engineering at scale adds
overhead, but allows for parallelism
Team OrganizationSoftware Engineering Processes Team size – code/content change rate
Red Team, Blue Team (pipelining) Enforced Modularity (functional areas) Better/reliable direct parallelism (no
locking) Reduce overhead, retain
correctness/causality Less waste, less blocking.
Configuration/Change management Builds breaking (probabilities)
1 % chance of break (twice a year) for each of 50 engineers…gives 60% chance of a good build (0.99^50)
Sample SE Processes
Per-Developer branches, Integration Engineer. Continuous Integration: tests run during build. Fast incremental builds, fast load/test times,
or eng might skip. Automatic regression
Testing Pre-Checkin Tools
Branch and Run. Don’t wait for everyone at end of “phase”.
Nimble (but big) Development Iteration rate
Build, test, reliability Lazy (minimal) content loading
Small init time Don’t wait to optimize this Build sophisticated asset loader early. It pays
every day Play test – reliable, quick injection of new ideas. Design Tuning (without making a new Build) Content iteration
Incremental development – always demoable – small sprints
Scale and break testing Rapid application of changes to Production.
En-Nimbling Suggestions
Fast Iteration Don’t restart the app/server just for new content. Don’t load all content, use lazy loading. Minimize build times: incremental build/link Break dependencies on implementation, minimize
recompile. Fast Adaptation to change
Design for change. Script development. Very modular. Solid lower layers. Fewer side-effects. Autotesting for a sure footing; easy refactoring. Optimize later (Kent Beck – rephrased)...
Make it work at all (and start getting feedback) Make it work right (well-formed, refactored, optimizable) Make it work efficiently (and still right)
Independent Iterations
Individual, parallel iterations require modularity, independence of modules
Ability to merge (even content) Invest in OH of C.M. to get some ||ism Which OH’s are *really* needed/best?
Content/code locking? Checkin “token”? Pre/post checkin code reviews?
Consistent, simple, flexible approach (software architecture)
Technique:Modularity and Interfaces A change is encapsulated/insulated
Makes system flexible Make system resilient to breakage
Parts can be reused Dependencies
Short include paths, faster builds, dependency checks
Acyclic: compile and link dependencies Forward declare, not #include Opaque pointers Tends to “drift”. How to avoid cycles even in an emergency.
Interfaces/abstractions Layering/levelizing: “Low level” modules should publish an
interface (so impl can change) Implementation can depend on Interface. Nothing should
depend on Implementation. Pure-virtual classes.
F .
A
B
ED
F
A
D
E
CB,C
GG
Modularity(2)
DLL/.so Significant reduction in link times Encapsulate change. Plugin, optional loading
With Interfaces, don’t have to export symbols Faster link and load times Fewer global symbols
Disentangle dependencies. Easier to understand code. Easier to test. Easier to integrate. One header file change doesn’t rebuild all Less likely to rebuild base libraries. If they do
change, only need to relink upper layers.
Frameworks
Sharing cycles Message driven. Multiple message handlers. Message oriented/not locks
Ease of understanding/debugging Single threaded, single writer Multi-process same as multi-threaded Allocate units of work to more or fewer
processes based on measurements/policy, not data/chance
Single paradigm: easier for an eng to move to a new part of the app. No 1 eng can know all of the app.
Lessons Learned from COM & .NET Lifetime management (who owns this
ptr?) Interface/Abstraction Plugin/Optional loading Factories Init Phases Recompile independence Dependency simplicity
Layers Peers/App
Pitfalls/Alternatives
Massive Link Circular dependencies “Infinite” include paths Unknowable dependencies Bad Layering, leading to no reuse
Resources
“Large Scale C++ Software Design”, John Lakos Some out of date ideas, some obvious, some
very good! Scale has gone up an order of magnitude
“Refactoring”, Martin Fowler Because you won’t get it right the first time
“The Game Asset Pipeline”, Ben Carter Good tips Still need fundamentally new approach
Scrum: http://danube.com
Scaling with Scaling with ParallelismParallelism Darrin West - EmergentDarrin West - Emergent
Scaling Performance: Challenges In distributed systems, not just code
efficiency, but interaction patterns/timing.
Compute “Load” varies in time, location, project phase…
OH vs. granularity/parallelism trade Blocking/Idling and Deadlock
Work/Computation Organization Variable compute load Can’t know: changes during development,
in real time. Design changes over life of project SW impl, structure, function-level changes Community does the unexpected. Varies in different areas, time of day.
Heterogeneous host speed and memory (incremental upgrades/purchasing)
Unpredictable # processors deployed on. Cores or Hosts might go offline.
Solve Variability:Flexibility and Policy Mainly talking about server-side
Lessons do apply to Clients Particularly symmetric multiprocessors. Watch out for scheduler assumptions. Where is data?
Decompose into small bits, and map Allows explicit mapping (tunable – metrics, tools) Automatic (if you have development time)
Separate mechanism from Policy. Policy easy to adapt New mechanism, or decomposition is hard.
Break it into a gazillion pieces, but re-aggregate it parametrically to minimize overheads.
Be in control of ||ism, not at the mercy of your data.
Logical Processes and Messages
LP
LP
LP LP
LPLP
LP
LP LP
LP
Physical Processor2
Physical Processor1
Map to Physical
Processes
Entities communicate using Messages only.
Entities communicate using Messages only.
Entities can therefore be mapped to any processor.
Entities can therefore be mapped to any processor.
Maximize Flexibility
Communicating Sequential Processes Single-writer Disjoint memory Lock-free
Decompose into small bits Uniform interaction pattern Recombine
Beware the performance saddle - overheads Too many processors. Too many processes.
Early Decisions on Decomposition/Mapping
What fraction of problem in each “unit” of work, and how many processors? % to graphics, other functions?
Interaction patterns: sparseness in space vs. even mapping to processors
Tradeoffs in the Decision
Concurrency and non-determinism to get performance.
Sacrifice serial, logical, repeatable computations.
Correctness vs. efficiency. Level of detail Causality
Overheads: amount of ||ism is traded against overheads.
These affect intrinsic, vs. achieved ||ism Synthetic ||ism Critical path
Effect of Overhead: Performance Saddle
Amdahl limit, plus scheduling/communication overhead
Ideal/real speedup (adds OH’s). Intrinsic, synthetic, achieved …
Elapsed Tim e
0
2000
4000
6000
8000
10000
12000
0 5 10 15 20 25 30 35
Processors
Ela
ps
ed
Tim
e
Efficiency: Examples of Overhead Linear increases in overhead, not logarithmic Shared bus, shared network Global locks (affect other processors) Blocking/Idling Other MMU/cache coherency limits Ultimately, communication/synchronization OH. Speed of light in 3d, real scale limited to O(N^3) Yes, Amdahl says break it into a gazillion pieces
to “not limit parallelism”, but that adds overhead. What is the ratio of real work to overhead?
Optimize OH on the critical path by running PingPong (all overhead application)
Best Policy:Measuring, Reacting Sharing “nicely”: allows multiple
processes per machine. Waste: constant looping vs.
instrumentation/measurement See remaining/available processing. Vs. dynamic Level of Detail?
Locating bottlenecks Change mapping (dynamic) Load bal vs. nearness of
Communication
How to Visualize Parallelism: Easy
Compute Hogs
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
10 9 11 8 12 7 13 6 14 5 15 16 4 17 3 18 19 2 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40
Processor #
Communication Hogs
0
2
4
6
8
10
12
14
10 9 11 8 12 7 13 6 14 5 15 16 4 17 3 18 19 2 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40
Processor #
Com
mo
Del
ay
Com
mo
+ S
ched
/Q
ueui
ngD
elay
SchedulingOverhead
Computation“Grain”
Message/Request
Synchronization/Mutexconsidered “Commo” forsimplicity.
Processor 1
Processor 2
Processor 3
Processor 4
Idle
How to Visualize Parallelism: Causality Tools
Summary: Why Message Passing FW’s Extracting ||ism
Efficiency: OH, lock-free, single-writer Tunability: granular, mapped
Cost and time to build: simple, flexible…
Plan for change Be in control. Configure.
Adding processors can make it slower, adding processes can make it slower. Find best, tune for that.
||ism Alternatives Multi-threaded, multiple r/w
Critical sections: priority inversion, deadlock Data access locks: slow when no contention, multi-resource
races. Stream processing
Can cost in extra data copying Scheduling effectiveness/efficiency Granularity and performance saddle
Functional Limited # pieces Long-pole, Amdahl
Pipelining (added latency) Optimistic
Memory, complexity, debugging, I/O and GVT Parallelizing compilers/small scale ||ism
Communication OH swamps computing, except in special cases (e.g with lots of pipelining)
Think: Ease of adapting to change
Development Similarity to Parallel Processing Can’t tell where time is going to be spent. Enough small pieces to keep everyone busy. Mapping. Keep parts independent (single writer) Avoid communication and syncronization overheads. Note immediately when someone is blocked or idle (no
busy-work). Only work on “real” stuff. No rollback. (maybe “some”) Avoid “global” activities
Meetings, checkin “semaphores”, consensus, Don’t wait for everyone to catch up. Measure and react Independent teams coordinated with “messages”. General-purpose “processors”, reassigned to today’s
task