Upload
ncoghlandev
View
120
Download
1
Embed Size (px)
DESCRIPTION
(Image on page 3: it's the traditional fast/good/cheap trade-off. Something glitched in the conversion))
Citation preview
Path Dependent Development
Nick Coghlan@ncoghlan_dev
Red Hat ToolsmithCPython Core Developer
Usefully Wrong
“All models are wrong. Some models are useful.”
“... the practical question is: How wrong do they have to be to not be useful?”
George E. P. Box (statistician) “Empirical Model-Building”
Choose Any Two?
Path Dependence
● “good enough to be useful” -> ship it● The decisions we make leave their mark on
the software we ship● These marks remain long after the scope of
the software expands to other use cases
What is “Good Enough”?
● Depends on your priorities and resources– What are you building?
– Why are you building it?
– Who are you building it for?
– Who is building it?
– What are you building it with?
– How much risk can you tolerate?
Context Matters
● Building an intranet web service– Trusted network
– Enforced user base
● Building a web startup– Hostile network
– Business lives or dies by user choice
● Building hardware control and management systems– Usage driven by hardware
– Software as a necessary evil
Trade-Offs Needed:Inquire Within
Functionality
● Doing one (or a few) things well is often better than doing a lot of things badly
● Adding functionality later is usually easier to sell than taking it away (no matter how broken it turns out to be)
Flexibility
● Don't make things configurable● Configurability = testing and maintenance pain● Do separate concerns (if you make it configurable
later, only one place needs to change)● Do use flexible support tools
– SQL Alchemy makes it easy to change database
– Django locks in some major decisions (like ORM and templating language) but provides a rich ecosystem of prebuilt components that work well together
Security
● A lot of software is still insecure by default– Unhashed (or poorly hashed) passwords
– Unencrypted communications channels
● Multiple layers of defence can hide this● Try to make the “easy option” and the “secure
option” one and same● Can be very hard to fix poor security choices
Reinventing Wheels
● Reuse means dependency management● Often simpler to roll your own to start● With good modularity, easy to replace later● Watch for increasing complexity
Documentation
● How sophisticated are users expected to be?– Installed by developers? Admins? End users?
– Intended for domain experts only?
● Is it stable enough to document?● Documentation can highlight design flaws
Test Quality
● Fine grained tests pinpoint failures easily
● Coarse grained tests are often easier to write
● Can easily start with coarse grained tests, then add more fine grained tests to narrow down failures
● Slow tests are better than no tests
● External dependencies are better than no tests
● Regression tests are great, but don't let them block fixes for problems that can't be reproduced reliably
Code Reviews
● Code is written to:– Tell the computer what to do
– Tell future maintainers what it does
● Tests cover the first, reviews the second● Debatable value for small teams● Highly valuable for large teams● Needs appropriate tools
Performance & Scalability
● Don't stress about it if you don't need to● Start with measurement infrastructure● If simple is fast enough, stick with simple
Reliability
● Not all software is mission critical● Pay attention to failure modes● Error quality matters
Usability
● Humans are still a lot smarter than computers● If users have no choice, they'll usually cope● Hence, awful UX in most “enterprise” software
Maintainability & Business Risks
● The Bus Factor– Most startups = 1
– Large companies want it to be higher
● Developer docs (including comments)● Legal risks (copyrights, patents, trademarks)
Automation
● Critical to speeding up release cycles● Is a process stable enough to automate?
Managing Path Dependence
Exit Strategies
● Know what you're not doing● Have a vague idea how to fix it when needed● Actual fixes will depend on future needs● Sometimes, the only right answer is “No”
Patterns and Processes
● Keep your options open● Minimise current complexity● This is not easy
– Software architecture and design patterns
– Software processes and methodologies
● “interim” solutions may last a long time● If you don't have a test suite, start there
Prototyping vs Implementation
● Two very different modes of development● Prototyping
– Exploration
– Trying to figure out what is feasible
● Implementation– Already known to be feasible
– Making it happen to a known specification
● Big difference in priorities!
Social Implications
● Design decisions are context dependent● Easy to criticise in hindsight● Design trade-offs can influence community● Actually getting better at building software● Ambitions are (more than?) keeping pace
Path Dependence in Action
An Innocent Start
● PulpDist: Mirroring network based on rsync ● Simple job definitions{ "remote_server": "localhost", "remote_path": "/demo/simple/", "local_path": "/var/www/pub/sync_demo_raw/", ...}
● Simple custom validator for JSON data
– Checks on individual values
– Overall sanity checks on full jobs
Don't Repeat Yourself
● Simple format turned out to be too simple– Hard to modify given multiple jobs from same source
● Enhanced format with reusable elements{ "mirror_id": "local_copy", "tree_id": "simple_sync", "site_id": "bne", ...}
● Simple validator was no longer adequate
What To Do?
● Upgrade the existing validator– Possible, but tedious to test properly
– Not a good wheel to reinvent
● JSON validation library– Research would be starting from scratch
– Hard to assess quality quickly
● Relational database– Enforces the constraints by its very nature
– Error quality would likely be poor
Two Birds...
● For validation, I needed to:– Ensure identifiers were unique
– Ensure cross references were valid
● For UI purposes I also needed:– To filter by component identifiers
– To sorting by various fields
● Sound familiar?
Two Birds...
● For validation, I needed to:– Ensure identifiers were unique
– Ensure cross references were valid
● For UI purposes I also needed:– To filter by component identifiers
– To sorting by various fields
● Sound familiar?
...One Stone
● An in-memory SQLite database was perfect● But writing SQL by hand is still horrible● SQL Alchemy in target environment● Problem solved!
– Config loaded into DB after simple field validation
– If the DB accepts it, references are also valid
How Does The Story End?
● Still some very rough edges– Sqlite error messages are quite user hostile
– Schema changes are triple-keyed
● Future changes?– Master in database, JSON only as export?
– Improved error messages?
– Switch to an actual schema engine?
● Other priorities!
Q & A
Pulp:
http://pulpproject.org/
PulpDist:
https://fedorahosted.org/pulpdist/
CPython Sprints
Monday & Tuesday