Path dependent-development (PyCon India)

Path Dependent Development

Nick Coghlan@ncoghlan_dev

Red Hat ToolsmithCPython Core Developer

Usefully Wrong

“All models are wrong. Some models are useful.”

“... the practical question is: How wrong do they have to be to not be useful?”

George E. P. Box (statistician) “Empirical Model-Building”

Choose Any Two?

Path Dependence

● “good enough to be useful” -> ship it● The decisions we make leave their mark on

the software we ship● These marks remain long after the scope of

the software expands to other use cases

What is “Good Enough”?

● Depends on your priorities and resources– What are you building?

– Why are you building it?

– Who are you building it for?

– Who is building it?

– What are you building it with?

– How much risk can you tolerate?

Context Matters

● Building an intranet web service– Trusted network

– Enforced user base

● Building a web startup– Hostile network

– Business lives or dies by user choice

● Building hardware control and management systems– Usage driven by hardware

– Software as a necessary evil

Trade-Offs Needed:Inquire Within

Functionality

● Doing one (or a few) things well is often better than doing a lot of things badly

● Adding functionality later is usually easier to sell than taking it away (no matter how broken it turns out to be)

Flexibility

● Don't make things configurable● Configurability = testing and maintenance pain● Do separate concerns (if you make it configurable

later, only one place needs to change)● Do use flexible support tools

– SQL Alchemy makes it easy to change database

– Django locks in some major decisions (like ORM and templating language) but provides a rich ecosystem of prebuilt components that work well together

Security

● A lot of software is still insecure by default– Unhashed (or poorly hashed) passwords

– Unencrypted communications channels

● Multiple layers of defence can hide this● Try to make the “easy option” and the “secure

option” one and same● Can be very hard to fix poor security choices

Reinventing Wheels

● Reuse means dependency management● Often simpler to roll your own to start● With good modularity, easy to replace later● Watch for increasing complexity

Documentation

● How sophisticated are users expected to be?– Installed by developers? Admins? End users?

– Intended for domain experts only?

● Is it stable enough to document?● Documentation can highlight design flaws

Test Quality

● Fine grained tests pinpoint failures easily

● Coarse grained tests are often easier to write

● Can easily start with coarse grained tests, then add more fine grained tests to narrow down failures

● Slow tests are better than no tests

● External dependencies are better than no tests

● Regression tests are great, but don't let them block fixes for problems that can't be reproduced reliably

Code Reviews

● Code is written to:– Tell the computer what to do

– Tell future maintainers what it does

● Tests cover the first, reviews the second● Debatable value for small teams● Highly valuable for large teams● Needs appropriate tools

Performance & Scalability

● Don't stress about it if you don't need to● Start with measurement infrastructure● If simple is fast enough, stick with simple

Reliability

● Not all software is mission critical● Pay attention to failure modes● Error quality matters

Usability

● Humans are still a lot smarter than computers● If users have no choice, they'll usually cope● Hence, awful UX in most “enterprise” software

Maintainability & Business Risks

● The Bus Factor– Most startups = 1

– Large companies want it to be higher

● Developer docs (including comments)● Legal risks (copyrights, patents, trademarks)

Automation

● Critical to speeding up release cycles● Is a process stable enough to automate?

Managing Path Dependence

Exit Strategies

● Know what you're not doing● Have a vague idea how to fix it when needed● Actual fixes will depend on future needs● Sometimes, the only right answer is “No”

Patterns and Processes

● Keep your options open● Minimise current complexity● This is not easy

– Software architecture and design patterns

– Software processes and methodologies

● “interim” solutions may last a long time● If you don't have a test suite, start there

Prototyping vs Implementation

● Two very different modes of development● Prototyping

– Exploration

– Trying to figure out what is feasible

● Implementation– Already known to be feasible

– Making it happen to a known specification

● Big difference in priorities!

Social Implications

● Design decisions are context dependent● Easy to criticise in hindsight● Design trade-offs can influence community● Actually getting better at building software● Ambitions are (more than?) keeping pace

Path Dependence in Action

An Innocent Start

● PulpDist: Mirroring network based on rsync ● Simple job definitions{ "remote_server": "localhost", "remote_path": "/demo/simple/", "local_path": "/var/www/pub/sync_demo_raw/", ...}

● Simple custom validator for JSON data

– Checks on individual values

– Overall sanity checks on full jobs

Don't Repeat Yourself

● Simple format turned out to be too simple– Hard to modify given multiple jobs from same source

● Enhanced format with reusable elements{ "mirror_id": "local_copy", "tree_id": "simple_sync", "site_id": "bne", ...}

● Simple validator was no longer adequate

What To Do?

● Upgrade the existing validator– Possible, but tedious to test properly

– Not a good wheel to reinvent

● JSON validation library– Research would be starting from scratch

– Hard to assess quality quickly

● Relational database– Enforces the constraints by its very nature

– Error quality would likely be poor

Two Birds...

● For validation, I needed to:– Ensure identifiers were unique

– Ensure cross references were valid

● For UI purposes I also needed:– To filter by component identifiers

– To sorting by various fields

● Sound familiar?

Two Birds...

● For validation, I needed to:– Ensure identifiers were unique

– Ensure cross references were valid

● For UI purposes I also needed:– To filter by component identifiers

– To sorting by various fields

● Sound familiar?

...One Stone

● An in-memory SQLite database was perfect● But writing SQL by hand is still horrible● SQL Alchemy in target environment● Problem solved!

– Config loaded into DB after simple field validation

– If the DB accepts it, references are also valid

How Does The Story End?

● Still some very rough edges– Sqlite error messages are quite user hostile

– Schema changes are triple-keyed

● Future changes?– Master in database, JSON only as export?

– Improved error messages?

– Switch to an actual schema engine?

● Other priorities!

Q & A

Pulp:

http://pulpproject.org/

PulpDist:

https://fedorahosted.org/pulpdist/

CPython Sprints

Monday & Tuesday

http://pulpproject.org/

https://fedorahosted.org/pulpdist/

Technology

Path dependent-development (PyCon India)