Your Code is Wrong

Preview:

DESCRIPTION

My keynote at NoSQL Now! on August 21st, 2013

Citation preview

Your Code is Wrong

Nathan Marz@nathanmarz 1

Let’s start with an example

Storm’s “reportError” method

(Storm is a realtime computation system, like Hadoop but for realtime)

Storm architecture

Storm architecture

Master node (similar to Hadoop JobTracker)

Storm architecture

Used for cluster coordination

Storm architecture

Run worker processes

Storm’s “reportError” method

Used to show errors in the Storm UI

Error info is stored in Zookeeper

What happens when a user deploys code like this?

Denial-of-service on Zookeeper and cluster goes down

Robust!

Designed input space Actual input space

Your code is wrong

Your code is literally wrong

Your code is wrong

Why do you believe your code is correct?

Your code

Dependency 1

Dependency 2

Dependency 3

Dependency 1

Dependency 4

Dependency 5

Dependency 4

Dependency 6

Dependency 9

Dependency 7

Dependency 8

Dependency 3,000,000

Hardware

Electronics

Chemistry

Atomic physics

Quantum mechanics

I think I can safely say that nobody understands

quantum mechanics.

Richard Feynman

Your code is wrong

Your code

...

All the software you’ve used has had bugs in it

Including the software you’ve written

Your code issometimes correct

That’s good enough!

Treat code as nondeterministic

Embrace “your code is wrong”to design better software

Robust!

Designed input space Actual input space

Robust!

Designed input space Actual input space

An example

Learning from Hadoop

Jobtracker

Job

Job

Job

Learning from Hadoop

Jobtracker

Job

Job

Job

Learning from Hadoop

Jobtracker

Job

Job

Job

Your code is wrong

So your processes will crash

Storm’s daemons are process fault-tolerant

Storm

Nimbus

Topology

Topology

Topology

Storm

Nimbus

Topology

Topology

Topology

Storm

Nimbus

Topology

Topology

Topology

Storm

Nimbus

Topology

Topology

Topology

Storm

Nimbus

Topology

Topology

Topology

Robust!

Designed input space Actual input space

Robust!

Designed input space Actual input space

The impact of code being wrong

Robust!

Designed input space Actual input space

Failures!Bad performance!Security holes!

Irrelevant!

Design principle #1

Measuring and monitoring are the foundation of solid engineering

Measuring: Under what range of inputs does my software function well?

Monitoring: What’s the actual input space of my software?

Measure & MonitorLatencyThroughputStack tracesBuffer sizesMemory usageCPU usage#threads spawned...

How you monitor your software is as important as its functionality

Design principle #2

Embrace immutability

Read/write databaseApplication

MySQLApplication

MongoDBApplication

RiakApplication

CassandraApplication

HBaseApplication

Your code is wrong

So data will be corrupted

And you may not know why

ViewsImmutable,

ever-growing data

Application

Architecture based on immutability

ViewsImmutable,

ever-growing data

Application

Lambda architecture

Design principle #3

Minimize dependencies

The less that can go wrong, the less that will go wrong

Example:Storm’s usage of Zookeeper

Worker locations stored in Zookeeper

All workers must know locations of other workers to send messages

Two ways to get location updates

1. Poll Zookeeper

Worker Zookeeper

2. Use Zookeeper “watch” feature to get push notifications

Worker Zookeeper

Method 2 is faster but relies on another feature

Storm uses both methods

Worker Zookeeper

If watch feature fails, locations still propagate via polling

Eliminating dependence justified by small amount of code required

Design principle #4

Explicitly respect functional input ranges

Storm’s “reportError” method

Implement self-throttling to avoid overloading other systems

Design principle #5

Embrace recomputation

“Your code is wrong” meanings1. Design input space differs from actual input space2. The logic of your code is wrong3. Requirements are constantly changing

You must be able to change your code to match shifting requirements

Example: blogging software

New requirement: search

Have to build a search index

Recomputation gives you so much more

ViewsImmutable,

ever-growing data

Application

Building software no different than any other engineering

The underlying challenges are the same

What will break it?

What are limits of my dependencies?

How can I add redundancy to increase robustness?

Can I isolate failures?

Our raw materials are ideas instead of matter

Thank you

Recommended