Upload
laura-frank
View
2.628
Download
0
Embed Size (px)
Citation preview
Laura Frank Director of Engineering, Codeship
Scalable and Available Services with $CONTAINER_TOOL
R A I N I N G O N Y O U R PA R A D E
Highly-available applications existed before containers
We love to think we’re solving new problems in new ways
We shouldn’t confuse new tools with new problems
Container toolinghas changed the way we design, build, run, and ship applications.
is a new solution for a longstanding problem.
Container tooling
Containers aren’t the point
We reason about services
Before the late 1980s
1990s-ish
3:00am when you’re on call
How can we guarantee availability in an environment
that will definitely fail?
!
DISTRIBUTED APPLICATIONS ENGINEERING, 1998
“Redundancy and recovery are the two main approaches to
solve this problem.”
An Imprecise Guideline ignoring many system constraints
redu
ndan
cy re
quire
d (n
umbe
r of r
eplic
as)
time to recover from failure (generic time units)
Container tools have some pretty sweet ways to deal with both redundancy and recovery.
Recovery
Control Theory FTW
Your orchestration platform is continuously trying to reconcile actual state with declared state.
Desired State
-
ClusterOrch
actions to conve
rge state
Actual State at time T
An Observability Problem
If a system can’t be observed, it can’t be controlled.
An Observability Problem
Failure Process State User Input
Desired State
-
ClusterMe!
Actual State at time T
An Observability Problem
Offloading the responsibility of observability to an orchestrator improves the level of controllability in your system
Atomic Scheduling Units
Scheduler
Orchestrator
taskN
task0
task1
Service Spec desired state
Service Object actual state
Kubernetes MasterDesired State
Scheduler Controllers
API Server
task0
task1
etcd
Kubernetes MasterDesired State
etcd
converged!
Scheduler Controllers
API Server
Using an orchestration tool, your system never fails…
it just doesn’t converge
Redundancy
Replicating and scheduling for high availability
HA application problems
scheduling problems
task scheduling problems
binpack
binpack
spread
spread (optimized for HA apps)
Most modern orchestration systems use an optimized scheduling algorithm for
dispatching services across a set of nodes.
G R E AT N E W S
It is not your tool’s responsibility to know about your system and business constraints
• topology* (some schedulers are topology aware) • specifics like OS, kernel, instance family • PII and other compliance
Y O U S T I L L H AV E T O D O W O R K
These tools work on the service level, not the infrastructure level
R E M I N D E R
Scheduling Constraints
Restrict services to specific nodes, such as specific architectures, security levels, or types, first apply a label to the nodes
docker service create \ --constraint 'node.labels.type==web' my-app
in Docker
nodeSelector has been around since 1.0, but there are alternatives which are more expressive
nodeAffinity has been around since 1.2 (still in beta).
nodeAntiAffinity does the opposite — you can repel things from one another.
in KubernetesScheduling Constraints
requiredDuringSchedulingIgnoredDuringExecution: - weight: 1 preference: matchExpressions: - key: some-node-label-key operator: Exists
in KubernetesScheduling Constraints
requiredDuringSchedulingIgnoredDuringExecution
in KubernetesScheduling Constraints
requiredDuringSchedulingRequiredDuringExecution
This allows labels to change while the pod is
running and won’t result in eviction
Implements a spread strategy over nodes that belong to a certain category.
This is a “soft” preference
--placement-pref ‘spread=node.labels.key’
in DockerPlacement Preferences
preferredDuringSchedulingIgnoredDuringExecution
in KubernetesPlacement Preferences
Topology-aware Scheduling
us-east-1
us-east-2
us-east-1
us-west-1
Topology-aware Scheduling
us-east-1
us-east-2
us-east-1
us-west-1
Topology-aware Scheduling
Kubernetes has a topology-aware scheduler! Read the docs.
In Docker, apply labels to your nodes, and use a placement preference like:
--placement-pref ‘spread=node.labels.region’
An Imprecise Guideline ignoring most constraints
redu
ndan
cy re
quire
d (n
umbe
r of r
eplic
as)
time to recover from failure (hypothetical time units)
The Future of Orchestration
Warning: opinions
A Framework for Evaluation
Genesis Custom Built Product Commodity
Visible (Lots of Management) Invisible (No Management)
Genesis Custom Built Product Commodity
Wardley Maps (simplified)
Time
Invis
ible
Visib
le
Genesis Custom Built Product Commodity
Invis
ible
Visib
le
Electricity 18th Century
Electricity 19th Century
Electricity now
Genesis Custom Built Product Commodity
Electricity
Compute
Invis
ible
Visib
le
Genesis Custom Built Product Commodity
Container Runtime 2000s Container Runtime
2014-2015
Container Runtime now
Invis
ible
Visib
le
Genesis Custom Built Product Commodity
Container Orchestrator
Container RuntimeInvis
ible
Visib
le
Genesis Custom Built Product Commodity
Container Orchestrator
Container RuntimeInvis
ible
Visib
le
?
?
?
Orchestration is becoming commoditized. Orchestrators will not be able to differentiate easily.
C O M M O D I T I Z AT I O N
If you have a hand-rolled solution for running apps with containers, it’s safe
to migrate to an orchestration platform.
I N N O VAT I O N
Solutions to old problems get commoditized, but it leaves room
for genesis elsewhere
Genesis Custom Built Product Commodity
Container Orchestrator
Container RuntimeInvis
ible
Visib
le
?
?
?
Istio & service mesh tools
Whatever Heptio is building
Storage!
Closing Thoughts
How can we guarantee availability in an environment
that will definitely fail?
!
DISTRIBUTED APPLICATIONS ENGINEERING, 1998
“Redundancy and recovery are the two main approaches to
solve this problem.”
Google became a company in 1998!
Laura Frank Director of Engineering, Codeship
@rhein_wein
Thanks!