View
216
Download
3
Category
Preview:
Citation preview
Sardina Systems Proprietary. Copyright (C) 2014, 2015, 2016, 2017. Sardina Systems. All rights reserved.
lessons learnt: large scale government OpenStack private cloud
Sardina Systems Proprietary. Copyright (C) 2014, 2015, 2016, 2017. Sardina Systems. All rights reserved.
1. the client highly technical, critical operations
Sardina Systems Proprietary. Copyright (C) 2014, 2015, 2016, 2017. Sardina Systems. All rights reserved.
the client
government x
disparate user groups, varying and competing needs, long-lived and short-lived workloads
technically astute organization
Sardina Systems Proprietary. Copyright (C) 2014, 2015, 2016, 2017. Sardina Systems. All rights reserved.
background
challenges with energy consumption, system efficiency
need to squeeze more out of the IT investment
understands gross wastage within their traditional solution
Sardina Systems Proprietary. Copyright (C) 2014, 2015, 2016, 2017. Sardina Systems. All rights reserved.
background
need to meet organizational mission
don’t throw money at the challenge
money does not grow on trees
Sardina Systems Proprietary. Copyright (C) 2014, 2015, 2016, 2017. Sardina Systems. All rights reserved.
2. OpenStack
flexible, widely adopted
IaaS platform
Sardina Systems Proprietary. Copyright (C) 2014, 2015, 2016, 2017. Sardina Systems. All rights reserved.
what is OpenStack?
at the most basic level: management layer for virtualized compute, networking, storage resources throughout a datacenter
modern web ui dashboard: gives administrators control while empowering self-service by users
Sardina Systems Proprietary. Copyright (C) 2014, 2015, 2016, 2017. Sardina Systems. All rights reserved.
what does OpenStack offer?
flexible compute, storage, networking
Sardina Systems Proprietary. Copyright (C) 2014, 2015, 2016, 2017. Sardina Systems. All rights reserved.
lifecycle
deploy
operate
upgrade
Sardina Systems Proprietary. Copyright (C) 2014, 2015, 2016, 2017. Sardina Systems. All rights reserved.
how do we turn this into a solution for the client?
Sardina Systems Proprietary. Copyright (C) 2014, 2015, 2016, 2017. Sardina Systems. All rights reserved.
3. the solution system for production, not-a-toy
Sardina Systems Proprietary. Copyright (C) 2014, 2015, 2016, 2017. Sardina Systems. All rights reserved.
route to solution
problem definition and solution design
system architecture
hardware specification
project management: what’s happening when, by who
Sardina Systems Proprietary. Copyright (C) 2014, 2015, 2016, 2017. Sardina Systems. All rights reserved.
route to solution
plan for production from day 0
build for production from day 0
it’s not a toy: high availability is not optional
monkey-capable no-magic scaling up process
Sardina Systems Proprietary. Copyright (C) 2014, 2015, 2016, 2017. Sardina Systems. All rights reserved.
route to solution
your client != your guinea pig
Sardina Systems Proprietary. Copyright (C) 2014, 2015, 2016, 2017. Sardina Systems. All rights reserved.
4. deploy showtime: get solution design to work
Sardina Systems Proprietary. Copyright (C) 2014, 2015, 2016, 2017. Sardina Systems. All rights reserved.
from design to deployment
the ingredients: what’s the configuration needed
ingredients ready: make sure hardware are available (servers, storage, networking)
system configuration: single source of truth at all times
Sardina Systems Proprietary. Copyright (C) 2014, 2015, 2016, 2017. Sardina Systems. All rights reserved.
automate, automate, automate
system configuration = input for automation
deployment of entire system: fully automated
reduced project risk = increased operational confidence
Sardina Systems Proprietary. Copyright (C) 2014, 2015, 2016, 2017. Sardina Systems. All rights reserved.
deploy: 1, 2, 3 … showtime
1. input: configuration and inventory files
2. run deployer
3. 1 or 2 cups of coffee
… done
Sardina Systems Proprietary. Copyright (C) 2014, 2015, 2016, 2017. Sardina Systems. All rights reserved.
deployer
pre-tested and proven off-site
ci/cd process
no knowledge of OpenStack necessary: enable factory deployment, integration and testing
Sardina Systems Proprietary. Copyright (C) 2014, 2015, 2016, 2017. Sardina Systems. All rights reserved.
5. upgrade reliable, interruption-free upgrade
Sardina Systems Proprietary. Copyright (C) 2014, 2015, 2016, 2017. Sardina Systems. All rights reserved.
run at all times
n to n+1 upgrade every 6 months (releases eol in 12 months)
2n upgrades in n years: not an option
it’s not a toy: zero-downtime is not optional
system has to continue operation at all times
Sardina Systems Proprietary. Copyright (C) 2014, 2015, 2016, 2017. Sardina Systems. All rights reserved.
zero-downtime
get solution architecture right (from deploy phase) … else, find out problem 6 months later
monkey-capable fully automated upgrade process
Sardina Systems Proprietary. Copyright (C) 2014, 2015, 2016, 2017. Sardina Systems. All rights reserved.
6. reliability dead is not an option: averting
service death
Sardina Systems Proprietary. Copyright (C) 2014, 2015, 2016, 2017. Sardina Systems. All rights reserved.
new expectations
on demand
it should just work
Sardina Systems Proprietary. Copyright (C) 2014, 2015, 2016, 2017. Sardina Systems. All rights reserved.
new roles
operator — vs — consumer
Sardina Systems Proprietary. Copyright (C) 2014, 2015, 2016, 2017. Sardina Systems. All rights reserved.
operator
operation of underlying infrastructure
provides virtualized resources (compute, networking, storage) … and stops at this!
Sardina Systems Proprietary. Copyright (C) 2014, 2015, 2016, 2017. Sardina Systems. All rights reserved.
consumer
operation within the virtualized environment
not concerned about underlying infrastructure
Sardina Systems Proprietary. Copyright (C) 2014, 2015, 2016, 2017. Sardina Systems. All rights reserved.
consumer
don’t care about what it takes to provide the cloud environment
it should just work
Sardina Systems Proprietary. Copyright (C) 2014, 2015, 2016, 2017. Sardina Systems. All rights reserved.
operator
responsibility to keep cloud environment running
Sardina Systems Proprietary. Copyright (C) 2014, 2015, 2016, 2017. Sardina Systems. All rights reserved.
dead
in event the service is dead: expectation no longer met
Sardina Systems Proprietary. Copyright (C) 2014, 2015, 2016, 2017. Sardina Systems. All rights reserved.
dead is not an option
if it is dead, it is too late!
Sardina Systems Proprietary. Copyright (C) 2014, 2015, 2016, 2017. Sardina Systems. All rights reserved.
when could there be risk of death?
Sardina Systems Proprietary. Copyright (C) 2014, 2015, 2016, 2017. Sardina Systems. All rights reserved.
operate
system is deployed, now … keep services running
high-availability is not a luxury
Sardina Systems Proprietary. Copyright (C) 2014, 2015, 2016, 2017. Sardina Systems. All rights reserved.
risk of death
services encountering problems + impacting consumers
Sardina Systems Proprietary. Copyright (C) 2014, 2015, 2016, 2017. Sardina Systems. All rights reserved.
death during operate phase
intrinsic — vs — extrinsic
death risks
Sardina Systems Proprietary. Copyright (C) 2014, 2015, 2016, 2017. Sardina Systems. All rights reserved.
extrinsic death risk?
dealt with via highly available solution architecture
Sardina Systems Proprietary. Copyright (C) 2014, 2015, 2016, 2017. Sardina Systems. All rights reserved.
what is OpenStack?
a series of intercommunicating services
HTTP, MQ
Sardina Systems Proprietary. Copyright (C) 2014, 2015, 2016, 2017. Sardina Systems. All rights reserved.
types of services
1. with data + configuration
2. configuration only
Sardina Systems Proprietary. Copyright (C) 2014, 2015, 2016, 2017. Sardina Systems. All rights reserved.
safeguards: configuration
infrastructure-as-code operating model
must be able to re-deploy components by re-running deployer
Sardina Systems Proprietary. Copyright (C) 2014, 2015, 2016, 2017. Sardina Systems. All rights reserved.
safeguards: data
replication, replication, replication
2-node model or quorum/odd-node model
Sardina Systems Proprietary. Copyright (C) 2014, 2015, 2016, 2017. Sardina Systems. All rights reserved.
safeguards: data
traditional high availability operations model
Sardina Systems Proprietary. Copyright (C) 2014, 2015, 2016, 2017. Sardina Systems. All rights reserved.
anything else?
Sardina Systems Proprietary. Copyright (C) 2014, 2015, 2016, 2017. Sardina Systems. All rights reserved.
what else?
http: it’s a web server! let’s treat it as such!
high availability for web servers
Sardina Systems Proprietary. Copyright (C) 2014, 2015, 2016, 2017. Sardina Systems. All rights reserved.
what else?
mq: put stuff in queue, take stuff from queue
replicate reader/writer
Sardina Systems Proprietary. Copyright (C) 2014, 2015, 2016, 2017. Sardina Systems. All rights reserved.
extrinsic death risk
keep extra copy available at all times
Sardina Systems Proprietary. Copyright (C) 2014, 2015, 2016, 2017. Sardina Systems. All rights reserved.
is that all?
yes
Sardina Systems Proprietary. Copyright (C) 2014, 2015, 2016, 2017. Sardina Systems. All rights reserved.
smarts?
it’s in the solution architecture
get it right, else it can come back to bite you!
Sardina Systems Proprietary. Copyright (C) 2014, 2015, 2016, 2017. Sardina Systems. All rights reserved.
intrinsic death risk
it’s intrinsic, not externally triggered
Sardina Systems Proprietary. Copyright (C) 2014, 2015, 2016, 2017. Sardina Systems. All rights reserved.
intrinsic death risk
traditional operation model: monitor, alert, action
Sardina Systems Proprietary. Copyright (C) 2014, 2015, 2016, 2017. Sardina Systems. All rights reserved.
monitor
look for dead or alive
if it looks like it is dead, check again x 3, to be sure
Sardina Systems Proprietary. Copyright (C) 2014, 2015, 2016, 2017. Sardina Systems. All rights reserved.
monitor
… but it’s too late if it is dead!
Sardina Systems Proprietary. Copyright (C) 2014, 2015, 2016, 2017. Sardina Systems. All rights reserved.
intrinsic death risk?
it’s intrinsic!
highly available solution architecture will not suffice
Sardina Systems Proprietary. Copyright (C) 2014, 2015, 2016, 2017. Sardina Systems. All rights reserved.
smarts?
need to detect sick states
sick: alive, not healthy
Sardina Systems Proprietary. Copyright (C) 2014, 2015, 2016, 2017. Sardina Systems. All rights reserved.
smarts
correlation between x and y
not just causality (that’s easy!)
Sardina Systems Proprietary. Copyright (C) 2014, 2015, 2016, 2017. Sardina Systems. All rights reserved.
OpenStack architecture
Sardina Systems Proprietary. Copyright (C) 2014, 2015, 2016, 2017. Sardina Systems. All rights reserved.
causality? correlation?
Sardina Systems Proprietary. Copyright (C) 2014, 2015, 2016, 2017. Sardina Systems. All rights reserved.
challenge
non-trivial computational task
plus, “sick” in one environment may not be “sick” in another
Sardina Systems Proprietary. Copyright (C) 2014, 2015, 2016, 2017. Sardina Systems. All rights reserved.
data
2 types of data
metrics
logs
Sardina Systems Proprietary. Copyright (C) 2014, 2015, 2016, 2017. Sardina Systems. All rights reserved.
it needs to …
example: response time behavior of Web service + disk SMART errors
detect sick states that are otherwise below the radar
Sardina Systems Proprietary. Copyright (C) 2014, 2015, 2016, 2017. Sardina Systems. All rights reserved.
how?
logs ingest from OpenStack infrastructure
time series store for machine generated data
watch log data for anomalies
Sardina Systems Proprietary. Copyright (C) 2014, 2015, 2016, 2017. Sardina Systems. All rights reserved.
analog
take continuous x-ray pictures of patient
feed through decision engine
if detect certain dark spots, the patient may be “sick”
Sardina Systems Proprietary. Copyright (C) 2014, 2015, 2016, 2017. Sardina Systems. All rights reserved.
analog
if decision engine got it wrong, need to re-train
next time, dark spots will be classed as “sick”
Sardina Systems Proprietary. Copyright (C) 2014, 2015, 2016, 2017. Sardina Systems. All rights reserved.
health
picture covers a certain configurable/variable time window
constituents of the picture: configurable
operator to trigger re-training on wrong decisions
Sardina Systems Proprietary. Copyright (C) 2014, 2015, 2016, 2017. Sardina Systems. All rights reserved.
health
shortcuts
pre-condition: has to be alive to check for “sick”
causality relationships
Sardina Systems Proprietary. Copyright (C) 2014, 2015, 2016, 2017. Sardina Systems. All rights reserved.
not a toy
highly available
no-polling, operate on pushed data streams
Sardina Systems Proprietary. Copyright (C) 2014, 2015, 2016, 2017. Sardina Systems. All rights reserved.
not a toy
back pressure sensitive: not to overload data pipeline
horizontal scalable data store
load balancing across multiple backends
Sardina Systems Proprietary. Copyright (C) 2014, 2015, 2016, 2017. Sardina Systems. All rights reserved.
not a toy
high availability for an evaluation system?
Sardina Systems Proprietary. Copyright (C) 2014, 2015, 2016, 2017. Sardina Systems. All rights reserved.
massive volumes of data
how much: x00s metrics per data source
volume: potentially saturating x00 GbE network
Sardina Systems Proprietary. Copyright (C) 2014, 2015, 2016, 2017. Sardina Systems. All rights reserved.
smarts?
smart data ingest reduction
imbalanced I/O pattern: large volume of small writes, small number of large reads
Sardina Systems Proprietary. Copyright (C) 2014, 2015, 2016, 2017. Sardina Systems. All rights reserved.
deal with risk of death and …
know imminent fault
deal with latent threat, before threat becomes patent
meet consumer expectations
Sardina Systems Proprietary. Copyright (C) 2014, 2015, 2016, 2017. Sardina Systems. All rights reserved.
avert death
Sardina Systems Proprietary. Copyright (C) 2014, 2015, 2016, 2017. Sardina Systems. All rights reserved.
7. efficiency reduce, optimize away wastage, raise
efficiency, increase utilization
Sardina Systems Proprietary. Copyright (C) 2014, 2015, 2016, 2017. Sardina Systems. All rights reserved.
efficiency
it’s about what you do with what you have
Sardina Systems Proprietary. Copyright (C) 2014, 2015, 2016, 2017. Sardina Systems. All rights reserved.
reservation
what you asked for
Sardina Systems Proprietary. Copyright (C) 2014, 2015, 2016, 2017. Sardina Systems. All rights reserved.
utilization
what is actually used
Sardina Systems Proprietary. Copyright (C) 2014, 2015, 2016, 2017. Sardina Systems. All rights reserved.
… what’s your utilization level?
Sardina Systems Proprietary. Copyright (C) 2014, 2015, 2016, 2017. Sardina Systems. All rights reserved.
traditional hpc sites: “90+% utilization according to scheduler” … but is that reservation or utilization?
the client: understands the wastage in their traditional environment
Sardina Systems Proprietary. Copyright (C) 2014, 2015, 2016, 2017. Sardina Systems. All rights reserved.
a large public cloud operator: “90% memory reservation whilst average cpu was 6% and average max cpu was 15%; this is not unusual”
Sardina Systems Proprietary. Copyright (C) 2014, 2015, 2016, 2017. Sardina Systems. All rights reserved.
leave lights on?
do you leave lights on at home for the whole week if you are home for a day?
… that’s 14% utilization!
Sardina Systems Proprietary. Copyright (C) 2014, 2015, 2016, 2017. Sardina Systems. All rights reserved.
what to do?
need data, detailed data
what’s happening in each part of cpu operating unit, memory subsystem, i/o subsystem
form full picture of system utilization
Sardina Systems Proprietary. Copyright (C) 2014, 2015, 2016, 2017. Sardina Systems. All rights reserved.
have data, then?
large scale n-dimensional jigsaw puzzle problem
each parameter = 1 dimension
Sardina Systems Proprietary. Copyright (C) 2014, 2015, 2016, 2017. Sardina Systems. All rights reserved.
2 classes of problems
placement: where to place new VMs
rebalancing: how to optimally lay out VMs, optimal number of hosts
Sardina Systems Proprietary. Copyright (C) 2014, 2015, 2016, 2017. Sardina Systems. All rights reserved.
2 classes of problems
placement: where to place new piece
rebalancing: dimensions have changed, re-do jigsaw puzzle
solve jigsaw puzzle with x00s of dimensions
Sardina Systems Proprietary. Copyright (C) 2014, 2015, 2016, 2017. Sardina Systems. All rights reserved.
why do we need all these parameters?
answer: try packing parcels while having only height information, not having width nor depth?
Sardina Systems Proprietary. Copyright (C) 2014, 2015, 2016, 2017. Sardina Systems. All rights reserved.
the impact
the client: raised utilization from 20% to 60% (leaving 40% head room)
3x utilization increase
66% energy reduction
Sardina Systems Proprietary. Copyright (C) 2014, 2015, 2016, 2017. Sardina Systems. All rights reserved.
efficient resource management
operating system-based model, throughout entire facility
dynamic, efficient: lower OpEx + CapEx
Sardina Systems Proprietary. Copyright (C) 2014, 2015, 2016, 2017. Sardina Systems. All rights reserved.
8. conditional
flexible, widely adopted
IaaS platform
Sardina Systems Proprietary. Copyright (C) 2014, 2015, 2016, 2017. Sardina Systems. All rights reserved.
conditional engine
fast, scalable, highly available
handle large volumes of conditions to evaluate
Sardina Systems Proprietary. Copyright (C) 2014, 2015, 2016, 2017. Sardina Systems. All rights reserved.
… topics for another day
Sardina Systems Proprietary. Copyright (C) 2014, 2015, 2016, 2017. Sardina Systems. All rights reserved.
9. sardina systems
Sardina Systems Proprietary. Copyright (C) 2014, 2015, 2016, 2017. Sardina Systems. All rights reserved.
this is Sardina Systems
full-lifecycle automated OpenStack: deploy, operate, upgrade
AI-driven smart, efficient, super-scalable automation technology
Sardina Systems Proprietary. Copyright (C) 2014, 2015, 2016, 2017. Sardina Systems. All rights reserved.
enable organizations to rapidly experience the value of OpenStack cloud and maximize utility of their resources
sectors: finance, government, aerospace, research, academia
in 2015, Sardina FishOS won the IDC HPC Innovation Award
Sardina Systems Proprietary. Copyright (C) 2014, 2015, 2016, 2017. Sardina Systems. All rights reserved.
top-10 trader on New York market
U of Edinburgh: top-5 UK academic and research site
150k VMs at classified government site
Sardina Systems Proprietary. Copyright (C) 2014, 2015, 2016, 2017. Sardina Systems. All rights reserved.
lessons learnt: large scale government OpenStack private cloud
dr kenneth tan kenneth.tan@sardinasystems.com
+44 798 941 7838
Recommended