Upload
yepa
View
59
Download
1
Embed Size (px)
DESCRIPTION
Alan Robertson IBM Linux Technology Center [email protected]. Linux-HA Release 2. Linux-HA Release 2. What is High-Availability (HA) Clustering? What can HA do for me? What is the Linux-HA project? Linux-HA applications Linux-HA customers Linux-HA release 1 capabilities - PowerPoint PPT Presentation
Citation preview
-- Linux-HA Release 2 Preview February/March, 2005
Linux-HA Release 2
Alan Robertson
IBM Linux Technology Center
-- Linux-HA Release 2 Preview February/March, 2005
Linux-HA Release 2
What is High-Availability (HA) Clustering?
What can HA do for me?
What is the Linux-HA project?
Linux-HA applications
Linux-HA customers
Linux-HA release 1 capabilities
Linux-HA release 2 capabilities
Comparative Architectures
Release 2 Details
Futures
-- Linux-HA Release 2 Preview February/March, 2005
What Is HA Clustering?
Putting together a group of computers which trust each other to provide a service even when system components fail
When one machine goes down, others take over its work
This involves IP address takeover, service takeover, etc.
New work comes to the “takeover” machine
Not primarily designed for high-performance
-- Linux-HA Release 2 Preview February/March, 2005
What Can HA Clustering Do For You?
It cannot achieve 100% availability – nothing can.
HA Clustering designed to recover from single faults
It can make your outages very short
From about a second to a few minutes
It is like a Magician's (Illusionist's) trick:
When it goes well, the hand is faster than the eye
When it goes not-so-well, it can be reasonably visible
A good HA clustering system adds a “9” to your base availability
99->99.9, 99.9->99.99, 99.99->99.999, etc.
Complexity is the enemy of reliability!
-- Linux-HA Release 2 Preview February/March, 2005
Single Points of Failure (SPOFs)
A single point of failure is a component whose failure will cause near-immediate failure of an entire system or service
Good HA design eliminates of single points of failure
-- Linux-HA Release 2 Preview February/March, 2005
How Does HA work?
Manage redundancy to improve service availability
Like a cluster-wide-super-init on steroids
Even complex services are now “respawn”
on node (computer) death
on “impairment” of nodes
on loss of connectivity
for services that aren't working (not necessarily stopped)
managing very complex dependency relationships
-- Linux-HA Release 2 Preview February/March, 2005
Redundant Communications
Intra-cluster communication is critical to HA system operation
Most HA clustering systems provide mechanisms for redundant internal communication for heartbeats, etc.
External communications is usually essential to provision of service
External communication redundancy is usually accomplished through routing tricks
Having an expert in BGP or OSPF is a help
-- Linux-HA Release 2 Preview February/March, 2005
Redundant Data Access
Replicated
Copies of data are kept updated on more than one computer in the cluster
Shared
Typically Fiber Channel Disk (SAN)
Sometimes shared SCSI
Back-end Storage (“”)
NFS, SMB
Back-end database
-- Linux-HA Release 2 Preview February/March, 2005
The Desire for HA systems
Who wants low-Who wants low-availability systems?availability systems?Why are so few systems High-Availability?
-- Linux-HA Release 2 Preview February/March, 2005
Why isn't everything HA?
Cost
Complexity
-- Linux-HA Release 2 Preview February/March, 2005
-- Linux-HA Release 2 Preview February/March, 2005
The Linux-HA Project
Linux-HA is the oldest high-availability project for Linux, with the largest associated community
The core piece of Linux-HA is called “heartbeat”(though it does much more than heartbeat)
Linux-HA has been in production since 1999, and is currently in use on about ten thousand sites
Linux-HA also runs on FreeBSD and Solaris, and is being ported to OpenBSD and others
Linux-HA is shipped with every major Linux distribution except one.
-- Linux-HA Release 2 Preview February/March, 2005
Linux-HA Release 1 Applications
Load Balancers
Web Servers
Database Servers
Custom Applications
Firewalls
Retail Point of Sale Solutions
Authentication
File Servers
Proxy Servers
Medical ImagingAlmost any type server application you can think of – except SAP
-- Linux-HA Release 2 Preview February/March, 2005
Linux-HA customersEmageonEmageon – medical imaging services
Contraloria General de la RepublicaContraloria General de la Republica (Colombian government)
IncredimailIncredimail bases their mail service on Linux-HA on IBM hardware
Karstadts'Karstadts' uses Linux-HA in each of several hundred stores
Bavarian Radio StationBavarian Radio Station (Munich) coverage of 2002 Olympics in Salt Lake City
Circuit City, Autozone, others Circuit City, Autozone, others uses Linux-HA in each of several hundred stores
Citysavings BankCitysavings Bank in Munich (infrastructure)
University of Toledo (US)University of Toledo (US) – 20k student Computer Aided Instruction system
AutostradaAutostrada – 230 clusters across country
The Weather ChannelThe Weather Channel (weather.com)
SonySony (manufacturing)
ISO New EnglandISO New England manages power grid using 25 Linux-HA clusters
-- Linux-HA Release 2 Preview February/March, 2005
Linux-HA Release 1 capabilities
Supports 2-node clusters
Can use serial, UDP bcast, mcast, ucast comm.
Fails over on node failure
Fails over on loss of IP connectivity
Capability for failing over on loss of SAN connectivity
Limited command line administrative tools to fail over, query current status, etc.
Active/Active or Active/Passive
Simple resource group dependency model
Requires external tool for resource monitoring
SNMP monitoring
-- Linux-HA Release 2 Preview February/March, 2005
Linux-HA Release 2 capabilities
Built-in resource monitoring
Support for the OCF resource standard
Much Larger clusters supported (>= 8 nodes)
Sophisticated dependency model with rich constraint support (resources, groups, incarnations, master/slave) (needed for SAP)
XML-based resource configuration
Configuration and monitoring GUI
Support for GFS cluster filesystem
Multi-state (master/slave) resource support
Initially - no IP, SAN monitoring
-- Linux-HA Release 2 Preview February/March, 2005
Release 2 Credits
Andrew Beekhof – CRM, CIB
Gouchun Shi – significant infrastructure improvements
Sun, Jiang Dong and Huang, Zhen – LRM, Stonithd and testing
Lars Marowsky-Bree – architecture, PHB :-)
Alan Robertson – architecture, project leadership, original heartbeat code and testing
-- Linux-HA Release 2 Preview February/March, 2005
Linux-HA Release 1 Architecture
-- Linux-HA Release 2 Preview February/March, 2005
Linux-HA Release 2 Architecture(add TE and PE)
-- Linux-HA Release 2 Preview February/March, 2005
Resource Objects in Release 2
Release 2 supports “resource objects” which can be any of the following:
Primitive ResourcesOCF, heartbeat-style, or LSB resource agent scripts
Resource Incarnations – need “n” resource objects - somewhere
Resource groups – a group of resources with implied co-location and linear ordering constraints
Multi-state resources (master/slave)Designed to model master/slave (replication) resources (DRBD, et al)
-- Linux-HA Release 2 Preview February/March, 2005
Basic Dependencies in Release 2
Ordering Dependencies
start before (implies stop after)
start after (implies stop before)
Mandatory Co-location Dependencies
must be co-located with
cannot be co-located with
-- Linux-HA Release 2 Preview February/March, 2005
Resource Location Constraints
Mandatory Constraints:
Resource Objects can be constrained to run on any selected subset of nodes. Default is none.
Preferential Constraints:
Resource Objects can also be preferentially constrained to run on specified nodes by providing weightings for arbitrary logical conditions
The resource object is run on the node which has the highest weight (score)
-- Linux-HA Release 2 Preview February/March, 2005
Resource Incarnations
Resource Incarnations allow one to have a resource which runs multiple (“n”) times on the cluster
This is useful for managing
load balancing clusters where you want “n” of them to be slave servers
Cluster filesystems
Cluster Alias IP addresses
-- Linux-HA Release 2 Preview February/March, 2005
Resource Groups
Resource Groups provide a shorthand for making a creating ordering and co-location dependencies
Each resource object in the group is declared to have linear start-after ordering relationships
Each resource object in the group is declared to have co-location dependencies on each other
This is an easy way of converting release 1 resource groups to release 2
-- Linux-HA Release 2 Preview February/March, 2005
Multi-State (master/slave) Resources
Normal resources can be in one of two stable states:
Multi-state resources can have more than two stable states. For example:
This is ideal for modeling replication resources like DRBD
-- Linux-HA Release 2 Preview February/March, 2005
Advanced Constraints
Nodes can have arbitrary attributes associated with them in name=value form
Attributes have types: int, string, version
Constraint expressions can use these attributes as well as node names, etc in largely arbitrary ways
Operators:
=, != <, >, <=, >=,
defined(attrname), undefined(attrname),
colocated(resource id), not colocated(resource id)
-- Linux-HA Release 2 Preview February/March, 2005
Advanced Constraints (cont'd)
Each constraint is associated with particular resource, and is evaluated in the context of a particular node.
A given constraint has a boolean predicate associated with it according to the expressions before, and is associated with a weight, and a condition.
If the predicate is true, then the condition is used to compute the weight associated with locating the given resource on the given node.
Supported conditions are: (these distinctions may be unneeded ?)
can: same as prefer with MAXINT weight
cannot: same as prefer with -MAXINT weight
prefer: positive weight
prefer not: same as prefer with negative weight
-- Linux-HA Release 2 Preview February/March, 2005
Security Considerations
Cluster: A computer whose backplane is the Internet
If this isn't frightening, you don't understand...
You may think you have a secure cluster network
You're probably mistaken now
You will be in the future
-- Linux-HA Release 2 Preview February/March, 2005
Secure Networks are Difficult Because...
Security is not often well-understood by adminsSecurity is well-understood by “black hats”Network security is easy to breach accidentally
Users bypass it
Hardware installers don't fully understand it
Most security breaches come from “trusted” staffStaff turnover is often a big issue
Virus/Worm/P2P technologies will create new holes especially for Windows machines
-- Linux-HA Release 2 Preview February/March, 2005
Security Advice
Good HA software should be designed to assume insecure networks
Not all HA software assumes insecure networks
Good HA installation architects use dedicated (secure?) networks for intra-cluster HA communication
Crossover cables are reasonably secure – all else is suspect
-- Linux-HA Release 2 Preview February/March, 2005
References
New Web site content (in progress)
(pretty - offline!)
(editable)
-- Linux-HA Release 2 Preview February/March, 2005
Legal Statements
IBM is a trademark of International Business Machines Corporation.
Linux is a registered trademark of Linus Torvalds.
Other company, product, and service names may be trademarks or service marks of others.
This work represents the views of the author and does not necessarily reflect the views of the IBM Corporation.