Upload
others
View
16
Download
0
Embed Size (px)
Citation preview
2012 Storage Developer Conference. © EMC. All Rights Reserved.2014 Storage Developer Conference. © EMC Corporation. All Rights Reserved.
1
Implementing Witness service for various cluster failover scenarios
Rafal SzczesniakEMC/Isilon
2012 Storage Developer Conference. © EMC. All Rights Reserved.2014 Storage Developer Conference. © EMC Corporation. All Rights Reserved.
2
Long time ago vs. now
SMB1 – no high availability at all
2
2012 Storage Developer Conference. © EMC. All Rights Reserved.2014 Storage Developer Conference. © EMC Corporation. All Rights Reserved.
3
Long time ago vs. now
SMB1 – no high availability at all
SMB2 – durable and resilient handles (file opens)
3
2012 Storage Developer Conference. © EMC. All Rights Reserved.2014 Storage Developer Conference. © EMC Corporation. All Rights Reserved.
4
Long time ago vs. now
SMB1 – no high availability at all
SMB2 – durable and resilient handles (file opens)
SMB3 – persistent handles, multi-channel and Witness
4
2012 Storage Developer Conference. © EMC. All Rights Reserved.2014 Storage Developer Conference. © EMC Corporation. All Rights Reserved.
5
What is Witness?
DCE/RPC interface (see [MS-SWN])
Service providing early detection of connection failures instead of relying on TCP timeouts
Means of (partial) control over client connections
5
2012 Storage Developer Conference. © EMC. All Rights Reserved.2014 Storage Developer Conference. © EMC Corporation. All Rights Reserved.
6
What is Witness?
2012 Storage Developer Conference. © EMC. All Rights Reserved.2014 Storage Developer Conference. © EMC Corporation. All Rights Reserved.
7
OneFS cluster
Isilon IQ Storage LayerIntracluster
Communication Infiniband
Clients
Client/Application Layer Ethernet Layer
(optional 2nd
switch)(optional 2nd
switch)
NFS, SMB,FTP, HTTP,
HDFS
NFS, SMB,FTP, HTTP,
HDFSClients
Clients
(optional 2nd switch)(optional 2nd switch)
2012 Storage Developer Conference. © EMC. All Rights Reserved.2014 Storage Developer Conference. © EMC Corporation. All Rights Reserved.
8
Witness Service in OneFS cluster
Isilon IQ Storage LayerIntracluster
Communication Infiniband
Clients
Client/Application Layer Ethernet Layer
Clients
(optional 2nd switch)(optional 2nd switch)
SMB Connection
Witness Registration
2012 Storage Developer Conference. © EMC. All Rights Reserved.2014 Storage Developer Conference. © EMC Corporation. All Rights Reserved.
9
Interfaces and Groups
Interface group as an abstraction of cluster nodes’ network interfaces
Usually the same as OneFS address pool
Separate groups for separate OneFS Access Zones
9
2012 Storage Developer Conference. © EMC. All Rights Reserved.2014 Storage Developer Conference. © EMC Corporation. All Rights Reserved.
10
Caching the state of interfaces
Requesting the interface information from the system all the time can be expensive
The interface state does not change so often
We can cache the information for as long as we need it
10
2012 Storage Developer Conference. © EMC. All Rights Reserved.2014 Storage Developer Conference. © EMC Corporation. All Rights Reserved.
11
Caching the state of interfaces
11
2012 Storage Developer Conference. © EMC. All Rights Reserved.2014 Storage Developer Conference. © EMC Corporation. All Rights Reserved.
12
Caching on-demand
The internal list of interfaces is propagated when needed
The number of interfaces can be substantial, especially in a cluster with multiple Access Zones
Updating a large cache could be expensive too, so it’s easier to keep track of only those interfaces the clients ask about
12
2012 Storage Developer Conference. © EMC. All Rights Reserved.2014 Storage Developer Conference. © EMC Corporation. All Rights Reserved.
13
Resource monitor
Thin layer providing access to the cluster “resources”
The only resources monitored (at the moment): Interface, Interface Group
Allows querying the current information
Allows subscribing for events and unsubscribing when the server is no longer interested in updates
13
2012 Storage Developer Conference. © EMC. All Rights Reserved.2014 Storage Developer Conference. © EMC Corporation. All Rights Reserved.
14
What does the availability mean?
Network interface failures
14
2012 Storage Developer Conference. © EMC. All Rights Reserved.2014 Storage Developer Conference. © EMC Corporation. All Rights Reserved.
15
What does the availability mean?
Network interface failures
Server process crashes or deadlocks
15
2012 Storage Developer Conference. © EMC. All Rights Reserved.2014 Storage Developer Conference. © EMC Corporation. All Rights Reserved.
16
What does the availability mean?
Network interface failures
Server process crashes or deadlocks
System crashes
16
2012 Storage Developer Conference. © EMC. All Rights Reserved.2014 Storage Developer Conference. © EMC Corporation. All Rights Reserved.
17
Resource monitor modules and events
Individual modules can keep track of all sorts of things independently
Subscribing certain (or any) changes enables the module to submit events to Interface or Interface Group
Witness server has the authority to filter the events and make its own decisions on how the clients should be notified
17
2012 Storage Developer Conference. © EMC. All Rights Reserved.2014 Storage Developer Conference. © EMC Corporation. All Rights Reserved.
18
Resource monitor modules
18
2012 Storage Developer Conference. © EMC. All Rights Reserved.2014 Storage Developer Conference. © EMC Corporation. All Rights Reserved.
19
Resource events
Virtually any change happening to a subscribed resource can generate an event
Examples of events to watch for:
Interface state change to unavailable
New interface added to an Interface Group
Submitted events are “pre-treated” by the server before they are used to generate client notifications
19
2012 Storage Developer Conference. © EMC. All Rights Reserved.2014 Storage Developer Conference. © EMC Corporation. All Rights Reserved.
20
Resource events (contd.)
Modules have a large degree of freedom in what can cause an event submission
The server has the authority to say which events will turn into the actual notifications
20
2012 Storage Developer Conference. © EMC. All Rights Reserved.2014 Storage Developer Conference. © EMC Corporation. All Rights Reserved.
21
Resource event
21
What does it include?
Module Id
Type of event (changed/added/removed)
Resource
Destination (optional, if the module has any suggestions)
2012 Storage Developer Conference. © EMC. All Rights Reserved.2014 Storage Developer Conference. © EMC Corporation. All Rights Reserved.
22
Interface events queue
22
2012 Storage Developer Conference. © EMC. All Rights Reserved.2014 Storage Developer Conference. © EMC Corporation. All Rights Reserved.
23
Keeping track of the availability
Multiple different modules look at different aspects of availability
We need all of them to give us a “go” in order to consider an Interface available
Witness server updates a list of Problems for each Interface as “go-s” and “no-go-s” come in their respective events
The list is empty = There are no problems = The interface is available
23
2012 Storage Developer Conference. © EMC. All Rights Reserved.2014 Storage Developer Conference. © EMC Corporation. All Rights Reserved.
24
Keeping track of availability
24
2012 Storage Developer Conference. © EMC. All Rights Reserved.2014 Storage Developer Conference. © EMC Corporation. All Rights Reserved.
25
Updating interface state
Any module can submit events to an interface at any time (given subscriptions)
Witness server starts a work item (a function started in a separate thread) to process the events
After processing, subsequent work items are started to queue notifications in each individual client registration
Work items queuing the notifications resume execution of asynchronous request and send the responses to the witness clients
25
2012 Storage Developer Conference. © EMC. All Rights Reserved.2014 Storage Developer Conference. © EMC Corporation. All Rights Reserved.
26
Updating interface (submit)
26
2012 Storage Developer Conference. © EMC. All Rights Reserved.2014 Storage Developer Conference. © EMC Corporation. All Rights Reserved.
27
Updating interface state (process)
27
2012 Storage Developer Conference. © EMC. All Rights Reserved.2014 Storage Developer Conference. © EMC Corporation. All Rights Reserved.
28
Updating interface state (wake up)
28
2012 Storage Developer Conference. © EMC. All Rights Reserved.2014 Storage Developer Conference. © EMC Corporation. All Rights Reserved.
29
Updating interface state (notify)
29
2012 Storage Developer Conference. © EMC. All Rights Reserved.2014 Storage Developer Conference. © EMC Corporation. All Rights Reserved.
30
Resource monitor modules
Different modules can keep track of different things independently
Each module handles its specific failover scenario
2012 Storage Developer Conference. © EMC. All Rights Reserved.2014 Storage Developer Conference. © EMC Corporation. All Rights Reserved.
31
Scenario: Testing
A module with an IPC interface and a command line client simulates the network interfaces and groups and their changes
Can create and keep an arbitrary number of groups and interfaces
Useful for simulating unusual events
31
2012 Storage Developer Conference. © EMC. All Rights Reserved.2014 Storage Developer Conference. © EMC Corporation. All Rights Reserved.
32
Testing module (netsim)
32
2012 Storage Developer Conference. © EMC. All Rights Reserved.2014 Storage Developer Conference. © EMC Corporation. All Rights Reserved.
33
Scenario: Network interface failure
Wired to OneFS cluster networking configuration (Flexnet)
Interface and address pool information received from the system service
Waiting for changes in a separate thread watching individual address pools
Notified through file descriptors
33
2012 Storage Developer Conference. © EMC. All Rights Reserved.2014 Storage Developer Conference. © EMC Corporation. All Rights Reserved.
34
Flexnet Service in OneFS cluster
Isilon IQ Storage LayerIntracluster
Communication Infiniband
Clients
Client/Application Layer Ethernet Layer
(optional 2nd
switch)(optional 2nd
switch)
NFS, SMB,FTP, HTTP,
HDFS
NFS, SMB,FTP, HTTP,
HDFSClients
Clients
(optional 2nd switch)(optional 2nd switch)Fle
xnet
2012 Storage Developer Conference. © EMC. All Rights Reserved.2014 Storage Developer Conference. © EMC Corporation. All Rights Reserved.
35
Network module
35
2012 Storage Developer Conference. © EMC. All Rights Reserved.2014 Storage Developer Conference. © EMC Corporation. All Rights Reserved.
36
Scenario: Server process failure
OneFS Group Manager watching other nodes in the cluster provides the feed
It can keep track of the state of certain processes on other nodes
The module gets notified about the changes in the same way as Network module
36
2012 Storage Developer Conference. © EMC. All Rights Reserved.2014 Storage Developer Conference. © EMC Corporation. All Rights Reserved.
37
Group Manager in OneFS cluster
Isilon IQ Storage LayerIntracluster
Communication Infiniband
Clients
Client/Application Layer Ethernet Layer
(optional 2nd
switch)(optional 2nd
switch)
NFS, SMB,FTP, HTTP,
HDFS
NFS, SMB,FTP, HTTP,
HDFSClients
Clients
(optional 2nd switch)(optional 2nd switch)
Group Manager
2012 Storage Developer Conference. © EMC. All Rights Reserved.2014 Storage Developer Conference. © EMC Corporation. All Rights Reserved.
38
Scenario: Maintenance
Sometimes we need to gracefully take a node off the cluster
Existing client connections should “go away”
The module can make the node interfaces look unavailable
It can also move all connections to a different node or even a completely different group
38
2012 Storage Developer Conference. © EMC. All Rights Reserved.2014 Storage Developer Conference. © EMC Corporation. All Rights Reserved.
39
Beyond failover
Witness “move” notification can be used for load balancing
What would it take?
Connection resource type (to have a control over individual connections)
A module checking the load on other nodes and requesting the move if one of them is overloaded (perhaps another use for witness)
39
2012 Storage Developer Conference. © EMC. All Rights Reserved.2014 Storage Developer Conference. © EMC Corporation. All Rights Reserved.
40
Beyond Witness itself
Witness RPC is not in fact tied to SMB protocol very much
Information provided by the Resource Monitor (network interfaces status) may be useful for other services, too
40
2012 Storage Developer Conference. © EMC. All Rights Reserved.2014 Storage Developer Conference. © EMC Corporation. All Rights Reserved.
41
Beyond Witness itself
41
2012 Storage Developer Conference. © EMC. All Rights Reserved.2014 Storage Developer Conference. © EMC Corporation. All Rights Reserved.
42
Beyond Witness itself
42
2012 Storage Developer Conference. © EMC. All Rights Reserved.2014 Storage Developer Conference. © EMC Corporation. All Rights Reserved.
43
Thank you!
Questions?
Rafal Szczesniak
43