Upload
vuthuan
View
222
Download
6
Embed Size (px)
Citation preview
Information Management
© 2009 IBM Corporation2
Agenda
� What is pureScale– Where Does it come from
– Test Results
� Deeper Dive: Key Design Points– Overall DB2 pureScale Architecture
– Scalability
– Availability• Automatic Workload Balancing & Client Routing
� Configuration and Monitoring– Cluster configuration and operational status
– Monitoring data
� Questions
Information Management
© 2009 IBM Corporation3
Agenda
� What is pureScale– Where Does it come from
– Test Results
� Deeper Dive: Key Design Points– Overall DB2 pureScale Architecture
– Scalability
– Availability• Automatic Workload Balancing & Client Routing
� Configuration and Monitoring– Cluster configuration and operational status
– Monitoring data
� Questions
Information Management
© 2009 IBM Corporation4
DB2 pureScale
� Unlimited Capacity– Buy only what you need, add capacity as
your needs grow
� Application Transparency– Avoid the risk and cost of
application changes
� Continuous Availability– Deliver uninterrupted access to your data
with consistent performance
Reduce the risk and cost of meeting changing business demands
Information Management
© 2009 IBM Corporation5
DB2 pureScale – Takeaway Themes
� It’s Easy– To install
– To administer
– To grow
– To run applications
– Automated Recovery
� It Performs– Linear Scalability
– Availability with no database freeze
– Efficient Centralized Locking and Caching
� It’s Meets OnDemand Requirements– Adding more horsepower
– Pricing … Pay for what you use when you use it
Reduce the risk and cost of meeting changing business demands
Information Management
© 2009 IBM Corporation6
DB2 pureScale Architecture
Global lock and memory manager technology
Automatic workload balancing
Shared Data
InfiniBand network
Cluster of DB2 members
Integrated Clustering Services
Information Management
© 2009 IBM Corporation7
DB2 pureScale is Easy to Deploy
Single installation for all components
Monitoring integrated into Optim tools
Single installation for fixpaks and updates
Simple commands to add and remove members
Information Management
© 2009 IBM Corporation8
Need more… just
deploy another
server and then
turn off DB2 when
you’re done.
Issue:
All year, except for
two days, the
system requires 3
servers of capacity.
Solution:
Use DB2 pureScale
and add another
server for those two
days, and only pay
sw license fees for
the days you use it.
Unlimited Capacity and Easy Scalability
� DB2 pureScale has been designed to grow to whatever
capacity your business requires
� Flexible licensing designed for minimizing costs of peak times
� Only pay for additional capacity when you use it even if for
only a single day
Over 100+ node architecture validation has been run by IBM� Add Members without changing applications and without
administrative complexity
Information Management
© 2009 IBM Corporation9
Proof of DB2 pureScale Architecture Scalability
� How far will it scale?
� Take a web commerce type workload– Read mostly but not read only
� Don’t make the application cluster aware– No routing of transactions to members
– Demonstrate transparent application scaling
Information Management
© 2009 IBM Corporation10
The Result
64 Members
95% Scalability
16 Members
Over 95%
Scalability
2, 4 and 8
Members Over
95% Scalability
32 Members
Over 95%
Scalability
88 Members
90% Scalability
112 Members
89% Scalability
128 Members
84% Scalability
Information Management
© 2009 IBM Corporation11
Dive Deeper into a 12 Member Cluster
� Looking at more challenging workload with more
updates – 1 update transaction for every 4 read transactions
– Typical read/write ratio of many OLTP workloads
� No cluster awareness in the application– No routing of transactions to members
– Demonstrate transparent application scaling
� Redundant system– 14 8-core p550s including duplexed PowerHA pureScale™
� Scalability remains above 90%
Information Management
© 2009 IBM Corporation12
Application Transparency
� Take advantage of extra capacity instantly– No need to modify your application code
– No need to tune your database infrastructure
Developers don’t even need to know more nodes are being added
Administrators can add capacity without re-tuning or re-testing
Information Management
© 2009 IBM Corporation13
Continuous Availability
Online Recovery– A key DB2 pureScale design point is
to maximize availability during failure recovery processing
– When a database member fails, only data in-flight on the failed member remains locked• In-flight = data modifications that are
part of active transactions on the member at the time it failed
– Redistribute workload to surviving nodes immediately
% of Data Available
Time (~seconds)
Only data in-flight updates
locked during recovery
Database member
failure (unplanned)
100
50
Stealth Maintenance– Allow DBAs to apply maintenance
without negotiating an outage
window
Information Management
© 2009 IBM Corporation14
Agenda
� What is pureScale– Where Does it come from
– Test Results
� Deeper Dive: Key Design Points– Overall DB2 pureScale Architecture
– Scalability
– Availability• Automatic Workload Balancing & Client Routing
� Configuration and Monitoring– Cluster configuration and operational status
– Monitoring data
� Questions
Information Management
© 2009 IBM Corporation15
Cluster Interconnect
DB2 pureScale : Technology Overview
Single Database View
Clients
Shared Database
Log Log Log Log
Shared Storage Access
CS CS CSCS
CS CS
CS
Member Member Member Member
Primary2nd-ary
DB2 engine runs on several host computers– Co-operate with each other to provide coherent access to the
database from any member
Data sharing architecture– Shared access to database– Members write to their own logs on shared disk– Parallel File System (GPFS bundled)– Logs accessible from another host (used during recovery)– SCSI 3 Persistent Reserve recommended
PowerHA pureScale technology– Efficient global locking and buffer management– Synchronous duplexing to secondary ensures availability
Low latency, high speed interconnect– Special optimizations provide significant advantages on RDMA-
capable interconnects (eg. Infiniband)
Clients connect anywhere,…… see single database– Clients connect into any member– Automatic load balancing and client reroute may change
underlying physical member to which client is connected
Integrated cluster services– Failure detection, recovery automation, cluster file system– In partnership with STG and Tivoli
Information Management
© 2009 IBM Corporation16
Agenda
� What is pureScale– Where Does it come from
– Test Results
� Deeper Dive: Key Design Points– Overall DB2 pureScale Architecture
– Scalability
– Availability• Automatic Workload Balancing & Client Routing
� Configuration and Monitoring– Cluster configuration and operational status
– Monitoring data
� Questions
Information Management
© 2009 IBM Corporation17
Group Buffer
Pool
Buffer Pool
Member 1
What Happens in DB2 pureScale to Read a Page
� Agent on Member 1 wants to read page 5011. db2agent checks local buffer pool: page not found
2. db2agent performs Read And Register (RaR) RDMA call
directly into PowerHA pureScale memory– No context switching, no kernel calls.
– Synchronous request to PowerHA pureScale
3. If PowerHA pureScale has the page, it returns the page to the
Member. Here, PowerHA pureScale replies that it does not
have the page (again via RDMA)
4. db2agent reads the page from disk
db2agent
1
2
3
4
501 501
PowerHA pureScale
Much more scalable, does not require locality of data
REG
Information Management
© 2009 IBM Corporation18
DB2 pureScale - Member 1 Updates a Row
1. Agent makes a Set Lock State (SLS) RDMA call to PowerHA pureScale for
X-lock on the row and P-lock to indicate the page will be updated– Prevents other members from making byte changes to the page at the exact same
time as this member
– SLS call takes as little as 15 microseconds end to end
2. PowerHA pureScale responds via RDMA with grant of lock request
3. Page updated
� At this point Member 1 does not
need to do anything else– P-lock is only released in a
lazy fashion
– If another Member wants it, they
can have it but otherwise
member 1 keeps it until commit
Member 2Member 2
Buffer pool GBP
db2agent
2
Member 1Member 1
3
Buffer pool
db2agent
501
501
GLM
501
1 REG REG
X - P
PowerHA pureScale
Information Management
© 2009 IBM Corporation19
DB2 pureScale - Member 1 Commits Their Update
1. Agent makes a Write And Register Multiple (WARM) RDMA call to
PowerHA pureScale for the pages it has updated
2. PowerHA pureScale will pull all the pages that have been updated directly
from the memory address of Member 1 into its global buffer pool– P-Locks released if not already released (as are X-locks for the row)
3. PowerHA pureScale invalidates
the page in all other members that
have read this page by directly updating
a bit in the other members’
buffer pool directory– Before a member accesses
this changed page again it
must get the current copy
from the Global Buffer Pool
– SILENT INVALIDATION: no CPU cycles
– No interrupt or other message processing
– Increasingly important as cluster grows
Member 2Member 2
Buffer pool GBP
db2agent
2
3
Member 1Member 1
Buffer pool
db2agent
501
501
GLM
501
1
501
501REG REG
X - P
PowerHA pureScale
Information Management
© 2009 IBM Corporation20
Member 1Member 1Page 1001
Member 2Member 2Page 1001
CFCF Page 1001
SW398122Huras4
NW849291Sachedina3
SW450321Zikopoulos2
NE111000Eaton1
Page 1001
UPDATE T1 SET C3 = 111000 WHERE C1 = 1
UPDATE T1 SET C4 = SE WHERE C1 = 4
RaR 1001
Member 1 Member 2
RaR 1001
SLS X_row1 P_1001
Valid = N
P-Lock
SW398122Huras4
NW849291Sachedina3
SW450321Zikopoulos2
NE109351Eaton1
P-Lock
SW398122Huras4
NW849291Sachedina3
SW450321Zikopoulos2
NE109351Eaton1
SLS X_row4
P_1001
NE111000Eaton1
SE398122Huras4
Release P-Lock and pull page
NE111000Eaton1
powerHA pureScale
DB2 pureScale - Two Members Update Same Page
Information Management
© 2009 IBM Corporation21
Summary on Achieving Efficient Scaling
� Deep RDMA exploitation over low latency fabric– Enables round-trip response time as low as 10-15 microseconds(lock request)
� Silent Invalidation– Informs members of page updates requires no CPU cycleson those members
– No interrupt or other message processing required
– Increasingly important as cluster grows
� Hot pages available without disk I/O from GBP memory– RDMA and dedicated threads enable read page operations in ~10s of microseconds
Information Management
© 2009 IBM Corporation22
Agenda
� What is pureScale– Where Does it come from
– Test Results
� Deeper Dive: Key Design Points– Overall DB2 pureScale Architecture
– Scalability
– Availability• Automatic Workload Balancing & Client Routing
� Configuration and Monitoring– Cluster configuration and operational status
– Monitoring data
� Questions
Information Management
© 2009 IBM Corporation23
Availability Goals
� HA automation built-in
“out of the box”:– Built-in failover, recovery and fail-back
� Unplanned events (eg. Member and
PowerHA pureScale server failures) – Instantaneous Online recovery
– All data that is not being updated is fully available during
recovery
� Planned events (eg. member
maintenance) : “Stealth” Maintenance– No errors
– No data availability loss
Information Management
© 2009 IBM Corporation24
Log
CS
CS
DB2
Member SW Failure : “Member Restart on Home Host”
Single Database View
Shared Data
Clients� kill -9 erroneously issued to a member
� DB2 Cluster Services automatically detects member’s death– Informs other members & powerHA pureScale
servers– Initiates automated member restart on same
(“home”) host– Member restart is like a database crash recovery in
a single system database, but is much faster• Redo limited to inflight transactions • Benefits from page cache in CF
� In the mean-time, client connections are transparently re-routed to healthy members– Based on least load (by default), or,– Pre-designated failover member
� Other members remain fully available throughout – “Online Failover”– Primary retains update locks held by member at the
time of failure – Other members can continue to read and update
data not locked for write access by failed member
� Member restart completes– Retained locks released and all data fully available– Transaction work routed to the restarted member
CS
DB2
CS
DB2
CS
DB2
CS
Updated Pages Global Locks
LogLogLog
Log Records Pages
PrimarySecondary
Updated Pages Global Locks
kill -9 Automatic;
Ultra Fast;
Online
Information Management
© 2009 IBM Corporation25
Log
CS
CS
DB2
Member HW Failure : “Member Restart on Guest Host (aka Restart Light)”
Shared Data
Clients� Power cord tripped over accidentally
� DB2 Cluster Services loses heartbeat and declares member down– Informs other members & PowerHA pureScale
servers– Fences member from logs and data– Initiates automated member restart on another
(“guest”) host• Using reduced, and pre-allocated memory model
– Member restart is like a database crash recovery in a single system database, but is much faster• Redo limited to inflight transactions • Benefits from page cache in PowerHA pureScale
� In the mean-time, client connections are automatically re-routed to healthy members– Based on least load (by default), or,
– Pre-designated failover member
� Other members remain fully available throughout – “Online Failover”– Primary retains update locks held by member at the
time of failure – Other members can continue to read and update
data not locked for write access by failed member
� Member restart completes– Retained locks released and all data fully available
CS
DB2
CS
DB2
CS
Updated Pages Global Locks
LogLogLog
PrimarySecondary
Updated Pages Global Locks
Fence
CS
DB2
DB2
Pages
Log Recs
Single Database View
Automatic;
Ultra Fast;
Online
Information Management
© 2009 IBM Corporation26
Log
CS
CS
DB2
Member Failback
Shared Data
Clients
� Power restored and system re-booted
� DB2 Cluster Services automatically detects system availability– Checks all pre-reqs (heartbeat, adapters, file system access, etc.)
– Informs other members and PowerHA pureScale servers
– Removes fence– Brings up member on home host
� Client connections automatically re-routed back to member
CS
DB2
CS
CS
Updated Pages Global Locks
LogLogLog
PrimarySecondary
Updated Pages Global Locks
CS
DB2
DB2
Single Database View
DB2
Information Management
© 2009 IBM Corporation27
Log
CS
CS
DB2
Primary PowerHA pureScale Failure
Shared Data
Clients
� Power cord tripped over accidentally
� DB2 Cluster Services loses heartbeat and declares primary down– Informs members and secondary– PowerHA pureScale service momentarily
blocked– All other database activity proceeds
normally• Eg. accessing pages in bufferpool, existing locks, sorting, aggregation, etc
� Members send missing data to secondary– Eg. read locks and page registrations
� Secondary becomes primary– PowerHA pureScale service continues
where it left off– No errors are returned to DB2 members
• Suspend lock timeouts
CS
DB2
CS
DB2
CS
Updated Pages Global Locks
LogLogLog
PrimarySecondary
Updated Pages Global Locks
CS
DB2
Single Database View
Primary
Automatic;
Ultra Fast;
Online
Information Management
© 2009 IBM Corporation28
Automatic Workload Balancing & Client Routing
� Run-time load information used to automatically balance load across the cluster (as in System z sysplex)– Load information of all members kept on each member– Piggy-backed to clients regularly– Used to route next connection (or optionally next transaction) to least loaded member– Routing occurs automatically (transparent to application)
� Failover– Load of failed member evenly distributed to surviving members automatically
� Fallback
ClientsClients
Information Management
© 2009 IBM Corporation29
Optional Affinity-based Routing App ServersGroup A
<affinity_list>
<list name="list1"
serverorder=“member1,member2,member3,member4" ></list>
<list name="list2"
serverorder=“member2,member3,member4,member1" ></list>
<list name="list3"
serverorder=“member3,member4,member1,member2" ></list>
<list name="list4"
serverorder=“member4,member1,member2,member3" ></list>
</affinity_list>
<client_affinity_defined>
<client name=“groupA" hostname="appsrv1.ibm.com" listname="list1" >
<client name=“groupB" hostname="appsrv2.ibm.com" listname="list2" >
<client name=“groupC" hostname="appsrv3.ibm.com" listname="list3" >
<client name=“groupD" hostname="appsrv4.ibm.com" listname="list4" >
</client_affinity_defined>
App ServersGroup B
App ServersGroup C
App ServersGroup D
db2dsdriver.cfg file excerpt:
Information Management
© 2009 IBM Corporation30
“Stealth” Maintenance : Example
Log LogLogLog
1. Ensure automatic load balancing (default) or automatic routing is enabled
2. db2stop member 3 quiesce<timeout>
3. db2stop instance on host <hostname>--also--
db2cluster –cm –enter –maintenance
4. Perform desired maintenance (eg. install AIX PTF)
5. db2cluster –cm –exit maintenance-- also --
db2start instance on host <hostname>
6. db2start member 3
Single Database View
DB2 DB2 DB2 DB2
Information Management
© 2009 IBM Corporation31
Agenda
� What is pureScale– Where Does it come from
– Test Results
� Deeper Dive: Key Design Points– Overall DB2 pureScale Architecture
– Scalability
– Availability• Automatic Workload Balancing & Client Routing
� Configuration and Monitoring– Cluster configuration and operational status
– Monitoring data
� Questions
Information Management
© 2009 IBM Corporation32
Low Cost of Management
�Over-arching goal: systems management
comparable to a single system.–Fully automated install• Including instance creation with multiple members and
PowerHA pureScale servers using the GUI.
–SQL and commands to view the state of the cluster
–High degree of automated recovery from unplanned
events
–Simple procedures for planned events• Grow and shrink the cluster by adding and dropping
members and PowerHA pureScale servers
• Add and remove disk space from a database
• Hardware and software maintenance activities
Information Management
© 2009 IBM Corporation33
Simple Installation
1. Complete pre-requisite work� AIX installed, on the network, access to shared disks
2. Add the member db2iupdt –add –m <MemHostName:MemIBHostName> InstName
Note: extending and shrinking the instance is an offline task in the initial release
SD image
You can also:� Drop member� Add / drop CF
3. DB2 does all tasks to add the member to the cluster� Copies the image and
response file to host6� Runs install� Adds M4 to the resources
for the instance.� Sets up access to the
cluster file system for M4
� Initial installation� Complete pre-requisite work: AIX installed, hosts on the network, access to shared
disks enabled.� Copies the DB2 pureScale image to the Install Initiating Host.� Installs the code on the specified hosts using a response file.� Creates the instance, members, and primary and secondary PowerHA pureScale
servers as directed. � Adds members, primary and secondary PowerHA pureScale servers, hosts, HCA
cards, etc. to the domain resources.� Creates the cluster file system and sets up each member’s access to it.
Add a member
host6host0
Install
Member 0
CSCS
scp image and rspfile
host4
Install
CSCS
Primary
host5
Install
CSCS
Secondary
Install
host1
Member 1
CSCS
Install
host2
Member 2
CSCS
Install
host3
Member 3
CSCS
Install
Initiating
Host
Copy Image Locally
DB2
pureScale
Image
Member 4
Install
CSCS
Member 4
Information Management
© 2009 IBM Corporation34
Instance and Host Status
0 host0 0 - MEMBER1 host1 0 - MEMBER2 host2 0 - MEMBER3 host3 0 - MEMBER4 host4 0 - CF5 host5 0 - CF
db2nodes.cfg
Host status
Instance statusDB2 DB2 DB2 DB2
Single Database View
CF CF
Shared Data
host1host0 host3host2
host5
Clients
host4
> db2start08/24/2008 00:52:59 0 0 SQL1063N DB2START processing was successful. 08/24/2008 00:53:00 1 0 SQL1063N DB2START processing was successful.
08/24/2008 00:53:01 2 0 SQL1063N DB2START processing was successful.08/24/2008 00:53:01 3 0 SQL1063N DB2START processing was successful.
SQL1063N DB2START processing was successful.
> db2instance -list
ID TYPE STATE HOME_HOST CURRENT_HOST ALERT
0 MEMBER STARTED host0 host0 NO
1 MEMBER STARTED host1 host1 NO
2 MEMBER STARTED host2 host2 NO
3 MEMBER STARTED host3 host3 NO
4 CF PRIMARY host4 host4 NO
5 CF PEER host5 host5 NO
HOST_NAME STATE INSTANCE_STOPPED ALERT
host0 ACTIVE NO NO
host1 ACTIVE NO NO
host2 ACTIVE NO NO
host3 ACTIVE NO NO
host4 ACTIVE NO NO
host5 ACTIVE NO NO
Information Management
© 2009 IBM Corporation35
Operational Monitoring� New monitoring views and SQL functions
• SYSIBMADM.DB2_CF• SYSIBMADM.DB2_MEMBER
– Global locking and global bufferpool statistics– Drill down into other PowerHA pureScaleinternal statistics
– Cluster communications time
– Cross-member page access statistics
� Drill down per member or get global view– Available from any member
� Event monitors “always available” mode– DB2 pureScale chooses initial member automatically
– Fails over automatically if member fails
� Various new monitoring elements– Example, GBP tuning related elements (partial list):• DATA_GBP_L_READS• DATA_GBP_P_READS• INDEX_GBP_L_READS• INDEX_GBP_P_READS
DB2 DB2 DB2 DB2
Single Database View
CF
Shared Data
host1host0 host3host2
host5
Clients
host4
Information Management
© 2009 IBM Corporation36
DB2 pureScale & Optim Tooling Solutions� Consistent and integrated approach
to offering tools for DB2 pureScale– Optim tools offered for enterprise and
distributed versions of DB2 fully aware of
DB2 pureScale environments
� Database Administration– Ability to perform common administration
tasks across members and PowerHA
PureScale servers
– Integrated navigation through shared
data instances
� System Monitoring– Seamless view of status and statistics
across all instances
– Includes information on locking,
connection, storage, memory, etc.
� Application Development– Full support for developing Java, C, and
.NET applications against a pureScale
environment
Select Quiesce options that
define how and when the action should occurLaunch Desired
Administration Task Assistant Select which
member to quiesce before taking it offline
View, modify, or execute the commands to complete a
task