If you can't read please download the document
Upload
tim-serong
View
6.168
Download
1
Embed Size (px)
DESCRIPTION
High Availability can be a curiously nebulous term, and most people probably don't care about it until they can't access their online banking service, or their plane crashes. This presentation examines some of the considerations necessary when building highly available computer systems, then focuses on the HA infrastructure software currently available from the Corosync/OpenAIS, Linux-HA and Pacemaker projects. Originally presented at Linux Users Victoria in April 2010 (http://luv.asn.au/2010/04/06)
Citation preview
2. Agenda
3. System Design Considerations 4. HA Clustering Software 5. What is High Availability? 6. What is High Availability? High availability is a system design protocol and associated implementation that ensures a certain degree of operational continuity during a given measurement period. http://en.wikipedia.org/wiki/High_Availability 7.
8. Decrease MTTR (redundant hardware + software) What is High Availability? Availability = MTTF MTTF + MTTR 9. What is High Availability? (hopefully your hardware is better than this) 10. What is High Availability?
11. What is High Availability?
12. High Availability in 37 Easy Steps 13. What is High Availability?
14. High Availability in 37 Easy Steps 15. High Availability is a Process, not a Product 16. What is High Availability? (hopefully you hired this sysadmin) http://xkcd.com/705/ 17. System Design Considerations 18. System Design Considerations
19. How good is your system already? 20. Within what limits can you operate? 21. Please, for the love of Eris, keep it simple. 22. System Design Considerations Dual F/C Ethernet RAID File Server Client Network 23. System Design Considerations Dual F/C Ethernet Reasonably Highly Available, Most of the Time RAID File Server Client Network 24. System Design Considerations
25. Redundant F/C connections 26. RAID Bad:
27. Software can still fail 28. System Design Considerations Dual F/C Ethernet Dual F/C Ethernet Private Network File Server RAID Node 1 Client Network Node 2 29. System Design Considerations Dual F/C Ethernet Dual F/C Ethernet Private Network File Server Node 2 takes over when Node 1 failsRAID Node 1 Client Network Node 2 30. System Design Considerations
31. Who's the boss if the two nodes get confused? 32. STONITH to the rescue 33. System Design Considerations 34. System Design Considerations
35. Set STONITH action to power off (not reset). 36. Get a third node. 37. Test,test ,test! 38.
39. HA Clustering Software
40. Easy to configure...
41. node1IPaddr::192.168.1.50 Filesystem::/dev/sda1::/data1::ext3 ...because it couldn't do anything. http://theclusterguy.clusterlabs.org/post/178680309/configuring-heartbeat-v1-was-so-simple 42. HA Clustering Software
43. Resource level monitoring 44. Dependencies between resources 45. HA Clustering Software
# cibadmin -Q ... ... 46. HA Clustering Software
47. HA Clustering Software
48. Pacemaker 0.6 (CRM) 49. (also glue, agents) Pacemaker added support for OpenAIS as an alternative to Heartbeat
50. HA Clustering Software http://clusterlabs.org/wiki/Architecture 51. HA Clustering Software
52. OpenAIS (SA Forum APIs, i.e. magic for DLM, OCFS2, etc.) So now we have:
53. Pacemaker 1.x on Corosync 1.x (+ OpenAIS 1.x) 54. HA Clustering Software Linux Kernel (SUSE only) (diagram courtesy of Lars Marowsky-Bre) ext3, XFS OCFS2 cLVM2 Local Disks SAN FC(oE), iSCSI DRBD Multipath IO DLM SCTP TCP UDP multicast UDP multicast Ethernet Infiniband Bonding SAP MySQL libvirt Xen Apache iSCSI Filesystems IP address DRBD clvmd Ocfs2_controld dlm_controld YaST2 c DRBD c OpenAIS MPIO LVS Resource Agents LSBinit STONITH LRM ... DRAC iLO SBD Fencing Web GUI Python GUI CRM Shell CIB Policy Engine Pacemaker OpenAIS 55. HA Clustering Software
56. ...and vastly more flexible # crm configure show primitive IP ocf:heartbeat:IPaddrparams ip="192.168.1.50"op monitor interval="5min" primitive FS ocf:heartbeat:Filesystemparams device="/dev/sda1" directory="/data1" fstype="ext3"op monitor interval="60s" group ip-with-fs IP FS location prefer-node1 ip-with-fs 100: node1 57. HA Clustering Software NetworkLinks Clients Storage (diagram courtesy of Lars Marowsky-Bre) Kernel Xen VM 1 LAMP Apache IP ext3 Kernel Kernel Corosync + openAIS Pacemaker DLM cLVM2+OCFS2 Xen VM 2 58. HA Clustering Software
59. crm(live)# cib new sandbox 60. INFO: sandbox shadow CIB created 61. crm(sandbox)# cib cibstatus load live 62. crm(sandbox)# cib cibstatus op monitor IP not_running 63. crm(sandbox)# configure ptest 64. ptest[12971]: 2010/04/05_07:43:36 WARN: unpack_rsc_op: Processing failed op IP_monitor_300000 on hex-14: not running (7) 65. HA Clustering Software 66. HA Clustering Software
Compiles:
67. Package state 68. DLM/OCFS2 state 69. System information 70. CIB history 71. Parses core dump reports (needs debuginfo packages!) 72. ...into a single tarball for subsequent analysis 73. HA Clustering Software
74. #linux-cluster 75. Various mailing lists SUSE and Red Hat converging on cluster stacks 76. Heartbeat in maintenance mode 77. Questions and Answers 78. Further Reading
79. http://www.linux-ha.org/ 80. http://www.corosync.org/ 81. http://www.novell.com/products/highavailability/ 82. http://www.linbit.com/ 83. http://www.ourobengr.com/ha