Upload
laura-smith
View
219
Download
5
Embed Size (px)
Citation preview
Failover Clustering Pro Troubleshooting with Windows Server 2008 R2Steven EkrenSenior Program ManagerMicrosoft Corporation
SESSION CODE: WSV314
Qualifying Clusters
Cluster Validation
Cluster Event Logging
Cluster Debug Logging
Agenda
Troubleshooting Tips
Qualifying Clusters
Cluster Validation
Cluster Event Logging
Cluster Debug Logging
Agenda
Troubleshooting Tips
Win2008+ Cluster Support PolicyWhat defines Microsoft supportability:
Purchase components which all have a “Certified for Windows Server 2008” logo
Servers, Storage, HBAs, MPIO DSMs, etc…Connect your hardwareRun Validate to verify interoperabilityIf Validate passes, it’s supported!
If you make a change… just re-run ValidateIt’s that simple!
No changes to this policy in Win2008 R2See this doc for more details:
http://go.microsoft.com/fwlink/?LinkID=119949
Frequently Asked QuestionsCan I create Guest Clusters with VM’s?
Yes! Mix physical and virtual in the same cluster?Yes!
Do the servers have to be identical?No
If it passes Validate, it’s supported!
Qualifying Clusters
Cluster Validation
Cluster Event Logging
Cluster Debug Logging
Agenda
Troubleshooting Tips
Cluster Validation ToolBuilt into the productTests collection of servers and storage that is intended to be a cluster Run validate each and every time you …
Create a new cluster Add a node, disk, or networkUpdate system software (drivers, firmware, service packs)Configure hardware (HBA, MPIO, NIC teaming)Change any component in your solution
It’s the very first thing you do!Run on configured clusters as a diagnostic tool
Enhanced Validation in R2Scriptable with Test-Cluster PowerShell cmdletCollects configuration informationIncludes additional “best practices” tests
GUI DependenciesQuorum configurationStatus of cluster resourcesOffers prescriptive guidance to achieve higher availability
New Validation Tests in R2Cluster Configuration
List Information (Core Group, Networks, Resources, Storage, Services and Applications)Validate Quorum ConfigurationValidate Resource StatusValidate Service Principal NameValidate Volume Consistency
NetworkList Network Binding OrderValidate Multiple Subnet Properties
System ConfigurationValidate Cluster Service and Driver SettingsValidate Memory Dump SettingsValidate OS Installation Options
Replaced Validate Operating Systems
Validate System Driver Variable
Using Validate
DEMO
Troubleshooting TipsIt’s best to use Validate first when:
Problem creating a cluster…Problem with storage...
Considerations…Validating storage requires disks be Offline, which means you need to schedule a maintenance windowRunning Validate with only a single node won’t help you much…You don’t always need to run a FULL validate
http://technet.microsoft.com/en-us/library/cc732035(WS.10).aspx Don’t “assume” the cluster will work and skip Validate
Validate Logging
Client side – Holistic Overview • Client refers to the system running CluAdmin.msc which is invoking Validate (could be running
RSAT)• Global view• Log file in C:\Windows\Cluster\Reports
• “Validation Report YYYY.MM.DD at HH.MM.SS.MHT”
Server side – Verbose Debug• For storage tests there is a verbose log which can contain additional information• Log file in C:\Windows\Cluster\Reports
• ValidateStorage.log• Similar in granularity to a Cluster.log• Each node has a unique log
• Need to collect logs from all nodes to get a holistic view
Qualifying Clusters
Cluster Validation
Cluster Event Logging
Cluster Debug Logging
Agenda
Troubleshooting Tips
Event Viewer
Where to find Cluster events in Event Viewer
Cluster EventsLevel System Channel Operational Channel
Critical P
Error P
Warning P
Informational P
• Operational channel found under:– Applications and Services Logs \ Microsoft \ Windows \ FailoverClustering
Viewing Events Cluster WideFailover Cluster Manager (CluAdmin.msc) provides an aggregated view of cluster events from all nodes
Click “Recent Cluster Events” to see all Error and Warnings cluster wide in the last 24 hoursBuild your own event queries
Built-In Event QueriesOn the right hand ‘Actions’ pane in Failover Cluster Management there are links to open filtered events
• Events associated with all resources in the group
Application Level
• Events related to that specific resource
Resource Level
New Diagnostic Logging with R2Capture snap-in pop-up’s
Even before cluster creationNew debug logging channels
Disabled by defaultEnabled for advanced troubleshooting
Cluster.log converted to an ETW channel, now appears in Event Viewer as well
Tip: Be sure to click on View / Show Analytic and Debug Logs
Capture Event QueriesSave Failover Cluster Manager event query results as EVTX files for future analysisEnables you to build an aggregated / filtered collection of the events needed and send them to someone else
Viewing Cluster Events
DEMO
Understanding Cluster EventsEvery cluster event edited with improved descriptive text and error codes in Win2008Online troubleshooting steps for all cluster events:
http://technet2.microsoft.com/windowsserver2008/en/library/19adfd9a-6688-455c-8c33-4fc4b0da6e251033.mspx?mfr=true
Monitoring Cluster EventsFully featured Failover Cluster Management Packs for:
System Center Operations Manager 2007Microsoft Operations Manager 2005
Troubleshooting Tips When you encounter a problem, always, always, always start with cluster Events
Look at a cluster wide view of the cluster eventsDig into all events in the System Event logCheck the Application Event log
Don’t be distracted by symptoms - focus on root causeFor example, if you see cluster IP Address failures, don’t waste lots of time looking at cluster events
Instead look for other networking related errorsThere may be multiple retries after a failure, producing more events. Look for what caused the first failure
Qualifying Clusters
Cluster Validation
Cluster Event Logging
Cluster Debug Logging
Agenda
Troubleshooting Tips
Cluster Debug LoggingMigration from legacy cluster debug logging (cluster.log) to Event Tracing for Windows (ETW)
Legacy text based Cluster.log no longer existsAll cluster debug logging done to an event trace session: Microsoft-Windows-FailoverClustering
Configuring the LogLogging enabled by defaultLog files stored as .ETL in:
%WinDir%\System32\winevt\logs\Microsoft-Windows-FailoverClusteringDefault log size is 100 MB
Stored as a cluster property ( ClusterLogSize )Change via: Set-ClusterLog –size 100
Each time a node is rebooted, file suffix is incremented~Diagnostic.etl.001Up to three log files
This means log history can be kept for up to three rebootsThe number of logs can be modified via the registry:HKLM\Software\Microsoft\Windows\CurrentVersion\WINEVT\Channels\Microsoft-Windows-FailoverClustering/Diagnostic\FileMax
Etl.001
Etl.002Etl.003
Reboot
Reboot
Reboot
Each of these logs are circular
ETL LoggingThe cluster debug logging to ETL files in the %systemroot%\system32\winevt\Logs subdirectory
Microsoft-Windows-FailoverClustering%4Diagnostic.etl.00xEach individual log is circularReboot will cause logging to start a new log By default, there are 3 logs and each log has a maximum size of 100 MB
Understanding Log Gaps
An ETL file lasts for the uptime of a nodeA new ETL file is used each time you restart the node
When you restart, you move on to the next file. After you have restarted 3 times you return back to the first file.
Each ETL has a log size of 100 MB and will wrap on themselves, but only within their own logCmdlet will merge all the .ETL logging data into a single contiguous text file
This is extremely confusing, and a common question on where the data wentIn reality, it is ok… you didn’t need it anyway
Etl.001 Etl.002 Etl.003Reboot Reboot
Producing the Cluster.logCluster trace session can be dumped to a text file that looks very similar to the legacy Cluster.logDumps the Cluster ETW channel to a text log located at:
%WinDir%\Cluster\Reports\Cluster.log
Get-ClusterLog cmdletSwitch Effect
-Destination Dump the log on all nodes and copy them to this location. Dumps the logs on all nodes in the entire cluster to a single directory
-TimeSpan Just dump the last X minutes of the log
-Node Useful when the ClusSvc is down to dump a specific node’s logs
Viewing LogTracerpt.exe can be used to dump the trace session
.EVTX and view in Event Viewer (eventvwr.msc)
.XML and apply a script to parse the XML log data into any format you please
Cluster Log Output LevelsLevel Error Warning Info Verbose Debug
0 (disabled)
1 P2 P P3 P P P4 P P P P5 P P P P P
Cluster Logging LevelsLogging level is configurable cluster wide
Set-ClusterLog –level 3Logging levels to control Cluster.log granularity:
Can have performance impact
Default
Cluster Logging
DEMO
Logging TipsAll cluster logs are captured in a single directory:
C:\Windows\Cluster\ReportsIncludes:
Validation Report .MHT logsValidateStorage.logCreateCluster .MHT logsAdd Node .MHT logsLog files for configuring an HA roleCluster.logEnabling Disks for CSV logs
A great directory to zip up and send off when needing help
Troubleshooting TipsThe cluster log is verbose and complex!
It should be the last place you go, not the firstMake sure your cluster.log captures at least 72 hours of data
Mileage will vary depending on how noisy apps areCluster log timestamps are in GMT, while event log timestamps are in local timeStart at the bottom of the log and work your way backwards searching for “ERR” linesUse NET HELPMSG to decipher error codes
Performance CountersMonitor cluster API, resource failures, communication, I/O patterns
Useful for monitoring CSV I/O redirectionSee this blog for more details: http://blogs.msdn.com/clustering/archive/2009/09/04/9891266.aspx
PowerShell SupportAll Cluster cmdlet’s now have full help online (aka. it’s searchable!)
http://technet.microsoft.com/en-us/library/ee461009.aspx Generate and configure cluster debug log Support for “read-only” access
Enables help desk to view (and not modify) the state of the cluster
Qualifying Clusters
Cluster Validation
Cluster Event Logging
Troubleshooting Tips
Agenda
Cluster Debug Logging
CSV Troubleshooting TipsCSV will redirect I/O over multiple fabrics
Direct I/O over block storageFiber Channel / SAS / iSCSI
Redirected I/O over file storageNetwork over SMB
When troubleshooting a CSV “storage” problem, it could really be a network problem
Check network connectivity between nodesAbility to authenticate with a domain controllerDon’t make assumptions, things are different!
Troubleshooting RHS Terminations
How clustering deals with unresponsive resourcesa) RHS makes calls to resources (IsAlive, LooksAlive, Online, Offline, Terminate, etc…)b) If that resource doesn’t respond, cluster health detection attempts to recoverc) The RHS process is restarted, so the resource can be restartedGenerates an Event 1230
Cluster resource 'Resource Name' (resource type '', DLL ‘blah.dll') either crashed or deadlocked. The Resource Hosting Subsystem (RHS) process will now attempt to terminate, and the resource will be marked to run in a separate monitor.
Do normal troubleshooting!What was the resource trying to do? See http://support.microsoft.com/kb/914458 Look for underlying core failures / events
Physical Disk… look for storage issuesNetwork Name… look for networking issues
See these blogs for more details:http://blogs.technet.com/askcore/archive/2009/11/23/resource-hosting-subsystem-rhs-in-windows-server-2008-failover-clusters.aspxhttp://blogs.msdn.com/clustering/archive/2009/06/27/9806160.aspx
Ensuring Graceful Shutdown Tip
When you click Start / Shutdown…Service Control Manager issues a STOP to all servicesServices are given 20 seconds to stopServices are terminated if they do not complete in time
Stopping a large number of VM’s could take longer than 20 seconds…ClusSvc will notify SCM in R2If Win2008, either:a) Offline / Move all groups prior to shutdownb) Manually modify SCM’s timeout
HKLM\SYSTEM\CurrentControlSet\Control\WaitToKillServiceTimeout http://technet.microsoft.com/en-us/library/cc976045.aspx
Cluster AD ConsiderationsAll Cluster Network Name resources now have associated computer objects
RequireKerberos is now on by default (and required)All nodes must be members of the same domain
Must be an Active Directory based domainComputer objects for cluster names are handled just like objects for normal machines
Passwords now rotated
Cluster Name Object (CNO)CNO – The computer object associated with the Cluster Name
It’s special…When creating a clustering, you must…
Be logged on locally with a domain user accountHave administrative privileges on all nodesHave “Create Computer Objects” privileges in the default Computers container
Or have Full Control permission to an existing disabled computer object
Virtual Computer Object (VCO)VCO – The computer object associated with Network Name resources for services and applicationsOnce a cluster is created, the CNO is the security context used moving forward
The CNO creates the VCOService is completely self managing
CNO requires privileges to the default Computer containerCreate Computer ObjectsAdd Workstations to a domain
Watch out there is a default quota of 10!
Creating Computer Objects
Installer creates CNOCNO creates VCO’s
Installer
Active Directory
Computers
CNO
VCO
VCO
Custom OU
Session SummaryValidate is a pre-deployment verifier, criterion for supportability, and also a diagnostic toolNew support policy for Win2008 and beyondFailover Cluster Management tool provides best mechanism to view cluster eventsCluster logging infrastructure is new, but it delivers what you are used toWindows Server 2008 R2 enhances Validate tool, adds Events and perf counters, and introduces PowerShell cmdletsFollow best practices to keep your cluster running smoothly
Passion for High Availability?
Are You Up For a Challenge?
Become a Cluster MVP!
Contact: [email protected]
Related ContentBreakout Sessions
WSV313 | Failover Clustering Deployment SuccessWSV314 | Failover Clustering Pro Troubleshooting with Windows Server 2008 R2VIR303 | Disaster Recovery by Stretching Hyper-V Clusters across Sites ARC308 | High Availability: A Contrarian ViewDAT207 | SQL Server High Availability: Overview, Considerations, and Solution GuidanceDAT303 | Architecting and Using Microsoft SQL Server Availability Technologies in a Virtualized WorldDAT305 | See the Largest Mission Critical Deployment of Microsoft SQL Server around the WorldDAT401 | High Availability and Disaster Recovery: Best Practices for Customer DeploymentsDAT407 | Windows Server 2008 R2 and Microsoft SQL Server 2008: Failover Clustering ImplementationsUNC304 | Microsoft Exchange Server 2010: High Availability Deep DiveUNC305 | Microsoft Exchange Server 2010 High Availability Design Considerations
Interactive SessionsVIR06-INT | Failover Clustering with Hyper-V Unleashed with Windows Server 2008 R2UNC01-INT | Real-World Database Availability Group (DAG) DesignVIR02-INT | Hyper-V Live Migration over Distance: A Multi-Datacenter Approach BOF34-IT | Microsoft Exchange Server High Availability and Disaster Recovery: Are You Prepared?
Hands-on LabsWSV01-HOL | Failover Clustering in Windows Server 2008 R2DAT01-HOL | Create a Two-Node Windows Server 2008 R2 Failover ClusterDAT02-HOL | Create a Windows Server 2008 R2 MSDTC ClusterDAT09-HOL | Installing a Microsoft SQL Server 2008 + SP1 Clustered InstanceDAT12-HOL | Maintaining a Microsoft SQL Server 2008 Failover ClusterUNC02-HOL | Microsoft Exchange Server 2010 High Availability and Storage ScenariosVIR06-HOL | Implementing High Availability and Live Migration with Windows Server 2008 R2 Hyper-V
Visit the Cluster Team in the TLC
Failover Clustering Booth
WSV-7
Failover Clustering ResourcesCluster Team Blog: http://blogs.msdn.com/clustering/
Cluster Resources: http://blogs.msdn.com/clustering/archive/2009/08/21/9878286.aspx
Cluster Information Portal: http://www.microsoft.com/windowsserver2008/en/us/clustering-home.aspx
Clustering Technical Resources: http://www.microsoft.com/windowsserver2008/en/us/clustering-resources.aspx
Clustering Forum (2008): http://forums.technet.microsoft.com/en-US/winserverClustering/threads/
Clustering Forum (2008 R2): http://social.technet.microsoft.com/Forums/en-US/windowsserver2008r2highavailability/threads/
R2 Cluster Features: http://technet.microsoft.com/en-us/library/dd443539.aspx
Resources
www.microsoft.com/teched
Sessions On-Demand & Community Microsoft Certification & Training Resources
Resources for IT Professionals Resources for Developers
www.microsoft.com/learning
http://microsoft.com/technet http://microsoft.com/msdn
Learning
Complete an evaluation on CommNet and enter to win!
Sign up for Tech·Ed 2011 and save $500 starting June 8 – June 31st
http://northamerica.msteched.com/registration
You can also register at the
North America 2011 kiosk located at registrationJoin us in Atlanta next year
© 2010 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to
be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
JUNE 7-10, 2010 | NEW ORLEANS, LA