\Memory\% Committed Bytes In Use< 80 \Memory\Available Mbytes> 5% of RAM

$Page 1: \Memory\% Committed Bytes In Use< 80 \Memory\Available Mbytes> 5% of RAM$

Monitoring and Tuning Microsoft Exchange Server 2013 PerformanceJeff MealiffePrincipal Program ManagerMicrosoft Corporation

OFC-B321

AgendaOptimizeSizingPlatform recommendationsStorage best practicesBalance

TroubleshootHigh level planHigh CPU scenariosExcessive memory utilizationStorage perf analysis

TrendCollecting perf dataTrending using BI & analysis toolsBuilding a scorecard, rhythm of planning

Q&A

3 goals for this session:Present best practices for optimizing Exchange performance

Provide tools/methods for diagnosing Exchange perf issues

Suggest methods for trending utilization, predicting capacity requirements

http://aka.ms/tena2014experf



Optimize

Size matters. Really.Most critical perf optimization is sizingLots of guidance available from Microsoft & partnersIf in doubt, round upPlan for growth in all dimensionsAvoid reliance on customization to achieve desired perf targets – setup deployment for success

http://aka.ms/E2013Sizing-MEChttp://aka.ms/E2013Sizing

http://aka.ms/E2013Sizing-MEC

http://aka.ms/E2013Sizing-MEC

http://aka.ms/E2013Sizing

Luke, use the calculatorNo really, use it.Role Requirements Calculator turns published sizing guidance into a modeling toolTry out various failure scenariosUnderstand the impact of different hardware & storage choicesProvides scripts for DAG, database & copy creationMany new featuresCAS sizingTransport storage sizingMultiple databases per-volume (JBOD) supportHigh availability architecture improvements

http://aka.ms/E2013Calc

Note: Baseline platform for CPU guidance changed in 2013. Don’t directly compare results from 2010 & 2013 calculators.

http://aka.ms/E2013Calc

Power considerationsBalance between energy efficiency and throughputPower used by CPUs, memory, disks, etc.Large slow drives help to conserve powerAvoid overdeploying CPU & memory, but don’t skimpIn general, use “High Performance” power plan for Exchange (highest perf state, all cores unparked), or use “Balanced” if required for power efficiencyMake sure BIOS is set to allow OS to manage power

Hyperthreading & Exchange 2013Turn off hyperthreading (SMT)!SMT provides gain in processor throughput, but overall the gain is not worth the “cost” based on our lab measurementsSignificant impact to some Exchange service memory footprints

Tuning .NET for storeBest practice to install KB 2803754 or 2803755http://support.microsoft.com/kb/2803754 (Windows Server 2008 R2)http://support.microsoft.com/kb/2803755 (Windows Server 2012)Set regkey after installing hotfix to enable: HKLM\Software\Microsoft\.NETFramework\DisableRetStructPinning (REG_DWORD) = 1Also install .NET 4.5.1 for other performance improvementsNo further updates required for Windows Server 2012 R2 (fix included in 4.5.1 on 2012 R2)

Reduces memory consumption in each store workerNo impact to sizing guidanceMemory is available for use by other processes

Decreases CPU spent in .NET garbage collectorBenefits Mailbox & multi-role

http://support.microsoft.com/kb/2803754

http://support.microsoft.com/kb/2803755

Network optimizationsSimplicity is key – multiple NICs generally don’t make sense for Exchange 2013NIC teaming doesn’t applyhttp://aka.ms/preferred

Use offload features, particularly including RSSRSS helps to scale CPU utilization, particularly on 10GbE portsDon’t bother adjusting specific offload settings, use defaults

Use latest (supported) version of Windows

http://aka.ms/preferred

Tuning IIS… or notVarious tuning parameters available for HTTP.sys and IIS usermode componentsIn general, Exchange doesn’t benefitExchange does tune a few connection limits at install

Keep the defaults in placeParticularly true on latest OS – Windows continues to improve OOB tuning

PagefileGuidance for Exchange 2013 is to use fixed size pagefile of size of RAM + 10MB, capped at 32778MBUse fixed size pagefile to avoid having to grow file under load

Likely won’t be large enough to capture full kernel dumpMinidumps may be enough to diagnose problemsDedicated dump file can be used – KB 969028, see DedicatedDumpFile regkey settings

MIN(RAM+10MB,32778MB)

\Memory\% Committed Bytes In Use < 80

\Memory\Available Mbytes > 5% of RAM

Storage tiering2 types of tiering to consider: manual & automaticManual tiering = databases placed on specific types of different storageAutomatic tiering = storage controller moves files or hottest blocks to “faster” storage typeNot recommended for ExchangeJetstress false-positives can be very misleadingMake sure you are testing all capacity simultaneously w/Jetstress

Very hard to predict true hot blocks for ExchangeConsider goal of simplicity, and think about slow/large disks for all Exchange storage

DAS storage cache policiesOn multi-role or Mailbox role servers, configure DAS storage controllers for 100% write cacheMailbox I/O patterns include lots of random reads, resulting in low cache hit ratesWhere predictions can be made, Exchange reads ahead into ESE cacheOther roles (CAS, AD) may benefit from some read cache on storage controller – consider 25% write/75% read

SSL offloadSSL offload is back in Exchange 2013 SP1Can be used to offload SSL-related processing to load balancer or reverse proxySome devices have specialized hardware for SSL processing, makes sense to take advantageExpect some reduction in CAS CPU consumption which will scale w/request ratehttp://aka.ms/e2013ssloffload

http://aka.ms/e2013ssloffload

Virtualization best practicesGenerally, never overcommit any resourceAbsolutely do not use memory overcommitMajor cause of perf problems on virtualized Exchange

Hyperthreading is OK, but size based on physical processor coresNever oversubscribe CPU, CPU constraints cause painDelivery throughput reduction = queue growthContent indexing throughput reduction = increased IOPSStore ROP processing throughput reduction = RPC latency & end-user pain

Size using guidance for physical, add CPU overheadExchange isn’t NUMA aware, use NUMA defaults

AD query performanceAD bottlenecks are a frequent cause of Exchange perf issuesWatch for high query latencies on Exchange servers & high CPU on AD

Use built-in “Active Directory Diagnostics” data collector set to troubleshootPlan to deploy enough RAM on AD servers to cache entire database fileFollow deployment guidance for core ratio (also in calc)

\MSExchange ADAccess Domain Controllers(*)\LDAP Search Time

\Processor(_Total)\% Processor Time

NTLM auth and MaxConcurrentAPINTLM auth has a limited set of worker threads that service requestsCan result in MaxConcurrentAPI bottlenecksSymptoms are typically client logon delays & timeoutsOften related to other issues which cause high auth velocity

http://aka.ms/maxconcurrentapi

\Netlogon\Semaphore Waiters

\Netlogon\Semaphore Holders

\Netlogon\Semaphore Acquires

\Netlogon\Semaphore Timeouts

\Netlogon\Average Semaphore Hold Time

http://aka.ms/maxconcurrentapi

Achieving balance for deployment healthOptimal hardware utilizationBetter response to failuresMore predictable user experience

Balance at CAS layerLoad balancing handles traffic distribution across CAS componentsSelect a solution that is application aware for high availabilitySelect a traffic distribution policy carefully – least connections may cause issues during outages or maintenance, consider round-robinGoal is approximately equal spread of inbound client requests across CAS role servers\Web Service(Default Web Site)\Current Connections

Balance within DAGAlways aim for well-balanced equal distribution of active copiesUtilize DB copy ActivationPreference and MaximumPreferredActiveDatabases parameters to ensure *-over maintains balanceConsider regular utilization of RedistributeActiveDatabases.ps1 to maintain balanceDuring localized high load events, redistribute active copies to migrate load

\MSExchange Active Manager(_total)\Database Mounted

Balance within databasesAlways aim for equal utilization of databases (space & activity)Spread out heavy and light users across databasesCan be based on heuristics around job role, or statistics like message send/receive rates

Monitor space utilization and rebalance databases regularly via mailbox movesDon’t forget about whitespace when evaluating available DB spaceHarder to balance based on usage

Balance within mailboxesExchange 2013 improved server-side perf for high item count foldersLegacy clients still have issues with high item counts, can lead to significant performance impactLook at Office 365 limits as best practice maximums, generally aim for more folders with fewer messageshttp://aka.ms/ExOnlineLimits

Set-mailbox can enforce mailbox shaping quotas in the same way as Office 365

http://aka.ms/ExOnlineLimits

Multi-role: just do itVery few reasons not to consider multi-role (Mailbox+CAS) deploymentMulti-role simplifies deployment, can reduce server countBenefit of increased availability at the CAS layerIssues remain with Windows NLB + DAG (WSFC)Certificate management may be a concern

Considerations of scale: up or out?How big is too big?Design for scale-out, not scale-upBetter alignment with intentions & design points of PGIdeally focus on “commodity” 2U servers as a platform to help minimize deployment riskWe don’t push the “top end” today – and don’t want you to either

Troubleshoot

Where do we begin?Two things to define:What are the specific symptoms (so we understand success criteria)What resources are impacted

Once we understand resource impact, try to isolate causeStart high level, work our way down

Prioritize service availability – collect relevant data and quickly attempt to restart/rebalance/reboot

High CPU issuesWhere is CPU going?Kernel vs. usermode (driver or HW issues?)Specific process(es)Generally high utilization

Processor counters will show mix of kernel & usermode consumption

If most consumption is in kernel (privileged), likely due to driver or HW problems

For usermode CPU, look at Process object counters to figure out process(es) consuming time

\Processor(_Total)\% Privileged Time

\Processor(_Total)\% User Time

\Process(*)\% Privileged Time

High CPU issuesDetermine if high CPU is load relatedHelps to have a baselineLook at RPC ops/sec and message traffic counters to indicate overall workload transaction rateIf load related, consider moving active copies to temporarily rebalance load issue

Prioritize availabilityTake multiple process dumps of impacted processes at intervals, restart impacted services.

Workload Management may run “background” tasks during off-peak timesCan result in higher than expected utilization\MSExchange WorkloadManagement Workloads(*)\ActiveTasks

\MSExchange WorkloadManagement Workloads(*)\CompletedTasks

\MSExchange WorkloadManagement Workloads(*)\QueuedTasks

Memory utilization issuesVast majority of Exchange components allocate via .NET CLR (managed memory)Pay attention to key CLR garbage collection metrics

Monitor working setsWorking set trimming can cause significant pain

\.NET CLR Memory(*)\% Time in GC < 10

\Memory\% Committed Bytes In Use < 80

\Memory\Available Mbytes > 5% of RAM

\.NET CLR Memory(*)\# Bytes in all Heaps

\Process(*)\Private Bytes

\Process(*)\Working Set

Storage performance analysisLatencies are key for ExchangeCan monitor via PhysicalDisk/LogicalDisk counters, better to monitor via ESE counters\MSExchange Database ==> Instances(*)\I/O Database Reads (Attached) Average Latency

< 20ms

\MSExchange Database ==> Instances(*)\I/O Database Writes (Attached) Average Latency

< 50ms

\MSExchange Database ==> Instances(*)\I/O Log Writes Average Latency < 10ms

\MSExchange Database ==> Instances(*)\I/O Database Reads (Recovery) Average Latency

< 200ms

\MSExchange Database ==> Instances(*)\I/O Database Writes (Recovery) Average Latency

< read latency for same instance as above

Storage performance analysisHigh latencies can have many potential causesDisk healthArray healthHigh volume of random I/O (seek times)High volume of sequential I/O (bandwidth)Controller performance (cache behavior?)Bus/connectivity issuesStorage stack, incl filter drivers

Look for higher than expected I/O\MSExchange Database ==> Instances(*)\I/O Database Reads (Attached)/sec

\MSExchange Database ==> Instances(*)\I/O Database Writes (Attached)/sec

\MSExchange Database ==> Instances(*)\I/O Log Writes/sec

Storage performance analysisUse Sysinternals Process Monitor to watch for unexpected I/O to Exchange volumesFilter on Exchange volume paths, add Duration columnhttp://aka.ms/procmon

Check array & controller health (RAID rebuild?)Check components that include storage filter drivers (like AV utilities)

http://aka.ms/procmon

One counter to rule them allVast majority of user-facing perf issues reflected in RPC Average LatencyGood “canary” to determine if a particular server is having problems

\MSExchangeIS Store(*)\RPC Average Latency < 100ms

\MSExchangeIS Client Type(*)\RPC Average Latency < 100ms

\MSExchangeIS Store(*)\RPC Operations/sec

\MSExchangeIS Client Type(*)\RPC Operations/sec

If in doubt…Open a support case with MicrosoftWell established escalation path into the product groupVery deep perf analysis skills, access to internal tools

Trend

Collecting perf dataExchange Diagnostics Service (EDS) captures relevant perf counters to the DailyPerformanceLogs directoryCollects up to 5GB of logsAutomatically purges old logs to stay under space quotaQuota can be adjusted by changing a line in Microsoft.Exchange.Diagnostics.Service.exe.config (parameter defines MB of space quota):<add Name="DailyPerformanceLogs" LogDataLoss="True" MaxSize="5120" MaxSizeDatacenter="2048" />

Logs can be easily loaded into SQL to build a data warehouse of perf historyMay need to consider adding indexes, using Analysis Services to help with query perf on large datasets

Loading data into SQLOn each machine, need to regularly copy logs off and keep track of what has been loaded into SQLWindows relog tool handles the upload processCreate an ODBC DSN that points to a SQL instanceCall relog against each log to uploadRelog.exe filename -f SQL -o SQL:dsn_name!log_set_namehttp://aka.ms/relog for details on relog options

http://aka.ms/relog

Basic analysisOnce data is in SQL, perfmon can utilize SQL data sourceAdditionally Excel and SQL Reporting Services can be used to query or reportExcel is fantastic for building trend reports against perf data

Build a scorecardA perf/capacity scorecard provides focus on a high-level view of the metrics that matterPick key health indicators, for example:CPU (average at peak)Storage utilizationMailflowClient connections for key client types

Set targets – both for health, as well as capacity triggersReport out on a regular schedule – operational rigor

SummaryProper sizing critical for a healthy deploymentSeek balance for optimal utilizationMulti-role is the goalLook at key resource utilization counters to troubleshoot perfBaselines are very helpfulCollect perf data & produce trend-based reporting: no surprises!

Q & A

[email protected]://aka.ms/tena2014experf



Breakout SessionsDCIM-B415 – Microsoft System Center 2012 R2 Operations Manager: Mastering Historical Monitoring DataDEV-B335 – Using the Cloud-Based Load Testing Service and Application Insights to Find Scale and Performance Bottlenecks in Your ApplicationsWIN-B413 – Windows Performance Deep Dive TroubleshootingDCIM-B360 – Key Metric, Performance, and Capacity Monitoring Using Microsoft System Center 2012 R2 Operations Manager

Related content

LabsWIN-IL301 – Resolving Windows Performance Issues without Opening a Support CaseWIN-IL302 – Windows 8.1 Managed Memory Debugging Using WinDBG

Microsoft Solutions Experience Location (MSE)Office Servers and Services (Go Deploy)

Find Me Later At. . .MSE Office Servers and Services (Go Deploy): Thursday 10:45 AM – 12:15PM


Resources

Learning

Microsoft Certification & Training Resources

www.microsoft.com/learning

msdn

Resources for Developers

http://microsoft.com/msdn

TechNet

Resources for IT Professionals

http://microsoft.com/technet

Sessions on Demand

http://channel9.msdn.com/Events/TechEd

http://www.microsoft.com/learning

http://microsoft.com/msdn

http://microsoft.com/technet



Complete an evaluation and enter to win!

Evaluate this session

Scan this QR code to evaluate this session.

© 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Documents

\Memory\% Committed Bytes In Use< 80 \Memory\Available Mbytes> 5% of RAM