40
Pascal Benois Performance Troubleshooting and Optimization #SPSBE #SPSBE18

Pascal benois performance_troubleshooting-spsbe18

  • Upload
    biwug

  • View
    384

  • Download
    1

Embed Size (px)

DESCRIPTION

Where to start? - the first 2 hours of performance troubleshooting • The performance cheat sheet: cover all the basics before you start• Data collections and mining the logs • Common techniques to improve performance

Citation preview

Page 1: Pascal benois performance_troubleshooting-spsbe18

Pascal Benois

Performance Troubleshooting and Optimization

#SPSBE

#SPSBE18

Page 2: Pascal benois performance_troubleshooting-spsbe18

About me

• Microsoft Premier Field Engineer

• Into SharePoint for ages

• Psychobilly enthousiast

• Eddy Merckx fanatic

Page 3: Pascal benois performance_troubleshooting-spsbe18

A big thanks to our sponsors

Venue Sponsor

Platinum Sponsors

Gold Premium Sponsors

Gold Sponsors

Page 4: Pascal benois performance_troubleshooting-spsbe18

Agenda

• The first minutes

• Common scenarios

• Shoud I virtualize ?

• HW considerations

• SQL server considerations

• Memory leaks for the admin (couldn’t prevent myself)

• Caching

Page 5: Pascal benois performance_troubleshooting-spsbe18

Agenda

• So…we are not gonna make it !

Page 6: Pascal benois performance_troubleshooting-spsbe18

QUESTIONS TO ASK• Where is a bottleneck?

• Are all pages/sites/Web applications/servers affected?

• Any strange patterns?

• Is the issue intermittent?

• Does the issue occur for a subset of users?

• Any errors or unexpected status codes in any if the logs?

• Are there any customizations in place?

• Have any software boundaries and limits been breached?

• What does analysis of common performance counters show?

Page 7: Pascal benois performance_troubleshooting-spsbe18

COMMON SCENARIOS – SLOW PAGE LOAD• Issue: A single page is always slow to load, no other pages in the site are slow

• Likely causes:

• Poor custom code/customizations

• Page payload is large or has multiple round-trips

• A custom Web part is performing badly

• Operations involving large lists (most likely throttled)

• Caching is not working correctly for content served on the page

Page 8: Pascal benois performance_troubleshooting-spsbe18

COMMON SCENARIOS – SLOW PAGE LOAD• Recommended tools:

• Fiddler

• IIS Logs and Log Parser

• Usage Database

• Developer Dashboard

• SPDisposeCheck

Page 9: Pascal benois performance_troubleshooting-spsbe18

COMMON SCENARIOS – SLOW PAGE LOAD• Issue: Multiple pages are slow to load but the issue is intermittent

• Likely causes:

• Poor custom code/customizations

• Page payload is large or has multiple round-trips

• A custom Web part is performing badly

• Operations involving large lists (most likely throttled)

• Caching is not working correctly

• Load balancer device incorrectly configured or a WFE is experiencing problems

• Load on WFEs is too high (could be NIC, CPU, memory etc.)

Page 10: Pascal benois performance_troubleshooting-spsbe18

COMMON SCENARIOS – SLOW PAGE LOAD• Recommended tools:

• Fiddler

• IIS Logs and Log Parser

• Usage Database

• Developer Dashboard

• SPDisposeCheck

• Performance Monitor

• PAL

Page 11: Pascal benois performance_troubleshooting-spsbe18

COMMON SCENARIOS – SLOW SITE• Issue: A single site is consistently slow

• Likely causes:

• Poor custom code/customizations

• Page payload is large or has multiple round-trips

• A custom Web part is performing badly

• Caching is not working correctly

Page 12: Pascal benois performance_troubleshooting-spsbe18

COMMON SCENARIOS – MULTIPLE SLOW SITES• Issue: Multiple sites are consistently slow

• Likely causes:

• Poor customized codes

• Web Application/Farm scoped customizations

• Caching is not working correctly

• SQL Server blocking due to large lists/databases

• Load balancer device incorrectly configured or a WFE is experiencing problems

• Load on WFEs is too high (could be NIC, CPU, memory etc.)

Page 13: Pascal benois performance_troubleshooting-spsbe18

WHICH SHAREPOINT ROLE SHOULD I VIRTUALIZE?

Page 14: Pascal benois performance_troubleshooting-spsbe18

WEB ROLE• Responsible for rendering of content

• Low amount of disk activity

• Multiple web role servers are common for redundancy and scalability

• Best Practices• Be sure to keep all components, applications, and patch levels the

same• Network Load Balancing (NLB)

• Hardware -> Offload NLB to dedicated resources

• Software -> CPU and Network usage on WFE

• For minimum availability split your load balanced virtual web servers over two physical hosts

Page 15: Pascal benois performance_troubleshooting-spsbe18

QUERY ROLE• Process search queries

• Requires propagated copy of the index• 10%- 30% of total size of documents indexed

• Best Practice• Large Indexes – Prefer dedicated physical LUN on SAN over dynamic

expanding virtual hard disk

• Don’t put your query and index servers on the same underlying physical disk

• Combine or split Web/Query role?• It depends on your environment.

• Web and Query performance requirements

Page 16: Pascal benois performance_troubleshooting-spsbe18

INDEX ROLE• Memory, CPU, Disk I/O and network intensive

• Best Practices

• Give most amount of RAM out of front ends

• Potentially keep as physical machine in larger environments

• Use Index server to be dedicated crawl server. Avoids hop.

• Use fixed-size VHDs or physical LUN on iSCSI SAN for best performance

Page 17: Pascal benois performance_troubleshooting-spsbe18

OTHER ROLES• Excel Services, PerformancePoint Services, Access Services, Visio Services, etc. are good

candidates for virtualization

• Additional servers can simply be added into the farm

• No additional hardware investment required

Page 18: Pascal benois performance_troubleshooting-spsbe18

DATABASE ROLE• SQL Server 2005/ 2008 virtualization fully supported

• Memory, CPU, Disk I/O and network intensive

• Assess first using Microsoft Assessment and Planning Toolkit (www.microsoft.com/map).

• SQL Alias flexibility

• Argument for Physical:

• SQL Server is already a consolidation layer

• Disk I/O activity

• Performance, performance, performance!

• Longer response times impacts ALL downstream roles in a SharePoint farm

Page 19: Pascal benois performance_troubleshooting-spsbe18

DATABASE ROLE• If you decide to virtualize database layer:

• Assign as much RAM and CPU as possible

• Offload the Disk I/O from the virtual machines

• Use fixed-size VHDs or physical LUN on an iSCSI SAN

• SQL Clustering: When virtualizing, consider making use of Guest Clustering in Hyper-V

• SQL Database Mirroring: Fully supported in SharePoint 2010 in physical or virtual database role environments

Page 20: Pascal benois performance_troubleshooting-spsbe18

CPU BEST PRACTICES

PHYSICAL• Performance is governed by processor efficiency, power draw and heat output

• Faster versus efficient processor – hidden power consumption cost

• Beware of built in processor software such as performance throttle for thermal thresholds

• Prefer higher number of processors and multi core

• Prefer PCI Express to limit bus contention & CPU utilization

Page 21: Pascal benois performance_troubleshooting-spsbe18

CPU BEST PRACTICES

VIRTUAL• Configure a 1-to-1 mapping of virtual CPU to physical

CPU for best performance • Be aware of the virtual processor limit for different

guest operating systems and plan accordingly • Beware of “CPU bound” issues, the ability of

processors to process information for virtual devices will determine the maximum throughput of such a virtual device. Example: Virtual NICS

Page 22: Pascal benois performance_troubleshooting-spsbe18

DISK BEST PRACTICES

PHYSICAL• Ensure you are using the fastest SAN infrastructure: Attempt to provide each virtual

machine with its own IO channel to shared storage using dual or quad ported HBAs and Gigabit Ethernet adapters.

• Use iSCSI SANs for if considering guest clustering

• Ensure your disk infrastructure is as fast as it can be. (RAID 10; 15000 RPM) – Slow disk causes CPU contention as Disk I/O takes longer to return data.

• Put virtual hard disks on different physical disks than the hard disk that the host operating system uses

Page 23: Pascal benois performance_troubleshooting-spsbe18

DISK BEST PRACTICES

VIRTUAL• Prefer SCSI controller to IDE controller. • Prefer fixed size to dynamically expanding• Prefer direct iSCSI SAN access for disk-bound roles• Beware of underlying disk read write contention between

different virtual machines to their virtual hard disks • Ensure SAN is configured and optimized for virtual disk

storage. Understand that a number of LUNs can be provisioned on the same underlying physical disks

Page 24: Pascal benois performance_troubleshooting-spsbe18

NETWORK BEST PRACTICES

PHYSICAL• Use Gigabit Ethernet adaptors and Gigabit switches

• Increasing network capacity – Add a number of NICs to host.

Page 25: Pascal benois performance_troubleshooting-spsbe18

NETWORK BEST PRACTICES

VIRTUAL• Ensure that integration components (“enlightenments”) are installed on

the virtual machine • Use the Network Adapter instead of the Legacy Network Adapter when

configuring networking for a virtual machine • Prefer synthetic to emulated drivers as they are more efficient, use a

dedicated VMBus to communicate to the Virtual NIC and result in lower CPU and network latency.

• Use virtual switches and VLAN tagging for security and performance improvement and create and internal network between virtual machines in your SharePoint farm. Associate SharePoint VMs to the same virtual switch.

Page 26: Pascal benois performance_troubleshooting-spsbe18

IMPORTANT• Understand the impact of your virtualization vendor feature set!

• Don’t let governance slip in your virtualized SharePoint environment

• Snapshots are not supported

• Beware of over subscribing host servers

• Do not exceed physical server RAM by more than 15% if using Hyper-V’s dynamic memory

• Host is a single point of failure

Page 27: Pascal benois performance_troubleshooting-spsbe18

SQL SERVER CONFIGURATION• Little or no configuration of SQL Server is a common problem that causes performance issues

• Optimize performance by:

• Pre-growing data files

• Setting growth factor to a fixed value not a percentage

• Optimizing storage configuration and RAID levels for databases

• Including the number of data files to allocate for tempdb and content databases

• Providing a dedicated VLAN for SharePoint to SQL Server communications

• Setting max degree of parallelism (MAXDOP) to 1

• Providing additional SQL Server instances or servers

Page 28: Pascal benois performance_troubleshooting-spsbe18

SQL SERVER MAINTENANCE• SharePoint databases require constant maintenance otherwise performance will degrade

• Performance issues frequently arise due to:

• Out-of-date statistics

• Fragmented indices

• There are Health Analyzer rules that are responsible for updating statistics and reorganizing or rebuilding indices

• Ensure these are running frequently and set to repair automatically

Page 29: Pascal benois performance_troubleshooting-spsbe18

MAXDOP• SQL Server can utilize the amount of processors that are available to execute the queries in

parallel

• PG has tested a lot with variable settings and came to the conclusion to NOT to use MAXDOP is the most stable and performing way

• To suppress parallel plan generation, set max degree of parallelism to 1

Page 30: Pascal benois performance_troubleshooting-spsbe18

MAXDOP

Page 31: Pascal benois performance_troubleshooting-spsbe18

AUTO_UPDATE_STATISTICS & AUTO_CREATE_STATISTICS

• we recommended disabling AUTO_UPDATE_STATISTICS

• In SharePoint 2010, both should be set to be disabled. For SharePoint 2007, it is recommended to have them both enabled.

• Product team introduced a new timerjob called “Database statistics” which itself takes care in updating the statistics for the databases.

Page 32: Pascal benois performance_troubleshooting-spsbe18

MEASURING PERFORMANCE• What is deemed acceptable?

• Are there any agreed upon metrics?

• What are you trying to measure?

• Common examples:

• Requests per second (RPS)

• Page load time – Time-to-Last-Byte (TTLB)

• Measuring specific operations

• Indexing performance

• What are you hoping to prove?

• Are there any agreed upon tools for measuring performance?

Page 33: Pascal benois performance_troubleshooting-spsbe18

KNOW THIS ONE ?

• “An SPRequest object was not disposed before the end of this thread. To avoid wasting system resources, dispose of this object or its parent (such as a SPSite or SPWeb) as soon as you are done using it. This object will now be disposed”

Page 34: Pascal benois performance_troubleshooting-spsbe18

THE DETECTION

• Avoid Task Manager• Track the private bytes• A steady increase in private bytes value that means a memory leak issue

Page 35: Pascal benois performance_troubleshooting-spsbe18

WHERE IS THE $*%Μ& MEMORY LEAK?

• Find out which logic is causing the memory leak• No automated tool • SharePoint Analysis Script (from 1.2)

Type DescriptionWarning GdiPlus.dll is responsible for 399.54 KBytes worth of outstanding allocations.

The following are the top 2 memory consuming functions:

GdiPlus!GpMalloc+16: 399.54 KBytes worth of outstanding allocations.

Page 36: Pascal benois performance_troubleshooting-spsbe18

WHERE IS THE $*%Μ& MEMORY LEAK?

• Monitor memory (#bytes in all heaps !)• DisposeChecker again ?• Tweak ULS logs• DebugDiag reports• WinDbg, ADPlus and the SOS.DLL

Page 37: Pascal benois performance_troubleshooting-spsbe18

CACHING• A poor or no caching strategy may impact performance as usage increases• Caching will alleviate round-trips to SQL Server, increasing performance by allowing content to be rendered quickly• Three types of caches:

• BLOB cache• Output cache• Object cache

• Simply enabling caching is not enough, settings will need tweaking based on planning and monitoring

Page 38: Pascal benois performance_troubleshooting-spsbe18

BLOB CACHE• Tools

• Fiddler/httpWatch

• Procmon

• Perfmon

• DecodeBlob (2007)

• Avoid flushing the cache at all costs

• It causes performance issues due to write lock held during index writes

• Limit the blob cache using more restrictive RegExp like: ((?<!_gif)\.gif|(?<!_jpg)\.jpg|(?<!_png)\.png|\.css|\.js)$Which excludes specific image pattern *_gif.gif *_jpg.jpg *_png.png

• Or regexp like [/]shared documents[/].+\.(gif|jpg|png|css|js)$ to limit to a certain library or subweb or site collection

Page 39: Pascal benois performance_troubleshooting-spsbe18

BLOB CACHE• ULS logs

• Enable Publishing Cache to Verbose• 2010 has improved logging

• IIS logs with time taken and client side trace fiddler/http watch• Cache-Control: public, max-age=86400• 304 responses with if-none-match and Etag headers• for streaming

• Accept-Range: bytes in response• Content-Range: bytes headers in request

Page 40: Pascal benois performance_troubleshooting-spsbe18

We need your feedback!

Scan this QR code or visit http://svy.mk/sps2012be

Our sponsors: