Search Topology and Optimization April 12, 2013 Mike Maadarani
SharePoint Architect
Slide 2
Bio.. Mike Maadarani App Dev and Architecture for over 18 years
(15 Years Microsoft, 3 Years with the Other Guys) Business focused
on Enterprise Content Management, Publishing Sites, & Search
Technology focused on SharePoint, SQL Server and SharePoint
Integration Architect, trainer, and presenter Blog:
www.maadarani.comwww.maadarani.com
[email protected]@maadarani.com; @mikemaadarani
Slide 3
Configuring SSA and PS Topology Scenarios Agenda Closing and
Q&A Relevancy, Query Builder, & Optimization SharePoint
2013 Search Overview Architecture and Resource Utilization Hybrid
Say What?
Slide 4
Search in 2010 Crawl Component Query Component SharePoint 2010
Search Service Application Crawl Indexing Engine Query Engine
Search Admin Property Store (SQL) Content User WFE
Slide 5
FAST Search for SharePoint 2010 FAST Content SSA FAST Query SSA
FAST back-end components (managed separately) Extensibility:
Sandbox Entity Extraction Crawl Indexing Engine Query Engine
Content Pipeline Analysis Engine Query Pipeline Search Admin
Content User WFE
Slide 6
In SharePoint 2013 SharePoint 2013 Search Service Application
Index Component Query Engine Content Pipeline Content Processing
Component Crawl Component Query Processing Component Analytics
Processing Component Query Pipeline Search Admin Admin Component
Entire index on local disk Property Store (SQL) Content User WFE
Analysis Engine Crawl Indexing Engine Link/query analysis &
recommendations Separate crawl and indexing Extensibility: Web
callout Entity Extraction
Slide 7
SharePoint 2013 Search Architecture Search Admin Content User
Crawl Content Processing Index Query ProcessingWFE Analytics
Processing FAST Search Index SharePoint SP Apps Devices Non-SP UX
HTTP File shares SharePoint User profiles Lotus Notes Documentum
Exchange folders Custom - BCS Public API Search topology
components
Slide 8
Why Search is so important? I just uploaded a document. Make it
searchable, quick! FAST
Slide 9
Why Search is so important? EASY
Slide 10
Why Search is so important? EASY
Slide 11
Why Search is so important? Search Driven Applications
Slide 12
Why Search is so important? Search Everything I can find ALL of
Rob Fords hidden videos!
Slide 13
noderunner.exe Where does Search live in the farm? Windows
services SharePoint Search Host Controller service
Runtime/lifecycle control of search components (except crawler)
hostcontrollerservice.exe SharePoint Server Search service Crawl
Component mssearch.exe mssdmn.exe Processes Noderunner.exe Runtime
environment for search components (except crawler) msseearch.exe
mssdmn.exe msseearch.exe mssdmn.exe Crawl Component noderunner.exe
Search Runtime Environment hostcontrollerservice.exe Host
Controller SharePoint App Server Still there, but only Crawl
Component Admin Component Query Processing Component Content
Processing Component Index Component Analytics Processing
Component
Slide 14
Where do I host my components?
Slide 15
CPU load Driving factors QPS Query transformations Network load
Driving factors Number of index partitions Size of queries and
results Example: 20 index partitions @ 20 qps => 200/100 Mbit/s
in/outbound Query processing component (QPC)
http://social.technet.microsoft.com/wiki/contents/articles/16002.sharepoint-2013-capacity-planning-sizing-and-high-availability-for-search-in-
spc172.aspx
Slide 16
CPU load Driving factors QPS and item count Guidelines per
index component @ 2 GHz CPU 1M items: 5 QPS per CPU core 5M items:
2 QPS per CPU core 10M items: 1 QPS per CPU core Disk load Driving
factors QPS and item count New content invalidates caches Disk
size: 500GB @ 10M items per index component Index component
Slide 17
Crawl component CPU load Driving factors Documents per second
Link discovery Crawl management Network load Driving factors
Downloading items from content sources Passing items on to CPC Disk
load All documents are temporarily stored in data folder
Slide 18
Content processing component (CPC) CPU load Driving factors
Documents per second Document size and complexity Feature
extraction Estimate: 5-10 DPS per CPU core Network load Driving
factors Documents per second Document size
Slide 19
Analytics processing component (APC) CPU load Driving factors
Number of items Site activity Disk load Local disk used for
temporary storage Bulk load, primacy concern is load isolation
Network load Same as for CPU load PLUS: Network traffic increases
when distributing APC across multiple machines
Slide 20
Search administration component Low CPU and network load Load
increase with more components in the search topology
Slide 21
Create your SSA
Slide 22
Small Search Topology
Slide 23
Fault tolerant small search topology
Slide 24
Small search farm (up to 10M items)
Slide 25
Scaling from small to medium search topology Adm
Slide 26
Extend your SSA
Slide 27
Medium Search Topology
Slide 28
Hybrid Search
Slide 29
Why Hybrid Search? Hybrid SharePoint environment Pieces of
content distributed across multiple environments Complexity due to
multiple locations Many top level domains requiring knowledge of
where to go to locate the most relevant content No single
Enterprise Search Center for finding content Lost user productivity
and added frustration while trying to locate relevant content
Slide 30
Benefits Provide integrated search results allowing for a
single place to find content One Enterprise Search center to reduce
User Interface complexity Query all of your SharePoint content at
the same time Allow O365 and On-Premises solutions to coexist
Provides a solution allowing customers to move to the cloud on
their own terms Reduce operation cost Take advantage of newer
SharePoint feature updates in O365 Hybrid search solves many
problems as data is moving from on- premises to O365
Slide 31
One-way outbound topology WFE SharePoint Online Local search
results only Site collection Office365 tenant SharePoint Server
2013 Farm Hybrid search results Outbound Inbound SharePoint Online
can NOT query SharePoint On-prem Internet Microsoft data
centerOn-premises SharePoint Server can query SharePoint
Online
Slide 32
One-way inbound topology WFE SharePoint Online Local search
results only Site collection Office365 tenant SharePoint Server
2013 Farm Hybrid search results Outbound Inbound SharePoint Online
can query SharePoint On-prem Internet Microsoft data center
On-premises SharePoint Server can NOT query SharePoint Online
Reverse Proxy DMZ
Slide 33
One-way inbound topology WFE SharePoint Online Local search
results only Site collection Office365 tenant SharePoint Server
2013 Farm Hybrid search results Outbound Inbound SharePoint Online
can query SharePoint On-prem Internet Microsoft data center
On-premises SharePoint Server can query SharePoint Online Reverse
Proxy DMZ
Slide 34
Tweaking Your results
Slide 35
Challenges: Intent Where is my talk Project Plan? Are Documents
held at the same place? I wonder if there are references from
previous projects? Different people have different intents Query
Rules help you handle intents There is rarely a single right answer
Infrastructure Project
Slide 36
Authorities: SSA-level configuration Sites that are important
Sites with low intrinsic relevance Takes ~24hrs to propagate
Slide 37
Authorities: Connected
Slide 38
Setting an authority affects all sites connected through
hyperlinks Sites are weighted by distance to the authority
Slide 39
Query Rules Tune Search Results Created at the SSA, Tenant,
Site Collection or Site SSA Site Collection Site
Slide 40
Query Rules Condition When Do I apply the rule? Action What to
do when the rule is matched? Publishing When should the rule be
active?
Slide 41
Query Rules Exact match, beginning or end Ad-hoc or term store
dictionary Match a regex (advanced) Is this query more likely aimed
at the following source? Do people mostly click on result of the
following type? Show a promoted result Show a block of results
Replace the core results with a different query
Slide 42
Query Builder Dynamically Ranking Change Part of the query
Results Ranking
Slide 43
Query Builder
Slide 44
Configuration in the Conceptual Relevance Flow For all queries:
Authorities: Level 1: http://employment Ranking model :
{incorporate user ratings} Query : HR Employment quarterly report
Search Web Part Query Processing Engine Document Collection
Thesaurus : HR Human Resources Best bets: HR Employment
/HR/employment (WORDS HR, Human Resources) AND (WORDS employees,
employed) AND (WORDS quarterly, quarterlies) AND (WORDS report,
reports, reported) Mixed Results for: HR Employment best bet HR
Employment quarterly report HR Employment ContentType=reports
Dynamic Reordering Rules: Quarterly Report {prefer docs from
http://reports} Query Rule: {Terms} Quarterly Report {Terms}
ContentType=reports
Slide 45
Create a Query Rule Hybrid From Result Source drop-down list,
select the specified result source Under Query is performed on
these sources, if you select One of these sources, make sure to
select the result source you created
Slide 46
Hybrid Results Results from SharePoint Online Results from
SharePoint Server
Slide 47
Session Objective and Takeaways High Availability and
PerformanceBetter Search QualityBetter managementFriendly results
and tools