Search Topology and Optimization April 12, 2013 Mike Maadarani SharePoint Architect

Embed Size (px)

Citation preview

  • Slide 1
  • Search Topology and Optimization April 12, 2013 Mike Maadarani SharePoint Architect
  • Slide 2
  • Bio.. Mike Maadarani App Dev and Architecture for over 18 years (15 Years Microsoft, 3 Years with the Other Guys) Business focused on Enterprise Content Management, Publishing Sites, & Search Technology focused on SharePoint, SQL Server and SharePoint Integration Architect, trainer, and presenter Blog: www.maadarani.comwww.maadarani.com [email protected]@maadarani.com; @mikemaadarani
  • Slide 3
  • Configuring SSA and PS Topology Scenarios Agenda Closing and Q&A Relevancy, Query Builder, & Optimization SharePoint 2013 Search Overview Architecture and Resource Utilization Hybrid Say What?
  • Slide 4
  • Search in 2010 Crawl Component Query Component SharePoint 2010 Search Service Application Crawl Indexing Engine Query Engine Search Admin Property Store (SQL) Content User WFE
  • Slide 5
  • FAST Search for SharePoint 2010 FAST Content SSA FAST Query SSA FAST back-end components (managed separately) Extensibility: Sandbox Entity Extraction Crawl Indexing Engine Query Engine Content Pipeline Analysis Engine Query Pipeline Search Admin Content User WFE
  • Slide 6
  • In SharePoint 2013 SharePoint 2013 Search Service Application Index Component Query Engine Content Pipeline Content Processing Component Crawl Component Query Processing Component Analytics Processing Component Query Pipeline Search Admin Admin Component Entire index on local disk Property Store (SQL) Content User WFE Analysis Engine Crawl Indexing Engine Link/query analysis & recommendations Separate crawl and indexing Extensibility: Web callout Entity Extraction
  • Slide 7
  • SharePoint 2013 Search Architecture Search Admin Content User Crawl Content Processing Index Query ProcessingWFE Analytics Processing FAST Search Index SharePoint SP Apps Devices Non-SP UX HTTP File shares SharePoint User profiles Lotus Notes Documentum Exchange folders Custom - BCS Public API Search topology components
  • Slide 8
  • Why Search is so important? I just uploaded a document. Make it searchable, quick! FAST
  • Slide 9
  • Why Search is so important? EASY
  • Slide 10
  • Why Search is so important? EASY
  • Slide 11
  • Why Search is so important? Search Driven Applications
  • Slide 12
  • Why Search is so important? Search Everything I can find ALL of Rob Fords hidden videos!
  • Slide 13
  • noderunner.exe Where does Search live in the farm? Windows services SharePoint Search Host Controller service Runtime/lifecycle control of search components (except crawler) hostcontrollerservice.exe SharePoint Server Search service Crawl Component mssearch.exe mssdmn.exe Processes Noderunner.exe Runtime environment for search components (except crawler) msseearch.exe mssdmn.exe msseearch.exe mssdmn.exe Crawl Component noderunner.exe Search Runtime Environment hostcontrollerservice.exe Host Controller SharePoint App Server Still there, but only Crawl Component Admin Component Query Processing Component Content Processing Component Index Component Analytics Processing Component
  • Slide 14
  • Where do I host my components?
  • Slide 15
  • CPU load Driving factors QPS Query transformations Network load Driving factors Number of index partitions Size of queries and results Example: 20 index partitions @ 20 qps => 200/100 Mbit/s in/outbound Query processing component (QPC) http://social.technet.microsoft.com/wiki/contents/articles/16002.sharepoint-2013-capacity-planning-sizing-and-high-availability-for-search-in- spc172.aspx
  • Slide 16
  • CPU load Driving factors QPS and item count Guidelines per index component @ 2 GHz CPU 1M items: 5 QPS per CPU core 5M items: 2 QPS per CPU core 10M items: 1 QPS per CPU core Disk load Driving factors QPS and item count New content invalidates caches Disk size: 500GB @ 10M items per index component Index component
  • Slide 17
  • Crawl component CPU load Driving factors Documents per second Link discovery Crawl management Network load Driving factors Downloading items from content sources Passing items on to CPC Disk load All documents are temporarily stored in data folder
  • Slide 18
  • Content processing component (CPC) CPU load Driving factors Documents per second Document size and complexity Feature extraction Estimate: 5-10 DPS per CPU core Network load Driving factors Documents per second Document size
  • Slide 19
  • Analytics processing component (APC) CPU load Driving factors Number of items Site activity Disk load Local disk used for temporary storage Bulk load, primacy concern is load isolation Network load Same as for CPU load PLUS: Network traffic increases when distributing APC across multiple machines
  • Slide 20
  • Search administration component Low CPU and network load Load increase with more components in the search topology
  • Slide 21
  • Create your SSA
  • Slide 22
  • Small Search Topology
  • Slide 23
  • Fault tolerant small search topology
  • Slide 24
  • Small search farm (up to 10M items)
  • Slide 25
  • Scaling from small to medium search topology Adm
  • Slide 26
  • Extend your SSA
  • Slide 27
  • Medium Search Topology
  • Slide 28
  • Hybrid Search
  • Slide 29
  • Why Hybrid Search? Hybrid SharePoint environment Pieces of content distributed across multiple environments Complexity due to multiple locations Many top level domains requiring knowledge of where to go to locate the most relevant content No single Enterprise Search Center for finding content Lost user productivity and added frustration while trying to locate relevant content
  • Slide 30
  • Benefits Provide integrated search results allowing for a single place to find content One Enterprise Search center to reduce User Interface complexity Query all of your SharePoint content at the same time Allow O365 and On-Premises solutions to coexist Provides a solution allowing customers to move to the cloud on their own terms Reduce operation cost Take advantage of newer SharePoint feature updates in O365 Hybrid search solves many problems as data is moving from on- premises to O365
  • Slide 31
  • One-way outbound topology WFE SharePoint Online Local search results only Site collection Office365 tenant SharePoint Server 2013 Farm Hybrid search results Outbound Inbound SharePoint Online can NOT query SharePoint On-prem Internet Microsoft data centerOn-premises SharePoint Server can query SharePoint Online
  • Slide 32
  • One-way inbound topology WFE SharePoint Online Local search results only Site collection Office365 tenant SharePoint Server 2013 Farm Hybrid search results Outbound Inbound SharePoint Online can query SharePoint On-prem Internet Microsoft data center On-premises SharePoint Server can NOT query SharePoint Online Reverse Proxy DMZ
  • Slide 33
  • One-way inbound topology WFE SharePoint Online Local search results only Site collection Office365 tenant SharePoint Server 2013 Farm Hybrid search results Outbound Inbound SharePoint Online can query SharePoint On-prem Internet Microsoft data center On-premises SharePoint Server can query SharePoint Online Reverse Proxy DMZ
  • Slide 34
  • Tweaking Your results
  • Slide 35
  • Challenges: Intent Where is my talk Project Plan? Are Documents held at the same place? I wonder if there are references from previous projects? Different people have different intents Query Rules help you handle intents There is rarely a single right answer Infrastructure Project
  • Slide 36
  • Authorities: SSA-level configuration Sites that are important Sites with low intrinsic relevance Takes ~24hrs to propagate
  • Slide 37
  • Authorities: Connected
  • Slide 38
  • Setting an authority affects all sites connected through hyperlinks Sites are weighted by distance to the authority
  • Slide 39
  • Query Rules Tune Search Results Created at the SSA, Tenant, Site Collection or Site SSA Site Collection Site
  • Slide 40
  • Query Rules Condition When Do I apply the rule? Action What to do when the rule is matched? Publishing When should the rule be active?
  • Slide 41
  • Query Rules Exact match, beginning or end Ad-hoc or term store dictionary Match a regex (advanced) Is this query more likely aimed at the following source? Do people mostly click on result of the following type? Show a promoted result Show a block of results Replace the core results with a different query
  • Slide 42
  • Query Builder Dynamically Ranking Change Part of the query Results Ranking
  • Slide 43
  • Query Builder
  • Slide 44
  • Configuration in the Conceptual Relevance Flow For all queries: Authorities: Level 1: http://employment Ranking model : {incorporate user ratings} Query : HR Employment quarterly report Search Web Part Query Processing Engine Document Collection Thesaurus : HR Human Resources Best bets: HR Employment /HR/employment (WORDS HR, Human Resources) AND (WORDS employees, employed) AND (WORDS quarterly, quarterlies) AND (WORDS report, reports, reported) Mixed Results for: HR Employment best bet HR Employment quarterly report HR Employment ContentType=reports Dynamic Reordering Rules: Quarterly Report {prefer docs from http://reports} Query Rule: {Terms} Quarterly Report {Terms} ContentType=reports
  • Slide 45
  • Create a Query Rule Hybrid From Result Source drop-down list, select the specified result source Under Query is performed on these sources, if you select One of these sources, make sure to select the result source you created
  • Slide 46
  • Hybrid Results Results from SharePoint Online Results from SharePoint Server
  • Slide 47
  • Session Objective and Takeaways High Availability and PerformanceBetter Search QualityBetter managementFriendly results and tools
  • Slide 48
  • Thank You! www.maadarani.com, [email protected], @mikemaadarani www.slideshare.net/maadarani