Upload
jeff-fried
View
225
Download
0
Embed Size (px)
Citation preview
Understanding and ApplyingCloud Hybrid Search
@jefffried
Jeff Fried CTO, BA Insight
we love hybrid search - it's amazing how fast usage is growingJeff Teper @jeffteper
Hybrid SharePoint: what, why & how
Cloud Hybrid Search & the Cloud SSA• What, Why, & How• Resources for SP2013 and SP2016
Going Deeper with “Case Study” Examples• External Content• Scaling & Performance• Content Enrichment• Security• Global Deployment
New
Focused on Search and SharePoint since 2004
Longtime Search Nerd• CTO, BA Insight• Senior PM,
Microsoft• VP, FAST• SVP, LingoMotors
About Jeff FriedPassionate About• Search• SharePoint• Search-driven
applications• Information
Strategy
Blog: BAinsight.com/blogTechnet Column“A View from the Crawlspace”
About BA Insight Our software quickly connects SharePoint users to needed information Software portfolio includes:
– Connectivity - Secure connectors and federation to over 50 content systems– Applications - Improve the user experience to boost productivity– Classification - Autotagging, metadata generation, and text analytics– Analytics - Insight into how SharePoint portals, search, & content are being used
Hundreds of successful implementations including:
KCTCS (background)
Search is not stationary
Demo
9
Move to the cloud without– Breaking customizations– Raising security concerns
Use features not available with SharePoint Online– extensibility models, search on external content, cross-site publishing….
Reduce risk by migrating in steps– Separation of workloads
Keep up with Microsoft’s cloud-first/cloud-only roadmap– and/or hedge your bets
Why Hybrid SharePoint?
The Evolutionof SharePoint:
HYBRID
Management ExtensibilityExperiences
| Server
Experiences Management Extensibility
| Server | Server
HYBRID
Team Sites
Portals
•Identit
y•
SearchEnterprise
Content Mngt
BI
Search Provides a Unified ViewSeamless experience for users• Don’t need to know where content is• Click-through to definitive version
Bring in content from many systems• Tap business critical content where it lives• Secure; reduces load & risk on LoB systems
SharePoint 2013/2016 Search Architecture
Content UX
API
FAST Search Index
ConnectorsCrawl
ContentProcessing
IndexQuery
Processing
Content EnrichmentWeb Service (CEWS)
RemoteResult
Sources
WebFront End
Proxy
“Classic” Hybrid Search is Federated
not a single result set OOB
Cloud Hybrid SearchNe
w
2017
Cloud SSA GA
SP2016Preview
2016 2017
Cloud SSA Preview (SP2013)
3rd PartyConnectors
SP2016 GA
SP2016Beta
Feature Pack 1
Oct 2015 CU
Feature Pack 2
2015
Clear Index Scripts
Remove itemFrom results
SP2016 inAzure VM
HybridTaxonomy
Nov 2016 CU/PU
Roadmap items…..
Benefits of Cloud Hybrid Search
InfrastructureSavings
Reduced Maintenance
Effort
CloudElasticity &
Access
In most Hybrid SharePoint scenarios, this is really important
2) Makes finding content easy, wherever the content lives
1) Simpler, easier, and less costly to run search
Cloud SSA O365Search Index
SharePoint Server(On-premises or Hosted)
Office 365
SharePoint Online Content
Onedrive for Business Content
SharePoint Content
Cloud Hybrid Search
Just Text + Metadata
Case Study: Split Users with SharePoint Large StateUniversity
Start small and grow Team Sites for classes, projects, research groups University-wide sites and applications
Hybrid If it can go to the cloud easily, move it now Legacy apps, some high-security data on-prem
“Textbook” setup OOB wherever possible
SupportSales & Marketing
Knowledge Articles
Fileshares
OneDrive Support forum
SPOSearch Farm
SP 2013 content SP 2010 content
On-premises
Office 365
SPO content
SP 2013/2016 Cloud SSA
Setting up Cloud Hybrid SearchPrereqs• SP2016 or SP2013 with December 2015 CU • O365 Subscription
Steps1. Synchronize users and groups (AAD Connect) 2. Create Cloud Search Service Application 3. Onboarding: Install pre-requisites; Execute script4. Validation: Content Sources, Validation Search Center
Onprem:Cloud SSA:
Online:O365 Search
Use search verticals with Cloud Hybrid Search
SharePoint Online
Everything Support Search
Custom result source using Local SharePoint results plus a filter which excludes results from on-premises
TIP: Can be used during validation of hybrid search in the production tenant.
Result source query:{searchTerms} NOT(IsExternalContent:1)
Result Sources are your friend
The Support Search vertical only searches sites that are relevant to the Support team.
It uses Local SharePoint results plus a filter on which sites to include in the search results
Result source query:{searchTerms} (Path:»http://sp2010» OR Path:»file://fileshare» ORPath:»http://demohybrid.../../supportforum»)
SharePoint Online Everything Support Search
Demo
25
Single node topologySizingScaling up and out may not provide a performance improvementVM Sizing should follow SP2016 minimum requirements (12GB, 4CPU, 80GB disk)Database CrawlDB sizing is the same as onprem (~20GB per 1M items)
Fault tolerance Yes - Queries in Office 365 No - Outbound Hybrid (Queries) No - Crawling
VM
Admin
Crawler
CPC(unused)
APC(unused)Indexer
(unused)QPC
SSA Databases
Multi-node topologyFault tolerance
Yes - Queries in Office 365 Yes - Outbound Hybrid Yes - Crawling
Throttling When the backend is busy, servers are
near capacity or tenant quota is reached No SLA on DPS, index freshness or
outbound query rateBottlenecks1. Customer’s uplink2. Content repository3. Crawler CPU
VM
Admin
Crawler
QPC
VM
Admin
Crawler
CPC(unused)
APC(unused)Indexer
(unused)QPC
SSA Databases
SSA Databases
Synchronous Mirroring
Reduce your footprint
Servers
Volume of Content (indexable items) Pattern
On-prem Search Farm
Cloud Hybrid Search
0-10 million items small 4 App + 2 DB 1 or 210-40 million items medium 12 App + 2 DB 240-100 million items large 28 App + 4 DB 2400 million items XL example (SP2016) 86 App + 4DB 2 or 3
Item Limits and PricingLicensing: 1M items of external content in index for every 1TB storage in O365
1TB included by default+ 0.5 GB per licensed O365 user
No limit on number of items from O365 in the indexDefault throttling at 20M external items; current threshold at 25M
2000 users x 0.5 GB = 1TB + 1TB default = 2 TB total -> 2M external items indexed
+ Can also buy the “Office 365 Extra File Storage” Add-on $0.20/GB/Month = $200/TB/Month = $200/M items/Month
50,000 users x 0.5 GB = 25TB + 1TB default = 26 TB total -> 26M external items indexed
SharePoint 2016 HybridFacilities built in for “split user”
Cloud Hybrid Search
User Profiles Following
ExtranetCompliance
(DLP/e-Discovery)
Config Experience
Built on Search
Advantages• Footprint/Operations
No servers to maintain HA, DR Trained support personnel
• Features Cloud only workloads Invite external users / sharing Easy access for users from all devices
Disadvantages Control
– No control on updates– Some control on features via first release– No control on indexing– …the train is rolling
Features– Some feature gaps– Harder to integrate LOB data– Harder to customize
Cloud SSA Pro/Con versus on-prem
Hybrid SharePoint: what, why & how
Cloud Hybrid Search & the Cloud SSA• What, Why, & How• Resources for SP2013 and SP2016
Going Deeper with “Case Study” Examples• External Content• Scaling & Performance• Content Enrichment• Security• Global Deployment
New
Cloud SSA O365Search Index
External Content(on-premises
and/orin the cloud)
SharePoint Server(On-premises or Hosted)
Office 365
SharePoint Online Content
Onedrive for Business Content
Conn
ecto
rs
SharePoint Content
Adding External ContentCloud Hybrid Search
Just Text + Metadata
Also drives:• Office Graph (delve,..)• Compliance (DLP, …)
Connectors to MANY Enterprise Systems
Plus a proven architecture and process for creating new connectors to complex systems
Databases• IBM DB2• Microsoft SQL Server• MySQL• Oracle Databases
ERP and Portal Systems• IBM WebSphere• Oracle Interaction (PlumTree)• Oracle WebCenter • SAP Business Suite• SAP DMS
Search and Cloud Systems• Box • Google Drive• Microsoft SharePoint 2013, 2010,
2007, FAST Search for SharePoint• Microsoft Search Server• Microsoft SharePoint Online• PharmaCircle• Scopus
Content and Collaboration Systems• Alfresco • Confluence• CuadraSTAR• EMC Documentum• EMC eRoom• HP Trim• IBM Connections• IBM Content Manager• IBM Filenet P8• IBM Lotus Notes • Objective DMS• OpenText LiveLink/RM• OpenText Hummingbird / eDocs• Oracle CMS/Stellent• Veeva Vault• Xerox DocuShare
Mailbox and Archiving Systems• HP Autonomy EAS / (Zantaz)• IBM Lotus Notes • Microsoft Exchange • Microsoft Exchange Public Folders• OpenText LiveLink File Archive• Symantex Evault
Practice Management Systems• Aderant Practice Management• Autonomy Worksite (iManage)• Elite/3E• Elite Prolaw • KnowledgeMill OnePlace• LegalKey • NetDocuments• Practical Law • RealPractice
CRM Systems• Microsoft Dynamics CRM• LexisNexis Interaction CRM• Salesforce.com/Force.com• ServiceNow• Any SQL Based CRM
Enterprise Social Networks• IBM Connections• Jive • SalesForce Chatter• Yammer
OOTB Search Center, plus any and all tailored search experiences
Shows external content (Connectors) and consistent metadata (AutoClassifier)
External Content in O365 UX
Unified view across all content - on-premises and on-line- inside and outside SharePoint
External content also surfaces in Delve, Modern Team Sites, etc. via the Office Graph
Current Caveats:1) don’t see thumbnails, just file icons2) Have to query for it to show up
24,000 employees - moving all SharePoint to O365– Keep up with Microsoft’s roadmap and innovation– Major cost savings
Index in O365– Documentum and File Shares on-prem
Crawl on-prem from O365– Used Cloud SSA & BA Insight connectors
Case Study: Cloud SSA, external content
Large global companyin materials science
External Content(on-premises
and/orin the cloud)
Connectors
FileHandlers
/iFilters
CrawlerContent
Processing Index
CustomProcessing
Content Flow – on-prem indexing
CEWS
Bottlenecks:1) Source systems2) Content Processing3) Indexer….
External Content(on-premises
and/orin the cloud)
Connectors
FileHandlers
/iFilters
CrawlerContent
Processing Index
Content Flow – Cloud Hybrid Search OOB
CloudSSA
Bottlenecks:1) Uplink2) Source systems….
42
PerformanceUplink is nowthe bottleneck
Want a tool?
SCS under the hood
Crawler
Content
Indexing API
Blob store
Document state table
Work queues
Backend API
Index/Graph
On-Premises content source
Search farm
Azure Broker
Crawler
Content
SPO content source
External Endpoint
Internal Endpoint
MICROSOFT CONFIDENT IAL – INTERNAL ONLY
Her user token gets rehydrated with her online claims as she is authenticated against Office 365.
Cloud SSA
SPOSearch Index
Logical architecture: query
Corporate network
SP 2013
1
2a
Jaden issues a query from Office 365.Her user token contains her online identity and group memberships.
1
Jaden issues a query from a site on-premises. This sends over her on-premises claims to SPO
2a
2b
2b
Office 365
SUPPORTED– Custom IFilter– BCS connectors– Partner connectors
Customizations with Cloud Hybrid Search
SUPPORTED– Tenant level schema mapping– Query rules– Result sources
Cloud SSA SCS/O365
NOT SUPPORTED• Content that requires custom
security trimming
NOT SUPPORTED• Site collection level schema mapping• Custom security trimming• Custom entity extraction• Content enrichment web service
Issues with Cloud Hybrid Search (1)Cloud Hybrid Search "annoyances"
Performance Characteristicsslower query latency for on-prem queries against Cloud SSA
SharePoint Online Limitationsno synonymsno site-level schemano full trust code access
Hybrid Administration Weaknessesclunky metadata mappingcan't remove on-premises search results from Cloud SSAtrickier to test & debug crawls can't reset index from Cloud SSA
Be aware of these& compensate for them
(Fixed in August PU)
(Semi-addressed in June PU)
And it’s getting better:
Should I run index reset?
NO!With SP2016 (June PU), at least you are warned, and you cancause a partial reset using CSOM DeleteAllCloudHybridSearchContent()https://blogs.technet.microsoft.com/beyondsharepoint/2016/07/07/cloud-hybrid-search-service-application-removing-items-from-the-office-365-search-index/
Issues with Cloud Hybrid Search (2)
50
Content Enrichmentno CEWSno Entity Extraction
Securityno Custom Security TrimmingCan't crawl across Multiple DomainsCan't Crawl SP in Classic Auth Mode
Data Sovereigntyexport-restricted content can't be put in O365 index
Limitations of Cloud SSA
These can all be solved!
Cloud SSA
External Content(on-premises
and/orin the cloud)
SharePoint Server(On-premises or Hosted)
SPO ContentOneDrive Content
Conn
ecto
rs SharePoint Content
ConnectorFramework
Office 365
AutoClassifier(app
version)
O365Search Index
CEWS
No CEWS?Add it back in
CustomProcessing
Case study:Content Enrichment
Top 5 Energy Company
“Home-built” CEWS stages Domain Specific Entity Extraction “Clean-up” processing
Using Cloud Hybrid Search, content from SharePoint 2010, SharePoint 2013 Documentum Fileshares EBSCO
Using BA Insight Connector Framework Smart Pipeline Feature for multiple CEWS stages AutoClassifier for entity extraction
ContentCloudSSA
Connector Framework
IndexingConnectors
Smart Pipeline
AutoClassifier Custom Stage A
CustomStage C
Custom Stage B
AAD
AD
Online
On-Prem
Cloud Hybrid Search under the coversSecurity = identity sync + ACL mapping
Cloud SSACloud SSA
ParseCrawl
SCS
ACL Map
Process
Blob store
queue
Azure O365
• Security principals can be managed on-premises and synched to the cloud by using the AAD sync tool.
• The object in the cloud (AAD) directory now mirrors the object in the on-premises (AD) directory.
Directory Synchronization
AD AAD
AccountName
CORP\jaden
SID S-1-5-21-1212121212-1212121212-1212
AccountName
msOnline-OnPremiseSecurityIdentifier
S-1-5-21-1212121212-1212121212-1212
PUID PUID-XXXX-XXXXXXXXXX
Mapping of Access Control ListsAs items are indexed in Office 365, the access control entries are looked up in the cloud directory service.
Allow: S-1-5-21-1212121212-1212121212-1212
Allow: PUID-XXXX-XXXXXXXXXX
• User SIDs are mapped to PUIDs• Group SIDs are mapped to Object IDs• «Everyone» and «Authenticated users» are mapped to
«Everyone except external users» Only AD Users and Groups, Only from one domain
Multiple Domains Can only sync one with O365
No way to change entitlements OOB
Using 2 BA Insight connectors SharePoint 2010 Yammer
Convert entitlements when crawling
Case Study: Crawling Cross-Domain European
Telecom Provider
A global single index solution
Cloud SSA
Cloud SSA
Cloud SSA
Cloud SSA
Cloud SSA
BUT export-restricted content can’t be in the global index
Issues with Cloud Hybrid Search OOB
Content Enrichmentno CEWSno Entity Extraction
Securityno Custom Security TrimmingCan't crawl across Multiple DomainsCan't Crawl SP in Classic Auth Mode
Data Sovereigntyexport-restricted content can't be put in O365 index
Limitations of Cloud SSA BA Insight Solution
Connector Framework AutoClassifierConnector Framework can 'map down' to AD groupscan 'map across' cross-domaincan crawl and map security
Federator
Solution
Hybrid SharePoint: what, why & how
Cloud Hybrid Search & the Cloud SSA• What, Why, & How• Resources for SP2013 and SP2016
Going Deeper with “Case Study” Examples• External Content• Scaling & Performance• Content Enrichment• Security• Global Deployment
New
Key Considerations for Hybrid: Workloads, Environment, Data, Customizations
Availability of features Online versus On-Premises on particular workloads
Significant investments in customization of On-Premises workloads
Concerns over global network performance with remote sites
Regulatory considerations
Manageability concerns
Succeeding with Hybrid SearchIt’s still a project
There are resources and tools to help
Expect to iterate