61
Understanding and Applying Cloud Hybrid Search @jefffried Jeff Fried CTO, BA Insight

Understanding and Applying Cloud Hybrid Search

Embed Size (px)

Citation preview

Page 1: Understanding and Applying Cloud Hybrid Search

Understanding and ApplyingCloud Hybrid Search

@jefffried

Jeff Fried CTO, BA Insight

Page 2: Understanding and Applying Cloud Hybrid Search

we love hybrid search - it's amazing how fast usage is growingJeff Teper @jeffteper

Page 3: Understanding and Applying Cloud Hybrid Search

Hybrid SharePoint: what, why & how

Cloud Hybrid Search & the Cloud SSA• What, Why, & How• Resources for SP2013 and SP2016

Going Deeper with “Case Study” Examples• External Content• Scaling & Performance• Content Enrichment• Security• Global Deployment

New

Page 4: Understanding and Applying Cloud Hybrid Search

Focused on Search and SharePoint since 2004

Longtime Search Nerd• CTO, BA Insight• Senior PM,

Microsoft• VP, FAST• SVP, LingoMotors

About Jeff FriedPassionate About• Search• SharePoint• Search-driven

applications• Information

Strategy

Blog: BAinsight.com/blogTechnet Column“A View from the Crawlspace”

[email protected]

Page 5: Understanding and Applying Cloud Hybrid Search

About BA Insight Our software quickly connects SharePoint users to needed information Software portfolio includes:

– Connectivity - Secure connectors and federation to over 50 content systems– Applications - Improve the user experience to boost productivity– Classification - Autotagging, metadata generation, and text analytics– Analytics - Insight into how SharePoint portals, search, & content are being used

Hundreds of successful implementations including:

Page 6: Understanding and Applying Cloud Hybrid Search

KCTCS (background)

Page 7: Understanding and Applying Cloud Hybrid Search
Page 8: Understanding and Applying Cloud Hybrid Search

Search is not stationary

Page 9: Understanding and Applying Cloud Hybrid Search

Demo

9

Page 10: Understanding and Applying Cloud Hybrid Search

Move to the cloud without– Breaking customizations– Raising security concerns

Use features not available with SharePoint Online– extensibility models, search on external content, cross-site publishing….

Reduce risk by migrating in steps– Separation of workloads

Keep up with Microsoft’s cloud-first/cloud-only roadmap– and/or hedge your bets

Why Hybrid SharePoint?

Page 11: Understanding and Applying Cloud Hybrid Search

The Evolutionof SharePoint:

HYBRID

Management ExtensibilityExperiences

| Server

Experiences Management Extensibility

| Server | Server

HYBRID

Team Sites

Portals

•Identit

y•

SearchEnterprise

Content Mngt

BI

Page 12: Understanding and Applying Cloud Hybrid Search

Search Provides a Unified ViewSeamless experience for users• Don’t need to know where content is• Click-through to definitive version

Bring in content from many systems• Tap business critical content where it lives• Secure; reduces load & risk on LoB systems

Page 13: Understanding and Applying Cloud Hybrid Search

SharePoint 2013/2016 Search Architecture

Content UX

API

FAST Search Index

ConnectorsCrawl

ContentProcessing

IndexQuery

Processing

Content EnrichmentWeb Service (CEWS)

RemoteResult

Sources

WebFront End

Proxy

Page 14: Understanding and Applying Cloud Hybrid Search

“Classic” Hybrid Search is Federated

not a single result set OOB

Page 15: Understanding and Applying Cloud Hybrid Search

Cloud Hybrid SearchNe

w

Page 16: Understanding and Applying Cloud Hybrid Search

2017

Cloud SSA GA

SP2016Preview

2016 2017

Cloud SSA Preview (SP2013)

3rd PartyConnectors

SP2016 GA

SP2016Beta

Feature Pack 1

Oct 2015 CU

Feature Pack 2

2015

Clear Index Scripts

Remove itemFrom results

SP2016 inAzure VM

HybridTaxonomy

Nov 2016 CU/PU

Roadmap items…..

Page 17: Understanding and Applying Cloud Hybrid Search

Benefits of Cloud Hybrid Search

InfrastructureSavings

Reduced Maintenance

Effort

CloudElasticity &

Access

In most Hybrid SharePoint scenarios, this is really important

2) Makes finding content easy, wherever the content lives

1) Simpler, easier, and less costly to run search

Page 18: Understanding and Applying Cloud Hybrid Search

Cloud SSA O365Search Index

SharePoint Server(On-premises or Hosted)

Office 365

SharePoint Online Content

Onedrive for Business Content

SharePoint Content

Cloud Hybrid Search

Just Text + Metadata

Page 19: Understanding and Applying Cloud Hybrid Search

Case Study: Split Users with SharePoint Large StateUniversity

Start small and grow Team Sites for classes, projects, research groups University-wide sites and applications

Hybrid If it can go to the cloud easily, move it now Legacy apps, some high-security data on-prem

“Textbook” setup OOB wherever possible

Page 20: Understanding and Applying Cloud Hybrid Search

SupportSales & Marketing

Knowledge Articles

Fileshares

OneDrive Support forum

SPOSearch Farm

SP 2013 content SP 2010 content

On-premises

Office 365

SPO content

SP 2013/2016 Cloud SSA

Page 21: Understanding and Applying Cloud Hybrid Search

Setting up Cloud Hybrid SearchPrereqs• SP2016 or SP2013 with December 2015 CU • O365 Subscription

Steps1. Synchronize users and groups (AAD Connect)   2. Create Cloud Search Service Application    3. Onboarding: Install pre-requisites; Execute script4. Validation: Content Sources, Validation Search Center

Page 22: Understanding and Applying Cloud Hybrid Search

Onprem:Cloud SSA:

Online:O365 Search

Page 23: Understanding and Applying Cloud Hybrid Search

Use search verticals with Cloud Hybrid Search

SharePoint Online

Everything Support Search

Custom result source using Local SharePoint results plus a filter which excludes results from on-premises

TIP: Can be used during validation of hybrid search in the production tenant.

Result source query:{searchTerms} NOT(IsExternalContent:1)

Page 24: Understanding and Applying Cloud Hybrid Search

Result Sources are your friend

The Support Search vertical only searches sites that are relevant to the Support team.

It uses Local SharePoint results plus a filter on which sites to include in the search results

Result source query:{searchTerms} (Path:»http://sp2010» OR Path:»file://fileshare» ORPath:»http://demohybrid.../../supportforum»)

SharePoint Online Everything Support Search

Page 25: Understanding and Applying Cloud Hybrid Search

Demo

25

Page 26: Understanding and Applying Cloud Hybrid Search

Single node topologySizingScaling up and out may not provide a performance improvementVM Sizing should follow SP2016 minimum requirements (12GB, 4CPU, 80GB disk)Database CrawlDB sizing is the same as onprem (~20GB per 1M items)

Fault tolerance Yes - Queries in Office 365 No - Outbound Hybrid (Queries) No - Crawling

VM

Admin

Crawler

CPC(unused)

APC(unused)Indexer

(unused)QPC

SSA Databases

Page 27: Understanding and Applying Cloud Hybrid Search

Multi-node topologyFault tolerance

Yes - Queries in Office 365 Yes - Outbound Hybrid Yes - Crawling

Throttling When the backend is busy, servers are

near capacity or tenant quota is reached No SLA on DPS, index freshness or

outbound query rateBottlenecks1. Customer’s uplink2. Content repository3. Crawler CPU

VM

Admin

Crawler

QPC

VM

Admin

Crawler

CPC(unused)

APC(unused)Indexer

(unused)QPC

SSA Databases

SSA Databases

Synchronous Mirroring

Page 28: Understanding and Applying Cloud Hybrid Search

Reduce your footprint

    Servers

Volume of Content (indexable items) Pattern

On-prem Search Farm

Cloud Hybrid Search

0-10 million items small 4 App + 2 DB 1 or 210-40 million items medium 12 App + 2 DB 240-100 million items large 28 App + 4 DB 2400 million items XL example (SP2016) 86 App + 4DB 2 or 3

Page 29: Understanding and Applying Cloud Hybrid Search

Item Limits and PricingLicensing: 1M items of external content in index for every 1TB storage in O365

1TB included by default+ 0.5 GB per licensed O365 user

No limit on number of items from O365 in the indexDefault throttling at 20M external items; current threshold at 25M

2000 users x 0.5 GB = 1TB + 1TB default = 2 TB total -> 2M external items indexed

+ Can also buy the “Office 365 Extra File Storage” Add-on $0.20/GB/Month = $200/TB/Month = $200/M items/Month

50,000 users x 0.5 GB = 25TB + 1TB default = 26 TB total -> 26M external items indexed

Page 30: Understanding and Applying Cloud Hybrid Search

SharePoint 2016 HybridFacilities built in for “split user”

Cloud Hybrid Search

User Profiles Following

ExtranetCompliance

(DLP/e-Discovery)

Config Experience

Built on Search

Page 31: Understanding and Applying Cloud Hybrid Search

Advantages• Footprint/Operations

No servers to maintain HA, DR Trained support personnel

• Features Cloud only workloads Invite external users / sharing Easy access for users from all devices

Disadvantages Control

– No control on updates– Some control on features via first release– No control on indexing– …the train is rolling

Features– Some feature gaps– Harder to integrate LOB data– Harder to customize

Cloud SSA Pro/Con versus on-prem

Page 32: Understanding and Applying Cloud Hybrid Search

Hybrid SharePoint: what, why & how

Cloud Hybrid Search & the Cloud SSA• What, Why, & How• Resources for SP2013 and SP2016

Going Deeper with “Case Study” Examples• External Content• Scaling & Performance• Content Enrichment• Security• Global Deployment

New

Page 33: Understanding and Applying Cloud Hybrid Search

Cloud SSA O365Search Index

External Content(on-premises

and/orin the cloud)

SharePoint Server(On-premises or Hosted)

Office 365

SharePoint Online Content

Onedrive for Business Content

Conn

ecto

rs

SharePoint Content

Adding External ContentCloud Hybrid Search

Just Text + Metadata

Also drives:• Office Graph (delve,..)• Compliance (DLP, …)

Page 34: Understanding and Applying Cloud Hybrid Search

Connectors to MANY Enterprise Systems

Plus a proven architecture and process for creating new connectors to complex systems

Databases• IBM DB2• Microsoft SQL Server• MySQL• Oracle Databases

ERP and Portal Systems• IBM WebSphere• Oracle Interaction (PlumTree)• Oracle WebCenter • SAP Business Suite• SAP DMS

Search and Cloud Systems• Box • Google Drive• Microsoft SharePoint 2013, 2010,

2007, FAST Search for SharePoint• Microsoft Search Server• Microsoft SharePoint Online• PharmaCircle• Scopus

Content and Collaboration Systems• Alfresco • Confluence• CuadraSTAR• EMC Documentum• EMC eRoom• HP Trim• IBM Connections• IBM Content Manager• IBM Filenet P8• IBM Lotus Notes • Objective DMS• OpenText LiveLink/RM• OpenText Hummingbird / eDocs• Oracle CMS/Stellent• Veeva Vault• Xerox DocuShare

Mailbox and Archiving Systems• HP Autonomy EAS / (Zantaz)• IBM Lotus Notes • Microsoft Exchange • Microsoft Exchange Public Folders• OpenText LiveLink File Archive• Symantex Evault

Practice Management Systems• Aderant Practice Management• Autonomy Worksite (iManage)• Elite/3E• Elite Prolaw • KnowledgeMill OnePlace• LegalKey • NetDocuments• Practical Law • RealPractice

CRM Systems• Microsoft Dynamics CRM• LexisNexis Interaction CRM• Salesforce.com/Force.com• ServiceNow• Any SQL Based CRM

Enterprise Social Networks• IBM Connections• Jive • SalesForce Chatter• Yammer

Page 35: Understanding and Applying Cloud Hybrid Search

OOTB Search Center, plus any and all tailored search experiences

Shows external content (Connectors) and consistent metadata (AutoClassifier)

External Content in O365 UX

Unified view across all content - on-premises and on-line- inside and outside SharePoint

Page 36: Understanding and Applying Cloud Hybrid Search

External content also surfaces in Delve, Modern Team Sites, etc. via the Office Graph

Current Caveats:1) don’t see thumbnails, just file icons2) Have to query for it to show up

Page 37: Understanding and Applying Cloud Hybrid Search

24,000 employees - moving all SharePoint to O365– Keep up with Microsoft’s roadmap and innovation– Major cost savings

Index in O365– Documentum and File Shares on-prem

Crawl on-prem from O365– Used Cloud SSA & BA Insight connectors

Case Study: Cloud SSA, external content

Large global companyin materials science

Page 38: Understanding and Applying Cloud Hybrid Search

External Content(on-premises

and/orin the cloud)

Connectors

FileHandlers

/iFilters

CrawlerContent

Processing Index

CustomProcessing

Content Flow – on-prem indexing

CEWS

Bottlenecks:1) Source systems2) Content Processing3) Indexer….

Page 39: Understanding and Applying Cloud Hybrid Search

External Content(on-premises

and/orin the cloud)

Connectors

FileHandlers

/iFilters

CrawlerContent

Processing Index

Content Flow – Cloud Hybrid Search OOB

CloudSSA

Bottlenecks:1) Uplink2) Source systems….

Page 40: Understanding and Applying Cloud Hybrid Search

42

PerformanceUplink is nowthe bottleneck

Want a tool?

Page 41: Understanding and Applying Cloud Hybrid Search

SCS under the hood

Crawler

Content

Indexing API

Blob store

Document state table

Work queues

Backend API

Index/Graph

On-Premises content source

Search farm

Azure Broker

Crawler

Content

SPO content source

External Endpoint

Internal Endpoint

Page 42: Understanding and Applying Cloud Hybrid Search

MICROSOFT CONFIDENT IAL – INTERNAL ONLY

Her user token gets rehydrated with her online claims as she is authenticated against Office 365.

Cloud SSA

SPOSearch Index

Logical architecture: query

Corporate network

SP 2013

1

2a

Jaden issues a query from Office 365.Her user token contains her online identity and group memberships.

1

Jaden issues a query from a site on-premises. This sends over her on-premises claims to SPO

2a

2b

2b

Office 365

Page 43: Understanding and Applying Cloud Hybrid Search

SUPPORTED– Custom IFilter– BCS connectors– Partner connectors

Customizations with Cloud Hybrid Search

SUPPORTED– Tenant level schema mapping– Query rules– Result sources

Cloud SSA SCS/O365

NOT SUPPORTED• Content that requires custom

security trimming

NOT SUPPORTED• Site collection level schema mapping• Custom security trimming• Custom entity extraction• Content enrichment web service

Page 44: Understanding and Applying Cloud Hybrid Search

Issues with Cloud Hybrid Search (1)Cloud Hybrid Search "annoyances"

Performance Characteristicsslower query latency for on-prem queries against Cloud SSA

SharePoint Online Limitationsno synonymsno site-level schemano full trust code access

Hybrid Administration Weaknessesclunky metadata mappingcan't remove on-premises search results from Cloud SSAtrickier to test & debug crawls can't reset index from Cloud SSA 

Be aware of these& compensate for them

(Fixed in August PU)

(Semi-addressed in June PU)

And it’s getting better:

Page 45: Understanding and Applying Cloud Hybrid Search

Should I run index reset?

NO!With SP2016 (June PU), at least you are warned, and you cancause a partial reset using CSOM DeleteAllCloudHybridSearchContent()https://blogs.technet.microsoft.com/beyondsharepoint/2016/07/07/cloud-hybrid-search-service-application-removing-items-from-the-office-365-search-index/

Page 46: Understanding and Applying Cloud Hybrid Search

Issues with Cloud Hybrid Search (2)

50

Content Enrichmentno CEWSno Entity Extraction

Securityno Custom Security TrimmingCan't crawl across Multiple DomainsCan't Crawl SP in Classic Auth Mode

Data Sovereigntyexport-restricted content can't be put in O365 index

Limitations of Cloud SSA

These can all be solved!

Page 47: Understanding and Applying Cloud Hybrid Search

Cloud SSA

External Content(on-premises

and/orin the cloud)

SharePoint Server(On-premises or Hosted)

SPO ContentOneDrive Content

Conn

ecto

rs SharePoint Content

ConnectorFramework

Office 365

AutoClassifier(app

version)

O365Search Index

CEWS

No CEWS?Add it back in

CustomProcessing

Page 48: Understanding and Applying Cloud Hybrid Search

Case study:Content Enrichment

Top 5 Energy Company

“Home-built” CEWS stages Domain Specific Entity Extraction “Clean-up” processing

Using Cloud Hybrid Search, content from SharePoint 2010, SharePoint 2013 Documentum Fileshares EBSCO

Using BA Insight Connector Framework Smart Pipeline Feature for multiple CEWS stages AutoClassifier for entity extraction

ContentCloudSSA

Connector Framework

IndexingConnectors

Smart Pipeline

AutoClassifier Custom Stage A

CustomStage C

Custom Stage B

Page 49: Understanding and Applying Cloud Hybrid Search

AAD

AD

Online

On-Prem

Cloud Hybrid Search under the coversSecurity = identity sync + ACL mapping

Cloud SSACloud SSA

ParseCrawl

SCS

ACL Map

Process

Blob store

queue

Azure O365

Page 50: Understanding and Applying Cloud Hybrid Search

• Security principals can be managed on-premises and synched to the cloud by using the AAD sync tool.

• The object in the cloud (AAD) directory now mirrors the object in the on-premises (AD) directory.

Directory Synchronization

AD AAD

AccountName

CORP\jaden

SID S-1-5-21-1212121212-1212121212-1212

AccountName

[email protected]

msOnline-OnPremiseSecurityIdentifier

S-1-5-21-1212121212-1212121212-1212

PUID PUID-XXXX-XXXXXXXXXX

Page 51: Understanding and Applying Cloud Hybrid Search

Mapping of Access Control ListsAs items are indexed in Office 365, the access control entries are looked up in the cloud directory service.

Allow: S-1-5-21-1212121212-1212121212-1212

Allow: PUID-XXXX-XXXXXXXXXX

• User SIDs are mapped to PUIDs• Group SIDs are mapped to Object IDs• «Everyone» and «Authenticated users» are mapped to

«Everyone except external users» Only AD Users and Groups, Only from one domain

Page 52: Understanding and Applying Cloud Hybrid Search

Multiple Domains Can only sync one with O365

No way to change entitlements OOB

Using 2 BA Insight connectors SharePoint 2010 Yammer

Convert entitlements when crawling

Case Study: Crawling Cross-Domain European

Telecom Provider

Page 53: Understanding and Applying Cloud Hybrid Search

A global single index solution

Cloud SSA

Cloud SSA

Cloud SSA

Cloud SSA

Cloud SSA

BUT export-restricted content can’t be in the global index

Page 54: Understanding and Applying Cloud Hybrid Search

Issues with Cloud Hybrid Search OOB

Content Enrichmentno CEWSno Entity Extraction

Securityno Custom Security TrimmingCan't crawl across Multiple DomainsCan't Crawl SP in Classic Auth Mode

Data Sovereigntyexport-restricted content can't be put in O365 index

Limitations of Cloud SSA BA Insight Solution

Connector Framework AutoClassifierConnector Framework can 'map down' to AD groupscan 'map across' cross-domaincan crawl and map security

Federator

Solution

Page 55: Understanding and Applying Cloud Hybrid Search

Hybrid SharePoint: what, why & how

Cloud Hybrid Search & the Cloud SSA• What, Why, & How• Resources for SP2013 and SP2016

Going Deeper with “Case Study” Examples• External Content• Scaling & Performance• Content Enrichment• Security• Global Deployment

New

Page 56: Understanding and Applying Cloud Hybrid Search

Key Considerations for Hybrid: Workloads, Environment, Data, Customizations

Availability of features Online versus On-Premises on particular workloads

Significant investments in customization of On-Premises workloads

Concerns over global network performance with remote sites

Regulatory considerations

Manageability concerns

Page 57: Understanding and Applying Cloud Hybrid Search

Succeeding with Hybrid SearchIt’s still a project

There are resources and tools to help

Expect to iterate

Page 58: Understanding and Applying Cloud Hybrid Search
Page 59: Understanding and Applying Cloud Hybrid Search
Page 60: Understanding and Applying Cloud Hybrid Search