32
© 2012 Microsoft FILETABLE AND SEMANTIC SEARCH IN SQL SERVER 2012 Michael Rys Principal Program Manager Microsoft Corp @SQLServerMike

FileTable and Semantic Search in SQL Server 2012

Embed Size (px)

DESCRIPTION

SQL Saturday 109 Presentation on FileTable and Semantic Search in SQL Server 2012

Citation preview

Page 1: FileTable and Semantic Search in SQL Server 2012

© 2012 Microsoft

FILETABLE AND SEMANTIC SEARCH IN SQL SERVER 2012

Michael RysPrincipal Program ManagerMicrosoft Corp@SQLServerMike

Page 2: FileTable and Semantic Search in SQL Server 2012

MY FAVORITE BEYOND RELATIONAL APPLICATION

Structured and unstructured Search

Related/”Semantic” Search

Page 3: FileTable and Semantic Search in SQL Server 2012

BEYOND RELATIONAL DATA

Building and Maintaining Applications with relational and non-relational data is hard

Complex integrationDuplicated functionalityCompensation for unavailable services

Pain Points

Goals

Reduce the cost of managing all dataSimplify the development of applications over all dataProvide management and programming services for all data

Page 4: FileTable and Semantic Search in SQL Server 2012

RICH UNSTRUCTURED DATA IN SQL SERVER 2012

• 80% of all data is not stored in databases! Most of it is “unstructured”

• Make SQL Server the preferred choice for managing Unstructured Data and allow building Rich Application Experience on top

• Address important customer requests for Capabilities and rich services for Rich Unstructured Data (RUDS)

o Scale Up for storage and search to 100mio to 500mio documentso Easy use/access to Unstructured data from all applicationso Rich insight into unstructured data to make better decisions

Page 5: FileTable and Semantic Search in SQL Server 2012

DEMO

Teaser: MySemanticSearchhttp://mysemanticsearch.codeplex.com

Page 6: FileTable and Semantic Search in SQL Server 2012

RICH UNSTRUCTURED DATA & SERVICES ECOSYSTEM

Fulltext Search

Semantic Similarity Search

Rich

S

erv

ices

Database

Disk1

Disk2

Disk3

Multiple Containers

Sca

le-u

p

Solu

tions

Database Applications

Transactional Access

Blobs

DB FileStre

DB FileStreams

Integrated Backup/Replication/AlwaysOn

Integrated AdministrationIntegrated Administration?

Windows Apps

SMB Share Files/Folders

FileStream API

Streaming Win32 AccessStreaming Win32 Access??

Customer Application

Azure lib Centera lib

SQL FILESTREAM lib

SQL RBS API

Azure Centera SQL DB

Remote BLOB Storage

FileStreamsFileTable

SQL Apps

Page 7: FileTable and Semantic Search in SQL Server 2012

DEMO

Integrated Management of documents in SQL Server 2012

Page 8: FileTable and Semantic Search in SQL Server 2012

FILETABLE OVERVIEW

FileTable: A Table of Files/Directories

User created Table with a fixed schema

contains FILESTREAM and File Attributes

Each row represents a File or a Directory

System defined constraints maintain the tree integrity

File/Directory hierarchy view through a Windows Share

Supports Win32 APIs for File/Directory Management

DB Storage is Transparent to Win32 applications

SMB level of application compatibility

Virtual network name (VNN) path support for transparent Win32 application failover

Private Docs(Database1)

Office Docs(Database2)

LogFiles (FileTable)

Documents(FileTable)

Media(FileTable)

MSSQLSERVER

\\my_machine\MSSQLSERVER\Office Docs\Documents

FILESTREAM Share

Database Directories

FileTable Directories

FileTable Folder Hierarchy

User-Defined Directory Structure

Page 9: FileTable and Semantic Search in SQL Server 2012

CREATING A FILETABLE

Pre-requisitesEnable FILESTREAM

Create FILESTREAM Share and Filegroup

Enable non-transactional access at the DB levelALTER DATABASE Contoso SET FILESTREAM( non_transacted_access=FULL, Directory_name = N’Contoso’)

Create FileTableCREATE TABLE Contoso..Documents AS FILETABLE

WITH (filetable_directory = N'Document Library')

Access at \\<machine name>\<FILESTREAM share>\Contoso\Document Library\

Page 10: FileTable and Semantic Search in SQL Server 2012

MODIFYING A FILETABLE

FileTable has a fixed schemaColumns, system defined constraints cannot be altered/dropped

Allows user defined indexes/constraints/triggers

Disabling/Enabling FileTable NamespaceALTER TABLE Documents DISABLE FILETABLE_NAMESPACE

Disables all system-defined constraints and Win32 access to FileTable

Useful for bulk-loading/re-organization of data

FileTable can be dropped similar to any other tableCatalog views can be used for obtaining metadata

Page 11: FileTable and Semantic Search in SQL Server 2012

DATA ACCESS – FILE SYSTEM ACCESS

FileTable hierarchy is visible through Filestream share

\\machine\<FILESTREAMshare>\<Database_directory>\<FileTable_Directory>\...

Provides transparent Win32 API & File/Directory Management capabilities

e.g. MS word can create/open/save files; xcopy for copying directory trees into database..

Win32 API operations are non-transactionalOperations cannot be part of any user transactions

Win32 operations are intercepted by SQL Server at the File system level

e.g. File/Directory creation/deletion => insert/delete into FileTable

Full locking/concurrency semantics with other accesses

Allows in-place update of file stream data/File attributes

Transactional FILESTREAM APIs can also be used.

Page 12: FileTable and Semantic Search in SQL Server 2012

DATA ACCESS – T-SQL ACCESS

Normal Insert/Update/Delete allowed for the FileTable manipulationFileTable Namespace integrity constraints enforced

Set based operations on the File-attributes – value add

Built-in functionsGetFileNamespacePath() – UNC path for a file/directory

FileTableRootPath() – UNC path to the FileTable root

GetPathlocator() – path_locator value for a file/directory

DDL/DML Triggers are supportedDML triggers on a FileTable cannot update any FileTables

Page 13: FileTable and Semantic Search in SQL Server 2012

MANAGING FILETABLE

DB Backup/Restore operations include FileTable data

Point in time Restore’ may contain more recent FILESTREAM data due to non-transactional updates during backup

FileTables are secured similar to any other user tables

Same security is enforced for Win32 access also

Data LoadingWindows tools like xcopy/robocopy OR drag-drop operations through Windows Explorer can be used

BCP operations are supported for direct T-SQL data inserts

SSMS supports FileTable creation/exploration

Page 14: FileTable and Semantic Search in SQL Server 2012

MANAGING FILETABLE – HIGH AVAILABILITY

SQL Server 2012 AlwaysOn is fully supported

Transparent data failoverFileTables can be configured with multiple secondary nodes

Both sync and async data replication is supported

File and metadata is available in the secondary in case of failover

Transparent application failoverVirtual network name (VNN) path support for transparent Win32 application failover

Applications use \\VNN\Share\db\... Path

Applications are automatically redirected to the secondary in case of failover

RestrictionsFileTables cannot participate in “Read-only” replicas.

Page 15: FileTable and Semantic Search in SQL Server 2012

FILETABLE RESTRICTIONS

FileTables cannot be partitionedMerge/Transactional replications are not supportedRCSI/SnapShot isolation mode

Applications cannot modify file stream data in FileTables

Win32 Application compatibilityMemory mapped files, Directory notifications, links are not supported

Page 16: FileTable and Semantic Search in SQL Server 2012

UNSTRUCTURED DATA SCALE-UPMULTIPLE CONTAINERS FOR FILESTREAM DATA

SQL 2008 R2Only one storage container/FILESTREAM filegroup

Limits storage capacity scaling and I/O scaling

SQL Server 2012Support for multiple storage containers/filegroup.

DDL Changes to Create/Alter Database statements

Ability to set max_size for the containers

DBCC Shrinkfile Emptyfile support

Scaling FlexibilityStorage scaling by adding additional storage drives

I/O scaling with multiple spindles

Page 17: FileTable and Semantic Search in SQL Server 2012

UNSTRUCTURED DATA : MULTIPLE CONTAINERS

Use of multiple spindles for achieving better I/O Scalability

Page 18: FileTable and Semantic Search in SQL Server 2012

RUDS SCALE-UP: FILESTREAM PERF/SCALEImproved performance of T-SQL and File I/O access

Various enhancements to improve read/write throughput 5 fold increase in Read throughput

Linear scaling with large number of concurrent threads

2012 2012

Page 19: FileTable and Semantic Search in SQL Server 2012

SUMMARY: FILETABLE

Application Compatibility for Windows Applications

Windows applications run on top of files stored in FileTables with no modifications

Relational Value PropositionProvide Integrated Administration and Services

Backup, Log Shipping, HA-DR, Full text and Semantic search, …

T-SQL orthogonalityFile/Folder attributes surfaced through relational columns

Power of set based operations, Policy Management, Reporting etc

FileNamespace Hierarchy management

Page 20: FileTable and Semantic Search in SQL Server 2012

FULL TEXT SEARCH IMPROVEMENTS IN SQL SERVER 2012Improved Performance and Scale:

Scale-up to 350M documents

iFTS query perf 7-10 times faster than in SQL Server 2008

Worst-case iFTS query response times < 3 sec for corpus

At par or better than main database search competitors

New Functionality:Property Search

customizable NEAR

New Wordbrakers: update existing WB, add Czech and Greek

Innovation in Search: Semantic Similarity Search

Page 21: FileTable and Semantic Search in SQL Server 2012

FULLTEXT SEARCH PERFORMANCE & SCALE IMPROVEMENTS

Architectural ImprovementsImproved internal implementation

Queries no longer block Index updates

Improved Query Plans: Better Plans for common queries

Fulltext predicate folding

Parallel Plan execution

Index and Query tested on scale up to 350Million documents with <~2 Sec Response

~3X better w/o DML and ~9X better with DML throughput

Scale easily with increasing number of connections

Page 22: FileTable and Semantic Search in SQL Server 2012

SCALE-UP: FULL-TEXT SEARCH

Queries over 350M documents database and random DMLs running in background. Beating SQL Server 2005 with a scale factor more than 2x and with avg 60x times better throughput

2012

2005/8

2005/8 vs 2012

Page 23: FileTable and Semantic Search in SQL Server 2012

SCALE-UP: FULL-TEXT SEARCH

Query avgExecTime (ms) under various number of connections (50 ~ 2000 users) for customer playback benchmark

2012

2005/8

2005/8 vs 2012

Page 24: FileTable and Semantic Search in SQL Server 2012

FULLTEXT PROPERTY SCOPED SEARCH

• Setup once per database instance to load the office filtersexec sp_fulltext_service 'load_os_resources',1goexec sp_fulltext_service 'restart_all_fdhosts'go

• Create a property listCREATE SEARCH PROPERTY LIST p1;

• Add properties to be extractedALTER SEARCH PROPERTY LIST [p1] ADD N'System.Author' WITH

(PROPERTY_SET_GUID = 'f29f85e0-4ff9-1068-ab91-08002b27b3d9', PROPERTY_INT_ID = 4, PROPERTY_DESCRIPTION = N'System.Author');

• Create/Alter Fulltext index to specify property list to be extractedALTER FULLTEXT INDEX ON fttable... SET SEARCH PROPERTY LIST = [p1];

• Query for propertiesSELECT * FROM fttable WHERE CONTAINS(PROPERTY(ftcol, 'System.Author'), 'fernlope');

New Search Filter for Document PropertiesCONTAINS (PROPERTY ( { column_name }, 'property_name' ),

‘contains_search_condition’ )

Page 25: FileTable and Semantic Search in SQL Server 2012

FULL-TEXT CUSTOMIZABLE NEAROLD NEAR SYNTAXselect * from fttable where contains(*, 'test near Space')

NEW NEAR USAGES

• SPECIFY DISTANCEselect * from fttable where contains(*, 'near((test, Space), 5,false)')

• REDUCE DISTANCEselect * from fttable where contains(*, 'near((test, Space), 2,false)')

• ORDER OF WORDS IS SPECIFIED AS IMPORTANTselect * from fttable where contains(*, 'near((test, Space), 5,true)')

Page 26: FileTable and Semantic Search in SQL Server 2012

STATISTICAL SEMANTIC SEARCHSemantic Insight into textual content

Uses language models to find most important keywords in documentNo need to build brittle ontologies!

Statistically Prominent KeywordsAutogenerated tag clouds

Potentially Related Content based on extracted Keywords, such asSimilar Products (based on description)

Similar Jobs or Applicants

Similar Support Incidents (based on call logs)

Potential Solutions (based on similar incidents)

First class usage experienceEfficent linear algorithms

Integrated with FTS and SQLNew Rowset functions for all results using SQL query

Page 27: FileTable and Semantic Search in SQL Server 2012

DEMO

Semantic Extraction and RelationshipsFullText Search in SQL Server 2012

Page 28: FileTable and Semantic Search in SQL Server 2012

SEMANTIC SIMILARITY• Input: Text such as varchar, Office, PDF, HTML, email…

Output: Rowset functions with standard SQL queries

Illustrating example:

Key Title Document

D1 Annual Budget …

D2 Corporate Earnings …

D3 Marketing Reports …

… … …

------------------------------------------------------------

----------------------------------------------------------------------

----------

------------------------------------------------------------

----------

Source Table

ID Keyword Colid … compDocid CompOc CompPid

K1 revenue 1 … 10,23,123 (1,4),(5,8),(1,34) 2,5,6,8,4,3

K2 growth 1 … 10,23,123 (1,5),(5,9),(1,34) 2,5,6,8,5,4

… … … … … …

Keyword Index (Full-Text)

Keyphrases KeyphraseDocumentsID DocID

T1 (revenue) D1 (Annual Budget)

T2 (growth) D2 (Corporate Earnings)

T3 (Windows) D3 (Marketing Reports)

… …

T1 (revenue) D7 (Finance Report)

… …

T3 (Windows) D11 (Azure Strategy)

T4 (Azure) D11 (Azure Strategy)

ID Keyword

T1 revenue

T2 growth

T3 Windows

T4 Azure

… …

DocumentSimilarityDocID MatchedDocID

D1 (Annual Budget) D2 (Corporate Earnings)

D1 (Annual Budget) D7 (Finance Report)

D3 (Marketing Reports) D11 (Azure Strategy)

… …

Full-Text and Semantic Processing

quarter, record, revenue…

2b

3

2 a1

+ Language Models 3

Page 29: FileTable and Semantic Search in SQL Server 2012

SEMANTIC EXTRACTION: END-2-END EXPERIENCE• Downloadable Language Statistical Database with registration

stored procedure• Setup along with Full-Text• Metadata / Catalog views• System level DMVs for progress state and usage• Manageability through SSMS and SMO

Page 30: FileTable and Semantic Search in SQL Server 2012

KEY TAKEAWAYS

SQL Server’s unstructured data support is key strategy to enable you to build complex data applications that go beyond relational data!

Content and Collaboration, eDiscovery, Healthcare, Document management etc.

Page 31: FileTable and Semantic Search in SQL Server 2012

RELATED CONTENT

SQL Server 2012 Whitepapers and information:http://www.sqlserverlaunch.com

Channel 9 DataBound Episode 2: http://channel9.msdn.com

MySemanticsSearch Demo: http://mysemanticsearch.codeplex.com

More demo data sets and demo scripts: http://blogs.msdn.com/b/sqlfts/archive/2011/07/21/introducing-fulltext-statistical-semantic-search-in-sql-server-codename-denali-release.aspx

Microsoft Virtual Academy Recording: Coming Soon!Find Me Later At…

• On Twitter: @SQLServerMike• Blog: http://sqlblog.com/blogs/michael_rys• Email: [email protected]

Page 32: FileTable and Semantic Search in SQL Server 2012