Upload
michael-rys
View
7.994
Download
3
Tags:
Embed Size (px)
DESCRIPTION
SQL Saturday 109 Presentation on FileTable and Semantic Search in SQL Server 2012
Citation preview
© 2012 Microsoft
FILETABLE AND SEMANTIC SEARCH IN SQL SERVER 2012
Michael RysPrincipal Program ManagerMicrosoft Corp@SQLServerMike
MY FAVORITE BEYOND RELATIONAL APPLICATION
Structured and unstructured Search
Related/”Semantic” Search
BEYOND RELATIONAL DATA
Building and Maintaining Applications with relational and non-relational data is hard
Complex integrationDuplicated functionalityCompensation for unavailable services
Pain Points
Goals
Reduce the cost of managing all dataSimplify the development of applications over all dataProvide management and programming services for all data
RICH UNSTRUCTURED DATA IN SQL SERVER 2012
• 80% of all data is not stored in databases! Most of it is “unstructured”
• Make SQL Server the preferred choice for managing Unstructured Data and allow building Rich Application Experience on top
• Address important customer requests for Capabilities and rich services for Rich Unstructured Data (RUDS)
o Scale Up for storage and search to 100mio to 500mio documentso Easy use/access to Unstructured data from all applicationso Rich insight into unstructured data to make better decisions
DEMO
Teaser: MySemanticSearchhttp://mysemanticsearch.codeplex.com
RICH UNSTRUCTURED DATA & SERVICES ECOSYSTEM
Fulltext Search
Semantic Similarity Search
Rich
S
erv
ices
Database
Disk1
Disk2
Disk3
Multiple Containers
Sca
le-u
p
Solu
tions
Database Applications
Transactional Access
Blobs
DB FileStre
DB FileStreams
Integrated Backup/Replication/AlwaysOn
Integrated AdministrationIntegrated Administration?
Windows Apps
SMB Share Files/Folders
FileStream API
Streaming Win32 AccessStreaming Win32 Access??
Customer Application
Azure lib Centera lib
SQL FILESTREAM lib
SQL RBS API
Azure Centera SQL DB
Remote BLOB Storage
FileStreamsFileTable
SQL Apps
DEMO
Integrated Management of documents in SQL Server 2012
FILETABLE OVERVIEW
FileTable: A Table of Files/Directories
User created Table with a fixed schema
contains FILESTREAM and File Attributes
Each row represents a File or a Directory
System defined constraints maintain the tree integrity
File/Directory hierarchy view through a Windows Share
Supports Win32 APIs for File/Directory Management
DB Storage is Transparent to Win32 applications
SMB level of application compatibility
Virtual network name (VNN) path support for transparent Win32 application failover
Private Docs(Database1)
Office Docs(Database2)
LogFiles (FileTable)
Documents(FileTable)
Media(FileTable)
MSSQLSERVER
\\my_machine\MSSQLSERVER\Office Docs\Documents
FILESTREAM Share
Database Directories
FileTable Directories
FileTable Folder Hierarchy
User-Defined Directory Structure
CREATING A FILETABLE
Pre-requisitesEnable FILESTREAM
Create FILESTREAM Share and Filegroup
Enable non-transactional access at the DB levelALTER DATABASE Contoso SET FILESTREAM( non_transacted_access=FULL, Directory_name = N’Contoso’)
Create FileTableCREATE TABLE Contoso..Documents AS FILETABLE
WITH (filetable_directory = N'Document Library')
Access at \\<machine name>\<FILESTREAM share>\Contoso\Document Library\
MODIFYING A FILETABLE
FileTable has a fixed schemaColumns, system defined constraints cannot be altered/dropped
Allows user defined indexes/constraints/triggers
Disabling/Enabling FileTable NamespaceALTER TABLE Documents DISABLE FILETABLE_NAMESPACE
Disables all system-defined constraints and Win32 access to FileTable
Useful for bulk-loading/re-organization of data
FileTable can be dropped similar to any other tableCatalog views can be used for obtaining metadata
DATA ACCESS – FILE SYSTEM ACCESS
FileTable hierarchy is visible through Filestream share
\\machine\<FILESTREAMshare>\<Database_directory>\<FileTable_Directory>\...
Provides transparent Win32 API & File/Directory Management capabilities
e.g. MS word can create/open/save files; xcopy for copying directory trees into database..
Win32 API operations are non-transactionalOperations cannot be part of any user transactions
Win32 operations are intercepted by SQL Server at the File system level
e.g. File/Directory creation/deletion => insert/delete into FileTable
Full locking/concurrency semantics with other accesses
Allows in-place update of file stream data/File attributes
Transactional FILESTREAM APIs can also be used.
DATA ACCESS – T-SQL ACCESS
Normal Insert/Update/Delete allowed for the FileTable manipulationFileTable Namespace integrity constraints enforced
Set based operations on the File-attributes – value add
Built-in functionsGetFileNamespacePath() – UNC path for a file/directory
FileTableRootPath() – UNC path to the FileTable root
GetPathlocator() – path_locator value for a file/directory
DDL/DML Triggers are supportedDML triggers on a FileTable cannot update any FileTables
MANAGING FILETABLE
DB Backup/Restore operations include FileTable data
Point in time Restore’ may contain more recent FILESTREAM data due to non-transactional updates during backup
FileTables are secured similar to any other user tables
Same security is enforced for Win32 access also
Data LoadingWindows tools like xcopy/robocopy OR drag-drop operations through Windows Explorer can be used
BCP operations are supported for direct T-SQL data inserts
SSMS supports FileTable creation/exploration
MANAGING FILETABLE – HIGH AVAILABILITY
SQL Server 2012 AlwaysOn is fully supported
Transparent data failoverFileTables can be configured with multiple secondary nodes
Both sync and async data replication is supported
File and metadata is available in the secondary in case of failover
Transparent application failoverVirtual network name (VNN) path support for transparent Win32 application failover
Applications use \\VNN\Share\db\... Path
Applications are automatically redirected to the secondary in case of failover
RestrictionsFileTables cannot participate in “Read-only” replicas.
FILETABLE RESTRICTIONS
FileTables cannot be partitionedMerge/Transactional replications are not supportedRCSI/SnapShot isolation mode
Applications cannot modify file stream data in FileTables
Win32 Application compatibilityMemory mapped files, Directory notifications, links are not supported
UNSTRUCTURED DATA SCALE-UPMULTIPLE CONTAINERS FOR FILESTREAM DATA
SQL 2008 R2Only one storage container/FILESTREAM filegroup
Limits storage capacity scaling and I/O scaling
SQL Server 2012Support for multiple storage containers/filegroup.
DDL Changes to Create/Alter Database statements
Ability to set max_size for the containers
DBCC Shrinkfile Emptyfile support
Scaling FlexibilityStorage scaling by adding additional storage drives
I/O scaling with multiple spindles
UNSTRUCTURED DATA : MULTIPLE CONTAINERS
Use of multiple spindles for achieving better I/O Scalability
RUDS SCALE-UP: FILESTREAM PERF/SCALEImproved performance of T-SQL and File I/O access
Various enhancements to improve read/write throughput 5 fold increase in Read throughput
Linear scaling with large number of concurrent threads
2012 2012
SUMMARY: FILETABLE
Application Compatibility for Windows Applications
Windows applications run on top of files stored in FileTables with no modifications
Relational Value PropositionProvide Integrated Administration and Services
Backup, Log Shipping, HA-DR, Full text and Semantic search, …
T-SQL orthogonalityFile/Folder attributes surfaced through relational columns
Power of set based operations, Policy Management, Reporting etc
FileNamespace Hierarchy management
FULL TEXT SEARCH IMPROVEMENTS IN SQL SERVER 2012Improved Performance and Scale:
Scale-up to 350M documents
iFTS query perf 7-10 times faster than in SQL Server 2008
Worst-case iFTS query response times < 3 sec for corpus
At par or better than main database search competitors
New Functionality:Property Search
customizable NEAR
New Wordbrakers: update existing WB, add Czech and Greek
Innovation in Search: Semantic Similarity Search
FULLTEXT SEARCH PERFORMANCE & SCALE IMPROVEMENTS
Architectural ImprovementsImproved internal implementation
Queries no longer block Index updates
Improved Query Plans: Better Plans for common queries
Fulltext predicate folding
Parallel Plan execution
Index and Query tested on scale up to 350Million documents with <~2 Sec Response
~3X better w/o DML and ~9X better with DML throughput
Scale easily with increasing number of connections
SCALE-UP: FULL-TEXT SEARCH
Queries over 350M documents database and random DMLs running in background. Beating SQL Server 2005 with a scale factor more than 2x and with avg 60x times better throughput
2012
2005/8
2005/8 vs 2012
SCALE-UP: FULL-TEXT SEARCH
Query avgExecTime (ms) under various number of connections (50 ~ 2000 users) for customer playback benchmark
2012
2005/8
2005/8 vs 2012
FULLTEXT PROPERTY SCOPED SEARCH
• Setup once per database instance to load the office filtersexec sp_fulltext_service 'load_os_resources',1goexec sp_fulltext_service 'restart_all_fdhosts'go
• Create a property listCREATE SEARCH PROPERTY LIST p1;
• Add properties to be extractedALTER SEARCH PROPERTY LIST [p1] ADD N'System.Author' WITH
(PROPERTY_SET_GUID = 'f29f85e0-4ff9-1068-ab91-08002b27b3d9', PROPERTY_INT_ID = 4, PROPERTY_DESCRIPTION = N'System.Author');
• Create/Alter Fulltext index to specify property list to be extractedALTER FULLTEXT INDEX ON fttable... SET SEARCH PROPERTY LIST = [p1];
• Query for propertiesSELECT * FROM fttable WHERE CONTAINS(PROPERTY(ftcol, 'System.Author'), 'fernlope');
New Search Filter for Document PropertiesCONTAINS (PROPERTY ( { column_name }, 'property_name' ),
‘contains_search_condition’ )
FULL-TEXT CUSTOMIZABLE NEAROLD NEAR SYNTAXselect * from fttable where contains(*, 'test near Space')
NEW NEAR USAGES
• SPECIFY DISTANCEselect * from fttable where contains(*, 'near((test, Space), 5,false)')
• REDUCE DISTANCEselect * from fttable where contains(*, 'near((test, Space), 2,false)')
• ORDER OF WORDS IS SPECIFIED AS IMPORTANTselect * from fttable where contains(*, 'near((test, Space), 5,true)')
STATISTICAL SEMANTIC SEARCHSemantic Insight into textual content
Uses language models to find most important keywords in documentNo need to build brittle ontologies!
Statistically Prominent KeywordsAutogenerated tag clouds
Potentially Related Content based on extracted Keywords, such asSimilar Products (based on description)
Similar Jobs or Applicants
Similar Support Incidents (based on call logs)
Potential Solutions (based on similar incidents)
First class usage experienceEfficent linear algorithms
Integrated with FTS and SQLNew Rowset functions for all results using SQL query
DEMO
Semantic Extraction and RelationshipsFullText Search in SQL Server 2012
SEMANTIC SIMILARITY• Input: Text such as varchar, Office, PDF, HTML, email…
Output: Rowset functions with standard SQL queries
Illustrating example:
Key Title Document
D1 Annual Budget …
D2 Corporate Earnings …
D3 Marketing Reports …
… … …
------------------------------------------------------------
----------------------------------------------------------------------
----------
------------------------------------------------------------
----------
Source Table
ID Keyword Colid … compDocid CompOc CompPid
K1 revenue 1 … 10,23,123 (1,4),(5,8),(1,34) 2,5,6,8,4,3
K2 growth 1 … 10,23,123 (1,5),(5,9),(1,34) 2,5,6,8,5,4
… … … … … …
Keyword Index (Full-Text)
Keyphrases KeyphraseDocumentsID DocID
T1 (revenue) D1 (Annual Budget)
T2 (growth) D2 (Corporate Earnings)
T3 (Windows) D3 (Marketing Reports)
… …
T1 (revenue) D7 (Finance Report)
… …
T3 (Windows) D11 (Azure Strategy)
T4 (Azure) D11 (Azure Strategy)
ID Keyword
T1 revenue
T2 growth
T3 Windows
T4 Azure
… …
DocumentSimilarityDocID MatchedDocID
D1 (Annual Budget) D2 (Corporate Earnings)
D1 (Annual Budget) D7 (Finance Report)
D3 (Marketing Reports) D11 (Azure Strategy)
… …
Full-Text and Semantic Processing
quarter, record, revenue…
2b
3
2 a1
+ Language Models 3
SEMANTIC EXTRACTION: END-2-END EXPERIENCE• Downloadable Language Statistical Database with registration
stored procedure• Setup along with Full-Text• Metadata / Catalog views• System level DMVs for progress state and usage• Manageability through SSMS and SMO
KEY TAKEAWAYS
SQL Server’s unstructured data support is key strategy to enable you to build complex data applications that go beyond relational data!
Content and Collaboration, eDiscovery, Healthcare, Document management etc.
RELATED CONTENT
SQL Server 2012 Whitepapers and information:http://www.sqlserverlaunch.com
Channel 9 DataBound Episode 2: http://channel9.msdn.com
MySemanticsSearch Demo: http://mysemanticsearch.codeplex.com
More demo data sets and demo scripts: http://blogs.msdn.com/b/sqlfts/archive/2011/07/21/introducing-fulltext-statistical-semantic-search-in-sql-server-codename-denali-release.aspx
Microsoft Virtual Academy Recording: Coming Soon!Find Me Later At…
• On Twitter: @SQLServerMike• Blog: http://sqlblog.com/blogs/michael_rys• Email: [email protected]