Upload
ambrose-gallagher
View
215
Download
0
Embed Size (px)
Citation preview
Module 20
Working with Full-Text Indexes and Queries
Module Overview
• Introduction to Full-Text Indexing
• Implementing Full-Text Indexes in SQL Server
• Working with Full-Text Queries
Lesson 1: Introduction to Full-Text Indexing
• Discussion: The Need for More Flexible User Interaction
• Why LIKE Isn't Enough
• Fuzziness in Queries
• Demonstration 1A: Using Full-Text Queries
Discussion: The Need for More Flexible User Interaction
Consider this search page:
It’s hard to imagine anyone wanting separate fields, etc. yet this is exactly how we build most business applications today Why?
Ask yourself if you’d prefer an interface like this:
Why LIKE Isn't Enough
• We can try to build more flexible search using the T-SQL LIKE operator
• A search for somename named Lee returns Ashlee, Carolee, Colleen, Kathleen, Kaylee, Lee, Shirleen, etc.
• Substrings aren't words: a search for Pen returns pen, pencil, pendulum, penitentiary, open, etc.
• Searching for two words gets even more complicated:
SELECT DISTINCT FirstName FROM Person.Person WHERE FirstName LIKE '%Lee%';
SELECT DISTINCT FirstName FROM Person.Person WHERE FirstName LIKE '%Lee%';
WHERE Details LIKE '%Fred%Terry%'OR Details LIKE '%Terry%Fred%'WHERE Details LIKE '%Fred%Terry%'OR Details LIKE '%Terry%Fred%'
Fuzziness in Queries
• IT Professionals tend to like to work in an exact and precise way
• End users prefer flexible and fuzzy search capabilities
• You might be able to find substrings in T-SQL but how would you find: the word Kathleen near the word bicycle?
the word Client when the search term was Customer?
the words Driving or Drove when the search term was Drive?
rows relating to Attempts to Improve Solar Energy Efficiency?
Demonstration 1A: Using Full-Text Queries
In this demonstration you will see why full-text indexing is important for creating advanced and flexible user interfaces
Lesson 2: Implementing Full-Text Indexes in SQL Server
• Discussion: Search-related Options
• Full-Text Search in SQL Server
• Core Components of Full-Text Search
• Language Support and Supported Word Breakers
• Implementing Full-Text Indexes
• Demonstration 2A: Implementing Full-Text Indexes
Discussion: Search-related Options
• Which forms of search are you familiar with? Bing?
Search in the operating system?
Search in Outlook?
Full-text search in earlier versions of SQL Server?
Other search engines?
Full-Text Search in SQL Server
• Search allows full-text queries against character-based data stored in SQL Server char, varchar, nchar, nvarchar
text, ntext, image
xml
varbinary(max)
• Indexes are created on the tables containing the character-based
data
are stored in the database along with other data
allows columns to be written in many languages
can query with simple words or phrases
can rank results via table-valued functions
Core Components of Full-Text Search
Component Purpose
iFilter Extracts a stream of text
Word Breaker Finds word boundaries
Stemmer Conjugates verbs, performs inflectional expansions
Noise Word Removal Removes words that are not useful in an index
Indexing Creates indexes for extracted words
Querying Executes full-text queries
Scoring Ranks results
Which of these components are likely to be specific to a language?
Language Support and Supported Word Breakers
ArabicBengaliBrazilianBritish English BulgarianCanadian EnglishCatalan Chinese (Simplified)Chinese (Traditional)Chinese (Hong Kong)Chinese (Macau)Chinese (Singapore)CroatianDanishDutchEnglishFrench
GermanGujaratiHebrewHindiIcelandicIndonesianItalianJapaneseKoreanLatvianLithuanianMalay - MalaysiaMalayalamMarathiNeutralNorwegianPolish
PortuguesePunjabiRomanianRussianSerbian (Cryllic)Serbian (Latin)SlovakSlovenianSpanishSwedishTamilTeluguThaiTurkishUkrainianUrdu Vietnamese
Query sys.fulltext_languages to see the current list
Implementing Full-Text Indexes
Steps to implement full-text indexing:
Must have a table with character-based dataüü
Create a full-text catalog (if none already)üü
Create a full-text index on the table
Populate the index
üü
üü
Query the index using full-text predicates or TVFsüü
Demonstration 2A: Implementing Full-Text Indexes
In this demonstration you will see
• how to create a full-text catalog
• how to create a full-text index
• how to check when a full-text index is fully populated
Lesson 3: Working with Full-Text Queries
• CONTAINS Queries
• FREETEXT Queries
• Table Functions and Ranking Results
• Thesaurus
• Stopwords and Stoplists
• SQL Server Management of Full-Text
• Demonstration 3A: Working with Full-Text Queries
CONTAINS Queries
SELECT MessageID,Description FROM dbo.MessagesWHERE CONTAINS(Description,'filing')ORDER BY MessageID;
SELECT MessageID,Description FROM dbo.MessagesWHERE CONTAINS(Description,'file AND NOT boundary')ORDER BY MessageID;
SELECT MessageID,Description FROM dbo.MessagesWHERE CONTAINS(Description,'filing')ORDER BY MessageID;
SELECT MessageID,Description FROM dbo.MessagesWHERE CONTAINS(Description,'file AND NOT boundary')ORDER BY MessageID;
• Searches for words
• Can use operators AND, OR, AND NOT
• Can use proximity (NEAR), inflectional, and thesaurus forms
FREETEXT Queries
SELECT MessageID,Description FROM dbo.MessagesWHERE FREETEXT(Description, 'statement was terminated')ORDER BY MessageID;
SELECT MessageID,Description FROM dbo.MessagesWHERE FREETEXT(Description, 'statement was terminated')ORDER BY MessageID;
• Are used to search for values that match the meaning, not just the wording
• Internally assign each term a weight and then find matches
• Work on a single table but can work with joins of multiple tables in a FROM clause
Table Functions and Ranking Results
SELECT m.MessageID,m.Description,ft.RANKFROM dbo.Messages AS mINNER JOIN FREETEXTTABLE(dbo.Messages,Description, 'statement was terminated') AS ftON m.MessageID = ft.[KEY]ORDER BY ft.RANK DESC;
SELECT m.MessageID,m.Description,ft.RANKFROM dbo.Messages AS mINNER JOIN FREETEXTTABLE(dbo.Messages,Description, 'statement was terminated') AS ftON m.MessageID = ft.[KEY]ORDER BY ft.RANK DESC;
• Table-valued function versions of CONTAINS and FREETEXT provide ranking of relevance CONTAINSTABLE FREETEXTTABLE
• KEY and RANK columns are provided by the functions
Thesaurus
<thesaurus xmlns="x-schema:tsSchema.xml"> <diacritics_sensitive>0</diacritics_sensitive> <expansion> <sub>user</sub> <sub>operator</sub> <sub>developer</sub> </expansion> <replacement> <pat>NT5</pat> <pat>W2K</pat> <sub>Windows 2000</sub> </replacement></thesaurus>
<thesaurus xmlns="x-schema:tsSchema.xml"> <diacritics_sensitive>0</diacritics_sensitive> <expansion> <sub>user</sub> <sub>operator</sub> <sub>developer</sub> </expansion> <replacement> <pat>NT5</pat> <pat>W2K</pat> <sub>Windows 2000</sub> </replacement></thesaurus>
• Allows searching for words other than those specified
• Provides Replacements and Expansions
• Is implemented as an XML file at the SQL Server instance level
Stopwords and Stoplists
SELECT * FROM sys.fulltext_system_stopwords WHERE language_id = 1033;
CREATE FULLTEXT STOPLIST CompanyNames;
ALTER FULLTEXT STOPLIST CompanyNames ADD 'Microsoft' LANGUAGE 1033;
SELECT * FROM sys.fulltext_system_stopwords WHERE language_id = 1033;
CREATE FULLTEXT STOPLIST CompanyNames;
ALTER FULLTEXT STOPLIST CompanyNames ADD 'Microsoft' LANGUAGE 1033;
• Not all words in any language are useful in an index
• Company names, etc. are often in every document and often useless to index
• sys.fulltext_system_stopwords shows the built-in stopwords by language
• Stoplists can be created manually
• Words in Stoplists are not indexed by iFTS
SQL Server Management of Full-Text
• Full-text indexes live within the database Inexes are backed up and/or restored along with the database
ALTER INDEX REORGANIZE can be used to defragment a full-text index
ALTER FULLTEXT CATALOG REORGANIZE causes a master merge of the full-text indexes in the catalog
• sys.dm_fts_parser and other DMVs are useful for troubleshooting
• sys.fulltext_document_types shows indexable document types
Demonstration 3A: Working with Full-Text Queries
In this demonstration, you will see how to:
• query a full-text index
• locate the built-in stopwords
• create a stoplist and add a value to it
• check the parsing of text by the full-text engine
Lab 20: Working with Full -Text Indexes and Queries
• Exercise 1: Implement a full-text index
• Exercise 2: Implement a stoplist
• Challenge Exercise 3: Create a stored procedure to implement a full-text search (Only if time permits)
Logon information
Estimated time: 45 minutes
Virtual machine 623XB-MIA-SQL
User name AdventureWorks\Administrator
Password Pa$$w0rd
Lab Scenario
Users have been complaining about the limited querying ability provided in the marketing system. You are intending to use full-text indexing to address these complaints.
You will implement a full-text index on the Marketing.ProductDescription table to improve this situation.
You will implement a stoplist to avoid excessive unnecessary index size.
If you have time, your manager would like you to help provide a more natural interface for your users. This will involve creating a new stored procedure.
Lab Review
• What sorts of values would be useful in stoplists?
• What sorts of values would be useful in a thesaurus?
Module Review and Takeaways
• Review Questions
• Best Practices
Course Evaluation