27
Module 20 Working with Full- Text Indexes and Queries

Module 20 Working with Full-Text Indexes and Queries

Embed Size (px)

Citation preview

Page 1: Module 20 Working with Full-Text Indexes and Queries

Module 20

Working with Full-Text Indexes and Queries

Page 2: Module 20 Working with Full-Text Indexes and Queries

Module Overview

• Introduction to Full-Text Indexing

• Implementing Full-Text Indexes in SQL Server

• Working with Full-Text Queries

Page 3: Module 20 Working with Full-Text Indexes and Queries

Lesson 1: Introduction to Full-Text Indexing

• Discussion: The Need for More Flexible User Interaction

• Why LIKE Isn't Enough

• Fuzziness in Queries

• Demonstration 1A: Using Full-Text Queries

Page 4: Module 20 Working with Full-Text Indexes and Queries

Discussion: The Need for More Flexible User Interaction

Consider this search page:

It’s hard to imagine anyone wanting separate fields, etc. yet this is exactly how we build most business applications today Why?

Ask yourself if you’d prefer an interface like this:

Page 5: Module 20 Working with Full-Text Indexes and Queries

Why LIKE Isn't Enough

• We can try to build more flexible search using the T-SQL LIKE operator

• A search for somename named Lee returns Ashlee, Carolee, Colleen, Kathleen, Kaylee, Lee, Shirleen, etc.

• Substrings aren't words: a search for Pen returns pen, pencil, pendulum, penitentiary, open, etc.

• Searching for two words gets even more complicated:

SELECT DISTINCT FirstName FROM Person.Person WHERE FirstName LIKE '%Lee%';

SELECT DISTINCT FirstName FROM Person.Person WHERE FirstName LIKE '%Lee%';

WHERE Details LIKE '%Fred%Terry%'OR Details LIKE '%Terry%Fred%'WHERE Details LIKE '%Fred%Terry%'OR Details LIKE '%Terry%Fred%'

Page 6: Module 20 Working with Full-Text Indexes and Queries

Fuzziness in Queries

• IT Professionals tend to like to work in an exact and precise way

• End users prefer flexible and fuzzy search capabilities

• You might be able to find substrings in T-SQL but how would you find: the word Kathleen near the word bicycle?

the word Client when the search term was Customer?

the words Driving or Drove when the search term was Drive?

rows relating to Attempts to Improve Solar Energy Efficiency?

Page 7: Module 20 Working with Full-Text Indexes and Queries

Demonstration 1A: Using Full-Text Queries

In this demonstration you will see why full-text indexing is important for creating advanced and flexible user interfaces

Page 8: Module 20 Working with Full-Text Indexes and Queries

Lesson 2: Implementing Full-Text Indexes in SQL Server

• Discussion: Search-related Options

• Full-Text Search in SQL Server

• Core Components of Full-Text Search

• Language Support and Supported Word Breakers

• Implementing Full-Text Indexes

• Demonstration 2A: Implementing Full-Text Indexes

Page 9: Module 20 Working with Full-Text Indexes and Queries

Discussion: Search-related Options

• Which forms of search are you familiar with? Bing?

Search in the operating system?

Search in Outlook?

Full-text search in earlier versions of SQL Server?

Other search engines?

Page 10: Module 20 Working with Full-Text Indexes and Queries

Full-Text Search in SQL Server

• Search allows full-text queries against character-based data stored in SQL Server char, varchar, nchar, nvarchar

text, ntext, image

xml

varbinary(max)

• Indexes are created on the tables containing the character-based

data

are stored in the database along with other data

allows columns to be written in many languages

can query with simple words or phrases

can rank results via table-valued functions

Page 11: Module 20 Working with Full-Text Indexes and Queries

Core Components of Full-Text Search

Component Purpose

iFilter Extracts a stream of text

Word Breaker Finds word boundaries

Stemmer Conjugates verbs, performs inflectional expansions

Noise Word Removal Removes words that are not useful in an index

Indexing Creates indexes for extracted words

Querying Executes full-text queries

Scoring Ranks results

Which of these components are likely to be specific to a language?

Page 12: Module 20 Working with Full-Text Indexes and Queries

Language Support and Supported Word Breakers

ArabicBengaliBrazilianBritish English BulgarianCanadian EnglishCatalan Chinese (Simplified)Chinese (Traditional)Chinese (Hong Kong)Chinese (Macau)Chinese (Singapore)CroatianDanishDutchEnglishFrench

GermanGujaratiHebrewHindiIcelandicIndonesianItalianJapaneseKoreanLatvianLithuanianMalay - MalaysiaMalayalamMarathiNeutralNorwegianPolish

PortuguesePunjabiRomanianRussianSerbian (Cryllic)Serbian (Latin)SlovakSlovenianSpanishSwedishTamilTeluguThaiTurkishUkrainianUrdu Vietnamese

Query sys.fulltext_languages to see the current list

Page 13: Module 20 Working with Full-Text Indexes and Queries

Implementing Full-Text Indexes

Steps to implement full-text indexing:

Must have a table with character-based dataüü

Create a full-text catalog (if none already)üü

Create a full-text index on the table

Populate the index

üü

üü

Query the index using full-text predicates or TVFsüü

Page 14: Module 20 Working with Full-Text Indexes and Queries

Demonstration 2A: Implementing Full-Text Indexes

In this demonstration you will see

• how to create a full-text catalog

• how to create a full-text index

• how to check when a full-text index is fully populated

Page 15: Module 20 Working with Full-Text Indexes and Queries

Lesson 3: Working with Full-Text Queries

• CONTAINS Queries

• FREETEXT Queries

• Table Functions and Ranking Results

• Thesaurus

• Stopwords and Stoplists

• SQL Server Management of Full-Text

• Demonstration 3A: Working with Full-Text Queries

Page 16: Module 20 Working with Full-Text Indexes and Queries

CONTAINS Queries

SELECT MessageID,Description FROM dbo.MessagesWHERE CONTAINS(Description,'filing')ORDER BY MessageID;

SELECT MessageID,Description FROM dbo.MessagesWHERE CONTAINS(Description,'file AND NOT boundary')ORDER BY MessageID;

SELECT MessageID,Description FROM dbo.MessagesWHERE CONTAINS(Description,'filing')ORDER BY MessageID;

SELECT MessageID,Description FROM dbo.MessagesWHERE CONTAINS(Description,'file AND NOT boundary')ORDER BY MessageID;

• Searches for words

• Can use operators AND, OR, AND NOT

• Can use proximity (NEAR), inflectional, and thesaurus forms

Page 17: Module 20 Working with Full-Text Indexes and Queries

FREETEXT Queries

SELECT MessageID,Description FROM dbo.MessagesWHERE FREETEXT(Description, 'statement was terminated')ORDER BY MessageID;

SELECT MessageID,Description FROM dbo.MessagesWHERE FREETEXT(Description, 'statement was terminated')ORDER BY MessageID;

• Are used to search for values that match the meaning, not just the wording

• Internally assign each term a weight and then find matches

• Work on a single table but can work with joins of multiple tables in a FROM clause

Page 18: Module 20 Working with Full-Text Indexes and Queries

Table Functions and Ranking Results

SELECT m.MessageID,m.Description,ft.RANKFROM dbo.Messages AS mINNER JOIN FREETEXTTABLE(dbo.Messages,Description, 'statement was terminated') AS ftON m.MessageID = ft.[KEY]ORDER BY ft.RANK DESC;

SELECT m.MessageID,m.Description,ft.RANKFROM dbo.Messages AS mINNER JOIN FREETEXTTABLE(dbo.Messages,Description, 'statement was terminated') AS ftON m.MessageID = ft.[KEY]ORDER BY ft.RANK DESC;

• Table-valued function versions of CONTAINS and FREETEXT provide ranking of relevance CONTAINSTABLE FREETEXTTABLE

• KEY and RANK columns are provided by the functions

Page 19: Module 20 Working with Full-Text Indexes and Queries

Thesaurus

<thesaurus xmlns="x-schema:tsSchema.xml"> <diacritics_sensitive>0</diacritics_sensitive> <expansion> <sub>user</sub> <sub>operator</sub> <sub>developer</sub> </expansion> <replacement> <pat>NT5</pat> <pat>W2K</pat> <sub>Windows 2000</sub> </replacement></thesaurus>

<thesaurus xmlns="x-schema:tsSchema.xml"> <diacritics_sensitive>0</diacritics_sensitive> <expansion> <sub>user</sub> <sub>operator</sub> <sub>developer</sub> </expansion> <replacement> <pat>NT5</pat> <pat>W2K</pat> <sub>Windows 2000</sub> </replacement></thesaurus>

• Allows searching for words other than those specified

• Provides Replacements and Expansions

• Is implemented as an XML file at the SQL Server instance level

Page 20: Module 20 Working with Full-Text Indexes and Queries

Stopwords and Stoplists

SELECT * FROM sys.fulltext_system_stopwords WHERE language_id = 1033;

CREATE FULLTEXT STOPLIST CompanyNames;

ALTER FULLTEXT STOPLIST CompanyNames ADD 'Microsoft' LANGUAGE 1033;

SELECT * FROM sys.fulltext_system_stopwords WHERE language_id = 1033;

CREATE FULLTEXT STOPLIST CompanyNames;

ALTER FULLTEXT STOPLIST CompanyNames ADD 'Microsoft' LANGUAGE 1033;

• Not all words in any language are useful in an index

• Company names, etc. are often in every document and often useless to index

• sys.fulltext_system_stopwords shows the built-in stopwords by language

• Stoplists can be created manually

• Words in Stoplists are not indexed by iFTS

Page 21: Module 20 Working with Full-Text Indexes and Queries

SQL Server Management of Full-Text

• Full-text indexes live within the database Inexes are backed up and/or restored along with the database

ALTER INDEX REORGANIZE can be used to defragment a full-text index

ALTER FULLTEXT CATALOG REORGANIZE causes a master merge of the full-text indexes in the catalog

• sys.dm_fts_parser and other DMVs are useful for troubleshooting

• sys.fulltext_document_types shows indexable document types

Page 22: Module 20 Working with Full-Text Indexes and Queries

Demonstration 3A: Working with Full-Text Queries

In this demonstration, you will see how to:

• query a full-text index

• locate the built-in stopwords

• create a stoplist and add a value to it

• check the parsing of text by the full-text engine

Page 23: Module 20 Working with Full-Text Indexes and Queries

Lab 20: Working with Full -Text Indexes and Queries

• Exercise 1: Implement a full-text index

• Exercise 2: Implement a stoplist

• Challenge Exercise 3: Create a stored procedure to implement a full-text search (Only if time permits)

Logon information

Estimated time: 45 minutes

Virtual machine 623XB-MIA-SQL

User name AdventureWorks\Administrator

Password Pa$$w0rd

Page 24: Module 20 Working with Full-Text Indexes and Queries

Lab Scenario

Users have been complaining about the limited querying ability provided in the marketing system. You are intending to use full-text indexing to address these complaints.

You will implement a full-text index on the Marketing.ProductDescription table to improve this situation.

You will implement a stoplist to avoid excessive unnecessary index size.

If you have time, your manager would like you to help provide a more natural interface for your users. This will involve creating a new stored procedure.

Page 25: Module 20 Working with Full-Text Indexes and Queries

Lab Review

• What sorts of values would be useful in stoplists?

• What sorts of values would be useful in a thesaurus?

Page 26: Module 20 Working with Full-Text Indexes and Queries

Module Review and Takeaways

• Review Questions

• Best Practices

Page 27: Module 20 Working with Full-Text Indexes and Queries

Course Evaluation