Click here to load reader
Upload
kirill-zotin
View
844
Download
2
Embed Size (px)
Citation preview
SQLite full-text search extension
FTS3 and FTS4
• Allows to create special virtual tables with a built-in full-text index
• Consumes more space
Table of 500 000 rows
FTS3
• CREATE VIRTUAL TABLE table1 USING fts3(content TEXT);
/* FTS3 table */
• SELECT count(*) FROM table1 WHERE content MATCH 'linux';
/* 0.03 seconds */
Ordinary table
• CREATE TABLE table2(content TEXT); /* Ordinary table */
• SELECT count(*) FROM table2 WHERE content LIKE '%linux%';
/* 22.5 seconds */
Creating tables
• CREATE VIRTUAL TABLE users USING fts3(
• USER_ID INTEGER PRIMARY KEY AUTOINCREMENT,
• NAME TEXT NOT NULL,
• PHONE INTEGER NOT NULL,
• UNIQUE (USER_ID) ON CONFLICT REPLACE, tokenize=porter
• )
Deleting tables
• CREATE VIRTUAL TABLE data USING fts3();
• DROP TABLE data;
Populating FTS Tables
• Regular INSERT, UPDATE, DELETE are used
• Contains hidden ‘rowid’ column
Triggers
• CREATE TRIGGER TRIGGER_INSERT_USER AFTER INSERT ON USER BEGIN INSERT INTO USER_SEARCH_TABLE
• VALUES(new.user_id, new.user_name); END;
• CREATE TRIGGER TRIGGER_INSERT_USER AFTER UPDATE ON USER BEGIN UPDATE USER_SEARCH_TABLE SET user_name=new.user_name where user_id=old.user_id; END;
Queries
• Query by rowid
• SELECT * FROM user WHERE rowid = 15;
• Full-text query
• SELECT * FROM SEARCH_USER_DATA
WHERE SEARCH_USER_DATA MATCH ‘starcraft';
Full-text Index Queries
• Token or token prefix queries SELECT * FROM docs WHERE docs MATCH 'linux';
SELECT * FROM docs WHERE docs MATCH 'lin*';
• Phrase queries. SELECT * FROM docs WHERE docs MATCH '"linux applications"';
SELECT * FROM docs WHERE docs MATCH '"lin* app*"';
• NEAR queries. • SELECT * FROM users WHERE users MATCH ‘android NEAR starcraft';
• SELECT * FROM users WHERE users MATCH ‘android NEAR/5 starcraft';
Tokenizers
• Tokenizer is a set of rules for extracting terms from a document
• Default value is ‘simple’
• Simple: converts to lower case, splitting by alphanumeric+’_’
• Porter: simple + converts to common English root.
• ICU: country specific (tokenize=icu th_TH for Turkey)
• Custom implementation