10

Click here to load reader

SQLite

Embed Size (px)

Citation preview

Page 1: SQLite

SQLite full-text search extension

Page 2: SQLite

FTS3 and FTS4

• Allows to create special virtual tables with a built-in full-text index

• Consumes more space

Page 3: SQLite

Table of 500 000 rows

FTS3

• CREATE VIRTUAL TABLE table1 USING fts3(content TEXT);

/* FTS3 table */

• SELECT count(*) FROM table1 WHERE content MATCH 'linux';

/* 0.03 seconds */

Ordinary table

• CREATE TABLE table2(content TEXT); /* Ordinary table */

• SELECT count(*) FROM table2 WHERE content LIKE '%linux%';

/* 22.5 seconds */

Page 4: SQLite

Creating tables

• CREATE VIRTUAL TABLE users USING fts3(

• USER_ID INTEGER PRIMARY KEY AUTOINCREMENT,

• NAME TEXT NOT NULL,

• PHONE INTEGER NOT NULL,

• UNIQUE (USER_ID) ON CONFLICT REPLACE, tokenize=porter

• )

Page 5: SQLite

Deleting tables

• CREATE VIRTUAL TABLE data USING fts3();

• DROP TABLE data;

Page 6: SQLite

Populating FTS Tables

• Regular INSERT, UPDATE, DELETE are used

• Contains hidden ‘rowid’ column

Page 7: SQLite

Triggers

• CREATE TRIGGER TRIGGER_INSERT_USER AFTER INSERT ON USER BEGIN INSERT INTO USER_SEARCH_TABLE

• VALUES(new.user_id, new.user_name); END;

• CREATE TRIGGER TRIGGER_INSERT_USER AFTER UPDATE ON USER BEGIN UPDATE USER_SEARCH_TABLE SET user_name=new.user_name where user_id=old.user_id; END;

Page 8: SQLite

Queries

• Query by rowid

• SELECT * FROM user WHERE rowid = 15;

• Full-text query

• SELECT * FROM SEARCH_USER_DATA

WHERE SEARCH_USER_DATA MATCH ‘starcraft';

Page 9: SQLite

Full-text Index Queries

• Token or token prefix queries SELECT * FROM docs WHERE docs MATCH 'linux';

SELECT * FROM docs WHERE docs MATCH 'lin*';

• Phrase queries. SELECT * FROM docs WHERE docs MATCH '"linux applications"';

SELECT * FROM docs WHERE docs MATCH '"lin* app*"';

• NEAR queries. • SELECT * FROM users WHERE users MATCH ‘android NEAR starcraft';

• SELECT * FROM users WHERE users MATCH ‘android NEAR/5 starcraft';

Page 10: SQLite

Tokenizers

• Tokenizer is a set of rules for extracting terms from a document

• Default value is ‘simple’

• Simple: converts to lower case, splitting by alphanumeric+’_’

• Porter: simple + converts to common English root.

• ICU: country specific (tokenize=icu th_TH for Turkey)

• Custom implementation