Sphinx - High performance full-text search for MySQL

Preview:

Citation preview

Sphinx - High performance full-text search for MySQL

Nguyen Van Vuong - Framgia

Agenda

❖ Full-text search❖ What’s Sphinx ?❖ Why Sphinx ?❖ Sphinx workflow

➢ Indexing➢ Searching➢ Query syntax

❖ How does it scale ?❖ More about Sphinx❖ References

2

Full-text search

3

Full-text search

❖ Full-text search is one of the techniques for searching a document or database stored➢ Examines all of the words➢ Tries to match the search query

Articles

id (integer) title (varchar) content (text) tag (varchar)

4

❖ Example

Full-text search

❖ Full-text search is one of the techniques for searching a document or database stored➢ Examines all of the words➢ Tries to match the search query

5

❖ ExampleSELECT * FROM articles

WHERE MATCH (title, content) AGAINST ('database' IN NATURAL LANGUAGE MODE)

Full-text search - Term search vs Full-text search

❖ Search keywords: “I ate pizza yesterday”❖ Term search

➢ No analysis phase➢ Operate on a single term

6

Full-text search - Term search vs Full-text search

❖ Full-text search➢ Tokenizer/analyzer

■ Breaking keywords down by whitespace and punctuation

■ Charset table➢ Morphology preprocessors

■ Normalize both "dogs" and "dog" to "dog"● Eat, eating, eaten, ate 7

What’s Sphinx ?

8

What’s Sphinx ?

❖ Sphinx is a mythical creature with the head of a human and the body of a lion

9

What’s Sphinx ?

❖ Sphinx is a mythical creature with the head of a human and the body of a lion

10

What’s Sphinx ?

❖ Full-text search engine❖ Free open source (GPL v2)❖ Begin 10 years ago❖ High performance❖ Integrate well with SQL databases❖ API exist for Perl, C#, Ruby, Java, PHP❖ Available for Linux, Windows, Mac OS

11

Why Sphinx ?

12

Why sphinx ?

❖ Quick to learn

❖ Easy to use

❖ Simple to maintain

13

Why sphinx ?

❖ Speed➢ 50x-100x faster than MySQL Fulltext➢ Up to 1000x faster than MySQL in extreme cases

(eg. large result set with GROUP BY)

❖ Feature-rich➢ Relevancy (BM25)➢ Synonyms➢ Stopwords➢ Real-time index➢ ... 14

Why sphinx ?

❖ Scalable➢ Aggregates search results from many sources➢ Fully transparent to calling application➢ Built-in load balancing

❖ Easy to Integrate➢ SphinxApi➢ SphinxSQL

15

Sphinx workflow

16

Spinx workflow

17

Application

Database

Sphinx Daemon

Sphinx Indexer Sphinx Index

1. Search query

2. Search results (IDs)

3. F

etch

doc

by

ID

Sphinx workflow - Indexing

❖ Configuration➢ sphinx.conf

❖ Data sources

18

❖ Character level➢ Charset_table

■ Use ranges: a...z, U+410...U+42F➢ Ngram_chars

■ Hieroglyphs as separate tokens● Chinese, Japanese, …● Unicode charset CJKV

Sphinx workflow - Indexing

19

Sphinx workflow - Indexing

❖ Word level➢ Stopwords

■ Avoid wasting index space■ Example

● Don’t want to search for (like “I”, “Am”, “An”, etc)

➢ Stemming■ Single word can appear in many forms when

used in different contexts20

Sphinx workflow - Indexing

❖ Building index

21

$ sudo service sphinxsearch start

$ sudo indexer --config <file> --all

$ sudo indexer --config <file> --rotate

Sphinx workflow - Searching

❖ Configuring search daemon

22

searchd {listen =

localhost:9312listen =

9306:mysqllog =

/var/log/sphinxsearch/searchd.logquery_log =

/var/log/sphinxsearch/query.logread_timeout = 5client_timeout = 300max_children = 30persistent_connections_limit = 30pid_file =

/var/run/sphinxsearch/searchd.pid...

}

Sphinx workflow - Searching

❖ Sphinx Api➢ Perl, C#, Ruby, Java, PHP➢ Example in PHP

23

Sphinx workflow - Searching

❖ SphinxQL➢ Connect via MySQL Client

➢ Query like MySQL

24

$ mysql -h<ip> -P<port_of_sphinx>

SELECT * FROM myindex

WHERE MATCH ('@(title,content) find me fast');

Sphinx workflow - Searching

❖ SphinxQL➢ Connect via MySQL Client

25

Sphinx workflow - Query syntax

❖ Boolean search AND OR NOT: hello | world hello & world hello -world

❖ Per-field search@title hello, @body world

❖ Field combination@(title, body) hello world

❖ Search within first N words@body[50] hello

❖ Phrase search“hello world”

26

Sphinx workflow - Query syntax

27

❖ Per field relevancy ranking weightsSPH_MATCH_ALLSPH_MATCH_ANYSPH_MATCH_FULLSCAN

❖ Proximity search"people passion"~3

❖ GEO distance search (with syntax for mi/km/m)GEODIST(0.659298124, -2.136602399, latitude,

longitude)

How does it scale ?

28

How does it scale ?

❖ Distribution is done horizontally➢ Search is performed across different nodes

❖ Set up an index on multiple servers

29

How does it scale ?

❖ Adding distributed index configuration➢ First server (192.168.1.1)

30

index master{

type = distributed# Local index to be searchedlocal = items# Remote agent (index) to be searchedagent = 192.168.1.2:9312:items-2

}

More about sphinx

31

More about Sphinx

❖ Biggest known Sphinx cluster➢ Indexes 25+ billion

documents➢ Over 9TB of data➢ 1+ million

searches/day

32

❖ Busiest known Sphinx cluster➢ 300+ million search

queries/day.

❖ Books

References

❖ Sphinx document (v2.2.1)❖ Sphinx Search Beginner's Guide - Abbas Ali❖ Meet the Sphinx - Andrew Aksyonoff❖ Advanced fulltext search with Sphinx - Adrian

Nuta❖ Search Big Data with MySQL and Sphinx -

Mindaugas Zukas

33

34

Thank you

Time for action

35

https://github.com/euclid1990/php-sphinx-search

Recommended