Upload
redis-labs
View
357
Download
0
Embed Size (px)
Citation preview
Redis Labs – Home of Redis
• Founded in 2011• HQ in Mountain View CA, R&D center in Tel-Aviv IL
The commercial company behind Open Source Redis
Provider of the Redis Enterprise (Redise) technology, platform and products
Redise Cloud Private
Redis Labs Products
Redise Cloud Redise Pack ManagedRedise Pack
SERVICES SOFTWARE
Fully managed Redise service in VPCs within AWS, MS Azure, GCP
& IBM Softlayer
Fully managed Redise service on hosted servers within AWS, MS
Azure, GCP, IBM Softlayer, Heroku, CF & OpenShift
Downloadable Redise software for any enterprise datacenter or
cloud environment
Fully managed Redise Pack in private data centers
&& &
So… Searching in Redis?
We all know and love Redis as a data store, but...
● GET user1? no problemo!
● HGETALL user1? sure thing!
● GET users WHERE name contains “jeff”? NOPE!
Redis has no built in concept of indexing or search, duh!
4
Search 101
The core concept: The Index
● Take a document
● Break it apart
● Map terms/properties ⇒ docs in a Posting List
Searching is getting documents that are linked to
the terms
5
Building an Index
doc1
{ title: “LCD TV”, price: 500, description: “foo” }
Indexing Process
Term Posting List
lcd doc1, doc3, doc5...
tv doc1, doc7, doc19...
foo doc1
bar doc1, doc100, ...
doc1
{ title: “LCD TV”, price: 500, description: “foo” }
doc1
{ title: “LCD TV”, price: 500, description: “foo” }
DocumentsIndex
Price Index
…. ….
500 doc1, doc7, doc19...
503 doc3
…. ...
6
So… Searching With Redis?
Redis is just Key=>Value, right?But can we use its advanced data types?
Key
"I'm a Plain Text String!"
{ A: “foo”, B: “bar”, C: “baz” }
Strings/Blobs/Bitmaps
Hash Tables (objects!)
Linked Lists
Sets
Sorted Sets
Geo Sets
HyperLogLog
{ A , B , C , D , E }
[ A → B → C → D → E ]
{ A: 0.1, B: 0.3, C: 100, D: 1337 }
{ A: (51.5, 0.12), B: (32.1, 34.7) }
00110101 11001110 10101010
Looks like a set will do!
7
Searching With Redis - Vanilla Style
● Two most suitable structures: Sorted Sets & Sets
● Remember, the core of searching is:
term ⇒ [doc_id, doc_id, ….]
● So how about this: redis> SADD hello doc1redis> SADD world doc1redis> SINTER hello world
8
Searching With Redis - Vanilla Style
Even better with Sorted Sets - we can have scores!
> ZADD hello 1.0 doc1 0.5 doc2> ZADD world 1.0 doc 0.5 doc2
> ZINTERSTORE hw 2 hello world> ZRANGE hw 0 -1
9
Vanilla Style Only Goes So Far...
● LOTS of memory - ZSETs take about 60B overhead per record!
● Intersections and unions can get slow● No exact phrase matching● No proximity boosting
● etc, etc...
10
Enter Redis Modules
● Since 4.0
● Dynamic libraries loaded to Redis
● With a C API
● Can extend Redis with:
New Capabilities
New Commands
New Data Types
11
Search With Modules
● So now Everything Is PossibleTM!
● We implement our own structures & algorithms!
● With less memory!
● And less CPU time!
● And more capabilities!
● And use existing libraries!
12
Why Write From Scratch?
● Most widely used search libraries are either:
○ In Java
○ Very complex and bloated
○ Written with disk in mind
Plus, where is the fun in that?
13
Introducing RediSearch
● Completely from-scratch, in C● Optimized data structures● Stores documents as Hashes● Extremely fast indexing and search● Text, numeric and geo filters● Fast non blocking updates and deletes● Scalable distributed mode up to billions of docs
14
RediSearch In Action
> FT.CREATE products SCHEMA title TEXT price NUMERIC
> FT.ADD products prod1 1.0 FIELDS title "Toshiba 32 LCD TV" price 250> FT.ADD products prod2 1.0 FIELDS title "Samsung 42 LCD TV" price 350
> FT.SEARCH products "lcd tv" 1) (integer) 22) "prod1"
…4) "prod2"
…> FT.SEARCH products "lcd tv" FILTER price 300 4001) (integer) 1"prod2"
15
RediSearch Data Structures
Key
[prod1, prod2, …]
{title:“Toshiba 32 LCD TV”, …}
Document Table
Posting Lists (Inverted Index)
Documents (HASHes)
AutoComplete (Trie)
Numeric Index (Range tree)
Geo Sets
{200..400: [prod1, prod2, …]… }
Toshib*
{ A: (51.5, 0.12), B: (32.1, 34.7) }
{1: prod1, 2: prod2, …}
16
RediSearch Search Request
FT.SEARCH {index} {query} [NOCONTENT] [VERBATIM] [NOSTOPWORDS] [WITHSCORES] [WITHPAYLOADS] [FILTER {numeric_field} {min} {max}] ... [GEOFILTER {geo_field} {lon} {lat} {raius} m|km|mi|ft] [INKEYS {num} {key} ... ] [INFIELDS {num {field} ... ] [SLOP {slop}] [INORDER] [LANGUAGE {language}] [EXPANDER {expander}] [SCORER {scorer}] [PAYLOAD {payload}] [SORTBY {field} [ASC|DESC]] [LIMIT offset num]
18
Search Request Flow
Request/Query Parsing
Query Expansion
Query Execution
Document Ranking
Top-n + PagingDocument
Loading
Response
Request
19
Evaluating Queries
● Document at-a-time approach● Indexes are decoded as iterators, one object per
round● Top N results copied into a
heap & sorted on the fly INTERSECT
UNION Obama
Barack Barrack
Date: 2010-2016
Scoring function
20
Scoring Results
● Each document in the query's result is evaluated by a Scoring Function
● Top-N scored documents are selected● The default uses:
○ Term Frequency–Inverse Document Frequency (TF-IDF)○ User given a priori document score○ Proximity boosting
● And you can write your own function!
21
Complex Query Example
@title:(lcd tv)
~@tags|description:”42 inch”
@price:[100 500]
-@type:4K|Plasma
32
Extending RediSearch
● Has its own module system○ C/C++ API○ Minimal, simple API
● Supports custom queryexpanders (e.g. synonyms)
● Supports custom rankingfunctions
34
Auto-Complete
● Custom data type - trie based
● Fuzzy completions
● Manual dictionary building and
scoring
● Unicode aware case-folding
35
Scaling RediSearch
● What do we do when the index is too big for one
machine?
● We need to split the index on multiple machines
● But it’s not that simple!
37
Scaling RediSearch
● Partition by document id● Each partition contains a full index of 1/N of the
documents● All terms for a sub-index are stored on the same
shard● This means we need a query coordinator:
○ Distribute the queries to all shards○ Merge the results○ Top-N of Top-Ns
38
RediSearch Cluster Fully-Connected Fan-Out/In
Node 1 Node 3
“foo” {3}
“foo” {4}
“foo” {5}
“foo” {6}
“foo” {2}
“foo” {1}
ClientFT.SEARCH idx “foo”Node 2
RediSearch Cluster Tree-Based Fan-Out/In
Node 1
Node 2
Node 3
“foo” {5},{6},{7},{8} “foo” {9},{10},{11},{12}
ClientFT.SEARCH idx “foo”
RediSearch Cluster Tree-Based Fan-Out/In
Node 1
Node 2
Node 3
“foo” {5},{6},{7},{8} “foo” {9},{10},{11},{12}
ClientFT.SEARCH idx “foo”
“foo” {5}
“foo” {7}
“foo” {6}
“foo” {8}
“foo” {1}
“foo” {3}
“foo” {2}
“foo” {4}
“foo” {9}
“foo” {11}
“foo” {10}
“foo” {12}
http://redisearch.io - Thank You!
Dvir VolkSenior ArchitectRedis Labs
@dvirsky
Itamar HaberChief OSS Education OfficerRedis Labs
@itamarhaber