13
Tsofit Mano-Izhar Customer Engineer [email protected]

Elasticsearch and Fuzzy Name Matching

Embed Size (px)

Citation preview

Page 1: Elasticsearch and Fuzzy Name Matching

Tsofit Mano-IzharCustomer [email protected]

Page 2: Elasticsearch and Fuzzy Name Matching

Our expertise

Page 3: Elasticsearch and Fuzzy Name Matching

Why are we here?

● We have the linguistic tools that can boost

name searching apps (RNI)

● We have many customers that use open

source search platforms

● We developed a plugin for ElasticSearch

that enables you to do fuzzy name matching

Page 4: Elasticsearch and Fuzzy Name Matching

What is it good for?

Page 5: Elasticsearch and Fuzzy Name Matching

The Boston Bombers

Dzhokhar Anzorovich "Jahar" Tsarnaev Джоха́р Анзо́рович Царна́ев

Tamerlan Anzorovich Tsarnaev Тамерла́н Анзо́рович Царна́ев

Page 6: Elasticsearch and Fuzzy Name Matching

What kinds of name variation?

Page 7: Elasticsearch and Fuzzy Name Matching

“Jesus Alfonso Lopez Diaz”

vs.

“LobezDias, Chuy”

Page 8: Elasticsearch and Fuzzy Name Matching

Demo

Page 9: Elasticsearch and Fuzzy Name Matching

Mapping without the plug-in

● multi_field type with a field per possible variation "mappings": { "ofac" : { "properties" : { "name" : { "type" : "multi_field", "fields" : { "surname": { "type": "string", "analyzer": "simple" },

"metaphone": { "type": "string", "analyzer": "metaphone"},

"porter": { "type": "string", "analyzer": "porter" } … } } } }}

Page 10: Elasticsearch and Fuzzy Name Matching

Mapping with our plug-in

● Plugin contains custom name mapper which does all the work behind the scenes

PUT /ofac/ofac/_mapping{ "ofac" : { "properties" : { "name" : { "type:" : "name" } "aka" : { "type:" : "name" } } }}

Page 11: Elasticsearch and Fuzzy Name Matching

Better ranking and higher precision

● rescore_query

"rescore" : {

"query" : {

"rescore_query" : {

"function_score" : {

"name_score" : {

"field" : "name",

"query_name" : "LobEzDiaS, Chuy"

}

...

Page 12: Elasticsearch and Fuzzy Name Matching

Rescore Query

Main Query

Plug-in Implementation

match : { name: "Bob Smitty" }

bool:name.Key1:...name.Key2:...name.Key3:...

User Query

Rescorename_score : { field : "name", name : "Bob

Smitty")

name:"Robert Smith"dob:2/13/1987score : .79

Indexing

{ name: "Robert Smith"dob:"1987/02/13" }

{ name: "Robert Smith"name.Key1:…name.Key2:…name.Key3:…dob: "1987/02/13" }

User Doc

Index

subset

Page 13: Elasticsearch and Fuzzy Name Matching

Thank you !