Elasticsearch and Fuzzy Name Matching

Preview:

Citation preview

Tsofit Mano-IzharCustomer Engineertsofit@basistech.com

Our expertise

Why are we here?

● We have the linguistic tools that can boost

name searching apps (RNI)

● We have many customers that use open

source search platforms

● We developed a plugin for ElasticSearch

that enables you to do fuzzy name matching

What is it good for?

The Boston Bombers

Dzhokhar Anzorovich "Jahar" Tsarnaev Джоха́р Анзо́рович Царна́ев

Tamerlan Anzorovich Tsarnaev Тамерла́н Анзо́рович Царна́ев

What kinds of name variation?

“Jesus Alfonso Lopez Diaz”

vs.

“LobezDias, Chuy”

Demo

Mapping without the plug-in

● multi_field type with a field per possible variation "mappings": { "ofac" : { "properties" : { "name" : { "type" : "multi_field", "fields" : { "surname": { "type": "string", "analyzer": "simple" },

"metaphone": { "type": "string", "analyzer": "metaphone"},

"porter": { "type": "string", "analyzer": "porter" } … } } } }}

Mapping with our plug-in

● Plugin contains custom name mapper which does all the work behind the scenes

PUT /ofac/ofac/_mapping{ "ofac" : { "properties" : { "name" : { "type:" : "name" } "aka" : { "type:" : "name" } } }}

Better ranking and higher precision

● rescore_query

"rescore" : {

"query" : {

"rescore_query" : {

"function_score" : {

"name_score" : {

"field" : "name",

"query_name" : "LobEzDiaS, Chuy"

}

...

Rescore Query

Main Query

Plug-in Implementation

match : { name: "Bob Smitty" }

bool:name.Key1:...name.Key2:...name.Key3:...

User Query

Rescorename_score : { field : "name", name : "Bob

Smitty")

name:"Robert Smith"dob:2/13/1987score : .79

Indexing

{ name: "Robert Smith"dob:"1987/02/13" }

{ name: "Robert Smith"name.Key1:…name.Key2:…name.Key3:…dob: "1987/02/13" }

User Doc

Index

subset

Thank you !

Recommended