23
Name Matching with Elasticsearch June 25, 2015 Graham Morehead [email protected]

Simple fuzzy Name Matching in Elasticsearch - Graham Morehead

Embed Size (px)

Citation preview

Page 1: Simple fuzzy Name Matching in Elasticsearch - Graham Morehead

Name Matchingwith Elasticsearch

June 25, 2015Graham Morehead

[email protected]

Page 2: Simple fuzzy Name Matching in Elasticsearch - Graham Morehead

April 15 2013 2:49 PM .

Page 3: Simple fuzzy Name Matching in Elasticsearch - Graham Morehead
Page 4: Simple fuzzy Name Matching in Elasticsearch - Graham Morehead
Page 5: Simple fuzzy Name Matching in Elasticsearch - Graham Morehead
Page 6: Simple fuzzy Name Matching in Elasticsearch - Graham Morehead
Page 7: Simple fuzzy Name Matching in Elasticsearch - Graham Morehead
Page 8: Simple fuzzy Name Matching in Elasticsearch - Graham Morehead

Real life exampleDavid K. MurgatroydVP of Engineering

Boarding Pass

Page 9: Simple fuzzy Name Matching in Elasticsearch - Graham Morehead

Best Practice using Elasticsearch?

● NameMapper (http://stackoverflow.com/questions/20632042/elasticsearch-searching-for-human-names)

"mappings": { ... "type": "multi_field", "fields": {

"pty_surename": { "type": "string", "analyzer": "simple" },

"metaphone": { "type": "string", "analyzer": "metaphone" },

"porter": { "type": "string", "analyzer": "porter" } …

● rescore_query

Page 10: Simple fuzzy Name Matching in Elasticsearch - Graham Morehead

“Jesus Alfonso Lopez Diaz”

vs.

“LobezDias, Chuy”

Page 11: Simple fuzzy Name Matching in Elasticsearch - Graham Morehead

RNI

Page 12: Simple fuzzy Name Matching in Elasticsearch - Graham Morehead
Page 13: Simple fuzzy Name Matching in Elasticsearch - Graham Morehead
Page 14: Simple fuzzy Name Matching in Elasticsearch - Graham Morehead
Page 15: Simple fuzzy Name Matching in Elasticsearch - Graham Morehead

Rescore Query

Main Query

Plug-in Implementation

match : { name: "Bob Smitty" }

bool:name.Key1:...name.Key2:...name.Key3:...

User Query

Rescorename_score : { field : "name", name : "Bob

Smitty")

name:"Robert Smith"dob:2/13/1987score : .79

Indexing

{ name: "Robert Smith"dob:"1987/02/13" }

{ name: "Robert Smith"name.Key1:…name.Key2:…name.Key3:…dob: "1987/02/13" }

User Doc

Index

subset

Page 16: Simple fuzzy Name Matching in Elasticsearch - Graham Morehead

Demo

Page 17: Simple fuzzy Name Matching in Elasticsearch - Graham Morehead

Elastic + RNI

Page 18: Simple fuzzy Name Matching in Elasticsearch - Graham Morehead

Name Matchingwith Elasticsearch

June 25, 2015Graham Morehead

[email protected]

Page 19: Simple fuzzy Name Matching in Elasticsearch - Graham Morehead

How could you use such a Field?

● Plugin contains custom mapper which does all the work behind the scenesPUT /ofac/ofac/_mapping{ "ofac" : { "properties" : { "name" : { "type:" : "rni_name" } "aka" : { "type:" : "rni_name" } } }}

Page 20: Simple fuzzy Name Matching in Elasticsearch - Graham Morehead

What happens at index time?

● NameMapper indexes keys for different phenomena in separate (sub) fields@Override

public void parse(ParseContext context) throws IOException {

Name name = NameBuilder.data(nameString).build();

//Generate keys for name

Collection<FieldSpec> fields = helper.deriveFieldsForName(name);

//Parse each key with the appropriate Mapper

for (FieldSpec field : fields) {

Mapper mapper = keyMappers.get(field.getField().fieldName());

context = context.createExternalValueContext(field.getStringValue());

mapper.parse(context);

}

}

Page 21: Simple fuzzy Name Matching in Elasticsearch - Graham Morehead

What happens at query time?

● Step #1: NameMapper generates analogous keys for a custom Lucene query that finds good candidates for re-scoring@Override

public Query termQuery(Object value, @Nullable QueryParseContext context) {

//Parse name string

Name name = NameBuilder.data(value.toString()).build();

QuerySpec spec = helper.buildQuerySpec(new NameIndexQuery(name));

//Build Lucene query

Query query = spec.accept(new ESQueryVisitor(names.indexName() + "."));

return query;

}

Page 22: Simple fuzzy Name Matching in Elasticsearch - Graham Morehead

What else happens at query time?

● Step #2: Uses a Rescore query to score names in the best candidate documents and reorder accordingly○ Tuned for high precision name matching○ Computationally expensive"rescore" : {

"query" : {

"rescore_query" : {

"function_score" : {

"name_score" : {

"field" : "name",

"query_name" : "LobEzDiaS, Chuy"

}

...

Page 23: Simple fuzzy Name Matching in Elasticsearch - Graham Morehead

● The 'name_score' function matches the query name against the indexed name in every candidate document and returns the similarity score

@Override

public double score(int docId, float subQueryScore) {

//Create a scorer for the query name

CachedScorer cs = createCachedScorer(queryName);

//Retrieve name data from doc values

nameByteData.setDocument(docId);

Name indexName = bytesToName(nameByteData.valueAt(i).bytes);

//Score the query against the indexed name in this document

return cs.score(indexName);

}

What does that function do?