Upload
basis-technology
View
83
Download
0
Embed Size (px)
Citation preview
April 15 2013 2:49 PM .
Real life exampleDavid K. MurgatroydVP of Engineering
Boarding Pass
Best Practice using Elasticsearch?
● NameMapper (http://stackoverflow.com/questions/20632042/elasticsearch-searching-for-human-names)
"mappings": { ... "type": "multi_field", "fields": {
"pty_surename": { "type": "string", "analyzer": "simple" },
"metaphone": { "type": "string", "analyzer": "metaphone" },
"porter": { "type": "string", "analyzer": "porter" } …
● rescore_query
“Jesus Alfonso Lopez Diaz”
vs.
“LobezDias, Chuy”
RNI
Rescore Query
Main Query
Plug-in Implementation
match : { name: "Bob Smitty" }
bool:name.Key1:...name.Key2:...name.Key3:...
User Query
Rescorename_score : { field : "name", name : "Bob
Smitty")
name:"Robert Smith"dob:2/13/1987score : .79
Indexing
{ name: "Robert Smith"dob:"1987/02/13" }
{ name: "Robert Smith"name.Key1:…name.Key2:…name.Key3:…dob: "1987/02/13" }
User Doc
Index
subset
Demo
Elastic + RNI
How could you use such a Field?
● Plugin contains custom mapper which does all the work behind the scenesPUT /ofac/ofac/_mapping{ "ofac" : { "properties" : { "name" : { "type:" : "rni_name" } "aka" : { "type:" : "rni_name" } } }}
What happens at index time?
● NameMapper indexes keys for different phenomena in separate (sub) fields@Override
public void parse(ParseContext context) throws IOException {
Name name = NameBuilder.data(nameString).build();
//Generate keys for name
Collection<FieldSpec> fields = helper.deriveFieldsForName(name);
//Parse each key with the appropriate Mapper
for (FieldSpec field : fields) {
Mapper mapper = keyMappers.get(field.getField().fieldName());
context = context.createExternalValueContext(field.getStringValue());
mapper.parse(context);
}
}
What happens at query time?
● Step #1: NameMapper generates analogous keys for a custom Lucene query that finds good candidates for re-scoring@Override
public Query termQuery(Object value, @Nullable QueryParseContext context) {
//Parse name string
Name name = NameBuilder.data(value.toString()).build();
QuerySpec spec = helper.buildQuerySpec(new NameIndexQuery(name));
//Build Lucene query
Query query = spec.accept(new ESQueryVisitor(names.indexName() + "."));
return query;
}
What else happens at query time?
● Step #2: Uses a Rescore query to score names in the best candidate documents and reorder accordingly○ Tuned for high precision name matching○ Computationally expensive"rescore" : {
"query" : {
"rescore_query" : {
"function_score" : {
"name_score" : {
"field" : "name",
"query_name" : "LobEzDiaS, Chuy"
}
...
● The 'name_score' function matches the query name against the indexed name in every candidate document and returns the similarity score
@Override
public double score(int docId, float subQueryScore) {
//Create a scorer for the query name
CachedScorer cs = createCachedScorer(queryName);
//Retrieve name data from doc values
nameByteData.setDocument(docId);
Name indexName = bytesToName(nameByteData.valueAt(i).bytes);
//Score the query against the indexed name in this document
return cs.score(indexName);
}
What does that function do?