Autocomplete Feature Building Relevence
In my previous post Autocomplete Feature with ElasticSearch. We had to build our custom analyzer and searched through documents.
Now suppose we have thousands of documents in our index, and searched query from user matches 100+ records. we can’t show this much of data on UI, it will ruin UI as well as User Experience. In autocomplete suggestions, we should show max 5-10 results on UI.
Now the question is which 5 results out of 100+ matched records?
The answer is most relevant.
By default record come sorted with elastic search default similarity algorithms which take account multiple things like tf (term frequency), idf(inverse document frequency), term length complete explanation out of scope for this post you can read about them later from here, In one line, It tells how well your document is matched.
we are going to give an artificial boost to documents with
- Popularity: which results are clicked by more number of users.
- Users Location: From which location end user is performing a search.
we had taken the user’s location as a boosting factor as we are searching through places, it could be the price if we had working with items. We will use elasticsearch’s function Score feature to define a function on how these factors affect our results. Let’s start working.
We will be updating index places
with following fields
click_count
, this will be used to store click counts when a user actually clicks on this suggested resultlocation
this will be used to store latitude and longitude of the placesnorm_click_count
this will be used to store normalized click count, which is(click_count/MAX(click_count))*100
run this to update the mapping of our existing place
index
curl -X PUT "localhost:9200/place/_mapping?pretty" -H 'Content-Type: application/json' -d'
{
"properties":
{
"click_count":
{
"type": "integer"
},
"location":
{
"type": "geo_point"
},
"norm_click_count":
{
"type": "half_float"
}
}
}'
Let’s have some dumys data,
curl -X POST "localhost:9200/_bulk?pretty" -H 'Content-Type: application/json' -d '
{ "index" : { "_index" : "place"} }
{"id":1,"description": "427 Race Course Road, Singapore","click_count": 1,"norm_click_count": "0.0","location":{"lat": "1.315918","lon": "103.857541"}}
{ "index" : { "_index" : "place"} }
{"id":2,"description": "Test, test, Singapore","click_count": 7,"norm_click_count": "1.16279069767442","location":{"lat": "1.332655","lon": "103.856372"}}
{ "index" : { "_index" : "place"} }
{"id":3,"description": "Paya Lebar Road, Singapore","click_count": 5,"norm_click_count": "0.934579439252336","location":{"lat": "1.324514","lon": "103.890757"}}
{ "index" : { "_index" : "place"} }
{"id":4,"description": "Natl Stadium, Singapore","click_count": 11,"norm_click_count": "1.6260162601626","location":{"lat": "1.306756","lon": "103.875244"}}
{ "index" : { "_index" : "place"} }
{"id":5,"description": "Turf Club Road, Turf City Superstore, Singapore","click_count": 3,"norm_click_count": "6.45161290322581","location":{"lat": "1.337685","lon": "103.793667"}}
{ "index" : { "_index" : "place"} }
{"id":6,"description": "Stevens Road, Mercure Singapore On Stevens, Singapore","click_count": 2,"norm_click_count": "0.162074554294976","location":{"lat": "1.313955","lon": "103.828427"}}
{ "index" : { "_index" : "place"} }
{"id":7,"description": "Kranji, Kranji Road, Singapore","click_count": 17,"norm_click_count": "1.94884287454324","location":{"lat": "1.425227","lon": "103.762028"}}
{ "index" : { "_index" : "place"} }
{"id":8,"description": "Orchard, Singapore","click_count": 33,"norm_click_count": "8.76712328767123","location":{"lat": "1.304843","lon": "103.831824"}}
{ "index" : { "_index" : "place"} }
{"id":9,"description": "Airport Blvd, Singapura","click_count": 16,"norm_click_count": "1.72018348623853","location":{"lat": "1.347626","lon": "103.984587"}}
{ "index" : { "_index" : "place"} }
{"id":10,"description": "Dover Road, Singapore Polytechnic, Singapore","click_count": 3,"norm_click_count": "0.266666666666667","location":{"lat": "1.309877","lon": "103.777501"}}
'
Now we are ready to search with function Score,
curl -X GET "localhost:9200/_search?pretty" -H 'Content-Type: application/json' -d'
{
"size": 5,
"_source":
{
"includes": ["description", "click_count"]
},
"query":
{
"function_score":
{
"query":
{
"match":
{
"description":
{
"query": "cochi",
"operator": "and",
"fuzziness": "AUTO"
}
}
},
"boost": 50,
"functions": [
{
"field_value_factor":
{
"field": "norm_click_count",
"factor": 0.2,
"missing": 0.2
}
},
{
"gauss":
{
"location":
{
"origin":
{
"lat": 1.306756,
"lon": 103.875244
},
"offset": "5km",
"scale": "5km"
}
}
}],
"boost_mode": "sum"
}
}
}
'
In the above query, we have used two functions,
we had given boost_mode
as “sum”, means the values from boost
and boost
= 50 boost value to added in final score
-
The first function is a field_value_factor function, here we are using norm_click_count and multiplies it by 0.2(
factor
) and for missing values taking it as 0.2 -
the second function is a gauss decay function which will decreasing score of the documents using the distance between doc location and user location
Note: You can use Explain API to see how your scores are calculated