Autocomplete Feature With ElasticSearch
Autocomplete is crucial functionality for any E-commerce Site. It reduces the amount of character a user needs to type before executing any search actions, thereby enhancing the search experience of users.
while there can be multiple ways to implement autocomplete with elasticsearch like,
- Prefix Query
- N-gram
- Suggesters
We are going to implement autocomplete with Edge N-gram and use function score to build relevance, We will be using Places name data to build our sample index,
We will be creating an index places
with the following fields
id
this we will use for find and update if needed
we will use two different custom analyzers for indexing and searching on the description field, before going further let’s understand some basic terms in Text analysis.
Text analysis- Text analysis is the process of converting text, like the body of any email, into tokens or terms which are added to the inverted index for searching. An analysis is performed by an
analyzer
.
Analyzer - An analyzer is just a package which contains three lower-level building blocks: character filters, tokenizers, and token filters.
Index time Analyzer
- Here we are gone use [standard] tokenizer
- Define our
char_filter
we name itampersand
{
"ampersand": {
"type": "mapping",
"mappings": ["&=> and ", "@=> at"]
}
}
- we use 3 filters first
lowercase
which is built in 2 custom filter built on top ofedge_ngram
andstop
{
"filter_edge_ngram":{
"type":"edge_ngram",
"min_gram": 1,
"max_gram": 12,
"token_chars":[
"letter",
"digit"]
},
"my_stop": {
"type": "stop",
"stopwords": ["a", "and", "are", "as", "be", "but", "for", "if", "into", "is", "it", "no", "not", "or", "such", "that", "their", "then", "there", "these", "they", "this", "to", "was", "will", "with"]
}
}
- finally we define our analyzer i name it
my_index_analyzer
{
"my_index_analyzer": {
"type": "custom",
"char_filter": ["ampersand"],
"tokenizer": "standard",
"filter": [
"lowercase",
"my_stop",
"filter_edge_ngram"]
}
}
Search time Analyzer
It will be same mostly except that we don’t use filter_edge_ngram here
{
"my_search_analyzer": {
"type": "custom",
"char_filter": ["ampersand"],
"tokenizer": "standard",
"filter": [
"lowercase",
"my_stop"]
}
}
final defination will be
just request PUT /places and use above json in body
curl -X PUT "localhost:9200/place?pretty" -H 'Content-Type: application/json' --data-binary '@/path_to_file/place_index_mapping.json'
after this we will index some data to our index
curl -X POST "localhost:9200/_bulk?pretty" -H 'Content-Type: application/json' -d '
{ "index" : { "_index" : "place"} }
{ "id" : "1", "description":"Turf Club Road, Turf City Superstore, Singapore" }
{ "index" : { "_index" : "place" } }
{ "id" : "2", "description":"Stevens Road, Mercure Singapore On Stevens, Singapore" }
{ "index" : { "_index" : "place"} }
{ "id" : "3", "description":"Maxwell Road, Vinfield Resources International LLP, Singapore" }
'
Lets search Now
curl -X GET "localhost:9200/_search?pretty" -H 'Content-Type: application/json' -d'
{
"query": {
"match" : {
"description" : {
"query" : "road mercu"
}
}
}
}
'
If you observer we had searched for “road mercu” which is our second document, but in results, all 3 documents are returned, basically if at least one term available form the search query in the document it is returned.
Other documents may not be relevant for the user let’s fix this
curl -X GET "localhost:9200/_search?pretty" -H 'Content-Type: application/json' -d'
{
"query": {
"match" : {
"description" : {
"query" : "road merz",
"operator": "and"
}
}
}
}
'
yeah, it’s working let’s try again with road merz
oops, we had misspelled ‘Mercure’ let’s make it work even if the user makes a mistake with elastic search fuzziness
curl -X GET "localhost:9200/_search?pretty" -H 'Content-Type: application/json' -d'
{
"query": {
"match" : {
"description" : {
"query" : "road merz",
"operator": "and",
"fuzziness":"AUTO"
}
}
}
}
'
Now let’s wrap up, what we had covered
1)build an index analyzer with the following features
- convert to lowercase
- remove stop words
- convert special characters
- tokenize text from left to right(shingle)
- build search analyzer who
- convert to lowercase
- remove stop words
- convert special characters
- perform searches with,
- either of term match
- all terms should match
- fuzziness
Thanks, guys, hope you like it