Autocomplete Feature With ElasticSearch

Autocomplete is crucial functionality for any E-commerce Site. It reduces the amount of character a user needs to type before executing any search actions, thereby enhancing the search experience of users.

while there can be multiple ways to implement autocomplete with elasticsearch like,

  • Prefix Query
  • N-gram
  • Suggesters

We are going to implement autocomplete with Edge N-gram and use function score to build relevance, We will be using Places name data to build our sample index,

We will be creating an index places with the following fields

  • id this we will use for find and update if needed

we will use two different custom analyzers for indexing and searching on the description field, before going further let’s understand some basic terms in Text analysis.

Text analysis- Text analysis is the process of converting text, like the body of any email, into tokens or terms which are added to the inverted index for searching. An analysis is performed by an analyzer.

Analyzer - An analyzer is just a package which contains three lower-level building blocks: character filters, tokenizers, and token filters.

Index time Analyzer

  • Here we are gone use [standard] tokenizer
  • Define our char_filter we name it ampersand
{
  "ampersand": {
    "type": "mapping",
    "mappings": ["&=> and ", "@=> at"]
  }
}
  • we use 3 filters first lowercase which is built in 2 custom filter built on top of edge_ngram and stop
{
  "filter_edge_ngram":{
    "type":"edge_ngram",
    "min_gram": 1,
    "max_gram": 12,
    "token_chars":[
      "letter",
      "digit"]
  },
  "my_stop": {
        "type": "stop",
        "stopwords": ["a", "and", "are", "as", "be", "but", "for", "if", "into", "is", "it", "no", "not", "or", "such", "that", "their", "then", "there", "these", "they", "this", "to", "was", "will", "with"]
	}
}
  • finally we define our analyzer i name it my_index_analyzer
{
	"my_index_analyzer": {
        "type": "custom",
        "char_filter": ["ampersand"],
        "tokenizer": "standard",
        "filter": [
          "lowercase",
          "my_stop",
          "filter_edge_ngram"]
    }
}

Search time Analyzer

It will be same mostly except that we don’t use filter_edge_ngram here

{
	"my_search_analyzer": {
        "type": "custom",
        "char_filter": ["ampersand"],
        "tokenizer": "standard",
        "filter": [
          "lowercase",
          "my_stop"]
    }
}

final defination will be

just request PUT /places and use above json in body

curl -X PUT "localhost:9200/place?pretty" -H 'Content-Type: application/json' --data-binary '@/path_to_file/place_index_mapping.json'

after this we will index some data to our index

curl -X POST "localhost:9200/_bulk?pretty" -H 'Content-Type: application/json' -d '
{ "index" : { "_index" : "place"} }
{ "id" : "1", "description":"Turf Club Road, Turf City Superstore, Singapore" }
{ "index" : { "_index" : "place" } }
{ "id" : "2", "description":"Stevens Road, Mercure Singapore On Stevens, Singapore" }
{ "index" : { "_index" : "place"} }
{ "id" : "3", "description":"Maxwell Road, Vinfield Resources International LLP, Singapore" }
'

Lets search Now

curl -X GET "localhost:9200/_search?pretty" -H 'Content-Type: application/json' -d'
{
    "query": {
        "match" : {
            "description" : {
                "query" : "road mercu"
            }
        }
    }
}
'

If you observer we had searched for “road mercu” which is our second document, but in results, all 3 documents are returned, basically if at least one term available form the search query in the document it is returned.

Other documents may not be relevant for the user let’s fix this

curl -X GET "localhost:9200/_search?pretty" -H 'Content-Type: application/json' -d'
{
    "query": {
        "match" : {
            "description" : {
                "query" : "road merz",
                "operator": "and"
            }
        }
    }
}
'

yeah, it’s working let’s try again with road merz oops, we had misspelled ‘Mercure’ let’s make it work even if the user makes a mistake with elastic search fuzziness

curl -X GET "localhost:9200/_search?pretty" -H 'Content-Type: application/json' -d'
{
    "query": {
        "match" : {
            "description" : {
                "query" : "road merz",
                "operator": "and",
                "fuzziness":"AUTO"
            }
        }
    }
}
'

Now let’s wrap up, what we had covered

1)build an index analyzer with the following features

  • convert to lowercase
  • remove stop words
  • convert special characters
  • tokenize text from left to right(shingle)
  1. build search analyzer who
  • convert to lowercase
  • remove stop words
  • convert special characters
  1. perform searches with,
  • either of term match
  • all terms should match
  • fuzziness

Thanks, guys, hope you like it

Related Posts