NGram, Edge NGram, and Partial Gram in Elasticsearch

NGram, Edge NGram, and Partial Gram in Elasticsearch

When implementing autocomplete or partial search features in Elasticsearch, you often use tokenizers like NGram or Edge NGram. These tokenizers help Elasticsearch break down words into smaller components (“grams”) to make partial or fuzzy matching possible.

๐Ÿ“˜ NGram

  • What it does: Breaks a word into all possible substrings of a given length.
  • Example (min_gram: 2, max_gram: 3) for the word “test”:
  • Output: ["te", "tes", "es", "est", "st"]
  • Use case: Fuzzy matching and typo-tolerant search.

๐Ÿ“˜ Edge NGram

  • What it does: Only generates substrings from the beginning of a word (prefix-based).
  • Example (min_gram: 2, max_gram: 3) for the word “test”:
  • Output: ["te", "tes"]
  • Use case: Autocomplete or search-as-you-type features.

๐Ÿ“˜ Partial Gram (Informal Term)

  • What it means: Refers generally to substring matching (including NGram/Edge NGram usage).
  • Itโ€™s not an official Elasticsearch term, but commonly used in documentation or discussions.

๐Ÿ“Œ Note:

https://www.elastic.co/docs/reference/text-analysis/analysis-edgengram-tokenizer

https://stackoverflow.com/questions/33833781/elasticsearch-partial-exact-scoring-with-edge-ngram-fuzziness

Using NGram analyzers can significantly increase index size, so use them carefully. Edge NGram is generally more efficient for autocomplete purposes.

Leave a comment

Your email address will not be published. Required fields are marked *