NGram, Edge NGram, and Partial Gram in Elasticsearch
When implementing autocomplete or partial search features in Elasticsearch, you often use tokenizers like NGram or Edge NGram. These tokenizers help Elasticsearch break down words into smaller components (“grams”) to make partial or fuzzy matching possible.
๐ NGram
- What it does: Breaks a word into all possible substrings of a given length.
- Example (min_gram: 2, max_gram: 3) for the word “test”:
- Output:
["te", "tes", "es", "est", "st"] - Use case: Fuzzy matching and typo-tolerant search.
๐ Edge NGram
- What it does: Only generates substrings from the beginning of a word (prefix-based).
- Example (min_gram: 2, max_gram: 3) for the word “test”:
- Output:
["te", "tes"] - Use case: Autocomplete or search-as-you-type features.
๐ Partial Gram (Informal Term)
- What it means: Refers generally to substring matching (including NGram/Edge NGram usage).
- Itโs not an official Elasticsearch term, but commonly used in documentation or discussions.
๐ Note:
https://www.elastic.co/docs/reference/text-analysis/analysis-edgengram-tokenizer
Using NGram analyzers can significantly increase index size, so use them carefully. Edge NGram is generally more efficient for autocomplete purposes.