What is a fuzzy search?
A fuzzy search is a technique that uses search algorithms to find strings that match patterns approximately. It’s particularly useful for helping users find webpages without having to know exactly what they’re looking for or how a word is spelled. Fuzzy searches are also used for Structured Query Language lookups to help database users find records without having to be sure of the exact spelling of the value they’re looking for.
A fuzzy search is performed using a fuzzy matching algorithm, which returns a list of results based on likely relevance even though search argument words and spellings may not be an exact match. For web lookups, exact and highly relevant matches appear near the top of the list. Subjective relevance ratings may be given, usually as percentages.
How do fuzzy searches work?
A fuzzy matching algorithm can operate like a spell checker and spelling-error corrector. For example, if a user types “Misissippi” into Yahoo or Google — both of which use fuzzy matching — a list of hits is returned along with the question, “Did you mean Mississippi?” Alternative spellings and words that sound the same but are spelled differently are given.
A fuzzy matching algorithm can compensate for common input typing errors, as well as errors introduced by optical character recognition scanning of printed documents. The algorithm can return hits with content that contains a specified base word along with prefixes and suffixes. For example, if planet is entered as a search word, hits occur for sites containing words such as protoplanet or planetary. The program can also find synonyms and related terms, working like an online thesaurus or encyclopedic cross-reference tool.
Fuzzy matching algorithms usually return irrelevant hits, as well as relevant ones. Superfluous results are likely to occur for terms with multiple meanings, only one of which is the meaning the user intends. If the user has only a vague or general idea of the topic or doesn’t know exactly what to look for, the ratio of relevant hits to irrelevant hits tends to be low. However, the ratio is even lower when an exact matching program is used in this situation.
Fuzzy searching is much more powerful than exact searching when used for research and investigation and is especially useful when researching unfamiliar, foreign-language or sophisticated terms for which the proper spellings aren’t widely known. Fuzzy searching can also be used to locate individuals based on incomplete or partially accurate identifying information.
Many search engines enable users to specifically request a fuzzy search in the search query by using a tilde (~) at the end of the word or term they want to search with fuzziness.
What are the pros and cons of fuzzy matching?
While fuzzy matching can be helpful to researchers who don’t know the exact spelling of what they’re looking for, it also can be a source of frustration, as it may return irrelevant results that the user must then filter.
Fuzzy matching offers the following benefits:
- Online visitors can find products or product categories without knowing the correct spellings.
- Tourists can locate cities without knowing exact spellings or by using an approximate spelling in cases where the city name is listed in a different language.
- Scientists can find articles without having to know the exact titles.
- Users can find the names of films or books without knowing the exact titles.
- Users can search for people without knowing the exact spelling of their names.
Fuzzy matching also has the following drawbacks:
- It sometimes returns too many results, requiring the user to sift through all the suggestions to find the one that’s most appropriate. Sometimes, the most appropriate result is toward the end of the list, several pages down; and sometimes, none of the results are appropriate.
- It only compensates for typos or misspellings and must be complemented with a semantic algorithm to also consider synonyms and semantics.