Lemmatization: implementation using Python

For Reducing morphological variations and grouping words to one common root

JIRA CODE – JJ-134

It is the process of grouping together the different inflected forms of a word so they can be analysed as a single item. Lemmatization is similar to stemming but it brings context to the words. So it links words with similar meaning to one word.
Text preprocessing includes both Stemming as well as Lemmatization. Many times people find these two terms confusing. Some treat these two as same. Actually, lemmatization is preferred over Stemming because lemmatization does morphological analysis of the words.

1: Used in search engines.

2: Used in compact indexing

from nltk.stem import WordNetLemmatizer 
lemmatizer = WordNetLemmatizer() 
print("rocks :", lemmatizer.lemmatize("rocks")) 
print("better :", lemmatizer.lemmatize("better", pos ="a"))

Output:
rocks : rock
better : good

Leave a comment

Your email address will not be published. Required fields are marked *