Stemming: Implementation using Python code

A normalizing method in Python

JIRA CODE: JJ-134

Stemming:
The idea of stemming is a sort of normalizing method. Many variations of words carry the same meaning, other than when tense is involved.
There are mainly two errors in stemming – Over stemming and Under stemming. Over stemming occur when two words are stemmed to same root that are of different stems. Under-stemming occurs when two words are stemmed to same root that are not of different stems Stemming is used in information retrieval systems like search engines.

Python
Code: 
from nltk.stem import PorterStemmer 
from nltk.tokenize import word_tokenize 
   ps = PorterStemmer() 
words = ["program", "programs", "programer", "programing", "programers"] 
  for w in words: 
    print(w, " : ", ps.stem(w))

Output:
program : program
programs : program
programer : program
programing : program
programers : program

Stemming words from sentences

Code:
 from nltk.stem import PorterStemmer 
 from nltk.tokenize import word_tokenize 
 ps = PorterStemmer() 
    sentence = "Programers program with programing languages"
 words = word_tokenize(sentence) 
 for w in words: 
     print(w, " : ", ps.stem(w))  

Output :
Programers : program
program : program
with : with
programing : program
languages : languag

Leave a comment

Your email address will not be published. Required fields are marked *