It is a classification technique based on Bayes’ theorem with an assumption of independence between predictors. A Naive Bayes classifier assumes that the presence of a particular feature in a class is unrelated to the presence of any other feature.

Naive Bayesian model is easy to build and particularly useful for very large data sets. Naive Bayes is known to outperform even highly sophisticated classification methods.

Naive Bayes uses a similar method to predict the probability of different classes based on various attributes. This algorithm is mostly used in text classification and with problems having multiple classes.

Bayes theorem calculates probability P(c|x) where c is the class of the possible outcomes and x is the given instance which has to be classified, representing some certain features.

P(c|x) = P(x|c) * P(c) / P(x)

Naive Bayes predicts the tag of a text. They calculate the probability of each tag for a given text and then output the tag with the highest one.

Code :

Cleaning texts

import pandas as pd
import re
import nltk
from nltk.corpus import stopwords
from nltk.stem.porter import PorterStemmer
from sklearn.feature_extraction.text import CountVectorizer

dataset = [[“I liked the movie”, “positive”],
[“It’s a good movie. Nice story”, “positive”],
[“Hero’s acting is bad but heroine looks good.\Overall nice movie”, “positive”],
[“Nice songs. But sadly boring ending.”, “negative”],
[“sad movie, boring movie”, “negative”]]

dataset = pd.DataFrame(dataset)
dataset.columns = [“Text”, “Reviews”]

nltk.download(‘stopwords’)

corpus = []

for i in range(0, 5):
text = re.sub(‘[^a-zA-Z]’, ”, dataset[‘Text’][i])
text = text.lower()
text = text.split()
ps = PorterStemmer()
text = ”.join(text)
corpus.append(text)

cv = CountVectorizer(max_features = 1500)

X = cv.fit_transform(corpus).toarray()
y = dataset.iloc[:, 1].values

Splitting the data set into training set and test set

from sklearn.cross_validation import train_test_split

X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size = 0.25, random_state = 0)

Fitting naive bayes to the training set

from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import confusion_matrix

classifier = GaussianNB();
classifier.fit(X_train, y_train)

Predicting test set results

y_pred = classifier.predict(X_test)

Making the confusion matrix

cm = confusion_matrix(y_test, y_pred)
cm

Output:

array([[0, 0],
[2, 0]])

Implementing the Naive-Bayes Machine learning Model