Implementation code of K-NN Algorithm using Python language
JIRA CODE – JJ – 134
This algorithm is used to solve the classification model problems. K-nearest neighbor or K-NN algorithm basically creates an imaginary boundary to classify the data. When new data points come in, the algorithm will try to predict that to the nearest of the boundary line.
Therefore, larger k value means smoother curves of separation resulting in less complex models. Whereas, smaller k value tends to over fit the data and result in complex models.
It’s very important to have the right k-value when analyzing the datasets to avoid over fitting and under fitting of the datasets.
Using the k-nearest neighbor algorithm we fit the historical data (or train the model) and predict the future.
Implementation :
from sklearn.neighbors import KNeighborsClassifier
classifier = KNeighborsClassifier(n_neighbors = 5, metric = 'minkowski', p = 2) //minkowski metric and p=2 ; default values to find the euclidean distance.//
classifier.fit(X_train, y_train)
k-NN classification is used more in the industry than an academic. It’s easy to train (because there’s no training, easy to use, and it’s easy to understand the results.
Applications of K-NN Algorithm : Text mining – The K-NN algorithm is one of the most popular algorithms for text categorization or text mining.