Naive Bayes Email Spam Classification
The classification of emails (spam or not) by using training sets to Naive Bayes
Classifier distributions. The data set was divided into training sets and test sets at a certain splitrate. The effects of training set and testing set ratios on the classification were checked using different split rates. The split rate that gave the best result was determined and the distributions were study at this rate. Model was created by using Naive Bayes distributions. The model was trained by training set. The label of the test set was predicted to the trained model. The expected result and predicted result was evaluated based on confusion matrix and accuracy values.
Dataset web page :https://archive.ics.uci.edu/ml/datasets/spambase
Number of Instances: 4601
Number of Attributes : 57
I applied Multinomial, Bernoulli and Gaussian Naive Bayes separately.
The highest accuracy
score is Bernoulli Naive Bayes Classifier.