项目作者: ggeop

项目描述 :
Machine Learnng - Naives Bayes Classifier
高级语言:
项目地址: git://github.com/ggeop/Bayes-Naives-Classifier-ML.git
创建时间: 2018-05-16T18:21:56Z
项目社区:https://github.com/ggeop/Bayes-Naives-Classifier-ML

开源协议:MIT License

下载


alt text
(cover image from: https://chrisalbon.com/) :-)

Bayes-Naives-Classifier

Description

We have created a very simple data set consisting of ten observations of seven input features. This is a simplified rendition of the German Credit data set from the UCI repository for the Support Vector Machine lab. The output variable is the Decision column. Here, this takes the value 1 when we reject the loan and 0 when we accept. We are going to construct a very simple Naïve Bayes model for this problem but we are going to train this manually.

Implementation

Firstly, we create a small dataset in order to test our code

  1. d1 <- c(1, 0, 0, 1, 0, 0, 0, 1)
  2. d2 <- c(0, 1, 0, 1, 1, 0, 0, 0)
  3. d3 <- c(0, 0, 1, 0, 0, 0, 1, 0)
  4. d4 <- c(0, 0, 0, 1, 0, 0, 0, 0)
  5. d5 <- c(0, 0, 0, 0, 0, 0, 1, 0)
  6. d6 <- c(0, 0, 0, 1, 0, 1, 0, 1)
  7. d7 <- c(0, 0, 1, 0, 0, 1, 0, 1)
  8. d8 <- c(1, 0, 0, 0, 0, 0, 0, 1)
  9. d9 <- c(0, 0, 0, 0, 0, 1, 0, 1)
  10. d10 <- c(1, 1, 0, 0, 0, 1, 0, 1)
  11. nb_df <- as.data.frame(rbind(d1,d2,d3,d4,d5,d6,d7,d8,d9,d10))
  12. names(nb_df) <- c("BadCredit", "HasStableJob", "OwnsHouse", "BigLoan",
  13. "HasLargeBankAccount", "HasPriorLoans", "HasDependents", "Decision")

Then we have to build piece by piece the Naive Bayes Classifier

  1. #Create the Class Vector
  2. decision<-nb_df$Decision
  3. #Calculate the propability of the loan accept
  4. p_accept<-sum(decision==0)/length(decision)
  5. #calculate the propability of the loan rejection
  6. p_reject<-sum(decision==1)/length(decision)
  7. #Create a vector with the prior probabilities
  8. priors<-c(p_accept,p_reject)

Compute a summary data frame in which one row contains the probabilities P(Fi = 1|Class = 0) for all the different features Fi and the
other row contains the probabilities P(Fi = 1|Class = 1). For example, the cell at [1,1] could contain the probability that when we accept the loan (class = 0) the loan applicant has bad credit (BadCredit = 1).

  1. aggregate(x=nb_df[c("BadCredit","HasStableJob","OwnsHouse","BigLoan","HasLargeBankAccount","HasPriorLoans","HasDependents")],
  2. by =nb_df[c("Decision")],
  3. FUN = function(x){y <- sum(x)/length(x); return(y)}
  4. )
  5. aggregate(x=nb_df[c("BadCredit","HasStableJob","OwnsHouse","BigLoan","HasLargeBankAccount","HasPriorLoans","HasDependents")],
  6. by =nb_df[c("Decision")],
  7. FUN = function(x){y <- 1- sum(x)/length(x); return(y)}
  8. )

Recalculate the matrix of probabilities that we computed in previous step to incorporate additive smoothing.

  1. #Calculate again the propabilities
  2. prob_matrix<-aggregate(x=nb_df[c("BadCredit","HasStableJob","OwnsHouse","BigLoan","HasLargeBankAccount","HasPriorLoans","HasDependents")],
  3. by =nb_df[c("Decision")],
  4. FUN = function(x){y <- (sum(x)+1)/(length(x)+2); return(y)}
  5. )

Finaally, we create the Bayes Naive classifier

  1. classifier<-function(observation,priors, prob_matrix)
  2. {
  3. #Delete the Decision column
  4. observation$Decision<-NULL
  5. prob_matrix$Decision<-NULL
  6. #Culculate the probability for the Reject Class (C=1)
  7. p<-c()
  8. for (i in 1:ncol(observation))
  9. {
  10. if (observation[i]==1)
  11. {
  12. p[i]<-prob_matrix[2,i]
  13. }
  14. else
  15. {
  16. p[i]<-1-prob_matrix[2,i]
  17. }
  18. }
  19. prob_reject<-prod(p)*priors[2]
  20. #Culculate the probability for the Accept Class (C=0)
  21. p<-c()
  22. for (i in 1:ncol(observation))
  23. {
  24. if (observation[i]==1)
  25. {
  26. p[i]<-prob_matrix[1,i]
  27. }
  28. else
  29. {
  30. p[i]<-1-prob_matrix[1,i]
  31. }
  32. }
  33. prob_accept<-prod(p)*priors[1]
  34. #Assign to the highest probability
  35. if(prob_accept>prob_reject)
  36. {
  37. return(0)
  38. }
  39. else
  40. {
  41. return(1)
  42. }
  43. }
  44. predict_nb <- function(test_df, priors, prob_matrix)
  45. {
  46. predict<-c()
  47. for (i in 1:nrow(test_df))
  48. {
  49. predict[i]<-classifier(test_df[i,],priors, prob_matrix)
  50. }
  51. return(predict)
  52. }

Compute the training accuracy of your Naïve Bayes model using the function that you just created

  1. #Creare the accuracy function
  2. accuracy<-function(test_dataset, predict_values)
  3. {
  4. count<-0
  5. for (i in 1:length(test$Decision))
  6. {
  7. if (prediction[i] == test$Decision[i])
  8. {
  9. count<-count+1
  10. }
  11. }
  12. return(count/length(test$Decision))
  13. }
  14. #We know that we don't have a test dataset, so we take a partition of the original dataset
  15. test<-nb_df[1:3,]
  16. prediction<-predict_nb(test, priors, prob_matrix)
  17. #We calculate the accuracy. It's normal to have accuracy equals to 1 because we run it in the same dataset.
  18. #Calculate the accuracy
  19. accuracy(test,prediction)