项目作者: MaitriGurey

项目描述 :
Classifies comments based on their toxicity.Uses MultinomialNB classifiers, SVM and GaussianNB classifier. Accuracy is detected using Binary Revelance Method.
高级语言:
项目地址: git://github.com/MaitriGurey/Toxic-Comments-Classification.git


Toxic-Comments-Classification

Aim of the Project:
In today’s world social media has become an integral part of life.
Dealing with toxicity online and curbing harassment has been a growing problem since social media and online conversations have become a part of everyday life. It is almost impossible to engage in online conversations without witnessing toxic behavior like unwanted harassment or disrespect.

The aim of the project is to categorize the toxic comments based on the types of toxicity. Examples of toxicity types can be toxic, severely toxic, obscene, threat, insult, identity hate. This is a Multi Label Classification problem which means that a given comment may belong to more than one category at the same time.

Language and Libraries used:

  1. Python 3.7
  2. Numpy
  3. Pandas
  4. Matplotlib
  5. NLTK
  6. Seaborn

Dataset used can be downloaded from https://www.kaggle.com/c/jigsaw-toxic-comment-classification-challenge

Steps involved

  1. Getting the dataset
  2. Getting insights from dataset using visualisation tools.
  3. Preprocessing the data using NLTK.
  4. Applying Multi Label classification algorithms.
  5. Comparing the results and choosing the best among them.