项目作者: baibai25

项目描述 :
Multivariate Normal Distribution based Oversampling
高级语言: Jupyter Notebook
项目地址: git://github.com/baibai25/MNDO.git
创建时间: 2018-02-22T11:36:49Z
项目社区:https://github.com/baibai25/MNDO

开源协议:MIT License

下载


MNDO

Python implementation of MNDO (Multivariate Normal Distribution based Oversampling).

Article about this implemention

Requirements

  • Anaconda / Python 3.6
  • tqdm 4.31.1
  • imbalanced-learn 0.4.3

Usage

Preprocessing Keel-datasets

If you use Keel-datasets, you can use the following command.

  1. python pre_dataset.py dataset_directory
  • Preprocessing all files in a directory.
  • Remove unnecessary lines and replace class labels. (Positive class -> 1, Negative class -> 0)
  • Preprocessed data is saved in MNDO/Predataset/xxx.csv

Over-sampling

Resampled(generated) data is stored in ./pos_data

  1. python over-sampling.py data_path

Training

  1. python train.py data_path

train.py steps:

  1. Load data
  2. Over-sampling (MNDO, SMOTE, Borderline-SMOTE, ADASYN, SMOTE-ENN and SMOTE-Tomek Links)
  3. Scaling (Normalization or Standardization)
  4. Learning (SVM, Decision Tree and k-NN)
  5. Predict (Results is saved in MNDO/output/xxx.csv)

If you want to train all files, you can use this script:

  1. ./run.sh

ToDo

  • Provide as python library

Author

Kotaro Ambai (baibai25)