项目作者: baibai25
项目描述 :
Multivariate Normal Distribution based Oversampling
高级语言: Jupyter Notebook
项目地址: git://github.com/baibai25/MNDO.git
MNDO
Python implementation of MNDO (Multivariate Normal Distribution based Oversampling).
Article about this implemention
Requirements
- Anaconda / Python 3.6
- tqdm 4.31.1
- imbalanced-learn 0.4.3
Usage
Preprocessing Keel-datasets
If you use Keel-datasets, you can use the following command.
python pre_dataset.py dataset_directory
- Preprocessing all files in a directory.
- Remove unnecessary lines and replace class labels. (Positive class -> 1, Negative class -> 0)
- Preprocessed data is saved in MNDO/Predataset/xxx.csv
Over-sampling
Resampled(generated) data is stored in ./pos_data
python over-sampling.py data_path
Training
python train.py data_path
train.py steps:
- Load data
- Over-sampling (MNDO, SMOTE, Borderline-SMOTE, ADASYN, SMOTE-ENN and SMOTE-Tomek Links)
- Scaling (Normalization or Standardization)
- Learning (SVM, Decision Tree and k-NN)
- Predict (Results is saved in MNDO/output/xxx.csv)
If you want to train all files, you can use this script:
./run.sh
ToDo
Author
Kotaro Ambai (baibai25)