项目作者: alexloser

项目描述 :
Small tools for csv file processing (onehot encoding, format checking and converting to libsvm).
高级语言: C
项目地址: git://github.com/alexloser/xsvt.git
创建时间: 2019-08-13T14:29:51Z
项目社区:https://github.com/alexloser/xsvt

开源协议:Other

下载


XSVT

Small tools for csv file processing (onehot encoding, format checking and converting to libsvm).

XSV means the delimeter could be comma(“,” csv), tab(“\t”, tsv) and space(“ “)

Contains three parts:

  • Format checker
  • Onehot encoder
  • Transfer to libsvm format convert

In fact, the core function is onehot encoding for machine-learning.

Dependency:

No dependency, pure c code.

Example:

Before onehot encoding:

Image text

After onehot encoding:

Image text

Notice:

  • Onehot only process non-numeric values.
  • The matrix(csv) after encoded maybe very large because more 0-1 values expanded!
    In fact it’s sparse matrix, so libsvm format is another choice.

Build and Test:

make clean
make -j2
cd test
./test-checker.sh
./test-onehot.sh
./test-transfer.sh

Usage:

  • Format checker

    1. Usage: ./xsvt.checker [OPTIONS]
    2. -i : input csv file to check.
    3. -d : delimiter of csv file, default is comma.
    4. -h : show this help.
  • OneHot encoder

    1. Usage: ./xsvt.onehot [OPTIONS]
    2. -h : Show this help
    3. -i : Input filename(.csv) to encode
    4. -o : Output filename(.csv) to save results
    5. -d : Delimiter of csv file
    6. -k : Has csv header at first line, 1 or 0, default is 1(has)
    7. -w : Write header in output file, 1 or 0, default is 1(write)
    8. -c : Check csv format, only check, no encoding work!
    9. Example:
    10. ./xsvt.onehot -i input.csv -c
    11. ./xsvt.onehot -i input.csv -o out.csv -d "," -w 1 -k 1
    12. ./xsvt.onehot -i input.csv -o out.csv -w 1 -k 0
    13. Notice:
    14. Using -c option to check the format before encoding is a good idea!!!
  • Libsvm transfer

    1. Usage: ./xsvt.transfer [OPTIONS]
    2. -h : Show this help.
    3. -i : Input filename(csv or svm) to convert.
    4. -o : Output filename to save csv or svm format.
    5. -l : label filename for read(when csv=>svm) or write(when svm=>csv).
    6. -d : Delimiter of csv file, default is comma.
    7. -k : Has csv header at first line, 1 or 0, default is 1(has).
    8. -s2c : SVM => CSV.
    9. -c2s : CSV => SVM.
    10. Example: convert in.csv to out.svm
    11. ./xsvt.transfer -c2s -i in.csv -l in_label.csv -o out.svm -d "," -k 1
    12. Example: convert in.svm to out.csv
    13. ./xsvt.transfer -s2c -i in.svm -o out.csv -l out_label.csv
How ever, it’s a toy …