Text classification using CNN & LSTM
Run the model in command prompt using:
python trainModel.py
python predictModel.py
To Divide the training data into train & evaluate files, Run Script “splitData.py”
a. Input file: train_test.csv
b. Output file: train.csv, evaluate.csv
Randomize the division of data: yes/no, if selected yes ‘y’, enter any number greater than 0
Training data fraction: any fraction between 0-1 (e.g 0.8 divides the data into 80% training and 20% evaluation samples)
To inflate data by creating duplicates, Run script “InflateAndSampleData.py”.
a. Input file: train.csv
b. Output file: train_samples.csv
c. Change the train_sample.csv file format as per the train data template, add ID column and save file as train.csv
samples_count : Number of training samples per class
REMOVE_EXTRA: remove extra samples if samples count for any class is greater than the given samples_count number
All the model parameters can be set/changed using the “settings.json” file:
Run Script “trainModel.py”
a. Input file: train.csv
b. Output file: Model & Data Pickles to be used to predictions (Model & PickleJar folder)
More classes can be added for training by adding more classes in the “CLASS” column of train.csv file. And the run trainModel script to train the Model with the updated classes structure.
Run Script “predictModel.py”
a. Input file: test.csv,
b. Default inputs: Trained Model & Pickled data (Model & PickleJar folder)
c. Output file: predictions.csv