Subspace multinomial model for learning document representations
python3.6
pytorch
, numpy
, scipy
, scikit-learn
python TwentyNewsDataset.py
scipy.sparse
matrix.scipy.sparse
matrix of shape n_words x n_docs
python run_smm_20news.py train -o exp/ -trn 100 -lw 1e-04 -rt l1 -lt 1e-4 -k 100
The trained model is saved as exp/lw_1e-40_l1_1e-04_100/model_T100.pt
phase
: train
or extract
-lw
: l2
regularization const for i-vectors-rt
: type of regularization for bases (l1
or l2
)-lt
: regularization const for bases-k
: i-vector dimension-o
: path to output directory-trn
: training iterations--ovr
: over-write existing experiment directorypython run_smm_20news.py extract -m exp/lw_1e-04_l1_1e-04_100/model_T100.pt -xtr 30 --nth 2
The document i-vectors are saved in exp/lw_1e-40_l1_1e-04_100/ivecs/
-xtr
: extraction iterations.--nth
: save every n
-th i-vector while extraction.python train_and_clf.py exp/lw_1e-40_l1_1e-04_100/train_model_T100_e30.npy
CUDA_VISIBLE_DEVICES=<device_id>
followed by python run_smm_20news.py