Sentiment analysis in Pytorch on an IMDb dataset.
The repository will walk you through the process of building a complete Sentiment Analysis model, which will be able to predict a polarity of given review (whether the expressed opinion is positive or negative). The dataset on which the model is going to be trained is popular IMDb movie reviews dataset.
The first notebook covers data loading from the raw dataset, feature extraction and analysis, also text preprocessing and train/val/test sets preparation.
The second tutorial contains instructions on how to set up the vocabulary object that will be responsible for the following tasks:
Enabling the use of pre-trained word vectors.
Furthermore, we will build the BatchIterator class that could be used for:
In the third notebook, the bidirectional Gated Recurrent Unit model will be built. In our neural network we will implement and use the following architectures and techniques: bidirectional GRU, stacked (multi-layer) GRU, dropout/spatial dropout, max-pooling, avg-pooling. The hyperparameters fine-tuning process will be presented. After choosing the proper parameters set, we will train our model and determine the generalization error.
BiGRU with additional features
In this notebook, we will implement the bidirectional Gated Recurrent Unit model that uses features extracted in the first tutorial.
This notebook covers the implementation of the bidirectional Gated Recurrent Unit model, which uses pre-trained Glove word embeddings together with additional features.
In this notebook, we will build the Convolutional Neural Network model for text classification.
Transformer model for classification
Implementation of the Self-Attention Transformer model for the classification task.
Dataset is available under the following link:
http://ai.stanford.edu/~amaas/data/sentiment
Unpack the downloaded tar.gz file using:
tar -xzf acllmdb.tar.gz
Rearrange the data to the following structure:
dataset
├── test
│ ├── positive
│ ├── negative
├── train
├── positive
└── negative
Create a virtual environment (conda, virtualenv etc.).
conda create -n <env_name> python=3.7
Activate your environment.
conda activate <env_name>
Create a new kernel.
pip install ipykernel
python -m ipykernel install --user --name <env_name>
Go to the directory: .local/share/jupyter/kernels/<env_name>
and ensure that kernel.json file contains the path to your environment python interpreter (can be checked by which python
command).
{
"argv": [
"home/user/anaconda3/envs/<env_name>/bin/python",
"-m",
"ipykernel_launcher",
"-f",
"{connection_file}"
],
"display_name": "<env_name>",
"language": "python"
}
Install requirements.
pip install -r requirements.txt
Restart your environment.
conda deactivate
conda activate <env_name>
Inside your virtual environment launch the jupyter notebook, and open the notebook file (with .ipynb extension), then change the kernel to the one created in the preceding step (
Model | Test accuracy | Validation accuracy | Training accuracy |
---|---|---|---|
BiGRU | 0.880 | 0.878 | 0.908 |
BiGRU with extra features | 0.882 | 0.881 | 0.898 |
BiGRU with Glove vectors | 0.862 | 0.862 | 0.842 |
TextCNN | 0.859 | 0.847 | 0.833 |
Transformer | 0.883 | 0.880 | 0.912 |