Bangla Sentiment Classification
This is a compiled version of sentiment analysis dataset collected from several sources for benchmarking purpose and reported in our work: Sentiment Classification in Bangla Textual Content: A Comparative Study. We provide data splits and benchmark results to compare with any future works.
Table of contents:
In the following directories we have data splits (train/dev/test) for different datasets.
For the lexical analysis, we computed valance score for each dataset. In lexical_analysis/ directory, we provided such analyses.
This is a compiled version from several datasets. We are releasing it as CC BY-NC-SA 2.0 (https://creativecommons.org/licenses/by-nc-sa/2.0/).
However, for the respective data source please check the licence in the corresponding papers or source location.
Please cite the following papers if you are using the data:
@article{alam2021review,
title={A Review of Bangla Natural Language Processing Tasks and the Utility of Transformer Models},
author={Alam, Firoj and Hasan, Md Arid and Alam, Tanvir and Khan, Akib and Tajrin, Janntatul and Khan, Naira and Chowdhury, Shammur Absar},
journal={arXiv preprint arXiv:2107.03844},
year={2021}
}
@inproceedings{iccit2020Arid,
Author = {Md. Arid Hasan and Jannatul Tajrin and Shammur Absar Chowdhury and Firoj Alam},
Booktitle = {23rd International Conference on Computer and Information Technology (ICCIT)},
Month = {December},
Title = {Sentiment Classification in Bangla Textual Content: A Comparative Study},
Year = {2020}
}
@inproceedings{patra2015shared,
title={Shared task on sentiment analysis in indian languages (sail) tweets-an overview},
author={Patra, Braja Gopal and Das, Dipankar and Das, Amitava and Prasath, Rajendra},
booktitle={Proc. of MIKE},
pages={650--655},
year={2015},
organization={Springer}
}
@article{rahman2018datasets,
title={Datasets for aspect-based sentiment analysis in bangla and its baseline evaluation},
author={Rahman, Md and Kumar Dey, Emon and others},
journal={Data},
volume={3},
number={2},
pages={15},
year={2018},
publisher={Multidisciplinary Digital Publishing Institute}
}
@article{rezaul2020classification,
title={Classification Benchmarks for Under-resourced {Bengali} Language based on Multichannel Convolutional-LSTM Network},
author={Rezaul Karim, Md and Raja Chakravarthi, Bharathi and Arcan, Mihael and McCrae, John P and Cochez, Michael},
journal={arXiv},
pages={arXiv--2004},
year={2020}
}
@inproceedings{tripto2018detecting,
title={Detecting Multilabel Sentiment and Emotions from Bangla YouTube Comments},
author={Tripto, Nafis Irtiza and Ali, Mohammed Eunus},
booktitle={Proc. of ICBSLP},
pages={1--6},
year={2018},
organization={IEEE}
}
Please write to banglanlp@gmail.com.