项目作者: terranivium

项目描述 :
Speech emotion recognition with PyTorch
高级语言: Jupyter Notebook
项目地址: git://github.com/terranivium/speech-emotion-recognition.git
创建时间: 2020-06-21T19:22:17Z
项目社区:https://github.com/terranivium/speech-emotion-recognition

开源协议:

下载


MSc Software Development (2019-2020)

University of Glasgow

Source-code for the research paper ‘Deep learning for robust dimensional characterisation of affect in speech’.

Paper abstract:

  1. * "This paper seeks to evaluate machine learning methodology
  2. in the task of speech emotion recognition (SER) by utilising a
  3. signal processing approach. This is done with aim of assessing
  4. user behaviour during the use of online voice communication."
  5. * "We use dimensional models of emotion, put forward by empirical
  6. research to gain nuanced information about sampled emotion data,
  7. disseminating greater insight into user voice communication activity."
  8. * "This paper focuses on the extraction and use of Mel-frequency
  9. cepstral coefficients (MFCC) as feature vectors for feed
  10. forward neural network architectures."
  11. * "Experiments conducted show evidence that the methodology
  12. proposed in this paper is partially effective on unseen,
  13. dissimilar in structure real-world data, which has proven
  14. to be a hurdle to deployment of solutions in the area of
  15. automatic speech recognition (ASR) and SER. These findings
  16. provide a framework to enable more precise, automated user handling."

Files:

  1. 'speech_emotion_recognition.ipynb' - main notebook
  2. - PyTorch (MLP) model
  3. 'augment_data.ipynb' - data augmentation batch scripts
  4. - white noise
  5. - simulated chatter, background noise
  6. - overdrive
  7. - reverb

Datasets used to train, validate and test model:

  1. + RAVDESS
  2. + CREMA-D

Datasets used only to emulate wild test performance:

  1. + TESS

Usage guide:

  1. + To simply generate results and plots,
  2. skip to and run cells in 'Testing' section of 'speech_emotion_recognition.ipynb',
  3. this will load the provided 'best model' state.