Identifying which celebrity is speaking through deep learning
Identifying which celebrity is speaking through deep neural networks. Applications for this project include machines matching commands with individuals through their voices, such that in the future, the machines can anticipate personal commands.
1) Collected data from VoxCeleb[1], a database of celebrities’ voices and images.
2) Extracted audio features referencing Aaqib Saeed’s code
Miranda Cosgrove’s Mel | Smokey Robinson’s Mel |
---|---|
![]() |
![]() |
Miranda Cosgrove’s Tonnetz | Smokey Robinson’s Tonnetz |
---|---|
![]() |
![]() |
The best model was an 18 layer - CNN model using selu activation function, yielding 0.73 F1-score.
Error and Validation Plots:
ROC Curve for all celebrities:
Challenged my audience to try and guess the celebrity from an audio clipped I played. I then ran my model to try and determine the person as well. My model won with 3 more correct guesses than the audience.
Citation:
[1] A. Nagrani, J. S. Chung, A. Zisserman
VoxCeleb: a large-scale speaker identification dataset
INTERSPEECH, 2017