[WIP] Inter-singer audio/song conversion using Deep Learning
Inter-singer audio conversion using Deep Learning
If Justin Bieber sang Adele’s “Set Fire to the Rain”, how would it sound?
Wouldn’t it be interesting to know how songs would sound if sung by different singers and vocalists? I try to find an answer to this question through this project.
I first create an audio dataset consisting of MP3s from the source singer and target singer. For instance, my source would be Justin Bieber and target Adele.
I then convert these MP3 samples to MIDI; it’s a format most ML projects work with. This allows me to work with raw numeric data that I can preprocess.
Transformers have been picking up attention (pun unintended) in most practical areas outside the ML space. It has been used for Computer Vision and Audio Processing too. Riding on the hype train, I’ll probably use transformers
by Hugging Face to train on the MIDI sequences.
Ideally, this seems like a novel data generation task so it might require some heavy compute.