This application is a GUI version of the vocal remover AI created and posted by tsurumeso. This would not have been possible without tsurumeso's hard work and dedication! You can find tsurumeso's original command line version [here](https://github.com/tsurumeso/vocal-remover)
A very special thanks to the main code contributor [DilanBoskan](https://github.com/DilanBoskan)! DilanBoskan, thank you for all of your support and hard work in helping bring this project to life!
The application was made with Tkinter for cross platform compatibility, so this should work with Windows, Mac, and Linux systems. I've only personally tested this on Windows 10 & Linux Ubuntu.
### Install Required Applications & Packages
1. Download & install Python 3.7 [here](https://www.python.org/ftp/python/3.6.8/python-3.6.8-amd64.exe) (Make sure to check the box that says "Add Python 3.7 to PATH" if you're on Windows)
- **Choose Main Model** - Here is where you choose the main model to convert your tracks with.
- **Choose Stacked Model** - These models are meant to clean up vocal residue left over in the form of vocal pinches and static. The stacked models provided are only meant for instrumental outputs generated by a track that ran through one of the main models. Selecting the *'Stack Passes'* option will enable you to select a stacked model to run with the main model. If you wish to only run a stacked model on a track, make sure the *'Stack Conversion Only* option is checked.
- The v2 & v4 AI engines use different sets of models. The available models for each engine will automatically populate within the model selection dropdowns based on which engine was selected.
All models released by me will have the values they were trained with appended to the end of the filename like so "MGM-HIGHEND_sr44100_hl512_w512_nf2048.pth". The "sr44100_hl512_w512_nf2048" portion automatically sets those values within the application, so please do not change the model files names. If there are no values appended to the end of a model, the value fields will be editable and auto-populate with default values. The default values are -
- **GPU Conversion** - This option ensures the GPU is used for conversions. It will not work if you don't have a Cuda compatible GPU (Nividia GPU's are most compatible with Cuda).
- **Post-process** - This option can potentially identify left over instrumental artifacts in the vocal outputs. This option may improve the separation on some songs. I recommend only using it if conversions don't come out well.
- **TTA** - This option performs Test-Time-Augmentation to improve the separation quality. However, having this selected will prolong the time it takes to complete a conversion. *Please note, this option is NOT compatible with the v2 AI engine.*
- **Output Image** - This option will include a spectrogram of the instrumental & vocal track outputs.
- **Stack Passes** - This option allows you to set the number of times you would like a track to run through a stacked model.
- **Stack Conversion Only** - Selecting this will allow you to bypass the main model and run a track through a stacked model only.
- **Save All Stacked Outputs** - Having this selected during stacked conversions will auto-generate a new directory to your *'Save to'* path with the track name. The new directory will contain all of the outputs generated by the whole conversion process. The amount of outputs will depend on how many stack passes you chose.
- **Model Test Mode** - This option is meant to make it easier for users to test the results of different models without having to manually create new folders and/or change the filenames. When it's selected, the application will auto-generate a new folder with the name of the selected model(s) in the *'Save to'* path you have chosen. The instrumental & vocal outputs will have the selected model(s) name(s) appended to them and save to the auto-generated directory.
- **Add New Model** - This button will automatically take you to the models folder. If you are adding a new model, make sure to add it accordingly based on the AI engine it was trained on! All new models added will automatically be detected without having to restart the application.
- **Restart Button** - If the application hangs for any reason, you can hit the circular arrow button immediately to the right of the *'Start Conversion'* button.
- The ability to drag & drop files to convert has also been added.
- The Stacked Model is meant to clean up vocal residue left over in the form of vocal pinches and static. The stacked models provided are only meant for instrumental outputs from track run through one of the main models.
- Conversion times will greatly depend on your hardware. This application will NOT be friendly to older or budget hardware. Please proceed with caution! Pay attention to your PC and make sure it doesn't overheat.
## References
- [1] Jansson et al., "Singing Voice Separation with Deep U-Net Convolutional Networks", https://ismir2017.smcnus.org/wp-content/uploads/2017/10/171_Paper.pdf
- [2] Takahashi et al., "Multi-scale Multi-band DenseNets for Audio Source Separation", https://arxiv.org/pdf/1706.09588.pdf