This application is a GUI version of the vocal remover AI created and posted by GitHub user [tsurumeso](https://github.com/tsurumeso). You can find tsurumeso's original command line version [here](https://github.com/tsurumeso/vocal-remover).
- [tsurumeso](https://github.com/tsurumeso) - The engineer who authored the AI code. Thank you for the hard work and dedication you put into the AI application this GUI is built around!
- [DilanBoskan](https://github.com/DilanBoskan) - The main GUI code contributor. Thank you for helping bring this GUI to life! Your hard work and continued support is greatly appreciated.
The application was made with Tkinter for cross-platform compatibility, so it should work with Windows, Mac, and Linux systems. However, this application has only been tested on Windows 10 & Linux Ubuntu.
2. Once Python has installed, download **Ultimate Vocal Remover GUI Version 4.0.0** [here](https://github.com/Anjok07/ultimatevocalremovergui/releases/download/v4.0.0/UVR-V4GUI.zip)
FFmpeg must be installed and configured in order for the application to be able to process any track that isn't a *.wav* file. Instructions for installing FFmpeg can be found on YouTube, WikiHow, Reddit, GitHub, and many other sources around the web.
- **Note:** If you are experiencing any errors when attempting to process any media files that are not in the *.wav* format, please ensure FFmpeg is installed & configured correctly.
- **Note:** If you are unable to open the *'VocalRemover.py'* file, please go to the [**troubleshooting**](https://github.com/Anjok07/ultimatevocalremovergui/tree/beta#troubleshooting) section below.
The v2 & v4 AI engines use different sets of models. When selected, the models available for v2 or v4 will automatically populate within the model selection dropdowns.
- The [*'Model Test Mode'*](https://github.com/Anjok07/ultimatevocalremovergui/tree/beta#checkboxes) option makes it easier for the user to test different models on given tracks.
- Selecting the [*'Stack Passes'*](https://github.com/Anjok07/ultimatevocalremovergui/tree/beta#checkboxes) option will enable you to select a stacked model to run with a main model.
- If you wish to only run a stacked model on a track, make sure the [*'Stack Conversion Only'*](https://github.com/Anjok07/ultimatevocalremovergui/tree/beta#checkboxes) option is checked.
- The wide range of main model/stacked model combinations gives the user more flexibility in discovering what model blend works best for the track(s) they are proessing.
- To reiterate, the [*'Model Test Mode'*](https://github.com/Anjok07/ultimatevocalremovergui/tree/beta#checkboxes) option streamlines the process of testing different main model/stacked model combinations on a given track. More information on this option can be found in the next section.
- **Post-process** - This option can potentially identify leftover instrumental artifacts within the vocal outputs. This option may improve the separation on *some* songs.
- **Note:** Having this option selected can potentially have an adverse effect on the conversion process, depending on the track. Because of this, it's only recommended as a last resort.
- **Note:** This option is ***not*** compatible with the *v2* AI engine.
- **Output Image** - Selecting this option will include the spectrograms in *.jpg* format for the instrumental & vocal audio outputs.
- **Stack Passes** - This option activates the stacked model conversion process and allows the user to set the number of times a track runs through a stacked model.
- **Save All Stacked Outputs** - Having this option selected will auto-generate a new folder named after the track being processed to your *'Save to'* path. The new folder will contain all of the outputs that were generated after each stack pass. The amount of audio outputs will depend on the number of stack passes chosen.
- **Note:** Each output audio file will be appended with the number of passes it has had.
- **Example:** If 5 stack passes are chosen, the application will provide you with all 5 pairs of audio outputs generated after each pass, if this option is enabled.
- The *'stacked vocal'* tracks will contain the audio of the vocal artifacts that were removed from the instrumental.
- These files can be used to verify artifact removal.
- **Model Test Mode** - This option makes it easier for users to test the results of different models, and model combinations, by eliminating the hassel of having to manually change the filenames and/or create new folders when processing the same track through multiple models. This option structures the model testing process.
- When *'Model Test Mode'* is selected, the application will auto-generate a new folder in the *'Save to'* path you have chosen.
- The filenames for the instrumental & vocal outputs will have the selected model(s) name(s) appended to them.
### Parameter Values
All models released here will have the values they were trained with appended to the end of their filenames like so, **'MGM-HIGHEND_sr44100_hl512_w512_nf2048.pth'**. The *'_sr44100_hl512_w512_nf2048'* portion automatically sets the *SR*, *HOP LENGNTH*, *WINDOW SIZE*, &*N_FFT* values within the application. If there are no values appended to the end of a selected model filename, the *SR*, *HOP LENGNTH*, *WINDOW SIZE*, &*N_FFT* fields will be editable and auto-populate with default values.
- **Restart Button** - If the application hangs for any reason, you can hit the circular arrow button immediately to the right of the *'Start Conversion'* button.
- **MGM_MAIN_v4_sr44100_hl512_w512_nf2048.pth** - This is the main model that does an excellent job removing vocals from most tracks.
- **MGM_LOWEND_A_v4_sr32000_hl512_w512_nf2048.pth** - This model focuses a bit more on removing vocals from lower frequencies.
- **MGM_LOWEND_B_v4_sr33075_hl384_w512_nf2048.pth** - This is also a model that focuses on lower end frequencies, but trained with different parameters.
- **MGM_HIGHEND_v4_sr44100_hl1024_nf2048.pth** - This model slightly focuses a bit more on higher end frequencies.
- **MODEL_BVKARAOKE_by_aufr33_v4_sr33075_hl384_w512_nf1536.pth** - This is a beta model that removes main vocals while leaving background vocals intact.
- **StackedMGM_MM_v4_sr44100_hl512_w512_nf2048.pth** - This is a strong vocal artifact removal model. This model was made to run with *'MGM_MAIN_v4_sr44100_hl512_w512_nf2048.pth'*. However, any combination may yield the desired results.
- **StackedMGM_MLA_v4_sr32000_hl512_w512_nf2048.pth** - This is a strong vocal artifact removal model. This model was made to run with *'MGM_MAIN_v4_sr44100_hl512_w512_nf2048.pth'*. However, any combination may yield a desired results.
- **StackedMGM_LL_v4_sr32000_hl512_w512_nf2048.pth** - This is a strong vocal artifact removal model. This model was made to run with *'MGM_LOWEND_A_v4_sr32000_hl512_w512_nf2048.pth'*. However, any combination may yield a desired results.
- **StackedRegA_v2_sr44100_hl1024_w512.pth** - This is a standard vocal artifact removal model.
- **StackedRegB_v2_sr44100_hl1024_w512.pth** - This is a standard vocal artifact removal model.
- **StackedArg_v2_sr44100_hl1024_w512.pth** - This model removes vocal artifacts a bit more aggressively, but may greatly degrade the audio quality of the output audio.
- **Note:** This application will *not* be friendly to older or budget hardware. Please proceed with caution! Pay attention to your PC and make sure it doesn't overheat. ***We are not responsible for any hardware damage.***
Please be as detailed as possible when posting a new issue. Make sure to provide any error outputs and/or screenshots/gif's to give us a clearer understanding of the issue you are experiencing.
- For anyone interested in the ongoing development of **Ultimate Vocal Remover GUI** please send us a pull request and we will review it. This project is 100% open-source and free for anyone to use and/or modify as they wish.
- Please note that we do not maintain or directly support any of tsurumesos AI application code. We only maintain the development and support for the **Ultimate Vocal Remover GUI**.