ultimatevocalremovergui/README.md

174 lines
11 KiB
Markdown
Raw Normal View History

2022-04-07 09:13:52 +02:00
# Ultimate Vocal Remover GUI v5.0.0
2022-04-07 09:40:31 +02:00
<img src="https://raw.githubusercontent.com/Anjok07/ultimatevocalremovergui/master/img/UVRV5.png" />
2020-07-20 23:47:52 +02:00
2020-11-13 12:54:31 +01:00
[![Release](https://img.shields.io/github/release/anjok07/ultimatevocalremovergui.svg)](https://github.com/anjok07/ultimatevocalremovergui/releases/latest)
[![Downloads](https://img.shields.io/github/downloads/anjok07/ultimatevocalremovergui/total.svg)](https://github.com/anjok07/ultimatevocalremovergui/releases)
2020-11-06 10:45:38 +01:00
2020-11-13 12:54:31 +01:00
## About
2020-11-09 11:16:50 +01:00
2022-04-07 09:13:52 +02:00
This application is a GUI version of the vocal remover AI created and posted by GitHub user [tsurumeso](https://github.com/tsurumeso). This version also comes with eight high-performance models trained by me. You can find tsurumeso's original command-line version [here](https://github.com/tsurumeso/vocal-remover).
2020-11-13 12:54:31 +01:00
2021-06-08 13:07:47 +02:00
- **The Developers**
- [Anjok07](https://github.com/anjok07)- Model collaborator & UVR developer.
2022-04-07 09:13:52 +02:00
- [aufr33](https://github.com/aufr33) - Model collaborator & fellow UVR developer. This project wouldn't be what it is without your help. Thank you for your continued support!
- [DilanBoskan](https://github.com/DilanBoskan) - The main UVR GUI developer. Thank you for helping bring the GUI to life! Your hard work and continued support are greatly appreciated.
2021-06-08 13:07:47 +02:00
- [tsurumeso](https://github.com/tsurumeso) - The engineer who authored the original AI code. Thank you for the hard work and dedication you put into the AI code UVR is built on!
2020-07-20 23:47:52 +02:00
2022-04-07 09:56:26 +02:00
## Change Log
2022-04-07 09:57:45 +02:00
- **v4 vs. v5**
2022-04-07 21:19:05 +02:00
- The v5 models significantly outperform the v4 models.
2022-04-07 10:03:00 +02:00
- The extraction's aggressiveness can be adjusted using the "Aggression Setting". The default value of 10 is optimal for most tracks.
- All v2 and v4 models have been removed.
- Stacked models have been entirely removed.
- Stacked model feature has been replaced by the new aggression setting and model ensembling.
- The NFFT, HOP_SIZE, and SR values are now set internally.
2022-04-07 09:56:26 +02:00
- **Upcoming v5.1 Update**
- Model ensembling
- MDX-NET AI engine and model support
2020-07-20 23:47:52 +02:00
## Installation
2020-11-13 12:54:31 +01:00
The application was made with Tkinter for cross-platform compatibility, so it should work with Windows, Mac, and Linux systems. However, this application has only been tested on Windows 10 & Linux Ubuntu.
2020-07-20 23:47:52 +02:00
### Install Required Applications & Packages
2022-04-07 22:36:33 +02:00
1. Download & install Python 3.9.8 [here](https://www.python.org/ftp/python/3.9.8/python-3.9.8-amd64.exe) (Windows link)
- **Note:** Ensure the *"Add Python 3.9 to PATH"* box is checked
2. Download the Source code zip here - https://github.com/Anjok07/ultimatevocalremovergui/archive/refs/tags/v5.0.1.zip
3. Download the models.zip here - https://github.com/Anjok07/ultimatevocalremovergui/releases/download/v5.0.1/models.zip
2022-04-07 22:49:20 +02:00
4. Extract the *ultimatevocalremovergui-5.0.1* folder within ultimatevocalremovergui-5.0.1.zip where ever you wish.
5. Extract the *models* folder within models.zip to the *ultimatevocalremovergui-5.0.1* directory.
2022-04-07 21:12:31 +02:00
- **Note:** At this time, the GUI is hardcoded to run the models included in this package only.
2022-04-07 22:36:33 +02:00
6. Open the command prompt from the ultimatevocalremovergui-5.0.1 directory and run the following commands, separately -
2020-11-13 12:54:31 +01:00
2020-11-09 11:16:50 +01:00
```
2020-11-13 12:54:31 +01:00
pip install --no-cache-dir -r requirements.txt
2022-04-07 22:09:53 +02:00
```
```
2022-04-07 21:30:08 +02:00
pip install torch==1.9.0+cu111 torchvision==0.10.0+cu111 torchaudio==0.9.0 -f https://download.pytorch.org/whl/torch_stable.html
2020-11-09 11:16:50 +01:00
```
2020-11-13 12:54:31 +01:00
### FFmpeg
2022-04-07 09:13:52 +02:00
FFmpeg must be installed and configured for the application to process any track that isn't a *.wav* file. Instructions for installing FFmpeg can be found on YouTube, WikiHow, Reddit, GitHub, and many other sources around the web.
2020-11-13 12:54:31 +01:00
2022-04-07 09:13:52 +02:00
- **Note:** If you are experiencing any errors when attempting to process any media files, not in the *.wav* format, please ensure FFmpeg is installed & configured correctly.
2020-11-13 12:54:31 +01:00
### Running the Vocal Remover GUI & Models
- Open the file labeled *'VocalRemover.py'*.
- It's recommended that you create a shortcut for the file labeled *'VocalRemover.py'* to your desktop for easy access.
- **Note:** If you are unable to open the *'VocalRemover.py'* file, please go to the [**troubleshooting**](https://github.com/Anjok07/ultimatevocalremovergui/tree/beta#troubleshooting) section below.
- **Note:** All output audio files will be in the *'.wav'* format.
## Option Guide
### Model Selections:
- **Choose Main Model** - Here is where you choose the main model to perform a deep vocal removal.
- Each of the models provided were trained on different parameters, though they can convert tracks of all genres.
- Each model differs in the way they process given tracks.
2022-04-07 10:57:30 +02:00
- The *'Model Test Mode'* option makes it easier for the user to test different models on given tracks.
2020-11-13 12:54:31 +01:00
### Checkboxes
- **GPU Conversion** - Selecting this option ensures the GPU is used to process conversions.
- **Note:** This option will not work if you don't have a Cuda compatible GPU.
2022-04-07 09:13:52 +02:00
- Nvidia GPU's are most compatible with Cuda.
- **Note:** CPU conversions are much slower than those processed through the GPU.
- **Post-process** - This option can potentially identify leftover instrumental artifacts within the vocal outputs. This option may improve the separation of *some* songs.
- **Note:** Having this option selected can adversely affect the conversion process, depending on the track. Because of this, it's only recommended as a last resort.
2020-11-13 12:54:31 +01:00
- **TTA** - This option performs Test-Time-Augmentation to improve the separation quality.
- **Note:** Having this selected will increase the time it takes to complete a conversion.
- **Output Image** - Selecting this option will include the spectrograms in *.jpg* format for the instrumental & vocal audio outputs.
2022-04-07 09:13:52 +02:00
- **Model Test Mode** - This option makes it easier for users to test the results of different models and model combinations by eliminating the hassle of manually changing the filenames and/or creating new folders when processing the same track through multiple models. This option structures the model testing process.
2020-11-13 12:54:31 +01:00
- When *'Model Test Mode'* is selected, the application will auto-generate a new folder in the *'Save to'* path you have chosen.
- The new auto-generated folder will be named after the model(s) selected.
- The output audio files will be saved to the auto-generated directory.
2022-04-07 09:13:52 +02:00
- The filenames for the instrumental & vocal outputs will have the selected model(s) name(s) appended.
2020-11-13 12:54:31 +01:00
### Parameter Values
2022-04-07 09:13:52 +02:00
- **Window Size** - The smaller your window size, the better your conversions will be. However, a smaller window means longer conversion times and heavier resource usage.
- Here are the selectable window size values -
2020-11-25 04:10:54 +01:00
- **1024** - Low conversion quality, shortest conversion time, low resource usage
2020-11-25 04:12:04 +01:00
- **512** - Average conversion quality, average conversion time, normal resource usage
2020-11-25 04:10:54 +01:00
- **320** - Better conversion quality, long conversion time, high resource usage
2022-04-07 09:13:52 +02:00
2022-04-07 09:56:26 +02:00
- **Aggression Setting** - This option allows you to set how strong the vocal removal will be. The range is 0-100. The higher the value, the more the vocal data will be removed. Higher values can result in muddy-sounding instrumentals for instrumental models depending on the track being converted, so this isn't always recommended. The default is 10 for instrumental & vocal models.
2020-11-25 04:09:39 +01:00
2020-11-13 12:54:31 +01:00
- **Default Values:**
2022-04-07 10:57:30 +02:00
- **Window Size** - 512
2022-04-07 21:21:03 +02:00
- **Aggression Setting** - 10 (optimal setting for all conversions)
2020-11-13 12:54:31 +01:00
### Other Buttons:
2022-04-07 09:13:52 +02:00
- **Open Export Directory** - This button will open your 'save to' directory.
- **Restart Button** - If the application hangs for any reason, you can hit the circular arrow button to the right of the *'Start Conversion'* button.
2020-11-13 12:54:31 +01:00
## Models Included
2020-07-20 23:47:52 +02:00
2020-11-13 12:54:31 +01:00
All of the models included in the release were trained on large datasets containing diverse sets of music genres.
2022-04-07 10:57:30 +02:00
- **PLEASE NOTE:** Do not change the name of the models provided! The required parameters are specified and appended to the end of the filenames.
2022-04-07 21:25:27 +02:00
- **Model Network Types**
- **HP2** - The model layers are much larger. However, this makes them resource heavy.
- **HP** - The model layers are the standard size for UVR v5.
2020-11-13 12:54:31 +01:00
Here's a list of the models included within the package -
2022-04-07 09:13:52 +02:00
- **Main Models**
- **HP2_3BAND_44100_MSB2.pth** - This is a strong instrumental model trained using more data and new parameters.
2022-04-07 21:25:27 +02:00
- **HP2_4BAND_44100_1.pth** - This is a strong instrumental model.
2022-04-07 09:13:52 +02:00
- **HP2_4BAND_44100_2.pth** - This is a fine tuned version of the HP2_4BAND_44100_1.pth model.
- **HP_4BAND_44100_A.pth** - This is a strong instrumental model.
- **HP_4BAND_44100_B.pth** - This is a fine tuned version of the HP_4BAND_44100_A.pth model.
- **HP_KAROKEE_4BAND_44100_SN.pth** - This is a model that removes main vocals while leaving background vocals intact.
2022-04-07 21:21:03 +02:00
- **HP_Vocal_4BAND_44100.pth** - This model emphasizes vocal extraction. The vocal stem will be clean, but the instrumental might sound muddy.
2022-04-07 09:13:52 +02:00
- **HP_Vocal_AGG_4BAND_44100.pth** - This model also emphasizes vocal extraction and is a bit more aggressive than the previous model.
2020-11-13 12:54:31 +01:00
## Other GUI Notes
- The application will automatically remember your *'save to'* path upon closing and reopening until it's changed.
- **Note:** The last directory accessed within the application will also be remembered.
- Multiple conversions are supported.
- The ability to drag & drop audio files to convert has also been added.
2022-04-07 09:13:52 +02:00
- Conversion times will significantly depend on your hardware.
- **Note:** This application will *not* be friendly to older or budget hardware. Please proceed with caution! Please pay attention to your PC and make sure it doesn't overheat. ***We are not responsible for any hardware damage.***
2020-11-13 12:54:31 +01:00
## Troubleshooting
### Common Issues
- This application is not compatible with 32-bit versions of Python. Please make sure your version of Python is 64-bit.
- If FFmpeg is not installed, the application will throw an error if the user attempts to convert a non-WAV file.
### Issue Reporting
Please be as detailed as possible when posting a new issue. Make sure to provide any error outputs and/or screenshots/gif's to give us a clearer understanding of the issue you are experiencing.
If the *'VocalRemover.py'* file won't open *under any circumstances* and all other resources have been exhausted, please do the following -
2022-04-07 21:21:03 +02:00
1. Open the cmd prompt from the UVR-V5GUI directory
2020-11-13 12:54:31 +01:00
2. Run the following command -
2020-07-21 00:12:25 +02:00
```
2020-11-13 12:54:31 +01:00
python VocalRemover.py
2020-07-20 23:47:52 +02:00
```
2020-11-16 00:44:35 +01:00
3. Copy and paste the error output shown in the cmd prompt to the issues center on the GitHub repository.
2020-11-13 12:54:31 +01:00
## License
The **Ultimate Vocal Remover GUI** code is [MIT-licensed](LICENSE).
2022-04-07 09:13:52 +02:00
- **PLEASE NOTE:** For all third-party application developers who wish to use our models, please honor the MIT license by providing credit to UVR and its developers Anjok07, aufr33, & tsurumeso.
2021-05-13 04:47:30 +02:00
2020-11-13 12:54:31 +01:00
## Contributing
2022-04-07 09:13:52 +02:00
- For anyone interested in the ongoing development of **Ultimate Vocal Remover GUI**, please send us a pull request, and we will review it. This project is 100% open-source and free for anyone to use and/or modify as they wish.
2020-12-15 00:04:46 +01:00
- Please note that we do not maintain or directly support any of tsurumesos AI application code. We only maintain the development and support for the **Ultimate Vocal Remover GUI** and the models provided.
2020-11-13 12:54:31 +01:00
## References
- [1] Takahashi et al., "Multi-scale Multi-band DenseNets for Audio Source Separation", https://arxiv.org/pdf/1706.09588.pdf