74
README.md
@ -1,5 +1,5 @@
|
||||
# Ultimate Vocal Remover GUI v5.5.1
|
||||
<img src="https://raw.githubusercontent.com/Anjok07/ultimatevocalremovergui/master/gui_data/img/UVR_5_5_1.png?raw=true" />
|
||||
# Ultimate Vocal Remover GUI v5.6
|
||||
<img src="https://raw.githubusercontent.com/Anjok07/ultimatevocalremovergui/v5.6/gui_data/img/UVR_5_6_0.png?raw=true" />
|
||||
|
||||
[![Release](https://img.shields.io/github/release/anjok07/ultimatevocalremovergui.svg)](https://github.com/anjok07/ultimatevocalremovergui/releases/latest)
|
||||
[![Downloads](https://img.shields.io/github/downloads/anjok07/ultimatevocalremovergui/total.svg)](https://github.com/anjok07/ultimatevocalremovergui/releases)
|
||||
@ -73,7 +73,6 @@ In order to use the Time Stretch or Change Pitch tool, you'll need Rubber Band.
|
||||
</details>
|
||||
|
||||
### MacOS Installation
|
||||
|
||||
- Please Note:
|
||||
- This bundle is intended for those running macOS Catalina and above.
|
||||
- Application functionality for systems running macOS Mojave or lower is not guaranteed.
|
||||
@ -167,7 +166,6 @@ pip3 install -r requirements.txt
|
||||
</details>
|
||||
|
||||
### Other Application Notes
|
||||
|
||||
- Nvidia RTX 1060 6GB is the minimum requirement for GPU conversions.
|
||||
- Nvidia GPUs with at least 8GBs of V-RAM are recommended.
|
||||
- AMD Radeon GPUs are not supported at this time.
|
||||
@ -178,78 +176,10 @@ pip3 install -r requirements.txt
|
||||
- Conversion times will significantly depend on your hardware.
|
||||
- These models are computationally intensive.
|
||||
|
||||
## Change Log
|
||||
|
||||
### Most Recent Changes:
|
||||
- Fixed Download Center model list issue.
|
||||
- Fixed audio clip in ensemble mode.
|
||||
- Fixed output model name issue in ensemble mode.
|
||||
- Added "Batch Mode" for MDX-Net to increase performance.
|
||||
- Batch Mode is more memory efficient.
|
||||
- Batch Mode produces the best output, regardless of batch size.
|
||||
- Added Batch Mode for VR Architecture.
|
||||
- Added Mixer Mode for Demucs.
|
||||
- This option may improve separation for some 4-stem models.
|
||||
|
||||
### Fixes & Changes going from UVR v5.4 to v5.5:
|
||||
|
||||
- The progress bar is now fully synced up with every process in the application.
|
||||
- Drag-n-drop feature should now work every time.
|
||||
- Users can now drop large batches of files and directories as inputs. When directories are dropped, the application will search for any file with an audio extension and add it to the list of inputs.
|
||||
- Fixed low-resolution icon.
|
||||
- Added the ability to download models manually if the application can't connect to the internet.
|
||||
- Various bug fixes for the Download Center.
|
||||
- Various design changes.
|
||||
|
||||
### Performance:
|
||||
|
||||
- Model load times are faster.
|
||||
- Importing/exporting audio files is faster.
|
||||
|
||||
### New Options:
|
||||
|
||||
- "Select Saved Settings" option - Allows the user to save the current settings of the whole application. You can also load saved settings or reset them to the default.
|
||||
- "Right-click" menu - Allows for quick access to important options.
|
||||
- "Help Hints" option - When enabled, users can hover over options to see pop-up text that describes that option. The right-clicking option also allows copying the "Help Hint" text.
|
||||
- Secondary Model Mode - This option is an expanded version of the "Demucs Model" option only available to MDX-Net. Except now, this option is available in all three AI Networks and for any stem. Any model can now be Secondary, and the user can choose the amount of influence it has on the final result.
|
||||
- Robust caching for ensemble mode, allowing for much faster processing times.
|
||||
- Clicking the "Input" field will pop up a new window that allows the user to go through all of the selected audio inputs. Within this menu, users can:
|
||||
- Remove inputs.
|
||||
- Verify inputs.
|
||||
- Create samples of selected inputs.
|
||||
- "Sample Mode" option - Allows the user to process only part of a track to sample settings or a model without running a complete conversion.
|
||||
- The number in the parentheses is the current number of seconds the generated sample will be.
|
||||
- You can choose the number of seconds to extract from the track in the "Additional Settings" menu.
|
||||
|
||||
### VR Architecture:
|
||||
|
||||
- Ability to toggle "High-End Processing."
|
||||
- Support for the latest VR architecture
|
||||
- Crop Size and Batch Size are specifically for models using the latest architecture only.
|
||||
|
||||
### MDX-NET:
|
||||
|
||||
- "Denoise Output" option results in cleaner results, but the processing time will be longer. This option has replaced Noise Reduction.
|
||||
- "Spectral Inversion" option uses spectral inversion techniques for a cleaner secondary stem result. This option may slow down the audio export process.
|
||||
- Secondary stem now has the same frequency cut-off as the main stem.
|
||||
|
||||
### Demucs:
|
||||
|
||||
- Demucs v4 models are now supported, including the 6-stem model.
|
||||
- Combining remaining stems instead of inverting selected stem with the mixture only when a user does not select "All Stems."
|
||||
- A "Pre-process" model that allows the user to run an inference through a robust vocal or instrumental model and separate the remaining stems from its generated instrumental mix. This option can significantly reduce vocal bleed in other Demucs-generated non-vocal stems.
|
||||
- The Pre-process model is intended for Demucs separations for all stems except vocals and instrumentals.
|
||||
|
||||
### Ensemble Mode:
|
||||
|
||||
- Ensemble Mode has been extended to include the following:
|
||||
- "Averaging" is a new algorithm that averages the final results.
|
||||
- Unlimited models in the ensemble.
|
||||
- Ability to save different ensembles.
|
||||
- Ability to ensemble outputs for all individual stem types.
|
||||
- Ability to choose unique ensemble algorithms.
|
||||
- Ability to ensemble all 4 Demucs stems at once.
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Common Issues
|
||||
|
@ -1,4 +1,4 @@
|
||||
VERSION = 'v5.5.1'
|
||||
PATCH = 'UVR_Patch_3_31_23_5_5'
|
||||
PATCH_MAC = 'UVR_Patch_01_10_12_6_50'
|
||||
PATCH_LINUX = 'UVR_Patch_01_01_23_6_50'
|
||||
VERSION = 'v5.6.0'
|
||||
PATCH = 'UVR_Patch_9_25_23_2_1'
|
||||
PATCH_MAC = 'UVR_Patch_9_25_23_2_1'
|
||||
PATCH_LINUX = 'UVR_Patch_9_25_23_2_1'
|
@ -140,6 +140,8 @@ def apply_model(model, mix, shifts=1, split=True, overlap=0.25, transition_power
|
||||
be on `device`, while the entire tracks will be stored on `mix.device`.
|
||||
"""
|
||||
|
||||
#print("Progress Bar?: ", type(set_progress_bar))
|
||||
|
||||
global fut_length
|
||||
global bag_num
|
||||
global prog_bar
|
||||
|
@ -1,13 +1,26 @@
|
||||
import os
|
||||
import platform
|
||||
from screeninfo import get_monitors
|
||||
from PIL import Image
|
||||
from PIL import ImageTk
|
||||
|
||||
OPERATING_SYSTEM = platform.system()
|
||||
|
||||
def get_screen_height():
|
||||
monitors = get_monitors()
|
||||
if len(monitors) == 0:
|
||||
raise Exception("Failed to get screen height")
|
||||
return monitors[0].height
|
||||
return monitors[0].height, monitors[0].width
|
||||
|
||||
def scale_values(value):
|
||||
if not SCALE_WIN_SIZE == 1920:
|
||||
ratio = SCALE_WIN_SIZE/1920 # Approx. 1.3333 for 2K
|
||||
return value * ratio
|
||||
else:
|
||||
return value
|
||||
|
||||
SCREEN_HIGHT, SCREEN_WIDTH = get_screen_height()
|
||||
SCALE_WIN_SIZE = 1920
|
||||
|
||||
SCREEN_SIZE_VALUES = {
|
||||
"normal": {
|
||||
@ -20,10 +33,10 @@ SCREEN_SIZE_VALUES = {
|
||||
'COMMAND_HEIGHT': 141,
|
||||
'PROGRESS_HEIGHT': 25,
|
||||
'PADDING': 7,
|
||||
'WIDTH': 680
|
||||
},
|
||||
"small": {
|
||||
"credits_img":(50, 50),
|
||||
## App Size
|
||||
'IMAGE_HEIGHT': 135,
|
||||
'FILEPATHS_HEIGHT': 85,
|
||||
'OPTIONS_HEIGHT': 274,
|
||||
@ -31,6 +44,7 @@ SCREEN_SIZE_VALUES = {
|
||||
'COMMAND_HEIGHT': 80,
|
||||
'PROGRESS_HEIGHT': 6,
|
||||
'PADDING': 5,
|
||||
'WIDTH': 680
|
||||
},
|
||||
"medium": {
|
||||
"credits_img":(50, 50),
|
||||
@ -42,23 +56,24 @@ SCREEN_SIZE_VALUES = {
|
||||
'COMMAND_HEIGHT': 115,
|
||||
'PROGRESS_HEIGHT': 9,
|
||||
'PADDING': 7,
|
||||
'WIDTH': 680
|
||||
},
|
||||
}
|
||||
|
||||
|
||||
try:
|
||||
if get_screen_height() >= 900:
|
||||
if SCREEN_HIGHT >= 900:
|
||||
determined_size = SCREEN_SIZE_VALUES["normal"]
|
||||
elif get_screen_height() <= 720:
|
||||
elif SCREEN_HIGHT <= 720:
|
||||
determined_size = SCREEN_SIZE_VALUES["small"]
|
||||
else:
|
||||
determined_size = SCREEN_SIZE_VALUES["medium"]
|
||||
except:
|
||||
determined_size = SCREEN_SIZE_VALUES["normal"]
|
||||
|
||||
image_scale_1, image_scale_2 = 20, 33
|
||||
|
||||
class ImagePath():
|
||||
def __init__(self, base_path):
|
||||
|
||||
img_path = os.path.join(base_path, 'gui_data', 'img')
|
||||
credits_path = os.path.join(img_path, 'credits.png')
|
||||
donate_path = os.path.join(img_path, 'donate.png')
|
||||
@ -69,16 +84,28 @@ class ImagePath():
|
||||
stop_path = os.path.join(img_path, 'stop.png')
|
||||
play_path = os.path.join(img_path, 'play.png')
|
||||
pause_path = os.path.join(img_path, 'pause.png')
|
||||
up_img_path = os.path.join(img_path, "up.png")
|
||||
down_img_path = os.path.join(img_path, "down.png")
|
||||
left_img_path = os.path.join(img_path, "left.png")
|
||||
right_img_path = os.path.join(img_path, "right.png")
|
||||
clear_img_path = os.path.join(img_path, "clear.png")
|
||||
copy_img_path = os.path.join(img_path, "copy.png")
|
||||
self.banner_path = os.path.join(img_path, 'UVR-banner.png')
|
||||
|
||||
self.efile_img = self.open_image(path=efile_path,size=(20, 20))
|
||||
self.stop_img = self.open_image(path=stop_path, size=(20, 20))
|
||||
self.play_img = self.open_image(path=play_path, size=(20, 20))
|
||||
self.pause_img = self.open_image(path=pause_path, size=(20, 20))
|
||||
self.help_img = self.open_image(path=help_path, size=(20, 20))
|
||||
self.download_img = self.open_image(path=download_path, size=(30, 30))
|
||||
self.donate_img = self.open_image(path=donate_path, size=(30, 30))
|
||||
self.key_img = self.open_image(path=key_path, size=(30, 30))
|
||||
self.efile_img = self.open_image(path=efile_path,size=(image_scale_1, image_scale_1))
|
||||
self.stop_img = self.open_image(path=stop_path, size=(image_scale_1, image_scale_1))
|
||||
self.play_img = self.open_image(path=play_path, size=(image_scale_1, image_scale_1))
|
||||
self.pause_img = self.open_image(path=pause_path, size=(image_scale_1, image_scale_1))
|
||||
self.help_img = self.open_image(path=help_path, size=(image_scale_1, image_scale_1))
|
||||
self.download_img = self.open_image(path=download_path, size=(image_scale_2, image_scale_2))
|
||||
self.donate_img = self.open_image(path=donate_path, size=(image_scale_2, image_scale_2))
|
||||
self.key_img = self.open_image(path=key_path, size=(image_scale_2, image_scale_2))
|
||||
self.up_img = self.open_image(path=up_img_path, size=(image_scale_2, image_scale_2))
|
||||
self.down_img = self.open_image(path=down_img_path, size=(image_scale_2, image_scale_2))
|
||||
self.left_img = self.open_image(path=left_img_path, size=(image_scale_2, image_scale_2))
|
||||
self.right_img = self.open_image(path=right_img_path, size=(image_scale_2, image_scale_2))
|
||||
self.clear_img = self.open_image(path=clear_img_path, size=(image_scale_2, image_scale_2))
|
||||
self.copy_img = self.open_image(path=copy_img_path, size=(image_scale_2, image_scale_2))
|
||||
self.credits_img = self.open_image(path=credits_path, size=determined_size["credits_img"])
|
||||
|
||||
def open_image(self, path: str, size: tuple = None, keep_aspect: bool = True, rotate: int = 0) -> ImageTk.PhotoImage:
|
||||
@ -111,11 +138,233 @@ class ImagePath():
|
||||
|
||||
return ImageTk.PhotoImage(img)
|
||||
|
||||
class AdjustedValues():
|
||||
IMAGE_HEIGHT = determined_size["IMAGE_HEIGHT"]
|
||||
FILEPATHS_HEIGHT = determined_size["FILEPATHS_HEIGHT"]
|
||||
OPTIONS_HEIGHT = determined_size["OPTIONS_HEIGHT"]
|
||||
CONVERSIONBUTTON_HEIGHT = determined_size["CONVERSIONBUTTON_HEIGHT"]
|
||||
COMMAND_HEIGHT = determined_size["COMMAND_HEIGHT"]
|
||||
PROGRESS_HEIGHT = determined_size["PROGRESS_HEIGHT"]
|
||||
PADDING = determined_size["PADDING"]
|
||||
#All Sizes Below Calibrated to 1080p!
|
||||
|
||||
if OPERATING_SYSTEM=="Darwin":
|
||||
FONT_SIZE_F1 = 13
|
||||
FONT_SIZE_F2 = 11
|
||||
FONT_SIZE_F3 = 12
|
||||
FONT_SIZE_0 = 9
|
||||
FONT_SIZE_1 = 11
|
||||
FONT_SIZE_2 = 12
|
||||
FONT_SIZE_3 = 13
|
||||
FONT_SIZE_4 = 14
|
||||
FONT_SIZE_5 = 15
|
||||
FONT_SIZE_6 = 17
|
||||
HELP_HINT_CHECKBOX_WIDTH = 13
|
||||
MDX_CHECKBOXS_WIDTH = 14
|
||||
VR_CHECKBOXS_WIDTH = 14
|
||||
ENSEMBLE_CHECKBOXS_WIDTH = 18
|
||||
DEMUCS_CHECKBOXS_WIDTH = 14
|
||||
DEMUCS_PRE_CHECKBOXS_WIDTH = 20
|
||||
GEN_SETTINGS_WIDTH = 17
|
||||
MENU_COMBOBOX_WIDTH = 16
|
||||
MENU_OPTION_WIDTH = 12
|
||||
READ_ONLY_COMBO_WIDTH = 35
|
||||
SETTINGS_BUT_WIDTH = 19
|
||||
VR_BUT_WIDTH = 16
|
||||
SET_MENUS_CHECK_WIDTH = 12
|
||||
COMBO_WIDTH = 14
|
||||
SET_VOC_SPLIT_CHECK_WIDTH = 21
|
||||
elif OPERATING_SYSTEM=="Linux":
|
||||
FONT_SIZE_F1 = 10
|
||||
FONT_SIZE_F2 = 8
|
||||
FONT_SIZE_F3 = 9
|
||||
FONT_SIZE_0 = 7
|
||||
FONT_SIZE_1 = 8
|
||||
FONT_SIZE_2 = 9
|
||||
FONT_SIZE_3 = 10
|
||||
FONT_SIZE_4 = 11
|
||||
FONT_SIZE_5 = 12
|
||||
FONT_SIZE_6 = 15
|
||||
HELP_HINT_CHECKBOX_WIDTH = 13
|
||||
MDX_CHECKBOXS_WIDTH = 14
|
||||
VR_CHECKBOXS_WIDTH = 16
|
||||
ENSEMBLE_CHECKBOXS_WIDTH = 25
|
||||
DEMUCS_CHECKBOXS_WIDTH = 18
|
||||
DEMUCS_PRE_CHECKBOXS_WIDTH = 27
|
||||
GEN_SETTINGS_WIDTH = 17
|
||||
MENU_COMBOBOX_WIDTH = 19
|
||||
MENU_OPTION_WIDTH = 15
|
||||
READ_ONLY_COMBO_WIDTH = 45
|
||||
COMBO_WIDTH = 19
|
||||
SETTINGS_BUT_WIDTH = 26
|
||||
VR_BUT_WIDTH = 20
|
||||
SET_MENUS_CHECK_WIDTH = 15
|
||||
SET_VOC_SPLIT_CHECK_WIDTH = 28
|
||||
elif OPERATING_SYSTEM=="Windows":
|
||||
HELP_HINT_CHECKBOX_WIDTH = 15
|
||||
MDX_CHECKBOXS_WIDTH = 14
|
||||
VR_CHECKBOXS_WIDTH = 14
|
||||
ENSEMBLE_CHECKBOXS_WIDTH = 20
|
||||
DEMUCS_CHECKBOXS_WIDTH = 14
|
||||
DEMUCS_PRE_CHECKBOXS_WIDTH = 20
|
||||
GEN_SETTINGS_WIDTH = 18
|
||||
MENU_COMBOBOX_WIDTH = 16
|
||||
MENU_OPTION_WIDTH = 12
|
||||
READ_ONLY_COMBO_WIDTH = 35
|
||||
SETTINGS_BUT_WIDTH = 20
|
||||
VR_BUT_WIDTH = 16
|
||||
SET_MENUS_CHECK_WIDTH = 13
|
||||
COMBO_WIDTH = 14
|
||||
SET_VOC_SPLIT_CHECK_WIDTH = 23
|
||||
|
||||
FONT_SIZE_F1 = 10
|
||||
FONT_SIZE_F2 = 8
|
||||
FONT_SIZE_F3 = 9
|
||||
FONT_SIZE_0 = 7
|
||||
FONT_SIZE_1 = 8
|
||||
FONT_SIZE_2 = 9
|
||||
FONT_SIZE_3 = 10
|
||||
FONT_SIZE_4 = 11
|
||||
FONT_SIZE_5 = 13
|
||||
FONT_SIZE_6 = 15
|
||||
|
||||
#Main Size Values:
|
||||
IMAGE_HEIGHT = determined_size["IMAGE_HEIGHT"]
|
||||
FILEPATHS_HEIGHT = determined_size["FILEPATHS_HEIGHT"]
|
||||
OPTIONS_HEIGHT = determined_size["OPTIONS_HEIGHT"]
|
||||
CONVERSIONBUTTON_HEIGHT = determined_size["CONVERSIONBUTTON_HEIGHT"]
|
||||
COMMAND_HEIGHT = determined_size["COMMAND_HEIGHT"]
|
||||
PROGRESS_HEIGHT = determined_size["PROGRESS_HEIGHT"]
|
||||
PADDING = determined_size["PADDING"]
|
||||
WIDTH = determined_size["WIDTH"]
|
||||
|
||||
# IMAGE_HEIGHT = 140
|
||||
# FILEPATHS_HEIGHT = 75
|
||||
# OPTIONS_HEIGHT = 262
|
||||
# CONVERSIONBUTTON_HEIGHT = 30
|
||||
# COMMAND_HEIGHT = 141
|
||||
# PROGRESS_HEIGHT = 25
|
||||
# PADDING = 7
|
||||
# WIDTH = 680
|
||||
|
||||
MENU_PADDING_1 = 5
|
||||
MENU_PADDING_2 = 10
|
||||
MENU_PADDING_3 = 15
|
||||
MENU_PADDING_4 = 3
|
||||
|
||||
#Main Frame Sizes
|
||||
X_CONVERSION_BUTTON_1080P = 50
|
||||
WIDTH_CONVERSION_BUTTON_1080P = -100
|
||||
HEIGHT_GENERIC_BUTTON_1080P = 35
|
||||
X_STOP_BUTTON_1080P = -10 - 35
|
||||
X_SETTINGS_BUTTON_1080P = -670
|
||||
X_PROGRESSBAR_1080P = 25
|
||||
WIDTH_PROGRESSBAR_1080P = -50
|
||||
X_CONSOLE_FRAME_1080P = 15
|
||||
WIDTH_CONSOLE_FRAME_1080P = -30
|
||||
HO_S = 7
|
||||
|
||||
#File Frame Sizes
|
||||
FILEPATHS_FRAME_X = 10
|
||||
FILEPATHS_FRAME_Y = 155
|
||||
FILEPATHS_FRAME_WIDTH = -20
|
||||
MUSICFILE_BUTTON_X = 0
|
||||
MUSICFILE_BUTTON_Y = 5
|
||||
MUSICFILE_BUTTON_WIDTH = 0
|
||||
MUSICFILE_BUTTON_HEIGHT = -5
|
||||
MUSICFILE_ENTRY_X = 7.5
|
||||
MUSICFILE_ENTRY_WIDTH = -50
|
||||
MUSICFILE_ENTRY_HEIGHT = -5
|
||||
MUSICFILE_OPEN_X = -45
|
||||
MUSICFILE_OPEN_Y = 160
|
||||
MUSICFILE_OPEN_WIDTH = 35
|
||||
MUSICFILE_OPEN_HEIGHT = 33
|
||||
SAVETO_BUTTON_X = 0
|
||||
SAVETO_BUTTON_Y = 5
|
||||
SAVETO_BUTTON_WIDTH = 0
|
||||
SAVETO_BUTTON_HEIGHT = -5
|
||||
SAVETO_ENTRY_X = 7.5
|
||||
SAVETO_ENTRY_WIDTH = -50
|
||||
SAVETO_ENTRY_HEIGHT = -5
|
||||
SAVETO_OPEN_X = -45
|
||||
SAVETO_OPEN_Y = 197.5
|
||||
SAVETO_OPEN_WIDTH = 35
|
||||
SAVETO_OPEN_HEIGHT = 32
|
||||
|
||||
#Main Option menu
|
||||
OPTIONS_FRAME_X = 10
|
||||
OPTIONS_FRAME_Y = 250
|
||||
OPTIONS_FRAME_WIDTH = -20
|
||||
FILEONE_LABEL_X = -28
|
||||
FILEONE_LABEL_WIDTH = -38
|
||||
FILETWO_LABEL_X = -32
|
||||
FILETWO_LABEL_WIDTH = -20
|
||||
TIME_WINDOW_LABEL_X = -43
|
||||
TIME_WINDOW_LABEL_WIDTH = 0
|
||||
INTRO_ANALYSIS_LABEL_X = -83
|
||||
INTRO_ANALYSIS_LABEL_WIDTH = -50
|
||||
INTRO_ANALYSIS_OPTION_X = -68
|
||||
DB_ANALYSIS_LABEL_X = 62
|
||||
DB_ANALYSIS_LABEL_WIDTH = -34
|
||||
DB_ANALYSIS_OPTION_X = 86
|
||||
WAV_TYPE_SET_LABEL_X = -43
|
||||
WAV_TYPE_SET_LABEL_WIDTH = 0
|
||||
ENTRY_WIDTH = 222
|
||||
|
||||
# Constants for the ensemble_listbox_Frame
|
||||
ENSEMBLE_LISTBOX_FRAME_X = -25
|
||||
ENSEMBLE_LISTBOX_FRAME_Y = -20
|
||||
ENSEMBLE_LISTBOX_FRAME_WIDTH = 0
|
||||
ENSEMBLE_LISTBOX_FRAME_HEIGHT = 67
|
||||
|
||||
# Constants for the ensemble_listbox_scroll
|
||||
ENSEMBLE_LISTBOX_SCROLL_X = 195
|
||||
ENSEMBLE_LISTBOX_SCROLL_Y = -20
|
||||
ENSEMBLE_LISTBOX_SCROLL_WIDTH = -48
|
||||
ENSEMBLE_LISTBOX_SCROLL_HEIGHT = 69
|
||||
|
||||
# Constants for Radio Buttons
|
||||
RADIOBUTTON_X_WAV = 457
|
||||
RADIOBUTTON_X_FLAC = 300
|
||||
RADIOBUTTON_X_MP3 = 143
|
||||
RADIOBUTTON_Y = -5
|
||||
RADIOBUTTON_WIDTH = 0
|
||||
RADIOBUTTON_HEIGHT = 6
|
||||
MAIN_ROW_Y_1 = -15
|
||||
MAIN_ROW_Y_2 = -17
|
||||
MAIN_ROW_X_1 = -4
|
||||
MAIN_ROW_X_2 = 21
|
||||
MAIN_ROW_2_Y_1 = -15
|
||||
MAIN_ROW_2_Y_2 = -17
|
||||
MAIN_ROW_2_X_1 = -28
|
||||
MAIN_ROW_2_X_2 = 1
|
||||
LOW_MENU_Y_1 = 18
|
||||
LOW_MENU_Y_2 = 16
|
||||
SUB_ENT_ROW_X = -2
|
||||
MAIN_ROW_WIDTH = -53
|
||||
MAIN_ROW_ALIGN_WIDTH = -86
|
||||
CHECK_BOX_Y = 0
|
||||
CHECK_BOX_X = 20
|
||||
CHECK_BOX_WIDTH = -49
|
||||
CHECK_BOX_HEIGHT = 2
|
||||
LEFT_ROW_WIDTH = -10
|
||||
LABEL_HEIGHT = -5
|
||||
OPTION_HEIGHT = 8
|
||||
LABEL_X_OFFSET = -28
|
||||
LABEL_WIDTH = -38
|
||||
ENTRY_WIDTH = 179.5
|
||||
ENTRY_OPEN_BUTT_WIDTH = -185
|
||||
ENTRY_OPEN_BUTT_X_OFF = 405
|
||||
UPDATE_LABEL_WIDTH = 35
|
||||
|
||||
HEIGHT_CONSOLE_FRAME_1080P = COMMAND_HEIGHT + HO_S
|
||||
LOW_MENU_Y = LOW_MENU_Y_1, LOW_MENU_Y_2
|
||||
MAIN_ROW_Y = MAIN_ROW_Y_1, MAIN_ROW_Y_2
|
||||
MAIN_ROW_X = MAIN_ROW_X_1, MAIN_ROW_X_2
|
||||
MAIN_ROW_2_Y = MAIN_ROW_2_Y_1, MAIN_ROW_2_Y_2
|
||||
MAIN_ROW_2_X = MAIN_ROW_2_X_1, MAIN_ROW_2_X_2
|
||||
|
||||
LABEL_Y = MAIN_ROW_Y[0]
|
||||
ENTRY_Y = MAIN_ROW_Y[1]
|
||||
|
||||
BUTTON_Y_1080P = IMAGE_HEIGHT + FILEPATHS_HEIGHT + OPTIONS_HEIGHT - 8 + PADDING*2
|
||||
HEIGHT_PROGRESSBAR_1080P = PROGRESS_HEIGHT
|
||||
Y_OFFSET_PROGRESS_BAR_1080P = IMAGE_HEIGHT + FILEPATHS_HEIGHT + OPTIONS_HEIGHT + CONVERSIONBUTTON_HEIGHT + COMMAND_HEIGHT + PADDING*4
|
||||
Y_OFFSET_CONSOLE_FRAME_1080P = IMAGE_HEIGHT + FILEPATHS_HEIGHT + OPTIONS_HEIGHT + CONVERSIONBUTTON_HEIGHT + PADDING + X_PROGRESSBAR_1080P
|
||||
|
||||
LABEL_Y_OFFSET = MAIN_ROW_Y[0]
|
||||
ENTRY_X_OFFSET = SUB_ENT_ROW_X
|
||||
ENTRY_Y_OFFSET = MAIN_ROW_Y[1]
|
||||
OPTION_WIDTH = MAIN_ROW_ALIGN_WIDTH
|
@ -101,6 +101,6 @@ def error_dialouge(exception):
|
||||
final_message = full_text
|
||||
break
|
||||
else:
|
||||
final_message = (f'{error_name}: {exception}\n\n{CONTACT_DEV}')
|
||||
final_message = (f'An Error Occurred: {error_name}\n\n{CONTACT_DEV}')
|
||||
|
||||
return final_message
|
||||
|
BIN
gui_data/fonts/Montserrat/Montserrat.ttf
Normal file
1
gui_data/fonts/other/own_font_goes_here.txt
Normal file
@ -0,0 +1 @@
|
||||
0
|
Before Width: | Height: | Size: 45 KiB After Width: | Height: | Size: 108 KiB |
BIN
gui_data/img/UVR_5_6_0.png
Normal file
After Width: | Height: | Size: 124 KiB |
BIN
gui_data/img/clear.png
Normal file
After Width: | Height: | Size: 757 B |
BIN
gui_data/img/copy.png
Normal file
After Width: | Height: | Size: 18 KiB |
BIN
gui_data/img/down.png
Normal file
After Width: | Height: | Size: 614 B |
BIN
gui_data/img/left.png
Normal file
After Width: | Height: | Size: 438 B |
BIN
gui_data/img/right.png
Normal file
After Width: | Height: | Size: 425 B |
Before Width: | Height: | Size: 276 KiB After Width: | Height: | Size: 276 KiB |
BIN
gui_data/img/up.png
Normal file
After Width: | Height: | Size: 491 B |
1
gui_data/model_manual_download.json
Normal file
4
gui_data/own_font.json
Normal file
@ -0,0 +1,4 @@
|
||||
{
|
||||
"font_name": null,
|
||||
"font_file": null
|
||||
}
|
@ -38,10 +38,11 @@ def init(func):
|
||||
|
||||
|
||||
@init
|
||||
def set_theme(theme):
|
||||
def set_theme(theme, font_name="Century Gothic", f_size=10):
|
||||
if theme not in {"dark", "light"}:
|
||||
raise RuntimeError(f"not a valid theme name: {theme}")
|
||||
|
||||
root.globalsetvar("fontName", (font_name, f_size))
|
||||
root.tk.call("set_theme", theme)
|
||||
|
||||
|
||||
|
@ -8,6 +8,8 @@ proc set_theme {mode} {
|
||||
if {$mode == "dark"} {
|
||||
ttk::style theme use "sun-valley-dark"
|
||||
|
||||
set fontString "$::fontName"
|
||||
|
||||
array set colors {
|
||||
-fg "#F6F6F7"
|
||||
-bg "#0e0e0f"
|
||||
@ -26,7 +28,7 @@ proc set_theme {mode} {
|
||||
-insertwidth 0 \
|
||||
-insertcolor $colors(-fg) \
|
||||
-fieldbackground $colors(-selectbg) \
|
||||
-font {"Century Gothic" 10} \
|
||||
-font $fontString \
|
||||
-borderwidth 0 \
|
||||
-relief flat
|
||||
|
||||
|
@ -140,6 +140,29 @@ namespace eval ttk::theme::sun-valley-dark {
|
||||
TSeparator.separator -sticky nsew
|
||||
}
|
||||
|
||||
|
||||
# # Modify the TCombobox style
|
||||
# ttk::style configure TCombobox -borderwidth 3
|
||||
|
||||
# # Define the layout of the ThickBorder.TCombobox
|
||||
# ttk::style layout ThickBorder.TCombobox {
|
||||
# Combobox.field -sticky nsew -children {
|
||||
# Combobox.padding -expand 1 -sticky nsew -children {
|
||||
# Combobox.textarea -sticky nsew
|
||||
# }
|
||||
# }
|
||||
# null -side right -sticky ns -children {
|
||||
# Combobox.arrow -sticky nsew
|
||||
# }
|
||||
# }
|
||||
|
||||
# # Use a canvas as the parent of the combobox and create a custom border
|
||||
# canvas .c -width 200 -height 30 -highlightthickness 0
|
||||
# canvas .c create rectangle 2 2 198 28 -width 3 -outline black
|
||||
# pack .c
|
||||
# ttk::combobox .c.cbox -values {"Option 1" "Option 2" "Option 3"} -style ThickBorder.TCombobox
|
||||
# .c create window 100 15 -window .c.cbox
|
||||
|
||||
ttk::style layout TCombobox {
|
||||
Combobox.field -sticky nsew -children {
|
||||
Combobox.padding -expand 1 -sticky nsew -children {
|
||||
@ -453,11 +476,11 @@ namespace eval ttk::theme::sun-valley-dark {
|
||||
ttk::style element create Combobox.field \
|
||||
image [list $images(button-rest) \
|
||||
{readonly disabled} $images(button-disabled) \
|
||||
{readonly pressed} $images(button-pressed) \
|
||||
{readonly pressed} $images(button-rest) \
|
||||
{readonly hover} $images(button-hover) \
|
||||
readonly $images(button-rest) \
|
||||
invalid $images(entry-invalid) \
|
||||
disabled $images(entry-disabled) \
|
||||
disabled $images(combo-disabled) \
|
||||
focus $images(entry-focus) \
|
||||
hover $images(button-hover) \
|
||||
] -border 5 -padding 8 -sticky nsew
|
||||
|
BIN
gui_data/sv_ttk/theme/dark/combo-disabled.png
Normal file
After Width: | Height: | Size: 2.9 KiB |
@ -1,15 +1,11 @@
|
||||
from abc import ABCMeta
|
||||
|
||||
import torch
|
||||
import torch.nn as nn
|
||||
from pytorch_lightning import LightningModule
|
||||
from .modules import TFC_TDF
|
||||
from pytorch_lightning import LightningModule
|
||||
|
||||
dim_s = 4
|
||||
|
||||
class AbstractMDXNet(LightningModule):
|
||||
__metaclass__ = ABCMeta
|
||||
|
||||
def __init__(self, target_name, lr, optimizer, dim_c, dim_f, dim_t, n_fft, hop_length, overlap):
|
||||
super().__init__()
|
||||
self.target_name = target_name
|
||||
@ -24,7 +20,7 @@ class AbstractMDXNet(LightningModule):
|
||||
self.window = nn.Parameter(torch.hann_window(window_length=self.n_fft, periodic=True), requires_grad=False)
|
||||
self.freq_pad = nn.Parameter(torch.zeros([1, dim_c, self.n_bins - self.dim_f, self.dim_t]), requires_grad=False)
|
||||
|
||||
def configure_optimizers(self):
|
||||
def get_optimizer(self):
|
||||
if self.optimizer == 'rmsprop':
|
||||
return torch.optim.RMSprop(self.parameters(), self.lr)
|
||||
|
||||
@ -37,7 +33,7 @@ class ConvTDFNet(AbstractMDXNet):
|
||||
|
||||
super(ConvTDFNet, self).__init__(
|
||||
target_name, lr, optimizer, dim_c, dim_f, dim_t, n_fft, hop_length, overlap)
|
||||
self.save_hyperparameters()
|
||||
#self.save_hyperparameters()
|
||||
|
||||
self.num_blocks = num_blocks
|
||||
self.l = l
|
||||
|
48
lib_v5/results.py
Normal file
@ -0,0 +1,48 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
|
||||
"""
|
||||
Matchering - Audio Matching and Mastering Python Library
|
||||
Copyright (C) 2016-2022 Sergree
|
||||
|
||||
This program is free software: you can redistribute it and/or modify
|
||||
it under the terms of the GNU General Public License as published by
|
||||
the Free Software Foundation, either version 3 of the License, or
|
||||
(at your option) any later version.
|
||||
|
||||
This program is distributed in the hope that it will be useful,
|
||||
but WITHOUT ANY WARRANTY; without even the implied warranty of
|
||||
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
||||
GNU General Public License for more details.
|
||||
|
||||
You should have received a copy of the GNU General Public License
|
||||
along with this program. If not, see <https://www.gnu.org/licenses/>.
|
||||
"""
|
||||
|
||||
import os
|
||||
import soundfile as sf
|
||||
|
||||
|
||||
class Result:
|
||||
def __init__(
|
||||
self, file: str, subtype: str, use_limiter: bool = True, normalize: bool = True
|
||||
):
|
||||
_, file_ext = os.path.splitext(file)
|
||||
file_ext = file_ext[1:].upper()
|
||||
if not sf.check_format(file_ext):
|
||||
raise TypeError(f"{file_ext} format is not supported")
|
||||
if not sf.check_format(file_ext, subtype):
|
||||
raise TypeError(f"{file_ext} format does not have {subtype} subtype")
|
||||
self.file = file
|
||||
self.subtype = subtype
|
||||
self.use_limiter = use_limiter
|
||||
self.normalize = normalize
|
||||
|
||||
|
||||
def pcm16(file: str) -> Result:
|
||||
return Result(file, "PCM_16")
|
||||
|
||||
def pcm24(file: str) -> Result:
|
||||
return Result(file, "FLOAT")
|
||||
|
||||
def save_audiofile(file: str, wav_set="PCM_16") -> Result:
|
||||
return Result(file, wav_set)
|
234
lib_v5/tfc_tdf_v3.py
Normal file
@ -0,0 +1,234 @@
|
||||
import torch
|
||||
import torch.nn as nn
|
||||
from functools import partial
|
||||
|
||||
class STFT:
|
||||
def __init__(self, n_fft, hop_length, dim_f):
|
||||
self.n_fft = n_fft
|
||||
self.hop_length = hop_length
|
||||
self.window = torch.hann_window(window_length=self.n_fft, periodic=True)
|
||||
self.dim_f = dim_f
|
||||
|
||||
def __call__(self, x):
|
||||
window = self.window.to(x.device)
|
||||
batch_dims = x.shape[:-2]
|
||||
c, t = x.shape[-2:]
|
||||
x = x.reshape([-1, t])
|
||||
x = torch.stft(x, n_fft=self.n_fft, hop_length=self.hop_length, window=window, center=True,return_complex=False)
|
||||
x = x.permute([0, 3, 1, 2])
|
||||
x = x.reshape([*batch_dims, c, 2, -1, x.shape[-1]]).reshape([*batch_dims, c * 2, -1, x.shape[-1]])
|
||||
return x[..., :self.dim_f, :]
|
||||
|
||||
def inverse(self, x):
|
||||
window = self.window.to(x.device)
|
||||
batch_dims = x.shape[:-3]
|
||||
c, f, t = x.shape[-3:]
|
||||
n = self.n_fft // 2 + 1
|
||||
f_pad = torch.zeros([*batch_dims, c, n - f, t]).to(x.device)
|
||||
x = torch.cat([x, f_pad], -2)
|
||||
x = x.reshape([*batch_dims, c // 2, 2, n, t]).reshape([-1, 2, n, t])
|
||||
x = x.permute([0, 2, 3, 1])
|
||||
x = x[..., 0] + x[..., 1] * 1.j
|
||||
x = torch.istft(x, n_fft=self.n_fft, hop_length=self.hop_length, window=window, center=True)
|
||||
x = x.reshape([*batch_dims, 2, -1])
|
||||
return x
|
||||
|
||||
|
||||
def get_norm(norm_type):
|
||||
def norm(c, norm_type):
|
||||
if norm_type == 'BatchNorm':
|
||||
return nn.BatchNorm2d(c)
|
||||
elif norm_type == 'InstanceNorm':
|
||||
return nn.InstanceNorm2d(c, affine=True)
|
||||
elif 'GroupNorm' in norm_type:
|
||||
g = int(norm_type.replace('GroupNorm', ''))
|
||||
return nn.GroupNorm(num_groups=g, num_channels=c)
|
||||
else:
|
||||
return nn.Identity()
|
||||
|
||||
return partial(norm, norm_type=norm_type)
|
||||
|
||||
|
||||
def get_act(act_type):
|
||||
if act_type == 'gelu':
|
||||
return nn.GELU()
|
||||
elif act_type == 'relu':
|
||||
return nn.ReLU()
|
||||
elif act_type[:3] == 'elu':
|
||||
alpha = float(act_type.replace('elu', ''))
|
||||
return nn.ELU(alpha)
|
||||
else:
|
||||
raise Exception
|
||||
|
||||
|
||||
class Upscale(nn.Module):
|
||||
def __init__(self, in_c, out_c, scale, norm, act):
|
||||
super().__init__()
|
||||
self.conv = nn.Sequential(
|
||||
norm(in_c),
|
||||
act,
|
||||
nn.ConvTranspose2d(in_channels=in_c, out_channels=out_c, kernel_size=scale, stride=scale, bias=False)
|
||||
)
|
||||
|
||||
def forward(self, x):
|
||||
return self.conv(x)
|
||||
|
||||
|
||||
class Downscale(nn.Module):
|
||||
def __init__(self, in_c, out_c, scale, norm, act):
|
||||
super().__init__()
|
||||
self.conv = nn.Sequential(
|
||||
norm(in_c),
|
||||
act,
|
||||
nn.Conv2d(in_channels=in_c, out_channels=out_c, kernel_size=scale, stride=scale, bias=False)
|
||||
)
|
||||
|
||||
def forward(self, x):
|
||||
return self.conv(x)
|
||||
|
||||
|
||||
class TFC_TDF(nn.Module):
|
||||
def __init__(self, in_c, c, l, f, bn, norm, act):
|
||||
super().__init__()
|
||||
|
||||
self.blocks = nn.ModuleList()
|
||||
for i in range(l):
|
||||
block = nn.Module()
|
||||
|
||||
block.tfc1 = nn.Sequential(
|
||||
norm(in_c),
|
||||
act,
|
||||
nn.Conv2d(in_c, c, 3, 1, 1, bias=False),
|
||||
)
|
||||
block.tdf = nn.Sequential(
|
||||
norm(c),
|
||||
act,
|
||||
nn.Linear(f, f // bn, bias=False),
|
||||
norm(c),
|
||||
act,
|
||||
nn.Linear(f // bn, f, bias=False),
|
||||
)
|
||||
block.tfc2 = nn.Sequential(
|
||||
norm(c),
|
||||
act,
|
||||
nn.Conv2d(c, c, 3, 1, 1, bias=False),
|
||||
)
|
||||
block.shortcut = nn.Conv2d(in_c, c, 1, 1, 0, bias=False)
|
||||
|
||||
self.blocks.append(block)
|
||||
in_c = c
|
||||
|
||||
def forward(self, x):
|
||||
for block in self.blocks:
|
||||
s = block.shortcut(x)
|
||||
x = block.tfc1(x)
|
||||
x = x + block.tdf(x)
|
||||
x = block.tfc2(x)
|
||||
x = x + s
|
||||
return x
|
||||
|
||||
|
||||
class TFC_TDF_net(nn.Module):
|
||||
def __init__(self, config):
|
||||
super().__init__()
|
||||
self.config = config
|
||||
|
||||
norm = get_norm(norm_type=config.model.norm)
|
||||
act = get_act(act_type=config.model.act)
|
||||
|
||||
self.num_target_instruments = 1 if config.training.target_instrument else len(config.training.instruments)
|
||||
self.num_subbands = config.model.num_subbands
|
||||
|
||||
dim_c = self.num_subbands * config.audio.num_channels * 2
|
||||
n = config.model.num_scales
|
||||
scale = config.model.scale
|
||||
l = config.model.num_blocks_per_scale
|
||||
c = config.model.num_channels
|
||||
g = config.model.growth
|
||||
bn = config.model.bottleneck_factor
|
||||
f = config.audio.dim_f // self.num_subbands
|
||||
|
||||
self.first_conv = nn.Conv2d(dim_c, c, 1, 1, 0, bias=False)
|
||||
|
||||
self.encoder_blocks = nn.ModuleList()
|
||||
for i in range(n):
|
||||
block = nn.Module()
|
||||
block.tfc_tdf = TFC_TDF(c, c, l, f, bn, norm, act)
|
||||
block.downscale = Downscale(c, c + g, scale, norm, act)
|
||||
f = f // scale[1]
|
||||
c += g
|
||||
self.encoder_blocks.append(block)
|
||||
|
||||
self.bottleneck_block = TFC_TDF(c, c, l, f, bn, norm, act)
|
||||
|
||||
self.decoder_blocks = nn.ModuleList()
|
||||
for i in range(n):
|
||||
block = nn.Module()
|
||||
block.upscale = Upscale(c, c - g, scale, norm, act)
|
||||
f = f * scale[1]
|
||||
c -= g
|
||||
block.tfc_tdf = TFC_TDF(2 * c, c, l, f, bn, norm, act)
|
||||
self.decoder_blocks.append(block)
|
||||
|
||||
self.final_conv = nn.Sequential(
|
||||
nn.Conv2d(c + dim_c, c, 1, 1, 0, bias=False),
|
||||
act,
|
||||
nn.Conv2d(c, self.num_target_instruments * dim_c, 1, 1, 0, bias=False)
|
||||
)
|
||||
|
||||
self.stft = STFT(config.audio.n_fft, config.audio.hop_length, config.audio.dim_f)
|
||||
|
||||
def cac2cws(self, x):
|
||||
k = self.num_subbands
|
||||
b, c, f, t = x.shape
|
||||
x = x.reshape(b, c, k, f // k, t)
|
||||
x = x.reshape(b, c * k, f // k, t)
|
||||
return x
|
||||
|
||||
def cws2cac(self, x):
|
||||
k = self.num_subbands
|
||||
b, c, f, t = x.shape
|
||||
x = x.reshape(b, c // k, k, f, t)
|
||||
x = x.reshape(b, c // k, f * k, t)
|
||||
return x
|
||||
|
||||
def forward(self, x):
|
||||
|
||||
x = self.stft(x)
|
||||
|
||||
mix = x = self.cac2cws(x)
|
||||
|
||||
first_conv_out = x = self.first_conv(x)
|
||||
|
||||
x = x.transpose(-1, -2)
|
||||
|
||||
encoder_outputs = []
|
||||
for block in self.encoder_blocks:
|
||||
x = block.tfc_tdf(x)
|
||||
encoder_outputs.append(x)
|
||||
x = block.downscale(x)
|
||||
|
||||
x = self.bottleneck_block(x)
|
||||
|
||||
for block in self.decoder_blocks:
|
||||
x = block.upscale(x)
|
||||
x = torch.cat([x, encoder_outputs.pop()], 1)
|
||||
x = block.tfc_tdf(x)
|
||||
|
||||
x = x.transpose(-1, -2)
|
||||
|
||||
x = x * first_conv_out # reduce artifacts
|
||||
|
||||
x = self.final_conv(torch.cat([mix, x], 1))
|
||||
|
||||
x = self.cws2cac(x)
|
||||
|
||||
if self.num_target_instruments > 1:
|
||||
b, c, f, t = x.shape
|
||||
x = x.reshape(b, self.num_target_instruments, -1, f, t)
|
||||
|
||||
x = self.stft.inverse(x)
|
||||
|
||||
return x
|
||||
|
||||
|
@ -1,36 +1,15 @@
|
||||
import json
|
||||
import pathlib
|
||||
|
||||
default_param = {}
|
||||
default_param['bins'] = 768
|
||||
default_param['unstable_bins'] = 9 # training only
|
||||
default_param['reduction_bins'] = 762 # training only
|
||||
default_param['bins'] = -1
|
||||
default_param['unstable_bins'] = -1 # training only
|
||||
default_param['stable_bins'] = -1 # training only
|
||||
default_param['sr'] = 44100
|
||||
default_param['pre_filter_start'] = 757
|
||||
default_param['pre_filter_stop'] = 768
|
||||
default_param['pre_filter_start'] = -1
|
||||
default_param['pre_filter_stop'] = -1
|
||||
default_param['band'] = {}
|
||||
|
||||
|
||||
default_param['band'][1] = {
|
||||
'sr': 11025,
|
||||
'hl': 128,
|
||||
'n_fft': 960,
|
||||
'crop_start': 0,
|
||||
'crop_stop': 245,
|
||||
'lpf_start': 61, # inference only
|
||||
'res_type': 'polyphase'
|
||||
}
|
||||
|
||||
default_param['band'][2] = {
|
||||
'sr': 44100,
|
||||
'hl': 512,
|
||||
'n_fft': 1536,
|
||||
'crop_start': 24,
|
||||
'crop_stop': 547,
|
||||
'hpf_start': 81, # inference only
|
||||
'res_type': 'sinc_best'
|
||||
}
|
||||
|
||||
N_BINS = 'n_bins'
|
||||
|
||||
def int_keys(d):
|
||||
r = {}
|
||||
@ -40,20 +19,14 @@ def int_keys(d):
|
||||
r[k] = v
|
||||
return r
|
||||
|
||||
|
||||
class ModelParameters(object):
|
||||
def __init__(self, config_path=''):
|
||||
if '.pth' == pathlib.Path(config_path).suffix:
|
||||
import zipfile
|
||||
|
||||
with zipfile.ZipFile(config_path, 'r') as zip:
|
||||
self.param = json.loads(zip.read('param.json'), object_pairs_hook=int_keys)
|
||||
elif '.json' == pathlib.Path(config_path).suffix:
|
||||
with open(config_path, 'r') as f:
|
||||
self.param = json.loads(f.read(), object_pairs_hook=int_keys)
|
||||
else:
|
||||
self.param = default_param
|
||||
|
||||
for k in ['mid_side', 'mid_side_b', 'mid_side_b2', 'stereo_w', 'stereo_n', 'reverse']:
|
||||
if not k in self.param:
|
||||
self.param[k] = False
|
||||
|
||||
if N_BINS in self.param:
|
||||
self.param['bins'] = self.param[N_BINS]
|
55
lib_v5/vr_network/modelparams/4band_v3_sn.json
Normal file
@ -0,0 +1,55 @@
|
||||
{
|
||||
"n_bins": 672,
|
||||
"unstable_bins": 8,
|
||||
"stable_bins": 530,
|
||||
"band": {
|
||||
"1": {
|
||||
"sr": 7350,
|
||||
"hl": 80,
|
||||
"n_fft": 640,
|
||||
"crop_start": 0,
|
||||
"crop_stop": 85,
|
||||
"lpf_start": 25,
|
||||
"lpf_stop": 53,
|
||||
"res_type": "polyphase"
|
||||
},
|
||||
"2": {
|
||||
"sr": 7350,
|
||||
"hl": 80,
|
||||
"n_fft": 320,
|
||||
"crop_start": 4,
|
||||
"crop_stop": 87,
|
||||
"hpf_start": 25,
|
||||
"hpf_stop": 12,
|
||||
"lpf_start": 31,
|
||||
"lpf_stop": 62,
|
||||
"res_type": "polyphase"
|
||||
},
|
||||
"3": {
|
||||
"sr": 14700,
|
||||
"hl": 160,
|
||||
"n_fft": 512,
|
||||
"crop_start": 17,
|
||||
"crop_stop": 216,
|
||||
"hpf_start": 48,
|
||||
"hpf_stop": 24,
|
||||
"lpf_start": 139,
|
||||
"lpf_stop": 210,
|
||||
"res_type": "polyphase"
|
||||
},
|
||||
"4": {
|
||||
"sr": 44100,
|
||||
"hl": 480,
|
||||
"n_fft": 960,
|
||||
"crop_start": 78,
|
||||
"crop_stop": 383,
|
||||
"hpf_start": 130,
|
||||
"hpf_stop": 86,
|
||||
"convert_channels": "stereo_n",
|
||||
"res_type": "kaiser_fast"
|
||||
}
|
||||
},
|
||||
"sr": 44100,
|
||||
"pre_filter_start": 668,
|
||||
"pre_filter_stop": 672
|
||||
}
|
@ -40,20 +40,20 @@ class BaseNet(nn.Module):
|
||||
|
||||
class CascadedNet(nn.Module):
|
||||
|
||||
def __init__(self, n_fft, nn_arch_size, nout=32, nout_lstm=128):
|
||||
def __init__(self, n_fft, nn_arch_size=51000, nout=32, nout_lstm=128):
|
||||
super(CascadedNet, self).__init__()
|
||||
|
||||
self.max_bin = n_fft // 2
|
||||
self.output_bin = n_fft // 2 + 1
|
||||
self.nin_lstm = self.max_bin // 2
|
||||
self.offset = 64
|
||||
nout = 64 if nn_arch_size == 218409 else nout
|
||||
|
||||
#print(nout, nout_lstm, n_fft)
|
||||
|
||||
self.stg1_low_band_net = nn.Sequential(
|
||||
BaseNet(2, nout // 2, self.nin_lstm // 2, nout_lstm),
|
||||
layers.Conv2DBNActiv(nout // 2, nout // 4, 1, 1, 0)
|
||||
)
|
||||
|
||||
self.stg1_high_band_net = BaseNet(2, nout // 4, self.nin_lstm // 2, nout_lstm // 2)
|
||||
|
||||
self.stg2_low_band_net = nn.Sequential(
|
||||
|
@ -1,219 +0,0 @@
|
||||
{
|
||||
"0ddfc0eb5792638ad5dc27850236c246": {
|
||||
"compensate": 1.035,
|
||||
"mdx_dim_f_set": 2048,
|
||||
"mdx_dim_t_set": 8,
|
||||
"mdx_n_fft_scale_set": 6144,
|
||||
"primary_stem": "Vocals"
|
||||
},
|
||||
"26d308f91f3423a67dc69a6d12a8793d": {
|
||||
"compensate": 1.035,
|
||||
"mdx_dim_f_set": 2048,
|
||||
"mdx_dim_t_set": 9,
|
||||
"mdx_n_fft_scale_set": 8192,
|
||||
"primary_stem": "Other"
|
||||
},
|
||||
"2cdd429caac38f0194b133884160f2c6": {
|
||||
"compensate": 1.035,
|
||||
"mdx_dim_f_set": 3072,
|
||||
"mdx_dim_t_set": 8,
|
||||
"mdx_n_fft_scale_set": 7680,
|
||||
"primary_stem": "Instrumental"
|
||||
},
|
||||
"2f5501189a2f6db6349916fabe8c90de": {
|
||||
"compensate": 1.035,
|
||||
"mdx_dim_f_set": 2048,
|
||||
"mdx_dim_t_set": 8,
|
||||
"mdx_n_fft_scale_set": 6144,
|
||||
"primary_stem": "Vocals"
|
||||
},
|
||||
"398580b6d5d973af3120df54cee6759d": {
|
||||
"compensate": 1.75,
|
||||
"mdx_dim_f_set": 3072,
|
||||
"mdx_dim_t_set": 8,
|
||||
"mdx_n_fft_scale_set": 7680,
|
||||
"primary_stem": "Vocals"
|
||||
},
|
||||
"488b3e6f8bd3717d9d7c428476be2d75": {
|
||||
"compensate": 1.035,
|
||||
"mdx_dim_f_set": 3072,
|
||||
"mdx_dim_t_set": 8,
|
||||
"mdx_n_fft_scale_set": 7680,
|
||||
"primary_stem": "Instrumental"
|
||||
},
|
||||
"4910e7827f335048bdac11fa967772f9": {
|
||||
"compensate": 1.035,
|
||||
"mdx_dim_f_set": 2048,
|
||||
"mdx_dim_t_set": 7,
|
||||
"mdx_n_fft_scale_set": 4096,
|
||||
"primary_stem": "Drums"
|
||||
},
|
||||
"53c4baf4d12c3e6c3831bb8f5b532b93": {
|
||||
"compensate": 1.035,
|
||||
"mdx_dim_f_set": 3072,
|
||||
"mdx_dim_t_set": 8,
|
||||
"mdx_n_fft_scale_set": 7680,
|
||||
"primary_stem": "Vocals"
|
||||
},
|
||||
"5d343409ef0df48c7d78cce9f0106781": {
|
||||
"compensate": 1.075,
|
||||
"mdx_dim_f_set": 3072,
|
||||
"mdx_dim_t_set": 8,
|
||||
"mdx_n_fft_scale_set": 7680,
|
||||
"primary_stem": "Vocals"
|
||||
},
|
||||
"5f6483271e1efb9bfb59e4a3e6d4d098": {
|
||||
"compensate": 1.035,
|
||||
"mdx_dim_f_set": 2048,
|
||||
"mdx_dim_t_set": 9,
|
||||
"mdx_n_fft_scale_set": 6144,
|
||||
"primary_stem": "Vocals"
|
||||
},
|
||||
"65ab5919372a128e4167f5e01a8fda85": {
|
||||
"compensate": 1.035,
|
||||
"mdx_dim_f_set": 2048,
|
||||
"mdx_dim_t_set": 8,
|
||||
"mdx_n_fft_scale_set": 8192,
|
||||
"primary_stem": "Other"
|
||||
},
|
||||
"6703e39f36f18aa7855ee1047765621d": {
|
||||
"compensate": 1.035,
|
||||
"mdx_dim_f_set": 2048,
|
||||
"mdx_dim_t_set": 9,
|
||||
"mdx_n_fft_scale_set": 16384,
|
||||
"primary_stem": "Bass"
|
||||
},
|
||||
"6b31de20e84392859a3d09d43f089515": {
|
||||
"compensate": 1.035,
|
||||
"mdx_dim_f_set": 2048,
|
||||
"mdx_dim_t_set": 8,
|
||||
"mdx_n_fft_scale_set": 6144,
|
||||
"primary_stem": "Vocals"
|
||||
},
|
||||
"867595e9de46f6ab699008295df62798": {
|
||||
"compensate": 1.075,
|
||||
"mdx_dim_f_set": 3072,
|
||||
"mdx_dim_t_set": 8,
|
||||
"mdx_n_fft_scale_set": 7680,
|
||||
"primary_stem": "Vocals"
|
||||
},
|
||||
"a3cd63058945e777505c01d2507daf37": {
|
||||
"compensate": 1.035,
|
||||
"mdx_dim_f_set": 2048,
|
||||
"mdx_dim_t_set": 8,
|
||||
"mdx_n_fft_scale_set": 6144,
|
||||
"primary_stem": "Vocals"
|
||||
},
|
||||
"b33d9b3950b6cbf5fe90a32608924700": {
|
||||
"compensate": 1.075,
|
||||
"mdx_dim_f_set": 3072,
|
||||
"mdx_dim_t_set": 8,
|
||||
"mdx_n_fft_scale_set": 7680,
|
||||
"primary_stem": "Vocals"
|
||||
},
|
||||
"c3b29bdce8c4fa17ec609e16220330ab": {
|
||||
"compensate": 1.035,
|
||||
"mdx_dim_f_set": 2048,
|
||||
"mdx_dim_t_set": 8,
|
||||
"mdx_n_fft_scale_set": 16384,
|
||||
"primary_stem": "Bass"
|
||||
},
|
||||
"ceed671467c1f64ebdfac8a2490d0d52": {
|
||||
"compensate": 1.035,
|
||||
"mdx_dim_f_set": 3072,
|
||||
"mdx_dim_t_set": 8,
|
||||
"mdx_n_fft_scale_set": 7680,
|
||||
"primary_stem": "Instrumental"
|
||||
},
|
||||
"d2a1376f310e4f7fa37fb9b5774eb701": {
|
||||
"compensate": 1.035,
|
||||
"mdx_dim_f_set": 3072,
|
||||
"mdx_dim_t_set": 8,
|
||||
"mdx_n_fft_scale_set": 7680,
|
||||
"primary_stem": "Instrumental"
|
||||
},
|
||||
"d7bff498db9324db933d913388cba6be": {
|
||||
"compensate": 1.035,
|
||||
"mdx_dim_f_set": 2048,
|
||||
"mdx_dim_t_set": 8,
|
||||
"mdx_n_fft_scale_set": 6144,
|
||||
"primary_stem": "Vocals"
|
||||
},
|
||||
"d94058f8c7f1fae4164868ae8ae66b20": {
|
||||
"compensate": 1.035,
|
||||
"mdx_dim_f_set": 2048,
|
||||
"mdx_dim_t_set": 8,
|
||||
"mdx_n_fft_scale_set": 6144,
|
||||
"primary_stem": "Vocals"
|
||||
},
|
||||
"dc41ede5961d50f277eb846db17f5319": {
|
||||
"compensate": 1.035,
|
||||
"mdx_dim_f_set": 2048,
|
||||
"mdx_dim_t_set": 9,
|
||||
"mdx_n_fft_scale_set": 4096,
|
||||
"primary_stem": "Drums"
|
||||
},
|
||||
"e5572e58abf111f80d8241d2e44e7fa4": {
|
||||
"compensate": 1.035,
|
||||
"mdx_dim_f_set": 3072,
|
||||
"mdx_dim_t_set": 8,
|
||||
"mdx_n_fft_scale_set": 7680,
|
||||
"primary_stem": "Instrumental"
|
||||
},
|
||||
"e7324c873b1f615c35c1967f912db92a": {
|
||||
"compensate": 1.075,
|
||||
"mdx_dim_f_set": 3072,
|
||||
"mdx_dim_t_set": 8,
|
||||
"mdx_n_fft_scale_set": 7680,
|
||||
"primary_stem": "Vocals"
|
||||
},
|
||||
"1c56ec0224f1d559c42fd6fd2a67b154": {
|
||||
"compensate": 1.035,
|
||||
"mdx_dim_f_set": 2048,
|
||||
"mdx_dim_t_set": 8,
|
||||
"mdx_n_fft_scale_set": 5120,
|
||||
"primary_stem": "Instrumental"
|
||||
},
|
||||
"f2df6d6863d8f435436d8b561594ff49": {
|
||||
"compensate": 1.035,
|
||||
"mdx_dim_f_set": 3072,
|
||||
"mdx_dim_t_set": 8,
|
||||
"mdx_n_fft_scale_set": 7680,
|
||||
"primary_stem": "Instrumental"
|
||||
},
|
||||
"b06327a00d5e5fbc7d96e1781bbdb596": {
|
||||
"compensate": 1.035,
|
||||
"mdx_dim_f_set": 3072,
|
||||
"mdx_dim_t_set": 8,
|
||||
"mdx_n_fft_scale_set": 6144,
|
||||
"primary_stem": "Instrumental"
|
||||
},
|
||||
"94ff780b977d3ca07c7a343dab2e25dd": {
|
||||
"compensate": 1.035,
|
||||
"mdx_dim_f_set": 3072,
|
||||
"mdx_dim_t_set": 8,
|
||||
"mdx_n_fft_scale_set": 6144,
|
||||
"primary_stem": "Instrumental"
|
||||
},
|
||||
"73492b58195c3b52d34590d5474452f6": {
|
||||
"compensate": 1.075,
|
||||
"mdx_dim_f_set": 3072,
|
||||
"mdx_dim_t_set": 8,
|
||||
"mdx_n_fft_scale_set": 7680,
|
||||
"primary_stem": "Vocals"
|
||||
},
|
||||
"1d64a6d2c30f709b8c9b4ce1366d96ee": {
|
||||
"compensate": 1.035,
|
||||
"mdx_dim_f_set": 2048,
|
||||
"mdx_dim_t_set": 8,
|
||||
"mdx_n_fft_scale_set": 5120,
|
||||
"primary_stem": "Instrumental"
|
||||
},
|
||||
"203f2a3955221b64df85a41af87cf8f0": {
|
||||
"compensate": 1.035,
|
||||
"mdx_dim_f_set": 3072,
|
||||
"mdx_dim_t_set": 8,
|
||||
"mdx_n_fft_scale_set": 6144,
|
||||
"primary_stem": "Instrumental"
|
||||
}
|
||||
}
|
34
models/MDX_Net_Models/model_data/mdx_c_configs/model1.yaml
Normal file
@ -0,0 +1,34 @@
|
||||
audio:
|
||||
chunk_size: 260096
|
||||
dim_f: 4096
|
||||
dim_t: 128
|
||||
hop_length: 2048
|
||||
n_fft: 8192
|
||||
num_channels: 2
|
||||
sample_rate: 44100
|
||||
model:
|
||||
act: gelu
|
||||
bottleneck_factor: 4
|
||||
growth: 64
|
||||
norm: InstanceNorm
|
||||
num_blocks_per_scale: 2
|
||||
num_channels: 128
|
||||
num_scales: 5
|
||||
num_subbands: 4
|
||||
scale:
|
||||
- 2
|
||||
- 2
|
||||
training:
|
||||
batch_size: 8
|
||||
grad_clip: 0
|
||||
instruments:
|
||||
- Vocals
|
||||
- Drums
|
||||
- Bass
|
||||
- Other
|
||||
lr: 5.0e-05
|
||||
target_instrument: null
|
||||
inference:
|
||||
batch_size: 1
|
||||
dim_t: 256
|
||||
num_overlap: 8
|
34
models/MDX_Net_Models/model_data/mdx_c_configs/model2.yaml
Normal file
@ -0,0 +1,34 @@
|
||||
audio:
|
||||
chunk_size: 260096
|
||||
dim_f: 4096
|
||||
dim_t: 128
|
||||
hop_length: 2048
|
||||
n_fft: 8192
|
||||
num_channels: 2
|
||||
sample_rate: 44100
|
||||
model:
|
||||
act: gelu
|
||||
bottleneck_factor: 4
|
||||
growth: 64
|
||||
norm: InstanceNorm
|
||||
num_blocks_per_scale: 2
|
||||
num_channels: 256
|
||||
num_scales: 5
|
||||
num_subbands: 4
|
||||
scale:
|
||||
- 2
|
||||
- 2
|
||||
training:
|
||||
batch_size: 8
|
||||
grad_clip: 0
|
||||
instruments:
|
||||
- Vocals
|
||||
- Drums
|
||||
- Bass
|
||||
- Other
|
||||
lr: 3.0e-05
|
||||
target_instrument: null
|
||||
inference:
|
||||
batch_size: 1
|
||||
dim_t: 256
|
||||
num_overlap: 8
|
34
models/MDX_Net_Models/model_data/mdx_c_configs/model3.yaml
Normal file
@ -0,0 +1,34 @@
|
||||
audio:
|
||||
chunk_size: 260096
|
||||
dim_f: 4096
|
||||
dim_t: 128
|
||||
hop_length: 2048
|
||||
n_fft: 12288
|
||||
num_channels: 2
|
||||
sample_rate: 44100
|
||||
model:
|
||||
act: gelu
|
||||
bottleneck_factor: 4
|
||||
growth: 64
|
||||
norm: InstanceNorm
|
||||
num_blocks_per_scale: 2
|
||||
num_channels: 128
|
||||
num_scales: 5
|
||||
num_subbands: 4
|
||||
scale:
|
||||
- 2
|
||||
- 2
|
||||
training:
|
||||
batch_size: 8
|
||||
grad_clip: 0
|
||||
instruments:
|
||||
- Vocals
|
||||
- Drums
|
||||
- Bass
|
||||
- Other
|
||||
lr: 5.0e-05
|
||||
target_instrument: Vocals
|
||||
inference:
|
||||
batch_size: 1
|
||||
dim_t: 256
|
||||
num_overlap: 8
|
39
models/MDX_Net_Models/model_data/mdx_c_configs/modelA.yaml
Normal file
@ -0,0 +1,39 @@
|
||||
audio:
|
||||
chunk_size: 261120
|
||||
dim_f: 4096
|
||||
dim_t: 256
|
||||
hop_length: 1024
|
||||
min_mean_abs: 0.01
|
||||
n_fft: 8192
|
||||
num_channels: 2
|
||||
sample_rate: 44100
|
||||
model:
|
||||
act: gelu
|
||||
bottleneck_factor: 4
|
||||
growth: 64
|
||||
norm: InstanceNorm
|
||||
num_blocks_per_scale: 2
|
||||
num_channels: 64
|
||||
num_scales: 5
|
||||
num_subbands: 4
|
||||
scale:
|
||||
- 2
|
||||
- 2
|
||||
training:
|
||||
batch_size: 6
|
||||
coarse_loss_clip: true
|
||||
ema_momentum: 0.999
|
||||
grad_clip: null
|
||||
instruments:
|
||||
- Vocals
|
||||
- Drums
|
||||
- Bass
|
||||
- Other
|
||||
lr: 0.0001
|
||||
num_steps: 100000
|
||||
q: 0.4
|
||||
target_instrument: null
|
||||
inference:
|
||||
batch_size: 2
|
||||
dim_t: 256
|
||||
num_overlap: 8
|
41
models/MDX_Net_Models/model_data/mdx_c_configs/modelB.yaml
Normal file
@ -0,0 +1,41 @@
|
||||
audio:
|
||||
chunk_size: 261120
|
||||
dim_f: 4096
|
||||
dim_t: 256
|
||||
hop_length: 1024
|
||||
min_mean_abs: 0.01
|
||||
n_fft: 8192
|
||||
num_channels: 2
|
||||
sample_rate: 44100
|
||||
model:
|
||||
act: gelu
|
||||
bottleneck_factor: 4
|
||||
growth: 64
|
||||
norm: InstanceNorm
|
||||
num_blocks_per_scale: 2
|
||||
num_channels: 64
|
||||
num_scales: 5
|
||||
num_subbands: 4
|
||||
scale:
|
||||
- 2
|
||||
- 2
|
||||
training:
|
||||
batch_size: 6
|
||||
coarse_loss_clip: false
|
||||
datasets:
|
||||
- ../data/moises/bleeding
|
||||
ema_momentum: 0.999
|
||||
grad_clip: null
|
||||
instruments:
|
||||
- Vocals
|
||||
- Drums
|
||||
- Bass
|
||||
- Other
|
||||
lr: 0.0001
|
||||
num_steps: 150000
|
||||
q: 0.93
|
||||
target_instrument: null
|
||||
inference:
|
||||
batch_size: 2
|
||||
dim_t: 256
|
||||
num_overlap: 8
|
@ -0,0 +1,36 @@
|
||||
audio:
|
||||
chunk_size: 260096
|
||||
dim_f: 4096
|
||||
dim_t: 256
|
||||
hop_length: 2048
|
||||
n_fft: 12288
|
||||
num_channels: 2
|
||||
sample_rate: 44100
|
||||
min_mean_abs: 0.001
|
||||
model:
|
||||
act: gelu
|
||||
bottleneck_factor: 4
|
||||
growth: 64
|
||||
norm: InstanceNorm
|
||||
num_blocks_per_scale: 2
|
||||
num_channels: 128
|
||||
num_scales: 5
|
||||
num_subbands: 4
|
||||
scale:
|
||||
- 2
|
||||
- 2
|
||||
name: epoch_10.ckpt
|
||||
training:
|
||||
batch_size: 16
|
||||
grad_clip: 0
|
||||
instruments:
|
||||
- Vocals
|
||||
- Instrumental
|
||||
lr: 5.0e-05
|
||||
target_instrument: null
|
||||
num_epochs: 100
|
||||
num_steps: 1000
|
||||
inference:
|
||||
batch_size: 1
|
||||
dim_t: 256
|
||||
num_overlap: 8
|
@ -0,0 +1,36 @@
|
||||
audio:
|
||||
chunk_size: 260096
|
||||
dim_f: 6144
|
||||
dim_t: 128
|
||||
hop_length: 2048
|
||||
n_fft: 12288
|
||||
num_channels: 2
|
||||
sample_rate: 44100
|
||||
min_mean_abs: 0.001
|
||||
model:
|
||||
act: gelu
|
||||
bottleneck_factor: 4
|
||||
growth: 64
|
||||
norm: InstanceNorm
|
||||
num_blocks_per_scale: 2
|
||||
num_channels: 128
|
||||
num_scales: 5
|
||||
num_subbands: 6
|
||||
scale:
|
||||
- 2
|
||||
- 2
|
||||
training:
|
||||
batch_size: 14
|
||||
grad_clip: 0
|
||||
instruments:
|
||||
- Vocals
|
||||
- Instrumental
|
||||
lr: 3.0e-05
|
||||
target_instrument: null
|
||||
num_epochs: 1000
|
||||
num_steps: 1000
|
||||
augmentation: 1
|
||||
inference:
|
||||
batch_size: 1
|
||||
dim_t: 256
|
||||
num_overlap: 8
|
@ -0,0 +1,36 @@
|
||||
audio:
|
||||
chunk_size: 260096
|
||||
dim_f: 6144
|
||||
dim_t: 128
|
||||
hop_length: 2048
|
||||
n_fft: 12288
|
||||
num_channels: 2
|
||||
sample_rate: 44100
|
||||
min_mean_abs: 0.001
|
||||
model:
|
||||
act: gelu
|
||||
bottleneck_factor: 4
|
||||
growth: 128
|
||||
norm: InstanceNorm
|
||||
num_blocks_per_scale: 2
|
||||
num_channels: 128
|
||||
num_scales: 5
|
||||
num_subbands: 6
|
||||
scale:
|
||||
- 2
|
||||
- 2
|
||||
training:
|
||||
batch_size: 14
|
||||
grad_clip: 0
|
||||
instruments:
|
||||
- Vocals
|
||||
- Instrumental
|
||||
lr: 2.0e-05
|
||||
target_instrument: null
|
||||
num_epochs: 1000
|
||||
num_steps: 1000
|
||||
augmentation: 1
|
||||
inference:
|
||||
batch_size: 1
|
||||
dim_t: 256
|
||||
num_overlap: 8
|
@ -0,0 +1,39 @@
|
||||
audio:
|
||||
chunk_size: 261120
|
||||
dim_f: 6144
|
||||
dim_t: 256
|
||||
hop_length: 1024
|
||||
n_fft: 12288
|
||||
num_channels: 2
|
||||
sample_rate: 44100
|
||||
min_mean_abs: 0.001
|
||||
model:
|
||||
act: gelu
|
||||
bottleneck_factor: 4
|
||||
growth: 128
|
||||
norm: InstanceNorm
|
||||
num_blocks_per_scale: 2
|
||||
num_channels: 128
|
||||
num_scales: 5
|
||||
num_subbands: 6
|
||||
scale:
|
||||
- 2
|
||||
- 2
|
||||
training:
|
||||
batch_size: 6
|
||||
grad_clip: 0
|
||||
instruments:
|
||||
- Vocals
|
||||
- Instrumental
|
||||
lr: 1.0e-05
|
||||
target_instrument: null
|
||||
num_epochs: 1000
|
||||
num_steps: 1000
|
||||
augmentation: 1
|
||||
q: 0.95
|
||||
coarse_loss_clip: true
|
||||
ema_momentum: 0.999
|
||||
inference:
|
||||
batch_size: 1
|
||||
dim_t: 256
|
||||
num_overlap: 8
|
@ -0,0 +1,40 @@
|
||||
audio:
|
||||
chunk_size: 261120
|
||||
dim_f: 6144
|
||||
dim_t: 256
|
||||
hop_length: 1024
|
||||
n_fft: 12288
|
||||
num_channels: 2
|
||||
sample_rate: 44100
|
||||
min_mean_abs: 0.001
|
||||
model:
|
||||
act: gelu
|
||||
bottleneck_factor: 4
|
||||
growth: 128
|
||||
norm: InstanceNorm
|
||||
num_blocks_per_scale: 2
|
||||
num_channels: 128
|
||||
num_scales: 5
|
||||
num_subbands: 6
|
||||
scale:
|
||||
- 2
|
||||
- 2
|
||||
training:
|
||||
batch_size: 6
|
||||
grad_clip: 0
|
||||
instruments:
|
||||
- Vocals
|
||||
- Instrumental
|
||||
lr: 0.7e-05
|
||||
patience: 2
|
||||
target_instrument: null
|
||||
num_epochs: 1000
|
||||
num_steps: 1000
|
||||
augmentation: 1
|
||||
q: 0.95
|
||||
coarse_loss_clip: true
|
||||
ema_momentum: 0.999
|
||||
inference:
|
||||
batch_size: 1
|
||||
dim_t: 256
|
||||
num_overlap: 8
|
@ -0,0 +1,43 @@
|
||||
audio:
|
||||
chunk_size: 261120
|
||||
dim_f: 4096
|
||||
dim_t: 256
|
||||
hop_length: 1024
|
||||
n_fft: 8192
|
||||
num_channels: 2
|
||||
sample_rate: 44100
|
||||
min_mean_abs: 0.001
|
||||
model:
|
||||
act: gelu
|
||||
bottleneck_factor: 4
|
||||
growth: 128
|
||||
norm: InstanceNorm
|
||||
num_blocks_per_scale: 2
|
||||
num_channels: 128
|
||||
num_scales: 5
|
||||
num_subbands: 4
|
||||
scale:
|
||||
- 2
|
||||
- 2
|
||||
training:
|
||||
batch_size: 6
|
||||
grad_clip: 0
|
||||
instruments:
|
||||
- Vocals
|
||||
- Instrumental
|
||||
lr: 1.0e-05
|
||||
patience: 2
|
||||
reduce_factor: 0.95
|
||||
target_instrument: null
|
||||
num_epochs: 1000
|
||||
num_steps: 1000
|
||||
augmentation: 1
|
||||
augmentation_type: simple1
|
||||
augmentation_mix: true
|
||||
q: 0.95
|
||||
coarse_loss_clip: true
|
||||
ema_momentum: 0.999
|
||||
inference:
|
||||
batch_size: 1
|
||||
dim_t: 256
|
||||
num_overlap: 8
|
41
models/MDX_Net_Models/model_data/mdx_c_configs/sndfx.yaml
Normal file
@ -0,0 +1,41 @@
|
||||
audio:
|
||||
chunk_size: 261120
|
||||
dim_f: 1024
|
||||
dim_t: 256
|
||||
hop_length: 1024
|
||||
min_mean_abs: 0.01
|
||||
n_fft: 2048
|
||||
num_channels: 2
|
||||
sample_rate: 44100
|
||||
stereo_prob: 0.7
|
||||
model:
|
||||
act: gelu
|
||||
bottleneck_factor: 4
|
||||
growth: 64
|
||||
norm: InstanceNorm
|
||||
num_blocks_per_scale: 2
|
||||
num_channels: 64
|
||||
num_scales: 5
|
||||
num_subbands: 4
|
||||
scale:
|
||||
- 2
|
||||
- 2
|
||||
training:
|
||||
batch_size: 8
|
||||
ema_momentum: 0.999
|
||||
grad_clip: null
|
||||
instruments:
|
||||
- Music
|
||||
- Speech
|
||||
- SFX
|
||||
lr: 0.0001
|
||||
num_steps: 30000
|
||||
target_instrument: null
|
||||
inference:
|
||||
batch_size: 8
|
||||
dim_t: 256
|
||||
instruments:
|
||||
- Music
|
||||
- Dialog
|
||||
- Effect
|
||||
num_overlap: 8
|
@ -14,7 +14,7 @@
|
||||
"primary_stem": "Other"
|
||||
},
|
||||
"2cdd429caac38f0194b133884160f2c6": {
|
||||
"compensate": 1.035,
|
||||
"compensate": 1.045,
|
||||
"mdx_dim_f_set": 3072,
|
||||
"mdx_dim_t_set": 8,
|
||||
"mdx_n_fft_scale_set": 7680,
|
||||
@ -25,7 +25,8 @@
|
||||
"mdx_dim_f_set": 2048,
|
||||
"mdx_dim_t_set": 8,
|
||||
"mdx_n_fft_scale_set": 6144,
|
||||
"primary_stem": "Vocals"
|
||||
"primary_stem": "Vocals",
|
||||
"is_karaoke": true
|
||||
},
|
||||
"398580b6d5d973af3120df54cee6759d": {
|
||||
"compensate": 1.75,
|
||||
@ -49,7 +50,7 @@
|
||||
"primary_stem": "Drums"
|
||||
},
|
||||
"53c4baf4d12c3e6c3831bb8f5b532b93": {
|
||||
"compensate": 1.035,
|
||||
"compensate": 1.043,
|
||||
"mdx_dim_f_set": 3072,
|
||||
"mdx_dim_t_set": 8,
|
||||
"mdx_n_fft_scale_set": 7680,
|
||||
@ -91,21 +92,21 @@
|
||||
"primary_stem": "Vocals"
|
||||
},
|
||||
"867595e9de46f6ab699008295df62798": {
|
||||
"compensate": 1.075,
|
||||
"compensate": 1.03,
|
||||
"mdx_dim_f_set": 3072,
|
||||
"mdx_dim_t_set": 8,
|
||||
"mdx_n_fft_scale_set": 7680,
|
||||
"primary_stem": "Vocals"
|
||||
},
|
||||
"a3cd63058945e777505c01d2507daf37": {
|
||||
"compensate": 1.035,
|
||||
"compensate": 1.03,
|
||||
"mdx_dim_f_set": 2048,
|
||||
"mdx_dim_t_set": 8,
|
||||
"mdx_n_fft_scale_set": 6144,
|
||||
"primary_stem": "Vocals"
|
||||
},
|
||||
"b33d9b3950b6cbf5fe90a32608924700": {
|
||||
"compensate": 1.075,
|
||||
"compensate": 1.03,
|
||||
"mdx_dim_f_set": 3072,
|
||||
"mdx_dim_t_set": 8,
|
||||
"mdx_n_fft_scale_set": 7680,
|
||||
@ -154,21 +155,21 @@
|
||||
"primary_stem": "Drums"
|
||||
},
|
||||
"e5572e58abf111f80d8241d2e44e7fa4": {
|
||||
"compensate": 1.035,
|
||||
"compensate": 1.028,
|
||||
"mdx_dim_f_set": 3072,
|
||||
"mdx_dim_t_set": 8,
|
||||
"mdx_n_fft_scale_set": 7680,
|
||||
"primary_stem": "Instrumental"
|
||||
},
|
||||
"e7324c873b1f615c35c1967f912db92a": {
|
||||
"compensate": 1.075,
|
||||
"compensate": 1.03,
|
||||
"mdx_dim_f_set": 3072,
|
||||
"mdx_dim_t_set": 8,
|
||||
"mdx_n_fft_scale_set": 7680,
|
||||
"primary_stem": "Vocals"
|
||||
},
|
||||
"1c56ec0224f1d559c42fd6fd2a67b154": {
|
||||
"compensate": 1.035,
|
||||
"compensate": 1.025,
|
||||
"mdx_dim_f_set": 2048,
|
||||
"mdx_dim_t_set": 8,
|
||||
"mdx_n_fft_scale_set": 5120,
|
||||
@ -189,25 +190,33 @@
|
||||
"primary_stem": "Instrumental"
|
||||
},
|
||||
"94ff780b977d3ca07c7a343dab2e25dd": {
|
||||
"compensate": 1.035,
|
||||
"compensate": 1.039,
|
||||
"mdx_dim_f_set": 3072,
|
||||
"mdx_dim_t_set": 8,
|
||||
"mdx_n_fft_scale_set": 6144,
|
||||
"primary_stem": "Instrumental"
|
||||
},
|
||||
"73492b58195c3b52d34590d5474452f6": {
|
||||
"compensate": 1.075,
|
||||
"compensate": 1.043,
|
||||
"mdx_dim_f_set": 3072,
|
||||
"mdx_dim_t_set": 8,
|
||||
"mdx_n_fft_scale_set": 7680,
|
||||
"primary_stem": "Vocals"
|
||||
},
|
||||
"970b3f9492014d18fefeedfe4773cb42": {
|
||||
"compensate": 1.009,
|
||||
"mdx_dim_f_set": 3072,
|
||||
"mdx_dim_t_set": 8,
|
||||
"mdx_n_fft_scale_set": 7680,
|
||||
"primary_stem": "Vocals"
|
||||
},
|
||||
"1d64a6d2c30f709b8c9b4ce1366d96ee": {
|
||||
"compensate": 1.035,
|
||||
"compensate": 1.065,
|
||||
"mdx_dim_f_set": 2048,
|
||||
"mdx_dim_t_set": 8,
|
||||
"mdx_n_fft_scale_set": 5120,
|
||||
"primary_stem": "Instrumental"
|
||||
"primary_stem": "Instrumental",
|
||||
"is_karaoke": true
|
||||
},
|
||||
"203f2a3955221b64df85a41af87cf8f0": {
|
||||
"compensate": 1.035,
|
||||
@ -229,5 +238,114 @@
|
||||
"mdx_dim_t_set": 8,
|
||||
"mdx_n_fft_scale_set": 6144,
|
||||
"primary_stem": "Instrumental"
|
||||
},
|
||||
"cc63408db3d80b4d85b0287d1d7c9632": {
|
||||
"compensate": 1.033,
|
||||
"mdx_dim_f_set": 3072,
|
||||
"mdx_dim_t_set": 8,
|
||||
"mdx_n_fft_scale_set": 6144,
|
||||
"primary_stem": "Instrumental"
|
||||
},
|
||||
"cd5b2989ad863f116c855db1dfe24e39": {
|
||||
"compensate": 1.035,
|
||||
"mdx_dim_f_set": 3072,
|
||||
"mdx_dim_t_set": 9,
|
||||
"mdx_n_fft_scale_set": 6144,
|
||||
"primary_stem": "Reverb"
|
||||
},
|
||||
"55657dd70583b0fedfba5f67df11d711": {
|
||||
"compensate": 1.022,
|
||||
"mdx_dim_f_set": 3072,
|
||||
"mdx_dim_t_set": 8,
|
||||
"mdx_n_fft_scale_set": 6144,
|
||||
"primary_stem": "Instrumental"
|
||||
},
|
||||
"b6bccda408a436db8500083ef3491e8b": {
|
||||
"compensate": 1.02,
|
||||
"mdx_dim_f_set": 3072,
|
||||
"mdx_dim_t_set": 8,
|
||||
"mdx_n_fft_scale_set": 7680,
|
||||
"primary_stem": "Instrumental"
|
||||
},
|
||||
"8a88db95c7fb5dbe6a095ff2ffb428b1": {
|
||||
"compensate": 1.026,
|
||||
"mdx_dim_f_set": 2048,
|
||||
"mdx_dim_t_set": 8,
|
||||
"mdx_n_fft_scale_set": 5120,
|
||||
"primary_stem": "Instrumental"
|
||||
},
|
||||
"b78da4afc6512f98e4756f5977f5c6b9": {
|
||||
"compensate": 1.021,
|
||||
"mdx_dim_f_set": 3072,
|
||||
"mdx_dim_t_set": 8,
|
||||
"mdx_n_fft_scale_set": 7680,
|
||||
"primary_stem": "Instrumental"
|
||||
},
|
||||
"77d07b2667ddf05b9e3175941b4454a0": {
|
||||
"compensate": 1.021,
|
||||
"mdx_dim_f_set": 3072,
|
||||
"mdx_dim_t_set": 8,
|
||||
"mdx_n_fft_scale_set": 7680,
|
||||
"primary_stem": "Vocals"
|
||||
},
|
||||
"2154254ee89b2945b97a7efed6e88820": {
|
||||
"config_yaml": "model_2_stem_061321.yaml"
|
||||
},
|
||||
"063aadd735d58150722926dcbf5852a9": {
|
||||
"config_yaml": "model_2_stem_061321.yaml"
|
||||
},
|
||||
"c09f714d978b41d718facfe3427e6001": {
|
||||
"config_yaml": "model_2_stem_061321.yaml"
|
||||
},
|
||||
"fe96801369f6a148df2720f5ced88c19": {
|
||||
"config_yaml": "model3.yaml"
|
||||
},
|
||||
"02e8b226f85fb566e5db894b9931c640": {
|
||||
"config_yaml": "model2.yaml"
|
||||
},
|
||||
"e3de6d861635ab9c1d766149edd680d6": {
|
||||
"config_yaml": "model1.yaml"
|
||||
},
|
||||
"3f2936c554ab73ce2e396d54636bd373": {
|
||||
"config_yaml": "modelB.yaml"
|
||||
},
|
||||
"890d0f6f82d7574bca741a9e8bcb8168": {
|
||||
"config_yaml": "modelB.yaml"
|
||||
},
|
||||
"63a3cb8c37c474681049be4ad1ba8815": {
|
||||
"config_yaml": "modelB.yaml"
|
||||
},
|
||||
"a7fc5d719743c7fd6b61bd2b4d48b9f0": {
|
||||
"config_yaml": "modelA.yaml"
|
||||
},
|
||||
"3567f3dee6e77bf366fcb1c7b8bc3745": {
|
||||
"config_yaml": "modelA.yaml"
|
||||
},
|
||||
"a28f4d717bd0d34cd2ff7a3b0a3d065e": {
|
||||
"config_yaml": "modelA.yaml"
|
||||
},
|
||||
"c9971a18da20911822593dc81caa8be9": {
|
||||
"config_yaml": "sndfx.yaml"
|
||||
},
|
||||
"57d94d5ed705460d21c75a5ac829a605": {
|
||||
"config_yaml": "sndfx.yaml"
|
||||
},
|
||||
"e7a25f8764f25a52c1b96c4946e66ba2": {
|
||||
"config_yaml": "sndfx.yaml"
|
||||
},
|
||||
"104081d24e37217086ce5fde09147ee1": {
|
||||
"config_yaml": "model_2_stem_061321.yaml"
|
||||
},
|
||||
"1e6165b601539f38d0a9330f3facffeb": {
|
||||
"config_yaml": "model_2_stem_061321.yaml"
|
||||
},
|
||||
"fe0108464ce0d8271be5ab810891bd7c": {
|
||||
"config_yaml": "model_2_stem_full_band.yaml"
|
||||
},
|
||||
"e9b82ec90ee56c507a3a982f1555714c": {
|
||||
"config_yaml": "model_2_stem_full_band_2.yaml"
|
||||
},
|
||||
"99b6ceaae542265a3b6d657bf9fde79f": {
|
||||
"config_yaml": "model_2_stem_full_band_8k.yaml"
|
||||
}
|
||||
}
|
@ -7,7 +7,16 @@
|
||||
"UVR-MDX-NET-Inst_1": "UVR-MDX-NET Inst 1",
|
||||
"UVR-MDX-NET-Inst_2": "UVR-MDX-NET Inst 2",
|
||||
"UVR-MDX-NET-Inst_3": "UVR-MDX-NET Inst 3",
|
||||
"UVR-MDX-NET-Inst_4": "UVR-MDX-NET Inst 4",
|
||||
"UVR-MDX-NET-Inst_Main": "UVR-MDX-NET Inst Main",
|
||||
"UVR-MDX-NET-Inst_Main_2": "UVR-MDX-NET Inst Main 2",
|
||||
"UVR-MDX-NET-Inst_HQ_1": "UVR-MDX-NET Inst HQ 1",
|
||||
"UVR_MDXNET_KARA_2": "UVR-MDX-NET Karaoke 2"
|
||||
"UVR-MDX-NET-Inst_HQ_2": "UVR-MDX-NET Inst HQ 2",
|
||||
"UVR-MDX-NET-Inst_HQ_3": "UVR-MDX-NET Inst HQ 3",
|
||||
"UVR_MDXNET_KARA_2": "UVR-MDX-NET Karaoke 2",
|
||||
"Kim_Vocal_1": "Kim Vocal 1",
|
||||
"Kim_Vocal_2": "Kim Vocal 2",
|
||||
"Kim_Inst": "Kim Inst",
|
||||
"MDX23C-8KFFT-InstVoc_HQ.ckpt": "MDX23C-InstVoc HQ",
|
||||
"Reverb_HQ_By_FoxJoy": "Reverb HQ"
|
||||
}
|
BIN
models/VR_Models/UVR-DeNoise-Lite.pth
Normal file
@ -13,7 +13,7 @@
|
||||
},
|
||||
"2aa34fbc01f8e6d2bf509726481e7142": {
|
||||
"vr_model_param": "4band_44100",
|
||||
"primary_stem": "Other"
|
||||
"primary_stem": "No Piano"
|
||||
},
|
||||
"3e18f639b11abea7361db1a4a91c2559": {
|
||||
"vr_model_param": "4band_44100",
|
||||
@ -29,7 +29,8 @@
|
||||
},
|
||||
"6b5916069a49be3fe29d4397ecfd73fa": {
|
||||
"vr_model_param": "3band_44100_msb2",
|
||||
"primary_stem": "Instrumental"
|
||||
"primary_stem": "Instrumental",
|
||||
"is_karaoke": true
|
||||
},
|
||||
"74b3bc5fa2b69f29baf7839b858bc679": {
|
||||
"vr_model_param": "4band_44100",
|
||||
@ -85,10 +86,52 @@
|
||||
},
|
||||
"f6ea8473ff86017b5ebd586ccacf156b": {
|
||||
"vr_model_param": "4band_v2_sn",
|
||||
"primary_stem": "Instrumental"
|
||||
"primary_stem": "Instrumental",
|
||||
"is_karaoke": true
|
||||
},
|
||||
"fd297a61eafc9d829033f8b987c39a3d": {
|
||||
"vr_model_param": "1band_sr32000_hl512",
|
||||
"primary_stem": "Instrumental"
|
||||
},
|
||||
"0ec76fd9e65f81d8b4fbd13af4826ed8": {
|
||||
"vr_model_param": "4band_v3",
|
||||
"primary_stem": "No Woodwinds"
|
||||
},
|
||||
"0fb9249ffe4ffc38d7b16243f394c0ff": {
|
||||
"vr_model_param": "4band_v3",
|
||||
"primary_stem": "No Reverb"
|
||||
},
|
||||
"6857b2972e1754913aad0c9a1678c753": {
|
||||
"vr_model_param": "4band_v3",
|
||||
"primary_stem": "Echo",
|
||||
"nout": 48,
|
||||
"nout_lstm": 128
|
||||
},
|
||||
"f200a145434efc7dcf0cd093f517ed52": {
|
||||
"vr_model_param": "4band_v3",
|
||||
"primary_stem": "No Echo",
|
||||
"nout": 48,
|
||||
"nout_lstm": 128
|
||||
},
|
||||
"44c55d8b5d2e3edea98c2b2bf93071c7": {
|
||||
"vr_model_param": "4band_v3",
|
||||
"primary_stem": "Noise",
|
||||
"nout": 48,
|
||||
"nout_lstm": 128
|
||||
},
|
||||
"51ea8c43a6928ed3c10ef5cb2707d57b": {
|
||||
"vr_model_param": "1band_sr44100_hl1024",
|
||||
"primary_stem": "Noise",
|
||||
"nout": 16,
|
||||
"nout_lstm": 128
|
||||
},
|
||||
"944950a9c5963a5eb70b445d67b7068a": {
|
||||
"vr_model_param": "4band_v3_sn",
|
||||
"primary_stem": "Vocals",
|
||||
"nout": 64,
|
||||
"nout_lstm": 128,
|
||||
"is_karaoke": false,
|
||||
"is_bv_model": true,
|
||||
"is_bv_model_rebalanced": 0.9
|
||||
}
|
||||
}
|
@ -11,8 +11,9 @@ julius==0.2.7
|
||||
kthread==0.2.3
|
||||
librosa==0.9.2
|
||||
llvmlite==0.39.1
|
||||
matchering==2.0.6
|
||||
ml_collections==0.1.1
|
||||
natsort==8.2.0
|
||||
numba==0.56.4
|
||||
numpy==1.23.4
|
||||
omegaconf==2.2.3
|
||||
opencv-python==4.6.0.66
|
||||
@ -32,7 +33,7 @@ resampy==0.2.2
|
||||
scipy==1.9.3
|
||||
soundfile==0.11.0
|
||||
soundstretch==1.2
|
||||
torch==1.13.1
|
||||
torch==1.9.0+cu111
|
||||
tqdm
|
||||
urllib3==1.26.12
|
||||
wget==3.2
|
||||
|