Merge pull request #818 from Anjok07/v5.6

V5.6
This commit is contained in:
Anjok07 2023-09-25 21:25:51 -05:00 committed by GitHub
commit a897c05a82
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
55 changed files with 6782 additions and 2945 deletions

View File

@ -1,5 +1,5 @@
# Ultimate Vocal Remover GUI v5.5.1
<img src="https://raw.githubusercontent.com/Anjok07/ultimatevocalremovergui/master/gui_data/img/UVR_5_5_1.png?raw=true" />
# Ultimate Vocal Remover GUI v5.6
<img src="https://raw.githubusercontent.com/Anjok07/ultimatevocalremovergui/v5.6/gui_data/img/UVR_5_6_0.png?raw=true" />
[![Release](https://img.shields.io/github/release/anjok07/ultimatevocalremovergui.svg)](https://github.com/anjok07/ultimatevocalremovergui/releases/latest)
[![Downloads](https://img.shields.io/github/downloads/anjok07/ultimatevocalremovergui/total.svg)](https://github.com/anjok07/ultimatevocalremovergui/releases)
@ -73,7 +73,6 @@ In order to use the Time Stretch or Change Pitch tool, you'll need Rubber Band.
</details>
### MacOS Installation
- Please Note:
- This bundle is intended for those running macOS Catalina and above.
- Application functionality for systems running macOS Mojave or lower is not guaranteed.
@ -167,7 +166,6 @@ pip3 install -r requirements.txt
</details>
### Other Application Notes
- Nvidia RTX 1060 6GB is the minimum requirement for GPU conversions.
- Nvidia GPUs with at least 8GBs of V-RAM are recommended.
- AMD Radeon GPUs are not supported at this time.
@ -178,78 +176,10 @@ pip3 install -r requirements.txt
- Conversion times will significantly depend on your hardware.
- These models are computationally intensive.
## Change Log
### Most Recent Changes:
- Fixed Download Center model list issue.
- Fixed audio clip in ensemble mode.
- Fixed output model name issue in ensemble mode.
- Added "Batch Mode" for MDX-Net to increase performance.
- Batch Mode is more memory efficient.
- Batch Mode produces the best output, regardless of batch size.
- Added Batch Mode for VR Architecture.
- Added Mixer Mode for Demucs.
- This option may improve separation for some 4-stem models.
### Fixes & Changes going from UVR v5.4 to v5.5:
- The progress bar is now fully synced up with every process in the application.
- Drag-n-drop feature should now work every time.
- Users can now drop large batches of files and directories as inputs. When directories are dropped, the application will search for any file with an audio extension and add it to the list of inputs.
- Fixed low-resolution icon.
- Added the ability to download models manually if the application can't connect to the internet.
- Various bug fixes for the Download Center.
- Various design changes.
### Performance:
- Model load times are faster.
- Importing/exporting audio files is faster.
### New Options:
- "Select Saved Settings" option - Allows the user to save the current settings of the whole application. You can also load saved settings or reset them to the default.
- "Right-click" menu - Allows for quick access to important options.
- "Help Hints" option - When enabled, users can hover over options to see pop-up text that describes that option. The right-clicking option also allows copying the "Help Hint" text.
- Secondary Model Mode - This option is an expanded version of the "Demucs Model" option only available to MDX-Net. Except now, this option is available in all three AI Networks and for any stem. Any model can now be Secondary, and the user can choose the amount of influence it has on the final result.
- Robust caching for ensemble mode, allowing for much faster processing times.
- Clicking the "Input" field will pop up a new window that allows the user to go through all of the selected audio inputs. Within this menu, users can:
- Remove inputs.
- Verify inputs.
- Create samples of selected inputs.
- "Sample Mode" option - Allows the user to process only part of a track to sample settings or a model without running a complete conversion.
- The number in the parentheses is the current number of seconds the generated sample will be.
- You can choose the number of seconds to extract from the track in the "Additional Settings" menu.
### VR Architecture:
- Ability to toggle "High-End Processing."
- Support for the latest VR architecture
- Crop Size and Batch Size are specifically for models using the latest architecture only.
### MDX-NET:
- "Denoise Output" option results in cleaner results, but the processing time will be longer. This option has replaced Noise Reduction.
- "Spectral Inversion" option uses spectral inversion techniques for a cleaner secondary stem result. This option may slow down the audio export process.
- Secondary stem now has the same frequency cut-off as the main stem.
### Demucs:
- Demucs v4 models are now supported, including the 6-stem model.
- Combining remaining stems instead of inverting selected stem with the mixture only when a user does not select "All Stems."
- A "Pre-process" model that allows the user to run an inference through a robust vocal or instrumental model and separate the remaining stems from its generated instrumental mix. This option can significantly reduce vocal bleed in other Demucs-generated non-vocal stems.
- The Pre-process model is intended for Demucs separations for all stems except vocals and instrumentals.
### Ensemble Mode:
- Ensemble Mode has been extended to include the following:
- "Averaging" is a new algorithm that averages the final results.
- Unlimited models in the ensemble.
- Ability to save different ensembles.
- Ability to ensemble outputs for all individual stem types.
- Ability to choose unique ensemble algorithms.
- Ability to ensemble all 4 Demucs stems at once.
## Troubleshooting
### Common Issues

5329
UVR.py

File diff suppressed because it is too large Load Diff

View File

@ -1,4 +1,4 @@
VERSION = 'v5.5.1'
PATCH = 'UVR_Patch_3_31_23_5_5'
PATCH_MAC = 'UVR_Patch_01_10_12_6_50'
PATCH_LINUX = 'UVR_Patch_01_01_23_6_50'
VERSION = 'v5.6.0'
PATCH = 'UVR_Patch_9_25_23_2_1'
PATCH_MAC = 'UVR_Patch_9_25_23_2_1'
PATCH_LINUX = 'UVR_Patch_9_25_23_2_1'

View File

@ -140,6 +140,8 @@ def apply_model(model, mix, shifts=1, split=True, overlap=0.25, transition_power
be on `device`, while the entire tracks will be stored on `mix.device`.
"""
#print("Progress Bar?: ", type(set_progress_bar))
global fut_length
global bag_num
global prog_bar

View File

@ -1,13 +1,26 @@
import os
import platform
from screeninfo import get_monitors
from PIL import Image
from PIL import ImageTk
OPERATING_SYSTEM = platform.system()
def get_screen_height():
monitors = get_monitors()
if len(monitors) == 0:
raise Exception("Failed to get screen height")
return monitors[0].height
return monitors[0].height, monitors[0].width
def scale_values(value):
if not SCALE_WIN_SIZE == 1920:
ratio = SCALE_WIN_SIZE/1920 # Approx. 1.3333 for 2K
return value * ratio
else:
return value
SCREEN_HIGHT, SCREEN_WIDTH = get_screen_height()
SCALE_WIN_SIZE = 1920
SCREEN_SIZE_VALUES = {
"normal": {
@ -20,10 +33,10 @@ SCREEN_SIZE_VALUES = {
'COMMAND_HEIGHT': 141,
'PROGRESS_HEIGHT': 25,
'PADDING': 7,
'WIDTH': 680
},
"small": {
"credits_img":(50, 50),
## App Size
'IMAGE_HEIGHT': 135,
'FILEPATHS_HEIGHT': 85,
'OPTIONS_HEIGHT': 274,
@ -31,6 +44,7 @@ SCREEN_SIZE_VALUES = {
'COMMAND_HEIGHT': 80,
'PROGRESS_HEIGHT': 6,
'PADDING': 5,
'WIDTH': 680
},
"medium": {
"credits_img":(50, 50),
@ -42,23 +56,24 @@ SCREEN_SIZE_VALUES = {
'COMMAND_HEIGHT': 115,
'PROGRESS_HEIGHT': 9,
'PADDING': 7,
'WIDTH': 680
},
}
try:
if get_screen_height() >= 900:
if SCREEN_HIGHT >= 900:
determined_size = SCREEN_SIZE_VALUES["normal"]
elif get_screen_height() <= 720:
elif SCREEN_HIGHT <= 720:
determined_size = SCREEN_SIZE_VALUES["small"]
else:
determined_size = SCREEN_SIZE_VALUES["medium"]
except:
determined_size = SCREEN_SIZE_VALUES["normal"]
image_scale_1, image_scale_2 = 20, 33
class ImagePath():
def __init__(self, base_path):
img_path = os.path.join(base_path, 'gui_data', 'img')
credits_path = os.path.join(img_path, 'credits.png')
donate_path = os.path.join(img_path, 'donate.png')
@ -69,16 +84,28 @@ class ImagePath():
stop_path = os.path.join(img_path, 'stop.png')
play_path = os.path.join(img_path, 'play.png')
pause_path = os.path.join(img_path, 'pause.png')
up_img_path = os.path.join(img_path, "up.png")
down_img_path = os.path.join(img_path, "down.png")
left_img_path = os.path.join(img_path, "left.png")
right_img_path = os.path.join(img_path, "right.png")
clear_img_path = os.path.join(img_path, "clear.png")
copy_img_path = os.path.join(img_path, "copy.png")
self.banner_path = os.path.join(img_path, 'UVR-banner.png')
self.efile_img = self.open_image(path=efile_path,size=(20, 20))
self.stop_img = self.open_image(path=stop_path, size=(20, 20))
self.play_img = self.open_image(path=play_path, size=(20, 20))
self.pause_img = self.open_image(path=pause_path, size=(20, 20))
self.help_img = self.open_image(path=help_path, size=(20, 20))
self.download_img = self.open_image(path=download_path, size=(30, 30))
self.donate_img = self.open_image(path=donate_path, size=(30, 30))
self.key_img = self.open_image(path=key_path, size=(30, 30))
self.efile_img = self.open_image(path=efile_path,size=(image_scale_1, image_scale_1))
self.stop_img = self.open_image(path=stop_path, size=(image_scale_1, image_scale_1))
self.play_img = self.open_image(path=play_path, size=(image_scale_1, image_scale_1))
self.pause_img = self.open_image(path=pause_path, size=(image_scale_1, image_scale_1))
self.help_img = self.open_image(path=help_path, size=(image_scale_1, image_scale_1))
self.download_img = self.open_image(path=download_path, size=(image_scale_2, image_scale_2))
self.donate_img = self.open_image(path=donate_path, size=(image_scale_2, image_scale_2))
self.key_img = self.open_image(path=key_path, size=(image_scale_2, image_scale_2))
self.up_img = self.open_image(path=up_img_path, size=(image_scale_2, image_scale_2))
self.down_img = self.open_image(path=down_img_path, size=(image_scale_2, image_scale_2))
self.left_img = self.open_image(path=left_img_path, size=(image_scale_2, image_scale_2))
self.right_img = self.open_image(path=right_img_path, size=(image_scale_2, image_scale_2))
self.clear_img = self.open_image(path=clear_img_path, size=(image_scale_2, image_scale_2))
self.copy_img = self.open_image(path=copy_img_path, size=(image_scale_2, image_scale_2))
self.credits_img = self.open_image(path=credits_path, size=determined_size["credits_img"])
def open_image(self, path: str, size: tuple = None, keep_aspect: bool = True, rotate: int = 0) -> ImageTk.PhotoImage:
@ -111,11 +138,233 @@ class ImagePath():
return ImageTk.PhotoImage(img)
class AdjustedValues():
IMAGE_HEIGHT = determined_size["IMAGE_HEIGHT"]
FILEPATHS_HEIGHT = determined_size["FILEPATHS_HEIGHT"]
OPTIONS_HEIGHT = determined_size["OPTIONS_HEIGHT"]
CONVERSIONBUTTON_HEIGHT = determined_size["CONVERSIONBUTTON_HEIGHT"]
COMMAND_HEIGHT = determined_size["COMMAND_HEIGHT"]
PROGRESS_HEIGHT = determined_size["PROGRESS_HEIGHT"]
PADDING = determined_size["PADDING"]
#All Sizes Below Calibrated to 1080p!
if OPERATING_SYSTEM=="Darwin":
FONT_SIZE_F1 = 13
FONT_SIZE_F2 = 11
FONT_SIZE_F3 = 12
FONT_SIZE_0 = 9
FONT_SIZE_1 = 11
FONT_SIZE_2 = 12
FONT_SIZE_3 = 13
FONT_SIZE_4 = 14
FONT_SIZE_5 = 15
FONT_SIZE_6 = 17
HELP_HINT_CHECKBOX_WIDTH = 13
MDX_CHECKBOXS_WIDTH = 14
VR_CHECKBOXS_WIDTH = 14
ENSEMBLE_CHECKBOXS_WIDTH = 18
DEMUCS_CHECKBOXS_WIDTH = 14
DEMUCS_PRE_CHECKBOXS_WIDTH = 20
GEN_SETTINGS_WIDTH = 17
MENU_COMBOBOX_WIDTH = 16
MENU_OPTION_WIDTH = 12
READ_ONLY_COMBO_WIDTH = 35
SETTINGS_BUT_WIDTH = 19
VR_BUT_WIDTH = 16
SET_MENUS_CHECK_WIDTH = 12
COMBO_WIDTH = 14
SET_VOC_SPLIT_CHECK_WIDTH = 21
elif OPERATING_SYSTEM=="Linux":
FONT_SIZE_F1 = 10
FONT_SIZE_F2 = 8
FONT_SIZE_F3 = 9
FONT_SIZE_0 = 7
FONT_SIZE_1 = 8
FONT_SIZE_2 = 9
FONT_SIZE_3 = 10
FONT_SIZE_4 = 11
FONT_SIZE_5 = 12
FONT_SIZE_6 = 15
HELP_HINT_CHECKBOX_WIDTH = 13
MDX_CHECKBOXS_WIDTH = 14
VR_CHECKBOXS_WIDTH = 16
ENSEMBLE_CHECKBOXS_WIDTH = 25
DEMUCS_CHECKBOXS_WIDTH = 18
DEMUCS_PRE_CHECKBOXS_WIDTH = 27
GEN_SETTINGS_WIDTH = 17
MENU_COMBOBOX_WIDTH = 19
MENU_OPTION_WIDTH = 15
READ_ONLY_COMBO_WIDTH = 45
COMBO_WIDTH = 19
SETTINGS_BUT_WIDTH = 26
VR_BUT_WIDTH = 20
SET_MENUS_CHECK_WIDTH = 15
SET_VOC_SPLIT_CHECK_WIDTH = 28
elif OPERATING_SYSTEM=="Windows":
HELP_HINT_CHECKBOX_WIDTH = 15
MDX_CHECKBOXS_WIDTH = 14
VR_CHECKBOXS_WIDTH = 14
ENSEMBLE_CHECKBOXS_WIDTH = 20
DEMUCS_CHECKBOXS_WIDTH = 14
DEMUCS_PRE_CHECKBOXS_WIDTH = 20
GEN_SETTINGS_WIDTH = 18
MENU_COMBOBOX_WIDTH = 16
MENU_OPTION_WIDTH = 12
READ_ONLY_COMBO_WIDTH = 35
SETTINGS_BUT_WIDTH = 20
VR_BUT_WIDTH = 16
SET_MENUS_CHECK_WIDTH = 13
COMBO_WIDTH = 14
SET_VOC_SPLIT_CHECK_WIDTH = 23
FONT_SIZE_F1 = 10
FONT_SIZE_F2 = 8
FONT_SIZE_F3 = 9
FONT_SIZE_0 = 7
FONT_SIZE_1 = 8
FONT_SIZE_2 = 9
FONT_SIZE_3 = 10
FONT_SIZE_4 = 11
FONT_SIZE_5 = 13
FONT_SIZE_6 = 15
#Main Size Values:
IMAGE_HEIGHT = determined_size["IMAGE_HEIGHT"]
FILEPATHS_HEIGHT = determined_size["FILEPATHS_HEIGHT"]
OPTIONS_HEIGHT = determined_size["OPTIONS_HEIGHT"]
CONVERSIONBUTTON_HEIGHT = determined_size["CONVERSIONBUTTON_HEIGHT"]
COMMAND_HEIGHT = determined_size["COMMAND_HEIGHT"]
PROGRESS_HEIGHT = determined_size["PROGRESS_HEIGHT"]
PADDING = determined_size["PADDING"]
WIDTH = determined_size["WIDTH"]
# IMAGE_HEIGHT = 140
# FILEPATHS_HEIGHT = 75
# OPTIONS_HEIGHT = 262
# CONVERSIONBUTTON_HEIGHT = 30
# COMMAND_HEIGHT = 141
# PROGRESS_HEIGHT = 25
# PADDING = 7
# WIDTH = 680
MENU_PADDING_1 = 5
MENU_PADDING_2 = 10
MENU_PADDING_3 = 15
MENU_PADDING_4 = 3
#Main Frame Sizes
X_CONVERSION_BUTTON_1080P = 50
WIDTH_CONVERSION_BUTTON_1080P = -100
HEIGHT_GENERIC_BUTTON_1080P = 35
X_STOP_BUTTON_1080P = -10 - 35
X_SETTINGS_BUTTON_1080P = -670
X_PROGRESSBAR_1080P = 25
WIDTH_PROGRESSBAR_1080P = -50
X_CONSOLE_FRAME_1080P = 15
WIDTH_CONSOLE_FRAME_1080P = -30
HO_S = 7
#File Frame Sizes
FILEPATHS_FRAME_X = 10
FILEPATHS_FRAME_Y = 155
FILEPATHS_FRAME_WIDTH = -20
MUSICFILE_BUTTON_X = 0
MUSICFILE_BUTTON_Y = 5
MUSICFILE_BUTTON_WIDTH = 0
MUSICFILE_BUTTON_HEIGHT = -5
MUSICFILE_ENTRY_X = 7.5
MUSICFILE_ENTRY_WIDTH = -50
MUSICFILE_ENTRY_HEIGHT = -5
MUSICFILE_OPEN_X = -45
MUSICFILE_OPEN_Y = 160
MUSICFILE_OPEN_WIDTH = 35
MUSICFILE_OPEN_HEIGHT = 33
SAVETO_BUTTON_X = 0
SAVETO_BUTTON_Y = 5
SAVETO_BUTTON_WIDTH = 0
SAVETO_BUTTON_HEIGHT = -5
SAVETO_ENTRY_X = 7.5
SAVETO_ENTRY_WIDTH = -50
SAVETO_ENTRY_HEIGHT = -5
SAVETO_OPEN_X = -45
SAVETO_OPEN_Y = 197.5
SAVETO_OPEN_WIDTH = 35
SAVETO_OPEN_HEIGHT = 32
#Main Option menu
OPTIONS_FRAME_X = 10
OPTIONS_FRAME_Y = 250
OPTIONS_FRAME_WIDTH = -20
FILEONE_LABEL_X = -28
FILEONE_LABEL_WIDTH = -38
FILETWO_LABEL_X = -32
FILETWO_LABEL_WIDTH = -20
TIME_WINDOW_LABEL_X = -43
TIME_WINDOW_LABEL_WIDTH = 0
INTRO_ANALYSIS_LABEL_X = -83
INTRO_ANALYSIS_LABEL_WIDTH = -50
INTRO_ANALYSIS_OPTION_X = -68
DB_ANALYSIS_LABEL_X = 62
DB_ANALYSIS_LABEL_WIDTH = -34
DB_ANALYSIS_OPTION_X = 86
WAV_TYPE_SET_LABEL_X = -43
WAV_TYPE_SET_LABEL_WIDTH = 0
ENTRY_WIDTH = 222
# Constants for the ensemble_listbox_Frame
ENSEMBLE_LISTBOX_FRAME_X = -25
ENSEMBLE_LISTBOX_FRAME_Y = -20
ENSEMBLE_LISTBOX_FRAME_WIDTH = 0
ENSEMBLE_LISTBOX_FRAME_HEIGHT = 67
# Constants for the ensemble_listbox_scroll
ENSEMBLE_LISTBOX_SCROLL_X = 195
ENSEMBLE_LISTBOX_SCROLL_Y = -20
ENSEMBLE_LISTBOX_SCROLL_WIDTH = -48
ENSEMBLE_LISTBOX_SCROLL_HEIGHT = 69
# Constants for Radio Buttons
RADIOBUTTON_X_WAV = 457
RADIOBUTTON_X_FLAC = 300
RADIOBUTTON_X_MP3 = 143
RADIOBUTTON_Y = -5
RADIOBUTTON_WIDTH = 0
RADIOBUTTON_HEIGHT = 6
MAIN_ROW_Y_1 = -15
MAIN_ROW_Y_2 = -17
MAIN_ROW_X_1 = -4
MAIN_ROW_X_2 = 21
MAIN_ROW_2_Y_1 = -15
MAIN_ROW_2_Y_2 = -17
MAIN_ROW_2_X_1 = -28
MAIN_ROW_2_X_2 = 1
LOW_MENU_Y_1 = 18
LOW_MENU_Y_2 = 16
SUB_ENT_ROW_X = -2
MAIN_ROW_WIDTH = -53
MAIN_ROW_ALIGN_WIDTH = -86
CHECK_BOX_Y = 0
CHECK_BOX_X = 20
CHECK_BOX_WIDTH = -49
CHECK_BOX_HEIGHT = 2
LEFT_ROW_WIDTH = -10
LABEL_HEIGHT = -5
OPTION_HEIGHT = 8
LABEL_X_OFFSET = -28
LABEL_WIDTH = -38
ENTRY_WIDTH = 179.5
ENTRY_OPEN_BUTT_WIDTH = -185
ENTRY_OPEN_BUTT_X_OFF = 405
UPDATE_LABEL_WIDTH = 35
HEIGHT_CONSOLE_FRAME_1080P = COMMAND_HEIGHT + HO_S
LOW_MENU_Y = LOW_MENU_Y_1, LOW_MENU_Y_2
MAIN_ROW_Y = MAIN_ROW_Y_1, MAIN_ROW_Y_2
MAIN_ROW_X = MAIN_ROW_X_1, MAIN_ROW_X_2
MAIN_ROW_2_Y = MAIN_ROW_2_Y_1, MAIN_ROW_2_Y_2
MAIN_ROW_2_X = MAIN_ROW_2_X_1, MAIN_ROW_2_X_2
LABEL_Y = MAIN_ROW_Y[0]
ENTRY_Y = MAIN_ROW_Y[1]
BUTTON_Y_1080P = IMAGE_HEIGHT + FILEPATHS_HEIGHT + OPTIONS_HEIGHT - 8 + PADDING*2
HEIGHT_PROGRESSBAR_1080P = PROGRESS_HEIGHT
Y_OFFSET_PROGRESS_BAR_1080P = IMAGE_HEIGHT + FILEPATHS_HEIGHT + OPTIONS_HEIGHT + CONVERSIONBUTTON_HEIGHT + COMMAND_HEIGHT + PADDING*4
Y_OFFSET_CONSOLE_FRAME_1080P = IMAGE_HEIGHT + FILEPATHS_HEIGHT + OPTIONS_HEIGHT + CONVERSIONBUTTON_HEIGHT + PADDING + X_PROGRESSBAR_1080P
LABEL_Y_OFFSET = MAIN_ROW_Y[0]
ENTRY_X_OFFSET = SUB_ENT_ROW_X
ENTRY_Y_OFFSET = MAIN_ROW_Y[1]
OPTION_WIDTH = MAIN_ROW_ALIGN_WIDTH

File diff suppressed because it is too large Load Diff

View File

@ -101,6 +101,6 @@ def error_dialouge(exception):
final_message = full_text
break
else:
final_message = (f'{error_name}: {exception}\n\n{CONTACT_DEV}')
final_message = (f'An Error Occurred: {error_name}\n\n{CONTACT_DEV}')
return final_message

Binary file not shown.

View File

@ -0,0 +1 @@
0

Binary file not shown.

Before

Width:  |  Height:  |  Size: 45 KiB

After

Width:  |  Height:  |  Size: 108 KiB

BIN
gui_data/img/UVR_5_6_0.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 124 KiB

BIN
gui_data/img/clear.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 757 B

BIN
gui_data/img/copy.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 18 KiB

BIN
gui_data/img/down.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 614 B

BIN
gui_data/img/left.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 438 B

BIN
gui_data/img/right.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 425 B

Binary file not shown.

Before

Width:  |  Height:  |  Size: 276 KiB

After

Width:  |  Height:  |  Size: 276 KiB

BIN
gui_data/img/up.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 491 B

File diff suppressed because one or more lines are too long

4
gui_data/own_font.json Normal file
View File

@ -0,0 +1,4 @@
{
"font_name": null,
"font_file": null
}

View File

@ -38,10 +38,11 @@ def init(func):
@init
def set_theme(theme):
def set_theme(theme, font_name="Century Gothic", f_size=10):
if theme not in {"dark", "light"}:
raise RuntimeError(f"not a valid theme name: {theme}")
root.globalsetvar("fontName", (font_name, f_size))
root.tk.call("set_theme", theme)

View File

@ -8,6 +8,8 @@ proc set_theme {mode} {
if {$mode == "dark"} {
ttk::style theme use "sun-valley-dark"
set fontString "$::fontName"
array set colors {
-fg "#F6F6F7"
-bg "#0e0e0f"
@ -26,7 +28,7 @@ proc set_theme {mode} {
-insertwidth 0 \
-insertcolor $colors(-fg) \
-fieldbackground $colors(-selectbg) \
-font {"Century Gothic" 10} \
-font $fontString \
-borderwidth 0 \
-relief flat

View File

@ -140,6 +140,29 @@ namespace eval ttk::theme::sun-valley-dark {
TSeparator.separator -sticky nsew
}
# # Modify the TCombobox style
# ttk::style configure TCombobox -borderwidth 3
# # Define the layout of the ThickBorder.TCombobox
# ttk::style layout ThickBorder.TCombobox {
# Combobox.field -sticky nsew -children {
# Combobox.padding -expand 1 -sticky nsew -children {
# Combobox.textarea -sticky nsew
# }
# }
# null -side right -sticky ns -children {
# Combobox.arrow -sticky nsew
# }
# }
# # Use a canvas as the parent of the combobox and create a custom border
# canvas .c -width 200 -height 30 -highlightthickness 0
# canvas .c create rectangle 2 2 198 28 -width 3 -outline black
# pack .c
# ttk::combobox .c.cbox -values {"Option 1" "Option 2" "Option 3"} -style ThickBorder.TCombobox
# .c create window 100 15 -window .c.cbox
ttk::style layout TCombobox {
Combobox.field -sticky nsew -children {
Combobox.padding -expand 1 -sticky nsew -children {
@ -453,11 +476,11 @@ namespace eval ttk::theme::sun-valley-dark {
ttk::style element create Combobox.field \
image [list $images(button-rest) \
{readonly disabled} $images(button-disabled) \
{readonly pressed} $images(button-pressed) \
{readonly pressed} $images(button-rest) \
{readonly hover} $images(button-hover) \
readonly $images(button-rest) \
invalid $images(entry-invalid) \
disabled $images(entry-disabled) \
disabled $images(combo-disabled) \
focus $images(entry-focus) \
hover $images(button-hover) \
] -border 5 -padding 8 -sticky nsew

Binary file not shown.

After

Width:  |  Height:  |  Size: 2.9 KiB

View File

@ -1,15 +1,11 @@
from abc import ABCMeta
import torch
import torch.nn as nn
from pytorch_lightning import LightningModule
from .modules import TFC_TDF
from pytorch_lightning import LightningModule
dim_s = 4
class AbstractMDXNet(LightningModule):
__metaclass__ = ABCMeta
def __init__(self, target_name, lr, optimizer, dim_c, dim_f, dim_t, n_fft, hop_length, overlap):
super().__init__()
self.target_name = target_name
@ -24,7 +20,7 @@ class AbstractMDXNet(LightningModule):
self.window = nn.Parameter(torch.hann_window(window_length=self.n_fft, periodic=True), requires_grad=False)
self.freq_pad = nn.Parameter(torch.zeros([1, dim_c, self.n_bins - self.dim_f, self.dim_t]), requires_grad=False)
def configure_optimizers(self):
def get_optimizer(self):
if self.optimizer == 'rmsprop':
return torch.optim.RMSprop(self.parameters(), self.lr)
@ -37,7 +33,7 @@ class ConvTDFNet(AbstractMDXNet):
super(ConvTDFNet, self).__init__(
target_name, lr, optimizer, dim_c, dim_f, dim_t, n_fft, hop_length, overlap)
self.save_hyperparameters()
#self.save_hyperparameters()
self.num_blocks = num_blocks
self.l = l

48
lib_v5/results.py Normal file
View File

@ -0,0 +1,48 @@
# -*- coding: utf-8 -*-
"""
Matchering - Audio Matching and Mastering Python Library
Copyright (C) 2016-2022 Sergree
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program. If not, see <https://www.gnu.org/licenses/>.
"""
import os
import soundfile as sf
class Result:
def __init__(
self, file: str, subtype: str, use_limiter: bool = True, normalize: bool = True
):
_, file_ext = os.path.splitext(file)
file_ext = file_ext[1:].upper()
if not sf.check_format(file_ext):
raise TypeError(f"{file_ext} format is not supported")
if not sf.check_format(file_ext, subtype):
raise TypeError(f"{file_ext} format does not have {subtype} subtype")
self.file = file
self.subtype = subtype
self.use_limiter = use_limiter
self.normalize = normalize
def pcm16(file: str) -> Result:
return Result(file, "PCM_16")
def pcm24(file: str) -> Result:
return Result(file, "FLOAT")
def save_audiofile(file: str, wav_set="PCM_16") -> Result:
return Result(file, wav_set)

234
lib_v5/tfc_tdf_v3.py Normal file
View File

@ -0,0 +1,234 @@
import torch
import torch.nn as nn
from functools import partial
class STFT:
def __init__(self, n_fft, hop_length, dim_f):
self.n_fft = n_fft
self.hop_length = hop_length
self.window = torch.hann_window(window_length=self.n_fft, periodic=True)
self.dim_f = dim_f
def __call__(self, x):
window = self.window.to(x.device)
batch_dims = x.shape[:-2]
c, t = x.shape[-2:]
x = x.reshape([-1, t])
x = torch.stft(x, n_fft=self.n_fft, hop_length=self.hop_length, window=window, center=True,return_complex=False)
x = x.permute([0, 3, 1, 2])
x = x.reshape([*batch_dims, c, 2, -1, x.shape[-1]]).reshape([*batch_dims, c * 2, -1, x.shape[-1]])
return x[..., :self.dim_f, :]
def inverse(self, x):
window = self.window.to(x.device)
batch_dims = x.shape[:-3]
c, f, t = x.shape[-3:]
n = self.n_fft // 2 + 1
f_pad = torch.zeros([*batch_dims, c, n - f, t]).to(x.device)
x = torch.cat([x, f_pad], -2)
x = x.reshape([*batch_dims, c // 2, 2, n, t]).reshape([-1, 2, n, t])
x = x.permute([0, 2, 3, 1])
x = x[..., 0] + x[..., 1] * 1.j
x = torch.istft(x, n_fft=self.n_fft, hop_length=self.hop_length, window=window, center=True)
x = x.reshape([*batch_dims, 2, -1])
return x
def get_norm(norm_type):
def norm(c, norm_type):
if norm_type == 'BatchNorm':
return nn.BatchNorm2d(c)
elif norm_type == 'InstanceNorm':
return nn.InstanceNorm2d(c, affine=True)
elif 'GroupNorm' in norm_type:
g = int(norm_type.replace('GroupNorm', ''))
return nn.GroupNorm(num_groups=g, num_channels=c)
else:
return nn.Identity()
return partial(norm, norm_type=norm_type)
def get_act(act_type):
if act_type == 'gelu':
return nn.GELU()
elif act_type == 'relu':
return nn.ReLU()
elif act_type[:3] == 'elu':
alpha = float(act_type.replace('elu', ''))
return nn.ELU(alpha)
else:
raise Exception
class Upscale(nn.Module):
def __init__(self, in_c, out_c, scale, norm, act):
super().__init__()
self.conv = nn.Sequential(
norm(in_c),
act,
nn.ConvTranspose2d(in_channels=in_c, out_channels=out_c, kernel_size=scale, stride=scale, bias=False)
)
def forward(self, x):
return self.conv(x)
class Downscale(nn.Module):
def __init__(self, in_c, out_c, scale, norm, act):
super().__init__()
self.conv = nn.Sequential(
norm(in_c),
act,
nn.Conv2d(in_channels=in_c, out_channels=out_c, kernel_size=scale, stride=scale, bias=False)
)
def forward(self, x):
return self.conv(x)
class TFC_TDF(nn.Module):
def __init__(self, in_c, c, l, f, bn, norm, act):
super().__init__()
self.blocks = nn.ModuleList()
for i in range(l):
block = nn.Module()
block.tfc1 = nn.Sequential(
norm(in_c),
act,
nn.Conv2d(in_c, c, 3, 1, 1, bias=False),
)
block.tdf = nn.Sequential(
norm(c),
act,
nn.Linear(f, f // bn, bias=False),
norm(c),
act,
nn.Linear(f // bn, f, bias=False),
)
block.tfc2 = nn.Sequential(
norm(c),
act,
nn.Conv2d(c, c, 3, 1, 1, bias=False),
)
block.shortcut = nn.Conv2d(in_c, c, 1, 1, 0, bias=False)
self.blocks.append(block)
in_c = c
def forward(self, x):
for block in self.blocks:
s = block.shortcut(x)
x = block.tfc1(x)
x = x + block.tdf(x)
x = block.tfc2(x)
x = x + s
return x
class TFC_TDF_net(nn.Module):
def __init__(self, config):
super().__init__()
self.config = config
norm = get_norm(norm_type=config.model.norm)
act = get_act(act_type=config.model.act)
self.num_target_instruments = 1 if config.training.target_instrument else len(config.training.instruments)
self.num_subbands = config.model.num_subbands
dim_c = self.num_subbands * config.audio.num_channels * 2
n = config.model.num_scales
scale = config.model.scale
l = config.model.num_blocks_per_scale
c = config.model.num_channels
g = config.model.growth
bn = config.model.bottleneck_factor
f = config.audio.dim_f // self.num_subbands
self.first_conv = nn.Conv2d(dim_c, c, 1, 1, 0, bias=False)
self.encoder_blocks = nn.ModuleList()
for i in range(n):
block = nn.Module()
block.tfc_tdf = TFC_TDF(c, c, l, f, bn, norm, act)
block.downscale = Downscale(c, c + g, scale, norm, act)
f = f // scale[1]
c += g
self.encoder_blocks.append(block)
self.bottleneck_block = TFC_TDF(c, c, l, f, bn, norm, act)
self.decoder_blocks = nn.ModuleList()
for i in range(n):
block = nn.Module()
block.upscale = Upscale(c, c - g, scale, norm, act)
f = f * scale[1]
c -= g
block.tfc_tdf = TFC_TDF(2 * c, c, l, f, bn, norm, act)
self.decoder_blocks.append(block)
self.final_conv = nn.Sequential(
nn.Conv2d(c + dim_c, c, 1, 1, 0, bias=False),
act,
nn.Conv2d(c, self.num_target_instruments * dim_c, 1, 1, 0, bias=False)
)
self.stft = STFT(config.audio.n_fft, config.audio.hop_length, config.audio.dim_f)
def cac2cws(self, x):
k = self.num_subbands
b, c, f, t = x.shape
x = x.reshape(b, c, k, f // k, t)
x = x.reshape(b, c * k, f // k, t)
return x
def cws2cac(self, x):
k = self.num_subbands
b, c, f, t = x.shape
x = x.reshape(b, c // k, k, f, t)
x = x.reshape(b, c // k, f * k, t)
return x
def forward(self, x):
x = self.stft(x)
mix = x = self.cac2cws(x)
first_conv_out = x = self.first_conv(x)
x = x.transpose(-1, -2)
encoder_outputs = []
for block in self.encoder_blocks:
x = block.tfc_tdf(x)
encoder_outputs.append(x)
x = block.downscale(x)
x = self.bottleneck_block(x)
for block in self.decoder_blocks:
x = block.upscale(x)
x = torch.cat([x, encoder_outputs.pop()], 1)
x = block.tfc_tdf(x)
x = x.transpose(-1, -2)
x = x * first_conv_out # reduce artifacts
x = self.final_conv(torch.cat([mix, x], 1))
x = self.cws2cac(x)
if self.num_target_instruments > 1:
b, c, f, t = x.shape
x = x.reshape(b, self.num_target_instruments, -1, f, t)
x = self.stft.inverse(x)
return x

View File

@ -1,36 +1,15 @@
import json
import pathlib
default_param = {}
default_param['bins'] = 768
default_param['unstable_bins'] = 9 # training only
default_param['reduction_bins'] = 762 # training only
default_param['bins'] = -1
default_param['unstable_bins'] = -1 # training only
default_param['stable_bins'] = -1 # training only
default_param['sr'] = 44100
default_param['pre_filter_start'] = 757
default_param['pre_filter_stop'] = 768
default_param['pre_filter_start'] = -1
default_param['pre_filter_stop'] = -1
default_param['band'] = {}
default_param['band'][1] = {
'sr': 11025,
'hl': 128,
'n_fft': 960,
'crop_start': 0,
'crop_stop': 245,
'lpf_start': 61, # inference only
'res_type': 'polyphase'
}
default_param['band'][2] = {
'sr': 44100,
'hl': 512,
'n_fft': 1536,
'crop_start': 24,
'crop_stop': 547,
'hpf_start': 81, # inference only
'res_type': 'sinc_best'
}
N_BINS = 'n_bins'
def int_keys(d):
r = {}
@ -40,20 +19,14 @@ def int_keys(d):
r[k] = v
return r
class ModelParameters(object):
def __init__(self, config_path=''):
if '.pth' == pathlib.Path(config_path).suffix:
import zipfile
with zipfile.ZipFile(config_path, 'r') as zip:
self.param = json.loads(zip.read('param.json'), object_pairs_hook=int_keys)
elif '.json' == pathlib.Path(config_path).suffix:
with open(config_path, 'r') as f:
with open(config_path, 'r') as f:
self.param = json.loads(f.read(), object_pairs_hook=int_keys)
else:
self.param = default_param
for k in ['mid_side', 'mid_side_b', 'mid_side_b2', 'stereo_w', 'stereo_n', 'reverse']:
if not k in self.param:
self.param[k] = False
self.param[k] = False
if N_BINS in self.param:
self.param['bins'] = self.param[N_BINS]

View File

@ -0,0 +1,55 @@
{
"n_bins": 672,
"unstable_bins": 8,
"stable_bins": 530,
"band": {
"1": {
"sr": 7350,
"hl": 80,
"n_fft": 640,
"crop_start": 0,
"crop_stop": 85,
"lpf_start": 25,
"lpf_stop": 53,
"res_type": "polyphase"
},
"2": {
"sr": 7350,
"hl": 80,
"n_fft": 320,
"crop_start": 4,
"crop_stop": 87,
"hpf_start": 25,
"hpf_stop": 12,
"lpf_start": 31,
"lpf_stop": 62,
"res_type": "polyphase"
},
"3": {
"sr": 14700,
"hl": 160,
"n_fft": 512,
"crop_start": 17,
"crop_stop": 216,
"hpf_start": 48,
"hpf_stop": 24,
"lpf_start": 139,
"lpf_stop": 210,
"res_type": "polyphase"
},
"4": {
"sr": 44100,
"hl": 480,
"n_fft": 960,
"crop_start": 78,
"crop_stop": 383,
"hpf_start": 130,
"hpf_stop": 86,
"convert_channels": "stereo_n",
"res_type": "kaiser_fast"
}
},
"sr": 44100,
"pre_filter_start": 668,
"pre_filter_stop": 672
}

View File

@ -40,26 +40,26 @@ class BaseNet(nn.Module):
class CascadedNet(nn.Module):
def __init__(self, n_fft, nn_arch_size, nout=32, nout_lstm=128):
def __init__(self, n_fft, nn_arch_size=51000, nout=32, nout_lstm=128):
super(CascadedNet, self).__init__()
self.max_bin = n_fft // 2
self.output_bin = n_fft // 2 + 1
self.nin_lstm = self.max_bin // 2
self.offset = 64
nout = 64 if nn_arch_size == 218409 else nout
#print(nout, nout_lstm, n_fft)
self.stg1_low_band_net = nn.Sequential(
BaseNet(2, nout // 2, self.nin_lstm // 2, nout_lstm),
layers.Conv2DBNActiv(nout // 2, nout // 4, 1, 1, 0)
)
)
self.stg1_high_band_net = BaseNet(2, nout // 4, self.nin_lstm // 2, nout_lstm // 2)
self.stg2_low_band_net = nn.Sequential(
BaseNet(nout // 4 + 2, nout, self.nin_lstm // 2, nout_lstm),
layers.Conv2DBNActiv(nout, nout // 2, 1, 1, 0)
)
)
self.stg2_high_band_net = BaseNet(nout // 4 + 2, nout // 2, self.nin_lstm // 2, nout_lstm // 2)
self.stg3_full_band_net = BaseNet(3 * nout // 4 + 2, nout, self.nin_lstm, nout_lstm)

View File

@ -1,219 +0,0 @@
{
"0ddfc0eb5792638ad5dc27850236c246": {
"compensate": 1.035,
"mdx_dim_f_set": 2048,
"mdx_dim_t_set": 8,
"mdx_n_fft_scale_set": 6144,
"primary_stem": "Vocals"
},
"26d308f91f3423a67dc69a6d12a8793d": {
"compensate": 1.035,
"mdx_dim_f_set": 2048,
"mdx_dim_t_set": 9,
"mdx_n_fft_scale_set": 8192,
"primary_stem": "Other"
},
"2cdd429caac38f0194b133884160f2c6": {
"compensate": 1.035,
"mdx_dim_f_set": 3072,
"mdx_dim_t_set": 8,
"mdx_n_fft_scale_set": 7680,
"primary_stem": "Instrumental"
},
"2f5501189a2f6db6349916fabe8c90de": {
"compensate": 1.035,
"mdx_dim_f_set": 2048,
"mdx_dim_t_set": 8,
"mdx_n_fft_scale_set": 6144,
"primary_stem": "Vocals"
},
"398580b6d5d973af3120df54cee6759d": {
"compensate": 1.75,
"mdx_dim_f_set": 3072,
"mdx_dim_t_set": 8,
"mdx_n_fft_scale_set": 7680,
"primary_stem": "Vocals"
},
"488b3e6f8bd3717d9d7c428476be2d75": {
"compensate": 1.035,
"mdx_dim_f_set": 3072,
"mdx_dim_t_set": 8,
"mdx_n_fft_scale_set": 7680,
"primary_stem": "Instrumental"
},
"4910e7827f335048bdac11fa967772f9": {
"compensate": 1.035,
"mdx_dim_f_set": 2048,
"mdx_dim_t_set": 7,
"mdx_n_fft_scale_set": 4096,
"primary_stem": "Drums"
},
"53c4baf4d12c3e6c3831bb8f5b532b93": {
"compensate": 1.035,
"mdx_dim_f_set": 3072,
"mdx_dim_t_set": 8,
"mdx_n_fft_scale_set": 7680,
"primary_stem": "Vocals"
},
"5d343409ef0df48c7d78cce9f0106781": {
"compensate": 1.075,
"mdx_dim_f_set": 3072,
"mdx_dim_t_set": 8,
"mdx_n_fft_scale_set": 7680,
"primary_stem": "Vocals"
},
"5f6483271e1efb9bfb59e4a3e6d4d098": {
"compensate": 1.035,
"mdx_dim_f_set": 2048,
"mdx_dim_t_set": 9,
"mdx_n_fft_scale_set": 6144,
"primary_stem": "Vocals"
},
"65ab5919372a128e4167f5e01a8fda85": {
"compensate": 1.035,
"mdx_dim_f_set": 2048,
"mdx_dim_t_set": 8,
"mdx_n_fft_scale_set": 8192,
"primary_stem": "Other"
},
"6703e39f36f18aa7855ee1047765621d": {
"compensate": 1.035,
"mdx_dim_f_set": 2048,
"mdx_dim_t_set": 9,
"mdx_n_fft_scale_set": 16384,
"primary_stem": "Bass"
},
"6b31de20e84392859a3d09d43f089515": {
"compensate": 1.035,
"mdx_dim_f_set": 2048,
"mdx_dim_t_set": 8,
"mdx_n_fft_scale_set": 6144,
"primary_stem": "Vocals"
},
"867595e9de46f6ab699008295df62798": {
"compensate": 1.075,
"mdx_dim_f_set": 3072,
"mdx_dim_t_set": 8,
"mdx_n_fft_scale_set": 7680,
"primary_stem": "Vocals"
},
"a3cd63058945e777505c01d2507daf37": {
"compensate": 1.035,
"mdx_dim_f_set": 2048,
"mdx_dim_t_set": 8,
"mdx_n_fft_scale_set": 6144,
"primary_stem": "Vocals"
},
"b33d9b3950b6cbf5fe90a32608924700": {
"compensate": 1.075,
"mdx_dim_f_set": 3072,
"mdx_dim_t_set": 8,
"mdx_n_fft_scale_set": 7680,
"primary_stem": "Vocals"
},
"c3b29bdce8c4fa17ec609e16220330ab": {
"compensate": 1.035,
"mdx_dim_f_set": 2048,
"mdx_dim_t_set": 8,
"mdx_n_fft_scale_set": 16384,
"primary_stem": "Bass"
},
"ceed671467c1f64ebdfac8a2490d0d52": {
"compensate": 1.035,
"mdx_dim_f_set": 3072,
"mdx_dim_t_set": 8,
"mdx_n_fft_scale_set": 7680,
"primary_stem": "Instrumental"
},
"d2a1376f310e4f7fa37fb9b5774eb701": {
"compensate": 1.035,
"mdx_dim_f_set": 3072,
"mdx_dim_t_set": 8,
"mdx_n_fft_scale_set": 7680,
"primary_stem": "Instrumental"
},
"d7bff498db9324db933d913388cba6be": {
"compensate": 1.035,
"mdx_dim_f_set": 2048,
"mdx_dim_t_set": 8,
"mdx_n_fft_scale_set": 6144,
"primary_stem": "Vocals"
},
"d94058f8c7f1fae4164868ae8ae66b20": {
"compensate": 1.035,
"mdx_dim_f_set": 2048,
"mdx_dim_t_set": 8,
"mdx_n_fft_scale_set": 6144,
"primary_stem": "Vocals"
},
"dc41ede5961d50f277eb846db17f5319": {
"compensate": 1.035,
"mdx_dim_f_set": 2048,
"mdx_dim_t_set": 9,
"mdx_n_fft_scale_set": 4096,
"primary_stem": "Drums"
},
"e5572e58abf111f80d8241d2e44e7fa4": {
"compensate": 1.035,
"mdx_dim_f_set": 3072,
"mdx_dim_t_set": 8,
"mdx_n_fft_scale_set": 7680,
"primary_stem": "Instrumental"
},
"e7324c873b1f615c35c1967f912db92a": {
"compensate": 1.075,
"mdx_dim_f_set": 3072,
"mdx_dim_t_set": 8,
"mdx_n_fft_scale_set": 7680,
"primary_stem": "Vocals"
},
"1c56ec0224f1d559c42fd6fd2a67b154": {
"compensate": 1.035,
"mdx_dim_f_set": 2048,
"mdx_dim_t_set": 8,
"mdx_n_fft_scale_set": 5120,
"primary_stem": "Instrumental"
},
"f2df6d6863d8f435436d8b561594ff49": {
"compensate": 1.035,
"mdx_dim_f_set": 3072,
"mdx_dim_t_set": 8,
"mdx_n_fft_scale_set": 7680,
"primary_stem": "Instrumental"
},
"b06327a00d5e5fbc7d96e1781bbdb596": {
"compensate": 1.035,
"mdx_dim_f_set": 3072,
"mdx_dim_t_set": 8,
"mdx_n_fft_scale_set": 6144,
"primary_stem": "Instrumental"
},
"94ff780b977d3ca07c7a343dab2e25dd": {
"compensate": 1.035,
"mdx_dim_f_set": 3072,
"mdx_dim_t_set": 8,
"mdx_n_fft_scale_set": 6144,
"primary_stem": "Instrumental"
},
"73492b58195c3b52d34590d5474452f6": {
"compensate": 1.075,
"mdx_dim_f_set": 3072,
"mdx_dim_t_set": 8,
"mdx_n_fft_scale_set": 7680,
"primary_stem": "Vocals"
},
"1d64a6d2c30f709b8c9b4ce1366d96ee": {
"compensate": 1.035,
"mdx_dim_f_set": 2048,
"mdx_dim_t_set": 8,
"mdx_n_fft_scale_set": 5120,
"primary_stem": "Instrumental"
},
"203f2a3955221b64df85a41af87cf8f0": {
"compensate": 1.035,
"mdx_dim_f_set": 3072,
"mdx_dim_t_set": 8,
"mdx_n_fft_scale_set": 6144,
"primary_stem": "Instrumental"
}
}

View File

@ -0,0 +1,34 @@
audio:
chunk_size: 260096
dim_f: 4096
dim_t: 128
hop_length: 2048
n_fft: 8192
num_channels: 2
sample_rate: 44100
model:
act: gelu
bottleneck_factor: 4
growth: 64
norm: InstanceNorm
num_blocks_per_scale: 2
num_channels: 128
num_scales: 5
num_subbands: 4
scale:
- 2
- 2
training:
batch_size: 8
grad_clip: 0
instruments:
- Vocals
- Drums
- Bass
- Other
lr: 5.0e-05
target_instrument: null
inference:
batch_size: 1
dim_t: 256
num_overlap: 8

View File

@ -0,0 +1,34 @@
audio:
chunk_size: 260096
dim_f: 4096
dim_t: 128
hop_length: 2048
n_fft: 8192
num_channels: 2
sample_rate: 44100
model:
act: gelu
bottleneck_factor: 4
growth: 64
norm: InstanceNorm
num_blocks_per_scale: 2
num_channels: 256
num_scales: 5
num_subbands: 4
scale:
- 2
- 2
training:
batch_size: 8
grad_clip: 0
instruments:
- Vocals
- Drums
- Bass
- Other
lr: 3.0e-05
target_instrument: null
inference:
batch_size: 1
dim_t: 256
num_overlap: 8

View File

@ -0,0 +1,34 @@
audio:
chunk_size: 260096
dim_f: 4096
dim_t: 128
hop_length: 2048
n_fft: 12288
num_channels: 2
sample_rate: 44100
model:
act: gelu
bottleneck_factor: 4
growth: 64
norm: InstanceNorm
num_blocks_per_scale: 2
num_channels: 128
num_scales: 5
num_subbands: 4
scale:
- 2
- 2
training:
batch_size: 8
grad_clip: 0
instruments:
- Vocals
- Drums
- Bass
- Other
lr: 5.0e-05
target_instrument: Vocals
inference:
batch_size: 1
dim_t: 256
num_overlap: 8

View File

@ -0,0 +1,39 @@
audio:
chunk_size: 261120
dim_f: 4096
dim_t: 256
hop_length: 1024
min_mean_abs: 0.01
n_fft: 8192
num_channels: 2
sample_rate: 44100
model:
act: gelu
bottleneck_factor: 4
growth: 64
norm: InstanceNorm
num_blocks_per_scale: 2
num_channels: 64
num_scales: 5
num_subbands: 4
scale:
- 2
- 2
training:
batch_size: 6
coarse_loss_clip: true
ema_momentum: 0.999
grad_clip: null
instruments:
- Vocals
- Drums
- Bass
- Other
lr: 0.0001
num_steps: 100000
q: 0.4
target_instrument: null
inference:
batch_size: 2
dim_t: 256
num_overlap: 8

View File

@ -0,0 +1,41 @@
audio:
chunk_size: 261120
dim_f: 4096
dim_t: 256
hop_length: 1024
min_mean_abs: 0.01
n_fft: 8192
num_channels: 2
sample_rate: 44100
model:
act: gelu
bottleneck_factor: 4
growth: 64
norm: InstanceNorm
num_blocks_per_scale: 2
num_channels: 64
num_scales: 5
num_subbands: 4
scale:
- 2
- 2
training:
batch_size: 6
coarse_loss_clip: false
datasets:
- ../data/moises/bleeding
ema_momentum: 0.999
grad_clip: null
instruments:
- Vocals
- Drums
- Bass
- Other
lr: 0.0001
num_steps: 150000
q: 0.93
target_instrument: null
inference:
batch_size: 2
dim_t: 256
num_overlap: 8

View File

@ -0,0 +1,36 @@
audio:
chunk_size: 260096
dim_f: 4096
dim_t: 256
hop_length: 2048
n_fft: 12288
num_channels: 2
sample_rate: 44100
min_mean_abs: 0.001
model:
act: gelu
bottleneck_factor: 4
growth: 64
norm: InstanceNorm
num_blocks_per_scale: 2
num_channels: 128
num_scales: 5
num_subbands: 4
scale:
- 2
- 2
name: epoch_10.ckpt
training:
batch_size: 16
grad_clip: 0
instruments:
- Vocals
- Instrumental
lr: 5.0e-05
target_instrument: null
num_epochs: 100
num_steps: 1000
inference:
batch_size: 1
dim_t: 256
num_overlap: 8

View File

@ -0,0 +1,36 @@
audio:
chunk_size: 260096
dim_f: 6144
dim_t: 128
hop_length: 2048
n_fft: 12288
num_channels: 2
sample_rate: 44100
min_mean_abs: 0.001
model:
act: gelu
bottleneck_factor: 4
growth: 64
norm: InstanceNorm
num_blocks_per_scale: 2
num_channels: 128
num_scales: 5
num_subbands: 6
scale:
- 2
- 2
training:
batch_size: 14
grad_clip: 0
instruments:
- Vocals
- Instrumental
lr: 3.0e-05
target_instrument: null
num_epochs: 1000
num_steps: 1000
augmentation: 1
inference:
batch_size: 1
dim_t: 256
num_overlap: 8

View File

@ -0,0 +1,36 @@
audio:
chunk_size: 260096
dim_f: 6144
dim_t: 128
hop_length: 2048
n_fft: 12288
num_channels: 2
sample_rate: 44100
min_mean_abs: 0.001
model:
act: gelu
bottleneck_factor: 4
growth: 128
norm: InstanceNorm
num_blocks_per_scale: 2
num_channels: 128
num_scales: 5
num_subbands: 6
scale:
- 2
- 2
training:
batch_size: 14
grad_clip: 0
instruments:
- Vocals
- Instrumental
lr: 2.0e-05
target_instrument: null
num_epochs: 1000
num_steps: 1000
augmentation: 1
inference:
batch_size: 1
dim_t: 256
num_overlap: 8

View File

@ -0,0 +1,39 @@
audio:
chunk_size: 261120
dim_f: 6144
dim_t: 256
hop_length: 1024
n_fft: 12288
num_channels: 2
sample_rate: 44100
min_mean_abs: 0.001
model:
act: gelu
bottleneck_factor: 4
growth: 128
norm: InstanceNorm
num_blocks_per_scale: 2
num_channels: 128
num_scales: 5
num_subbands: 6
scale:
- 2
- 2
training:
batch_size: 6
grad_clip: 0
instruments:
- Vocals
- Instrumental
lr: 1.0e-05
target_instrument: null
num_epochs: 1000
num_steps: 1000
augmentation: 1
q: 0.95
coarse_loss_clip: true
ema_momentum: 0.999
inference:
batch_size: 1
dim_t: 256
num_overlap: 8

View File

@ -0,0 +1,40 @@
audio:
chunk_size: 261120
dim_f: 6144
dim_t: 256
hop_length: 1024
n_fft: 12288
num_channels: 2
sample_rate: 44100
min_mean_abs: 0.001
model:
act: gelu
bottleneck_factor: 4
growth: 128
norm: InstanceNorm
num_blocks_per_scale: 2
num_channels: 128
num_scales: 5
num_subbands: 6
scale:
- 2
- 2
training:
batch_size: 6
grad_clip: 0
instruments:
- Vocals
- Instrumental
lr: 0.7e-05
patience: 2
target_instrument: null
num_epochs: 1000
num_steps: 1000
augmentation: 1
q: 0.95
coarse_loss_clip: true
ema_momentum: 0.999
inference:
batch_size: 1
dim_t: 256
num_overlap: 8

View File

@ -0,0 +1,43 @@
audio:
chunk_size: 261120
dim_f: 4096
dim_t: 256
hop_length: 1024
n_fft: 8192
num_channels: 2
sample_rate: 44100
min_mean_abs: 0.001
model:
act: gelu
bottleneck_factor: 4
growth: 128
norm: InstanceNorm
num_blocks_per_scale: 2
num_channels: 128
num_scales: 5
num_subbands: 4
scale:
- 2
- 2
training:
batch_size: 6
grad_clip: 0
instruments:
- Vocals
- Instrumental
lr: 1.0e-05
patience: 2
reduce_factor: 0.95
target_instrument: null
num_epochs: 1000
num_steps: 1000
augmentation: 1
augmentation_type: simple1
augmentation_mix: true
q: 0.95
coarse_loss_clip: true
ema_momentum: 0.999
inference:
batch_size: 1
dim_t: 256
num_overlap: 8

View File

@ -0,0 +1,41 @@
audio:
chunk_size: 261120
dim_f: 1024
dim_t: 256
hop_length: 1024
min_mean_abs: 0.01
n_fft: 2048
num_channels: 2
sample_rate: 44100
stereo_prob: 0.7
model:
act: gelu
bottleneck_factor: 4
growth: 64
norm: InstanceNorm
num_blocks_per_scale: 2
num_channels: 64
num_scales: 5
num_subbands: 4
scale:
- 2
- 2
training:
batch_size: 8
ema_momentum: 0.999
grad_clip: null
instruments:
- Music
- Speech
- SFX
lr: 0.0001
num_steps: 30000
target_instrument: null
inference:
batch_size: 8
dim_t: 256
instruments:
- Music
- Dialog
- Effect
num_overlap: 8

View File

@ -14,7 +14,7 @@
"primary_stem": "Other"
},
"2cdd429caac38f0194b133884160f2c6": {
"compensate": 1.035,
"compensate": 1.045,
"mdx_dim_f_set": 3072,
"mdx_dim_t_set": 8,
"mdx_n_fft_scale_set": 7680,
@ -25,7 +25,8 @@
"mdx_dim_f_set": 2048,
"mdx_dim_t_set": 8,
"mdx_n_fft_scale_set": 6144,
"primary_stem": "Vocals"
"primary_stem": "Vocals",
"is_karaoke": true
},
"398580b6d5d973af3120df54cee6759d": {
"compensate": 1.75,
@ -49,7 +50,7 @@
"primary_stem": "Drums"
},
"53c4baf4d12c3e6c3831bb8f5b532b93": {
"compensate": 1.035,
"compensate": 1.043,
"mdx_dim_f_set": 3072,
"mdx_dim_t_set": 8,
"mdx_n_fft_scale_set": 7680,
@ -91,21 +92,21 @@
"primary_stem": "Vocals"
},
"867595e9de46f6ab699008295df62798": {
"compensate": 1.075,
"compensate": 1.03,
"mdx_dim_f_set": 3072,
"mdx_dim_t_set": 8,
"mdx_n_fft_scale_set": 7680,
"primary_stem": "Vocals"
},
"a3cd63058945e777505c01d2507daf37": {
"compensate": 1.035,
"compensate": 1.03,
"mdx_dim_f_set": 2048,
"mdx_dim_t_set": 8,
"mdx_n_fft_scale_set": 6144,
"primary_stem": "Vocals"
},
"b33d9b3950b6cbf5fe90a32608924700": {
"compensate": 1.075,
"compensate": 1.03,
"mdx_dim_f_set": 3072,
"mdx_dim_t_set": 8,
"mdx_n_fft_scale_set": 7680,
@ -154,21 +155,21 @@
"primary_stem": "Drums"
},
"e5572e58abf111f80d8241d2e44e7fa4": {
"compensate": 1.035,
"compensate": 1.028,
"mdx_dim_f_set": 3072,
"mdx_dim_t_set": 8,
"mdx_n_fft_scale_set": 7680,
"primary_stem": "Instrumental"
},
"e7324c873b1f615c35c1967f912db92a": {
"compensate": 1.075,
"compensate": 1.03,
"mdx_dim_f_set": 3072,
"mdx_dim_t_set": 8,
"mdx_n_fft_scale_set": 7680,
"primary_stem": "Vocals"
},
"1c56ec0224f1d559c42fd6fd2a67b154": {
"compensate": 1.035,
"compensate": 1.025,
"mdx_dim_f_set": 2048,
"mdx_dim_t_set": 8,
"mdx_n_fft_scale_set": 5120,
@ -189,25 +190,33 @@
"primary_stem": "Instrumental"
},
"94ff780b977d3ca07c7a343dab2e25dd": {
"compensate": 1.035,
"compensate": 1.039,
"mdx_dim_f_set": 3072,
"mdx_dim_t_set": 8,
"mdx_n_fft_scale_set": 6144,
"primary_stem": "Instrumental"
},
"73492b58195c3b52d34590d5474452f6": {
"compensate": 1.075,
"compensate": 1.043,
"mdx_dim_f_set": 3072,
"mdx_dim_t_set": 8,
"mdx_n_fft_scale_set": 7680,
"primary_stem": "Vocals"
},
"970b3f9492014d18fefeedfe4773cb42": {
"compensate": 1.009,
"mdx_dim_f_set": 3072,
"mdx_dim_t_set": 8,
"mdx_n_fft_scale_set": 7680,
"primary_stem": "Vocals"
},
"1d64a6d2c30f709b8c9b4ce1366d96ee": {
"compensate": 1.035,
"compensate": 1.065,
"mdx_dim_f_set": 2048,
"mdx_dim_t_set": 8,
"mdx_n_fft_scale_set": 5120,
"primary_stem": "Instrumental"
"primary_stem": "Instrumental",
"is_karaoke": true
},
"203f2a3955221b64df85a41af87cf8f0": {
"compensate": 1.035,
@ -229,5 +238,114 @@
"mdx_dim_t_set": 8,
"mdx_n_fft_scale_set": 6144,
"primary_stem": "Instrumental"
},
"cc63408db3d80b4d85b0287d1d7c9632": {
"compensate": 1.033,
"mdx_dim_f_set": 3072,
"mdx_dim_t_set": 8,
"mdx_n_fft_scale_set": 6144,
"primary_stem": "Instrumental"
},
"cd5b2989ad863f116c855db1dfe24e39": {
"compensate": 1.035,
"mdx_dim_f_set": 3072,
"mdx_dim_t_set": 9,
"mdx_n_fft_scale_set": 6144,
"primary_stem": "Reverb"
},
"55657dd70583b0fedfba5f67df11d711": {
"compensate": 1.022,
"mdx_dim_f_set": 3072,
"mdx_dim_t_set": 8,
"mdx_n_fft_scale_set": 6144,
"primary_stem": "Instrumental"
},
"b6bccda408a436db8500083ef3491e8b": {
"compensate": 1.02,
"mdx_dim_f_set": 3072,
"mdx_dim_t_set": 8,
"mdx_n_fft_scale_set": 7680,
"primary_stem": "Instrumental"
},
"8a88db95c7fb5dbe6a095ff2ffb428b1": {
"compensate": 1.026,
"mdx_dim_f_set": 2048,
"mdx_dim_t_set": 8,
"mdx_n_fft_scale_set": 5120,
"primary_stem": "Instrumental"
},
"b78da4afc6512f98e4756f5977f5c6b9": {
"compensate": 1.021,
"mdx_dim_f_set": 3072,
"mdx_dim_t_set": 8,
"mdx_n_fft_scale_set": 7680,
"primary_stem": "Instrumental"
},
"77d07b2667ddf05b9e3175941b4454a0": {
"compensate": 1.021,
"mdx_dim_f_set": 3072,
"mdx_dim_t_set": 8,
"mdx_n_fft_scale_set": 7680,
"primary_stem": "Vocals"
},
"2154254ee89b2945b97a7efed6e88820": {
"config_yaml": "model_2_stem_061321.yaml"
},
"063aadd735d58150722926dcbf5852a9": {
"config_yaml": "model_2_stem_061321.yaml"
},
"c09f714d978b41d718facfe3427e6001": {
"config_yaml": "model_2_stem_061321.yaml"
},
"fe96801369f6a148df2720f5ced88c19": {
"config_yaml": "model3.yaml"
},
"02e8b226f85fb566e5db894b9931c640": {
"config_yaml": "model2.yaml"
},
"e3de6d861635ab9c1d766149edd680d6": {
"config_yaml": "model1.yaml"
},
"3f2936c554ab73ce2e396d54636bd373": {
"config_yaml": "modelB.yaml"
},
"890d0f6f82d7574bca741a9e8bcb8168": {
"config_yaml": "modelB.yaml"
},
"63a3cb8c37c474681049be4ad1ba8815": {
"config_yaml": "modelB.yaml"
},
"a7fc5d719743c7fd6b61bd2b4d48b9f0": {
"config_yaml": "modelA.yaml"
},
"3567f3dee6e77bf366fcb1c7b8bc3745": {
"config_yaml": "modelA.yaml"
},
"a28f4d717bd0d34cd2ff7a3b0a3d065e": {
"config_yaml": "modelA.yaml"
},
"c9971a18da20911822593dc81caa8be9": {
"config_yaml": "sndfx.yaml"
},
"57d94d5ed705460d21c75a5ac829a605": {
"config_yaml": "sndfx.yaml"
},
"e7a25f8764f25a52c1b96c4946e66ba2": {
"config_yaml": "sndfx.yaml"
},
"104081d24e37217086ce5fde09147ee1": {
"config_yaml": "model_2_stem_061321.yaml"
},
"1e6165b601539f38d0a9330f3facffeb": {
"config_yaml": "model_2_stem_061321.yaml"
},
"fe0108464ce0d8271be5ab810891bd7c": {
"config_yaml": "model_2_stem_full_band.yaml"
},
"e9b82ec90ee56c507a3a982f1555714c": {
"config_yaml": "model_2_stem_full_band_2.yaml"
},
"99b6ceaae542265a3b6d657bf9fde79f": {
"config_yaml": "model_2_stem_full_band_8k.yaml"
}
}

View File

@ -7,7 +7,16 @@
"UVR-MDX-NET-Inst_1": "UVR-MDX-NET Inst 1",
"UVR-MDX-NET-Inst_2": "UVR-MDX-NET Inst 2",
"UVR-MDX-NET-Inst_3": "UVR-MDX-NET Inst 3",
"UVR-MDX-NET-Inst_4": "UVR-MDX-NET Inst 4",
"UVR-MDX-NET-Inst_Main": "UVR-MDX-NET Inst Main",
"UVR-MDX-NET-Inst_Main_2": "UVR-MDX-NET Inst Main 2",
"UVR-MDX-NET-Inst_HQ_1": "UVR-MDX-NET Inst HQ 1",
"UVR_MDXNET_KARA_2": "UVR-MDX-NET Karaoke 2"
"UVR-MDX-NET-Inst_HQ_2": "UVR-MDX-NET Inst HQ 2",
"UVR-MDX-NET-Inst_HQ_3": "UVR-MDX-NET Inst HQ 3",
"UVR_MDXNET_KARA_2": "UVR-MDX-NET Karaoke 2",
"Kim_Vocal_1": "Kim Vocal 1",
"Kim_Vocal_2": "Kim Vocal 2",
"Kim_Inst": "Kim Inst",
"MDX23C-8KFFT-InstVoc_HQ.ckpt": "MDX23C-InstVoc HQ",
"Reverb_HQ_By_FoxJoy": "Reverb HQ"
}

Binary file not shown.

View File

@ -13,7 +13,7 @@
},
"2aa34fbc01f8e6d2bf509726481e7142": {
"vr_model_param": "4band_44100",
"primary_stem": "Other"
"primary_stem": "No Piano"
},
"3e18f639b11abea7361db1a4a91c2559": {
"vr_model_param": "4band_44100",
@ -29,7 +29,8 @@
},
"6b5916069a49be3fe29d4397ecfd73fa": {
"vr_model_param": "3band_44100_msb2",
"primary_stem": "Instrumental"
"primary_stem": "Instrumental",
"is_karaoke": true
},
"74b3bc5fa2b69f29baf7839b858bc679": {
"vr_model_param": "4band_44100",
@ -85,10 +86,52 @@
},
"f6ea8473ff86017b5ebd586ccacf156b": {
"vr_model_param": "4band_v2_sn",
"primary_stem": "Instrumental"
"primary_stem": "Instrumental",
"is_karaoke": true
},
"fd297a61eafc9d829033f8b987c39a3d": {
"vr_model_param": "1band_sr32000_hl512",
"primary_stem": "Instrumental"
},
"0ec76fd9e65f81d8b4fbd13af4826ed8": {
"vr_model_param": "4band_v3",
"primary_stem": "No Woodwinds"
},
"0fb9249ffe4ffc38d7b16243f394c0ff": {
"vr_model_param": "4band_v3",
"primary_stem": "No Reverb"
},
"6857b2972e1754913aad0c9a1678c753": {
"vr_model_param": "4band_v3",
"primary_stem": "Echo",
"nout": 48,
"nout_lstm": 128
},
"f200a145434efc7dcf0cd093f517ed52": {
"vr_model_param": "4band_v3",
"primary_stem": "No Echo",
"nout": 48,
"nout_lstm": 128
},
"44c55d8b5d2e3edea98c2b2bf93071c7": {
"vr_model_param": "4band_v3",
"primary_stem": "Noise",
"nout": 48,
"nout_lstm": 128
},
"51ea8c43a6928ed3c10ef5cb2707d57b": {
"vr_model_param": "1band_sr44100_hl1024",
"primary_stem": "Noise",
"nout": 16,
"nout_lstm": 128
},
"944950a9c5963a5eb70b445d67b7068a": {
"vr_model_param": "4band_v3_sn",
"primary_stem": "Vocals",
"nout": 64,
"nout_lstm": 128,
"is_karaoke": false,
"is_bv_model": true,
"is_bv_model_rebalanced": 0.9
}
}

View File

@ -11,8 +11,9 @@ julius==0.2.7
kthread==0.2.3
librosa==0.9.2
llvmlite==0.39.1
matchering==2.0.6
ml_collections==0.1.1
natsort==8.2.0
numba==0.56.4
numpy==1.23.4
omegaconf==2.2.3
opencv-python==4.6.0.66
@ -32,7 +33,7 @@ resampy==0.2.2
scipy==1.9.3
soundfile==0.11.0
soundstretch==1.2
torch==1.13.1
torch==1.9.0+cu111
tqdm
urllib3==1.26.12
wget==3.2

File diff suppressed because it is too large Load Diff