Retrieval-based-Voice-Conversion-WebUI

Cool_Tools/Retrieval-based-Voice-Conversion-WebUI

mirror of synced 2024-11-23 23:21:03 +01:00

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

一、对照实验数据集

训练集（target speaker）约8min。

采样试听（用于展示音色和训练集质量）来自米津玄師《ピースサイン》

https://user-images.githubusercontent.com/129054828/236691565-591a0238-b260-46a5-a9fe-83fd4f8065ee.mp4

测试音频：夏真浔翻唱《冬之花》第一段

https://user-images.githubusercontent.com/129054828/236691637-ab48a326-2016-4960-8d50-cf255e34fe65.mp4

before-baseline-version(史前版本)混音结果完整版(《冬の花》coverd by AI米津玄師)：

https://www.bilibili.com/video/BV1Kb411d7zC

二、faiss索引对照（updated20230428）

结论：

1、nprobe增大对效果影响不大，因此更新后从7降至1，检索速度7倍；

2、fastscan（PQ128）质量有损（注意 wa ta shi no i no "ch"i），暂时不采纳；

3、top8进行加权混合代替top1：显著削弱高频刺耳的现象，提升了音频质量，采纳。

https://user-images.githubusercontent.com/129054828/236691899-80900082-0fd8-4562-a936-c572688e7afe.mp4

https://user-images.githubusercontent.com/129054828/236691905-509d591a-52fb-4d10-9440-40c7e6ffb8c1.mp4

https://user-images.githubusercontent.com/129054828/236691908-f88fb16f-3623-4a86-9044-273a842af519.mp4

https://user-images.githubusercontent.com/129054828/236691910-912327cd-6604-46bb-b7bc-4e886489a883.mp4

三、backbone结构对照（底模+小训练集fine tune）

version:hubert_base（ContentVec）+add 3 period discriminators

harvest+邻域3的中值滤波+index_rate=1

hubert_base结构下，中间hidden size为768，结尾linear至256

C768：不使用final_proj

C256：使用final_proj

L9/L12：hubert的特征层数

baseline（当前版本）：C256L9

结论：C768L12默秒全（呼吸+辅音齿音电流声）。

https://user-images.githubusercontent.com/129054828/236691753-86c1242e-0fd5-4da0-b6e6-d14cbc5a3712.mp4

https://user-images.githubusercontent.com/129054828/236691760-4de3f6ee-6ff1-4475-b8d7-1b679729fca8.mp4

https://user-images.githubusercontent.com/129054828/236691763-41c3335e-7f32-471c-9532-bf2f95866148.mp4

https://user-images.githubusercontent.com/129054828/236691767-5ea7d837-87f2-40ed-a2f8-79f1daba8e7e.mp4

四、RVC_v3偷跑

大就是好！

https://github.com/RVC-Project/Retrieval-based-Voice-Conversion-WebUI/assets/129054828/e8b21d2a-d868-47ae-8a35-2108570914a6