
您现在的位置是:首页 >  后端



MATLAB建模代码 实现 基于 结构 分离 语音
2023-09-14 09:14:28 时间







💥1 概述

📚2 运行结果

🎉3 参考文献

🌈4 Matlab代码实现

💥1 概述





Source separation of musical signals is an appealing but difficult problem, especially in the single-channel case. In this paper, an unsupervised single-channel music source separation algorithm based on average harmonic structure modeling is proposed. Under the assumption of playing in narrow pitch ranges, different harmonic instrumental sources in a piece of music often have different but stable harmonic structures; thus, sources can be characterized uniquely by harmonic structure models. Given the number of instrumental sources, the proposed algorithm learns these models directly from the mixed signal by clustering the harmonic structures extracted from different frames. The corresponding sources are then extracted from the mixed signal using the models. Experiments on several mixed signals, including synthesized instrumental sources, real instrumental sources, and singing voices, show that this algorithm outperforms the general nonnegative matrix factorization (NMF)-based source separation algorithm, and yields good subjective listening quality. As a side effect, this algorithm estimates the pitches of the harmonic instrumental sources. The number of concurrent sounds in each frame is also computed, which is a difficult task for general multipitch estimation (MPE) algorithms.


声源分离问题可以按声源和传感器的数量进行分类。超定确定的情况是传感器数量分别大于或等于源数量的情况。在这些情况下,独立成分分析(ICA)[1]-[3]和一些使用源统计[4],[8]的方法可以取得良好的结果。然而,在处理传感器少于源的未确定情况下,它们会遇到困难。在这些情况下,一些最先进的方法采用声源稀疏性[5],[6]或听觉线索[7]来解决问题。 单通道源分离问题是确定不足的源分离问题的极端情况。第二节回顾了解决这一问题的一些方法。

根据所使用的信息,声源分离方法可分为监督和无监督。监督方法通常需要源单独摘录来训练单个源模型 [8]–[17],或整体分离模型参数 [18]然后使用这些模型分离混合信号。无监督方法[19]-[23],使用的信息较少,采用计算听觉场景分析(CASA)[24],[25]线索,如和谐度和共同开始和偏移时间,来解决分离问题。 此外,源信号和混合信号的统计特征,如非负性[26],稀疏性[4]-[6],或两者兼而有之[27]也被一些无监督的方法所采用。

在本文中,我们以无监督的方式处理了单通道音乐源分离问题。在这里,每个源都是一个单声道信号,一次最多有一个声音。研究发现,在音乐信号中,谐波结构是谐波乐器在窄音调范围内的近似不变特征。因此,这些仪器的谐波结构是从混合信号的每一帧的频谱中提取的。然后,我们通过对提取的结构进行聚类,给定乐器源的数量来学习平均谐波结构(AHS)模型,即单个乐器的典型谐波结构。使用这些模型,从混合信号中提取相应的源。我们注意到,这种分离算法不需要知道声源的音高。相反,它给出了多音高估计 (MPE) 结果作为副作用。该算法已经在合成和真实乐器以及歌声的几个混合信号上进行了测试。结果是有希望的。这个想法最早是在[29]中提出的。本文给出了估计F0s和提取谐波结构的不同公式,以及更详细的分析,实验和讨论。

📚2 运行结果




%% Parameters for peak extraction
peakThreshold = 50;                                % global amplitude threshold (dB)
peakThreshold_rel = 8;                             % local amplitude threshold (dB)
peakThreshold_freq_min = 60;                       % minimum frequency to look for a peak (Hz) 60Hz is approx. the first harmonic of 'C2'
peakThreshold_freq_max = 20000;                    % maximum frequency to look for a peak (Hz)
%movL = round(0.01*nfft);                                         % moving average width (bins)
movL = 9;
typeSmoothing = 'Normal Moving Average';           %  type of smoothing function used. Can either be 'Normal Moving Average' or 'Gaussian Moving Average'
%typeSmoothing = 'Gaussian Moving Average';         %  type of smoothing function used. Can either be 'Normal Moving Average' or 'Gaussian Moving Average'
%sigma = (0.1*movL);
sigma = 5;
%% ---------- F0's Estimation ----------------
maxf0Num = numberSources;                      %-- the number of maximum F0s in each frame, highest possible number of sources in a frame
f0min_midi = note2midinum('C2');               %-- lowest possible frequency of F0 (midi number)
f0max_midi = note2midinum('B7');               %-- highest possible frequency of F0 (midi number)
searchRadius_midi = 0.5;                       % radius around each peak to search for f0s (midi)
f0step_midi = 0.1;                             % F0 search step (midi)

%% Parameters for calculating Harmonic Structures
maxHarm = 20;                     %-- The number of harmonics, i.e. harmonic structure feature dimensionality (20 default used)
Thresh_PeakF0Belong = 0.03;       %-- The limit of interval (in frequency ratio, i.e., fpeak/k*f0) to decide if a peak is a harmonic or not (0.03 for half semi-tone)
normEnergy_dB = 100;              %-- The total energy to normalize the harmonic structure of each F0 (dB)

🎉3 参考文献


[1]Z. Duan, Y. Zhang, C. Zhang and Z. Shi, "Unsupervised Single-Channel Music Source Separation by Average Harmonic Structure Modeling," in IEEE Transactions on Audio, Speech, and Language Processing, vol. 16, no. 4, pp. 766-778, May 2008, doi: 10.1109/TASL.2008.919073.

🌈4 Matlab代码实现