Example 2: Speech denoising using overlapping group shrinkage (OGS)
Speech denoising using overlapping group shrinkage (OGS). The OGS algorithm is applied to the short-time Fourier transform (STFT) of the noisy speech signal. For comparison, speech denoising is also performed using soft thresholding and block thresholding (Yu, Mallat, Bacry, 2008).
Po-Yu Chen and Ivan Selesnick Polytechnic Institute of New York University New York, USA March 2012
Contents
Start
close all clear randn('state',0); % Initialize state so as to exactly reproduce the example. printme = @(txt) print('-dpdf', sprintf('figures/Example2_%s', txt)); SNR = @(x, x_denoise) 10*log10(sum(abs(x(:)).^2) / (sum(abs(x(:) - x_denoise(:)).^2)));
Load speech data
Load a noise-free speech waveform, then add white Gaussian noise to it.
[x, fs] = wavread('speech/arctic_a0001.wav'); % x : speech waveform x = x'; % fs : sampling rate (samples/second) sigma = 0.025; % sigma : noise standard deviation w = sigma * randn(size(x)); % w : white noise signal y = x + w; % y : speech + noise wavwrite(y, fs, 'speech/speech_noisy') % write noisy speech to file N = length(x); % N : length of speech signal t = (0:N-1)/fs; % t : time axis % Display noisy speech waveform figure(1) clf plot(t, y) xlim([0 N]/fs) xlabel('Time (sec)') title('Noisy speech')
Compute spectrogram
R = 2^9; % R : STFT block length X = stft(x, R); % X : spectrogram (STFT) of x Y = stft(y, R); % Y : spectrogram (STFT) of y
Verify perfect reconstruction property of STFT
err = x - istft(X, R, N);
fprintf('Maximum reconstruction error = %g\n', max(abs(err(:))))
Maximum reconstruction error = 3.33067e-16
Display spectrogram of speech waveform
figure(2) clf displaySTFT(X, fs, R, [-50 0]) xlim([0 2.5]) title('Speech (clean)') printme('clean')
Display spectrogram of noisy signal
figure(3) clf displaySTFT(Y, fs, R, [-50 0]) xlim([0 2.5]) title('Speech + noise') printme('noisy')
STFT of pure noise
Observe the standard deviation (std) in the STFT when the signal is a standard normal white noise.
yy = randn(1,100000); YY = stft(yy, R); std(YY(:)) % -> 0.706 % The output std for standad normal white noise signal is 1/sqrt(2) due to the window % and normalization in STFT command.
ans = 0.7065
So, the standard deviation of the noise in the STFT is compubed by:
sigma_stft = sigma/sqrt(2); % sigma_stft : noise std in STFT domain
Soft thresholding
Perform speech denoising using soft thresholding of the STFT of the noisy signal.
T = 3.26 * sigma_stft; % T : threshold for soft threshold rule % To select T, we have used Table II in the paper for group size 1x1 with output std = 0.001 % Perform soft thresholding X1 = soft(Y, T); % X1 : STFT after soft thresholding x1 = real(istft(X1, R, N)); % x1 : denoised speech using soft thresholding fprintf('SNR for soft thresholding: %.2f\n\n', SNR(x, x1)); wavwrite(x1, fs, 'speech/speech_soft') % write signal to file figure(3) clf displaySTFT(X1, fs, R, [-50 0]) xlim([0 2.5]) title('Soft thresholding') printme('denoised_soft')
SNR for soft thresholding: 10.13
Overlapping group shrinkage (OGS)
Perform speech denoising using overlapping group shrinkage (OGS) on the STFT.
% Set group size K1 by K2 K1 = 8; % K1 : frequency K2 = 2; % K2 : time lambda = 0.32 * sigma_stft; % lambda : regularization paameter for OGS % To select T, we have used Table II in the paper for group size 2x8 with output std = 0.001 Nit = 25; % Nit : number of iterations [X2, cost] = ogshrink2(Y, K1, K2, lambda, Nit); % Perform OGS x2 = real(istft(X2, R, N)); % x2 : denoised speech using OGS fprintf('SNR for overlapping group shrinkage (no Wiener post-processing) : %.2f\n', SNR(x, x2)); wavwrite(x2, fs, 'speech/speech_OGS') figure(2) clf displaySTFT(X2, fs, R, [-50 0]) xlim([0 2.5]) title('Overlapping group shrinkage (OGS)') printme('denoised_OGS')
SNR for overlapping group shrinkage (no Wiener post-processing) : 13.77
Display cost function history of OGS algorithm
figure(3) clf plot(1:Nit, cost,'.-') title('Cost function history') xlabel('Iteration')
Wiener post-processing
Apply empirical Wiener filter (Ghael, Sayeed, Baraniuk, 1997) to the STFT obtained using OGS.
X2_wp = Y .* abs(X2).^2 ./ ( abs(X2).^2 + sigma_stft^2 ); % Wiener post-processing x2_wp = real(istft(X2_wp, R, N)); % Transform to time-domain fprintf('SNR for overlapping group shrinkage (with Wiener post-processing): %.2f\n\n', SNR(x, x2_wp)); wavwrite(x2_wp, fs, 'speech/speech_OGS_wp')
SNR for overlapping group shrinkage (with Wiener post-processing): 15.63
Block thresholding
Perform speech denoising using block thresholding (Yu, Mallat, Bacry, 2008). We use the software provided by the authors at http://www.cmap.polytechnique.fr/~yu/research/ABT/samples.html We modified their program so it returns additional outputs so we can obtain the spectrogram.
addpath block_thresholding % Block thresholding software (Yu, Mallat, Bacry, 2008) time_win = 32; [f_rec, AttenFactorMap, flag_depth, STFTcoef_th, f_rec_th] = BlockThresholding_mod(y, time_win, fs, sigma); % STFTcoef_th : STFT without Wiener post-processing % f_rec : denoised speech using Wiener post-processing % f_rec_th : denoised speech with out using Wiener post-processing wavwrite(f_rec_th, fs, 'speech/speech_blockthresh') wavwrite(f_rec, fs, 'speech/speech_blockthresh_wp') fprintf('SNR for block thresholding (no Wiener post-processing): %.2f \n', SNR(x, f_rec_th)); fprintf('SNR for block thresholding (using post-processing): %.2f \n\n', SNR(x, f_rec));
SNR for block thresholding (no Wiener post-processing): 15.35 SNR for block thresholding (using post-processing): 15.75
Display spectogram of block thresholding result
yy = randn(1,100000); factor_redund = 1; % 2-times over-complete [YY, window] = STFT_Yu(yy, time_win, factor_redund, fs); std(YY(:)) % -> 13.78 sqrt(sum(window.^2)) % -> 13.85 % The STFT implementation of Yu scales the noise std by the L2 norm of the % window. So to scale the spectrogram of Yu to the same scale for the purpose % visualization, we multiply by 1/(sqrt(2)*sqrt(sum(window.^2))). C = sqrt(2)*sqrt(sum(window.^2)); % C : normalization constant for correct intensity scaling displaySTFT(STFTcoef_th / C, fs, R, [-50 0]) xlim([0 2.5]) title('Block thresholding') printme('denoised_blockthresh')
ans = 13.8494 ans = 13.8564