Example 2: Speech denoising using overlapping group shrinkage (OGS)

Speech denoising using overlapping group shrinkage (OGS). The OGS algorithm is applied to the short-time Fourier transform (STFT) of the noisy speech signal. For comparison, speech denoising is also performed using soft thresholding and block thresholding (Yu, Mallat, Bacry, 2008).

Po-Yu Chen and Ivan Selesnick
Polytechnic Institute of New York University
New York, USA
March 2012

Start
Load speech data
Compute spectrogram
STFT of pure noise
Soft thresholding
Overlapping group shrinkage (OGS)
Wiener post-processing
Block thresholding

Start

close all
clear
randn('state',0);               % Initialize state so as to exactly reproduce the example.

printme = @(txt) print('-dpdf', sprintf('figures/Example2_%s', txt));

SNR = @(x, x_denoise) 10*log10(sum(abs(x(:)).^2) / (sum(abs(x(:) - x_denoise(:)).^2)));

Load speech data

Load a noise-free speech waveform, then add white Gaussian noise to it.

[x, fs] = wavread('speech/arctic_a0001.wav');   % x : speech waveform
x = x';                                         % fs : sampling rate (samples/second)

sigma = 0.025;                                  % sigma : noise standard deviation
w = sigma * randn(size(x));                     % w : white noise signal
y = x + w;                                      % y : speech + noise
wavwrite(y, fs, 'speech/speech_noisy')          % write noisy speech to file

N = length(x);                                  % N : length of speech signal
t = (0:N-1)/fs;                                 % t : time axis

% Display noisy speech waveform
figure(1)
clf
plot(t, y)
xlim([0 N]/fs)
xlabel('Time (sec)')
title('Noisy speech')

Compute spectrogram

R = 2^9;                           % R : STFT block length

X = stft(x, R);                    % X : spectrogram (STFT) of x
Y = stft(y, R);                    % Y : spectrogram (STFT) of y

Verify perfect reconstruction property of STFT

err = x - istft(X, R, N);
fprintf('Maximum reconstruction error = %g\n', max(abs(err(:))))

Maximum reconstruction error = 3.33067e-16

Display spectrogram of speech waveform

figure(2)
clf
displaySTFT(X, fs, R, [-50 0])
xlim([0 2.5])
title('Speech (clean)')
printme('clean')

Display spectrogram of noisy signal

figure(3)
clf
displaySTFT(Y, fs, R, [-50 0])
xlim([0 2.5])
title('Speech + noise')
printme('noisy')

STFT of pure noise

Observe the standard deviation (std) in the STFT when the signal is a standard normal white noise.

yy = randn(1,100000);
YY = stft(yy, R);
std(YY(:))  % -> 0.706

% The output std for standad normal white noise signal is 1/sqrt(2) due to the window
% and normalization in STFT command.

ans =

    0.7065

So, the standard deviation of the noise in the STFT is compubed by:

sigma_stft = sigma/sqrt(2);     % sigma_stft : noise std in STFT domain

Soft thresholding

Perform speech denoising using soft thresholding of the STFT of the noisy signal.

T = 3.26 * sigma_stft;          % T : threshold for soft threshold rule
% To select T, we have used Table II in the paper for group size 1x1 with output std = 0.001

% Perform soft thresholding
X1 = soft(Y, T);                % X1 : STFT after soft thresholding

x1 = real(istft(X1, R, N));     % x1 : denoised speech using soft thresholding

fprintf('SNR for soft thresholding: %.2f\n\n', SNR(x, x1));

wavwrite(x1, fs, 'speech/speech_soft')      % write signal to file

figure(3)
clf
displaySTFT(X1, fs, R, [-50 0])
xlim([0 2.5])
title('Soft thresholding')
printme('denoised_soft')

SNR for soft thresholding: 10.13

Overlapping group shrinkage (OGS)

Perform speech denoising using overlapping group shrinkage (OGS) on the STFT.

% Set group size K1 by K2
K1 = 8;                         % K1 : frequency
K2 = 2;                         % K2 : time

lambda = 0.32 * sigma_stft;     % lambda : regularization paameter for OGS
% To select T, we have used Table II in the paper for group size 2x8 with output std = 0.001

Nit = 25;                       % Nit : number of iterations

[X2, cost] = ogshrink2(Y, K1, K2, lambda, Nit);     % Perform OGS

x2 = real(istft(X2, R, N));     % x2 : denoised speech using OGS

fprintf('SNR for overlapping group shrinkage (no Wiener post-processing) : %.2f\n', SNR(x, x2));

wavwrite(x2, fs, 'speech/speech_OGS')

figure(2)
clf
displaySTFT(X2, fs, R, [-50 0])
xlim([0 2.5])
title('Overlapping group shrinkage (OGS)')
printme('denoised_OGS')

SNR for overlapping group shrinkage (no Wiener post-processing) : 13.77

Display cost function history of OGS algorithm

figure(3)
clf
plot(1:Nit, cost,'.-')
title('Cost function history')
xlabel('Iteration')

Wiener post-processing

Apply empirical Wiener filter (Ghael, Sayeed, Baraniuk, 1997) to the STFT obtained using OGS.

X2_wp = Y .* abs(X2).^2 ./ ( abs(X2).^2 + sigma_stft^2 );       % Wiener post-processing
x2_wp = real(istft(X2_wp, R, N));                               % Transform to time-domain

fprintf('SNR for overlapping group shrinkage (with Wiener post-processing): %.2f\n\n', SNR(x, x2_wp));

wavwrite(x2_wp, fs, 'speech/speech_OGS_wp')

SNR for overlapping group shrinkage (with Wiener post-processing): 15.63

Block thresholding

Perform speech denoising using block thresholding (Yu, Mallat, Bacry, 2008). We use the software provided by the authors at http://www.cmap.polytechnique.fr/~yu/research/ABT/samples.html We modified their program so it returns additional outputs so we can obtain the spectrogram.

addpath block_thresholding      % Block thresholding software (Yu, Mallat, Bacry, 2008)

time_win = 32;
[f_rec, AttenFactorMap, flag_depth, STFTcoef_th, f_rec_th] = BlockThresholding_mod(y, time_win, fs, sigma);

% STFTcoef_th   : STFT without Wiener post-processing
% f_rec         : denoised speech using Wiener post-processing
% f_rec_th      : denoised speech with out using Wiener post-processing

wavwrite(f_rec_th, fs, 'speech/speech_blockthresh')
wavwrite(f_rec, fs, 'speech/speech_blockthresh_wp')

fprintf('SNR for block thresholding (no Wiener post-processing): %.2f \n', SNR(x, f_rec_th));
fprintf('SNR for block thresholding (using post-processing): %.2f \n\n', SNR(x, f_rec));

SNR for block thresholding (no Wiener post-processing): 15.35 
SNR for block thresholding (using post-processing): 15.75

Display spectogram of block thresholding result

yy = randn(1,100000);
factor_redund = 1;                  % 2-times over-complete
[YY, window] = STFT_Yu(yy, time_win, factor_redund, fs);
std(YY(:))  % -> 13.78
sqrt(sum(window.^2))   % -> 13.85
% The STFT implementation of Yu scales the noise std by the L2 norm of the
% window. So to scale the spectrogram of Yu to the same scale for the purpose
% visualization, we multiply by 1/(sqrt(2)*sqrt(sum(window.^2))).

C = sqrt(2)*sqrt(sum(window.^2));       % C : normalization constant for correct intensity scaling

displaySTFT(STFTcoef_th / C, fs, R, [-50 0])
xlim([0 2.5])
title('Block thresholding')
printme('denoised_blockthresh')

ans =

   13.8494


ans =

   13.8564