Example 2: Speech denoising using overlapping group shrinkage (OGS)

Speech denoising using overlapping group shrinkage (OGS). The OGS algorithm is applied to the short-time Fourier transform (STFT) of the noisy speech signal. For comparison, speech denoising is also performed using soft thresholding and block thresholding (Yu, Mallat, Bacry, 2008).

Po-Yu Chen and Ivan Selesnick
Polytechnic Institute of New York University
New York, USA
March 2012

Contents

Start

close all
clear
randn('state',0);               % Initialize state so as to exactly reproduce the example.

printme = @(txt) print('-dpdf', sprintf('figures/Example2_%s', txt));

SNR = @(x, x_denoise) 10*log10(sum(abs(x(:)).^2) / (sum(abs(x(:) - x_denoise(:)).^2)));

Load speech data

Load a noise-free speech waveform, then add white Gaussian noise to it.

[x, fs] = wavread('speech/arctic_a0001.wav');   % x : speech waveform
x = x';                                         % fs : sampling rate (samples/second)

sigma = 0.025;                                  % sigma : noise standard deviation
w = sigma * randn(size(x));                     % w : white noise signal
y = x + w;                                      % y : speech + noise
wavwrite(y, fs, 'speech/speech_noisy')          % write noisy speech to file

N = length(x);                                  % N : length of speech signal
t = (0:N-1)/fs;                                 % t : time axis

% Display noisy speech waveform
figure(1)
clf
plot(t, y)
xlim([0 N]/fs)
xlabel('Time (sec)')
title('Noisy speech')

Compute spectrogram

R = 2^9;                           % R : STFT block length

X = stft(x, R);                    % X : spectrogram (STFT) of x
Y = stft(y, R);                    % Y : spectrogram (STFT) of y

Verify perfect reconstruction property of STFT

err = x - istft(X, R, N);
fprintf('Maximum reconstruction error = %g\n', max(abs(err(:))))
Maximum reconstruction error = 3.33067e-16

Display spectrogram of speech waveform

figure(2)
clf
displaySTFT(X, fs, R, [-50 0])
xlim([0 2.5])
title('Speech (clean)')
printme('clean')

Display spectrogram of noisy signal

figure(3)
clf
displaySTFT(Y, fs, R, [-50 0])
xlim([0 2.5])
title('Speech + noise')
printme('noisy')

STFT of pure noise

Observe the standard deviation (std) in the STFT when the signal is a standard normal white noise.

yy = randn(1,100000);
YY = stft(yy, R);
std(YY(:))  % -> 0.706

% The output std for standad normal white noise signal is 1/sqrt(2) due to the window
% and normalization in STFT command.
ans =

    0.7065

So, the standard deviation of the noise in the STFT is compubed by:

sigma_stft = sigma/sqrt(2);     % sigma_stft : noise std in STFT domain

Soft thresholding

Perform speech denoising using soft thresholding of the STFT of the noisy signal.

T = 3.26 * sigma_stft;          % T : threshold for soft threshold rule
% To select T, we have used Table II in the paper for group size 1x1 with output std = 0.001

% Perform soft thresholding
X1 = soft(Y, T);                % X1 : STFT after soft thresholding

x1 = real(istft(X1, R, N));     % x1 : denoised speech using soft thresholding

fprintf('SNR for soft thresholding: %.2f\n\n', SNR(x, x1));

wavwrite(x1, fs, 'speech/speech_soft')      % write signal to file

figure(3)
clf
displaySTFT(X1, fs, R, [-50 0])
xlim([0 2.5])
title('Soft thresholding')
printme('denoised_soft')
SNR for soft thresholding: 10.13

Overlapping group shrinkage (OGS)

Perform speech denoising using overlapping group shrinkage (OGS) on the STFT.

% Set group size K1 by K2
K1 = 8;                         % K1 : frequency
K2 = 2;                         % K2 : time

lambda = 0.32 * sigma_stft;     % lambda : regularization paameter for OGS
% To select T, we have used Table II in the paper for group size 2x8 with output std = 0.001

Nit = 25;                       % Nit : number of iterations

[X2, cost] = ogshrink2(Y, K1, K2, lambda, Nit);     % Perform OGS

x2 = real(istft(X2, R, N));     % x2 : denoised speech using OGS

fprintf('SNR for overlapping group shrinkage (no Wiener post-processing) : %.2f\n', SNR(x, x2));

wavwrite(x2, fs, 'speech/speech_OGS')

figure(2)
clf
displaySTFT(X2, fs, R, [-50 0])
xlim([0 2.5])
title('Overlapping group shrinkage (OGS)')
printme('denoised_OGS')
SNR for overlapping group shrinkage (no Wiener post-processing) : 13.77

Display cost function history of OGS algorithm

figure(3)
clf
plot(1:Nit, cost,'.-')
title('Cost function history')
xlabel('Iteration')

Wiener post-processing

Apply empirical Wiener filter (Ghael, Sayeed, Baraniuk, 1997) to the STFT obtained using OGS.

X2_wp = Y .* abs(X2).^2 ./ ( abs(X2).^2 + sigma_stft^2 );       % Wiener post-processing
x2_wp = real(istft(X2_wp, R, N));                               % Transform to time-domain

fprintf('SNR for overlapping group shrinkage (with Wiener post-processing): %.2f\n\n', SNR(x, x2_wp));

wavwrite(x2_wp, fs, 'speech/speech_OGS_wp')
SNR for overlapping group shrinkage (with Wiener post-processing): 15.63

Block thresholding

Perform speech denoising using block thresholding (Yu, Mallat, Bacry, 2008). We use the software provided by the authors at http://www.cmap.polytechnique.fr/~yu/research/ABT/samples.html We modified their program so it returns additional outputs so we can obtain the spectrogram.

addpath block_thresholding      % Block thresholding software (Yu, Mallat, Bacry, 2008)

time_win = 32;
[f_rec, AttenFactorMap, flag_depth, STFTcoef_th, f_rec_th] = BlockThresholding_mod(y, time_win, fs, sigma);

% STFTcoef_th   : STFT without Wiener post-processing
% f_rec         : denoised speech using Wiener post-processing
% f_rec_th      : denoised speech with out using Wiener post-processing

wavwrite(f_rec_th, fs, 'speech/speech_blockthresh')
wavwrite(f_rec, fs, 'speech/speech_blockthresh_wp')

fprintf('SNR for block thresholding (no Wiener post-processing): %.2f \n', SNR(x, f_rec_th));
fprintf('SNR for block thresholding (using post-processing): %.2f \n\n', SNR(x, f_rec));
SNR for block thresholding (no Wiener post-processing): 15.35 
SNR for block thresholding (using post-processing): 15.75 

Display spectogram of block thresholding result

yy = randn(1,100000);
factor_redund = 1;                  % 2-times over-complete
[YY, window] = STFT_Yu(yy, time_win, factor_redund, fs);
std(YY(:))  % -> 13.78
sqrt(sum(window.^2))   % -> 13.85
% The STFT implementation of Yu scales the noise std by the L2 norm of the
% window. So to scale the spectrogram of Yu to the same scale for the purpose
% visualization, we multiply by 1/(sqrt(2)*sqrt(sum(window.^2))).

C = sqrt(2)*sqrt(sum(window.^2));       % C : normalization constant for correct intensity scaling

displaySTFT(STFTcoef_th / C, fs, R, [-50 0])
xlim([0 2.5])
title('Block thresholding')
printme('denoised_blockthresh')
ans =

   13.8494


ans =

   13.8564