Group-sparse denoising by non-convex regularization: Speech denoising
This example illustrates speech denoising using 2D overlapping group sparsity (OGS) with a non-convex regularizer (the 'atan' function). 2D OGS is applied to the spectrogram.
Po-Yu Chen and Ivan Selesnick, August 2013
Reference: Group-Sparse Signal Denoising: Non-Convex Regularization, Convex Optimization http://arxiv.org/abs/1308.5038
Contents
Set up
close all clear printme = @(txt) ... print('-dpdf', sprintf('figures/Example2_%s', txt));
Load speech signal
[x, Fs] = audioread('arctic_a0001.wav'); x = x(:)'; % Convert to row vector N = length(x); % N : signal length T = N/Fs; % T : signal duration (sec)
Make noisy signal
% snr_dB : SNR level of noisy signal (dB) snr_dB = 10; snr = 10^(snr_dB/10); Px = var(x) + mean(x)^2; Pn = Px/snr; % sigma_n : standard deviation of noise sigma_n = sqrt(Pn); % Set state for reproducibility of example randn('state', 0); % y : noisy speech signal y = x + sigma_n * randn(1, N);
Define STFT function
Nf = 2^9; % FFT length F = @(y) stft(y, Nf); % STFT invF = @(c) istft(c, Nf, N); % inverse STFT Y = F(y); % Y : STFT of noisy speech signal X = F(x); % X : STFT of clean speech signal
STFT of noisy speech
dBlim = [-50 0]; figure(1) clf displaySTFT(Y, Fs, T, dBlim) title( sprintf('Noisy signal (SNR = %2.0f dB)', ... get_snr(x, y)) ) printme('noisy') % sigma_stft: noise standard deviation in STFT domain sigma_stft = sigma_n/sqrt(2);

Soft thresholding
% Calculate lambda by table look up and interpolation stdo = 3E-4; % Group size of 1 by 1 (scalar) lam_factor = find_lam(stdo, 'abs', 'complex', 1, 1); lam = lam_factor * sigma_stft; X_soft = soft(Y, lam); x_soft = real(invF(X_soft)); % Compute SNR snr_soft = get_snr(x, x_soft); fprintf('Soft thresholding: SNR = %.2f dB\n', snr_soft)
Soft thresholding: SNR = 9.60 dB

Display the spectrogram
figure(1) clf displaySTFT(X_soft, Fs, T, dBlim) tstr = sprintf('soft (SNR = %3.2f dB, \\lambda = %3.2f\\sigma)', ... snr_soft, lam_factor); title( tstr ); printme('soft')

OGS[abs]
% (K1, K2) : Two-dimensional group size in STFT domain K1 = 8; % K1 : number of spectral samples K2 = 2; % K2 : number of temporal samples Nit = 25; % Nit : number of iterations of OGS algorithm % Calculate lambda by table look up and interpolation stdo = 3E-4; lam_factor = find_lam(stdo, 'abs', 'complex', K1, K2); lam = lam_factor * sigma_stft; % Run OGS[abs] algorithm X_ogs_abs = ogs2(Y, K1, K2, lam, 'abs', 0, Nit); % inverse STFT x_ogs_abs = real(invF(X_ogs_abs)); % Compute SNR snr_ogs_abs = get_snr(x, x_ogs_abs); fprintf('OGS[abs]: SNR = %.2f dB\n', snr_ogs_abs)
OGS[abs]: SNR = 13.36 dB

Spectrogram
figure(3) clf displaySTFT(X_ogs_abs, Fs, T, dBlim) tstr = sprintf('OGS[abs] (SNR = %3.2f dB, \\lambda = %3.2f\\sigma, %d iterations)', ... snr_ogs_abs, lam_factor, Nit); title( tstr ); printme('ogs_abs')

OGS[atan]
% Calculate lambda by table look up and interpolation stdo = 3E-4; lam_factor = find_lam(stdo, 'atan', 'complex', K1, K2); lam = lam_factor * sigma_stft; % Run OGS[tan] algorithm X_ogs_atan = ogs2(Y, K1, K2, lam, 'atan', 1, Nit); % inverse STFT x_ogs_atan = real(invF(X_ogs_atan)); % Compute SNR snr_ogs_atan = get_snr(x, x_ogs_atan); fprintf('OGS[atan]: SNR = %.2f dB\n', snr_ogs_atan)
OGS[atan]: SNR = 16.28 dB

Spectrogram
figure(4) clf displaySTFT(X_ogs_atan, Fs, T, dBlim) tstr = sprintf('OGS[atan] (SNR = %3.2f dB, \\lambda = %3.2f\\sigma, %d iterations)', ... snr_ogs_atan, lam_factor, Nit); title( tstr ) printme('ogs_atan')

SNR summary
fprintf('\n') fprintf('Summary:\n') fprintf('Soft thresholding: SNR = %.2f dB\n', snr_soft) fprintf('OGS[abs]: SNR = %.2f dB\n', snr_ogs_abs) fprintf('OGS[atan]: SNR = %.2f dB\n', snr_ogs_atan)
Summary: Soft thresholding: SNR = 9.60 dB OGS[abs]: SNR = 13.36 dB OGS[atan]: SNR = 16.28 dB
Listen to results
Uncomment following lines to hear the denoised speech signals.
% sound(x_soft, Fs); % % sound(x_ogs_abs, Fs); % % sound(x_ogs_atan, Fs);
Wiener post-processing
For this example, Wiener post-processing improves soft-thresholding and OGS[abs] but it does not improve OGS[atan].
% soft thresholing X_soft_wp = Y .* abs(X_soft).^2 ./ ( abs(X_soft).^2 + sigma_stft^2 ); x_soft_wp = invF(X_soft_wp); % OGS[abs] X_ogs_abs_wp = Y .* abs(X_ogs_abs).^2 ./ ( abs(X_ogs_abs).^2 + sigma_stft^2 ); x_ogs_abs_wp = invF(X_ogs_abs_wp); % OGS[atan] X_ogs_atan_wp = Y .* abs(X_ogs_atan).^2 ./ ( abs(X_ogs_atan).^2 + sigma_stft^2 ); x_ogs_atan_wp = invF(X_ogs_atan_wp); % Compute SNR values snr_soft_wp = get_snr(x, x_soft_wp); snr_ogs_abs_wp = get_snr(x, x_ogs_abs_wp); snr_ogs_atan_wp = get_snr(x, x_ogs_atan_wp); % Display SNR values fprintf('\n') fprintf('With Wiener post-processing:\n') fprintf('Soft thresholding: SNR = %.2f dB\n', snr_soft_wp) fprintf('OGS[abs]: SNR = %.2f dB\n', snr_ogs_abs_wp) fprintf('OGS[atan]: SNR = %.2f dB\n', snr_ogs_atan_wp)
With Wiener post-processing: Soft thresholding: SNR = 12.23 dB OGS[abs]: SNR = 15.36 dB OGS[atan]: SNR = 16.21 dB