دانلود مقاله ISI انگلیسی شماره 24692
ترجمه فارسی عنوان مقاله

پیشینی برآوردگر SNR جدید بر اساس روش چند رگرسیون خطی برای تقویت گفتار

عنوان انگلیسی
A new a priori SNR estimator based on multiple linear regression technique for speech enhancement
کد مقاله سال انتشار تعداد صفحات مقاله انگلیسی
24692 2014 11 صفحه PDF
منبع

Publisher : Elsevier - Science Direct (الزویر - ساینس دایرکت)

Journal : Digital Signal Processing, Volume 30, July 2014, Pages 154–164

ترجمه کلمات کلیدی
بهبود گفتار - برآورد پیشینی - رگرسیون خطی چندگانه - مدل مخلوط گوسی -
کلمات کلیدی انگلیسی
Speech enhancement, A priori SNR estimation, Multiple linear regression, Gaussian mixture model,
پیش نمایش مقاله
پیش نمایش مقاله  پیشینی برآوردگر SNR جدید بر اساس روش چند رگرسیون خطی برای تقویت گفتار

چکیده انگلیسی

We propose a new approach to estimate the a priori signal-to-noise ratio (SNR) based on a multiple linear regression (MLR) technique. In contrast to estimation of the a priori SNR employing the decision-directed (DD) method, which uses the estimated speech spectrum in previous frame, we propose to find the a priori SNR based on the MLR technique by incorporating regression parameters such as the ratio between the local energy of the noisy speech and its derived minimum along with the a posteriori SNR. In the experimental step, regression coefficients obtained using the MLR are assigned according to various noise types, for which we employ a real-time noise classification scheme based on a Gaussian mixture model (GMM). Evaluations using both objective speech quality measures and subjective listening tests under various ambient noise environments show that the performance of the proposed algorithm is better than that of the conventional methods.

مقدمه انگلیسی

Speech enhancement is a key factor in various speech communication systems such as robust speech recognition, mobile communication, and speech coding due to background acoustic noise [1], [2], [3], [4], [5], [6], [7], [8], [9], [10] and [11]. The a priori signal-to-noise ratio (SNR) is one of the most crucial parameters in speech enhancement areas [12], [13], [14], [15] and [16]. Actually, the a priori SNR should be carefully estimated for the reductions of musical noise and speech distortion within the minimum mean squared error (MMSE)-based spectral gain estimation [17] and [18]. However, accurate estimation of the a priori SNR is actually difficult in non-stationary noise environments [19]. Thus, over the past few years, several studies have been performed to estimate the a priori SNR [12], [13], [14], [15], [16], [17], [18] and [19]. For example, Ephraim and Malah found that the performance of the speech enhancement could be significantly degraded due to the inaccurate a priori SNR estimation [1] so that they firstly used a maximum likelihood (ML) method to estimate the a priori SNR. They also introduced the decision-directed (DD) approach, which is used to estimate the a priori SNR, based on the definition of the a priori SNR and its relationship with the a posteriori SNR [1]. Alternatively, Cohen proposed a noncausal (NC) a priori SNR estimator that uses future spectral signals to estimate the spectral variance of the clean speech signal [13]. However, this approach has limited applications because noncausal estimation always has an additional delay [20]. Recently, Park and Chang proposed a novel a priori SNR estimator by employing the sigmoid type function [6]. Unfortunately, this scheme does not consider a diversity of noise environments on noise variation [21]. More recently, Suhadi et al. proposed a data-driven approach that employs two trained neural networks to estimate the a priori SNR [20]. However, this approach requires a substantial training process for estimating the a priori SNR, which cannot be robust estimator under varying noise environments. Among the previous works, the DD approach of Ephraim and Malah is often preferred for estimating the a priori SNR because it is the most computationally efficient method with acceptable performance. However, the DD approach also has a drawback such as slow responses to speech onsets [14] and [20]. Indeed, the DD estimation for the true a priori SNR is composed of the estimation of the a priori SNR obtained from the previous frames and that of the current a posteriori SNR. The weight of the a priori SNR estimator obtained in the previous frame is substantially larger than that of the a posteriori SNR estimator based on the current frame [14]. Therefore, this characteristic of the DD approach cause a delay in a priori SNR especially in speech onsets, possibly degrading the quality of the enhanced speech signal. In particular, Breithaupt and Martin [22] analyzed that the DD-based SNR estimator in low SNR conditions has limits in reducing noise in terms of preservation of speech onsets and the suppression of musical noise. In addition, it is discovered that subband-weighting rule can be separately trained from the various noise environments and stored in a look-up table [23]. For example, Erkelens et al. [24] proposed a data-driven weighting method, which uses a training step under white noise with known spectral variance to address the bias problem especially in a low SNR. However, it has a tendency to offer inaccurate noise estimate especially in various noise conditions and is computationally inefficient. For this reason, an efficient methodology to estimate the a priori SNR is needed while allowing its application to a variety of noise environments without a considerable training process. Thus, we propose a novel approach to estimate the a priori SNR using multiple linear regression (MLR) [25], [26] and [27]. In our proposed approach, we apply the MLR to overcome the aforementioned problem in the DD algorithm because the MLR does not use the estimation of the a priori SNR obtained from processing the previous frame as the DD. Specifically, the MLR can estimate the best-fitting surface of a suitable function that relates the independent and dependent variables [27]. We use the ratio between the local energy of the noisy speech and its derived minimum SrSr [8], which is known to have similar characteristics to the estimation of the a priori SNR [28] as an independent variable along with the estimated a posteriori SNR and we use the true SNR as a dependent variable. In our training process, the regression coefficients are estimated, which represents the best-fitting surface between the independent and dependent variables, so that the estimated a priori SNR fits better the true SNR than the conventional estimators. In the testing process, assignment of the regression coefficients is performed according to various noise types determined by a real-time noise classification scheme using the Gaussian mixture model (GMM) [21] and [29]. The performance of the proposed algorithm is evaluated using extensive objective and subjective speech signal quality measures in various noise environments. The experimental results reveal that the proposed method shows better performance than the conventional methods.

نتیجه گیری انگلیسی

In this paper, we proposed a new a priori SNR estimation method based on the MLR technique. The principal contribution of this paper is that it establishes a method to accurately estimate the a priori SNR by using different regression coefficients based on the noise classification performed by the GMM technique. We evaluated the performance of the proposed method using both objective and subjective speech quality measures under various noise environments. Our experimental results showed that the proposed algorithm performs better with extra computational burden when compared with conventional algorithms.