دانلود مقاله ISI انگلیسی شماره 157494
ترجمه فارسی عنوان مقاله

معماری پرتوهای صوتی عصبی عمومی برای پردازش گفتار چند کاناله قوی

عنوان انگلیسی
A generic neural acoustic beamforming architecture for robust multi-channel speech processing
کد مقاله سال انتشار تعداد صفحات مقاله انگلیسی
157494 2017 12 صفحه PDF
منبع

Publisher : Elsevier - Science Direct (الزویر - ساینس دایرکت)

Journal : Computer Speech & Language, Volume 46, November 2017, Pages 374-385

پیش نمایش مقاله
پیش نمایش مقاله  معماری پرتوهای صوتی عصبی عمومی برای پردازش گفتار چند کاناله قوی

چکیده انگلیسی

Acoustic beamforming can greatly improve the performance of Automatic Speech Recognition(ASR) and speech enhancement systems when multiple channels are available. We recently proposed a way to support the model-based Generalized Eigenvalue beamforming operation with a powerful neural network for spectral mask estimation. The enhancement system has a number of desirable properties. In particular, neither assumptions need to be made about the nature of the acoustic transfer function (e.g., being anechonic), nor does the array configuration need to be known. While the system has been originally developed to enhance speech in noisy environments, we show in this article that it is also effective in suppressing reverberation, thus leading to a generic trainable multi-channel speech enhancement system for robust speech processing. To support this claim, we consider two distinct datasets: The CHiME 3challenge, which features challenging real-world noise distortions, and the Reverbchallenge, which focuses on distortions caused by reverberation. We evaluate the system both with respect to a speech enhancement and a recognition task. For the first task we propose a new way to cope with the distortions introduced by the Generalized Eigenvalue beamformer by renormalizing the target energy for each frequency bin, and measure its effectiveness in terms of the PESQ score. For the latter we feed the enhanced signal to a strong DNN back-end and achieve state-of-the-art ASR results on both datasets. We further experiment with different network architectures for spectral mask estimation: One small feed-forward network with only one hidden layer, one Convolutional Neural Network and one bi-directional Long Short-Term Memory network, showing that even a small network is capable of delivering significant performance improvements.