دانلود مقاله ISI انگلیسی شماره 124389
ترجمه فارسی عنوان مقاله

تقویت طیفی مبتنی بر انسجام شدید برای تشخیص گفتار در محیط های ناخوشایند در دنیای واقعی

عنوان انگلیسی
Robust coherence-based spectral enhancement for speech recognition in adverse real-world environments
کد مقاله سال انتشار تعداد صفحات مقاله انگلیسی
124389 2017 13 صفحه PDF
منبع

Publisher : Elsevier - Science Direct (الزویر - ساینس دایرکت)

Journal : Computer Speech & Language, Volume 46, November 2017, Pages 388-400

ترجمه کلمات کلیدی
شناسایی قوی سخنرانی، پست فیلترینگ افزایش طیفی، نسبت قدرت سازگاری به پخش، فیلتر وینر،
کلمات کلیدی انگلیسی
Robust speech recognition; Postfiltering; Spectral enhancement; Coherence-to-diffuse power ratio; Wiener filter;
پیش نمایش مقاله
پیش نمایش مقاله  تقویت طیفی مبتنی بر انسجام شدید برای تشخیص گفتار در محیط های ناخوشایند در دنیای واقعی

چکیده انگلیسی

Speech recognition in adverse real-world environments is highly affected by reverberation and non-stationary background noise. A well-known strategy to reduce such undesired signal components in multi-microphone scenarios is spatial filtering of the microphone signals. In this article, we demonstrate that an additional coherence-based postfilter, which is applied to the beamformer output signal to remove diffuse interference components from the latter, is an effective means to further improve the recognition accuracy of modern deep learning speech recognition systems. To this end, the 3rd CHiME Speech Separation and Recognition Challenge (CHiME-3) baseline speech enhancement system is extended by a coherence-based postfilter and the postfilter’s impact on the Word Error Rates (WERs) of a state-of-the-art automatic speech recognition system is investigated for the realistic noisy environments provided by CHiME-3. To determine the time- and frequency-dependent postfilter gains, we use Direction-of-Arrival (DOA)-dependent and (DOA)-independent estimators of the coherent-to-diffuse power ratio as an approximation of the short-time signal-to-noise ratio. Our experiments show that incorporating coherence-based postfiltering into the CHiME-3 baseline speech enhancement system leads to a significant reduction of the WERs, with relative improvements of up to 11.31%.