دانلود مقاله ISI انگلیسی شماره 124357
ترجمه فارسی عنوان مقاله

تشخیص خطا و تعیین دقت در تشخیص گفتار خودکار با استفاده از شبکه های عصبی مجدد عمیق دو طرفه

عنوان انگلیسی
Error detection and accuracy estimation in automatic speech recognition using deep bidirectional recurrent neural networks
کد مقاله سال انتشار تعداد صفحات مقاله انگلیسی
124357 2017 19 صفحه PDF
منبع

Publisher : Elsevier - Science Direct (الزویر - ساینس دایرکت)

Journal : Speech Communication, Volume 89, May 2017, Pages 70-83

ترجمه کلمات کلیدی
شناسایی خودکار گفتار، شناسایی خطا، برآورد دقت، زمینه های تصادفی محض، شبکههای عصبی مجدد عمیق دو طرفه،
کلمات کلیدی انگلیسی
Automatic speech recognition; Error detection; Accuracy estimation; Conditional random fields; Deep bidirectional recurrent neural networks;
پیش نمایش مقاله
پیش نمایش مقاله  تشخیص خطا و تعیین دقت در تشخیص گفتار خودکار با استفاده از شبکه های عصبی مجدد عمیق دو طرفه

چکیده انگلیسی

Recurrent neural networks (RNNs) have recently been applied as the classifiers for sequential labeling problems. In this paper, deep bidirectional RNNs (DBRNNs) are applied to error detection in automatic speech recognition (ASR), which is a sequential labeling problem. We investigate three types of ASR error detection tasks, i.e. confidence estimation, out-of-vocabulary word detection and error type classification. We also estimate ASR accuracy, i.e. percent correct and word accuracy, from the error type classification results. Experimental results using English and Japanese lecture speech corpora show that the DBRNNs greatly outperform conditional random fields (CRFs) and the other NN structures, i.e. deep feedforward NNs (DNNs) and deep unidirectional RNNs (DURNNs). These performance improvements are because the DBRNNs can take the longer bidirectional context of input feature vectors into account and can model highly nonlinear relationships between the input feature vectors and output labels. In detailed analyses, the DBRNNs show a better generalization ability than the CRFs. These results are thanks to the ability of the DBRNNs to represent (embed) the words in a low-dimensional continuous value vector space. In addition, the superiority of the DBRNNs to the DNNs and DURNNs indicates that the average length of the context of the input feature vectors required for ASR error detection is only a few time steps, however, it will change (lengthen) depending on the situation.