دانلود مقاله ISI انگلیسی شماره 124385
ترجمه فارسی عنوان مقاله

برآورد کیفیت سخنرانی غیرمستقیم به عنوان ترکیبی از تخمین ها با استفاده از ویژگی های شنوایی چندگانه در طول زمان

عنوان انگلیسی
Non-intrusive speech quality estimation as combination of estimates using multiple time-scale auditory features
کد مقاله سال انتشار تعداد صفحات مقاله انگلیسی
124385 2017 11 صفحه PDF
منبع

Publisher : Elsevier - Science Direct (الزویر - ساینس دایرکت)

Journal : Digital Signal Processing, Volume 70, November 2017, Pages 114-124

ترجمه کلمات کلیدی
غیر قابل نفوذ، کیفیت سخنرانی، ویژگی های چندگانه در طول زمان، مدل شنیداری سخنرانی ضعیف،
کلمات کلیدی انگلیسی
Non-intrusive; Speech quality; Multiple time-scale features; Auditory model; Degraded speech;
پیش نمایش مقاله
پیش نمایش مقاله  برآورد کیفیت سخنرانی غیرمستقیم به عنوان ترکیبی از تخمین ها با استفاده از ویژگی های شنوایی چندگانه در طول زمان

چکیده انگلیسی

The human auditory system is modeled by different auditory models representing the distribution of speech sound energy in different channels across the cochlea using filter-banks of different bandwidths. In previous algorithms of non-intrusive speech quality evaluation, auditory features are determined using these auditory models on per frame basis and then averaged over the entire speech utterance. In these approaches, the effect of impulsive noise and other non-stationary noise effects get averaged over the utterance. To include the variations in the features of speech over time in the speech utterance, a multiple time-scale features approach has been proposed as the speech features vary from frame to frame that accounts for variation of noise characteristics over the speech utterance and thus its affect on quality mapping. In this work, non-intrusive speech quality evaluation has been done as an optimal linear combination of quality mapping called objective mean opinion score (MOS), computed using multiple time-scale estimates of features. The objective MOS of each of the multiple time-scale estimates (the combination of multiple active speeches) are obtained using a probabilistic approach. The overall objective MOS of the speech utterance is computed by taking the optimal linear combination of the estimated objective MOS using multiple time-scale estimates of features, where the optimality is based on the minimum mean square error (MMSE) criterion or correlation maximization criterion. The results are given in terms of Pearson's correlation coefficient and root mean square error (RMSE) between the subjective MOS and the estimated overall objective MOS for three different standard databases. The results have been compared with a single time-scale features approach, the ITU-T Recommendation P.563 and recent algorithms.