حداکثر استقرایی رگرسیون خطی برای به رسمیت شناختن زبان
|کد مقاله||سال انتشار||مقاله انگلیسی||ترجمه فارسی||تعداد کلمات|
|24603||2012||5 صفحه PDF||سفارش دهید||3610 کلمه|
Publisher : Elsevier - Science Direct (الزویر - ساینس دایرکت)
Journal : Expert Systems with Applications, Volume 39, Issue 4, March 2012, Pages 4287–4291
This paper proposes the use of Maximum A Posteriori Linear Regression (MAPLR) transforms as feature for language recognition. Rather than estimating the transforms using maximum likelihood linear regression (MLLR), MAPLR inserts the priori information of the transforms in the estimation process using maximum a posteriori (MAP) as the estimation criterion to drive the transforms. By multi MAPLR adaptation each language spoken utterance is convert to one discriminative transform supervector consist of one target language transform vector and other non-target transform vectors. SVM classifiers are employed to model the discriminative MAPLR transform supervector. This system can achieve performance comparable to that obtained with state-of-the-art approaches and better than MLLR. Experiment results on 2007 NIST Language Recognition Evaluation (LRE) databases show that relative decline in EER of 4% and on mincost of 9% are obtained after the language recognition system using MAPLR instead of MLLR in 30-s tasks, and further improvement is gained combining with state-of-the-art systems. It leads to gains of 6% on EER and 11% on minDCF comparing with the performance of the only combination of the MMI system and the GMM–SVM system.
The aim for language recognition is to determine the language spoken in a given segment of speech. PRLM and PPRLM approaches that use phonotactic information, have shown very successful performance (Yan and Barnard, 1995 and Zissman, 1995). In PPRLM, several tokenizers are used to transcribe the input speech into phoneme strings or lattices (Gauvain et al., 2004 and Shen et al., 2006) which are scored by n-gram language models. It is generally believed that phonotactic feature and spectral feature provide complementary cues to each other (Zissman, 1995). The spectral features of speech are collected as independent vectors. The collection of vectors can be extracted as shifted-delta-cepstral acoustic features, and then modeled by Gaussian Mixture Model (GMM). The result was reported in Torres-Carrasquillo, Singer, Kohler, Greene, and Reynolds (2002). The approach was further improved by using discriminative train that named Maximum Mutual Information (MMI). Several studies using SVM in language recognition to form GSV-SVM system (Campbell et al., 2006 and Li et al., 2007). SVM as a classifier maps input feature vector into high dimensional space then separate classes with maximum margin hyperplane. It is important to choose an appropriate SVM feature expansion, which maps a given utterance to a feature vector in a high-dimensional feature space for SVM classification. Maximum likelihood linear regression (MLLR) is a commonly used adaptation approach in large vocabulary speech recognition systems. The concatenation of the transformation parameters can be seen as a kind of mapping from the given utterance to a high-dimensional space. MLLR and CMLLR are introduced for the task of speaker recognition in Ferras, Leung, Barras, and Gauvain (2008) and language recognition in Shen and Reynolds, 2008 and Zhong and Liu, 2010, which is useful for the system fusion. A system proposed in Stolcke, Ferrer, Kajarekar, Shriberg, and Venkataraman (2005) first used the MLLR transforms employed in automatic speech speaker recognition. Another system uses constrained MLLR (CMLLR) to adapt the means of a GMM UBM to a given utterance, and uses the entries of the transform as features for SVM classification (Ferras, Leung, Barras, & Gauvain, 2007). In Shen and Reynolds (2008) CMLLR is used as a feature-space implementation in language recognition. Zhong and Liu (2010) propose CMLLR supervector kernel and the system uses the entries of the transform as features for SVM classification in language recognition. In MLLR and CMLLR, parameters are estimated with the maximum likelihood (ML) criteria, which is well known for its poor asymptotic properties and may generate unacceptable affine transformation parameters when the adaptation data is insufficient (Gales, 1998). A possible solution to this problem is to introduce some constraints on the possible values of the transformation parameters. Maximum A Posteriori Linear Regression (MAPLR) (Chesta, Siohan, & Lee, 1999) is such an adaptation approach, which inserts the priori information of the transforms in the estimation process using maximum a posteriori (MAP) as the estimation criterion to drive the transformation parameters η: equation(1) View the MathML sourceηˆ=argmaxηp(η|X,λ)=argmaxηp(X|λ,η)p(η) Turn MathJax on where p(η) is the priori distribution of the parameters η, X is the adaptation features and λ represents the universal background model. We believe MAPLR can generate transforms that could show better adaptation performance. In ours proposed MAPLR language recognition system each language spoken utterance is convert to feature vector. Then discriminative MAPLR transform supervector space is built from feature vector by multi MAPLR adaptation. That is one language spoken utterance is convert to one discriminative transform supervector consist of one target language transform vector and other non-target transform vectors. SVM classifiers are employed to model the discriminative MAPLR transform supervector and LDA and diagonal covariance gaussians are used as backend in language score calibration. This paper is organized as following: In Section 2, we give a simple review of Support Vector Machines and MLLR. Section 3 shows the MAPLR technique. In Section 4, the proposed MAPLR language recognition system is presented in detail. corpora and evaluation and experimental result are given in Sections 5 and 6. Finally, we conclude in Section 7.
نتیجه گیری انگلیسی
In this paper, MAPLR is introduced to the domain of language recognition. We use the affine transforms as the features for SVM modeling and scoring. Experiments show that MAPLR language recognition shows better performance than MLLR. Besides, the proposed MAPLR approach is quite effective for system fusion. Experiment results on 2007 NIST language Recognition Evaluation databases show that relative decline in EER of 4% and in mincost of 9% are obtained after the language recognition system using MAPLR instead of MLLR in 30-s tasks, and further improvement is gain combining with state-of-the-art systems. It leads to gains of 6% on EER and 11% minDCF after using backend compare with the performance of the only combination of the MMI system and the GMM–SVM system. We see that MAPLR can generate transforms that show better adaptation performance than MLLR because of employing the prior information. And it contain different language information compare to traditional acoustic system.