دانلود مقاله ISI انگلیسی شماره 28178
ترجمه فارسی عنوان مقاله

طبقه بندی بهینه با حداقل خطای مورد انتظار در چارچوب بیزی - قسمت دوم: خواص و تجزیه و تحلیل عملکرد

عنوان انگلیسی
Optimal classifiers with minimum expected error within a Bayesian framework — Part II: Properties and performance analysis
کد مقاله سال انتشار تعداد صفحات مقاله انگلیسی
28178 2013 13 صفحه PDF
منبع

Publisher : Elsevier - Science Direct (الزویر - ساینس دایرکت)

Journal : Pattern Recognition, Volume 46, Issue 5, May 2013, Pages 1288–1300

ترجمه کلمات کلیدی
برآورد بیزی - طبقه بندی - برآورد خطا - ژنومیک - برآورد میانگین مربعات حداقل - نمونه های کوچک -
کلمات کلیدی انگلیسی
Bayesian estimation, Classification, Error estimation, Genomics, Minimum mean-square estimation, Small samples,
پیش نمایش مقاله
پیش نمایش مقاله  طبقه بندی بهینه با حداقل خطای مورد انتظار در چارچوب بیزی - قسمت دوم: خواص و تجزیه و تحلیل عملکرد

چکیده انگلیسی

In part I of this two-part study, we introduced a new optimal Bayesian classification methodology that utilizes the same modeling framework proposed in Bayesian minimum-mean-square error (MMSE) error estimation. Optimal Bayesian classification thus completes a Bayesian theory of classification, where both the classifier error and our estimate of the error may be simultaneously optimized and studied probabilistically within the assumed model. Having developed optimal Bayesian classifiers in discrete and Gaussian models in part I, here we explore properties of optimal Bayesian classifiers, in particular, invariance to invertible transformations, convergence to the Bayes classifier, and a connection to Bayesian robust classifiers. We also explicitly derive optimal Bayesian classifiers with non-informative priors, and explore relationships to linear and quadratic discriminant analysis (LDA and QDA), which may be viewed as plug-in rules under Gaussian modeling assumptions. Finally, we present several simulations addressing the robustness of optimal Bayesian classifiers to false modeling assumptions. Companion website: http://gsp.tamu.edu/Publications/supplementary/dalton12a.

مقدمه انگلیسی

In the first part of this two-part study [1], we defined an optimal Bayesian classifier to be a classifier that minimizes the probability of misclassifying a future point relative to the assumed model conditioned on the observed sample, or equivalently minimizes the Bayesian error estimate. The problem of optimal Bayesian classification over an uncertainty class of feature-label distributions arises naturally from two related sources: the need for accurate classification and the need for accurate error estimation. With small samples, the latter is only possible with application of prior knowledge in conjunction with the sample data. Given prior knowledge, it behooves us to find an optimal error estimator and classifier relative to the prior knowledge. Having found optimal Bayesian error estimators in [2] and [3], found analytic representation of the MSE of these error estimates in [4] and [5], and found expressions for optimal Bayesian classifiers in terms of the effective class-conditional densities in [1], here, in part II we examine basic properties of optimal Bayesian classifiers. We study invariance to invertible transformations in discrete and continuous models, convergence to the Bayes classifier, and a connection to robust classification. The latter is a classical filtering problem [6] and [7], where in the context of classification one wishes to find an optimal classifier over a parameterized uncertainty class of feature-label distributions absent new data [8]. Heretofore, the robust classification problem had only been solved in a suboptimal manner and now the optimal robust classifier falls out from the theory of optimal Bayesian classification. We also explicitly derive optimal Bayesian classifiers using non-informative priors and, using Gaussian modeling assumptions, compare these to plug-in classification rules, such as linear discriminant analysis (LDA) and quadratic discriminant analysis (QDA), which are optimal in fixed Gaussian models with common covariance matrix and different covariance matrices, respectively. Finally, we present several simulations addressing the robustness of optimal Bayesian classifiers to false modeling assumptions. Having some robustness to incorrect modeling assumptions is always important in practice because, even if one utilizes statistical techniques, such as hypothesis tests, for model checking, these can at best, even for very small p values, lead to not rejecting the assumed model. For the sake of completeness, we begin by stating some key definitions and propositions from Part I [1]. An optimal Bayesian classifier is any classifier, ψOBCψOBC, satisfying equation(1) Eπ⁎[ε(θ,ψOBC)]≤Eπ⁎[ε(θ,ψ)],Eπ⁎[ε(θ,ψOBC)]≤Eπ⁎[ε(θ,ψ)], Turn MathJax on for all ψ∈Cψ∈C, where ε(θ,ψ)ε(θ,ψ) is the true error of classifier ψψ under a feature-label distribution parameterized by θ∈Θθ∈Θ and CC is an arbitrary family of classifiers. In (1), the expectations are taken relative to a posterior distribution, π⁎(θ)π⁎(θ), on the parameters that is updated from a prior, π(θ)π(θ), after observing a sample, S n, of size n . An optimal Bayesian classifier minimizes the Bayesian error estimate, View the MathML sourceε^(ψ,Sn)=Eπ⁎[ε(θ,ψ)]. For a binary classification problem, the Bayesian framework defines θ=[c,θ0,θ1]θ=[c,θ0,θ1], where c is the a priori probability that a future point comes from class 0 and θ0θ0 and θ1θ1 parameterize the class-0 and class-1-conditional distributions, respectively. For a fixed class, y∈{0,1}y∈{0,1}, we let fθy(x|y)fθy(x|y) be the class-conditional density parameterized by θyθy and denote the marginal posterior of θyθy by π⁎(θy)π⁎(θy). If Eπ⁎[c]=0Eπ⁎[c]=0, then the optimal Bayesian classifier is a constant and always assigns class 1; if Eπ⁎[c]=1Eπ⁎[c]=1 then it always assigns class 0. Hence, we typically assume that 0<Eπ⁎[c]<10<Eπ⁎[c]<1. Two important theorems from Part I follow. Theorem 1 Evaluating Bayesian error estimators. Let ψψbe a fixed classifier given by ψ(x)=0ψ(x)=0if x∈R0x∈R0and ψ(x)=1ψ(x)=1if x∈R1x∈R1, where measurable sets R 0and R 1partition the sample space. Then equation(2) View the MathML sourceε^(ψ,Sn)=Eπ⁎[c]∫R1f(x|0)dx+(1−Eπ⁎[c])∫R0f(x|1)dx, Turn MathJax on where IEIEis an indicator function equal to one if E is true and zero otherwise, and equation(3) View the MathML sourcef(x|y)=∫Θyfθy(x|y)π⁎(θy)dθy, Turn MathJax on is known as the effective class-conditional density. Theorem 2 Optimal Bayesian classification. An optimal Bayesian classifier , ψOBCψOBC, satisfying (1)for all ψ∈Cψ∈C, the set of all classifiers with measurable decision regions, exists and is given pointwise by equation(4) View the MathML sourceψOBC(x)=0ifEπ⁎[c]f(x|0)≥(1−Eπ⁎[c])f(x|1),1otherwise. Turn MathJax on

نتیجه گیری انگلیسی

This work ties Bayesian classifier design and Bayesian error estimation together with the old problem of optimal robust filtering. As with Wiener filtering, we first find representations for some error measure (e.g., expected error or MSE) and then find optimizing parameters. Optimal Bayesian classification has a connection with Bayesian robust classification, with the distinction that it permits optimization over an arbitrary space of classifiers and utilizes a posterior distribution of the parameters, which is in a sense the full knowledge of the underlying distributions available. Optimal Bayesian classification in our Gaussian models is consistent and generally robust with respect to mild deviations from model assumptions. Reiterating what we said at the outset of [2], in small-sample settings it is rarely possible to obtain quantifiably accurate error estimates absent prior knowledge. Thus, it is natural to utilize prior knowledge in a classical MMSE framework to derive optimal error estimators and optimal classifiers.