تجزیه و تحلیل حساسیت معیار کرنش برای پوسته شدن چند بعدی
کد مقاله | سال انتشار | تعداد صفحات مقاله انگلیسی |
---|---|---|
25833 | 2006 | 19 صفحه PDF |
Publisher : Elsevier - Science Direct (الزویر - ساینس دایرکت)
Journal : Computational Statistics & Data Analysis, Volume 50, Issue 1, 10 January 2006, Pages 135–153
چکیده انگلیسی
Multidimensional scaling (MDS) is a collection of data analytic techniques for constructing configurations of points from dissimilarity information about interpoint distances. Classsical MDS assumes a fixed matrix of dissimilarities. However, in some applications, e.g., the problem of inferring 3-dimensional molecular structure from bounds on interatomic distances, the dissimilarities are free to vary, resulting in optimization problems with a spectral objective function. A perturbation analysis is used to compute first- and second-order directional derivatives of this function. The gradient and Hessian are then inferred as representers of the derivatives. This coordinate-free approach reveals the matrix structure of the objective and facilitates writing customized optimization software. Also analyzed is the spectrum of the Hessian of the objective.
مقدمه انگلیسی
Multidimensional scaling (MDS) is a collection of data analytic techniques for constructing configurations of points from dissimilarity information about interpoint distances. Developed primarily by psychometricians and statisticians, MDS is widely used in a variety of disciplines for visualization and dimension reduction. The extensive literature on MDS includes numerous books, e.g., Borg and Groenen (1997), Cox and Cox (1994), and Everitt and Rabe-Hesketh (1997); and book chapters, e.g., Everitt and Dunn (1991, Chapter 5), Krzanowski and Marriott (1994, Chapter 5), Mardia et al. (1979, Chapter 14), and Seber (1984, Section 5.5). Many specific formulations of MDS are possible; a useful organizing principle, adopted by de Leeuw and Heiser (1982) and by Trosset (1997b), is to formulate MDS as a collection of optimization problems. The method of Torgerson (1952) and Gower (1966), variously called classical MDS and principal coordinate analysis, can be formulated as an optimization problem with an objective function whose minimum value is sometimes called the strain criterion. Formally, a matrix is a dissimilarity matrix if and only if it is symmetric and hollow (its diagonal entries vanish) with nonnegative entries. Given an n×nn×n matrix Δ=(δij)Δ=(δij) of squared dissimilarities and a target dimension p , a classical problem in distance geometry is to determine if ΔΔ is a matrix of p -dimensional squared Euclidean distances, i.e., if there exist x1,…,xn∈Rpx1,…,xn∈Rp such that ∥xi-xj∥2=δij∥xi-xj∥2=δij. It is well-known that the answer is affirmative if and only if the symmetric n×nn×n matrix View the MathML sourceA=τ(Δ)=-12PTΔP Turn MathJax on is positive semidefinite with rank(A)⩽prank(A)⩽p, where P is the n×nn×n projection matrix I-eeT/nI-eeT/n, I is the n×nn×n identity matrix, and e=(1,…,1)T∈Rne=(1,…,1)T∈Rn. This embedding theorem motivated classical MDS, which can be stated as the problem of finding the symmetric positive semidefinite n×nn×n matrix of rank ⩽p⩽p that is nearest τ(Δ)τ(Δ) in squared Frobenius distance. The minimum value of the objective function, View the MathML source∥B-τ(Δ)∥F2, is the strain criterion. Detailed studies of the linear operator ττ were made by Critchley (1988) and by Gower and Groenen (1991). To evaluate the strain criterion for a fixed ΔΔ, one computes the spectral decomposition τ(Δ)=QΛQTτ(Δ)=QΛQT and sets View the MathML sourceΛ¯=diag(λ¯), where View the MathML sourceλ¯i=max(λi,0)i=1,…,p0i=p+1,…,n Turn MathJax on and View the MathML sourcediag(λ¯) is the diagonal matrix whose diagonal entries are View the MathML source(λ¯1,…,λ¯n). Then View the MathML sourceQΛ¯QT is a global minimizer of View the MathML source∥B-τ(Δ)∥F2 and the global minimum is equation(1) View the MathML sourceFp∘τ(Δ)=∥QΛ¯QT-QΛQT∥F2=∑i=1n(λ¯i-λi)2=∑i=1p[max(λi,0)-λi]2+∑i=p+1nλi2=∑i=1nλi2-∑i=1p[max(λi,0)]2=∥τ(Δ)∥F2-∑i=1p[max(λi,0)]2. Turn MathJax on Notice that only the p largest eigenvalues are required to evaluate the strain criterion. Classical MDS assumes that ΔΔ is fixed. Recently, extensions of classical MDS have been developed for nonmetric MDS (Trosset, 1998b) and for various problems with bound constraints (Trosset, 1998a and Trosset, 2000). The latter include the important problem of inferring 3-dimensional molecular structure from partial information about interatomic distances, which is especially challenging because the number of atoms (n ) is typically large and precise solutions are required. Trosset (2002) analyzed the computational theory of these extensions and reported computational results for several molecular conformation problems. To efficiently minimize F3∘τ(Δ)F3∘τ(Δ) (for simplicity, we henceforth restrict attention to p=3p=3) as ΔΔ varies, we require first- and second-order derivative information about the objective function F3∘τF3∘τ. First derivatives of F3∘τF3∘τ can be derived by various methods. One useful technique involves perturbation analysis, as in Wilkinson (1965). This technique was exploited by Sibson (1979) in his study of the robustness of classical MDS. Another technique involves the theory of reducible nonlinear programming, as in Parks (1985); yet another technique involves the theory of spectral functions, as in Lewis (1996). These techniques were exploited by Trosset, 1997a and Trosset, 2002 to derive the first-order partial derivatives of F3∘τF3∘τ with respect to the components of ΔΔ. Previous research has not considered the second derivatives of F3∘τF3∘τ. Noting that there are O(n4)O(n4) second-order partial derivatives of F3∘τF3∘τ, Trosset (2002) opted to use a limited memory quasi-Newton method for optimization. The primary purpose of the present work is to derive second derivatives of F3∘τF3∘τ. We do so via perturbation analysis; thus, our preliminary derivation of first derivatives overlaps Sibson (1979). The essence of our approach lies in writing the second-order Taylor polynomial approximation of F3∘τF3∘τ in terms of duality pairings of symmetric matrices via the Frobenius inner product. We then compute directional derivatives of F3∘τF3∘τ and infer the gradient and Hessian as representers of the derivatives. This allows us to manipulate matrices directly, rather than working with matrix components. The result is a more lucid, coordinate-free approach that better reveals the matrix structure of the objective function and better lends itself to writing customized software for minimizing F3∘τF3∘τ. Furthermore, because we compute Hessian-matrix pairings, we can avail ourselves of optimization algorithms based on iterative linear algebra, and we circumvent the memory requirements that dissuaded Trosset (2002) from using second-order methods. We conclude by analyzing the spectrum of the Hessian of F3∘τF3∘τ.
نتیجه گیری انگلیسی
Classical MDS can be stated as the problem of finding the symmetric positive semidefinite matrix of rank ⩽p⩽p that is nearest τ(Δ)τ(Δ) is squared Frobenius distance, for ΔΔ fixed. This problem has an explicit solution, and the minimum squared distance is sometimes called the strain criterion. We have been concerned with extensions of classical MDS in which ΔΔ is free to vary. (For simplicity, we have emphasized the important case of p=3p=3.) The resulting nonlinear optimization problems do not have explicit solutions; iterative methods are required. A great deal is known about the geometry of the closed cone of symmetric positive semidefinite matrices of rank ⩽p⩽p, but it is not clear how to exploit this knowledge in an optimization algorithm. Traditional nonlinear programming assumes a problem that has been formulated using (preferably linear) equality and inequality constraints. Accordingly, Trosset proposed using the strain criterion as an objective function, thereby eliminating a nonlinear constraint that confounds traditional optimization algorithms. The geometry of that constraint is preserved, encoded in the objective function. The challenge addressed in the preceding sections is how to extract that information in a form that can be used by traditional optimization algorithms. The preceding sections have provided a complete characterization of the Hessian of the strain criterion. The strain criterion is unusual insofar as it is a highly nontrivial nonlinear function, yet its Hessian-vector products can be computed in closed form, as can the eigenvalues and eigenvectors of the Hessian. We expect that these properties can be exploited, leading to more efficient algorithms for solving the optimization problems that motivated the present study. For example, Trosset (2002) used a bound-constrained quasi-Newton algorithm with a limited memory BFGS updating formula (LM-BFGS) that constructs Hessian approximations from gradients computed for several previous iterations. For unconstrained optimization, Nash and Nocedal (1991) compared LM-BFGS and a truncated Newton strategy. The latter is based on the successive approximate minimization, via conjugate gradients, of a quadratic model of the nonlinear objective—an approach that is similar to the algorithms that we are developing for constrained minimization of the strain criterion. Although general comparisons are difficult, Nash and Nocedal observed that, for certain conditions that obtain in the strain objective (e.g., a mildly nonlinear objective, a Hessian with some zero eigenvalues), truncated Newton was generally more efficient than LM-BFGS. Truncated Newton is even more attractive when Hessian-vector products can be computed efficiently, as can be done for the strain criterion using (22). Finally, Trosset (2002) observed that working directly with squared dissimilarities, rather than parameterizing distance matrices by the Cartesian coordinates of n points in RpRp, appears to result in more tractable optimization problems with fewer nonglobal minimizers. Intuitively, it would seem that allowing each squared dissimilarity to vary independently of the others has the positive effect of decoupling the complicated interactions that result from varying individual coordinates. Our analysis of the Hessian of the strain criterion clarifies the nature of its nonconvexity, yielding new insight. Specifically, the nonconvexity of the strain criterion near ΔΔ is related to the possibility of embedding ΔΔ in some RqRq, where q>pq>p. This insight contradicts, at least superficially, popular speculation that one would do well to address problems in RpRp by working in RqRq. The full significance of this insight is not yet clear to us; it will be investigated in future work.