مدل مبتنی بر خوشه بندی داده های انتخاب مشتری
|کد مقاله||سال انتشار||مقاله انگلیسی||ترجمه فارسی||تعداد کلمات|
|20924||2014||11 صفحه PDF||سفارش دهید||6931 کلمه|
Publisher : Elsevier - Science Direct (الزویر - ساینس دایرکت)
Journal : Computational Statistics & Data Analysis, Volume 71, March 2014, Pages 3–13
In several empirical applications analyzing customer-by-product choice data, it may be relevant to partition individuals having similar purchase behavior in homogeneous segments. Moreover, should individual- and/or product-specific covariates be available, their potential effects on the probability to choose certain products may be also investigated. A model for joint clustering of statistical units (customers) and variables (products) is proposed in a mixture modeling framework, and an appropriate EM-type algorithm for ML parameter estimation is presented. The model can be easily linked with similar proposals appeared in various contexts, such as co-clustering of gene expression data, clustering of words and documents in web-mining data analysis.
We propose a model-based approach to cluster individuals and products in disjoint individual- and product-specific groups, where the corresponding partitions are dependent. We will refer to individual-specific groups as segments, while the product-specific groups will be referred to as clusters. The motivation arises from empirical situations where customer data are analyzed to investigate on factors affecting the purchase behavior towards several products. The idea is to define individual-specific segments which are homogeneous in terms of customer product choices; the prior (conditional) probability for an individual to belong to a given segment is assumed to be a function of individual-specific covariates, and we are interested in investigating how such characteristics affect the segment memberships. We can also imagine that, within an individual-specific segment, a partition of the products may be identified depending on their characteristics. For example, customers with a given purchase profile may prefer a particular subset of products because of their features and such preferences may vary within segments of customers. In this view, we may be interested in studying whether individuals in a specific segment (representing a prototypical purchase behavior) choose specific subsets of products for their features. In this perspective, we aim at jointly partitioning customers and products to investigate about the determinants of the customer choices. This purpose might be linked with methods for joint partitioning of genes and tissues (or experimental conditions) in microarray data analysis (see e.g. Martella et al. (2008)), of words and documents in web data analysis (see e.g. Li and Zha (2006)), or, in general, when latent block-based clustering is pursued (see e.g. Govaert and Nadif (2003)). Further interesting links can be established with multi-layer mixtures, see e.g. Li (2005), and with hierarchical mixture of experts models, see e.g. Titsias and Likas (2002). Such methodological connections will be further discussed to better focus and motivate our proposal. The plan of the paper is as follows. In Section 2, the model is introduced in a general framework, and, in Section 3, a ML approach to parameter estimation is described. An EM-type algorithm is detailed in Section 4 in the context of observed count data. In Section 5, the analysis of a benchmark data set is proposed. In the last section, concluding remarks and the future research agenda are discussed. 2. The model Let View the MathML sourceYi, i=1,…,ni=1,…,n, be a pp-dimensional random vector and let View the MathML sourceyi, i=1,…,ni=1,…,n represent the corresponding realization in a sample of size nn; let View the MathML sourceY=(Y1,…,Yn)T denote the (View the MathML sourcen, View the MathML sourcep) matrix of the observed values yijyij, for individual i=1,…,ni=1,…,n and variable j=1,…,pj=1,…,p. Just to give an example, and without loss of generality, we may suppose to consider nn customers and pp products, where yijyij represents the number of items of the jj-th product the ii-th customer has bought in a given time interval. In addition, we assume that a set of outcome-specific (price, weight, type of package, etc.) and of individual-specific (age, gender, educational level, income, etc.) covariates have been also recorded. Let View the MathML sourcexi and View the MathML sourcezj denote the vectors containing the characteristics of the ii-th individual, and of the jj-th product, j=1,…,pj=1,…,p, respectively. In the following, for the sake of clarity, groups of individuals and products will be termed segments and clusters, respectively.
نتیجه گیری انگلیسی
In this paper, we propose a two-level finite mixture model for clustering rows (units) and columns (variables) of a data matrix. The proposal has been sketched in the field of customer behavior for illustrative purposes, but it can be easily extended to other contexts, such as gene expression and text mining analyses, where a partition of objects and features is of interest. The model structure and the adopted parameterization have been compared with several proposals appeared in the recent literature on joint clustering of rows and columns of a two-way data matrix; the model has been proposed in a maximum likelihood framework and an appropriate EM-type algorithm has been outlined. Further extensions of this model might be introduced by looking at different representations for the cluster-specific model parameters.