The aim of this article is to consider a new linear programming and two goal programming models for two-group classification problems. When these approaches are applied to the data of real life or of simulation, our proposed new models perform well both in separating the groups and the group–membership predictions of new objects. In discriminant analysis some linear programming models determine the attribute weights and the cut-off value in two steps, but our models determine simultaneously all of these values in one step. Moreover, the results of simulation experiments show that our proposed models outperform significantly than existing linear programming and statistical approaches in attaining higher average hit-ratios.
Discriminant analysis has been successfully used for many fields, health applications, education planning, taxonomy problems, including engineering applications. Discriminant analysis is a technique which is interested in determining the groups of objects based on their observed scores. Especially Fisher’s linear discriminant function is the most popular technique which is frequently used for the discriminant problem. As an alternative for the examination of classification problems using the statistical methods, it is recently developed a number of new efficient mathematical programming approaches. See Bajgier and Hill, 1982, Erenguc and Koehler, 1990, Fred and Glover, 1981a, Fred and Glover, 1981b, Fred and Glover, 1986, Glover, 1990, Joachimsthaler and Stam, 1988, Koehler and Erenguc, 1990, Lam and Moy, 1997, Lam and Moy, 2002, Lee and Ord, 1990, Rubin, 1990 and Sueyoshi, 1999 among others. In two-group and multigroup classification problems, Lam, Choo, and Moy (1996) and Lam and Moy (1996) developed a satisfactory model for classification based on cluster analysis. But, in two-group classification problems, their approach minimizes the sum of deviations of all object’s classification scores from the mean group classification scores.
In statistics, it is known that the median is the point minimizing the total ℓ1-norm distance while the mean is the point that minimizes the total ℓ2-norm distance from all points to it (Benjamin et al., 2005, Bradley et al., 1997 and Gilani and Padberg, 2002). Our suggested models and Lam et al. (1996) model, which we will call as the LCM, base on the ℓ1-norm.
For these reasons, it is more appropriate to use the median in place of the mean in the LCM model. The approaches based on linear and goal programming presented here examine the two-group classification problems by minimizing the sum of deviations between the classification scores of all objects and the group median scores.
In this article, three new mathematical approaches LPMED, GPMEAN and GPMED are developed in solving two-group classification problems. Since LCM determines the attribute weights and the cut-off value in two steps, GPMEAN and GPMED have an advantage over LCM because they determine simultaneously all of these values in one step. In respect to the results of real life application and simulation experiments, it is seen that these three models are capable of solving two-group classification problems, and GPMEAN and GPMED models are efficient, and also practicable. The results of simulation studies indicate that the GPMEAN and GPMED models have greater average hit-ratios when there exist multiple optimal solutions for the first priority, and have good CPU times than FLDF, MSD and LCM models for all distributions. LPMED also seems to outperform LCM in the simulation experiment for skewed distributions. But LPMED did not give a good performance as well as GPMEAN and GPMED models. Furthermore, it is seen for all distributions that the approach GPMED is almost superior to all other models in respect of classification performance and CPU times.
For a further study, the performances of the approaches LPMED, GPMEAN and especially GPMED may be investigated for multigroup classification problems.