یک مدل انتخاب سهام ترکیبی با استفاده از الگوریتم ژنتیک و رگرسیون بردار پشتیبانی
کد مقاله | سال انتشار | تعداد صفحات مقاله انگلیسی |
---|---|---|
25630 | 2012 | 12 صفحه PDF |

Publisher : Elsevier - Science Direct (الزویر - ساینس دایرکت)
Journal : Applied Soft Computing, Volume 12, Issue 2, February 2012, Pages 807–818
چکیده انگلیسی
In the areas of investment research and applications, feasible quantitative models include methodologies stemming from soft computing for prediction of financial time series, multi-objective optimization of investment return and risk reduction, as well as selection of investment instruments for portfolio management based on asset ranking using a variety of input variables and historical data, etc. Among all these, stock selection has long been identified as a challenging and important task. This line of research is highly contingent upon reliable stock ranking for successful portfolio construction. Recent advances in machine learning and data mining are leading to significant opportunities to solve these problems more effectively. In this study, we aim at developing a methodology for effective stock selection using support vector regression (SVR) as well as genetic algorithms (GAs). We first employ the SVR method to generate surrogates for actual stock returns that in turn serve to provide reliable rankings of stocks. Top-ranked stocks can thus be selected to form a portfolio. On top of this model, the GA is employed for the optimization of model parameters, and feature selection to acquire optimal subsets of input variables to the SVR model. We will show that the investment returns provided by our proposed methodology significantly outperform the benchmark. Based upon these promising results, we expect this hybrid GA–SVR methodology to advance the research in soft computing for finance and provide an effective solution to stock selection in practice.
مقدمه انگلیسی
Stock selection has been a challenging and important research area in finance and investment decision-making. This line of research is highly contingent upon reliable prediction of future performance of stocks and successful portfolio construction. Recent advances in computational intelligence and data mining are leading to significant opportunities to solve these problems more effectively. Feasible quantitative models include methodologies stemming from soft computing [1] for prediction of financial time series, multi-objective optimization of expected investment return and risk reduction, and portfolio management – selection of investment instruments based on asset ranking using a variety of input variables and historical data, etc. [2] and [3]. All these research efforts were in an attempt to facilitate the task of decision-making for investment. In the research area of stock selection and portfolio optimization, several machine learning methodologies have been developed, including fuzzy systems, artificial neural networks (ANNs), evolutionary algorithms (EAs) as well as support vector machines (SVMs). Earlier work includes several fuzzy approaches; for instance, Chu et al. [4] used fuzzy multiple attribute decision analysis to select stocks for portfolio construction. Analogously, Zargham and Sayeh [5] employed a fuzzy rule-based system to evaluate a set of stocks for the same purpose. Although these fuzzy approaches denote early efforts in employing computational intelligence for financial applications, they usually lack sufficient learning ability. Quah and Srinivasan [6] studied an ANN stock selection system to choose stocks that are top-ranked performers. They showed their proposed model outperformed the benchmark model in terms of compounded actual returns overtime. Chapados and Bengio [7] also trained neural networks for estimation and prediction of asset behavior in order to facilitate decision-making in asset allocation. Although these models worked in some applications, they often suffer from the overfitting problem and may tend to fall into a local optimum. For portfolio optimization, Kim and Han [8] proposed a genetic algorithm (GA) approach to feature discretization and the determination of connection weights for ANNs to predict the stock price index. They suggested that their approach was able to reduce the numbers of attributes and the prediction performance was enhanced. In addition, Caplan and Becker [9] employed genetic programming (GP) to develop a stock ranking model for the high technology manufacturing industry in the U.S. More recently, Becker et al. [10] explored various single-objective fitness functions for GP to construct stock selection models for particular investment specifics with respect to risk. In a nutshell, these GP-based models rank stocks from high to low according to a pre-defined objective function. Because stock market data is highly noisy and complex in dimensionality, it often occurs that most of the aforementioned approaches exhibit inconsistent and unpredictable performance. These challenges arise mainly from the fact that the characteristics and processes of the underlying system that generate time series are generally nonlinear and non-stationary, and for these systems the models solving the relevant applications are usually unknown a priori. An advanced class of novel machine learning algorithms – support vector machines – that improve upon the deficiency of well-known linear techniques for solving these complex applications, was thus developed by Vapnik [11]. As opposed to the traditional empirical risk minimization principle employed by ANNs that minimizes the error on training data, SVMs employ the principle of structural risk minimization that aims to minimize the upper bound of generalization error, and over-fitting is less likely to occur. In general, the optimal solution to SVMs may also be global whereas other neural-network models tend to fall into a local optimal solution. As a result, SVM research thus far has showed that this methodology can outperform other non-linear methods, including neural-network based non-linear prediction, case based reasoning, Linear Discriminant Analysis, Quadratic Discriminant Analysis and Elman Back-propagation Neural Networks [12], [13], [14] and [15]. In this study, we therefore adopt this methodology for the investment problem investigated here. Furthermore, even though SVMs have been employed as a popular research methodology in the area of financial applications, most of them focused on the forecast of future direction of either a stock market index or individual stocks [14], [15], [16], [17] and [18]. Rather than the prediction of financial time series alone, in this study we investigate the task of stock selection using SVMs. This problem is challenging and important in investment, but it is not clear yet how SVMs can be used to advance this research area. Although there exists an earlier attempt using SVMs for this problem by Fan and Palaniswami [19], they solely employed SVMs to classify stocks into winning or losing groups, and this coarse-grained classification procedure usually failed to capture more subtle characteristics of individual stocks. In this study, we will utilize SVMs for regression (support vector regression – SVR) of stock returns, which then serve as surrogates for the actual returns of stocks to imply their quality and relative rankings. Via this improvement, we shall demonstrate SVR as an effective means for stock selection. However, despite the promising performance of the SVM and SVR in classification and regression, respectively, its success in solving these two problems is highly contingent upon the input variables (features) to the model. Yang and Honavar [20] indicated that several classification issues are determined by the choice of features that describe given patterns presented to a classifier, such as the classification accuracy of the learned classifier, the computational overhead required for learning a classification function, the number of training examples needed for learning, and the cost associated with the features. The goal of feature selection aims to identify useful, non-redundant subsets of features for a given data mining or machine learning task. By extracting the most essential yet least number of features, one can reduce the computational cost significantly, and construct models that are generalized enough to bring about consistent performance over unseen datasets. Furthermore, since the variables relevant to the SVM/SVR consist of not only the features but also the kernel parameters, it is expected that a successful model along this line of research shall take into consideration these two issues simultaneously. In the literature, simultaneous optimization on kernel parameters and feature subsets for SVM-based models has been conducted. Fröhlich et al. [21] first presented a study on this problem for SVM by using the GA, in which feature selection was the main research subject. Huang and Wang [22] then presented a different version for this sort of simultaneous optimization and showed that the classification accuracy of their proposed SVM can be improved for several UCI datasets [23]. Due to these promising results, in this stock-selection study, we thus propose to employ a SVR-based model with a hybrid feature selection and parameter optimization methodology by the GA. In our proposed framework, the task of feature selection depends on the learning algorithm that constructs the SVR model, and our scheme shall be categorized as a wrapper approach [24] and [25], as opposed to a filter approach. The wrapper approach for feature selection is employed in this study because of its improved performance over the filter approach [22], [23], [24], [25] and [26]. In essence, the optimization method we adopted here is very similar to that proposed by Huang and Wang [22], yet we will demonstrate our main contribution lies in a proper setup that successfully applied this hybrid methodology to stock selection, which is a new SVR application area. In a nutshell, the methodology we proposed here is to use the SVR to generate reliable surrogates of actual stock returns for stock rankings. Top-ranked stocks are then chosen for portfolio construction. For the simultaneous optimization on model parameters and feature subsets, we employ the GA for this task. We will report the portfolios constructed by our proposed scheme will substantially outperform the benchmark over the long period of time. This paper is organized into five sections. Section 2 outlines the methods employed in our study. Section 3 describes the research data used in this study. In Section 4, we describe the experimental design and empirical results are reported and discussed. Section 5 presents the conclusions and future research directions.
نتیجه گیری انگلیسی
In this paper we presented a hybrid GA–SVR model for stock selection. The SVR method was used to generate predicted returns on a collective of stocks, which in turn served as surrogates of the actual returns for stock rankings. Top-ranked stocks are then selected as components in a portfolio. On top of this model, the GA was employed for feature selection and optimization of model parameters. We have evaluated our GA–SVR models statistically and validated the effectiveness of this method by comparing with the benchmark. In this study, we have shown that feature selection can shed light on which features play more important roles in our proposed model. Interestingly, the results also showed that, in this particular application, the contribution of feature selection to effective stock selection appear to be more significant than that of the optimization on the model parameters alone. This work again highlights the crucial importance of feature selection in complex real-world problems, such as the stock selection problem studied here. Overall, the empirical results showed that the investment returns provided by our proposed model can significantly outperform the benchmark. Therefore, we expect this hybrid GA–SVR methodology to advance the research in computational finance and provide a promising solution to stock selection in practice. In the future, a plausible research direction is to employ more advanced SVR models to investigate how performance of stock selection can be further improved. In addition, because investment return and risk management appear to be two distinct objectives, in the future work, we expect that a study for simultaneous optimization on these multi-objectives is also a promising research subject to explore. Finally, we intend to conduct a further study on the characteristics of the stock selection domain to determine which algorithms, including ES, PSO or GA, shall be most fruitful for the optimization on our proposed work.