پیش بینی سری های زمانی مالی با استفاده از تجزیه و تحلیل مولفه های مستقل و رگرسیون بردار پشتیبانی
کد مقاله | سال انتشار | تعداد صفحات مقاله انگلیسی |
---|---|---|
25033 | 2009 | 11 صفحه PDF |
Publisher : Elsevier - Science Direct (الزویر - ساینس دایرکت)
Journal : Decision Support Systems, Volume 47, Issue 2, May 2009, Pages 115–125
چکیده انگلیسی
As financial time series are inherently noisy and non-stationary, it is regarded as one of the most challenging applications of time series forecasting. Due to the advantages of generalization capability in obtaining a unique solution, support vector regression (SVR) has also been successfully applied in financial time series forecasting. In the modeling of financial time series using SVR, one of the key problems is the inherent high noise. Thus, detecting and removing the noise are important but difficult tasks when building an SVR forecasting model. To alleviate the influence of noise, a two-stage modeling approach using independent component analysis (ICA) and support vector regression is proposed in financial time series forecasting. ICA is a novel statistical signal processing technique that was originally proposed to find the latent source signals from observed mixture signals without having any prior knowledge of the mixing mechanism. The proposed approach first uses ICA to the forecasting variables for generating the independent components (ICs). After identifying and removing the ICs containing the noise, the rest of the ICs are then used to reconstruct the forecasting variables which contain less noise and served as the input variables of the SVR forecasting model. In order to evaluate the performance of the proposed approach, the Nikkei 225 opening index and TAIEX closing index are used as illustrative examples. Experimental results show that the proposed model outperforms the SVR model with non-filtered forecasting variables and a random walk model.
مقدمه انگلیسی
There has been growing interest in financial time series forecasting in recent years as accurate forecasting of financial prices/indices has become an important issue in investment decision making. However, financial time series are inherently noisy and non-stationary [19] and [64]. The noise characteristic refers to the unavailability of complete information from past behavior of financial markets to fully capture the dependency between future and past prices. The information that is not included in the forecasting model is considered as noise while the non-stationary characteristic implies that the distribution of financial time series is changing over time. Therefore, financial time series forecasting is regarded as one of the most challenging tasks of time series forecasting. Neural networks have been found to be useful techniques for modeling financial time series due to their ability to capture subtle functional relationships among the empirical data even though the underlying relationships are unknown or hard to describe [34], [36], [37], [38], [52], [61], [65] and [66]. Unlike traditional statistical models, known as Box-Jenkins ARIMA [5], neural networks are data-driven and non-parametric models. They do not require strong model assumptions and can map any nonlinear function without a priori assumption about the properties of the data [20], [61] and [66]. The most popular neural network training algorithm for financial forecasting is the backpropagation neural networks (BPN) that has a simple architecture but a powerful problem-solving ability. However, the BPN also suffers from a number of shortcomings such as the need for a large number of controlling parameters, difficulty in obtaining a stable solution and the risk of model over-fitting [7], [8], [55] and [56]. Support vector machines (SVMs) is a novel neural network algorithm based on statistical learning theory [59] and [60]. It can lead to great potential and superior performance in practical applications. This is largely due to the structure risk minimization principles in SVMs, which has greater generalization ability and is superior to the empirical risk minimization principle as adopted by traditional neural networks. Due to the advantages of the generalization capability in obtaining a unique solution, the SVMs have drawn the attention of researchers and have been applied in many applications such as texture classification, image recognition, data mining and bioinformatics [6], [14], [22], [31], [40], [44], [46] and [50]. With the introduction of Vapnik's ε-insensitivity loss function, the regression model of SVMs, called support vector regression (SVR), has also been receiving increasing attention to solve nonlinear estimation problems [59] and [60]. It has been successfully applied in different problems of time series prediction such as production value forecast of machinery industry, engine reliability prediction, wind speed prediction and financial time series forecasting [7], [8], [26], [30], [45], [48], [55], [56] and [57]. Since there are many successful results of utilizing SVR in time series prediction, it motivates our research work by using SVR for financial time series forecasting. In the modeling of financial time series using SVR, one of the key problems is the inherent noise of the financial time series. Learning observations with noise without paying attention may lead to fitting those unwanted data and may torture the approximation function. This will result in the loss of generalization capability in the testing phase. Moreover, the noise in the data could lead to over-fitting or under-fitting problems [7] and [19]. Therefore, detecting and removing the noise are important but difficult tasks when building an SVR forecasting model. Few studies have been proposed to deflate the influence of noisy data and enhance the robust capability of SVR. Chuang et al. [15] proposed a robust support vector regression network. They used the concept of tradition robust statistics to fine tune the model obtained by SVR trying to reduce the overfitting phenomenon and improve the learning performance. Suykens et al. [53] presented a weighted version of least squares SVM (LS-SVM) to overcome the effects of outliers. In their approach, an LS-SVM was trained on the entire dataset for yielding the support values. A small fraction of the dataset associated with support values having the smallest magnitude are discarded and the LS-SVM retrained on the remaining data. This process is repeated until a sufficiently small kernel expansion is obtained. As the existing methods would either involve extensive computation or use additional parameters in SVR algorithm to reduce the effects of outliers/noise contained in the data. However, the consuming time of performing SVR algorithm will be increased while the extensive computation is carried out. When the parameters are not properly chosen, the final results may be affected by its parameters. Moreover, the selection of parameters is not straightforward. To avoid the limitations of the existing method and reduce the influence of noise, a two-stage approach by combining independent component analysis (ICA) and support vector regression is proposed in this research for modeling financial time series. ICA is a novel statistical signal processing technique to find independent sources given only observed data that are mixtures of unknown sources without any prior knowledge of the mixing mechanism [25] and [35]. In the basic ICA model, the observed mixture signals X can be expressed as X = AS, where A is an unknown mixing matrix and S represents the latent source signals that cannot be directly observed from the mixture signals X. The ICA model describes how the observed mixture signals are generated by a process that uses the mixing matrix A to linearly mix the latent source signals S. The source signals are assumed to be mutually statistically independent. Based on this assumption, the ICA solution is obtained in an unsupervised learning process that finds a de-mixing matrix W. The de-mixing matrix W is used to transform the observed mixture signals X to yield the independent signals Y, i.e., WX = Y. The independent signals Y are then used as the estimates of the latent source signals S. The rows of Y, called independent components (ICs), are required to be as mutually independent as possible. Even though the basic ICA model has been widely applied in signal processing, face recognition, feature extraction and quality control [3], [17], [28], [29], [32], [43], [42] and [58], there are still few applications using ICA in financial time series forecasting. Back and Weigend [1] used ICA to exact the features of the daily returns of the 28 largest Japanese stocks. The results showed that the dominant ICs can reveal more underlying structure and information of the stock prices than principal component analysis. Kiviluoto and Oja [33] employed ICA to find the fundamental factors affecting the cash flow of 40 stores in the same retail chain. They found that the cash flow of the retail stores was mainly affected by holidays, seasons and competitors' strategies. Oja et al. [47] applied ICA in foreign exchange rate time forecasting. They first used ICA to estimate independent components and mixing matrix from the observed time series data and filtered the independent components to reduce the effects of noise through linear and nonlinear smoothing techniques. Then, the autoregression (AR) model was employed to predict the smoothed independent components. Finally, they combined the predictions of each smoothed IC by using mixing matrix and thus obtained the predictions for the original observed time series. There are only very few articles addressing both ICA and SVR in conducting forecasting tasks. Cao and Chong [9] employed ICA as a feature extraction tool in developing a SVM forecaster. The independent components (ICs) were considered as features of the forecasting data and used to build the SVM forecasting model. Chen et al. [11] combined dynamic independent component analysis (DICA) with SVR to construct multi-layer support vector regression model. The DICA was used in the first layer to extract the major dynamic features from the process. The second layer is the SVR that makes the regression estimation based the extracted features. Hou et al. [21] applied ICA and SVR in near-infrared (NIR) spectral analysis. They used ICA to extract the independent components and corresponding mixing matrix from the NIR spectra of chemical components, then the SVR was used to build a model between mixing matrix and the real concentration matrix of chemical components for spectral analysis. Wang et al. [62] utilized kernel independent component analysis and SVR for the estimation of source ultraviolet spectra profiles and simultaneous determination of polycomponents in mixtures. They applied ICA to estimate the ultraviolet source spectra profiles. Then, the calibration model was build by using SVR based on the mixing matrix. The existing ICA–SVR model approach usually only uses independent components or the mixing matrix as the inputs of the built SVR model. Moreover, the existing method did not discuss the features of the ICs. On the other hand, our proposed ICA–SVR model identifies the ICs that can be used to represent the main feature or noise of the original data. Based on these two points, we believe that our proposed modeling approach differs from those appeared in the literature and hence provides an ideal alternative in conducting financial time series forecasting. In this study, we present a financial time series forecasting model by integrating ICA and SVR. The ICA method is used to detect and remove the noise of financial time series data and further improve the performance of SVR. The proposed approach first uses ICA to the forecasting variables to estimate the independent components and mixing matrix. Since the financial time series are inherently noisy, at least one IC can be used to represent noise information of the data. After identifying and removing the ICs containing the noise, the rest of the ICs are then used to reconstruct the forecasting variables which contain less noise. The SVR then uses the filtered (or de-noised) forecasting variables to build the forecasting model. In order to evaluate the performance of the proposed approach, the Nikkei 225 opening cash index and TAIEX (Taiwan Stock Exchange Capitalization Weighted Stock Index) closing cash index are used as the illustrative examples. The rest of this paper is organized as follows. 2 and 3 respectively, give a brief introduction about independent component analysis and support vector regression. The proposed two-stage forecasting model is thoroughly described in Section 4. Section 5 presents the experimental results from the datasets including the Nikkei 225 opening cash index and TAIEX closing cash index. The paper is concluded in Section 6.
نتیجه گیری انگلیسی
This paper proposed a two-stage forecasting model by integrating ICA and SVR for financial time series. The proposed ICA–SVR method first uses ICA based on reconstruction criterion to remove the noise from forecasting variables since the financial time series data is inherently noisy. The noise in the data could lead to an over-fitting or under-fitting problem. The filtered forecasting variables containing less noise information are then used in SVR for building forecasting model. The experiments have evaluated two datasets including the Nikkei 225 opening cash index and the TAIEX closing cash index. This study compared the proposed method with traditional SVR and random walk models using prediction error and prediction accuracy as criteria. Experimental results showed that the proposed model can produce lower prediction error and higher prediction accuracy and outperformed the SVR and random walk models. According to the experiments, it can be concluded that the proposed method can effectively detect and remove the noise from financial time series data and improve the forecasting performance of SVR. Future researches may aim at combining ICA and other forecasting tools, like neural networks and grey system theory, in evaluating the ability of the proposed de-noise forecasting scheme. Integrating SVR and other signal processing techniques, like wavelet transform and nonnegative matrix factorization, in further improving the forecasting capabilities can also be investigated in future studies.