Predicting the stream flow is one of the most important steps in the water resources management. Artificial neural network (ANN) has been suggested and applied for this purpose by many of researchers. In such studies for verification and comparison of ANN results usually the popular methods such as multivariate linear regression (MLR) is used. Unfortunately, the presented methodology in some researches is faced with some problems. Thus, in this paper we have tried to find out the deficiencies of them and subsequently to present a correct the MLR methodology based on principal component analysis (PCA) for prediction of monthly stream flow. Then, assessment of different training functions on ANN operation is investigated and the best training function for optimizing the ANN parameters is selected. Afterward, the imperfections of the discrepancy ration (DR) statistic are remedied and a proper DR statistic is developed. Finally, the error distribution for testing stage of MLR and ANN models are calculated using developed DR statistic. The results of comparison show that the presented methodology in this research has improved the MLR operation. Also, comparing with the MLR, the ANN model possesses satisfactory predicting performance.
Stream flow discharge forecasting has been considered as an important challenge for the researchers in the two past decades. For the purpose of stream flow discharge modeling different approaches such as regression (Kuligowski & Barros, 1998; Adeloye & Munari, 2006), conceptual (Jain and Srinivasulu, 2006 and Xu et al., 1996) and black box (Hsu et al., 1995 and Muller-Wohlfeil et al., 2003) models are used. Multivariate linear regression (MLR) model is on of the customary statistical models that used along with the artificial neural network (ANN) models in the hydrological modeling. Besides, the simpler application of it than conceptual models has caused MLR could be even used by no experts in field of water resources management (WRM). Applequist, Garhs, and Pfeffer (2002) compared five different techniques (ANN, linear regression, discriminant analysis, logistic regression and a classifying system) for rainfall forecasts. They used meteorological variables for training over central and eastern areas of the USA. The logistic regression model in their study had the best performance. Ramirez, Velho, and Ferreira (2005) compared two ANN and MLR techniques for rainfall forecasting. They reported that ANN forecasts were superior to the ones obtained by the linear regression model. Dawson, Abrahart, Shamseldin, and Wilby (2006) developed flood estimation model at ungauged sites using ANN and MLR models and stated that ANN provide better results than MLR model. High correlation of independent variables to each other is the famous customary in the hydrological process when we have very input variables. It caused the multicolinearity problem for MLR model that unfortunately neglected in the some researches (Applequist et al., 2002, Dawson et al., 2006, Kuligowski and Barros, 1998 and Ramirez et al., 2005).
On the other hand, a survey on literatures of the past two decades show that ANN modeling studies in WRM are continued and every year we can see some papers about different aspects of this model which have innovative solutions for challenging problems in this field. Most of them have been done by feedforward backpropagation neural network (Karunanithi et al., 1994, Kisi, 2004 and Noori et al., 2009a). The standard backpropagation algorithm (SBPA) has some problems include; the training convergence speed is very slow and easy entrapment in a local minimum (Haykin, 1994). The researchers in during last decades have tried to find the optimum solution for these problems and improve the ANN operation. Chau (2006) has used the particle swarm optimization as a training function to optimize the network weights and biases for prediction the water level in Shing Mun River. He compared the results with the SBPA and reported the superiority of his model. Rogers, Dowla, and Johnson (1995) proposed the genetic algorithm instead of SBPA. Also, Wang, Gelder, Vrijling, and Ma (2006) have used gradient descent with momentum training function to optimize the network parameters for daily flow prediction. This training function often provides faster convergence than SBPA and momentum allows a network to respond not only to the local gradient, but also to recent trends in the error surface (Hagan, Demuth, & Beale, 1996). In another research Ramirez et al. (2005) proposed the resilient backpropagation (RP) training function for network training to predict the rainfall in Sao Paulo, Brazil. Using RP can improve the results. Multilayer networks usually use sigmoid transfer functions in the hidden layers. These functions are often called “squashing” functions. These functions compress an infinite input range into a finite output range. Sigmoid functions slopes must approach zero, as the input gets large. So when you use steepest descent to train a multilayer network with sigmoid functions some problem will be accrued, because the gradient can have a very small magnitude and it causes small changes in the weights and biases. RP can eliminate these harmful effects. Also, some researchers proposed the Levenberg–Marquardt algorithm suggested with Levenberg, 1944 and Marquardt, 1963 (TRAINLM) as a training function for SBPA.
In this paper we used two methods, ANN and MLR models. We have proposed a new application of principal component analysis (PCA) for using in process of feed data in MLR model. In addition, we investigated the effect of some important training functions which use heuristic and optimization methods to update the ANN weights and biases. Finally for results comparison of the models a proper statistic index is developed based on discrepancy ratio (DR) statistic presented by White, Milli, and Crabbe (1973).
In this research to take into consideration of problems in the some water resources studies based on MLR model a proper methodology for this model developed to predict the monthly flow. Besides, a proper ANN model developed using the most famous training functions. The following results can be getting in this study:
•
The discrepancy ratio statistic proposed by White et al. (1973) is not comprehensive and capable to evaluation of the all models.
•
An alternative option to remedy of multicolinearity problem in MLR model is PCA.
•
The developed discrepancy ratio statistic is a proper statistic index to check the model robustness that it do not has limitations of discrepancy ratio is presented by White et al. (1973).
•
Different training functions don’t have significant effect on ANN operation; however CGF, SCG, and OSS have better results than the others.
•
Performed analysis on evaluation of model robustness for wet and arid periods indicated that CGF, SCG models have better results than OSS model.