استفاده از الگوریتم های ژنتیک برای انتخاب متغیر در برآورد تابش خورشیدی
کد مقاله | سال انتشار | تعداد صفحات مقاله انگلیسی |
---|---|---|
8102 | 2013 | 9 صفحه PDF |
Publisher : Elsevier - Science Direct (الزویر - ساینس دایرکت)
Journal : Renewable Energy, Volume 50, February 2013, Pages 168–176
چکیده انگلیسی
Prediction of climatic variables, in particular those related to wind and solar radiation, has developed a huge interest in recent years, mainly due to its applications to renewable energy. In many cases there is a large number of factors that influence the climatic variable of interest, and the researcher chooses the most relevant ones (based on previous knowledge of the region, availability, etc.) and runs a series of experiments combining the available data in order to find the combination that provides the best prediction. In this work we present two applications of Niching Genetic Algorithms to solve the problem of selection of variables for the estimation of Solar Radiation. On one hand, this methodology is able to estimate a given climatic variable using databases with missing data, since the algorithm can compensate it by the use of others. On the other hand, we present a methodology that allows us to select the relevant input variables for a given climatic variable estimation or prediction problem, in a systematic way, using the same Genetic Algorithm with different parameters. Both methods were tested in the estimation of daily Global Solar Radiation in El Colmenar (Tucumán, Argentina), using linear regression on data from 14 weather stations spread along the north of Argentina. The results obtained show that the methodology is appropriate, providing an RMSE = 2.36 [MJ/m2] and R = 0.926 using an average of 64 out of 329 initial variables, on a 70 individuals/85 generations combination. For a 200 individuals/150 generations combination it obtained an RMSE = 2.34 [MJ/m2] and R = 0.928 using an average of 54 variables.
مقدمه انگلیسی
Prediction of climatic variables, in particular those related to wind and solar radiation, has developed a huge interest in recent years, mainly due to its applications to renewable energy. Solar radiation is one of the climatic variables whose lack of data is quite generalized. Worldwide, about one in 500 weather stations make measurements of incident solar radiation Raichijk et al. [1]. Solar radiation affects the growth of crops and it is used in numerical models to estimate soil moisture, photosynthesis and potential evapotranspiration (Ball et al. [2]). There are many agricultural regions of Argentina lacking data on radiation and, they need to be estimated (Al-Alawi and Al-Hinai [3]). Empirical models (Noia et al. [4] and Tovar and Baldasano [5]), statistical approaches coming from time-series analysis (Ji et al. [6]), linear regression (Lin and Gao [7], Ji et al. [6], Bocco et al. [8]), and neural networks (Al-Alawi and Al-Hinai [3], Mohandes et al. [9], Kalogirou et al. [10], Bocco et al. [11] and Rehman and Mohandes [12]) are some of the main tools that have been applied to estimate and predict the solar radiation that reaches the earth's surface. Studies report that linear regression has better performance than statistical models (Ji et al. [6]), and neural network outperforms linear regression (Bocco et al. [11], Bocco et al. [8]), but it takes a longer processing time. These methods have also been used to predict other weather phenomena (Brahm and Varas [13], Bilgili et al. [14], Kusiak and Li [15]). In all the mentioned works, the selection of the models and the variables that are used for the prediction, is an analysis of combinations among climatic variables, based on a list of possibilities proposed by the researcher. Linear correlation is sometimes used (Al-Alawi and Al-Hinai [3], Kusiak and Li [15]), even though the relation among the variables might not be linear, as in Neural Networks. It is also reported that the quality of the estimates largely depends on the degree of correlation between the weather stations used. Moreover, the number of variables used in the model can be increased in an attempt to improve the accuracy of the prediction. The drawback is that it significantly increases the number of combinations, making it harder to determine which variables have the greatest influence on the climate variable to be estimated, and also increasing the number of tests needed. The objective of this work is to use a niching genetic algorithm to determine which climatic variables and weather stations have the greatest influence in the estimation of a climatic variable. This allows an objective analysis of large amounts of data, as a previous step to other analysis that, like neural networks, would be prohibitively expensive. In our approach, all non-relevant variables can be eliminated, and the critical variables identified, greatly reducing the number of variables involved in a prediction or estimation. It is important to distinguish between the proposed methodology and sensitivity analysis. Sensitivity analysis studies how error or uncertainty in the output variables can be related to error or uncertainties in the input variables (Saltelli et al. [16], (Saltelli [17])). Although more sophisticated methods exists, most methods found in the literature are local or one-at-a-time (Saltelli et al. [16]), and the technique is mainly used to determine which variable has the greatest influence in the output (see Varmuza and Filzmoser [18] and Vázquez Piqué et al. [19]). The proposed methodology is oriented to provide different combinations of the input variables that will obtain the prediction with the least possible error. The objective in this case is to identify the variables with the greatest influence in the model, and to allow the use of the prediction methodology when some of the variables are missing. We apply the methodology to the case of using Linear Regression to estimate Global Solar Radiation for El Colmenar, Tucumán, Argentina, on the basis of data from 13 other weather stations spread along the North of Argentina.
نتیجه گیری انگلیسی
The genetic algorithm system developed identifies correctly several variables as critical, several combinations of other equivalent variables, and some as redundant, that produce the same prediction error. The search space considered is a combinatorial optimization problem with 23292329 possible answers, what justifies the application of genetic algorithm since reduces this to an average of 54 variables. This methodology fulfills the climatic variable with similar precision, on the basis of different combinations of data, according to availability or other criteria (v.g., preferring variables with a smaller measurement error, in order to decrease the final error). Moreover, even lower validation errors can be achieved by using 1 or more days posterior to the estimation day. This might make sense in some applications like historic data fill up, but it limits real-life applications such as data quality control or prediction, where future data are not available. The proposed methodology allows for a more efficient use of the data available, since a user can analyze larger sets of data at the same time, and select relevant stations and variables for a later analysis. In future papers, we will adapt the technology for variable selection in neural networks, where the problem is not equivalent, due to the large number of variables involved and the long processing times.