یک روش برنامه ریزی پویا برای برآورد اطلاعات از دست رفته با استفاده از شبکه های عصبی
|کد مقاله||سال انتشار||مقاله انگلیسی||ترجمه فارسی||تعداد کلمات|
|25885||2013||10 صفحه PDF||سفارش دهید||محاسبه نشده|
Publisher : Elsevier - Science Direct (الزویر - ساینس دایرکت)
Journal : Information Sciences, Volume 237, 10 July 2013, Pages 49–58
This paper develops and presents a novel technique for missing data estimation using a combination of dynamic programming, neural networks and genetic algorithms (GA) on suitable subsets of the input data. The method proposed here is well suited for decision making processes and uses the concept of optimality and the Bellman’s equation to estimate the missing data. The proposed approach is applied to an HIV/AIDS database and the results shows that the proposed method significantly outperforms a similar method where dynamic programming is not used. This paper also suggests a different way of formulating a missing data problem such that the dynamic programming is applicable to estimate the missing data.
Decision making processes are highly dependent on the availability of data, from which information can be extracted. All scientific, business and economic decisions are somehow related to the information available at the time of making such decisions. It is for this reason that the problem of missing data afflicts a variety of research and application areas in fields such as engineering, economics, finance and many more. Most predictive and decision making models designed to use a specified number of inputs will breakdown when one or more inputs are not available. In many such applications, simply ignoring or deleting the incomplete record (known as case deletion) is not a favorable option, as it may bring more harm than good . In a statistical model, case deletion can also lead to biased results and in applications such as machine control, case deletion may result in breakdown of machinery . Many techniques to estimate missing data that are aimed at minimizing the bias or output error of a model have been extensively researched ,  and . Most of these are statistical methods, one of the most successful being Bayesian multiple imputation . It is unfortunate that most decision making tools such as the commonly used neural networks and many other computational intelligence techniques cannot be used for decision making if data are not complete. In such cases, the optimal decision output should nevertheless, still be maintained despite the missing data. The estimation of missing input vector elements requires a system that possesses the knowledge of certain characteristics such as correlations between variables, which are inherent in the input space. Computational intelligence techniques and maximum likelihood techniques do possess such characteristics and, as a result, are useful in the imputation of missing data . This paper proposes a novel technique for missing data estimation, grounded on the theory of dynamic programing. The novel method proposed here uses neural networks and genetic algorithms (GA) on suitable subsets of the input data, and assumes some model that emits data, some of which are missing. The remainder of this paper is arranged as follows: Section 2 presents a literature review and discusses related methods. The problem is presented in detail in Section 3, followed by the background information in Section 4. Sections 5, 6 and 7 present in detail, information on the proposed model together with a description of the base model. Lastly, experimental results are given followed by a discussion.
نتیجه گیری انگلیسی
A model using auto-associative neural networks and genetic algorithm was built to estimate missing data, using the principle of dynamic programing. The results indicate that dynamic programing adds many advantages to the base model. The difference is very significant and future work will look at the test of significance of the difference. Since the model was run more than 10 times for each variable, it is conclusive that credit has to be given to dynamic programming. The rationale as to why this method improves performance of the base model is that the base model assumes data variables to be somehow related to one another. From statistical analysis, it can be anticipated that parameters such as race have an influence on the other variables such as HIV. In this model, where these parameters are viewed as states, it shows that the states actually affect each other. Looking at this from a behavioral point, a person is likely to act more similarly to someone close to their age than someone much older or younger than them. However, with this model, it is left to the model to derive the policy that maximizes the prediction.