مدل سازی شبکه های بیزی برای سازه های ژنتیکی تکاملی
کد مقاله | سال انتشار | تعداد صفحات مقاله انگلیسی |
---|---|---|
29005 | 2010 | 11 صفحه PDF |
Publisher : Elsevier - Science Direct (الزویر - ساینس دایرکت)
Journal : Computers & Mathematics with Applications, Volume 59, Issue 8, April 2010, Pages 2541–2551
چکیده انگلیسی
Evolutionary theory states that stronger genetic characteristics reflect the organism’s ability to adapt to its environment and to survive the harsh competition faced by every species. Evolution normally takes millions of generations to assess and measure changes in heredity. Determining the connections, which constrain genotypes and lead superior ones to survive is an interesting problem. In order to accelerate this process,we develop an artificial genetic dataset, based on an artificial life (AL) environment genetic expression (ALGAE). ALGAE can provide a useful and unique set of meaningful data, which can not only describe the characteristics of genetic data, but also simplify its complexity for later analysis. To explore the hidden dependencies among the variables, Bayesian Networks (BNs) are used to analyze genotype data derived from simulated evolutionary processes and provide a graphical model to describe various connections among genes. There are a number of models available for data analysis such as artificial neural networks, decision trees, factor analysis, BNs, and so on. Yet BNs have distinct advantages as analytical methods which can discern hidden relationships among variables. Two main approaches, constraint based and score based, have been used to learn the BN structure. However, both suit either sparse structures or dense structures. Firstly, we introduce a hybrid algorithm, called “the E-algorithm”, to complement the benefits and limitations in both approaches for BN structure learning. Testing E-algorithm against a standardized benchmark dataset ALARM, suggests valid and accurate results. BAyesian Network ANAlysis (BANANA) is then developed which incorporates the E-algorithm to analyze the genetic data from ALGAE. The resulting BN topological structure with conditional probabilistic distributions reveals the principles of how survivors adapt during evolution producing an optimal genetic profile for evolutionary fitness.
مقدمه انگلیسی
Bayesian Network (BN) modeling for evolutionary genetic structure, uses BN to analyze genotype data derived from evolutionary processes and provides a graphical model to describe hidden dependencies among genes. According to evolutionary theory, stronger genetic characteristics reflect the organism’s ability to adapt to its environment and to survive the harsh competition faced by every species [1], [2] and [3]. Each individual’s traits and characteristics are coded into cellular information called genes. Genes evolve to be strong, fit genes; that is, nature selects the best genes and reproduces them using inheritance through generations of survivors. Such evolution normally takes millions of generations. But what are the hidden connections which constrain genotypes, yet lead to superior characteristics which promote survival is rather interesting. In order to explore this problem, we accelerate this process significantly, so that we can evaluate the genetic change much more rapidly. We then analyze the hidden evolutionary relationships. Having revealed these connections, we can determine which precise factors and connections promote fitness in an individual population or species. There are a number of models available for data analysis such as artificial neural networks, decision trees, factor analysis, BNs, and so on. Yet BNs have distinct advantages as computational tools. BN is an analytical tool which can discern hidden relationships among variables [4]. BN can handle incomplete datasets just as well as complete ones, and it can discover dependencies among all variables by representing them in a comprehensible graphical model. BNs have been widely used in bioinformatics (gene regulatory networks, protein structure), medicine, document classification, information retrieval and image processing [5], [6], [7], [8], [9], [10], [24], [25] and [26]. As probabilistic models, BNs have been used to replace traditional variation of genetic and evolutionary algorithm in evolutionary computing [11]. In [11], Pelikan segments chromosomes to different traps as variables and build a probabilistic model based on this; after that, only use this model to sample the solutions and generate new candidates population. BN has provided a more promising solution population, however, the real reason why this method can bring out the optimal candidates population more efficiently is the discovery of the hidden relationship among the genes. Thus, our work is undertaken as a response to reveal the discovery of this hidden relationship among the genes by applying BN as an analytical tool for a population solution space, rather than a probabilistic sampling tool. We therefore propose to apply BNs to analyze data arising in genetic research. We demonstrate our idea on a simulated genetic dataset, which mimics a biology-driven artificial life (AL) environment [12]. This AL simulation, Artificial Life Genetic Algorithm Expression (ALGAE), provides a useful and unique set of meaningful data, which can not only describe the characteristics of genetic data, but also simplify its complexity for our BN analysis. BAyesian Network ANAlysis (BANANA) is then developed to analyze the genetic data from ALGAE. BANANA incorporates a BN structure learning algorithm: the E-algorithm, first proposed by Yan et al. [13] and has been proven to be an efficient and accurate algorithm for constructing BN structure by later adaptations, applied to a business model [10] and [14]. The goal of our research is to reveal the hidden connections among genetic characteristics. Each chromosome in the AL species contains a coded gene sequence representing particular species characteristics. These characteristics appear random, but after generations of evolution, certain genetic attributes will emerge as dominant. However, this hidden information is not apparent from the raw data, and the meaning needs to be extracted and interpreted. BN analysis of the genetic data can produce a graphical and statistical representation showing the dependencies between genotypes among populations. The significance of the analysis of the hidden dependencies between genetic descriptors is that two important outcomes are produced as a result of research. Firstly, we generate an interesting and unique genetic dataset using the AL model, which extends the versatility and utility of the Genetic Algorithm (GA) so that it becomes a remarkable instrument for creating hypotheses for any given entities. Secondly, using BN to analyze the hidden dependencies among AL genetic data is a unique methodology. It provides a new approach for problem solving by combining evolutionary principles and BN modeling, based upon generating unique and expressive data. This paper is organized as follows: Section 2 provides background regarding Bayesian network learning and the E-algorithm; Section 3 introduces the design of ALGAE, and experiments to obtain artificial genetic data; Section 4 explains the process called BANANA, and the modeling for AL genetic data structure, and discusses the experimental results of genotype characteristic hidden connections; Section 5 summarizes our contribution and provides some open questions for further research.
نتیجه گیری انگلیسی
Bayesian networks in Gene Selection applies BNs to analyze and explain relationships between characteristics of artificial life species. Species can represent any organisms or classes of organism, or any comparable classes of entity existing in a competitive environment. Assuming that evolutionary data is provided, BN analysis assists us to understand the dependencies implicit in the relationships. First, we provide the E-algorithm for BN structure learning with two noteworthy improvements. One defines a partial structure “ΔΔ-form” for CI tests, in order to reduce redundant causal connections between variables. The second, indicates that the mutual information between each variable and its parents has been ordered and used for a heuristic search to reduce redundant recursions and to solve variable combinatory problems. Experiments on ALARM proves that the E-algorithm is valid, accurate and effective for BN learning. Furthermore, we implement ALGAE to simulate the viability of two populations in a competitive environment, subject to evolving and adapting forces. ALGAE proves effective at generating data which emulate natural selection and evolution for any two species or entities with definable characteristics. Control of certain factors such as environment, genetic recombination and selection, and presence or absence of specific genes produced valid and reliable data about which genes were fittest, given the constraints of their environment. The dataset favorably compares with standardized datasets. Thirdly, incorporated with E-algorithm, BANANA is used to analyze the artificial chromosome which is the product of the evolutionary process ALGAE. This research extends the utility of artificial life and the genetic algorithm by capturing and interpreting data which might otherwise be unavailable. This result also provides a unique bridge connecting BN and evolutionary processes. These evolutionary simulation data are useful to researchers who can benefit from predictive modeling. The experimental results show that Bayesian networks are flexible and valuable analytical data mining tools. The overall results are encouraging and suggest three outcomes: one, a single chromosome or gene combination derived from evolution do not, of themselves, determine fitness or survivability in a given environment. Two, fitness is contingent on the relationship between the AGenes, the mix, and the resulting genotype. Three, BANANA provides a map of the ideal genotype which demonstrates optimal fitness under certain conditions. Thus “optimal” does not mean any particular gene, but a combination of genes. The process of evolution is accelerated by ALGAE, allowing us to observe generations of genes evolving in a short time. This allows us to foresee the genetic recombination process. We analyze the linkages between generations that favor fitness (and thus survival) which emerge from the data. BN is a critical method to reveal the hidden structure and its relationships, and more importantly, its rules. The principles of how a survivor adapts in evolution from either optimal ancestors or weak ones, and at what point the evolutionary process can be tilted to favor certain adaptive ones, need further research.