کاربرد داده کاوی برای یادگیری پویایی سیستم در یک مدل بیولوژیکی
کد مقاله | سال انتشار | تعداد صفحات مقاله انگلیسی |
---|---|---|
22068 | 2006 | 9 صفحه PDF |
Publisher : Elsevier - Science Direct (الزویر - ساینس دایرکت)
Journal : Expert Systems with Applications, Volume 30, Issue 1, January 2006, Pages 50–58
چکیده انگلیسی
Data mining consists of a set of powerful methods that have been successfully applied to many different application domains, including business, engineering, and bioinformatics. In this paper, we propose an innovative approach that uses genetic algorithms to mine a set of temporal behavior data output by a biological system in order to determine the kinetic parameters of the system. Analyzing the behavior of a biological network is a complicated task. In our approach, the machine learning method is integrated with the framework of system dynamics so that its findings are expressed in a form of system dynamics model. An application of the method to the cell division cycle model has shown that the method can discover approximate parametric values of the system and reproduce the input behavior.
مقدمه انگلیسی
Traditionally, researchers adopt the reductionism to study biological phenomena, i.e. analyzing a system by breaking it into constituents repeatedly until they can be observed directly (Gallagher & Appenzeller, 1999). In order to find the function and role of a component, the researcher has to repeatedly conduct experiments with different system parameters or components. Although this approach works fine in most situations, it often encounters difficulties when we intend to examine the interaction effects within a system or when the system is complicated. It is also well-known that the net behavior of a biological system is usually not the sum of its components' behavior (Csete & Doyle, 2002) because of the existence of the so-called ‘emergent property’ (Bhalla and Lyengar, 1999, Gardner and Collins, 2000 and Yi et al., 2000). Recently, a system view of biology called systems biology has been proposed ( Chong and Ray, 2002, Davidson et al., 2000 and Kitano, 2002a), which aims to the development of a system-level understanding of biological systems ( Kitano, 2002a). In other words, one wants to understand not only the molecules but also the cause–effect relationships linking the behavior of molecules as well as the characteristics and functions of a system. Although artificial intelligence has increasingly been used in analyzing biological data for years, this is certainly a more difficult case and needs innovative methods. We propose an approach that integrates system dynamics and data mining methods to induce the dynamic behavior of a biological system in this paper. System dynamics is a discipline that studies the dynamic behavior of social systems (Forrester, 1961). In particular, it has an advantage in modeling the information-feedback characteristics to see how system structure, amplification (in policies), and time delays (in decisions and actions) may interact to influence the behavior of an organization. Since a social system is a combination of a number of simple entities (or agents) that operate in an environment to generate complex behavior patterns as a collective, it may be suitable for analyzing the information-feedback loops and complicated interactions within a biological system (Becskei and Serrano, 2000 and Gardner and Collins, 2000). A challenge for applying system dynamics to the analysis of biological data is that the base model for analysis is often constructed by human experts who have expertise in the application domain and are able to draw a flow diagram by observing the operation of target system to represent the causal relationships among system entities (variables) (Coyle, 1977, Lyneis and Pugh, 1996 and Starr, 1980). This, however, is not the case in biological analysis because in most cases the biological systems under study act like black-boxes and only their input and output behavior can be observed over time. Thus, direct application of system dynamics to the construction of biological models is very difficult, if not impossible. We need a mechanism to bridge the gap. A possible way to deal with the problem is to use data mining techniques to analyze the observed behavior data to discover the hidden relationships and/or rules behind the system dynamics. In order to do this, a data mining method needs to be augmented; it has to have a conceptual framework beforehand so that the findings from data will be express in the form of a system model. In this paper, we will use a combination of genetic algorithms and artificial neural networks to implement the idea. The artificial network is designed to emulate a system dynamics model and then encode into a genetic form for learning. The proposed approach is applied to experiment on the synthetic cell division cycle model (CDC6, hereafter) created by Tyson (1991). The behavior data generated by CDC6 model is given as an input to the developed method to learn the model's kinetic parameters. The results are then compared with the original data to evaluate the effectiveness of the approach. The remainder of the paper is organized as follows. Section 2 is a brief review of related literature. Section 3 describes the proposed approach for mining behavior relationships from a set of observed biological data. Section 4 illustrates the result when the approach is applied to the CDC6 model. Section 5 concludes the paper.
نتیجه گیری انگلیسی
In this paper, we present an innovative approach that represents a biological system with a specially designed artificial neural network and then uses genetic algorithms to modify the link weights of the network to discover the system's kinetic parameters. Although there are tools in some systems biology's websites (e.g. http://sbml.org/index.psp) in which some of them can perform modeling and simulation, the proposed approach is unique in that it provides a means to bridge two sciences: system dynamics and systems biology so that their knowledge can be shared and transferred for better integration. Applying data mining to analysis of bio-information is an important area of study. The fast progress of biology development has accumulated a tremendous amount of experimental data, which becomes a big challenge to efficiently extract valuable knowledge hidden behind. Data mining can contribute substantially in this area by generating potential solutions to save the time and effort of a biologist. The example shown in our approach is just an initial step to discover related information from a biological system. The ultimate goal of this line of study can be using data mining techniques to assist model construction and behavior analysis in systems biology. Although we have shown that the proposed approach is capable of modeling a biological system in systems dynamics to analyze its behavior, there are many issues that need further elaboration or investigation. For instance, a further issue coming after is ‘can this method be applied to reveal structure information for a biological model?’ This is an area that traditionally can be handled only by human experts and little literature can be found in biology. Koza et al. (2001) used genetic programming to discover the network of chemical reactions from a set of temporal data, but it required thousands of processors to run in parallel for a number of hours. Since our method has been demonstrated to be able to learn kinetic parameters successfully, it is highly possible that our approach can be extended to mine structural information of a network. Since evolving a network involves adding and removing nodes and links into or off from the network, an extension of this study is to revise the encoding scheme for partial recurrent neural networks. For example, we may use a strategy to encode it indirectly (Curran & O'Riordan, 2002), and describe the network structure by a set of construction instructions (i.e. a script) so that, by modifying them, the network structure changes. Inferring structural information, however, needs to be more careful because isomorphism may exist among different structures, which makes different models generate the same (or similar) behavior patterns. Over-fitting problems may also occur when we use data mining. This is a concern about whether an automatic method will produce a model that generates ‘the right behavior for the wrong reasons’, or just tries to ‘confirm but not falsify’ a hypothesis (Oliva, 2003). Another issue related to this discussion is the ‘robustness’ of a biological system, which is currently actively investigated in systems biology (Kitano, 2002b and Morohashi et al., 2002). It is suggested that biochemical networks are conserved across species and are robust to variations in concentrations and kinetic parameters. If this is true, then what data mining discovers may not have reliable biological meanings or at least judgments from domain experts are essential for interpreting and using the resulting models.