تجزیه و تحلیل مسائل سیستم های غیر متعارف در زیست شناسی مولکولی: یک مطالعه موردی در مدل سازی شبکه های نظارتی ژن
|کد مقاله||سال انتشار||مقاله انگلیسی||ترجمه فارسی||تعداد کلمات|
|27960||2005||17 صفحه PDF||سفارش دهید||11150 کلمه|
Publisher : Elsevier - Science Direct (الزویر - ساینس دایرکت)
Journal : Computers & Chemical Engineering, Volume 29, Issue 3, 15 February 2005, Pages 547–563
The broad conceptual postulate that systems engineering techniques developed for complex chemical processes may be applicable to complex cell biological processes is very compelling. However, a naïve, “direct” application of systems engineering techniques to biological problems of practical significance may be rendered virtually ineffective by fundamental differences between cell biology and chemical processes. These differences and the problems they pose are illustrated in this paper with an example problem: modeling a gene regulatory network involved in the yeast cell cycle. We demonstrate how the biological essence complicates a straightforward “process modeling/identification” problem and subsequently recommend an alternative approach. The approach—a middle ground between a direct, “off the shelf” application of systems engineering tools and a “one-at-a-time” ad-hoc development—incorporates fundamental knowledge of the mechanisms and constraints intrinsic to biological systems. The principles and implementation details of the approach are illustrated with the case study.
Cells are complex dynamical systems that are constantly remodeling themselves over time in response to changes in their internal and external environments. The emergence of technology for acquiring genome-wide gene expression and other global data sets has created an opportunity for understanding this process at the system-wide level. Given its scale and complexity, the problem of deciphering system-wide cellular regulation has naturally attracted computational approaches from the physical, engineering, and computer sciences. The net result is that many cell biologists are increasingly applying computational methods in their work while many computational scientists are also increasingly engaging in biological research problems. Just as individuals with training in the biological sciences must acquire sufficient domain knowledge before they can effectively apply advanced modeling, simulation, and analysis techniques, so must the computational scientist acquire sufficient domain knowledge to be effective in dealing with complex biological problems. Illustrating this latter point is the focus of the present work. Specifically we present the idiosyncrasies, complexities, and constraints intrinsic to biological systems that must be addressed for systems engineering tools and expertise to be successful in dealing with typical and yet non-trivial problems in biology. We take as an example the modeling and identification of gene regulatory networks from global gene expression data, specifically the regulatory network underlying the yeast cell cycle. 1.1. Computational modeling in biology Whereas a key objective of modeling and identification of chemical processes is model development for process control and optimization, currently the key objective of modeling biological systems is “process understanding” or “reverse engineering” ( Csete and Doyle, 2002). Such understanding may be in terms of metaphors for signaling pathways that explain the “rationale” behind their complex architectures ( Bhalla, 2003; Neves & Iyengar, 2002); it may also be understanding that serves to reduce complexity and scale, for example understanding the global gene expression response of a cell to a perturbation as the response of a few transcriptional regulators instead of the response of thousands of genes ( Bussemaker, Li, & Siggia, 2001). The latter may be considered integrative understanding in that the thousands of observations are integrated into a concise explanation, and the consistent framework provided by modeling and identification for such integrative understanding may be one of the strongest arguments for applying such techniques in biology ( Hartemink, Gifford, Jaakola, & Young, 2002; Ideker et al., 2001; Jarvis et al., 2002). Integration of complementary data types into computational models provides understanding that would not be possible if the data types were considered individually. Genomic sequences, gene expression profiles, protein–DNA interaction data, and transcript degradation constants, for example, may be integrated into computational models of system-wide transcriptional regulation that are far more interpretable with respect to cellular function than any of these data types are individually. Finally, computational models can be used to enhance experimental studies: they can be used to generate testable hypotheses and for designing experiments tailored for optimal extraction of the desired system information. When sufficient process understanding has been acquired, it may become possible to “forward engineer” biological systems to meet specific objectives ( Yokobayashi, Collins, Leadbetter, Weiss, & Arnold, 2003).
نتیجه گیری انگلیسی
In the present study, using as an example the gene regulatory network modeling problem, we have demonstrated how systems engineering approaches that explicitly recognize the complexities, constraints, and idiosyncrasies of biological systems can effectively handle complex biological problems. We described how imposing structure via fundamental knowledge and the inclusion of multiple types of data into the resulting structured modeling and identification approach can render an otherwise intractable problem more tractable (although this is achieved at the expense of introducing additional idiosyncrasies specific to each type of additional data). Given the current rate of progress in genomic sequencing, annotation, and bioinformatic tool development, additional data types and information that may constrain model structures are increasingly available. There is no reason to exclude these data types from attempts to model gene regulation because they are available for practically any system of interest. While the data collections for all organisms are not currently as extensive as they are for yeast, the information that is available can nevertheless significantly enhance the modeling efforts. For example, merely specifying that only TFs may regulate the expression of genes can reduce the number of model parameters by an order of magnitude. Using data that is currently available for yeast, we demonstrated a framework for integrating multiple data types into subcellular and nuclear connectivity structures that may be used as prior knowledge in the modeling and identification of gene regulatory networks. Nuclear connectivity, as obtained through a structured modeling approach, specifies which genes are regulated by which TFs and can greatly improve the tractability of the gene network identification problem. We presented an approach for determining nuclear connectivity that is based on the assumption that genes with similar expression profiles are regulated similarly, and combines gene expression data with promoter sequences and information from TRE databases through clustering and TRE searching techniques. We observed that some biological signals were strong in that they were not dependent on specific clustering parameters (for example, enrichment of clusters for both SCB and MCB). We also observed greater robustness at the TRE level as opposed to the gene level, since clusters were found to be enriched for several specific TREs, regardless of clustering parameters, while the genes that comprised those clusters were generally variable. We observed that it may be advantageous to perform the analysis using an ensemble of clustering parameters, given that some TREs (for example SFF and ACE2–SWI5) may selectively appear as significant only when certain numbers of clusters are used. Finally, by comparing the centers of clusters that were enriched for the same TRE, we observed that our assumption that genes with similar expression profiles are regulated similarly is a valid one for the present system. We searched the literature to identify the TFs that bind to the TREs, and thus linked TFs to their target genes, thereby completing a preliminary nuclear connectivity structure. Before proceeding to the model identification, however, we refined the nuclear connectivity by retaining only the TF–gene interactions that had been identified in protein–DNA interaction data from the literature (Lee et al., 2002). Reasonable percentages (10–30%) of agreement between the predicted interactions and those observed with the protein–DNA interaction data were obtained. We then combined nuclear connectivity with values for transcript half lives from the literature and reasonable dynamical model structures and identified computational models that described quantitatively how expression levels of target genes respond to changes in the expression levels of TFs. The results showed that the activity of some TFs (SWI4, MCM1) may be regulated effectively at the level of gene expression in that the model structures in Eqs. (3), (4), (5) and (6), based entirely on transcriptional regulation of TF activity, could describe very well the regulation of their target genes. This result is particularly interesting for SWI4, given that is a component of a TF (SBF), whose activity is known to be regulated post-transcriptionally (Chen et al., 2000). It suggests that the conditions which activate SBF post-transcriptionally may be effectively saturating, making the expression of SWI4 the rate-limiting step that determines when SBF regulates the expression of its target genes. For the other TFs we considered (RAP1, SWI5, ACE2, Msn2, Msn4) their targets were generally not well-modeled as dependent on their mRNA levels, suggesting post-transcriptional regulation of the activities of these TFs, necessitating more complex models and additional data. We also demonstrated the importance of a structured modeling approach, specifically nuclear connectivity, by considering how well the ACE2–SWI5 genes could be modeled as targets of random genes. We observed that prior knowledge of nuclear connectivity plays a critical role in identifying gene regulatory networks because it was not difficult to find false interactions that were modeled at least as well as the true interactions. This observation corroborates our previous results obtained using simulated systems (Zak, Doyle, Gonye, & Schwaber, 2001; Zak et al., 2003b). Knowledge of nuclear connectivity also allowed us to identify shortcomings in the dynamical model structures we considered—something not possible with the unstructured approaches that use microarray data alone. For example, genes YOR264W and YHR143W, both known from the literature to be regulated by ACE2 and/or SWI5, were identified as targets of ACE2 and/or SWI5 in our structured identification approach, but were modeled as targets of random genes as well as, or better than, they were modeled as targets of their true regulators. An unstructured modeling approach would either assign equal weight to the true and incorrect interactions, or worse, neglect the true interactions at the expense of false ones. One aspect of modeling and identification that is missing from the present study is the validation: the systematic comparison of the model prediction with data from independent experiments. This has an important role to play in the modeling and identification of biological systems, but it must be done carefully, given the biological complexity and the limited scope of any model derived from a single set of experiments. For example, gene regulatory network models derived from yeast cell cycle expression data may be expected not to fit expression data from the yeast diauxic shift for the basic reason that TFs that are active during the cell cycle may not be active during diauxic shift, and vice versa. This difficulty can be overcome to some extent by first checking the suitability of the validation data before performing the validation. In the case of the yeast cell cycle/diauxic shift example, the suitability of the validation data set can be assessed by determining the nuclear connectivity from the diauxic shift data before using it to validate models derived from cell cycle data. Models of TF/gene pairings that are consistent between the two data sets may be readily validated with the new data set. The remaining TF/gene pairings, and pairings that are unique to the new data set, will require other data sets for validation. If nuclear connectivity is determined using many diverse data sets, however, the complete set of TFs that regulate each gene will be revealed. This comprehensive nuclear connectivity can be used to construct comprehensive gene regulatory network models that can be validated by any gene expression data set collected from the system because they describe how the target genes are regulated by any TF that is potentially active in the system, not just the TFs that are active under a particular condition. Thus, as in the identification of chemical process systems, models derived from diverse data sets will be able to capture a greater diversity of dynamics. This requirement is more extreme in biological systems, however, because the data is used to identify both the network structure and parameters, with the effective structure (which TFs are actively regulating which genes) being condition-dependent. Finally, it is possible that the dynamical structures we considered for the cytoplasmic model, h(·), may be overly simple for all but developmental TFs that are regulated at the level of transcription ( Brivanlou & Darnell, 2002), especially given that these structures did not describe very well the relationships between the majority of TFs we found to be active in our system and their targets. It is known that the activity of several yeast cell cycle TFs are post-transcriptionally regulated, and a computational model that describes some aspects of this regulation has been described ( Chen et al., 2000). We do not advocate our modeling approach or the simple models in Eqs. (3), (4), (5) and (6) as replacements for such detailed computational models of biochemical processes. Rather, we advocate the use of simple model structures for h(·) only when detailed information about the regulation of TF activity is not available. Since our objective in the present work was to demonstrate a general approach, we did not include details specific to the regulation of cell cycle TF activity, and thus used the simple structures for h(·). We feel that it would be straightforward, but still productive, to integrate the detailed model of cell cycle biochemistry into h(·), to yield an overall model that branches out from the core cell cycle to the system-wide regulation of gene expression. An example of a similar approach may be found in Jin et al. (2003). We leave that modeling effort for future work.