دانلود مقاله ISI انگلیسی شماره 21444
ترجمه فارسی عنوان مقاله

مقایسه عملکرد بین رویکردهای چندمتغیره و داده کاوی برای حضور/عدم حضور مدل پیچیده آستروپتاموبیوس پالیپس در ناحیه ای در شمال غربی ایتالیا (شمال غربی ایتالیا)

عنوان انگلیسی
Performance comparison among multivariate and data mining approaches to model presence/absence of Austropotamobius pallipes complex in Piedmont (North Western Italy)
کد مقاله سال انتشار تعداد صفحات مقاله انگلیسی
21444 2011 10 صفحه PDF
منبع

Publisher : Elsevier - Science Direct (الزویر - ساینس دایرکت)

Journal : Comptes Rendus Biologies, Volume 334, Issue 10, October 2011, Pages 695–704

ترجمه کلمات کلیدی
اکوسیستم های آب شیرین - مدیریت - رگرسیون لجستیک - درخت های تصمیم گیری - شبکه های عصبی مصنوعی -
کلمات کلیدی انگلیسی
Freshwater ecosystem, Management, Logistic regression, Decision trees, Artificial neural network,
پیش نمایش مقاله
پیش نمایش مقاله  مقایسه عملکرد بین رویکردهای چندمتغیره و داده کاوی برای حضور/عدم حضور مدل پیچیده آستروپتاموبیوس پالیپس در ناحیه ای در شمال غربی ایتالیا (شمال غربی ایتالیا)

چکیده انگلیسی

Freshwater inhabitants in Piedmont (Italy) have been deeply disadvantaged by environmental changes caused by human disturbance. Hence there are engendered species that need human intervention of an entirely different kind – better management through the development of innovative practical tools. The most ecologically important of the river-dwelling invertebrates is a threatened species, the native white-clawed crayfish Austropotamobius pallipes. This is the species that we focused on in our effort to contribute to species conservation. Specifically we contrasted three different techniques of managing data relating to the presence/absence of this species: logistic regression, decision-tree models and artificial neural networks (ANN). Logistic regression and decision tree models (unpruned and pruned) performed worse than ANN. In this case, tree-pruning techniques did not make these models significantly more reliable, but did make the trees less complex and therefore did make the models clearer. ANN performed the best. Therefore we have judged them to be the most effective techniques.

مقدمه انگلیسی

Freshwaters, which are rapidly deteriorating all around the world, have been the focus of more and more attention [1], [2] and [3]. This attention has inspired many studies analyzing the ecological, environmental and habitat factors that affect the distribution of freshwater organisms at different spatial scales. However, one kind of freshwater organism that has been relatively neglected is the crustacean [4], [5], [6], [7], [8] and [9]. In relation to crustaceans, we endeavored to analyze the relationship between species distribution and ecological factors, a fundamental step towards increasing our knowledge of freshwater ecosystems, of the communities associated with them, and of information important for management and conservation. Worldwide, freshwater habitats are being subjected to such marked human disturbance that the extinction rate of freshwater species is predicted to be five times that of terrestrial species and three times that of coastal marine mammals [10]. All this hastens us to foster habitat and species preservation by developing practical tools for assessing running waters and species conditions ecologically. The biological model we used in this research project is the white-clawed crayfish Austropotamobius pallipes complex, the biggest indigenous freshwater invertebrate in Western and Central Europe [11] and [12]. Over the last few decades, European populations of native crayfish have been fragmented and have declined all over the continent [13]. Human disturbance has provoked habitat fragmentation, deforestation and water deterioration. Larger, more aggressive, and quicker-growing non-native crayfish [14], [15], [16] and [17] have been introduced. On top of this, human disturbance is liable to become even more severe in the future while non-indigenous species are transmitting the crayfish plague due to Aphanomyces astaci (Schikora, 1906) [18]. Obviously, A. pallipes has been in need of special protective measures and so was listed as “vulnerable” on the Red List of threatened animal species compiled by the International Union for the Conservation of Nature and Natural Resources [19] and in annexes II and V of the Habitat Directive (Council of the European Communities, 1992, 1997). In Piedmont (NW Italy), A. pallipes is protected locally by a Regional Law (L.R. number 37 dated 29/12/06), which lays down new regulations for the management of aquatic fauna, habitat, and fishing. In particular, it provides policies aimed at re-establishing consistent populations of native species. A. pallipes, like other native crayfish, is considered a keystone species [20], an important component of many food webs in freshwater ecosystems [21], [22], [23] and [24]. Crayfish are involved in the food chain: they are prey for vertebrate predators [25] and, in turn, are omnivorous feeders with a significant impact on community structures [26], [27], [28], [29], [30] and [31]. They play an important role in the well-being of running water ecosystems [32] and take part in the cycling of matter and the flow of energy [33]. Although A. pallipes have long been considered valid bioindicators of water quality [34], [35] and [36], they also inhabit moderately polluted waters [8], [9] and [37]. These were the factors that have led us to investigate the relationship between the environment and the presence/absence of A. pallipes. In our research project, we have used modeling, a tool being considered more and more important for defining management and conservation policies. Ecosystems have highly complex nonlinear relationships among their input variables, and so researchers have been applying machine-learning methods to ecology in the last decade [38], [39], [40], [41], [42], [43], [44], [45] and [46]. One reason is that machine-learning techniques introduce fewer prior assumptions about the relationships among the variables and hence are better than traditional statistical analysis in many ways. There are many machine learning techniques. However, decision trees [47], artificial neural networks [48], fuzzy logic [49], and Bayesian belief networks [50] are the techniques that seem to model habitat suitability the best [41] and [51]. Our research project evaluates the reliability of various current classification techniques in modeling A. pallipes presence/absence and ranks their performances. We used two types of approaches. Firstly, we used the multivariate-statistics approach, where we applied logistic regressions (LRs). Secondly, we used the machine-learning approach, where we applied decision trees (DTs) and artificial neural networks (ANNs). These types of machine-learning techniques have been used at various rates – ANNs quite often from mid-1990s [44], [45], [46], [48], [52], [53], [54], [55], [56], [57], [58], [59], [60], [61] and [62], DTs sporadically [41], [45] and [46], and LRs most frequently [56].

نتیجه گیری انگلیسی

The most obvious finding of our research project is that A. pallipes is distributed across Piedmont in a heterogeneous and fragmented way (as it is in the Lazio Region in central Italy [16]). Over the last few decades, populations of A. pallipes declined considerably in Piedmont [83], as in all of Europe [12], [13], [18], [84] and [85]. We no longer observed crayfish in 77 previously inhabited watercourses probably because there has been habitat fragmentation, engineering work, stream canalization, deforestation, and increased water pollution. We have endeavored to determine the predictive model that performs the best because such a model can be used to manage and protect endangered species better. Not all the modeling procedures we tested performed well. LR models performed as they did in the tests of Manel et al. [56]. ANNs outperformed both LR models and DT models. LRs performed worse in relation to specificity than in relation to sensitivity probably because there were more sites with crayfish presence than absence. DTs performed the worst of all. Cohen's k statistic showed that DT models yielded unreliable predictions, in that most of the classifications were based on chance, as they did in previous studies where they did not perform well in predicting macroinvertebrates [41]. In Dakou et al. [41], Cohen's k values were even lower than those in the present study. The unpruned DTs were too complex with their many leaves to yield any ecological interpretation. The J48 algorithm produced very detailed trees that prevented the models from generalizing any more. Therefore we used post-pruning to reduce tree complexity and variance. Post-pruning did not make the models perform better, as in Tirelli et al. [46] and Tirelli and Pessani [45]. However, they did yield simpler trees that could be interpreted ecologically [41], [45] and [46]. Learning in ANNs is sensitive to the input data used. When researchers choose the appropriate features through pre-processing, their models perform considerably better in ecological contexts [46]. When there is no variable selection in ANNs, irrelevant information passes through the nodes, influences the connection weights slightly, and affects the overall performance of ANNs. On the other hand, variable selection decreases ANN size, reduces computational costs, increases speed, and uses less data to estimate connection weights efficiently. Feature selection eliminates all but the most relevant attributes, reduces the number of input variables, and helps models predict better [46], [74] and [86]. In general, predictions are more accurate when the number of presences and absences is around 50% [87]. This is obviously a problem, especially when modeling rare species. It is especially important to predict presences correctly and to have accurate models when we need to predict the presence of scarce species. Such accuracy helps conserve and manage the species by identifying the potential protected areas. With this in mind, the ANN approach is valuable for modeling A. pallipes presence. 4.1. Physical-chemical variables One finding our research project that seconds earlier research is that the organic matter dissolved in the water is a factor crucial for explaining the white-clawed crayfish distribution (all models use BOD5) [8], [9], [88] and [89]. Broquet et al. [15] and Trouilhé et al. [8] underlined that organic matter is one of the most important features of brooks with native crayfish. Vegetal residues and organic detritus are of great importance for the crayfish diet. In fact, they are the most important sources of energy and food available in freshwater ecosystems [31] and [90]. In our project, the BOD5 index was used to measure the organic matter that can be biologically attached by bacteria [9] and [15]. In addition, we built models using several other physical-chemical variables that have already been reported to be important for A. pallipes distribution [7], [8] and [9]: the pH, the concentration of Ca2+, the concentration of NO3−, the percentage of dissolved oxygen in water, and the level of conductivity. Ca2+ is especially important for determining the occurrence of crayfish because it is essential for exoskeleton calcification. NH4+ and PO43− and the pollution they cause do not affect A. pallipes presence. In fact, these ions are often found in streams inhabited by A. pallipes [8], [9], [15], [37], [89], [91] and [92]. Mean value and standard deviation of the physical-chemical variables characterizing sites inhabited by this species are reported in Favaro et al. [9]. 4.2. Environmental and climate variables Another finding that our research project supports is that A. pallipes need to avoid potential predators, extreme temperature ranges, and extreme changes in the flow of water. Thus the environmental features that can help explain their distribution are the ones that play a role in their avoiding these circumstances: (1) shade due to canopy cover and bedrock used as shelter from potential predators; (2) temperature variations (a minimum temperature during cold seasons) and temperature variations due to altitude (a good integrator of the thermal conditions); (3) scarcity or flooding of flowing water (precipitation during the wettest period and water velocity). The availability of shelters and borrows in a stream – critical for the survival of adults – is the most important resource bottleneck in crayfish populations [7] and [93]. This association of canopy cover with A. pallipes presence has been supported by Smith et al. [4], Naura and Robinson [5], and Broquet et al. [15], but not by Barbaresi et al. [7]. Mean value and standard deviation of the environmental and climate variables characterizing the elective habitat of A. pallipes in Piedmont are reported in Table 5.In conclusion, A. pallipes are being subjected to an unprecedented crisis [11] and [85]. Therefore it is imperative that researchers choose the best way to take on this crisis by understanding the relationships between endangered species and their habitats more deeply. With this in mind, they can better plan conservation and management strategies. Our advice is this: researchers must first use various techniques and then contrast their performances. Our own results illustrate the advantages of contrasting various approaches. In fact our method enabled us to predict white-clawed crayfish presence in Piedmont with reasonable accuracy. It helped us choose the best model for managing A. pallipes. Had we used fewer approaches, we would have come up with a poorer model. Our research project has underlined the synergic effects of several biotic and abiotic factors on the occurrence of A. pallipes in an effort to provide information for the maintenance of natural populations and the selection of sites and streams where reintroduction strategies may be planned. We conclude with the suggestion that researchers use and contrast various techniques, as we did, in their research in other areas.