کشف تنوع درون ناظر در سرطان پستان با استفاده از درخت تصمیم و شبکه های بیزی
کد مقاله | سال انتشار | تعداد صفحات مقاله انگلیسی |
---|---|---|
28797 | 2009 | 12 صفحه PDF |
Publisher : Elsevier - Science Direct (الزویر - ساینس دایرکت)
Journal : Applied Soft Computing, Volume 9, Issue 4, September 2009, Pages 1331–1342
چکیده انگلیسی
We evaluate the performance of two decision tree procedures and four Bayesian network classifiers as potential decision support systems in the cytodiagnosis of breast cancer. In order to test their performance thoroughly, we use two real-world databases containing 692 cases and 322 cases collected by a single observer and 19 observers, respectively. The results show that, in general, there are considerable differences in all tests (accuracy, sensitivity, specificity, PV+, PV− and ROC) when a specific classifier uses the single-observer dataset compared to those when this same classifier uses the multiple-observer dataset. These results suggest that different observers see different things: a problem known as interobserver variability. We graphically unveil such a problem by presenting the structures of the decision trees and Bayesian networks resultant from running both databases.
مقدمه انگلیسی
One of the most common types of cancer that affects women in North America, Europe and the Antipodes, is breast cancer [1]. The usual practice in the United Kingdom to accurately diagnose such a disease is to use three different methods: the surgeon's clinical opinion, a mammography and the cytological studies. That is why this diagnosis process is known as the triple approach [1]. Because mammographic studies are not decisive in the diagnosis of breast cancer [1] and [2], an alternative confirmatory method is needed to support or reject these findings. The most common confirmatory method used in the United Kingdom for this purpose is that of fine needle aspiration of breast lesion (FNAB) [1], [3] and [4]. Such a technique involves a process where a syringe sucks cells from breast lesions using a fine bore needle similar to those used for blood samples. Once this is done, these cells are transferred to a transport solution and sent to the pathology laboratory for a microscopic study carried out by a trained cytopathologist [1]. In this paper, we focus on the discovery of inconsistencies in the interpretation of a sample by building classifiers, based on decision trees and Bayesian networks, from a cytopathological database retrospectively collected by a single observer and a database prospectively collected by 19 observers. The time it normally takes to a medical doctor to become an independent practicing cytopathologist in the UK is about 5 years as a minimum [1]. This fact can give an indication of the very complex learning process through which medical doctors have to pass. It is mainly for this reason that machine learning methods for decision-making may have two potential applications: (a) to accelerate the training process of learning by providing guidance to the trainee on what features are the most important ones to look at; and (b) to compare the final results to those of the trainee or even those of the expert so that the decision (whether the sample taken indicates a benign or a malignant output) can be made on more robust criteria. That is why we need to test potential decision support systems in this area to the extreme: we need to check whether it is possible to carry out an objective diagnosis rather than a subjective one; i.e., whether the interpretation of a sample by different pathologists is consistent. To this end, we use 692 consecutive adequate specimens collected by a single pathologist and 322 consecutive adequate specimens collected by 19 pathologists to train and test two decision tree procedures and four Bayesian network classifiers. The results show the presence of a difficulty known as the interobserver variability problem [5], [6] and [7], which prevents the classifiers from generalizing well enough. The interobserver variability problem refers to the situation where there is no reproducibility of the results given by different experts of a determined area on the same set of cases. That is to say, in the specific case of the cytodiagnosis of breast cancer, pathologists do not agree on what cytological features are the most relevant for correctly diagnosing whether a patient has breast cancer or not. Although the inconsistencies in the two datasets prevent the classifiers presented here from performing robustly, it can be argued that, with the aid of this kind of tools (decision trees and Bayesian networks), the main underlying principles, conditions, mechanisms and causes that lead to this problem could be gradually discovered since these tools graphically show the importance of each variable for the diagnosis of breast cancer (in the case of decision trees) as well as the probabilistic relations among these variables (in the case of Bayesian networks). It is important to mention here that some other works [1], [4] and [8] have applied different classification methods, such as logistic regression, multilayer perceptron neural networks (MLPs) and adaptive resonance theory mapping neural networks (ARTMAPs) for analyzing these same datasets. However, these models (as well as others like symbolic rules, k-nearest neighbors and support vector machines) do not have the power of graphically unveil the mentioned, and central to this investigation, interobserver variability problem. By this we do not mean that such models lack the power for identifying this problem; we just mean that they originally lack the graphical power for unveiling it, since they are not graphical representations. We are not saying either that this graphical feature cannot be added to such models; we are saying that the traditional (original) formulation of these models does not consider this graphical feature [9], [10], [11], [12], [13], [14] and [15]. For example, symbolic rules might be converted to decision trees but this step is not originally included in such an approach. It can also be argued that the graphical representations presented in this research provide a natural and intuitive framework to model interactions among variables. The terms natural and intuitive suggest that these graphical representations are, under certain conditions, easier to understand than other kinds of representations. A number of researchers from a wide range of scientific disciplines (cognitive psychology, developmental psychology, linguistics, anthropology and computer science) have given evidence that supports such a claim: Gattis [16], Liben [17], Tversky [18], Emmorey [19], Bryant and Squire [20], McGonigle and Chalmers [21], Hummel and Holyoak [22] and Larkin and Simon [23]. Some of them claim that these representations aid cognition because they are structured in such a familiar way that people can rely on them to structure memory, communication and reasoning [16]. Gattis [16] also points out that spatial representations are not merely metaphors that help understand cognitive processes but actual internal mechanisms that allow us to perform more abstract cognitive tasks. Larkin and Simon argue that a diagram can be superior to a verbal description because, when well used, the former “automatically supports a large number of perceptual inferences, which are extremely easy for humans” [16, p. 107]. Graphical representations are useful in reasoning tasks because, through their structure (which can represent order, directionality and relations) and the partial knowledge about their elements and the relations among them, it is possible to infer the values of the elements and their relationships that are unknown [16]. In a similar vein, Larkin and Simon [23] also claim that these representations have the power to group together all the information that is used together, which avoids the problem of searching large amounts of data to find the elements needed for performing inference. As Tversky [18] points out, graphical representation can be used to reveal hidden knowledge, providing models that facilitate inference and discovery. It is very important to remark that, in words of Tversky, “long before there was written language, there were depictions, of myriad varieties” [11, p. 80]. In sum, graphical representations can represent abstract concepts and information in such a way that this information can be accessed and integrated quickly and easily. They also facilitate group communication [18]. It is also important to justify why these two specific models (decision trees and Bayesian networks) can be regarded as Soft Computing (SC) methods: it is mainly because such models can learn from experimental data and have the power for suitably representing human knowledge [24], [25], [26] and [27]; enterprises on which the core SC techniques (support vector machines, artificial neural networks and fuzzy logic models) mainly focus their attention. According to Kecman [24], there are additional techniques that can be considered as extensions of SC: evolutionary algorithms, probabilistic reasoning, fractals and chaos theories and belief networks (also known as Bayesian networks). Kecman points out too that IF–THEN rules can express almost all structured human knowledge: decision trees can be easily converted into this kind of rules [13], [15] and [27]. Finally, classification lies within the range of problems solved by SC methods: this is one of the central topics in the present work as well as the transferring of human knowledge into analytical models [24]. The remainder of this paper is organized as follows. In Section 2 we give the rationale for decision trees and Bayesian networks in order to better understand how these models work. In Section 3 we present the materials and methods used for this research. In Section 4 we present the performance of two algorithms that build decision trees and four algorithms that construct Bayesian network structures from a database collected by a single observer and a database collected by multiple observers. In Section 5 we discuss the results given by these procedures and finally, in Section 6, we present some conclusions and propose some sensible directions for future research.
نتیجه گیری انگلیسی
We have presented a study to discover interobserver variability in the cytodiagnosis of breast cancer using decision trees and Bayesian networks. The results show that it seems possible to accurately build automatic classifiers from data if these data come from a single observer. However, the results suggest not constructing such classifiers from data if these data come from different observers. At first sight, the experiments presented here may look like as a simple application of software but this is not so: decision trees and Bayesian networks are graphical representations of the phenomenon (breast cancer) under study. This implies that, although there are several other models for representing this phenomenon (artificial neural networks, regression, k-nearest neighbors, support vector machines, among others), such graphical representations provide medical doctors a new and much easier way to identify the possible causes for interobserver variability than the non-graphical ones. For instance, investigations by various researchers [5], [6], [7] and [46] show this interobserver variability problem only in numerical terms (mainly correlation and averages); i.e., they just show that the problem is present but do not explicitly explain how changes in the interactions among variables may be causing such a problem, which in turn, might be very useful. Both the decision tree and the Bayesian network frameworks allow to visually identify the so-called interobserver variability problem: the interpretation of a sample may vary from one pathologist to another if they are forced to codify the variables one by one (in contrast to globally interpreting the image); a situation that is reflected in different decision trees and Bayesian network structures. It is important to recall that the main goal of this paper is to show the interobserver variability problem in the cytodiagnosis of breast cancer using decision trees and Bayesian networks, not to propose new algorithms and study their properties. That is why we used Weka [15], which is a versatile tool that includes many powerful and well-known methods for building classifiers based on those models, some of which are used here. Finally, we need to stress that the applied nature of this research may be useful for a wide range of specialists. As a future work, we can comment on the following points. First, as said above, the main goal of this paper is to show the interobserver variability problem in the cytodiagnosis of breast cancer. However, through the experiments carried out here, we noticed the variety of performances by the classifiers presented in this research. Thus, a nice exploration would be that of measuring whether the differences in their respective performances are significant and which of these classifiers seem to be more robust to the interobserver variability problem. As an extension of such an exploration, we may combine the predictions by these classifiers in order to come up with an improved classifier: this can be achieved using the well-known meta-algorithm called AdaBoost [47]. Finally, as mentioned in Section 3.1, we might try different codifications of cytological attributes and variable age other than binary for the former and three-valued for the latter. This could be achieved by using different (automatic) discretization procedures combined with a sensitivity analysis.