درجات شرطی (در) وابستگی: چارچوبی برای شبکه های تقریبی بیزی و نمونه های مرتبط با انتخاب ویژگی مبتنی بر مجموعه ای ناهموار
|کد مقاله||سال انتشار||مقاله انگلیسی||ترجمه فارسی||تعداد کلمات|
|28758||2009||13 صفحه PDF||سفارش دهید||9824 کلمه|
Publisher : Elsevier - Science Direct (الزویر - ساینس دایرکت)
Journal : Information Sciences, Volume 179, Issue 3, 16 January 2009, Pages 197–209
Bayesian networks provide the means for representing probabilistic conditional independence. Conditional independence is widely considered also beyond the theory of probability, with linkages to, e.g. the database multi-valued dependencies, and at a higher abstraction level of semi-graphoid models. The rough set framework for data analysis is related to the topics of conditional independence via the notion of a decision reduct, to be considered within a wider domain of the feature selection. Given probabilistic version of decision reducts equivalent to the data-based Markov boundaries, the studies were also conducted for other criteria of the rough-set-based feature selection, e.g. those corresponding to the multi-valued dependencies. In this paper, we investigate the degrees of approximate conditional dependence, which could be a topic corresponding to the well-known notions such as conditional mutual information and polymatroid functions, however, with many practically useful approximate conditional independence models unmanageable within the information theoretic framework. The major paper’s contribution lays in extending the means for understanding the degrees of approximate conditional dependence, with appropriately generalized semi-graphoid properties formulated and with the mathematical soundness of the Bayesian network-like representation of the approximate conditional independence statements thoroughly proved. As an additional contribution, we provide a case study of the approximate conditional independence model, which would not be manageable without the above-mentioned extensions.
Conditional independence (CI) provides us, in its original and still most widely researched probabilistic version, with the means for expressing relationships among random variables . In machine learning, e.g. the variables are interpreted as attributes (columns, features) in data tables and the joint probabilistic distributions are estimated by looking at combinations of the attributes’ values observed as the data records . In such cases, probabilistic CI is often renamed as statistical CI . Many researchers find probabilistic CI useful while interpreting the tasks of the feature selection  and , which may be regarded as aiming at searching for attributes that provide (almost) the same information about some specified targets as the set of all available attributes. For example, given widely studied relationships between rough sets and probability , ,  and , the concepts related to probabilistic CI were considered within the framework of probability-based attribute reduction  and . There are many ways of representing probabilistic CI. The most popular one refers to Bayesian networks (BNs) – directed acyclic graphs (DAGs) enriched with conditional probability distributions labeling their nodes  and . BNs can be constructed basing on the expert knowledge or on the probabilistic CI statements discovered during the data-based learning processes  and . Each BN is supposed to be an independence mapping (IM), which graphically encodes knowledge about the probabilistic CI statements by means of so called d-separation. The corresponding DAG structure serves as a probabilistic CI knowledge base. This way, BNs can be used also as the means for knowledge visualization, which is important for interaction between the experts and the decision support systems. For instance, researchers in bioinformatics often represent knowledge derivable from the gene expression data  and  using DAGs spanned over the gene-related attributes  and . BNs and their extensions are widely applicable also to such areas as the new case classification, databases, information retrieval and data compression, where their ability to represent (in)dependencies among the sets of attributes plays the key role , ,  and . Having in mind real-life problems concerning various types of data and related reasoning strategies, one can ask whether probabilistic model of CI is the only one. It is also important in the above-mentioned domain of feature selection, where information about (in)dependencies between attributes does not need to be expressible in terms of probabilities. In  and , it is stated that the reason for BNs to be able to encode knowledge about probabilistic CI lays in its so called semi-graphoid properties. Such properties hold also for some other interpretations of CI. Consequently, BNs can be reconsidered for non-probabilistic approaches too. For instance, one of the known alternative CI models is based only on a part of probabilistic information, namely, whether the value-vectors have zero or non-zero probability. Such occurrence-based CI model was given as an example in  and analyzed in  as corresponding to the rough-set-based attribute reduction framework relying on so called generalized decisions  and . On the other hand, it corresponds to the database-related framework for the embedded multi-valued dependencies (EMVDs) , ,  and . Hence, there is a direct linkage between the rough set approach to the feature selection and the principles of modeling dependencies in databases. Some investigation has been also conducted to extend the existing approaches towards approximate CI, that would better fit the real-life data. It was repeatedly noted in the literature that requirement for the precise equality between probability distributions while defining probabilistic CI is impractical or even self-contradictory, given that the probability theory is supposed to deal with imprecision , ,  and . The notion of approximate CI may have also an impact on knowledge discovery, where the most interesting patterns or dependencies usually turn out to hold in data only to some reasonably high degree. For example, within the rough set framework for feature selection  and , the following three principles of the approximate attribute reduction were considered : (1) it is worth reducing irrelevant attributes and simplifying the corresponding decision system; (2) reduction (simplification) should not decrease the overall system’s ability to approximate the target concepts; (3) in real-world situations, however, we should agree to slightly decrease the system’s quality, if it leads to significantly simpler underlying dependencies. In other words, given previously-mentioned correspondence between CI and the feature selection, one may consider the decision systems based on the CI statements, which are simpler but only approximately satisfied in data. Analogous ideas, referrable to the Occam’s razor and the minimum-description-length principles  and , were considered in other areas related to CI. As an example, most of algorithms extracting BNs from data focus only on those out of edges that provide significant amount of information about inter-variable correlations  and . The resulting DAGs may represent probabilistic CI statements that are only roughly true, which is often the only solution because of no exact probabilistic CI statements in the real-life data. However, before our later-discussed publications  and , there was no theoretical background for analyzing whether, and to what degree, the probabilistic CI statements represented by DAGs pre-learnt from the data in such an inexact fashion are actually valid against the same data. In other words, although there were some previous attempts to formalize the notion of consistency of DAGs with respect to the data , there was no analogous attempt to find correspondence between data-related consistency of DAGs and data-related degrees of satisfaction of the CI statements derivable from those DAGs using d-separation.
نتیجه گیری انگلیسی
We showed that the polymatroid-based framework for the approximate conditional independence (CI) and approximate Bayesian networks (BNs)  and  is not flexible enough. We proposed more general postulates for the degrees of approximate conditional dependence, based on functions M(·;·|·):P(A)×P(A)×P(A)→[0,+∞)M(·;·|·):P(A)×P(A)×P(A)→[0,+∞), where P(A)P(A) denotes the family of subsets of a set of attributes A ( Definition 2). We accordingly extended one of the major theoretical results related to representation of CI in BNs  and  in order to properly handle the M(·;·|·)M(·;·|·)-based approximate CI statements ( Theorem 3). We attempted to reemphasize correspondences between the rough set framework for feature selection  and the basic notions of approximate CI . In particular, in order to show why the above-mentioned extension of the postulates for the degrees of approximate conditional dependence is truly necessary, we considered in Section 5 a relatively simple case of modeling the embedded multi-valued dependencies  within the theory of rough sets, over the numeric data sets multiply discretized using their consecutive records . We believe that the paper contributes to the foundations of knowledge bases gathering most meaningful approximate attribute (in)dependencies. We showed that the approximate attribute (in)dependence models require various levels of representation complexity, described by means of functions defined over P(A)P(A), P(A)×P(A)P(A)×P(A), or even P(A)×P(A)×P(A)P(A)×P(A)×P(A). Further research is certainly necessary with regards to such knowledge bases and models that would lead, for example, towards larger ensembles of approximate BNs representing the data-based approximate CI statements in a more complete fashion.