مواجهه با پرس و جوهای پیچیده در سیستم پشتیبانی تصمیم گیری
کد مقاله | سال انتشار | تعداد صفحات مقاله انگلیسی |
---|---|---|
5513 | 2011 | 15 صفحه PDF |
Publisher : Elsevier - Science Direct (الزویر - ساینس دایرکت)
Journal : Data & Knowledge Engineering, Volume 70, Issue 2, February 2011, Pages 167–181
چکیده انگلیسی
In decision-making problems under uncertainty, a decision table consists of a set of attributes indicating what is the optimal decision (response) within the different scenarios defined by the attributes. We recently introduced a method to give explanations of these responses. In this paper, the method is extended. To do this, it is combined with a query system to answer expert questions about the preferred action for a given instantiation of decision table attributes. The main difficulty is to accurately answer queries associated with incomplete instantiations. Incomplete instantiations are the result of the evaluation of a partial model outputting decision tables that only include a subset of the whole problem, leading to uncertain responses. Our proposal establishes an automatic and interactive dialogue between the decision-support system and the expert to elicit information from the expert to reduce uncertainty. Typically, the process involves learning a Bayesian network structure from a relevant part of the decision table and computing some interesting conditional probabilities that are revised accordingly.
مقدمه انگلیسی
1.1. Decision tables, explanations and queries Under uncertainty, a modern and useful decision-theoretic model is the influence diagram[17]. It consists of an acyclic directed graph with associated probabilities and utilities, respectively modeling the uncertainties and preferences tied in with the stated problem. Nowadays this probabilistic graphical model is frequently adopted as a basis for constructing decision-support systems (DSSs). The results of evaluating an influence diagram are decision tables containing the optimal decision alternatives, policies or responses. Thus, for every decision, there is an associated decision table with the best alternative, i.e. the alternative with the maximum expected utility for every combination of relevant variables (usually called attributes within this context) that are observable before the decision is made. The evaluation algorithm determines which of the observable variables are relevant. These variables are outcomes of random variables and/or other past decisions. A decision table may have millions of rows and typically more than twenty columns leading to enormous data sets for storage and analysis. Expert DSS users demand such an analysis on mainly two grounds. First, DSS decision tables provide the best decision-making recommendations. However, experts may find such recommendations hard to accept if they come without any explanation whatsoever of why the proposed decisions are optimal. Unexplained responses are not good enough for expert users since DSSs operate on a model that is an approximation of the real world. The importance of explanations has been reported in the literature, see e.g. [9], [12] and [13]. Thus, for example, in health-care problems, usually involving difficult trade-offs between the treatment benefits and risks, practitioners may use decision tables to determine the best patient treatment recommendations. For this purpose, they need to understand the underlying reasons or implicit rules. In medical DSSs, clinical practice guidelines assemble the relevant knowledge gathered through literature review, meta-analysis, expert consensus, etc., and operationalize this information as informal, text documents. This makes the gathered information difficult to interpret automatically and the decision-making process hard to guide. Shiffman and Greenes [19] propose translating guideline knowledge into decision table-based rule sets. Shiffman [18] proposes augmenting decision tables by layers, storing collateral information in slots at various levels beneath the logic layer of the conventional decision table. Information relates to table cells, rows and columns. It may include how tests are performed, the benefits/risks of the recommended strategies, costs, literature citations, etc., to help understand the domain. All these decision tables are different than ours. Our knowledge base is the model (influence diagram) and its evaluation, stored in the decision tables. The model (graph with probabilistic dependencies and probability and utility information) is built from clinical practice guidelines, data and expert input. Also, there is no uncertainty in clinical guidelines. Influence diagrams are based on subjective probabilities and utilities, and support learning and reasoning with uncertainty and preferences. In [6] we introduced KBM2L lists to find explanations. The main idea stems from how computers manage multidimensional matrices: computer memory stores and manages these matrices as linear arrays, and each position is a function of the order chosen for the matrix dimensions. KBM2L lists are new list-based structures that optimize this order by putting equal responses in consecutive positions, yielding the target explanations and simultaneously achieving compact storage. These lists implicitly include the probability and utility models, they are simple, and have no added complex layers. Not only do expert users employ decision tables as a knowledge base (KB) for explanations; they also query the DSS about which is the best recommendation for a given set of attributes in different ways. This is the second reason for decision table analysis. In a typical session, experts interact with DSSs to: (A) formulate a query in the KB domain; (B) translate the query into the KB formalism; (C) implement the response retrieval; (D) build the response efficiently; (E) communicate the response(s) and/or suggest improvements, and wait for user feedback. For (A) and (B), we distinguish between two groups of queries (closed/open) depending on whether or not the whole set of attributes is instantiated. A closed query is a specific and well-defined query entered by users that know all the attribute information. An open query is less specific, as it includes attribute values that are undefined either because they are hard or expensive to obtain or they are unreliable. Martinez et al. [15] give a similar classification for GIS (geographical information systems), although they focus on data efficient updating and access from a physical point of view (merely as a database), rather than from a logical point of view (as a KB). (C) to (E) may be troublesome, especially for open queries, due to imprecise response retrieval failing to satisfy users. Additionally, the DSS may not include the whole decision table, because an exhaustive evaluation of the decision-making problem can be too costly. In this case there will be no response at all. Worse still, both situations could apply at the same time, demanding a methodology to undertake tasks (C)–(E) dealing with ambiguity and ignorance about the response. 1.2. Example: Optimal treatment of gastric non-Hodgkin lymphoma Let us illustrate these ideas with the following clinical problem. It is a real health-care decision-making problem regarding the optimal treatment of non-Hodgkin lymphoma of the stomach. Primary gastric non-Hodgkin lymphoma, gastric NHL for short, is a relatively rare disorder, accounting for about 5% of gastric tumors. This disorder is caused by a chronic infection by the Helicobacter pylori bacterium [5]. Treatment consists of a combination of antibiotics, chemotherapy, radiotherapy and surgery. A number of influence diagrams have been constructed and validated [14]. These models are only meant to be used for patients with histologically confirmed gastric NHL. We have taken the most complex version with three decision nodes. This influence diagram is shown in Fig. 1, and is briefly discussed in the following. The first of the decision nodes, helicobacter-treatment (ht), corresponds to the decision to prescribe antibiotics against H. pylori. The second decision concerns carrying out surgery (s). The possibilities are either curative surgery, involving the complete removal of the stomach and locoregional tumor mass; palliative surgery, i.e. partial removal of the stomach and tumor; or no surgery. The last decision, ct-rt-schedule (ctrts), is concerned with the selection of chemotherapy (Chemo), radiotherapy (Radio), chemotherapy followed by radiotherapy (Ch.Next.Rad), or none.The influence diagram model consists of 17 chance nodes (ellipses), one value node (diamond), three decision nodes (rectangles) and 42 arcs. Nodes to the left of the decision nodes (see Fig. 1) concern pretreatment information. Nodes to the right of the decision nodes are posttreatment nodes. Variables with their associated domains are listed in Table 1. See [14] for further details on the model. Bielza et al. [1] detail the use of KBM2L lists to gain a better understanding of the treatment basis of the gastric NHL model.The gastric NHL influence diagram evaluation outputs three decision tables, one for each decision variable, each containing the optimal treatment for each combination of attributes in the tables. Let us take the first decision table concerning the ht decision. It contains four attributes (cs, bd, hc, and hp), and the expected utility of each treatment alternative ht = No/Yes. To illustrate likely user queries, suppose a user queries the DSS about patients with the following configurations: View the MathML sourceQ0: HC=Low.Grade, HP=Present, CS=I and BD=Yes Turn MathJax on View the MathML sourceOQ1: HP=Absent, CS=I and BD=Yes Turn MathJax on View the MathML sourceOQ2: CS=II2. Turn MathJax on We will look at all the discussed queries in this paper. In the first case, View the MathML sourceQ0, the query is closed since the four attributes are instantiated. The question is about a patient that has a good histological classification (hc = Low.Grade), a favorable prognosis (cs = I), the H. pylori bacterium (hp = Present), and a big tumor (bd = Yes). Unless this query corresponds precisely to an unsolved part of the problem, the response should be easy to retrieve. In the second case, View the MathML sourceOQ1, the query is open because the doctor has not yet performed a biopsy to ascertain the histological-classification (hc). This could perhaps be due to the high cost of the biopsy. In the third case, View the MathML sourceOQ2, the query is even more open, specifying only a medium clinical stage (cs = II2) for the patient. However, the user may be interested in finding out which treatment patients like these should receive. Responses are not expected to be easy to retrieve now. There are many possible alternatives, where users will find it unsatisfactory if different and perhaps unknown responses are retrieved. Therefore, strategies should be developed to assure user satisfaction. One possibility is table reordering to provide more precise answers. Another is sophisticated prediction procedures to infer the unknown responses from (somehow) close known responses or by having the user intervene at some steps to reduce response uncertainty. 1.3. Outline In this paper, we propose a query system based on the KBM2L framework to deal with these complex situations. Unlike database management systems that operate with facts, DSSs must provide explanations besides efficiently retrieving the query response information [10]. Thus, our KBM2L framework provides not only an efficient and satisfactory query response retrieval but also an informed response explanation. It is not our aim to develop clinical practice guidelines, but to provide a DSS with a user interface capable of performing complex queries involving more than just accessing a clinical protocol database or document. The paper is organised as follows. Section 2 outlines the technique of KBM2L lists. Section 3 describes the query complexity and shows how to deal with a closed query. Section 4 tackles less specific and more complex open queries. The proposal combines decision tables that have been compacted using KBM2L lists with learning, information access and information retrieval processes. We give several examples applied to the non-Hodgkin lymphoma problem. Section 5 contains the conclusions and suggests further research.
نتیجه گیری انگلیسی
A decision model builds on guidelines, probabilities, utilities, probabilistic relationships, among other sources of information. Decision tables are the result of evaluating a decision model, taking into account that information. Their extraordinarily large size motivated us to analyse them. The aim was to save memory space and, more interestingly, retrieve knowledge (to understand DSS suggestions). In our previous paper we managed to achieve both aims. Moreover, by analysing the items—groups of cases from the decision tables with the same optimal alternative, we could get an explanation of that optimal alternative, the similarity among several alternatives, attribute relevance… Therefore, KBM2L lists managed to solve space savings, optimization and explanations of decision tables. In this paper, the KBM2L framework is extended to deal with queries, one of the most important DSS facilities. The user enters any query into the DSS, assuming that the model (influence diagram) has been validated. This is more than a simple database query, which matches a single rule. Nowadays, influence diagrams do not have any such facility, and users have to deal with huge decision tables from which it is almost impossible to extract useful/concise information. Open queries and situations with imprecise/uncertain responses are specially difficult. These are the cases that we solve in this paper. The focus is on guiding the user toward what variable to query to get a more accurate and convincing response. The optimal and operative bases allow the records involved in a query to be organised from different perspectives. General queries leading to imprecise responses are addressed via an attributes–policy relationship learning process, where interaction with experts is required to arrive at a satisfactory response. Our approach provides the KB definite exploitation for the DSS. As opposed to only listing the influence diagram outputs, we report on improvements in space savings, knowledge extraction as rules, explanations of optimal policies and satisfactory answers to complex queries. Despite the power of our iterative scheme of progressive knowledge elicitation, a possible field of future research would focus on enabling queries with constrained rather than non-instantiated attributes, covering initial beliefs about the attributes. Also, more effort could be employed in determining good operative bases if there is more than one. Two criteria could be: minimum computational effort to output the new KBM2L and minimum item fragmentation. Finally, rather than directly allowing the expert to choose a decision criterion in Algorithm AA2, we could first implement a search within the tree of possible sessions, i.e. possible r − y − r − y ⋯ (response r and instantiated attributes y) sequences. This would filter out possibilities that will not satisfy expert expectations, facilitating choices.