کاربرد درخت رده بندی در تجزیه و تحلیل حساسیت نتایج مدل احتمالی
|کد مقاله||سال انتشار||مقاله انگلیسی||ترجمه فارسی||تعداد کلمات|
|25701||2003||7 صفحه PDF||سفارش دهید||3979 کلمه|
Publisher : Elsevier - Science Direct (الزویر - ساینس دایرکت)
Journal : Reliability Engineering & System Safety, Volume 79, Issue 2, February 2003, Pages 123–129
The complexity of some integrated-system models necessitates using a probabilistic approach to quantify uncertainty in model projections. In this work, we demonstrate how classification trees can be used to perform sensitivity analyses on probabilistic results. The classification tree technique is applied to results from the probabilistic total system performance assessment model used in the Yucca Mountain project. The technique proves effective in delineating the variables that most influence low and high outcomes.
Computer models are increasingly being used to predict the future behavior and associated uncertainties of complex systems such as nuclear waste repositories , oil production facilities  and global climate change dynamics . Such integrated-system models, when executed in a probabilistic mode to enable quantification of uncertainties in model projections, often include hundreds of parameters that are uncertain and/or variable and whose interaction with one another can also be complex and/or highly nonlinear. It is difficult to obtain an understanding of exactly how the model works, and what the critical uncertainties and sensitivities are, from a simple evaluation of model results. To this end, sensitivity analysis provides a structured framework for unraveling the results of probabilistic model runs by examining the sensitivity of model results to the uncertainties and assumptions in model inputs. Sensitivity analysis, in its simplest sense, involves quantification of the change in model output corresponding to a change in one or more of the model inputs. In the context of probabilistic models, however, sensitivity analysis takes on a more specific definition, viz. identification of those input parameters that have the greatest influence on the spread or variance of the model results. This is sometimes referred to as global sensitivity analysis  to distinguish it from the classical (local) sensitivity analysis measures typically obtained as partial derivatives of the output with respect to inputs of interest. The contribution to output uncertainty (variance) by an input is a function of both the uncertainty of the input variable and the sensitivity of the output to that particular input. In general, input variables identified as important in global sensitivity analysis have both characteristics; they demonstrate significant variance and large sensitivity coefficients. Conversely, variables which do not show up as important per these metrics are either restricted to a small range in the probabilistic analysis, and/or are variables to which the model outcome does not have a high sensitivity. Commonly used global sensitivity analysis techniques for probabilistic models include regression analysis , variance decomposition methods , screening methods  and partitioning techniques . Several applications of these techniques have recently been described in proceedings of the SAMO conferences ,  and  as well as in the open literature . The objective of this paper is to present a new global sensitivity analysis methodology which enables the analyst to determine what variables or interactions of variables drive model output into particular categories. The proposed methodology utilizes the classification tree analysis technique that is widely used in the field of medical decision-making and other scientific disciplines . Tree-based modeling is an exploratory technique for uncovering structure in categorical and continuous data with such practical applications as rapid determination of prediction rules, summary of large multivariate data sets and variable screening. Although tree-based models are useful for both classification and regression problems, we focus here on the former because standard global sensitivity analysis techniques are generally restricted to continuous rather than categorical outcomes. In particular, tree-based models are likely to be helpful in determining factors responsible for the separation between high- and low-dose outcomes, zero- and non-zero release outcomes, etc. In what follows, we first review the principles of classification tree analysis along with some implementation details specific to the sensitivity analysis problem. Next, we describe illustrative applications of the methodology in a recently concluded probabilistic performance assessment study for the proposed nuclear waste repository at Yucca Mountain, Nevada, USA. Finally, we present some comments regarding the general applicability of the classification tree technique and how it compares with other common sensitivity analysis methods.
نتیجه گیری انگلیسی
We have presented classification trees as a tool for identifying key variables affecting extreme outcomes in probabilistic model results. Although tree-based techniques have been used in other scientific disciplines for data analysis and modeling, we believe this is the first application of the technique in the field of sensitivity analysis. The ability of classification trees to handle categorical data and to easily incorporate variable interactions makes them a useful complement to standard sensitivity analysis techniques such as regression modeling or variance decomposition-based methods. Application of classification tree analysis to probabilistic performance assessment results from Yucca Mountain has highlighted two key features of the methodology. First, a straightforward application of the technique helps reveal important factors responsible for high- and low-dose outcomes. When supplemented with a two-variable decision tree and/or a partition plot, the separation of extreme outcomes in the uncertain parameter space is easy to visualize and explain. Such insights are often absent from simple applications of partitioning methods such as the Kolmogorov–Smirnov test  which merely lead to a numerical ranking of uncertainty importance without offering any understanding of the input–output relationship. The second key feature of the methodology, as demonstrated by the analysis of the 100,000-year data, involves looking beyond the obvious to identify ‘masked’ factors of importance. Such conditional sensitivity analyses would be difficult to carry out using standard regression modeling techniques because of the limited number of realizations (corresponding to only the top and bottom deciles) available for modeling. This is a useful data mining capability not available with other sensitivity analysis techniques. It should be pointed out that classification trees are appropriate for modeling non-linear and non-additive behavior only in monotonic input–output models. The presence of non-monotonic trends in the data may require the use of more sophisticated tests for detecting non-random patterns  beyond what is possible with regression-based approaches or tree-based techniques. In summary, we note that classification trees appear to be a useful tool for sensitivity analysis of probabilistic model results without requiring any additional functional evaluations. They provide a statistical framework for exploring physical factors driving input–output relationships in various regions of the functional space.