کد منبع داده کاوی برای محل اشکالات نرم افزار: یک مطالعه موردی در صنعت مخابرات
|کد مقاله||سال انتشار||مقاله انگلیسی||ترجمه فارسی||تعداد کلمات|
|22163||2009||5 صفحه PDF||سفارش دهید||محاسبه نشده|
Publisher : Elsevier - Science Direct (الزویر - ساینس دایرکت)
Journal : Expert Systems with Applications, Volume 36, Issue 6, August 2009, Pages 9986–9990
In a large software system knowing which files are most likely to be fault-prone is valuable information for project managers. They can use such information in prioritizing software testing and allocating resources accordingly. However, our experience shows that it is difficult to collect and analyze fine-grained test defects in a large and complex software system. On the other hand, previous research has shown that companies can safely use cross-company data with nearest neighbor sampling to predict their defects in case they are unable to collect local data. In this study we analyzed 25 projects of a large telecommunication system. To predict defect proneness of modules we trained models on publicly available Nasa MDP data. In our experiments we used static call graph based ranking (CGBR) as well as nearest neighbor sampling for constructing method level defect predictors. Our results suggest that, for the analyzed projects, at least 70% of the defects can be detected by inspecting only (i) 6% of the code using a Naïve Bayes model, (ii) 3% of the code using CGBR framework.
Software testing is one of the most critical and costly phases in software development. Project managers need to know “when to stop testing?” and “which parts of the code to test?”. The answers to these questions would directly affect defect rates and product quality as well as resource allocation (i.e. experience of test staff, how many people to allocate for testing) and the cost. As the size and complexity of software increases, manual inspection of software becomes a harder task. In this context, defect predictors have been effective secondary tools to help test teams locate potential defects accurately (Menzies, Greenwald, & Frank, 2007). These tools are built using historical defect databases and are expected to generalize the statistical patterns for unseen projects. Thus, collecting defect data from past projects is the key challenge for constructing such predictors. In this paper, we share our experience for building defect predictors in a large telecommunication system and present our initial results. We have been working with the largest GSM operator (∼70% market share) in Turkey, Turkcell, to improve code quality and to predict defects before the testing phase. Turkcell is a global company whose stocks are traded in NYSE and operates in Turkey, Azerbaijan, Kazakhstan, Georgia, Northern Cyprus and Ukraine with a customer base of 53.4 million. The underlying system is a standard 3-tier architecture, with presentation, application and data layers. Our analysis focuses on the presentation and application layers. However, the content in these layers cannot be separated as distinct projects. We were able to identify 25 critical components, which we will refer to as project throughout this paper. We used a defect prediction model that is based on static code attributes like lines of code, Halstead and McCabe attributes. Some researchers have argued against the use of static code attributes claiming that their information content is very limited (Fenton & Neil, 1999). However, static code attributes are easy to collect, interpret and many recent research have successfully used them to build defect predictors (Menzies et al., 2007, Menzies et al., 2007, Turhan and Bener, 2007 and Turhan and Bener, 2008). Furthermore, the information content of these attributes can be increased i.e. using call graphs (Kocak et al., 2008a and Kocak et al., 2008b). Kocal et al. show that integrating call graph information in defect predictors decreases their false positive rates while preserving their detection rates. Previously, Turkcell did not use company-wide policies for collecting and analyzing such metrics. In our research, we have collected these metrics from the abovementioned 25 projects. We have also collected the static call graphs for these projects. The collection of static code metrics and call graphs can be easily carried out using automated tools (Menzies et al., 2007, Menzies et al., 2007 and Turhan and Bener, 2008). However, as we mentioned earlier, matching these measurements to software components is the most critical factor for building defect predictors. Unfortunately, in our case, it was not possible to match past defects with the software components in the desired granularity, module level, where we mean the smallest unit of functionality (i.e. java methods, c functions). Previous research in such large systems use either component or file level code churn metrics to predict defects (Bell et al., 2006, Nagappan and Ball, 2006, Ostrand and Weyuker, 2002, Ostrand et al., 2005, Ostrand et al., 2004, Ostrand et al., 2007 and Zimmermann and Nagappan, 2006). The reason is that file level is the smallest granularity level that can be achieved. For example, Nagappan, Ball and Zimmermann analyze Microsoft software in component level and Ostrand, Weyuker and Bell analyze AT&T software in file level to report effective predictors used in practice. However, defect predictors become more precise as the measurements are gathered from smaller units (Ostrand et al., 2007). Therefore, we decided to use module level cross-company data to predict defects for Turkcell projects (Menzies, Turhan et al., 2007). Specifically, we have used module level defect information from Nasa MDP projects to train defect predictors and then obtained predictions for Turkcell projects. Previous research have shown that cross-company data gives stable results and using nearest neighbor sampling techniques further improves the prediction performance when cross-company data is used (Menzies et al., 2007, Menzies et al., 2007 and Turhan and Bener, 2008). Our experiment results with cross-company data on Turkcell projects, estimate that we can detect 70% of the defects with a 6% LOC investigation effort. While nearest neighbor algorithm improves the detection rate of predictors built on cross-company data, false alarm rates remain high. In order to decrease false alarm rates, we included the call graph based ranking (CGBR) framework in our analysis based on our previous research. We used graph based ranking (CGBR) framework (Kocak et al., 2008a and Kocak et al., 2008b) to software modules. Using CGBR framework improved our estimated results such that the LOC investigation effort decreased from 6% to 3%. The rest of the paper is organized as follows: In section 2 we briefly review the related literature, in Section 3 we explain the project data. Section 4 explains our rule-based analysis. Learning based model analysis is discussed in Section 5. The last section gives conclusion and future direction.
نتیجه گیری انگلیسی
In this study we investigate how to predict fault-prone modules in a large software system. We have performed an average case analysis for the 25 projects in order to determine the characteristics of the implemented code base and observed that there were contradicting measurements with the company objectives. Specifically, the software modules were written using relatively low number of operands and operators to increase modularity and to decrease maintenance effort. However, we have also observed that the code base was purely commented, which makes maintenance a difficult task. Our initial data analysis revealed that a simple rule-based model based on recommended standards on static code attributes estimates a defect rate of 15% and requires 45% of the code to be inspected. This is an impractical outcome considering the scope of the system. Thus, we have constructed learning based defect predictors and performed further analysis. We have used a cross-company Nasa data to learn defect predictors, due to lack of local module level defect data. The first analysis confirms that the average defect rate of all projects was 15%. While the simple rule-based module requires inspection of 45% of the code, the learning based model suggested that we needed to inspect only 6% of the code. This is from the fact that rule-based model has a bias towards more complex and larger modules, whereas learning based model predicts that smaller modules contain most of the defects. Our second analysis results employed data adjusted with CGBR framework, which is externally validated not to change the median probability of detection and to significantly decrease the median probability of false alarm. The second analysis improved the estimations further and suggested that 70% of the defects could be detected by inspecting only 3% of the code. Our future work consists of collecting local module level defects to be able to build within-company predictors for this large telecommunication system. We also plan to use file level code churn metrics in order to predict production defects between successive versions of the software.