EPSRC logo
 Home | GoW Home | Back | Programme | Scheme | Topic | Sector | Theme | Region | Organisation     
 
Details of Grant
 
EPSRC Reference: EP/E063039/1
Title: Investigating code fault proneness using program slicing
Principal Investigator: Dr T Hall
Other Investigators:
Dr PD Wernick
Researcher Co-investigator:
Project Partner:
Department: Information Systems Computing and Maths
Organisation: Brunel University
Scheme: Standard Research
Starts: 31 March 2008 Ends: 30 March 2009 Value (£): 74,687
EPSRC Research Topic Classifications:
Software Engineering
EPSRC Industrial Sector Classifications:
No relevance to Underpinning Sectors
Related Grants:
EP/E055141/1 EP/E056296/1
Panel History:
Panel DatePanel NameOutcome
01 Mar 2007 ICT Prioritisation Panel Announced
Summary
This feasibility study explores the relationship between program slices and faults. The aim is to investigate whether the characteristics of program slices can be used to identify fault-prone code hotspots. Slicing metrics and dependence clusters are used to characterise the slices in a software component. The relationship between the characteristics of those slices in the component and the faults in that component are then analysed. Identifying fault-prone code is difficult and reliable predictors of fault-proneness are not widely reported in the literature. Program slicing is an established software engineering technique to support the detection and correction of known faults. Once a problem has emerged, slicing enables all statements that could have caused that problem to be identified and extracted. This extracted code makes the identification and removal of the fault much easier. We propose to investigate whether slicing could also be a good predictor of latent faults that have not yet caused a problem. The results of this study will show whether the use of program slicing can be extended as a reliable tool to predict fault-prone code. Our previous proof of concept study suggests that this investigation is viable and that slicing may offer valuable insights into fault-proneness.

Final Report Summary
This was a one year feasibility study whose aim was to investigate the relationship between program slices and faults. Program slicing is a well established technique (Weiser 1981, 1982) that has extensively been used to improve the quality of software systems (Harman 2003, 2004), and in particular to find specific individual faults during de-bugging. This project sought to extend the use of program slicing to the prediction of fault hotspots.

This study achieved five substantive outputs:

1. A systematic literature review of 111 previous studies predicting faults in code. A variety of code-based predictors have been used in previous studies to identify fault hotspots in code. We systematically reviewed 111 such studies published since 2000. We analyzed these studies in terms of the source of data, the predictors used, and the approach taken to developing prediction models. We found that over 70% of these studies use industrial data with 24% of these using NASA datasets. Fault prediction models are based mainly on static code metrics, change data and previous fault data. The performance of the prediction models reported was variable and in some cases difficult to quantify, with 68% of studies failing to report false positives. 80% of studies report that their models achieve a prediction success rate of over 50% with 15% achieving performance levels of over 90%, whereas only 5% report that their models achieve results of less than 50% performance. Our work analyzing these studies is currently in review with IEEE Transactions of Software Engineering (Beecham et al, in review).

2. The development of data mining tools to extract and categorize fault data During this project we extracted fault data from the repositories of two open source projects (Barcode and Apache). This data extraction required significant effort but was critical to producing a high quality dataset. We then undertook a manual multi-rater categorization of changes to the Barcode project. An automated categorization approach was successfully developed and applied to the Apache project. As a result of this work we have collected an extensive set of fault data extracted from these two systems, which is publically available to other researchers via our website (URL below).

3. Extending CodeSurfer to collect program slicing metrics. We succeeded in creating scripts for the code analysis tool CodeSurfer to extract program slicing metrics data from the two open source projects. This task was made particularly challenging by the lack of documentation on extending CodeSurfer. The extension scripts we developed are available via our web site.

4. Adding Normalised Hamming Distance to the base set of program slicing metrics. We added a new metric to the suite of program slicing metrics originally developed by Weiser (1981, 1982) and extended by Ott and Thuss (1993). We adapted our existing Normalised Hamming Distance (NHD) metric to program slices and evaluated its performance against the existing program slicing metrics. Our findings indicate (Counsell et al 2009) that using this metric increases the overall information provided by program slicing metrics.

5. The analysis of program slicing metrics data in relation to fault data. We conducted analysis studies investigating the relationship between program slicing metrics (Meyers & Binkley 2004) and faults in the two Open Source Systems. So far an inconclusive relationship between program slicing metrics and code faults has emerged. However during this one year project we collected a huge amount of fault and program slicing data (approx 1 million tuples) which we continue to analyze.
Further Information:  
Organisation Website: http://www.brunel.ac.uk
Terms and conditions