|
| EPSRC Reference: |
EP/F010206/1 |
| Title: |
Using Program Slicing to Size Code Change |
| Principal Investigator: |
Dr T Hall |
| Other Investigators: |
|
| Researcher Co-investigator: |
|
| Project Partner: |
|
| Department: |
Information Systems Computing and Maths |
| Organisation: |
Brunel University |
| Scheme: |
Standard Research |
| Starts: |
31 March 2008 |
Ends: |
30 March 2009 |
Value (£): |
78,776
|
| EPSRC Research Topic Classifications: |
|
| EPSRC Industrial Sector Classifications: |
| No relevance to Underpinning Sectors |
|
|
| Related Grants: |
|
| Panel History: |
|
|
Summary |
This is a proposal for a 12 month preliminary investigation into whether the characteristics of program slices in a software system can help to predict the size of code change for a change request. Program slicing is increasingly used by software developers as a tool to support the maintenance of systems. Developers use program slicing to identify elements of the code that may be affected by particular maintenance changes. The original aim behind the development of slicing was to allow developers to perform higher quality code debugging. Slicing has proved to be effective for debugging because it focuses on the structure of code relevant to making a change to that code. We propose to investigate whether data characterising program slices might have wider application, in particular whether understanding the characteristics of the program slices in a system could allow the predication of the size of code change for a change request. This will allow more effective planning of changes to that system. We will investigate these issues using the slicing characteristics of two long-lived open source software systems. We will use multiple regression techniques to investigate the relationship between the characteristics of both forward and backward slices in these open source systems and the size of code change for change requests to each of these systems. This preliminary investigation, if successful, will be extended in subsequent proposals to investigate commercial systems in industry.
|
| Final Report Summary |
This was a one year feasibility study whose aim was to investigate whether the characteristics of program slices in a software system can help to predict the size of code change necessary to implement a change request. Program slicing is a well established technique (Weiser 1981, 1982) that has extensively been used to aid the development of improved quality systems (Harman 2003, 2004). This project sought to extend the use of program slicing to predicting how much code change would be required for any given change request. To investigate the relationship between three program slicing metrics and the size of code change we tested the following hypotheses:
H1: Lower overlap correlates with smaller sized code changes;
H2: Lower tightness correlates with larger sized code changes;
H3: Higher MaxCoverage correlates with larger sized code changes.
Overall our analysis found no conclusive relationship between these three program slicing metrics and the size of changes made. Indeed we rejected all three hypotheses. This is an important finding for future researchers as it should save further effort investigating these relationships. However further analysis using other slicing metrics has shown some weak relationships, particularly our new metric 'normalized hamming distance' (Counsell et al, 2009).
Despite not identifying significant relationships between the variables investigated we are able to report other substantive findings that will be of value to future researchers. These are:
1. A lack of detailed implementation definitions of program slicing metrics. Although program slicing metrics have been formally defined at a high level of abstraction, the detail of how to implement these high level definitions to collect low level code data is not defined. There are many variations possible in the way that program slicing metrics can be implemented in CodeSurfer, each of which we have shown has an important impact on the data collected (Bowes et al, in review). This implementation variability has not previously been reported and previous studies using program slicing metrics are not explicit about how they have collected their program slicing data. This is an important omission from the current literature which this project has identified.
2. The development of data mining tools to extract and categorize change data. During this project we extracted change data from the repositories of two open source projects (Barcode and Apache). This data extraction required significant effort but was critical to producing a high quality dataset. We then undertook a manual multi-rater categorization of changes to the Barcode project. An automated categorization approach was successfully developed and applied to the Apache project. As a result of this work we have collected an extensive set of change data extracted from these two systems, which is publically available to other researchers via our website (URL below).
3. Extending CodeSurfer to collect program slicing metrics. We succeeded in creating scripts for the code analysis tool CodeSurfer to extract program slicing metrics data from the two open source projects. This task was made particularly challenging by the lack of documentation on extending CodeSurfer. Our extension scripts are available via our web site.
4. The impact of noise on the reliability of open source data. In this study we used the data repositories for two open source systems (Barcode and Apache). Such repositories are increasingly used in research studies. However we found that the data contained in them is difficult to extract accurately and is contaminated with so much noise that cleaning the data is an enormous and error-prone process. We report particular problems the data presents when using data mining techniques (Gray et al, 2009).
|
| Further Information: |
|
| Organisation Website: |
http://www.brunel.ac.uk |
|
|