EPSRC logo
 Home | GoW Home | Back | Programme | Scheme | Topic | Sector | Theme | Region | Organisation     
 
Details of Grant
 
EPSRC Reference: EP/G025177/1
Title: Geometric Abstractions for Scalable Program Analyzers
Principal Investigator: Professor AG Cohn
Other Investigators:
Researcher Co-investigator:
Dr P Hill
Project Partner:
Department: Sch of Computing
Organisation: University of Leeds
Scheme: Standard Research
Starts: 01 October 2008 Ends: 30 September 2009 Value (£): 51,230
EPSRC Research Topic Classifications:
Fundamentals of Computing
EPSRC Industrial Sector Classifications:
No relevance to Underpinning Sectors
Related Grants:
Panel History:  
Summary
It is widely acknowledged that relatively small defects in software can have a substantial cost both for producers and consumers. For example, system vulnerabilities are frequently introduced by programming mistakes such as allowing out of bounds accesses to buffers, overflows in operations on native integers and other errors related to memory management. Of course, there can be other causes, such as system design flaws, but finding and certifying the absence of the low-level bugs is an important prerequisite to building secure and reliable software. The approach we use to detect and locate programming errors or certify the absence of such bugs is that of static analysis; that is, the determination of correct though approximate information about the program's values at each program step. Static analysis has its roots in compiler optimization where the analysis time has to be kept very low while the properties of interest are fixed with respect to the compiler. More recently program analyzers have been developed for program verification; however these also consider a fixed set of possible run-time errors and aim for a scalability and performance that enables them to tackle very large programs.

Static analysis uses abstract domains for representing information that needs to be collected. Thus these domains have to provide a convenient but approximate representation of the accumulated information during the abstract evaluation of a program. Observe that the abstract domain component of a static analyzer has to include, not only a computer representation of the logical properties of interest, but also the operations needed to extract this information from the program's components, primitives for propagating this information forward and/or backward within the program, and operators for accelerating the analysis process and ensuring loop iterations actually terminate.

Since, many program properties of interest are intrinsically numeric, there has been a considerable amount of research on how this kind of information can be represented efficiently and precisely by means of geometric domains. The problem being to get the "right" efficiency/precision trade-off, which is difficult since this is clearly dependent on the application. Thus many geometric domains have been proposed and researched, the majority being defined by linear (i.e., planar) bounds such as polyhedra; octagons; boxes, also known as intervals; and grids, simple forms of which are also called lattices. Such a range is needed since domains such as polyhedra, although very precise, have high complexity and exponential space requirements (relative to the number of dimensions) while simpler domains such as octagons and grids are polynomial and the non-relational domain of boxes has linear complexity.

Solving this scalability problem is the main motivation for this project; here we will research new techniques for building compound geometric domains that can be constructed from several atomic ones such as those discussed above. In order to allow for varying the efficiency/precision trade-off, not only will it be parametrized on the component domains but it will also have a highly adjustable strategy for varying the kind and amount of communication between them. Thus a successful project will provide bespoke domains that are tailored for the application, allowing for both the type of property being verified and the size and complexity of the software being analyzed.

Final Report Summary
It is widely acknowledged that relatively small defects in software can have a substantial cost both for producers and consumers. For example, system vulnerabilities are frequently introduced by programming mistakes such as allowing out of bounds accesses to buffers, overflows in operations on native integers and other errors related to memory management. Of course, there can be other causes, such as system design flaws, but finding and certifying the absence of the low-level bugs is an important prerequisite to building secure and reliable software. The approach we use to detect and locate programming errors or certify the absence of such bugs is that of static analysis; that is, the determination of correct though approximate information about the program's values at each program step. Static analysis has its roots in compiler optimization where the analysis time has to be kept very low while the properties of interest are fixed with respect to the compiler. More recently program analyzers have been developed for program verification; however these also consider a fixed set of possible run-time errors and aim for a scalability and performance that enables them to tackle very large programs.

Static analysis uses abstract domains for representing information that needs to be collected. Thus these domains have to provide a convenient but approximate representation of the accumulated information during the abstract evaluation of a program. Observe that the abstract domain component of a static analyzer has to include, not only a computer representation of the logical properties of interest, but also the operations needed to extract this information from the program's components, primitives for propagating this information forward and/or backward within the program, and operators for accelerating the analysis process and ensuring loop iterations actually terminate.

Since, many program properties of interest are intrinsically numeric, there has been a considerable amount of research on how this kind of information can be represented efficiently and precisely by means of geometric domains. The problem being to get the "right" efficiency/precision trade-off, which is difficult since this is clearly dependent on the application. Thus many geometric domains have been proposed and researched, the majority being defined by linear (i.e., planar) bounds such as polyhedra; octagons; boxes, also known as intervals; and grids, simple forms of which are also called lattices. Such a range is needed since domains such as polyhedra, although very precise, have high complexity and exponential space requirements (relative to the number of dimensions) while simpler domains such as octagons and grids are polynomial and the non-relational domain of boxes has linear complexity.

Solving this scalability problem was the main motivation for this project. As a result of this work, we have obtained new more efficient algorithms for supporting these compound geometric domains; these target a range of component domains, enabling a choice of precision/efficiency trade-offs. We have also investigated applications of polyhedral computations to the analysis and verification of hardware and software systems; and as a result are currently building tools that can verify many important properties of software systems, both numeric and non-numeric, syntactic and semantic.

Further Information: http://www.cs.unipr.it/ppl/
Organisation Website: http://www.leeds.ac.uk
Terms and conditions