|
| EPSRC Reference: |
GR/S79572/01 |
| Title: |
Architecture and Compiler Infrastructure for Flexible Cellular Multiprocessing |
| Principal Investigator: |
Dr M Cintra |
| Other Investigators: |
|
| Researcher Co-investigator: |
|
| Project Partner: |
|
| Department: |
Sch of Informatics |
| Organisation: |
University of Edinburgh |
| Scheme: |
Standard Research |
| Starts: |
01 May 2004 |
Ends: |
31 August 2007 |
Value (£): |
283,989
|
| EPSRC Research Topic Classifications: |
|
| EPSRC Industrial Sector Classifications: |
|
| Related Grants: |
|
| Panel History: |
|
|
Summary |
Traditionally, parallelism within a single application has been exploited in two forms: thread-level parallelism (TLP), and instruction-level parallelism (ILP). To perform well across a broad range of applications the architecture must be able to dynamically exploit both types of parallelism. Unfortunately, there is often some mismatch between application and architecture. One example of such mismatch are the recent cellular architectures, which rely entirely on large amounts of TLP.
This project proposes to investigate polymorphous cellular architectures that can dynamically adapt to the form of the available parallelism in the application. This is an emerging technology, but current architectures can only accommodate very coarse grain configuration changes. We intend to explore novel architectural and compiler solutions to overcome these limitations. This will first include a systematic analysis of ILP and TLP behaviour across a range of applications. We will then investigate architectural extensions to the issue logic and to the reconfiguration capabilities of polymorphous cellular architectures to dynamically adapt to ILP and TLP availability. On the compiler side, after developing static analysis to detect the most profitable form of parallelism, later work will investigate adaptive dynamic compilation and speculative parallelism as a means of improving performance.
|
| Final Report Summary |
With respect to the first project objective, this project started with an evaluation of the tradeoffs between ILP and TLP, with the latter in the form of independent programs. This work demonstrated that, with a simple and somewhat idealized memory system, ILP and TLP are more or less interchangeable (i.e., n 1-issue cores lead to similar throughput as 1 n-issue core), but that with a more realistic memory exploiting TLP is the better option. We are currently in the process of extending this study to evaluate the impact of ILP and TLP in the form of threads from a single, parallel, program.
With respect to the second objective, this project started with a thorough quantification and characterization of the uncertainty in points-to sets resulting from state-of-the-art pointer analyses. This work demonstrated that some of this uncertainty is intrinsic to the application (e.g., due to uncertain control flow), but that some is due to limitations of the implementation of the pointer analyses (e.g., lack of support for pointer arithmetic) -- as opposed to limitations of the algorithms. The reason for this work is that pointer analysis is a key step for the compiler identification and exploitation of both ILP and TLP in applications.
Finally, with respect to the third and fourth objectives, this project developed novel mechanisms to support ILP and TLP simultaneously. In particular, we developed two new mechanisms to support TLP without the traditional hardware intensive cache coherence protocols. We note that solving the cache coherence problem is critical for exploiting TLP since it would be too great a burden on programmers to force applications written in the ubiquitous shared-memory programming model to manage the caches by themselves without hardware or systems-software assistance. We believe that the lightweight mechanisms that we developed in this project are key to supporting both ILP and TLP in a cellular multiprocessor for two reasons. Firstly, because traditonal hardware cache coherence protocols are either not scalable to the number of processors envisioned (e.g., snoopping on buses) or are too complex to design and verify (e.g., distributed directories). Secondly, because hardware cache coherence protocols generate unpredictable artifactual communication across tiles that disturbs existing mechanisms to exploit ILP, which rely on deterministic placement of data and, thus, deterministic latencies to access it. One of the schemes we propose avoids incoherence across data caches by enforcing a single logic copy of every data, and relies on minimal hardware support to access this copy via remote cache reads and writes. Moreover, this scheme uses for its TLP support an on-chip interconnection network that was initially developed to exploit ILP in MIT's Raw system -- in line with the polymorphism phylosophy of reusing the same hardware infrastructure for multiple purposes. We are currently in the process of extending this work such that the same network is also used for fast thread synchronization, which is an important further support for TLP. Another scheme we propose relies mostly on software to trigger communication (i.e., coherence) events at specific points of execution. It then relies of minimal hardware support to achieve more selective communication according to the actual run-time access patterns.
In summary, this project produced: 3 technical papers in high-profile refereed conferences (1 still under review); 2 PhD theses (with estimated completion in Winter 2007 and with both students having found jobs even before graduating); and 1 BSc. Honours thesis. Also, the work in this project has attracted much national and international attention as evidenced by invited talks at: IBM T. J. Watson Research Center (USA), Oak Ridge National Laboratory (USA), and ARM Ltd (UK).
|
| Further Information: |
http://homepages.inf.ed.ac.uk/mc/Projects/CELLULAR/main.html |
| Organisation Website: |
http://www.ed.ac.uk |
|
|