Miao Ju, Hun Jung, and Hao Che
As chip multiprocessors (CMPs) become the mainstream processor technology, challenges arise as to how to design and program CMPs to achieve the desired performance for applications of diverse nature. There are two scalability barriers that the existing CMP analysis approaches (e.g., simulation and benchmark testing) find difficult to overcome. The first barrier is the difficulty for the existing approaches to effectively analyze CMP performance as the number of cores and threads of execution becomes large. The second barrier is the difficulty for the existing approaches to perform comprehensive comparative studies of different architectures as CMPs proliferate. In addition to these barriers, how to analyze the performance of various possible design/programming choices during the initial CMP design/programming phase is particularly challenging, when the actual instruction-level program is not available.
To overcome the above scalability barriers, approaches that work at much coarser granularities (e.g., overlooking microarchitectural details) than the existing ones must be sought to keep up with the ever-growing design space. Such an approach should be able to characterize the general performance properties for a wide variety of CMP architectures and a large workload space at coarse granularity. Moreover, such an approach cannot ...