Appendix 2

Register Saturation Computation on Stand-alone DDG

This appendix summarizes our full experiments in [BRI 09a].

A2.1. The cyclic register saturation

Our experiments were conducted on a regular Linux workstation (Intel Xeon, 2.33 GHz and 9 GB of memory). The data dependency graphs (DDGs) used for experiments come from SPEC2000, SPEC2006, MEDIABENCH and FFMPEG sets of benchmarks, all described in Appendix 1. We used the directed acyclic graph (DAG) of the loop bodies, and the configured set of register types is images = (FP, BR, GR). Since the compiler may unroll loops to enhance instruction-level parallelism (ILP) scheduling, we have also experimented the DDG after loop unrolling with a factor of 4 (so the sizes of DDGs are multiplied by a factor of 5). The distribution of the sizes of the unrolled loops may be computed by multiplying the initial sizes by a factor of 5.

A2.1.1. On the optimal RS computation

Since computing register saturation (RS) is NP-complete, we have to use exponential methods if optimality is needed. An integer linear program was proposed in [TOU 05, TOU 02], but was extremely inefficient (we were unable to solve the problem with a DDG larger than 12 nodes). We replaced the integer linear program with an exponential algorithm to compute the optimal RS [BRI 09a]. The optimal values of RS allow us to test the efficiency of GREEDY-K heuristics. From our experiments ...

Get Advanced Backend Optimization now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.