Appendix 5

Loop Unroll Degree Minimization: Experimental Results

All our benchmarks have been cross-compiled on a regular Dell workstation, equipped with Intel(R) Core(TM)2 CPU of 2.4 GHz and Linux operating system (kernel version 2.6, 64 bits).

A5.1. Stand-alone experiments with single register types

This section presents full experiments on a stand-alone tool by considering a single register type only. Our stand-alone tool is independent of the compiler and processor architecture. We will demonstrate the efficiency of our loop minimization method for both unscheduled loops (as studied in section 11.4) and scheduled loops (as studied in section 11.6).

A5.1.1. Experiments with unscheduled loops

In this context, our stand-alone tool takes a data dependence graph (DDG) as input, just after a periodic register allocation done by SIRA, and applies a loop unrolling minimization (LUM).

A5.1.2. Results on randomly generated data dependence graphs

First, our stand-alone software generates the number of distinct reuse circuits k and their weights (μ1, …, μk). Afterwards, we calculate the number of remaining registers images and the loop unrolling degree ρ = lcm(μ1, …, μk). Finally, we apply our method for minimizing ρ.

We did extensive random generations on many configurations: we varied the number of available registers from 4 to 256, and we considered 10,000 random instances containing multiple ...

Get Advanced Backend Optimization now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.