11.12 SUMMARY OF WORK DONE IN THIS CHAPTER

At this stage, we were able to completely specify the reduced computation domain c11ue041 associated with the matrix–matrix multiplication algorithm. This c11ue042 could represent the required concurrent threads for a software implementation or the required PEs needed for a systolic array hardware implementation. Below we summarize what we have done and why:

1. We started by expressing the matrix multiplication as an iterative Equation (Eq. 11.1).

2. The indices of the iterative Equation defined the multidimensional computation domain x1D49F_EuclidMathOne_10n_000100. The facets and vertices of this domain were studied in Sections 11.3 and 11.4.

3. We identified the dependence matrix A associated with each variable of the algorithm in Section 11.5. Based on this matrix, we identified its nullvectors, which represent the broadcast subdomain B of the variable. We were also able to identify the intersection points of B with x1D49F_EuclidMathOne_10n_000100. These intersection points help in supplying input variables or extracting output results. At this stage, we can decide whether to broadcast or to pipeline our variables. ...

Get Algorithms and Parallel Computing now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.