Algorithms and Parallel Computing

11.12 SUMMARY OF WORK DONE IN THIS CHAPTER

At this stage, we were able to completely specify the reduced computation domain associated with the matrix–matrix multiplication algorithm. This could represent the required concurrent threads for a software implementation or the required PEs needed for a systolic array hardware implementation. Below we summarize what we have done and why:

1. We started by expressing the matrix multiplication as an iterative Equation (Eq. 11.1).

2. The indices of the iterative Equation defined the multidimensional computation domain . The facets and vertices of this domain were studied in Sections 11.3 and 11.4.

3. We identified the dependence matrix A associated with each variable of the algorithm in Section 11.5. Based on this matrix, we identified its nullvectors, which represent the broadcast subdomain B of the variable. We were also able to identify the intersection points of B with . These intersection points help in supplying input variables or extracting output results. At this stage, we can decide whether to broadcast or to pipeline our variables. ...

Get Algorithms and Parallel Computing now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Algorithms and Parallel Computing by Fayez Gebali

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly