16.1 FLOATING-POINT SYSTEM DEFINITION

Assume that a set of real numbers x belonging to the interval

image

is represented in such a way that the following specifications are satisfied:

d1 is the maximum distance between small exactly-represented non zero numbers;

d2 is the maximum distance between large exactly-represented numbers;

xmin is the maximum distance between 0 and the smallest exactly-represented numbers:

where the adjectives small and large refer to the absolute value of the corresponding numbers.

Every number x will be represented in the form ±s.be, with b ≥ 2, s being the significand and e the exponent.

In order to make the implementation of the arithmetic operations easier (Section 16.2), the two following conditions must be satisfied:

  1. The significand s is represented in base B = b.
  2. The significand belongs to the interval

image

Thus x is expressed in the form

image

The values of p, emin, and emax are chosen in such a way that

image

image

Example 16.1 Define a floating-point representation system ...

Get Synthesis of Arithmetic Circuits: FPGA, ASIC and Embedded Systems now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.