Chapter 8

Floating Point Arithmetic

Abstract

The chapter discusses the handling of integer and floating point numbers in a digital computer. Integers are stored using two’s-complement notation with p bits. The positive integers have a zero in the left-most bit, and the binary representation for the integer in the remaining p − 1 bits. The negative integers begin with −1 = 111…111 and end with 1000…000., so they all have a left-most bit of 1. The negative of an integer is computed using the formula 2comp(n) = 1comp(n) + 1, where 1comp(n) flips bits. The range of the integer representation is − 2^p ≤ n ≤ 2^p−1. Perform subtraction of x and y by executing x − y = x + 2comp(y). The sum of two positive or two negative integers can overflow, and ...

Get Numerical Linear Algebra with Applications now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Numerical Linear Algebra with Applications by William Ford

Floating Point Arithmetic

Abstract

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly