# Next-Generation Arithmetic.

Tim Fernandez-Hart

Rarely do we give much thought to how numbers are represented in computers. The closest we might get is when using a C-like language which forces us to declare the type of our variables at compile time. However, all the details of how this is represented are hidden and generally taken for granted.

Typically, most programmers use integers and floating-point numbers. Integers are whole numbers, they are fast but have a small range of values they can represent for a given word length. Floating-point numbers were introduced as an improvement on this. They represent numbers in scientific format and have a regular structure.

Figure 1 shows an IEEE-754, 32-bit floating-point number. The MSB on the left (bit 31) is the sign bit (blue). If it is a 0 the number is positive, if its a 1 the number is a negative. The next 8 bits left to right (green) are the exponent which again can be positive or negative (you need to subtract a “bias” before you know its value). The remaining 23 bits encode the fraction. Floating-point numbers have huge dynamic range and a high precision and are the go-to format for most scientific computing.

However, the format has several disadvantages, particularly in hardware. For example, zero can be negative or positive. Both have to be checked for. As a fixed format, you are also stuck with the number of exponent bits and largely, most real-world problems have no need for numbers as large as 2 × 10308 . But worst of all, is the sheer quantity of NaN bit patterns. It is in the quadrillions when using a 64-bit double-precision float. These bit patterns could have been used to represent values and so this format is extremely wasteful.

Posits are a new number representation designed to solve these problems. They were introduced by John Gustafson in 2017 as a drop-in replacement for floating-point numbers and were ratified by The Posit Working group in 2022. They are parametrized by two numbers *n* and *es*. *n* is the total number of bits and *es* is the size of the exponent. Typically, *es* will be 2 for a 32-bit number. They have only two exceptional values 0 and +/- infinity or NaR (Not-a-Real).

As you can see from Figure 3, similarly to floating-point numbers, posits retain the sign bit, fraction bits, and an exponent. But they add a new set of bits called the *regime* bits which act like a super exponent to scale the fraction along with the exponent.

Whilst posits generally simplify the implementation they complicate it in one way. All the bit positions, other than the sign bit, are not fixed. This happens because the regime bits can expand dynamically to encode a large or small number. They are allowed to push the fraction bit, and even the exponent bits out dynamically. However, the advantage of this is that when the number is closer to zero, the regime bits shrink, giving more space over to the fraction. This gives posits the curious feature of increasing both their dynamic range **and ** their precision with the addition of an extra bit to the format.

Decoding the regime bits is done by counting the leading zeros for a negative regime, or the leading ones for a positive regime. The run of bits is terminated by the opposite value, e.g. 00001, or 11111110. The regime value *r* is given by:

*k*is the number of leading 1’s/ 0’s. Table 1 gives some examples.

Putting it all together:

Sign bit = 0

Regime bits = 10 = 0

Exponent = 1

Fraction value = 2 + 32 + 256 + 2048 = 2338

Fraction length = 12

useed = 4