Table of Contents

# Data Representation and Encoding¶

In this chapter we study the techniques used to encode all the information used by a processor using only bits, that is, 0 and 1. Digital circuits are capable of manipulating groups of bits. The problem then is to decide how many bits are required to encode certain information and how such encoding is performed.

## Binary Logic¶

The digital electronic circuits are capable of manipulating signals in two
possible states: zero and one. As a consequence, all the information processed
by these circuits needs to be first encoded using sets of the signals called
*bits*. A digital circuit is then a system capable of receiving data encoded
with a set of bits (inputs), perform some operation, and producing another set of
signals (outputs) with the result. A microprocessor is a complex digital
circuit capable of executing a previously defined set of instructions encoded
with bits and known as Machine Language.

But in order for a microprocessor to execute these instructions, all their elements (numbers, symbols, operands, etc.) need to be encoded with binary logic. This encoding is the base to understand how a computer which is made mostly of digital circuits capable of manipulating bits can perform much more complex tasks such as allowing you to browse through the net, making a phone call, playing a game, listening to music, read your heart beat, etc.

In the following sections we will manipulate numbers in different bases, mostly
in base 10, base 8 (or octal) and base 16 (or hexadecimal). You will need quite
frequently to translate numbers between those bases. Even though the steps to
make these translations will be explained, it is **very helpful** to carry out
these operations with a calculator, or more precisely, a **programmers
calculator** that offer the operations typical of this context.

These programmers calculators usually come installed by default in the most popular operating systems. The following figure shows you the ones currently present in Microsoft Windows, MacOS and some distributions of Linux.

### Properties of a binary encoding¶

Using a single bit, there are only two elements that can be represented, one for each of the possible values 0 and 1. If we use sequence of two bits, the number of possible combinations increases to four values: 00, 01, 10, and 11. For every bit we add to the sequence, the number of possible combinations doubles because we can repeat the combinations twice but adding a one to the first group and a zero to the second group.

If a single bit can encode only two elements, and for each additional bit we include in a sequence we double the number of combinations, in general with \(n\) bits we can encode up to \(2^n\) elements.

The previous formula returns the number of possible combinations when using a sequence of \(n\) bits, but if we have a set with \(N\) elements, how many bits are needed to encode its elements in binary logic? For example, suppose the elements are \(\{red, green, blue, cyan, magenta\}\), is it possible to encode then in binary logic with two bits? with three? with four? The answer to this question is that when attempting to encode a set with \(N\) elements, the number of possible combinations must be larger or equal than the number of elements. In other words, then number of bits \(n\) required to encode the elements of a set with \(N\) elements must satisfy

Or alternatively, given a set with \(N\) elements, the minimum number of bits required to encode its elements with binary logic is

where the symbols ⌈⌉ represent the integer larger than the obtained logarithm. Going back to the previous example, the number of bits required to encode the set of five elements is \(n \ge \log_{2}5\), and therefore \(n \ge 2.3219281\), so we need at least 3 bits.

This inequality is useful to know the **minimum number of bits** to encode a
set, but it does not provide any statement about the maximum. When encoding a
set of elements, using more bits than the minimum provided by equation
(2) is perfectly feasible. An encoding may have a number of
combinations larger than the number of elements in the set, thus leaving some
of them unused.

The two previous equations can be transformed into the two rules to take into account when encoding a set of elements using binary logic:

- With \(n\) bits we can encode up to a maximum of \(2^n\) elements.
- To encode \(N\) elements in a set we need at least \(\lceil\log_2 N\rceil\)

Once the number of bits has been decided, you need to define a **relationship**
between each element in the set and a concrete sequence of bits. Each element
must have at least one binary representation, and each sequence of bits must
correspond with one element in the set. This **relationship** must satisfy some
minimum requirements to be usable by digital circuits. These circuits can only
manipulate sequences of bits up to a certain length which needs to be decided
beforehand. For example, if a circuit must operate with natural numbers, you
need to decide which subset of those numbers will be used and from there, the
number of bits required for their encoding.

## Representing numbers in different bases¶

Before studying how to encode elements in different sets so that they can be
manipulated by a computer system, it is very useful to study how to represent
natural numbers in different bases. Conventionally, we write natural numbers
using 10 digits (0, 1, 2, 3, 4, 5, 6, 7, 8 and 9) in what is known as *base
10*. Given a number written in base 10, the least significant digit which is
written in the right most position corresponds with the units. We will use the
numbers in base 10 which are the ones normally used, to explain how to
generalize this representation to any base. The digit in the left most position
is the most significant. Any number represented in a base \(b\) satisfies
the following conditions:

- \(b\) digits are used to represent numbers. The digits go from 0 to \(b - 1\).
- The number represented by \(b - 1\) is followed by the number 10.
- Analogously, the maximum number with \(d\) digits is followed by one with \(d + 1\) digits in which the most significant is 1 and the rest are all zeros.

These conditions are satisfied by numbers encoded in base 10 using digits 0 to 9. The number following 9 is 10, and the maximum number represented by, for example, 4 digits (which is 9999) is followed by 1 and five zeros (10000). The value of a sequence of digits is obtained multiplying each of them by the base raised to the exponent denoting its position starting by the least significant one being zero as shown in the following example.

Example

The same number can be re-written using the base raised to the appropriate exponent as follows:

The general formula to obtain the value of any number on any base can be re-written as

where \(d_i\) represents the digit in position \(i\) in the number. The formula applies to any base. Thus, the value of any n-digit number in base \(b\) is obtained by the following equation:

Example

The equivalent in base 10 of the number 6342 in base 7 can be obtained by using equation (3):

Thus, number 63432 in base 7 corresponds to 2235 represented in base 10. As you can see, the number in base seven does not have any digit bigger than 6, because only the digits from 0 to 6 are allowed.

When writing numbers in different bases together we have an ambiguity problem. For example, the number 2235 in base 10 from the previous example is the equivalent of number 6342 in base 7. In order to distinguish this number from the same number in base 10, the base is included in the right side of the number as a subscript. Thus we can write

### Translating numbers to different base encodings¶

Equation (3) allows to obtain value in base 10 of any number in any base. The opposite process, that is, given a number in base 10 obtain its equivalent in a different base requires repeated divisions by the base to obtain the digits of the new number starting by the least significant. The process is easily shown with an example. Suppose you want to calculate the equivalent in base 7 of the number \(88675_{10}\). The least significant digit can be obtained by using equation (3).

If equation (4) is divided by the base we obtain as reminder the least significant bit \(d_0\) of the representation. The quotient contains the rest of digits and if we keep divided by the base, the rest of the digits in the new base are obtained.

Example

Consider the number \(8675_{10}\). To obtain its representation in base 7 we perform the first division by 7 to obtain a reminder of 2, thus being the least significant digit. The quotient obtained is 1239. When divided again by 7, the reminder is 0, and the new quotient is 177. If this operation is repeated the successive reminders are the digits of the number in base 7 as shown in the following equation:

The successive divisions by the base guarantees that eventually a quotient with value less than the base. When this occurs there is no need for more divisions. The representation in base 7 can be used to apply equation (3) and obtain again the number in base 10:

and therefore

In summary, given a number in base 10, its representation in another base \(b\) is obtained by collecting the reminders of successive divisions by the new base until the quotient is less than the base. The digits are obtained from least to most significant.

## Encoding natural numbers¶

The type of elements that are simplest to encode in binary logic to be manipulated by a computer is the set of natural numbers \(\mathbb{N}\). The representation corresponds with the numbers encoded in base 2. The digits in this base correspond with the two values that can be manipulated by digital circuits. Thus, given a binary number with n digits (or equivalently, n bits) its equivalent in base 10 is obtained applying equation (3) in base 2:

But since the only possible values for \(d_i\) are 0 and 1, the previous equation can be interpreted in a simplified way. Given a number in base 2, its equivalent in base 10 is obtained adding those powers of 2 with exponents corresponding to the place in which there is a 1.

Example

Consider the binary number \(1101011_2\). Its equivalent in base 10 is obtained as

Thus, when manipulating numbers in binary format and know its equivalent in decimal it is important to get familiar with the values of the powers of two that shown in the following table:

Bit | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|

Weight | \(2^0\) | \(2^1\) | \(2^2\) | \(2^3\) | \(2^4\) | \(2^5\) | \(2^6\) | \(2^7\) | \(2^8\) | \(2^9\) | \(2^{10}\) | \(2^{11}\) | \(2^{12}\) | \(2^{13}\) | \(2^{14}\) | \(2^{15}\) |

Decimal | 1 | 2 | 4 | 8 | 16 | 32 | 64 | 128 | 256 | 512 | 1024 | 2048 | 4096 | 8192 | 16384 | 32768 |

The following table shows a few examples of how to obtain the representation in base 10 of three 8 bit binary numbers:

Weight | 128 | 64 | 32 | 16 | 8 | 4 | 2 | 1 | Total |
---|---|---|---|---|---|---|---|---|---|

Binary | 0 | 0 | 1 | 0 | 0 | 1 | 1 | 1 | |

Base 10 | 32 | 4 | 2 | 1 | 39 |

Weight | 128 | 64 | 32 | 16 | 8 | 4 | 2 | 1 | Total |
---|---|---|---|---|---|---|---|---|---|

Binary | 1 | 0 | 1 | 1 | 0 | 0 | 1 | 0 | |

Base 10 | 128 | 32 | 16 | 2 | 178 |

Weight | 128 | 64 | 32 | 16 | 8 | 4 | 2 | 1 | Total |
---|---|---|---|---|---|---|---|---|---|

Binary | 1 | 1 | 0 | 1 | 1 | 0 | 0 | 1 | |

Base 10 | 128 | 64 | 16 | 8 | 1 | 217 |

The opposite procedure, converting a number in base 10 to binary is done following the procedure explained in Section Representing numbers in different bases. In this case, the number is divided by 2 and therefore the reminder can only be 0 or 1 producing the bits for the binary representation.

Example

The representation in binary of the number 217 in base 10 is:

The digits of the binary results are obtained from least to most significant, thus the result is \(11011001_2\).

The representation in base 2 has several properties that are very useful when operating with these numbers. To know if a number is odd or even it is enough to check the least significant bit. If it is 1, the number is odd, otherwise, it is even. This is because any binary number is obtained adding powers of 2. All powers of 2 are even numbers except the first one \(2^0\), which is precisely the value of the least significant bit. Thus, a number in binary is odd or even depending only on this bit.

The second property applies to any base. Integer multiplication or division by the base is done by adding a zero as least significant digit or removing the least significant bit respectively. For the numbers in base 10, multiplying by 10 is done adding a zero as the least significant bit. Analogously, if a number in base 10 is divided by 10, the quotient is obtained by ignoring the least significant digit, which itself corresponds with the reminder of the division.

In binary logic, the multiplication and division by 2 correspond respectively to adding a zero as least significant bit, or removing the least significant bit. For example the binary number \(100111_2\) which corresponds with \(39_{10}\), if multiplied by 2 results in \(1001110_2\) which represents 78 in base 10. Similarly, if the same number is divided by two, its quotient is \(10011_2\) which represents \(19_{10}\), and the reminder is 1 (\(39 = 19 * 2 + 1\)).

## Encoding in bases 8 and 16¶

Encoding in base 8, also known as *octal* even though it might not look very
useful to handle digital circuits, it satisfies a special property that makes it
important. Applying the concepts presented in the previous section, numbers
encoded in this base use digits 0 to 7. Number 7 is followed by 10, and 77
by 100. To translate a number from base 10 to base 8 we proceed with the
successive divisions by 8 to obtain the digits of the new representation. But
since 8 is a power of 2 (\(2^3\)), the successive divisions can easily be
applied to a number represented in binary. In other words, to translate a
binary number to base 8, there is no need to first calculate its value in base
10 and then divide successively by 8, the divisions can be trivially performed
directly in the binary number. How?

The quotient and reminder obtained when dividing a binary number by eight are obtained directly. The three least significant bits represent the reminder, and the rest of the bits (after the three least significant have been removed) is the new quotient. Thus, the translation of a binary number to octal consists simply on creating groups of three bits starting from the least significant, and translate each group to its corresponding value between 0 and 7 as shown in the following table:

Octal digit | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 |

Binary | 000 | 001 | 010 | 011 | 100 | 101 | 110 | 111 |

If the number of bits in the binary number is not multiple of three, in other words, the last group has one or two bits, the group is completed up to three bits with zero values in the most significant bits (because they do not alter the value of the number). The following table shows examples of how this translation is performed:

Number in binary | Groups of 3 bits | Number in octal |
---|---|---|

\(00100111_2\) | 000 = 0, 100 = 4, 111 = 7 | \(47_8\) |

\(10110010_2\) | 010 = 2, 110 = 6, 010 = 2 | \(262_8\) |

\(11011001_2\) | 011 = 3, 011 = 3, 001 = 1 | \(331_8\) |

Given how simple is the process to translate from base 2 to base 8 and vice versa, base 8 is actually used as a more compact way to represent binary numbers. A long sequence of zeros and ones is written as its base 8 equivalence and therefore is much shorter. Additionally, instead of representing the number with the sub-index 8 following the number, there is the convention of adding a zero as the leftmost digit. Thus, the numbers \(47_8\) and \(047\) both represent the same number in octal which corresponds with the number \(100111\) in binary.

But base 8 is not the only one with the property of being a power of two. The following base in increasing order that is also a power of two would be base 16. Is it possible to write numbers in this base? Following the properties described in the previous section, we need as many digits as indicated by the base starting from zero. So, aside from digits 0 to 9, we need six additional symbols to use as digits. The solution is to use the first six letters of the alphabet: A, B, C, D, E, F as if they were digits. Thus, the 16 digits used to represent numbers in base 16 are:

Applying the same rules that were previously described, number F is followed by 10, number 19 is followed by 1A, 1F is followed by 20, 99 is followed by 9A, and the number before 100 is FF.

This encoding is also known as **hexadecimal**. Is it possible to make a direct
translation from a binary number to an hexadecimal number? The digits are
obtained by successive divisions by 16. But since 16 is also a power of two
(\(2^4\)), the reminder of the division are the four least significant bits
of the number, and the quotient are the remaining bits after removing the
reminder. Thus, to obtain the hexadecimal value of a binary number we group the
bits in groups of four starting from the least significant, and then translate
each group into its corresponding digit as shown in the following table.

Hex digit | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 |

Binary | 0000 | 0001 | 0010 | 0011 | 0100 | 0101 | 0110 | 0111 |

Hex digit | 8 | 9 | A | B | C | D | E | F |

Binary | 1000 | 1001 | 1010 | 1011 | 1100 | 1101 | 1110 | 1111 |

As in the case of base 8, hexadecimal is typically used as a more compact representation of binary numbers. This base is a bit more convenient because for the case of bytes, it only requires two digits (as opposed to three for base 8). To avoid confusion with the other representation hexadecimal numbers are represented with the prefix 0x. The following table shows examples of binary numbers and their corresponding hexadecimal representation.

Number in binary | Groups of 4 bits | Number in hexadecimal |
---|---|---|

\(00100111_2\) | 0010 = 2, 0111 = 7 | 0x27 |

\(10110010_2\) | 1011 = 11, 0010 = 2 | 0xB2 |

\(11011001_2\) | 1101 = 13, 1001 = 9 | 0xD9 |

The conversion from hexadecimal to base 10 is done identically to the rest of the bases. The digits A through F have values 10 to 15 respectively, and the base to use is 16.

Example

The representation in base 10 of the following hexadecimal numbers is:

The following video summarizes how to encode numbers in bases 2, 8 and 16.

## Size of a encoding¶

In the previous sections you have seen how to encode natural numbers in binary. The representation of a number is done by a set of bits. But, how many bits are really needed to represent natural numbers? Given that there are infinite of them, in principle, the answer would be infinite number of bits. However, digital circuits can only manipulate representations of numbers in binary with a finite number of bits.

This limitation means that additionally to a technique to translate numbers to
binary, we must also decide the size of such encoding and what happens when
such size is not enough. For example, suppose we decide to encode natural
numbers with 10 bits. Only the numbers in the interval \([0, 2^{10} - 1]\)
can be represented. The last number of such interval will be encoded as
1111111111 (or 0x3FF in hexadecimal). The problem appears when a circuit
manipulating these numbers has to perform the operation \(1023 + 1\). The
result can be obtained with no problem but **it cannot be represented with
given number of bits**. This situation in which the result of an operation
cannot be represented due to the size of the encoding is known as
**overflow**. The way the microprocessors deal with it is simply to detect it
and raise an exception so that it can be treated by some specific code.

The number of bits used to encode natural numbers changes from circuit to
circuit and is one of the parameters that defines what is called its
*architecture*. The more digits used, the larger the interval of numbers that
can be represented, but at the same time the more complicated are the circuits
needed to manipulate them. Thus, microprocessor designers have to take into
account this trade-off and find a size that is reasonable for the type of
calculations required and so that the circuits required to operate on them are
not too complex. Through the history of microprocessor design, the typical
sizes used to represent naturals started as 8 bits (allowing to encode from 0
to 255) all the way up to 128 bits that can be found in more advanced circuits.

The problem of the size of an encoding is not unique for the naturals. Any set with an infinite number of elements that needs to be encoded will have the same problem. Given a set of bits, only a subset of values can be encoded, and the processor must detect and notify whenever the encoding of an element outside that interval is needed.

## Encoding integers¶

There are various schemes used to encode integers in binary logic. The most
simple of them is called *sign and magnitude*. This encoding is based in the
observation that any integer can be considered as a pair formed by a sign and a
natural number (representing its absolute value). For example, the number
-345 can be represented by the pair (-, 345). The absolute value of an
integer, by definition, is a natural number, and therefore we can use the
encoding using base 2 we described in the previous section. As for the sign,
because it is a set with two possible values, we simply use one bit and the
value 0 represents the positive sign, and 1 the negative sign.

As a consequence, the translation of an integer in base 10 to its representation in the sign and magnitude scheme is simple: encode the absolute value of the number in binary, and add the sign as an additional bit. Typically, the sign bit is the left-most, or most significant. The following table shows several integers and their encoding in sign and magnitude using 10 bits.

Example

Integers encoded with sign and magnitude using a 10 bit representation.

Number in decimal | Sign | Absolute Value | Sign and magnitude |
---|---|---|---|

-342 | – = 1 | 342 = 101010110 | 1101010110 |

342 | + = 0 | 342 = 101010110 | 0101010110 |

-23 | – = 1 | 23 = 10111 | 1000110111 |

Note that in the last example, the encoding of number -23 requires its absolute value to be represented with 9 digits, and therefore, the appropriate number of zeros are added as most significant bits.

A sign and magnitude representation of size n bits allows to encode the integers in the interval \([-(2^{n - 1} - 1), 2^{n-1} - 1]\). This expression is derived from the fact that the highest number for the encoding starts with a zero (for the positive sign) followed by all ones. Analogously, the smallest value has a 1 in its left-most bit (negative) followed by all ones. The following figure shows this range for a sign and magnitude representation using 8 bits.

But this encoding technique has an undesirable property. In binary encoding,
n bits allow for \(2^n\) combinations, but with sign and magnitude, only
\(2^n - 1\) combinations are used. The problem is that the number zero has
two representations, all zeros, and a 1 as most significant bit and the rest
zeros. This means that an element of the initial set (in this case the value
zero as integer number) is represented by two bit combinations, thus wasting
the opportunity of encoding one more integer in that interval. There are other
encoding schemes that assign a single combination of bits to the number
zero. One of them is known as *2s complement*.

The 2s complement encoding scheme allows \(2^n\) consecutive integers around the value zero to be encoded with n bits. More precisely, the range of integers represented by n bits in 2s complement is \([-(2^{n-1}), 2^{n-1} - 1]\). The following figure shows the range of integers that can be encoded with 2s complement using 8 bits.

As you can see, the combination with eight ones represents the number -1.

The translation of an integer encoded in base 10 to a 2s complement with n bits is done with the following steps:

- If the number is larger or equal to zero, simply translate it to base 2 encoding by successively dividing by 2 and taking the reminders as the bits in increasing order of significance.
- If the number is negative, apply these three steps:
- Obtain the base 2 encoding of the absolute value of the number.
- In that representation, replace every 0 by a 1 and every 1 by a 0.
- Add 1 to the resulting number

For example, suppose you have to obtain the representation of the number -115 in 2s complement with 8 bits. We first obtain the representation in base 2 of the absolute value 115, which is 01110011 (or its equivalent in hexadecimal 0x73). Then, we switch ones by zeros and zeros by ones to obtain 10001100. Finally, we add 1 to the result obtaining 10001101.

The translation of an integer in 2s complement with n bits to its representation in base 10 is done with the following steps:

If the number is positive (that is, its most significant bit is zero), the number in base 10 is obtained by adding those powers of two for which the corresponding bit is one (as explained before).

If the number is negative (that is, its most significant bit is one), the base 10 value can be obtained by either of the following two methods:

Applying the equation:

\[ABS(N) - 2^n\]where \(ABS(N)\) is the value obtained when translating the n bits directly into a base 10 number.

Applying the following three steps:

- Replace every 0 by a 1 and every 1 by a 0 in the n bits.
- Add 1 to the resulting number.
- Translate the resulting number to base 10 and take its value as a negative number.

Example

Consider the value 10110110 which represents an integer encoded in 2s complement with 8 bits. Its corresponding value in base 10 is calculated as:

Or, alternatively:

- Replace zeros by ones and ones by zeros: 01001001
- Add one: 01001010
- Translate the value to base 10 and take as negative number: 2 + 8 + 64 = \(-74\)

The following table shows the relationship between the representation of naturals and integers i2s complement.

Numbers | Representation in base 10 |
---|---|

\(N\in [0,2^n - 1]\) | \(N = \sum_0^{n-1} 2^ib^i\) |

Positive integers \(N \in [0, 2^{n-1} - 1]\) | N has 0 as most significant bit: \(N = \sum_0^{n-1}2^ib^i\) |

Negative integers \(N \in [-2^{n-1}, -1]\) | N has 1 as most significant bit: \(N = \sum_0^{n-1}2^ib^i - 2^n\) |

Adjusting the encoding of an integer in 2s complement to a scheme with a larger
number of bits, requires special steps depending on the sign. If the number is
positive, the adjustment consists on adding zeros to the most significant bits
until the new length is reached. However, if the number is negative, adding
zeros as most significant bits would change the sign. In this case, extending
the number adding ones to the most significant bits maintains the encoding
valid in 2s complement. This operation is known as **bit extension**.

The 2s complement scheme to encode integers has several properties that make it
very efficient. The problem with respect to the double encoding of the value
zero present in the sign and magnitude scheme is not present in this one. Zero
has a unique representation and it corresponds with its value in base 2. The
most significant bit (or the left most) still represents the sign of the number
with identical correspondence as the one used in sign and magnitude. Positive
numbers have their most significant bit set to zero, and negative numbers have
that bit set to one. The positive integers are encoded **exactly** as the
natural numbers, that is, with their encoding in base 2.

But the most interesting property of the 2s complement scheme is that the addition and subtraction operations can be done following base 2 rules as if the numbers were naturals. This property is very important because in the context of digital circuits it means that the same circuit used to add and subtract naturals can be used with no modifications to operate on integers encoded in 2s complement.

The following video summarizes how to encode integers using sign and magnitude, and 2s complement scheme.

## Encoding Real Numbers with Floating Point Representation¶

Encoding real numbers using binary logic is significantly more complex that the
case of naturals and integers. As we have seen for the case of naturals and
integers, when using an encoding with a fixed number of bits, there is a range
of numbers that can be represented. But in the case of the real numbers, there
are infinite values in that interval. Thus, the representation of real numbers
must be restricted to a certain interval **and** to certain values within that
interval. This feature of the encoding translates into several issues that need
to be solved when operating with these numbers.

Suppose that two real numbers are added, and the result **does not** correspond
with one of the values included in the encoding. This situation is perfectly
possible because only a subset of possible values are encoded. The only
possible solution in digital circuits is to **approximate** the result by the
closet number that can be represented. This means that when a digital circuit
manipulates real numbers it introduces **an error**. In principle any real
number that is manipulated by a digital circuit may contain some error, which
in some cases could be zero.

This potential error derived from the representation of real numbers is specially important in those systems that perform massive calculations with real numbers as the error may increase to the point of invalidating the results. There are programming techniques to address this issue and minimize the error obtained when manipulating real numbers.

Real numbers are encoded in binary logic with the technique known as **floating
point**. Every real number is represented as set of digits to the left of the
point and another set of digits to the right of the point. The part to the
right of the point is called *mantissa*. Multiplying and dividing by the base
allows the point to be moved to the left and right respectively. The following
example shows the equivalence of real numbers in base 10 when they are multiplied
and divided by the base.

Example

Effect of multiplying by the base with respect to the point.

By multiplying and dividing by the base it is possible to represent any real number as a number in which the first non-zero digit is to the right of the point times the base raise to the appropriate exponent.

Note

The floating point representation is similar to what is known as the
*scientific notation*. The difference is that in the scientific notation the
multiplications and divisions by the base are done to leave **only one
non-zero digit** to the left of the point. So, the difference between the
two notations is in one unit in the exponent.

The advantage of the floating point representation is that it allows to represent certain very big and very small numbers in a very compact form. The number \(0.1 * 10^{-10}\) would require 11 digits if represented normally with a fixed point. However, in floating point it only requires to represent the mantissa \(0.1\) and the exponent \(-10\).

The representation of real numbers using base 2 follows exactly the same rules as the case of the naturals with respect to the weight of any of the digits. The weight of the bits in the decimal part is obtained by decreasing negative powers of the base (\(2^{-1}, 2^{-2}, 2^{-3}\), etc.)

Example

The effect of multiplying and dividing by the base applies identically to base 2. In this case, the factor used is 2 to the power of the appropriate number.

Example

Multiplying by powers of 2 real numbers in base 2

Real numbers in base 2 can be represented by first adjusting the value of the mantissa so that the first non-zero digit is at the right of the point, and it is then multiplied by the appropriate power of the base. But this representation is still not suitable to be managed by digital systems. Remember that they can only manage zeros and ones. The technique to achieve the proper encoding is to encode separately the three ingredients: the sign of the mantissa, the mantissa, and the exponent (as the base is implicitly 2).

The sign of the mantissa can be easily encoded by a single bit following the
encoding used for the integers: zero represents the positive sign, and 1
represents the negative sign. The second element is the mantissa, that is already
in binary format and therefore does not need any additional processing. The
exponent is an integer, and therefore we can encode using any of the methods
previously described (sign and magnitude or 2s complement). In summary, a real
number is now represented by three *pieces*: one bit encoding the sign, a set
of bits encoding the mantissa, and a set of bits encoding an integer which is
the exponent of the floating point representation. The following figure shows
the structure of this representation as a sequence of bits.

In this type of encoding, it is not enough to know the number of bits for the overall encoding, but we need to know how many bits within the encoding are devoted to the mantissa, how many of them to the exponent, and how is the exponent value being encoded (sign and magnitude, 2s complement, 1s complement, excess K, etc.)

### How do we encode zero?¶

In the previously described scheme we have assumed that the mantissa of a real
number is obtained by shifting the point such that the first non-zero digit is
at the right of the point. But what happens if there is **no non-zero digit**?
This is actually the case when trying to represent the number zero. This is a
special case that needs to be addressed if we are using the floating point
representation. The way zero is encoded is typically assigning a special
combination of bits in the mantissa and exponent and check for the presence of
such combination before performing such operation.

### Range, Accuracy and Precision of the Floating Point Representation¶

Two representations of real numbers with the same size, for example 14 bits, may represent different intervals and values depending on the number of bits assigned to encode the mantissa and exponent (the sign of the mantissa is always represented by one bit). For example, in one scheme the 14 bits are allocated as 1 for the sign, remaining 7 for the mantissa and 6 bits for the exponent. If we assume that the exponent is encoded with 2s complement values, then the range of possible values to represent is

This interval can be refined a bit further by stating the intervals of negative and positive non-zero numbers allowed:

Another possibility with the same size (14 bits) is to assign one bit for the sign, 5 bits for the mantissa, and 8 bits for the exponent. In this case, the intervals are

As we can see, changing the assignment of bits changes the range of the representation. But, what is the effect of the different distribution of bits for the mantissa and the exponent in the accuracy of the encoding? Let us explore it by two extreme cases of a 32 bit floating point representation.

Suppose that in one case the 32 bits are distributed as: 1 bit for the sign, 2
bit for the mantissa and 29 bits for the exponent. Such encoding would cover a
range of very large and very small numbers (because the exponent can be high in
absolute value). However, for each possible exponent value, only two possible
mantissas are possible: \(10\) and \(11\) (remember that the mantissa
has its most significant bit always one). Therefore we would expect the errors
that appear when trying to encode numbers to be high as only two bits can be
used for the mantissa. In this case the encoding has a **low precision** in
general.

Suppose now the other extreme. Out of the 32 bits used to encode real numbers,
one bit is used for the sign of the mantissa, 29 for the mantissa, and the
remaining two for the exponent. If the exponent is encoded in 2s complement,
then the only possible values are \([-2, -1, 0, 1]\). This means that only
a very narrow interval of numbers can be represented. However, from within that
interval, the numbers can be represented with **better precision**.

As a consequence, two floating point encoding schemes with identical overall size may have different precision due to the distribution of the bits between the mantissa and the exponent.

What is the difference between accuracy and precision? Precision is a property of a generic encoding. In other words, we may compare the precision provided by using 10 or 12 bits for the mantissa. However, the accuracy refers to how close a binary encoding of a real number is to the real magnitude we want to encode. In other words, the precision is a property of the overall encoding scheme, whereas the accuracy is a property of one concrete real number and the binary encoding used to represent it.

### Overflow and Underflow in Floating Point Encoding¶

As in the case of the naturals and integers, when a number to be represented is
outside of the range of valid values an **overflow** is produced. But in the
case of the real numbers, this situation is a bit more complex. Aside from the
case of numbers being greater or lower than the extremes of the interval,
another special situation appears when trying to represent a very small
number.

Suppose that the smallest positive number represented by a floating point
scheme (independently of the overall size, size of the mantissa and size of the
exponent). What is the result of dividing such number by 3? Given that the
number is positive, the result is positive. Furthermore, since it is the
smallest positive number, the closest number to this result is **zero**. But
approximating a non-zero value by zero has a devastating effect in some
operations. Suppose the result of this operation is then multiplied by a very
large number, in that case, the result is zero, and therefore the error is
enormous. In fact, the consequence of approximating a non-zero number by zero
is that the from that point, the error introduced by the floating point
representation is *unbounded*. This is the reason to treat this situation as an
anomaly known as **underflow**.

Both situations, underflow and overflow, while operating with real numbers are reported as exceptions with the hope that some other program or application will take the appropriate measures. From the point of view of a computer system, though, the designers must include the required hardware to detect these situations when performing mathematical calculations.

Given the importance of defining the right floating point encoding for computer systems, the Institute of Electrical and Electronics Engineers (known simply as the IEEE) has defined a standard to encoded floating point numbers known as the IEEE 754 format. The standard defines five representations: binary floating point with 32, 64 and 128 bits, and decimal floating point with 64 and 128 bits.

## Encoding set of symbols¶

Aside from manipulating numbers, computer systems must be also able to manipulate sets of arbitrary symbols. As in the case of numbers, though, they must be encoded with bits despite the fact that they are not manipulated with arithmetic operations. A binary encoding of a set of symbols requires three ingredients:

- A set of symbols.
- The number of bits to use in the encoding.
- The correspondence between each symbol in the set and the sequence (or sequences) of bits representing it.

The number of bits to use when encoding a set has the restriction derived from equation (2). With n bits, a maximum of \(2^n\) elements can be encoded, therefore, if \(C\) is the cardinality of the set of symbols, the following equation must be satisfied:

Example

Suppose you have to encode the symbols in the set \(S = \{ \text{Red}, \text{Green}, \text{Blue}\}\). You decide to use the minimum number of bits for its encoding, in this case, two. The correspondence between the symbols and its encoding can be established in the following table:

Symbol | Encoding |
---|---|

Red | 00 |

Green | 01 |

Blue | 10 |

Since the cardinality of the set is not a power of two, there are more combination of bits than elements which allows more than one sequence of bits to be assigned to one element.

As it can be seen, given a set of elements and a number of bits, there are multiple ways in which the correspondence between sequences of bits and symbols can be defined. Although encoding of sets of symbols can require an ad-hoc process, there are certain set of symbols that are used very frequently and some common schemes have already been defined.

### Encoding Characters¶

One of the sets of symbols that are used more often by computer systems are those introduced through the keyboard. The first computer systems already required to encode the set of symbols used to write text. Additionally, since computer systems typically exchange text information very often, it is important that all of them use the same encoding.

One of the character encoding scheme that has been used very often was ASCII (American Standard Code for Information Interchange). This encoding includes not only letters and digits but also special symbols used when transmitting telegraphic messages. The size of the encoding was decided to be 7 bits, and therefore, a maximum of 128 symbols could be encoded. The following figure shows a table from 1972 with the ASCII encoding used in a printer.

The first 32 combinations (from 00000000 or 0x00, to 00011111 or 0x1F) are used to encode symbols that are not printable and were used to handle the transmission of text between telegraphs. The symbols with codes from 32 (0x20) to 127 (0x7F) correspond with printable symbols. For example, the white space is encoded as 0x20. Letters “a” to “z” occupy the codes 0x61 to 0x7A. Its corresponding uppercase symbols are in the codes 0x41 to 0x5A (the difference between upper and lowercase codes is a constant).

But ASCII was not the only encoding scheme used to represent letter. Other
alternatives such as EBCDIC (Extended Binary Coded Decimal Interchange Code)
were also used in other contexts and by other systems. One of major
shortcomings of the ASCII encoding is the absence of encoding for symbols that
are not part of the US alphabet. In a first stage the code was extended to
include some of them such as “á”, “ü”, or “ö”. The *extended* ASCII code was
proposed as an 8 bit code and the combinations from 128 to 255 were used to
encode those and other graphical symbols.

The extended ASCII encoding is still not enough to encode all possible symbols
in the planet. A new scheme was needed that had enough combinations to encode
all the symbols currently used in the planet for written communication. The
encoding scheme proposed with this objective is called *Unicode* and pretends
to offer a comprehensive encoding so that all computer systems are capable to
handle all the symbols used by the human race. Unicode has now been adopted by
most of the hardware and software companies and it will eventually become the
only representation that will be used.

The implementation of the encoding tries to be as generic as possible, and towards that end, instead of fixing a unique size in bits for the representation, there are several possible implementations with sizes of 8, 16 and 32 bits. These three encoding schemes are different implementations of the same standard and are known as “UTF-8”, “UTF-16” and “UTF-32” respectively. The correspondence between the numeric value and the symbols is defined for each of the three formats.

Example

Image | Symbol | UTF code |
---|---|---|

z | lower case z | 0x007A |

water | 0x6C34 | |

G key | 0xD834 0xDD1E |

Once the encoding for all the symbols has been established, strings are represented by sequences of these values. This type of encoding is used by applications such as plain text editors. The following table shows the encoding of a text file with an assembly program.

Program:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 | ```
#include <avr/io.h>
.data
msg: .asciz "Hello\n"
.text
.global main
main: ldi r26, lo8(msg)
ldi r27, hi8(msg)
push r26
push r27
call printf
pop r27
pop r26
ret
``` |

ASCII encoding of the symbols in the text:

```
23 69 6E 63 6C 75 64 65 20 3C 61 76 72 2F 69 6F 2E 68 3E 0A
0A
09 2E 64 61 74 61 0A
6D 73 67 3A 09 2E 61 73 63 69 7A 20 22 48 65 6C 6C 6F 5C 6E 22 0A
09 2E 74 65 78 74 0A
09 2E 67 6C 6F 62 61 6C 20 6D 61 69 6E 0A
6D 61 69 6E 3A 09 6C 64 69 20 72 32 36 2C 20 6C 6F 38 28 6D 73 67 29 0A
09 6C 64 69 20 72 32 37 2C 20 68 69 38 28 6D 73 67 29 0A
09 70 75 73 68 20 72 32 36 0A
09 70 75 73 68 20 72 32 37 0A
09 63 61 6C 6C 20 70 72 69 6E 74 66 0A
09 70 6F 70 20 72 32 37 0A
09 70 6F 70 20 72 32 36 0A
09 72 65 74 0A
```

### Encoding instructions¶

One of the most important encoding schemes used by a computer system is that of the set of instructions that is capable to execute. In it, each instruction with its operands is considered as an element of the set. As with any other symbolic encoding, it is necessary to choose both the number of bits to use and the correspondence between instructions and its encoding.

#### The set ual-1¶

To illustrate this type of encoding we will use an initially reduced set of
instructions which we will increase in complexity to see how the encoding can
be adapted. For the sake of clarity, we will call this set *ual-1*.

Instead of enumerating all the element in the set, we will divide each
instruction (or element in the set) in a set of *fields* and we will encode each
field independently. The encoding of an instruction will then be obtained by
combining the encoding of its fields. The instructions in *ual-1* are divided
into three parts: an operation and two operands that are integers in the range
\([0, 255]\). The operation can have one of the following four values:
add, sub, mul and div. The following figure shows the structure of the
instructions and some instructions that are part of the set.

The first step to encode the set is to calculate then number of elements in the set ual-1. The field operation has four possible values (or \(2^2\)), and the two remaining fields can have each 256 possible values (or \(2^8\)), thus the number of elements in the set is

Let us recall that the number of bits for the encoding must satisfy (2) and therefore, we need at least 18 bits. In this example, instead of using the minimum number of bits, we will use an encoding with size that is multiple of 8 so we will use 24 bits (3 bytes).

The correspondence between symbols and its binary representation can be established more succinctly referring to groups of instructions instead of enumerating all possible 262144 elements. The proposed set of rules to encode this set are:

The field operation will be encoded with two bits with the following correspondence:

Symbol Encoding add 00 sub 01 mul 10 div 11 The two 8 bit numbers are encoded with their regular base 2 encoding.

The fields are encoded from left to right in the order shown in the previous figure and the representation is completed with 6 bits with value zero

**on the right side**of the code for a total of 24 bits.

The following figure shows the format for the binary encoding of ual-1.

Following this encoding, the instruction representing the addition of the numbers 0x27 and 0xB2, and represented by add 0x27 0xB2, is encoded in binary as shown in the following figure.

The most important property of this encoding is that the translation of any
instruction to its representation in binary and vice versa is a **systematic
process**, and therefore, can be done automatically by a computer system. The
following table show the encoding of other symbols of ual-1.

Symbol | Encoding |
---|---|

ADD 0x27 0xB2 | 0x09EC80 |

SUB 0x11 0x2F | 0x444BC0 |

MUL 0xFA 0x2B | 0xBE8AC0 |

DIV 0x10 0x02 | 0xC40080 |

As you can see, the resulting encoding does not contain exactly the same digits that are part of the operands of the symbol. This occurs because the bits are interpreted in groups of four depending on their position in the overall encoding.

#### The set ual-2¶

We next define a new set of symbols derived from ual-1 but with a more complex structure. We add a fourth element to the format to represent the location to store the result of the operation with only two possible values, Location_A and Location B. We will call this new set of symbols ual-2. The structure of the new symbols and some examples are shown in the following figure.

The new set ual-2 has **double** number of elements than ual-1. For each
element in ual-1 there is one in ual-1 with the suffix Location_A, and
another with the suffix Location_B. Thus, the number of elements in ual-2 is
\(2^{19}\) and at least 19 bits are required to be encoded in binary. As in
the case of the set ual-1 we will use a number of bits that is multiple of 8,
and therefore, use 24 bits.

The correspondence between the elements in ual-2 and its binary encoding follows the same rules applied for ual-1 with the exception that the last field is now encoded with one bit as shown in the following table:

Symbol | Encoding |
---|---|

Location_A | 0 |

Location_B | 1 |

The new encoding uses 19 of the 24 bits with the structure shown in the following figure.

The interpretation of this last field in the instruction is the location in which the result of operating the two given numbers is stored. For example, the symbol add 0x10 0x02 Location_B means that the numbers 0x10 and 0x02 are added and the result stored in Location_B. The process of translating a symbol in the set to its 6-digit hexadecimal encoding and vice versa can be done systematically as well. The following table shows several examples of symbols in ual-2 and its binary en hexadecimal encoding.

Symbol | Binary | Hexadecimal |
---|---|---|

div 0x10 0x02 Location_A | 11 0001 0000 0000 0010 0 00000 | 0xC40080 |

add 0x10 0x02 Location_B | 00 0001 0000 0000 0010 1 00000 | 0x0400A0 |

mul 0x10 0x02 Location_A | 10 0001 0000 0000 0010 0 00000 | 0x840080 |

The Instruction decoding, or deducing which symbol corresponds to a given binary representation is precisely what the microprocessors do when they receive an instruction to execute.

As you can see, the encoding schemes of ual-1 and ual-2 have several arbitrary
decisions: the correspondence between operations and their binary codes, the
order in which the binary codes are concatenated, the total number of bits, the
position of the additional *filler* bits, etc. These decisions may be modified
and producing different encoding schemes that are also valid. The property that
must be preserved is to encode and decode the symbols unequivocally.

The filler bits used to extend the encoding to 24 bits offer some degree of ambiguity. Given two 24 bit numbers, if the difference among them is in the filler bits, they both represent the same symbol. This property does not affect though the process of encoding and decoding instruction, it is only a consequence of having more binary combinations available than elements in the set.

#### The set ual-3¶

We now define the new set ual-3 in which the two first operands of each
instruction can also include one of the two locations (Location_A,
Location_B). This new set extends the ual-2 with those symbols in which the
second or third part is a location. More formally, the set ual-2 is a
**subset** of ual-3. The interpretation of the symbols in this new set is that
both locations can now be the place where operands are stored. For example, the
symbol add Location_A 0x10 Location_B which is in ual-3 encodes an
instruction that takes the value (previously) stored in Location_A, adds it
to 0x10 and stores the result in Location_B.

The number of elements in ual-3 is obtained again multiplying the combinations
possible in each of the fields in the instruction. The operation code has four
possible values, but the second and third fields now have 256 possible numbers
**plus** the two locations. Thus, the number of elements of ual-3 is

Following the encoding scheme used until now, the size of the encoding will be augmented to a multiple of bytes, and therefore will occupy 24 bits. The correspondence between symbols in the set and their binary encoding needs to be reviewed though as the previous rules are no longer valid in this set. The two operands that follow the operation cannot be encoded with 8 bits as they have 258 possible combinations. At least 9 bits are required to encode the operands in ual-3. With 9 bits there are 512 possible combinations, but only 258 of them are needed. One possible solution is to divide the 9 bits into two groups of 1 and 8 bits. If the most significant bit is zero, then the following 8 bits encode a numeric value between 0 and 255. If the first bit is 1, then the operand is one of the two possible locations, Location_A or Location_B. In this last case, only one of the 8 bits is needed, and the remaining 7 are ignored. The following figure shows this encoding scheme.

Due to the fact that we are encoding 258 elements with 9 bits, there are various combinations that are not used. More precisely, when a instruction has a location as its second or third field, the first bit of that field has the value one, and all combinations derived from the following seven bits all represent the same value, because they are not relevant. The following table shows examples of operand encoding in ual-3.

Binary encoding | Operand |
---|---|

000010010 | 0x12 |

100010011 | Location_B |

001111010 | 0x7A |

100010010 | Location_A |

1XXXXXXX0 | Location_A |

As you can see in the last row of the previous table, when the value of the first of the nine bits encoding the operand is one, the following seven bits are not relevant, and they are typically represented by the symbol X.

The new correspondence between binary values and symbols in ual-3 is now defined as comprising seven fields: the operation, the bit indicating the type of the first operand, the first operand, the bit indicating the second operand, the second operand, the third operand, and the filler bits. The following figure shows this format.

The following table show examples of instructions from ual-3 and their corresponding encoding.

Symbol | Binary Encoding |
---|---|

add 0x01 0x00 Location_A | 0x002000 |

add 0x02 0x00 Location_B | 0x004008 |

mul Location_B 0x03 Location_A | 0xA02030 |

mul Location_B 0x04 Location_A | 0xA02040 |

The encoding schemes like the ones shown until now are a simplified version of
those used in commercial processors. These circuits have **pre-defined set of
instructions** that can be executed, and each of them is encoded in
binary. The moment in which the board in which a microprocessor is mounted
receives voltage, the circuit starts to get instructions from memory, decode
their binary representation, and executed the required operations.

The binary encoding of a sequence of instructions is simply the sequence of the binary encoding of each of the instructions. The microprocessor then obtains these representations in sequence and executes them. The values of this sequence are stored in the RAM memory, sharing the space with the data required to execute a program.

The number of bits used to encode a set of symbols has a direct impact on the amount of memory needed to store a sequence of instructions. The more compact the encoding, the least amount of memory is required, and more information can be stored in memory. However, a very compact representation of the instructions may require a sophisticated decoding scheme. This trade-off between the size of an instruction and the complexity of its decoding is taken into account by designers.

#### Instructions with variable size¶

After encoding the symbols in ual-3, would it be possible to encode the instructions with a representation using less than 24 bits? The immediate answer would be to get rid off the filler bits. Could the representation be reduced any further? For those symbols for which the two operands are natural numbers, each bit of the encoding is already providing information, and therefore it is not possible to reduce its size.

However, for those symbols in which any of the two operands is one location,
some of the bits are not used. A new scheme could be defined in which these
bits are removed **only if the first bit of the operand field is one**. A new
scheme is proposed that has the following rules:

- If any of the operands is a number of eight bits, the encoding scheme used in ual-3 is maintained.
- If the operand is a location, two bits are used. The first one has always the value 1 and the second encodes the location.

With this new encoding scheme, the symbol add Location_B 0x10 Location_B is encoded with the 14 bit sequence with value 00 11 0 0001 0000 1. If we decide to maintain the rule that each symbol must be represented by a multiple of eight bits, then the symbol can be encoded with 16 bits with the value :0x3084 (adding two filler bits at the end).

Does this encoding maintain the property of being systematic? In other words,
can we think of an automatic process that receives a sequence of bits and
detects the instruction that it encodes? The answer is yes, but with one
important modification. Encoding the symbols (obtain its corresponding binary
representation) is clearly defined by the previous
rules. Decoding the symbol (derive the symbol from its binary representation)
requires a different strategy because the size of the encoding may have
different size depending on the operands. In this case, the decoding must be
done **incrementally**.

The first step is to obtain the first byte of the encoding and analyze the bits to know the type of operands. Depending on the value of these bits we may obtain the required bits to decode the complete value of the operands. For example, if we are decoding the value 0x3084 we can see that the two first bits are zero, which corresponds to the operation add, and the next one has the value 1, which means that instead of obtaining the next eight bits, the operand is a location and is encoded in the immediately following bit encoding one of the location. The same process is repeated to the following bits. The fifth bit is a zero meaning that the second operand is a natural number and is encoded in the following eight bits.

This decoding procedure is systematic, it follows a set of unequivocally defined rules, but now it processes instructions that have different sizes depending on the type of operands. More specifically, with this scheme, and assuming that the encoding has always to be multiple of eight bytes, there are instructions of one, two and three bytes depending on the type of operand that they contain.

This last example shows the division of current commercial microprocessors depending on the format of the instructions: fixed format or variable format.

- Fixed format: all the instructions have the same length. They are typically divided into fixed fields and each field encodes one element of the instruction. The main advantage is that the decoding process is very efficient due to its simplicity. The main disadvantage is that the encoding may waste space in some of the fields, so the encoding is not compact.
- Variable format: instructions have different size depending on the information they contain. The main advantage is that the binary representation can be very compact, thus requiring the minimum space in memory. The main disadvantage is the decoding may require a set of non-trivial steps that look at the bits gradually. A complex decoding may require extra time thus impacting the overall performance of the processor (the number of instructions executed per second).

Today’s commercial microprocessors are divided into those with a RISC (Reduced Instruction Set Computer) architecture where there is only a small number of instructions with the same length that can be decoded very fast, and those with a CISC (Complex Instruction Set Computer) architecture where there numerous instructions performing complex tasks, encoded with variable length, and with a complex decoding phase.

### Machine Language¶

Encoding instructions like the ones in the previous section requires then a
decoding process that in commercial processors needs to be implemented as a
**digital circuit**. The encoding of the instructions with binary logic is what
is known as the *machine language* of a processor, and is the only language
understood by the system. Thus, writing a program for such system requires a
detailed knowledge of all the instructions and the rules to derive their
corresponding binary encoding.

The description of the machine language of a processor is usually accompanied by the description of its architecture (the internal parts in the processor). The reason is because the instructions that are part of the machine language are used to manage the different parts of the processor. Thus, each processor has a document describing its architecture and the set of instructions that can execute with the type of operands that can be used.