Table of Contents
Data Representation and Encoding¶
In this chapter we study the techniques used to encode all the information used by a processor using only bits, that is, 0 and 1. Digital circuits are capable of manipulating groups of bits. The problem then is to decide how many bits are required to encode certain information and how such encoding is performed.
Binary Logic¶
The digital electronic circuits are capable of manipulating signals in two possible states: zero and one. As a consequence, all the information processed by these circuits needs to be first encoded using sets of the signals called bits. A digital circuit is then a system capable of receiving data encoded with a set of bits (inputs), perform some operation, and producing another set of signals (outputs) with the result. A microprocessor is a complex digital circuit capable of executing a previously defined set of instructions encoded with bits and known as Machine Language.
But in order for a microprocessor to execute these instructions, all their elements (numbers, symbols, operands, etc.) need to be encoded with binary logic. This encoding is the base to understand how a computer which is made mostly of digital circuits capable of manipulating bits can perform much more complex tasks such as allowing you to browse through the net, making a phone call, playing a game, listening to music, read your heart beat, etc.
In the following sections we will manipulate numbers in different bases, mostly in base 10, base 8 (or octal) and base 16 (or hexadecimal). You will need quite frequently to translate numbers between those bases. Even though the steps to make these translations will be explained, it is very helpful to carry out these operations with a calculator, or more precisely, a programmers calculator that offer the operations typical of this context.
These programmers calculators usually come installed by default in the most popular operating systems. The following figure shows you the ones currently present in Microsoft Windows, MacOS and some distributions of Linux.
Properties of a binary encoding¶
Using a single bit, there are only two elements that can be represented, one for each of the possible values 0 and 1. If we use sequence of two bits, the number of possible combinations increases to four values: 00, 01, 10, and 11. For every bit we add to the sequence, the number of possible combinations doubles because we can repeat the combinations twice but adding a one to the first group and a zero to the second group.
If a single bit can encode only two elements, and for each additional bit we include in a sequence we double the number of combinations, in general with \(n\) bits we can encode up to \(2^n\) elements.
The previous formula returns the number of possible combinations when using a sequence of \(n\) bits, but if we have a set with \(N\) elements, how many bits are needed to encode its elements in binary logic? For example, suppose the elements are \(\{red, green, blue, cyan, magenta\}\), is it possible to encode then in binary logic with two bits? with three? with four? The answer to this question is that when attempting to encode a set with \(N\) elements, the number of possible combinations must be larger or equal than the number of elements. In other words, then number of bits \(n\) required to encode the elements of a set with \(N\) elements must satisfy
Or alternatively, given a set with \(N\) elements, the minimum number of bits required to encode its elements with binary logic is
where the symbols ⌈⌉ represent the integer larger than the obtained logarithm. Going back to the previous example, the number of bits required to encode the set of five elements is \(n \ge \log_{2}5\), and therefore \(n \ge 2.3219281\), so we need at least 3 bits.
This inequality is useful to know the minimum number of bits to encode a set, but it does not provide any statement about the maximum. When encoding a set of elements, using more bits than the minimum provided by equation (2) is perfectly feasible. An encoding may have a number of combinations larger than the number of elements in the set, thus leaving some of them unused.
The two previous equations can be transformed into the two rules to take into account when encoding a set of elements using binary logic:
With \(n\) bits we can encode up to a maximum of \(2^n\) elements.
To encode \(N\) elements in a set we need at least \(\lceil\log_2 N\rceil\)
Once the number of bits has been decided, you need to define a relationship between each element in the set and a concrete sequence of bits. Each element must have at least one binary representation, and each sequence of bits must correspond with one element in the set. This relationship must satisfy some minimum requirements to be usable by digital circuits. These circuits can only manipulate sequences of bits up to a certain length which needs to be decided beforehand. For example, if a circuit must operate with natural numbers, you need to decide which subset of those numbers will be used and from there, the number of bits required for their encoding.
Representing numbers in different bases¶
Before studying how to encode elements in different sets so that they can be manipulated by a computer system, it is very useful to study how to represent natural numbers in different bases. Conventionally, we write natural numbers using 10 digits (0, 1, 2, 3, 4, 5, 6, 7, 8 and 9) in what is known as base 10. Given a number written in base 10, the least significant digit which is written in the right most position corresponds with the units. We will use the numbers in base 10 which are the ones normally used, to explain how to generalize this representation to any base. The digit in the left most position is the most significant. Any number represented in a base \(b\) satisfies the following conditions:
\(b\) digits are used to represent numbers. The digits go from 0 to \(b  1\).
The number represented by \(b  1\) is followed by the number 10.
Analogously, the maximum number with \(d\) digits is followed by one with \(d + 1\) digits in which the most significant is 1 and the rest are all zeros.
These conditions are satisfied by numbers encoded in base 10 using digits 0 to 9. The number following 9 is 10, and the maximum number represented by, for example, 4 digits (which is 9999) is followed by 1 and five zeros (10000). The value of a sequence of digits is obtained multiplying each of them by the base raised to the exponent denoting its position starting by the least significant one being zero as shown in the following example.
Example
The same number can be rewritten using the base raised to the appropriate exponent as follows:
The general formula to obtain the value of any number on any base can be rewritten as
where \(d_i\) represents the digit in position \(i\) in the number. The formula applies to any base. Thus, the value of any ndigit number in base \(b\) is obtained by the following equation:
Example
The equivalent in base 10 of the number 6342 in base 7 can be obtained by using equation (3):
Thus, number 63432 in base 7 corresponds to 2235 represented in base 10. As you can see, the number in base seven does not have any digit bigger than 6, because only the digits from 0 to 6 are allowed.
When writing numbers in different bases together we have an ambiguity problem. For example, the number 2235 in base 10 from the previous example is the equivalent of number 6342 in base 7. In order to distinguish this number from the same number in base 10, the base is included in the right side of the number as a subscript. Thus we can write
Translating numbers to different base encodings¶
Equation (3) allows to obtain value in base 10 of any number in any base. The opposite process, that is, given a number in base 10 obtain its equivalent in a different base requires repeated divisions by the base to obtain the digits of the new number starting by the least significant. The process is easily shown with an example. Suppose you want to calculate the equivalent in base 7 of the number \(88675_{10}\). The least significant digit can be obtained by using equation (3).
If equation (4) is divided by the base we obtain as reminder the least significant bit \(d_0\) of the representation. The quotient contains the rest of digits and if we keep divided by the base, the rest of the digits in the new base are obtained.
Example
Consider the number \(8675_{10}\). To obtain its representation in base 7 we perform the first division by 7 to obtain a reminder of 2, thus being the least significant digit. The quotient obtained is 1239. When divided again by 7, the reminder is 0, and the new quotient is 177. If this operation is repeated the successive reminders are the digits of the number in base 7 as shown in the following equation:
The successive divisions by the base guarantees that eventually a quotient with value less than the base. When this occurs there is no need for more divisions. The representation in base 7 can be used to apply equation (3) and obtain again the number in base 10:
and therefore
In summary, given a number in base 10, its representation in another base \(b\) is obtained by collecting the reminders of successive divisions by the new base until the quotient is less than the base. The digits are obtained from least to most significant.
Encoding natural numbers¶
The type of elements that are simplest to encode in binary logic to be manipulated by a computer is the set of natural numbers \(\mathbb{N}\). The representation corresponds with the numbers encoded in base 2. The digits in this base correspond with the two values that can be manipulated by digital circuits. Thus, given a binary number with n digits (or equivalently, n bits) its equivalent in base 10 is obtained applying equation (3) in base 2:
But since the only possible values for \(d_i\) are 0 and 1, the previous equation can be interpreted in a simplified way. Given a number in base 2, its equivalent in base 10 is obtained adding those powers of 2 with exponents corresponding to the place in which there is a 1.
Example
Consider the binary number \(1101011_2\). Its equivalent in base 10 is obtained as
Thus, when manipulating numbers in binary format and know its equivalent in decimal it is important to get familiar with the values of the powers of two that shown in the following table:
Bit 
0 
1 
2 
3 
4 
5 
6 
7 
8 
9 
10 
11 
12 
13 
14 
15 

Weight 
\(2^0\) 
\(2^1\) 
\(2^2\) 
\(2^3\) 
\(2^4\) 
\(2^5\) 
\(2^6\) 
\(2^7\) 
\(2^8\) 
\(2^9\) 
\(2^{10}\) 
\(2^{11}\) 
\(2^{12}\) 
\(2^{13}\) 
\(2^{14}\) 
\(2^{15}\) 
Decimal 
1 
2 
4 
8 
16 
32 
64 
128 
256 
512 
1024 
2048 
4096 
8192 
16384 
32768 
The following table shows a few examples of how to obtain the representation in base 10 of three 8 bit binary numbers:
Weight 
128 
64 
32 
16 
8 
4 
2 
1 
Total 

Binary 
0 
0 
1 
0 
0 
1 
1 
1 

Base 10 
32 
4 
2 
1 
39 
Weight 
128 
64 
32 
16 
8 
4 
2 
1 
Total 

Binary 
1 
0 
1 
1 
0 
0 
1 
0 

Base 10 
128 
32 
16 
2 
178 
Weight 
128 
64 
32 
16 
8 
4 
2 
1 
Total 

Binary 
1 
1 
0 
1 
1 
0 
0 
1 

Base 10 
128 
64 
16 
8 
1 
217 
The opposite procedure, converting a number in base 10 to binary is done following the procedure explained in Section Representing numbers in different bases. In this case, the number is divided by 2 and therefore the reminder can only be 0 or 1 producing the bits for the binary representation.
Example
The representation in binary of the number 217 in base 10 is:
The digits of the binary results are obtained from least to most significant, thus the result is \(11011001_2\).
The representation in base 2 has several properties that are very useful when operating with these numbers. To know if a number is odd or even it is enough to check the least significant bit. If it is 1, the number is odd, otherwise, it is even. This is because any binary number is obtained adding powers of 2. All powers of 2 are even numbers except the first one \(2^0\), which is precisely the value of the least significant bit. Thus, a number in binary is odd or even depending only on this bit.
The second property applies to any base. Integer multiplication or division by the base is done by adding a zero as least significant digit or removing the least significant bit respectively. For the numbers in base 10, multiplying by 10 is done adding a zero as the least significant bit. Analogously, if a number in base 10 is divided by 10, the quotient is obtained by ignoring the least significant digit, which itself corresponds with the reminder of the division.
In binary logic, the multiplication and division by 2 correspond respectively to adding a zero as least significant bit, or removing the least significant bit. For example the binary number \(100111_2\) which corresponds with \(39_{10}\), if multiplied by 2 results in \(1001110_2\) which represents 78 in base 10. Similarly, if the same number is divided by two, its quotient is \(10011_2\) which represents \(19_{10}\), and the reminder is 1 (\(39 = 19 * 2 + 1\)).
Encoding in bases 8 and 16¶
Encoding in base 8, also known as octal even though it might not look very useful to handle digital circuits, it satisfies a special property that makes it important. Applying the concepts presented in the previous section, numbers encoded in this base use digits 0 to 7. Number 7 is followed by 10, and 77 by 100. To translate a number from base 10 to base 8 we proceed with the successive divisions by 8 to obtain the digits of the new representation. But since 8 is a power of 2 (\(2^3\)), the successive divisions can easily be applied to a number represented in binary. In other words, to translate a binary number to base 8, there is no need to first calculate its value in base 10 and then divide successively by 8, the divisions can be trivially performed directly in the binary number. How?
The quotient and reminder obtained when dividing a binary number by eight are obtained directly. The three least significant bits represent the reminder, and the rest of the bits (after the three least significant have been removed) is the new quotient. Thus, the translation of a binary number to octal consists simply on creating groups of three bits starting from the least significant, and translate each group to its corresponding value between 0 and 7 as shown in the following table:
Octal digit 
0 
1 
2 
3 
4 
5 
6 
7 
Binary 
000 
001 
010 
011 
100 
101 
110 
111 
If the number of bits in the binary number is not multiple of three, in other words, the last group has one or two bits, the group is completed up to three bits with zero values in the most significant bits (because they do not alter the value of the number). The following table shows examples of how this translation is performed:
Number in binary 
Groups of 3 bits 
Number in octal 

\(00100111_2\) 
000 = 0, 100 = 4, 111 = 7 
\(47_8\) 
\(10110010_2\) 
010 = 2, 110 = 6, 010 = 2 
\(262_8\) 
\(11011001_2\) 
011 = 3, 011 = 3, 001 = 1 
\(331_8\) 
Given how simple is the process to translate from base 2 to base 8 and vice versa, base 8 is actually used as a more compact way to represent binary numbers. A long sequence of zeros and ones is written as its base 8 equivalence and therefore is much shorter. Additionally, instead of representing the number with the subindex 8 following the number, there is the convention of adding a zero as the leftmost digit. Thus, the numbers \(47_8\) and \(047\) both represent the same number in octal which corresponds with the number \(100111\) in binary.
But base 8 is not the only one with the property of being a power of two. The following base in increasing order that is also a power of two would be base 16. Is it possible to write numbers in this base? Following the properties described in the previous section, we need as many digits as indicated by the base starting from zero. So, aside from digits 0 to 9, we need six additional symbols to use as digits. The solution is to use the first six letters of the alphabet: A, B, C, D, E, F as if they were digits. Thus, the 16 digits used to represent numbers in base 16 are:
Applying the same rules that were previously described, number F is followed by 10, number 19 is followed by 1A, 1F is followed by 20, 99 is followed by 9A, and the number before 100 is FF.
This encoding is also known as hexadecimal. Is it possible to make a direct translation from a binary number to an hexadecimal number? The digits are obtained by successive divisions by 16. But since 16 is also a power of two (\(2^4\)), the reminder of the division are the four least significant bits of the number, and the quotient are the remaining bits after removing the reminder. Thus, to obtain the hexadecimal value of a binary number we group the bits in groups of four starting from the least significant, and then translate each group into its corresponding digit as shown in the following table.
Hex digit 
0 
1 
2 
3 
4 
5 
6 
7 
Binary 
0000 
0001 
0010 
0011 
0100 
0101 
0110 
0111 
Hex digit 
8 
9 
A 
B 
C 
D 
E 
F 
Binary 
1000 
1001 
1010 
1011 
1100 
1101 
1110 
1111 
As in the case of base 8, hexadecimal is typically used as a more compact representation of binary numbers. This base is a bit more convenient because for the case of bytes, it only requires two digits (as opposed to three for base 8). To avoid confusion with the other representation hexadecimal numbers are represented with the prefix 0x. The following table shows examples of binary numbers and their corresponding hexadecimal representation.
Number in binary 
Groups of 4 bits 
Number in hexadecimal 

\(00100111_2\) 
0010 = 2, 0111 = 7 
0x27 
\(10110010_2\) 
1011 = 11, 0010 = 2 
0xB2 
\(11011001_2\) 
1101 = 13, 1001 = 9 
0xD9 
The conversion from hexadecimal to base 10 is done identically to the rest of the bases. The digits A through F have values 10 to 15 respectively, and the base to use is 16.
Example
The representation in base 10 of the following hexadecimal numbers is:
The following video summarizes how to encode numbers in bases 2, 8 and 16.
Size of a encoding¶
In the previous sections you have seen how to encode natural numbers in binary. The representation of a number is done by a set of bits. But, how many bits are really needed to represent natural numbers? Given that there are infinite of them, in principle, the answer would be infinite number of bits. However, digital circuits can only manipulate representations of numbers in binary with a finite number of bits.
This limitation means that additionally to a technique to translate numbers to binary, we must also decide the size of such encoding and what happens when such size is not enough. For example, suppose we decide to encode natural numbers with 10 bits. Only the numbers in the interval \([0, 2^{10}  1]\) can be represented. The last number of such interval will be encoded as 1111111111 (or 0x3FF in hexadecimal). The problem appears when a circuit manipulating these numbers has to perform the operation \(1023 + 1\). The result can be obtained with no problem but it cannot be represented with given number of bits. This situation in which the result of an operation cannot be represented due to the size of the encoding is known as overflow. The way the microprocessors deal with it is simply to detect it and raise an exception so that it can be treated by some specific code.
The number of bits used to encode natural numbers changes from circuit to circuit and is one of the parameters that defines what is called its architecture. The more digits used, the larger the interval of numbers that can be represented, but at the same time the more complicated are the circuits needed to manipulate them. Thus, microprocessor designers have to take into account this tradeoff and find a size that is reasonable for the type of calculations required and so that the circuits required to operate on them are not too complex. Through the history of microprocessor design, the typical sizes used to represent naturals started as 8 bits (allowing to encode from 0 to 255) all the way up to 128 bits that can be found in more advanced circuits.
The problem of the size of an encoding is not unique for the naturals. Any set with an infinite number of elements that needs to be encoded will have the same problem. Given a set of bits, only a subset of values can be encoded, and the processor must detect and notify whenever the encoding of an element outside that interval is needed.
Encoding integers¶
There are various schemes used to encode integers in binary logic. The most simple of them is called sign and magnitude. This encoding is based in the observation that any integer can be considered as a pair formed by a sign and a natural number (representing its absolute value). For example, the number 345 can be represented by the pair (, 345). The absolute value of an integer, by definition, is a natural number, and therefore we can use the encoding using base 2 we described in the previous section. As for the sign, because it is a set with two possible values, we simply use one bit and the value 0 represents the positive sign, and 1 the negative sign.
As a consequence, the translation of an integer in base 10 to its representation in the sign and magnitude scheme is simple: encode the absolute value of the number in binary, and add the sign as an additional bit. Typically, the sign bit is the leftmost, or most significant. The following table shows several integers and their encoding in sign and magnitude using 10 bits.
Example
Integers encoded with sign and magnitude using a 10 bit representation.
Number in decimal 
Sign 
Absolute Value 
Sign and magnitude 

342 
– = 1 
342 = 101010110 
1101010110 
342 
+ = 0 
342 = 101010110 
0101010110 
23 
– = 1 
23 = 10111 
1000110111 
Note that in the last example, the encoding of number 23 requires its absolute value to be represented with 9 digits, and therefore, the appropriate number of zeros are added as most significant bits.
A sign and magnitude representation of size n bits allows to encode the integers in the interval \([(2^{n  1}  1), 2^{n1}  1]\). This expression is derived from the fact that the highest number for the encoding starts with a zero (for the positive sign) followed by all ones. Analogously, the smallest value has a 1 in its leftmost bit (negative) followed by all ones. The following figure shows this range for a sign and magnitude representation using 8 bits.
But this encoding technique has an undesirable property. In binary encoding, n bits allow for \(2^n\) combinations, but with sign and magnitude, only \(2^n  1\) combinations are used. The problem is that the number zero has two representations, all zeros, and a 1 as most significant bit and the rest zeros. This means that an element of the initial set (in this case the value zero as integer number) is represented by two bit combinations, thus wasting the opportunity of encoding one more integer in that interval. There are other encoding schemes that assign a single combination of bits to the number zero. One of them is known as 2s complement.
The 2s complement encoding scheme allows \(2^n\) consecutive integers around the value zero to be encoded with n bits. More precisely, the range of integers represented by n bits in 2s complement is \([(2^{n1}), 2^{n1}  1]\). The following figure shows the range of integers that can be encoded with 2s complement using 8 bits.
As you can see, the combination with eight ones represents the number 1.
The translation of an integer encoded in base 10 to a 2s complement with n bits is done with the following steps:
If the number is larger or equal to zero, simply translate it to base 2 encoding by successively dividing by 2 and taking the reminders as the bits in increasing order of significance.
If the number is negative, apply these three steps:
Obtain the base 2 encoding of the absolute value of the number.
In that representation, replace every 0 by a 1 and every 1 by a 0.
Add 1 to the resulting number
For example, suppose you have to obtain the representation of the number 115 in 2s complement with 8 bits. We first obtain the representation in base 2 of the absolute value 115, which is 01110011 (or its equivalent in hexadecimal 0x73). Then, we switch ones by zeros and zeros by ones to obtain 10001100. Finally, we add 1 to the result obtaining 10001101.
The translation of an integer in 2s complement with n bits to its representation in base 10 is done with the following steps:
If the number is positive (that is, its most significant bit is zero), the number in base 10 is obtained by adding those powers of two for which the corresponding bit is one (as explained before).
If the number is negative (that is, its most significant bit is one), the base 10 value can be obtained by either of the following two methods:
Applying the equation:
\[ABS(N)  2^n\]where \(ABS(N)\) is the value obtained when translating the n bits directly into a base 10 number.
Applying the following three steps:
Replace every 0 by a 1 and every 1 by a 0 in the n bits.
Add 1 to the resulting number.
Translate the resulting number to base 10 and take its value as a negative number.
Example
Consider the value 10110110 which represents an integer encoded in 2s complement with 8 bits. Its corresponding value in base 10 is calculated as:
Or, alternatively:
Replace zeros by ones and ones by zeros: 01001001
Add one: 01001010
Translate the value to base 10 and take as negative number: 2 + 8 + 64 = \(74\)
The following table shows the relationship between the representation of naturals and integers i2s complement.
Numbers 
Representation in base 10 

\(N\in [0,2^n  1]\) 
\(N = \sum_0^{n1} 2^ib^i\) 
Positive integers \(N \in [0, 2^{n1}  1]\) 
N has 0 as most significant bit: \(N = \sum_0^{n1}2^ib^i\) 
Negative integers \(N \in [2^{n1}, 1]\) 
N has 1 as most significant bit: \(N = \sum_0^{n1}2^ib^i  2^n\) 
Adjusting the encoding of an integer in 2s complement to a scheme with a larger number of bits, requires special steps depending on the sign. If the number is positive, the adjustment consists on adding zeros to the most significant bits until the new length is reached. However, if the number is negative, adding zeros as most significant bits would change the sign. In this case, extending the number adding ones to the most significant bits maintains the encoding valid in 2s complement. This operation is known as bit extension.
The 2s complement scheme to encode integers has several properties that make it very efficient. The problem with respect to the double encoding of the value zero present in the sign and magnitude scheme is not present in this one. Zero has a unique representation and it corresponds with its value in base 2. The most significant bit (or the left most) still represents the sign of the number with identical correspondence as the one used in sign and magnitude. Positive numbers have their most significant bit set to zero, and negative numbers have that bit set to one. The positive integers are encoded exactly as the natural numbers, that is, with their encoding in base 2.
But the most interesting property of the 2s complement scheme is that the addition and subtraction operations can be done following base 2 rules as if the numbers were naturals. This property is very important because in the context of digital circuits it means that the same circuit used to add and subtract naturals can be used with no modifications to operate on integers encoded in 2s complement.
The following video summarizes how to encode integers using sign and magnitude, and 2s complement scheme.
Encoding Real Numbers with Floating Point Representation¶
Encoding real numbers using binary logic is significantly more complex that the case of naturals and integers. As we have seen for the case of naturals and integers, when using an encoding with a fixed number of bits, there is a range of numbers that can be represented. But in the case of the real numbers, there are infinite values in that interval. Thus, the representation of real numbers must be restricted to a certain interval and to certain values within that interval. This feature of the encoding translates into several issues that need to be solved when operating with these numbers.
Suppose that two real numbers are added, and the result does not correspond with one of the values included in the encoding. This situation is perfectly possible because only a subset of possible values are encoded. The only possible solution in digital circuits is to approximate the result by the closet number that can be represented. This means that when a digital circuit manipulates real numbers it introduces an error. In principle any real number that is manipulated by a digital circuit may contain some error, which in some cases could be zero.
This potential error derived from the representation of real numbers is specially important in those systems that perform massive calculations with real numbers as the error may increase to the point of invalidating the results. There are programming techniques to address this issue and minimize the error obtained when manipulating real numbers.
Real numbers are encoded in binary logic with the technique known as floating point. Every real number is represented as set of digits to the left of the point and another set of digits to the right of the point. The part to the right of the point is called mantissa. Multiplying and dividing by the base allows the point to be moved to the left and right respectively. The following example shows the equivalence of real numbers in base 10 when they are multiplied and divided by the base.
Example
Effect of multiplying by the base with respect to the point.
By multiplying and dividing by the base it is possible to represent any real number as a number in which the first nonzero digit is to the right of the point times the base raise to the appropriate exponent.
Note
The floating point representation is similar to what is known as the scientific notation. The difference is that in the scientific notation the multiplications and divisions by the base are done to leave only one nonzero digit to the left of the point. So, the difference between the two notations is in one unit in the exponent.
The advantage of the floating point representation is that it allows to represent certain very big and very small numbers in a very compact form. The number \(0.1 * 10^{10}\) would require 11 digits if represented normally with a fixed point. However, in floating point it only requires to represent the mantissa \(0.1\) and the exponent \(10\).
The representation of real numbers using base 2 follows exactly the same rules as the case of the naturals with respect to the weight of any of the digits. The weight of the bits in the decimal part is obtained by decreasing negative powers of the base (\(2^{1}, 2^{2}, 2^{3}\), etc.)
Example
The effect of multiplying and dividing by the base applies identically to base 2. In this case, the factor used is 2 to the power of the appropriate number.
Example
Multiplying by powers of 2 real numbers in base 2
Real numbers in base 2 can be represented by first adjusting the value of the mantissa so that the first nonzero digit is at the right of the point, and it is then multiplied by the appropriate power of the base. But this representation is still not suitable to be managed by digital systems. Remember that they can only manage zeros and ones. The technique to achieve the proper encoding is to encode separately the three ingredients: the sign of the mantissa, the mantissa, and the exponent (as the base is implicitly 2).
The sign of the mantissa can be easily encoded by a single bit following the encoding used for the integers: zero represents the positive sign, and 1 represents the negative sign. The second element is the mantissa, that is already in binary format and therefore does not need any additional processing. The exponent is an integer, and therefore we can encode using any of the methods previously described (sign and magnitude or 2s complement). In summary, a real number is now represented by three pieces: one bit encoding the sign, a set of bits encoding the mantissa, and a set of bits encoding an integer which is the exponent of the floating point representation. The following figure shows the structure of this representation as a sequence of bits.
In this type of encoding, it is not enough to know the number of bits for the overall encoding, but we need to know how many bits within the encoding are devoted to the mantissa, how many of them to the exponent, and how is the exponent value being encoded (sign and magnitude, 2s complement, 1s complement, excess K, etc.)
How do we encode zero?¶
In the previously described scheme we have assumed that the mantissa of a real number is obtained by shifting the point such that the first nonzero digit is at the right of the point. But what happens if there is no nonzero digit? This is actually the case when trying to represent the number zero. This is a special case that needs to be addressed if we are using the floating point representation. The way zero is encoded is typically assigning a special combination of bits in the mantissa and exponent and check for the presence of such combination before performing such operation.
Range, Accuracy and Precision of the Floating Point Representation¶
Two representations of real numbers with the same size, for example 14 bits, may represent different intervals and values depending on the number of bits assigned to encode the mantissa and exponent (the sign of the mantissa is always represented by one bit). For example, in one scheme the 14 bits are allocated as 1 for the sign, remaining 7 for the mantissa and 6 bits for the exponent. If we assume that the exponent is encoded with 2s complement values, then the range of possible values to represent is
This interval can be refined a bit further by stating the intervals of negative and positive nonzero numbers allowed:
Another possibility with the same size (14 bits) is to assign one bit for the sign, 5 bits for the mantissa, and 8 bits for the exponent. In this case, the intervals are
As we can see, changing the assignment of bits changes the range of the representation. But, what is the effect of the different distribution of bits for the mantissa and the exponent in the accuracy of the encoding? Let us explore it by two extreme cases of a 32 bit floating point representation.
Suppose that in one case the 32 bits are distributed as: 1 bit for the sign, 2 bit for the mantissa and 29 bits for the exponent. Such encoding would cover a range of very large and very small numbers (because the exponent can be high in absolute value). However, for each possible exponent value, only two possible mantissas are possible: \(10\) and \(11\) (remember that the mantissa has its most significant bit always one). Therefore we would expect the errors that appear when trying to encode numbers to be high as only two bits can be used for the mantissa. In this case the encoding has a low precision in general.
Suppose now the other extreme. Out of the 32 bits used to encode real numbers, one bit is used for the sign of the mantissa, 29 for the mantissa, and the remaining two for the exponent. If the exponent is encoded in 2s complement, then the only possible values are \([2, 1, 0, 1]\). This means that only a very narrow interval of numbers can be represented. However, from within that interval, the numbers can be represented with better precision.
As a consequence, two floating point encoding schemes with identical overall size may have different precision due to the distribution of the bits between the mantissa and the exponent.
What is the difference between accuracy and precision? Precision is a property of a generic encoding. In other words, we may compare the precision provided by using 10 or 12 bits for the mantissa. However, the accuracy refers to how close a binary encoding of a real number is to the real magnitude we want to encode. In other words, the precision is a property of the overall encoding scheme, whereas the accuracy is a property of one concrete real number and the binary encoding used to represent it.
Overflow and Underflow in Floating Point Encoding¶
As in the case of the naturals and integers, when a number to be represented is outside of the range of valid values an overflow is produced. But in the case of the real numbers, this situation is a bit more complex. Aside from the case of numbers being greater or lower than the extremes of the interval, another special situation appears when trying to represent a very small number.
Suppose that the smallest positive number represented by a floating point scheme (independently of the overall size, size of the mantissa and size of the exponent). What is the result of dividing such number by 3? Given that the number is positive, the result is positive. Furthermore, since it is the smallest positive number, the closest number to this result is zero. But approximating a nonzero value by zero has a devastating effect in some operations. Suppose the result of this operation is then multiplied by a very large number, in that case, the result is zero, and therefore the error is enormous. In fact, the consequence of approximating a nonzero number by zero is that the from that point, the error introduced by the floating point representation is unbounded. This is the reason to treat this situation as an anomaly known as underflow.
Both situations, underflow and overflow, while operating with real numbers are reported as exceptions with the hope that some other program or application will take the appropriate measures. From the point of view of a computer system, though, the designers must include the required hardware to detect these situations when performing mathematical calculations.
Given the importance of defining the right floating point encoding for computer systems, the Institute of Electrical and Electronics Engineers (known simply as the IEEE) has defined a standard to encoded floating point numbers known as the IEEE 754 format. The standard defines five representations: binary floating point with 32, 64 and 128 bits, and decimal floating point with 64 and 128 bits.
Encoding set of symbols¶
Aside from manipulating numbers, computer systems must be also able to manipulate sets of arbitrary symbols. As in the case of numbers, though, they must be encoded with bits despite the fact that they are not manipulated with arithmetic operations. A binary encoding of a set of symbols requires three ingredients:
A set of symbols.
The number of bits to use in the encoding.
The correspondence between each symbol in the set and the sequence (or sequences) of bits representing it.
The number of bits to use when encoding a set has the restriction derived from equation (2). With n bits, a maximum of \(2^n\) elements can be encoded, therefore, if \(C\) is the cardinality of the set of symbols, the following equation must be satisfied:
Example
Suppose you have to encode the symbols in the set \(S = \{ \text{Red}, \text{Green}, \text{Blue}\}\). You decide to use the minimum number of bits for its encoding, in this case, two. The correspondence between the symbols and its encoding can be established in the following table:
Symbol 
Encoding 

Red 
00 
Green 
01 
Blue 
10 
Since the cardinality of the set is not a power of two, there are more combination of bits than elements which allows more than one sequence of bits to be assigned to one element.
As it can be seen, given a set of elements and a number of bits, there are multiple ways in which the correspondence between sequences of bits and symbols can be defined. Although encoding of sets of symbols can require an adhoc process, there are certain set of symbols that are used very frequently and some common schemes have already been defined.
Encoding Characters¶
One of the sets of symbols that are used more often by computer systems are those introduced through the keyboard. The first computer systems already required to encode the set of symbols used to write text. Additionally, since computer systems typically exchange text information very often, it is important that all of them use the same encoding.
One of the character encoding scheme that has been used very often was ASCII (American Standard Code for Information Interchange). This encoding includes not only letters and digits but also special symbols used when transmitting telegraphic messages. The size of the encoding was decided to be 7 bits, and therefore, a maximum of 128 symbols could be encoded. The following figure shows a table from 1972 with the ASCII encoding used in a printer.
The first 32 combinations (from 00000000 or 0x00, to 00011111 or 0x1F) are used to encode symbols that are not printable and were used to handle the transmission of text between telegraphs. The symbols with codes from 32 (0x20) to 127 (0x7F) correspond with printable symbols. For example, the white space is encoded as 0x20. Letters “a” to “z” occupy the codes 0x61 to 0x7A. Its corresponding uppercase symbols are in the codes 0x41 to 0x5A (the difference between upper and lowercase codes is a constant).
But ASCII was not the only encoding scheme used to represent letter. Other alternatives such as EBCDIC (Extended Binary Coded Decimal Interchange Code) were also used in other contexts and by other systems. One of major shortcomings of the ASCII encoding is the absence of encoding for symbols that are not part of the US alphabet. In a first stage the code was extended to include some of them such as “á”, “ü”, or “ö”. The extended ASCII code was proposed as an 8 bit code and the combinations from 128 to 255 were used to encode those and other graphical symbols.
The extended ASCII encoding is still not enough to encode all possible symbols in the planet. A new scheme was needed that had enough combinations to encode all the symbols currently used in the planet for written communication. The encoding scheme proposed with this objective is called Unicode and pretends to offer a comprehensive encoding so that all computer systems are capable to handle all the symbols used by the human race. Unicode has now been adopted by most of the hardware and software companies and it will eventually become the only representation that will be used.
The implementation of the encoding tries to be as generic as possible, and towards that end, instead of fixing a unique size in bits for the representation, there are several possible implementations with sizes of 8, 16 and 32 bits. These three encoding schemes are different implementations of the same standard and are known as “UTF8”, “UTF16” and “UTF32” respectively. The correspondence between the numeric value and the symbols is defined for each of the three formats.
Example
Image 
Symbol 
UTF code 

z 
lower case z 
0x007A 
water 
0x6C34 

G key 
0xD834 0xDD1E 
Once the encoding for all the symbols has been established, strings are represented by sequences of these values. This type of encoding is used by applications such as plain text editors. The following table shows the encoding of a text file with an assembly program.
Program:
1 2 3 4 5 6 7 8 9 10 11 12 13 14  #include <avr/io.h>
.data
msg: .asciz "Hello\n"
.text
.global main
main: ldi r26, lo8(msg)
ldi r27, hi8(msg)
push r26
push r27
call printf
pop r27
pop r26
ret

ASCII encoding of the symbols in the text:
23 69 6E 63 6C 75 64 65 20 3C 61 76 72 2F 69 6F 2E 68 3E 0A
0A
09 2E 64 61 74 61 0A
6D 73 67 3A 09 2E 61 73 63 69 7A 20 22 48 65 6C 6C 6F 5C 6E 22 0A
09 2E 74 65 78 74 0A
09 2E 67 6C 6F 62 61 6C 20 6D 61 69 6E 0A
6D 61 69 6E 3A 09 6C 64 69 20 72 32 36 2C 20 6C 6F 38 28 6D 73 67 29 0A
09 6C 64 69 20 72 32 37 2C 20 68 69 38 28 6D 73 67 29 0A
09 70 75 73 68 20 72 32 36 0A
09 70 75 73 68 20 72 32 37 0A
09 63 61 6C 6C 20 70 72 69 6E 74 66 0A
09 70 6F 70 20 72 32 37 0A
09 70 6F 70 20 72 32 36 0A
09 72 65 74 0A
Encoding instructions¶
One of the most important encoding schemes used by a computer system is that of the set of instructions that is capable to execute. In it, each instruction with its operands is considered as an element of the set. As with any other symbolic encoding, it is necessary to choose both the number of bits to use and the correspondence between instructions and its encoding.
The set ual1¶
To illustrate this type of encoding we will use an initially reduced set of instructions which we will increase in complexity to see how the encoding can be adapted. For the sake of clarity, we will call this set ual1.
Instead of enumerating all the element in the set, we will divide each instruction (or element in the set) in a set of fields and we will encode each field independently. The encoding of an instruction will then be obtained by combining the encoding of its fields. The instructions in ual1 are divided into three parts: an operation and two operands that are integers in the range \([0, 255]\). The operation can have one of the following four values: add, sub, mul and div. The following figure shows the structure of the instructions and some instructions that are part of the set.
The first step to encode the set is to calculate then number of elements in the set ual1. The field operation has four possible values (or \(2^2\)), and the two remaining fields can have each 256 possible values (or \(2^8\)), thus the number of elements in the set is
Let us recall that the number of bits for the encoding must satisfy (2) and therefore, we need at least 18 bits. In this example, instead of using the minimum number of bits, we will use an encoding with size that is multiple of 8 so we will use 24 bits (3 bytes).
The correspondence between symbols and its binary representation can be established more succinctly referring to groups of instructions instead of enumerating all possible 262144 elements. The proposed set of rules to encode this set are:
The field operation will be encoded with two bits with the following correspondence:
Symbol
Encoding
add
00
sub
01
mul
10
div
11
The two 8 bit numbers are encoded with their regular base 2 encoding.
The fields are encoded from left to right in the order shown in the previous figure and the representation is completed with 6 bits with value zero on the right side of the code for a total of 24 bits.
The following figure shows the format for the binary encoding of ual1.
Following this encoding, the instruction representing the addition of the numbers 0x27 and 0xB2, and represented by add 0x27 0xB2, is encoded in binary as shown in the following figure.
The most important property of this encoding is that the translation of any instruction to its representation in binary and vice versa is a systematic process, and therefore, can be done automatically by a computer system. The following table show the encoding of other symbols of ual1.
Symbol 
Encoding 

ADD 0x27 0xB2 
0x09EC80 
SUB 0x11 0x2F 
0x444BC0 
MUL 0xFA 0x2B 
0xBE8AC0 
DIV 0x10 0x02 
0xC40080 
As you can see, the resulting encoding does not contain exactly the same digits that are part of the operands of the symbol. This occurs because the bits are interpreted in groups of four depending on their position in the overall encoding.
The set ual2¶
We next define a new set of symbols derived from ual1 but with a more complex structure. We add a fourth element to the format to represent the location to store the result of the operation with only two possible values, Location_A and Location B. We will call this new set of symbols ual2. The structure of the new symbols and some examples are shown in the following figure.
The new set ual2 has double number of elements than ual1. For each element in ual1 there is one in ual1 with the suffix Location_A, and another with the suffix Location_B. Thus, the number of elements in ual2 is \(2^{19}\) and at least 19 bits are required to be encoded in binary. As in the case of the set ual1 we will use a number of bits that is multiple of 8, and therefore, use 24 bits.
The correspondence between the elements in ual2 and its binary encoding follows the same rules applied for ual1 with the exception that the last field is now encoded with one bit as shown in the following table:
Symbol 
Encoding 

Location_A 
0 
Location_B 
1 
The new encoding uses 19 of the 24 bits with the structure shown in the following figure.
The interpretation of this last field in the instruction is the location in which the result of operating the two given numbers is stored. For example, the symbol add 0x10 0x02 Location_B means that the numbers 0x10 and 0x02 are added and the result stored in Location_B. The process of translating a symbol in the set to its 6digit hexadecimal encoding and vice versa can be done systematically as well. The following table shows several examples of symbols in ual2 and its binary en hexadecimal encoding.
Symbol 
Binary 
Hexadecimal 

div 0x10 0x02 Location_A 
11 0001 0000 0000 0010 0 00000 
0xC40080 
add 0x10 0x02 Location_B 
00 0001 0000 0000 0010 1 00000 
0x0400A0 
mul 0x10 0x02 Location_A 
10 0001 0000 0000 0010 0 00000 
0x840080 
The Instruction decoding, or deducing which symbol corresponds to a given binary representation is precisely what the microprocessors do when they receive an instruction to execute.
As you can see, the encoding schemes of ual1 and ual2 have several arbitrary decisions: the correspondence between operations and their binary codes, the order in which the binary codes are concatenated, the total number of bits, the position of the additional filler bits, etc. These decisions may be modified and producing different encoding schemes that are also valid. The property that must be preserved is to encode and decode the symbols unequivocally.
The filler bits used to extend the encoding to 24 bits offer some degree of ambiguity. Given two 24 bit numbers, if the difference among them is in the filler bits, they both represent the same symbol. This property does not affect though the process of encoding and decoding instruction, it is only a consequence of having more binary combinations available than elements in the set.
The set ual3¶
We now define the new set ual3 in which the two first operands of each instruction can also include one of the two locations (Location_A, Location_B). This new set extends the ual2 with those symbols in which the second or third part is a location. More formally, the set ual2 is a subset of ual3. The interpretation of the symbols in this new set is that both locations can now be the place where operands are stored. For example, the symbol add Location_A 0x10 Location_B which is in ual3 encodes an instruction that takes the value (previously) stored in Location_A, adds it to 0x10 and stores the result in Location_B.
The number of elements in ual3 is obtained again multiplying the combinations possible in each of the fields in the instruction. The operation code has four possible values, but the second and third fields now have 256 possible numbers plus the two locations. Thus, the number of elements of ual3 is
Following the encoding scheme used until now, the size of the encoding will be augmented to a multiple of bytes, and therefore will occupy 24 bits. The correspondence between symbols in the set and their binary encoding needs to be reviewed though as the previous rules are no longer valid in this set. The two operands that follow the operation cannot be encoded with 8 bits as they have 258 possible combinations. At least 9 bits are required to encode the operands in ual3. With 9 bits there are 512 possible combinations, but only 258 of them are needed. One possible solution is to divide the 9 bits into two groups of 1 and 8 bits. If the most significant bit is zero, then the following 8 bits encode a numeric value between 0 and 255. If the first bit is 1, then the operand is one of the two possible locations, Location_A or Location_B. In this last case, only one of the 8 bits is needed, and the remaining 7 are ignored. The following figure shows this encoding scheme.
Due to the fact that we are encoding 258 elements with 9 bits, there are various combinations that are not used. More precisely, when a instruction has a location as its second or third field, the first bit of that field has the value one, and all combinations derived from the following seven bits all represent the same value, because they are not relevant. The following table shows examples of operand encoding in ual3.
Binary encoding 
Operand 

000010010 
0x12 
100010011 
Location_B 
001111010 
0x7A 
100010010 
Location_A 
1XXXXXXX0 
Location_A 
As you can see in the last row of the previous table, when the value of the first of the nine bits encoding the operand is one, the following seven bits are not relevant, and they are typically represented by the symbol X.
The new correspondence between binary values and symbols in ual3 is now defined as comprising seven fields: the operation, the bit indicating the type of the first operand, the first operand, the bit indicating the second operand, the second operand, the third operand, and the filler bits. The following figure shows this format.
The following table show examples of instructions from ual3 and their corresponding encoding.
Symbol 
Binary Encoding 

add 0x01 0x00 Location_A 
0x002000 
add 0x02 0x00 Location_B 
0x004008 
mul Location_B 0x03 Location_A 
0xA02030 
mul Location_B 0x04 Location_A 
0xA02040 
The encoding schemes like the ones shown until now are a simplified version of those used in commercial processors. These circuits have predefined set of instructions that can be executed, and each of them is encoded in binary. The moment in which the board in which a microprocessor is mounted receives voltage, the circuit starts to get instructions from memory, decode their binary representation, and executed the required operations.
The binary encoding of a sequence of instructions is simply the sequence of the binary encoding of each of the instructions. The microprocessor then obtains these representations in sequence and executes them. The values of this sequence are stored in the RAM memory, sharing the space with the data required to execute a program.
The number of bits used to encode a set of symbols has a direct impact on the amount of memory needed to store a sequence of instructions. The more compact the encoding, the least amount of memory is required, and more information can be stored in memory. However, a very compact representation of the instructions may require a sophisticated decoding scheme. This tradeoff between the size of an instruction and the complexity of its decoding is taken into account by designers.
Instructions with variable size¶
After encoding the symbols in ual3, would it be possible to encode the instructions with a representation using less than 24 bits? The immediate answer would be to get rid off the filler bits. Could the representation be reduced any further? For those symbols for which the two operands are natural numbers, each bit of the encoding is already providing information, and therefore it is not possible to reduce its size.
However, for those symbols in which any of the two operands is one location, some of the bits are not used. A new scheme could be defined in which these bits are removed only if the first bit of the operand field is one. A new scheme is proposed that has the following rules:
If any of the operands is a number of eight bits, the encoding scheme used in ual3 is maintained.
If the operand is a location, two bits are used. The first one has always the value 1 and the second encodes the location.
With this new encoding scheme, the symbol add Location_B 0x10 Location_B is encoded with the 14 bit sequence with value 00 11 0 0001 0000 1. If we decide to maintain the rule that each symbol must be represented by a multiple of eight bits, then the symbol can be encoded with 16 bits with the value :0x3084 (adding two filler bits at the end).
Does this encoding maintain the property of being systematic? In other words, can we think of an automatic process that receives a sequence of bits and detects the instruction that it encodes? The answer is yes, but with one important modification. Encoding the symbols (obtain its corresponding binary representation) is clearly defined by the previous rules. Decoding the symbol (derive the symbol from its binary representation) requires a different strategy because the size of the encoding may have different size depending on the operands. In this case, the decoding must be done incrementally.
The first step is to obtain the first byte of the encoding and analyze the bits to know the type of operands. Depending on the value of these bits we may obtain the required bits to decode the complete value of the operands. For example, if we are decoding the value 0x3084 we can see that the two first bits are zero, which corresponds to the operation add, and the next one has the value 1, which means that instead of obtaining the next eight bits, the operand is a location and is encoded in the immediately following bit encoding one of the location. The same process is repeated to the following bits. The fifth bit is a zero meaning that the second operand is a natural number and is encoded in the following eight bits.
This decoding procedure is systematic, it follows a set of unequivocally defined rules, but now it processes instructions that have different sizes depending on the type of operands. More specifically, with this scheme, and assuming that the encoding has always to be multiple of eight bytes, there are instructions of one, two and three bytes depending on the type of operand that they contain.
This last example shows the division of current commercial microprocessors depending on the format of the instructions: fixed format or variable format.
Fixed format: all the instructions have the same length. They are typically divided into fixed fields and each field encodes one element of the instruction. The main advantage is that the decoding process is very efficient due to its simplicity. The main disadvantage is that the encoding may waste space in some of the fields, so the encoding is not compact.
Variable format: instructions have different size depending on the information they contain. The main advantage is that the binary representation can be very compact, thus requiring the minimum space in memory. The main disadvantage is the decoding may require a set of nontrivial steps that look at the bits gradually. A complex decoding may require extra time thus impacting the overall performance of the processor (the number of instructions executed per second).
Today’s commercial microprocessors are divided into those with a RISC (Reduced Instruction Set Computer) architecture where there is only a small number of instructions with the same length that can be decoded very fast, and those with a CISC (Complex Instruction Set Computer) architecture where there numerous instructions performing complex tasks, encoded with variable length, and with a complex decoding phase.
Machine Language¶
Encoding instructions like the ones in the previous section requires then a decoding process that in commercial processors needs to be implemented as a digital circuit. The encoding of the instructions with binary logic is what is known as the machine language of a processor, and is the only language understood by the system. Thus, writing a program for such system requires a detailed knowledge of all the instructions and the rules to derive their corresponding binary encoding.
The description of the machine language of a processor is usually accompanied by the description of its architecture (the internal parts in the processor). The reason is because the instructions that are part of the machine language are used to manage the different parts of the processor. Thus, each processor has a document describing its architecture and the set of instructions that can execute with the type of operands that can be used.