#### Arithmetic Circuits: Subtractors, Multipliers, ALU

Becker/Molitor, Chapter 9.2+9.3 Harris/Harris, Chapter 5.2

#### Jan Reineke Universität des Saarlandes

System Architecture, Jan Reineke

# Roadmap: Computer architecture



- 1. Combinatorial circuits: Boolean Algebra/Functions/Expressions/Synthesis
- 2. Number representations
- 3. Arithmetic Circuits: Addition, **Multiplication**, **Division**, **ALU**
- 4. Sequential circuits: Flip-Flops, Registers, SRAM, Moore and Mealy automata
- 5. Verilog
- 6. Instruction Set Architecture
- 7. Microarchitecture
- 8. Performance: RISC vs. CISC, Pipelining, Memory Hierarchy

### Subtraction

- As we have  $-[b]=[\overline{b}]+1$  the difference [a]-[b] is equal to the sum  $[a]+[\overline{b}]+1$ .
- → Derive subtractor circuit from adder circuit
  → combined adder/subtractor

#### Reminder: Two's complement

#### Lemma:

Let *d* be a fixed-point number and  $\overline{d}$  the number obtained by flipping all bits  $(0 \rightarrow 1, 1 \rightarrow 0)$  in *d*. Then:  $[\overline{d}]_2 + 1 = -[d]_2$ .

Example: 
$$n = 2, k = 0$$
:d000001010011100101110111[d]\_20123-4-3-2-1

#### Example: Subtraction

$$[a]_{2} = [0110]_{2} = 6_{10} \qquad [b]_{2} = [0111]_{2} = 7_{10}$$
$$\overline{[b]}_{2} = [1000]_{2} = -8_{10}$$

#### Schematic of a subtractor



# Schematic of a combined adder/subtractor



# Multiplier

Wanted: Circuit for the **multiplication** of two binary numbers <a\_n-1, ..., a\_0>, <b\_n-1, ..., b\_0>.



### Outputs of a multiplier

How many bits are required for the result?

$$\le$$
  
 $(2^{n}-1)\cdot(2^{n}-1)=2^{2n}-2^{n+1}+1\le 2^{2n}-1$ 

*Thus*: **2n bits** are sufficient to represent the product of two n-bit binary numbers.

# Multiplier

*Definition*: An **n-bit multiplier** is a circuit that computes the following function:  $mul_n: \mathbf{B}^{2n} \to \mathbf{B}^{2n}$  with  $mul_n(\mathbf{a}_{n-1}, ..., \mathbf{a}_0, \mathbf{b}_{n-1}, ..., \mathbf{b}_0) = (\mathbf{p}_{2n-1}, ..., \mathbf{p}_0)$  with  $\langle \mathbf{p}_{2n-1}, ..., \mathbf{p}_0 \rangle = \langle \mathbf{a} \rangle \cdot \langle \mathbf{b} \rangle$ 

$$< a > \cdot < b > = < a > \cdot \sum_{i=0}^{n-1} b_i \cdot 2^i = \sum_{i=0}^{n-1} (a > \cdot b_i \cdot 2^i)$$
  
partial product

### The multiplication matrix

n partial products, each with 2n bits



System Architecture, Jan Reineke

# Fast addition of partial products

Goal: Fast addition of n partial products of length 2n.

First approach: Use carry-lookahead adders (CLAs).

Cost:  $O(n^2)$ 

Depth:

- O(n · log n) if partial products are summed up one by one linearly
- O(log<sup>2</sup> n) (= O((log n)<sup>2</sup>) for tree-shaped summation of partial products

### Brainstorming: Addition of partial products:

Can we do better than adding up the partial products in O(log<sup>2</sup>n)?

#### Fast addition of **n partial products**

Use carry-save adders.

Reduction of three input values u, v, w into two output values s, c s.t.  $\langle u \rangle + \langle v \rangle + \langle w \rangle = \langle s \rangle + \langle c \rangle$ .

|           | $u_{n-1}$                      | $\mathcal{U}_{n-2}$            | ••• | $u_2$                 | $u_1$                 | $u_0^{}$       |
|-----------|--------------------------------|--------------------------------|-----|-----------------------|-----------------------|----------------|
|           | $V_{n-1}$                      | $V_{n-2}$                      |     | $v_2$                 | $v_1$                 | $v_0^{}$       |
|           | $W_{n-1}$                      | $W_{n-2}$                      |     | $W_2$                 | $W_1$                 | W <sub>0</sub> |
| $C_{n-1}$ | <i>C</i> <sub><i>n</i>-2</sub> | <i>C</i> <sub><i>n</i>-3</sub> | ••• | <i>C</i> <sub>1</sub> | C <sub>0</sub>        | 0              |
| 0         | $S_{n-1}$                      | $S_{n-2}$                      |     | S <sub>2</sub>        | <i>S</i> <sub>1</sub> | S <sub>0</sub> |

Solved by juxtaposition of independent full adders (not in a chain!)

#### Carry-save adder (also: 3:2 adder)



System Architecture, Jan Reineke

### 1. Serial solution

- Cascade connection of n-2 CSavAs of length 2n:
  → Combine n partial products into two 2n-bit words
- Add the last two 2n-bit words using a CLA
- Cost:  $O(n^2)$ , Depth: O(n)



# 2. Tree-shaped solution

- Combine two carry-save adders to reduce four 2n-bit input words into two output words: 4-to-2 reducer
- Balanced binary tree of 4-to-2 reducers to summarize the n partial products using two 2n-bit words
- Addition of the two final 2n-bit words using a CLA



#### 4-to-2 reducer

# Adder stage of the log-time multiplier for 16 bits



### Construction of an ALU

#### ALU = arithmetic logic unit for

performing basic arithmetic and logic operations

#### Here: n-bit-ALU with:

- 2 n-bit operands a, b, carry-in c
- m-bit select input, which selects the function to execute
- (n+1)-bit output

#### Schematic of an n-bit ALU



### Example ALU specification

#### Here: 8 functions, i.e. 3-bit select input

| Function number |       | umber          | ALU function                                                                                                           |  |  |  |  |
|-----------------|-------|----------------|------------------------------------------------------------------------------------------------------------------------|--|--|--|--|
| s <sub>2</sub>  | $s_1$ | s <sub>0</sub> |                                                                                                                        |  |  |  |  |
| -               | 0     | -              | 0 0                                                                                                                    |  |  |  |  |
| 0               | 0     | 1              | [b] – [a]                                                                                                              |  |  |  |  |
| 0               | 1     | 0              | [b] - [a]<br>[a] - [b]<br>[a] + [b] + c                                                                                |  |  |  |  |
| 0               | 1     | 1              | [a] + [b] + c                                                                                                          |  |  |  |  |
|                 | 0     |                | $\mathbf{a} \oplus \mathbf{b} = (\mathbf{a}_{n-1} \oplus \mathbf{b}_{n-1},, \mathbf{a}_0 \oplus \mathbf{b}_0)$         |  |  |  |  |
| 1               | 0     | 1              | $a \oplus b = (a_{n-1} \oplus b_{n-1},, a_0 \oplus b_0)$<br>$a \lor b = (a_{n-1} \lor b_{n-1},, a_0 \lor b_0)$ "Logic" |  |  |  |  |
| 1               | 1     | 0              | $a \wedge b = (a_{n-1} \wedge b_{n-1},, a_0 \wedge b_0)$                                                               |  |  |  |  |
| 1               | 1     | 1              | 1 1                                                                                                                    |  |  |  |  |

#### Possible implementations of an ALU

#### 1. Possibility:

Implement  $f_0$ , ...,  $f_{2^n-1}$  separately via circuits  $C_i$  for  $f_i$ ; then select correct output via generalized multiplexer

(see upcoming figure)

#### 2. Possibility:

Shared circuit for similar functions (see upcoming figure)

#### 1. Possible implementation



## 2. Possible implementation



# *Outlook:* Datapath and instruction execution

