# System Architecture Summer Semester 2023 – Recap Jan Reineke Universität des Saarlandes ### Reminder - Exam Registration - Final Exam: July 25, 2023, 10:00-12:00 - Registration in LSF: July 19, 2023 (Today!) #### 1. Boolean Calculus ### 1. Boolean Calculus - Key Items - Partial and total Boolean functions - Truth tables, On-Set, Off-Set - Boolean algebra, axioms, laws, duality principle - Boolean expressions, syntax, semantics - Literals, monomials, polynomials - (Canonical) disjunctive/conjunctive normal form ### 2. PLAs and Logic Synthesis #### 2. PLAs and Logic Synthesis - Key Items - n-type and p-type transistors - Programmable Logic Arrays (PLAs) - Implementation of monomials and polynomials in PLAs - Cost of monomials and polynomials - Two-level logic minimization, and the corresponding covering problem on the hypercube ### 3. Implicants and Prime Implicants An implicant of f is a monomial q with $\psi(q) \le f$ . A prime implicant of f is a maximal implicant q of f. #### Theorem (Quine): Every minimal polynomial p of a Boolean function f consists only of prime implicants of f. #### Theorem (Implicants): A monomial m is an implicant of f if and only if, either - m is a minterm of f, or - m·x and m·x' are implicants of f for a variable x that does not occur in m. ``` Thus: m \in Implicant(f) \Leftrightarrow [m \in Minterm(f)] \vee [m \cdot x, m \cdot x' \in Implicant(f)] ``` #### 3. Implicants and Prime Implicants - Key Items - Implicants and prime implicants - Minimal polynomial - Theorem of Quine - Characterization of implicants ### 4. Quine/McCluskey Algorithm ``` Quine-Prime-Implicants(f: B^n \rightarrow B) L_0 := Minterm(f) i := 1 Prime(f) := \emptyset while (L_{i\cdot 1} \neq \emptyset) and (i \leq n) L_i := \{m \mid |m| = n \cdot i, m \cdot x \text{ and } m \cdot x' \text{ are in } L_{i\cdot 1} \text{ for some } x\} P_i := \{m \mid m \in L_{i\cdot 1} \text{ and } m \text{ is not covered by any } m' \in L_i\} Prime(f) := Prime(f) \cup P_i i := i + 1 return Prime(f) \cup L_{i\cdot 1} ``` # McCluskey's Improvement #### Compare only those monomials - that contain the same variables, and - whose number of positive literals differs by one. ### 4. Quine/McCluskey Algorithm Matrix-covering problem #### 1. Reduction Rule: Remove from the prime implicant table PIT(f) all essential prime implicants and all minterms that are covered by these prime implicants. #### 2. Reduction Rule: Remove all minterms from the prime implicant table PIT(f) that dominate another minterm in PIT(f). #### 3. Reduction Rule Remove all prime implicants from the prime implicant table PIT(f) that are dominated by other prime implicants that are not more expensive. #### 4. Quine/McCluskey Algorithm - Key Items - Quine's algorithm - McCluskey's improvement - Correctness and complexity of the Quine-McCluskey algorithm - The matrix covering problem - Three reduction rules - Essential prime implicants - Column domination - Row domination - Cyclic covering problems, Petrick's method #### 5. Combinatorial Circuits The depth depth(C) of a circuit C is the maximal number of gates on a path from an arbitrary input $x_i$ to an arbitrary output $y_j$ of C. ### 5. Combinatorial Circuits - Key Items - Logic gates, cell library - Circuits - Semantics of circuits - Concrete and symbolic simulation - Cost and depth of circuits - Hierarchical circuits - Circuits vs Boolean functions - Implementation or associative operations ### 6. Number Representations #### Questions: - 1. How to represent *natural numbers*? - 2. How to represent *integers*? Challenge: negative numbers - 3. How to represent rational numbers? - 4. How to represent very large and very small numbers? fixed-point numbers floating-point numbers Binary numbers: $\langle d_n d_{n-1} ... d_0 \rangle := \sum_{i=0,...,n} d_i \cdot 2^i$ Two's complement: $[d_n d_{n-1} ... d_0]_2 := \sum_{i=0,...,n-1} d_i \cdot 2^i \cdot d_n \cdot 2^n$ Fixed-point numbers: $[d_n d_{n-1} ... d_0, d_1 ... d_k]_2 := \sum_{i=-k,...,n-1} d_i \cdot 2^i - d_n \cdot 2^n$ Floating-point numbers: sign, exponent, mantissa #### 6. Number Representations - Key Items - Numerals (digits), binary, decimal, hexadecimal - Positional numeral system - Natural numbers - Signed-magnitude representation, One's complement, Two's complement - Fixed-point numbers - Floating-point numbers #### 7. Arithmetic Circuits: Adders #### Definition of an Adder #### Ripple-carry Adders #### Half and Full adders $$C(RC_n) = n \cdot C(FA) = 5n$$ $$depth(RC_n) = 3 + 2(n-1)$$ #### 7. Arithmetic Circuits: Adders Lower bounds for adders! $C(+) > 2n \qquad donth(+) > 1$ $$C(+_n) \ge 2n$$ , depth $(+_n) \ge \log(n) + 1$ $$depth(CSA_n) = 3 \log_2 n + 3$$ $$C(CSA_n) = 10n^{\log 3} \cdot 3n \cdot 2$$ #### 7. Arithmetic Circuits: Adders Let M be a set and $o: M \times M \to M$ an associative operation. The parallel prefix sum PP<sup>n</sup>: $M^n \to M^n$ is defined as follows: $$PP^{n}(x_{n-1}, ..., x_{0}) = (x_{n-1} o x_{n-2} ... o x_{0}, ..., x_{1} o x_{0}, x_{0})$$ $depth(PP^n) < (2 \cdot log_2 n) \cdot depth(o)$ $C(PP^n) \le 2n \cdot C(o)$ Generated carry $g_{j,i}$ from i to j: $c_j = 1$ independently of $c_{i-1}$ . Propagated carry $p_{j,i}$ from i to j: $c_j = 1$ if and only if also $c_{i-1} = 1$ Generated and propagated carries can be captured as parallel prefixes of associative operator. #### 7. Arithmetic Circuits: Adders - Key Items - Definition of adder - Half adder, Full adder - Ripple-carry adder, correctness, cost, depth - Recursive constructions, inductive proofs - Incrementer, Multiplexer - Lower bounds on cost and depth - Conditional-sum adder, divide-and-conquer - Addition in two's complement #### 7. Arithmetic Circuits: Adders - Key Items - Parallel prefix operation and circuit - Generated and propagated carries - Carry-lookahead adder #### 8. Arithmetic Circuits #### Multiplication matrix #### Adder stage of log-time multiplier #### 8. Arithmetic Circuits ### 8. Arithmetic Circuits - Key Items - Subtractor, combined adder/subtractor - Multiplier - Carry-save adder - Arithmetic logic unit ## 9. Sequential Circuits ### 9. Sequential Circuits #### Mealy Machines #### Moore Machines ### 9. Sequential Circuits - Key Items - SR latch, D latch, D flip-flop - Register - Random access memory (RAM) - Decoder - Static RAM and Dynamic RAM - Sequential circuits - Finite state machines, Mealy and Moore machines - Synthesis of sequential circuits ### 10. Verilog #### Encapsulation ``` a → Verilog module → y c → Verilog module → y c → Verilog module → y c → Verilog module → y c → verilog module → y input [1:0] b, input c, output y ); // functionality ``` # Hierarchical circuits ``` module fulladder( input a0, input b0, input c, output s0, output s1 ); wire ha0c, ha0s, ha1c; halfadder ha0(.a(a0), .b(b0), .c(ha0c), .s(ha0s)); halfadder ha1(.a(ha0s), .b(c), .c(ha1c), .s(s0)); assign s1 = ha0c | ha1c; endmodule ``` $a_0$ $b_0$ HA Blocking vs non-blocking assignments #### Blocking assignment ``` reg x, y; always @(posedge clk) begin x = y; y = x; end ``` Assignments are blocking, i.e., they are executed **sequentially** $\Rightarrow x$ and y are equal #### Non-blocking assignment ``` reg x, y; always @(posedge clk) begin x <= y; y <= x; end Assignments are non-blocking, i.e., are conceptually executed in parallel x and y are exchanged</pre> ``` ### 10. Verilog – Key Items - Hardware description languages - Simulation and hardware synthesis - Blocking vs non-blocking assignments - Test benches #### 11. Instruction Set Architecture **Instruction set architecture** (or just "architecture") - = set of instructions, their **encoding** and **semantics** - = "What" a computer computes For example: x86, ARM Assembly language = textual representation Assembler + Linker Machine language = binary representation #### MIPS instruction set - instructions - encoding Instruction Set Architecture System Memiceture, Jan Remek #### 11. Instruction Set Architecture - Key Items - Instruction set architecture - Assembly language - Machine language, instruction encoding - MIPS instruction set - Addressing modes - Little endian, big endian - Sign extension #### 12. Microarchitecture #### Microarchitecture - = concrete implementation of an instruction set in hardware - = "How" a computer works; e.g. Intel Skylake, AMD Zen 3, Apple M1 ### 12. Microarchitecture - Key Items - Microarchitecture - Datapath, control - Petri nets - Single-cycle system - Main decoder, ALU decoder - ALU implementation ### 13. Performance: Basic Concepts Latency = time required to perform a single task Throughput = number of tasks performed in one time unit Processor time = Number of executed instructions - \* Cycles per instruction - \* Cycle time Technological developments Performance: Basic concepts #### 13. Performance: Basic Concepts - Key Items - Performance definitions, latency, throughput - Execution time, response time, processor time - Cycles per instruction, cycle time - Moore's law - Reduced Instruction Set Computer (RISC), Complex Instruction Set Computer (CISC) - Pipelining ## 14. Pipelining ### 15. Advanced Pipelining Concepts Deep + Superscalar Pipelines Best case CPI = 1/width Tomasulo Algorithm Speculative Execution Predict outcome of branches + Execute instructions speculatively Reorder Buffer LOAD R3 $\leftarrow$ 0(R1) ... $\leftarrow$ head BEQ R2, R3, 40 ... ADD R4 $\leftarrow$ R6, R7 ... MUL R5 $\leftarrow$ R6, R8 ... ADD R7 $\leftarrow$ R9, R9 ... ADD R3 $\leftarrow$ R6, R8 ... ... $\leftarrow$ tail # 15. Advanced Pipelining Concepts – Key Items - Flynn bottleneck - Out-of-order execution - Reservation stations - Tomasulo algorithm - Speculative execution ## 16. Memory Hierarchy, Caches #### 16. Memory Hierarchy, Caches - Key Items - Memory technologies, Memory Gap - Memory Hierarchy - Scratchpad memory, Caches - Fully-associative, direct-mapped, set-associative - Replacement policy - Optimal replacement, Farthest-in-the-Future - Least-recently-used (LRU), First in, first out (FIFO) - Online and offline algorithms - Temporal and spatial locality ## 17. ISA, µArchitecture, and Bad News from the Real World #### Correct ISA Implementation? #### Flush+Reload - 1. FLUSH memory line - 2. Wait a bit - 3. Measure time to **RELOAD** line #### Leaky Abstractions rdtsc: "read time-stamp counter" Spectre Attack ``` Extracts a bit of "value" ``` ``` "JavaScript"- Code: value = some_array[offset]; tmp = other_data[(value>>bit)&1]; } ``` 2. Secret-dependent memory access System Architecture, Jan Reineko ## 17. ISA, μArchitecture, and Bad News from the Real World - Key Items - Correct ISA implementation - Flush+Reload - Prime+Probe - Spectre attack #### 19. Virtualization: The CPU Direct execution User processes Operating System Hardware User processes Operating System Hardware #### Limited direct execution Preemptive Multitasking ### 19. Virtualization: The CPU - Key Items - Process, process vs program - Direct execution - Restricted operations - User mode vs kernel mode - System calls, exception handling - Mechanism vs policy - Context switches - Cooperative vs preemptive multitasking ### 20. Persistence: I/O Devices - Status checks: polling vs. interrupts - Data: Programmed-IO vs. DMA - Control: special instructions vs. memory-mapped I/O Persistence: I/O devices System Architecture, Jan Reineke #### 20. Persistence: I/O Devices - Key Items - I/O devices - Busy waiting/polling vs interrupts - Programmed I/O vs Direct Memory Access (DMA) - Drivers ## 21. Scheduling #### Performance metrics Turnaround time Response time Throughput Fairness Meet deadlines #### Scheduling policies First Come, First Served (FCFS) Shortest Job First (SJF) Shortest Time-to-Completion First (STCF) Round Robin Multi-Level Feedback Queue (MLFQ) System Architecture, Jan Reineke Throughput Turnaround time Response time Fairness Combination ## 21. Scheduling – Key Items - Dispatcher vs Scheduler - Workload, Performance metric - Turnaround time, Response time, Throughput, Overhead, Fairness - FIFO (also FCFS), Convoy effect - Shortest Job First (SJF), Shortest Time-to-Completion First (STCF) - Round Robin - Multi-Level Feedback Queue (MLFQ) - Starvation - Voodoo constants ### 22. Memory Virtualization Foundations ## 22. Memory Virtualization Foundations – Key Items - Transparency, protection, efficiency, sharing - Address space - Static: code and global data, dynamic: stack and heap - Time sharing, Static relocation, Dynamic relocation, Segmentation - Memory Management Unit (MMU) - Base and bounds - Segment table ## 23. Paging ### 23. Paging - Key Items - Internal and external fragmentation - Paging - Pages, page frames - Page number, frame number, page offset - Virtual address, physical address - Page table - Valid bit, protection bits #### 24. Translation Lookaside Buffers "Naïve" paging too slow two physical accesses for every virtual access #### Design choices page sizes associativity replacement policy #### 24. Translation Lookaside Buffers - Key Items - Translation Lookaside Buffer - Direct-mapped, fully-associative, set-associative - Influence of page size on performance - Influence of locality on performance - TLB replacement policies - Address space identifiers (ASIDs) - TLB miss handling ## 25. Smaller Page Tables #### 25. Smaller Page Tables - Key Items - Invalid page table entries - Segmented page tables - Multi-level page tables - Outer page, inner page - Page tables fit within pages ### 26. Swapping #### Pages can be in memory or on disk #### 26. Swapping- Key Items - Swapping - Present bit in page table - Page fault - HW + OS cooperate on address translation - Precise interrupts - Page selection and Page replacement - Demand paging, Prefetching, Hints - Clock algorithm ### 28. Persistence: Disks + I/O Scheduling I/O Scheduling Shortest Positioning Time First SCAN algorithms Anticipatory schedulers Time to read/write Seek → slow Rotation $\rightarrow$ slow Transfer time $\rightarrow$ fast Performance depends on workload | Workload | Toshiba | Seagate Exos | |------------|----------|--------------| | Sequential | 290 MB/s | 261 MB/s | | Random | 1 MB/s | 0,47 MB/s | ## 28. Persistence: Disks + I/O Scheduling – Key Items - Persistent vs volatile memory - platter, surface, spindle, cylinder, track, sector, read/write unit, read/write head - Seek, rotation, and transfer - Throughput on sequential and random workloads - Shortest Positioning Time First (SPTF), Shortest Seek Time First (SSTF) - SCAN algorithms, Elevator algorithm, C-SCAN - Work conservation, anticipatory schedulers #### 29. Persistence: Flash-based Solid State Disks #### Solid-state storage devices No mechanical or moving parts like HDD Built out of transistors; but persistent unlike typical RAM Hierarchical organization - electrons can be **trapped in** the floating gate - electrons do not escape → persistent memory Read: at page granularity Write: $1 \rightarrow 0$ : at page granularity Erase: $0 \rightarrow 1$ : only at block granularity ## 29. Persistence: Flash-based Solid State Disks – Key Items - Solid-state storage devices - Floating-gate transistors - Single-level cells, multi-level cells, etc. - Basic operations: read, write, erase - Reliability: wear out - Out-of-place update - Flash Translation Layer (FTL) #### 30. Error Detection and Correction Hamming distance dist(00001101,10001100) = 2 #### Lemma (Error Detection) A fixed-length code c is k-error detecting iff dist(c) $\geq k+1$ . Parity code Hamming code Lemma (Error Correction) A fixed-length code c is k-error correcting iff dist(c) $\geq 2k+1$ . ## 30. Error Detection and Correction – Key Items - (Fixed-length) codes - Hamming distance, code distance - k-error detecting, k-error correcting - Repetition code - Parity code - Hamming code ## The End.