## An Introduction to Microarchitectural Attacks

Jan Reineke

Based on slides kindly provided by Yuval Yarom, The University of Adelaide and Data61 Ruhr University Bochum



#### **Abstraction Layers**

| High-level Languages<br>Compiler                                                         |           | Illusion (ISA) | Reality<br>(mArchitecture)                                 |
|------------------------------------------------------------------------------------------|-----------|----------------|------------------------------------------------------------|
| Assembly<br>Assembler                                                                    | Hardware  | Dedicated      | Shared                                                     |
| Machine Code                                                                             | Memory    | Uniform        | Non-uniform                                                |
| Operating System                                                                         | Execution | Serial         | Superscalar                                                |
| Instrution Set Architectur (SA)<br>implements<br>Microarchitecture<br>realite<br>realize |           | etween abst    | d breathes in the raction layers.<br>ullien (@halvarflake) |

#### CPU vs. Memory



Processor Speed Memory Latency

1 MHz

500 ns



8\*2600 MHz



#### Bridging the gap

Cache utilises locality to bridge the gap

- Divides memory into *lines*
- Stores recently used lines
- In a *cache hit,* data is retrieved from the cache
- In a cache miss, data is retrieved from memory and inserted to the cache





#### **Cache Consistency**

- Memory and cache can be in inconsistent states
  - Rare, but possible
- Solution: Flushing the cache contents
  - Ensures that the next load is served from the memory











#### FLUSH+RELOAD [YF14]

- FLUSH memory line
- Wait a bit
- Measure time to **Reload** line
  - slow-> no access
  - fast-> access
- Repeat





Memory

#### The RSA Encryption System

• The RSA encryption is a public key cryptographic scheme



Key Generation:

- Select random primes p and q
- Calculate N = pq
- Select a public exponent e(=65537)
- Compute  $d=e^{-1} \mod \varphi(N)$
- (*N*, *e*) is the public key
- (p, q, d) is the private key



#### GnuPG 1.4.13 Exponentiation



#### Flush+Reload on GnuPG 1.4.13



#### The FLUSH+RELOAD Technique

- Leaks information on victim access to shared memory.
- Spy monitors victim's access to shared code
  - Spy can determine what victim does
  - Spy can infer the data the victim operates on

#### **Set Associative Caches**

- Memory lines map to *cache sets*. Multiple lines map to the same set.
- Sets consist of *ways*.
   A memory line can be stored in **any** of the ways of the set it maps to.
- When a cache miss occurs, one of the lines in the set is *evicted*.



### The Prime+Probe Attack [OST06]

- Allocate a cache-sized memory buffer
- Prime:

fills the cache with the contents of the buffer

- Probe:
  - measure the time to access each cache set
  - Slow access indicates victim access to the set
- The probe phase primes the cache for the next round





Memory

Sample Victim: Data Rattle

}

```
volatile char buffer[4096];
```

```
int main(int ac, char **av) {
  for (;;) {
    for (int i = 0; i < 64000; i++)
      buffer[800] += i;
    for (int i = 0; i < 64000; i++)
      buffer[1800] += i;</pre>
```

# Cache Fingerprint of the Rattle Program



#### Real Victim – AES

|                                       | <pre>static const u32 Te0[256] = {</pre>                                                    |
|---------------------------------------|---------------------------------------------------------------------------------------------|
| <pre>\$0 = GETU32(in ) ^ rk[0];</pre> | 0xc66363a5U, 0xf87c7c84U, 0xee777799U, 0xf67b7b8dU,                                         |
| S1 = GETU32(in + 4) + [K[1];          | 0xfff2f20dU, 0xd66b6bbdU, 0xde6f6fb1U, 0x91c5c554U,                                         |
| $s2 = GETU32(in + 8) ^ rk[2];$        | 0x60303050U, 0x02010103U, 0xce6767a9U, 0x562b2b7dU,                                         |
| $s3 = GETU32(in + 12) ^ rk[3];$       | 0xe7fefe19U, 0xb5d7d762U, 0x4dababe6U, 0xec76769aU,                                         |
| <pre>#ifdef FULL_UNROLL</pre>         | 0x8fcaca45U, 0x1f82829dU, 0x89c9c940U, 0xfa7d7d87U,                                         |
| /* round 1: */                        | 0xeffafa15U, 0xb25959ebU, 0x8e4747c9U, 0xfbf0f00bU,                                         |
| t0 = Te0[s0 >> 24D ^ Te1[(s1 >>       | 0x41adadecU, 0xb3d4d467U, 0x5fa2a2fdU, 0x45afafeaU,                                         |
| t1 = Te0[si >> 24] ^ Te1[(s2 >>       | 0x239c9chfll 0x53a4a4f7ll 0xe4727296ll 0x9bc0c05bll                                         |
| t2 = Te0[s2 >> 24] ^ Te1[(s3 >>       |                                                                                             |
| t3 = Te0[s3 >> 24] ^ Te1[(s0 >>       |                                                                                             |
| /* round 2: */                        |                                                                                             |
|                                       | 16) & 0xff] ^ Te2[(t2 >> 8) & 0xff] ^ Te3[t3 & 0xff] ^ rk[ 8];                              |
|                                       | 16) & 0xff] ^ Te2[(t3 >> 8) & 0xff] ^ Te3[t0 & 0xff] ^ rk[ 9];                              |
|                                       | 16) & 0xff] ^ Te2[(t0 >> 8) & 0xff] ^ Te3[t1 & 0xff] ^ rk[10];                              |
|                                       | <pre>16) &amp; 0xff] ^ Te2[(t1 &gt;&gt; 8) &amp; 0xff] ^ Te3[t2 &amp; 0xff] ^ rk[11];</pre> |
| /* round 3: */                        | <pre>16) &amp; 0xff] ^ Te2[(s2 &gt;&gt; 8) &amp; 0xff] ^ Te3[s3 &amp; 0xff] ^ rk[12];</pre> |
|                                       | 16) & $0xff$ ] ^ Te2[(s2 >> 8) & $0xff$ ] ^ Te3[s0 & $0xff$ ] ^ rk[13];                     |
|                                       | 16) & $0xff$ ] ^ Te2[(s0 >> 8) & $0xff$ ] ^ Te3[s1 & $0xff$ ] ^ rk[14];                     |
|                                       | 16) & $0xff$ ] ^ Te2[(s1 >> 8) & $0xff$ ] ^ Te3[s2 & $0xff$ ] ^ rk[15];                     |
| /* round 4: */                        |                                                                                             |
|                                       | <pre>16) &amp; 0xff] ^ Te2[(t2 &gt;&gt; 8) &amp; 0xff] ^ Te3[t3 &amp; 0xff] ^ rk[16];</pre> |
|                                       |                                                                                             |

#### **AES T-table access**

| static | const u32  | Te0[256] = { |              |              |
|--------|------------|--------------|--------------|--------------|
| 0x0    | :66363a5U, | 0xf87c7c84U, | 0xee777799U, | 0xf67b7b8dU, |
| 0xf    | ff2f20dU,  | 0xd66b6bbdU, | 0xde6f6fb1U, | 0x91c5c554U, |
| 0x6    | 60303050U, | 0x02010103U, | 0xce6767a9U, | 0x562b2b7dU, |
| 0ve    | 7fofo1011  | 0vh5d7d76211 | Qv/dahahe611 | Avec76760all |
|        |            |              |              |              |

- Assume we know the plaintext and the index (s0>>24)
  - We can recover the most significant byte of the key

#### **AES T-tables and cache lines**

#### static const u32 Te0[256] = {

|              |                         |              | $[e0[256] = \{$         | static const u32 |
|--------------|-------------------------|--------------|-------------------------|------------------|
|              | 0xf67b7b8dU,            | 0xee777799U, | 0xf87c7c84U,            | 0xc66363a5U,     |
| Cache Line 0 | 0x91c5c554U,            | 0xde6f6fb1U, | 0xd66b6bbdU,            | 0xfff2f20dU,     |
|              | 0x562b2b7dU,            | 0xce6767a9U, | 0x02010103U,            | 0x60303050U,     |
|              | 0xec76769aU,            | 0x4dababe6U, | 0xb5d7d762U,            | 0xe7fefe19U,     |
|              | 0xfa7d7d87U,            | 0x89c9c940U, | 0x1f82829dU,            | 0x8fcaca45U,     |
| Cache Line 1 | 0xfbf0f00bU,            | 0x8e4747c9U, | 0xb25959ebU,            | 0xeffafa15U,     |
|              | 0x45afafeaU,            | 0x5fa2a2fdU, | 0xb3d4d467U,            | 0x41adadecU,     |
|              | 0x9bc0c05bU,            | 0xe4727296U, | 0x53a4a4f7U,            | 0x239c9cbfU,     |
|              | 0x4c26266aU,            | 0x3d9393aeU, | <pre>0xe1fdfd1cU,</pre> | 0x75b7b7c2U,     |
| Cache Line 2 | 0x83cccc4fU,            | 0xf5f7f702U, | 0x7e3f3f41U,            | 0x6c36365aU,     |
|              | 0xf9f1f108U,            | 0xd1e5e534U, | 0x51a5a5f4U,            | 0x6834345cU,     |
|              | 0x2a15153fU,            | 0x62313153U, | 0xabd8d873U,            | 0xe2717193U,     |
|              | 0x9dc3c35eU,            | 0x46232365U, | 0x95c7c752U,            | 0x0804040cU,     |
| Cache Line 3 | 0x2f9a9ab5U,            | 0x0a05050fU, | 0x379696a1U,            | 0x30181828U,     |
| Cache Line 5 | 0xdfe2e23dU,            | 0x1b80809bU, | 0x24121236U,            | 0x0e070709U,     |
|              | 0xea75759fU,            | 0x7fb2b2cdU, | 0x4e272769U,            | 0xcdebeb26U,     |
|              | 0x341a1a2eU,            | 0x582c2c74U, | 0x1d83839eU,            | 0x1209091bU,     |
| Cache Line 4 | 0x5ba0a0fbU,            | 0xb45a5aeeU, | 0xdc6e6eb2U,            | 0x361b1b2dU,     |
| Cache Line 4 | 0x7db3b3ceU,            | 0xb7d6d661U, | 0x763b3b4dU,            | 0xa45252f6U,     |
|              | 0x13848497U,            | 0x5e2f2f71U, | 0xdde3e33eU,            | 0x5229297bU,     |
|              | <pre>0xc1eded2cU,</pre> | 0×00000000U, | 0xb9d1d168U,            | 0xa65353f5U,     |
| Cache Line 5 | 0xb65b5bedU,            | 0x79b1b1c8U, | <pre>0xe3fcfc1fU,</pre> | 0x40202060U,     |
| Cache Line 5 | 0x7239394bU,            | 0x67bebed9U, | 0x8dcbcb46U,            | 0xd46a6abeU,     |
|              | 0x85cfcf4aU,            | 0xb05858e8U, | 0x984c4cd4U,            | 0x944a4adeU,     |
|              | 0xedfbfb16U,            | 0x4faaaae5U, | 0xc5efef2aU,            | 0xbbd0d06bU,     |
|              | 0x11858594U,            | 0x66333355U, | 0x9a4d4dd7U,            | 0x864343c5U,     |
|              | 0xfe7f7f81U,            | 0x04020206U, | 0xe9f9f910U,            | 0x8a4545cfU,     |
|              | Av/haQaQaQII            | 0v250f0fhall | Av783c3c4411            | 0va05050f011     |

## AES T-tables and cache lines

|              |              |              | $Te0[256] = {$          | static const u32 |
|--------------|--------------|--------------|-------------------------|------------------|
|              | 0xf67b7b8dU, | 0xee777799U, | 0xf87c7c84U,            | 0xc66363a5U,     |
| Cache Line 0 | 0x91c5c554U, | 0xde6f6fb1U, | 0xd66b6bbdU,            | 0xfff2f20dU,     |
|              | 0x562b2b7dU, | 0xce6767a9U, | 0x02010103U,            | 0x60303050U,     |
|              | 0xec76769aU, | 0x4dababe6U, | 0xb5d7d762U,            | 0xe7fefe19U,     |
|              | 0xfa7d7d87U, | 0x89c9c940U, | 0x1f82829dU,            | 0x8fcaca45U,     |
| Cache Line 1 | 0xfbf0f00bU, | 0x8e4747c9U, | 0xb25959ebU,            | 0xeffafa15U,     |
|              | 0x45afafeaU, | 0x5fa2a2fdU, | 0xb3d4d467U,            | 0x41adadecU,     |
|              | 0x9bc0c05bU, | 0xe4727296U, | 0x53a4a4f7U,            | 0x239c9cbfU,     |
|              | 0x4c26266aU, | 0x3d9393aeU, | <pre>0xe1fdfd1cU,</pre> | 0x75b7b7c2U,     |
| Cache Line 2 | 0x83cccc4fU, | 0xf5f7f702U, | 0x7e3f3f41U,            | 0x6c36365aU,     |
|              | 0xf9f1f108U, | 0xd1e5e534U, | 0x51a5a5f4U,            | 0x6834345cU,     |
|              | 0x2a15153fU, | 0x62313153U, | 0xabd8d873U,            | 0xe2717193U,     |
|              | 0x9dc3c35eU, | 0x46232365U, | 0x95c7c752U,            | 0x0804040cU,     |
|              | 0x2f9a9ab5U, | 0x0a05050fU, | 0x379696a1U,            | 0x30181828U,     |
|              |              |              |                         |                  |

- If 0≤plaintext^key<16, Cache Line 0 is accessed.
- What if plaintext^key≥16?

#### Prime+Probe Attack on AES

s0 = plaintext ^ key
t0 = Te0[s0>>24]

- For many plaintexts do: Prime, Encrypt, Probe
- Calculate the average probe time of each cache set as a function of the plaintext value
- Extract key from results

#### **PP Attack on AES - Results**



#### PP Attack on AES – More Results

plaintext[5..8]



29

**Other Techniques** (a very partial list)

- Evict+Time [OST06]
- Branch prediction [AKS06, ERAP18,...]
- L1-I Prime+Probe [Aci07]
- LLC Prime+Probe [LYG+15, IES15]
- Flush+Flush [GMWM15]
- CacheBleed [YGH17]
- TLBleed [GRBG18]
- PortSmash [CBH+18]
- SPOILER [IMB+19]

#### OpenSSL

*LOW Severity*. This includes issues such as those that ... or hard to exploit timing (side channel) attacks.

https://www.openssl.org/policies/secpolicy.html

- Attacks are easy, but at the same time
  - Publications are terse technical details are often omitted
  - Generic tools do not exist

#### Mastik

#### Extremely bad acronym for

#### Micro-Architectural Side-channel ToolKit

- Original Aims
  - Collate information on SC attacks
    - Improve our understanding of the domain
    - Provide somewhat-robust implementations of all known SC attack techniques for every architecture
    - Implementation of generic analysis techniques
  - Reduce barriers to entry into the area
  - Shift focus to cryptanalysis

#### **Current Status**

- Reasonably robust implementation of six attacks
  - Prime+Probe on L1-D, L1-I and L3
  - Flush+Reload
  - Flush+Flush
  - Performance degradation
- Only Intel x86-64, on Linux and Mac (limited)
  - x86-32 and limited ARM currently working in the lab
- Zero documentation, little testing
- Little user feedback

https://cs.adelaide.edu.au/~yval/Mastik/





No need to program

FR-trace -s 2000 -c 100000 -f ./gpg \

- -m mpih-mul.c:85 \
- -m mpih-mul.c:271 \
- -m mpih-div.c:356

#### **Speculative Execution Attacks**

#### **Microarchitectural channels**

Program execution leaves traces
 inside the processor



• We believe that hardware is working correctly. It is therefore the responsibility of software that processes sensitive material to introduce the appropriate countermeasures. (Intel)

> adc \\$0,%rdx add \$A[0],\$N[0] adc \\$0,%rdx mov \$N[0],24(\$tp) mov %rdx,\$N[1]

#### **Instruction Pipelining**

- Nominally, the processor executes instructions one after the other
- Instruction execution consists of multiple steps
  - Each uses a different unit

| Instruction Instruction<br>Fetch Decode | Argument<br>Fetch | Execute | Write Back |
|-----------------------------------------|-------------------|---------|------------|
|-----------------------------------------|-------------------|---------|------------|

| _      |                 |
|--------|-----------------|
| mulq   | \$m0            |
| add    | %rax,\$A[0]     |
| mov    |                 |
|        | 32(\$tp),\$tp   |
|        | \\$0,%rdx       |
|        | %rdx,\$A[1]     |
|        |                 |
| mulq   |                 |
|        | %rax,\$N[0]     |
|        | 8(\$a,\$j),%rax |
| adc    | \\$0,%rdx       |
| add    | \$A[0],\$N[0]   |
| adc    | \\$0,%rdx       |
|        | \$N[0],-        |
| 24(\$t |                 |
|        | %rdx, \$N[1]    |
| mulq   |                 |
|        | %rax,\$A[1]     |
|        | 8*1(\$np),%rax  |
|        |                 |
|        | \\$0,%rdx       |
|        | %rdx,\$A[0]     |
| mulq   |                 |
|        | %rax,\$N[1]     |
| mov    | (\$a,\$j),%rax  |
| mov    | 8(\$a,\$j),%rax |
| adc    | \\$0,%rdx       |
|        | •               |

#### **Instruction Pipelining**

- Nominally, the processor executes instructions one after the other
- Instruction execution consists of multiple steps
  - Each uses a different unit
- Pipelining increases utilisation by executing steps of multiple instructions

| Instruction<br>Fetch | Instruction<br>Decode | Argument<br>Fetch | Execute | Write Back |
|----------------------|-----------------------|-------------------|---------|------------|
| Instruction<br>Fetch | Instruction<br>Decode | Argument<br>Fetch | Execute | Write Back |
| Instruction<br>Fetch | Instruction<br>Decode | Argument<br>Fetch | Execute | Write Back |
| Instruction<br>Fetch | Instruction<br>Decode | Argument<br>Fetch | Execute | Write Back |
| Instruction<br>Fetch | Instruction<br>Decode | Argument<br>Fetch | Execute | Write Back |

|   |   |   | , | 1        |   |
|---|---|---|---|----------|---|
| С | Ξ | a | / | b;<br>5: |   |
|   |   |   |   |          |   |
| Р | = | С | + | 5:       | 1 |

| mov<br>lea                                                                 | <pre>%rax,\$A[0]<br/>8*2(\$np),%rax<br/>32(\$tp),\$tp</pre>                                                                                                      |
|----------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| adc<br>mov<br>add<br>adc<br>add<br>adc<br>mov<br>mov<br>mulq<br>add<br>mov | <pre>\\$0,%rdx %rdx,\$A[1] \$m1 %rax,\$N[0] 8(\$a,\$j),%rax \\$0,%rdx \$A[0],\$N[0] \\$0,%rdx \$N[0],-24(\$tp) %rdx,\$N[1] \$m0 %rax,\$A[1] 8*1(\$np),%rax</pre> |
| adc<br>mov<br>mulq<br>add<br>mov<br>mov<br>adc                             | <pre>\\$0,%rdx %rdx,\$A[0] \$m1 %rax,\$N[1] (\$a,\$j),%rax 8(\$a,\$j),%rax \\$0,%rdD</pre>                                                                       |

• Problem: dependencies

#### Out-of-order execution

• Execute instructions when data is available rather than by program order



$$c = a / b;$$

d = c + 5;

e = f + g;

- Completed instructions wait in the reorder buffer until all previous instructions are retired
- Why not retire immediately?

#### Out-of-order execution

• Execute instructions when data is available rather than by program order



$$c = a / b;$$
  
 $d = c + 5;$ 

- e = f + g;
- Completed instructions wait in the reorder buffer until all previous instructions are retired
- Why not retire immediately?

## Out-of-order execution is speculative!

#### Speculative execution

 Abandon instructions in the reorder buffer if never executed in program order

| IF | ID | AF | EX | WB |
|----|----|----|----|----|
| IF | ID | AF | EX | WB |
| IF | ID | AF | EX | WB |

• Also useful for handling branches

## **Speculative Execution and Branches**

- When execution reaches a branch
- The processor predicts the outcome of the branch
- Execution proceeds (speculatively) along predicted branch
- Correct prediction  $\rightarrow$  all is well
- **Misprediction**  $\rightarrow$  abandon and resume



#### **Branch Prediction**

- Branch History Buffer (BHB)
  - Outcome of conditional branches
  - Does the program tend to take this



- Branch Target Buffer (BTB)
  - Target of indirect branches
  - Where does the program usually go from here?



## Main Discovery

- Abandoned speculative execution leaves traces in the microarchitecture
- Developed techniques to implement a covert channel from the abandoned code to the attacker



#### Attack overview





# Meltdown



















# Spectre (Variant 2)





#### How deep does the rabbit hole go?

- Variant 3a: leak model-specific registers
- "The processor is, in fact, operating as it is designed,"
  Smith said. "And in every case, it's been this side-channel approach that the researchers used to gain information even while the processor is executing normally its intended functions." (Intel)
- Fallout: read data from the store buffer





#### Countermeasures



# Compiler patches to block speculative execution

- Use static ana

ars TECHNICA SPOOKY SPOOKY SPECTRE Microsoft's compiler-level Barriers at eve Spectre fix shows how hard this problem will be to solve Investigation of Microsoft's compiler changes show that much of the ETER BRIGHT - 2/15/2018, 8:19 AN





#### Meltdown



Strict site isolation



• Limit memory access or use of data

```
if (x < array_len) {
    i = array[x];
    y = array2[i * 256];
    v = array2[i * 256];
}</pre>
```

- Reduce timer frequency
  - Also disable features such as SharedArrayBuffers

#### Conclusions

- Decades of focus on performance with little regard to security bring us Spectre and Meltdown
  - This is not much different to software development
  - -... but it's harder to fix

- Likely to affect computer security for a long time
  - We do not understand the full implications yet
- Microarchitectural channels matter