summaryrefslogtreecommitdiffstats
path: root/wiki/src/contribute/design/kernel_hardening.mdwn
blob: c683a3bfee31df1b71c956fb7bb679a4de56717f (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
We pass a few kernel parameters on the boot command line and /proc/sys
to increase security at little to no cost.

See [[!tails_ticket 11143]] and [[!tails_ticket 10951]] for the
discussion that lead to this choice of parameters. Most of what
follows come straight from what "cypherpunks" wrote there.

### `page_poison`

"[Clear memory at free], wiping any sensitive data so it doesn't have an
opportunity to leak in various ways (e.g. accidentally uninitialized
structures or padding), and that certain types of use-after-free flaws
cannot be exploited since the memory has been wiped"
(<https://outflux.net/blog/archives/2016/09/30/security-things-in-linux-v4-6/>)

#### `slab_nomerge`

Disables the merging of slabs of similar sizes. Many times some
obscure slab will be used in a vulnerable way, allowing an attacker to
mess with it more or less arbitrarily. Most slabs are not usable even
when exploited, so this isn't too big of a deal. Unfortunately the
kernel will merge similar slabs to save a tiny bit of space, and if
a vulnerable and useless slab is merged with a safe but useful slab,
an attacker can leverage that aliasing to do far more harm than they
could have otherwise. In effect, this reduces kernel attack surface
area by isolating slabs from each other. The trade-off is a very
slight increase in kernel memory utilization. `slabinfo -a` can be
used to tell what the memory footprint increase would be on
a given system.

#### `slub_debug=FZP`

Enables sanity checks (F), redzoning (Z) and poisoning (P). Sanity checks are
self-evident and come with a modest performance impact, but this is
unlikely to be significant on an average Tails system. The checks are
basic but are still useful both for security and as a debugging
measure. Redzoning adds extra areas around slabs that detect when
a slab is overwritten past its real size, which can help detect
overflows. Its performance impact is negligible.

Poisoning writes an arbitrary value to freed objects, so any
modification or reference to that object after being freed or before
being initialized will be detected and prevented. This prevents many
types of use-after-free vulnerabilities at little performance cost.

An additional note: any time `slub_debug=` is put in the kernel
command line, `slab_nomerge` is implied. But having `slab_nomerge`
explicitely declared can help prevent regressions where disabling of
debugging features is desired but re-enabling of merging is not.

#### `vsyscall=none`

Virtual syscalls are the obsolete predecessor of vDSO calls.
Unfortunately, both `vsyscall=native` and `vsyscall=emulate` (the
default) have a negative security impact, with the latter a little
less so. Namely, they provide a target for any attacker who has
control of the return instruction pointer, which is increasingly
common these days now that attackers need to resort to ROP and similar
attacks which target a process' control flow. The impact of this is
with reduced compatibility, however only legacy statically compiled
binaries and old versions of glibc used vsyscalls. All software on
modern Tails uses vDSO instead. If for some reason a program does try
to use a vsyscall, the process will crash with a memory access
violation, and won't bring the whole system down.

#### `mce=0`

Mostly useful for systems with ECC memory, setting `mce` to 0 will
cause the kernel to panic on any uncorrectable errors detected by the
machine check exception system. Corrected errors will just be logged.
The default is `mce=1`, which will SIGBUS on many uncorrected errors.
Unfortunately this means malicious processes which try to exploit
hardware bugginess (such as rowhammer) will be able to try over and
over, suffering only a SIGBUS at failure. Setting `mce=0` should have
no impact. Any hardware which regularly triggers a memory-based MCE is
unlikely to even boot, and the default is 1 only for
long-lived servers.

### kASLR

Linux kASLR is known as not being particularly strong, but one has to
start somewhere.
See [self-protection.txt](https://github.com/torvalds/linux/blob/master/Documentation/security/self-protection.rst)
for details.

kASLR is enabled by default in the Debian kernel since 4.7~rc7-1~exp1
(`CONFIG_RANDOMIZE_BASE` and `CONFIG_RANDOMIZE_MEMORY`) so there is no
need to enable it with a specific kernel parameter.

#### `kernel.kptr_restrict=2`

Some off-the-shelf malware exploit kernel addresses exposed via
`/proc/kallsyms` so by not making these addresses easily available we
increase the cost of such attack some what; now such malware has to
check which kernel Tails is running and then fetch the corresponding
kernel address map from some external source. This is not hard, but
certainly not all malware has such functionality.

For this reason, we also make sure to purge `/boot/System.map`.

#### `vm.mmap_rnd_bits`, `vm.mmap_rnd_compat_bits`

These settings are
[[!tails_gitweb config/chroot_local-includes/etc/sysctl.d/mmap_aslr.conf desc="set to the maximum supported value"]]
in order to improve ASLR effectiveness for mmap, at the cost of
increased address-space fragmentation.

### `kernel.kexec_load_disabled = 1`

kexec is dangerous: it enables replacement of the running kernel.

### `mds=full,nosmt`

As per
<https://www.kernel.org/doc/html/latest/admin-guide/hw-vuln/mds.html>,
if the CPU is vulnerable, this:

1. enables "all available mitigations for the MDS vulnerability, CPU
   buffer clearing on exit to userspace";
2. disables SMT which is another avenue for exploiting this class
   of attacks.