summaryrefslogtreecommitdiffstats
path: root/wiki/src/blueprint/freezable_APT_repository.mdwn
blob: 92560510d74408a981ed480fb3a1b5b77d0954e1 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
This is about [[!tails_ticket 5926]].

[[!toc levels=3]]

# Assumptions

A given APT repository snapshot is immutable after it's been taken.
If we need a freeze exception, we can import the specific package we
want into a Tails-specific APT suite, use it until the next Tails
release, and ensure it's removed from the Tails-specific APT suite
later, when appropriate.

We want to have reproducible builds some day. Therefore, the APT
`sources.list` shipped in the ISO must be stable across rebuilds from
the same release Git tag.

Say `kedit` is a package shipped in Debian, but not in Tails. Then,
when run inside Tails, `apt install kedit` must fetch `kedit` from
current Debian, as opposed to installing it from a Tails-specific, and
generally obsolete, snapshot of the Debian APT repository.

We don't bother merging mirrored APT repositories / suites into
aggregated ones. It loses information, gives us more work, and brings
little value.

# The big picture

Several times a day (e.g. 4 times, to match runs of `dinstall` in the
Debian archive) we update a local mirror of the APT repositories we're
interested in, e.g. with `reprepro update`. Once this is successfully
done, we take a snapshot of the current state of our local mirror;
this snapshot's name must contain:

 * an identifier of the APT repository this snapshot is about, e.g.
   `debian`, `debian-security`, `torproject`;
 * a `YYYYMMDD$ID` serial, `$ID` being an incremental decimal number
   formatted on two digits (`01`, `02`, etc.).

The APT repository mirroring infrastructure publishes the name of the
latest snapshot for each mirrored repository. Similarly, every ISO
build exports the names of the APT repository snapshots it uses.

Building an ISO from the `devel` branch always uses the freshest set
of APT repository snapshots available. Resolving what's the latest one
is done at the beginning of the build, so that the entire build uses
the exact same state of these repositories. This is needed for
reproducible builds, and has a nice side effect: so long, `Hashsum
mismatch`, and thanks for the fish. (Implementation detail: in
practice, this pointer resolution is done early in `auto/config`, so
that we can 1. specify the snapshots we want via `lb
config --mirror-{bootstrap,chroot}`, which `lb build` uses to generate
APT sources for the target base distribution, and 2. adjust other APT
sources (`config/chroot_sources`) somehow.)

Building an ISO from the branch used to prepare the next major release
(`testing`), or a topic branch based on it (`config/base_branch`):

 * **outside of the freeze period**: we use the latest set of APT
   repository snapshots, just like when building from `devel`;
 * **freeze period**: at freeze time, the RM encodes in the Git
   `testing` branch the set of APT repository snapshots (via their
   serial numbers) that shall be used during the freeze; the only
   exception is security.debian.org, for which we always use our
   latest snapshot;
 * **at release time**: when building from a tagged branch, similarly to
   what we do for our custom [[contribute/APT_repository]], instead of
   using timestamp-based APT repository snapshots, we use snapshots
   labeled with the Git tag;
 * **after releasing**, the RM encodes in the `testing` Git branch the
   fact that it is not frozen anymore, that is: the RM removes the
   indication that a specific set of APT repository snapshots must be
   used; and then, we're back to the "outside of the freeze
   period" case.

Building an ISO from the branch used to prepare the next point-release
`stable`), or a topic branch based on it (`config/base_branch`), we
use snapshots labeled with the Git tag of the latest Tails release,
except:

 * we generally use our latest snapshot of security.debian.org;
 * if a set of APT repository snapshots is encoded directly in that
   branch: use them, even for security.debian.org.

# Special cases and implementation

## APT sources used inside Tails

A running Tails' APT must be pointed at the official, live Debian
archive, and not to a Tails-specific and already obsolete snapshot.

To achieve that we can tweak `sources.list` as we already do in
[[!tails_gitweb config/chroot_local-includes/lib/live/config/1500-reconfigure-APT]].

But generating the 2 versions (frozen, not frozen) of the sources at
ISO build time would probably be more elegant: at boot time, one only
needs to rename files instead of fiddling with `sed`.

## Upgrading to a new Debian point-release

With this design:

 * `devel` gets them automatically because it closely tracks the
   Debian archive;
 * for release branches (`stable`, `testing`): on a case-by-case
   basis, depending on the respective Debian/Tails release schedule
   timing, we can choose whether to switch to using a new snapshot of
   the Debian archive for the next release. Note that this can be done
   via a topic-branch since this information is encoded in Git. If we
   choose not to manually pick the point release, which is the default
   if we don't act at all, then:
   - `testing` will start using the new Debian point-release as soon
     as it is unfrozen, that is as soon as it has been used to release
     a new major version of Tails;
   - `stable` will start using the new Debian point-release once
     a `testing` branch that uses that point-release is merged into
     `stable`.

## Different problems ⇒ different solutions

We want to manage two sets of snapshots that are vastly different in
terms of goals, users, turnover, garbage collection and backup
strategies:

 * time-based, full snapshots of the mirrored APT repositories over
   the last N days;
   - goal: freezable repo feature for the dev process and QA
   - this one can be started from scratch from time to time if
     reprepro becomes too slow for some reason (such as imperfect DB
     garbage collection);
   - if we lose this content, we lose only N days of data, and can
     immediately rebuild a working data set from scratch ⇒ no need to
     sync' this content to the failover server; no need to back it up;

 * tagged, partial snapshots that were used to build released Tails
   ISO images. In there we import only the needed packages. We want to
   back up this data, and expire it very cautiously, if ever. .

XXX: This can be unbearably slow, and couples together quite different
problems ⇒ let's uncouple them:

XXX: move this to the big picture above, or immediately after.

 * both reprepro's can have vastly different {garbage collecting,
   backup, sync' to failover} strategies as they have very different
   goals (QA + the freezable repo feature for the dev process, vs.
   reproducible builds + GPL compliance).

## Number of distributions

... in reprepro's `conf/distributions`, for the reprepro instance(s)
dedicated to taking snapshots of the regular Debian archive, assuming
other mirrored archives such as security.d.o, deb.tpo, etc. each go to
their own reprepro instance.

### Time-based snapshots

13 distributions:
 ( (oldstable, stable) * (base, updates, p-u, backports, sloppy-backports)
    + testing
    + sid
    + experimental
 )

4 snapshots a day (=~ 1/dinstall run) * 13 distributions
* N days
= 52 * N

Let's set N to match the `Valid-Until` duration we want: it makes
little sense to keep expired snapshots around, and reciprocally it
makes little sense to give a snapshot a validity time that goes beyond
when we'll delete it via garbage collection.

⇒ 52 * N = 52 * 10 =~ 500

#### Garbage collection

XXX

And, to ensure that garbage collection doesn't delete a snapshot we
still need, e.g. the one currently referenced in the frozen `testing`
branch, we'll maintain a list of snapshots that need to be kept
around. The tool used by the RM to bump the archive snapshot serials
in Git should take care of it.

### Tagged snapshots

We want to keep "forever" the tagged snapshots used by releases.

In practice, "forever" == min(3 years for GPL, how long we want to be
able to reproduce the build of a released ISO) = 3 years.

12 releases/year * 13 distributions =~ 150 distributions/year

⇒ 450 distributions three years after deployment, which is the upper
bound if we delete such snapshots when they're 3 years old.

## reprepro

XXX:

 * use `Tracking:` in `conf/distributions`?
 * use a leading dash for `Update: - ...` in `conf/distributions`?
 * compare fields in generated `Release` files, with what can be found
   in the official Debian archive

There's a race condition when updating a local mirror with `reprepro
update`: if it's not finished before the next dinstall + mirror sync'
end, then files `reprepro` wants to download can have disappeared from
the remote mirror, and `reprepro update` will fail (exit code = 255).
So, when the first run exits with exit code 255, let's ignore the
error and run `reprepro update` a second time.

## Listing used packages

Only needed for partial archive snapshots, but useful in all cases.

Saved as ISO build artifact, both when building in Jenkins and outside
from it.

Output:   

- for each .deb:
  * Version: Need to look up version _inside_ .deb's because file name doesn't
    contain epoch and then doesn't allow us to infer version.
  * Checksum(s)
- The union of all APT sources used during the build.
- XXX: save more build info, e.g. Git commit etc.?

"at ISO build time, generate a list of used packages and version,
including packages used at build time but not shipped in the ISO"
-> from logs APT and/or dpkg and/or `/var/cache/apt`

* debootstrap ⇒ `--keep-debootstrap-dir`
* `apt-get source` ⇒ corner case, handle by hand
* if all APT sources in use behave ((source package name, version)
  is a unique identifier wrt. file *content* among all such APT
  sources), then we don't need to save _where_ each package was
  pulled from
* Note: one can have a binary package with a different version from
  the source package it was built from, see e.g. `src:lvm2` and
  `libdevmapper1.02.1-udeb`.

Not strictly needed, but useful even if we do full archive
snapshots:

- Allows to inspect the diff between the subset of two different
  snapshots that was used at build time; the benefit is very
  minor as long as we're based on Debian oldstable or stable, but
  if/when we switch to being based on Debian testing then we will
  definitely want that. Not that minor: we also fetch packages
  from testing, sid, backports, etc.
- Say a branch (topic one, or devel, etc.) introduces
  a regression, and has changes the set of packages used at build
  time, we may want to check how exactly that set was changed.
  Think "check the diff between `.packages`" as we do at release
  time, but done in a correct way.
- Allows keeping only _partial_ snapshots (of our full archive
  ones) for those we want to keep forever, i.e. release ones.

### Downloading specific packages

Needed for creating the partial archive snapshot.

Input = the output of "Listing used packages"

for each (package, version, checksum):
  if found on deb.tails.b.o
  then
    skip
  else
    add APT sources = union(those used during build)
    if not apt-get download $package=$version:
      fetch with debsnap + verify checksum

XXX: check if grml has code to do that or something similar.

## Valid-Until and signing

Assumption: it is acceptable to have our APT repository snapshots
signed by a key that lives on an online server.

We would like to have `Valid-Until` fields in the generated `Release`
files, but we'd rather not have to update these files, and the
corresponding signatures, regularly. In practice:

 * A **tagged APT repository snapshot** that was used to build a given
   Tails release is immutable by design, so it does not need the
   protections provided by `Valid-Until`. Besides, not using
   `Valid-Until` for those makes it much easier to reproduce a given
   ISO build in the future.

 * The main use case for keeping a given **time-based APT repository
   snapshot** around and valid is when it's being used by a release
   branch:
   - `testing`: while it's frozen, that is during 5-10 days most of
     the time;
   - `stable`: that's a corner case, since `stable` generally uses the
     set of tagged snapshots of the latest Tails release; if and when
     we decide to manually point `stable` to a different set of
     snapshots, then we can as well deal with `Valid-Until` manually.

So, let's set `Valid-Until` 10 days after the generation time for
time-based snapshots, and not set it at all for tagged snapshots.

Still, it may be that we need to bump `Valid-Until` for a given
time-based snapshot, e.g. if a freeze lasts substantially longer than
usual. We thus need a tool that allows us (XXX: the RM?
sysadmin team?) to do so.

In passing, note that we ship an empty `/var/cache/apt/lists/` in the
ISO ⇒ modifying `Release` and `Release.gpg` files on our APT
repository won't prevent the ISO build from being deterministic.

## XXX

This lead me to think a bit about importing
selected packages only vs. importing entire APT dists, and my current
take on this is that the latter is much more attractive a solution.
In general, it wouldn't make much difference, but there are use cases
in which the latter solution makes the workflow trivial, while the
other makes it hard to deal with: e.g. say I'm working on a topic
branch that installs additional Debian packages; if we're importing
entire APT dists, then regardless of which stage of Tails development
we are in (frozen or not), then it'll just work since the newly needed
package is already part of the mirror we're using; OTOH, if we're
importing only the packages we think we need, then working on such
a topic branch requires

either that I have the credentials to import
new packages from Debian into our own mirror (which raises the barrier
for contributing),

 => no

or that during some phases of Tails development the
regular Debian archive is used instead of our own mirror

 * We can live with it, no? E.g. only use frozen APT sources at ISO
   build time:

   - when building a release (from a tag) which is business as usual since
     we already do that for our own APT repository; it only affects
     release managers anyway;

   - during a code freeze (from a branch whose base branch is
     `stable` or `testing`)

     * most of the time the bugfix branches we merge into `stable` and
       `testing` don't need to change the set of (package, version)
       pulled from Debian

     * when one such branch needs e.g. a package update from Debian:
       1. import it into our own APT repo (`stable` or `testing`
          branch) so it's installed in the next Tails release
       2. make it so we remove this package from the relevant APT
          source (at least `devel`; more?) after next release (a
          ticket in Redmine should be good enough).
          And/or add an APT pinning entry in the relevant branches (at
          least `devel`; more?) that forces installing this package
          from Debian, as opposed as to from our own repo.
          This is seriously ugly and complex, but we're speaking of
          a corner case so perhaps it's OK.

     * when we fix bugs directly in the Debian archive during a Tails
       code freeze; XXX: check if we often do that

## XXX: TODO

* draft doc for each workflow, including stable, testing, devel, and
  `$topic`
* write automated tests for the generation of APT sources
* implement the generation of APT sources
* have a debootstrap 1.0.73+ in all our build environments (Vagrant
  basebox, Jenkins slaves, manual build doc) so that we get the
  `deburis` file, that's needed to build our packages listing.

## APT vs. reprepro: dist names

We need to encode in the APT sources' base URL the exact snapshot we
want to use, in order to be able to pass it to `lb config --mirror-*`.
But this doesn't match reprepro's directory structure as-is.

Thankfully this problem can be workaround'ed with some symlinks or
HTTP rewrite rules. Here's how.

Let's assume:

    lb config --distribution wheezy
    lb config --mirror-chroot          http://XXX.tails.boum.org/debian/20151019/
    lb config --mirror-chroot-security http://XXX.tails.boum.org/debian-security/20151021/
    etc.

Which generates this APT `sources.list`:

    deb http://XXX.tails.boum.org/debian/20151019/ wheezy main
    deb http://XXX.tails.boum.org/debian-security/20151021/ wheezy/updates main

As a result APT sends HTTP requests with URL such as:

 * <http://XXX.tails.boum.org/debian/20151019/dists/wheezy/Release>
 * <http://XXX.tails.boum.org/debian-security/20151021/dists/wheezy/updates/Release>

The corresponding files in reprepro's filesystem (if we have one
reprepro instance per mirrored archive) are:

 * in Debian archive's reprepro:
   - `/srv/foo/debian/dists/wheezy/20151019/Release`
   - `/srv/foo/debian/conf/distributions` contains `Suite: wheezy/20151019`

 * in Debian security archive's reprepro:
   - `/srv/foo/debian-security/dists/wheezy/updates/20151021/Release`
   - `/srv/foo/debian-security/conf/distributions` contains
     `Suite: wheezy/updates/20151019`

To have these HTTP requests translate to access these files, one needs
either symlinks (tested successfully) or HTTP rewrite rules.

Note: this works because APT only warns when the codename in the
`Release` file doesn't match the one requested in `sources.list`.
There's a code comment around this check, dating back from 2004, that
says "This might become fatal in the future". We bet that if it
becomes fatal some day, it will be possible to turn it back into
a warning via configuration. This affects only development builds
since we're not going to configure APT _in the Tails ISO_ to point to
our own snapshots of the Debian archive.

# Bonus for later

This mechanism can perhaps be reused for snapshotting the state of our
own repo at release time (e.g. to create/publish the `1.6` APT suite).

If the chosen mirroring/snapshoting tool supported re-using the Debian
signature (e.g. <https://github.com/smira/aptly/issues/37>) then we
would only have to sign ourselves the snapshots for which need to
modify `Release` — that is: when we bump (too long freeze) or remove
(at release time) `Valid-Until` — which happens rarely and can be done
manually ⇒ we can avoid storing the signing key on an online server.

# Discarded

## "Remote" APT repository snapshots

... i.e. using snapshot.debian.org directly, instead of mirroring the
files ourselves.

Discarded because:

* not substantially simpler than our other designs;
* having to re-implement `Valid-Until` checking is scary;
* too much reliance on an external service.

frozen mode = when building from a tag => use snapshot.d.o with
a timestamp manually set in Git => need code that tells us what's the
dinstall timestamp used at some point during a validated build (racy
but no big deal; can kill the race condition by using a local mirror
whose update is disabled during builds)

regular mode = otherwise => use ftp.us.d.o

* Directly use snapshot.d.o + dinstall ID
  - basically replaces e.g. aptly's snapshot / "reprepro pull in new
    suite" feature
  - The fastest possible way to do a new snapshot, since we don't have
    to store nor pull anything at all.
  - Doesn't introduce a database we have to maintain and trust
    software not to ever corrupt it.
  - the dinstall ID that a given mirror was last updated can be
    retrieved from that mirror, e.g. `Archive serial` in
    <http://ftp.fr.debian.org/debian/project/trace/ftp-master.debian.org>
  - Blocker: `Valid-Until` can be invalid:
    * If we don't bump the dinstall ID at least once a week as part of
      the normal development process. Seems impractical (e.g.
      we sometimes freeze for more than a week) and too rigid.
    * When rebuilding from an old tag (old > a week).
  - XXX: do we want to depend on snapshot.d.o that much?
    * Served from two different locations.
    * Ask weasel if we can go this way. Make it clear how much we care
      about _old_ data, e.g.:
      - For reproducible rebuild check we only care about re-building
        the last release, or the last few releases.
      - GPL requires distributing the source for at least 3 years
        after we stop distributing the binaries.
  - XXX: Whonix uses that, go look/ask for pros/cons they've seen.
  - XXX: other repos e.g. deb.tpo; we can probably handle it in a very
    ad-hoc and lightweight way, by importing the packages we want into
    our own Tails-specific APT suites, or with reprepro's mirroring
    (`pull`) feature.

- to avoid relying on browsing <http://snapshot.debian.org/> for
  getting the dinstall timestamp we'll stick into Git, we need
  a script that does the browsing _and_ validates that the determined
  timestamp is not too far away in the past.

## Partial APT repository snapshots

Discarded because:

* it raises a tricky chicken'n'egg problem: managing the list of
  packages that will be needed to build a given Git branch, and
  maintaining the set of partial APT repository snapshots that these
  branches need;
* partially snapshot'ing live APT repositories (e.g.
  security.debian.org) is racy: between the time when we build an ISO
  to get the list of packages we need to import, and the time we
  actually import them, files can have disappeared on the mirrors.

 - pro: faster sync ⇒ faster snapshots ⇒ shorter time to remediation
   However, we can have something similar with full snapshots, if we
   continuously update a temporary snapshot, and then when we need it
   we only have to stick some label onto it.
 - cons: more complex... except perhaps if we want to optimize time to
   remediation for full snapshots as described above.

### Snapshots name

For partial mirroring, their name must contain:

* Debian origin (`debian`, `debian-security`)
* Debian distribution (`sid`, `jessie`, `jessie-backports`, etc.)
* name of the Tails Git release/base branch that needs this set of
  packages

## Replace index expiration check

For the remote snapshots (snapshot.d.o) solution, we _have_ to do
that. For partial and full archive snapshots, this is optional: the
only advantage is that it allows us to _not_ periodically update
`Valid-Until` and signature.

One "solution" would be to replace `Acquire::Check-Valid-Until`:

 - runtime: we point APT sources to the regular Debian archive, no
   need to disable `Acquire::Check-Valid-Until`, we're good.
 - ISO build time: we know when we've frozen ⇒ we can tell APT not to
   do that check, and check the Release files ourselves based on the
   additional info and constraints we have; a bit risky, no right to
   fail, but not totally scary; XXX: draft a security discussion, then
   have it reviewed

## dak, britney, merge-o-matic, debile, etc.

Overkill, and not really meant to address our needs.
Let's instead write our own :P