summaryrefslogtreecommitdiffstats
path: root/wiki/src/blueprint/freezable_APT_repository.mdwn
blob: 1a237d45181acd30f7b2572b9f9c4d36d2d333da (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
This is about [[!tails_ticket 5926]].

[[!toc levels=3]]

# Assumptions

A given APT repository snapshot is immutable after it's been taken.
We'll deal with freeze exception separately.

We want to have reproducible builds some day. Therefore, the APT
`sources.list` shipped in the ISO must be stable across rebuilds from
the same release Git tag.

Say `kedit` is a package shipped in Debian, but not in Tails. Then,
when run inside Tails, `apt install kedit` must fetch `kedit` from
current Debian, as opposed to installing it from a Tails-specific, and
generally obsolete, snapshot of the Debian APT repository.

We don't bother merging mirrored APT repositories / suites into
aggregated ones. It loses information, gives us more work, and brings
little value.

# TODO

1. doc-driven development [i]
   * draft contributors doc for each workflow
     - RM (see release process doc and APT repo common operations doc)
     - developer (including stable, testing, devel, and `$topic`)
   * get the updated documentation + this design reviewed, including
     security aspects [i]
   * write tools that the doc calls for
     - bump `Valid-Until`
     - freeze
     - unfreeze
   * move relevant content from this blueprint to the "final" design
     doc + contributors doc

2. time-based snapshots [i]
   a. **done** initial reprepro setup that keeps up-to-date local mirrors of
      the APT repositories we need
   b. **done** snapshot these mirrors every time they're updated
   c. **done** decide how many reprepro instances we want/have to split all
      this among
   d. **done** mirror relevant suites of deb.tails.b.o as well
   d. publish the snapshots over HTTP
   e. try using such snapshots for building an ISO
   e. publish the snapshots' serial: a file is updated, now needs to
      be published over HTTP
   d. implement list of sticky snapshots that must not be GC'ed,
      including the tool to add to that list
   e. implement GC of snapshots
   f. implement GC of packages
   g. have build system output the snapshots being used,
      and have Jenkins publish this info if available
   h. manage symlinks or rewrite rules for URL → reprepro filesystem layout

3. generate set of APT sources [i]
   * write automated tests for the generation of APT sources
     - redirection when using the latest snapshot of a given origin
   * implement the generation of APT sources
   * plug the generation of APT sources into the build system
   * implement
     [[switching to live APT sources at runtime|freezable_APT_repository#runtime-sources]]

4. tagged snapshots
   a. **done** PoC of capturing the list of binary packages used during the build [k]
   b. **done** PoC of capturing the list of source packages used during the build [k]
   c. **done** initial reprepro setup for tagged snapshots
   d. **done** debootstrap in jessie-backports
   e. **WIP** how to create a partial snapshot from a manifest and
      the origin time-based snapshots? [k]
      - review and test k's code that is meant to address this [i]; in particular:
        * `generate-build-manifest` (main Git repo), aka.
          [[!tails_ticket 10748]]
          - cherry-pick the relevant bits and get them into Tails 2.3 [i]
          - update or replace the custom debootstrap script
            (`tails-wheezy`)
          - drop the "per-origin" part: we only need (package,
            version)
          - The architecture information is not part of the manifest.
            It's fine if `tails-prepare-tagged-apt-snapshot-import` can
            do its job without this information. Can it?
          - triage remaining XXX:s in the script, address what needs
            to be
        * `tails-prepare-tagged-apt-snapshot-import`, aka.
          [[!tails_ticket 10749]] (`puppet-tails` repo):
          - Handle the problem mentioned on
            [[listing used packages|freezable APT repository#build-manifest]]
            [k]
          - support for multiple architectures? needed for multiarch
            that we'll have to use as soon as we want to upgrade Linux
            to 4.x, see
            [[!tails_gitweb_branch feature/8415-overlayfs]]; (it might
            be good enough to import a bit too much, e.g. import each
            package for _all_ architectures our reprepro setup
            supports even though we need it only for one architecture;
            beware of differing versions due to binNMUs, though)
          - triage remaining XXX:s in the script, address what needs
            to be
   f. **WIP** expand list of source packages with those that the binary
      packages were built from [k]
      => review this [i], in particular:
      - check the case when the binary package's version is different
        from the corresponding source package's one
        (`libdevmapper1.02.1` vs. `lvm2`)
      - torproject provides no source packages; how does
        `tails-prepare-tagged-apt-snapshot-import` deal with it?
   g. **WIP** have the manifest → partial snapshot process include source
      packages [k]
      => review this [i], in particular:
      - check the case when the binary package's version is different
        from the corresponding source package's one
        (`libdevmapper1.02.1` vs. `lvm2`)
   h. for some Tails release: generate manifest, import packages into
      tagged snapshots, try building *offline* with these tagged
      snapshots [i]
   i. have debootstrap 1.0.73+ in all our build environments so that
      we get the `deburis` file, that's needed to build our packages
      listing; same for `libfile-slurp-perl` and `liblist-moreutils-perl`
      - Vagrant basebox
      - Jenkins slaves
      - done: manual build doc
   j. convert custom `data/debootstrap/tails-wheezy` into a patch,
      or set up the process to update/replace it in the future [i]
   k. Update the
      [[Listing used packages|freezable APT repository#build-manifest]]
      section
   l. Have Jenkins publish the list of build manifest
      if available (supersede existing `*.{bin,src}pkg`).
   m. if needed, implement GC

5. misc
   * implement whatever the "freeze exceptions" section requires
   * Puppetize every server-side thing that hasn't been yet.

# The big picture

## Snapshots and branches

Several times a day (e.g. 4 times, to match runs of `dinstall` in the
Debian archive; XXX: start with once a day and then raise the
frequency if the infrastructure can hold it?) we update a local mirror
of the APT repositories we're
interested in, e.g. with `reprepro update`. Once this is successfully
done, we take a snapshot of the current state of our local mirror
(e.g. with `reprepro pull`); this snapshot's name must contain:

 * an identifier of the APT repository this snapshot is about, e.g.
   `debian`, `debian-security`, `torproject`;
 * a `YYYYMMDD$ID` serial, `$ID` being an incremental decimal number
   formatted on two digits (`01`, `02`, etc.).

The APT repository mirroring infrastructure publishes the name of the
latest snapshot for each mirrored repository. Similarly, every ISO
build exports the names of the APT repository snapshots it uses.

Building an ISO from the `devel` branch always uses the freshest set
of APT repository snapshots available. Resolving what's the set of
freshest APT repository snapshots is done at the beginning of the
build, so that the entire build uses the exact same state of these
repositories. This is needed for reproducible builds, and has a nice
side effect: so long, `Hashsum mismatch`, and thanks for the fish.
(Implementation detail: in practice, this pointer resolution is done
early in `auto/config`, so that we can 1. specify the snapshots we
want via `lb config --mirror-{bootstrap,chroot}`, which `lb build`
uses to generate APT sources for the target base distribution, and 2.
adjust other APT sources (`config/chroot_sources`) somehow.)

Building an ISO from the branch used to prepare the next major release
(`testing`), or a topic branch based on it (`config/base_branch`):

 * **outside of the freeze period**: we use the latest set of APT
   repository snapshots, just like when building from `devel`;
 * **freeze period**: at freeze time, the RM encodes in the Git
   `testing` branch the set of APT repository snapshots (via their
   serial numbers) that shall be used during the freeze; the only
   exception is security.debian.org, for which we always use our
   latest snapshot;
 * **at release time**: when building from a tagged branch, similarly to
   what we do for our custom [[contribute/APT_repository]], instead of
   using time-based APT repository snapshots, we use snapshots
   labeled with the Git tag;
 * **after releasing**, the RM encodes in the `testing` Git branch the
   fact that it is not frozen anymore, that is: the RM removes the
   indication that a specific set of APT repository snapshots must be
   used; and then, we're back to the "outside of the freeze
   period" case.

Building an ISO from the branch used to prepare the next point-release
(`stable`), or a topic branch based on it (`config/base_branch`
contains `stable`), we
use snapshots labeled with the Git tag of the latest Tails release,
except:

 * we generally use our latest snapshot of security.debian.org;
 * if a set of APT repository snapshots is encoded directly in that
   branch: use them, even for security.debian.org.

XXX: add special handling of deb.tails.b.o, that we need since it's
the repo where we can sneak freeze exceptions in. In theory it's not
related to our great APT repository snapshots plans, since it has its
own snapshots mechanism already, but ideally we would integrate it
into the new system entirely?

## Different problems ⇒ different solutions

Note that:

 * The time-based snapshots of the mirrored APT repositories that are
   used basically all the time (except when building a release) should
   be *full* snapshots, that is they should contain exactly the same
   set of packages as the mirrored repository. This has the advantage
   that some workflows are trivially handled, e.g. working on a topic
   branch that installs additional Debian packages; if such snapshots
   were not full ones, then to work on one such branch, one would need
   either that I have the credentials to import new packages from
   Debian into our own mirror or repositories (which raises the
   barrier for contributing), or that during some phases of Tails
   development the regular Debian archive is used instead of our own
   mirror, which feels prone to "time to QA vs. time to
   release" issues.

 * The tagged snapshots used to build releases can be *partial*, that
   is they can contain only the subset of the mirrored repositories
   that is required for building a specific Tails ISO image.

So, we actually want to manage two sets of snapshots that are vastly
different in terms of goals, users, turnover, garbage collection and
backup strategies:

 * **time-based, full snapshots** of the mirrored APT repositories over
   the last N days;
   - goal: freezable repo feature for the dev process and QA
   - this one can be restarted from scratch from time to time if
     reprepro becomes too slow for some reason (such as imperfect DB
     garbage collection);
   - if we lose this content, we lose only N days of data, and we can
     immediately rebuild a working data set from scratch ⇒ no need to
     sync' this content to the failover server; no need to back it up;

 * **tagged, partial snapshots** that were used to build released Tails
   ISO images:
   - goal: reproducible builds, GPL compliance;
   - in there we import only the needed packages;
   - we want to back up this data, and expire it very cautiously,
     if ever.

Trying to solve both problems in the same `reprepro` instance would be
problematic. Not only, coupling very different problems together, and
trying to address them with the exact same tools and process, is
generally a bad idea. But also, reprepro's database becomes quite big
when we import large chunks of the Debian archive into it, which may
make it slow ([[!tails_ticket 6295]]), and in any case makes it hard
to back up... which we want to, for preserving the releases' tagged
snapshots information.

So we'll use two independent `reprepro` instances to address these
two problems.

XXX: how exactly we'll import packages we need from time-based
snapshots to tagged ones is left to be defined (filtered `reprepro
update`? `cp` + `reprepro includeblah`?)

# Special cases and implementation

<a id="runtime-sources"></a>

## APT sources used inside Tails

A running Tails' APT must be pointed at the official, live Debian
archive, and not to a Tails-specific and already obsolete snapshot.

To achieve that we can tweak `sources.list` as we already do in
[[!tails_gitweb config/chroot_local-includes/lib/live/config/1500-reconfigure-APT]].

But generating the 2 versions (frozen, not frozen) of the sources at
ISO build time would probably be more elegant: at boot time, one only
needs to rename files instead of fiddling with `sed`.

## Upgrading to a new snapshot

In other words: bumping, in Git, the pointers to the set of snapshots
that shall be used.

Let's use, as an example of a situation in which we might want to do
that, upgrading to a new Debian point-release.

With this design:

 * `devel` gets them automatically because it closely tracks the
   Debian archive;
 * for release branches (`stable`, `testing`): on a case-by-case
   basis, depending on the respective Debian/Tails release schedule
   timing, we can choose whether to switch to using a new snapshot of
   the Debian archive for the next release. Note that this can be done
   via a topic-branch since this information is encoded in Git. If we
   choose not to manually pick the point release, which is the default
   if we don't act at all, then:
   - `testing` will start using the new Debian point-release as soon
     as it is unfrozen, that is as soon as it has been used to release
     a new major version of Tails;
   - `stable` will start using the new Debian point-release once
     a `testing` branch that uses that point-release is merged into
     `stable`.

<a id="freeze-exceptions"></a>

## Freeze exceptions

This is a new problem brought by using "frozen" snapshot of APT
repositories during a Tails code freeze: some bug, that we want to see
fixed in the release we are preparing, would be resolved if we pulled
an upgraded package as-is from a freshest Debian APT repository.
Before we could freeze APT repositories, we would have got this bugfix
for free. Now we need to grant freeze exceptions.

This is similar to "Upgrading to a new snapshot", except that we want
to upgrade one package only. By definition, this only affects *frozen*
release branches (`stable`, `testing`), and topic branches based on
them: all other branches use the freshest set of APT repository
snapshots available.

Most of the time, a bugfix branch we want to merge into a frozen
release branch doesn't need to upgrade packages from Debian, so this
is a corner case for the time being. Moreover, so far we have always
dealt with this problem entirely by hand, so it's not critical to
provide much improved tools. What makes it tempting to improve the
situation here is mostly:

 * even though freeze exceptions will remain exceptions, frozen will
   add one use case:
 * this will become a relatively common operation if we are based on
   Debian testing some day, so let's check that it's not only
   possible, but also reasonably easy to handle with this design
   (otherwise we may have to switch to more powerful tools, such as
   dak + britney).

Definition: here, we'll call "overlay [[contribute/APT repository]]"
the set of Tails-specific APT suites that we have been maintaining for
a few years. They are overlay in that they don't contain all the
packages that can be found in Debian: building a Tails ISO image also
requires another kind of APT sources, that are more complete.

We can handle freeze exceptions this way:

1. Import the package we want to upgrade into our own overlay
   [[contribute/APT repository]], in the suite corresponding to the
   branch that we want to see this package ⇒ in the general case, the
   upgraded package will be installed in the next Tails release.
   We need a tool to do that (would `reprepro pull` with a custom
   filter do the job?).

2. Pin, in `config/chroot_apt/preferences`, the upgraded package we
   have just imported. The aforementioned tool can do this as well.

   [Our current default APT pinning ranks Tails overlay APT suites
   over any other APT source, so why the need to add an APT pinning
   entry? The problem is that it's hard to do the next step (clean up)
   with this APT pinning, combined with the fact that we can't easily
   delete a package from an APT suite and see this deletion propagated
   over suite merges. I (intrigeri) was not able to find a good
   solution to that problem under these constraints, so this document
   assumes that we change this, and pin our overlay APT suites at the
   same level as the APT sources corresponding to the Debian release
   Tails is currently based on. This implies that we manually pin, in
   Git, the packages from our overlay APT suites, that we want to
   override the ones found in other repositories regardless of
   version numbers.]

3. Make it so branches stop using the upgraded package once they have
   been unfrozen, that is once the upgraded package can be fetched
   from a time-based snapshot of the repository we've initially pulled
   it from. Reverting the commit that added the corresponding APT
   pinning in the first place is enough. This should be done by the
   release manager, immediately after a release, when they un-freeze
   the branch used for the release and merge it into other release
   branches. But the RM needs to know which commit to revert, so we
   need to keep track of such upgrades: ideally the tool used to pull
   the upgraded package in the first place should generate the
   commands that will need to be run post-release, or save the data
   needed to generate these commands; such information must be pasted
   somewhere, e.g. into a new ticket; these commands could even be
   scheduled to be run post-release automatically, if we're
   comfortable giving commit access to our Git repository to more
   machines, and have time to implement it.

Another option, instead of adding/removing temporary APT pinning,
would be to backport the package we want to upgrade, and make it so it
has a version greater than the one in the time-based snapshot used by
the frozen release branch, and lower than the one in more recent
time-based snapshots. This means building and uploading the package to
the relevant overlay APT suite. This is appealing, because it doesn't
require any cleanup: the upgraded package will automatically be
superseded as soon as it can be. However:

 * we would not benefit from Debian features like reproducible builds;
 * it requires either manual work and bandwidth every time, or setting
   up and maintaining infrastructure to automate the whole thing;
 * the fact that the changes *have to* go through Git, with the APT
   pinning option, helps enforcing our review'n'merge processes; one
   can do it by the book with the custom backport option too, by going
   through a topic branch and `config/APT_overlays.d/`, but it still
   conveys less historical information through Git than the APT
   pinning option.

## Number of distributions

... in reprepro's `conf/distributions`, for the reprepro instance(s)
dedicated to taking snapshots of the regular Debian archive, assuming
other mirrored archives such as security.d.o, deb.tpo, etc. each go to
their own reprepro instance.

XXX: the more we split between multiple instances of reprepro, the
smaller and more manageable its database becomes. But it implies some
disk space waste due to duplicated files, and some bandwidth waste to
re-downloading these duplicated packages. If the waste is limited the
packages from security.d.o that get included in the next
{oldstable,stable} point release we can perhaps live with it.

### Time-based snapshots

14 distributions:
 (    oldstable * (base, updates, p-u, backports, sloppy-backports)
    + stable    * (base, updates, p-u, backports)
    + testing * (base, updates, p-u)
    + sid
    + experimental
 )

4 snapshots a day (=~ 1/dinstall run) * 14 distributions
* N days
= 56 * N

Let's set N to match the `Valid-Until` duration we want: it makes
little sense to keep expired snapshots around, and reciprocally it
makes little sense to give a snapshot a validity time that goes beyond
when we'll delete it via garbage collection.

⇒ 56 * N = 56 * 10 = 560

Number of distributions for other archives:

- debian-security: 3 (oldstable, stable, testing)
- tails: 3 (stable, testing, devel)
- torproject: 5 (oldstable, stable, testing, unstable, obfs4proxy)

#### Garbage collection

Simply cloning an existing Wheezy/i386/main "distribution" adds 100MB
to `reprepro`'s database (*not* counting the actual packages!), so the
whole thing will likely be quite big ⇒ expiring the snapshots older
than N days will probably be compulsory.

To ensure that garbage collection doesn't delete a snapshot we still
need, e.g. the one currently referenced in the frozen `testing`
branch, we'll maintain a list of snapshots that need to be kept
around. The tool used by the RM to bump the archive snapshot serials
in Git should take care of it.

### Tagged snapshots

We want to keep "forever" the tagged snapshots used by releases.

In practice, "forever" == min(3 years for GPL, how long we want to be
able to reproduce the build of a released ISO) = 3 years.

12 releases/year * 13 distributions =~ 150 distributions/year

⇒ 450 distributions three years after deployment, which is the upper
bound if we delete such snapshots when they're 3 years old.

#### Garbage collection

Depending on the growth rate of this `reprepro` instance's database,
we may or may not need to implement expiration of these snapshots any
time soon. Time will tell.

## reprepro

XXX:

 * use `Log:` in `conf/distributions`? deployed (20151030), let's look
   at it and reconsider in a few weeks
 * use `Tracking:` in `conf/distributions`?
 * use a leading dash for `Update: - ...` in `conf/distributions`?
   <https://mirrorer.alioth.debian.org/reprepro.1.html#Some%20note%20on%20updates>
 * compare fields in generated `Release` files, with what can be found
   in the official Debian archive
 * "Reprepro uses berkeley db, which was a big mistake. The most
   annoying problem not yet worked around is database corruption when
   the disk runs out of space. (Luckily if it happens while
   downloading packages while updating, only the files database is
   affected, which is easy (though time consuming) to rebuild, see
   recovery file in the documentation). *Ideally put the database on
   another partition to avoid that.*" (emphasis mine, from
   [reprepro(1)](https://mirrorer.alioth.debian.org/reprepro.1.html#BUGS))

There's a race condition when updating a local mirror with `reprepro
update`: if it's not finished before the next dinstall + mirror sync'
end, then files `reprepro` wants to download can have disappeared from
the remote mirror, and `reprepro update` will fail (exit code = 255).
So, when the first run exits with exit code 255, let's ignore the
error and run `reprepro update` a second time.

### Snapshots

In our [initial
experiments](https://labs.riseup.net/code/issues/6295#note-14) we
added full blown distributions to `conf/distributions` for each
snapshot, and used `reprepro pull $codename` to add packages to them.

Let's try with `reprepro gensnapshot`, which avoids the need to manage
the list of snapshots in `conf/distributions`. The following tests are
run with `conf/{distributions,updates}` set up to mirror the 14
distributions we want from the Debian archive.

Creating one snapshot:

	distributions() {
	    sed -rn -e 's/^Codename:\s+(.*)$/\1/p' conf/distributions
	}
	serial="$(date -u '+%Y%m%d')01"
	for codename in $(distributions) ; do
	    reprepro gensnapshot "$codename" "$serial"
	done

⇒ `dists/*/snapshots` takes 400MB (a snapshot done with `reprepro
pull` would of course add essentially the same files somewhere else in
`dists`, and occupy the same disk space in there), but the DB doesn't
grow noticeably.

And then, jumping to 40 (10 days * 4 snapshots/day) snapshots of each
distribution, which is what we should have in practice:

	for incr in $(seq --equal-width 2 40); do
	    serial="$(date -u '+%Y%m%d')$incr"
	    for codename in $(distributions) ; do
	        reprepro gensnapshot "$codename" "$serial"
	    done
	done

⇒ `dists/*/snapshots` takes 16 GB, and the DB has grown from 900 MB to
1.5 GB; as expected, `packages.db` didn't grow at all: only
`references.db` did.

Conclusion: compared to the "snapshots as full-blown distributions +
`reprepro pull`" option, we're saving _a lot_ on database size, which
is very appealing. The counterpart being that:

 * garbage collecting expired snapshots is a bit more involved, but
   apparently doable: see reprepro(1) around `gensnapshot`;
 * bumping `Valid-Until` for a given time-based snapshot has to be
   done directly in `dist`, without any help from reprepro.

XXX: find out how we can solve these two problems.

None of these problems seem to warrant going back to the other
option... and having to deal with 80GB+ BDB databases.

<a id="build-manifest"></a>

## Listing used packages

Saved as ISO build artifact, both when building in Jenkins and outside
from it.

Output:

- for each .deb: the installed version
- The union of all APT sources used during the build.

### Corner case

If a (package, version) is seen at build time in 2 or more APT
sources, `tails-prepare-tagged-apt-snapshot-import` will inject it
into each of the tagged snapshots corresponding to these sources.
The goal is to avoid this scenario:

 - version X of package P is available both in suite S1 on origin O1,
   and in suite S2 on origin O2
 - version Y of package P is available in suite S3 of origin O3
 - our pinning makes us prefer version X of package P *because it's
   available in O1/S1*; otherwise, if it wasn't in there, then our
   pinning would make APT prefer version Y to version X
 - at ISO build time, APT fetches package P version X from O2/S2
 - given this build manifest, we import package P version X into our
   tagged snapshot of O2/S2, but not into our tagged snapshot of O1/S1
 - if we rebuild from the same source tree using that set of tagged
   snapshots, then version Y of package P will be installed

This scenario can happen in practice:

	# cat /etc/apt/sources.list
	deb http://security.debian.org wheezy/updates main
	deb http://ftp.us.debian.org/debian/ wheezy main
	deb http://ftp.us.debian.org/debian/ jessie main

	# cat /etc/apt/preferences
	Package: *
	Pin: origin security.debian.org
	Pin-Priority: -10

	Package: *
	Pin: release o=Debian,n=wheezy
	Pin-Priority: 990

	Package: *
	Pin: release o=Debian,n=jessie
	Pin-Priority: 700

	# apt-cache madison a2ps
	a2ps | 1:4.14-1.3 | http://ftp.us.debian.org/debian/ jessie/main amd64 Packages
	a2ps | 1:4.14-1.1+deb7u1 | http://security.debian.org/ wheezy/updates/main amd64 Packages
	a2ps | 1:4.14-1.1+deb7u1 | http://ftp.us.debian.org/debian/ wheezy/main amd64 Packages

	# apt-cache policy a2ps
	a2ps:
	  Installed: (none)
	  Candidate: 1:4.14-1.1+deb7u1
	  Version table:
	     1:4.14-1.3 0
	        700 http://ftp.us.debian.org/debian/ jessie/main amd64 Packages
	     1:4.14-1.1+deb7u1 0
	        -10 http://security.debian.org/ wheezy/updates/main amd64 Packages
	        990 http://ftp.us.debian.org/debian/ wheezy/main amd64 Packages

And then, in the current state of things APT will download `a2ps` from
security.d.o:

	# apt-get download a2ps --print-uris
	'http://security.debian.org/pool/updates/main/a/a2ps/a2ps_4.14-1.1+deb7u1_amd64.deb' a2ps_4.14-1.1+deb7u1_amd64.deb 956298 sha256:e47d7fe9adb7aa62421108debf425830f4e2385e98151c5cb359d3eb8688eea8

... but if `a2ps` was not available in the regular Wheezy archive,
e.g. because we were using a tagged snapshot that imported `a2ps` into
the security archive, then APT would prefer `a2ps` from Jessie, which
demonstrates the bug... hence the "inject it into each of the tagged
snapshots corresponding to these sources" requirement we had.

### Bonus material

- Allows to inspect the diff between the subset of two different
  snapshots that was used at build time; the benefit is very
  minor as long as we're based on Debian oldstable or stable, but
  if/when we switch to being based on Debian testing then we will
  definitely want that. Not that minor: we also fetch packages
  from testing, sid, backports, etc.
- Say a branch (topic one, or devel, etc.) introduces
  a regression, and has changes the set of packages used at build
  time, we may want to check how exactly that set was changed.
  Think "check the diff between `.packages`" as we do at release
  time, but done in a correct way.
- Allows keeping only _partial_ snapshots (of our full archive
  ones) for those we want to keep forever, i.e. release ones.

## Valid-Until and signing

Assumption: it is acceptable to have our APT repository snapshots
signed by a key that lives on an online server.

We would like to have `Valid-Until` fields in the generated `Release`
files, but we'd rather not have to update these files, and the
corresponding signatures, regularly. In practice:

 * A **tagged APT repository snapshot** that was used to build a given
   Tails release is immutable by design, so it does not need the
   protections provided by `Valid-Until`. Besides, not using
   `Valid-Until` for those makes it much easier to reproduce a given
   ISO build in the future.

 * The main use case for keeping a given **time-based APT repository
   snapshot** around and valid is when it's being used by a release
   branch:
   - `testing`: while it's frozen, that is during 5-10 days most of
     the time;
   - `stable`: that's a corner case, since `stable` generally uses the
     set of tagged snapshots of the latest Tails release; if and when
     we decide to manually point `stable` to a different set of
     snapshots, then we can as well deal with `Valid-Until` manually.

So, let's set `Valid-Until` 10 days after the generation time for
time-based snapshots, and not set it at all for tagged snapshots.

Still, it may be that we need to bump `Valid-Until` for a given
time-based snapshot, e.g. if a freeze lasts substantially longer than
usual. We thus need a tool that allows us (XXX: the RM?
sysadmin team?) to do so.

In passing, note that we ship an empty `/var/cache/apt/lists/` in the
ISO ⇒ modifying `Release` and `Release.gpg` files on our APT
repository won't prevent the ISO build from being deterministic.

## APT vs. reprepro: dist names

We need to encode in the APT sources' base URL the exact snapshot we
want to use, in order to be able to pass it to `lb config --mirror-*`.
But this doesn't match reprepro's directory structure as-is.

Thankfully this problem can be workaround'ed with some symlinks or
HTTP rewrite rules. Here's how.

Let's assume:

    lb config --distribution wheezy
    lb config --mirror-chroot          http://XYZ.tails.boum.org/debian/20151019/
    lb config --mirror-chroot-security http://XYZ.tails.boum.org/debian-security/20151021/
    etc.

Which generates this APT `sources.list`:

    deb http://XYZ.tails.boum.org/debian/20151019/ wheezy main
    deb http://XYZ.tails.boum.org/debian-security/20151021/ wheezy/updates main

As a result APT sends HTTP requests with URL such as:

 * <http://XYZ.tails.boum.org/debian/20151019/dists/wheezy/Release>
 * <http://XYZ.tails.boum.org/debian-security/20151021/dists/wheezy/updates/Release>

XXX: update the following if we decide to use `reprepro gensnapshot`,
which implies slightly different paths.

The corresponding files in reprepro's filesystem (if we have one
reprepro instance per mirrored archive) are:

 * in Debian archive's reprepro:
   - `/srv/foo/debian/dists/wheezy/20151019/Release`
   - `/srv/foo/debian/conf/distributions` contains `Suite: wheezy/20151019`

 * in Debian security archive's reprepro:
   - `/srv/foo/debian-security/dists/wheezy/updates/20151021/Release`
   - `/srv/foo/debian-security/conf/distributions` contains
     `Suite: wheezy/updates/20151019`

To have these HTTP requests translate to access these files, one needs
either symlinks (tested successfully) or HTTP rewrite rules.

Note: this works because APT only warns when the codename in the
`Release` file doesn't match the one requested in `sources.list`.
There's a code comment around this check, dating back from 2004, that
says something like "This might become fatal in the future". We bet that if it
becomes fatal some day, it will be possible to turn it back into
a warning via configuration. This affects only development builds
since we're not going to configure APT _in the Tails ISO_ to point to
our own snapshots of the Debian archive.

# Bonus for later

This mechanism can perhaps be reused for snapshotting the state of our
own repo at release time (e.g. to create/publish the `1.6` APT suite).

If the chosen mirroring/snapshoting tool supported re-using the Debian
signature (e.g. <https://github.com/smira/aptly/issues/37>) then we
would only have to sign ourselves the snapshots for which need to
modify `Release` — that is: when we bump (too long freeze) or remove
(at release time) `Valid-Until` — which happens rarely and can be done
manually ⇒ we can avoid storing the signing key on an online server.

# Discarded

## "Remote" APT repository snapshots

... i.e. using snapshot.debian.org directly, instead of mirroring the
files ourselves.

Discarded because:

* not substantially simpler than our other designs;
* having to re-implement `Valid-Until` checking is scary;
* too much reliance on an external service.

frozen mode = when building from a tag => use snapshot.d.o with
a timestamp manually set in Git => need code that tells us what's the
dinstall timestamp used at some point during a validated build (racy
but no big deal; can kill the race condition by using a local mirror
whose update is disabled during builds)

regular mode = otherwise => use ftp.us.d.o

* Directly use snapshot.d.o + dinstall ID
  - basically replaces e.g. aptly's snapshot / "reprepro pull in new
    suite" feature
  - The fastest possible way to do a new snapshot, since we don't have
    to store nor pull anything at all.
  - Doesn't introduce a database we have to maintain and trust
    software not to ever corrupt it.
  - the dinstall ID that a given mirror was last updated can be
    retrieved from that mirror, e.g. `Archive serial` in
    <http://ftp.fr.debian.org/debian/project/trace/ftp-master.debian.org>
  - Blocker: `Valid-Until` can be invalid:
    * If we don't bump the dinstall ID at least once a week as part of
      the normal development process. Seems impractical (e.g.
      we sometimes freeze for more than a week) and too rigid.
    * When rebuilding from an old tag (old > a week).
  - But do we want to depend on snapshot.d.o that much?
    * Served from two different locations.
    * Ask weasel if we can go this way. Make it clear how much we care
      about _old_ data, e.g.:
      - For reproducible rebuild check we only care about re-building
        the last release, or the last few releases.
      - GPL requires distributing the source for at least 3 years
        after we stop distributing the binaries.
  - Whonix uses that, go look/ask for pros/cons they've seen.
  - other repos e.g. deb.tpo; we can probably handle it in a very
    ad-hoc and lightweight way, by importing the packages we want into
    our own Tails-specific APT suites, or with reprepro's mirroring
    (`pull`) feature.

- to avoid relying on browsing <http://snapshot.debian.org/> for
  getting the dinstall timestamp we'll stick into Git, we need
  a script that does the browsing _and_ validates that the determined
  timestamp is not too far away in the past.

## Partial APT repository snapshots

Discarded because:

* it raises a tricky chicken'n'egg problem: managing the list of
  packages that will be needed to build a given Git branch, and
  maintaining the set of partial APT repository snapshots that these
  branches need;
* partially snapshot'ing live APT repositories (e.g.
  security.debian.org) is racy: between the time when we build an ISO
  to get the list of packages we need to import, and the time we
  actually import them, files can have disappeared on the mirrors.

 - pro: faster sync ⇒ faster snapshots ⇒ shorter time to remediation
   However, we can have something similar with full snapshots, if we
   continuously update a temporary snapshot, and then when we need it
   we only have to stick some label onto it.
 - cons: more complex... except perhaps if we want to optimize time to
   remediation for full snapshots as described above.

### Snapshots name

For partial mirroring, their name must contain:

* Debian origin (`debian`, `debian-security`)
* Debian distribution (`sid`, `jessie`, `jessie-backports`, etc.)
* name of the Tails Git release/base branch that needs this set of
  packages

## Replace index expiration check

For the remote snapshots (snapshot.d.o) solution, we _have_ to do
that. For partial and full archive snapshots, this is optional: the
only advantage is that it allows us to _not_ periodically update
`Valid-Until` and signature.

One "solution" would be to replace `Acquire::Check-Valid-Until`:

 - runtime: we point APT sources to the regular Debian archive, no
   need to disable `Acquire::Check-Valid-Until`, we're good.
 - ISO build time: we know when we've frozen ⇒ we can tell APT not to
   do that check, and check the Release files ourselves based on the
   additional info and constraints we have; a bit risky, no right to
   fail, but not totally scary; so draft a security discussion, then
   have it reviewed

## dak, britney, merge-o-matic, debile, etc.

Overkill, and not really meant to address our needs.
Let's instead write our own :P