1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
|
This is about [[!tails_ticket 5926]].
[[!toc levels=3]]
# Assumptions
A given APT repository snapshot is immutable after it's been taken.
We'll deal with freeze exception separately.
We want to have reproducible builds some day. Therefore, the APT
`sources.list` shipped in the ISO must be stable across rebuilds from
the same release Git tag.
Say `kedit` is a package shipped in Debian, but not in Tails. Then,
when run inside Tails, `apt install kedit` must fetch `kedit` from
current Debian, as opposed to installing it from a Tails-specific, and
generally obsolete, snapshot of the Debian APT repository.
We don't bother merging mirrored APT repositories / suites into
aggregated ones. It loses information, gives us more work, and brings
little value.
# TODO
1. doc-driven development [i]
* draft contributors doc for each workflow
- RM (see release process doc and APT repo common operations doc)
- developer (including stable, testing, devel, and `$topic`)
* get the updated documentation + this design reviewed, including
security aspects [i]
* write tools that the doc calls for
- bump `Valid-Until`
- freeze
- unfreeze
* move relevant content from this blueprint to the "final" design
doc + contributors doc
2. time-based snapshots [i]
a. **done** initial reprepro setup that keeps up-to-date local mirrors of
the APT repositories we need
b. **done** snapshot these mirrors every time they're updated
c. **done** decide how many reprepro instances we want/have to split all
this among
d. **done** mirror relevant suites of deb.tails.b.o as well
d. publish the snapshots over HTTP
e. try using such snapshots for building an ISO
e. publish the snapshots' serial: a file is updated, now needs to
be published over HTTP
d. implement list of sticky snapshots that must not be GC'ed,
including the tool to add to that list
e. implement GC of snapshots
f. implement GC of packages
g. have build system output the snapshots being used,
and have Jenkins publish this info if available
h. manage symlinks or rewrite rules for URL → reprepro filesystem layout
3. generate set of APT sources [i]
* write automated tests for the generation of APT sources
- redirection when using the latest snapshot of a given origin
* implement the generation of APT sources
* plug the generation of APT sources into the build system
* implement
[[switching to live APT sources at runtime|freezable_APT_repository#runtime-sources]]
4. tagged snapshots
a. **done** PoC of capturing the list of binary packages used during the build [k]
b. **done** PoC of capturing the list of source packages used during the build [k]
c. **done** initial reprepro setup for tagged snapshots
d. **done** debootstrap in jessie-backports
e. **WIP** how to create a partial snapshot from a manifest and
the origin time-based snapshots? [k]
- review and test k's code that is meant to address this [i]; in particular:
* `generate-build-manifest` (main Git repo), aka.
[[!tails_ticket 10748]]
- cherry-pick the relevant bits and get them into Tails 2.3 [i]
- update or replace the custom debootstrap script
(`tails-wheezy`)
- drop the "per-origin" part: we only need (package,
version)
- The architecture information is not part of the manifest.
It's fine if `tails-prepare-tagged-apt-snapshot-import` can
do its job without this information. Can it?
- triage remaining XXX:s in the script, address what needs
to be
* `tails-prepare-tagged-apt-snapshot-import`, aka.
[[!tails_ticket 10749]] (`puppet-tails` repo):
- Handle the problem mentioned on
[[listing used packages|freezable APT repository#build-manifest]]
[k]
- support for multiple architectures? needed for multiarch
that we'll have to use as soon as we want to upgrade Linux
to 4.x, see
[[!tails_gitweb_branch feature/8415-overlayfs]]; (it might
be good enough to import a bit too much, e.g. import each
package for _all_ architectures our reprepro setup
supports even though we need it only for one architecture;
beware of differing versions due to binNMUs, though)
- triage remaining XXX:s in the script, address what needs
to be
f. **WIP** expand list of source packages with those that the binary
packages were built from [k]
=> review this [i], in particular:
- check the case when the binary package's version is different
from the corresponding source package's one
(`libdevmapper1.02.1` vs. `lvm2`)
- torproject provides no source packages; how does
`tails-prepare-tagged-apt-snapshot-import` deal with it?
g. **WIP** have the manifest → partial snapshot process include source
packages [k]
=> review this [i], in particular:
- check the case when the binary package's version is different
from the corresponding source package's one
(`libdevmapper1.02.1` vs. `lvm2`)
h. for some Tails release: generate manifest, import packages into
tagged snapshots, try building *offline* with these tagged
snapshots [i]
i. have debootstrap 1.0.73+ in all our build environments so that
we get the `deburis` file, that's needed to build our packages
listing; same for `libfile-slurp-perl` and `liblist-moreutils-perl`
- Vagrant basebox
- Jenkins slaves
- done: manual build doc
j. convert custom `data/debootstrap/tails-wheezy` into a patch,
or set up the process to update/replace it in the future [i]
k. Update the
[[Listing used packages|freezable APT repository#build-manifest]]
section
l. Have Jenkins publish the list of build manifest
if available (supersede existing `*.{bin,src}pkg`).
m. if needed, implement GC
5. misc
* implement whatever the "freeze exceptions" section requires
* Puppetize every server-side thing that hasn't been yet.
# The big picture
## Snapshots and branches
Several times a day (e.g. 4 times, to match runs of `dinstall` in the
Debian archive; XXX: start with once a day and then raise the
frequency if the infrastructure can hold it?) we update a local mirror
of the APT repositories we're
interested in, e.g. with `reprepro update`. Once this is successfully
done, we take a snapshot of the current state of our local mirror
(e.g. with `reprepro pull`); this snapshot's name must contain:
* an identifier of the APT repository this snapshot is about, e.g.
`debian`, `debian-security`, `torproject`;
* a `YYYYMMDD$ID` serial, `$ID` being an incremental decimal number
formatted on two digits (`01`, `02`, etc.).
The APT repository mirroring infrastructure publishes the name of the
latest snapshot for each mirrored repository. Similarly, every ISO
build exports the names of the APT repository snapshots it uses.
Building an ISO from the `devel` branch always uses the freshest set
of APT repository snapshots available. Resolving what's the set of
freshest APT repository snapshots is done at the beginning of the
build, so that the entire build uses the exact same state of these
repositories. This is needed for reproducible builds, and has a nice
side effect: so long, `Hashsum mismatch`, and thanks for the fish.
(Implementation detail: in practice, this pointer resolution is done
early in `auto/config`, so that we can 1. specify the snapshots we
want via `lb config --mirror-{bootstrap,chroot}`, which `lb build`
uses to generate APT sources for the target base distribution, and 2.
adjust other APT sources (`config/chroot_sources`) somehow.)
Building an ISO from the branch used to prepare the next major release
(`testing`), or a topic branch based on it (`config/base_branch`):
* **outside of the freeze period**: we use the latest set of APT
repository snapshots, just like when building from `devel`;
* **freeze period**: at freeze time, the RM encodes in the Git
`testing` branch the set of APT repository snapshots (via their
serial numbers) that shall be used during the freeze; the only
exception is security.debian.org, for which we always use our
latest snapshot;
* **at release time**: when building from a tagged branch, similarly to
what we do for our custom [[contribute/APT_repository]], instead of
using time-based APT repository snapshots, we use snapshots
labeled with the Git tag;
* **after releasing**, the RM encodes in the `testing` Git branch the
fact that it is not frozen anymore, that is: the RM removes the
indication that a specific set of APT repository snapshots must be
used; and then, we're back to the "outside of the freeze
period" case.
Building an ISO from the branch used to prepare the next point-release
(`stable`), or a topic branch based on it (`config/base_branch`
contains `stable`), we
use snapshots labeled with the Git tag of the latest Tails release,
except:
* we generally use our latest snapshot of security.debian.org;
* if a set of APT repository snapshots is encoded directly in that
branch: use them, even for security.debian.org.
XXX: add special handling of deb.tails.b.o, that we need since it's
the repo where we can sneak freeze exceptions in. In theory it's not
related to our great APT repository snapshots plans, since it has its
own snapshots mechanism already, but ideally we would integrate it
into the new system entirely?
## Different problems ⇒ different solutions
Note that:
* The time-based snapshots of the mirrored APT repositories that are
used basically all the time (except when building a release) should
be *full* snapshots, that is they should contain exactly the same
set of packages as the mirrored repository. This has the advantage
that some workflows are trivially handled, e.g. working on a topic
branch that installs additional Debian packages; if such snapshots
were not full ones, then to work on one such branch, one would need
either that I have the credentials to import new packages from
Debian into our own mirror or repositories (which raises the
barrier for contributing), or that during some phases of Tails
development the regular Debian archive is used instead of our own
mirror, which feels prone to "time to QA vs. time to
release" issues.
* The tagged snapshots used to build releases can be *partial*, that
is they can contain only the subset of the mirrored repositories
that is required for building a specific Tails ISO image.
So, we actually want to manage two sets of snapshots that are vastly
different in terms of goals, users, turnover, garbage collection and
backup strategies:
* **time-based, full snapshots** of the mirrored APT repositories over
the last N days;
- goal: freezable repo feature for the dev process and QA
- this one can be restarted from scratch from time to time if
reprepro becomes too slow for some reason (such as imperfect DB
garbage collection);
- if we lose this content, we lose only N days of data, and we can
immediately rebuild a working data set from scratch ⇒ no need to
sync' this content to the failover server; no need to back it up;
* **tagged, partial snapshots** that were used to build released Tails
ISO images:
- goal: reproducible builds, GPL compliance;
- in there we import only the needed packages;
- we want to back up this data, and expire it very cautiously,
if ever.
Trying to solve both problems in the same `reprepro` instance would be
problematic. Not only, coupling very different problems together, and
trying to address them with the exact same tools and process, is
generally a bad idea. But also, reprepro's database becomes quite big
when we import large chunks of the Debian archive into it, which may
make it slow ([[!tails_ticket 6295]]), and in any case makes it hard
to back up... which we want to, for preserving the releases' tagged
snapshots information.
So we'll use two independent `reprepro` instances to address these
two problems.
XXX: how exactly we'll import packages we need from time-based
snapshots to tagged ones is left to be defined (filtered `reprepro
update`? `cp` + `reprepro includeblah`?)
# Special cases and implementation
<a id="runtime-sources"></a>
## APT sources used inside Tails
A running Tails' APT must be pointed at the official, live Debian
archive, and not to a Tails-specific and already obsolete snapshot.
To achieve that we can tweak `sources.list` as we already do in
[[!tails_gitweb config/chroot_local-includes/lib/live/config/1500-reconfigure-APT]].
But generating the 2 versions (frozen, not frozen) of the sources at
ISO build time would probably be more elegant: at boot time, one only
needs to rename files instead of fiddling with `sed`.
## Upgrading to a new snapshot
In other words: bumping, in Git, the pointers to the set of snapshots
that shall be used.
Let's use, as an example of a situation in which we might want to do
that, upgrading to a new Debian point-release.
With this design:
* `devel` gets them automatically because it closely tracks the
Debian archive;
* for release branches (`stable`, `testing`): on a case-by-case
basis, depending on the respective Debian/Tails release schedule
timing, we can choose whether to switch to using a new snapshot of
the Debian archive for the next release. Note that this can be done
via a topic-branch since this information is encoded in Git. If we
choose not to manually pick the point release, which is the default
if we don't act at all, then:
- `testing` will start using the new Debian point-release as soon
as it is unfrozen, that is as soon as it has been used to release
a new major version of Tails;
- `stable` will start using the new Debian point-release once
a `testing` branch that uses that point-release is merged into
`stable`.
<a id="freeze-exceptions"></a>
## Freeze exceptions
This is a new problem brought by using "frozen" snapshot of APT
repositories during a Tails code freeze: some bug, that we want to see
fixed in the release we are preparing, would be resolved if we pulled
an upgraded package as-is from a freshest Debian APT repository.
Before we could freeze APT repositories, we would have got this bugfix
for free. Now we need to grant freeze exceptions.
This is similar to "Upgrading to a new snapshot", except that we want
to upgrade one package only. By definition, this only affects *frozen*
release branches (`stable`, `testing`), and topic branches based on
them: all other branches use the freshest set of APT repository
snapshots available.
Most of the time, a bugfix branch we want to merge into a frozen
release branch doesn't need to upgrade packages from Debian, so this
is a corner case for the time being. Moreover, so far we have always
dealt with this problem entirely by hand, so it's not critical to
provide much improved tools. What makes it tempting to improve the
situation here is mostly:
* even though freeze exceptions will remain exceptions, frozen will
add one use case:
* this will become a relatively common operation if we are based on
Debian testing some day, so let's check that it's not only
possible, but also reasonably easy to handle with this design
(otherwise we may have to switch to more powerful tools, such as
dak + britney).
Definition: here, we'll call "overlay [[contribute/APT repository]]"
the set of Tails-specific APT suites that we have been maintaining for
a few years. They are overlay in that they don't contain all the
packages that can be found in Debian: building a Tails ISO image also
requires another kind of APT sources, that are more complete.
We can handle freeze exceptions this way:
1. Import the package we want to upgrade into our own overlay
[[contribute/APT repository]], in the suite corresponding to the
branch that we want to see this package ⇒ in the general case, the
upgraded package will be installed in the next Tails release.
We need a tool to do that (would `reprepro pull` with a custom
filter do the job?).
2. Pin, in `config/chroot_apt/preferences`, the upgraded package we
have just imported. The aforementioned tool can do this as well.
[Our current default APT pinning ranks Tails overlay APT suites
over any other APT source, so why the need to add an APT pinning
entry? The problem is that it's hard to do the next step (clean up)
with this APT pinning, combined with the fact that we can't easily
delete a package from an APT suite and see this deletion propagated
over suite merges. I (intrigeri) was not able to find a good
solution to that problem under these constraints, so this document
assumes that we change this, and pin our overlay APT suites at the
same level as the APT sources corresponding to the Debian release
Tails is currently based on. This implies that we manually pin, in
Git, the packages from our overlay APT suites, that we want to
override the ones found in other repositories regardless of
version numbers.]
3. Make it so branches stop using the upgraded package once they have
been unfrozen, that is once the upgraded package can be fetched
from a time-based snapshot of the repository we've initially pulled
it from. Reverting the commit that added the corresponding APT
pinning in the first place is enough. This should be done by the
release manager, immediately after a release, when they un-freeze
the branch used for the release and merge it into other release
branches. But the RM needs to know which commit to revert, so we
need to keep track of such upgrades: ideally the tool used to pull
the upgraded package in the first place should generate the
commands that will need to be run post-release, or save the data
needed to generate these commands; such information must be pasted
somewhere, e.g. into a new ticket; these commands could even be
scheduled to be run post-release automatically, if we're
comfortable giving commit access to our Git repository to more
machines, and have time to implement it.
Another option, instead of adding/removing temporary APT pinning,
would be to backport the package we want to upgrade, and make it so it
has a version greater than the one in the time-based snapshot used by
the frozen release branch, and lower than the one in more recent
time-based snapshots. This means building and uploading the package to
the relevant overlay APT suite. This is appealing, because it doesn't
require any cleanup: the upgraded package will automatically be
superseded as soon as it can be. However:
* we would not benefit from Debian features like reproducible builds;
* it requires either manual work and bandwidth every time, or setting
up and maintaining infrastructure to automate the whole thing;
* the fact that the changes *have to* go through Git, with the APT
pinning option, helps enforcing our review'n'merge processes; one
can do it by the book with the custom backport option too, by going
through a topic branch and `config/APT_overlays.d/`, but it still
conveys less historical information through Git than the APT
pinning option.
## Number of distributions
... in reprepro's `conf/distributions`, for the reprepro instance(s)
dedicated to taking snapshots of the regular Debian archive, assuming
other mirrored archives such as security.d.o, deb.tpo, etc. each go to
their own reprepro instance.
XXX: the more we split between multiple instances of reprepro, the
smaller and more manageable its database becomes. But it implies some
disk space waste due to duplicated files, and some bandwidth waste to
re-downloading these duplicated packages. If the waste is limited the
packages from security.d.o that get included in the next
{oldstable,stable} point release we can perhaps live with it.
### Time-based snapshots
14 distributions:
( oldstable * (base, updates, p-u, backports, sloppy-backports)
+ stable * (base, updates, p-u, backports)
+ testing * (base, updates, p-u)
+ sid
+ experimental
)
4 snapshots a day (=~ 1/dinstall run) * 14 distributions
* N days
= 56 * N
Let's set N to match the `Valid-Until` duration we want: it makes
little sense to keep expired snapshots around, and reciprocally it
makes little sense to give a snapshot a validity time that goes beyond
when we'll delete it via garbage collection.
⇒ 56 * N = 56 * 10 = 560
Number of distributions for other archives:
- debian-security: 3 (oldstable, stable, testing)
- tails: 3 (stable, testing, devel)
- torproject: 5 (oldstable, stable, testing, unstable, obfs4proxy)
#### Garbage collection
Simply cloning an existing Wheezy/i386/main "distribution" adds 100MB
to `reprepro`'s database (*not* counting the actual packages!), so the
whole thing will likely be quite big ⇒ expiring the snapshots older
than N days will probably be compulsory.
To ensure that garbage collection doesn't delete a snapshot we still
need, e.g. the one currently referenced in the frozen `testing`
branch, we'll maintain a list of snapshots that need to be kept
around. The tool used by the RM to bump the archive snapshot serials
in Git should take care of it.
### Tagged snapshots
We want to keep "forever" the tagged snapshots used by releases.
In practice, "forever" == min(3 years for GPL, how long we want to be
able to reproduce the build of a released ISO) = 3 years.
12 releases/year * 13 distributions =~ 150 distributions/year
⇒ 450 distributions three years after deployment, which is the upper
bound if we delete such snapshots when they're 3 years old.
#### Garbage collection
Depending on the growth rate of this `reprepro` instance's database,
we may or may not need to implement expiration of these snapshots any
time soon. Time will tell.
## reprepro
XXX:
* use `Log:` in `conf/distributions`? deployed (20151030), let's look
at it and reconsider in a few weeks
* use `Tracking:` in `conf/distributions`?
* use a leading dash for `Update: - ...` in `conf/distributions`?
<https://mirrorer.alioth.debian.org/reprepro.1.html#Some%20note%20on%20updates>
* compare fields in generated `Release` files, with what can be found
in the official Debian archive
* "Reprepro uses berkeley db, which was a big mistake. The most
annoying problem not yet worked around is database corruption when
the disk runs out of space. (Luckily if it happens while
downloading packages while updating, only the files database is
affected, which is easy (though time consuming) to rebuild, see
recovery file in the documentation). *Ideally put the database on
another partition to avoid that.*" (emphasis mine, from
[reprepro(1)](https://mirrorer.alioth.debian.org/reprepro.1.html#BUGS))
There's a race condition when updating a local mirror with `reprepro
update`: if it's not finished before the next dinstall + mirror sync'
end, then files `reprepro` wants to download can have disappeared from
the remote mirror, and `reprepro update` will fail (exit code = 255).
So, when the first run exits with exit code 255, let's ignore the
error and run `reprepro update` a second time.
### Snapshots
In our [initial
experiments](https://labs.riseup.net/code/issues/6295#note-14) we
added full blown distributions to `conf/distributions` for each
snapshot, and used `reprepro pull $codename` to add packages to them.
Let's try with `reprepro gensnapshot`, which avoids the need to manage
the list of snapshots in `conf/distributions`. The following tests are
run with `conf/{distributions,updates}` set up to mirror the 14
distributions we want from the Debian archive.
Creating one snapshot:
distributions() {
sed -rn -e 's/^Codename:\s+(.*)$/\1/p' conf/distributions
}
serial="$(date -u '+%Y%m%d')01"
for codename in $(distributions) ; do
reprepro gensnapshot "$codename" "$serial"
done
⇒ `dists/*/snapshots` takes 400MB (a snapshot done with `reprepro
pull` would of course add essentially the same files somewhere else in
`dists`, and occupy the same disk space in there), but the DB doesn't
grow noticeably.
And then, jumping to 40 (10 days * 4 snapshots/day) snapshots of each
distribution, which is what we should have in practice:
for incr in $(seq --equal-width 2 40); do
serial="$(date -u '+%Y%m%d')$incr"
for codename in $(distributions) ; do
reprepro gensnapshot "$codename" "$serial"
done
done
⇒ `dists/*/snapshots` takes 16 GB, and the DB has grown from 900 MB to
1.5 GB; as expected, `packages.db` didn't grow at all: only
`references.db` did.
Conclusion: compared to the "snapshots as full-blown distributions +
`reprepro pull`" option, we're saving _a lot_ on database size, which
is very appealing. The counterpart being that:
* garbage collecting expired snapshots is a bit more involved, but
apparently doable: see reprepro(1) around `gensnapshot`;
* bumping `Valid-Until` for a given time-based snapshot has to be
done directly in `dist`, without any help from reprepro.
XXX: find out how we can solve these two problems.
None of these problems seem to warrant going back to the other
option... and having to deal with 80GB+ BDB databases.
<a id="build-manifest"></a>
## Listing used packages
Saved as ISO build artifact, both when building in Jenkins and outside
from it.
Output:
- for each .deb: the installed version
- The union of all APT sources used during the build.
### Corner case
If a (package, version) is seen at build time in 2 or more APT
sources, `tails-prepare-tagged-apt-snapshot-import` will inject it
into each of the tagged snapshots corresponding to these sources.
The goal is to avoid this scenario:
- version X of package P is available both in suite S1 on origin O1,
and in suite S2 on origin O2
- version Y of package P is available in suite S3 of origin O3
- our pinning makes us prefer version X of package P *because it's
available in O1/S1*; otherwise, if it wasn't in there, then our
pinning would make APT prefer version Y to version X
- at ISO build time, APT fetches package P version X from O2/S2
- given this build manifest, we import package P version X into our
tagged snapshot of O2/S2, but not into our tagged snapshot of O1/S1
- if we rebuild from the same source tree using that set of tagged
snapshots, then version Y of package P will be installed
This scenario can happen in practice:
# cat /etc/apt/sources.list
deb http://security.debian.org wheezy/updates main
deb http://ftp.us.debian.org/debian/ wheezy main
deb http://ftp.us.debian.org/debian/ jessie main
# cat /etc/apt/preferences
Package: *
Pin: origin security.debian.org
Pin-Priority: -10
Package: *
Pin: release o=Debian,n=wheezy
Pin-Priority: 990
Package: *
Pin: release o=Debian,n=jessie
Pin-Priority: 700
# apt-cache madison a2ps
a2ps | 1:4.14-1.3 | http://ftp.us.debian.org/debian/ jessie/main amd64 Packages
a2ps | 1:4.14-1.1+deb7u1 | http://security.debian.org/ wheezy/updates/main amd64 Packages
a2ps | 1:4.14-1.1+deb7u1 | http://ftp.us.debian.org/debian/ wheezy/main amd64 Packages
# apt-cache policy a2ps
a2ps:
Installed: (none)
Candidate: 1:4.14-1.1+deb7u1
Version table:
1:4.14-1.3 0
700 http://ftp.us.debian.org/debian/ jessie/main amd64 Packages
1:4.14-1.1+deb7u1 0
-10 http://security.debian.org/ wheezy/updates/main amd64 Packages
990 http://ftp.us.debian.org/debian/ wheezy/main amd64 Packages
And then, in the current state of things APT will download `a2ps` from
security.d.o:
# apt-get download a2ps --print-uris
'http://security.debian.org/pool/updates/main/a/a2ps/a2ps_4.14-1.1+deb7u1_amd64.deb' a2ps_4.14-1.1+deb7u1_amd64.deb 956298 sha256:e47d7fe9adb7aa62421108debf425830f4e2385e98151c5cb359d3eb8688eea8
... but if `a2ps` was not available in the regular Wheezy archive,
e.g. because we were using a tagged snapshot that imported `a2ps` into
the security archive, then APT would prefer `a2ps` from Jessie, which
demonstrates the bug... hence the "inject it into each of the tagged
snapshots corresponding to these sources" requirement we had.
### Bonus material
- Allows to inspect the diff between the subset of two different
snapshots that was used at build time; the benefit is very
minor as long as we're based on Debian oldstable or stable, but
if/when we switch to being based on Debian testing then we will
definitely want that. Not that minor: we also fetch packages
from testing, sid, backports, etc.
- Say a branch (topic one, or devel, etc.) introduces
a regression, and has changes the set of packages used at build
time, we may want to check how exactly that set was changed.
Think "check the diff between `.packages`" as we do at release
time, but done in a correct way.
- Allows keeping only _partial_ snapshots (of our full archive
ones) for those we want to keep forever, i.e. release ones.
## Valid-Until and signing
Assumption: it is acceptable to have our APT repository snapshots
signed by a key that lives on an online server.
We would like to have `Valid-Until` fields in the generated `Release`
files, but we'd rather not have to update these files, and the
corresponding signatures, regularly. In practice:
* A **tagged APT repository snapshot** that was used to build a given
Tails release is immutable by design, so it does not need the
protections provided by `Valid-Until`. Besides, not using
`Valid-Until` for those makes it much easier to reproduce a given
ISO build in the future.
* The main use case for keeping a given **time-based APT repository
snapshot** around and valid is when it's being used by a release
branch:
- `testing`: while it's frozen, that is during 5-10 days most of
the time;
- `stable`: that's a corner case, since `stable` generally uses the
set of tagged snapshots of the latest Tails release; if and when
we decide to manually point `stable` to a different set of
snapshots, then we can as well deal with `Valid-Until` manually.
So, let's set `Valid-Until` 10 days after the generation time for
time-based snapshots, and not set it at all for tagged snapshots.
Still, it may be that we need to bump `Valid-Until` for a given
time-based snapshot, e.g. if a freeze lasts substantially longer than
usual. We thus need a tool that allows us (XXX: the RM?
sysadmin team?) to do so.
In passing, note that we ship an empty `/var/cache/apt/lists/` in the
ISO ⇒ modifying `Release` and `Release.gpg` files on our APT
repository won't prevent the ISO build from being deterministic.
## APT vs. reprepro: dist names
We need to encode in the APT sources' base URL the exact snapshot we
want to use, in order to be able to pass it to `lb config --mirror-*`.
But this doesn't match reprepro's directory structure as-is.
Thankfully this problem can be workaround'ed with some symlinks or
HTTP rewrite rules. Here's how.
Let's assume:
lb config --distribution wheezy
lb config --mirror-chroot http://XYZ.tails.boum.org/debian/20151019/
lb config --mirror-chroot-security http://XYZ.tails.boum.org/debian-security/20151021/
etc.
Which generates this APT `sources.list`:
deb http://XYZ.tails.boum.org/debian/20151019/ wheezy main
deb http://XYZ.tails.boum.org/debian-security/20151021/ wheezy/updates main
As a result APT sends HTTP requests with URL such as:
* <http://XYZ.tails.boum.org/debian/20151019/dists/wheezy/Release>
* <http://XYZ.tails.boum.org/debian-security/20151021/dists/wheezy/updates/Release>
XXX: update the following if we decide to use `reprepro gensnapshot`,
which implies slightly different paths.
The corresponding files in reprepro's filesystem (if we have one
reprepro instance per mirrored archive) are:
* in Debian archive's reprepro:
- `/srv/foo/debian/dists/wheezy/20151019/Release`
- `/srv/foo/debian/conf/distributions` contains `Suite: wheezy/20151019`
* in Debian security archive's reprepro:
- `/srv/foo/debian-security/dists/wheezy/updates/20151021/Release`
- `/srv/foo/debian-security/conf/distributions` contains
`Suite: wheezy/updates/20151019`
To have these HTTP requests translate to access these files, one needs
either symlinks (tested successfully) or HTTP rewrite rules.
Note: this works because APT only warns when the codename in the
`Release` file doesn't match the one requested in `sources.list`.
There's a code comment around this check, dating back from 2004, that
says something like "This might become fatal in the future". We bet that if it
becomes fatal some day, it will be possible to turn it back into
a warning via configuration. This affects only development builds
since we're not going to configure APT _in the Tails ISO_ to point to
our own snapshots of the Debian archive.
# Bonus for later
This mechanism can perhaps be reused for snapshotting the state of our
own repo at release time (e.g. to create/publish the `1.6` APT suite).
If the chosen mirroring/snapshoting tool supported re-using the Debian
signature (e.g. <https://github.com/smira/aptly/issues/37>) then we
would only have to sign ourselves the snapshots for which need to
modify `Release` — that is: when we bump (too long freeze) or remove
(at release time) `Valid-Until` — which happens rarely and can be done
manually ⇒ we can avoid storing the signing key on an online server.
# Discarded
## "Remote" APT repository snapshots
... i.e. using snapshot.debian.org directly, instead of mirroring the
files ourselves.
Discarded because:
* not substantially simpler than our other designs;
* having to re-implement `Valid-Until` checking is scary;
* too much reliance on an external service.
frozen mode = when building from a tag => use snapshot.d.o with
a timestamp manually set in Git => need code that tells us what's the
dinstall timestamp used at some point during a validated build (racy
but no big deal; can kill the race condition by using a local mirror
whose update is disabled during builds)
regular mode = otherwise => use ftp.us.d.o
* Directly use snapshot.d.o + dinstall ID
- basically replaces e.g. aptly's snapshot / "reprepro pull in new
suite" feature
- The fastest possible way to do a new snapshot, since we don't have
to store nor pull anything at all.
- Doesn't introduce a database we have to maintain and trust
software not to ever corrupt it.
- the dinstall ID that a given mirror was last updated can be
retrieved from that mirror, e.g. `Archive serial` in
<http://ftp.fr.debian.org/debian/project/trace/ftp-master.debian.org>
- Blocker: `Valid-Until` can be invalid:
* If we don't bump the dinstall ID at least once a week as part of
the normal development process. Seems impractical (e.g.
we sometimes freeze for more than a week) and too rigid.
* When rebuilding from an old tag (old > a week).
- But do we want to depend on snapshot.d.o that much?
* Served from two different locations.
* Ask weasel if we can go this way. Make it clear how much we care
about _old_ data, e.g.:
- For reproducible rebuild check we only care about re-building
the last release, or the last few releases.
- GPL requires distributing the source for at least 3 years
after we stop distributing the binaries.
- Whonix uses that, go look/ask for pros/cons they've seen.
- other repos e.g. deb.tpo; we can probably handle it in a very
ad-hoc and lightweight way, by importing the packages we want into
our own Tails-specific APT suites, or with reprepro's mirroring
(`pull`) feature.
- to avoid relying on browsing <http://snapshot.debian.org/> for
getting the dinstall timestamp we'll stick into Git, we need
a script that does the browsing _and_ validates that the determined
timestamp is not too far away in the past.
## Partial APT repository snapshots
Discarded because:
* it raises a tricky chicken'n'egg problem: managing the list of
packages that will be needed to build a given Git branch, and
maintaining the set of partial APT repository snapshots that these
branches need;
* partially snapshot'ing live APT repositories (e.g.
security.debian.org) is racy: between the time when we build an ISO
to get the list of packages we need to import, and the time we
actually import them, files can have disappeared on the mirrors.
- pro: faster sync ⇒ faster snapshots ⇒ shorter time to remediation
However, we can have something similar with full snapshots, if we
continuously update a temporary snapshot, and then when we need it
we only have to stick some label onto it.
- cons: more complex... except perhaps if we want to optimize time to
remediation for full snapshots as described above.
### Snapshots name
For partial mirroring, their name must contain:
* Debian origin (`debian`, `debian-security`)
* Debian distribution (`sid`, `jessie`, `jessie-backports`, etc.)
* name of the Tails Git release/base branch that needs this set of
packages
## Replace index expiration check
For the remote snapshots (snapshot.d.o) solution, we _have_ to do
that. For partial and full archive snapshots, this is optional: the
only advantage is that it allows us to _not_ periodically update
`Valid-Until` and signature.
One "solution" would be to replace `Acquire::Check-Valid-Until`:
- runtime: we point APT sources to the regular Debian archive, no
need to disable `Acquire::Check-Valid-Until`, we're good.
- ISO build time: we know when we've frozen ⇒ we can tell APT not to
do that check, and check the Release files ourselves based on the
additional info and constraints we have; a bit risky, no right to
fail, but not totally scary; so draft a security discussion, then
have it reviewed
## dak, britney, merge-o-matic, debile, etc.
Overkill, and not really meant to address our needs.
Let's instead write our own :P
|