summaryrefslogtreecommitdiffstats
path: root/wiki/src/contribute/design/translation_platform.mdwn
blob: a8c694b7e5351379b97c2494130917d2f9be16dd (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
[[!meta title="Translation platform"]]

Until 2019, our (website) translation infrastructure relied on
translators [[being able to know how to use
Git|contribute/how/translate/with_Git]]. This was a pretty high entry
barrier for new translators, especially those who are not familiar with
Git or the command line.

This is the technical design documentation of our new setup.
It is by no means perfect. We track known issues via
[tickets on Redmine](https://redmine.tails.boum.org/code/projects/tails/issues?query_id=321).

[[!toc levels=2]]

Terminology used in this document
=================================

- Canonical Git repository: the [[main Tails Git
  repository|contribute/git#main-repo]] that our
  website is built from, in scripts often called "main repository" or "main
  Git"
- Production server: the server that hosts our website
- translate.lizard: the VM that hosts our Weblate web interface, the
  corresponding Git repositories, as well as the [staging website](https://staging.tails.boum.org/).

Setup and integration with our infrastructure
=============================================

We are using our own [Weblate instance](https://translate.tails.boum.org/).

Weblate uses a clone of the Tails main Git repository, to which
translations get committed and pushed once they have been approved by a user with
reviewer status. Non-approved translations live on Weblate's database
only, until they get reviewed. A [staging website](https://staging.tails.boum.org/) allows translators to
preview non-reviewed translations in context.

Approved changes are automatically fed back into our canonical Git
repository. This presents a major challenge, because we need to ensure
that:

- No merge conflict occurs:

  - such conflicts often occur in PO file headers which prevents Weblate
    from automatically merging changes
  - many contributors work on the same code base using different tools
    (PO files can be edited by hand, using translation software such as
    Poedit, or they are generated by ikiwiki itself, which results in
    different formatting)

- Only PO files are committed.

- The committed PO files comply with shared formatting standards.

- No compromised code is introduced.

In order to integrate Weblate and the work done by translators into our
process, we have set up this scheme:

[[!img "lib/design/git_repository_details.svg" link="no"]]

Website and Weblate
-------------------

Our website uses ikiwiki and its [PO plugin](https://ikiwiki.info/plugins/po/).

It uses markdown files for
the English original language and carries a PO file for each translated
language. Thereby we distinguish languages that are activated on our
website from languages that have translations but are not yet activated
on the website because they do not [[cover enough of
our core pages|contribute/how/translate/team/new/]] to be considered
usable.

We have defined [[a list of tier-1
languages|contribute/how/translate#tier-1-languages]], that we consider
to be of importance to our user base. No more languages shall be
activated in Weblate as our main Git repository carries reviewed, and
thus approved translations of all languages enabled on the Weblate
platform, while only part of them are active on the website.

Each PO file corresponds to a single component in Weblate, in order to
appear in the Weblate interface. For example, the component:

    wiki/src/support.*.po

relates to the files `support.mdwn`, `support.es.po`, `support.de.po`, `support.pot`,
etc.

Repositories
------------

The repository used by Weblate is cloned and updated from the master
branch of the Tails main
repository. Changes generated on Weblate's copy
of the Tails main Git repository, located on the VM which hosts the
Weblate platform, are automatically fed back to the master branch of
the Tails main repository. This happens through a number of scripts,
checks, and cronjobs that we'll describe below.

There are several languages enabled, some of them with few or no
translations. As everything is fed back to the Tails canonical
repository, all files are available when cloning this repository:

    git clone https://git-tails.immerda.ch/tails

If needed, for exceptional means, Weblate's Git repository can be cloned
or added as a remote:

    git clone https://translate.tails.boum.org/git/tails/index/

At the server the repository is located in:

    ~weblate/repositories/vcs/tails/index

Weblate can commit to its local repository at any time, whenever
translations get approved. Changes done in the canonical repository by
Tails contributors via Git and changes done in Weblate thus need to be
merged — in a safe place. This happens in an integration repository:

    ~weblate/repositories/integration

On the VM (translate.lizard), a third repository is used for the staging
website:

    ~weblate/repositories/vcs/staging

Automatic merging and pushing
-----------------------------

The integration of changes from the different repositories is done by a
script which is executed on the VM hosting Weblate as [a cronjob](https://git-tails.immerda.ch/puppet-tails/tree/manifests/weblate.pp). The
[`cron.sh`](https://git-tails.immerda.ch/puppet-tails/tree/files/weblate/scripts/cron.sh) script
has the following steps which we will explain below:

  1. Canonical → Integration:
     Update the integration repository with changes made on the
     canonical repository (called "main" in the script).
  2. Make Weblate locally commit any pending approved translation
  3. Weblate → Integration:
     Integrate committed changes from Weblate into the integration repository
  4. Integration → Canonical:
     Push the up-to-date integration repository to the canonical repository.
  5. Canonical → Weblate:
     Pull from the canonical repository and update the Weblate components.
  6. Update Weblate's index for fulltext search

Whenever a contributor modifies a markdown (`*.mdwn`) file and pushes
to master, the corresponding PO files are updated, that is: the
translatable English strings within those files are updated. This
update happens:

 - on the production server itself, when [[building the
   wiki|contribute/build/website]];
 - only for languages that are enabled on the production website.

We need to ensure on the translation platform server, that PO files for
additional languages (that are enabled on Weblate but not on the
production website) are equally updated, committed locally, and pushed to
the canonical Git repository. On top of this we need to update Weblate's
database accordingly, so that translatable strings can be updated for new or
modified English strings in those files, in all languages.

### Step 1: Canonical → Integration

**Update the integration repository with changes made on the canonical
repository**

The script fetches from the canonical (remote) repository and tries to
merge changes into the (local) integration repository. The merge
strategy used for this step is defined in [`update_weblate_git.py`](https://git-tails.immerda.ch/puppet-tails/tree/files/weblate/scripts/update_weblate_git.py):

When this script is executed, it merges changes in PO files based on
single translation units (`msgids`). A merge conflict occurs when the same
translation unit has been changed both in the canonical and the integration
repository (in the latter case, this would mean that the change has been
done via Weblate). In such a case, we always prefer the canonical
version. This makes sure that Tails developers can fix issues in
translations and have priority over Weblate.

Due to this procedure we never end up with broken PO files. However, we
may loose a translation done on Weblate.

Until here, only PO files of languages that are activated on our
production website will be merged, as the production website
does not refresh PO files for languages that are not activated there,
so these PO files are outdated in the canonical Git repository at this point.

Because of this limitation of ikiwiki, once the activated language PO
files are merged, the script checks if PO files of other languages, that are not
activated in production, need updating. We do this by
generating POT files out of a PO file that we've previously defined as the
default language. We do this for all components. If the actual POT
file, generated on the production server, differs from the POT file we've
just created, then every additional language PO file needs to be
updated.

On top of this, if the PO file of the default language (that is, its
Markdown file) has been renamed, moved, or deleted, then the PO files of
additional languages need to be accordingly renamed, moved, or deleted.

In summary, our script applies all changes detected on the default
language to the additional languages.

With `python-git` creating a diff against working directory against the index
is very error-prone. But a diff between two commits works fine. That's why we
always create a new commit within the described script, but often those commits
don't change the content of any file. In order to omit these empty unnecessary
commits our script also detects when a `fast-forward` is possible (the master
branch is updated to HEAD of either the canonical or the integration branch).
If only Weblate or only modifications on the canonical repository introduces
new commits and the merge commit is empty, a fast-forward can be done, by a
force reset to the desired HEAD.

### Step 2: Trigger commits

Weblate tries to minimize the number of commits (aka. "lazy
commits"), so we need to explicitly to ask Weblate to commit every component
which has outstanding changes since more than 24 hours.

This is done by triggering Weblate to commit pending approved
translations using the internal command ([`manage.py commit_pending`](https://docs.weblate.org/en/weblate-2.20/admin/management.html#commit-pending)).

### Step 3: Weblate → Integration

**Merging changes from Weblate's Git repository into the integration
repository**

The script fetches from the Weblate (remote) Git repository and tries to
merge changes into the (local) integration repository. The merge
strategy used for this step is defined in [`merge_weblate_changes.py`](https://git-tails.immerda.ch/puppet-tails/tree/files/weblate/scripts/merge_weblate_changes.py).

Changes already present in the integration repository are preferred over
the changes from the remote, Weblate repository. This makes fixes
done to PO files manually, via the canonical Git repository, stick and propagate
to Weblate.

Again, PO file merges are done on translation units (`msgids`).

Furthermore, we make sure via the script that Weblate has only modified
PO files; indeed we automatically reset everything else to the version
that exists in canonical.

### Step 4: Integration → Canonical

**Pushing from the integration repository to our canonical repository,
aka "production"**

After updating the Integration repository, we push the changes back to
Canonical aka puppet-git.lizard. After this, the Canonical repository has
everything integrated from Weblate.

On the side of the canonical Git repository, a Gitolite hook
([`tails-weblate-update.hook`](https://git-tails.immerda.ch/puppet-tails/tree/files/gitolite/hooks/tails-weblate-update.hook))
makes sure that Weblate only pushes changes on PO files.
This hook also checks and verifies the committer of each commit, to make
sure only translations made on the Weblate platform are automatically
pushed. Otherwise
the push is rejected, for security reasons.

### Step 5: Canonical → Weblate

**Integrating the changes made in the Canonical Git repository into
the Weblate repository**

After having merged changes from the canonical Git repository into the
integration Git repository, and integrated changes from Weblate there,
we can assume that every PO file is now up-to-date, both in the Integration
and Canonical repositories. Hence we can try to pull from the Canonical
repository using a fast-forward only merge (`git pull --ff-only`).  The
canonical and Weblate repositories may see new commits anytime. This
means: while our cronjob is running a new commit can be made. Then, a
new commit on one side (canonical or Weblate), prevents a
`fast-forward`. When this happens, the cronjob is run 5 minutes later
anew, and then steps 1, 3 and 4 of the cronjob aim at fixing the cause of
why the fast-forward was not possible this time.

If the fast-forward merge was successful, we need to update Weblate's components
to reflect the modifications that happened in Git, such as
string and file updates, removals, renames, or additions. This is
handled by another script:
[`update_weblate_components.py`](https://git-tails.immerda.ch/puppet-tails/tree/files/weblate/scripts/update_weblate_components.py).

Besides our scripts that modify the Weblate repository, Weblate itself
keeps creating commits and updates the master branch. That's why the
script is using a dedicated Git remote named `cron` to keep track of which
commits need to be looked at for Weblate component changes. This remote
name is set in
[weblate.pp](https://git-tails.immerda.ch/puppet-tails/tree/manifests/weblate.pp)
and used in the cronjob like this:

	update_weblate_components.py --remoteBranch=cron/master [...]

### Step 6

Run [`manage.py update_index`](https://docs.weblate.org/en/weblate-2.20/admin/management.html#update-index).
This updates Weblate's index for fulltext search.
Weblate upstream authors recommend running it every 5 minutes.

<a id="staging-website"></a>

Staging website
---------------

### Goals

In order to allow translators to see their non committed suggestions as
well as languages which are not activated on <https://tails.boum.org>, we
have put in place a [staging website](https://staging.tails.boum.org/).
It is a clone of our production website and is regularly refreshed.

On top of what our production website has, it includes:

 - all languages available on Weblate, even those that are not enabled
   on our production website yet;

 - all translation suggestions made on Weblate.

This allows:

 - translators to check how the result of their work will look like
   on our website;

 - reviewers to check how translation suggestions look like on the
   website, before validating them.

### What is done behind the scenes to generate a new version of the staging website?

The
[`update-staging-website.sh`](https://git-tails.immerda.ch/puppet-tails/tree/files/weblate/scripts/update-staging-website.sh)
cronjob is run.

This cronjob calls a script that extracts suggestions from Weblate's
database and applies them to a local clone of Weblate's Git repository,
after having updated this clone:
[`save-suggestions.py`](https://git-tails.immerda.ch/puppet-tails/tree/files/weblate/scripts/save-suggestions.py)

After that we run `ikiwiki --refresh` using an dedicated `ikiwiki.setup`
file for the staging website.

None of the changes on this repository clone are fed back anywhere and they
should not.

### Sanity checks

We automatically perform some sanity checks on this staging website.
The last report of these checks is published on
<https://staging.tails.boum.org/last-sanity-errors.txt>.

Machine translation
-------------------

This is important because it saves time for the translators, especially
in cumbersome documents, and helps us to be consistent not only with our
translations but, for example, with the Debian locales if we feed them
to the tmserver.

It is a very subtle way of increasing the quality of our translations.

It should give suggestions when one is translating, under the translation
window, in the _Machine translation_ tab.

We use tmserver for machine translation ([upstream documentation](https://docs.weblate.org/en/weblate-2.20/admin/machine.html#tmserver)).

In order to update the suggestion we run
[`update_tm.sh`](https://git-tails.immerda.ch/puppet-tails/tree/templates/weblate/update_tm.sh.erb) via cronjob every month.

The tmserver can be queried like this [(see
`tmserver.service`)](https://git-tails.immerda.ch/puppet-tails/tree/manifests/weblate.pp):

	http://localhost:8080/tmserver/en/de/unit/contribute

<a id="access-control"></a>

Access control on Weblate
=========================

## Requirements

- Every translation change must be reviewed by another person before
  it's validated (and thus committed by Weblate and pushed to our
  production website).

  - This requirement must be enforced via technical means, for
    translators that are not particularly trusted (e.g. new user
    accounts). For example, it must be impossible for an attacker to
    pretend to be that second person and validate their own changes,
    simply by creating a second user account.

  - It's acceptable that this requirement is enforced only via social
    rules, and not via technical means, for a set of
    trusted translators.

- We need to be able to bootstrap a new language and give its
  translators sufficient access rights so that they can do their job,
  even without anyone at Tails personally knowing any of them.

- Suggested translations are used to build the [[staging
  website|translation_platform#staging-website]].

Currently implemented proposal
------------------------------

- In Weblate lingo, we use the [dedicated
  reviewers](https://docs.weblate.org/en/latest/workflows.html#dedicated-reviewers)
  workflow: it's the only one that protects us against an adversary
  who's ready to create multiple user accounts.

- When not logged in, a visitor is in the `Guests` group and is
  only allowed to suggest translations.

- Every logged in user is in the `Users` group. Members of this group
  are allowed to suggest translations but not to accept suggestions
  nor to directly save new translations of their own. They can also
  vote on suggestions.

- A reviewer, i.e. a member of the `@Review` group in Weblate, is
  allowed to accept suggestions.

  Limitations:

  - Technically, reviewers are also allowed to directly save new
    translations of their own, edit existing translations, and
    accept their own suggestions; we ask them in our
    documentation to use this privilege sparingly, only to fix
    important and obvious problems.

	Even if we forbid reviewers to accept their own suggestions,
    nothing would prevent them from creating another account, making
    the suggestion from there, and then accepting it with their
    reviewer account.

  - Reviewer status is global to our Weblate instance, and not
    per-language, so technically, a reviewer can very well accept
    suggestions for a language they don't speak. We will them in
    our documentation to _not_ do that, except to fix important and
    obvious problems that don't require knowledge of that language
    (for example, broken syntax for ikiwiki directives).

	If this ever causes actual problems, this could be fixed with
    [group-based access
    control](https://docs.weblate.org/en/weblate-2.20/admin/access.html#groupacl)

- How one gets reviewer status:

  - We will port to Weblate semantics the pre-existing trust
    relationship we already have towards translation teams that have
    been using Git so far: they all become reviewers.

	To this aim, we have asked them to create an account on Weblate
	and tell us what their user name is.

  - One can request reviewer status to Weblate administrators, who
    will:
    1. Accept this request if, and only if, a sufficient amount of
       work was done by the requesting translator (this can be checked on
       the user's page, e.g.
       [intrigeri's](https://translate.tails.boum.org/user/intrigeri/).
       In other words, we use proof-of-work to increase the cost of attacks.
    2. Let <tails-l10n@boum.org> and all the other Weblate reviewers
       know about this status change.

- Bootstrapping a new language

  As a result of this access control setup, translators for a new
  language can only make suggestions until they have done a sufficient
  amount of work and two of them are granted reviewer status. In the
  meantime, they can see the output of their work on the [[staging
  website|contribute/design/translation_platform#staging-website]].

  Pending questions:

  - Is the resulting UX good enough? Does the ability to vote up
    suggestions helps sufficiently?

Maintenance
===========

A plan for the future maintenance of our Weblate instance will be
worked on in November 2019 and laid out before the end of the year
([[!tails_ticket 17050]]):

 - [[maintainers's role definition|contribute/working_together/roles/translation_platform]]
 - [[operations documentation|contribute/working_together/roles/translation_platform/operations]]

See also
========

 - [[specification|contribute/design/translation_platform/specification]]
 - [[documentation for translators|contribute/how/translate/with_translation_platform]]
 - [[blueprint for future work|blueprint/translation_platform]]