summaryrefslogtreecommitdiffstats
path: root/wiki/src/contribute/design/translation_platform.mdwn
diff options
context:
space:
mode:
Diffstat (limited to 'wiki/src/contribute/design/translation_platform.mdwn')
-rw-r--r--wiki/src/contribute/design/translation_platform.mdwn491
1 files changed, 491 insertions, 0 deletions
diff --git a/wiki/src/contribute/design/translation_platform.mdwn b/wiki/src/contribute/design/translation_platform.mdwn
new file mode 100644
index 0000000..a8c694b
--- /dev/null
+++ b/wiki/src/contribute/design/translation_platform.mdwn
@@ -0,0 +1,491 @@
+[[!meta title="Translation platform"]]
+
+Until 2019, our (website) translation infrastructure relied on
+translators [[being able to know how to use
+Git|contribute/how/translate/with_Git]]. This was a pretty high entry
+barrier for new translators, especially those who are not familiar with
+Git or the command line.
+
+This is the technical design documentation of our new setup.
+It is by no means perfect. We track known issues via
+[tickets on Redmine](https://redmine.tails.boum.org/code/projects/tails/issues?query_id=321).
+
+[[!toc levels=2]]
+
+Terminology used in this document
+=================================
+
+- Canonical Git repository: the [[main Tails Git
+ repository|contribute/git#main-repo]] that our
+ website is built from, in scripts often called "main repository" or "main
+ Git"
+- Production server: the server that hosts our website
+- translate.lizard: the VM that hosts our Weblate web interface, the
+ corresponding Git repositories, as well as the [staging website](https://staging.tails.boum.org/).
+
+Setup and integration with our infrastructure
+=============================================
+
+We are using our own [Weblate instance](https://translate.tails.boum.org/).
+
+Weblate uses a clone of the Tails main Git repository, to which
+translations get committed and pushed once they have been approved by a user with
+reviewer status. Non-approved translations live on Weblate's database
+only, until they get reviewed. A [staging website](https://staging.tails.boum.org/) allows translators to
+preview non-reviewed translations in context.
+
+Approved changes are automatically fed back into our canonical Git
+repository. This presents a major challenge, because we need to ensure
+that:
+
+- No merge conflict occurs:
+
+ - such conflicts often occur in PO file headers which prevents Weblate
+ from automatically merging changes
+ - many contributors work on the same code base using different tools
+ (PO files can be edited by hand, using translation software such as
+ Poedit, or they are generated by ikiwiki itself, which results in
+ different formatting)
+
+- Only PO files are committed.
+
+- The committed PO files comply with shared formatting standards.
+
+- No compromised code is introduced.
+
+In order to integrate Weblate and the work done by translators into our
+process, we have set up this scheme:
+
+[[!img "lib/design/git_repository_details.svg" link="no"]]
+
+Website and Weblate
+-------------------
+
+Our website uses ikiwiki and its [PO plugin](https://ikiwiki.info/plugins/po/).
+
+It uses markdown files for
+the English original language and carries a PO file for each translated
+language. Thereby we distinguish languages that are activated on our
+website from languages that have translations but are not yet activated
+on the website because they do not [[cover enough of
+our core pages|contribute/how/translate/team/new/]] to be considered
+usable.
+
+We have defined [[a list of tier-1
+languages|contribute/how/translate#tier-1-languages]], that we consider
+to be of importance to our user base. No more languages shall be
+activated in Weblate as our main Git repository carries reviewed, and
+thus approved translations of all languages enabled on the Weblate
+platform, while only part of them are active on the website.
+
+Each PO file corresponds to a single component in Weblate, in order to
+appear in the Weblate interface. For example, the component:
+
+ wiki/src/support.*.po
+
+relates to the files `support.mdwn`, `support.es.po`, `support.de.po`, `support.pot`,
+etc.
+
+Repositories
+------------
+
+The repository used by Weblate is cloned and updated from the master
+branch of the Tails main
+repository. Changes generated on Weblate's copy
+of the Tails main Git repository, located on the VM which hosts the
+Weblate platform, are automatically fed back to the master branch of
+the Tails main repository. This happens through a number of scripts,
+checks, and cronjobs that we'll describe below.
+
+There are several languages enabled, some of them with few or no
+translations. As everything is fed back to the Tails canonical
+repository, all files are available when cloning this repository:
+
+ git clone https://git-tails.immerda.ch/tails
+
+If needed, for exceptional means, Weblate's Git repository can be cloned
+or added as a remote:
+
+ git clone https://translate.tails.boum.org/git/tails/index/
+
+At the server the repository is located in:
+
+ ~weblate/repositories/vcs/tails/index
+
+Weblate can commit to its local repository at any time, whenever
+translations get approved. Changes done in the canonical repository by
+Tails contributors via Git and changes done in Weblate thus need to be
+merged — in a safe place. This happens in an integration repository:
+
+ ~weblate/repositories/integration
+
+On the VM (translate.lizard), a third repository is used for the staging
+website:
+
+ ~weblate/repositories/vcs/staging
+
+Automatic merging and pushing
+-----------------------------
+
+The integration of changes from the different repositories is done by a
+script which is executed on the VM hosting Weblate as [a cronjob](https://git-tails.immerda.ch/puppet-tails/tree/manifests/weblate.pp). The
+[`cron.sh`](https://git-tails.immerda.ch/puppet-tails/tree/files/weblate/scripts/cron.sh) script
+has the following steps which we will explain below:
+
+ 1. Canonical → Integration:
+ Update the integration repository with changes made on the
+ canonical repository (called "main" in the script).
+ 2. Make Weblate locally commit any pending approved translation
+ 3. Weblate → Integration:
+ Integrate committed changes from Weblate into the integration repository
+ 4. Integration → Canonical:
+ Push the up-to-date integration repository to the canonical repository.
+ 5. Canonical → Weblate:
+ Pull from the canonical repository and update the Weblate components.
+ 6. Update Weblate's index for fulltext search
+
+Whenever a contributor modifies a markdown (`*.mdwn`) file and pushes
+to master, the corresponding PO files are updated, that is: the
+translatable English strings within those files are updated. This
+update happens:
+
+ - on the production server itself, when [[building the
+ wiki|contribute/build/website]];
+ - only for languages that are enabled on the production website.
+
+We need to ensure on the translation platform server, that PO files for
+additional languages (that are enabled on Weblate but not on the
+production website) are equally updated, committed locally, and pushed to
+the canonical Git repository. On top of this we need to update Weblate's
+database accordingly, so that translatable strings can be updated for new or
+modified English strings in those files, in all languages.
+
+### Step 1: Canonical → Integration
+
+**Update the integration repository with changes made on the canonical
+repository**
+
+The script fetches from the canonical (remote) repository and tries to
+merge changes into the (local) integration repository. The merge
+strategy used for this step is defined in [`update_weblate_git.py`](https://git-tails.immerda.ch/puppet-tails/tree/files/weblate/scripts/update_weblate_git.py):
+
+When this script is executed, it merges changes in PO files based on
+single translation units (`msgids`). A merge conflict occurs when the same
+translation unit has been changed both in the canonical and the integration
+repository (in the latter case, this would mean that the change has been
+done via Weblate). In such a case, we always prefer the canonical
+version. This makes sure that Tails developers can fix issues in
+translations and have priority over Weblate.
+
+Due to this procedure we never end up with broken PO files. However, we
+may loose a translation done on Weblate.
+
+Until here, only PO files of languages that are activated on our
+production website will be merged, as the production website
+does not refresh PO files for languages that are not activated there,
+so these PO files are outdated in the canonical Git repository at this point.
+
+Because of this limitation of ikiwiki, once the activated language PO
+files are merged, the script checks if PO files of other languages, that are not
+activated in production, need updating. We do this by
+generating POT files out of a PO file that we've previously defined as the
+default language. We do this for all components. If the actual POT
+file, generated on the production server, differs from the POT file we've
+just created, then every additional language PO file needs to be
+updated.
+
+On top of this, if the PO file of the default language (that is, its
+Markdown file) has been renamed, moved, or deleted, then the PO files of
+additional languages need to be accordingly renamed, moved, or deleted.
+
+In summary, our script applies all changes detected on the default
+language to the additional languages.
+
+With `python-git` creating a diff against working directory against the index
+is very error-prone. But a diff between two commits works fine. That's why we
+always create a new commit within the described script, but often those commits
+don't change the content of any file. In order to omit these empty unnecessary
+commits our script also detects when a `fast-forward` is possible (the master
+branch is updated to HEAD of either the canonical or the integration branch).
+If only Weblate or only modifications on the canonical repository introduces
+new commits and the merge commit is empty, a fast-forward can be done, by a
+force reset to the desired HEAD.
+
+### Step 2: Trigger commits
+
+Weblate tries to minimize the number of commits (aka. "lazy
+commits"), so we need to explicitly to ask Weblate to commit every component
+which has outstanding changes since more than 24 hours.
+
+This is done by triggering Weblate to commit pending approved
+translations using the internal command ([`manage.py commit_pending`](https://docs.weblate.org/en/weblate-2.20/admin/management.html#commit-pending)).
+
+### Step 3: Weblate → Integration
+
+**Merging changes from Weblate's Git repository into the integration
+repository**
+
+The script fetches from the Weblate (remote) Git repository and tries to
+merge changes into the (local) integration repository. The merge
+strategy used for this step is defined in [`merge_weblate_changes.py`](https://git-tails.immerda.ch/puppet-tails/tree/files/weblate/scripts/merge_weblate_changes.py).
+
+Changes already present in the integration repository are preferred over
+the changes from the remote, Weblate repository. This makes fixes
+done to PO files manually, via the canonical Git repository, stick and propagate
+to Weblate.
+
+Again, PO file merges are done on translation units (`msgids`).
+
+Furthermore, we make sure via the script that Weblate has only modified
+PO files; indeed we automatically reset everything else to the version
+that exists in canonical.
+
+### Step 4: Integration → Canonical
+
+**Pushing from the integration repository to our canonical repository,
+aka "production"**
+
+After updating the Integration repository, we push the changes back to
+Canonical aka puppet-git.lizard. After this, the Canonical repository has
+everything integrated from Weblate.
+
+On the side of the canonical Git repository, a Gitolite hook
+([`tails-weblate-update.hook`](https://git-tails.immerda.ch/puppet-tails/tree/files/gitolite/hooks/tails-weblate-update.hook))
+makes sure that Weblate only pushes changes on PO files.
+This hook also checks and verifies the committer of each commit, to make
+sure only translations made on the Weblate platform are automatically
+pushed. Otherwise
+the push is rejected, for security reasons.
+
+### Step 5: Canonical → Weblate
+
+**Integrating the changes made in the Canonical Git repository into
+the Weblate repository**
+
+After having merged changes from the canonical Git repository into the
+integration Git repository, and integrated changes from Weblate there,
+we can assume that every PO file is now up-to-date, both in the Integration
+and Canonical repositories. Hence we can try to pull from the Canonical
+repository using a fast-forward only merge (`git pull --ff-only`). The
+canonical and Weblate repositories may see new commits anytime. This
+means: while our cronjob is running a new commit can be made. Then, a
+new commit on one side (canonical or Weblate), prevents a
+`fast-forward`. When this happens, the cronjob is run 5 minutes later
+anew, and then steps 1, 3 and 4 of the cronjob aim at fixing the cause of
+why the fast-forward was not possible this time.
+
+If the fast-forward merge was successful, we need to update Weblate's components
+to reflect the modifications that happened in Git, such as
+string and file updates, removals, renames, or additions. This is
+handled by another script:
+[`update_weblate_components.py`](https://git-tails.immerda.ch/puppet-tails/tree/files/weblate/scripts/update_weblate_components.py).
+
+Besides our scripts that modify the Weblate repository, Weblate itself
+keeps creating commits and updates the master branch. That's why the
+script is using a dedicated Git remote named `cron` to keep track of which
+commits need to be looked at for Weblate component changes. This remote
+name is set in
+[weblate.pp](https://git-tails.immerda.ch/puppet-tails/tree/manifests/weblate.pp)
+and used in the cronjob like this:
+
+ update_weblate_components.py --remoteBranch=cron/master [...]
+
+### Step 6
+
+Run [`manage.py update_index`](https://docs.weblate.org/en/weblate-2.20/admin/management.html#update-index).
+This updates Weblate's index for fulltext search.
+Weblate upstream authors recommend running it every 5 minutes.
+
+<a id="staging-website"></a>
+
+Staging website
+---------------
+
+### Goals
+
+In order to allow translators to see their non committed suggestions as
+well as languages which are not activated on <https://tails.boum.org>, we
+have put in place a [staging website](https://staging.tails.boum.org/).
+It is a clone of our production website and is regularly refreshed.
+
+On top of what our production website has, it includes:
+
+ - all languages available on Weblate, even those that are not enabled
+ on our production website yet;
+
+ - all translation suggestions made on Weblate.
+
+This allows:
+
+ - translators to check how the result of their work will look like
+ on our website;
+
+ - reviewers to check how translation suggestions look like on the
+ website, before validating them.
+
+### What is done behind the scenes to generate a new version of the staging website?
+
+The
+[`update-staging-website.sh`](https://git-tails.immerda.ch/puppet-tails/tree/files/weblate/scripts/update-staging-website.sh)
+cronjob is run.
+
+This cronjob calls a script that extracts suggestions from Weblate's
+database and applies them to a local clone of Weblate's Git repository,
+after having updated this clone:
+[`save-suggestions.py`](https://git-tails.immerda.ch/puppet-tails/tree/files/weblate/scripts/save-suggestions.py)
+
+After that we run `ikiwiki --refresh` using an dedicated `ikiwiki.setup`
+file for the staging website.
+
+None of the changes on this repository clone are fed back anywhere and they
+should not.
+
+### Sanity checks
+
+We automatically perform some sanity checks on this staging website.
+The last report of these checks is published on
+<https://staging.tails.boum.org/last-sanity-errors.txt>.
+
+Machine translation
+-------------------
+
+This is important because it saves time for the translators, especially
+in cumbersome documents, and helps us to be consistent not only with our
+translations but, for example, with the Debian locales if we feed them
+to the tmserver.
+
+It is a very subtle way of increasing the quality of our translations.
+
+It should give suggestions when one is translating, under the translation
+window, in the _Machine translation_ tab.
+
+We use tmserver for machine translation ([upstream documentation](https://docs.weblate.org/en/weblate-2.20/admin/machine.html#tmserver)).
+
+In order to update the suggestion we run
+[`update_tm.sh`](https://git-tails.immerda.ch/puppet-tails/tree/templates/weblate/update_tm.sh.erb) via cronjob every month.
+
+The tmserver can be queried like this [(see
+`tmserver.service`)](https://git-tails.immerda.ch/puppet-tails/tree/manifests/weblate.pp):
+
+ http://localhost:8080/tmserver/en/de/unit/contribute
+
+<a id="access-control"></a>
+
+Access control on Weblate
+=========================
+
+## Requirements
+
+- Every translation change must be reviewed by another person before
+ it's validated (and thus committed by Weblate and pushed to our
+ production website).
+
+ - This requirement must be enforced via technical means, for
+ translators that are not particularly trusted (e.g. new user
+ accounts). For example, it must be impossible for an attacker to
+ pretend to be that second person and validate their own changes,
+ simply by creating a second user account.
+
+ - It's acceptable that this requirement is enforced only via social
+ rules, and not via technical means, for a set of
+ trusted translators.
+
+- We need to be able to bootstrap a new language and give its
+ translators sufficient access rights so that they can do their job,
+ even without anyone at Tails personally knowing any of them.
+
+- Suggested translations are used to build the [[staging
+ website|translation_platform#staging-website]].
+
+Currently implemented proposal
+------------------------------
+
+- In Weblate lingo, we use the [dedicated
+ reviewers](https://docs.weblate.org/en/latest/workflows.html#dedicated-reviewers)
+ workflow: it's the only one that protects us against an adversary
+ who's ready to create multiple user accounts.
+
+- When not logged in, a visitor is in the `Guests` group and is
+ only allowed to suggest translations.
+
+- Every logged in user is in the `Users` group. Members of this group
+ are allowed to suggest translations but not to accept suggestions
+ nor to directly save new translations of their own. They can also
+ vote on suggestions.
+
+- A reviewer, i.e. a member of the `@Review` group in Weblate, is
+ allowed to accept suggestions.
+
+ Limitations:
+
+ - Technically, reviewers are also allowed to directly save new
+ translations of their own, edit existing translations, and
+ accept their own suggestions; we ask them in our
+ documentation to use this privilege sparingly, only to fix
+ important and obvious problems.
+
+ Even if we forbid reviewers to accept their own suggestions,
+ nothing would prevent them from creating another account, making
+ the suggestion from there, and then accepting it with their
+ reviewer account.
+
+ - Reviewer status is global to our Weblate instance, and not
+ per-language, so technically, a reviewer can very well accept
+ suggestions for a language they don't speak. We will them in
+ our documentation to _not_ do that, except to fix important and
+ obvious problems that don't require knowledge of that language
+ (for example, broken syntax for ikiwiki directives).
+
+ If this ever causes actual problems, this could be fixed with
+ [group-based access
+ control](https://docs.weblate.org/en/weblate-2.20/admin/access.html#groupacl)
+
+- How one gets reviewer status:
+
+ - We will port to Weblate semantics the pre-existing trust
+ relationship we already have towards translation teams that have
+ been using Git so far: they all become reviewers.
+
+ To this aim, we have asked them to create an account on Weblate
+ and tell us what their user name is.
+
+ - One can request reviewer status to Weblate administrators, who
+ will:
+ 1. Accept this request if, and only if, a sufficient amount of
+ work was done by the requesting translator (this can be checked on
+ the user's page, e.g.
+ [intrigeri's](https://translate.tails.boum.org/user/intrigeri/).
+ In other words, we use proof-of-work to increase the cost of attacks.
+ 2. Let <tails-l10n@boum.org> and all the other Weblate reviewers
+ know about this status change.
+
+- Bootstrapping a new language
+
+ As a result of this access control setup, translators for a new
+ language can only make suggestions until they have done a sufficient
+ amount of work and two of them are granted reviewer status. In the
+ meantime, they can see the output of their work on the [[staging
+ website|contribute/design/translation_platform#staging-website]].
+
+ Pending questions:
+
+ - Is the resulting UX good enough? Does the ability to vote up
+ suggestions helps sufficiently?
+
+Maintenance
+===========
+
+A plan for the future maintenance of our Weblate instance will be
+worked on in November 2019 and laid out before the end of the year
+([[!tails_ticket 17050]]):
+
+ - [[maintainers's role definition|contribute/working_together/roles/translation_platform]]
+ - [[operations documentation|contribute/working_together/roles/translation_platform/operations]]
+
+See also
+========
+
+ - [[specification|contribute/design/translation_platform/specification]]
+ - [[documentation for translators|contribute/how/translate/with_translation_platform]]
+ - [[blueprint for future work|blueprint/translation_platform]]
+