Create Platform backend (OpenSearch and Clickhouse) and release data.
This application uses the Otter library to define the steps of the pipeline that create and push all the output artifacts for a run of the Open Targets pipeline.
Check out the config.yaml file to see the steps and the tasks that make them up.
A thin cli wrapper has been built around the otter runner (which itself uses cli) in order to faciliate remote execution (by terraform) as well as local execution.
- uv is the package manager POS. It is compatible with PIP, so you can also fall back to it if you feel more comfortable.
- terraform is the IaC tool by which the necessary infrastructure is assembled and destroyed.
All the configuration you need should be in the config.yaml. Additional configuration for running specific commands (e.g. terraform config) can be given as command line options.
A folder for all the configuration is here, which has the following:
- Main config for otter: config.yaml - Platform config.
- PPP specific config for otter: config-ppp.yaml - PPP specific config. When the commands have the
--productoption,platformwill use config.yaml andpppwill use config-ppp.yaml. - Config for datasets, data sources/table names/settings etc. for Clickhouse, OpenSearch, BigQuery: datasets.yaml
- Clickhouse configs/schema/sql: clickhouse
- OpenSearch Dockerfile/index settings: opensearch
$ uv run pos --help
Usage: pos [OPTIONS] COMMAND [ARGS]...
Platform Output Support (POS) CLI
Options:
--help Show this message and exit.
Commands:
local* Run any POS step locally (default command).
backend Create platform backend using remote POS execution.
bigquery Populate BigQuery.
clean-remote Clean up remote POS resources after a remote run.
ftp-sync Release data to FTP.
gcs-sync Release data to GCS.
remote Run any POS step remotely on a machine defined by...
restore-clickhouse Restore ClickHouse from a backup.
restore-opensearch Restore OpenSearch from an OpenSearch snapshot.
tarballs Create platform tarballs using remote POS execution.local is the default command so if you run without any command, this will invoke the otter cli:
$uv run pos
Platform Output Support (POS) CLI
Usage: pos local [OPTIONS]
Run any POS step locally (default command).
Depending on the step and where you are running this, this may not work.
Options:
-c, --config-path PATH Path to configuration YAML file.
-s, --step TEXT Step to run. [required]
-w, --work-path PATH The local working path. This is where files
will be downloaded and the manifest and logs
will be written to.
-r, --release-uri TEXT If set, this URI will be used as the release
location. This is where files will be
uploaded and the manifest and logs will be
written to.If omitted, the run will be local
only.
-p, --pool-size INTEGER The number of worker proccesses that will be
spawned to run tasksin the step in parallel.
It should be similar to the number of
cores,but could be higher because there is a
lot of I/O blocking.
-l, --log-level [TRACE|DEBUG|INFO|WARNING|ERROR|CRITICAL]
Log level for the application.
--help Show this message and exit.Create the platform data backend artifacts. Note that this doesn't create tarballs nor does it release any data. These are managed by other commands. Executes remotely.
$ uv run pos backend
Platform Output Support (POS) CLI
Usage: pos backend [OPTIONS]
Create platform backend using remote POS execution.
Use this to creates the following resources: - Google Disk snapshots for
ClickHouse and OpenSearch - OpenSearch snapshot in a remote GCS repository -
ClickHouse backup in a remote GCS bucket
Options:
--product [platform|ppp] Product to create backend for. [required]
-p, --pool-size INTEGER The number of worker proccesses that will be
spawned to run tasksin the step in parallel. It
should be similar to the number of cores,but could
be higher because there is a lot of I/O blocking.
--pos-branch TEXT The POS git branch to use for the remote run.
--tfvar <TEXT TEXT>... Terraform variable overrides as key-value pairs.
e.g., --tfvar key value
--tfvar-file PATH Path to a Terraform variable file.
--auto-approve Automatically approve Terraform actions without
prompting.
--terraform-dir PATH Path to the Terraform configuration directory.
--help Show this message and exit.e.g. uv run pos backend --product platform will create the backed for platform.
After this has been run, be sure to clean up the remote infrastructure with clean-remote.
e.g. uv run pos bigquery --instance prod
Use to release data to Google BigQuery (dev or prod). Executes locally.
e.g. uv run pos clean-remote
Clean up any remote infrastucture created by Terraform.
e.g. uv run pos ftp-sync
Release data to the ftp. Executes locally. From there, it connects to the EBI compute cluster and runs a gcloud container to sync data.
e.g. uv run pos gcs-sync --product platform
Release data to Google Cloud Storage. Executes locally.
Available for platform or ppp.
e.g. uv run pos remote -s clickhouse
Run any step in the config remotely.
e.g. uv run pos tarballs --product platform --os-from-snapshot platform-2512-os --ch-from-snapshot platform-2512-ch
Create the tarballs for clickhouse/opensearch data.
Requires specifying the google disk snapshots that you wish to archive the data from. Executes remotely.
e.g. uv run pos restore-clickhouse --product platform --target-instance production-clickhouse
Restore the specified ClickHouse instance (gcp arguments can be passed as options) from the backup matching the database namespace in the config.
e.g. uv run pos restore-opensearch --product platform --target-instance production-opensearch
Restore the specified OpenSearch instance (gcp arguments can be passed as options) from the backup matching the database namespace in the config.
Copyright 2014-2025 EMBL - European Bioinformatics Institute, Genentech, GSK, MSD, Pfizer, Sanofi and Wellcome Sanger Institute
This software was developed as part of the Open Targets project. For more information please see: http://www.opentargets.org
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.