Takuan

Takuan is a REST API data service for transcriptomics data.

It is intended to ingest, organize and query data from transcriptomics experiments through an API.

Takuan stores its data in a PostgreSQL database.

Data

Takuan handles data produced in transcriptomics experiments.

For a given experiment, samples taken from study participants are selected for RNA sequencing.

At the end of an RNA sequencing pipeline, results are usually stored in TSV/CSV format, Takuan handles 2 result formats:

Multi-sample Raw Count Matrices (RCM)
- Defines the expression levels for each feature (gene) and sample pair
- Can only ingest one count type at a time (raw, TPM, TMM, GETMM or FPKM)
Single-sample detailled counts
- Defines the expression levels for each feature (gene) for the given sample
- Can ingest all count types at once (raw, TPM, TMM, GETMM and FPKM)

Once the data is produced, it can be ingested in Takuan in order to allow downstream analysis of the results.

Multi-sample matrices (RCM)

Takuan expects to receive RCM files in CSV format, where the colums correspond to unique sample identifiers, rows to unique feature identifiers (genes) and cells to the observed count for the sample-gene pair.

Single-sample detailled documents

Takuan expects to receive RCM files in CSV or TSV format, where the columns correspond to specific expression measures, and the rows to feature IDs (genes).

For example:

gene_id           raw_count   tpm_count   tmm_count   getmm    fpkm_count
ENSG00000000003   0           0           0           0        0
ENSG00000000005   1559        2.3567      0.2369      7.566    0.369
...

With single-sample data files, you can save time by ingesting the raw counts and the pre-normalised values at the same time.

The single-sample endpoint even supports column headers mappings, allowing you to flexibly ingest files that don't share header names.

Data model and flow

In order to ingest and query data into Takuan, you must follow these steps:

Create an experiment, in which we will later ingest gene expression data
1. POST /experiment
2. JSON body describing the experiment, where
  1. experiment_result_id is a unique identifier for the experiment
  2. assembly_id is the assembly accession ID used in the experiment
  3. assembly_name is the genome assembly name used in the experiment
  4. extra_properties is a JSON object where you can place additional meta data
Ingest an RCM into the experiment you created
1. POST /experiment/{experiment_result_id}/ingest
  1. Where experiment_result_id must correspond to an existing experiment ID in Takuan
  2. A valid RCM file must be in the request's body as rcm_file
2. During the ingestion, Takuan creates a gene_expression row for every pair of sample-gene
OR ingest single-sample data
1. POST /experiment/{experiment_result_id}/ingest/single
  1. Where experiment_result_id must correspond to an existing experiment ID in Takuan
  2. A valid TSV/CSV file in the request body as data
2. During the ingestion, Takuan creates a gene_expression row for every expression row in the file
The gene_expression table now contains rows with the raw_count column filled
(Optional) Normalized counts can be computed on demand and stored in the database
1. POST /normalize/{experiment_result_id}/{method}
  1. experiment_result_id is the ID of an experiment with raw gene expressions
  2. method is the normalization method to use (TPM, TMM or GETMM)
  3. TPM and GETMM both require that you include a gene_lengths CSV file in the body
2. Normalized values are added in the appropriate column of gene_expression
Query the experiments and gene expressions in your DB!
1. POST /expressions to get expression data results
  1. JSON request body for filtering results and pagination
2. POST experiment/{experiment_result_id}/samples to get the sample IDs for an experiment
3. POST experiment/{experiment_result_id}/features to get the gene IDs for an experiment

Synthetic data

Valid synthetic transcriptomics data can be produced for Takuan.

To do so, please follow the steps detailled in the Bento Demo Dataset repository.

After following the instructions, 4 relevant files are produced for Takuan:

counts_matrix_group_1.csv
counts_matrix_group_2.csv
counts_matrix_group_3.csv
gene_lengths.csv

The count matrices can be ingested into Takuan as is. The gene lengths file can be used as is to normalize ingested expressions.

For testing purposes, this repository includes an RCM and gene lenghts file:

Configuration

Environment variables

The following environment variables should be set when running a Takuan container:

Name	Description	Default
`AUTHZ_ENABLED`	Enables/disables the authorization plugin.	`False`
`CORS_ORIGINS`	List of allowed CORS origins	`Null`
`DB_HOST`	IP or hostname of the database	`tds-db`
`DB_PORT`	Database port	`5432`
`DB_USER`	Database username	`tds_user`
`DB_NAME`	Database name	`tds`
`DB_PASSWORD`	DB_USER's Database password	`Null`
`DB_PASSWORD_FILE`	Docker secret file for DB_USER's Database password	`Null`
`TDS_USER_NAME`	Non-root container user name running the server process	`Null`
`TDS_UID`	UID of TDS_USER_NAME	`1000`

Note: Only use DB_PASSWORD or DB_PASSWORK_FILE, not both, since they serve the same purpose in a different fashion.

Using Docker Secrets for the PostgreSQL credential

The Takuan Config object has its values populated from environment variables and secrets at startup.

The Config.db_password value is populated by either:

DB_PASSWORD=<a secure password> if using an environment variable
- As seen in docker-compose.dev.yaml
DB_PASSWORD_FILE=/run/secrets/db_password if using a Docker secret (recommended)
- As seen in docker-compose.secrets.dev.yaml

Using a Docker secret is recommended for security, as environment variables are more prone to be leaked.

DB_PASSWORD should only be considered for local development, or if the database is secured and isolated from public access in a private network.

Authorization plugin

The Transcriptomics Data Service is meant to be a reusable microservice that can be integrated in existing stacks. Since authorization schemes vary across projects, Takuan allows adopters to code their own authorization plugin, enabling adopters to leverage their existing access control code, tools and policies.

See the authorization docs for more information on how to create and use the authz plugin with Takuan.

Starting a standalone Takuan

Start the Takuan server with a local PostgreSQL database for testing by running the following command.

# start
docker compose up --build -d

# stop
docker compose down

The --build argument forces the image to be rebuilt. Be sure to use it if you want code changes to be present.

You can now interact with the API by querying localhost:5000/{endpoint}

For the OpenAPI browser interface, go to localhost:5000/docs.

Local dev

For local development, you can use the docker-compose.dev.yaml file to start a Takuan development container that mounts the local directory. The server starts in reload mode to quickly reflect local changes, and debugpy is listening on the container's internal port 9511.

# Set UID for directory permissions in the container
export UID=$(id -u)

# start
docker compose -f ./docker-compose.dev.yaml up --build -d

# stop
docker compose -f ./docker-compose.dev.yaml down

You can then attach VS Code to the takuan container, and use the preconfigured Python Debugger (Takuan) for interactive debugging.

GA4GH Service Info

This service implements GA4GH's Service-Info spec.

If left unconfigured, a default service info object will be returned.

For adopters outside of the Bento stack, we recommend that you provide a custom service info object when deploying.

This can be done by simply mounting a JSON file in the Takuan container.

When starting, the service will look for a JSON file at /tds/lib/service-info.json.

If the file exists, it will be served from the GET /service-info endpoint, otherwise the default is used.

Endpoints

The service exposes the following endpoints:

Endpoint	Method	Description
`/experiment`	GET	Get all experiments
`/experiment`	POST	Create an experiment
`/experiment/{experiment_result_id}`	GET	Get an experiment by unique ID
`/experiment/{experiment_result_id}`	DELETE	Delete an experiment by unique ID
`/experiment/{experiment_result_id}/samples`	POST	Retrieve the samples for a given experiment
`/experiment/{experiment_result_id}/features`	POST	Retrieve the features for a given experiment
`/experiment/{experiment_result_id}/ingest`	POST	Ingest multi-sample transcriptomics data into an experiment
`/experiment/{experiment_result_id}/ingest/single`	POST	Ingest single-sample transcriptomics data into an experiment
`/normalize/{experiment_result_id}/{method}`	POST	Normalize an experiment's gene expressions with one of the supported methods (TPM, TMM, GETMM)
`/expressions`	POST	Retrieve expressions with filter parameters
`/service-info`	GET	Returns a GA4GH service-info object describing the service

Note: For a more thorough API documentation, please refer to the OpenAPI release artifacts (openapi.json), or consult the hosted docs (link to come). An openapi.json file is produced and attached to every release.

Mount points

A Takuan deployment can be customized by mounting certain files to the container. The table bellow lists the files that can be mounted to a Takuan container to customize its behaviour.

Container path	Description
`/run/secrets/`	Docker secrets directory
`/tds/lib/.env`	Extra environment variables for an authz plugin
`/tds/lib/authz.module.py`	Custom authorization plugin implementation
`/tds/lib/requirements.txt`	Extra Python dependencies to install for an authz plugin
`/tds/lib/service-info.json`	Custom GA4GH service-info JSON definition

Docker images

The Transcriptomics Data Service is packaged and released as a Docker image using GitHub Actions.

Images are published in GitHub's container registry, here.

Images are built and published using the following tags:

<version>: Build for a tagged release
latest: Build for the latest tagged release
edge: The top of the main branch
pr-<number>: Build for a pull request that targets main

Note: Images with the -dev suffix (e.g. edge-dev) are based on dev.Dockerfile for local development.

To pull an image, or reference it in a compose file, use this pattern:

docker pull ghcr.io/bento-platform/takuan:<TAG>

Examples

List of example scripts to interact with a Takuan API:

Ingesting single sample TSV data

Name		Name	Last commit message	Last commit date
Latest commit History 299 Commits
.github/workflows		.github/workflows
.vscode		.vscode
authz_plugins		authz_plugins
docs		docs
examples		examples
tests		tests
transcriptomics_data_service		transcriptomics_data_service
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
dev.Dockerfile		dev.Dockerfile
dev_db_password.txt		dev_db_password.txt
docker-compose.dev.yaml		docker-compose.dev.yaml
docker-compose.secrets.dev.yaml		docker-compose.secrets.dev.yaml
docker-compose.test.yaml		docker-compose.test.yaml
docker-compose.yaml		docker-compose.yaml
entrypoint.bash		entrypoint.bash
gosu_entrypoint.bash		gosu_entrypoint.bash
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
run.bash		run.bash
run.dev.bash		run.dev.bash
test-docker.bash		test-docker.bash

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Takuan

Data

Multi-sample matrices (RCM)

Single-sample detailled documents

Data model and flow

Synthetic data

Configuration

Environment variables

Using Docker Secrets for the PostgreSQL credential

Authorization plugin

Starting a standalone Takuan

Local dev

GA4GH Service Info

Endpoints

Mount points

Docker images

Examples

About

Uh oh!

Releases 2

Packages

Uh oh!

Uh oh!

Contributors 5

Uh oh!

Languages

License

bento-platform/takuan

Folders and files

Latest commit

History

Repository files navigation

Takuan

Data

Multi-sample matrices (RCM)

Single-sample detailled documents

Data model and flow

Synthetic data

Configuration

Environment variables

Using Docker Secrets for the PostgreSQL credential

Authorization plugin

Starting a standalone Takuan

Local dev

GA4GH Service Info

Endpoints

Mount points

Docker images

Examples

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Uh oh!

Contributors 5

Uh oh!

Languages

Packages