A small container to get an OMOP CDM database running quickly, with support for both PostgreSQL and SQL Server.
Drop your data into data/, and run the container.
You can configure the container or CLI using the following environment variables:
DB_HOST: The hostname of the database. Default isdb.DB_PORT: The port number of the database. Default is5432.DB_USER: The username for the database. Default ispostgres.DB_PASSWORD: The password for the database. Default ispassword.DB_NAME: The name of the database. Default isomop.DIALECT: The type of database to use. Default ispostgresql, but can also bemssql.SCHEMA_NAME: The name of the schema to be created/used in the database. Default ispublic.DATA_DIR: The directory containing the data CSV files. Default isdata.SYNTHETIC: Load synthetic data (boolean). Default isfalseSYNTHETIC_NUMBER: Size of synthetic data,100or1000. Default is100.DELIMITER: The delimiter used to separate data. Default istab, can also be,
pip install omop-lite
python omop-lite --help
docker run -v ./data:/data ghcr.io/health-informatics-uon/omop-lite
# docker-compose.yml
services:
omop-lite:
image: ghcr.io/health-informatics-uon/omop-lite
volumes:
- ./data:/data
depends_on:
- db
db:
image: postgres:latest
environment:
- POSTGRES_DB=omop
- POSTGRES_PASSWORD=password
ports:
- "5432:5432"To install using Helm:
# Add the Helm repository
helm install omop-lite oci://ghcr.io/health-informatics-uon/charts/omop-lite --version 0.2.2The Helm chart deploys OMOP Lite as a Kubernetes Job that creates an OMOP CDM in a database. You can customise the installation using a values file:
# values.yaml
env:
dbHost: postgres
dbPort: "5432"
dbUser: postgres
dbPassword: postgres
dbName: omop_helm
dialect: postgresql
schemaName: public
synthetic: "false" Install with custom values:
helm install omop-lite omop-lite/omop-lite -f values.yamlIf you need synthetic data, some is provided in the synthetic directory. It provides a small amount of data to load quickly.
To load the synthetic data, run the container with the SYNTHETIC environment variable set to true.
- 100 is fake data.
- 1000 is Synthea 1k data.
You can provide your own data for loading into the tables by placing your files in the data/ directory. This should contain .csv files matching the data tables (DRUG_STRENGTH.csv, CONCEPT.csv, etc.).
To match the vocabulary files from Athena, this data should be tab-separated, but as a .csv file extension.
You can override the delimiter with DELIMITER configuration.
Adding a tsvector column to the concept table and an index on that column makes full-text search queries on the concept table run much faster.
Postgres does vector search too!
To enable these features in omop-lite, you can use the text-search profile
docker compose --profile text-search upTo do this, you need to have text-search/embeddings.parquet, containing concept_ids and embeddings (an example file is provided).
This uses pgvector to create an embeddings table.
If you're a developer and want to iterate on omop-lite quickly, there's a small subset of the vocabularies sufficient to build in synthetic/.
If you wish to test the vector search, there are matching embeddings in embeddings/embeddings.parquet.