Using dbt (data build tool) to convert Synthea synthetic data to OMOP Common Data Model in PostgreSQL
- dbt-core >= 1.0.0
- dbt-postgres >= 1.0.0
- sqlfluff >= 2.0.0
- sqlfluff-templater-dbt >= 2.0.0
- Install dbt with Postgres adapter, sqlfluff and sqlfluff-templater-dbt
pip install dbt-postgres sqlfluff sqlfluff-templater-dbt
- If the installation is successful, you should have dbt CLI accessible globally. To test run
dbt --version
-
Go to
~/.dbt/(user home directory, i.e., not this repo directory), createprofiles.ymlwith the example fromexample.profiles.yml. Fill in database credentials, read instructions in Postgres Profile config.
dbt can write and read to one database connection at a time. dbt is not an ETL tool, as it does just the T. -
Test settings & DB connection, run
dbt debug
- Install packages for macro
dbt deps
-
Write code. Learn more from Official free online courses.
-
We use SQLFluff to make our SQL code clean and consistent.
Install SQLFluff and sqlfluff-templater-dbt with
pip install sqlfluff sqlfluff-templater-dbt
To lint SQL codes, run
sqlfluff lint .
To fix linting issues, run
sqlfluff fix .
or the following to force fix
sqlfluff fix -f .
- To test whether the code is good on our database or not, compile the model with
dbt compile
(optional: with --models like dbt run below).
- To run models,
dbt run- regular run- Model selection syntax (source). Specifying models can save you a lot of time by only running/testing the models that you think are relevant. However, there is a risk that you'll forget to specify an important upstream dependency so it's a good idea to understand the syntax thoroughly:
dbt run --models modelname- will only run modelnamedbt run --models +modelname- will run modelname and all parentsdbt run --models modelname+- will run modelname and all childrendbt run --models +modelname+- will run modelname, and all parents and childrendbt run --models @modelname- will run modelname, all parents, all children, AND all parents of all childrendbt run --exclude modelname- will run all models except modelname- Note that all of these work with folder selection syntax too:
dbt run --models folder- will run all models in a folderdbt run --models folder.subfolder- will run all models in the subfolderdbt run --models +folder.subfolder- will run all models in the subfolder and all parents
- To run models with test, use
dbt build
(optional: with --models like dbt run above).
- To generate doc, run
dbt docs generate, serve bydbt docs serve. Or, run them together as
dbt docs generate && dbt docs serve --port 8080
- Learn more about dbt in the docs, reference, best practices
- dbt project maturity levels repo, video
- GitLab's dbt repo