dbt-invoke-gemma is a CLI for creating, updating, and deleting dbt property files and md files. Its primary purpose is to assist in generating dbt documentation.
This package is a Gemma adaptation of the dbt-invoke CLI (built with Invoke).
-
Supported dbt resource types:
- models
- snapshots
- analyses
-
Under the hood, this tool works by combining the power of the dbt ls and dbt run-operation commands with dbt's built-in
get_columns_in_querymacro.- This methodology allows the tool to work on ephemeral models and analyses, which other approaches, such as those based on listing data warehouse tables/views, can miss.
pip install dbt-invoke-gemmaNOTE: dbt version 1.8 or above is required
- There are 2 main commands
dbt-invoke --mdfiles: generate md files with doc blocks.dbt-invoke --properties: generate property files referencing the doc blocks.
- You must have previously executed
dbt run/dbt seed/dbt snapshoton the resources for which you wish to create/update property files.- If you have made updates to your resources, execute the appropriate command
(
dbt run/dbt seed/dbt snapshot) before using this tool to create/update property files.
- If you have made updates to your resources, execute the appropriate command
(
- The package also provides suggested prompts to feed to the CursorAI agent to automatically generate documentation. To use the prompts you must have a valid cursor account.
dbt-invoke mdfiles <options>- The first time you run this command, you should be prompted to add a short
macro called
_log_columns_listto your dbt project.- You may accept the prompt to add it automatically.
- Otherwise, copy/paste it into one your dbt project's macro-paths yourself.
NOTE: You need to generate md files BEFORE generating the property files, otherwise the system will throw an error.
-
md files will be created, updated, or deleted on a one-to-one basis in a subfolder of the path of the resource files they represent (naming convention:
_{resource folder name}_docs)- For example, given a resource file in the location
models/marts/core/users.sql, this tool will create, update, or delete a property file in the locationmodels/marts/core/_core_docs/users.md. If the docs subfolder (ex.models/marts/core/_core_docs) does not exist, it will be generated by the package.
- For example, given a resource file in the location
-
Any newly generated md files are created with the doc snippets for the columns of the resource (naming convention:
{model name}__{COLUMN NAME}).- For example, when generating a new property file for a model
userswith column namesuser_idandcreated_at, the following md file will be generated:-
{% docs users__USER_ID %} {% enddocs %} {% docs users__CREATED_AT %} {% enddocs %}
-
- For example, when generating a new property file for a model
-
When updating an already existing md file, existing columns and descriptions will remain as they are, new columns in the resource will be added (however, columns that no longer exist will not be removed).
-
The command will also provide a suggestion on how to prompt the CursorAI agent to automatically fill out the content of the doc files.
To add the documentation to fields, use the following prompt for cursor AI after passing the following as context: - the md file of the model - the sql file of the model > Check the query for the sql model of this md file, and fill out the empty doc snippets > (only the empty doc snippets, do not edit the ones that already have content). > For each field the info should include: > - description > - column level lineage > - calculation or field logic (if the field is derived) > Check this file well, it is really important that the info is correct. > Also, fill out ALL empty doc snippets, do not skip any fields. > Give me the change immediately, do not wait for me to ask several times to change the file
dbt-invoke properties <options>-
Property files will be created, updated, or deleted on a one-to-one basis in a subfolder of the path of the resource files they represent (naming convention:
_{resource folder name}_schemas)- For example, given a resource file in the location
models/marts/core/users.sql, this tool will create, update, or delete a property file in the locationmodels/marts/core/_core_schemas/users.yml. - If your dbt project defines properties for multiple resources per
.ymlfile, see the Migrating to One Resource Per Property File section.
- For example, given a resource file in the location
-
Any newly generated property files are created with the correct resource type, resource name, and columns. For each column, the related doc snippet (naming convention:
{model name}__{COLUMN NAME}) will be added as description.- For example, when generating a new property file for a model
userswith column namesuser_idandcreated_at, the following yaml will be generated:-
version: 2 models: - name: users description: '' columns: - name: user_id description: |- {{ doc("users_USER_ID") }} data_tests: [] - name: created_at description: |- {{ doc("users_CREATED_AT") }} data_tests: []
-
- For example, when generating a new property file for a model
-
When updating an already existing property file, new columns in the resource will be added, and columns that no longer exist will be removed.
-
You may add other properties (e.g.
data tests). They will remain intact when updating existing property files as long as the column/resource name to which they belong still exists.
<options> primarily uses the same arguments as the dbt ls command to
allow flexibility in selecting the dbt resources for which you wish to
create/update property files (run dbt ls --help for details).
-
--resource-type
-
--models
-
--select
-
--selector
-
--exclude
-
--project-dir
-
--profiles-dir
-
--profile
-
--target
-
--vars
-
--bypass-cache
-
--state
-
Notes:
- This tool supports only the long flags of
dbt lsoptions (for example:--models, and not short flags like-m). - Multiple values for the same argument can be passed as a comma separated
string (Example:
--models modelA,modelB)- Keep in mind that dbt may not support multiple values for certain options.
- This tool supports only the long flags of
-
Two additional flags are made available.
--log-levelto alter the verbosity of logs.- It accepts one of Python's standard logging levels (debug, info, warning, error, critical).
--threadsto set a maximum number of concurrent threads to use in collecting resources' column information from the data warehouse and in creating/updating the corresponding property files. Each thread will run dbt's get_columns_in_query macro against the data warehouse.
-
Some examples:
# Create/update md files for all supported resource types dbt-invoke mdfiles # Create/update property files for all supported resource types dbt-invoke properties # Create/update md files for all supported resource types, using 4 concurrent threads dbt-invoke mdfiles --threads 4 # Create/update property files for all supported resource types, using 4 concurrent threads dbt-invoke properties --threads 4 # Create/update md files for all models in a models/marts directory dbt-invoke mdfiles --models marts # Create/update property files for all models in a models/marts directory dbt-invoke properties --models marts # Create/update md files for a 'users' model and an 'orders' models dbt-invoke mdfiles --models users,orders # Create/update property files for a 'users' model and an 'orders' models dbt-invoke properties --models users,orders # Create/update md files for a 'users' model and all downstream models dbt-invoke mdfiles --models users+ # Create/update property files for a 'users' model and all downstream models dbt-invoke properties --models users+ # Create/update a md file for a snapshot called 'users_snapshot' dbt-invoke mdfiles --resource-type snapshot --select users_snapshot # Create/update a property file for a snapshot called 'users_snapshot' dbt-invoke properties --resource-type snapshot --select users_snapshot
dbt-invoke properties.delete <options><options>uses the same arguments as for creating/updating property files, except for--threads.
-
To view the list of available commands and their short descriptions, run:
dbt-invoke --list
-
To view in depth command descriptions and available options/flags, run:
dbt-invoke <command_name> --help
- dbt-invoke will try to preserve formatting and comments when updating
existing files. If you want to preserve line-breaks, use
>or|on your multiline strings, as recommended here. - In order to collect or update the list of columns that should appear in
each property file, dbt's
get_columns_in_querymacro is run for each matching resource. As of the time of writing,get_columns_in_queryuses a SELECT statement limited to zero rows. While this is not typically a performance issue for table or incremental materializations, execution may be slow for complex analyses, views, or ephemeral materializations.- This may be partially remedied by increasing the value of the
--threadsoption indbt-invoke properties.update.
- This may be partially remedied by increasing the value of the
- dbt-invoke has not been tested across different types of data warehouses.