Skip to content

Gemma-Analytics/dbt-invoke-gemma

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

25 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

dbt-invoke

dbt-invoke-gemma is a CLI for creating, updating, and deleting dbt property files and md files. Its primary purpose is to assist in generating dbt documentation.

This package is a Gemma adaptation of the dbt-invoke CLI (built with Invoke).

  • Supported dbt resource types:

    • models
    • snapshots
    • analyses
  • Under the hood, this tool works by combining the power of the dbt ls and dbt run-operation commands with dbt's built-in get_columns_in_query macro.

    • This methodology allows the tool to work on ephemeral models and analyses, which other approaches, such as those based on listing data warehouse tables/views, can miss.

Installation

  pip install dbt-invoke-gemma

NOTE: dbt version 1.8 or above is required

Usage

  • There are 2 main commands
    • dbt-invoke --mdfiles: generate md files with doc blocks.
    • dbt-invoke --properties: generate property files referencing the doc blocks.
  • You must have previously executed dbt run/dbt seed/dbt snapshot on the resources for which you wish to create/update property files.
    • If you have made updates to your resources, execute the appropriate command (dbt run/dbt seed/dbt snapshot) before using this tool to create/update property files.
  • The package also provides suggested prompts to feed to the CursorAI agent to automatically generate documentation. To use the prompts you must have a valid cursor account.

Generate md files

dbt-invoke mdfiles <options>
  • The first time you run this command, you should be prompted to add a short macro called _log_columns_list to your dbt project.
    • You may accept the prompt to add it automatically.
    • Otherwise, copy/paste it into one your dbt project's macro-paths yourself.

NOTE: You need to generate md files BEFORE generating the property files, otherwise the system will throw an error.

  • md files will be created, updated, or deleted on a one-to-one basis in a subfolder of the path of the resource files they represent (naming convention: _{resource folder name}_docs)

    • For example, given a resource file in the location models/marts/core/users.sql, this tool will create, update, or delete a property file in the location models/marts/core/_core_docs/users.md. If the docs subfolder (ex.models/marts/core/_core_docs) does not exist, it will be generated by the package.
  • Any newly generated md files are created with the doc snippets for the columns of the resource (naming convention: {model name}__{COLUMN NAME}).

    • For example, when generating a new property file for a model users with column names user_id and created_at, the following md file will be generated:
      • {% docs users__USER_ID %}
        
        {% enddocs %}
        
        {% docs users__CREATED_AT %}
        
        {% enddocs %}
  • When updating an already existing md file, existing columns and descriptions will remain as they are, new columns in the resource will be added (however, columns that no longer exist will not be removed).

  • The command will also provide a suggestion on how to prompt the CursorAI agent to automatically fill out the content of the doc files.

    To add the documentation to fields, use the following prompt for cursor AI
    after passing the following as context:
    - the md file of the model
    - the sql file of the model
    
    > Check the query for the sql model of this md file, and fill out the empty doc snippets
    > (only the empty doc snippets, do not edit the ones that already have content).
    > For each field the info should include:
    > - description
    > - column level lineage
    > - calculation or field logic (if the field is derived)
    > Check this file well, it is really important that the info is correct.
    > Also, fill out ALL empty doc snippets, do not skip any fields.
    > Give me the change immediately, do not wait for me to ask several times to change the file
    

Generate property files

dbt-invoke properties <options>
  • Property files will be created, updated, or deleted on a one-to-one basis in a subfolder of the path of the resource files they represent (naming convention: _{resource folder name}_schemas)

    • For example, given a resource file in the location models/marts/core/users.sql, this tool will create, update, or delete a property file in the location models/marts/core/_core_schemas/users.yml.
    • If your dbt project defines properties for multiple resources per .yml file, see the Migrating to One Resource Per Property File section.
  • Any newly generated property files are created with the correct resource type, resource name, and columns. For each column, the related doc snippet (naming convention: {model name}__{COLUMN NAME}) will be added as description.

    • For example, when generating a new property file for a model users with column names user_id and created_at, the following yaml will be generated:
      • version: 2
        models:
        - name: users
          description: ''
          columns:
          - name: user_id
            description: |-
             {{ doc("users_USER_ID") }}
            data_tests: []
          - name: created_at
            description: |-
             {{ doc("users_CREATED_AT") }}
            data_tests: []
  • When updating an already existing property file, new columns in the resource will be added, and columns that no longer exist will be removed.

  • You may add other properties (e.g. data tests). They will remain intact when updating existing property files as long as the column/resource name to which they belong still exists.

Options

<options> primarily uses the same arguments as the dbt ls command to allow flexibility in selecting the dbt resources for which you wish to create/update property files (run dbt ls --help for details).

  • --resource-type

  • --models

  • --select

  • --selector

  • --exclude

  • --project-dir

  • --profiles-dir

  • --profile

  • --target

  • --vars

  • --bypass-cache

  • --state

  • Notes:

    • This tool supports only the long flags of dbt ls options (for example: --models, and not short flags like -m).
    • Multiple values for the same argument can be passed as a comma separated string (Example: --models modelA,modelB)
      • Keep in mind that dbt may not support multiple values for certain options.
  • Two additional flags are made available.

    • --log-level to alter the verbosity of logs.
      • It accepts one of Python's standard logging levels (debug, info, warning, error, critical).
    • --threads to set a maximum number of concurrent threads to use in collecting resources' column information from the data warehouse and in creating/updating the corresponding property files. Each thread will run dbt's get_columns_in_query macro against the data warehouse.
  • Some examples:

    # Create/update md files for all supported resource types
    dbt-invoke mdfiles
    
    # Create/update property files for all supported resource types
    dbt-invoke properties
    
    # Create/update md files for all supported resource types, using 4 concurrent threads
    dbt-invoke mdfiles --threads 4
    
    # Create/update property files for all supported resource types, using 4 concurrent threads
    dbt-invoke properties --threads 4
    
    # Create/update md files for all models in a models/marts directory
    dbt-invoke mdfiles --models marts
    
    # Create/update property files for all models in a models/marts directory
    dbt-invoke properties --models marts
    
    # Create/update md files for a 'users' model and an 'orders' models
    dbt-invoke mdfiles --models users,orders
    
    # Create/update property files for a 'users' model and an 'orders' models
    dbt-invoke properties --models users,orders
    
    # Create/update md files for a 'users' model and all downstream models
    dbt-invoke mdfiles --models users+
    
    # Create/update property files for a 'users' model and all downstream models
    dbt-invoke properties --models users+
    
    # Create/update a md file for a snapshot called 'users_snapshot'
    dbt-invoke mdfiles --resource-type snapshot --select users_snapshot
    
    # Create/update a property file for a snapshot called 'users_snapshot'
    dbt-invoke properties --resource-type snapshot --select users_snapshot
    

Deleting Property Files

dbt-invoke properties.delete <options>
  • <options> uses the same arguments as for creating/updating property files, except for --threads.

Help

  • To view the list of available commands and their short descriptions, run:

    dbt-invoke --list
  • To view in depth command descriptions and available options/flags, run:

    dbt-invoke <command_name> --help

Limitations

  • dbt-invoke will try to preserve formatting and comments when updating existing files. If you want to preserve line-breaks, use > or | on your multiline strings, as recommended here.
  • In order to collect or update the list of columns that should appear in each property file, dbt's get_columns_in_query macro is run for each matching resource. As of the time of writing, get_columns_in_query uses a SELECT statement limited to zero rows. While this is not typically a performance issue for table or incremental materializations, execution may be slow for complex analyses, views, or ephemeral materializations.
    • This may be partially remedied by increasing the value of the --threads option in dbt-invoke properties.update.
  • dbt-invoke has not been tested across different types of data warehouses.

About

A CLI for creating, updating, and deleting dbt property files

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Python 100.0%