Skip to content

Add comps training data to model.training_data #870

@jeancochrane

Description

@jeancochrane

The model.training_data table is currently restricted to only final model runs:

# Build the base metadata DataFrame
base_query = """
SELECT
run_id,
year,
assessment_year,
dvc_md5_training_data
FROM model.metadata
WHERE run_type = 'final'
"""

However, it would also be useful to have it for comp runs, which would greatly reduce the maintenance burden of this CTE in the pinval.vw_comp view:

training_data AS (
SELECT
train.*,
meta.model_predictor_all_name
FROM {{ ref('model.training_data') }} AS train
LEFT JOIN {{ source('model', 'metadata') }} AS meta
ON train.run_id = meta.run_id
-- Currently the `model.training_data` table only includes training data
-- for final model runs, not comp runs, so we can only use final model runs
-- here. Further, we have to make a manual decision about which final model
-- run has training data that matches the comp run for assessment years
-- that have multiple final models.
WHERE train.run_id IN (
'2024-03-17-stupefied-maya',
'2025-02-11-charming-eric'
)
),

Tasks include:

  • Add run_type = 'comps' to the conditional in the model.training_data query
  • Rerun model.training_data locally and confirm that it adds rows for the two comps runs (2024 and 2025)
  • Adjust the docs for model.training_data to clarify that comps runs should also be included
  • Refactor pinval.vw_comp view to remove the WHERE clause in this CTE and adjust the join to join by run_id instead of assessment_year

Adapted from #864 (comment).

Metadata

Metadata

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions