The model.training_data table is currently restricted to only final model runs:
|
# Build the base metadata DataFrame |
|
base_query = """ |
|
SELECT |
|
run_id, |
|
year, |
|
assessment_year, |
|
dvc_md5_training_data |
|
FROM model.metadata |
|
WHERE run_type = 'final' |
|
""" |
However, it would also be useful to have it for comp runs, which would greatly reduce the maintenance burden of this CTE in the pinval.vw_comp view:
|
training_data AS ( |
|
SELECT |
|
train.*, |
|
meta.model_predictor_all_name |
|
FROM {{ ref('model.training_data') }} AS train |
|
LEFT JOIN {{ source('model', 'metadata') }} AS meta |
|
ON train.run_id = meta.run_id |
|
-- Currently the `model.training_data` table only includes training data |
|
-- for final model runs, not comp runs, so we can only use final model runs |
|
-- here. Further, we have to make a manual decision about which final model |
|
-- run has training data that matches the comp run for assessment years |
|
-- that have multiple final models. |
|
WHERE train.run_id IN ( |
|
'2024-03-17-stupefied-maya', |
|
'2025-02-11-charming-eric' |
|
) |
|
), |
Tasks include:
Adapted from #864 (comment).
The
model.training_datatable is currently restricted to only final model runs:data-architecture/dbt/models/model/model.training_data.py
Lines 14 to 23 in 236e8cc
However, it would also be useful to have it for comp runs, which would greatly reduce the maintenance burden of this CTE in the
pinval.vw_compview:data-architecture/dbt/models/pinval/pinval.vw_comp.sql
Lines 39 to 55 in 236e8cc
Tasks include:
run_type = 'comps'to the conditional in themodel.training_dataquerymodel.training_datalocally and confirm that it adds rows for the two comps runs (2024 and 2025)model.training_datato clarify that comps runs should also be includedpinval.vw_compview to remove theWHEREclause in this CTE and adjust the join to join byrun_idinstead ofassessment_yearAdapted from #864 (comment).