[COST-7249] Phase 2: Rate Calculation Isolation — RatesToUsage pipeline#6017
[COST-7249] Phase 2: Rate Calculation Isolation — RatesToUsage pipeline#6017jordigilh wants to merge 2 commits intoproject-koku:mainfrom
Conversation
There was a problem hiding this comment.
Code Review
This pull request implements Phase 2 of the cost breakdown project, introducing the RatesToUsage table and a new SQL-based pipeline for OCP cost calculations. Key changes include normalizing JSON rates into a dedicated Rate table, updating the cost model manager for diff-based synchronization, and integrating the RatesToUsage logic into the OCPCostModelCostUpdater. Feedback identifies a critical logic error where the aggregation step is positioned outside the month loop, which would result in missing data for multi-month ranges. Furthermore, multiple SQL templates require single quotes around Jinja placeholders for date, string, and UUID literals to prevent syntax errors. A design concern was also raised regarding the level of aggregation in the RatesToUsage table, which may need to be more granular to support future per-rate breakdown requirements.
| SELECT uuid_generate_v4(), {{cost_model_id}}, {{report_period_id}}, {{source_uuid}}, | ||
| b.usage_start, b.usage_start, b.node, b.namespace, b.cluster_id, b.cluster_alias, | ||
| b.data_source, b.persistentvolumeclaim, b.pod_labels, b.volume_labels, b.all_labels, b.label_hash, | ||
| 'cpu_core_usage_per_hour', 'cpu', {{rate_type}}, | ||
| NULL, b.cpu_usage_hours * {{cpu_core_usage_per_hour}}, b.cost_category_id | ||
| FROM base b WHERE {{cpu_core_usage_per_hour}} != 0 |
There was a problem hiding this comment.
Missing quotes around Jinja placeholders for UUID and string values ({{cost_model_id}}, {{source_uuid}}, {{rate_type}}). This issue persists across all UNION ALL components in this file.
SELECT uuid_generate_v4(), '{{cost_model_id}}', {{report_period_id}}, '{{source_uuid}}',
b.usage_start, b.usage_start, b.node, b.namespace, b.cluster_id, b.cluster_alias,
b.data_source, b.persistentvolumeclaim, b.pod_labels, b.volume_labels, b.all_labels, b.label_hash,
'cpu_core_usage_per_hour', 'cpu', '{{rate_type}}',
NULL, b.cpu_usage_hours * {{cpu_core_usage_per_hour}}, b.cost_category_id
FROM base b WHERE {{cpu_core_usage_per_hour}} != 0There was a problem hiding this comment.
False positive — same JinjaSql parameterization applies. {{cost_model_id}}, {{source_uuid}}, and {{rate_type}} are all rendered as %(param)s placeholders by JinjaSql and quoted by psycopg2 at execution time. No manual quotes needed.
New Risks Identified: R20 & R21@myersCody — Two new risks surfaced during an adversarial audit of the Phase 2 implementation. Full details are in the risk register (v1.2). Summary below for your review. R20 — Aggregation DELETE Scope Too Broad for Phase 2Problem: Options (details in risk register):
Recommendation: Option A or C. R21 — Transitional VM Cost Handling (Phase 2 → 3)Problem: Resolution: R21 resolves mechanically once R20 is decided — each R20 option dictates exactly what happens to the VM cost path. Also in this pushFinding B fix (commit |
R20 & R21 — Resolved (Option D: Reorder orchestration)@myersCody — Follow-up to my earlier comment. Both risks are now mitigated. The approach was derived directly from your feedback on the tech design PR. R20: Aggregation DELETE scopeRoot cause: Fix: Moved the aggregation to run before the legacy direct-write paths. The DELETE now only removes stale rows from the previous cycle, and legacy VM/tag paths write their rows after — unaffected. Why this approach (from your PR #5948 reviews):
R21: Transitional VM cost handlingAlready resolved on this branch: The original
With the R20 orchestration reorder, New orchestration order (Phase 2)Full decision rationale with code references in risk-register.md (v1.2, R20 § "Why Option D" and R21 § "Resolution"). Commits in this push:
|
Phase 2 Drift Fixes — Post-Preflight ExecutionThree drift fixes committed, resolving the remaining items from the adversarial audit: Commit 1:
|
c9b388b to
545ae8d
Compare
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #6017 +/- ##
=======================================
- Coverage 94.3% 94.3% -0.0%
=======================================
Files 361 361
Lines 31805 31909 +104
Branches 3484 3494 +10
=======================================
+ Hits 30007 30101 +94
Misses 1170 1170
- Partials 628 638 +10 🚀 New features to boost your workflow:
|
myersCody
left a comment
There was a problem hiding this comment.
The current structure makes it really hard to validate through our integration tests. Overall, mostly good structure. I would like to see us utilize the rate table more in the sql though.
| # cycle. Legacy paths write their rows AFTER this step and are unaffected. | ||
| self._aggregate_rates_to_daily_summary(start_date, end_date) | ||
|
|
||
| self._update_usage_costs(start_date, end_date) |
There was a problem hiding this comment.
We won't be able to utilize our integration tests to confirm your changes with this current structure.
self._aggregate_rates_to_daily_summary(start_date, end_date) -> populates daily summary with cost_model_cost_type = Infrastructure & Supplementary
elf._update_usage_costs(start_date, end_date) -> Deletes Infrastructure & Supplementary
Reinserts Infrastructure and Supplementary.
Essentially we are doing twice the work at the moment with zero gain and no way to confirm functionality outside of the validate_ script that you wrote. Not that I don't trust your validate script, I would just like the ability to use integrated tests instead.
What we can do is add an unleash flag to conditionally turn on your feature pathway for integration tests:
If feature_enabled:
self._aggregate_rates_to_daily_summary(start_date, end_date)
else:
self._update_usage_costs(start_date, end_date)
There was a problem hiding this comment.
Sounds good — here's the flag proposal for the gating:
Flag name: cost-management.backend.cost_breakdown_rates_to_usage
Constant: COST_BREAKDOWN_RTU_UNLEASH_FLAG in masu/processor/__init__.py
Following the GPU flag pattern (dev_fallback=True):
- SaaS: off by default, enabled per-schema via Unleash for integration testing
- Dev: on by default via
fallback_development_true - On-prem: off until promoted to
MockUnleashClient.ONPREM_FLAG_DEFAULTSafter validation
Gating follows your either/or suggestion — when enabled, the RTU pipeline replaces the legacy _update_usage_costs path entirely (no dual-write):
rtu_enabled = is_feature_flag_enabled_by_schema(
self._schema, COST_BREAKDOWN_RTU_UNLEASH_FLAG, dev_fallback=True
)
if rtu_enabled:
self._update_usage_rates_to_usage(start_date, end_date)
self._aggregate_rates_to_daily_summary(start_date, end_date)
else:
self._update_usage_costs(start_date, end_date)This way existing integration tests run the legacy path by default, and we can flip the flag per-schema to exercise the RTU pipeline independently.
Also bundled in this update: the SQL refactor you suggested (comments 4+5) — rate_names CTE now JOINs cost_model_rate for default_rate and cost_type directly, eliminating 11 per-metric Jinja params and the two-call Infrastructure/Supplementary loop (single-pass). cluster_cost_per_hour remains as the sole rate Jinja param for the cte_node_cost pre-computation. DELETE is merged into the INSERT file, and all RTU SQL is under usage_rates/.
There was a problem hiding this comment.
So, I traced down the _update_usage_costs path, and I saw:
report_accessor.populate_vm_usage_costs(
report_type,
filter_dictionary(report_type_dict, metric_constants.COST_MODEL_VM_USAGE_RATES),
start_date,
end_date,
self._provider.uuid,
report_period_id,
)
Your:
self._update_usage_rates_to_usage(start_date, end_date)
self._aggregate_rates_to_daily_summary(start_date, end_date)
Does not calculate the VM rates. Which is likely the source of your failing integration tests.
You can decouple the populate_vm_usage_costs and run it independently of the usage costs and that may get your integration tests passing.
There was a problem hiding this comment.
Exactly right — this was the root cause. Extracted _update_vm_usage_costs() as a standalone method and wired it into the RTU path after aggregation (commit ce7fcd2). The orchestration is now:
if rtu_enabled and cost_model_id:
_update_usage_rates_to_usage() # RTU INSERT
_aggregate_rates_to_daily_summary() # DELETE+INSERT from RTU
_update_vm_usage_costs() # VM costs (decoupled)
else:
_update_usage_costs() # Legacy (includes VM)
Also added _cleanup_stale_rtu_costs() for the edge case where rtu_enabled=True but the cost model has been removed — deletes orphaned rows from both daily_summary and rates_to_usage.
| SELECT uuid_generate_v4(), {{cost_model_id}}, {{report_period_id}}, {{source_uuid}}, | ||
| b.usage_start, b.usage_start, b.node, b.namespace, b.cluster_id, b.cluster_alias, | ||
| b.data_source, b.persistentvolumeclaim, b.pod_labels, b.volume_labels, b.all_labels, b.label_hash, | ||
| COALESCE(rn.custom_name, 'cpu_core_usage_per_hour'), 'cpu', {{rate_type}}, |
There was a problem hiding this comment.
COALESCE(rn.custom_name, 'cpu_core_usage_per_hour')I would prefer to usern.metricinstead of a hard coded vale. Less chance of a typo making it to production in the future.- Instead of
cpucould we just use themetric_typefield from rate table? - {{rate_type}} can be replaced with
cost_typefrom the rate field.
Utilizing the rate_type from the rate field means that we don't have to run this SQL twice, one for infrastructure and one for supplementary.
There was a problem hiding this comment.
All three addressed:
COALESCE(rn.custom_name, rn.metric)— all 11 components now fall back torn.metricinstead of hardcoded metric names (commite7e3c3e).metric_type— addedr.metric_typeto therate_namesCTE and replaced hardcoded'cpu'/'memory'/'storage'withrn.metric_typefor Components 1-3 and 7-11 (commit45bf072). Components 4-5 (node/cluster core) keep'cpu'because the Rate table stores'node'/'cluster'but the aggregation SQL routes costs viametric_type IN ('cpu','memory','storage')intocost_model_*_costcolumns. Component 6 (cluster_cost_per_hour) keeps theCASE WHEN distributionexpression since its target column is distribution-dependent.{{rate_type}}eliminated —rn.cost_typeis read directly from the Rate table via therate_namesCTE (commit8991953). This also eliminated the two-call loop (Infra/Supplementary in a single pass).
26777f3 to
2f2cd73
Compare
Post-Review Polish: Audit Findings SummaryFollowing a comprehensive adversarial audit of the entire PR (security, correctness, performance, design quality, maintainability), these two commits address the identified findings for long-term maintenance. All changes are backward-compatible and non-functional in the happy path. Commit 1:
|
myersCody
left a comment
There was a problem hiding this comment.
There are some failing regressions with your integration tests. Setting dev_fallback=True means that when the integration tests run turn the unleash flag on. Therefore, your changes is what is causing those test failures.
| SELECT uuid_generate_v4(), {{cost_model_id}}, {{report_period_id}}, {{source_uuid}}, | ||
| b.usage_start, b.usage_start, b.node, b.namespace, b.cluster_id, b.cluster_alias, | ||
| b.data_source, b.persistentvolumeclaim, b.pod_labels, b.volume_labels, b.all_labels, b.label_hash, | ||
| COALESCE(rn.custom_name, 'cpu_core_usage_per_hour'), 'cpu', rn.cost_type, |
There was a problem hiding this comment.
| COALESCE(rn.custom_name, 'cpu_core_usage_per_hour'), 'cpu', rn.cost_type, | |
| COALESCE(rn.custom_name, rn.metric), 'cpu', rn.cost_type, |
There was a problem hiding this comment.
Applied all 11 — every COALESCE(rn.custom_name, '<hardcoded>') is now COALESCE(rn.custom_name, rn.metric) (commit e7e3c3e).
| SELECT uuid_generate_v4(), {{cost_model_id}}, {{report_period_id}}, {{source_uuid}}, | ||
| b.usage_start, b.usage_start, b.node, b.namespace, b.cluster_id, b.cluster_alias, | ||
| b.data_source, b.persistentvolumeclaim, b.pod_labels, b.volume_labels, b.all_labels, b.label_hash, | ||
| COALESCE(rn.custom_name, 'cpu_core_request_per_hour'), 'cpu', rn.cost_type, |
There was a problem hiding this comment.
| COALESCE(rn.custom_name, 'cpu_core_request_per_hour'), 'cpu', rn.cost_type, | |
| COALESCE(rn.custom_name, rn.metric), 'cpu', rn.cost_type, |
| SELECT uuid_generate_v4(), {{cost_model_id}}, {{report_period_id}}, {{source_uuid}}, | ||
| b.usage_start, b.usage_start, b.node, b.namespace, b.cluster_id, b.cluster_alias, | ||
| b.data_source, b.persistentvolumeclaim, b.pod_labels, b.volume_labels, b.all_labels, b.label_hash, | ||
| COALESCE(rn.custom_name, 'cpu_core_effective_usage_per_hour'), 'cpu', rn.cost_type, |
There was a problem hiding this comment.
| COALESCE(rn.custom_name, 'cpu_core_effective_usage_per_hour'), 'cpu', rn.cost_type, | |
| COALESCE(rn.custom_name, rn.metric), 'cpu', rn.cost_type, |
| SELECT uuid_generate_v4(), {{cost_model_id}}, {{report_period_id}}, {{source_uuid}}, | ||
| b.usage_start, b.usage_start, b.node, b.namespace, b.cluster_id, b.cluster_alias, | ||
| b.data_source, b.persistentvolumeclaim, b.pod_labels, b.volume_labels, b.all_labels, b.label_hash, | ||
| COALESCE(rn.custom_name, 'node_core_cost_per_hour'), 'cpu', rn.cost_type, |
There was a problem hiding this comment.
| COALESCE(rn.custom_name, 'node_core_cost_per_hour'), 'cpu', rn.cost_type, | |
| COALESCE(rn.custom_name, rn.metric), 'cpu', rn.cost_type, |
| SELECT uuid_generate_v4(), {{cost_model_id}}, {{report_period_id}}, {{source_uuid}}, | ||
| b.usage_start, b.usage_start, b.node, b.namespace, b.cluster_id, b.cluster_alias, | ||
| b.data_source, b.persistentvolumeclaim, b.pod_labels, b.volume_labels, b.all_labels, b.label_hash, | ||
| COALESCE(rn.custom_name, 'cluster_core_cost_per_hour'), 'cpu', rn.cost_type, |
There was a problem hiding this comment.
| COALESCE(rn.custom_name, 'cluster_core_cost_per_hour'), 'cpu', rn.cost_type, | |
| COALESCE(rn.custom_name, rn.metric), 'cpu', rn.cost_type, |
| SELECT uuid_generate_v4(), {{cost_model_id}}, {{report_period_id}}, {{source_uuid}}, | ||
| b.usage_start, b.usage_start, b.node, b.namespace, b.cluster_id, b.cluster_alias, | ||
| b.data_source, b.persistentvolumeclaim, b.pod_labels, b.volume_labels, b.all_labels, b.label_hash, | ||
| COALESCE(rn.custom_name, 'memory_gb_effective_usage_per_hour'), 'memory', rn.cost_type, |
There was a problem hiding this comment.
| COALESCE(rn.custom_name, 'memory_gb_effective_usage_per_hour'), 'memory', rn.cost_type, | |
| COALESCE(rn.custom_name, rn.metric), 'memory', rn.cost_type, |
| SELECT uuid_generate_v4(), {{cost_model_id}}, {{report_period_id}}, {{source_uuid}}, | ||
| b.usage_start, b.usage_start, b.node, b.namespace, b.cluster_id, b.cluster_alias, | ||
| b.data_source, b.persistentvolumeclaim, b.pod_labels, b.volume_labels, b.all_labels, b.label_hash, | ||
| COALESCE(rn.custom_name, 'storage_gb_usage_per_month'), 'storage', rn.cost_type, |
There was a problem hiding this comment.
| COALESCE(rn.custom_name, 'storage_gb_usage_per_month'), 'storage', rn.cost_type, | |
| COALESCE(rn.custom_name, rn.metric), 'storage', rn.cost_type, |
| SELECT uuid_generate_v4(), {{cost_model_id}}, {{report_period_id}}, {{source_uuid}}, | ||
| b.usage_start, b.usage_start, b.node, b.namespace, b.cluster_id, b.cluster_alias, | ||
| b.data_source, b.persistentvolumeclaim, b.pod_labels, b.volume_labels, b.all_labels, b.label_hash, | ||
| COALESCE(rn.custom_name, 'storage_gb_request_per_month'), 'storage', rn.cost_type, |
There was a problem hiding this comment.
| COALESCE(rn.custom_name, 'storage_gb_request_per_month'), 'storage', rn.cost_type, | |
| COALESCE(rn.custom_name, rn.metric), 'storage', rn.cost_type, |
| lids.cost_category_id | ||
| ), | ||
|
|
||
| rate_names AS ( |
There was a problem hiding this comment.
Eventually there could be more than one price list per cost model. We need to choose the price list here that is the highest priority for the start & end date passed in.
There was a problem hiding this comment.
Acknowledged — the current rate_names CTE joins through price_list_cost_model_map without priority filtering, which works today because there's only one price list per cost model. When COST-575 lands the multi-price-list lifecycle support, this CTE will need to filter by pcm.priority and pl.effective_start_date <= start_date AND (pl.effective_end_date IS NULL OR pl.effective_end_date >= end_date). Tracking as a follow-up for when that schema is available.
There was a problem hiding this comment.
That schema is already available. The only thing stoping 575 from landing right now is the UI work and final verification, so I think it would be better to adjust the SQL now to handle more than one price list.
There was a problem hiding this comment.
Addressed in commit e4c7add — added an effective_price_list CTE that resolves the active price list by effective_start_date/effective_end_date date range and priority, then scopes rate_names to that single price list. This handles the multi-price-list scenario from 575.
| SELECT uuid_generate_v4(), {{cost_model_id}}, {{report_period_id}}, {{source_uuid}}, | ||
| b.usage_start, b.usage_start, b.node, b.namespace, b.cluster_id, b.cluster_alias, | ||
| b.data_source, b.persistentvolumeclaim, b.pod_labels, b.volume_labels, b.all_labels, b.label_hash, | ||
| COALESCE(rn.custom_name, 'cpu_core_usage_per_hour'), 'cpu', rn.cost_type, |
There was a problem hiding this comment.
We added a metric_type column to Rate table. Why are we hardcoding cpu, memory or storage for all of the select statement in order to insert the metric_type? I find that a bit confusing.
There was a problem hiding this comment.
Fixed in commit 45bf072 — added r.metric_type to the rate_names CTE and replaced hardcoded values with rn.metric_type for 8 of the 11 components (1-3, 7-11). The remaining 3 components (4, 5, 6) keep explicit values because the Rate table stores 'node'/'cluster' for those metrics, but the aggregation SQL filters on metric_type IN ('cpu', 'memory', 'storage') to route costs into the daily summary cost_model_*_cost columns. Using rn.metric_type there would silently drop those rows from aggregation.
efd3aae to
058ed69
Compare
|
/retest |
1 similar comment
|
/retest |
Add r.metric_type to the rate_names CTE and replace hardcoded
'cpu'/'memory'/'storage' with rn.metric_type in Components 1-3, 7-11.
Components 4-5 (node_core_cost_per_hour, cluster_core_cost_per_hour) keep
'cpu' because the Rate table stores 'node'/'cluster' but the aggregation
SQL routes costs via metric_type IN ('cpu','memory','storage') into the
daily summary cost_model_*_cost columns.
Component 6 (cluster_cost_per_hour) keeps the CASE WHEN distribution
expression since its metric_type is distribution-dependent.
Addresses TL review feedback on PR project-koku#6017.
Made-with: Cursor
|
Note on Changed the Unleash check from Why: |
|
/retest |
|
Tested with full smokes with just 1 failure unrelated to the PR -> we can switch to hot-fix-smokes tests for rebase run (in case the PR itself is updated, we will need a different label). |
e8fc443 to
e4c7add
Compare
| ) | ||
|
|
||
| rtu_enabled = is_feature_flag_enabled_by_schema( | ||
| self._schema, COST_BREAKDOWN_RTU_UNLEASH_FLAG, dev_fallback=False |
There was a problem hiding this comment.
So, I chatted with QE during our boardwalk today and the preference it to set the deve fallback=True. This means all smoke test jobs moving forward will be using your flow.
We have stage running the old flow for any regressions.
There was a problem hiding this comment.
Done — flipped dev_fallback=True so all smoke tests use the RTU flow going forward.
| ] | ||
|
|
||
| operations = [ | ||
| migrations.RemoveIndex( |
There was a problem hiding this comment.
These indexes are added in the previous migration, we could jsut set the migrations we want in the original migration 0348.
There was a problem hiding this comment.
Good call — squashed the composite index directly into 0348 and removed 0349.
9d33130 to
4ba9477
Compare
|
koku-ci timed out after ~6h during the data ingestion setup phase — the actual smoke tests never ran (still at 0% progress). The last activity was polling This is a resource/time constraint of /retest |
@jordigilh The PR check koku-ci-hrlz6 was actually not aborted - Note that the logs in the konflux UI are truncated (so it is def. confusing) I downloaded the results of that pr-check and see the following failures directly related to you PR: These failures indicate that the cost calculations are not as expected. |
4ba9477 to
da15ea8
Compare
Add r.metric_type to the rate_names CTE and replace hardcoded
'cpu'/'memory'/'storage' with rn.metric_type in Components 1-3, 7-11.
Components 4-5 (node_core_cost_per_hour, cluster_core_cost_per_hour) keep
'cpu' because the Rate table stores 'node'/'cluster' but the aggregation
SQL routes costs via metric_type IN ('cpu','memory','storage') into the
daily summary cost_model_*_cost columns.
Component 6 (cluster_cost_per_hour) keeps the CASE WHEN distribution
expression since its metric_type is distribution-dependent.
Addresses TL review feedback on PR project-koku#6017.
Made-with: Cursor
2256f40 to
acd4e74
Compare
…, and tests Introduces the RatesToUsage table and pipeline for cost model calculations: - RatesToUsage model and migration (M4) with composite index - SQL pipeline: insert per-rate rows, aggregate to daily summary - Orchestration: feature-flagged RTU path in OCPCostModelCostUpdater - Unleash flag: cost-management.backend.cost_breakdown_rates_to_usage - Rate sync logic extracted to rate_sync.py with O(n^2) dedup fix - Custom name generation consolidated into _resolve_custom_name - Full test suite: unit, integration, behavioral, and e2e - Legacy-path tests updated to disable RTU flag Made-with: Cursor
acd4e74 to
7f76f7d
Compare
…nd orchestration
Root cause: label_hash collision in RTU aggregate SQL caused join expansion
and inflated costs (~38% higher). The md5 hash concatenated pod_labels,
volume_labels, and all_labels without delimiters, so rows differing only
in which label field was NULL vs '{}' produced identical hashes.
Fix: add '|' delimiter between label fields in md5() across
insert_usage_rates_to_usage.sql, aggregate_rates_to_daily_summary.sql,
and validate_rates_against_daily_summary.sql.
Additional fixes:
- Tighten base CTE filter to exclude rows with existing cost_model_rate_type
- Fix distribution lookup: read from cost_model table in SQL
- Read metric_type and cost_type from Rate table directly
- Extract _update_vm_usage_costs and call after RTU aggregation
- Factor cluster_cost_per_hour into distribution-aware component
- Enable dev_fallback=True for RTU pipeline in dev/test
Verified: both IQE raw_calc tests pass with dev_fallback=True.
Co-authored-by: Cursor <cursoragent@cursor.com>
0bec8fb to
137f77c
Compare
RCA: RTU Pipeline Cost Inflation with
|
| pod_labels | volume_labels | all_labels | Concatenation | Hash |
|---|---|---|---|---|
NULL → '' |
'{}' |
NULL → '' |
'{}' |
abc123 |
NULL → '' |
NULL → '' |
'{}' |
'{}' |
abc123 ← collision |
Two semantically distinct rows produced the same label_hash. During the LEFT JOIN in aggregation, a single rates_to_usage row matched multiple base rows, inflating costs by 1.5x.
Fix
Added | delimiter between label fields to prevent boundary collisions:
-- AFTER (fixed):
md5(COALESCE(lids.pod_labels::text, '')
|| '|' || COALESCE(lids.volume_labels::text, '')
|| '|' || COALESCE(lids.all_labels::text, '')) AS label_hashApplied consistently across:
insert_usage_rates_to_usage.sqlaggregate_rates_to_daily_summary.sqlvalidate_rates_against_daily_summary.sql
Verification
Both IQE tests pass with dev_fallback=True:
========= 2 passed, 11369 deselected, 16 warnings in 475.70s (0:07:55) =========
Added full-run-smoke-tests label for full CI validation.
|
/retest |
Summary
PR 2 of 5 in the COST-7249 phased delivery plan.
usage_costs.sqlandCostModel.ratesJSONWhat this PR does
Introduces the RatesToUsage table and pipeline, which replaces the direct-write path in
usage_costs.sqlwith a two-step approach:rates_to_usage(partitioned by month, one row per rate × namespace × node × day at full label granularity)reporting_ocpusagelineitem_daily_summarycolumns (cost_model_cpu_cost,cost_model_memory_cost,cost_model_volume_cost)This makes
rates_to_usagethe single source of truth for cost model calculations, enabling per-rate cost breakdown in Phases 3-4.Pipeline flow
Artifacts
RatesToUsagemodelreporting/provider/ocp/models.pylabel_hashfor performant GROUP BYreporting/migrations/0345_create_rates_to_usage.py(source_uuid, usage_start)andlabel_hashmasu/database/sql/openshift/cost_model/insert_usage_rates_to_usage.sqlusage_costs.sqldirect-writemasu/database/sql/openshift/cost_model/aggregate_rates_to_daily_summary.sqlcost_model_*_costcolumnsmasu/database/sql/openshift/cost_model/delete_rates_to_usage.sqlmasu/database/sql/openshift/cost_model/validate_rates_against_daily_summary.sqlmasu/processor/ocp/ocp_cost_model_cost_updater.py_update_usage_rates_to_usage()and_aggregate_rates_to_daily_summary()methods; wired intoupdate_summary_cost_model_costs()masu/database/ocp_report_db_accessor.pypopulate_usage_rates_to_usage(),aggregate_rates_to_daily_summary(),delete_rates_to_usage()masu/database/cost_model_db_accessor.pyprice_listproperty reads from Rate table instead of JSONmasu/processor/ocp/ocp_report_db_cleaner.pyrates_to_usageadded to partition cleanupocp_cost_model_cost_updater.py_ensure_rates_to_usage_partitions()called before RTU writesmasu/test/processor/ocp/test_phase2_rates_to_usage.pyDesign documentation
Dependencies
Test plan
test_phase2_rates_to_usage.py) passesvalidate_rates_against_daily_summary.sql) returns zero rows (no divergence)rates_to_usagepartitions created and cleaned by purge0345applies cleanly on fresh and existing databasesMade with Cursor