Skip to content

Conversation

@Raviikumar001
Copy link

closes #3336

Summary

  • Add optional precision/scale overrides for unbounded Postgres NUMERIC → ClickHouse Decimal mapping.
  • Proto: introduce FlowConnectionConfigs fields clickhouse_numeric_default_precision and clickhouse_numeric_default_scale.
  • Config: support env-based overrides
  • PEERDB_CLICKHOUSE_NUMERIC_DEFAULT_PRECISION and PEERDB_CLICKHOUSE_NUMERIC_DEFAULT_SCALE
  • Validation: enforce ranges and dependency (precision 1–76, scale 0–precision; scale requires precision).
  • Wiring: thread overrides through normalize, CDC schema deltas, Avro schema/value conversion, and query generation.
  • Model: update qvalue functions (DetermineNumericSettingForDWH, ToDWHColumnType, GetNumericDestinationType) to accept overrides; bounded NUMERIC unchanged.
  • UI: fix CDCConfig to use camelCase fields (
    [clickhouseNumericDefaultPrecision], [clickhouseNumericDefaultScale]) and update handlers.
    -Rust: update FlowConnectionConfigs initializer to include the new optional fields.
  • Tests: add unit tests for override behavior; add ClickHouse e2e test validating Decimal(60,10) mapping for unbounded NUMERIC.

@CLAassistant
Copy link

CLAassistant commented Nov 13, 2025

CLA assistant check
All committers have signed the CLA.

@serprex serprex requested a review from ilidemi November 13, 2025 23:21
@ilidemi ilidemi requested a review from jgao54 November 13, 2025 23:30
@jgao54
Copy link
Contributor

jgao54 commented Nov 14, 2025

@Raviikumar001 appreciate the initiative and contribution here.

Configurable decimal precision/scale for unbounded numeric is a great feature, however it's important that it is implemented at the column-level rather than via global environment variable, in order to accommodate different columns with different precision/scale requirement. (historically we have used env var to store data-type state, and each time it had bit us back and we'd regret going down that path).

In the codebase we already have the concept of "SupportedDestinationTypes". To support configurable decimal precision/scale, that would be the better entry point to introduce this feature (i.e. introduce a NumericToNumericSchemaConversion and NumericToNumericValueConversion). Although the current design for the conversion system also lacks e2e design, so i am not 100% sure how much of a lift it is to preserve precision/scale e2e without digging into the entire write path.

TL;DR is this piece is probably going to be a bigger project. If you have ideas on improving the architecture to make type conversion more extensible for scale/precision, would love to hear them.

@Raviikumar001
Copy link
Author

Raviikumar001 commented Nov 14, 2025

@Raviikumar001 appreciate the initiative and contribution here.

Configurable decimal precision/scale for unbounded numeric is a great feature, however it's important that it is implemented at the column-level rather than via global environment variable, in order to accommodate different columns with different precision/scale requirement. (historically we have used env var to store data-type state, and each time it had bit us back and we'd regret going down that path).

In the codebase we already have the concept of "SupportedDestinationTypes". To support configurable decimal precision/scale, that would be the better entry point to introduce this feature (i.e. introduce a NumericToNumericSchemaConversion and NumericToNumericValueConversion). Although the current design for the conversion system also lacks e2e design, so i am not 100% sure how much of a lift it is to preserve precision/scale e2e without digging into the entire write path.

TL;DR is this piece is probably going to be a bigger project. If you have ideas on improving the architecture to make type conversion more extensible for scale/precision, would love to hear them.

Thanks for you're response, really appreciate it, i'll have to think on this, pls give me some time, i'll share my thoughts on you're query soon!

@Raviikumar001
Copy link
Author

Raviikumar001 commented Nov 14, 2025

@Raviikumar001 appreciate the initiative and contribution here.

Configurable decimal precision/scale for unbounded numeric is a great feature, however it's important that it is implemented at the column-level rather than via global environment variable, in order to accommodate different columns with different precision/scale requirement. (historically we have used env var to store data-type state, and each time it had bit us back and we'd regret going down that path).

In the codebase we already have the concept of "SupportedDestinationTypes". To support configurable decimal precision/scale, that would be the better entry point to introduce this feature (i.e. introduce a NumericToNumericSchemaConversion and NumericToNumericValueConversion). Although the current design for the conversion system also lacks e2e design, so i am not 100% sure how much of a lift it is to preserve precision/scale e2e without digging into the entire write path.

TL;DR is this piece is probably going to be a bigger project. If you have ideas on improving the architecture to make type conversion more extensible for scale/precision, would love to hear them.

@jgao54
Thanks for the detailed feedback — I agree that env-level precision/scale isn’t the right abstraction and that this belongs in the column-level schema and conversion layer. To address this properly, I’m planning to split the work into two PRs to keep the scope reviewable and reduce risk.

Apologies for the lengthy comment 🙇‍♂️

PR 1 — Foundation (column-level spec + ClickHouse support)

Core changes:

  • Introduce a per-column DestinationTypeSpec (decimal { precision, scale, applyToUnboundedOnly }).
  • Add NumericToNumericSchemaConversion + NumericToNumericValueConversion for ClickHouse.
  • Persist the selected decimal spec in the catalog; all DDL, CDC schema updates, Avro/Arrow schema generation, and value conversion use this single source of truth.
  • Precedence:
    (1) column-level spec
    (2) bounded NUMERIC(p,s) typmod
    (3) legacy env defaults (env kept only as a fallback and marked deprecated)

Tests:

  • Unit tests for precedence handling, invalid ranges, bounded vs unbounded behavior.
  • One ClickHouse end-to-end case with different overrides across multiple columns.

Migration:

  • No breaking changes; existing mirrors continue using env defaults until column-level specs are added.

PR 2 — Expansion (UI, additional destinations, policies, deprecation)

Enhancements:

  • UI for configuring per-column Decimal precision/scale with validation.
  • Add converters for BigQuery (NUMERIC/BIGNUMERIC) and Snowflake.
  • Introduce optional rounding/overflow policy (default remains “fail”).
  • Deprecation path: warnings when env fallback is used + option to disable env fallback entirely.

Testing:

  • Broader cross-destination conversion matrix.
  • CDC schema delta tests involving decimal overrides.
  • Rounding and overflow behavior tests.

Splitting this into two PRs keeps the core architectural change (single, column-scoped decimal spec + schema/value unification) focused and reviewable, while the second PR handles UI and multi-destination extensions once the foundation is solid.

Please let me know you're thoughts on this — happy to adjust. Though some testing cannot done on my end(from my laptop) as mentioned above in testing section, i'd really apppreciate you thoughts on my suggestions.
I'll be really greateful if i'll get the chance to be able to do these contributions. Thanks!!

@jgao54
Copy link
Contributor

jgao54 commented Nov 14, 2025

@Raviikumar001 thanks for providing your feedback, the general direction does sound right to me. Unfortunately, at the moment given our team's limited capacity, frankly we would not be able to support architectural changes at this scale, mostly due to the state changes this requires across the board (temporal/db/avro).

It's probably not the best time to introduce this feature; but I expect this to be something we'll revisit sometimes next year.

@Raviikumar001
Copy link
Author

Raviikumar001 commented Nov 15, 2025

@Raviikumar001 thanks for providing your feedback, the general direction does sound right to me. Unfortunately, at the moment given our team's limited capacity, frankly we would not be able to support architectural changes at this scale, mostly due to the state changes this requires across the board (temporal/db/avro).

It's probably not the best time to introduce this feature; but I expect this to be something we'll revisit sometimes next year.

Thanks for the clarification, that makes sense.
Given the architectural scope (temporal/db/avro) and the concern about env‑based type state, I’ll park this feature for now and won’t pursue the env‑based approach further.

I'll be sure to ping you next year, regarding this if you're planning to go through to implement this feature, I'll be more adept in my skills and i'd love to contribute in it in any capacity. Thanks!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Enhancement request for ClickHouse targets: Be able to specify the custom numeric precision and scale for mirrors

3 participants