Skip to content

Conversation

@Alex-Burmak
Copy link
Owner

@Alex-Burmak Alex-Burmak commented May 23, 2025

Summary by Sourcery

Overhaul the database migration workflow to eliminate the temporary database step and drive replication setup directly through Zookeeper transactions, while adding robust schema consistency checks and simplifying the CLI command logic.

New Features:

  • Implement direct creation of Zookeeper nodes and transactions for first and non-first replica setup without using a temporary database
  • Add a schema consistency check to prevent migration when local table metadata diverges from Zookeeper
  • Expose helper functions for reading local metadata and parsing shard/replica macros

Enhancements:

  • Refactor migrate_engine_command to remove temp DB creation and use NodeExistsError to distinguish first vs non-first replicas
  • Modularize migration steps into helper functions (_generate_counter, _create_query_node, _create_database_replica, etc.)
  • Unify metadata file handling via read_local_table_metadata and update change_table_uuid to return a change flag
  • Improve logging around Zookeeper transactions, errors, and migration steps

Tests:

  • Clean up database_migrate feature tests by removing redundant restarts and sleep steps
  • Add a scenario for migrating MergeTree tables where a schema change in one replica causes a controlled migration failure

@Alex-Burmak Alex-Burmak changed the title Test PR Test .sourcery.yaml May 23, 2025
@Alex-Burmak
Copy link
Owner Author

@sourcery-ai review

@sourcery-ai
Copy link

sourcery-ai bot commented May 23, 2025

Reviewer's Guide

This PR replaces the temporary-database migration approach with direct ZooKeeper transactions, reorganizes imports, introduces ZK helper functions and metadata utilities for schema consistency, updates the CLI command accordingly, and extends the BDD tests with restart steps and a new MergeTree scenario.

Sequence Diagram for First Replica ZooKeeper Interaction

sequenceDiagram
    participant M as migrate_as_first_replica
    participant KZC as KazooClient
    participant TXN as ZKTransaction
    participant DBMetadata as DatabaseMetadata
    participant CH as ClickHouse

    M->>KZC: Get client
    M->>KZC: transaction()
    KZC-->>TXN: ZKTransaction object
    M->>TXN: create("/clickhouse/db/first_replica_database_name", ...)
    M->>TXN: create(query_node_path, ...)
    M->>TXN: create(database_replica_path, ...)
    M->>TXN: create(table_metadata_in_zk_path, ...)
    M->>TXN: commit()
    TXN-->>M: Commit results
    M->>M: _update_local_metadata_first_replica(db)
    Note right of M: Calls DBMetadata.set_replicated()
    Note right of M: Calls DBMetadata.update_metadata_file()
    M->>CH: DETACH DATABASE db
    M->>CH: ATTACH DATABASE db
Loading

Class Diagram for Updated Metadata and Migration Logic

classDiagram
    class DatabaseMetadata {
        +database_name: string
        +database_engine: DatabaseEngine
        +replica_path: string
        +shard: string
        +replica_name: string
        +set_replicated()
        +update_metadata_file()
    }

    class migration_module {
        <<Module>>
        +migrate_as_first_replica(ctx, migrating_database)
        +migrate_as_non_first_replica(ctx, migrating_database)
        +create_database_nodes(ctx, migrating_database)
        +is_table_schema_equal(ctx, db, table, path) bool
        +get_shard_and_replica_from_macros(ctx) Tuple[str, str]
        -_update_local_metadata_first_replica(ctx, db)
        -_check_tables_consistent(ctx, db, tables_info)
        -_generate_counter(ctx, zk, db) string
        -_create_query_node(ctx, txn, db, counter)
        -_create_database_replica(ctx, txn, db)
        -_create_database_metadata_nodes(ctx, txn, db)
        -_attach_dbs(ctx, dbs)
        -_detach_dbs(ctx, dbs)
    }

    class table_module {
        <<Module>>
        +read_local_table_metadata(ctx, table_local_metadata_path) string
        +change_table_uuid(ctx, db, table, path, new_uuid, attached) bool
    }

    class KazooClient {
        <<External Library>>
        +transaction() TransactionRequest
        +create(path, value, sequence, ephemeral, makepath)
        +exists(path)
        +commit()
    }

    class TransactionRequest {
        <<External Library>>
        +create(path, value)
        +delete(path)
        +commit()
    }

    migration_module ..> DatabaseMetadata : uses
    migration_module ..> table_module : uses
    migration_module ..> KazooClient : uses
    migration_module ..> TransactionRequest : uses
Loading

File-Level Changes

Change Details Files
Reorganize imports and module dependencies
  • Removed unused imports and consolidated related functions
  • Added KazooClient, TransactionRequest, and ZK path utilities
  • Included replace_macros and get_macros for macro handling
ch_tools/chadmin/internal/migration.py
ch_tools/chadmin/internal/table.py
ch_tools/chadmin/cli/database_metadata.py
ch_tools/chadmin/internal/zookeeper.py
Refactor migration flows to use ZooKeeper transactions instead of a temporary database
  • Replaced create_temp_db logic with create_database_nodes
  • Rewrote migrate_as_first_replica and migrate_as_non_first_replica to open zk_client and use zk.transaction
  • Removed temp_db parameter and cleanup steps
ch_tools/chadmin/internal/migration.py
ch_tools/chadmin/cli/database_group.py
Add ZooKeeper helper functions for transactional node creation and validation
  • Introduce _generate_counter and _check_result_txn for transaction management
  • Implement node-creation helpers (_create_query_node, _create_database_replica, _create_database_metadata_nodes)
  • Provide create_database_nodes wrapper to bootstrap replicated database structure
ch_tools/chadmin/internal/migration.py
Implement metadata management and schema consistency checks
  • Add set_replicated in DatabaseMetadata to update engine and paths
  • Extract read_local_table_metadata and is_table_schema_equal for comparing local vs ZK metadata
  • Refactor change_table_uuid to return a flag and detect schema drift
  • Encapsulate attach/detach database operations into helpers
ch_tools/chadmin/cli/database_metadata.py
ch_tools/chadmin/internal/table.py
ch_tools/chadmin/internal/migration.py
Extend BDD tests with restart steps and a new MergeTree scenario
  • Inserted supervisorctl restart steps to ensure server reload
  • Added @require_version_24.8 MergeTree schema-change migration scenario
  • Removed redundant sleep steps and cleaned up existing scenarios
tests/features/database_migrate.feature

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @Alex-Burmak - I've reviewed your changes - here's some feedback:

  • migration.py has grown into a very large file—consider splitting it into focused modules (e.g. ZK transaction builders, metadata comparators, and replica workflows) to improve readability and maintainability.
  • Several try/except blocks catch broad Exception only to re‐raise—narrow these to specific exceptions (e.g. NodeExistsError) to avoid masking unexpected errors and clarify your error flows.
  • The function _change_tables_uuid returns a boolean but its name suggests only an action—consider renaming it to convey its status return (e.g. has_tables_uuid_changed) for clearer intent.
Here's what I looked at during the review
  • 🟡 General issues: 6 issues found
  • 🟢 Security: all looks good
  • 🟢 Review instructions: all looks good
  • 🟢 Testing: all looks good
  • 🟢 Documentation: all looks good

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

logging.info("result contains NodeExistsError.")
raise NodeExistsError()
if isinstance(result, Exception):
logging.error("result contains ex={}, type=P{}.", result, type(result))
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (typo): Fix logging placeholder typo in _check_result_txn

The format string should use 'type={}' instead of 'type=P{}'. Please correct the placeholder to avoid formatting errors.

Suggested change
logging.error("result contains ex={}, type=P{}.", result, type(result))
logging.error("result contains ex={}, type={}.", result, type(result))


first_replica = False
except Exception as ex:
logging.info("create_database_nodes failed with ex={}", type(ex))
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue: Correct logging arguments for exception details

The log message expects one argument but receives type(ex). To log both the exception and its type, include both in the message or use an f-string for clarity.

Comment on lines +127 to +128
shard = replace_macros("{shard}", get_macros(ctx))
replica = replace_macros("{replica}", get_macros(ctx))
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue: Use the loaded macros dict instead of calling get_macros again

Pass the existing macros variable to replace_macros instead of calling get_macros(ctx) again.

Comment on lines +71 to +76
if "Replicated" in row["engine"]:
logging.warning(
"Replicated table engine {} can have different schema. Continue.",
row["engine"],
)
continue
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (bug_risk): Strengthen replicated-engine detection

Literal substring matching may fail if the engine string's case or format changes. Use lowercasing or a regex for more reliable detection of ReplicatedMergeTree and similar engines.

Suggested change
if "Replicated" in row["engine"]:
logging.warning(
"Replicated table engine {} can have different schema. Continue.",
row["engine"],
)
continue
import re
if re.search(r"replicated", row["engine"], re.IGNORECASE):
logging.warning(
"Replicated table engine {} can have different schema. Continue.",
row["engine"],
)
continue

Comment on lines +472 to +476
local_table_metadata = (
local_table_metadata[:start_pos_uuid] + local_table_metadata[finish_pos_uuid:]
)
zk_table_metadata = (
zk_table_metadata[:start_pos_uuid] + zk_table_metadata[finish_pos_uuid:]
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: Avoid hardcoding UUID length slicing in metadata comparison

Relying on a fixed UUID length may cause issues if the format changes. Use parsing or regex to handle the UUID portion more robustly.

Suggested implementation:

    import re

    metadata_prefix = "ATTACH TABLE _ UUID '"
    # Regex to match the UUID in the metadata string
    uuid_regex = re.compile(r"(ATTACH TABLE _ UUID ')([0-9a-fA-F-]{36})(')")

    def remove_uuid(metadata):
        return uuid_regex.sub(r"\1\3", metadata, count=1)
    local_table_metadata = remove_uuid(local_table_metadata)
    zk_table_metadata = remove_uuid(zk_table_metadata)

Comment on lines 555 to +564
ex,
)
sys.exit(1)


def read_local_table_metadata(ctx: Context, table_local_metadata_path: str) -> str:
if match_str_ch_version(get_version(ctx), "25.1"):
table_local_metadata_path = f"{CLICKHOUSE_PATH}/{table_local_metadata_path}"

with open(table_local_metadata_path, "r", encoding="utf-8") as f:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: Extract version-specific path logic to a helper

Consider moving the version check and metadata-path prefix logic from both change_table_uuid and read_local_table_metadata into a shared utility function to avoid duplication.

Comment on lines +123 to +124
missing = [macro for macro in ["shard", "replica"] if macro not in macros]
if missing:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (code-quality): Use named expression to simplify assignment and conditional (use-named-expression)

Suggested change
missing = [macro for macro in ["shard", "replica"] if macro not in macros]
if missing:
if missing := [
macro for macro in ["shard", "replica"] if macro not in macros
]:

Comment on lines +181 to +183
was_changed = _change_tables_uuid(ctx, tables_info, migrating_database)

if was_changed:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (code-quality): Use named expression to simplify assignment and conditional (use-named-expression)

Suggested change
was_changed = _change_tables_uuid(ctx, tables_info, migrating_database)
if was_changed:
if was_changed := _change_tables_uuid(
ctx, tables_info, migrating_database
):

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How to suppress code-quality suggestions "use-named-expression"?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can turn off that rule in two ways:

  1. Suppress it inline, for just this one spot, by adding a skip comment above the code you don’t want rewritten:
# sourcery skip: use-named-expression
if was_changed:
    _change_tables_uuid(ctx, tables_info, migrating_database)
  1. Disable it globally (or per-file) in your .sourcery.yaml. For example, to disable it project-wide:
# .sourcery.yaml
rules:
  use-named-expression: false

Or to disable it only for a specific file:

# .sourcery.yaml
files:
  ch_tools/chadmin/internal/migration.py:
    exclude_rules:
      - use-named-expression

Either approach will prevent Sourcery from suggesting that particular refactoring.

logging.info(
"New metadata for node from mapping table:\n{}\n===", target_metadata
)
txn.create(
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (code-quality): Use f-string instead of string concatenation [×4] (use-fstring-for-concatenation)

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How to configure current github repository to suppress this type of code-quality suggestions?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can tune Sourcery’s behavior at the repo level by adding a config file in your project root (either .sourcery.yaml, .sourcery.yml or .sourcery.json) and disabling just the rule(s) you don’t want. For example, to silence only the “use-fstring-for-concatenation” suggestions:

# .sourcery.yaml
analysis:
  exclude-rules:
    - use-fstring-for-concatenation

If you’d rather turn off all “code-quality” suggestions, you can also filter by category:

# .sourcery.yaml
analysis:
  exclude-categories:
    - code-quality

After you commit that file to your default branch, Sourcery’s GitHub Action / Bot will pick it up automatically and stop reporting the excluded rules.

–––

If you need to suppress a single suggestion inline you can also sprinkle one-off skips in your code:

# sourcery skip: use-fstring-for-concatenation
some_var = "foo" + bar

But for a global repo-wide setting, the .sourcery.yaml method above is the way to go.

metadata_path = row["metadata_path"]

if match_str_ch_version(get_version(ctx), "25.1"):
metadata_path = CLICKHOUSE_PATH + "/" + metadata_path
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (code-quality): Use f-string instead of string concatenation [×2] (use-fstring-for-concatenation)

Suggested change
metadata_path = CLICKHOUSE_PATH + "/" + metadata_path
metadata_path = f"{CLICKHOUSE_PATH}/{metadata_path}"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants