Test .sourcery.yaml #5

Alex-Burmak · 2025-05-23T13:22:05Z

Summary by Sourcery

Rewrite ClickHouse database migration to use direct ZooKeeper transactions for Replicated databases, remove the temporary-database workaround, and add robust schema consistency checks.

New Features:

Introduce create_database_nodes to initialize Replicated database nodes in ZooKeeper without creating a temporary database.
Implement migrate_as_first_replica and migrate_as_non_first_replica flows with ZooKeeper transactions, including counter generation and query/log node setup.
Add get_shard_and_replica_from_macros helper to extract shard and replica identifiers from ClickHouse macros.
Implement schema consistency checking (is_table_schema_equal) to abort migration when local table metadata diverges from ZooKeeper.

Enhancements:

Refactor migration logic to remove the temporary database approach and consolidate detach/attach helpers.
Refactor change_table_uuid to return a change flag and introduce read_local_table_metadata for metadata comparison.
Simplify migrate_engine_command to detect first vs. non-first replica based on NodeExistsError from create_database_nodes.
Improve logging, error handling, and transaction result validation in migration routines.

Tests:

Clean up existing database migration feature tests by removing redundant restarts and sleeps.
Add a new scenario to validate schema-drift detection for MergeTree tables before and after migration.

sourcery-ai · 2025-05-23T13:22:17Z

Reviewer's Guide

This PR overhauls the database migration flow by replacing the temporary‐database approach with Zookeeper transactions for initializing Replicated databases, streamlining first- and non-first replica logic, adding metadata consistency checks, enhancing macro handling, and updating related CLI commands and tests.

Sequence Diagram: First Replica Database Migration

sequenceDiagram
    actor User
    participant CLI as migrate_engine_command
    participant ZK as Zookeeper
    participant CH as ClickHouse
    participant FS as LocalMetadataStorage

    User->>CLI: Execute migrate_engine_command(database)
    CLI->>ZK: create_database_nodes(database) # Initial ZK setup for DB
    ZK-->>CLI: Nodes created (determines first replica)

    CLI->>ZK: migrate_as_first_replica(database)
    activate ZK
    ZK->>ZK: Generate counter for DDL queue
    ZK->>ZK: Start Transaction
    ZK->>ZK: Create first_replica_database_name node
    ZK->>ZK: Create query log nodes
    ZK->>ZK: Create database replica nodes (incl. serverUUID from CH)
    ZK->>FS: Read local table metadata files
    FS-->>ZK: Table metadata content
    ZK->>ZK: Create table metadata nodes in ZK
    ZK->>ZK: Commit Transaction
    deactivate ZK
    ZK-->>CLI: ZK setup successful

    CLI->>CH: DETACH DATABASE database
    CH-->>CLI: Database detached
    CLI->>FS: Update local DB .sql file (set_replicated)
    FS-->>CLI: Local metadata updated
    CLI->>CH: ATTACH DATABASE database
    CH-->>CLI: Database attached

Sequence Diagram: Non-First Replica Database Migration

sequenceDiagram
    actor User
    participant CLI as migrate_engine_command
    participant ZK as Zookeeper
    participant CH as ClickHouse
    participant FS as LocalMetadataStorage

    User->>CLI: Execute migrate_engine_command(database)
    CLI->>ZK: create_database_nodes(database) # Attempt initial ZK setup
    ZK-->>CLI: NodeExistsError (determines non-first replica)

    CLI->>ZK: migrate_as_non_first_replica(database)
    activate ZK
    ZK->>ZK: Generate counter for DDL queue
    ZK->>ZK: Start Transaction
    ZK->>ZK: Create query log nodes
    ZK->>ZK: Create database replica nodes (incl. serverUUID from CH)
    ZK->>ZK: Commit Transaction
    deactivate ZK
    ZK-->>CLI: ZK replica setup successful

    CLI->>CH: Get tables info (name, uuid, metadata_path, engine)
    CH-->>CLI: Tables info
    CLI->>CH: DETACH DATABASE database
    CH-->>CLI: Database detached

    CLI->>ZK: Check table schema consistency
    activate ZK
    ZK->>FS: Read local table metadata file
    FS-->>ZK: Local table metadata content
    ZK->>ZK: Compare local metadata (sans UUID) with ZK metadata
    deactivate ZK
    ZK-->>CLI: Consistency check result

    CLI->>FS: Update local DB .sql file (set_replicated)
    FS-->>CLI: Local DB metadata updated
    CLI->>FS: Change table UUIDs in local .sql files if different from ZK
    FS-->>CLI: Local table UUIDs updated (or not)

    alt Table UUIDs were changed
        CLI->>User: Inform ClickHouse restart needed
    else No UUID changes or non-critical differences
        CLI->>CH: ATTACH DATABASE database
        CH-->>CLI: Database attached
    end

Class Diagram: Update to DatabaseMetadata and Zookeeper Interaction

classDiagram
    class DatabaseMetadata {
        +str database_name
        +DatabaseEngine database_engine
        +str replica_path
        +str shard
        +str replica_name
        +set_replicated() void
        +update_metadata_file() void
    }
    class KazooClient {
        +transaction() TransactionRequest
    }
    class TransactionRequest {
        +create(path, value)
        +delete(path)
        +commit() list
    }
    DatabaseMetadata ..> KazooClient : Uses (indirectly via zk_client context)
    KazooClient ..> TransactionRequest : Creates

File-Level Changes

Change	Details	Files
Adopt transactional Zookeeper API for node creation	Introduce _generate_counter and _check_result_txn to manage ZK sequences and transaction results Add create_database_nodes and multiple create* helper functions using TransactionRequest Replace ad-hoc update_zk_nodes and list_zk_nodes calls with txn.create operations	`ch_tools/chadmin/internal/migration.py` `ch_tools/chadmin/internal/zookeeper.py`
Refactor migration logic and remove temporary‐database flow	Eliminate create_temp_db and temp_db parameters in migrate functions Use create_database_nodes to initialize first replica, catching NodeExistsError for others Simplify migrate_as_first_replica and migrate_as_non_first_replica with shared detach/attach and metadata updates	`ch_tools/chadmin/internal/migration.py` `ch_tools/chadmin/cli/database_group.py`
Implement local vs ZK metadata consistency checks	Add read_local_table_metadata and is_table_schema_equal to compare schemas Introduce _check_tables_consistent to abort migration on drift Integrate consistency check into non-first replica flow before committing	`ch_tools/chadmin/internal/migration.py` `ch_tools/chadmin/internal/table.py`
Enhance shard and replica identification via macros	Add get_shard_and_replica_from_macros using get_macros and replace_macros Use resolved shard	replica values in query and replica node creation
Update CLI migrate command to use new ZK init and error handling	Replace create_temp_db calls with create_database_nodes in migrate_engine_command Handle NodeExistsError to detect non-first replica Remove temp_db variable and improve exception logging	`ch_tools/chadmin/cli/database_group.py`
Refresh database_migrate.feature tests	Remove redundant sleep steps and consolidate restart commands Add scenario for MergeTree schema drift and expected migration failure Ensure coverage for both first and non-first replica flows	`tests/features/database_migrate.feature`

Tips and commands

Interacting with Sourcery

Trigger a new review: Comment @sourcery-ai review on the pull request.
Continue discussions: Reply directly to Sourcery's review comments.
Generate a GitHub issue from a review comment: Ask Sourcery to create an
issue from a review comment by replying to it. You can also reply to a
review comment with @sourcery-ai issue to create an issue from it.
Generate a pull request title: Write @sourcery-ai anywhere in the pull
request title to generate a title at any time. You can also comment
@sourcery-ai title on the pull request to (re-)generate the title at any time.
Generate a pull request summary: Write @sourcery-ai summary anywhere in
the pull request body to generate a PR summary at any time exactly where you
want it. You can also comment @sourcery-ai summary on the pull request to
(re-)generate the summary at any time.
Generate reviewer's guide: Comment @sourcery-ai guide on the pull
request to (re-)generate the reviewer's guide at any time.
Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
pull request to resolve all Sourcery comments. Useful if you've already
addressed all the comments and don't want to see them anymore.
Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
request to dismiss all existing Sourcery reviews. Especially useful if you
want to start fresh with a new review - don't forget to comment
@sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

Enable or disable review features such as the Sourcery-generated pull request
summary, the reviewer's guide, and others.
Change the review language.
Add, remove or edit custom review instructions.
Adjust other review settings.

Getting Help

Contact our support team for questions or feedback.
Visit our documentation for detailed guides and information.
Keep in touch with the Sourcery team by following us on X/Twitter, LinkedIn or GitHub.

Alex-Burmak · 2025-05-23T13:22:18Z

@sourcery-ai review

sourcery-ai

Hey @Alex-Burmak - I've reviewed your changes - here's some feedback:

There’s a lot of repeated txn.create() calls for different ZK nodes—consider extracting common path/value creation into helper functions to reduce duplication.
The fixed‐offset slicing to remove UUIDs in is_table_schema_equal is brittle; using a regex or structured parser would be more robust than hard-coded start/length values.
The new feature scenarios duplicate restart/sleep steps—factor those into a Background or Scenario Outline in database_migrate.feature to DRY up the tests.

Here's what I looked at during the review

🟡 General issues: 1 issue found
🟢 Security: all looks good
🟢 Review instructions: all looks good
🟢 Testing: all looks good
🟢 Documentation: all looks good

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

sourcery-ai · 2025-05-23T13:27:48Z

ch_tools/chadmin/internal/migration.py

+            logging.info("result contains NodeExistsError.")
+            raise NodeExistsError()
+        if isinstance(result, Exception):
+            logging.error("result contains ex={}, type=P{}.", result, type(result))


issue (typo): Fix placeholder typo in logging

The format string should use type={} instead of type=P{} to log the exception type correctly.

Suggested change

logging.error("result contains ex={}, type=P{}.", result, type(result))

logging.error("result contains ex={}, type={}.", result, type(result))

sourcery-ai · 2025-05-23T13:27:48Z

ch_tools/chadmin/internal/migration.py

+    missing = [macro for macro in ["shard", "replica"] if macro not in macros]
+    if missing:


suggestion (code-quality): Use named expression to simplify assignment and conditional (use-named-expression)

Suggested change

missing = [macro for macro in ["shard", "replica"] if macro not in macros]

if missing:

if missing := [

macro for macro in ["shard", "replica"] if macro not in macros

]:

sourcery-ai · 2025-05-23T13:27:48Z

ch_tools/chadmin/internal/migration.py

+    was_changed = _change_tables_uuid(ctx, tables_info, migrating_database)
+
+    if was_changed:


suggestion (code-quality): Use named expression to simplify assignment and conditional (use-named-expression)

Suggested change

was_changed = _change_tables_uuid(ctx, tables_info, migrating_database)

if was_changed:

if was_changed := _change_tables_uuid(

ctx, tables_info, migrating_database

):

sourcery-ai · 2025-05-23T13:27:48Z

ch_tools/chadmin/internal/migration.py

-        logging.info(
-            "New metadata for node from mapping table:\n{}\n===", target_metadata
-        )
+    txn.create(


issue (code-quality): Use f-string instead of string concatenation [×4] (use-fstring-for-concatenation)

sourcery-ai · 2025-05-23T13:27:48Z

ch_tools/chadmin/internal/migration.py

+        metadata_path = row["metadata_path"]
+
+        if match_str_ch_version(get_version(ctx), "25.1"):
+            metadata_path = CLICKHOUSE_PATH + "/" + metadata_path


suggestion (code-quality): Use f-string instead of string concatenation [×2] (use-fstring-for-concatenation)

Suggested change

metadata_path = CLICKHOUSE_PATH + "/" + metadata_path

metadata_path = f"{CLICKHOUSE_PATH}/{metadata_path}"

sourcery-ai

Hey @Alex-Burmak - I've reviewed your changes - here's some feedback:

Consider centralizing Zookeeper path templates (e.g. "/clickhouse/..." strings) into constants or helper functions to reduce duplication and simplify maintenance.
Extract repeated magic values (like "0", "1", default log pointers, etc.) into named constants or configuration entries to improve readability and avoid hard-coding.
migration.py has grown quite large—consider splitting it into smaller modules or classes (e.g. separating first-replica vs non-first-replica logic) for clearer structure and easier navigation.

Here's what I looked at during the review

🟡 General issues: 4 issues found
🟢 Security: all looks good
🟢 Review instructions: all looks good
🟢 Testing: all looks good
🟢 Documentation: all looks good

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

sourcery-ai · 2025-05-23T13:28:37Z

ch_tools/chadmin/internal/migration.py

+    shard = replace_macros("{shard}", get_macros(ctx))
+    replica = replace_macros("{replica}", get_macros(ctx))


suggestion: Unnecessary second call to get_macros

Pass the previously assigned macros variable to replace_macros instead of calling get_macros(ctx) again.

Suggested change

shard = replace_macros("{shard}", get_macros(ctx))

replica = replace_macros("{replica}", get_macros(ctx))

shard = replace_macros("{shard}", macros)

replica = replace_macros("{replica}", macros)

sourcery-ai · 2025-05-23T13:28:37Z

ch_tools/chadmin/internal/migration.py

+    if missing:
+        raise RuntimeError(f"Failed replace marcos. {missing}")


nitpick (typo): Typo in exception message ('marcos')

Suggested change

if missing:

raise RuntimeError(f"Failed replace marcos. {missing}")

if missing:

raise RuntimeError(f"Failed replace macros. {missing}")

sourcery-ai · 2025-05-23T13:28:37Z

ch_tools/chadmin/cli/database_group.py

+            logging.info(
+                "create_database_nodes failed with NodeExistsError. {}, type={}. Migrate as second replica",
+                ex,
+                type(ex),
+            )

            first_replica = False
+        except Exception as ex:
+            logging.info("create_database_nodes failed with ex={}", type(ex))


nitpick: Logging only the exception type

Log the exception message as well to provide more context on the failure.

sourcery-ai · 2025-05-23T13:28:37Z

ch_tools/chadmin/internal/migration.py

+        )


 def _get_tables_info_and_detach(ctx: Context, database_name: str) -> dict:


nitpick: Incorrect return type annotation

Update the function signature to use -> List[dict] to match the actual return type.

sourcery-ai · 2025-05-23T13:28:37Z

ch_tools/chadmin/internal/migration.py

+    missing = [macro for macro in ["shard", "replica"] if macro not in macros]
+    if missing:


suggestion (code-quality): Use named expression to simplify assignment and conditional (use-named-expression)

Suggested change

missing = [macro for macro in ["shard", "replica"] if macro not in macros]

if missing:

if missing := [

macro for macro in ["shard", "replica"] if macro not in macros

]:

How I can disable "use-named-expression" suggestions?

You have two options:

Globally turn off the use-named-expression rule in your repo’s .sourcery.yaml. For example, at the root of your project add:

# .sourcery.yaml rules: use-named-expression: enabled: false

(or, depending on your Sourcery version, you may have to use disabled_rules: [use-named-expression])

Suppress it inline on a per-function or per-file basis by inserting a comment:

# sourcery skip: use-named-expression def get_shard_and_replica_from_macros(ctx: Context) -> Tuple[str, str]: …

That will tell Sourcery to ignore that particular refactoring.

How I can disable "use-named-expression" suggestions for Sourcery installed through GitHub application?

You still need to tell Sourcery what to ignore, even when it’s running as a GitHub App. There isn’t a toggle in the GitHub-App UI—you have to ship a config in your repo (or use inline skips). Two options:

Repo-wide via .sourcery.yaml
At the root of your repo add something like:

# .sourcery.yaml rules: use-named-expression: enabled: false # or, depending on your Sourcery version: # disabled_rules: [use-named-expression]

File- or function-level inline suppression
Add a # sourcery skip: comment right before the function or file you want Sourcery to ignore:

# sourcery skip: use-named-expression def get_shard_and_replica_from_macros(ctx: Context) -> Tuple[str, str]: …

Once you commit one of those, Sourcery (even as a GitHub App) will stop suggesting the use-named-expression refactoring.

sourcery-ai · 2025-05-23T13:28:37Z

ch_tools/chadmin/internal/migration.py

+    was_changed = _change_tables_uuid(ctx, tables_info, migrating_database)
+
+    if was_changed:


suggestion (code-quality): Use named expression to simplify assignment and conditional (use-named-expression)

Suggested change

was_changed = _change_tables_uuid(ctx, tables_info, migrating_database)

if was_changed:

if was_changed := _change_tables_uuid(

ctx, tables_info, migrating_database

):

How I can disable this type of suggestions?

You can turn off that particular refactoring in a couple of ways:

Per-project (via .sourcery.yaml)
Create a .sourcery.yaml at the root of your repo and blacklist the rule:

# .sourcery.yaml ignore_rules: - use-named-expression

In-file or per-code block

To skip a single suggestion on the next statement:

# sourcery skip: use-named-expression if _change_tables_uuid(ctx, tables_info, migrating_database): …

To skip all Sourcery suggestions in a file, put at the top:

# sourcery skip-file

With either of those in place Sourcery will stop proposing the use-named-expression change.

sourcery-ai · 2025-05-23T13:28:38Z

ch_tools/chadmin/internal/migration.py

-        logging.info(
-            "New metadata for node from mapping table:\n{}\n===", target_metadata
-        )
+    txn.create(


issue (code-quality): Use f-string instead of string concatenation [×4] (use-fstring-for-concatenation)

sourcery-ai · 2025-05-23T13:28:38Z

ch_tools/chadmin/internal/migration.py

+        metadata_path = row["metadata_path"]
+
+        if match_str_ch_version(get_version(ctx), "25.1"):
+            metadata_path = CLICKHOUSE_PATH + "/" + metadata_path


suggestion (code-quality): Use f-string instead of string concatenation [×2] (use-fstring-for-concatenation)

Suggested change

metadata_path = CLICKHOUSE_PATH + "/" + metadata_path

metadata_path = f"{CLICKHOUSE_PATH}/{metadata_path}"

k-morozov added 21 commits April 28, 2025 19:03

check table schema

ec1a349

fix style

68a3da4

apply comment

a02e269

Merge branch 'main' into bugfix/conflict_schema

6590f87

starting replace

9ac0798

database not readonly, timeout create table

85f3740

first test passed

9e59df3

issue with nin finished task

f22f118

up except

7453f83

fix nodes

1c64f04

Merge branch 'main' into feature/zk_nodes_manually

07b212f

up style

f0a08d8

most tests were returned

b1dd567

Txns have been added. Ready to add atomic to conflict schemas.

93aedb7

atomic check schema

a1b22c4

update migration

ffcffc8

small up

916d986

apply comments

79f7ac2

change logic creating query nodes

93c0e4c

apply comments

895efc9

apply comments: check exception, style

0781547

sourcery-ai bot reviewed May 23, 2025

View reviewed changes

Alex-Burmak closed this May 23, 2025

	logging.error("result contains ex={}, type=P{}.", result, type(result))
	logging.error("result contains ex={}, type={}.", result, type(result))

		missing = [macro for macro in ["shard", "replica"] if macro not in macros]
		if missing:

		was_changed = _change_tables_uuid(ctx, tables_info, migrating_database)

		if was_changed:

	metadata_path = CLICKHOUSE_PATH + "/" + metadata_path
	metadata_path = f"{CLICKHOUSE_PATH}/{metadata_path}"

		shard = replace_macros("{shard}", get_macros(ctx))
		replica = replace_macros("{replica}", get_macros(ctx))

		if missing:
		raise RuntimeError(f"Failed replace marcos. {missing}")

		)


		def _get_tables_info_and_detach(ctx: Context, database_name: str) -> dict:

Test .sourcery.yaml #5

Test .sourcery.yaml #5

Uh oh!

Conversation

Alex-Burmak commented May 23, 2025 • edited by sourcery-ai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by Sourcery

Uh oh!

sourcery-ai bot commented May 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviewer's Guide

Sequence Diagram: First Replica Database Migration

Sequence Diagram: Non-First Replica Database Migration

Class Diagram: Update to DatabaseMetadata and Zookeeper Interaction

File-Level Changes

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

Alex-Burmak commented May 23, 2025

Uh oh!

sourcery-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

sourcery-ai bot May 23, 2025

Choose a reason for hiding this comment

Uh oh!

sourcery-ai bot May 23, 2025

Choose a reason for hiding this comment

Uh oh!

sourcery-ai bot May 23, 2025

Choose a reason for hiding this comment

Uh oh!

sourcery-ai bot May 23, 2025

Choose a reason for hiding this comment

Uh oh!

sourcery-ai bot May 23, 2025

Choose a reason for hiding this comment

Uh oh!

sourcery-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

sourcery-ai bot May 23, 2025

Choose a reason for hiding this comment

Uh oh!

sourcery-ai bot May 23, 2025

Choose a reason for hiding this comment

Uh oh!

sourcery-ai bot May 23, 2025

Choose a reason for hiding this comment

Uh oh!

sourcery-ai bot May 23, 2025

Choose a reason for hiding this comment

Uh oh!

sourcery-ai bot May 23, 2025

Choose a reason for hiding this comment

Uh oh!

Alex-Burmak May 23, 2025

Choose a reason for hiding this comment

Uh oh!

sourcery-ai bot May 23, 2025

Choose a reason for hiding this comment

Uh oh!

Alex-Burmak May 23, 2025

Choose a reason for hiding this comment

Uh oh!

sourcery-ai bot May 23, 2025

Choose a reason for hiding this comment

Uh oh!

sourcery-ai bot May 23, 2025

Choose a reason for hiding this comment

Uh oh!

Alex-Burmak May 23, 2025

Choose a reason for hiding this comment

Uh oh!

sourcery-ai bot May 23, 2025

Choose a reason for hiding this comment

Uh oh!

sourcery-ai bot May 23, 2025

Choose a reason for hiding this comment

Alex-Burmak commented May 23, 2025 •

edited by sourcery-ai bot

Loading

sourcery-ai bot commented May 23, 2025 •

edited

Loading