-
Notifications
You must be signed in to change notification settings - Fork 0
Test .sourcery.yaml #5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Reviewer's GuideThis PR overhauls the database migration flow by replacing the temporary‐database approach with Zookeeper transactions for initializing Replicated databases, streamlining first- and non-first replica logic, adding metadata consistency checks, enhancing macro handling, and updating related CLI commands and tests. Sequence Diagram: First Replica Database MigrationsequenceDiagram
actor User
participant CLI as migrate_engine_command
participant ZK as Zookeeper
participant CH as ClickHouse
participant FS as LocalMetadataStorage
User->>CLI: Execute migrate_engine_command(database)
CLI->>ZK: create_database_nodes(database) # Initial ZK setup for DB
ZK-->>CLI: Nodes created (determines first replica)
CLI->>ZK: migrate_as_first_replica(database)
activate ZK
ZK->>ZK: Generate counter for DDL queue
ZK->>ZK: Start Transaction
ZK->>ZK: Create first_replica_database_name node
ZK->>ZK: Create query log nodes
ZK->>ZK: Create database replica nodes (incl. serverUUID from CH)
ZK->>FS: Read local table metadata files
FS-->>ZK: Table metadata content
ZK->>ZK: Create table metadata nodes in ZK
ZK->>ZK: Commit Transaction
deactivate ZK
ZK-->>CLI: ZK setup successful
CLI->>CH: DETACH DATABASE database
CH-->>CLI: Database detached
CLI->>FS: Update local DB .sql file (set_replicated)
FS-->>CLI: Local metadata updated
CLI->>CH: ATTACH DATABASE database
CH-->>CLI: Database attached
Sequence Diagram: Non-First Replica Database MigrationsequenceDiagram
actor User
participant CLI as migrate_engine_command
participant ZK as Zookeeper
participant CH as ClickHouse
participant FS as LocalMetadataStorage
User->>CLI: Execute migrate_engine_command(database)
CLI->>ZK: create_database_nodes(database) # Attempt initial ZK setup
ZK-->>CLI: NodeExistsError (determines non-first replica)
CLI->>ZK: migrate_as_non_first_replica(database)
activate ZK
ZK->>ZK: Generate counter for DDL queue
ZK->>ZK: Start Transaction
ZK->>ZK: Create query log nodes
ZK->>ZK: Create database replica nodes (incl. serverUUID from CH)
ZK->>ZK: Commit Transaction
deactivate ZK
ZK-->>CLI: ZK replica setup successful
CLI->>CH: Get tables info (name, uuid, metadata_path, engine)
CH-->>CLI: Tables info
CLI->>CH: DETACH DATABASE database
CH-->>CLI: Database detached
CLI->>ZK: Check table schema consistency
activate ZK
ZK->>FS: Read local table metadata file
FS-->>ZK: Local table metadata content
ZK->>ZK: Compare local metadata (sans UUID) with ZK metadata
deactivate ZK
ZK-->>CLI: Consistency check result
CLI->>FS: Update local DB .sql file (set_replicated)
FS-->>CLI: Local DB metadata updated
CLI->>FS: Change table UUIDs in local .sql files if different from ZK
FS-->>CLI: Local table UUIDs updated (or not)
alt Table UUIDs were changed
CLI->>User: Inform ClickHouse restart needed
else No UUID changes or non-critical differences
CLI->>CH: ATTACH DATABASE database
CH-->>CLI: Database attached
end
Class Diagram: Update to DatabaseMetadata and Zookeeper InteractionclassDiagram
class DatabaseMetadata {
+str database_name
+DatabaseEngine database_engine
+str replica_path
+str shard
+str replica_name
+set_replicated() void
+update_metadata_file() void
}
class KazooClient {
+transaction() TransactionRequest
}
class TransactionRequest {
+create(path, value)
+delete(path)
+commit() list
}
DatabaseMetadata ..> KazooClient : Uses (indirectly via zk_client context)
KazooClient ..> TransactionRequest : Creates
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
|
@sourcery-ai review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @Alex-Burmak - I've reviewed your changes - here's some feedback:
- There’s a lot of repeated txn.create() calls for different ZK nodes—consider extracting common path/value creation into helper functions to reduce duplication.
- The fixed‐offset slicing to remove UUIDs in is_table_schema_equal is brittle; using a regex or structured parser would be more robust than hard-coded start/length values.
- The new feature scenarios duplicate restart/sleep steps—factor those into a Background or Scenario Outline in database_migrate.feature to DRY up the tests.
Here's what I looked at during the review
- 🟡 General issues: 1 issue found
- 🟢 Security: all looks good
- 🟢 Review instructions: all looks good
- 🟢 Testing: all looks good
- 🟢 Documentation: all looks good
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| logging.info("result contains NodeExistsError.") | ||
| raise NodeExistsError() | ||
| if isinstance(result, Exception): | ||
| logging.error("result contains ex={}, type=P{}.", result, type(result)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue (typo): Fix placeholder typo in logging
The format string should use type={} instead of type=P{} to log the exception type correctly.
| logging.error("result contains ex={}, type=P{}.", result, type(result)) | |
| logging.error("result contains ex={}, type={}.", result, type(result)) |
| missing = [macro for macro in ["shard", "replica"] if macro not in macros] | ||
| if missing: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion (code-quality): Use named expression to simplify assignment and conditional (use-named-expression)
| missing = [macro for macro in ["shard", "replica"] if macro not in macros] | |
| if missing: | |
| if missing := [ | |
| macro for macro in ["shard", "replica"] if macro not in macros | |
| ]: |
| was_changed = _change_tables_uuid(ctx, tables_info, migrating_database) | ||
|
|
||
| if was_changed: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion (code-quality): Use named expression to simplify assignment and conditional (use-named-expression)
| was_changed = _change_tables_uuid(ctx, tables_info, migrating_database) | |
| if was_changed: | |
| if was_changed := _change_tables_uuid( | |
| ctx, tables_info, migrating_database | |
| ): |
| logging.info( | ||
| "New metadata for node from mapping table:\n{}\n===", target_metadata | ||
| ) | ||
| txn.create( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue (code-quality): Use f-string instead of string concatenation [×4] (use-fstring-for-concatenation)
| metadata_path = row["metadata_path"] | ||
|
|
||
| if match_str_ch_version(get_version(ctx), "25.1"): | ||
| metadata_path = CLICKHOUSE_PATH + "/" + metadata_path |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion (code-quality): Use f-string instead of string concatenation [×2] (use-fstring-for-concatenation)
| metadata_path = CLICKHOUSE_PATH + "/" + metadata_path | |
| metadata_path = f"{CLICKHOUSE_PATH}/{metadata_path}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @Alex-Burmak - I've reviewed your changes - here's some feedback:
- Consider centralizing Zookeeper path templates (e.g. "/clickhouse/..." strings) into constants or helper functions to reduce duplication and simplify maintenance.
- Extract repeated magic values (like "0", "1", default log pointers, etc.) into named constants or configuration entries to improve readability and avoid hard-coding.
- migration.py has grown quite large—consider splitting it into smaller modules or classes (e.g. separating first-replica vs non-first-replica logic) for clearer structure and easier navigation.
Here's what I looked at during the review
- 🟡 General issues: 4 issues found
- 🟢 Security: all looks good
- 🟢 Review instructions: all looks good
- 🟢 Testing: all looks good
- 🟢 Documentation: all looks good
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| shard = replace_macros("{shard}", get_macros(ctx)) | ||
| replica = replace_macros("{replica}", get_macros(ctx)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: Unnecessary second call to get_macros
Pass the previously assigned macros variable to replace_macros instead of calling get_macros(ctx) again.
| shard = replace_macros("{shard}", get_macros(ctx)) | |
| replica = replace_macros("{replica}", get_macros(ctx)) | |
| shard = replace_macros("{shard}", macros) | |
| replica = replace_macros("{replica}", macros) |
| if missing: | ||
| raise RuntimeError(f"Failed replace marcos. {missing}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpick (typo): Typo in exception message ('marcos')
| if missing: | |
| raise RuntimeError(f"Failed replace marcos. {missing}") | |
| if missing: | |
| raise RuntimeError(f"Failed replace macros. {missing}") |
| logging.info( | ||
| "create_database_nodes failed with NodeExistsError. {}, type={}. Migrate as second replica", | ||
| ex, | ||
| type(ex), | ||
| ) | ||
|
|
||
| first_replica = False | ||
| except Exception as ex: | ||
| logging.info("create_database_nodes failed with ex={}", type(ex)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpick: Logging only the exception type
Log the exception message as well to provide more context on the failure.
| ) | ||
|
|
||
|
|
||
| def _get_tables_info_and_detach(ctx: Context, database_name: str) -> dict: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpick: Incorrect return type annotation
Update the function signature to use -> List[dict] to match the actual return type.
| missing = [macro for macro in ["shard", "replica"] if macro not in macros] | ||
| if missing: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion (code-quality): Use named expression to simplify assignment and conditional (use-named-expression)
| missing = [macro for macro in ["shard", "replica"] if macro not in macros] | |
| if missing: | |
| if missing := [ | |
| macro for macro in ["shard", "replica"] if macro not in macros | |
| ]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How I can disable "use-named-expression" suggestions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You have two options:
- Globally turn off the use-named-expression rule in your repo’s
.sourcery.yaml. For example, at the root of your project add:
# .sourcery.yaml
rules:
use-named-expression:
enabled: false(or, depending on your Sourcery version, you may have to use disabled_rules: [use-named-expression])
- Suppress it inline on a per-function or per-file basis by inserting a comment:
# sourcery skip: use-named-expression
def get_shard_and_replica_from_macros(ctx: Context) -> Tuple[str, str]:
…That will tell Sourcery to ignore that particular refactoring.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How I can disable "use-named-expression" suggestions for Sourcery installed through GitHub application?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You still need to tell Sourcery what to ignore, even when it’s running as a GitHub App. There isn’t a toggle in the GitHub-App UI—you have to ship a config in your repo (or use inline skips). Two options:
-
Repo-wide via .sourcery.yaml
At the root of your repo add something like:# .sourcery.yaml rules: use-named-expression: enabled: false # or, depending on your Sourcery version: # disabled_rules: [use-named-expression]
-
File- or function-level inline suppression
Add a# sourcery skip:comment right before the function or file you want Sourcery to ignore:# sourcery skip: use-named-expression def get_shard_and_replica_from_macros(ctx: Context) -> Tuple[str, str]: …
Once you commit one of those, Sourcery (even as a GitHub App) will stop suggesting the use-named-expression refactoring.
| was_changed = _change_tables_uuid(ctx, tables_info, migrating_database) | ||
|
|
||
| if was_changed: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion (code-quality): Use named expression to simplify assignment and conditional (use-named-expression)
| was_changed = _change_tables_uuid(ctx, tables_info, migrating_database) | |
| if was_changed: | |
| if was_changed := _change_tables_uuid( | |
| ctx, tables_info, migrating_database | |
| ): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How I can disable this type of suggestions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can turn off that particular refactoring in a couple of ways:
- Per-project (via .sourcery.yaml)
Create a .sourcery.yaml at the root of your repo and blacklist the rule:
# .sourcery.yaml
ignore_rules:
- use-named-expression- In-file or per-code block
-
To skip a single suggestion on the next statement:
# sourcery skip: use-named-expression if _change_tables_uuid(ctx, tables_info, migrating_database): …
-
To skip all Sourcery suggestions in a file, put at the top:
# sourcery skip-file
With either of those in place Sourcery will stop proposing the use-named-expression change.
| logging.info( | ||
| "New metadata for node from mapping table:\n{}\n===", target_metadata | ||
| ) | ||
| txn.create( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue (code-quality): Use f-string instead of string concatenation [×4] (use-fstring-for-concatenation)
| metadata_path = row["metadata_path"] | ||
|
|
||
| if match_str_ch_version(get_version(ctx), "25.1"): | ||
| metadata_path = CLICKHOUSE_PATH + "/" + metadata_path |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion (code-quality): Use f-string instead of string concatenation [×2] (use-fstring-for-concatenation)
| metadata_path = CLICKHOUSE_PATH + "/" + metadata_path | |
| metadata_path = f"{CLICKHOUSE_PATH}/{metadata_path}" |
Summary by Sourcery
Rewrite ClickHouse database migration to use direct ZooKeeper transactions for Replicated databases, remove the temporary-database workaround, and add robust schema consistency checks.
New Features:
Enhancements:
Tests: