-
Notifications
You must be signed in to change notification settings - Fork 75
feat: update partition spec #401
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
| /// | ||
| /// \param term The unbound term representing the partition transform. | ||
| /// \return Reference to this for method chaining. | ||
| UpdatePartitionSpec& AddField(std::shared_ptr<UnboundTerm<BoundReference>> term); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to use std::shared_ptr<Term> term as the input and use Term::is_unbound() and Term::kind() to check and cast to the right subclass? This may help simplify the API.
|
|
||
| // Pending changes | ||
| std::vector<PartitionField> adds_; | ||
| std::unordered_map<int32_t, PartitionField> added_time_fields_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto, we can directly use pointer to adds_.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since adds_ can be realloced, using pointer to the array element is not safe. After some digging, I found using PartitionField::ToString as the value should work.
9ea3cf4 to
ff938c0
Compare
ff938c0 to
c3850c6
Compare
wgtmac
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have reviewed all files except for the test. My main concerns are:
- The partition name can be
const std::string&and regard empty as missing (a.k.a. null in the Java impl) - Simplify the api for AddField and RemoveField.
- Apply method can be simpler but we need to wait for a moment before the code is ready to use.
| const TableMetadata* base_metadata = transaction_->base(); | ||
| if (base_metadata == nullptr) [[unlikely]] { | ||
| AddError(ErrorKind::kInvalidArgument, | ||
| "Base table metadata is required to construct UpdatePartitionSpec"); | ||
| return; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| const TableMetadata* base_metadata = transaction_->base(); | |
| if (base_metadata == nullptr) [[unlikely]] { | |
| AddError(ErrorKind::kInvalidArgument, | |
| "Base table metadata is required to construct UpdatePartitionSpec"); | |
| return; | |
| } | |
| const TableMetadata& base_metadata = transaction_->current(); |
We should call current to get all current accumulated changes.
| const TableMetadata* base_metadata = transaction_->base(); | ||
| if (base_metadata == nullptr) { | ||
| return; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| const TableMetadata* base_metadata = transaction_->base(); | |
| if (base_metadata == nullptr) { | |
| return; | |
| } | |
| const TableMetadata& base_metadata = transaction_->current(); |
| } | ||
| schema_ = std::move(schema_result.value()); | ||
|
|
||
| last_assigned_partition_id_ = spec_->last_assigned_field_id(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| last_assigned_partition_id_ = spec_->last_assigned_field_id(); | |
| last_assigned_partition_id_ = base_metadata->last_partition_id; | |
| if (last_assigned_partition_id_ < PartitionSpec::kLegacyPartitionDataIdStart - 1) { | |
| last_assigned_partition_id_ = PartitionSpec::kLegacyPartitionDataIdStart - 1; | |
| } |
We cannot get last_assigned_partition_id_ from the current spec since it may be an older one.
| for (const auto& field : partition_spec->fields()) { | ||
| TransformKey key{field.source_id(), field.transform()->ToString()}; | ||
| // Use emplace to only insert if key doesn't exist, preserving first occurrence | ||
| // This ensures we get the earliest field ID for recycling |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this correct? IIUC, base_metadata->partition_specs does not guarantee any order of the partition specs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to rephrase the comment here to avoid misleading understanding, or simply remove it.
| UpdatePartitionSpec& UpdatePartitionSpec::AddField(std::shared_ptr<NamedReference> term, | ||
| std::string part_name) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| UpdatePartitionSpec& UpdatePartitionSpec::AddField(std::shared_ptr<NamedReference> term, | |
| std::string part_name) { | |
| UpdatePartitionSpec& UpdatePartitionSpec::AddField(const std::shared_ptr<NamedReference>& term, | |
| const std::string& part_name) { |
Both term and part_name are only accessed internally, we can use const reference to avoid copy.
| new_fields.push_back(field); | ||
| } | ||
| } else if (format_version_ < 2) { | ||
| // In V1, deleted fields are replaced with void transform |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's add the comment from java impl?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// field IDs were not required for v1 and were assigned sequentially in each partition spec
// starting at 1,000.
// to maintain consistent field ids across partition specs in v1 tables, any partition field
// that is removed
// must be replaced with a null transform. null values are always allowed in partition data.
| new_fields.insert(new_fields.end(), adds_.begin(), adds_.end()); | ||
|
|
||
| // Determine the new spec ID | ||
| int32_t new_spec_id = spec_ ? spec_->spec_id() + 1 : PartitionSpec::kInitialSpecId; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logic here and below are not necessary. The right approach is to implement TableMetadataBuilder::SetDefaultPartitionSpec and TableMetadataBuilder::AddPartitionSpec to let it handle reusing or creating a new spec_id and then call it from here.
@HeartLinked is refactoring PendingUpdate::Apply and Transaction so we need to wait that PR before making such changes.
| case TransformType::kMonth: | ||
| case TransformType::kDay: | ||
| case TransformType::kHour: | ||
| case TransformType::kUnknown: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we use Result as the return type and return error for kUnknown?
| int32_t source_id, const std::shared_ptr<Transform>& transform) const { | ||
| // Find the source field name | ||
| auto field_result = schema_->FindFieldById(source_id); | ||
| std::string_view source_name = "unknown"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about returning Result and handling return for FindFieldById and unknown name?
| source_name = field_result.value().value().get().name(); | ||
| } | ||
|
|
||
| return transform->GeneratePartitionName(std::string(source_name)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| return transform->GeneratePartitionName(std::string(source_name)); | |
| return transform->GeneratePartitionName(source_name); |
Transform::GeneratePartitionName accepts string_view.
No description provided.