Skip to content

Conversation

@zhjwpku
Copy link
Collaborator

@zhjwpku zhjwpku commented Dec 7, 2025

No description provided.

///
/// \param term The unbound term representing the partition transform.
/// \return Reference to this for method chaining.
UpdatePartitionSpec& AddField(std::shared_ptr<UnboundTerm<BoundReference>> term);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to use std::shared_ptr<Term> term as the input and use Term::is_unbound() and Term::kind() to check and cast to the right subclass? This may help simplify the API.


// Pending changes
std::vector<PartitionField> adds_;
std::unordered_map<int32_t, PartitionField> added_time_fields_;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto, we can directly use pointer to adds_.

Copy link
Collaborator Author

@zhjwpku zhjwpku Dec 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since adds_ can be realloced, using pointer to the array element is not safe. After some digging, I found using PartitionField::ToString as the value should work.

@zhjwpku zhjwpku force-pushed the update_partition_spec branch from 9ea3cf4 to ff938c0 Compare December 20, 2025 12:25
@zhjwpku zhjwpku force-pushed the update_partition_spec branch from ff938c0 to c3850c6 Compare December 22, 2025 15:03
@zhjwpku zhjwpku requested a review from wgtmac December 23, 2025 04:09
Copy link
Member

@wgtmac wgtmac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have reviewed all files except for the test. My main concerns are:

  • The partition name can be const std::string& and regard empty as missing (a.k.a. null in the Java impl)
  • Simplify the api for AddField and RemoveField.
  • Apply method can be simpler but we need to wait for a moment before the code is ready to use.

Comment on lines +48 to +53
const TableMetadata* base_metadata = transaction_->base();
if (base_metadata == nullptr) [[unlikely]] {
AddError(ErrorKind::kInvalidArgument,
"Base table metadata is required to construct UpdatePartitionSpec");
return;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
const TableMetadata* base_metadata = transaction_->base();
if (base_metadata == nullptr) [[unlikely]] {
AddError(ErrorKind::kInvalidArgument,
"Base table metadata is required to construct UpdatePartitionSpec");
return;
}
const TableMetadata& base_metadata = transaction_->current();

We should call current to get all current accumulated changes.

Comment on lines +464 to +467
const TableMetadata* base_metadata = transaction_->base();
if (base_metadata == nullptr) {
return;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
const TableMetadata* base_metadata = transaction_->base();
if (base_metadata == nullptr) {
return;
}
const TableMetadata& base_metadata = transaction_->current();

}
schema_ = std::move(schema_result.value());

last_assigned_partition_id_ = spec_->last_assigned_field_id();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
last_assigned_partition_id_ = spec_->last_assigned_field_id();
last_assigned_partition_id_ = base_metadata->last_partition_id;
if (last_assigned_partition_id_ < PartitionSpec::kLegacyPartitionDataIdStart - 1) {
last_assigned_partition_id_ = PartitionSpec::kLegacyPartitionDataIdStart - 1;
}

We cannot get last_assigned_partition_id_ from the current spec since it may be an older one.

for (const auto& field : partition_spec->fields()) {
TransformKey key{field.source_id(), field.transform()->ToString()};
// Use emplace to only insert if key doesn't exist, preserving first occurrence
// This ensures we get the earliest field ID for recycling
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this correct? IIUC, base_metadata->partition_specs does not guarantee any order of the partition specs.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to rephrase the comment here to avoid misleading understanding, or simply remove it.

Comment on lines +115 to +116
UpdatePartitionSpec& UpdatePartitionSpec::AddField(std::shared_ptr<NamedReference> term,
std::string part_name) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
UpdatePartitionSpec& UpdatePartitionSpec::AddField(std::shared_ptr<NamedReference> term,
std::string part_name) {
UpdatePartitionSpec& UpdatePartitionSpec::AddField(const std::shared_ptr<NamedReference>& term,
const std::string& part_name) {

Both term and part_name are only accessed internally, we can use const reference to avoid copy.

new_fields.push_back(field);
}
} else if (format_version_ < 2) {
// In V1, deleted fields are replaced with void transform
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add the comment from java impl?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

    // field IDs were not required for v1 and were assigned sequentially in each partition spec
    // starting at 1,000.
    // to maintain consistent field ids across partition specs in v1 tables, any partition field
    // that is removed
    // must be replaced with a null transform. null values are always allowed in partition data.

new_fields.insert(new_fields.end(), adds_.begin(), adds_.end());

// Determine the new spec ID
int32_t new_spec_id = spec_ ? spec_->spec_id() + 1 : PartitionSpec::kInitialSpecId;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic here and below are not necessary. The right approach is to implement TableMetadataBuilder::SetDefaultPartitionSpec and TableMetadataBuilder::AddPartitionSpec to let it handle reusing or creating a new spec_id and then call it from here.

@HeartLinked is refactoring PendingUpdate::Apply and Transaction so we need to wait that PR before making such changes.

case TransformType::kMonth:
case TransformType::kDay:
case TransformType::kHour:
case TransformType::kUnknown:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we use Result as the return type and return error for kUnknown?

int32_t source_id, const std::shared_ptr<Transform>& transform) const {
// Find the source field name
auto field_result = schema_->FindFieldById(source_id);
std::string_view source_name = "unknown";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about returning Result and handling return for FindFieldById and unknown name?

source_name = field_result.value().value().get().name();
}

return transform->GeneratePartitionName(std::string(source_name));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return transform->GeneratePartitionName(std::string(source_name));
return transform->GeneratePartitionName(source_name);

Transform::GeneratePartitionName accepts string_view.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants