Skip to content

Conversation

@WZhuo
Copy link
Contributor

@WZhuo WZhuo commented Dec 22, 2025

This PR mainly includes the following changes:

  1. Modify NameToIdVisitor so that the visit method can set both name_to_id and id_to_name map simultaneously.
  2. Add identifier_field_ids as a member of Schema to specify identifier_fields.
  3. Add AssignFreshIdVisitor to assign entirely new field IDs to a Schema. This will be used later when creating a new table.

@WZhuo WZhuo force-pushed the id_assigner branch 4 times, most recently from 88f50bf to 1006ae4 Compare December 23, 2025 02:57
}

std::shared_ptr<StructType> AssignFreshIdVisitor::Visit(const StructType& type) const {
auto fresh_ids = type.fields() |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI: there is a discussion thread for orders to generate these ids: https://lists.apache.org/thread/kwfy80s2nm6xcpohkrzznsmhqsmz32c3

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It has some misunderstandings about the id assigner. All the implementations it mentioned, including Java, Python, and Rust, are consistent. The actual implementation is: use pre-order traversal, and a field's ID is assigned when its parent node is visited. This ensures that the IDs of the first-level child nodes in a struct are consecutive.

@WZhuo WZhuo force-pushed the id_assigner branch 2 times, most recently from aeac91e to 10d688d Compare December 23, 2025 08:56
Copy link
Member

@wgtmac wgtmac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on it! Generally LGTM. I've left some minor comments.

Comment on lines 124 to 128
const std::vector<int32_t>& IdentifierFieldIds() const;
Result<std::vector<std::string>> IdentifierFieldNames() const;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
const std::vector<int32_t>& IdentifierFieldIds() const;
Result<std::vector<std::string>> IdentifierFieldNames() const;
/// \brief Returns field ids of identifier fields.
const std::vector<int32_t>& IdentifierFieldIds() const;
/// \brief Returns canonical field names of identifier fields.
Result<std::vector<std::string>> IdentifierFieldNames() const;

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants