Skip to content

Conversation

@a5-stable
Copy link
Contributor

Sample PR for #695

(This pull request introduces a significant refactor to the ancestry library by replacing direct references to primary_key and id with a new method, ancestry_identifier_column

object
else
unscoped_where { |scope| scope.find(object.try(primary_key) || object) }
unscoped_where { |scope| scope.find(object.try(ancestry_identifier_column) || object) }
Copy link
Collaborator

@kbrock kbrock May 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here, primary_key (:id), vs ancestry_identifier_column (:id) -- I'll keep them the same for now, but want to say this to highlight issues in the rest of this pr.

Copy link
Collaborator

@kbrock kbrock May 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and find inherently is a find_by(:id => object.id)
So this should read find_by(ancestry_identifier_column => object.read_attribute(ancestry_identifier_column))

As I'm thinking more about this, this getting unwieldy very quickly.
ods are pretty low.

By the way, this method could be defined as def to_node(id_or_object). The try is saying "if this is an Object, then lets fetch the id, otherwise lets assume it is an Integer id", which works because ruby dropped the Object#id maybe 15 years ago.'

And the unscoped_where is there so STI works.


nodes.each_with_object({}) do |node, arranged|
children = index[node.id]
children = index[node.ancestry_identifier_column]
Copy link
Collaborator

@kbrock kbrock May 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
children = index[node.ancestry_identifier_column]
children = index[node.read_attribute(ancestry_identifier_column)]

not sure if we want read_attribute or send

Copy link
Contributor Author

@a5-stable a5-stable May 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, sorry — I replaced it too naively... I should've used read_attribute instead.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

id is nice. behind the scenes it does smart stuff. wonder if something similar would work here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on your suggestion, I introduced the method ancestry_identifier_value, which returns read_attribute(ancestry_identifier_column).

And id in Rails method implicitly returns the primary key value, so this ancestry_identifier_value can serve a similar purpose if I'm correct.

By using this method, I can avoid repeatedly writing read_attribute or send whenever I need to fetch the value, which helps keep the code more readable.

if !node.sane_ancestor_ids?
raise Ancestry::AncestryIntegrityException, I18n.t("ancestry.invalid_ancestry_column",
:node_id => node.id,
:node_id => node.ancestry_identifier_column,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
:node_id => node.ancestry_identifier_column,
:node_id => node.read_attribute(ancestry_identifier_column),

unless exists?(ancestor_id)
raise Ancestry::AncestryIntegrityException, I18n.t("ancestry.reference_nonexistent_node",
:node_id => node.id,
:node_id => node.ancestry_identifier_column,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
:node_id => node.ancestry_identifier_column,
:node_id => node.read_attribute(ancestry_identifier_column),

Comment on lines 140 to 142
node.path_ids.zip([nil] + node.path_ids).each do |node_identifier, parent_identifier|
parents[node_identifier] = parent_identifier unless parents.key?(node_identifier)
unless parents[node_identifier] == parent_identifier
Copy link
Collaborator

@kbrock kbrock May 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we should leave as node_id, parent_id -- otherwise this is pretty verbose. What do you think? (I may change my mind later. honestly asking your opinion)

this will also cut down on churn

Copy link
Contributor Author

@a5-stable a5-stable May 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your comment. Sorry, I mixed up parent_identifier and parent_id...
Here's how I'm currently thinking about this:

  • I would not like to change the naming of interface-level fields like parent_id or node_id.

  • These names are compatible with existing behavior, and semantically they make sense as the identifiers used within the context of ancestry gem.

For this reason, I’ve reverted usage of parent_identifier back to parent_id. I'd appreciate your thoughts on this direction.

end
# ... save parent id of this node in parent_ids array if it exists
parent_ids[node.id] = node.parent_id if exists? node.parent_id
parent_ids[node.ancestry_identifier_column] = node.parent_id if exists? node.parent_id
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
parent_ids[node.ancestry_identifier_column] = node.parent_id if exists? node.parent_id
parent_ids[node.read_attribute(ancestry_identifier_column)] = node.parent_id if exists? node.parent_id

feel like these are getting mixed up. parent_id vs parent_identifier.
Maybe you are right and we need to distinguish between the two.

not liking this identifier vs id idea.
Feeling like the path/breadcrumb test may meet your needs better

Comment on lines 267 to 270
# alias for primary_key
def ancestry_identifier_column
primary_key
end
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good idea to do this for the short term

# Validate that the ancestors don't include itself
def ancestry_exclude_self
errors.add(:base, I18n.t("ancestry.exclude_self", class_name: self.class.name.humanize)) if ancestor_ids.include?(id)
errors.add(:base, I18n.t("ancestry.exclude_self", class_name: self.class.name.humanize)) if ancestor_ids.include?(ancestry_identifier_column)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe you need to introduce ancestry_identifier_value - and any time you see id we use ancestry_identifer_value?

@kbrock
Copy link
Collaborator

kbrock commented May 20, 2025

Yea, this is getting confusing for me.

So parent_id is the id from the parent node, but that is technically the parent's ancestry_identifier_column. So when we have root_id and parent_id, then it is the root_identity field and the parent_identity field.

So all lookups of any nodes in the tree need to change.

I'm still thinking breadcrumbing may be the way you want.

Do you have an example of the code that won't work with the current model?

@kbrock
Copy link
Collaborator

kbrock commented May 20, 2025

@a5-stable would your need be met if we had ancestry with was id/id, but also had identifier_ancestry which had identifier/identifier?

@a5-stable
Copy link
Contributor Author

@kbrock
I sincerely appreciate your careful and patient review....!
Let me explain more about the background, API design, and related details.

Usecase

One Example of the use case I’m considering involves a model that uses Temporal Modeling.

Temporal models store historical data, so multiple records can represent the lifecycle of a single "resource." For example, using a bitemporal model might look like this:

CREATE TABLE employee_department_bitemporal (
    id               SERIAL PRIMARY KEY,       -- Surrogate Key
    bitemporal_id    INT NOT NULL,             -- Identifier for the logical employee
    valid_from       DATE NOT NULL,
    valid_to         DATE NOT NULL,
    transaction_from TIMESTAMP NOT NULL,
    transaction_to   TIMESTAMP NOT NULL,
    UNIQUE (bitemporal_id, valid_from, transaction_from)  
);

In this case, id is unique across each version of the record, while bitemporal_id uniquely identifies the logical employee.

Now, let’s say that we want to add a hierarchical association to this model. In some cases, it makes more sense for the ancestry column to store values like bitemporal_id/bitemporal_id/... instead of id/id/..., so that the hierarchy is consistent across the logical employees, not the history records.

would your need be met if we had ancestry with was id/id, but also had identifier_ancestry which had identifier/identifier?

In the use case above, the coexistence strategy (having both id/id and identifier/identifier) might work, but I wonder if storing the unused id/id hierarchy leads to unnecessary code and data overhead.
Perhaps I am not fully appreciating the intention behind this strategy...

public API such as parent_id

I understand that technically parent_id refers to the value in the parent's ancestry_identifier_column, which may differ from a simple primary key. In the context of the ancestry column, I think this is acceptable as an identifier, so it’s fine to keep familiar terms like parent_id and root_id in the public API.

readability

Based on your #698 (comment), I introduced the method ancestry_identifier_value, which returns read_attribute(ancestry_identifier_column).

By using this method, I think we can avoid repeatedly writing read_attribute or send whenever I need to fetch the value.

@a5-stable a5-stable requested a review from kbrock May 22, 2025 01:03
@kbrock
Copy link
Collaborator

kbrock commented May 22, 2025

Ok, cool concept.

So basically the id field is not your primary key, but rather the bitemporal_id field.
And it is not unique, but rather only unique when giving a date in a valid_from..valid_to range and probably assuming that transaction_to is null.

TANGENT: I'd check if creating your indexes with WHERE transaction_to IS NULL give you a good boost. This index would not be usable when you return records with an actual transaction date in the past, but that feels like an edge case.

So if you used :id as the ancestry key, then the tree would not be valid across a valid date boundary. Think this is a worse case than when I was suggesting this before.

But also, just saying parent_id is not good enough, as you would need to specify the valid date (assuming that transaction date was null). But you would definitely want parent_id to be of the bitemoral_id type.

If you are recording a hierarchy in a point in time, then you want ancestry to be recorded at a point in time. Which seems to work, and would probably work better than closure trees. (not taking off the table, but seems to be easier to version using the ancestry path style)

When you move a node in a tree, you need to update a bunch of records. But in your case, you don't really want to modify any records (other than updating the transaction_to and possibly the valid_to dates). You'll need to ensure that updates in the callback do the necessary modification. Tricky edge case. Hopefully the update strategy system would work for you.

TANGENT: I'd like to moving towards sql transactions rather than updating the records in ruby and sending those across N+1 style. But time has been limited so that may never occur. But I think it may affect you.

TANGENT: Are you writing UPDATE triggers in the database to allow UPDATE statements but have those actually get converted to INSERT statements? May cause chaos for rails, or may make it more elegant.

TANGENT: Wondering if thread local variables ref for transaction_time (default to null) and a default valid_time would work. Then you could keep using the database as is with a default scope pulling the values out of the threaded variables. Not sure if you could go so far as to set the primary key to bitemoral_id, but maybe?

All the find operations need to change. so every scope and update in a callback would be wrong. This is a big change.

I agree that you don't want id/id because those are only valid at a single point in time, and in the future, when you update the path, they may be invalid since the ids may span transaction_time scopes.

TANGENT: look into how the multi tenant solutions solve this scoping issue since the behavior may be similar to what you want. I notice that multi-tenant gem uses the active support wrapper over thread local while acts_as_tenant uses Thread.current.

Do keep in mind, in order to work with STI, the code does throw away default scopes, specifically when fetching a parent, but possibly in other cases, as well. So this may introduce issues depending upon how you implement your solution. A number of people over the years seem to be polar pro dropping scopes vs anti dropping scopes.

Ok, that was a little bit of a ramble. Sorry I don't have more time to better organize the thoughts. Hopefully some of that is useful to you.

This seems like it would require quite a bit of change. Need some time to process. Adding the transaction axis and the valid axis makes this quite interesting. For your domain do you need both, or would transaction date meet your needs? If the latter, it may be able to

@kbrock
Copy link
Collaborator

kbrock commented May 22, 2025

ooh, I've used deleted_at for soft deletes and paper_trail for history. But this is a neat idea - though does look potentially intrusive

@a5-stable
Copy link
Contributor Author

@kbrock
Thank you so much for your thoughtful and in-depth feedback. I really appreciate the time you took to consider this!
Your points are very helpful!

In my domain, we do require both the transaction and valid time axes—not just for logical deletion like discard, but to properly track the lifecycle of a resource as a historical record. (even in unitemporal use cases, we might want bitemporal_id/bitemporal_id as the ancestry key instead of id, to represent logical relationships rather than historical versions).

Also, I’m planning to use a multi-tenant-like approach, perhaps with something like the multi-tenant gem to manage valid and transaction times via thread-local variables. Your thoughts on default scopes and STI behavior were particularly helpful—I'll take extra care there.

Just to clarify, I shared this use case mainly to help illustrate the idea of supporting custom identifier columns for ancestry—not to require the entire bitemporal model to be supported out of the box.

I completely understand that this could lead to significant internal changes, and I’d be happy to help. If you have thoughts on where such a change should start or what kind of interface would make sense, I’d love to hear them and contribute.

Again, thank you for your time and insights.

@kbrock
Copy link
Collaborator

kbrock commented May 27, 2025

even in unitemporal use cases, we might want bitemporal_id/bitemporal_id as the ancestry key instead of id, to represent logical relationships rather than historical versions.

Yea, in your case, I feel id/id is not useful. Once you assign any transaction_to or valid_to, then the ancestry` value will be off.

@a5-stable
Copy link
Contributor Author

@kbrock
Thank you very much for your thoughtful feedback and the time you’ve spent on this!
I fully understand that addressing this properly will require some time and effort, and I’m very willing to collaborate closely to move it forward.
If you have any concerns, suggestions, or ideas about where to begin or how to proceed, I would greatly appreciate your guidance.

@kbrock
Copy link
Collaborator

kbrock commented Jul 17, 2025

@a5-stable I am not sure if this is related or totally tangental, and it is still an idea and not a know commodity, but...

- def parent_id
-   ancestry.split("/").last
- end
+ def parent_id
+   self.class.calculate_parent_id(self, ancestry_column)
+ end
+ def self.calculate_parent_id(rec, col_name)
+   rec.send(col_name).split("/").last
+ end

And further down the line...

def has_ancestry
  # ...
  define_method(:parent_id)
    self.class.calculate_parent_id(self, ancestry_column)
  end
  # ...
end

Does that change the way this PR is implemented?

@a5-stable
Copy link
Contributor Author

@kbrock
Thanks for sharing the idea, I think the change around parent_id looks good overall.
Just to clarify: are you thinking of including that change in this PR, or is it more of a future improvement that's on the roadmap?

Either way, I don't see any conflict between that idea and the current PR, so no problem from my side.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants