Abstract all primary key references #698

a5-stable · 2025-05-11T09:34:24Z

Sample PR for #695

（This pull request introduces a significant refactor to the ancestry library by replacing direct references to primary_key and id with a new method, ancestry_identifier_column）

kbrock · 2025-05-20T13:41:00Z

lib/ancestry/class_methods.rb

        object
      else
-        unscoped_where { |scope| scope.find(object.try(primary_key) || object) }
+        unscoped_where { |scope| scope.find(object.try(ancestry_identifier_column) || object) }


here, primary_key (:id), vs ancestry_identifier_column (:id) -- I'll keep them the same for now, but want to say this to highlight issues in the rest of this pr.

and find inherently is a find_by(:id => object.id)
So this should read find_by(ancestry_identifier_column => object.read_attribute(ancestry_identifier_column))

As I'm thinking more about this, this getting unwieldy very quickly.
ods are pretty low.

By the way, this method could be defined as def to_node(id_or_object). The try is saying "if this is an Object, then lets fetch the id, otherwise lets assume it is an Integer id", which works because ruby dropped the Object#id maybe 15 years ago.'

And the unscoped_where is there so STI works.

kbrock · 2025-05-20T13:41:36Z

lib/ancestry/class_methods.rb


      nodes.each_with_object({}) do |node, arranged|
-        children = index[node.id]
+        children = index[node.ancestry_identifier_column]


Suggested change

children = index[node.ancestry_identifier_column]

children = index[node.read_attribute(ancestry_identifier_column)]

not sure if we want read_attribute or send

Oops, sorry — I replaced it too naively... I should've used read_attribute instead.

id is nice. behind the scenes it does smart stuff. wonder if something similar would work here.

Based on your suggestion, I introduced the method ancestry_identifier_value, which returns read_attribute(ancestry_identifier_column).

And id in Rails method implicitly returns the primary key value, so this ancestry_identifier_value can serve a similar purpose if I'm correct.

By using this method, I can avoid repeatedly writing read_attribute or send whenever I need to fetch the value, which helps keep the code more readable.

kbrock · 2025-05-20T13:42:48Z

lib/ancestry/class_methods.rb

          if !node.sane_ancestor_ids?
            raise Ancestry::AncestryIntegrityException, I18n.t("ancestry.invalid_ancestry_column",
-                                                               :node_id => node.id,
+                                                               :node_id => node.ancestry_identifier_column,


Suggested change

:node_id => node.ancestry_identifier_column,

:node_id => node.read_attribute(ancestry_identifier_column),

kbrock · 2025-05-20T13:43:15Z

lib/ancestry/class_methods.rb

            unless exists?(ancestor_id)
              raise Ancestry::AncestryIntegrityException, I18n.t("ancestry.reference_nonexistent_node",
-                                                                 :node_id => node.id,
+                                                                 :node_id => node.ancestry_identifier_column,


Suggested change

:node_id => node.ancestry_identifier_column,

:node_id => node.read_attribute(ancestry_identifier_column),

kbrock · 2025-05-20T13:44:28Z

lib/ancestry/class_methods.rb

+          node.path_ids.zip([nil] + node.path_ids).each do |node_identifier, parent_identifier|
+            parents[node_identifier] = parent_identifier unless parents.key?(node_identifier)
+            unless parents[node_identifier] == parent_identifier


maybe we should leave as node_id, parent_id -- otherwise this is pretty verbose. What do you think? (I may change my mind later. honestly asking your opinion)

this will also cut down on churn

Thank you for your comment. Sorry, I mixed up parent_identifier and parent_id...
Here's how I'm currently thinking about this:

I would not like to change the naming of interface-level fields like parent_id or node_id.

These names are compatible with existing behavior, and semantically they make sense as the identifiers used within the context of ancestry gem.

For this reason, I’ve reverted usage of parent_identifier back to parent_id. I'd appreciate your thoughts on this direction.

kbrock · 2025-05-20T13:46:38Z

lib/ancestry/class_methods.rb

            end
            # ... save parent id of this node in parent_ids array if it exists
-            parent_ids[node.id] = node.parent_id if exists? node.parent_id
+            parent_ids[node.ancestry_identifier_column] = node.parent_id if exists? node.parent_id


Suggested change

parent_ids[node.ancestry_identifier_column] = node.parent_id if exists? node.parent_id

parent_ids[node.read_attribute(ancestry_identifier_column)] = node.parent_id if exists? node.parent_id

feel like these are getting mixed up. parent_id vs parent_identifier.
Maybe you are right and we need to distinguish between the two.

not liking this identifier vs id idea.
Feeling like the path/breadcrumb test may meet your needs better

kbrock · 2025-05-20T13:47:10Z

lib/ancestry/class_methods.rb

+    # alias for primary_key
+    def ancestry_identifier_column
+      primary_key
+    end


good idea to do this for the short term

kbrock · 2025-05-20T13:47:54Z

lib/ancestry/instance_methods.rb

    # Validate that the ancestors don't include itself
    def ancestry_exclude_self
-      errors.add(:base, I18n.t("ancestry.exclude_self", class_name: self.class.name.humanize)) if ancestor_ids.include?(id)
+      errors.add(:base, I18n.t("ancestry.exclude_self", class_name: self.class.name.humanize)) if ancestor_ids.include?(ancestry_identifier_column)


maybe you need to introduce ancestry_identifier_value - and any time you see id we use ancestry_identifer_value?

kbrock · 2025-05-20T15:15:09Z

Yea, this is getting confusing for me.

So parent_id is the id from the parent node, but that is technically the parent's ancestry_identifier_column. So when we have root_id and parent_id, then it is the root_identity field and the parent_identity field.

So all lookups of any nodes in the tree need to change.

I'm still thinking breadcrumbing may be the way you want.

Do you have an example of the code that won't work with the current model?

kbrock · 2025-05-20T16:25:14Z

@a5-stable would your need be met if we had ancestry with was id/id, but also had identifier_ancestry which had identifier/identifier?

a5-stable · 2025-05-21T15:06:42Z

@kbrock
I sincerely appreciate your careful and patient review....!
Let me explain more about the background, API design, and related details.

Usecase

One Example of the use case I’m considering involves a model that uses Temporal Modeling.

Temporal models store historical data, so multiple records can represent the lifecycle of a single "resource." For example, using a bitemporal model might look like this:

CREATE TABLE employee_department_bitemporal (
    id               SERIAL PRIMARY KEY,       -- Surrogate Key
    bitemporal_id    INT NOT NULL,             -- Identifier for the logical employee
    valid_from       DATE NOT NULL,
    valid_to         DATE NOT NULL,
    transaction_from TIMESTAMP NOT NULL,
    transaction_to   TIMESTAMP NOT NULL,
    UNIQUE (bitemporal_id, valid_from, transaction_from)  
);

In this case, id is unique across each version of the record, while bitemporal_id uniquely identifies the logical employee.

Now, let’s say that we want to add a hierarchical association to this model. In some cases, it makes more sense for the ancestry column to store values like bitemporal_id/bitemporal_id/... instead of id/id/..., so that the hierarchy is consistent across the logical employees, not the history records.

would your need be met if we had ancestry with was id/id, but also had identifier_ancestry which had identifier/identifier?

In the use case above, the coexistence strategy (having both id/id and identifier/identifier) might work, but I wonder if storing the unused id/id hierarchy leads to unnecessary code and data overhead.
Perhaps I am not fully appreciating the intention behind this strategy...

public API such as `parent_id`

I understand that technically parent_id refers to the value in the parent's ancestry_identifier_column, which may differ from a simple primary key. In the context of the ancestry column, I think this is acceptable as an identifier, so it’s fine to keep familiar terms like parent_id and root_id in the public API.

readability

Based on your #698 (comment), I introduced the method ancestry_identifier_value, which returns read_attribute(ancestry_identifier_column).

By using this method, I think we can avoid repeatedly writing read_attribute or send whenever I need to fetch the value.

kbrock · 2025-05-22T20:11:17Z

Ok, cool concept.

So basically the id field is not your primary key, but rather the bitemporal_id field.
And it is not unique, but rather only unique when giving a date in a valid_from..valid_to range and probably assuming that transaction_to is null.

TANGENT: I'd check if creating your indexes with WHERE transaction_to IS NULL give you a good boost. This index would not be usable when you return records with an actual transaction date in the past, but that feels like an edge case.

So if you used :id as the ancestry key, then the tree would not be valid across a valid date boundary. Think this is a worse case than when I was suggesting this before.

But also, just saying parent_id is not good enough, as you would need to specify the valid date (assuming that transaction date was null). But you would definitely want parent_id to be of the bitemoral_id type.

If you are recording a hierarchy in a point in time, then you want ancestry to be recorded at a point in time. Which seems to work, and would probably work better than closure trees. (not taking off the table, but seems to be easier to version using the ancestry path style)

When you move a node in a tree, you need to update a bunch of records. But in your case, you don't really want to modify any records (other than updating the transaction_to and possibly the valid_to dates). You'll need to ensure that updates in the callback do the necessary modification. Tricky edge case. Hopefully the update strategy system would work for you.

TANGENT: I'd like to moving towards sql transactions rather than updating the records in ruby and sending those across N+1 style. But time has been limited so that may never occur. But I think it may affect you.

TANGENT: Are you writing UPDATE triggers in the database to allow UPDATE statements but have those actually get converted to INSERT statements? May cause chaos for rails, or may make it more elegant.

TANGENT: Wondering if thread local variables ref for transaction_time (default to null) and a default valid_time would work. Then you could keep using the database as is with a default scope pulling the values out of the threaded variables. Not sure if you could go so far as to set the primary key to bitemoral_id, but maybe?

All the find operations need to change. so every scope and update in a callback would be wrong. This is a big change.

I agree that you don't want id/id because those are only valid at a single point in time, and in the future, when you update the path, they may be invalid since the ids may span transaction_time scopes.

TANGENT: look into how the multi tenant solutions solve this scoping issue since the behavior may be similar to what you want. I notice that multi-tenant gem uses the active support wrapper over thread local while acts_as_tenant uses Thread.current.

Do keep in mind, in order to work with STI, the code does throw away default scopes, specifically when fetching a parent, but possibly in other cases, as well. So this may introduce issues depending upon how you implement your solution. A number of people over the years seem to be polar pro dropping scopes vs anti dropping scopes.

Ok, that was a little bit of a ramble. Sorry I don't have more time to better organize the thoughts. Hopefully some of that is useful to you.

This seems like it would require quite a bit of change. Need some time to process. Adding the transaction axis and the valid axis makes this quite interesting. For your domain do you need both, or would transaction date meet your needs? If the latter, it may be able to

kbrock · 2025-05-22T20:19:27Z

ooh, I've used deleted_at for soft deletes and paper_trail for history. But this is a neat idea - though does look potentially intrusive

a5-stable · 2025-05-26T16:25:47Z

@kbrock
Thank you so much for your thoughtful and in-depth feedback. I really appreciate the time you took to consider this!
Your points are very helpful!

In my domain, we do require both the transaction and valid time axes—not just for logical deletion like discard, but to properly track the lifecycle of a resource as a historical record. (even in unitemporal use cases, we might want bitemporal_id/bitemporal_id as the ancestry key instead of id, to represent logical relationships rather than historical versions).

Also, I’m planning to use a multi-tenant-like approach, perhaps with something like the multi-tenant gem to manage valid and transaction times via thread-local variables. Your thoughts on default scopes and STI behavior were particularly helpful—I'll take extra care there.

Just to clarify, I shared this use case mainly to help illustrate the idea of supporting custom identifier columns for ancestry—not to require the entire bitemporal model to be supported out of the box.

I completely understand that this could lead to significant internal changes, and I’d be happy to help. If you have thoughts on where such a change should start or what kind of interface would make sense, I’d love to hear them and contribute.

Again, thank you for your time and insights.

kbrock · 2025-05-27T21:36:32Z

even in unitemporal use cases, we might want bitemporal_id/bitemporal_id as the ancestry key instead of id, to represent logical relationships rather than historical versions.

Yea, in your case, I feel id/id is not useful. Once you assign any transaction_to or valid_to, then the ancestry` value will be off.

a5-stable · 2025-06-08T15:40:15Z

@kbrock
Thank you very much for your thoughtful feedback and the time you’ve spent on this!
I fully understand that addressing this properly will require some time and effort, and I’m very willing to collaborate closely to move it forward.
If you have any concerns, suggestions, or ideas about where to begin or how to proceed, I would greatly appreciate your guidance.

kbrock · 2025-07-17T16:00:57Z

@a5-stable I am not sure if this is related or totally tangental, and it is still an idea and not a know commodity, but...

- def parent_id
-   ancestry.split("/").last
- end
+ def parent_id
+   self.class.calculate_parent_id(self, ancestry_column)
+ end
+ def self.calculate_parent_id(rec, col_name)
+   rec.send(col_name).split("/").last
+ end

And further down the line...

def has_ancestry
  # ...
  define_method(:parent_id)
    self.class.calculate_parent_id(self, ancestry_column)
  end
  # ...
end

Does that change the way this PR is implemented?

a5-stable · 2025-07-27T02:02:33Z

@kbrock
Thanks for sharing the idea, I think the change around parent_id looks good overall.
Just to clarify: are you thinking of including that change in this PR, or is it more of a future improvement that's on the roadmap?

Either way, I don't see any conflict between that idea and the current PR, so no problem from my side.

Abstract all primary key references

672aa3f

a5-stable mentioned this pull request May 11, 2025

Allow customization of column for ancestry relationships (using non-primary key) #695

Open

kbrock reviewed May 20, 2025

View reviewed changes

a5-stable added 2 commits May 20, 2025 23:28

use parent_id, node_id

6731f91

use ancestry_identifier_value

efde55e

use find_by instead of find

6c8cc10

a5-stable requested a review from kbrock May 22, 2025 01:03

	children = index[node.ancestry_identifier_column]
	children = index[node.read_attribute(ancestry_identifier_column)]

	:node_id => node.ancestry_identifier_column,
	:node_id => node.read_attribute(ancestry_identifier_column),

	parent_ids[node.ancestry_identifier_column] = node.parent_id if exists? node.parent_id
	parent_ids[node.read_attribute(ancestry_identifier_column)] = node.parent_id if exists? node.parent_id

Abstract all primary key references #698

Are you sure you want to change the base?

Abstract all primary key references #698

Uh oh!

Conversation

a5-stable commented May 11, 2025

Uh oh!

kbrock May 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kbrock May 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kbrock May 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

a5-stable May 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kbrock May 20, 2025

Choose a reason for hiding this comment

Uh oh!

a5-stable May 20, 2025

Choose a reason for hiding this comment

Uh oh!

kbrock May 20, 2025

Choose a reason for hiding this comment

Uh oh!

kbrock May 20, 2025

Choose a reason for hiding this comment

Uh oh!

kbrock May 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

a5-stable May 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kbrock May 20, 2025

Choose a reason for hiding this comment

Uh oh!

kbrock May 20, 2025

Choose a reason for hiding this comment

Uh oh!

kbrock May 20, 2025

Choose a reason for hiding this comment

Uh oh!

kbrock commented May 20, 2025

Uh oh!

kbrock commented May 20, 2025

Uh oh!

a5-stable commented May 21, 2025

Usecase

public API such as parent_id

readability

Uh oh!

kbrock commented May 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kbrock commented May 22, 2025

Uh oh!

a5-stable commented May 26, 2025

Uh oh!

kbrock commented May 27, 2025

Uh oh!

a5-stable commented Jun 8, 2025

Uh oh!

kbrock commented Jul 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

a5-stable commented Jul 27, 2025

Uh oh!

Reviewers

Assignees

Labels

kbrock May 20, 2025 •

edited

Loading

kbrock May 20, 2025 •

edited

Loading

kbrock May 20, 2025 •

edited

Loading

a5-stable May 20, 2025 •

edited

Loading

kbrock May 20, 2025 •

edited

Loading

a5-stable May 20, 2025 •

edited

Loading

public API such as `parent_id`

kbrock commented May 22, 2025 •

edited

Loading

kbrock commented Jul 17, 2025 •

edited

Loading