Skip to content

Mark a table as root yourself for subsetting #3227

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
Yarn-e opened this issue Feb 6, 2025 · 8 comments
Open

Mark a table as root yourself for subsetting #3227

Yarn-e opened this issue Feb 6, 2025 · 8 comments
Labels
enhancement New feature or request

Comments

@Yarn-e
Copy link
Contributor

Yarn-e commented Feb 6, 2025

Is your feature request related to a problem?

When subsetting, i have a table that is the source where most of my other tables rely on. But Neosync only recognises tables as root if they comply with the following rule:

This is a Root table that only has foreign key references to children tables.

Describe the solution you'd like

Have the option to mark a table as root, even if it doesn't only have foreign key relations.

Describe alternatives you've considered

Creating subsets for other tables, but since we can't do joins, it is hard to create a dynamic subset on properties on the root table (for example: Trial subscribers that are active the last 2 weeks)

Additional context

N / A

@Yarn-e Yarn-e added the enhancement New feature or request label Feb 6, 2025
@nickzelei
Copy link
Member

nickzelei commented Feb 7, 2025

Just so I understand, you have a table that other tables rely on, but they don't have a foreign key to that table? Does that mean that the constraint is implicit?

Do those tables have references to that table in some way? I'm wondering if virtual foreign keys would work for you here to define that application-level dependency if it's not defined by your database.

What I'm trying to figure out here is that if we add an option to mark a table as a root, what does neosync actually do with that information if it doesn't know what actually depends on it. That's why I'm thinking virtual foreign keys come in to play here..But maybe there is something I'm missing for your usecase.

@Yarn-e
Copy link
Contributor Author

Yarn-e commented Feb 7, 2025

TIL: Virtual keys! I have some tables that indeed don't have a dependency defined by the database, but that is not the main problem i'm facing.

My 'Starting point' table has foreign keys to other tables as well as other tables have foreign keys to it. I want to apply a subset filter to this table, and all the tables related to it should apply that subset filter to it. But it looks like my subset filter is not propagated to the tables that have a foreign key to it, as my 'Starting point' table has foreign keys too to other tables.

I'm not sure if it is possible to indicate a sort of hierarchy, where i would indicate this is the first table, and these tables should come after it?

Heres a dummy example, that hopefully sketch my usecase a bit better (The doted line without an arrow indicates a dependency that is not defined by the database)

classDiagram 

class StartingTable
class ReliesOnStartingTable
class ReliesOnStartingTableToo
class CodeRelationToStartingTable
class StartingTableReliesOnMe
class ReliesOnCodeRelation

StartingTable <.. ReliesOnStartingTable
StartingTable <.. ReliesOnStartingTableToo
StartingTableReliesOnMe <.. StartingTable
StartingTable .. CodeRelationToStartingTable
CodeRelationToStartingTable <.. ReliesOnCodeRelation
Loading

What i want to achieve is the following flowchart, where i define one subset on StartingTable (WHERE status = Trial) and that all other tables only take the ID's that are left over in the starting table.

flowchart LR  

StartingTable
ReliesOnStartingTable
ReliesOnStartingTableToo
CodeRelationToStartingTable
StartingTableReliesOnMe
ReliesOnCodeRelation

StartingTable -->ReliesOnStartingTable
StartingTable --> ReliesOnStartingTableToo
StartingTable -->|Virtual Foreign key| CodeRelationToStartingTable
StartingTable --> StartingTableReliesOnMe
CodeRelationToStartingTable --> ReliesOnCodeRelation
Loading

@nickzelei
Copy link
Member

@Yarn-e - We support this today by with the boolean flag subset using foreign key constraints.

Image

It defaults to enabled. Do you have this enable in your job? And it's not working how you expect?

We follow foreign key constraints when that feature is enabled and the downstream tables will be properly subset by your StartingTable query.
The only one that won't be subset today would be the StartingTableReliesOnMe as it is effectively a parent table. So if you wanted that one to be subset, it would also need a subset query applied to it...which would then filter StartingTable and so on.

Also apologies for the late reply here. My github notifications are hard to track. I appreciate the detailed write up!

@Yarn-e
Copy link
Contributor Author

Yarn-e commented Feb 25, 2025

Thanks for the answer @nickzelei!

I was wondering what the limitation is that a subset of a child table can't affect parent tables? Is this a design choice, or rather a current limitation in the system?

This would mean that i can't create a dynamic subset of data because the parent tables would need to know the id's of the child table, as you can't do a join when subsetting.

@nickzelei
Copy link
Member

nickzelei commented Feb 26, 2025

Yeah it's a current design choice and system limitation.

Foreign Key constraints are effectively directional graphs (DAGs) that follow a parent->child hierarchy.

We've had a few different users (as well as a lot of internal discussions) ask about changing our subsetting logic to treat any table that has a filter applied to it as the root node and ignoring the directionality of the foreign key constraints and instead treat them as graph edges.

This is more complex in practice because today when a sync happens, we build the run order based on the foreign key constraints. So all parent tables run prior to their children. This is really simple from Neosync's perspective because if a subset is applied, when the children are synced, we add in the parent's where clause as a join in the select.

If we need to actually filter the parent by the child, the run order doesn't necessarily change, but the query does, and it increases in complexity even further if you have multiple subsets that apply.


Long story short, it's definitely possible to update Neosync to subset in this way, we just haven't tackled it yet as we started with the simpler approach (I say simple lightly here 😅 ) first.

@nickzelei
Copy link
Member

In practice if you are filtering a table that has parents, it is pretty easy to add a subset query to the parent table (maybe not in your case?) to further filter the parent down to match more closely what you would expect from the child.

Is it possible for you to add a query to the parent which would then filter the child for you instead of having the subset on the child? Or perhaps have a query on both of those tables so they are filtered down to what you want?

@Yarn-e
Copy link
Contributor Author

Yarn-e commented Feb 27, 2025

Sadly it isn't possible in our case. In theory we could not apply any subsets to the parent table, and that would result in not too much data being added to our destination table. But there is one relation that the child table has to their parent (and that parent has a lot of children) that isn't possible to create the subset query we would want.

Thanks for giving this explanation! I'll keep an eye out on the roadmap in case it would come up there!

@nickzelei
Copy link
Member

Hm, okay. Maybe I'm missing something here because you were asking about filtering the parent anyways via the child's subset. But it sounds like if you did that it would break other children of that parent anyways. If I'm understanding the makeup correctly. :)

Either way, I hope this has helped you or you were at least able to get a closer subset of the data you wanted with the addition of the virtual foreign keys.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants