-
Notifications
You must be signed in to change notification settings - Fork 155
Mark a table as root yourself for subsetting #3227
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Just so I understand, you have a table that other tables rely on, but they don't have a foreign key to that table? Does that mean that the constraint is implicit? Do those tables have references to that table in some way? I'm wondering if virtual foreign keys would work for you here to define that application-level dependency if it's not defined by your database. What I'm trying to figure out here is that if we add an option to mark a table as a root, what does neosync actually do with that information if it doesn't know what actually depends on it. That's why I'm thinking virtual foreign keys come in to play here..But maybe there is something I'm missing for your usecase. |
TIL: Virtual keys! I have some tables that indeed don't have a dependency defined by the database, but that is not the main problem i'm facing. My 'Starting point' table has foreign keys to other tables as well as other tables have foreign keys to it. I want to apply a subset filter to this table, and all the tables related to it should apply that subset filter to it. But it looks like my subset filter is not propagated to the tables that have a foreign key to it, as my 'Starting point' table has foreign keys too to other tables. I'm not sure if it is possible to indicate a sort of hierarchy, where i would indicate this is the first table, and these tables should come after it? Heres a dummy example, that hopefully sketch my usecase a bit better (The doted line without an arrow indicates a dependency that is not defined by the database) classDiagram
class StartingTable
class ReliesOnStartingTable
class ReliesOnStartingTableToo
class CodeRelationToStartingTable
class StartingTableReliesOnMe
class ReliesOnCodeRelation
StartingTable <.. ReliesOnStartingTable
StartingTable <.. ReliesOnStartingTableToo
StartingTableReliesOnMe <.. StartingTable
StartingTable .. CodeRelationToStartingTable
CodeRelationToStartingTable <.. ReliesOnCodeRelation
What i want to achieve is the following flowchart, where i define one subset on StartingTable ( flowchart LR
StartingTable
ReliesOnStartingTable
ReliesOnStartingTableToo
CodeRelationToStartingTable
StartingTableReliesOnMe
ReliesOnCodeRelation
StartingTable -->ReliesOnStartingTable
StartingTable --> ReliesOnStartingTableToo
StartingTable -->|Virtual Foreign key| CodeRelationToStartingTable
StartingTable --> StartingTableReliesOnMe
CodeRelationToStartingTable --> ReliesOnCodeRelation
|
@Yarn-e - We support this today by with the boolean flag subset using foreign key constraints. ![]() It defaults to enabled. Do you have this enable in your job? And it's not working how you expect? We follow foreign key constraints when that feature is enabled and the downstream tables will be properly subset by your Also apologies for the late reply here. My github notifications are hard to track. I appreciate the detailed write up! |
Thanks for the answer @nickzelei! I was wondering what the limitation is that a subset of a child table can't affect parent tables? Is this a design choice, or rather a current limitation in the system? This would mean that i can't create a dynamic subset of data because the parent tables would need to know the id's of the child table, as you can't do a join when subsetting. |
Yeah it's a current design choice and system limitation. Foreign Key constraints are effectively directional graphs (DAGs) that follow a parent->child hierarchy. We've had a few different users (as well as a lot of internal discussions) ask about changing our subsetting logic to treat any table that has a filter applied to it as the root node and ignoring the directionality of the foreign key constraints and instead treat them as graph edges. This is more complex in practice because today when a sync happens, we build the run order based on the foreign key constraints. So all parent tables run prior to their children. This is really simple from Neosync's perspective because if a subset is applied, when the children are synced, we add in the parent's where clause as a join in the select. If we need to actually filter the parent by the child, the run order doesn't necessarily change, but the query does, and it increases in complexity even further if you have multiple subsets that apply. Long story short, it's definitely possible to update Neosync to subset in this way, we just haven't tackled it yet as we started with the simpler approach (I say simple lightly here 😅 ) first. |
In practice if you are filtering a table that has parents, it is pretty easy to add a subset query to the parent table (maybe not in your case?) to further filter the parent down to match more closely what you would expect from the child. Is it possible for you to add a query to the parent which would then filter the child for you instead of having the subset on the child? Or perhaps have a query on both of those tables so they are filtered down to what you want? |
Sadly it isn't possible in our case. In theory we could not apply any subsets to the parent table, and that would result in not too much data being added to our destination table. But there is one relation that the child table has to their parent (and that parent has a lot of children) that isn't possible to create the subset query we would want. Thanks for giving this explanation! I'll keep an eye out on the roadmap in case it would come up there! |
Hm, okay. Maybe I'm missing something here because you were asking about filtering the parent anyways via the child's subset. But it sounds like if you did that it would break other children of that parent anyways. If I'm understanding the makeup correctly. :) Either way, I hope this has helped you or you were at least able to get a closer subset of the data you wanted with the addition of the virtual foreign keys. |
Is your feature request related to a problem?
When subsetting, i have a table that is the source where most of my other tables rely on. But Neosync only recognises tables as root if they comply with the following rule:
Describe the solution you'd like
Have the option to mark a table as root, even if it doesn't only have foreign key relations.
Describe alternatives you've considered
Creating subsets for other tables, but since we can't do joins, it is hard to create a dynamic subset on properties on the root table (for example: Trial subscribers that are active the last 2 weeks)
Additional context
N / A
The text was updated successfully, but these errors were encountered: