Topological level and shortest dependency path of dependency tree (directed acyclic graph) #996
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What is the problem / what does the code in this PR do
The dependency structure forms a directed acyclic graph (DAG), allowing us to define a topological level for each node, as implemented in #896. This new PR extends that by also computing the shortest dependency path within the DAG:
level(node) = 1 + max(level(dep))Represents the longest path from any source node to the given node — useful for understanding how late a node occurs in the dependency chain.level(node) = 1 + min(level(dep))Captures the minimum number of steps from a source node to the current node, identifying the shallowest point of dependency.By combining both metrics, we can better analyze the structure of the dependency tree. In particular, we can identify key nodes that act as "connectors" or "cut points": if we divide the tree at such a node, one of the resulting subtrees will contain all nodes that depend on it, effectively capturing a full dependency closure.
That node is the node with a unique topological level and shortest dependency path among all nodes.
Previously, in reprocessing, only
data_types that depend on onedepends_onare allowed in per-chunk storage. Now we allow multipledepends_on, if alldepends_oneventually depend on a singledata_type, and all intermediatedata_types are not stored.Can you briefly describe how it works?
Can you give a minimal working example (or illustrate with a figure)?
Please include the following if applicable:
Please make sure that all automated tests have passed before asking for a review (you can save the PR as a draft otherwise).