-
Notifications
You must be signed in to change notification settings - Fork 5
Add '--hide-acyclic' option to hide vertices not part of a cycle #32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
0250467
1f3051c
45c261e
8e76df0
7c5978b
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,6 +1,10 @@ | ||
| Changelog | ||
| ========= | ||
|
|
||
| latest | ||
| ---------------- | ||
| * Add --hide-acyclic flag. | ||
|
|
||
| 2.2 (2025-12-12) | ||
| ---------------- | ||
|
|
||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,14 +1,15 @@ | ||
| from collections.abc import Set | ||
| from collections.abc import Callable | ||
| import itertools | ||
|
|
||
| import grimp | ||
| from impulse import ports, dotfile | ||
| from impulse import ports, dotfile, graph | ||
|
|
||
|
|
||
| def draw_graph( | ||
| module_name: str, | ||
| show_import_totals: bool, | ||
| show_cycle_breakers: bool, | ||
| hide_acyclic: bool, | ||
| sys_path: list[str], | ||
| current_directory: str, | ||
| get_top_level_package: Callable[[str], str], | ||
|
|
@@ -21,6 +22,7 @@ def draw_graph( | |
| module_name: the package or subpackage name of any importable Python package. | ||
| show_import_totals: whether to label the arrows with the total number of imports they represent. | ||
| show_cycle_breakers: marks a set of dependencies that, if removed, would make the graph acyclic. | ||
| hide_acyclic: whether to hide submodules that are not part of a cycle. | ||
|
Owner
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Will the system behave sensibly when both We would hope that there will never be an overlap between cycle breakers and acyclic modules. But the two algorithms are currently very separate. Note also that I wonder if it would be a good idea to remove all the acyclic modules first, then run
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Good point! I hadn't really thought about the quality of the result from I haven't looked into the heuristic, but even if it could end up nominating a cycle breaker involving an acyclic module, that edge isn't breaking any cycles by definition, so the remaining cycle breakers would still be sufficient to make the graph acyclic. That said, it makes a lot more sense to run the cycle breaker nomination on the reduced graph, as you suggested. Since removing acyclic vertices gives us strongly connected components for free, we could even run the nomination separately on each SCC. That should make it faster and also prevent it from nominating unnecessary edges between SCCs. |
||
| sys_path: the sys.path list (or a test double). | ||
| current_directory: the current working directory. | ||
| get_top_level_package: the function to retrieve the top level package name. This will usually be the first part | ||
|
|
@@ -35,23 +37,46 @@ def draw_graph( | |
| top_level_package = get_top_level_package(module_name) | ||
| grimp_graph = build_graph(top_level_package) | ||
|
|
||
| dot = _build_dot(grimp_graph, module_name, show_import_totals, show_cycle_breakers) | ||
| dot = _build_dot( | ||
| grimp_graph, | ||
| module_name, | ||
| show_import_totals, | ||
| show_cycle_breakers, | ||
| hide_acyclic, | ||
| ) | ||
|
|
||
| viewer.view(dot) | ||
|
|
||
|
|
||
| class _DotGraphBuildStrategy: | ||
| def build(self, module_name: str, grimp_graph: grimp.ImportGraph) -> dotfile.DotGraph: | ||
| def build( | ||
|
Owner
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I might be missing something, but it feels to me like it would be a better separation of concerns to make
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. As it stands, we still use
Owner
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Let's come back to this once we've figured out the Grimp API. |
||
| self, | ||
| module_name: str, | ||
| grimp_graph: grimp.ImportGraph, | ||
| hide_acyclic: bool, | ||
| ) -> dotfile.DotGraph: | ||
| children = grimp_graph.find_children(module_name) | ||
|
|
||
| self.prepare_graph(grimp_graph, children) | ||
|
|
||
| presentation_graph = graph.DirectedGraphWithoutLoops.from_adjacency_condition( | ||
| vertices=children, | ||
| is_adjacent=lambda upstream, downstream: self.has_edge( | ||
| grimp_graph, upstream, downstream | ||
| ), | ||
| ) | ||
|
|
||
| if hide_acyclic: | ||
| presentation_graph = presentation_graph.remove_acyclic_vertices() | ||
|
|
||
| dot = dotfile.DotGraph(title=module_name, concentrate=self.should_concentrate()) | ||
| for child in children: | ||
| dot.add_node(child) | ||
| for upstream, downstream in itertools.permutations(children, r=2): | ||
| if edge := self.build_edge(grimp_graph, upstream, downstream): | ||
| dot.add_edge(edge) | ||
|
|
||
| for vertex in presentation_graph.vertices: | ||
| dot.add_node(vertex) | ||
|
|
||
| for upstream, downstream in presentation_graph.iter_edges(): | ||
| dot_edge = self.build_edge(grimp_graph, upstream, downstream) | ||
| dot.add_edge(dot_edge) | ||
|
|
||
| return dot | ||
|
|
||
|
|
@@ -61,9 +86,12 @@ def should_concentrate(self) -> bool: | |
| def prepare_graph(self, grimp_graph: grimp.ImportGraph, children: Set[str]) -> None: | ||
| pass | ||
|
|
||
| def has_edge(self, grimp_graph: grimp.ImportGraph, upstream: str, downstream: str) -> bool: | ||
|
Owner
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think we should change this to
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Agree, that fits better! |
||
| raise NotImplementedError | ||
|
|
||
| def build_edge( | ||
| self, grimp_graph: grimp.ImportGraph, upstream: str, downstream: str | ||
| ) -> dotfile.Edge | None: | ||
| ) -> dotfile.Edge: | ||
| raise NotImplementedError | ||
|
|
||
|
|
||
|
|
@@ -74,12 +102,13 @@ def prepare_graph(self, grimp_graph: grimp.ImportGraph, children: Set[str]) -> N | |
| for child in children: | ||
| grimp_graph.squash_module(child) | ||
|
|
||
| def has_edge(self, grimp_graph: grimp.ImportGraph, upstream: str, downstream: str) -> bool: | ||
| return grimp_graph.direct_import_exists(importer=downstream, imported=upstream) | ||
|
|
||
| def build_edge( | ||
| self, grimp_graph: grimp.ImportGraph, upstream: str, downstream: str | ||
| ) -> dotfile.Edge | None: | ||
| if grimp_graph.direct_import_exists(importer=downstream, imported=upstream): | ||
| return dotfile.Edge(source=downstream, destination=upstream) | ||
| return None | ||
| ) -> dotfile.Edge: | ||
| return dotfile.Edge(source=downstream, destination=upstream) | ||
|
|
||
|
|
||
| class _ImportExpressionBuildStrategy(_DotGraphBuildStrategy): | ||
|
|
@@ -128,31 +157,32 @@ def _get_self_or_ancestor(candidate: str, ancestors: Set[str]) -> str | None: | |
| return ancestor | ||
| return None | ||
|
|
||
| def has_edge(self, grimp_graph: grimp.ImportGraph, upstream: str, downstream: str) -> bool: | ||
| return grimp_graph.direct_import_exists( | ||
| importer=downstream, imported=upstream, as_packages=True | ||
| ) | ||
|
|
||
| def build_edge( | ||
| self, grimp_graph: grimp.ImportGraph, upstream: str, downstream: str | ||
| ) -> dotfile.Edge | None: | ||
| if grimp_graph.direct_import_exists( | ||
| importer=downstream, imported=upstream, as_packages=True | ||
| ): | ||
| if self.show_import_totals: | ||
| number_of_imports = self._count_imports_between_packages( | ||
| grimp_graph, importer=downstream, imported=upstream | ||
| ) | ||
| label = str(number_of_imports) | ||
| else: | ||
| label = "" | ||
|
|
||
| if self.show_cycle_breakers: | ||
| assert self.cycle_breakers is not None | ||
| is_cycle_breaker = (downstream, upstream) in self.cycle_breakers | ||
| emphasized = is_cycle_breaker | ||
| else: | ||
| emphasized = False | ||
|
|
||
| return dotfile.Edge( | ||
| source=downstream, destination=upstream, label=label, emphasized=emphasized | ||
| ) -> dotfile.Edge: | ||
| if self.show_import_totals: | ||
| number_of_imports = self._count_imports_between_packages( | ||
| grimp_graph, importer=downstream, imported=upstream | ||
| ) | ||
| return None | ||
| label = str(number_of_imports) | ||
| else: | ||
| label = "" | ||
|
|
||
| if self.show_cycle_breakers: | ||
| assert self.cycle_breakers is not None | ||
| is_cycle_breaker = (downstream, upstream) in self.cycle_breakers | ||
| emphasized = is_cycle_breaker | ||
| else: | ||
| emphasized = False | ||
|
|
||
| return dotfile.Edge( | ||
| source=downstream, destination=upstream, label=label, emphasized=emphasized | ||
| ) | ||
|
|
||
| @staticmethod | ||
| def _count_imports_between_packages( | ||
|
|
@@ -183,6 +213,7 @@ def _build_dot( | |
| module_name: str, | ||
| show_import_totals: bool, | ||
| show_cycle_breakers: bool, | ||
| hide_acyclic: bool, | ||
| ) -> dotfile.DotGraph: | ||
| strategy: _DotGraphBuildStrategy | ||
| if show_import_totals or show_cycle_breakers: | ||
|
|
@@ -194,4 +225,4 @@ def _build_dot( | |
| else: | ||
| strategy = _ModuleSquashingBuildStrategy() | ||
|
|
||
| return strategy.build(module_name, grimp_graph) | ||
| return strategy.build(module_name, grimp_graph, hide_acyclic) | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,104 @@ | ||
| import dataclasses | ||
| import itertools | ||
| from typing import Callable, Generic, Iterator, Mapping, TypeVar | ||
|
|
||
| V = TypeVar("V") # type for graph vertices | ||
|
|
||
|
|
||
| @dataclasses.dataclass(frozen=True) | ||
| class DirectedGraphWithoutLoops(Generic[V]): | ||
|
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The commit message touches on this, but to expand: we need a graph structure to work on when removing vertices and edges that shouldn't be shown. There are already two graph structures around, but I don't think either fits the bill:
That's why I've added this small intermediate structure for this, but I'm open to other suggestions!
Owner
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm concerned we're introducing unnecessary maintenance overhead here. Ideally, Impulse should be a fairly light / UI-focused tool that delegates complex algorithms to Grimp. If we can't find these in a straightforward way then we should push that down into Grimp. I'd be interested in your thoughts about the following:
Re. networkx dependency, I think it would be fine to add it as dependency temporarily for this PR, so we can then have passing tests. Then we could separately implement it in Grimp and then switch out the dependency.
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I see the concern and had actually considered suggesting putting this into Grimp, but I wasn't sure the API would feel natural or if it really had a general use case. Now that you mention it though, I'm happy to explore ways to sensibly abstract it for For the module-squashing strategy, where every submodule in the For the unsquashed version, I see two options:
With the second option, the virtual "vertices" and "edges" the algorithm would run on are essentially those of the partially squashed graph, so it feels much cleaner to me to go with option one, where we would just create that graph and operate directly on it. I'm not sure if might be slower, though. Either way, it seems like we would need to compute dependency on the submodule level by inspecting imports of their children exactly once, which feels like the most expensive part. That would make it roughly equivalent, but I know too little about Grimp to be confident this makes sense. What do you think?
Owner
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks for this and for the video call earlier, just noting for the record. We decided that the next step would be for you to draft some docs / API / Python tests for what's needed in Grimp in order to support this - probably something for getting strongly connected components for the children of a given package. Separately I've been experimenting with an interactive mode for Import Linter, which is where this functionality may eventually end up. |
||
| vertices: frozenset[V] | ||
| _adjacency_map: Mapping[V, set[V]] | ||
|
|
||
| @classmethod | ||
| def from_adjacency_condition(cls, vertices: set[V], is_adjacent: Callable[[V, V], bool]): | ||
| adjacency_map = { | ||
| from_vertex: { | ||
| to_vertex | ||
| for to_vertex in vertices | ||
| if from_vertex != to_vertex and is_adjacent(from_vertex, to_vertex) | ||
| } | ||
| for from_vertex in vertices | ||
| } | ||
| return cls(vertices=frozenset(vertices), _adjacency_map=adjacency_map) | ||
|
|
||
| def iter_edges(self) -> Iterator[tuple[V, V]]: | ||
| for from_vertex in self._adjacency_map: | ||
| for to_vertex in self._adjacency_map[from_vertex]: | ||
| yield from_vertex, to_vertex | ||
|
|
||
| def remove_vertices(self, vertices_to_remove: set[V]) -> "DirectedGraphWithoutLoops[V]": | ||
|
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We only use this to remove acyclic vertices, but I think it would also be very useful for this issue, and I'm happy to submit a follow-up PR for it. |
||
| new_vertices = frozenset(self.vertices - vertices_to_remove) | ||
| return self.__class__( | ||
| vertices=new_vertices, | ||
| _adjacency_map={ | ||
| from_vertex: self._adjacency_map[from_vertex] - vertices_to_remove | ||
| for from_vertex in new_vertices | ||
| }, | ||
| ) | ||
|
|
||
| def find_acyclic_vertices(self) -> set[V]: | ||
|
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Removing acyclic vertices in a directed graph amounts to removing strongly connected components of size 1. I suspect identifying them is an equally hard problem as identifying all strongly connected components in a graph. For the latter, Tarjan's algorithm seems to be the standard approach and runs in linear time. There are also very similar algorithms by Kosaraju and Dijkstra. Reimplementing this here doesn’t feel ideal, but I haven’t found a usable Python implementation. py-tarjan doesn’t seem to be maintained, and while there’s an implementation in NetworkX, I don't think you'd want this as a dependency. A brute-force DFS would make the code simpler, but is quite inefficient. The graphs people want to display probably aren't huge, but it still doesn't feel ideal. |
||
| """ | ||
| Find all vertices that are not part of any cycle. | ||
|
|
||
| This function uses Tarjan's strongly connected components algorithm. The | ||
| algorithm performs a single depth first search of the graph and groups | ||
| vertices into strongly connected components (SCCs), where each vertex in | ||
| an SCC is reachable from every other vertex in the same SCC. | ||
|
|
||
| Under the assumption that the graph contains no self loops, a vertex is | ||
| not part of a cycle if and only if the SCC it is part of contains no | ||
| other vertices. | ||
|
|
||
| Returns: | ||
| A set of vertices that are not part of any cycle. | ||
| """ | ||
| # Vertices are assigned indices in the order they are encountered | ||
| index_generator: Iterator[int] = itertools.count() | ||
| index_map: dict[V, int] = {} | ||
| lowest_reachable_index: dict[V, int] = {} | ||
|
|
||
| active_stack: list[V] = [] | ||
| # Mirror of active_stack for O(1) membership checks | ||
| active_stack_set: set[V] = set() | ||
|
|
||
| acyclic_vertices: set[V] = set() | ||
|
|
||
| def visit(v: V) -> None: | ||
| index = next(index_generator) | ||
|
|
||
| index_map[v] = index | ||
| lowest_reachable_index[v] = index | ||
|
|
||
| active_stack.append(v) | ||
| active_stack_set.add(v) | ||
|
|
||
| for w in self._adjacency_map.get(v, {}): | ||
| if w not in index_map: | ||
| visit(w) | ||
| lowest_reachable_index[v] = min( | ||
| lowest_reachable_index[v], lowest_reachable_index[w] | ||
| ) | ||
| elif w in active_stack_set: | ||
| lowest_reachable_index[v] = min(lowest_reachable_index[v], index_map[w]) | ||
|
|
||
| if lowest_reachable_index[v] == index_map[v]: | ||
| scc: list[V] = [] | ||
| while True: | ||
| w = active_stack.pop() | ||
| active_stack_set.remove(w) | ||
| scc.append(w) | ||
| if w == v: | ||
| break | ||
|
|
||
| if len(scc) == 1: | ||
| acyclic_vertices.update(scc) | ||
|
|
||
| for vertex in self.vertices: | ||
| if vertex not in index_map: | ||
| visit(vertex) | ||
|
|
||
| return acyclic_vertices | ||
|
|
||
| def remove_acyclic_vertices(self) -> "DirectedGraphWithoutLoops[V]": | ||
| return self.remove_vertices(self.find_acyclic_vertices()) | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feels like we should indicate in the UI that this was passed somehow - otherwise it just says 'Import graph for ' but it might be missing things.
What about a subtitle that says something like, "With acyclic modules hidden"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, good point!