Skip to content

CSTTransformer can change node without actual changes. #1444

@sandreenko

Description

@sandreenko

TLDR question: how can I detect when CSTTransformer does not modify the tree?

Summary

When using CSTTransformer, the updated_node parameter in on_leave() may be a different object than original_node even when no changes were made to the tree. This is by design — all nodes unconditionally create new objects in _visit_and_replace_children. However, this behavior:

  1. May surprise users who expect updated_node == original_node when nothing changed
  2. Is not clearly documented
  3. Could be optimized to avoid unnecessary object creation (not 100% sure because it would add a condition)

Minimal Reproduction

import libcst as cst

class DebugTransformer(cst.CSTTransformer):
    def __init__(self):
        super().__init__()
        self._root = None

    def on_visit(self, node: cst.CSTNode) -> bool:
        if self._root is None:
            self._root = node
        # Skip IfExp children to simulate a real transformer
        if isinstance(node, cst.IfExp):
            return False
        return True

    def on_leave(self, original_node: cst.CSTNode, updated_node: cst.CSTNode) -> cst.CSTNode:
        same_identity = original_node is updated_node
        same_equality = original_node == updated_node
        deep_eq = original_node.deep_equals(updated_node)
        
        print(f"{type(original_node).__name__}: is={same_identity}, ==={same_equality}, deep_equals={deep_eq}")
        return updated_node

# Test with BinaryOperation containing two IfExp
expr = cst.parse_expression('([":a"] if cond1 else []) + ([":b"] if cond2 else [])')
expr.visit(DebugTransformer())

Actual Output

IfExp: is=True, ===True, deep_equals=True
SimpleWhitespace: is=True, ===True, deep_equals=True
SimpleWhitespace: is=True, ===True, deep_equals=True
Add: is=False, ===False, deep_equals=True      # <-- Different object even though unchanged!
IfExp: is=True, ===True, deep_equals=True
BinaryOperation: is=False, ===False, deep_equals=True  # <-- Cascades up!

Expected Behavior

When no modifications are made to a tree, I would expect:

  • original_node is updated_nodeTrue (same object)
  • original_node == updated_nodeTrue (equal)

This is important because the best practices documentation states:

The original_node parameter on any leave_ method is provided for book-keeping and is guaranteed to be equal via == and is checks to the node parameter in the corresponding visit_ method.

While this guarantee is about original_node vs the visit_<Node> parameter (which holds), users naturally expect that updated_node should also be the same object when nothing changed.

Do we maintain this website and is this statement correct?

Root Cause

This is the general behavior across ALL nodes in libcst, not a bug specific to operators. Every node's _visit_and_replace_children() unconditionally creates a new object:

# Example from _BaseOneTokenOp (libcst/_nodes/op.py:28-37)
def _visit_and_replace_children(self, visitor: CSTVisitorT) -> "_BaseOneTokenOp":
    return self.__class__(  # <-- ALWAYS creates new object!
        whitespace_before=visit_required(...),
        whitespace_after=visit_required(...),
    )

# Same pattern in BinaryOperation (libcst/_nodes/expression.py:1525-1532)
def _visit_and_replace_children(self, visitor: CSTVisitorT) -> "BinaryOperation":
    return BinaryOperation(  # <-- ALWAYS creates new object!
        lpar=visit_sequence(...),
        left=visit_required(...),
        operator=visit_required(...),
        right=visit_required(...),
        rpar=visit_sequence(...),
    )

The only exception is BaseLeaf._visit_and_replace_children() which returns self — but BaseLeaf has no children, so this is trivial.

This pattern is used consistently across:

  • op.py: _BaseOneTokenOp, _BaseTwoTokenOp, BaseUnaryOp, NotEqual (~40+ operator classes)
  • expression.py: BinaryOperation, LeftSquareBracket, RightSquareBracket, LeftParen, RightParen, Name, Integer, etc.
  • statement.py: All statement nodes

Impact

When writing a transformer that needs to detect "am I at the root of the expression I'm transforming?", users cannot rely on updated_node == self._root. They must use updated_node.deep_equals(self._root) instead, which is:

  1. Not obvious from the documentation
  2. More expensive (full tree comparison vs identity check)

Proposed Fix

Option 1: Internal optimization (minimal change)

Have _visit_and_replace_children return self when children are unchanged:

def _visit_and_replace_children(self, visitor: CSTVisitorT) -> "_BaseOneTokenOp":
    new_ws_before = visit_required(
        self, "whitespace_before", self.whitespace_before, visitor
    )
    new_ws_after = visit_required(
        self, "whitespace_after", self.whitespace_after, visitor
    )
    # Only create new object if children actually changed
    if new_ws_before is self.whitespace_before and new_ws_after is self.whitespace_after:
        return self
    return self.__class__(
        whitespace_before=new_ws_before,
        whitespace_after=new_ws_after,
    )

This preserves object identity when nothing changes, making original_node is updated_node reliable.

Option 2: New sentinel (API change)

libcst already has RemovalSentinel (remove node) and FlattenSentinel (replace with multiple nodes). A NoChangeSentinel could signal "no changes made":

class NoChangeSentinel:
    NO_CHANGE = ...

# In transformer:
def on_leave(self, original_node, updated_node):
    if no_modifications_needed:
        return NoChangeSentinel.NO_CHANGE  # Signals: keep original, skip rebuilding parents
    return updated_node

This would allow the traversal to short-circuit parent rebuilding. However, Option 1 is simpler and doesn't require API changes.

Environment

fbsource

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions