Skip to content

Deleted job causes unrelated datasets to appear in lineage graph #3085

@srimathithangaraj

Description

@srimathithangaraj

After deleting a job that connects two parts of a lineage graph, datasets and jobs from the disconnected part still appear in lineage queries. This happens because the lineage traversal uses the deleted job's I/O mappings even though the job itself is hidden.

Steps to Reproduce
Create a lineage chain:
d1 → job1 → d2 → job3 → d3 → job2 → d4

Where:
job1 produces d1 and d2
job3 consumes d2 and produces d3
job2 consumes d3 and produces d4

Delete job3

Query lineage for d1:

Expected Behavior
After deleting job3, the lineage for d1 should only show the directly connected portion:

d1 → job1 → d2
Since job3 (which connects d2 to d3) is deleted, there should be no path to d3, job2, or d4.

Actual Behavior

The lineage for d1 incorrectly includes:
d1→job1→d2
d3→job2→d4
Problem: d3, job2, and d4 appear in the graph even though:

job3 (the only connection from d2 to d3) is deleted
There's no visible path explaining how these nodes are related to d1

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions