Skip to content

Reduce TaskSpawner GitHub API traffic by decoupling polling from Task events and lazily fetching details #794

@gjkim42

Description

@gjkim42

Summary

TaskSpawner currently generates more GitHub API traffic than necessary. The main causes are:

  1. Task status/deletion events can trigger a full GitHub rediscovery outside the normal poll interval
  2. GitHub issue/PR detail endpoints are fetched even when the active source config and templates do not need them

This issue proposes reducing GitHub traffic without changing the public TaskSpawner API.

Goals

  • Keep GitHub discovery interval-driven only
  • Stop Task activity from causing full GitHub rediscovery
  • Fetch GitHub comments/reviews only when required by source config or templates
  • Preserve existing public CRD/API shape

Proposed Plan

1. Split Task events from source polling

  • Remove the Task watch from the poll controller so GitHub discovery only runs on:
    • TaskSpawner create
    • TaskSpawner spec generation changes
    • scheduled RequeueAfter poll intervals
  • Add a lightweight Task activity reconcile path/controller inside the spawner process for the single managed TaskSpawner
  • On Task phase/deletion changes, the Task activity controller should:
    • recompute and update status.activeTasks
    • run per-Task GitHub reporting when reporting is enabled
    • never call source discovery
  • Keep refill semantics simple:
    • when maxConcurrency frees up because a Task finishes, the next GitHub item is considered on the next scheduled poll only
  • Continue updating status.totalDiscovered, status.totalTasksCreated, and status.lastDiscoveryTime only during poll cycles

2. Make GitHub detail fetching lazy

  • Compute internal fetch requirements in buildSource from:
    • comment policy / legacy trigger and exclude comments
    • reviewState
    • fields referenced by taskTemplate.promptTemplate or the default prompt
    • fields referenced by taskTemplate.branch
  • Detect referenced template fields by parsing Go templates and walking the AST
  • Keep the Source interface unchanged
  • Extend internal GitHub source structs with fetch-requirement booleans

GitHub Issues rules

  • Fetch issue comments only when:
    • comment policy is configured, or
    • a template references .Comments
  • Otherwise leave WorkItem.Comments empty and skip issue comment requests

GitHub Pull Requests rules

  • Fetch reviews only when:
    • reviewState != any, or
    • comment policy is configured, or
    • a template references .ReviewState
  • Fetch conversation comments only when:
    • comment policy is configured, or
    • a template references .Comments
  • Fetch review comments only when:
    • comment policy is configured, or
    • a template references .ReviewComments
  • Preserve current comment-policy semantics by still loading review bodies and review comments whenever comment policy is active

Public API / Compatibility

  • No CRD changes
  • No new user-facing fields
  • No change to the existing Source interface

Acceptance Criteria

  • Task phase/deletion changes no longer trigger GitHub rediscovery
  • Task phase/deletion changes still update activeTasks
  • Task phase/deletion changes still perform GitHub reporting when enabled
  • TaskSpawner spec changes still trigger immediate discovery
  • GitHub issues source skips comment requests when .Comments is not needed
  • GitHub pull request source only fetches reviews/comments required by config/template usage
  • Default prompt behavior remains compatible with current output

Test Plan

Controller tests

  • Task phase update/delete does not trigger GitHub discovery
  • Task phase update still updates activeTasks
  • Task phase update still performs GitHub reporting for that Task when enabled
  • TaskSpawner spec changes still trigger immediate discovery

GitHub source tests

  • Issues source with prompt/branch templates not referencing .Comments performs no comment requests
  • Default prompt still causes issue comment fetches
  • PR source with no comment/review fields and reviewState=any only lists PRs
  • Referencing .ReviewState fetches reviews only
  • Referencing .ReviewComments fetches review comments only
  • Comment policy enabled fetches reviews, conversation comments, and review comments

Template analyzer tests

  • Detects fields in conditionals and nested template nodes
  • Treats empty prompt as the current default prompt

Notes

This issue intentionally does not include:

  • webhook support
  • persistent backlog state
  • cross-cycle authorization caching
  • public API additions

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions