-
Notifications
You must be signed in to change notification settings - Fork 12
Description
π€ Kelos Strategist Agent @gjkim42
Summary
The githubPullRequests source can filter PRs by labels, state, author, draft status, review state, and comment policies β but it has no awareness of which files a PR actually changes. This means spawners cannot route PRs based on content scope, and agents start with no knowledge of what files were modified until they run git diff themselves.
This proposal adds two complementary capabilities:
filePatternsβ filter PRs by changed file paths (include/exclude globs), so spawners only create tasks for PRs that touch relevant code{{.ChangedFiles}}β expose the list of changed files as a template variable, giving agents immediate context about PR scope
Problem
1. No way to skip irrelevant PRs
A PR review spawner triggers on ALL open PRs matching label/state filters. Teams cannot express "only review PRs that change Go source files" or "skip documentation-only PRs." This wastes agent compute and API tokens on PRs that don't need agent attention.
Today's workaround is label-based β teams must maintain labels like `area/backend`, `area/frontend`, `docs-only` and ensure they're applied to every PR before the spawner polls. This creates triage overhead, introduces race conditions (label applied after spawner already created a task), and doesn't scale to fine-grained file patterns.
2. No way to route PRs to specialized agents
Teams with domain-specific agents (security reviewer for `auth/`, performance reviewer for `db/`, API compatibility checker for `api/`) must rely on labels for routing. With `filePatterns`, each spawner can declaratively claim PRs by changed file paths β no labels needed, no triage step, no race conditions.
3. Agents lack immediate file-change context
When a PR review agent starts, the WorkItem contains title, body, labels, comments, review state, and review comments β but not which files changed. The agent must run `git diff` or `gh pr diff` to discover the PR scope, wasting tokens on discovery that the spawner already knows. Exposing `{{.ChangedFiles}}` in the prompt template gives agents immediate, structured context.
Proposed API Changes
New fields on `GitHubPullRequests`
```go
type GitHubPullRequests struct {
// ... existing fields ...
// FilePatterns filters pull requests by changed file paths.
// When set, only PRs where at least one changed file matches an include
// pattern (and no changed file matches an exclude pattern) are discovered.
// Patterns use filepath.Match syntax (e.g., "*.go", "internal/**", "docs/**/*.md").
// When empty, no file-based filtering is applied (current behavior).
// +optional
FilePatterns *FilePatternFilter `json:"filePatterns,omitempty"`
}
type FilePatternFilter struct {
// Include requires at least one changed file to match any of these glob patterns.
// When empty, all files are considered matching (only exclude patterns apply).
// +optional
Include []string json:"include,omitempty"
// Exclude rejects PRs where ANY changed file matches any of these glob patterns.
// Exclusion takes precedence over inclusion.
// Useful for filtering out PRs that only change non-code files.
// +optional
Exclude []string `json:"exclude,omitempty"`
// ExcludeOnly inverts the filter logic: instead of requiring at least one
// include match, it only skips PRs where ALL changed files match exclude
// patterns (i.e., "skip docs-only PRs"). Defaults to false.
// +optional
ExcludeOnly bool `json:"excludeOnly,omitempty"`
}
```
New template variable on WorkItem
```go
type WorkItem struct {
// ... existing fields ...
// ChangedFiles lists the file paths modified by a pull request.
// Only populated for githubPullRequests source when filePatterns is set
// or when the spawner opts into file enrichment.
ChangedFiles []string
}
```
Available in prompt templates as `{{.ChangedFiles}}` (newline-joined) or iterable via `{{range .ChangedFiles}}`.
Implementation Plan
Phase 1: File fetching and filtering
Add a `fetchPRFiles` method to `GitHubPullRequestSource` using the existing `fetchGitHubPage` helper:
```go
// internal/source/github_pr.go
type githubPullRequestFile struct {
Filename string `json:"filename"`
Status string `json:"status"` // added, removed, modified, renamed, etc.
}
func (s *GitHubPullRequestSource) fetchPRFiles(ctx context.Context, number int) ([]string, error) {
var allFiles []githubPullRequestFile
pageURL := fmt.Sprintf("%s/repos/%s/%s/pulls/%d/files?per_page=100",
s.baseURL(), s.Owner, s.Repo, number)
for page := 0; pageURL != "" && page < maxPages; page++ {
var files []githubPullRequestFile
nextURL, err := s.fetchGitHubPage(ctx, pageURL, &files)
if err != nil {
return nil, fmt.Errorf("fetching PR files: %w", err)
}
for _, f := range files {
allFiles = append(allFiles, f)
}
pageURL = nextURL
}
paths := make([]string, len(allFiles))
for i, f := range allFiles {
paths[i] = f.Filename
}
return paths, nil
}
```
File pattern matching uses Go's `path.Match` (or `doublestar` for `**` support):
```go
func matchesFilePatterns(files []string, patterns *FilePatternFilter) bool {
if patterns == nil {
return true
}
hasIncludeMatch := len(patterns.Include) == 0
allMatchExclude := true
for _, file := range files {
for _, pattern := range patterns.Exclude {
if match, _ := doublestar.Match(pattern, file); match {
if !patterns.ExcludeOnly {
return false // strict exclude: any match rejects the PR
}
} else {
allMatchExclude = false
}
}
if !hasIncludeMatch {
for _, pattern := range patterns.Include {
if match, _ := doublestar.Match(pattern, file); match {
hasIncludeMatch = true
break
}
}
}
}
if patterns.ExcludeOnly && allMatchExclude && len(patterns.Exclude) > 0 {
return false // all files matched exclude patterns β skip
}
return hasIncludeMatch
}
```
Phase 2: Integration into Discover pipeline
In `GitHubPullRequestSource.Discover()`, after the existing `filterPullRequests()` call and before task creation:
```go
// After filterPullRequests, before iterating to build WorkItems:
if s.FilePatterns != nil {
var fileFiltered []githubPullRequest
for _, pr := range pullRequests {
files, err := s.fetchPRFiles(ctx, pr.Number)
if err != nil {
return nil, fmt.Errorf("fetching files for PR #%d: %w", pr.Number, err)
}
if matchesFilePatterns(files, s.FilePatterns) {
fileFiltered = append(fileFiltered, pr)
// Store files for later WorkItem enrichment
prFiles[pr.Number] = files
}
}
pullRequests = fileFiltered
}
```
Phase 3: Template variable enrichment
Populate `WorkItem.ChangedFiles` when file data is available, and update `RenderPrompt` in `internal/source/prompt.go` to expose it:
```go
item := WorkItem{
// ... existing fields ...
ChangedFiles: prFiles[pr.Number], // populated when filePatterns is set
}
```
Files Changed
| File | Change |
|---|---|
| `api/v1alpha1/taskspawner_types.go` | Add `FilePatterns` field and `FilePatternFilter` type |
| `internal/source/github_pr.go` | Add `fetchPRFiles`, `matchesFilePatterns`, integrate into `Discover` |
| `internal/source/github_pr_test.go` | Tests for file fetching and pattern matching |
| `internal/source/source.go` | Add `ChangedFiles` to `WorkItem` |
| `internal/source/prompt.go` | Expose `ChangedFiles` in template rendering |
| `internal/source/prompt_test.go` | Tests for `{{.ChangedFiles}}` template variable |
| `cmd/kelos-spawner/reconciler.go` | Wire `FilePatterns` from spec to source |
Estimated: ~150 lines of new Go code + tests.
Example Configs
Security review agent scoped to auth code
```yaml
apiVersion: kelos.dev/v1alpha1
kind: TaskSpawner
metadata:
name: security-reviewer
spec:
when:
githubPullRequests:
state: open
reviewState: any
filePatterns:
include:
- "internal/auth/"
- "internal/crypto/"
- "api/**/auth*.go"
commentPolicy:
triggerComment: "/review security"
maxConcurrency: 1
taskTemplate:
type: claude-code
model: opus
credentials:
type: oauth
secretRef:
name: kelos-credentials
workspaceRef:
name: my-repo
promptTemplate: |
Security review for PR #{{.Number}}: {{.Title}}
Changed files:
{{range .ChangedFiles}}- {{.}}
{{end}}
Focus on: authentication bypass, injection vulnerabilities,
credential handling, and access control in the changed files.
ttlSecondsAfterFinished: 3600
```
Skip documentation-only PRs
```yaml
apiVersion: kelos.dev/v1alpha1
kind: TaskSpawner
metadata:
name: code-reviewer
spec:
when:
githubPullRequests:
labels: [needs-review]
filePatterns:
exclude:
- "docs/**"
- ".md"
- ".txt"
- "LICENSE"
excludeOnly: true # only skip if ALL changed files are docs
taskTemplate:
type: claude-code
credentials:
type: oauth
secretRef:
name: kelos-credentials
workspaceRef:
name: my-repo
promptTemplate: |
Review PR #{{.Number}}: {{.Title}}
{{.Body}}
Files changed in this PR:
{{range .ChangedFiles}}- {{.}}
{{end}}
```
Per-domain agent routing (frontend vs backend)
```yaml
Frontend specialist
apiVersion: kelos.dev/v1alpha1
kind: TaskSpawner
metadata:
name: frontend-reviewer
spec:
when:
githubPullRequests:
state: open
filePatterns:
include:
- "web/"
- "src/components/"
- ".tsx"
- ".css"
taskTemplate:
type: claude-code
agentConfigRef:
name: frontend-agent-config
# ...
Backend specialist
apiVersion: kelos.dev/v1alpha1
kind: TaskSpawner
metadata:
name: backend-reviewer
spec:
when:
githubPullRequests:
state: open
filePatterns:
include:
- "internal/"
- "cmd/"
- "api/**"
- "*.go"
taskTemplate:
type: claude-code
agentConfigRef:
name: backend-agent-config
# ...
```
API Rate Limit Considerations
The GitHub PR files endpoint (`GET /repos/{owner}/{repo}/pulls/{number}/files`) costs 1 API request per PR. For a spawner watching 20 open PRs, this adds 20 requests per poll cycle.
Mitigation strategies:
- Only call the files endpoint when `filePatterns` is set β no cost if the feature isn't used
- Apply all existing filters first (labels, author, draft, review state, comments) to minimize the number of PRs that need file fetching
- Cache file lists per PR+head SHA β file changes don't change within the same commit, so the spawner can skip re-fetching if the head SHA hasn't changed since last poll
- Respect ETag/conditional requests β the existing ETag transport cache (ETag transport cache grows unbounded in long-running spawnerΒ #683) can be extended to cover PR files endpoints
The rate limit impact is modest: even with 50 open PRs and a 5-minute poll interval, this adds ~600 requests/hour β well within GitHub's 5,000/hour limit for authenticated requests.
Relationship to Existing Issues
| Issue | Relationship |
|---|---|
| #518 (Monorepo support) | Complementary: #518 adds workspace scoping (sparse checkout, workdir). This proposal adds discovery-time routing by changed files. Together they enable full monorepo workflows β route PRs by changed path AND scope the agent workspace. |
| #498 (CEL matchExpressions) | Complementary: CEL filters on work item metadata (title, body, labels). This proposal operates on changed files, which are not part of the WorkItem struct today. `ChangedFiles` enrichment could also feed into CEL expressions in the future (`item.changedFiles.exists(f, f.matches(".*\.go"))`). |
| #537 (TaskTemplate overrides) | Complementary: Per-item template overrides could use file patterns to select different models/configs per PR scope. |
| #752 (retriggerOnPush) | Synergy: When a PR is retriggered on push, the changed files may differ β the spawner should re-evaluate file patterns against the new commit. |
Backward Compatibility
- `FilePatterns` is optional; when nil, behavior is identical to today (no file-based filtering, no extra API calls)
- `ChangedFiles` on WorkItem is empty when `filePatterns` is not set (existing templates that don't use `{{.ChangedFiles}}` are unaffected)
- No CRD version bump needed (additive field in v1alpha1)
- No changes to existing TaskSpawner manifests required
/kind feature