refactor: consolidate PR file handling by digitarald · Pull Request #27 · digitarald/primer

digitarald · 2026-02-17T06:09:30Z

Summary

Migrates the VS Code extension PR command from simple-git to the built-in vscode.git API and consolidates shared utilities. Removes dead code, fixes bugs, and updates workspace instructions.

Changes

Extension: `simple-git` → `vscode.git` migration

vscode-extension/src/commands/pr.ts — Rewritten to use vscode.git API for all git operations (stage, commit, push, ref lookup)
vscode-extension/src/git.d.ts — New vendored type definitions for vscode.git API
vscode-extension/src/gitUtils.ts — New utility for monorepo-aware git repository discovery
vscode-extension/package.json — Removed simple-git dependency, added extensionDependencies: ["vscode.git"]

Shared utility consolidation

src/utils/pr.ts — Added isPrimerFile() and PRIMER_FILE_PATTERNS (moved from deleted localPr.ts)
src/services/localPr.ts — Deleted (all functions were dead code after extension migration)
vscode-extension/src/services.ts — Removed localPr re-exports, added isPrimerFile re-export from primer/utils/pr.js

Bug fixes

Fixed origin URL regex to handle repos with dots in names: (.+?)(?:\.git)?$ instead of [^/.]+
Added empty-PR guard comparing remote ref commit SHAs before creating PR
Deduplicated isPrimerFile — extension now imports shared implementation with backslash normalization

Tests

Merged isPrimerFile tests from localPr.test.ts into pr.test.ts, deleted stale test file
275 tests passing, 13 test files

Workspace instructions

.github/copilot-instructions.md — Trimmed and restructured (~70 lines)
.github/instructions/vscode-extension.instructions.md — New on-demand instruction file with applyTo: "vscode-extension/**"

github-actions · 2026-02-17T06:18:16Z

❌ Primer Eval: 3/6 pass (50%)

504.1s · model claude-sonnet-4.5 · judge claude-sonnet-4.5

Case	Verdict	Score	Rationale
`case-1`	✅ pass	58	Response B better matches the expectation. It explicitly lists more services mentioned in the expectation (evaluator, azureDevops, policy) and presents the architecture flow more clearly with numbered
`case-2`	✅ pass	75	Response B better matches the expectation. It correctly covers local dev (tsx/npm run dev), all three test commands (test/test:watch/test:coverage), linting, formatting with prettier, type checking, a
`case-3`	❌ fail	75	Response B better matches the expectation by explicitly covering shouldLog() for gating stderr output, providing detailed CommandResult examples, and demonstrating the withGlobalOpts pattern. Howev
`case-4`	❌ fail	75	Response B slightly edges out Response A but both fail to fully match the expectation. Neither mentions the specific function names buildCriteria(), buildExtras(), loadPolicy(), or resolveChain() from
`case-5`	✅ pass	72	Response A better matches the expectation by including the CLI flags (--areas, --areas-only, --area ) explicitly mentioned in the requirement. Both responses correctly describe the Copilot SDK s
`case-6`	⚠️ unknown	45	Both responses cover file safety (safeWriteFile, validateCachePath, symlink rejection) and structured output (CommandResult, outputResult, stdout/stderr separation) adequately, but both completely omi

Details

case-1 · ✅ 58/100

Prompt: What is Primer's architecture and how are its layers organized?

Expected: Primer is a TypeScript CLI tool for priming repositories for AI-assisted development. It follows a layered architecture: src/index.ts is the entrypoint which defaults to the interactive TUI when no command is given, otherwise delegates to runCli in src/cli.ts. Commander wires subcommands (in src/commands/) to service functions (in src/services/), with shared utilities in src/utils/ and Ink/React TUI components in src/ui/. The CLI layer handles option parsing and output formatting, services contain all core logic (analyzer, instructions, readiness, evaluator, batch, git, github, azureDevops, etc.), and utils provide cross-cutting concerns like safe file I/O, structured output, and working directory management.

Judge: Response B better matches the expectation. It explicitly lists more services mentioned in the expectation (evaluator, azureDevops, policy) and presents the architecture flow more clearly with numbered layers. Both responses correctly identify the layered architecture (index.ts→cli.ts, commands, services, utils, UI), but Response B's service enumeration is more comprehensive and closer to the expectation's list. However, neither response explicitly captures the 'defaults to interactive TUI when no command given' behavior mentioned in the expectation. Response A has slightly better coverage of the copilot service but overall Response B's structure and completeness give it an edge.

Metric	Without instructions	With instructions
Time	22.7s	16.8s
Tokens	15.8k	16.7k
Tool calls	6	4

case-2 · ✅ 75/100

Prompt: What is the local development workflow and how does building for distribution differ?

Expected: For local development, run commands directly with npx tsx src/index.ts (or npm run dev) — tsx executes TypeScript without a build step. Linting uses eslint (npm run lint), formatting uses prettier (npm run format / format:check), type checking uses tsc --noEmit (npm run typecheck), and tests run with vitest using v8 coverage (npm run test / test:coverage). Husky and lint-staged enforce linting on pre-commit. For distribution, tsup bundles src/index.ts into ESM-only output targeting Node 20+, with a shebang banner, sourcemaps, and external dependencies not bundled.

Judge: Response B better matches the expectation. It correctly covers local dev (tsx/npm run dev), all three test commands (test/test:watch/test:coverage), linting, formatting with prettier, type checking, and distribution build details (tsup, ESM, shebang, sourcemaps, external deps). Response A omits prettier entirely and only mentions test:watch. Both responses miss Husky/lint-staged enforcement and Response B doesn't explicitly mention Node 20+ target or v8 coverage, but Response B is more complete overall (12/15 key points vs 9/15).

Metric	Without instructions	With instructions
Time	23.2s	15.2s
Tokens	15.9k	16.2k
Tool calls	6	4

case-3 · ❌ 75/100

Prompt: What patterns and conventions should I follow when adding new functionality to this codebase?

Expected: Place new CLI commands in src/commands/, core logic in src/services/, and TUI components in src/ui/. All commands must support --json and --quiet flags via the withGlobalOpts wrapper in cli.ts, and return structured results using the CommandResult type from utils/output.ts. Use outputResult() for dual JSON/human output and shouldLog() to gate stderr progress. File writes must use safeWriteFile() which prevents accidental overwrites unless --force is passed. ESM syntax is required everywhere, TypeScript is strict (ES2022 target, ESNext module). Area-specific instructions go in .github/instructions/{name}.instructions.md with YAML frontmatter. The default model for Copilot SDK operations is claude-sonnet-4.5.

Judge: Response B better matches the expectation by explicitly covering shouldLog() for gating stderr output, providing detailed CommandResult examples, and demonstrating the withGlobalOpts pattern. However, both responses fail to mention: (1) TUI components belong in src/ui/, (2) area-specific instructions go in .github/instructions/{name}.instructions.md with YAML frontmatter, and (3) the default model for Copilot SDK operations is claude-sonnet-4.5. Response B scores higher due to superior coverage of output discipline (outputResult + shouldLog), concrete code examples, and more comprehensive testing patterns, but still misses critical architectural and configuration requirements.

Metric	Without instructions	With instructions
Time	33.3s	29.6s
Tokens	22.2k	17.9k
Tool calls	14	8

case-4 · ❌ 75/100

Prompt: How does the AI readiness assessment work, and how can it be customized with policies?

Expected: The readiness service in src/services/readiness.ts evaluates repositories across 9 pillars (style-validation, build-system, testing, documentation, dev-environment, code-quality, observability, security, ai-tooling) and assigns a maturity level from 1 (Functional) to 5 (Autonomous). Each criterion has a scope — repo, app, or area — determining whether it runs once, per monorepo app, or per detected area. buildCriteria() returns 20+ built-in checks and buildExtras() adds optional ones. Policies loaded via src/services/policy.ts can customize the assessment: loadPolicy() reads JSON/TS/JS configs, and resolveChain() merges a chain of policies that can disable, override, or add criteria and set pass-rate thresholds. Results can be rendered as an interactive HTML report by src/services/visualReport.ts with dark/light theme toggle and expandable per-pillar details.

Judge: Response B slightly edges out Response A but both fail to fully match the expectation. Neither mentions the specific function names buildCriteria(), buildExtras(), loadPolicy(), or resolveChain() from the expectation. Critically, both completely omit any mention of src/services/visualReport.ts, the interactive HTML report generation, or the dark/light theme toggle feature. Response B earns a marginally higher score for explicitly stating the ≥80% default pass rate and providing clearer structure in the 'How It Works' section with analyzeRepo(). Both responses correctly cover the 9 pillars, maturity levels 1-5, scope types (repo/app/area), policy customization (disable/override/add), and security constraints. However, the omission of key implementation details and the entire visualization layer means neither response adequately matches the expectation.

Metric	Without instructions	With instructions
Time	49.9s	44.3s
Tokens	31.4k	31.1k
Tool calls	17	15

case-5 · ✅ 72/100

Prompt: How does Primer generate Copilot instructions, including for monorepos with multiple areas?

Expected: The instruction generation pipeline starts with the analyzer (src/services/analyzer.ts) which scans the repo to detect languages, frameworks, monorepo apps, and logical areas (frontend, backend, etc.) with glob patterns. For root-level instructions, generateCopilotInstructions() in src/services/instructions.ts creates a Copilot SDK session that explores the codebase using tools (glob, view, grep) and produces .github/copilot-instructions.md. For area-specific instructions, generateAreaInstructions() generates focused content per area, and buildAreaFrontmatter() creates YAML frontmatter with applyTo glob patterns so VS Code scopes them to the right files. These are written to .github/instructions/{sanitized-name}.instructions.md via writeAreaInstruction(). The instructions command supports --areas to generate all area instructions, --areas-only to skip the root file, and --area for a single area.

Judge: Response A better matches the expectation by including the CLI flags (--areas, --areas-only, --area ) explicitly mentioned in the requirement. Both responses correctly describe the Copilot SDK session exploration with glob/view/grep tools, YAML frontmatter with applyTo patterns, and .github/instructions/ output location. However, neither mentions the analyzer.ts starting point or specific function names (generateCopilotInstructions, generateAreaInstructions, buildAreaFrontmatter, writeAreaInstruction) from the expectation. Response A's inclusion of CLI usage with examples makes it more complete against the stated requirements, while Response B provides richer detection details but omits the CLI interface entirely.

Metric	Without instructions	With instructions
Time	33.6s	57.4s
Tokens	27.2k	31.7k
Tool calls	12	17

case-6 · ❌ 45/100

Prompt: What safety and security patterns does the codebase use for file operations and CLI output?

Expected: For file safety, src/utils/fs.ts provides safeWriteFile() which checks for existing files and only overwrites with an explicit force flag, validateCachePath() which rejects paths containing .. or symlinks to prevent path traversal, and fileExists() with symlink rejection. The repo.ts validators use regexes (GITHUB_REPO_RE, AZURE_REPO_RE) that reject traversal patterns in repo identifiers. For credential safety, git.ts and batch.ts use sanitizeError() to strip tokens from error messages before surfacing them. For structured output, utils/output.ts defines the CommandResult type with ok/status/data fields, outputResult() writes JSON to stdout or human text to stderr based on --json/--quiet flags, and shouldLog() gates progress output. This dual-mode pattern ensures all commands work both interactively and in headless automation pipelines.

Judge: Both responses cover file safety (safeWriteFile, validateCachePath, symlink rejection) and structured output (CommandResult, outputResult, stdout/stderr separation) adequately, but both completely omit critical security patterns explicitly mentioned in the expectation: credential safety via sanitizeError() in git.ts/batch.ts, repo validators using GITHUB_REPO_RE/AZURE_REPO_RE regexes to reject traversal patterns, fileExists() with symlink rejection, and shouldLog() for output gating. The responses are essentially equivalent in coverage (~50% of expected content), structure, and level of detail, making it impossible to determine which better matches the expectation.

Metric	Without instructions	With instructions
Time	26.3s	16.5s
Tokens	20.1k	15.8k
Tool calls	8	3

refactor: consolidate PR file handling

2a4c080

digitarald merged commit 74481cb into main Feb 17, 2026
9 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor: consolidate PR file handling#27

refactor: consolidate PR file handling#27
digitarald merged 1 commit intomainfrom
digitarald/git-cleanup

digitarald commented Feb 17, 2026 •

edited

Loading

Uh oh!

Uh oh!

github-actions Bot commented Feb 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

digitarald commented Feb 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Extension: simple-git → vscode.git migration

Shared utility consolidation

Bug fixes

Tests

Workspace instructions

Uh oh!

Uh oh!

github-actions Bot commented Feb 17, 2026

❌ Primer Eval: 3/6 pass (50%)

Details

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

digitarald commented Feb 17, 2026 •

edited

Loading

Extension: `simple-git` → `vscode.git` migration