refactor: consolidate PR file handling#27
Conversation
❌ Primer Eval: 3/6 pass (50%)
Details
|
| Metric | Without instructions | With instructions |
|---|---|---|
| Time | 22.7s | 16.8s |
| Tokens | 15.8k | 16.7k |
| Tool calls | 6 | 4 |
case-2 · ✅ 75/100
Prompt: What is the local development workflow and how does building for distribution differ?
Expected: For local development, run commands directly with npx tsx src/index.ts (or npm run dev) — tsx executes TypeScript without a build step. Linting uses eslint (npm run lint), formatting uses prettier (npm run format / format:check), type checking uses tsc --noEmit (npm run typecheck), and tests run with vitest using v8 coverage (npm run test / test:coverage). Husky and lint-staged enforce linting on pre-commit. For distribution, tsup bundles src/index.ts into ESM-only output targeting Node 20+, with a shebang banner, sourcemaps, and external dependencies not bundled.
Judge: Response B better matches the expectation. It correctly covers local dev (tsx/npm run dev), all three test commands (test/test:watch/test:coverage), linting, formatting with prettier, type checking, and distribution build details (tsup, ESM, shebang, sourcemaps, external deps). Response A omits prettier entirely and only mentions test:watch. Both responses miss Husky/lint-staged enforcement and Response B doesn't explicitly mention Node 20+ target or v8 coverage, but Response B is more complete overall (12/15 key points vs 9/15).
| Metric | Without instructions | With instructions |
|---|---|---|
| Time | 23.2s | 15.2s |
| Tokens | 15.9k | 16.2k |
| Tool calls | 6 | 4 |
case-3 · ❌ 75/100
Prompt: What patterns and conventions should I follow when adding new functionality to this codebase?
Expected: Place new CLI commands in src/commands/, core logic in src/services/, and TUI components in src/ui/. All commands must support --json and --quiet flags via the withGlobalOpts wrapper in cli.ts, and return structured results using the CommandResult type from utils/output.ts. Use outputResult() for dual JSON/human output and shouldLog() to gate stderr progress. File writes must use safeWriteFile() which prevents accidental overwrites unless --force is passed. ESM syntax is required everywhere, TypeScript is strict (ES2022 target, ESNext module). Area-specific instructions go in .github/instructions/{name}.instructions.md with YAML frontmatter. The default model for Copilot SDK operations is claude-sonnet-4.5.
Judge: Response B better matches the expectation by explicitly covering shouldLog() for gating stderr output, providing detailed CommandResult examples, and demonstrating the withGlobalOpts pattern. However, both responses fail to mention: (1) TUI components belong in src/ui/, (2) area-specific instructions go in .github/instructions/{name}.instructions.md with YAML frontmatter, and (3) the default model for Copilot SDK operations is claude-sonnet-4.5. Response B scores higher due to superior coverage of output discipline (outputResult + shouldLog), concrete code examples, and more comprehensive testing patterns, but still misses critical architectural and configuration requirements.
| Metric | Without instructions | With instructions |
|---|---|---|
| Time | 33.3s | 29.6s |
| Tokens | 22.2k | 17.9k |
| Tool calls | 14 | 8 |
case-4 · ❌ 75/100
Prompt: How does the AI readiness assessment work, and how can it be customized with policies?
Expected: The readiness service in src/services/readiness.ts evaluates repositories across 9 pillars (style-validation, build-system, testing, documentation, dev-environment, code-quality, observability, security, ai-tooling) and assigns a maturity level from 1 (Functional) to 5 (Autonomous). Each criterion has a scope — repo, app, or area — determining whether it runs once, per monorepo app, or per detected area. buildCriteria() returns 20+ built-in checks and buildExtras() adds optional ones. Policies loaded via src/services/policy.ts can customize the assessment: loadPolicy() reads JSON/TS/JS configs, and resolveChain() merges a chain of policies that can disable, override, or add criteria and set pass-rate thresholds. Results can be rendered as an interactive HTML report by src/services/visualReport.ts with dark/light theme toggle and expandable per-pillar details.
Judge: Response B slightly edges out Response A but both fail to fully match the expectation. Neither mentions the specific function names buildCriteria(), buildExtras(), loadPolicy(), or resolveChain() from the expectation. Critically, both completely omit any mention of src/services/visualReport.ts, the interactive HTML report generation, or the dark/light theme toggle feature. Response B earns a marginally higher score for explicitly stating the ≥80% default pass rate and providing clearer structure in the 'How It Works' section with analyzeRepo(). Both responses correctly cover the 9 pillars, maturity levels 1-5, scope types (repo/app/area), policy customization (disable/override/add), and security constraints. However, the omission of key implementation details and the entire visualization layer means neither response adequately matches the expectation.
| Metric | Without instructions | With instructions |
|---|---|---|
| Time | 49.9s | 44.3s |
| Tokens | 31.4k | 31.1k |
| Tool calls | 17 | 15 |
case-5 · ✅ 72/100
Prompt: How does Primer generate Copilot instructions, including for monorepos with multiple areas?
Expected: The instruction generation pipeline starts with the analyzer (src/services/analyzer.ts) which scans the repo to detect languages, frameworks, monorepo apps, and logical areas (frontend, backend, etc.) with glob patterns. For root-level instructions, generateCopilotInstructions() in src/services/instructions.ts creates a Copilot SDK session that explores the codebase using tools (glob, view, grep) and produces .github/copilot-instructions.md. For area-specific instructions, generateAreaInstructions() generates focused content per area, and buildAreaFrontmatter() creates YAML frontmatter with applyTo glob patterns so VS Code scopes them to the right files. These are written to .github/instructions/{sanitized-name}.instructions.md via writeAreaInstruction(). The instructions command supports --areas to generate all area instructions, --areas-only to skip the root file, and --area for a single area.
Judge: Response A better matches the expectation by including the CLI flags (--areas, --areas-only, --area ) explicitly mentioned in the requirement. Both responses correctly describe the Copilot SDK session exploration with glob/view/grep tools, YAML frontmatter with applyTo patterns, and .github/instructions/ output location. However, neither mentions the analyzer.ts starting point or specific function names (generateCopilotInstructions, generateAreaInstructions, buildAreaFrontmatter, writeAreaInstruction) from the expectation. Response A's inclusion of CLI usage with examples makes it more complete against the stated requirements, while Response B provides richer detection details but omits the CLI interface entirely.
| Metric | Without instructions | With instructions |
|---|---|---|
| Time | 33.6s | 57.4s |
| Tokens | 27.2k | 31.7k |
| Tool calls | 12 | 17 |
case-6 · ❌ 45/100
Prompt: What safety and security patterns does the codebase use for file operations and CLI output?
Expected: For file safety, src/utils/fs.ts provides safeWriteFile() which checks for existing files and only overwrites with an explicit force flag, validateCachePath() which rejects paths containing .. or symlinks to prevent path traversal, and fileExists() with symlink rejection. The repo.ts validators use regexes (GITHUB_REPO_RE, AZURE_REPO_RE) that reject traversal patterns in repo identifiers. For credential safety, git.ts and batch.ts use sanitizeError() to strip tokens from error messages before surfacing them. For structured output, utils/output.ts defines the CommandResult type with ok/status/data fields, outputResult() writes JSON to stdout or human text to stderr based on --json/--quiet flags, and shouldLog() gates progress output. This dual-mode pattern ensures all commands work both interactively and in headless automation pipelines.
Judge: Both responses cover file safety (safeWriteFile, validateCachePath, symlink rejection) and structured output (CommandResult, outputResult, stdout/stderr separation) adequately, but both completely omit critical security patterns explicitly mentioned in the expectation: credential safety via sanitizeError() in git.ts/batch.ts, repo validators using GITHUB_REPO_RE/AZURE_REPO_RE regexes to reject traversal patterns, fileExists() with symlink rejection, and shouldLog() for output gating. The responses are essentially equivalent in coverage (~50% of expected content), structure, and level of detail, making it impossible to determine which better matches the expectation.
| Metric | Without instructions | With instructions |
|---|---|---|
| Time | 26.3s | 16.5s |
| Tokens | 20.1k | 15.8k |
| Tool calls | 8 | 3 |
Summary
Migrates the VS Code extension PR command from
simple-gitto the built-invscode.gitAPI and consolidates shared utilities. Removes dead code, fixes bugs, and updates workspace instructions.Changes
Extension:
simple-git→vscode.gitmigrationvscode-extension/src/commands/pr.ts— Rewritten to usevscode.gitAPI for all git operations (stage, commit, push, ref lookup)vscode-extension/src/git.d.ts— New vendored type definitions forvscode.gitAPIvscode-extension/src/gitUtils.ts— New utility for monorepo-aware git repository discoveryvscode-extension/package.json— Removedsimple-gitdependency, addedextensionDependencies: ["vscode.git"]Shared utility consolidation
src/utils/pr.ts— AddedisPrimerFile()andPRIMER_FILE_PATTERNS(moved from deletedlocalPr.ts)src/services/localPr.ts— Deleted (all functions were dead code after extension migration)vscode-extension/src/services.ts— RemovedlocalPrre-exports, addedisPrimerFilere-export fromprimer/utils/pr.jsBug fixes
(.+?)(?:\.git)?$instead of[^/.]+isPrimerFile— extension now imports shared implementation with backslash normalizationTests
isPrimerFiletests fromlocalPr.test.tsintopr.test.ts, deleted stale test fileWorkspace instructions
.github/copilot-instructions.md— Trimmed and restructured (~70 lines).github/instructions/vscode-extension.instructions.md— New on-demand instruction file withapplyTo: "vscode-extension/**"