Skip to content

fix(extract): skip _-prefixed directories in walkMarkdownFiles (#202)#209

Open
sharziki wants to merge 1 commit intogarrytan:masterfrom
sharziki:fix/extract-walk-quarantine-dirs
Open

fix(extract): skip _-prefixed directories in walkMarkdownFiles (#202)#209
sharziki wants to merge 1 commit intogarrytan:masterfrom
sharziki:fix/extract-walk-quarantine-dirs

Conversation

@sharziki
Copy link
Copy Markdown
Contributor

Summary

Fixes #202 — `gbrain extract all --source fs` walked ``-prefixed directories even though the walker already skipped ``-prefixed files, causing extract to count quarantined content that the sync path correctly excluded.

Root cause

`src/commands/extract.ts::walkMarkdownFiles()`:

```ts
for (const entry of readdirSync(d)) {
if (entry.startsWith('.')) continue; // applies to both files + dirs
const full = join(d, entry);
try {
if (lstatSync(full).isDirectory()) {
walk(full); // recurses into pending/
} else if (entry.endsWith('.md') && !entry.startsWith('
')) {
files.push({ path: full, relPath: relative(dir, full) });
}
}
}
```

The `!entry.startsWith('_')` check only gates files. The directory branch above it recurses unconditionally, so `pending/originals/foo.md` is walked even when the user has standardized on ``-prefix quarantine namespaces. On the reporter's brain: 61 authoritative pages but "154 pages walked" once the `_pending/` tree was included.

Fix

Hoist the `_`-prefix skip above the `lstatSync` branch so it behaves the same as the `.`-prefix skip — applies to both files and directories.

```ts
if (entry.startsWith('.')) continue;
if (entry.startsWith('_')) continue;
```

Test plan

  • New test in `test/extract.test.ts` builds a fixture with `concepts/alpha.md`, `_pending/ambient.md`, `_pending/originals/buried.md`, and a sibling `_skip-me.md`. Expects `walkMarkdownFiles` to return exactly `['concepts/alpha.md']`.
  • `bun test test/extract.test.ts` → 17/17 pass.

🤖 Generated with Claude Code

…tan#202)

The extract walker matched file-level behavior — `_foo.md` was skipped —
but its recursion still walked into `_pending/originals/`, so
`gbrain extract all --source fs` counted quarantined pages that the
sync path correctly excluded.

On a brain with 61 authoritative pages and a sizeable `_pending/` tree,
extract reported "154 pages walked" instead of 61, wasting link/timeline
extraction effort and producing misleading counts.

Hoist the `_`-prefix skip above the `lstatSync` branch so it applies to
directories as well. Dotted entries already behave this way. Added a
regression test covering a nested `_pending/originals/buried.md` layout
alongside a sibling `concepts/alpha.md` that must still be walked.

Fixes garrytan#202
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

extract: --source fs walker does not respect isSyncable prefix exclusions

1 participant