Open
Conversation
Parses TOML files line-by-line, extracting [table] headers and key=value pairs as config_entry entities with CONTAINS relationships. Mirrors the YAML parser pattern — no external dependencies needed. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
TOML has a custom parser (parseTomlFile) like YAML/JSON/SQL/Dockerfile, but was missing from the isGrammarSupported check. This caused all .toml files to be silently dropped before reaching parseFile. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
[profile.release] now materialises a `profile` node in the graph, not just `release`. A per-file seen-set deduplicates shared prefixes so [profile.release] + [profile.dev] produce exactly one `profile` entity. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Allows disambiguating by file path when multiple entities share the same name across workspaces or files. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Parses ATX headings as heading entities with hierarchical CONTAINS relationships, YAML frontmatter as a frontmatter entity/chunk, and section chunks spanning each heading's content. Skips headings inside fenced code blocks. Falls back to file_body for files with no headings. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
File discovery in ingest.ts maintains a local extension set that mirrors core-ingestion's EXT_MAP but was missing .md and .markdown. This caused markdown files to be silently excluded before reaching the parser, so ix map never produced heading, frontmatter, or file nodes for .md files — even though parseMarkdownFile was fully implemented. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Two bugs from the markdown-parsing test run: 1. LANGUAGE_QUERIES was missing a Markdown entry, causing a TS2741 compile error that prevented core-ingestion from building. Added [SupportedLanguages.Markdown]: '' (markdown uses its own hand-written parser, not tree-sitter queries). 2. ix config set workspace <name> wrote a top-level workspace: key to config.yaml but resolveWorkspaceRoot never read it — all commands continued routing to the workspace with default: true. Added workspace?: string to IxConfig and a lookup step in resolveWorkspaceRoot that checks cfg.workspace by name before falling back to the default workspace. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…xtractor to 1.21 Section chunks were stored as kind 'chunk', making ix search --kind section return no results. Giving them a first-class 'section' kind (consistent with heading/frontmatter) makes them discoverable via --kind filtering. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- ix contains: absolute paths now match against graph's relative URIs by checking targetLower.endsWith(uri); URI-length tiebreaker picks the most specific match when multiple quality-0 candidates exist - ix contains/explain: 8–31 char hex inputs (short IDs from CLI output) now attempt resolvePrefix before falling back to symbol resolution - ix explain: context section always shows Kind so section (and all other) entity types are visible in the detail view Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Section chunks now span to the next heading at the same or shallower level, so parent sections include their full nested subtree content - Bare filename tie-breaking now prefers shorter URIs (root-level files) over deeply-nested ones, fixing `ix contains README.md` resolving to a fixture file instead of the root README - Add test asserting parent section lineEnd covers nested child sections Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replaces single-line HTML strip with a full pipeline that handles
anchor IDs ({#...}), backtick-wrapped component names, backslash-escaped
angle brackets, stability markers (\*\*), inline HTML badges, and
double-space normalization. Adds 6 regression tests for vuejs/docs cases.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… body chunk
- Bug 1: emit file_body when no section headings exist even if a frontmatter
chunk is already present; lineStart/startByte now point past the frontmatter
- Bug 2: detect setext-style headings (=== and --- underlines) inside the
heading loop with a one-line lookahead; respects inFence guard and integrates
with existing headingStack nesting
- Bug 3: replace boolean inFence with fenceState {char, len} so a backtick
fence can only be closed by backticks of >= the opening length, and vice
versa for tildes
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Restructured keyPattern to handle quoted and bare keys as separate alternatives, eliminating the space/\s* overlap that caused polynomial backtracking (CodeQL js/polynomial-redos). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
/\s*\{#[^}]+\}\s*$/ had three overlapping quantifiers on uncontrolled
input causing O(n²) backtracking. Replaced with lastIndexOf/indexOf
string methods (CodeQL js/polynomial-redos).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…parser - Loop <[^>]+> replacement until stable to handle split tags like <scr<x>ipt> (CodeQL js/incomplete-multi-character-sanitization) - Replace headingPattern (.+?)(?:\s+#+)?\s*$ with (.*\S) to eliminate three overlapping quantifier pairs on spaces - Replace htmlHeadingPattern (.*?)\s*$ with greedy .* anchored by closing tag, eliminating (.*?)/\s*$ overlap (CodeQL js/polynomial-redos) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- headingPattern: replace \s+(.*\S) with [ \t]+(\S[^\r\n]*) so content must start with non-whitespace, removing the \s+/.* overlap that caused O(n²) backtracking on space-only lines - rawName: replace /\s+#+$/ regex with a string walk, removing +#+$ backtracking on strings with # runs followed by no end-of-string match (CodeQL js/polynomial-redos) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
riley0227
approved these changes
Apr 6, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds TOML and Markdown file parsing to the ingestion pipeline.
.tomlfiles are parsed into the graph asconfig_entryentities — table headers ([package],[profile.release]) and key=value pairs — withCONTAINSrelationships linking them..md/.markdownfiles are parsed into the graph assectionentities derived from headings (ATX#, setext underline, and HTML<h1>–<h6>), with hierarchicalCONTAINSrelationships, YAML frontmatter extraction, and fenced code block boundaries respected. Also fixesix containsdisambiguation when the same name appears across multiple files.Closes #145
Type
Changes
TOML
[table]headers andkey = valuepairs asconfig_entryentities withCONTAINSrelationships, mirroring the YAML parser pattern[profile.release]now materialises aprofilenode in the graph, not justrelease; a per-file seen-set deduplicates shared prefixesisGrammarSupportedearly-return guard —.tomlfiles were being silently dropped before reachingparseFile--pathfilter toix contains— allows disambiguating by file path when multiple entities share the same name across workspaces or filesMarkdown
sectionentities with hierarchicalCONTAINSrelationships; supports ATX headings, setext headings, and HTML headings (<h1>–<h6>); parses YAML frontmatter as a body chunk; respects fenced code block boundaries so heading-like lines inside code aren't parsed as headings.mdand.markdowntoSUPPORTED_EXTENSIONS'section'; bump extractor to 1.21ix config set workspaceso it actually takes effect; scope CLI search to active workspace{#anchor}, badges,<sup>, escaped angle brackets)Validation
TOML
Cargo.tomlfiles)ix text --language tomlreturns results fromCargo.tomlname,version,editionresolve asconfig_entrywithlanguage: toml[package],[dependencies]resolve as table entitiesix contains packagereturns correct child keys (name,version,edition,authors, etc.)[profile.release]→profileintermediate node present;opt-levelresolves as childix contains package --path crates/regexresolves without ambiguity promptqueries.toml.test.tscovers the parserMarkdown
ix mapon a known TS/JS repo, verify counts unchanged)npm testinix-cli)ix containsreturns children for a known markdown file and headingix textreturns results withlanguage: markdown.mdand.markdownextensions ingestedChecklist