Skip to content

feat(tree_path): add multi-language support with LanguageConfig refactor#2

Merged
xen0n merged 18 commits intomainfrom
langconfig-refactor
Mar 10, 2026
Merged

feat(tree_path): add multi-language support with LanguageConfig refactor#2
xen0n merged 18 commits intomainfrom
langconfig-refactor

Conversation

@xen0n
Copy link
Contributor

@xen0n xen0n commented Mar 9, 2026

This PR adds support for Python, Go, JavaScript, and TypeScript tree_path resolution via a refactored LanguageConfig abstraction.

Changes

refactor(tree_path): extract LanguageConfig abstraction

  • Introduce LanguageConfig struct to replace hardcoded Rust-specific logic
  • Configurable: ts_language, extensions, kind_map, name_field, name_overrides, body_fields
  • New languages can be added by defining a LanguageConfig constant — no changes to resolve/compute logic needed
  • All existing tests pass, no behavioral changes

feat: Python support (lang-python feature)

  • Mappings: function_definition, class_definition
  • Extensions: .py, .pyi
  • Tree paths: fn::name, class::Name, class::Name::fn::method

feat: Go support (lang-go feature)

  • Mappings: function_declaration, method_declaration
  • Extensions: .go
  • Tree paths: fn::Name, method::Name

feat: JavaScript/TypeScript support (lang-javascript, lang-typescript features)

  • JS: function_declaration, class_declaration, method_definition (.js, .mjs, .cjs, .jsx)
  • TS: adds interface_declaration, type_alias_declaration, enum_declaration (.ts, .mts, .cts)
  • TSX: separate grammar for .tsx files
  • Tree paths: fn::name, class::Name, class::Name::method::name, interface::Name, type::Name, enum::Name

Usage

# Individual languages
cargo build --features lang-python
cargo build --features lang-go
cargo build --features lang-javascript
cargo build --features lang-typescript  # includes JS

# All languages
cargo build --features "lang-python lang-go lang-javascript lang-typescript"

Testing

# Default (Rust only)
cargo test --package liyi

# With all languages
cargo test --package liyi --features "lang-python lang-go lang-javascript lang-typescript"

All 55+ tests pass with all features enabled.

Roadmap Alignment

Implements M1.1–M1.5 from docs/liyi-01x-roadmap.md:

  • ✅ M1.1: LanguageConfig refactor
  • ✅ M1.2: Python support
  • ✅ M1.3: Go support
  • ✅ M1.4: JavaScript support
  • ✅ M1.5: TypeScript support

@xen0n xen0n force-pushed the langconfig-refactor branch 2 times, most recently from 550690e to 2a5fecb Compare March 10, 2026 08:07
xen0n added 16 commits March 10, 2026 20:27
Refactor tree_path.rs to use a data-driven LanguageConfig struct instead
of hardcoded Rust-specific logic. This enables adding new languages by
only adding a new LanguageConfig constant and Cargo feature.

Changes:
- Add LanguageConfig struct with ts_language, extensions, kind_map,
  name_field, name_overrides, and body_fields
- Convert Rust-specific KIND_MAP, node_name(), and body resolution to
  LanguageConfig methods
- Update all resolve/compute functions to take &LanguageConfig
- Use fn() -> TSLanguage for lazy language initialization

All existing tests pass. No behavioral changes.

Original prompt:

> Let's first work on refactoring and multi-language support. Please
> remember to commit frequently as you make changes. And you must respect
> the commit message convention. Work in a new branch and PR when ready.

AI-assisted-by: Kimi K2.5 (OpenClaw)
Signed-off-by: WANG Xuerui <git@xen0n.name>
Add tree_path support for Python via optional lang-python Cargo feature.

Changes:
- Add tree-sitter-python dependency (0.25.0) as optional feature
- Add PYTHON_CONFIG LanguageConfig with function_definition and
  class_definition mappings
- Update Language enum with Python variant (cfg-gated)
- Update detect_language() to recognize .py and .pyi files
- Add comprehensive Python tests (function, class, method resolution)

Usage: cargo build --features lang-python

Original prompt:

> Let's first work on refactoring and multi-language support. Please
> remember to commit frequently as you make changes. And you must respect
> the commit message convention. Work in a new branch and PR when ready.

AI-assisted-by: Kimi K2.5 (OpenClaw)
Signed-off-by: WANG Xuerui <git@xen0n.name>
Add tree_path support for Go via optional lang-go Cargo feature.

Changes:
- Add tree-sitter-go dependency (0.25.0) as optional feature
- Add GO_CONFIG LanguageConfig with function_declaration and
  method_declaration mappings
- Update Language enum with Go variant (cfg-gated)
- Update detect_language() to recognize .go files
- Add comprehensive Go tests (function, method resolution)

Note: struct and interface types are not yet supported due to Go's
nested type_declaration/type_spec AST structure.

Usage: cargo build --features lang-go

Original prompt:

> Let's first work on refactoring and multi-language support. Please
> remember to commit frequently as you make changes. And you must respect
> the commit message convention. Work in a new branch and PR when ready.

AI-assisted-by: Kimi K2.5 (OpenClaw)
Signed-off-by: WANG Xuerui <git@xen0n.name>
Add tree_path support for JavaScript and TypeScript via optional
Cargo features.

Changes:
- Add tree-sitter-javascript (0.25.0) and tree-sitter-typescript
  (0.23.2) as optional dependencies
- Add JAVASCRIPT_CONFIG with function_declaration, class_declaration,
  and method_definition mappings (.js, .mjs, .cjs, .jsx)
- Add TYPESCRIPT_CONFIG and TSX_CONFIG with additional interface,
  type alias, and enum mappings (.ts, .mts, .cts, .tsx)
- Update Language enum with JavaScript, TypeScript, and Tsx variants
- Update detect_language() to recognize JS/TS file extensions
- Add comprehensive JS/TS tests for all node types

Features:
- lang-javascript: JavaScript support
- lang-typescript: TypeScript/TSX support (implies lang-javascript)

Usage: cargo build --features lang-javascript
       cargo build --features lang-typescript

Original prompt:

> Let's first work on refactoring and multi-language support. Please
> remember to commit frequently as you make changes. And you must respect
> the commit message convention. Work in a new branch and PR when ready.

AI-assisted-by: Kimi K2.5 (OpenClaw)
Signed-off-by: WANG Xuerui <git@xen0n.name>
Make lang-python, lang-go, lang-javascript, and lang-typescript enabled
by default. Users can opt out with --no-default-features if needed.

Original prompt:

> make all lang features default-enabled

AI-assisted-by: Kimi K2.5 (OpenClaw)
Signed-off-by: WANG Xuerui <git@xen0n.name>
Fix clippy warning (collapsible_if) and run cargo fmt. Reanchor
sidecar specs after tree_path.rs changes.

Original prompt:

> did you forget to check clippy, cargo fmt, and sync 立意?

AI-assisted-by: Kimi K2.5 (OpenClaw)
Signed-off-by: WANG Xuerui <git@xen0n.name>
Remove #[cfg(feature = "...")] from Language enum variants to ensure
API stability. The enum variants are now always present, but languages
report whether they're supported via is_supported().

Changes:
- Remove #[cfg] gates from Language enum variants
- Add Language::is_supported() method for runtime feature checking
- Change Language::config() to return Option<&LanguageConfig>
- Change Language::ts_language() to return Option<TSLanguage>
- Update make_parser(), resolve_tree_path(), compute_tree_path() to
  handle unsupported languages gracefully by returning None/empty
- Update detect_language() to only return supported languages

This ensures downstream code can match on all Language variants
without conditional compilation, while still gracefully handling
unsupported languages at runtime.

Original prompt:

> Can you adversarially review the PR branch changes?

AI-assisted-by: Kimi K2.5 (OpenClaw)
Signed-off-by: WANG Xuerui <git@xen0n.name>
Make LanguageConfig fields private to hide implementation details
and expose a cleaner public API.

Changes:
- Remove pub from all LanguageConfig fields
- Add matches_extension(&self, ext: &str) -> bool public method
- Update detect_language() to use the new method

This prevents external code from depending on internal struct
layout while still allowing the necessary operations.

Original prompt:

> Fix them one by one.

AI-assisted-by: Kimi K2.5 (OpenClaw)
Signed-off-by: WANG Xuerui <git@xen0n.name>
Add documentation for known limitations and test coverage:

- Document Go method naming collision in GO_CONFIG doc comment
- Note that methods resolve as method::Name without receiver type
  disambiguation, which can cause tree_path collisions
- Add TSX test module with tests for function, class, interface, and
  file extension detection

Original prompt:

> Fix them one by one.

AI-assisted-by: Kimi K2.5 (OpenClaw)
Signed-off-by: WANG Xuerui <git@xen0n.name>
Add documentation to detect_language() explaining the behavior when
two languages share an extension (first match wins).

Original prompt:

> Fix them one by one.

AI-assisted-by: Kimi K2.5 (OpenClaw)
Signed-off-by: WANG Xuerui <git@xen0n.name>
Drop Cargo feature gates for tree-sitter grammars — all five languages
(Rust, Python, Go, JavaScript, TypeScript) are now compiled into the
binary unconditionally.  The binary-size cost is modest relative to
the universality benefit; Python/Go/JS/TS codebases vastly outnumber
Rust codebases and requiring opt-in per language would hinder adoption.

Go tree_path support:

- Add `custom_name` callback to `LanguageConfig` for languages with
  non-trivial name extraction.
- Change `node_name` return type to `Cow<str>` to support both
  borrowed and owned (composite) names.
- Encode method receivers: `method::(*Type).Method` (pointer) vs
  `method::Type.Method` (value).
- Navigate type_declaration → type_spec, const_declaration →
  const_spec, var_declaration → var_spec indirection via custom_name.
- Use unified `type` shorthand for structs, interfaces, and type
  aliases — Go type names are unique per package.

Code changes (tree_path.rs, Cargo.toml):
- Remove all #[cfg(feature = "...")] gates from statics, Language
  impl, detect_language, and test modules.
- Make all tree-sitter-* dependencies unconditional; remove [features]
  section from Cargo.toml.

Doc updates (liyi-design.md, liyi-01x-roadmap.md):
- Update design doc: language support is built-in (not feature-gated),
  binary is ~6000 lines / 11 MiB (not "small"), remains single binary.
- Update roadmap: mark M1 milestones complete, remove feature-gate
  references from headings and acceptance criteria, document resolved
  Go receiver encoding design.

All 114 tests pass (90 unit + 20 golden + 4 proptest).

Original prompt:

> Review the current branch's changes against the roadmap and
> design doc on the main branch.
>
> Regarding M1.3, this pattern is prevalent so support should be
> added. Regarding conditional features, having them not built-in by
> default would hinder adoption (orders of magnitude more than Rust
> codebases in the wild), so to fulfill the project's promise as a
> universal tool I'd suggest dropping "conditional" altogether in
> the docs. The design doc may also need updating in that the linter
> is already >6000 lines of Rust and a 11 MiB release build, by no
> means "small". It is expected to remain as one binary, though.

AI-assisted-by: Claude Opus 4.6 (GitHub Copilot)
Signed-off-by: WANG Xuerui <git@xen0n.name>
Reanchor all specs after the LanguageConfig refactor and Go support
addition.  Fix two misidentified specs that reanchor shifted into
wrong items:

- KIND_MAP → LanguageConfig (struct replaced the static array)
- node_name at matches_extension span → node_name at actual method span

Update Language enum intent from "only Rust" to list all six built-in
variants.  Add go_node_name spec covering receiver encoding and
type/const/var spec indirection.

liyi check: 85 current, 0 stale, 0 shifted.

Original prompt:

> please sync 立意

AI-assisted-by: Claude Opus 4.6 (GitHub Copilot)
Signed-off-by: WANG Xuerui <git@xen0n.name>
Add tree_path structural identity support for C, C++, Java, C#,
PHP, Objective-C, Kotlin, and Swift. Each language gets a
LanguageConfig with kind mappings and, where needed, custom name
extraction callbacks:

- C/C++: declarator-chain unwrapping for function_definition,
  C++ adds template_declaration transparency and alias_declaration
- Objective-C: class_interface/implementation/protocol name
  extraction, selector composition for methods
- Kotlin: property_declaration and type_alias name extraction,
  class_body positional-child handling in find_body()
- PHP: const_declaration name via const_element child
- Java/C#/Swift: standard field-based extraction (no custom
  callback needed)

Extends detect_language() with all new file extensions.
Generalizes find_body() to search body_fields as child node kinds
(not just field names), enabling Kotlin class_body and C++
field_declaration_list.

All 103 tree_path tests pass, including 8 new per-language test
modules. Full test suite (unit, golden, proptest) green.

Updates M2 section of docs/liyi-01x-roadmap.md from placeholder
"Deferred languages" to comprehensive documentation of all 8
language integrations.

Original prompt:

> Let's build them into the roadmap docs and implement.
> (Referring to C, C++, Objective-C, Java, C#, PHP, Kotlin, Swift
> tree-sitter language support for tree_path.)

AI-assisted-by: Claude Opus 4.6 (GitHub Copilot)
Signed-off-by: WANG Xuerui <git@xen0n.name>
Extract each language configuration into its own file under
crates/liyi/src/tree_path/:

  mod.rs           – core infrastructure (LanguageConfig, Language enum,
                     detect_language, resolve/compute functions) — 752 lines
  lang_rust.rs     – Rust config
  lang_python.rs   – Python config + tests
  lang_go.rs       – Go config + go_node_name callback + tests
  lang_c.rs        – C config + c_extract_declarator_name (shared) + tests
  lang_cpp.rs      – C++ config + tests (imports c_extract_declarator_name)
  lang_objc.rs     – Objective-C config + tests (imports c_extract_declarator_name)
  lang_java.rs     – Java config + tests
  lang_csharp.rs   – C# config + tests
  lang_php.rs      – PHP config + php_node_name callback + tests
  lang_kotlin.rs   – Kotlin config + kotlin_node_name callback + tests
  lang_swift.rs    – Swift config + tests
  lang_javascript.rs – JavaScript config + tests
  lang_typescript.rs – TypeScript + TSX configs + tests

No behavioral changes. All 168 tests pass (144 unit + 20 golden + 4 proptest).
Sidecar moved to tree_path/mod.rs.liyi.jsonc and reanchored (85 current).

Original prompt:

> The tree_path module is getting large, please refactor so
> every language lives its own file and commit.

AI-assisted-by: Claude Opus 4.6 (GitHub Copilot)
Signed-off-by: WANG Xuerui <git@xen0n.name>
The linter now has 14 tree-sitter grammars and ~7 k lines of Rust.
Update the "self-contained" bullet and the reimplementation cost
estimate to reflect the current state.

Original prompt:

> Regarding the design doc (line 1900 and 2034) -- the linter is now
> 7348 lines of Rust and 33 MiB, arguably not "lightweight" any more.
> We need to reword a bit.
>
> For line 1900, I think this version is still a bit too much. Just
> "The linter is a single binary with tree-sitter grammars built in,
> no runtime dependencies" would be enough, because the size or
> complexity doesn't matter for self-containedness.

Human note: added the second turn.

AI-assisted-by: Claude Opus 4.6 (GitHub Copilot)
Signed-off-by: WANG Xuerui <git@xen0n.name>
Apply cargo fmt across tree_path/lang_*.rs files and fix two clippy
warnings:
- collapsible_if in tree_path/mod.rs (nested if let → combined chain)
- cloned_ref_to_slice_refs in discovery.rs (&[sub.clone()] → from_ref)

Reanchor mod.rs.liyi.jsonc after the code changes.

Original-prompt: please fix cargo fmt and cargo clippy and sync sidecars
AI-assisted-by: Claude Opus 4.6 (GitHub Copilot)
Signed-off-by: WANG Xuerui <git@xen0n.name>
@xen0n xen0n force-pushed the langconfig-refactor branch from 2a5fecb to 11c6d53 Compare March 10, 2026 12:28
xen0n added 2 commits March 10, 2026 20:29
Scaffold and populate .liyi.jsonc sidecars for 13 language support
files (lang_c, lang_cpp, lang_csharp, lang_go, lang_java,
lang_javascript, lang_kotlin, lang_objc, lang_php, lang_python,
lang_rust, lang_swift, lang_typescript) with intent specs for all
non-trivial items (CONFIG statics, custom name extractors).

Original prompt:

> okay, commit them first. also don't the newly added language
> support files need sidecars? if you decide to add, do it with
> subagents

Human note: fixed formatting and content of "original prompt".

AI-assisted-by: Claude Opus 4.6 (GitHub Copilot)
Signed-off-by: WANG Xuerui <git@xen0n.name>
The bullet said "the linter doesn't parse source code", but since
semantic anchors (tree_path) were added, the linter does parse source
via tree-sitter. Reword to say only the checking process skips parsing.

Original prompt:

> In line 1902, "the linter doesn't parse source code" -- this is
> no longer true, since support for semantic anchors was added. Please
> reword to say only the checking process doesn't parse code. Resync
> sidecar and commit.

AI-assisted-by: Claude Opus 4.6 (GitHub Copilot)
Signed-off-by: WANG Xuerui <git@xen0n.name>
@xen0n xen0n force-pushed the langconfig-refactor branch from 11c6d53 to de7ea3a Compare March 10, 2026 12:31
@xen0n xen0n merged commit de7ea3a into main Mar 10, 2026
4 checks passed
@xen0n xen0n deleted the langconfig-refactor branch March 10, 2026 12:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant