Skip to content

fix: tighten types and move textcase to module scope in custom delimiter loading#277

Merged
bashandbone merged 3 commits intoclaude/issue-99-20260319-0103from
copilot/sub-pr-275
Mar 25, 2026
Merged

fix: tighten types and move textcase to module scope in custom delimiter loading#277
bashandbone merged 3 commits intoclaude/issue-99-20260319-0103from
copilot/sub-pr-275

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Mar 25, 2026

Review feedback on the custom delimiter loading implementation flagged loose typing (list[object]) and inline textcase imports on hot paths.

delimiter.py

  • _load_custom_delimiters() return type: list[object]list[Delimiter]; local variable typed accordingly — removes the # type: ignore[assignment] at the call site in _load_delimiters_for_language
  • import textcase, import logging, ConfigLanguage, SemanticSearchLanguage moved to module scope; logger = logging.getLogger(__name__) promoted to module level

selector.py

  • import textcase moved to module scope
  • textcase: object parameter removed from _match_custom_ext_pair — callers no longer pass the module around; all # type: ignore[attr-defined] dropped
  • Fixed enum normalization for delim_lang: was textcase.snake(str(delim_lang)) which produces config_language_bash for ConfigLanguage.BASH; now uses .variable consistent with the _normalize helper:
# before — wrong for enum types
return textcase.snake(str(delim_lang))  # type: ignore[attr-defined]

# after
if isinstance(delim_lang, SemanticSearchLanguage | ConfigLanguage):
    return textcase.snake(delim_lang.variable)
return textcase.snake(str(delim_lang))

isinstance union syntax intentionally kept — CodeWeaver/ruff preferred idiom.


🔒 GitHub Advanced Security automatically protects Copilot coding agent pull requests. You can protect all pull requests by enabling Advanced Security for your repositories. Learn more about Advanced Security.

Summary by Sourcery

Tighten typing and module-level setup for custom delimiter loading and selection, and correct language normalization when using enum-based custom delimiters.

Bug Fixes:

  • Fix language normalization for enum-based custom delimiter languages so that their snake_case names are derived from the enum variable instead of its string representation.

Enhancements:

  • Narrow the return type of custom delimiter loading to use the Delimiter model explicitly and remove the need for type-ignore annotations.
  • Promote textcase, logging, and language-related imports and logger initialization to module scope in delimiter and selector modules to avoid repeated imports and parameter plumbing.
  • Simplify custom extension matching by no longer threading the textcase module through function parameters and relying on the shared module-level import instead.

…ter loading

Co-authored-by: bashandbone <89049923+bashandbone@users.noreply.github.com>
Agent-Logs-Url: https://github.com/knitli/codeweaver/sessions/0cc04151-09d9-4c2f-9eb2-9a1f233a187f
Copilot AI changed the title [WIP] Implement custom delimiter loading with override and type safety fix: tighten types and move textcase to module scope in custom delimiter loading Mar 25, 2026
Copilot AI requested a review from bashandbone March 25, 2026 14:02
@bashandbone bashandbone marked this pull request as ready for review March 25, 2026 14:42
Copilot AI review requested due to automatic review settings March 25, 2026 14:42
@sourcery-ai
Copy link
Copy Markdown
Contributor

sourcery-ai bot commented Mar 25, 2026

Reviewer's guide (collapsed on small PRs)

Reviewer's Guide

Tightens typing in custom delimiter loading, promotes imports/logging to module scope to reduce hot-path overhead, and fixes enum normalization when resolving languages from custom delimiter extensions.

Class diagram for tightened delimiter loading types

classDiagram
    class Delimiter {
    }

    class SemanticSearchLanguage {
        string variable
    }

    class ConfigLanguage {
        string variable
    }

    class Governor {
        object settings
    }

    class Settings {
        list~object~ custom_delimiters
    }

    class ChunkerDelimiterModule {
        +Governor _governor
        +list~Delimiter~ _load_custom_delimiters(normalized_language str, language str)
        +list~Delimiter~ _load_delimiters_for_language(language str)
    }

    ChunkerDelimiterModule --> Governor : uses
    Governor --> Settings : has
    ChunkerDelimiterModule --> Delimiter : returns
    ChunkerDelimiterModule ..> SemanticSearchLanguage : normalizes
    ChunkerDelimiterModule ..> ConfigLanguage : normalizes
Loading

File-Level Changes

Change Details Files
Tighten typing for custom delimiter loading and remove ignore markers.
  • Change _load_custom_delimiters return type from list[object] to list[Delimiter] and type its local delimiters list accordingly.
  • Remove the type: ignore[assignment] at the _load_delimiters_for_language call site now that types align.
  • Keep the _normalize helper using SemanticSearchLanguage
ConfigLanguage union isinstance pattern for enum handling.
Move imports and logger creation to module scope to avoid repeated work on hot paths.
  • Hoist imports for textcase, logging, ConfigLanguage, and SemanticSearchLanguage from inside _load_custom_delimiters to the module level.
  • Promote logger = logging.getLogger(name) to module level instead of creating it inside the function.
  • Remove inline textcase import in _load_delimiters_for_language; rely on the module-scope import.
src/codeweaver/engine/chunker/delimiter.py
Simplify custom extension matching API and fix enum normalization in selector.
  • Move textcase import to module scope, stop threading a textcase parameter through _match_custom_ext_pair, and update _detect_language_from_custom_ext caller accordingly.
  • Update delim_lang normalization to use delim_lang.variable when it is a SemanticSearchLanguage or ConfigLanguage enum, matching the _normalize helper semantics.
  • Drop type: ignore[attr-defined] comments by using correctly-typed enum handling and direct textcase.snake calls.
src/codeweaver/engine/chunker/selector.py

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link
Copy Markdown
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 2 issues, and left some high level feedback:

  • The SemanticSearchLanguage | ConfigLanguage normalization logic using textcase.snake now appears in both delimiter._normalize and _match_custom_ext_pair; consider extracting this into a shared helper to avoid divergence if the enum shape changes again.
  • Now that textcase is imported at module scope, you may want to handle the case where it is an optional dependency (e.g., by guarding import errors or providing a fallback) if consumers can run without it.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- The `SemanticSearchLanguage | ConfigLanguage` normalization logic using `textcase.snake` now appears in both `delimiter._normalize` and `_match_custom_ext_pair`; consider extracting this into a shared helper to avoid divergence if the enum shape changes again.
- Now that `textcase` is imported at module scope, you may want to handle the case where it is an optional dependency (e.g., by guarding import errors or providing a fallback) if consumers can run without it.

## Individual Comments

### Comment 1
<location path="src/codeweaver/engine/chunker/delimiter.py" line_range="1292-1295" />
<code_context>

-        logger = logging.getLogger(__name__)
-
         def _normalize(lang: object) -> str:
             if isinstance(lang, SemanticSearchLanguage | ConfigLanguage):
                 return textcase.snake(lang.variable)
             return textcase.snake(str(lang))

-        delimiters: list[object] = []
</code_context>
<issue_to_address>
**suggestion:** Consider centralizing language normalization logic used across delimiter loading and selector matching.

This helper duplicates the normalization logic in `_match_custom_ext_pair` (handling `SemanticSearchLanguage | ConfigLanguage` via `lang.variable`, otherwise `str(lang)`). Please consider extracting a shared utility (e.g., in a small internal helper module) so the chunker and selector stay in sync if one of them changes.
</issue_to_address>

### Comment 2
<location path="src/codeweaver/engine/chunker/selector.py" line_range="255-264" />
<code_context>
                 continue
             delim_lang = getattr(custom_delim, "language", None)
             if delim_lang is not None:
-                return textcase.snake(str(delim_lang))  # type: ignore[attr-defined]
+                if isinstance(delim_lang, SemanticSearchLanguage | ConfigLanguage):
+                    return textcase.snake(delim_lang.variable)
+                return textcase.snake(str(delim_lang))
             pair_lang = getattr(pair, "language", None)
             if pair_lang is not None:
                 if isinstance(pair_lang, SemanticSearchLanguage | ConfigLanguage):
                     return pair_lang
-                return textcase.snake(str(pair_lang))  # type: ignore[attr-defined]
+                return textcase.snake(str(pair_lang))
         return None

</code_context>
<issue_to_address>
**question:** The return shape for `language` is asymmetric between `custom_delim.language` and `pair.language`, which may be surprising for callers.

Here, `delim_lang` of type `SemanticSearchLanguage | ConfigLanguage` is converted to a normalized `str`, while `pair_lang` of the same types is returned as the enum instance. Given the shared return type `SemanticSearchLanguage | ConfigLanguage | str | None`, this inconsistency can complicate caller logic. Consider either returning normalized strings in both cases or returning enum instances in both and handling normalization at a higher layer. If you keep this behavior, documenting the intent or tightening the return type would help clarify the contract.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment on lines 1292 to 1295
def _normalize(lang: object) -> str:
if isinstance(lang, SemanticSearchLanguage | ConfigLanguage):
return textcase.snake(lang.variable)
return textcase.snake(str(lang))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: Consider centralizing language normalization logic used across delimiter loading and selector matching.

This helper duplicates the normalization logic in _match_custom_ext_pair (handling SemanticSearchLanguage | ConfigLanguage via lang.variable, otherwise str(lang)). Please consider extracting a shared utility (e.g., in a small internal helper module) so the chunker and selector stay in sync if one of them changes.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR tightens typing and reduces repeated hot-path imports in the chunker custom-delimiter loading and selection logic.

Changes:

  • Tighten _load_custom_delimiters() typing (list[object]list[Delimiter]) and remove the downstream # type: ignore[assignment].
  • Move textcase (and related imports/logger) to module scope in delimiter.py and selector.py.
  • Adjust enum normalization for custom delimiter language detection in ChunkerSelector.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
src/codeweaver/engine/chunker/selector.py Moves textcase import to module scope and updates custom extension→language normalization.
src/codeweaver/engine/chunker/delimiter.py Promotes imports/logger to module scope and tightens return types for custom delimiter loading.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Signed-off-by: Adam Poulemanos <89049923+bashandbone@users.noreply.github.com>
@bashandbone bashandbone merged commit e0be205 into claude/issue-99-20260319-0103 Mar 25, 2026
6 checks passed
@bashandbone bashandbone deleted the copilot/sub-pr-275 branch March 25, 2026 19:05
bashandbone added a commit that referenced this pull request Mar 27, 2026
…and new-language detection (#275)

* feat: Implement delimiter conversion for custom delimiters from settings

Complete delimiter family system integration by implementing proper
conversion from DelimiterPattern to Delimiter objects for custom
delimiters loaded from user settings.

Changes:
- Remove TODO placeholder at delimiter.py:1289-1290
- Initialize delimiters list before custom delimiter processing
- Add conversion loop using Delimiter.from_pattern() for custom delimiters
- Maintain consistent pattern with family delimiter conversion

The keyword matching system blocker (documented in previous analysis)
has been resolved - _match_keyword_delimiters() was implemented at
delimiter.py:417, enabling proper handling of delimiters with empty
end strings and keyword-based structure detection.

Resolves #99

Co-authored-by: Adam Poulemanos <bashandbone@users.noreply.github.com>

* fix: address review feedback on custom delimiter loading

Co-authored-by: bashandbone <89049923+bashandbone@users.noreply.github.com>
Agent-Logs-Url: https://github.com/knitli/codeweaver/sessions/66279302-fae6-4887-8c87-7b6eadfac587

* fix: remove unused type: ignore comment in test

Co-authored-by: bashandbone <89049923+bashandbone@users.noreply.github.com>
Agent-Logs-Url: https://github.com/knitli/codeweaver/sessions/66279302-fae6-4887-8c87-7b6eadfac587

* fix: tighten types and move textcase to module scope in custom delimiter loading (#277)

* Initial plan

* fix: tighten types and move textcase to module scope in custom delimiter loading

Co-authored-by: bashandbone <89049923+bashandbone@users.noreply.github.com>
Agent-Logs-Url: https://github.com/knitli/codeweaver/sessions/0cc04151-09d9-4c2f-9eb2-9a1f233a187f

* Update src/codeweaver/engine/chunker/selector.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Signed-off-by: Adam Poulemanos <89049923+bashandbone@users.noreply.github.com>

---------

Signed-off-by: Adam Poulemanos <89049923+bashandbone@users.noreply.github.com>
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: bashandbone <89049923+bashandbone@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update src/codeweaver/engine/chunker/selector.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Signed-off-by: Adam Poulemanos <89049923+bashandbone@users.noreply.github.com>

---------

Signed-off-by: Adam Poulemanos <89049923+bashandbone@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Adam Poulemanos <bashandbone@users.noreply.github.com>
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants