fix: tighten types and move textcase to module scope in custom delimiter loading#277
Conversation
…ter loading Co-authored-by: bashandbone <89049923+bashandbone@users.noreply.github.com> Agent-Logs-Url: https://github.com/knitli/codeweaver/sessions/0cc04151-09d9-4c2f-9eb2-9a1f233a187f
Reviewer's guide (collapsed on small PRs)Reviewer's GuideTightens typing in custom delimiter loading, promotes imports/logging to module scope to reduce hot-path overhead, and fixes enum normalization when resolving languages from custom delimiter extensions. Class diagram for tightened delimiter loading typesclassDiagram
class Delimiter {
}
class SemanticSearchLanguage {
string variable
}
class ConfigLanguage {
string variable
}
class Governor {
object settings
}
class Settings {
list~object~ custom_delimiters
}
class ChunkerDelimiterModule {
+Governor _governor
+list~Delimiter~ _load_custom_delimiters(normalized_language str, language str)
+list~Delimiter~ _load_delimiters_for_language(language str)
}
ChunkerDelimiterModule --> Governor : uses
Governor --> Settings : has
ChunkerDelimiterModule --> Delimiter : returns
ChunkerDelimiterModule ..> SemanticSearchLanguage : normalizes
ChunkerDelimiterModule ..> ConfigLanguage : normalizes
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Hey - I've found 2 issues, and left some high level feedback:
- The
SemanticSearchLanguage | ConfigLanguagenormalization logic usingtextcase.snakenow appears in bothdelimiter._normalizeand_match_custom_ext_pair; consider extracting this into a shared helper to avoid divergence if the enum shape changes again. - Now that
textcaseis imported at module scope, you may want to handle the case where it is an optional dependency (e.g., by guarding import errors or providing a fallback) if consumers can run without it.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The `SemanticSearchLanguage | ConfigLanguage` normalization logic using `textcase.snake` now appears in both `delimiter._normalize` and `_match_custom_ext_pair`; consider extracting this into a shared helper to avoid divergence if the enum shape changes again.
- Now that `textcase` is imported at module scope, you may want to handle the case where it is an optional dependency (e.g., by guarding import errors or providing a fallback) if consumers can run without it.
## Individual Comments
### Comment 1
<location path="src/codeweaver/engine/chunker/delimiter.py" line_range="1292-1295" />
<code_context>
- logger = logging.getLogger(__name__)
-
def _normalize(lang: object) -> str:
if isinstance(lang, SemanticSearchLanguage | ConfigLanguage):
return textcase.snake(lang.variable)
return textcase.snake(str(lang))
- delimiters: list[object] = []
</code_context>
<issue_to_address>
**suggestion:** Consider centralizing language normalization logic used across delimiter loading and selector matching.
This helper duplicates the normalization logic in `_match_custom_ext_pair` (handling `SemanticSearchLanguage | ConfigLanguage` via `lang.variable`, otherwise `str(lang)`). Please consider extracting a shared utility (e.g., in a small internal helper module) so the chunker and selector stay in sync if one of them changes.
</issue_to_address>
### Comment 2
<location path="src/codeweaver/engine/chunker/selector.py" line_range="255-264" />
<code_context>
continue
delim_lang = getattr(custom_delim, "language", None)
if delim_lang is not None:
- return textcase.snake(str(delim_lang)) # type: ignore[attr-defined]
+ if isinstance(delim_lang, SemanticSearchLanguage | ConfigLanguage):
+ return textcase.snake(delim_lang.variable)
+ return textcase.snake(str(delim_lang))
pair_lang = getattr(pair, "language", None)
if pair_lang is not None:
if isinstance(pair_lang, SemanticSearchLanguage | ConfigLanguage):
return pair_lang
- return textcase.snake(str(pair_lang)) # type: ignore[attr-defined]
+ return textcase.snake(str(pair_lang))
return None
</code_context>
<issue_to_address>
**question:** The return shape for `language` is asymmetric between `custom_delim.language` and `pair.language`, which may be surprising for callers.
Here, `delim_lang` of type `SemanticSearchLanguage | ConfigLanguage` is converted to a normalized `str`, while `pair_lang` of the same types is returned as the enum instance. Given the shared return type `SemanticSearchLanguage | ConfigLanguage | str | None`, this inconsistency can complicate caller logic. Consider either returning normalized strings in both cases or returning enum instances in both and handling normalization at a higher layer. If you keep this behavior, documenting the intent or tightening the return type would help clarify the contract.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| def _normalize(lang: object) -> str: | ||
| if isinstance(lang, SemanticSearchLanguage | ConfigLanguage): | ||
| return textcase.snake(lang.variable) | ||
| return textcase.snake(str(lang)) |
There was a problem hiding this comment.
suggestion: Consider centralizing language normalization logic used across delimiter loading and selector matching.
This helper duplicates the normalization logic in _match_custom_ext_pair (handling SemanticSearchLanguage | ConfigLanguage via lang.variable, otherwise str(lang)). Please consider extracting a shared utility (e.g., in a small internal helper module) so the chunker and selector stay in sync if one of them changes.
There was a problem hiding this comment.
Pull request overview
This PR tightens typing and reduces repeated hot-path imports in the chunker custom-delimiter loading and selection logic.
Changes:
- Tighten
_load_custom_delimiters()typing (list[object]→list[Delimiter]) and remove the downstream# type: ignore[assignment]. - Move
textcase(and related imports/logger) to module scope indelimiter.pyandselector.py. - Adjust enum normalization for custom delimiter language detection in
ChunkerSelector.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| src/codeweaver/engine/chunker/selector.py | Moves textcase import to module scope and updates custom extension→language normalization. |
| src/codeweaver/engine/chunker/delimiter.py | Promotes imports/logger to module scope and tightens return types for custom delimiter loading. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Signed-off-by: Adam Poulemanos <89049923+bashandbone@users.noreply.github.com>
…and new-language detection (#275) * feat: Implement delimiter conversion for custom delimiters from settings Complete delimiter family system integration by implementing proper conversion from DelimiterPattern to Delimiter objects for custom delimiters loaded from user settings. Changes: - Remove TODO placeholder at delimiter.py:1289-1290 - Initialize delimiters list before custom delimiter processing - Add conversion loop using Delimiter.from_pattern() for custom delimiters - Maintain consistent pattern with family delimiter conversion The keyword matching system blocker (documented in previous analysis) has been resolved - _match_keyword_delimiters() was implemented at delimiter.py:417, enabling proper handling of delimiters with empty end strings and keyword-based structure detection. Resolves #99 Co-authored-by: Adam Poulemanos <bashandbone@users.noreply.github.com> * fix: address review feedback on custom delimiter loading Co-authored-by: bashandbone <89049923+bashandbone@users.noreply.github.com> Agent-Logs-Url: https://github.com/knitli/codeweaver/sessions/66279302-fae6-4887-8c87-7b6eadfac587 * fix: remove unused type: ignore comment in test Co-authored-by: bashandbone <89049923+bashandbone@users.noreply.github.com> Agent-Logs-Url: https://github.com/knitli/codeweaver/sessions/66279302-fae6-4887-8c87-7b6eadfac587 * fix: tighten types and move textcase to module scope in custom delimiter loading (#277) * Initial plan * fix: tighten types and move textcase to module scope in custom delimiter loading Co-authored-by: bashandbone <89049923+bashandbone@users.noreply.github.com> Agent-Logs-Url: https://github.com/knitli/codeweaver/sessions/0cc04151-09d9-4c2f-9eb2-9a1f233a187f * Update src/codeweaver/engine/chunker/selector.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Signed-off-by: Adam Poulemanos <89049923+bashandbone@users.noreply.github.com> --------- Signed-off-by: Adam Poulemanos <89049923+bashandbone@users.noreply.github.com> Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: bashandbone <89049923+bashandbone@users.noreply.github.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update src/codeweaver/engine/chunker/selector.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Signed-off-by: Adam Poulemanos <89049923+bashandbone@users.noreply.github.com> --------- Signed-off-by: Adam Poulemanos <89049923+bashandbone@users.noreply.github.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Adam Poulemanos <bashandbone@users.noreply.github.com> Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Review feedback on the custom delimiter loading implementation flagged loose typing (
list[object]) and inlinetextcaseimports on hot paths.delimiter.py_load_custom_delimiters()return type:list[object]→list[Delimiter]; local variable typed accordingly — removes the# type: ignore[assignment]at the call site in_load_delimiters_for_languageimport textcase,import logging,ConfigLanguage,SemanticSearchLanguagemoved to module scope;logger = logging.getLogger(__name__)promoted to module levelselector.pyimport textcasemoved to module scopetextcase: objectparameter removed from_match_custom_ext_pair— callers no longer pass the module around; all# type: ignore[attr-defined]droppeddelim_lang: wastextcase.snake(str(delim_lang))which producesconfig_language_bashforConfigLanguage.BASH; now uses.variableconsistent with the_normalizehelper:isinstanceunion syntax intentionally kept — CodeWeaver/ruff preferred idiom.🔒 GitHub Advanced Security automatically protects Copilot coding agent pull requests. You can protect all pull requests by enabling Advanced Security for your repositories. Learn more about Advanced Security.
Summary by Sourcery
Tighten typing and module-level setup for custom delimiter loading and selection, and correct language normalization when using enum-based custom delimiters.
Bug Fixes:
Enhancements: