fix: defer fastembed import errors to use time instead of import time#280
Conversation
Replace import-time ConfigurationError raises with conditional imports using has_package() sentinels, following the existing pattern in service_cards.py. Modules can now be safely imported even when fastembed is unavailable — errors are raised only when fastembed functionality is actually invoked. Files changed: - fastembed_extensions.py: guard imports + model constants, add _require_fastembed() check in provider functions - embedding/providers/fastembed.py: conditional import with Any fallbacks - reranking/providers/fastembed.py: conditional import with Any fallbacks Closes knitli#279 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Reviewer's GuideDefers FastEmbed-related ConfigurationError exceptions from import time to use time by introducing package-availability sentinels, conditional imports, and runtime guards across the fastembed extensions, embedding provider, and reranking provider modules. Sequence diagram for deferred FastEmbed ConfigurationError at use timesequenceDiagram
actor User
participant Application
participant FastEmbedEmbeddingProvider
participant fastembed_extensions
participant FastEmbedLibrary
User->>Application: request_embeddings(texts)
Application->>FastEmbedEmbeddingProvider: embed(texts)
alt First use of FastEmbedEmbeddingProvider
FastEmbedEmbeddingProvider->>fastembed_extensions: get_text_embedder()
fastembed_extensions->>fastembed_extensions: _require_fastembed()
alt FastEmbed is installed
fastembed_extensions->>FastEmbedLibrary: import TextEmbedding, DenseModelDescription, BaseModelDescription
fastembed_extensions-->>FastEmbedEmbeddingProvider: TextEmbedding subclass with custom models
FastEmbedEmbeddingProvider->>FastEmbedLibrary: TextEmbedding(texts)
FastEmbedLibrary-->>FastEmbedEmbeddingProvider: embeddings
FastEmbedEmbeddingProvider-->>Application: embeddings
Application-->>User: embeddings
else FastEmbed is not installed
fastembed_extensions-->>FastEmbedEmbeddingProvider: raise ConfigurationError
FastEmbedEmbeddingProvider-->>Application: propagate ConfigurationError
Application-->>User: error response
end
else Subsequent uses
FastEmbedEmbeddingProvider->>FastEmbedLibrary: reuse _TextEmbedding(texts)
FastEmbedLibrary-->>FastEmbedEmbeddingProvider: embeddings
FastEmbedEmbeddingProvider-->>Application: embeddings
Application-->>User: embeddings
end
Flow diagram for FastEmbed provider import and sentinel-based initializationflowchart TD
A[Module import: embedding/providers/fastembed.py] --> B[Call has_package fastembed]
B --> C[Call has_package fastembed-gpu]
C --> D{Any FastEmbed package available?}
D -- Yes --> E[Set _FASTEMBED_AVAILABLE to True]
D -- No --> F[Set _FASTEMBED_AVAILABLE to False]
E --> G[Import TextEmbedding, SparseTextEmbedding under TYPE_CHECKING or runtime]
F --> H[Assign TextEmbedding = Any, SparseTextEmbedding = Any]
G --> I[Import get_text_embedder and get_sparse_embedder]
I --> J[Initialize _TextEmbedding and _SparseTextEmbedding via helpers]
F --> K[Set _TextEmbedding = None, _SparseTextEmbedding = None]
J --> L[Provider methods use _TextEmbedding and _SparseTextEmbedding]
K --> L
L --> M{FastEmbed requested at runtime?}
M -- Yes, FastEmbed available --> N[Use underlying FastEmbed classes normally]
M -- Yes, FastEmbed missing --> O[_require_fastembed raises ConfigurationError]
M -- No --> P[No FastEmbed usage, module import remains successful]
%% fastembed_extensions guard
Q[Module import: fastembed_extensions.py] --> R[Compute _FASTEMBED_AVAILABLE with has_package]
R --> S{_FASTEMBED_AVAILABLE?}
S -- Yes --> T[Import real FastEmbed model description types and TextCrossEncoder]
S -- No --> U[Assign BaseModelDescription, DenseModelDescription, ModelSource, PoolingType, TextCrossEncoder, SparseTextEmbedding, TextEmbedding to Any]
T --> V[Populate DENSE_MODELS and RERANKING_MODELS tuples]
U --> W[Set DENSE_MODELS = empty tuple, RERANKING_MODELS = empty tuple]
V --> X[Runtime helpers call _require_fastembed before using FastEmbed]
W --> X
%% reranking provider
AA[Module import: reranking/providers/fastembed.py] --> AB[Compute _FASTEMBED_AVAILABLE with has_package]
AB --> AC{_FASTEMBED_AVAILABLE?}
AC -- Yes --> AD[Import TextCrossEncoder under TYPE_CHECKING or runtime]
AC -- No --> AE[Assign TextCrossEncoder = Any]
AD --> AF[FastEmbedRerankingProvider uses real TextCrossEncoder]
AE --> AF[FastEmbedRerankingProvider type checks but will error only on actual FastEmbed use]
AF --> AG[Import completes without raising even if FastEmbed is missing]
File-Level Changes
Assessment against linked issues
Possibly linked issues
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
|
👋 Hey @aiedwardyi, Thanks for your contribution to codeweaver! 🧵You need to agree to the CLA first... 🖊️Before we can accept your contribution, you need to agree to our Contributor License Agreement (CLA). To agree to the CLA, please comment:
Those exact words are important1, so please don't change them. 😉 You can read the full CLA here: Contributor License Agreement ✅ @aiedwardyi has signed the CLA. You can retrigger this bot by commenting recheck in this Pull Request. Posted by the CLA Assistant Lite bot. Footnotes
|
There was a problem hiding this comment.
Hey - I've found 2 issues, and left some high level feedback:
- In
embedding/providers/fastembed.py,_TextEmbeddingand_SparseTextEmbeddingare set toNonewhen FastEmbed is unavailable; consider mirroring the_require_fastembed()pattern used infastembed_extensions.pyso that any code path that instantiates or uses these types raises a clearConfigurationErrorinstead of hittingNoneat runtime. - In
reranking/providers/fastembed.py,FastEmbedRerankingProvidernow type-checks without FastEmbed installed, but there is no corresponding runtime guard; it would be safer to add a use-time availability check (similar to_require_fastembed()) in the provider’s constructor or first-use methods to fail with a clear configuration error if FastEmbed is missing.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- In `embedding/providers/fastembed.py`, `_TextEmbedding` and `_SparseTextEmbedding` are set to `None` when FastEmbed is unavailable; consider mirroring the `_require_fastembed()` pattern used in `fastembed_extensions.py` so that any code path that instantiates or uses these types raises a clear `ConfigurationError` instead of hitting `None` at runtime.
- In `reranking/providers/fastembed.py`, `FastEmbedRerankingProvider` now type-checks without FastEmbed installed, but there is no corresponding runtime guard; it would be safer to add a use-time availability check (similar to `_require_fastembed()`) in the provider’s constructor or first-use methods to fail with a clear configuration error if FastEmbed is missing.
## Individual Comments
### Comment 1
<location path="src/codeweaver/providers/embedding/providers/fastembed.py" line_range="55-56" />
<code_context>
+ )
+
+
+if _FASTEMBED_AVAILABLE:
+ """
+ SPARSE_MODELS = (
</code_context>
<issue_to_address>
**issue (bug_risk):** Using `None` placeholders for embedding classes can lead to unclear runtime failures when fastembed is missing.
Previously this module raised a `ConfigurationError` at import time when FastEmbed was missing. With `_TextEmbedding` and `_SparseTextEmbedding` now set to `None` when `_FASTEMBED_AVAILABLE` is false, callers may hit `AttributeError`/`TypeError` instead of a clear configuration error if they construct or use this provider without checking the flag. Consider either raising `ConfigurationError` in the provider’s constructor/factory when `_FASTEMBED_AVAILABLE` is false, or avoiding `None` placeholders so selection of this provider fails fast with a clear message.
</issue_to_address>
### Comment 2
<location path="src/codeweaver/providers/reranking/providers/fastembed.py" line_range="26-31" />
<code_context>
from codeweaver.core.di import dependency_provider
+from codeweaver.core.utils import has_package
+_FASTEMBED_AVAILABLE = has_package("fastembed") or has_package("fastembed-gpu")
-try:
</code_context>
<issue_to_address>
**issue (bug_risk):** Removing the configuration error on missing fastembed may cause harder-to-debug runtime issues for reranking.
Previously, a failed `TextCrossEncoder` import raised a `ConfigurationError` with a clear install hint. Now, with `TextCrossEncoder` typed as `Any` when fastembed is missing, `FastEmbedRerankingProvider` can still be constructed and will only fail on first use with a likely opaque error. Please add an explicit check (e.g., in `FastEmbedRerankingProvider.__init__` or its factory) that raises a clear `ConfigurationError` when `_FASTEMBED_AVAILABLE` is false.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
There was a problem hiding this comment.
Pull request overview
This PR aims to make FastEmbed-related modules safe to import when fastembed/fastembed-gpu isn’t installed, deferring dependency errors until FastEmbed functionality is actually used (addressing #279 and improving Python 3.14+ test collection behavior).
Changes:
- Added
_FASTEMBED_AVAILABLEsentinels based onhas_package()and conditionalTYPE_CHECKINGimports across FastEmbed embedding/reranking modules. - Introduced a use-time
_require_fastembed()guard infastembed_extensions.pyand gated model registries behind availability checks. - Replaced prior import-time
ConfigurationErrorraises with lazy patterns intended to defer failures.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.
| File | Description |
|---|---|
| src/codeweaver/providers/reranking/providers/fastembed.py | Switches FastEmbed import logic to a sentinel + conditional imports to avoid import-time failure when dependency is absent. |
| src/codeweaver/providers/embedding/providers/fastembed.py | Applies the same sentinel + conditional import approach for embedding/sparse embedding providers. |
| src/codeweaver/providers/embedding/fastembed_extensions.py | Adds _require_fastembed() and gates model registries; intended to centralize use-time dependency errors. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
|
||
| raise ConfigurationError( | ||
| "fastembed is not installed. Please install it with " | ||
| "`pip install code-weaver[fastembed]` or `codeweaver[fastembed-gpu]`." |
There was a problem hiding this comment.
The ConfigurationError install hint uses pip install codeweaver[fastembed-gpu], but the project’s distribution name is code-weaver (see pyproject.toml). Using the wrong name will send users to a failing install command. Update the message so both extras use the correct distribution name consistently.
| "`pip install code-weaver[fastembed]` or `codeweaver[fastembed-gpu]`." | |
| "`pip install code-weaver[fastembed]` or `pip install code-weaver[fastembed-gpu]`." |
| _FASTEMBED_AVAILABLE = has_package("fastembed") or has_package("fastembed-gpu") | ||
|
|
||
| try: | ||
| if TYPE_CHECKING or _FASTEMBED_AVAILABLE: |
There was a problem hiding this comment.
This PR changes import-time behavior (modules should be importable when fastembed isn’t available). There doesn’t appear to be a regression test asserting that these modules can be imported when has_package("fastembed")/has_package("fastembed-gpu") is false. Adding a unit test that temporarily forces has_package to return false (and reloads these modules) would help prevent reintroducing import-time failures.
| _FASTEMBED_AVAILABLE = has_package("fastembed") or has_package("fastembed-gpu") | ||
|
|
||
| raise ConfigurationError( | ||
| r"FastEmbed is not installed. Please install it with `pip install code-weaver\[fastembed]` or `codeweaver\[fastembed-gpu]`." | ||
| ) from e | ||
| if TYPE_CHECKING or _FASTEMBED_AVAILABLE: | ||
| from fastembed.rerank.cross_encoder import TextCrossEncoder | ||
| else: | ||
| TextCrossEncoder = Any |
There was a problem hiding this comment.
_FASTEMBED_AVAILABLE is computed from has_package(...), but the subsequent from fastembed... import is unconditional when that flag is true. If the distribution is present but import fastembed fails (e.g., missing platform wheels/compiled deps), this will still crash at module import time, which defeats the goal of deferring fastembed import errors to use-time. Consider wrapping the fastembed import in try/except ImportError and, on failure, treating fastembed as unavailable (optionally stash the ImportError) so you can raise a ConfigurationError from a small _require_fastembed() guard when the provider is actually used.
| _FASTEMBED_AVAILABLE = has_package("fastembed") or has_package("fastembed-gpu") | ||
|
|
||
| try: | ||
| if TYPE_CHECKING or _FASTEMBED_AVAILABLE: | ||
| from fastembed.sparse import SparseTextEmbedding | ||
| from fastembed.text import TextEmbedding | ||
|
|
There was a problem hiding this comment.
This file now defers the absence of fastembed, but it no longer defers ImportError during fastembed import (the from fastembed... imports run whenever has_package(...) returns true). To fully meet the PR’s goal, the imports should be protected so that a broken/partial fastembed install doesn’t raise at module import time; instead, record the failure and raise ConfigurationError when FastEmbedEmbeddingProvider/FastEmbedSparseProvider functionality is invoked.
| @@ -24,87 +26,104 @@ | |||
| from fastembed.rerank.cross_encoder import TextCrossEncoder | |||
| from fastembed.sparse import SparseTextEmbedding | |||
| from fastembed.text import TextEmbedding | |||
There was a problem hiding this comment.
_FASTEMBED_AVAILABLE is based on package presence, but the fastembed imports in the next block are not guarded. If fastembed (or fastembed-gpu) is installed but fails to import (missing binary deps, incompatible Python, etc.), this module will still raise at import time. To actually defer import errors to use-time, wrap these imports in try/except ImportError and fall back to the Any placeholders + _require_fastembed() raising ConfigurationError when the exported helpers are called.
|
I read the contributors license agreement and I agree to it. |
|
recheck |
|
I read the contributors license agreement and I agree to it. |
|
I have read the CLA Document and I hereby sign the CLA |
|
recheck |
|
The CLA bot doesn't seem to be recognizing my signature. I've commented with both the custom phrase and the default phrase multiple times, but the check keeps failing. Could you take a look at the CLA bot configuration? It may be a permissions issue writing to \ in the \ repo. |
Address review feedback from Sourcery and Copilot: - Wrap fastembed imports in try/except ImportError so broken installs (missing binary deps, incompatible Python) don't crash at import time - Add _require_fastembed() guards in embedding and reranking providers so missing fastembed raises a clear ConfigurationError instead of opaque AttributeError/TypeError from None placeholders - Fix install hint typo: codeweaver[fastembed-gpu] -> code-weaver[fastembed-gpu]
Welcome to Codecov 🎉Once you merge this PR into your default branch, you're all set! Codecov will compare coverage reports and display results in all future pull requests. ℹ️ You can also turn on project coverage checks and project coverage reporting on Pull Request comment Thanks for integrating Codecov - We've got you covered ☂️ |
|
@aiedwardyi thanks for the contribution! Tracking the CLA bot issue, it wasn't high on my list while it was just me and the robot masses here. You get the distinction of being the first (other) human contributor to CodeWeaver. 🎉 And I appreciate it -- you gave me a reason to stop endlessly tweaking and ship Alpha 6. Should go out tomorrow. This looks good to me. Closes #279 |
Summary
ConfigurationErrorraises with conditional imports usinghas_package()sentinels, following the existing pattern inservice_cards.pyfastembed_extensions.py,embedding/providers/fastembed.py, andreranking/providers/fastembed.pyApproach
Each file follows the same pattern:
_FASTEMBED_AVAILABLE = has_package("fastembed") or has_package("fastembed-gpu")TYPE_CHECKING or _FASTEMBED_AVAILABLE,Anyplaceholders in theelsebranch_require_fastembed()raisesConfigurationErroronly when fastembed features are actually calledThis mirrors the lazy import convention already used in
service_cards.pyand keeps type checking fully functional.Test plan
requires_fastembedmarker)ConfigurationErroris still raised with a clear message when fastembed functionality is invoked without the packageCloses #279
🤖 Generated with Claude Code
Summary by Sourcery
Defer FastEmbed dependency errors from import time to use time across embedding and reranking providers, allowing the package to be imported without FastEmbed installed while still failing clearly when FastEmbed-backed functionality is invoked.
Bug Fixes:
Enhancements: