LCORE-1446: Add dynamic filter support to Solr vector IO provider#119
LCORE-1446: Add dynamic filter support to Solr vector IO provider#119anik120 wants to merge 1 commit intolightspeed-core:mainfrom
Conversation
|
Warning Rate limit exceeded
Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 54 minutes and 39 seconds. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: ASSERTIVE Plan: Pro Run ID: ⛔ Files ignored due to path filters (1)
📒 Files selected for processing (5)
WalkthroughAdds dynamic filter support to the Solr provider: new serialization of llama-stack Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant SolrIndex
participant SolrServer
Client->>SolrIndex: call query_vector(..., filters=Filter)
SolrIndex->>SolrIndex: serialize Filter -> fq (validate/escape)
SolrIndex->>SolrIndex: combine fq with configured chunk_filter_query (AND) if present
SolrIndex->>SolrServer: HTTP request with params (q, fq, rows, ... )
SolrServer-->>SolrIndex: search results
SolrIndex-->>Client: return chunks/results
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes 🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
✨ Simplify code
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 6
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (5)
lightspeed_stack_providers/providers/remote/solr_vector_io/solr_vector_io/src/solr_vector_io/solr.py (5)
1296-1300:⚠️ Potential issue | 🔴 CriticalAnnotate the vector-store cache.
The mypy job reports
Need type annotation for "cache"on this new empty dict assignment.🔧 Proposed typing fix
- self.cache = {} + self.cache: dict[str, VectorStoreWithIndex] = {}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@lightspeed_stack_providers/providers/remote/solr_vector_io/solr_vector_io/src/solr_vector_io/solr.py` around lines 1296 - 1300, Annotate the untyped cache on the class by giving self.cache an explicit type (for example use typing.Dict[str, typing.Any] or a more specific type if known) instead of assigning a bare {} in the constructor; add the necessary import from typing (Dict, Any or the precise types) and replace the line setting self.cache so mypy knows its type (refer to the attribute name self.cache in the class __init__).
860-865:⚠️ Potential issue | 🔴 CriticalAnnotate the defaultdict used for kept chunk indices.
The mypy job reports
Need type annotation for "kept_indices_by_parent"here.🔧 Proposed typing fix
- kept_indices_by_parent = defaultdict(list) + kept_indices_by_parent: defaultdict[Any, list[int]] = defaultdict(list)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@lightspeed_stack_providers/providers/remote/solr_vector_io/solr_vector_io/src/solr_vector_io/solr.py` around lines 860 - 865, The variable kept_indices_by_parent is untyped for mypy; annotate it as a DefaultDict mapping parent IDs to lists of chunk indices. Replace the current assignment with a typed declaration like kept_indices_by_parent: DefaultDict[str, List[int]] = defaultdict(list) (or use DefaultDict[Any, List[int]] if parent keys aren’t strings) and add the necessary typing imports (DefaultDict, List) while keeping the existing collections.defaultdict import; update the declaration near the other variables (expanded_chunks, expanded_scores) and run mypy to verify.
697-800:⚠️ Potential issue | 🔴 CriticalNarrow
chunk_window_configbefore dereferencing it.These helpers assume chunk-windowing is enabled, but
self.chunk_window_configis typed as optional, which causes the reportedunion-attrmypy failures. Add anassert schema is not Noneafter assigningschemain each chunk-window-only helper.🔧 Proposed mypy fix pattern
schema = self.chunk_window_config + assert schema is not NoneAlso applies to: 859-864, 1103-1104
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@lightspeed_stack_providers/providers/remote/solr_vector_io/solr_vector_io/src/solr_vector_io/solr.py` around lines 697 - 800, The code dereferences self.chunk_window_config (assigned to variable schema) in chunk-window-only helpers like the parent-metadata fetcher and _fetch_context_chunks even though chunk_window_config is Optional; add an assertion directly after "schema = self.chunk_window_config" in each of these helpers: assert schema is not None, to narrow the type for mypy and avoid union-attr errors (do the same in the other chunk-window-only helper methods where schema is used).
463-485:⚠️ Potential issue | 🔴 CriticalAnnotate
solr_paramswith an HTTPX-compatible value type.Mypy currently rejects this
client.get(..., params=solr_params)call because the dict is inferred asdict[str, object]after mixing strings and integers.🔧 Proposed typing fix
- solr_params = { + solr_params: dict[str, str | int] = { "q": query_string, "rows": k, "fl": "*,score", "wt": "json", }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@lightspeed_stack_providers/providers/remote/solr_vector_io/solr_vector_io/src/solr_vector_io/solr.py` around lines 463 - 485, The mypy error comes from solr_params being inferred as dict[str, object] and HTTPX expects params values to be Any/str-like; fix by annotating solr_params as dict[str, Any] (import Any from typing) and normalize non-string values (e.g., set "rows": str(k)) before calling client.get; update the block that builds solr_params (symbol: solr_params) and the client.get call (symbol: client.get) so the params argument is accepted by the type checker.
1230-1239:⚠️ Potential issue | 🔴 CriticalGuard optional chunk-window metadata access in
_doc_to_chunk.Queries without
chunk_window_configwill hitself.chunk_window_config.chunk_token_count_fieldand fall into the broad exception handler, dropping otherwise valid results. This also matches the mypyunion-attrfailures.🐛 Proposed runtime fix
- if self.chunk_window_config.chunk_token_count_field in doc: - metadata[self.chunk_window_config.chunk_token_count_field] = doc[ - self.chunk_window_config.chunk_token_count_field - ] - # add family fields to the metadata since we need to access them - # for comparison with other chunks later on - if self.chunk_window_config.chunk_family_fields: - for field in self.chunk_window_config.chunk_family_fields: - if field in doc: - metadata[field] = doc[field] + if self.chunk_window_config is not None: + token_count_field = self.chunk_window_config.chunk_token_count_field + if token_count_field in doc: + metadata[token_count_field] = doc[token_count_field] + + # add family fields to the metadata since we need to access them + # for comparison with other chunks later on + if self.chunk_window_config.chunk_family_fields: + for field in self.chunk_window_config.chunk_family_fields: + if field in doc: + metadata[field] = doc[field]🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@lightspeed_stack_providers/providers/remote/solr_vector_io/solr_vector_io/src/solr_vector_io/solr.py` around lines 1230 - 1239, In _doc_to_chunk, avoid accessing self.chunk_window_config attributes when chunk_window_config can be None: first check that self.chunk_window_config is truthy before referencing self.chunk_window_config.chunk_token_count_field or self.chunk_window_config.chunk_family_fields; only then copy doc values into metadata (use the existing metadata dict and the same field names), and keep the existing loop over chunk_family_fields but guarded by the same presence check so queries without chunk-window config don't hit the broad exception handler or trigger union-attr mypy errors.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In
`@lightspeed_stack_providers/providers/remote/solr_vector_io/solr_vector_io/README.md`:
- Around line 219-220: Insert a blank line before each fenced Python code block
in the README so markdownlint MD031 is satisfied; specifically add an empty line
after the preceding paragraph/heading lines such as "**Simple equality
filter:**", "**Range filter:**", "**Multiple values with 'in' filter:**",
"**Compound filters (AND/OR):**", and "**Nested compound filters:**" so each
subsequent ```python fence (examples using ComparisonFilter,
adapter.query_chunks, CompoundFilter) is separated by a blank line from the
preceding text.
In
`@lightspeed_stack_providers/providers/remote/solr_vector_io/solr_vector_io/src/solr_vector_io/solr.py`:
- Around line 177-195: The filter_parts list is currently untyped (filter_parts
= []) which can cause narrow mypy inference; change its declaration to an
explicit string list type (e.g., filter_parts: list[str] = [] or
typing.List[str] = []) at the top of the function that builds the Solr fq (the
block referencing filter_parts, chunk_filter_query, filters and calling
_filter_to_solr_fq) so mypy treats it as a list of strings; ensure imports are
adjusted if you use typing.List for older Python compatibility.
- Around line 54-58: The RET504 lint flag is triggered by assigning the result
of the second replace to `value` and then returning it; change the end of the
function so you keep the first replacement assignment (value =
value.replace("\\", "\\\\")) but return the second replacement directly (return
value.replace('"', '\\"')) instead of reassigning `value`; update the code in
solr.py around the two replace calls for the `value` variable accordingly.
- Around line 77-101: Validate and sanitize filter field names before
interpolating them into Solr queries and ensure all range bounds are escaped via
_format_solr_value; specifically, in places that use filter_obj.key (and in
functions like _handle_in_filter and the range-handling code for gt/gte/lt/lte),
first validate the field name against a safe pattern such as
^[A-Za-z_][A-Za-z0-9_.-]*$ and raise ValueError if it fails, and wrap every
range bound and any raw value interpolations with _format_solr_value() so
equality, inequality and range operators use consistent escaping.
In `@tests/unit/providers/remote/solr_vector_io/test_filters.py`:
- Around line 20-177: The tests are missing return type annotations and brief
docstrings; update each test function (e.g., test_eq_filter_with_string,
test_eq_filter_with_number, test_ne_filter, test_gt_filter, test_gte_filter,
test_lt_filter, test_lte_filter, test_in_filter_with_strings,
test_in_filter_with_numbers, test_in_filter_empty_list, test_nin_filter,
test_nin_filter_empty_list, test_string_escaping,
test_in_filter_with_non_list_raises_error, test_and_filter, test_or_filter,
test_nested_compound_filter, test_empty_compound_filter, test_no_filters,
test_only_static_filter, test_only_dynamic_filter,
test_combined_static_and_dynamic_filters, test_combined_with_compound_filter) to
include a -> None return annotation and a one-line docstring describing what the
test asserts; keep implementations unchanged and place the docstring as the
first statement in each function.
- Line 8: The pylint failure in tests importing pytest means pytest is missing
from the environment used by pylint; fix by either adding pytest to the dev
extras in pyproject.toml (add "pytest" to the dev dependency group) or ensure
pylint runs with the test extras (use the runtime command that includes the test
extras, e.g., run pylint with the test extra like `uv run --extra test pylint
...`), so that the `import pytest` in test files is resolvable during static
analysis.
---
Outside diff comments:
In
`@lightspeed_stack_providers/providers/remote/solr_vector_io/solr_vector_io/src/solr_vector_io/solr.py`:
- Around line 1296-1300: Annotate the untyped cache on the class by giving
self.cache an explicit type (for example use typing.Dict[str, typing.Any] or a
more specific type if known) instead of assigning a bare {} in the constructor;
add the necessary import from typing (Dict, Any or the precise types) and
replace the line setting self.cache so mypy knows its type (refer to the
attribute name self.cache in the class __init__).
- Around line 860-865: The variable kept_indices_by_parent is untyped for mypy;
annotate it as a DefaultDict mapping parent IDs to lists of chunk indices.
Replace the current assignment with a typed declaration like
kept_indices_by_parent: DefaultDict[str, List[int]] = defaultdict(list) (or use
DefaultDict[Any, List[int]] if parent keys aren’t strings) and add the necessary
typing imports (DefaultDict, List) while keeping the existing
collections.defaultdict import; update the declaration near the other variables
(expanded_chunks, expanded_scores) and run mypy to verify.
- Around line 697-800: The code dereferences self.chunk_window_config (assigned
to variable schema) in chunk-window-only helpers like the parent-metadata
fetcher and _fetch_context_chunks even though chunk_window_config is Optional;
add an assertion directly after "schema = self.chunk_window_config" in each of
these helpers: assert schema is not None, to narrow the type for mypy and avoid
union-attr errors (do the same in the other chunk-window-only helper methods
where schema is used).
- Around line 463-485: The mypy error comes from solr_params being inferred as
dict[str, object] and HTTPX expects params values to be Any/str-like; fix by
annotating solr_params as dict[str, Any] (import Any from typing) and normalize
non-string values (e.g., set "rows": str(k)) before calling client.get; update
the block that builds solr_params (symbol: solr_params) and the client.get call
(symbol: client.get) so the params argument is accepted by the type checker.
- Around line 1230-1239: In _doc_to_chunk, avoid accessing
self.chunk_window_config attributes when chunk_window_config can be None: first
check that self.chunk_window_config is truthy before referencing
self.chunk_window_config.chunk_token_count_field or
self.chunk_window_config.chunk_family_fields; only then copy doc values into
metadata (use the existing metadata dict and the same field names), and keep the
existing loop over chunk_family_fields but guarded by the same presence check so
queries without chunk-window config don't hit the broad exception handler or
trigger union-attr mypy errors.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: b35e613b-e7bb-4182-86bf-5784c55a1e64
📒 Files selected for processing (4)
README.mdlightspeed_stack_providers/providers/remote/solr_vector_io/solr_vector_io/README.mdlightspeed_stack_providers/providers/remote/solr_vector_io/solr_vector_io/src/solr_vector_io/solr.pytests/unit/providers/remote/solr_vector_io/test_filters.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: e2e_tests
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
**/*.py: Use absolute imports for internal modules (e.g.,from authentication import get_auth_dependency)
FastAPI dependencies: import fromfastapiusingAPIRouter, HTTPException, Request, status, Depends
Usefrom llama_stack_client import AsyncLlamaStackClientfor Llama Stack integration
All modules must start with descriptive docstrings explaining purpose
Uselogger = get_logger(__name__)fromlog.pyfor module logging
Type aliases should be defined at module level for clarity
All config classes must use Pydantic models extendingConfigurationBasewithextra="forbid"to reject unknown fields
Use@field_validatorand@model_validatorfor custom validation in Pydantic models
Configuration type hints should useOptional[FilePath],PositiveInt,SecretStrand other Pydantic-specific types
All functions require docstrings with brief descriptions
Complete type annotations for function parameters and return types are required, using modern syntax (str | intinstead ofUnion[str, int]),typing_extensions.Selffor model validators, andOptional[Type]for optional values
Use snake_case with descriptive, action-oriented function names (get_, validate_, check_)
Avoid in-place parameter modification anti-patterns; return new data structures instead of modifying function parameters
Useasync deffor I/O operations and external API calls
For API endpoints, use FastAPIHTTPExceptionwith appropriate status codes; handleAPIConnectionErrorfrom Llama Stack
Use standard logging levels with clear purposes:logger.debug()for diagnostic info,logger.info()for general execution info,logger.warning()for unexpected events,logger.error()for serious problems
All classes require descriptive docstrings explaining purpose
Use PascalCase for class names with descriptive names and standard suffixes:Configurationfor config classes,Error/Exceptionfor custom exceptions,Resolverfor strategy pattern implementations,Interfacefor...
Files:
tests/unit/providers/remote/solr_vector_io/test_filters.pylightspeed_stack_providers/providers/remote/solr_vector_io/solr_vector_io/src/solr_vector_io/solr.py
tests/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
tests/**/*.py: Use pytest for all unit and integration tests; do not use unittest
Usepytest-mockfor MagicMock and AsyncMock objects in tests
Unit test fixtures should use@pytest.fixture(name="fixture_name")decorator with properly typed return values; auth mocks should useMOCK_AUTH = ("mock_user_id", "mock_username", False, "mock_token")
Files:
tests/unit/providers/remote/solr_vector_io/test_filters.py
🧠 Learnings (2)
📚 Learning: 2026-04-14T19:08:55.908Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-providers PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-14T19:08:55.908Z
Learning: Applies to **/*.py : Use `from llama_stack_client import AsyncLlamaStackClient` for Llama Stack integration
Applied to files:
README.md
📚 Learning: 2026-04-14T19:08:55.908Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-providers PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-14T19:08:55.908Z
Learning: Applies to **/*.py : For API endpoints, use FastAPI `HTTPException` with appropriate status codes; handle `APIConnectionError` from Llama Stack
Applied to files:
README.md
🪛 GitHub Actions: Black
lightspeed_stack_providers/providers/remote/solr_vector_io/solr_vector_io/src/solr_vector_io/solr.py
[error] 1-1: Black --check failed: file would be reformatted (exit code 1). Run 'uv run black . --write' to apply formatting.
🪛 GitHub Actions: Python linter
tests/unit/providers/remote/solr_vector_io/test_filters.py
[error] 8-8: pylint: Unable to import 'pytest' (import-error)
lightspeed_stack_providers/providers/remote/solr_vector_io/solr_vector_io/src/solr_vector_io/solr.py
[error] 1-1: pylint: Too many lines in module (1557/1000) (too-many-lines)
🪛 GitHub Actions: Ruff
lightspeed_stack_providers/providers/remote/solr_vector_io/solr_vector_io/src/solr_vector_io/solr.py
[error] 58-58: ruff check failed: RET504 Unnecessary assignment to value before return statement. Rule: RET504. Help: Remove unnecessary assignment.
🪛 GitHub Actions: Type checks
lightspeed_stack_providers/providers/remote/solr_vector_io/solr_vector_io/src/solr_vector_io/solr.py
[error] 485-485: mypy: Argument "params" to "get" of "AsyncClient" has incompatible type "dict[str, object]"; expected QueryParams / Mapping[str, str|int|float|bool|None|Sequence[...]] / etc. [arg-type].
[error] 701-711: mypy: ChunkWindowConfig is Optional; accessing fields like "parent_id_field" / "parent_total_chunks_field" / "parent_total_tokens_field" / "parent_content_id_field" / ... without None-check causes union-attr errors [union-attr].
[error] 718-735: mypy: ChunkWindowConfig Optional union-attr errors when accessing parent_* fields (e.g., "parent_id_field", "parent_total_chunks_field"). [union-attr].
[error] 772-817: mypy: ChunkWindowConfig Optional union-attr errors when accessing chunk_* fields (e.g., "chunk_index_field", "chunk_token_count_field", "chunk_parent_id_field", "chunk_family_fields", "chunk_filter_query"). [union-attr].
[error] 864-864: mypy: Need type annotation for "kept_indices_by_parent" [var-annotated].
[error] 877-1010: mypy: ChunkWindowConfig Optional union-attr errors when accessing chunk_parent_id_field / chunk_index_field / parent_total_* / chunk_family_fields / chunk_token_count_field fields. [union-attr].
[error] 1044-1055: mypy: ChunkWindowConfig Optional union-attr errors when accessing parent_content_id_field / parent_content_title_field / parent_content_url_field. [union-attr].
[error] 1113-1159: mypy: ChunkWindowConfig Optional union-attr errors when accessing chunk_token_count_field / chunk_index_field / etc. [union-attr].
[error] 1230-1237: mypy: ChunkWindowConfig Optional union-attr errors when accessing chunk_token_count_field / chunk_family_fields. [union-attr].
[error] 1299-1299: mypy: Need type annotation for "cache" [var-annotated].
🪛 markdownlint-cli2 (0.22.0)
lightspeed_stack_providers/providers/remote/solr_vector_io/solr_vector_io/README.md
[warning] 220-220: Fenced code blocks should be surrounded by blank lines
(MD031, blanks-around-fences)
[warning] 239-239: Fenced code blocks should be surrounded by blank lines
(MD031, blanks-around-fences)
[warning] 256-256: Fenced code blocks should be surrounded by blank lines
(MD031, blanks-around-fences)
[warning] 273-273: Fenced code blocks should be surrounded by blank lines
(MD031, blanks-around-fences)
[warning] 294-294: Fenced code blocks should be surrounded by blank lines
(MD031, blanks-around-fences)
🔇 Additional comments (1)
README.md (1)
61-63: LGTM.The documented Llama Stack prerequisite now matches the version required for dynamic filter support.
476fc2a to
3a2be72
Compare
Updates the Solr vector search integration to support the new Filter API
introduced in llama-stack 0.6.0, enabling metadata-based filtering of RAG
results using comparison and compound filters.
Filter format examples:
Simple: {"filters": {"type": "eq", "key": "platform", "value": "openshift"}}
Compound: {"filters": {"type": "and", "filters": [...]}}
Note: This change requires lightspeed-providers with solr_vector_io filter support,
introduced in lightspeed-core/lightspeed-providers#119
Updates the Solr vector search integration to support the new Filter API
introduced in llama-stack 0.6.0, enabling metadata-based filtering of RAG
results using comparison and compound filters.
Filter format examples:
Simple: {"filters": {"type": "eq", "key": "platform", "value": "openshift"}}
Compound: {"filters": {"type": "and", "filters": [...]}}
Note: This change requires lightspeed-providers with solr_vector_io filter support,
introduced in lightspeed-core/lightspeed-providers#119
Updates the Solr vector search integration to support the new Filter API
introduced in llama-stack 0.6.0, enabling metadata-based filtering of RAG
results using comparison and compound filters.
Filter format examples:
Simple: {"filters": {"type": "eq", "key": "platform", "value": "openshift"}}
Compound: {"filters": {"type": "and", "filters": [...]}}
Note: This change requires lightspeed-providers with solr_vector_io filter support,
introduced in lightspeed-core/lightspeed-providers#119
3a2be72 to
1be8a76
Compare
Adds support for dynamic metadata filtering to the Solr vector IO provider, enabling users to filter RAG query results by metadata fields at runtime. NOTE: Dynamic filters from requests are **combined with** the existing static `chunk_filter_query` configuration (e.g., `"is_chunk:true"`) rather than replacing it. This preserves backward compatibility and maintains the internal schema filtering needed for chunk window expansion to work correctly. Both filters are joined with AND logic: fq=(is_chunk:true AND ) This approach avoids breaking changes while enabling flexible metadata filtering for use cases like filtering by platform, version, document type, etc. - Upstream PR: ogx-ai/ogx#4471
1be8a76 to
e41ffae
Compare
There was a problem hiding this comment.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
lightspeed_stack_providers/providers/remote/solr_vector_io/solr_vector_io/src/solr_vector_io/solr.py (1)
482-498:⚠️ Potential issue | 🟡 Minormypy arg-type regression at
client.get(..., params=solr_params).The CI
Type checksjob reportsline 503: Argument "params" to "get" ... has incompatible type "dict[str, object]".solr_paramsstarts as a heterogeneous literal (str/int), and the new conditionalsolr_params["fq"] = combined_filterkeeps mypy inferringdict[str, object]. Annotate the dict at the declaration (line 482) so httpx'sQueryParamsoverload is satisfied.🔧 Fix
- solr_params = { + solr_params: dict[str, str | int] = { "q": query_string, "rows": k, "fl": "*,score", "wt": "json", }Same pattern applies to
data_paramsinquery_hybridif it starts failing type checks.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@lightspeed_stack_providers/providers/remote/solr_vector_io/solr_vector_io/src/solr_vector_io/solr.py` around lines 482 - 498, The mypy error is caused by solr_params being inferred as dict[str, object]; fix by adding an explicit annotation at its declaration so it matches httpx's QueryParams overload (e.g. change the declaration to something like solr_params: dict[str, str | int] = {...} and ensure combined_filter (from _build_solr_filter_query) is a str before assigning to solr_params["fq"]); apply the same pattern to data_params in query_hybrid if needed.
♻️ Duplicate comments (1)
lightspeed_stack_providers/providers/remote/solr_vector_io/solr_vector_io/src/solr_vector_io/solr.py (1)
59-63: 🧹 Nitpick | 🔵 TrivialMinor: use
fullmatchfor readability.The pattern already anchors with
^…$, sore.matchworks, but_FIELD_NAME_PATTERN.fullmatch(field_name)expresses intent more directly and is robust to future pattern edits that drop the anchors.- if not _FIELD_NAME_PATTERN.match(field_name): + if not _FIELD_NAME_PATTERN.fullmatch(field_name):🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@lightspeed_stack_providers/providers/remote/solr_vector_io/solr_vector_io/src/solr_vector_io/solr.py` around lines 59 - 63, Replace the use of _FIELD_NAME_PATTERN.match(field_name) with _FIELD_NAME_PATTERN.fullmatch(field_name) in the validation check so intent is explicit and robust if the regex is later changed; update the condition where the ValueError is raised (the block that checks field_name using _FIELD_NAME_PATTERN) to call fullmatch instead of match, leaving the error message unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In
`@lightspeed_stack_providers/providers/remote/solr_vector_io/solr_vector_io/README.md`:
- Line 307: The README claim that "Special Solr characters in filter values are
automatically escaped" is inaccurate given _escape_solr_value in solr.py only
handles backslashes and double quotes; either update the README to state the
precise scope (only backslashes and double quotes are escaped) or expand
_escape_solr_value to also escape Solr-reserved characters (+ - && || ! ( ) { }
[ ] ^ ~ * ? : /) and update callers (e.g., eq/ne filter building logic) and
tests accordingly; reference the _escape_solr_value function in solr.py and the
README entry to ensure consistent wording or implementation and add unit tests
that demonstrate the new escaping behavior if you implement it.
In
`@lightspeed_stack_providers/providers/remote/solr_vector_io/solr_vector_io/src/solr_vector_io/solr.py`:
- Around line 398-407: The repeated block that computes chunk_filter and
combined_filter and sets params["fq"] appears in SolrIndex.query_vector,
SolrIndex.query_keyword, and SolrIndex.query_hybrid; extract this into a new
helper method on SolrIndex (e.g., _apply_chunk_and_dynamic_filters(params,
filters)) that does: read self.chunk_window_config.chunk_filter_query (safely if
chunk_window_config is None), call _build_solr_filter_query(chunk_filter,
filters), set params["fq"] when non-empty, and log via log.debug; then replace
the duplicated 10-line block in query_vector, query_keyword, and query_hybrid
with a single call to that helper so semantics and logging remain identical.
- Around line 154-155: The `ne` and `nin` comparison branches are emitting
pure-negative clauses that break when combined in an OR; update the Solr clause
generation so each negation is wrapped with a match-all prefix. Specifically, in
the comparison handling where filter_type == "ne" change the returned clause to
wrap the negative term as "(*:* AND -{key}:{_format_solr_value(value)})", apply
the same wrapping to the `nin` branch(s) in the comparison handler, and also
modify the `_handle_in_filter` function to emit each negated IN as "(*:* AND
-{key}:{val})" rather than a pure negative term; finally add unit tests that
compose `or` with `ne` and `nin` filters to ensure these cases no longer produce
empty-match queries.
In `@tests/unit/providers/remote/solr_vector_io/test_filters.py`:
- Around line 11-17: The test imports use an overly deep src-layout path for
_build_solr_filter_query and _filter_to_solr_fq; create a short, stable import
surface (e.g., add a package-level shim/__init__ that re-exports
_build_solr_filter_query and _filter_to_solr_fq from the internal solr module or
flatten the package layout) and update the test to import those symbols from the
new top-level entrypoint, then remove the pylint: disable=line-too-long guard in
the test.
---
Outside diff comments:
In
`@lightspeed_stack_providers/providers/remote/solr_vector_io/solr_vector_io/src/solr_vector_io/solr.py`:
- Around line 482-498: The mypy error is caused by solr_params being inferred as
dict[str, object]; fix by adding an explicit annotation at its declaration so it
matches httpx's QueryParams overload (e.g. change the declaration to something
like solr_params: dict[str, str | int] = {...} and ensure combined_filter (from
_build_solr_filter_query) is a str before assigning to solr_params["fq"]); apply
the same pattern to data_params in query_hybrid if needed.
---
Duplicate comments:
In
`@lightspeed_stack_providers/providers/remote/solr_vector_io/solr_vector_io/src/solr_vector_io/solr.py`:
- Around line 59-63: Replace the use of _FIELD_NAME_PATTERN.match(field_name)
with _FIELD_NAME_PATTERN.fullmatch(field_name) in the validation check so intent
is explicit and robust if the regex is later changed; update the condition where
the ValueError is raised (the block that checks field_name using
_FIELD_NAME_PATTERN) to call fullmatch instead of match, leaving the error
message unchanged.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 42ec098d-8e75-4dc7-b8b3-085dd939f0c0
📒 Files selected for processing (4)
README.mdlightspeed_stack_providers/providers/remote/solr_vector_io/solr_vector_io/README.mdlightspeed_stack_providers/providers/remote/solr_vector_io/solr_vector_io/src/solr_vector_io/solr.pytests/unit/providers/remote/solr_vector_io/test_filters.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: e2e_tests
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
**/*.py: Use absolute imports for internal modules (e.g.,from authentication import get_auth_dependency)
FastAPI dependencies: import fromfastapiusingAPIRouter, HTTPException, Request, status, Depends
Usefrom llama_stack_client import AsyncLlamaStackClientfor Llama Stack integration
All modules must start with descriptive docstrings explaining purpose
Uselogger = get_logger(__name__)fromlog.pyfor module logging
Type aliases should be defined at module level for clarity
All config classes must use Pydantic models extendingConfigurationBasewithextra="forbid"to reject unknown fields
Use@field_validatorand@model_validatorfor custom validation in Pydantic models
Configuration type hints should useOptional[FilePath],PositiveInt,SecretStrand other Pydantic-specific types
All functions require docstrings with brief descriptions
Complete type annotations for function parameters and return types are required, using modern syntax (str | intinstead ofUnion[str, int]),typing_extensions.Selffor model validators, andOptional[Type]for optional values
Use snake_case with descriptive, action-oriented function names (get_, validate_, check_)
Avoid in-place parameter modification anti-patterns; return new data structures instead of modifying function parameters
Useasync deffor I/O operations and external API calls
For API endpoints, use FastAPIHTTPExceptionwith appropriate status codes; handleAPIConnectionErrorfrom Llama Stack
Use standard logging levels with clear purposes:logger.debug()for diagnostic info,logger.info()for general execution info,logger.warning()for unexpected events,logger.error()for serious problems
All classes require descriptive docstrings explaining purpose
Use PascalCase for class names with descriptive names and standard suffixes:Configurationfor config classes,Error/Exceptionfor custom exceptions,Resolverfor strategy pattern implementations,Interfacefor...
Files:
tests/unit/providers/remote/solr_vector_io/test_filters.pylightspeed_stack_providers/providers/remote/solr_vector_io/solr_vector_io/src/solr_vector_io/solr.py
tests/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
tests/**/*.py: Use pytest for all unit and integration tests; do not use unittest
Usepytest-mockfor MagicMock and AsyncMock objects in tests
Unit test fixtures should use@pytest.fixture(name="fixture_name")decorator with properly typed return values; auth mocks should useMOCK_AUTH = ("mock_user_id", "mock_username", False, "mock_token")
Files:
tests/unit/providers/remote/solr_vector_io/test_filters.py
🧠 Learnings (4)
📚 Learning: 2026-04-14T19:08:55.908Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-providers PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-14T19:08:55.908Z
Learning: Applies to **/*.py : Use `from llama_stack_client import AsyncLlamaStackClient` for Llama Stack integration
Applied to files:
README.md
📚 Learning: 2026-04-14T19:08:55.908Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-providers PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-14T19:08:55.908Z
Learning: Applies to **/*.py : For API endpoints, use FastAPI `HTTPException` with appropriate status codes; handle `APIConnectionError` from Llama Stack
Applied to files:
README.md
📚 Learning: 2026-04-14T19:08:55.908Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-providers PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-14T19:08:55.908Z
Learning: Configure pylint with `source-roots = "lightspeed_stack_providers/"`
Applied to files:
tests/unit/providers/remote/solr_vector_io/test_filters.py
📚 Learning: 2026-04-14T19:08:55.908Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-providers PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-14T19:08:55.908Z
Learning: Applies to tests/**/*.py : Unit test fixtures should use `pytest.fixture(name="fixture_name")` decorator with properly typed return values; auth mocks should use `MOCK_AUTH = ("mock_user_id", "mock_username", False, "mock_token")`
Applied to files:
tests/unit/providers/remote/solr_vector_io/test_filters.py
🪛 GitHub Actions: Python linter
tests/unit/providers/remote/solr_vector_io/test_filters.py
[error] 8-8: pylint (import-error): Unable to import 'pytest'
lightspeed_stack_providers/providers/remote/solr_vector_io/solr_vector_io/src/solr_vector_io/solr.py
[error] 1-1: pylint (C0302) Too many lines in module (1558/1000)
[error] 870-870: pylint (C0415) Import outside toplevel (collections.defaultdict)
🪛 GitHub Actions: Type checks
lightspeed_stack_providers/providers/remote/solr_vector_io/solr_vector_io/src/solr_vector_io/solr.py
[error] 503-503: mypy: Argument "params" to "get" of "AsyncClient" has incompatible type "dict[str, object]"; expected "QueryParams | Mapping[str, str | int | float | bool | None | Sequence[str | int | float | bool | None]] | list[tuple[str, str | int | float | bool | None]] | tuple[tuple[str, str | int | float | bool | None], ...] | str | bytes | None" [arg-type].
[error] 717-717: mypy: Item "None" of "ChunkWindowConfig | None" has no attribute "parent_id_field" [union-attr].
[error] 718-718: mypy: Item "None" of "ChunkWindowConfig | None" has no attribute "parent_total_chunks_field" [union-attr].
[error] 719-719: mypy: Item "None" of "ChunkWindowConfig | None" has no attribute "parent_total_tokens_field" [union-attr].
[error] 722-724: mypy: Item "None" of "ChunkWindowConfig | None" has no attribute(s) "parent_content_id_field" / "parent_content_title_field" [union-attr] (multiple lines reported).
[error] 726-727: mypy: Item "None" of "ChunkWindowConfig | None" has no attribute(s) "parent_content_url_field" / "parent_id_field" [union-attr] (multiple lines reported).
[error] 877-877: mypy: Need type annotation for "kept_indices_by_parent" [var-annotated].
[error] 1305-1305: mypy: Need type annotation for "cache" (hint: "cache: dict[, ] = ...") [var-annotated].
🔇 Additional comments (3)
tests/unit/providers/remote/solr_vector_io/test_filters.py (1)
8-8: pytest import error in pylint job (already flagged).Pipeline still reports
pylint: Unable to import 'pytest' (import-error). Per the previous review, addpytestto thedevdependency group (or run pylint viauv run --extra test pylint …) so static analysis can resolve the import.README.md (1)
62-62: LGTM — version bump aligned with provider changes.
Llama Stack == 0.6.0matches the library version required by the newllama_stack.providers.utils.vector_io.filtersimport used insolr.py.lightspeed_stack_providers/providers/remote/solr_vector_io/solr_vector_io/src/solr_vector_io/solr.py (1)
17-21: Verify theFilterimport path in llama-stack 0.6.0 using Python 3.12+.The sandbox environment cannot verify this import path because it runs Python 3.11.2 and llama-stack 0.6.0 requires Python >=3.12. Please confirm in a Python 3.12+ environment that
llama_stack.providers.utils.vector_io.filtersexportsComparisonFilter,CompoundFilter, andFilter.
|
|
||
| - **Static filters preserved:** The configured `chunk_filter_query` (e.g., `"is_chunk:true"`) is always applied to maintain proper chunk/parent document separation | ||
| - **Dynamic filters added:** Request filters are combined with static filters using AND logic | ||
| - **String escaping:** Special Solr characters in filter values are automatically escaped |
There was a problem hiding this comment.
Doc overstates the escaping scope.
The doc claims "Special Solr characters in filter values are automatically escaped", but _escape_solr_value in solr.py only escapes backslashes and double quotes. Other Solr-reserved characters (+ - && || ! ( ) { } [ ] ^ ~ * ? : /) are not escaped and will be parsed by Solr. Since eq/ne values are wrapped in quotes most of these are inert in practice, but users should not rely on this blanket statement — consider clarifying the scope.
📝 Suggested wording
-- **String escaping:** Special Solr characters in filter values are automatically escaped
+- **String escaping:** Backslashes and double quotes in string values are escaped; values are wrapped in quotes so most Solr-reserved characters are treated literally📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| - **String escaping:** Special Solr characters in filter values are automatically escaped | |
| - **String escaping:** Backslashes and double quotes in string values are escaped; values are wrapped in quotes so most Solr-reserved characters are treated literally |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In
`@lightspeed_stack_providers/providers/remote/solr_vector_io/solr_vector_io/README.md`
at line 307, The README claim that "Special Solr characters in filter values are
automatically escaped" is inaccurate given _escape_solr_value in solr.py only
handles backslashes and double quotes; either update the README to state the
precise scope (only backslashes and double quotes are escaped) or expand
_escape_solr_value to also escape Solr-reserved characters (+ - && || ! ( ) { }
[ ] ^ ~ * ? : /) and update callers (e.g., eq/ne filter building logic) and
tests accordingly; reference the _escape_solr_value function in solr.py and the
README entry to ensure consistent wording or implementation and add unit tests
that demonstrate the new escaping behavior if you implement it.
| if filter_type == "ne": | ||
| return f"-{key}:{_format_solr_value(value)}" |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Confirm whether the existing tests cover OR-of-NE combinations.
rg -nP 'type="or"' tests/unit/providers/remote/solr_vector_io/test_filters.py -C 6Repository: lightspeed-core/lightspeed-providers
Length of output: 1226
🏁 Script executed:
#!/bin/bash
# Read the solr.py file to check lines 154-155 and surrounding context
head -200 lightspeed_stack_providers/providers/remote/solr_vector_io/solr_vector_io/src/solr_vector_io/solr.py | tail -60Repository: lightspeed-core/lightspeed-providers
Length of output: 2043
🏁 Script executed:
#!/bin/bash
# Check for lines 176-179 and 125 in the same file
sed -n '120,130p' lightspeed_stack_providers/providers/remote/solr_vector_io/solr_vector_io/src/solr_vector_io/solr.py
echo "---"
sed -n '170,185p' lightspeed_stack_providers/providers/remote/solr_vector_io/solr_vector_io/src/solr_vector_io/solr.pyRepository: lightspeed-core/lightspeed-providers
Length of output: 958
🏁 Script executed:
#!/bin/bash
# Check test_filters.py for any tests with NE or NIN inside OR
rg -n 'type="ne"|type="nin"' tests/unit/providers/remote/solr_vector_io/test_filters.py -C 8Repository: lightspeed-core/lightspeed-providers
Length of output: 2765
Solr pure-negation disjunction bug: ne and nin inside or produce invalid queries matching zero docs.
When ne or nin filters are composed inside an or compound filter, the generated Solr syntax becomes a pure-negative boolean query that matches nothing. For example, CompoundFilter(type="or", filters=[ComparisonFilter(type="ne", ...), ComparisonFilter(type="ne", ...)]) produces (-a:"x" OR -b:"y"), which Solr evaluates as matching no documents. This contradicts the advertised support for arbitrary combinations of comparison operators inside and/or gates.
The fix is to wrap each negation at the comparison filter level. For ne (lines 154-155), change the return value to (*:* AND -{key}:{_format_solr_value(value)}). Apply the same pattern to nin (lines 176-179 and the _handle_in_filter function at line 125). This ensures negations remain valid whether standalone, inside and, or inside or.
Update unit tests to cover or compositions with ne and nin operators to prevent regression.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In
`@lightspeed_stack_providers/providers/remote/solr_vector_io/solr_vector_io/src/solr_vector_io/solr.py`
around lines 154 - 155, The `ne` and `nin` comparison branches are emitting
pure-negative clauses that break when combined in an OR; update the Solr clause
generation so each negation is wrapped with a match-all prefix. Specifically, in
the comparison handling where filter_type == "ne" change the returned clause to
wrap the negative term as "(*:* AND -{key}:{_format_solr_value(value)})", apply
the same wrapping to the `nin` branch(s) in the comparison handler, and also
modify the `_handle_in_filter` function to emit each negated IN as "(*:* AND
-{key}:{val})" rather than a pure negative term; finally add unit tests that
compose `or` with `ne` and `nin` filters to ensure these cases no longer produce
empty-match queries.
| # pylint: disable=line-too-long | ||
| from lightspeed_stack_providers.providers.remote.solr_vector_io.solr_vector_io.src.solr_vector_io.solr import ( | ||
| _build_solr_filter_query, | ||
| _filter_to_solr_fq, | ||
| ) | ||
|
|
||
| # pylint: enable=line-too-long |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Long import path is a structural smell.
The six-segment path lightspeed_stack_providers.providers.remote.solr_vector_io.solr_vector_io.src.solr_vector_io.solr (with solr_vector_io appearing three times and a src/ segment in the dotted import) suggests the package layout mixes a src-layout with a non-src parent. It forces the pylint: disable=line-too-long guards here. Consider a follow-up to flatten the layout or expose a top-level import shim; not blocking for this PR.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/unit/providers/remote/solr_vector_io/test_filters.py` around lines 11 -
17, The test imports use an overly deep src-layout path for
_build_solr_filter_query and _filter_to_solr_fq; create a short, stable import
surface (e.g., add a package-level shim/__init__ that re-exports
_build_solr_filter_query and _filter_to_solr_fq from the internal solr module or
flatten the package layout) and update the test to import those symbols from the
new top-level entrypoint, then remove the pylint: disable=line-too-long guard in
the test.
Description
Adds support for dynamic metadata filtering to the Solr vector IO provider, enabling users to filter RAG query results by metadata fields at runtime.
NOTE: Dynamic filters from requests are combined with the existing static
chunk_filter_queryconfiguration (e.g.,"is_chunk:true") rather than replacing it. This preserves backward compatibility and maintains the internal schema filtering needed for chunk window expansion to work correctly.Both filters are joined with AND logic: fq=(is_chunk:true AND )
This approach avoids breaking changes while enabling flexible metadata filtering for use cases like filtering by platform, version, document type, etc.
Type of change
Tools used to create PR
Identify any AI code assistants used in this PR (for transparency and review context)
Related Tickets & Documents
Checklist before requesting a review
Testing
Summary by CodeRabbit
New Features
Documentation
Tests
Chores