Skip to content

LCORE-1446: Add dynamic filter support to Solr vector IO provider#119

Open
anik120 wants to merge 1 commit intolightspeed-core:mainfrom
anik120:vector-io-filters
Open

LCORE-1446: Add dynamic filter support to Solr vector IO provider#119
anik120 wants to merge 1 commit intolightspeed-core:mainfrom
anik120:vector-io-filters

Conversation

@anik120
Copy link
Copy Markdown

@anik120 anik120 commented Apr 23, 2026

Description

Adds support for dynamic metadata filtering to the Solr vector IO provider, enabling users to filter RAG query results by metadata fields at runtime.

NOTE: Dynamic filters from requests are combined with the existing static chunk_filter_query configuration (e.g., "is_chunk:true") rather than replacing it. This preserves backward compatibility and maintains the internal schema filtering needed for chunk window expansion to work correctly.

Both filters are joined with AND logic: fq=(is_chunk:true AND )

This approach avoids breaking changes while enabling flexible metadata filtering for use cases like filtering by platform, version, document type, etc.

Type of change

  • Refactor
  • New feature
  • Bug fix
  • CVE fix
  • Optimization
  • Documentation Update
  • Configuration Update
  • Bump-up service version
  • Bump-up dependent library
  • Bump-up library or tool used for development (does not change the final image)
  • CI configuration change
  • Konflux configuration change
  • Unit tests improvement
  • End to end tests improvement

Tools used to create PR

Identify any AI code assistants used in this PR (for transparency and review context)

  • Assisted-by: Claude

Related Tickets & Documents

Checklist before requesting a review

  • I have performed a self-review of my code.
  • PR has passed all pre-merge test jobs.
  • If it is a core feature, I have added thorough tests.

Testing

  • Added 23 new unit tests covering all filter types and combinations
  • All 37 Solr tests passing (14 existing + 23 new)
  • Code quality: Black formatted, pylint 9.47/10, mypy clean

Summary by CodeRabbit

  • New Features

    • Added dynamic filtering for search results using structured filters with operators: eq, ne, in, nin and compound and/or logic.
  • Documentation

    • Added documentation describing filter syntax, operator support, escaping behavior, and how dynamic filters combine with static filters.
  • Tests

    • Added unit tests validating filter serialization, combinations, and error cases.
  • Chores

    • Updated Llama Stack prerequisite to 0.6.0.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 23, 2026

Warning

Rate limit exceeded

@anik120 has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 54 minutes and 39 seconds before requesting another review.

Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 54 minutes and 39 seconds.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: a1f6ffac-fea2-4d26-a45f-82a7b4b3b2be

📥 Commits

Reviewing files that changed from the base of the PR and between 1be8a76 and e41ffae.

⛔ Files ignored due to path filters (1)
  • uv.lock is excluded by !**/*.lock
📒 Files selected for processing (5)
  • README.md
  • lightspeed_stack_providers/providers/remote/solr_vector_io/solr_vector_io/README.md
  • lightspeed_stack_providers/providers/remote/solr_vector_io/solr_vector_io/src/solr_vector_io/solr.py
  • pyproject.toml
  • tests/unit/providers/remote/solr_vector_io/test_filters.py

Walkthrough

Adds dynamic filter support to the Solr provider: new serialization of llama-stack Filter objects into Solr fq strings, README dependency bump to Llama Stack 0.6.0, documentation for filter syntax, and unit tests covering serialization and combinations with static chunk filters.

Changes

Cohort / File(s) Summary
Dependency Update
README.md
Bumps required Llama Stack version from 0.4.3 to 0.6.0.
Documentation
lightspeed_stack_providers/providers/remote/solr_vector_io/solr_vector_io/README.md
Adds docs for dynamic filters passed to adapter.query_chunks(), lists supported operators (eq, ne, in, nin) and compound and/or, states range operators are not supported, describes escaping, combination with configured chunk_filter_query, and applicability to vector/keyword/hybrid searches.
Filter Implementation
lightspeed_stack_providers/providers/remote/solr_vector_io/solr_vector_io/src/solr_vector_io/solr.py
Implements helpers to serialize ComparisonFilter/CompoundFilter into Solr fq strings (supports eq, ne, in, nin, recursive and/or), validates field names, escapes values, updates query_vector(), query_keyword(), and query_hybrid() signatures to accept filters: Optional[Filter] = None, and combines static chunk filters with dynamic filters only when non-empty (logging moved to debug).
Filter Tests
tests/unit/providers/remote/solr_vector_io/test_filters.py
Adds comprehensive tests for _filter_to_solr_fq and _build_solr_filter_query: comparison and membership operators, string escaping, empty-list behavior, invalid inputs, nested compound filters, and static+dynamic combination logic.

Sequence Diagram(s)

sequenceDiagram
  participant Client
  participant SolrIndex
  participant SolrServer

  Client->>SolrIndex: call query_vector(..., filters=Filter)
  SolrIndex->>SolrIndex: serialize Filter -> fq (validate/escape)
  SolrIndex->>SolrIndex: combine fq with configured chunk_filter_query (AND) if present
  SolrIndex->>SolrServer: HTTP request with params (q, fq, rows, ... )
  SolrServer-->>SolrIndex: search results
  SolrIndex-->>Client: return chunks/results
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately captures the primary change: adding dynamic filter support to the Solr vector IO provider. It is specific, concise, and directly reflects the main objective of the changeset.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
✨ Simplify code
  • Create PR with simplified code

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (5)
lightspeed_stack_providers/providers/remote/solr_vector_io/solr_vector_io/src/solr_vector_io/solr.py (5)

1296-1300: ⚠️ Potential issue | 🔴 Critical

Annotate the vector-store cache.

The mypy job reports Need type annotation for "cache" on this new empty dict assignment.

🔧 Proposed typing fix
-        self.cache = {}
+        self.cache: dict[str, VectorStoreWithIndex] = {}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@lightspeed_stack_providers/providers/remote/solr_vector_io/solr_vector_io/src/solr_vector_io/solr.py`
around lines 1296 - 1300, Annotate the untyped cache on the class by giving
self.cache an explicit type (for example use typing.Dict[str, typing.Any] or a
more specific type if known) instead of assigning a bare {} in the constructor;
add the necessary import from typing (Dict, Any or the precise types) and
replace the line setting self.cache so mypy knows its type (refer to the
attribute name self.cache in the class __init__).

860-865: ⚠️ Potential issue | 🔴 Critical

Annotate the defaultdict used for kept chunk indices.

The mypy job reports Need type annotation for "kept_indices_by_parent" here.

🔧 Proposed typing fix
-        kept_indices_by_parent = defaultdict(list)
+        kept_indices_by_parent: defaultdict[Any, list[int]] = defaultdict(list)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@lightspeed_stack_providers/providers/remote/solr_vector_io/solr_vector_io/src/solr_vector_io/solr.py`
around lines 860 - 865, The variable kept_indices_by_parent is untyped for mypy;
annotate it as a DefaultDict mapping parent IDs to lists of chunk indices.
Replace the current assignment with a typed declaration like
kept_indices_by_parent: DefaultDict[str, List[int]] = defaultdict(list) (or use
DefaultDict[Any, List[int]] if parent keys aren’t strings) and add the necessary
typing imports (DefaultDict, List) while keeping the existing
collections.defaultdict import; update the declaration near the other variables
(expanded_chunks, expanded_scores) and run mypy to verify.

697-800: ⚠️ Potential issue | 🔴 Critical

Narrow chunk_window_config before dereferencing it.

These helpers assume chunk-windowing is enabled, but self.chunk_window_config is typed as optional, which causes the reported union-attr mypy failures. Add an assert schema is not None after assigning schema in each chunk-window-only helper.

🔧 Proposed mypy fix pattern
         schema = self.chunk_window_config
+        assert schema is not None

Also applies to: 859-864, 1103-1104

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@lightspeed_stack_providers/providers/remote/solr_vector_io/solr_vector_io/src/solr_vector_io/solr.py`
around lines 697 - 800, The code dereferences self.chunk_window_config (assigned
to variable schema) in chunk-window-only helpers like the parent-metadata
fetcher and _fetch_context_chunks even though chunk_window_config is Optional;
add an assertion directly after "schema = self.chunk_window_config" in each of
these helpers: assert schema is not None, to narrow the type for mypy and avoid
union-attr errors (do the same in the other chunk-window-only helper methods
where schema is used).

463-485: ⚠️ Potential issue | 🔴 Critical

Annotate solr_params with an HTTPX-compatible value type.

Mypy currently rejects this client.get(..., params=solr_params) call because the dict is inferred as dict[str, object] after mixing strings and integers.

🔧 Proposed typing fix
-            solr_params = {
+            solr_params: dict[str, str | int] = {
                 "q": query_string,
                 "rows": k,
                 "fl": "*,score",
                 "wt": "json",
             }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@lightspeed_stack_providers/providers/remote/solr_vector_io/solr_vector_io/src/solr_vector_io/solr.py`
around lines 463 - 485, The mypy error comes from solr_params being inferred as
dict[str, object] and HTTPX expects params values to be Any/str-like; fix by
annotating solr_params as dict[str, Any] (import Any from typing) and normalize
non-string values (e.g., set "rows": str(k)) before calling client.get; update
the block that builds solr_params (symbol: solr_params) and the client.get call
(symbol: client.get) so the params argument is accepted by the type checker.

1230-1239: ⚠️ Potential issue | 🔴 Critical

Guard optional chunk-window metadata access in _doc_to_chunk.

Queries without chunk_window_config will hit self.chunk_window_config.chunk_token_count_field and fall into the broad exception handler, dropping otherwise valid results. This also matches the mypy union-attr failures.

🐛 Proposed runtime fix
-            if self.chunk_window_config.chunk_token_count_field in doc:
-                metadata[self.chunk_window_config.chunk_token_count_field] = doc[
-                    self.chunk_window_config.chunk_token_count_field
-                ]
-            # add family fields to the metadata since we need to access them
-            # for comparison with other chunks later on
-            if self.chunk_window_config.chunk_family_fields:
-                for field in self.chunk_window_config.chunk_family_fields:
-                    if field in doc:
-                        metadata[field] = doc[field]
+            if self.chunk_window_config is not None:
+                token_count_field = self.chunk_window_config.chunk_token_count_field
+                if token_count_field in doc:
+                    metadata[token_count_field] = doc[token_count_field]
+
+                # add family fields to the metadata since we need to access them
+                # for comparison with other chunks later on
+                if self.chunk_window_config.chunk_family_fields:
+                    for field in self.chunk_window_config.chunk_family_fields:
+                        if field in doc:
+                            metadata[field] = doc[field]
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@lightspeed_stack_providers/providers/remote/solr_vector_io/solr_vector_io/src/solr_vector_io/solr.py`
around lines 1230 - 1239, In _doc_to_chunk, avoid accessing
self.chunk_window_config attributes when chunk_window_config can be None: first
check that self.chunk_window_config is truthy before referencing
self.chunk_window_config.chunk_token_count_field or
self.chunk_window_config.chunk_family_fields; only then copy doc values into
metadata (use the existing metadata dict and the same field names), and keep the
existing loop over chunk_family_fields but guarded by the same presence check so
queries without chunk-window config don't hit the broad exception handler or
trigger union-attr mypy errors.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In
`@lightspeed_stack_providers/providers/remote/solr_vector_io/solr_vector_io/README.md`:
- Around line 219-220: Insert a blank line before each fenced Python code block
in the README so markdownlint MD031 is satisfied; specifically add an empty line
after the preceding paragraph/heading lines such as "**Simple equality
filter:**", "**Range filter:**", "**Multiple values with 'in' filter:**",
"**Compound filters (AND/OR):**", and "**Nested compound filters:**" so each
subsequent ```python fence (examples using ComparisonFilter,
adapter.query_chunks, CompoundFilter) is separated by a blank line from the
preceding text.

In
`@lightspeed_stack_providers/providers/remote/solr_vector_io/solr_vector_io/src/solr_vector_io/solr.py`:
- Around line 177-195: The filter_parts list is currently untyped (filter_parts
= []) which can cause narrow mypy inference; change its declaration to an
explicit string list type (e.g., filter_parts: list[str] = [] or
typing.List[str] = []) at the top of the function that builds the Solr fq (the
block referencing filter_parts, chunk_filter_query, filters and calling
_filter_to_solr_fq) so mypy treats it as a list of strings; ensure imports are
adjusted if you use typing.List for older Python compatibility.
- Around line 54-58: The RET504 lint flag is triggered by assigning the result
of the second replace to `value` and then returning it; change the end of the
function so you keep the first replacement assignment (value =
value.replace("\\", "\\\\")) but return the second replacement directly (return
value.replace('"', '\\"')) instead of reassigning `value`; update the code in
solr.py around the two replace calls for the `value` variable accordingly.
- Around line 77-101: Validate and sanitize filter field names before
interpolating them into Solr queries and ensure all range bounds are escaped via
_format_solr_value; specifically, in places that use filter_obj.key (and in
functions like _handle_in_filter and the range-handling code for gt/gte/lt/lte),
first validate the field name against a safe pattern such as
^[A-Za-z_][A-Za-z0-9_.-]*$ and raise ValueError if it fails, and wrap every
range bound and any raw value interpolations with _format_solr_value() so
equality, inequality and range operators use consistent escaping.

In `@tests/unit/providers/remote/solr_vector_io/test_filters.py`:
- Around line 20-177: The tests are missing return type annotations and brief
docstrings; update each test function (e.g., test_eq_filter_with_string,
test_eq_filter_with_number, test_ne_filter, test_gt_filter, test_gte_filter,
test_lt_filter, test_lte_filter, test_in_filter_with_strings,
test_in_filter_with_numbers, test_in_filter_empty_list, test_nin_filter,
test_nin_filter_empty_list, test_string_escaping,
test_in_filter_with_non_list_raises_error, test_and_filter, test_or_filter,
test_nested_compound_filter, test_empty_compound_filter, test_no_filters,
test_only_static_filter, test_only_dynamic_filter,
test_combined_static_and_dynamic_filters, test_combined_with_compound_filter) to
include a -> None return annotation and a one-line docstring describing what the
test asserts; keep implementations unchanged and place the docstring as the
first statement in each function.
- Line 8: The pylint failure in tests importing pytest means pytest is missing
from the environment used by pylint; fix by either adding pytest to the dev
extras in pyproject.toml (add "pytest" to the dev dependency group) or ensure
pylint runs with the test extras (use the runtime command that includes the test
extras, e.g., run pylint with the test extra like `uv run --extra test pylint
...`), so that the `import pytest` in test files is resolvable during static
analysis.

---

Outside diff comments:
In
`@lightspeed_stack_providers/providers/remote/solr_vector_io/solr_vector_io/src/solr_vector_io/solr.py`:
- Around line 1296-1300: Annotate the untyped cache on the class by giving
self.cache an explicit type (for example use typing.Dict[str, typing.Any] or a
more specific type if known) instead of assigning a bare {} in the constructor;
add the necessary import from typing (Dict, Any or the precise types) and
replace the line setting self.cache so mypy knows its type (refer to the
attribute name self.cache in the class __init__).
- Around line 860-865: The variable kept_indices_by_parent is untyped for mypy;
annotate it as a DefaultDict mapping parent IDs to lists of chunk indices.
Replace the current assignment with a typed declaration like
kept_indices_by_parent: DefaultDict[str, List[int]] = defaultdict(list) (or use
DefaultDict[Any, List[int]] if parent keys aren’t strings) and add the necessary
typing imports (DefaultDict, List) while keeping the existing
collections.defaultdict import; update the declaration near the other variables
(expanded_chunks, expanded_scores) and run mypy to verify.
- Around line 697-800: The code dereferences self.chunk_window_config (assigned
to variable schema) in chunk-window-only helpers like the parent-metadata
fetcher and _fetch_context_chunks even though chunk_window_config is Optional;
add an assertion directly after "schema = self.chunk_window_config" in each of
these helpers: assert schema is not None, to narrow the type for mypy and avoid
union-attr errors (do the same in the other chunk-window-only helper methods
where schema is used).
- Around line 463-485: The mypy error comes from solr_params being inferred as
dict[str, object] and HTTPX expects params values to be Any/str-like; fix by
annotating solr_params as dict[str, Any] (import Any from typing) and normalize
non-string values (e.g., set "rows": str(k)) before calling client.get; update
the block that builds solr_params (symbol: solr_params) and the client.get call
(symbol: client.get) so the params argument is accepted by the type checker.
- Around line 1230-1239: In _doc_to_chunk, avoid accessing
self.chunk_window_config attributes when chunk_window_config can be None: first
check that self.chunk_window_config is truthy before referencing
self.chunk_window_config.chunk_token_count_field or
self.chunk_window_config.chunk_family_fields; only then copy doc values into
metadata (use the existing metadata dict and the same field names), and keep the
existing loop over chunk_family_fields but guarded by the same presence check so
queries without chunk-window config don't hit the broad exception handler or
trigger union-attr mypy errors.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: b35e613b-e7bb-4182-86bf-5784c55a1e64

📥 Commits

Reviewing files that changed from the base of the PR and between 8cd1b3d and 2182798.

📒 Files selected for processing (4)
  • README.md
  • lightspeed_stack_providers/providers/remote/solr_vector_io/solr_vector_io/README.md
  • lightspeed_stack_providers/providers/remote/solr_vector_io/solr_vector_io/src/solr_vector_io/solr.py
  • tests/unit/providers/remote/solr_vector_io/test_filters.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: e2e_tests
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

**/*.py: Use absolute imports for internal modules (e.g., from authentication import get_auth_dependency)
FastAPI dependencies: import from fastapi using APIRouter, HTTPException, Request, status, Depends
Use from llama_stack_client import AsyncLlamaStackClient for Llama Stack integration
All modules must start with descriptive docstrings explaining purpose
Use logger = get_logger(__name__) from log.py for module logging
Type aliases should be defined at module level for clarity
All config classes must use Pydantic models extending ConfigurationBase with extra="forbid" to reject unknown fields
Use @field_validator and @model_validator for custom validation in Pydantic models
Configuration type hints should use Optional[FilePath], PositiveInt, SecretStr and other Pydantic-specific types
All functions require docstrings with brief descriptions
Complete type annotations for function parameters and return types are required, using modern syntax (str | int instead of Union[str, int]), typing_extensions.Self for model validators, and Optional[Type] for optional values
Use snake_case with descriptive, action-oriented function names (get_, validate_, check_)
Avoid in-place parameter modification anti-patterns; return new data structures instead of modifying function parameters
Use async def for I/O operations and external API calls
For API endpoints, use FastAPI HTTPException with appropriate status codes; handle APIConnectionError from Llama Stack
Use standard logging levels with clear purposes: logger.debug() for diagnostic info, logger.info() for general execution info, logger.warning() for unexpected events, logger.error() for serious problems
All classes require descriptive docstrings explaining purpose
Use PascalCase for class names with descriptive names and standard suffixes: Configuration for config classes, Error/Exception for custom exceptions, Resolver for strategy pattern implementations, Interface for...

Files:

  • tests/unit/providers/remote/solr_vector_io/test_filters.py
  • lightspeed_stack_providers/providers/remote/solr_vector_io/solr_vector_io/src/solr_vector_io/solr.py
tests/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

tests/**/*.py: Use pytest for all unit and integration tests; do not use unittest
Use pytest-mock for MagicMock and AsyncMock objects in tests
Unit test fixtures should use @pytest.fixture(name="fixture_name") decorator with properly typed return values; auth mocks should use MOCK_AUTH = ("mock_user_id", "mock_username", False, "mock_token")

Files:

  • tests/unit/providers/remote/solr_vector_io/test_filters.py
🧠 Learnings (2)
📚 Learning: 2026-04-14T19:08:55.908Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-providers PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-14T19:08:55.908Z
Learning: Applies to **/*.py : Use `from llama_stack_client import AsyncLlamaStackClient` for Llama Stack integration

Applied to files:

  • README.md
📚 Learning: 2026-04-14T19:08:55.908Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-providers PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-14T19:08:55.908Z
Learning: Applies to **/*.py : For API endpoints, use FastAPI `HTTPException` with appropriate status codes; handle `APIConnectionError` from Llama Stack

Applied to files:

  • README.md
🪛 GitHub Actions: Black
lightspeed_stack_providers/providers/remote/solr_vector_io/solr_vector_io/src/solr_vector_io/solr.py

[error] 1-1: Black --check failed: file would be reformatted (exit code 1). Run 'uv run black . --write' to apply formatting.

🪛 GitHub Actions: Python linter
tests/unit/providers/remote/solr_vector_io/test_filters.py

[error] 8-8: pylint: Unable to import 'pytest' (import-error)

lightspeed_stack_providers/providers/remote/solr_vector_io/solr_vector_io/src/solr_vector_io/solr.py

[error] 1-1: pylint: Too many lines in module (1557/1000) (too-many-lines)

🪛 GitHub Actions: Ruff
lightspeed_stack_providers/providers/remote/solr_vector_io/solr_vector_io/src/solr_vector_io/solr.py

[error] 58-58: ruff check failed: RET504 Unnecessary assignment to value before return statement. Rule: RET504. Help: Remove unnecessary assignment.

🪛 GitHub Actions: Type checks
lightspeed_stack_providers/providers/remote/solr_vector_io/solr_vector_io/src/solr_vector_io/solr.py

[error] 485-485: mypy: Argument "params" to "get" of "AsyncClient" has incompatible type "dict[str, object]"; expected QueryParams / Mapping[str, str|int|float|bool|None|Sequence[...]] / etc. [arg-type].


[error] 701-711: mypy: ChunkWindowConfig is Optional; accessing fields like "parent_id_field" / "parent_total_chunks_field" / "parent_total_tokens_field" / "parent_content_id_field" / ... without None-check causes union-attr errors [union-attr].


[error] 718-735: mypy: ChunkWindowConfig Optional union-attr errors when accessing parent_* fields (e.g., "parent_id_field", "parent_total_chunks_field"). [union-attr].


[error] 772-817: mypy: ChunkWindowConfig Optional union-attr errors when accessing chunk_* fields (e.g., "chunk_index_field", "chunk_token_count_field", "chunk_parent_id_field", "chunk_family_fields", "chunk_filter_query"). [union-attr].


[error] 864-864: mypy: Need type annotation for "kept_indices_by_parent" [var-annotated].


[error] 877-1010: mypy: ChunkWindowConfig Optional union-attr errors when accessing chunk_parent_id_field / chunk_index_field / parent_total_* / chunk_family_fields / chunk_token_count_field fields. [union-attr].


[error] 1044-1055: mypy: ChunkWindowConfig Optional union-attr errors when accessing parent_content_id_field / parent_content_title_field / parent_content_url_field. [union-attr].


[error] 1113-1159: mypy: ChunkWindowConfig Optional union-attr errors when accessing chunk_token_count_field / chunk_index_field / etc. [union-attr].


[error] 1230-1237: mypy: ChunkWindowConfig Optional union-attr errors when accessing chunk_token_count_field / chunk_family_fields. [union-attr].


[error] 1299-1299: mypy: Need type annotation for "cache" [var-annotated].

🪛 markdownlint-cli2 (0.22.0)
lightspeed_stack_providers/providers/remote/solr_vector_io/solr_vector_io/README.md

[warning] 220-220: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)


[warning] 239-239: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)


[warning] 256-256: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)


[warning] 273-273: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)


[warning] 294-294: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)

🔇 Additional comments (1)
README.md (1)

61-63: LGTM.

The documented Llama Stack prerequisite now matches the version required for dynamic filter support.

Comment thread tests/unit/providers/remote/solr_vector_io/test_filters.py
Comment thread tests/unit/providers/remote/solr_vector_io/test_filters.py Outdated
@anik120 anik120 force-pushed the vector-io-filters branch 3 times, most recently from 476fc2a to 3a2be72 Compare April 23, 2026 14:35
anik120 added a commit to anik120/lightspeed-stack that referenced this pull request Apr 23, 2026
Updates the Solr vector search integration to support the new Filter API
introduced in llama-stack 0.6.0, enabling metadata-based filtering of RAG
results using comparison and compound filters.

Filter format examples:
    Simple: {"filters": {"type": "eq", "key": "platform", "value": "openshift"}}
    Compound: {"filters": {"type": "and", "filters": [...]}}

Note: This change requires lightspeed-providers with solr_vector_io filter support,
introduced in lightspeed-core/lightspeed-providers#119
anik120 added a commit to anik120/lightspeed-stack that referenced this pull request Apr 23, 2026
Updates the Solr vector search integration to support the new Filter API
introduced in llama-stack 0.6.0, enabling metadata-based filtering of RAG
results using comparison and compound filters.

Filter format examples:
    Simple: {"filters": {"type": "eq", "key": "platform", "value": "openshift"}}
    Compound: {"filters": {"type": "and", "filters": [...]}}

Note: This change requires lightspeed-providers with solr_vector_io filter support,
introduced in lightspeed-core/lightspeed-providers#119
anik120 added a commit to anik120/lightspeed-stack that referenced this pull request Apr 23, 2026
Updates the Solr vector search integration to support the new Filter API
introduced in llama-stack 0.6.0, enabling metadata-based filtering of RAG
results using comparison and compound filters.

Filter format examples:
    Simple: {"filters": {"type": "eq", "key": "platform", "value": "openshift"}}
    Compound: {"filters": {"type": "and", "filters": [...]}}

Note: This change requires lightspeed-providers with solr_vector_io filter support,
introduced in lightspeed-core/lightspeed-providers#119
@anik120 anik120 force-pushed the vector-io-filters branch from 3a2be72 to 1be8a76 Compare April 23, 2026 18:29
Adds support for dynamic metadata filtering to the Solr vector IO provider, enabling users to filter
RAG query results by metadata fields at runtime.

NOTE: Dynamic filters from requests are **combined with** the existing static `chunk_filter_query`
configuration (e.g., `"is_chunk:true"`) rather than replacing it. This preserves backward compatibility
and maintains the internal schema filtering needed for chunk window expansion to work correctly.

Both filters are joined with AND logic: fq=(is_chunk:true AND )

This approach avoids breaking changes while enabling flexible metadata filtering for use cases like
filtering by platform, version, document type, etc.

- Upstream PR: ogx-ai/ogx#4471
@anik120 anik120 force-pushed the vector-io-filters branch from 1be8a76 to e41ffae Compare April 23, 2026 18:31
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
lightspeed_stack_providers/providers/remote/solr_vector_io/solr_vector_io/src/solr_vector_io/solr.py (1)

482-498: ⚠️ Potential issue | 🟡 Minor

mypy arg-type regression at client.get(..., params=solr_params).

The CI Type checks job reports line 503: Argument "params" to "get" ... has incompatible type "dict[str, object]". solr_params starts as a heterogeneous literal (str/int), and the new conditional solr_params["fq"] = combined_filter keeps mypy inferring dict[str, object]. Annotate the dict at the declaration (line 482) so httpx's QueryParams overload is satisfied.

🔧 Fix
-            solr_params = {
+            solr_params: dict[str, str | int] = {
                 "q": query_string,
                 "rows": k,
                 "fl": "*,score",
                 "wt": "json",
             }

Same pattern applies to data_params in query_hybrid if it starts failing type checks.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@lightspeed_stack_providers/providers/remote/solr_vector_io/solr_vector_io/src/solr_vector_io/solr.py`
around lines 482 - 498, The mypy error is caused by solr_params being inferred
as dict[str, object]; fix by adding an explicit annotation at its declaration so
it matches httpx's QueryParams overload (e.g. change the declaration to
something like solr_params: dict[str, str | int] = {...} and ensure
combined_filter (from _build_solr_filter_query) is a str before assigning to
solr_params["fq"]); apply the same pattern to data_params in query_hybrid if
needed.
♻️ Duplicate comments (1)
lightspeed_stack_providers/providers/remote/solr_vector_io/solr_vector_io/src/solr_vector_io/solr.py (1)

59-63: 🧹 Nitpick | 🔵 Trivial

Minor: use fullmatch for readability.

The pattern already anchors with ^…$, so re.match works, but _FIELD_NAME_PATTERN.fullmatch(field_name) expresses intent more directly and is robust to future pattern edits that drop the anchors.

-    if not _FIELD_NAME_PATTERN.match(field_name):
+    if not _FIELD_NAME_PATTERN.fullmatch(field_name):
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@lightspeed_stack_providers/providers/remote/solr_vector_io/solr_vector_io/src/solr_vector_io/solr.py`
around lines 59 - 63, Replace the use of _FIELD_NAME_PATTERN.match(field_name)
with _FIELD_NAME_PATTERN.fullmatch(field_name) in the validation check so intent
is explicit and robust if the regex is later changed; update the condition where
the ValueError is raised (the block that checks field_name using
_FIELD_NAME_PATTERN) to call fullmatch instead of match, leaving the error
message unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In
`@lightspeed_stack_providers/providers/remote/solr_vector_io/solr_vector_io/README.md`:
- Line 307: The README claim that "Special Solr characters in filter values are
automatically escaped" is inaccurate given _escape_solr_value in solr.py only
handles backslashes and double quotes; either update the README to state the
precise scope (only backslashes and double quotes are escaped) or expand
_escape_solr_value to also escape Solr-reserved characters (+ - && || ! ( ) { }
[ ] ^ ~ * ? : /) and update callers (e.g., eq/ne filter building logic) and
tests accordingly; reference the _escape_solr_value function in solr.py and the
README entry to ensure consistent wording or implementation and add unit tests
that demonstrate the new escaping behavior if you implement it.

In
`@lightspeed_stack_providers/providers/remote/solr_vector_io/solr_vector_io/src/solr_vector_io/solr.py`:
- Around line 398-407: The repeated block that computes chunk_filter and
combined_filter and sets params["fq"] appears in SolrIndex.query_vector,
SolrIndex.query_keyword, and SolrIndex.query_hybrid; extract this into a new
helper method on SolrIndex (e.g., _apply_chunk_and_dynamic_filters(params,
filters)) that does: read self.chunk_window_config.chunk_filter_query (safely if
chunk_window_config is None), call _build_solr_filter_query(chunk_filter,
filters), set params["fq"] when non-empty, and log via log.debug; then replace
the duplicated 10-line block in query_vector, query_keyword, and query_hybrid
with a single call to that helper so semantics and logging remain identical.
- Around line 154-155: The `ne` and `nin` comparison branches are emitting
pure-negative clauses that break when combined in an OR; update the Solr clause
generation so each negation is wrapped with a match-all prefix. Specifically, in
the comparison handling where filter_type == "ne" change the returned clause to
wrap the negative term as "(*:* AND -{key}:{_format_solr_value(value)})", apply
the same wrapping to the `nin` branch(s) in the comparison handler, and also
modify the `_handle_in_filter` function to emit each negated IN as "(*:* AND
-{key}:{val})" rather than a pure negative term; finally add unit tests that
compose `or` with `ne` and `nin` filters to ensure these cases no longer produce
empty-match queries.

In `@tests/unit/providers/remote/solr_vector_io/test_filters.py`:
- Around line 11-17: The test imports use an overly deep src-layout path for
_build_solr_filter_query and _filter_to_solr_fq; create a short, stable import
surface (e.g., add a package-level shim/__init__ that re-exports
_build_solr_filter_query and _filter_to_solr_fq from the internal solr module or
flatten the package layout) and update the test to import those symbols from the
new top-level entrypoint, then remove the pylint: disable=line-too-long guard in
the test.

---

Outside diff comments:
In
`@lightspeed_stack_providers/providers/remote/solr_vector_io/solr_vector_io/src/solr_vector_io/solr.py`:
- Around line 482-498: The mypy error is caused by solr_params being inferred as
dict[str, object]; fix by adding an explicit annotation at its declaration so it
matches httpx's QueryParams overload (e.g. change the declaration to something
like solr_params: dict[str, str | int] = {...} and ensure combined_filter (from
_build_solr_filter_query) is a str before assigning to solr_params["fq"]); apply
the same pattern to data_params in query_hybrid if needed.

---

Duplicate comments:
In
`@lightspeed_stack_providers/providers/remote/solr_vector_io/solr_vector_io/src/solr_vector_io/solr.py`:
- Around line 59-63: Replace the use of _FIELD_NAME_PATTERN.match(field_name)
with _FIELD_NAME_PATTERN.fullmatch(field_name) in the validation check so intent
is explicit and robust if the regex is later changed; update the condition where
the ValueError is raised (the block that checks field_name using
_FIELD_NAME_PATTERN) to call fullmatch instead of match, leaving the error
message unchanged.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 42ec098d-8e75-4dc7-b8b3-085dd939f0c0

📥 Commits

Reviewing files that changed from the base of the PR and between 2182798 and 1be8a76.

📒 Files selected for processing (4)
  • README.md
  • lightspeed_stack_providers/providers/remote/solr_vector_io/solr_vector_io/README.md
  • lightspeed_stack_providers/providers/remote/solr_vector_io/solr_vector_io/src/solr_vector_io/solr.py
  • tests/unit/providers/remote/solr_vector_io/test_filters.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: e2e_tests
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

**/*.py: Use absolute imports for internal modules (e.g., from authentication import get_auth_dependency)
FastAPI dependencies: import from fastapi using APIRouter, HTTPException, Request, status, Depends
Use from llama_stack_client import AsyncLlamaStackClient for Llama Stack integration
All modules must start with descriptive docstrings explaining purpose
Use logger = get_logger(__name__) from log.py for module logging
Type aliases should be defined at module level for clarity
All config classes must use Pydantic models extending ConfigurationBase with extra="forbid" to reject unknown fields
Use @field_validator and @model_validator for custom validation in Pydantic models
Configuration type hints should use Optional[FilePath], PositiveInt, SecretStr and other Pydantic-specific types
All functions require docstrings with brief descriptions
Complete type annotations for function parameters and return types are required, using modern syntax (str | int instead of Union[str, int]), typing_extensions.Self for model validators, and Optional[Type] for optional values
Use snake_case with descriptive, action-oriented function names (get_, validate_, check_)
Avoid in-place parameter modification anti-patterns; return new data structures instead of modifying function parameters
Use async def for I/O operations and external API calls
For API endpoints, use FastAPI HTTPException with appropriate status codes; handle APIConnectionError from Llama Stack
Use standard logging levels with clear purposes: logger.debug() for diagnostic info, logger.info() for general execution info, logger.warning() for unexpected events, logger.error() for serious problems
All classes require descriptive docstrings explaining purpose
Use PascalCase for class names with descriptive names and standard suffixes: Configuration for config classes, Error/Exception for custom exceptions, Resolver for strategy pattern implementations, Interface for...

Files:

  • tests/unit/providers/remote/solr_vector_io/test_filters.py
  • lightspeed_stack_providers/providers/remote/solr_vector_io/solr_vector_io/src/solr_vector_io/solr.py
tests/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

tests/**/*.py: Use pytest for all unit and integration tests; do not use unittest
Use pytest-mock for MagicMock and AsyncMock objects in tests
Unit test fixtures should use @pytest.fixture(name="fixture_name") decorator with properly typed return values; auth mocks should use MOCK_AUTH = ("mock_user_id", "mock_username", False, "mock_token")

Files:

  • tests/unit/providers/remote/solr_vector_io/test_filters.py
🧠 Learnings (4)
📚 Learning: 2026-04-14T19:08:55.908Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-providers PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-14T19:08:55.908Z
Learning: Applies to **/*.py : Use `from llama_stack_client import AsyncLlamaStackClient` for Llama Stack integration

Applied to files:

  • README.md
📚 Learning: 2026-04-14T19:08:55.908Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-providers PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-14T19:08:55.908Z
Learning: Applies to **/*.py : For API endpoints, use FastAPI `HTTPException` with appropriate status codes; handle `APIConnectionError` from Llama Stack

Applied to files:

  • README.md
📚 Learning: 2026-04-14T19:08:55.908Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-providers PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-14T19:08:55.908Z
Learning: Configure pylint with `source-roots = "lightspeed_stack_providers/"`

Applied to files:

  • tests/unit/providers/remote/solr_vector_io/test_filters.py
📚 Learning: 2026-04-14T19:08:55.908Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-providers PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-14T19:08:55.908Z
Learning: Applies to tests/**/*.py : Unit test fixtures should use `pytest.fixture(name="fixture_name")` decorator with properly typed return values; auth mocks should use `MOCK_AUTH = ("mock_user_id", "mock_username", False, "mock_token")`

Applied to files:

  • tests/unit/providers/remote/solr_vector_io/test_filters.py
🪛 GitHub Actions: Python linter
tests/unit/providers/remote/solr_vector_io/test_filters.py

[error] 8-8: pylint (import-error): Unable to import 'pytest'

lightspeed_stack_providers/providers/remote/solr_vector_io/solr_vector_io/src/solr_vector_io/solr.py

[error] 1-1: pylint (C0302) Too many lines in module (1558/1000)


[error] 870-870: pylint (C0415) Import outside toplevel (collections.defaultdict)

🪛 GitHub Actions: Type checks
lightspeed_stack_providers/providers/remote/solr_vector_io/solr_vector_io/src/solr_vector_io/solr.py

[error] 503-503: mypy: Argument "params" to "get" of "AsyncClient" has incompatible type "dict[str, object]"; expected "QueryParams | Mapping[str, str | int | float | bool | None | Sequence[str | int | float | bool | None]] | list[tuple[str, str | int | float | bool | None]] | tuple[tuple[str, str | int | float | bool | None], ...] | str | bytes | None" [arg-type].


[error] 717-717: mypy: Item "None" of "ChunkWindowConfig | None" has no attribute "parent_id_field" [union-attr].


[error] 718-718: mypy: Item "None" of "ChunkWindowConfig | None" has no attribute "parent_total_chunks_field" [union-attr].


[error] 719-719: mypy: Item "None" of "ChunkWindowConfig | None" has no attribute "parent_total_tokens_field" [union-attr].


[error] 722-724: mypy: Item "None" of "ChunkWindowConfig | None" has no attribute(s) "parent_content_id_field" / "parent_content_title_field" [union-attr] (multiple lines reported).


[error] 726-727: mypy: Item "None" of "ChunkWindowConfig | None" has no attribute(s) "parent_content_url_field" / "parent_id_field" [union-attr] (multiple lines reported).


[error] 877-877: mypy: Need type annotation for "kept_indices_by_parent" [var-annotated].


[error] 1305-1305: mypy: Need type annotation for "cache" (hint: "cache: dict[, ] = ...") [var-annotated].

🔇 Additional comments (3)
tests/unit/providers/remote/solr_vector_io/test_filters.py (1)

8-8: pytest import error in pylint job (already flagged).

Pipeline still reports pylint: Unable to import 'pytest' (import-error). Per the previous review, add pytest to the dev dependency group (or run pylint via uv run --extra test pylint …) so static analysis can resolve the import.

README.md (1)

62-62: LGTM — version bump aligned with provider changes.

Llama Stack == 0.6.0 matches the library version required by the new llama_stack.providers.utils.vector_io.filters import used in solr.py.

lightspeed_stack_providers/providers/remote/solr_vector_io/solr_vector_io/src/solr_vector_io/solr.py (1)

17-21: Verify the Filter import path in llama-stack 0.6.0 using Python 3.12+.

The sandbox environment cannot verify this import path because it runs Python 3.11.2 and llama-stack 0.6.0 requires Python >=3.12. Please confirm in a Python 3.12+ environment that llama_stack.providers.utils.vector_io.filters exports ComparisonFilter, CompoundFilter, and Filter.


- **Static filters preserved:** The configured `chunk_filter_query` (e.g., `"is_chunk:true"`) is always applied to maintain proper chunk/parent document separation
- **Dynamic filters added:** Request filters are combined with static filters using AND logic
- **String escaping:** Special Solr characters in filter values are automatically escaped
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Doc overstates the escaping scope.

The doc claims "Special Solr characters in filter values are automatically escaped", but _escape_solr_value in solr.py only escapes backslashes and double quotes. Other Solr-reserved characters (+ - && || ! ( ) { } [ ] ^ ~ * ? : /) are not escaped and will be parsed by Solr. Since eq/ne values are wrapped in quotes most of these are inert in practice, but users should not rely on this blanket statement — consider clarifying the scope.

📝 Suggested wording
-- **String escaping:** Special Solr characters in filter values are automatically escaped
+- **String escaping:** Backslashes and double quotes in string values are escaped; values are wrapped in quotes so most Solr-reserved characters are treated literally
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
- **String escaping:** Special Solr characters in filter values are automatically escaped
- **String escaping:** Backslashes and double quotes in string values are escaped; values are wrapped in quotes so most Solr-reserved characters are treated literally
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@lightspeed_stack_providers/providers/remote/solr_vector_io/solr_vector_io/README.md`
at line 307, The README claim that "Special Solr characters in filter values are
automatically escaped" is inaccurate given _escape_solr_value in solr.py only
handles backslashes and double quotes; either update the README to state the
precise scope (only backslashes and double quotes are escaped) or expand
_escape_solr_value to also escape Solr-reserved characters (+ - && || ! ( ) { }
[ ] ^ ~ * ? : /) and update callers (e.g., eq/ne filter building logic) and
tests accordingly; reference the _escape_solr_value function in solr.py and the
README entry to ensure consistent wording or implementation and add unit tests
that demonstrate the new escaping behavior if you implement it.

Comment on lines +154 to +155
if filter_type == "ne":
return f"-{key}:{_format_solr_value(value)}"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Confirm whether the existing tests cover OR-of-NE combinations.
rg -nP 'type="or"' tests/unit/providers/remote/solr_vector_io/test_filters.py -C 6

Repository: lightspeed-core/lightspeed-providers

Length of output: 1226


🏁 Script executed:

#!/bin/bash
# Read the solr.py file to check lines 154-155 and surrounding context
head -200 lightspeed_stack_providers/providers/remote/solr_vector_io/solr_vector_io/src/solr_vector_io/solr.py | tail -60

Repository: lightspeed-core/lightspeed-providers

Length of output: 2043


🏁 Script executed:

#!/bin/bash
# Check for lines 176-179 and 125 in the same file
sed -n '120,130p' lightspeed_stack_providers/providers/remote/solr_vector_io/solr_vector_io/src/solr_vector_io/solr.py
echo "---"
sed -n '170,185p' lightspeed_stack_providers/providers/remote/solr_vector_io/solr_vector_io/src/solr_vector_io/solr.py

Repository: lightspeed-core/lightspeed-providers

Length of output: 958


🏁 Script executed:

#!/bin/bash
# Check test_filters.py for any tests with NE or NIN inside OR
rg -n 'type="ne"|type="nin"' tests/unit/providers/remote/solr_vector_io/test_filters.py -C 8

Repository: lightspeed-core/lightspeed-providers

Length of output: 2765


Solr pure-negation disjunction bug: ne and nin inside or produce invalid queries matching zero docs.

When ne or nin filters are composed inside an or compound filter, the generated Solr syntax becomes a pure-negative boolean query that matches nothing. For example, CompoundFilter(type="or", filters=[ComparisonFilter(type="ne", ...), ComparisonFilter(type="ne", ...)]) produces (-a:"x" OR -b:"y"), which Solr evaluates as matching no documents. This contradicts the advertised support for arbitrary combinations of comparison operators inside and/or gates.

The fix is to wrap each negation at the comparison filter level. For ne (lines 154-155), change the return value to (*:* AND -{key}:{_format_solr_value(value)}). Apply the same pattern to nin (lines 176-179 and the _handle_in_filter function at line 125). This ensures negations remain valid whether standalone, inside and, or inside or.

Update unit tests to cover or compositions with ne and nin operators to prevent regression.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@lightspeed_stack_providers/providers/remote/solr_vector_io/solr_vector_io/src/solr_vector_io/solr.py`
around lines 154 - 155, The `ne` and `nin` comparison branches are emitting
pure-negative clauses that break when combined in an OR; update the Solr clause
generation so each negation is wrapped with a match-all prefix. Specifically, in
the comparison handling where filter_type == "ne" change the returned clause to
wrap the negative term as "(*:* AND -{key}:{_format_solr_value(value)})", apply
the same wrapping to the `nin` branch(s) in the comparison handler, and also
modify the `_handle_in_filter` function to emit each negated IN as "(*:* AND
-{key}:{val})" rather than a pure negative term; finally add unit tests that
compose `or` with `ne` and `nin` filters to ensure these cases no longer produce
empty-match queries.

Comment on lines +11 to +17
# pylint: disable=line-too-long
from lightspeed_stack_providers.providers.remote.solr_vector_io.solr_vector_io.src.solr_vector_io.solr import (
_build_solr_filter_query,
_filter_to_solr_fq,
)

# pylint: enable=line-too-long
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Long import path is a structural smell.

The six-segment path lightspeed_stack_providers.providers.remote.solr_vector_io.solr_vector_io.src.solr_vector_io.solr (with solr_vector_io appearing three times and a src/ segment in the dotted import) suggests the package layout mixes a src-layout with a non-src parent. It forces the pylint: disable=line-too-long guards here. Consider a follow-up to flatten the layout or expose a top-level import shim; not blocking for this PR.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit/providers/remote/solr_vector_io/test_filters.py` around lines 11 -
17, The test imports use an overly deep src-layout path for
_build_solr_filter_query and _filter_to_solr_fq; create a short, stable import
surface (e.g., add a package-level shim/__init__ that re-exports
_build_solr_filter_query and _filter_to_solr_fq from the internal solr module or
flatten the package layout) and update the test to import those symbols from the
new top-level entrypoint, then remove the pylint: disable=line-too-long guard in
the test.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant