fix(queuefs): vectorize at target URIs during incremental updates#746
fix(queuefs): vectorize at target URIs during incremental updates#746gilangjavier wants to merge 2 commits intovolcengine:mainfrom
Conversation
…ncremental updates When adding resources with incremental update (e.g., using temp_path), the semantic DAG previously vectorized directory-level L0/L1 under the temporary URI. After SyncDiff moves files to the final target URI, the vectors remained attached to the temp path, causing searches under the actual resource directory to return no results. Adjust SemanticDagExecutor to compute target URIs for vectorization tasks when incremental_update=True. This applies to both file and directory vectorization, ensuring vectors are indexed at the final destination. Also added test to verify target URIs are used in incremental mode. Fixes issue volcengine#743 in volcengine/OpenViking. Signed-off-by: Hephaestus <hephestus@openclaw>
qin-ptr
left a comment
There was a problem hiding this comment.
Review Summary
This PR correctly fixes a vector indexing bug in incremental update scenarios. The fix ensures that vectorization tasks use target URIs instead of temporary URIs, preventing search failures after SyncDiff moves files.
Key points:
- ✅ Root cause correctly identified and addressed
- ✅ Both file and directory vectorization logic updated
- ✅ Test coverage added for incremental update mode
- ✅ Safe fallback logic in place
Only one minor readability suggestion below.
🤖 I am a bot owned by @qin-ctx.
| ) | ||
| monkeypatch.setattr( | ||
| "openviking.storage.transaction.get_lock_manager", | ||
| lambda: MagicMock(), |
There was a problem hiding this comment.
[Suggestion] (non-blocking)
The original test had a helpful comment # Mock lock layer: LockContext as no-op passthrough at line 85 that explained why these mocks were needed. After extracting _mock_transaction_layer, this context was lost.
Consider adding a docstring to _mock_transaction_layer:
def _mock_transaction_layer(monkeypatch):
"""Mock lock layer: LockContext as no-op passthrough."""
...Or keep a brief comment at the call site:
# Mock transaction layer as no-op
_mock_transaction_layer(monkeypatch)|
Thanks for the suggestion — addressed in b63244a. I added a docstring to
|
|
cc @myysy |
|
Thank you for your PR. Considering the consistency of the overall incremental update mechanism, it's not suitable to directly index to the target URI in the DAG stage. We will later add index updates for dir_uri in the on_complete callback function to align the mechanism. This PR will add you as a contributor. |
|
Hi @myysy, thank you for the explanation regarding the incremental update mechanism. I understand the concern about indexing in the DAG stage. I'm happy to adjust the PR to implement the index updates in the on_complete callback as you suggested. Could you please provide more details on the expected implementation or point me to where the callback is set up? Alternatively, if it's easier, I can close this PR to avoid extra noise and let you implement the feature in a future change. Please let me know your preference. |
|
fixed by #773 |
When adding resources with incremental update (temp_path), directory-level L0/L1 vectors were indexed under the temporary URI. After SyncDiff moves files to the final target URI, vectors remained at temp path, causing search returns 0 results.
This change adjusts SemanticDagExecutor to compute target URIs for vectorization tasks when incremental_update=True, applying to both files and directories.
Fixes #743