fix(demo): improve RAG retrieval to include text content

bsbodden · bsbodden · commit ff1e3d8341c4 · 2025-12-10T19:05:25.000-07:00
IMAGE chunks have generic descriptions (e.g., "Image 1 from page 3")
that embed poorly for semantic search but may match general queries.
This caused TEXT chunks with actual document content to be excluded.

- Add 'type' TAG field to schema for future type-based filtering
- Increase maxResults from 5 to 15 to ensure TEXT chunks are retrieved
- Lower minScore from 0.3 to 0.2 for broader retrieval (filtered in code)

RAGService already separates TEXT vs IMAGE content for processing.
diff --git a/demos/rag-multimodal/src/main/java/com/redis/vl/demo/rag/service/ServiceFactory.java b/demos/rag-multimodal/src/main/java/com/redis/vl/demo/rag/service/ServiceFactory.java
@@ -90,6 +90,7 @@ private SearchIndex createSearchIndex() {
             "fields",
             List.of(
                 Map.of("name", "text", "type", "text"),
+                Map.of("name", "type", "type", "tag"),  // TAG field for filtering by chunk type
                 Map.of("name", "metadata", "type", "text"),
                 Map.of(
                     "name",
@@ -128,12 +129,15 @@ public RAGService createRAGService(LLMConfig config) {
     }
 
     // Create content retriever
+    // Use higher maxResults to ensure TEXT chunks are retrieved (IMAGE chunks
+    // have generic descriptions that may match better semantically but contain
+    // no useful content). RAGService separates TEXT vs IMAGE for processing.
     RedisVLContentRetriever retriever =
         RedisVLContentRetriever.builder()
             .embeddingStore(embeddingStore)
             .embeddingModel(embeddingModel)
-            .maxResults(5)
-            .minScore(0.7)
+            .maxResults(15)
+            .minScore(0.2)  // Low threshold since we'll filter in RAGService
             .build();
 
     // Create chat model based on provider