-
Notifications
You must be signed in to change notification settings - Fork 76
[Feature][Integration][Java] add ElasticsearchVectorStore in Java #341
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[Feature][Integration][Java] add ElasticsearchVectorStore in Java #341
Conversation
…integration Add comprehensive RAG example demonstrating retrieval-augmented generation using Elasticsearch as vector store and Ollama for embeddings/chat. Includes knowledge base setup utility with sample documents and supports various authentication methods.
|
@twosom Please add the following content to your PR description and select a checkbox: |
|
Hi, @twosom, I'll start reviewing this PR as soon as possible. Besides, this PR should be labeled 'doc-needed'. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, @twosom, thanks for your contribution. Overall looks good to me. I left some minor comments about details.
Besides, due to the example requires es cluster, I will verify the example next week.
And some test cases failed, I think it is caused by the newly added built-in action, which break the agent plan verification.
api/src/main/java/org/apache/flink/agents/api/vectorstores/BaseVectorStore.java
Outdated
Show resolved
Hide resolved
plan/src/main/java/org/apache/flink/agents/plan/actions/ContextRetrievalAction.java
Outdated
Show resolved
Hide resolved
plan/src/main/java/org/apache/flink/agents/plan/actions/ContextRetrievalAction.java
Outdated
Show resolved
Hide resolved
...rg/apache/flink/agents/integrations/vectorstores/elasticsearch/ElasticsearchVectorStore.java
Show resolved
Hide resolved
| String filter = (String) args.get("filter_query"); | ||
|
|
||
| List<Float> queryVector = new ArrayList<>(embedding.length); | ||
| for (float v : embedding) queryVector.add(v); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe could simplify to
List<Float> queryVector = Arrays.stream(embedding)
.boxed()
.collect(Collectors.toList());
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wenjin272
Thank you for your feedback.
But there is no float support in Arrays.stream() method.
| } | ||
|
|
||
| final SearchResponse<Map<String, Object>> searchResponse = | ||
| (SearchResponse) this.client.search(builder.build(), Map.class); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here we specify the TDocument to Map, could it be other types?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wenjin272
Thank you for your feedback.
Good point! While Python's Document class fixes content as str, I believe this is an over-generalization. Different vector store vendors may have different document structures.
Therefore, keeping ContentT as a generic type and delegating the responsibility to each vendor implementation would provide better flexibility. This allows each integration to define its own document structure rather than forcing all vendors into a single String-based content model.
Linked issue: #321
Purpose of change
This PR implements the Java version of Vector Store functionality for Flink Agents, following the design proposal in #143. This implementation enables RAG (Retrieval-Augmented Generation) capabilities by providing vector-based context retrieval.
@VectorStoreannotation for agent plan resource managementTests
API
Documentation
doc-neededdoc-not-needed