Skip to content

Conversation

@twosom
Copy link
Contributor

@twosom twosom commented Nov 25, 2025

Linked issue: #321

Purpose of change

This PR implements the Java version of Vector Store functionality for Flink Agents, following the design proposal in #143. This implementation enables RAG (Retrieval-Augmented Generation) capabilities by providing vector-based context retrieval.

  • Added Vector Store API with query support and document handling
  • Added @VectorStore annotation for agent plan resource management
  • Added context retrieval request and response events
  • Implemented context retrieval action with vector store support
  • Implemented Elasticsearch vector store integration
  • Added ElasticsearchVectorStore RAG example with Ollama integration

Tests

API

Documentation

  • doc-needed
  • doc-not-needed

@github-actions github-actions bot added priority/major Default priority of the PR or issue. fixVersion/0.2.0 The feature or bug should be implemented/fixed in the 0.2.0 version. doc-label-missing The Bot applies this label either because none or multiple labels were provided. labels Nov 25, 2025
@github-actions
Copy link

@twosom Please add the following content to your PR description and select a checkbox:

- [ ] `doc-needed` 
- [ ] `doc-not-needed` <!-- Your PR changes do not impact docs -->

@github-actions github-actions bot added doc-not-needed Your PR changes do not impact docs and removed doc-label-missing The Bot applies this label either because none or multiple labels were provided. labels Nov 25, 2025
@wenjin272
Copy link
Collaborator

Hi, @twosom, I'll start reviewing this PR as soon as possible. Besides, this PR should be labeled 'doc-needed'.

@github-actions github-actions bot added doc-needed Your PR changes impact docs. and removed doc-not-needed Your PR changes do not impact docs labels Nov 27, 2025
Copy link
Collaborator

@wenjin272 wenjin272 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, @twosom, thanks for your contribution. Overall looks good to me. I left some minor comments about details.

Besides, due to the example requires es cluster, I will verify the example next week.

And some test cases failed, I think it is caused by the newly added built-in action, which break the agent plan verification.

String filter = (String) args.get("filter_query");

List<Float> queryVector = new ArrayList<>(embedding.length);
for (float v : embedding) queryVector.add(v);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe could simplify to

List<Float> queryVector = Arrays.stream(embedding)
                                .boxed()
                                .collect(Collectors.toList());

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wenjin272
Thank you for your feedback.
But there is no float support in Arrays.stream() method.

image

}

final SearchResponse<Map<String, Object>> searchResponse =
(SearchResponse) this.client.search(builder.build(), Map.class);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we specify the TDocument to Map, could it be other types?

Copy link
Contributor Author

@twosom twosom Nov 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wenjin272
Thank you for your feedback.
Good point! While Python's Document class fixes content as str, I believe this is an over-generalization. Different vector store vendors may have different document structures.

Therefore, keeping ContentT as a generic type and delegating the responsibility to each vendor implementation would provide better flexibility. This allows each integration to define its own document structure rather than forcing all vendors into a single String-based content model.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

doc-needed Your PR changes impact docs. fixVersion/0.2.0 The feature or bug should be implemented/fixed in the 0.2.0 version. priority/major Default priority of the PR or issue.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants