Skip to content

Conversation

@micheal-o
Copy link
Contributor

@micheal-o micheal-o commented Nov 14, 2025

What changes were proposed in this pull request?

Introducing the API for offline repartitioning of streaming state. This is currently not exposed, since it is still in development. Also implemented some of the core functionalities of the repartition batch runner, that validates the checkpoint, creates the repartition batch and commits. Subsequent PRs will build on this. Also Spark connect and pyspark APIs will be added in subsequent PRs.

Also introduce the streamingCheckpointManager for performing operations on the streaming checkpoint. This is currently not exposed, since it is still in development.

Why are the changes needed?

Streaming state repartitioning

Does this PR introduce any user-facing change?

No

How was this patch tested?

New test suite added

Was this patch authored or co-authored using generative AI tooling?

No

Copy link
Contributor

@anishshri-db anishshri-db left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm pending green CI

@anishshri-db
Copy link
Contributor

@micheal-o - could you pls check for the failed linter CI job ? Thx

@micheal-o
Copy link
Contributor Author

@micheal-o - could you pls check for the failed linter CI job ? Thx

@anishshri-db yeah, fixed the linter error. Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants