Skip to content

Conversation

@sklaha
Copy link
Contributor

@sklaha sklaha commented Nov 5, 2025

Sidecar endpoint for moving a node to a new token

These changes are part of the effort to introduce bespoke Sidecar APIs to support key operational functionality currently managed through nodetool commands. Changes in the commit:

  • API for moving a node to a new token
  • Unit tests
  • Integration tests

Copy link
Contributor

@frankgh frankgh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good in general, I have some comments.

*
* @param newToken the new token for the node to move to
*/
void move(String newToken);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to propagate any potential IOException from JMX and we should handle it

Suggested change
void move(String newToken);
void move(String newToken) throws IOException;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the method signature. OperationalJob.execute() already handles the exception. I also added tests for node move failure.

* @param newToken the string to validate as a BigInteger, can be null
* @return true if the string can be successfully parsed as a BigInteger, false if null or invalid
*/
public static boolean isValidBigInt(String newToken)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would defer validation to Cassandra, and handle the exception thrown by Cassandra. There are tokens that are not long-based or big integer based

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think, OrderPreservingPartitioner is deprecated and token is numeric for all other partitioners. Please correct me if I am wrong.

String body = context.body().asString();
if (body == null || body.equalsIgnoreCase("null"))
{
logger.warn("Bad request. Received null payload.");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a log entry will be logged by the parent class when we throw the exception, so I think we should remove this line:

Suggested change
logger.warn("Bad request. Received null payload.");

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@JsonCreator
public NodeMoveRequestPayload(@JsonProperty(value = "newToken", required = true) String newToken)
{
this.newToken = newToken;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should ensure the token is provided

Suggested change
this.newToken = newToken;
this.newToken = Objects.requireNonNull(newToken, "newToken is required");

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

{
NodeMoveRequestPayload payload = Json.decodeValue(body, NodeMoveRequestPayload.class);
String newToken = payload.newToken();
if (StringUtils.isNullOrEmpty(newToken) || !DataTypeUtils.isValidBigInt(newToken))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's defer validation of the token to Cassandra. Since some tokens might not necessarily be numeric, i.e. org.apache.cassandra.dht.OrderPreservingPartitioner

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think, OrderPreservingPartitioner is deprecated and token is numeric for all other partitioners. Please correct me if I am wrong.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, that's correct, but I think we shouldn't make assumptions from the Sidecar side. Maybe reducing the input to BigInteger is sufficient, but I prefer to defer validation to the partitioner.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated the validation of the token.

}
catch (DecodeException e)
{
logger.warn("Bad request. Received invalid JSON payload.");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will be logged in the parent abstract class. No need to log it here

Suggested change
logger.warn("Bad request. Received invalid JSON payload.");

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

{
private static final Logger LOGGER = LoggerFactory.getLogger(NodeMoveJob.class);
private static final String OPERATION = "move";
private static final String OPERATION_MODE_MOVING = "MOVING";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need to consider the org.apache.cassandra.service.StorageService.Mode#MOVE_FAILED mode as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We just need to check if a move operation is in progress or not. Check if the status is "MOVING" accomplishes that.

responseCode = "200",
content = @Content(mediaType = "application/json",
schema = @Schema(implementation = OperationalJobResponse.class)))
@APIResponse(description = "Node move operation initiated successfully",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we document the conflict scenario as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

assertThat(ring).isNotNull();

RingEntry ringEntry = ring.stream()
.filter(entry -> entry.fqdn().equals(node))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this won't work if we had vnodes enabled for this test.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't enable vnodes for this test and it succeeds.

@Singleton
public class NodeMoveHandler extends AbstractHandler<String> implements AccessProtected
{
private static final int MAX_TOKEN_LENGTH = 128;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT, document how you came up with value 128 for the maximum token length

@sklaha sklaha requested a review from frankgh December 22, 2025 18:22
Copy link
Contributor

@frankgh frankgh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 looks good to me

Copy link
Contributor

@sarankk sarankk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 Looks good to me, some minor nits.

* @param newToken the new token for the node to move to
* @return a reference to this Builder
*/
public Builder nodeMoveRequest(String newToken)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: @NotNull annotation for newToken

{
UUID jobId = UUID.randomUUID();
String newToken = "123456789";
String nodeMoveString = "{\"jobId\":\"" + jobId + "\",\"jobStatus\":\"SUCCEEDED\",\"instance\":\"127.0.0.1\"}";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Why do we return instance in response here? OperationalJobResponse seems to return only these 4 fields jobId, jobStatus, operation, reason

.expect(ResponsePredicate.SC_OK)
.putHeader("content-type", "application/json")
.sendBuffer(io.vertx.core.buffer.Buffer.buffer(requestBody), context.succeeding(response -> {
assertThat(response.statusCode()).isEqualTo(OK.code());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: verify "jobStatus":"FAILED", "reason":"Simulated failure"


// Validate the operational job status using the OperationalJobHandler
String jobId = responseBody.getString("jobId");
validateOperationalJobStatus(jobId, "move", OperationalJobStatus.SUCCEEDED);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clarification question, does storageOps.move call succeed even if the node is in moving status?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants