Implement rotating write endpoints for MemqProducer #45

jeffxiang · 2025-09-24T19:27:07Z

Allow MemQ producers to maintain a sticky set of N brokers for round-robin writes. This probabilistically reduces the variance in broker throughput vs. just one sticky connection.

The idea is to maintain the favorable properties of the current endpoint selection algorithm but expand it to accommodate N endpoints in rotation. The favorable properties we want to maintain are:

Stickiness: Once working endpoints are chosen, we want to keep using them. This reduces connection counts.
Randomness: Each AZ-local endpoint should have equal probability of being chosen. This ensures even distribution across restarts.
Greediness: We should choose the first N endpoints that are successful. This simplifies the selection logic and maintenance overhead.

To accommodate N endpoints in rotation, we only slightly modify the original endpoint selection algorithm so that we replace the singleton currentEndpoint and instead keep a set of writeEndpoints which has a maximum size of numWriteEndpoints. The modified algorithm looks similar to the existing algorithm:

Discover topic metadata → Set<Endpoint> broker endpoints which host the topic
Filter through only the AZ-local endpoints + shuffle in random order
While (writeEndpoints.size() < numWriteEndpoints):
  New write → try the first endpoint in AZ-local endpoints
    If success, add it to writeEndpoints
    If failed, go down the list until one succeeds and add it to writeEndpoints
    Rotate AZ-local endpoints by 1 index
Now that writeEndpoints.size() == numWriteEndpoints, simply rotate writeEndpoints by 1 for every future write and try the first endpoint
If an endpoint is unreachable / failed, remove it from writeEndpoints and repeat step 3

Changes are made to the following core classes:

MemqCommonClient: modified endpoint selection algorithm
NetworkClient: Introduce a Map<InetSocketAddress, ChannelFuture> channelPool to maintain a set of open channels for each endpoint
ResponseHandler: Introduce 2 maps: channelToRequests and requestsToChannel to track and maintain inflight requests for multiple channels

Testing:

Unit test coverage for changes in core classes
Functional testing in prod-like environments, including chaos scenarios such as multiple dead brokers, rolling restarts/replacements, etc.

…rioritization + removals after multiple failures

vahidhashemian

Thanks! Left a few comments. Also, would be great to add:

Javadoc for the new code/methods/configs
metrics where it makes sense

memq-client/src/main/java/com/pinterest/memq/client/commons2/MemqCommonClient.java

memq-client/src/main/java/com/pinterest/memq/client/commons2/network/NetworkClient.java

memq-client/src/main/java/com/pinterest/memq/client/commons2/MemqCommonClient.java

memq-commons/src/main/java/com/pinterest/memq/commons/protocol/Broker.java

vahidhashemian · 2025-10-31T04:18:36Z

memq-client/src/main/java/com/pinterest/memq/client/commons2/MemqCommonClient.java

+              deprioritizeDeadEndpoint(endpoint, topic);  // this endpoint is down even after retries in NetworkClient, remove it from the write endpoints and take another one from locality endpoints
+            } catch (Exception ex) {
+              logger.error("Failed to refresh write endpoints", ex);
+              throw e;


Should it throw ex too?

we're throwing e instead of ex because ex is a secondary error thrown by deprioritizeDeadEndpoint(). the main issue is the ConnectException.

vahidhashemian · 2025-10-31T04:26:30Z

memq-client/src/main/java/com/pinterest/memq/client/commons2/MemqCommonClient.java

+    if (currentWrites.size() < numWriteEndpoints && !currentWrites.contains(endpoint)) {
      logger.info("Registering write endpoint: " + endpoint + " for topic: " + topic);
-      writeEndpoints.add(endpoint);
+      List<Endpoint> newWrites = new ArrayList<>(currentWrites);
+      newWrites.add(endpoint);
+      this.writeEndpoints = Collections.unmodifiableList(newWrites);
    }


Is it still possible for two threads to overwrite each other in this if block?

hypothetically yes, but the caller of MemqCommonClient is the Request.Dispatch thread which is single-threaded: https://github.com/pinterest/memq/blob/main/memq-client/src/main/java/com/pinterest/memq/client/producer2/RequestManager.java#L84

jeffxiang added 16 commits September 24, 2025 15:25

Implement rotating write endpoints for MemqProducer

ade09d2

Bump version to 1.0.1-SNAPSHOT

1d2456c

Add logs

f78e7ac

Implement connection multiplexing in NetworkClient

a4f1b06

Fix tests

cb487c8

Add print debug

c3acd4d

Ignore test

efc7585

Change config key name; add javadocs

97deda8

Reproduce connection reset by peer

2570a21

Rotate out dead endpoints

2017ac5

Fix tests

b451e98

Print exception in testTwoBrokerWrites

6080f43

Fix test

ca7dd64

Add unit tests and javadocs

cec8a32

Update endpoint rotation algorithm in MemqCommonClient to perform dep…

f6a4c38

…rioritization + removals after multiple failures

Improve javadocs

9cb7ce9

jeffxiang marked this pull request as ready for review October 27, 2025 15:01

jeffxiang requested a review from a team as a code owner October 27, 2025 15:01

vahidhashemian reviewed Oct 28, 2025

View reviewed changes

Address comments

3fec43c

vahidhashemian reviewed Oct 31, 2025

View reviewed changes

vahidhashemian approved these changes Nov 3, 2025

View reviewed changes

jeffxiang merged commit a1fb975 into main Nov 3, 2025
1 check passed

jeffxiang deleted the producer_balancing branch November 3, 2025 17:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement rotating write endpoints for MemqProducer #45

Implement rotating write endpoints for MemqProducer #45

Uh oh!

jeffxiang commented Sep 24, 2025 •

edited

Loading

Uh oh!

vahidhashemian left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

vahidhashemian Oct 31, 2025

Uh oh!

jeffxiang Oct 31, 2025

Uh oh!

vahidhashemian Oct 31, 2025

Uh oh!

jeffxiang Oct 31, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Implement rotating write endpoints for MemqProducer #45

Implement rotating write endpoints for MemqProducer #45

Uh oh!

Conversation

jeffxiang commented Sep 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vahidhashemian left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

vahidhashemian Oct 31, 2025

Choose a reason for hiding this comment

Uh oh!

jeffxiang Oct 31, 2025

Choose a reason for hiding this comment

Uh oh!

vahidhashemian Oct 31, 2025

Choose a reason for hiding this comment

Uh oh!

jeffxiang Oct 31, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jeffxiang commented Sep 24, 2025 •

edited

Loading