Skip to content

Conversation

@yyj8
Copy link
Owner

@yyj8 yyj8 commented Nov 23, 2024

…return a failure because the service may be unavailable

Fixes #xyz

Main Issue: #xyz

PIP: #xyz

Motivation

In some special scenarios, when the broker service has a deadlock, it needs to be able to automatically recover instead of requiring manual intervention. For example, when the service is deployed in a customer environment, we cannot directly manage it. If the service has a deadlock, the probe should return a failure because the service may be unavailable. The probe failure triggers a node restart to resolve the deadlock.

Modifications

Add deadlock detection in the probe. If a deadlock exists, print the thread stack and return a service unavailable exception

Verifying this change

  • Make sure that the change passes the CI checks.

(Please pick either of the following options)

This change is a trivial rework / code cleanup without any test coverage.

(or)

This change is already covered by existing tests, such as (please describe tests).

(or)

This change added tests and can be verified as follows:

(example:)

  • Added integration tests for end-to-end deployment with large payloads (10MB)
  • Extended integration test for recovery after broker failure

Does this pull request potentially affect one of the following parts:

If the box was checked, please highlight the changes

  • Dependencies (add or upgrade a dependency)
  • The public API
  • The schema
  • The default values of configurations
  • The threading model
  • The binary protocol
  • The REST endpoints
  • The admin CLI options
  • The metrics
  • Anything that affects deployment

Documentation

  • doc
  • doc-required
  • doc-not-needed
  • doc-complete

Matching PR in forked repository

PR in forked repository:

lhotari and others added 28 commits May 17, 2025 09:02
…pache#24313)

### Motivation

There is still a call to the synchronous method in `internalCreatePartitionedTopic`, which can lead to a deadlock under certain conditions.

### Modifications

- Replaced synchronous call to `getNamespacePolicies()` with `getNamespacePoliciesAsync()`.
- Replaced synchronous call to `getTopicPartitionList()` with `getTopicPartitionListAsync()`.
- Updated `internalCreatePartitionedTopic` to propagate the async flow properly.
### Motivation
 we can now obtain the offset of a message by its message id:

1. Get the message by id using `get-message-by-id` cmd
2. Get the index of the message using `Message.getIndex()`

But we cannot obtain the message id by offset. Then we need to add a new API to get the message id by offset.


### Modifications

Add a new http API to retrieve the message ID by offset.
We propose to add a new API to retrieve the message ID by offset, enabling us to cache the mapping between message ID and offset.
This will allow us to use offsets for seek and acknowledgment operations when consuming messages through the standardized API.
…picWithRollbackDuration (apache#24318)

### Motivation
![image](https://github.com/user-attachments/assets/ddd6e926-b4c4-438f-9c9a-2b8407fcbd09)
The root cause of this problem is that the `reader` is not cleaned up after the unit test `shouldSupportCancellingReadNextAsync` is executed.


### Modifications
Add @cleanup on `reader` in test `shouldSupportCancellingReadNextAsync`.
…tion deletion rate (apache#24190)

Co-authored-by: zjxxzjwang <zjxxzjwang@tencent.com>
Co-authored-by: Lari Hotari <lhotari@users.noreply.github.com>
…ed state (apache#24352)

Signed-off-by: Zixuan Liu <nodeces@gmail.com>
…opic deleted, even if the partitioned topic still exists (apache#24350)
…ry (apache#19783)

Co-authored-by: Lari Hotari <lhotari@apache.org>
dao-jun and others added 30 commits September 9, 2025 11:31
…ER_WATER_MARK to pulsar conf and pause receive requests when channel is unwritable (apache#24510)
… Delivery (apache#24625)

Co-authored-by: Christina <qwang3@paypal.com>
… from the replay queue after a consumer disconnects and leaves a backlog (apache#24736)

Co-authored-by: Nikolai Borisov <nikolai.borisov@onde.app>
…s of retention policy (apache#24733)

Co-authored-by: Jiwe Guo <technoboy@apache.org>
Co-authored-by: oneby-wang <onebywang@qq.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.