[improve][broker]add active status into cursor stats#1
Open
[improve][broker]add active status into cursor stats#1
Conversation
This reverts commit 1e9a1f0.
Fixes apache#16782 ### Motivation As we all know, Bundle split has 3 algorithms: - range_equally_divide - topic_count_equally_divide - specified_positions_divide However, none of these algorithms can divide bundles according to flow or qps, which may cause bundles to be split multiple times.
…che#17459) --- *Motivation* We update the jna version in this [PR](apache#17262). We should update the version in presto license file as well.
Fixes apache#17392 ### Motivation All timers in `ProducerImpl` are `std::shared_ptr` objects that can be reset with `nullptr` in `ProducerImpl::cancelTimers`. It could lead to null pointer access in some cases. See apache#17392 (comment) for the analysis. Generally it's not necessary to hold a nullable pointer to the timer. However, to resolve the cyclic reference issue, apache#5246 reset the shared pointer to reduce the reference count manually. It's not a good solution because we have to perform null check for timers everywhere. The null check still has some race condition issue like: Thread 1: ```c++ if (timer) { // [1] timer is not nullptr timer->async_wait(/* ... */); // [3] timer is null now, see [2] below } ``` Thread 2: ```c++ timer.reset(); // [2] ``` The best solution is to capture `weak_ptr` in timer's callback and call `lock()` to check if the referenced object is still valid. ### Modifications - Change the type of `sendTimer_` and `batchTimer_` to `deadline_timer`, not a `shared_ptr`. - Use `PeriodicTask` instead of the `deadline_timer` for token refresh. - Migrate `weak_from_this()` method from C++17 and capture `weak_from_this()` instead of `shared_from_this()` in callbacks. ### Verifying this change Run the `testResendViaSendCallback` for many times and we can see it won't fail after this patch. ```bash ./tests/main --gtest_filter='BasicEndToEndTest.testResendViaSendCallback' --gtest_repeat=30 ```
…ration is invalid (apache#17035)
* Sync recent changes from apache#17030, apache#17039, apache#16315, and apache#17057 * fix apache#17119 * minor updates * add link of release notes to navigation * fix * update release process as per PIP-190 * minor fix * minor fix * Update release-process.md
Signed-off-by: Zixuan Liu <nodeces@gmail.com> Signed-off-by: Zixuan Liu <nodeces@gmail.com>
… and added test cases including default time (apache#16130)
### Motivation When a Pulsar topic is unloaded from a broker, certain metrics related to that topic will appear to remain active for the broker for 5 minutes. This is confusing for troubleshooting because it makes the topic appear to be owned by multiple brokers for a short period of time. See below for a way to reproduce this behavior. In order to solve this "zombie" metric problem, I propose we remove the timestamps that get exported with each Prometheus metric served by the broker. ### Analysis Since we introduced Prometheus metrics in apache#294, we have exported a timestamp along with most metrics. This is an optional, valid part of the spec defined [here](https://prometheus.io/docs/instrumenting/exposition_formats/#comments-help-text-and-type-information). However, after our adoption of Prometheus metrics, the Prometheus project released version 2.0 with a significant improvement to its concept of staleness. In short, before 2.0, a metric that was in the last scrape but not the next one (this often happens for topics that are unloaded) will essentially inherit the most recent value for the last 5 minute window. If there isn't one in the past 5 minutes, the metric becomes "stale" and isn't reported. Starting in 2.0, there was new logic to consider a value stale the very first time that it is not reported in a scrape. Importantly, this new behavior is only available if you do not export timestamps with metrics, as documented here: https://prometheus.io/docs/prometheus/latest/querying/basics/#staleness. We want to use the new behavior because it gives better insight into all topic metrics, which are subject to move between brokers at any time. This presentation https://www.youtube.com/watch?v=GcTzd2CLH7I and slide deck https://promcon.io/2017-munich/slides/staleness-in-prometheus-2-0.pdf document the feature in detail. This blog post was also helpful: https://www.robustperception.io/staleness-and-promql/. Additional motivation comes from mailing list threads like this one https://groups.google.com/g/prometheus-users/c/8OFAwp1OEcY. It says: > Note, however, that adding timestamps is an extremely niche use case. Most of the users who think the need it should actually not do it. > > The main usecases within that tiny niche are federation and mirroring the data from another monitoring system. The Prometheus Go client also indicates a similar motivation: https://pkg.go.dev/github.com/prometheus/client_golang/prometheus#NewMetricWithTimestamp. The OpenMetrics project also recommends against exporting timestamps: https://github.com/OpenObservability/OpenMetrics/blob/main/specification/OpenMetrics.md#exposing-timestamps. As such, I think we are not a niche use case, and we should not add timestamps to our metrics. ### Reproducing the problem 1. Run any 2.x version of Prometheus (I used 2.31.0) along with the following scrape config: ```yaml - job_name: broker honor_timestamps: true scrape_interval: 30s scrape_timeout: 10s metrics_path: /metrics scheme: http follow_redirects: true static_configs: - targets: ["localhost:8080"] ``` 2. Start pulsar standalone on the same machine. I used a recently compiled version of master. 3. Publish messages to a topic. 4. Observe `pulsar_in_messages_total` metric for the topic in the prometheus UI (localhost:9090) 5. Stop the producer. 6. Unload the topic from the broker. 7. Optionally, `curl` the metrics endpoint to verify that the topic’s `pulsar_in_messages_total` metric is no longer reported. 8. Watch the metrics get reported in prometheus for 5 additional minutes. When you set `honor_timestamps: false`, the metric stops getting reported right after the topic is unloaded, which is the desired behavior. ### Modifications * Remove all timestamps from metrics * Fix affected tests and test files (some of those tests were in the proxy and the function worker, but no code was changed for those modules) ### Verifying this change This change is accompanied by updated tests. ### Does this pull request potentially affect one of the following parts: This is technically a breaking change to the metrics, though I would consider it a bug fix at this point. I will discuss it on the mailing list to ensure it gets proper visibility. Given how frequently Pulsar changes which metrics are exposed between each scrape, I think this is an important fix that should be cherry picked to older release branches. Technically, we can avoid cherry picking this change if we advise users to set `honor_timestamps: false`. However, I think it is better to just remove them. ### Documentation - [x] `doc-not-needed`
…17378) Co-authored-by: huangzegui <huangzegui@didiglobal.com>
…pulsar-perf-producer (apache#17381) * [improve][cli] Add option to disable batching in pulsar-testclient * [improve][cli] Add option to disable batching in pulsar-testclient * [improve][cli] Add option to disable batching in pulsar-testclient * [improve][cli] Add option to disable batching in pulsar-testclient * [improve][cli] Add option to disable batching in pulsar-testclient * [improve][cli] Add option to disable batching in pulsar-testclient Co-authored-by: Vineeth <vineeth.polamreddy@verizonmedia.com> Co-authored-by: Rajan Dhabalia <rdhabalia@apache.org>
…sOpenCacheSetEnabled=true (apache#17465)
…ed (apache#17704) * [fix][metrics]wrong metrics text generated when label_cluster specified * improve logic branch * mark test group
Signed-off-by: tison <wander4096@gmail.com> Signed-off-by: tison <wander4096@gmail.com>
… been published after the topic gets activated on a broker (apache#16618) * Skip creating a replication snapshot if no messages have been published * Adapt test to new behavior where replication snapshots happen only when there are new messages
* Call cleanup method in finally block to ensure it's not skipped * Clear invocations for the mocks that are left around without cleanup * Cleanup PulsarService and PulsarAdmin mocks/spies in MockedPulsarServiceBaseTest * Don't record invocations at all for PulsarService and PulsarAdmin in MockedPulsarServiceBaseTest * Don't record invocations for spies by default * Simplify reseting mocks * Fix PersistentTopicTest * Fix TokenExpirationProducerConsumerTest * Fix SimpleLoadManagerImplTest * Fix FilterEntryTest
…he#17834) - use Bookkeeper defaults by setting BK_METADATA_OPTIONS=none
…7209) Fixes apache#17186 ### Motivation There are some cases in which it is useful to be able to include current position of the message when reset of cursor was made. ### Modifications * Support inclusive seek in c++ consumers. * Add a unit test to verify.
* fix: delete sqlite files after jdbc connection closed This closes apache#17713. Signed-off-by: tison <wander4096@gmail.com> * uses isolated db file Signed-off-by: tison <wander4096@gmail.com> * Revert "uses isolated db file" This reverts commit 295db3c. * close in order Signed-off-by: tison <wander4096@gmail.com> * strong order guarantee Signed-off-by: tison <wander4096@gmail.com> * factor out defer logic to avoid further bugs Signed-off-by: tison <wander4096@gmail.com> * Revert "factor out defer logic to avoid further bugs" This reverts commit f7f4634. * Revert "strong order guarantee" This reverts commit 747086f. * use awaitTermination Signed-off-by: tison <wander4096@gmail.com> Signed-off-by: tison <wander4096@gmail.com>
…AndCommitForTransaction (apache#17845) * scenario is already covered by PendingAckPersistentTest
…#17691) Co-authored-by: Marvin Cai <cai19930303@gmail.com>
…ManagedLedgerImpl (apache#17293) - a NPE with no description is confusing
…on time (apache#17790) Fixes - apache#17623 - apache#17637 ### Motivation Manually release resources, including `consumer`, `producer`, `pulsar client`, `transaction`, and `topic`. This saves `setup` and `cleanup` time before and after each method. ### Modifications - Manually release resources instead of calling `cleanup` & `setup` each method - remove useless method `markDeletePositionCheck` - `Integer.valueOf(int)` instead of `new Integer(int)`, because `new Integer(int)` is deprecated ### Matching PR in forked repository PR in forked repository: - poorbarcode#10
Fixes apache#17785 ### Motivation The `failureMap` need to be clear after run per unit test. ### Modifications Clear `failureMap` after run per unit test, and only run once `setup()`/`cleanup()` to reduce execution time. ### Matching PR in forked repository PR in forked repository: coderzc#6
…ache#17252) - fixes issue with stats where timestamps might be inconsistent because of visibility issues - fields should be volatile to ensure visibility of updated values in a consistent manner - in replication, the lastDataMessagePublishedTimestamp field in PersistentTopic might be inconsistent unless volatile is used
…he#16891) * add reader config doc * update to the versioned doc * Update site2/docs/io-debezium-source.md Co-authored-by: momo-jun <60642177+momo-jun@users.noreply.github.com> * Update site2/docs/io-debezium-source.md Co-authored-by: momo-jun <60642177+momo-jun@users.noreply.github.com> * revert changes to 2.10.1 and 2.9.3 Co-authored-by: momo-jun <60642177+momo-jun@users.noreply.github.com>
5 tasks
|
/pulsarbot run-failure-checks |
1 similar comment
Owner
Author
|
/pulsarbot run-failure-checks |
|
The pr had no activity for 30 days, mark with Stale label. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
The active state indicates whether a cursor is active. And it affects the cache hit rate, because inactive cursors will be evicted from the entry cache. This active state helps to troubleshoot issues with low cache hit rate.
Also it is meaningful for configuring a suitable values for
managedLedgerCursorBackloggedThresholdinServiceConfigurationModifications
Add active status into cursor stats.
It will get the following sample data when getting the
internalStatsof a partitioned topic.Verifying this change
This change is a trivial rework / code cleanup without any test coverage.
Does this pull request potentially affect one of the following parts:
If
yeswas chosen, please highlight the changesDocumentation
Check the box below or label this PR directly.
Need to update docs?
doc-requireddoc-not-neededdocdoc-completeMatching PR in forked repository
PR in forked repository: #1