[improve][ci] Disable test that causes OOME until the problem has been resolved#22586
Conversation
|
In one of the heap dumps, there was 251,029 lambdas which all reference a Using https://github.com/vlsi/mat-calcite-plugin to query the heap dump. select this['arg$2.completeTopicName'], count(*) from "org.apache.pulsar.broker.resources.NamespaceResources$PartitionedTopicResources$$Lambda$1819+0x00007f08a8b65ee8" group by 1 |
|
In another heapdump select this['arg$2.completeTopicName'], count(*) from "org.apache.pulsar.broker.resources.NamespaceResources$PartitionedTopicResources$$Lambda$3405+0x00007fae50f7b000" group by 1 |
|
There are a few recent replicator related changes #21946, #21948 and #22537 . @poorbarcode please check if one of the changes is triggering the OOME issue possibly related to deletion. There are a lot of entries for |
|
Just wondering if the problem is somehow related to namespace deletion with replication enabled. The concurrency issue is explained in #22541 (comment) |
|
the namespace deletion in the test might be the code that triggers the problem: @poorbarcode do you have a chance to debug this issue? |
|
There are more problems. Using heap dump from https://github.com/apache/pulsar/actions/runs/8835173621/attempts/1?pr=22583 select toString(this['stack.fn.arg$1']), count(*) from java.util.concurrent.CompletableFuture where this['stack.fn'] is not null group by 1 order by 2 desc |
select toString(this['result.ex.detailMessage']), count(*) from java.util.concurrent.CompletableFuture where this['result.ex.detailMessage'] is not null group by 1 order by 2 desc |
Motivation
Unit test group 1 fails often with OOME. (example)
Modifications
The issue is most like related to #21495 and org.apache.pulsar.broker.service.ReplicatorSubscriptionTest#testWriteMarkerTaskOfReplicateSubscriptions .
Disable the test until the problem has been resolved.
Documentation
docdoc-requireddoc-not-neededdoc-complete