Skip to content

LeaseIdMismatchWithLeaseOperation and storage performance warnings #460

@kcr-aveva

Description

@kcr-aveva

We are using the following version of Durable Task:
3.1.0

We are having a lot of storage performance issue warnings such as these, as well as high CPU usage, which is never recovering. The high CPU usage seems to coincide with these warnings appearing:

2026-01-03T20:32:52.895174533Z warn: DurableTask.Netherite.FasterStorage.Performance[0]
2026-01-03T20:32:52.895178891Z       Part29 Performance issue detected: storage operation PageBlobClient.DownloadStreamingAsync (ReadFromDevice) timed out on attempt 4 after 24.2s, retrying now; target=c897ae6d-7da5-4c58-9c4e-41cbcf7ee04c/p29/store.obj/store.obj.1 id=2843598 position=15695872 length=2560 operationReadRange=[15695872, 15698432]
2026-01-03T20:32:52.895470184Z warn: DurableTask.Netherite.FasterStorage.Performance[0]
2026-01-03T20:32:52.895517954Z       Part29 Performance issue detected: storage operation PageBlobClient.DownloadStreamingAsync (ReadFromDevice) timed out on attempt 4 after 24.2s, retrying now; target=c897ae6d-7da5-4c58-9c4e-41cbcf7ee04c/p29/store.obj/store.obj.1 id=2843640 position=18891264 length=2560 operationReadRange=[18891264, 18893824]
2026-01-03T20:32:52.895538112Z warn: DurableTask.Netherite.FasterStorage.Performance[0]
2026-01-03T20:32:52.895586763Z       Part29 Performance issue detected: storage operation PageBlobClient.DownloadStreamingAsync (ReadFromDevice) timed out on attempt 4 after 24.2s, retrying now; target=c897ae6d-7da5-4c58-9c4e-41cbcf7ee04c/p29/store.obj/store.obj.1 id=2843657 position=18571776 length=2560 operationReadRange=[18571776, 18574336]

We also have the following dispose error:

2026-01-03T19:33:08.801996474Z       Part19 !!! Dispose Task FasterKV.Dispose timed out after 00:02:00 in PartitionErrorHandler.DisposeAsync:  terminatePartition=False

The final point, where the CPU usage spikes has this exception information:

2026-01-03T20:34:41.795977457Z warn: DurableTask.Netherite.EventHubsTransport[0]
2026-01-03T20:34:41.796015338Z       EventHubsProcessor loadmonitor received event hubs error indication: Azure.Messaging.EventHubs.EventHubsException(GeneralError): WARNING: A load balancing cycle has taken too long to complete.  A slow cycle can cause stability issues with partition ownership.  Consider investigating storage latency and thread pool health.  Common causes are soft delete being enabled for storage and too many partitions owned.  You may also want to consider increasing the 'PartitionOwnershipExpirationInterval' in the processor options.  Cycle Duration: '178.90' seconds.  Partition Ownership Interval '1:20' seconds. (loadmonitor).  For troubleshooting information, see https://aka.ms/azsdk/net/eventhubs/exceptions/troubleshoot
2026-01-03T20:34:41.796022371Z       
2026-01-03T20:34:41.803073599Z warn: DurableTask.Netherite[0]
2026-01-03T20:34:41.803095390Z       Part29 !!! Could not release partition lease during shutdown in MaintenanceLoopAsync: Azure.RequestFailedException: The lease ID specified did not match the lease ID for the blob.
2026-01-03T20:34:41.803102543Z       RequestId:704d58e4-001e-0017-3af0-7c66f2000000
2026-01-03T20:34:41.803121879Z       Time:2026-01-03T20:34:41.8038929Z
2026-01-03T20:34:41.803126508Z       Status: 409 (The lease ID specified did not match the lease ID for the blob.)
2026-01-03T20:34:41.803134062Z       ErrorCode: LeaseIdMismatchWithLeaseOperation
2026-01-03T20:34:41.803138000Z       
2026-01-03T20:34:41.803141987Z       Content:
2026-01-03T20:34:41.803146285Z       <?xml version="1.0" encoding="utf-8"?><Error><Code>LeaseIdMismatchWithLeaseOperation</Code><Message>The lease ID specified did not match the lease ID for the blob.
2026-01-03T20:34:41.803588563Z       RequestId:704d58e4-001e-0017-3af0-7c66f2000000
2026-01-03T20:34:41.803597490Z       Time:2026-01-03T20:34:41.8038929Z</Message></Error>
2026-01-03T20:34:41.803603732Z       
2026-01-03T20:34:41.803609783Z       Headers:
2026-01-03T20:34:41.803615523Z       Server: Windows-Azure-Blob/1.0 Microsoft-HTTPAPI/2.0
2026-01-03T20:34:41.803621014Z       x-ms-request-id: 704d58e4-001e-0017-3af0-7c66f2000000
2026-01-03T20:34:41.803645199Z       x-ms-client-request-id: 80407b4b-1a35-48b5-8980-3b717a2360b5
2026-01-03T20:34:41.803650910Z       x-ms-version: 2024-11-04
2026-01-03T20:34:41.803656310Z       x-ms-error-code: LeaseIdMismatchWithLeaseOperation
2026-01-03T20:34:41.803661961Z       Date: Sat, 03 Jan 2026 20:34:41 GMT
2026-01-03T20:34:41.803667451Z       Content-Length: 265
2026-01-03T20:34:41.803672921Z       Content-Type: application/xml
2026-01-03T20:34:41.803678662Z        terminatePartition=False
2026-01-03T20:34:42.031180491Z warn: DurableTask.Netherite.EventHubsTransport[0]
2026-01-03T20:34:42.031223521Z       EventHubsProcessor partitions received event hubs error indication: Azure.Messaging.EventHubs.EventHubsException(GeneralError): WARNING: A load balancing cycle has taken too long to complete.  A slow cycle can cause stability issues with partition ownership.  Consider investigating storage latency and thread pool health.  Common causes are soft delete being enabled for storage and too many partitions owned.  You may also want to consider increasing the 'PartitionOwnershipExpirationInterval' in the processor options.  Cycle Duration: '183.04' seconds.  Partition Ownership Interval '1:20' seconds. (partitions).  For troubleshooting information, see https://aka.ms/azsdk/net/eventhubs/exceptions/troubleshoot
2026-01-03T20:34:42.031231777Z       
2026-01-03T20:34:42.031238209Z warn: DurableTask.Netherite[0]
2026-01-03T20:34:42.031245222Z       Part09 !!! EventHubsProcessor shut down before partition fully started in StartPartitionAsync:  terminatePartition=True
2026-01-03T20:34:42.031833323Z warn: DurableTask.Netherite[0]
2026-01-03T20:34:42.031883397Z       Part26 !!! EventHubsProcessor shut down before partition fully started in StartPartitionAsync:  terminatePartition=True
2026-01-03T20:34:42.115906570Z warn: DurableTask.Netherite[0]
2026-01-03T20:34:42.115957877Z       Part29 !!! EventHubsProcessor shut down before partition fully started in StartPartitionAsync:  terminatePartition=True
2026-01-03T20:34:42.116294146Z warn: DurableTask.Netherite.EventHubsTransport[0]
2026-01-03T20:34:42.116312290Z       EventHubsProcessor partitions/29 received packets for closed processor, discarded
2026-01-03T20:34:42.117280554Z warn: DurableTask.Netherite.EventHubsTransport[0]
2026-01-03T20:34:42.117300562Z       EventHubsProcessor partitions/29 received event hubs error indication: Azure.Messaging.EventHubs.EventHubsException(GeneralError): An error was encountered while executing developer-provided code to process events.  On most hosts, this will fault the task responsible for partition processing, causing it to be restarted from the last checkpoint.  On some hosts, it may crash the process.  It is very strongly advised that all developer-provided code include a try/catch wrapper and ensure that no exceptions are allowed to propagate up the stack.  Exception details can be found in the inner exception. (partitions).  For troubleshooting information, see https://aka.ms/azsdk/net/eventhubs/exceptions/troubleshoot
2026-01-03T20:34:42.117344073Z       
2026-01-03T20:34:42.117383597Z       System.NullReferenceException: Object reference not set to an instance of an object.
2026-01-03T20:34:42.117430886Z          at DurableTask.Netherite.EventHubsTransport.PartitionProcessor.DurableTask.Netherite.EventHubsTransport.IEventProcessor.ProcessEventBatchAsync(IEnumerable`1 packets, CancellationToken cancellationToken) in /_/src/DurableTask.Netherite/TransportLayer/EventHubs/PartitionProcessor.cs:line 540
2026-01-03T20:34:42.117440534Z          at DurableTask.Netherite.EventHubsTransport.EventProcessorHost.OnProcessingEventBatchAsync(IEnumerable`1 events, EventProcessorPartition partition, CancellationToken cancellationToken) in /_/src/DurableTask.Netherite/TransportLayer/EventHubs/EventProcessorHost.cs:line 137
2026-01-03T20:34:42.117447968Z          at Azure.Messaging.EventHubs.Primitives.EventProcessor`1.ProcessEventBatchAsync(TPartition partition, IReadOnlyList`1 eventBatch, Boolean dispatchEmptyBatches, CancellationToken cancellationToken)
2026-01-03T20:34:42.117454420Z          at Azure.Messaging.EventHubs.Primitives.EventProcessor`1.<>c__DisplayClass75_0.<<CreatePartitionProcessor>g__performProcessing|1>d.MoveNext()
2026-01-03T20:34:42.117460181Z       --- End of stack trace from previous location ---
2026-01-03T20:34:42.117465971Z          at Azure.Messaging.EventHubs.Primitives.EventProcessor`1.<>c__DisplayClass75_0.<<CreatePartitionProcessor>g__performProcessing|1>d.MoveNext()
2026-01-03T20:34:42.117472534Z       --- End of stack trace from previous location ---
2026-01-03T20:34:42.117479387Z          at Azure.Messaging.EventHubs.Primitives.EventProcessor`1.<>c__DisplayClass75_0.<<CreatePartitionProcessor>g__performProcessing|1>d.MoveNext()
2026-01-03T20:34:42.117485979Z       --- End of stack trace from previous location ---
2026-01-03T20:34:42.117492341Z          at Azure.Messaging.EventHubs.Primitives.EventProcessor`1.<>c__DisplayClass97_0.<<StopProcessingPartitionAsync>b__0>d.MoveNext()

Please can you let me know if there is a solution to this, or perhaps we need to have different configuration in our setup?

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions