Skip to content

Conversation

@CasperGN
Copy link
Contributor

@CasperGN CasperGN commented Dec 18, 2025

Description

Implement default retry policy and allow overwriting during agent instantiation.

Issue reference

We strive to have all PR being opened based on an issue, where the problem or feature have been discussed prior to implementation.

Please reference the issue this PR closes: #221

Checklist

Please make sure you've completed the relevant tasks for this PR, out of the following list:

Note: We expect contributors to open a corresponding documentation PR in the dapr/docs repository. As the implementer, you are the best person to document your work! Implementation PRs will not be merged until the documentation PR is opened and ready for review.

Signed-off-by: Casper Nielsen <casper@diagrid.io>
…olicy

Signed-off-by: Casper Nielsen <casper@diagrid.io>
Signed-off-by: Casper Nielsen <casper@diagrid.io>
…d each activity gets applied retry policy

Signed-off-by: Casper Nielsen <casper@diagrid.io>
Signed-off-by: Casper Nielsen <casper@diagrid.io>
Copy link
Collaborator

@bibryam bibryam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Casper, thank you for working on this feature. I love this feature!

I think we should make sure that these attributes that are all related to each other in the context of a retry should be somehow grouped together.

I was thinking that we could add this within the agent execution config, such as this

 execution=AgentExecutionConfig(
            max_iterations=5
            retry_max_attempts=10,
            retry_initial_backoff=5,
            retry_max_backoff=45,
            retry_backoff_multiplier=1.5,
        ),

but this does not apply to regular agents. As a result, we cannot include it there.

This is the best I can come up with.

        retry=DurableRetryConfig(
            max_attempts=10,
            initial_backoff=5,
            max_backoff=45,
            backoff_multiplier=1.5,
        ),

This has few benefits:

  1. It shows the fields that are related to each other.
  2. They are about retry behaviour
  3. It makes clear that this is a durable retry different than the in-memory retries,
  4. This config can be set to durable agents only

HTH

Signed-off-by: Casper Nielsen <casper@diagrid.io>
Signed-off-by: Casper Nielsen <casper@diagrid.io>
@CasperGN CasperGN requested a review from bibryam December 18, 2025 14:07
@CasperGN
Copy link
Contributor Author

@bibryam have a look at the updates

Copy link
Collaborator

@sicoyle sicoyle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

few comments so far 🙏



@dataclass
class DurableRetryConfig:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be more workflow-specific. Something like WorkflowRetryConfig might be clearer, since this behavior only applies to the durable agent anyway, which makes “durable” feel a bit redundant here.

On retries more broadly, should we consider introducing a resiliency policy in our quickstarts—or at least documenting it—so users know how to apply retry policies for state operations and pub/sub? For example, we could reference the existing Dapr resiliency quickstart or general docs on this:
https://docs.dapr.io/getting-started/quickstarts/resiliency/resiliency-state-quickstart/

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll relabel this WorkflowRetryPolicy to be explicit on it

Comment on lines +282 to +285
max_attempts: Optional[int] = 1
initial_backoff_seconds: Optional[int] = 5
max_backoff_seconds: Optional[int] = 30
backoff_multiplier: Optional[float] = 1.5
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any reason there are four fields here yet docs show 5 options we can set?
https://docs.dapr.io/developing-applications/building-blocks/workflow/workflow-features-concepts/#retry-policies

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's an oversight on my part. I've added the 5th as well!

Comment on lines +145 to +147
raise (
ValueError("max_attempts or DAPR_API_MAX_RETRIES must be at least 1.")
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should consider if we want to deviate here. The default that i see in docs is actually a 0 value for this... which I think for durable agent durability we want retries enabled, just want confirmation on that :)
https://docs.dapr.io/developing-applications/sdks/python/python-client/#initialising-the-client

Copy link
Contributor Author

@CasperGN CasperGN Dec 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have to check on 1, else it might fail upstream

Signed-off-by: Casper Nielsen <casper@diagrid.io>
Signed-off-by: Casper Nielsen <casper@diagrid.io>
Signed-off-by: Casper Nielsen <casper@diagrid.io>
backoff_multiplier: Multiplier for exponential backoff.
"""

max_attempts: Optional[int] = 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue is asking for infinite retries by default. In dapr we have -1 in places to signal infinity. Would it make sense to allow for infinite retries here in that same manner as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately that's not how it is in the upstream sdk

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so as-is, there is no way to specify retries indefinitely? maybe we should tweak the upstream sdk eventually to support this :)

this PR is a good start!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cicoyle yes I concur. This should be fixed upstream in the python-sdk then carried down here. I'll put it on our todo list!

Signed-off-by: Casper Nielsen <casper@diagrid.io>
@CasperGN
Copy link
Contributor Author

@sicoyle, @cicoyle I've added dapr/python-sdk#870 for upstream fixing

Copy link
Collaborator

@sicoyle sicoyle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - thank you!!

@sicoyle sicoyle merged commit f649402 into dapr:main Dec 18, 2025
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Durable Agent lacks default retry for activities

4 participants