Skip to content

Worker: retry failed fetch:plan events #1200

@josephjclark

Description

@josephjclark

Seen a lost run lately where the fetch:plan event timed out, resulting in the run being lost.

This is a rare event, but the worker must handle the case better. Should the plan fail to fetch, it should be quite happy to back off and try again.

On these getter-style events (dataclip, plan, maybe credential) we don't have to worry about idempotence. So in the event of a timeout these events should just keep retrying until they a) error or b) succeed.

I suppose the flipside of this is: if the event consistently times out, the worker should give up and return some kind of error, rather than just letting the run be Lost

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    Status

    DevX Backlog

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions