Skip to content

Conversation

@tor-oscar
Copy link

A new attempt at adding error handling to the Source.run/1 call.

This has been attempted twice before, #155 and #153. Both were (implicitly) closed due to me being dragged away for a very long time, sorry about that.

If the Source.run/1 function returns an error tuple, raises an error or takes longer than Dataloader.dataloader_timeout/1 the Source is removed from :sources field and the key :error with the reason as value is added.

This causes a variety of issues down the line, among them the misleading/unhelpful error message:

** (RuntimeError) Source does not exist: <failed_source_name>

Registered sources are:

[<other_source_names...> , :error]

This PR intends to preserve the source name key and replace the source implementation with {:error, reason}instead of replacing the source key value pair with the error key value pair.

There is no reasonable recovery in case of a failure. This is motivated by a two points:

  • There is no way to clear the failing batches from a Source (currently), any future Source.run/1 will
    retry the same batches with potential additional batches. If the issue causing a timeout is not
    resolved between runs it will incur severe delays to full response.
  • If future Source.run/1 on failed Sources would be allowed, it is not possible to distinguish
    between failures of future batch/key and the previous Source error without keeping track of
    which batch was part of which run. Not distinguishing between different errors could be
    very detrimental during debugging.

It works seamlessly for Dataloader.get/4 and Dataloader.get_many/4 as they will simply handle the error tuple as any other error tuple.

It works fine for pending_batches? as source that has returned an error does not have any pending batches, ie false is a reasonable return value.

It is a little icky with Dataloader.put/5 since it is currently no perceivable difference between calling it on an errored source and not. The error will not be triggered until Dataloader.get/4 or Dataloader.get_many/4 is called.

It is a little icky with Dataloader.load/4 and Dataloader.load_many/4 since it is currently no perceivable difference between calling it on an errored source and not. The error will not be triggered until Dataloader.get/4 or Dataloader.get_many/4 is called.

Additional changes

  • Added test of the timeouts and general error handling
  • Added test describing the max_timeout feature
  • Made the added margin to the timeout configurable to support testing without waiting seconds.

Thoughts?

@tor-oscar
Copy link
Author

@benwilson512 Do you have any comments or opinions on this approach to handling Source errors?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant