Handling batch submit/status failure for many tasks

There is currently some inconsistent behavior with how `/batch_status` and `/submit` error handling is done if many tasks are being dealt with.

In the case of `/batch_status`, there is no real error handling. This route returns a list of task statuses that are requested for using the `task_ids` parameter. If a task is not found, it is added to this list with `'status': 'Failed'`, and if it succeeds it is added to the list with the desired data for that task. The route always responds with a success response, even if all the tasks are marked as 'Failed'.

In the case of `/submit`, there is some error handling, and the route will respond with a 4xx/5xx if any of the submitted tasks fail during submission. This means that even if some tasks were submitted successfully during the request, a failure will be sent back with no additional info if any task fails.

This is a bit tricky particularly for the `/submit` endpoint, because if a user submits a single task the ideal standard is good error readability. But since this single submission is internally just a batch submit, it needs to also maintain consistency with what happens when many tasks are submitted, where some fail and others succeed.

My proposal: If an internal error occurs preventing the request from being processed at all, or if status/submit fails for all of the tasks, a 4xx/5xx is sent back with an error. If a status/submit succeeds for one of the tasks, send back a 200, with a list of successes/failures for each individual task. For both status and submit, even if a task fails, proceed with the other tasks in the batch until they have all been tried.

The pros of this approach are readability for simple, single submission tasks. The cons of this are that it could be confusing that a 4xx/5xx is sent back if all tasks fail, but a 200 is sent back if some tasks fail.

# Proposed Changes

The `/submit` route response would change from

```
{'status': 'Success',
'task_uuids': ['a', 'b'],
'task_uuid': ""}
```
(response code usually 200 unless some or all task launches fail)

to
```
{
'response': 'batch',
'results': [
  {
    'status': 'Success',
    'task_uuid': 'a',
    'http_status_code': 200
  },
  {
    'status': 'Failed',
    'code': 1,
    'task_uuid': 'b',
    'reason': 'human readable reason',
    'http_status_code': 4XX/5XX,
    ...
  },
  ...
]
}
```
(This response code would be a 207 HTTP multi-response since there were some successes and some fails. If everything was a success, it would be a 200. If an internal error occurred that made everything fail, it would be some 5XX)

When the funcx sdk receives such a batch response, it would store all the failed task submits in the local table, to be retrieved with `get_result` or `get_batch_result`. These failures would not be saved on the service side.

I think similar changes to the `/batch_status` route would be fitting, though they wouldn't need to be as drastic. Each status response object in the list would be an "http response object" of its own like above.





Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Handling batch submit/status failure for many tasks #210

Proposed Changes

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Handling batch submit/status failure for many tasks #210

Description

Proposed Changes

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions