Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
473 changes: 473 additions & 0 deletions source/client-backpressure/client-backpressure.md
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This specification introduces the overload retry policy, but similarly to the design, omits a very important piece: how should the current retry policy and the overload retry policy coexist? At the very least, the specification should cover the following (generally speaking, it's better if it does that by clearly expressing a principle from which the answers may be easily derived, rather than answering each question explicitly, as there may be more questions that have to be answered that I did not think about at the moment):

  1. Is it possible to encounter a failed attempt that is is eligible for a retry by both the current and the overload policy?
    1.1. I suspect, it currently is not, because the overload retry policy for now requires both RetryableError and SystemOverloadedError to be present. However, he specification should make the answer clear.
  2. What happens if the first attempt (so not a retry attempt) fails in a way that triggers a retry attempt according to the overload retry policy, and then the second attempt (the first retry attempt) fails in a way that could have triggered a retry attempt according to the current retry policy?
    2.1. The same question is for two attempts a(n), a(n+1) where the latter immediately1 follows the former, with the former, a(n), not being the first attempt.
    2.1.1. Note that such a situation may be encountered more than once for a single operation.
  3. What happens if the first attempt (so not a retry attempt) fails in a way that triggers a retry attempt according to the current retry policy, and then the second attempt (the first retry attempt) fails in a way that could have triggered a retry attempt according to the overload retry policy?
    3.1. The same question is for two attempts a(n), a(n+1) where the latter immediately1 follows the former, with the former, a(n), not being the first attempt.
    3.1.1. Note that such a situation may be encountered more than once for a single operation.
  4. The current retry policy for reads and writes specify which error is to be propagated to an application (if all attempts fail, there are multiple errors to choose from). The proposed overload retry policy does not do this even within itself; it should further specify which error is to be propagated to an application when some attempts of the same requested operation are done according to the current retry policy, while others are done according to the overload retry policy.

1 In terms of ordering relations, not in the temporal sense.

Large diffs are not rendered by default.

61 changes: 61 additions & 0 deletions source/client-backpressure/tests/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
# Client Backpressure Tests

______________________________________________________________________

## Introduction

The YAML and JSON files in this directory are platform-independent tests meant to exercise a driver's implementation of
retryable reads. These tests utilize the [Unified Test Format](../../unified-test-format/unified-test-format.md).

Several prose tests, which are not easily expressed in YAML, are also presented in this file. Those tests will need to
be manually implemented by each driver.

### Prose Tests

#### Test 1: Operation Retry Uses Exponential Backoff

Drivers should test that retries do not occur immediately when a SystemOverloadedError is encountered.

1. Let `client` be a `MongoClient`
2. Let `collection` be a collection
3. Now, run transactions without backoff:
1. Configure the random number generator used for jitter to always return `0` -- this effectively disables backoff.

2. Configure the following failPoint:

```javascript
{
configureFailPoint: 'failCommand',
mode: 'alwaysOn',
data: {
failCommands: ['insert'],
errorCode: 2,
errorLabels: ['SystemOverloadedError', 'RetryableError']
}
}
```

3. Execute the document `{ a: 1 }`. Expect that the command errors. Measure the duration of the command execution.

```javascript
const start = performance.now();
expect(
await coll.insertOne({ a: 1 }).catch(e => e)
).to.be.an.instanceof(MongoServerError);
const end = performance.now();
```

4. Configure the random number generator used for jitter to always return `1`.

5. Execute step 3 again.

6. Compare the two time between the two runs.
```python
assertTrue(with_backoff_time - no_backoff_time >= 2.1)
```
The sum of 5 backoffs is 3.1 seconds. There is a 1-second window to account for potential variance between the two
runs.

## Changelog

- 2025-XX-XX: Initial version.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a TODO item you'll update before merging. I think most files just create an empty changelog for new copies.

Loading
Loading