Skip to content

Conversation

@slinkydeveloper
Copy link

Signed-off-by: Francesco Guardiani francescoguard@gmail.com

Added timeout to the delivery spec. For more details: knative/eventing#5149

Signed-off-by: Francesco Guardiani <francescoguard@gmail.com>
@google-cla google-cla bot added the cla: yes Indicates the PR's author has signed the CLA. label Mar 25, 2021
@knative-prow-robot
Copy link

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
To complete the pull request process, please assign tayarani after the PR has been reviewed.
You can assign the PR to them by writing /assign @tayarani in a comment when ready.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@knative-prow-robot knative-prow-robot added the size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. label Mar 25, 2021
| `deadLetterSink` | [`duckv1.Destination`](#duckv1.Destination) | Optional | The sink receiving event that could not be sent to a `Destination`. | |
| `retry` | `int` | Optional | The minimum number of retries the sender should attempt when sending an event before moving it to the dead letter sink (when specified) or discarded (otherwise). | |
| `backoffPolicy` | `string` | Optional | The retry backoff policy (`linear` or `exponential`). | |
| `timeout` | `string` | Optional | The timeout of each single request. | |

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note this is a behaviour change for Brokers (all implementations), Triggers, Channels (all implementations), Subscriptions, Sequence, and Parallel.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to say what happens if we hit the timeout? e.g. what should be the error seen?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a timeout per attempt (eg like an http timeout) or the total time you hold on to the event including all retries?

@n3wscott
Copy link

I am not a Eventing WG lead, but I feel like this is a large change too close to when we are trying to ship a v1 eventing. Adding this will break a ton of out of tee implementations (channels and brokers).

@slinkydeveloper
Copy link
Author

I think this is an important API change, but on the other hand this is trivially implementable and requested by a user. I'm even inclined to say that this is more a bug fix than a new feature, given that setting the timeout is quite an important delivery feature, more than the retries.

I don't agree with the idea that we should stop improving our APIs now, even more because we're going v1 and we want to make sure we support a broad range of use cases and the UX is nice.

This won't break implementations at all, it will just add a field that some implementation won't support temporarily. There are some fields that right now even our reference implementation doesn't fully support/implement , eg Broker.Delivery.

@matzew
Copy link
Member

matzew commented Mar 26, 2021

I'd say, adding this is IMO very important before we ship a v1 API - this is IMO very important to be added, to avoid potential problems

@rhuss
Copy link
Contributor

rhuss commented Apr 7, 2021

I think both lines of reasonings (API convergence vs. importance to have individual timeouts) are definitely valid, so it's really a tradeoff. My take is because it is a perfect backwards-compatible update it feels like it is safe to add still to V1.

The following potential issues should be carefully considered IMO:

  • Bikeshedding issues: Is the naming timeout controversial? --> timeout seems to be appropriate when the context is clear, which timeout is meant. Should it be deliveryTimeout, in case there could be another timeout being added in the future for that specific part of the schema? I don't think this is likely, so I think the name is safe
  • Implementation stability: Is the change critical to the stability of the product? As @n3wscott points out it affects a lot of other resources. Are we totally clear which implementation of this duck type are affected and need to be changed and is there someone who takes responsibility? Is the implementation easy, straightforward and timely (including E2E tests) ? Do we need a full implementation everywhere or could it be an option to mark the field optional?
  • Opportunity costs: If we decide to implement that change, does this cost so much effort that something other, possibly with a higher priority can't be tackled? (considering we are striving for a time-boxed approach for v1 which I'm +1)
  • Creating a precedent: If we add this change in this phase, could it be a precedent for other change requests to be added as well, and we can't easily draw a line which to add and which not? Are there any of this kind of change that has been already rejected?
  • Future proof: Does the non-inclusion of this feature would require a quick update of V1 afterwards because this is a highly demanded feature? Are we in danger to dilute the API already? If this is a highly requested feature and we nail down V1 for some time to come, then the timeout likely will creep into Eventing via a backdoor (label/annotation) if not added as an optional field.

I think the naming is future-proof, but I can't really assess the other concerns. But my feeling is, adding it as an optional field would be a good compromise.

@slinkydeveloper
Copy link
Author

Are we totally clear which implementation of this duck type are affected and need to be changed and is there someone who takes responsibility? Is the implementation easy, straightforward and timely (including E2E tests) ? Do we need a full implementation everywhere or could it be an option to mark the field optional?

Delivery spec is a mandatory part of the spec to support for every broker and channel implementation. So all these components already support retries, backoff delay etc, they all just set a default timeout.
For all the components in knative and knative-sandbox org, implementing this field should be quite trivial, because most of them use eventing/pkg/kncloudevents to deliver events, so we just need to touch that code in order to support this new feature. For components that doesn't use that package, it should still be trivial to implement, given they already support the other parts of the delivery spec.

If we decide to implement that change, does this cost so much effort that something other, possibly with a higher priority can't be tackled?

I don't think so?

Creating a precedent

I don't understand what do you mean by this. This field came out from a user request, and I care about making users happy, in particular when they have a very valid request like this one, no matter if we're releasing 1.0 or 0.0.1, so when you say "creating a precedent" I feel like you're looking at it from the wrong POV.
If a user comes out with a valid feature request the day before 1.0, I'll push to implement it, no matter the moment. The fact that there is gatekeeping from adding user requested features in specific moments in time, because of preparing for a release and whatnot, is a project important deficiency that has to be addressed ASAP, both process wise and technological wise.

But my feeling is, adding it as an optional field would be a good compromise.

If your meaning of optional is "an implementation can decide to not support it", then this field is not optional. Implementations that doesn't support it will just ignore it for the moment. But all implementations must support DeliverySpec, hence they'll have to support this field. In other words, it won't break, but it must be implemented at some point.

In general, we don't have a way in eventing to say "this field is optional for an implementer to support".

@rhuss
Copy link
Contributor

rhuss commented Apr 7, 2021

so when you say "creating a precedent" I feel like you're looking at it from the wrong POV.

I mean, when there has been already some (user-driven) feature requests like this waiting to be implemented before 1.0 and has been rejected because of 1.0 convergence concerns, then adding the timeout would be a precedent. Not sure whether this is the case though (e.q. if such features exist that has been rejected).

If a user comes out with a valid feature request the day before 1.0, I'll push to implement it, no matter the moment.

I don't agree here. For a feature to be added in rush before such an important milestone, it must be carefully considered how it impacts that milestone. There needs to be a time span between 'api change freeze' and hardening / releasing the API IMO. If the change request comes after this freeze (which might have happened implicitly, according to the discussion that I'm aware of), then the feature request must not only be valid but also of highest priority to break that feature-silent time. My question is, whether a delivery timeout qualifies for this urgency (and whether we are already in this hard convergence phase).

@slinkydeveloper
Copy link
Author

slinkydeveloper commented Apr 7, 2021

There needs to be a time span between 'api change freeze' and hardening / releasing the API IMO.

There is no such agreement or public discussion about an eventual api change freeze period. That's why I'm proposing to go ahead and merge this PR.

@knative-prow-robot
Copy link

@slinkydeveloper: PR needs rebase.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@knative-prow-robot knative-prow-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 24, 2021
@duglin
Copy link
Contributor

duglin commented Aug 5, 2021

@slinkydeveloper can you please rebase this?

@duglin
Copy link
Contributor

duglin commented Sep 2, 2021

@lionelvillard what are you thoughts on the idea of this feature in general?

@lionelvillard
Copy link
Contributor

IMHO this is an important feature, easy to implement (most of it is already implemented, if not everything). so +1 to including it

@duglin
Copy link
Contributor

duglin commented Sep 16, 2021

is someone going to drive/own this since slinky has moved on?

@lionelvillard
Copy link
Contributor

I can drive this.

/assign

@duglin
Copy link
Contributor

duglin commented Sep 23, 2021

@lionelvillard rebase needed

@lionelvillard
Copy link
Contributor

closing in favor of #76 (to avoid CLA problems)

@lionelvillard
Copy link
Contributor

/close

@knative-prow-robot
Copy link

@lionelvillard: Closed this PR.

Details

In response to this:

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

odacremolbap pushed a commit to odacremolbap/specs that referenced this pull request Jun 19, 2022
Signed-off-by: Pierangelo Di Pilato <pierangelodipilato@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla: yes Indicates the PR's author has signed the CLA. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants