Skip to content

Conversation

gdalle
Copy link
Collaborator

@gdalle gdalle commented Aug 15, 2025

Checklist

  • Appropriate tests were added
  • Any code changes were done in a way that does not break public API
  • All documentation related to code changes were updated
  • The new code follows the
    contributor guidelines, in particular the SciML Style Guide and
    COLPRAC.
  • Any new documentation only uses public API

Additional context

Fixes #85, fixes #114

Note that these changes rely on downstream packages (mostly DI) to interpret them correctly when reconstructing an Enzyme mode object before differentiation. Between the release of ADTypes v1.18 and the release of the corresponding DI patch, the new backend parameters introduced here will have zero effect. That's not a great situation but ADTypes has no way to require a future DI version before introducing these changes. The real solution to prevent such gaps would probably be to merge ADTypes into DI, so that the semantics are defined at the same time as the backends.

@gdalle gdalle marked this pull request as draft August 15, 2025 22:15
Copy link
Collaborator

@wsmoses wsmoses left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if specifying the runtime activity flag with the auto forward mode created an EnzymeCore.set_runtime_activity(enzymecore.forward), that would mean that downstream users don't change

@gdalle
Copy link
Collaborator Author

gdalle commented Aug 15, 2025

Downstream code needs to change anyway, if only because of the chunk size.
Indeed I thought of adding a converter inside the EnzymeCore extension, but that would make it necessary for EnzymeCore to be loaded for the AutoEnzyme(; mode=ADTypes.ForwardMode()) constructor to run. I guess it depends how extreme you want to be about #113 / #123: does your ADTypes constructor need to work before Enzyme is loaded or not?

@wsmoses
Copy link
Collaborator

wsmoses commented Aug 16, 2025

Yeah but making sure that the downstream users don't need to reimplement the "to mode" logic is nice -- and especially making sure that it's implemented correctly here/consistently.

Is there a reason for not doing that offhand?

@gdalle
Copy link
Collaborator Author

gdalle commented Aug 16, 2025

The main reason is people wanting to be able to specify the mode without having Enzyme loaded (see #113 or #123). If you really want to do that, then you need to defer the to_mode call to the differentiation step.
But in all honesty, that sounds like a bad idea to me anyway. If you're going to use Enzyme, you might as well load it, or rather load EnzymeCore which is very lightweight. Furthermore, having more than one syntax to specify the mode is unneeded complexity. I'd much prefer removing everything from this PR except the chunk size addition, which is not present in the Enzyme mode object and which is consistent with other ADTypes backends.

@gdalle
Copy link
Collaborator Author

gdalle commented Aug 16, 2025

The middle ground you're suggesting is for people who are ok with loading EnzymeCore before constructing the backend object, but are not okay using the EnzymeCore mode objects, so they want something like this

using EnzymeCore, ADTypes
backend = AutoEnzyme(mode=ADTypes.ForwardMode())

rather than something like that

using EnzymeCore, ADTypes
backend = AutoEnzyme(mode=EnzymeCore.Forward)

which sounds very strange to me

@gdalle
Copy link
Collaborator Author

gdalle commented Aug 16, 2025

To clarify, I completely agree with you that getting the EnzymeCore mode object as soon as possible and as uniquely as possible is a good thing for correctness. I'm just trying to be the devil's advocate and argue in favor of a request like #113, which I tried to entertain in this PR. But my opinion is that the added complexity is not worth it, and I don't really see a scenario where loading EnzymeCore at backend definition time is a bad thing. With that in mind, I'd rather keep one single mode specification, the existing one.

@wsmoses
Copy link
Collaborator

wsmoses commented Aug 16, 2025

I think it's mostly making as easy as possible for the user (just seeing the ADTypes package from their end, and not knowing they want to import EnzymeCore in place of enzyme, which seems to be the case in the linked issue). Similarly for consistency, if other backends also use the mode specifier from here.

Also one we can potentially avoid the import two packages problem to register to mode is just depending on EnzymeCore (it is dependency free and rarely changed so shouldn't cause any problems)

@gdalle
Copy link
Collaborator Author

gdalle commented Aug 16, 2025

To sum up, we have four options:

  1. Specify mode with EnzymeCore only (current behavior)
  2. Specify mode with EnzymeCore or ADTypes, conversion to EnzymeCore downstream (current state of this PR)
  3. Specify mode with EnzymeCore or ADTypes, conversion to EnzymeCore in ADTypes extension
  4. Specify mode with EnzymeCore or ADTypes, conversion to EnzymeCore in ADTypes itself with new dependency on EnzymeCore

I agree that 2 is not great for correctness. I'm putting a veto on 4, cause I don't want ADTypes or DI getting a hard dependency on any AD package, no matter how lightweight. This leaves 1 or 3. My preference would be to keep 1, precisely because I deem it more simple to have just one syntax, but if you think your users would prefer 3 with the additional ADTypes mode and the conversion in EnzymeCoreExt, I'm happy to adapt the PR. Your call.

@gdalle
Copy link
Collaborator Author

gdalle commented Aug 27, 2025

@wsmoses I just remembered a reason why options 3 and 4 are not possible anyway: when the user selects AutoEnzyme(; mode = nothing), this means that we need to pick the best mode depending on the differentiation operator that gets used. For DI.pushforward, it will be a forward mode, but for DI.gradient, it will be a reverse mode (and I assume other packages like Optimization.jl do something similar). This kind of choice needs to happen downstream, so the conversion to an EnzymeCore object needs to happen downstream as well (because we cannot convert nothing to an EnzymeCore.mode without knowing the operator to apply).

@gdalle
Copy link
Collaborator Author

gdalle commented Aug 27, 2025

In other words, I think this PR is the right version, and I'd appreciate other reviews

@gdalle gdalle marked this pull request as ready for review August 27, 2025 05:53
@wsmoses
Copy link
Collaborator

wsmoses commented Aug 27, 2025

I actually dislike that convention, because it means that people don't know what to expect from downstream packages (e.g. maybe someone chooses forward vs reverse, and something fails unexpectedly). Similarly here you end up in the situation where you have multiple conflicting ways of specifying things -- leading to further confusion/ambiguity.

Since I think I would push for 4 or 3, and veto 2; and you would push for 2 and veto 4, I think that means we stick with 1.

@gdalle
Copy link
Collaborator Author

gdalle commented Aug 27, 2025

I actually dislike that convention, because it means that people don't know what to expect from downstream packages (e.g. maybe someone chooses forward vs reverse, and something fails unexpectedly). Similarly here you end up in the situation where you have multiple conflicting ways of specifying things -- leading to further confusion/ambiguity.

Yeah, the default AutoEnzyme() is a compromise. It's open to downstream interpretation, but at least it doesn't expect beginners to know what forward and reverse modes are, or which one is (usually) best for their given application. In any case, it's probably too breaking to remove.

Since I think I would push for 4 or 3, and veto 2; and you would push for 2 and veto 4, I think that means we stick with 1.

Alright then, I removed the ADTypes mode specification, but I left runtime activity and chunksize choices. In particular, this will allow people to use AutoEnzyme(; runtime_activity=true) in e.g. Turing and have DI automatically pick set_runtime_activity(Reverse) as the mode when computing gradients, which is a nice QoL improvement.

If you think this is good to go, we can merge and I can do the DI follow up.

@wsmoses
Copy link
Collaborator

wsmoses commented Aug 27, 2025

I think without the conversion to mode in ADTypes, we shouldn't include runtime activity -- as thats also in the mode and would lead to ambiguity.

@gdalle
Copy link
Collaborator Author

gdalle commented Aug 27, 2025

Fair enough, I removed it in the last commit. Now this only adds a chunk size and changes nothing else

@gdalle gdalle changed the title feat: add runtime activity and chunksize parameters to AutoEnzyme feat: add chunksize parameter to AutoEnzyme Aug 27, 2025
@gdalle gdalle requested a review from wsmoses August 27, 2025 13:40
@gdalle
Copy link
Collaborator Author

gdalle commented Sep 8, 2025

Gentle bump @wsmoses

if C isa Int
@assert C > 0
elseif C isa Float64
@assert C == Inf
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should give a better error message here


+ an object subtyping `EnzymeCore.Mode` (like `EnzymeCore.Forward` or `EnzymeCore.Reverse`) if a specific mode is required
+ `nothing` to choose the best mode automatically
+ a positive `Int` to fix a constant chunk size
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kind of wonder if a chunk size of 0 here would be a good way to represent maximum chunk size

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get where you're coming from but I have two objections:

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reading that, I still don't understand why a zero chunksize is semantically meaningful?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you denote by $N$ the dimension, $C$ the chunk size, $N_C$ the number of chunks, you have $N = N_C \cdot C$ (plus a remainder possibly). For $N = 0$, you can either pick $C = 0$ or $N_C = 0$. None of those means a lot to be honest, but different backends have different conventions, and ForwardDiff picks a zero chunk size in the zero-length case by default (JuliaDiff/DifferentiationInterface.jl#835 (comment)) while Enzyme doesn't. That's why I'd rather steer clear of this whole mess.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see where you're coming from vis-a-vis forward diff making a different design choice, but I'm not sure that is most critical here.

Alternatively, I kind of wonder, if it would be best to make an EnzymeCore.MaxChunk (which equally can be used by the Enzyme.gradient/jacobian wrappers), which would be the alternate here like there is for EnzymeCore.Mode

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm gonna veto the zero but if you want to add the max chunk setting to EnzymeCore that's fine by me too, your call

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Billy, just following up on this, do you want to add that setting to EnzymeCore?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah I think thats the right move. if you have cycles before I feel free to open a PR on

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

More parameters for AutoEnzyme Adapt to Enzyme 0.13
2 participants