-
-
Notifications
You must be signed in to change notification settings - Fork 16
feat: add chunksize parameter to AutoEnzyme
#124
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if specifying the runtime activity flag with the auto forward mode created an EnzymeCore.set_runtime_activity(enzymecore.forward), that would mean that downstream users don't change
Downstream code needs to change anyway, if only because of the chunk size. |
Yeah but making sure that the downstream users don't need to reimplement the "to mode" logic is nice -- and especially making sure that it's implemented correctly here/consistently. Is there a reason for not doing that offhand? |
The main reason is people wanting to be able to specify the mode without having Enzyme loaded (see #113 or #123). If you really want to do that, then you need to defer the |
The middle ground you're suggesting is for people who are ok with loading EnzymeCore before constructing the backend object, but are not okay using the EnzymeCore mode objects, so they want something like this using EnzymeCore, ADTypes
backend = AutoEnzyme(mode=ADTypes.ForwardMode()) rather than something like that using EnzymeCore, ADTypes
backend = AutoEnzyme(mode=EnzymeCore.Forward) which sounds very strange to me |
To clarify, I completely agree with you that getting the EnzymeCore mode object as soon as possible and as uniquely as possible is a good thing for correctness. I'm just trying to be the devil's advocate and argue in favor of a request like #113, which I tried to entertain in this PR. But my opinion is that the added complexity is not worth it, and I don't really see a scenario where loading EnzymeCore at backend definition time is a bad thing. With that in mind, I'd rather keep one single mode specification, the existing one. |
I think it's mostly making as easy as possible for the user (just seeing the ADTypes package from their end, and not knowing they want to import EnzymeCore in place of enzyme, which seems to be the case in the linked issue). Similarly for consistency, if other backends also use the mode specifier from here. Also one we can potentially avoid the import two packages problem to register to mode is just depending on EnzymeCore (it is dependency free and rarely changed so shouldn't cause any problems) |
To sum up, we have four options:
I agree that 2 is not great for correctness. I'm putting a veto on 4, cause I don't want ADTypes or DI getting a hard dependency on any AD package, no matter how lightweight. This leaves 1 or 3. My preference would be to keep 1, precisely because I deem it more simple to have just one syntax, but if you think your users would prefer 3 with the additional ADTypes mode and the conversion in EnzymeCoreExt, I'm happy to adapt the PR. Your call. |
@wsmoses I just remembered a reason why options 3 and 4 are not possible anyway: when the user selects |
In other words, I think this PR is the right version, and I'd appreciate other reviews |
I actually dislike that convention, because it means that people don't know what to expect from downstream packages (e.g. maybe someone chooses forward vs reverse, and something fails unexpectedly). Similarly here you end up in the situation where you have multiple conflicting ways of specifying things -- leading to further confusion/ambiguity. Since I think I would push for 4 or 3, and veto 2; and you would push for 2 and veto 4, I think that means we stick with 1. |
Yeah, the default
Alright then, I removed the ADTypes mode specification, but I left runtime activity and chunksize choices. In particular, this will allow people to use If you think this is good to go, we can merge and I can do the DI follow up. |
I think without the conversion to mode in ADTypes, we shouldn't include runtime activity -- as thats also in the mode and would lead to ambiguity. |
Fair enough, I removed it in the last commit. Now this only adds a chunk size and changes nothing else |
AutoEnzyme
AutoEnzyme
Gentle bump @wsmoses |
if C isa Int | ||
@assert C > 0 | ||
elseif C isa Float64 | ||
@assert C == Inf |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should give a better error message here
|
||
+ an object subtyping `EnzymeCore.Mode` (like `EnzymeCore.Forward` or `EnzymeCore.Reverse`) if a specific mode is required | ||
+ `nothing` to choose the best mode automatically | ||
+ a positive `Int` to fix a constant chunk size |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I kind of wonder if a chunk size of 0 here would be a good way to represent maximum chunk size
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I get where you're coming from but I have two objections:
- semantically,
chunksize=Inf
makes much more sense even to the uninformed reader - practically, a zero chunksize is also a thing, and it is very very tricky to handle correctly across backends (see this recent discussion in DI Inconsistency in handling empty arguments JuliaDiff/DifferentiationInterface.jl#802) so I'd rather not confuse the two
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reading that, I still don't understand why a zero chunksize is semantically meaningful?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you denote by
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see where you're coming from vis-a-vis forward diff making a different design choice, but I'm not sure that is most critical here.
Alternatively, I kind of wonder, if it would be best to make an EnzymeCore.MaxChunk (which equally can be used by the Enzyme.gradient/jacobian wrappers), which would be the alternate here like there is for EnzymeCore.Mode
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm gonna veto the zero but if you want to add the max chunk setting to EnzymeCore that's fine by me too, your call
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Billy, just following up on this, do you want to add that setting to EnzymeCore?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah I think thats the right move. if you have cycles before I feel free to open a PR on
Checklist
contributor guidelines, in particular the SciML Style Guide and
COLPRAC.
Additional context
Fixes #85, fixes #114
Note that these changes rely on downstream packages (mostly DI) to interpret them correctly when reconstructing an Enzyme mode object before differentiation. Between the release of ADTypes v1.18 and the release of the corresponding DI patch, the new backend parameters introduced here will have zero effect. That's not a great situation but ADTypes has no way to require a future DI version before introducing these changes. The real solution to prevent such gaps would probably be to merge ADTypes into DI, so that the semantics are defined at the same time as the backends.