For some reason, each successive function call seems to take a few milliseconds longer than the last. This doesn't show up in the benchmarks either, so I'm not sure what's going on here. I've tested this through the python interface via iminuit and also with the current development work integrating ganesh, and it happens in both cases. Maybe there's some clone I don't know about? I think at the very least there is definitely an optimization that can be done over the Expression struct to improve cache locality, but it's a bit messy (see the recursion crate for the basic idea).