Skip to content

Opportunity to save 1 instruction from mcycle checks #308

@edubart

Description

@edubart

Context

Currently the interpreter hot loop does:

while (mcycle < mcycle_tick_end) {
    // Fetch, decode, execute
    mcycle++;
}

But it could be simplified to something like:

uint64_t remaining = mcycle_tick_end - mcycle;
mcycle += remaining;
for (; remaining > 0; remaining--) {
    // Fetch, decode, execute
}

This may reduce 1 instruction in the interpreter's hot inner loop for both amd64/arm64 (by using SUB instruction), see https://godbolt.org/z/MvPGYscaP as a PoC. But to do this, I will need to stop propagating mcycle on every memory access instruction, and maybe introduce an mtime CSR that gets incremented every RTC tick, in order to remove the need to propagate mcycle to client device when using rtc_cycle_to_time(a->read_mcycle()).

Furthermore, this will free up a register currently reserved for mcycle_tick_end, making it usable inside the interpreter's hot loop, allowing the optimizer to perform better register allocation inside the hot loop.

When doing this, it's worth experimenting with increasing RTC_FREQ_DIV_DEF from 8192 to 16384, since the interpreter outer loop will start performing a write to mtime every tick. Also, because the interpreter recently got 2x speedups, to the point where time inside the machine is advancing too fast when doing intensive computations, ideally the RTC frequency should have a value that attempts to make time pass closer to what would pass in the host.

This idea is something I've had for a while, and it has been briefly discussed internally. I am writing it down as an issue so I do not forget to attempt it someday.

Metadata

Metadata

Assignees

Labels

Type

Projects

Status

PR Available

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions