Used bit packing for evaluating ARM condition codes instead of a switch-case#2
Used bit packing for evaluating ARM condition codes instead of a switch-case#2wheremyfoodat wants to merge 2 commits intomattrbeck:masterfrom
Conversation
|
Hey thanks so much for submitting this! This is a tricky little change that I don't think I would have thought of haha. However, I just pulled down the changes locally and compared to the current FPS I'm seeing, and I wasn't actually able to see any improvement in Emerald. In fact, I'm seeing ~2 FPS lower on Golden Sun on average across a few runs. I don't really understand why it would be slower for me, since logically it seems like it should just be an improvement. You were able to see an FPS gain though? |
Yeah though nothing too groundbreaking. |
|
Somewhat related to this issue, but I think an easy / free optimisation to implement is to check if the cond is AL (0xE), if so, continue, otherwise, use the LUT (or switch). In the vast majority or cases, the cond is going to be AL, so the switch / LUT won't be hit. If crystal supports marking stuff likely/unlikely, you can label that if(cond==AL) as likely. |
|
@ITotalJustice Thanks for the idea! Tested in 8d9c789, although I didn't see any noticeable improvement in the few games I tested |
This PR gains a couple FPS depending on the scene in emerald in my VM, tell me if you get any benefit from it
The truthfulness of an ARM condition depends on 2 factors:
This means that you can use a 256-entry truth table that uses the upper 8 bits as a hash, instead of using a switch-case which would probably compile to an array lookup + indirect jump.
LUTs aren't generally the best thing for the cache and stuff, so they shouldn't be abused toooo much. So, here's a neat bit packing trick which originates from MelonDS's ARM interpreter, which uses a packed 32 (16*2) byte LUT of masks depending on the condition code, instead of a switch-case, to verify if a condition is true. The 16 masks in the LUT are magic numbers which get masked by (1 << CPSR_FLAGS). The masks are specially-made so that
masks [conditionCode] & (1 << CPSR_FLAGS)will always return a non-zero value if the condition is met, and 0 if not. This way, you canI used Pokemon Emerald to make sure it works and arm.gba wihch still passes.
I tried ARMWrestler too but I couldn't find the start button. It boots though.
Tell me what you think when you can