Skip to content

Conversation

bremoran
Copy link
Contributor

@bremoran bremoran commented Sep 10, 2025

On 32-bit architectures, each call to mld_keccakf1600_xor_bytes incurs an overhead. For example, on Arm v7-M and Arm v8-M and using the optimised bit interleave from xkcp xoring a lane into the state incurs an overhead of 37 instructions. Any time an incomplete lane is xored into the state, this penalty is paid twice. This PR ensures that only full lanes are xored into the state.

Fixes #445

Signed-off-by: Brendan Moran <brendan.moran@arm.com>
@bremoran bremoran requested a review from a team as a code owner September 10, 2025 13:36
Signed-off-by: Brendan Moran <brendan.moran@arm.com>
@rod-chapman
Copy link
Contributor

Please provide a description for this PR. What is the point of this refactoring? What benefit does it bring? Please provide CBMC proof harness and Makefile for any new functions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Low performance in mld_H
2 participants