[acc, pqc] Part 1 rebase of PQC development work into expo by apinise · Pull Request #201 · zerorisc/expo

apinise · 2026-02-20T21:55:53Z

This PR commits the first half of changes from expo-otbn-pqc into expo. It has been separated into 2 parts due to the size of the rebase being > 100 commits.

See PR #197 for the entire rebase and CI tests passing.

…nvironment poly_add_dilithium was arbitrarily chosen for its innocuous-sounding name. Signed-off-by: bunnie <bunnie@kosagi.com>

This contains the test case and expected data as copied from the PQ working repo. Note that tag on the expected data case is 'dexp' for the current main branch, not 'exp'. Signed-off-by: bunnie <bunnie@kosagi.com>

Simple overwrite of what's in the expo main, no changes. Signed-off-by: bunnie <bunnie@kosagi.com>

these are also a direct copy from the PQ working repo Signed-off-by: bunnie <bunnie@kosagi.com>

This is a full copy of the HJSON file from the PQ working repo. This contains references to a kmac block which are not actually incorporated in this patch set (it is in the simulator but not in the hardware). But more importantly, it also expands the size of the IMEM/DMEM for OTBN, which is necessary in order for the "assert" to pass on the size check in the dmem compare. Signed-off-by: bunnie <bunnie@kosagi.com>

…ng repo Here, I think there may be some divergence between what's in expo and what's in the PQ working repo. The actual format of the dmem expected data takes the form of address ranges followed by hex numbers. What the original code seemed to expect is actually a single large hex block. The code from the working repo is pulled into expo in this commit, but this may not actually be the correct decision. Furthermore, there is a strange problem where parse_actual_dump() is handed an ASCII object, but the type is 'bytes'. There is a mismatch in the abstractions: what we get is ASCII data that looks like this: dmem: 04112345... So there is literally the six bytes b'dmem:\n' are in the stream, and the data being passed in is ASII characters representing the binary data. A converter which strips off the first 6 bytes, decodes the ASCII and turns it into the binary data is inserted into parse_actual_dmem(). This feels hacky and wrong; but I'll commit this here and leave it up for reviewers to comment on the correct method to fix this. Signed-off-by: bunnie <bunnie@kosagi.com>

More API divergence. it looks like what might be happening is some fairly fancy provision was baked into the expo main branch which can go into an ELF file and extract multiple disjoint regions of memory for comparison. This would imply that the data is labeled and tagged for such comparisons, but no such labels exist in the actual test data provided. This change *probably* breaks some other tests that depend on this feature, so while this allows us to run this one PQ test case, I have a strong feeling that this is the wrong approach. However, it's easy to pull out commits so let's drop this into a PR and see what reviewers have to say. Signed-off-by: bunnie <bunnie@kosagi.com>

…ng of dmem Signed-off-by: bunnie <bunnie@kosagi.com>

…ntitan Usage: 1. Copy the routine from pq-opentitan into the desired location and setup the BUILD template 2. Copy the test from pq-opentitan into the desired location, setup BUILD, *but* in this case, rename the '.exp' file with an extension of '.exp2'. The '.exp2' file serves as a reference to generate .exp/.dexp files. 3. Run the script using the test target you set up, e.g.: `convert_exp.py --test_target //sw/otbn/crypto/tests/mldsa:poly_add_dilithium_test` This will attempt to do the following: 1. Build the test, to create a .elf file by trying to run the test. It's OK if the test fails, but we do need a .elf file. 2. From the .elf file, dumps all the symbols with `nm` 3. Tries to match based on address in dmem any contiguous memory regions in the '.exp2' file, and assigns them to the label corresponding to that label in the code. The matched areas are written to a '.dexp' file that is consistent with the current format in expo. 4. Also checks to see if register values are specified, if the line pattern-matches against a register spec, it will also output a '.exp' file that contains the expceted register contents. Regions that can't be matched are printed as an error. Signed-off-by: bunnie <bunnie@kosagi.com>

This one I think extracts successfully, but the test doesn't pass because the computed value doesn't match the expected value. This might be more an indication that the port over of the OTBN extensions is not actually complete. Signed-off-by: bunnie <bunnie@kosagi.com>

…ng script Signed-off-by: bunnie <bunnie@kosagi.com>

somewhere along the way this output file got munged into an ascii text format which isn't actually the supported format used by the bazel tools to compare outputs. Signed-off-by: bunnie <bunnie@kosagi.com>

…exp file Signed-off-by: bunnie <bunnie@kosagi.com>

This seems to be necessary to select a variant of mlkem/mldsa for running a particular test. Signed-off-by: bunnie <bunnie@kosagi.com>

this is another blind-copy of artifacts from the pqc-opentitan repo, a lot of changes happened so probably the diff needs to be carefully reviewed Signed-off-by: bunnie <bunnie@kosagi.com>

I'm able to at least build the test, but running it fails. Some important notes: kyber768_mlkem_keypair_test is missing `const_tomont`. What I did was just blindly copy all the constants from what looks to be a more modern kyber_mlkem_keypair_test.s file back into this file in hopes that maybe it works. This is probably the wrong thing to do. kyber_mlkem_keypair_test.s does *not* have a .exp file associated with it in the PQC-opentitan repo, which makes me think that the testing methodology may have changed along the way. It also looks like a section of entropy that should be an input to the test is all 0's, which makes me think the test bench is expecting these data to be poked into the test bench dynamically. So, this commit serves as more of a checkpoint to see which series of mlkem tests are the correct ones to be focusing on and to check if the testing methodology is the same. Signed-off-by: bunnie <bunnie@kosagi.com>

not sure if this breaks running from bazel environment, but it fixed command line runs Signed-off-by: bunnie <bunnie@kosagi.com>

this passes using the manually run script Signed-off-by: bunnie <bunnie@kosagi.com>

this passes - next is to figure out how to integrate this into bazel? Signed-off-by: bunnie <bunnie@kosagi.com>

Signed-off-by: bunnie <bunnie@kosagi.com>

Signed-off-by: Jade Philipoom <jadep@zerorisc.com>

- optionally allow specifying --copt parameters - optionally allow specifying a custom nm path - allow 'D' as well as 'd' labels - print errors if a command fails - stop after finding one label for a data region Signed-off-by: Jade Philipoom <jadep@zerorisc.com>

Also get dilithium2_keypair_test passing with exp -> dexp conversion Signed-off-by: Jade Philipoom <jadep@zerorisc.com>

Signed-off-by: Jade Philipoom <jadep@zerorisc.com>

the false_dec tests currently still fail because they aren't working on the source repo. Signed-off-by: Jade Philipoom <jadep@zerorisc.com>

These were accidentally omitted from a recent commit. Signed-off-by: Jade Philipoom <jadep@zerorisc.com>

these tests were broken before, but now they work! Signed-off-by: Jade Philipoom <jadep@zerorisc.com>

…v paper. Contains the RTL for the new adder and multiplier architecture. Changes to OTBN RTL for new instruction support. Adds the ACCH WSR to the python model. Includes new instructions to the bignum-insn.yml and otbnsim model. Wrapped RTL behind and ifdef OTBN_PQC Contains a few patch fixes to the MAC to fix build errors with dvsim. Added mulv/l insns to smoke test. Signed-off-by: Evan Apinis <eapinis@zerorisc.com>

Signed-off-by: Evan Apinis <eapinis@zerorisc.com>

…ension Signed-off-by: Evan Apinis <eapinis@zerorisc.com>

Signed-off-by: Evan Apinis <eapinis@zerorisc.com>

This is a reapplication of the transformations in commit 53e256 to the code with the new multiplier. Signed-off-by: Jade Philipoom <jadep@zerorisc.com>

This is a reapplication of the transformations in commit 506dcc to the code with the new multiplier. Signed-off-by: Jade Philipoom <jadep@zerorisc.com>

…ssary. This is a reapplication of the transformations in commit 51cd78 to the code with the new multiplier. Signed-off-by: Jade Philipoom <jadep@zerorisc.com>

This is a reapplication of the transformations in commit b0e6b8 to the code with the new multiplier. Signed-off-by: Jade Philipoom <jadep@zerorisc.com>

This is a reapplication of the transformations in commit e442e7 to the code with the new multiplier. Signed-off-by: Jade Philipoom <jadep@zerorisc.com>

This is a reapplication of the transformations in commit 1bb148 to the code with the new multiplier. Signed-off-by: Jade Philipoom <jadep@zerorisc.com>

This is a reapplication of the transformations in commit 50c83d to the code with the new multiplier. Signed-off-by: Jade Philipoom <jadep@zerorisc.com>

This is a reapplication of the transformations in commit 9d0817 to the code with the new multiplier. This is the final commit for reapplying sign memory optimizations, so here are some quick spot-check stats from the hardcoded tests: Comparison to 3e6aef, new multiplier without memory optimization: Stats for rejection loops in mldsa44_sign_test: 71957 -> 177151 (+105194 cycles, +146.19%) 69786 -> 157492 (+87706 cycles, +125.68%) 75180 -> 215185 (+140005 cycles, +186.23%) 52721 -> 138077 (+85356 cycles, +161.90%) 71173 -> 176229 (+105056 cycles, +147.61%) Stats for rejection loops in mldsa65_sign_test: 99186 -> 284517 (+185331 cycles, +186.85%) Stats for rejection loops in mldsa87_sign_test: 104333 -> 360873 (+256540 cycles, +245.89%) 147344 -> 520633 (+373289 cycles, +253.35%) 145500 -> 502098 (+356598 cycles, +245.08%) 106992 -> 377983 (+270991 cycles, +253.28%) 100080 -> 342141 (+242061 cycles, +241.87%) Comparison to 9d0817, old multiplier with memory optimization: Stats for rejection loops in mldsa44_sign_test: 200888 -> 177151 (-23737 cycles, -11.82%) 177524 -> 157492 (-20032 cycles, -11.28%) 246330 -> 215185 (-31145 cycles, -12.64%) 154405 -> 138077 (-16328 cycles, -10.57%) 199966 -> 176229 (-23737 cycles, -11.87%) Stats for rejection loops in mldsa65_sign_test: 319207 -> 284517 (-34690 cycles, -10.87%) Stats for rejection loops in mldsa87_sign_test: 391092 -> 360873 (-30219 cycles, -7.73%) 582342 -> 520633 (-61709 cycles, -10.60%) 560102 -> 502098 (-58004 cycles, -10.36%) 411906 -> 377983 (-33923 cycles, -8.24%) 368656 -> 342141 (-26515 cycles, -7.19%) Signed-off-by: Jade Philipoom <jadep@zerorisc.com>

Change .rept loops in sign that do not have a branch back to the start of the rejection loop to use the `loopi` instruction. Saves 4616 bytes of IMEM size for ML-DSA-87. Signed-off-by: Jade Philipoom <jadep@zerorisc.com>

These are just to make it easier for callers to know how big of a stack the function expects. Signed-off-by: Jade Philipoom <jadep@zerorisc.com>

This bug wasn't previously showing up because polyt1_unpack also set t5 = 5 and ran before polyz_unpack, but it emerged when I tried to move polyt1_unpack. Signed-off-by: Jade Philipoom <jadep@zerorisc.com>

This is a first step towards not needing to store the entirety of w1 on the stack. Required modifying the kmac_send_msg function to handle partial writes before the end of the message. Signed-off-by: Jade Philipoom <jadep@zerorisc.com>

This transformation is a step towards eliminating the polyvec-sized w1 buffer on the stack, and removes the need for the temporary buffer for the packed w1. Signed-off-by: Jade Philipoom <jadep@zerorisc.com>

Continue to transform the verify operation to operate on polynomials one at a time to reduce stack requirements. Signed-off-by: Jade Philipoom <jadep@zerorisc.com>

This allows us to drop the stack space needed for the matrix down from K*L polynomials ({16,30,56} for ML-DSA-{44,65,87}) all the way down to 1 polynomial, saving 55kB of stack space for ML-DSA-87. Unlike for signing, this introduces no performance penalty because the matrix is only used once. Signed-off-by: Jade Philipoom <jadep@zerorisc.com>

This means we no longer need to store t1 on the stack. Signed-off-by: Jade Philipoom <jadep@zerorisc.com>

This is the final memory optimization required to get ML-DSA verification under 32kB for all parameters. Here are some summary statistics compared with the version of verification before any memory optimizations (just spot-checks using the existing tests, not proper high-sample-size benchmarks): ML-DSA-44 verify test: 130955 -> 131376 cycles (+0.32%) ML-DSA-65 verify test: 215996 -> 216786 cycles (+0.37%) ML-DSA-87 verify test: 364978 -> 366139 cycles (+0.32%) The performance impact is consistent across parameter sets and very small, which makes sense since most data structures in verification are only processed once and streaming them incurs a small amount of overhead but not any duplication of long steps. Note that some of the rearrangements of verification checks (in particular the one in this commit, which moves the decoding of h much later) will cause invalid signatures to get rejected more slowly, even though they do not strongly affect performance for valid signatures. Signed-off-by: Jade Philipoom <jadep@zerorisc.com>

qmn

LGTM, this is just a split of #197. See https://github.com/orgs/community/discussions/27521#discussioncomment-13046664 for the 100-commit restriction on the "rebase and merge" merge method. Note that by rebasing and merging you'll have to rebase #202 (or #197) on top of this, as the commit hashes are going to diverge again.

bunnie and others added 30 commits February 20, 2026 13:43

[otbn] add a simple routine which serves as a basis for testing the e…

0b163e5

…nvironment poly_add_dilithium was arbitrarily chosen for its innocuous-sounding name. Signed-off-by: bunnie <bunnie@kosagi.com>

[otbn] Add the test case for poly_add_dilithium_test

17e0192

This contains the test case and expected data as copied from the PQ working repo. Note that tag on the expected data case is 'dexp' for the current main branch, not 'exp'. Signed-off-by: bunnie <bunnie@kosagi.com>

[otbn] copy in the otbnsim files directly from the working pq repo

b1c5919

Simple overwrite of what's in the expo main, no changes. Signed-off-by: bunnie <bunnie@kosagi.com>

[otbn] copy over YML files that reflect vector data type

35e9af1

these are also a direct copy from the PQ working repo Signed-off-by: bunnie <bunnie@kosagi.com>

[otbn] Revert last two commits so we can adopt expo's mainline handli…

b728916

…ng of dmem Signed-off-by: bunnie <bunnie@kosagi.com>

[otbn] this file was renamed to .exp2, as it is an input to the porti…

e5f1b8e

…ng script Signed-off-by: bunnie <bunnie@kosagi.com>

[otbn] align the output of otbnsim dmem range with what upstream expects

31ef810

somewhere along the way this output file got munged into an ascii text format which isn't actually the supported format used by the bazel tools to compare outputs. Signed-off-by: bunnie <bunnie@kosagi.com>

[otbn] rename the input file to .exp2 and add the generated output .d…

988ecc1

…exp file Signed-off-by: bunnie <bunnie@kosagi.com>

[otbn] add the ability to past -D to tests via copt

0b05c6a

This seems to be necessary to select a variant of mlkem/mldsa for running a particular test. Signed-off-by: bunnie <bunnie@kosagi.com>

[otbn] smash in otbn_as and otbn_ld

840332b

this is another blind-copy of artifacts from the pqc-opentitan repo, a lot of changes happened so probably the diff needs to be carefully reviewed Signed-off-by: bunnie <bunnie@kosagi.com>

[otbn] fix path to imported library

0785da9

not sure if this breaks running from bazel environment, but it fixed command line runs Signed-off-by: bunnie <bunnie@kosagi.com>

[otbn] add updated kyber768 keypair test and vectors

3a45d87

this passes using the manually run script Signed-off-by: bunnie <bunnie@kosagi.com>

[otbn] add a command-line runner for kyber768 test

9e7454f

this passes - next is to figure out how to integrate this into bazel? Signed-off-by: bunnie <bunnie@kosagi.com>

[otbn] modernize otbn ram configuration interface

6fad740

Signed-off-by: bunnie <bunnie@kosagi.com>

[otbn] copy dilithium code and tests from pqc-opentitan repo

8fc81a9

Signed-off-by: Jade Philipoom <jadep@zerorisc.com>

[otbn] get all dilithium tests building

72aaa39

Also get dilithium2_keypair_test passing with exp -> dexp conversion Signed-off-by: Jade Philipoom <jadep@zerorisc.com>

[otbn] all dilithium tests pass

0cabd34

Signed-off-by: Jade Philipoom <jadep@zerorisc.com>

[otbn] remove a few empty files

80f2e68

Signed-off-by: Jade Philipoom <jadep@zerorisc.com>

[otbn] add gcc path to bazel rules

f80607a

Signed-off-by: Jade Philipoom <jadep@zerorisc.com>

[otbn] copy kyber tests

666982f

Signed-off-by: Jade Philipoom <jadep@zerorisc.com>

[otbn] kyber enc/dec/keypair tests pass

7123bac

the false_dec tests currently still fail because they aren't working on the source repo. Signed-off-by: Jade Philipoom <jadep@zerorisc.com>

[otbn] add back a few convert_exp.py tweaks

8e677ae

These were accidentally omitted from a recent commit. Signed-off-by: Jade Philipoom <jadep@zerorisc.com>

[otbn] update false_dec tests from new changes in pq repo

fa52b25

these tests were broken before, but now they work! Signed-off-by: Jade Philipoom <jadep@zerorisc.com>

apinise and others added 24 commits February 20, 2026 13:43

[otbn] Expanded the snippet gen to verify vector ISA extensions

df12627

Signed-off-by: Evan Apinis <eapinis@zerorisc.com>

[otbn] Renaming otbn crypto tests for mlkem with new vector ISA

d58b32f

Signed-off-by: Evan Apinis <eapinis@zerorisc.com>

[otbn] Added new mlkem software tests for vector ISA extension

b08cd3a

Signed-off-by: Evan Apinis <eapinis@zerorisc.com>

[otbn] Renamed mldsa crypto test files associated with vector ISA ext…

955b810

…ension Signed-off-by: Evan Apinis <eapinis@zerorisc.com>

[otbn] Modifying mldsa crypto tests to support vector ISA extension

eb8f653

Signed-off-by: Evan Apinis <eapinis@zerorisc.com>

[otbn] Added copyright headers to files

cef1f77

Signed-off-by: Evan Apinis <eapinis@zerorisc.com>

[mldsa] Stream the matrix A in ML-DSA signing.

019c926

This is a reapplication of the transformations in commit 53e256 to the code with the new multiplier. Signed-off-by: Jade Philipoom <jadep@zerorisc.com>

[mldsa] Make CP and MAT use the same stack slot.

56bc9f9

This is a reapplication of the transformations in commit 506dcc to the code with the new multiplier. Signed-off-by: Jade Philipoom <jadep@zerorisc.com>

[mldsa] Adjust polynomial library so that stack alignment is not nece…

e98f56a

…ssary. This is a reapplication of the transformations in commit 51cd78 to the code with the new multiplier. Signed-off-by: Jade Philipoom <jadep@zerorisc.com>

[mldsa] Compute y on the fly during signing to save memory.

935bc43

This is a reapplication of the transformations in commit b0e6b8 to the code with the new multiplier. Signed-off-by: Jade Philipoom <jadep@zerorisc.com>

[mldsa] Small adjustments to the stack and CTILDE stores.

3d01fe8

This is a reapplication of the transformations in commit e442e7 to the code with the new multiplier. Signed-off-by: Jade Philipoom <jadep@zerorisc.com>

[mldsa] Rearrange sign procedure to stream polyvecs.

cc095f4

This is a reapplication of the transformations in commit 1bb148 to the code with the new multiplier. Signed-off-by: Jade Philipoom <jadep@zerorisc.com>

[mldsa] Compress w1 in ML-DSA signing.

322f763

This is a reapplication of the transformations in commit 50c83d to the code with the new multiplier. Signed-off-by: Jade Philipoom <jadep@zerorisc.com>

[mldsa] Switch to hardware loops in sign to save code size.

27a7dfe

Change .rept loops in sign that do not have a branch back to the start of the rejection loop to use the `loopi` instruction. Saves 4616 bytes of IMEM size for ML-DSA-87. Signed-off-by: Jade Philipoom <jadep@zerorisc.com>

[mldsa] Add STACK_SIZE references to sign.

e2262ac

These are just to make it easier for callers to know how big of a stack the function expects. Signed-off-by: Jade Philipoom <jadep@zerorisc.com>

[mldsa] Fix one-character typo in polyz_unpack.

7f86d0e

This bug wasn't previously showing up because polyt1_unpack also set t5 = 5 and ran before polyz_unpack, but it emerged when I tried to move polyt1_unpack. Signed-off-by: Jade Philipoom <jadep@zerorisc.com>

[mldsa] Send packed w1 to KMAC incrementally.

992204b

This is a first step towards not needing to store the entirety of w1 on the stack. Required modifying the kmac_send_msg function to handle partial writes before the end of the message. Signed-off-by: Jade Philipoom <jadep@zerorisc.com>

[mldsa] Compute and pack w1 polynomials incrementally.

e36974c

This transformation is a step towards eliminating the polyvec-sized w1 buffer on the stack, and removes the need for the temporary buffer for the packed w1. Signed-off-by: Jade Philipoom <jadep@zerorisc.com>

[mldsa] Compute approximation of w incrementally.

f5fb8df

Continue to transform the verify operation to operate on polynomials one at a time to reduce stack requirements. Signed-off-by: Jade Philipoom <jadep@zerorisc.com>

[mldsa] Process t1 one polynomial at a time in verify.

5658f9b

This means we no longer need to store t1 on the stack. Signed-off-by: Jade Philipoom <jadep@zerorisc.com>

apinise requested a review from qmn February 20, 2026 21:56

This was referenced Feb 20, 2026

[acc, pqc] Part 2 rebase of PQC development work into expo #202

Closed

[acc, pqc] Rebase of PQC development work into expo #197

Merged

qmn approved these changes Feb 20, 2026

View reviewed changes

apinise merged commit 5b06fd6 into master Feb 20, 2026
15 of 16 checks passed

github-actions bot locked and limited conversation to collaborators Feb 20, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[acc, pqc] Part 1 rebase of PQC development work into expo#201

[acc, pqc] Part 1 rebase of PQC development work into expo#201
apinise merged 83 commits intomasterfrom
acc-pqc-part1

apinise commented Feb 20, 2026

Uh oh!

qmn left a comment •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

apinise commented Feb 20, 2026

Uh oh!

qmn left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

qmn left a comment •

edited

Loading