Skip to content

[acc, pqc] Part 1 rebase of PQC development work into expo#201

Merged
apinise merged 83 commits intomasterfrom
acc-pqc-part1
Feb 20, 2026
Merged

[acc, pqc] Part 1 rebase of PQC development work into expo#201
apinise merged 83 commits intomasterfrom
acc-pqc-part1

Conversation

@apinise
Copy link
Contributor

@apinise apinise commented Feb 20, 2026

This PR commits the first half of changes from expo-otbn-pqc into expo. It has been separated into 2 parts due to the size of the rebase being > 100 commits.

See PR #197 for the entire rebase and CI tests passing.

bunnie and others added 30 commits February 20, 2026 13:43
…nvironment

poly_add_dilithium was arbitrarily chosen for its innocuous-sounding
name.

Signed-off-by: bunnie <bunnie@kosagi.com>
This contains the test case and expected data as copied from
the PQ working repo.

Note that tag on the expected data case is 'dexp' for the current
main branch, not 'exp'.

Signed-off-by: bunnie <bunnie@kosagi.com>
Simple overwrite of what's in the expo main, no changes.

Signed-off-by: bunnie <bunnie@kosagi.com>
these are also a direct copy from the PQ working repo

Signed-off-by: bunnie <bunnie@kosagi.com>
This is a full copy of the HJSON file from the PQ working repo.

This contains references to a kmac block which are not actually
incorporated in this patch set (it is in the simulator but not
in the hardware).

But more importantly, it also expands the size of the IMEM/DMEM
for OTBN, which is necessary in order for the "assert" to pass
on the size check in the dmem compare.

Signed-off-by: bunnie <bunnie@kosagi.com>
…ng repo

Here, I think there may be some divergence between what's in
expo and what's in the PQ working repo. The actual format of
the dmem expected data takes the form of address ranges followed
by hex numbers. What the original code seemed to expect is actually
a single large hex block.

The code from the working repo is pulled into expo in this commit,
but this may not actually be the correct decision.

Furthermore, there is a strange problem where parse_actual_dump()
is handed an ASCII object, but the type is 'bytes'. There is
a mismatch in the abstractions: what we get is ASCII data that looks
like this:

dmem:
04112345...

So there is literally the six bytes b'dmem:\n' are in the stream,
and the data being passed in is ASII characters representing
the binary data.

A converter which strips off the first 6 bytes, decodes the ASCII
and turns it into the binary data is inserted into parse_actual_dmem().

This feels hacky and wrong; but I'll commit this here and leave it
up for reviewers to comment on the correct method to fix this.

Signed-off-by: bunnie <bunnie@kosagi.com>
More API divergence. it looks like what might be happening is some
fairly fancy provision was baked into the expo main branch which
can go into an ELF file and extract multiple disjoint regions
of memory for comparison.

This would imply that the data is labeled and tagged for such
comparisons, but no such labels exist in the actual test data
provided.

This change *probably* breaks some other tests that depend on
this feature, so while this allows us to run this one PQ test
case, I have a strong feeling that this is the wrong approach.

However, it's easy to pull out commits so let's drop this
into a PR and see what reviewers have to say.

Signed-off-by: bunnie <bunnie@kosagi.com>
…ng of dmem

Signed-off-by: bunnie <bunnie@kosagi.com>
…ntitan

Usage:

1. Copy the routine from pq-opentitan into the desired location and setup
the BUILD template

2. Copy the test from pq-opentitan into the desired location, setup BUILD,
*but* in this case, rename the '.exp' file with an extension of '.exp2'.
The '.exp2' file serves as a reference to generate .exp/.dexp files.

3. Run the script using the test target you set up, e.g.:

`convert_exp.py --test_target //sw/otbn/crypto/tests/mldsa:poly_add_dilithium_test`

This will attempt to do the following:

1. Build the test, to create a .elf file by trying to run the test. It's
OK if the test fails, but we do need a .elf file.

2. From the .elf file, dumps all the symbols with `nm`

3. Tries to match based on address in dmem any contiguous memory regions
in the '.exp2' file, and assigns them to the label corresponding to
that label in the code. The matched areas are written to a '.dexp' file
that is consistent with the current format in expo.

4. Also checks to see if register values are specified, if the line
pattern-matches against a register spec, it will also output a '.exp'
file that contains the expceted register contents.

Regions that can't be matched are printed as an error.

Signed-off-by: bunnie <bunnie@kosagi.com>
This one I think extracts successfully, but the test doesn't pass
because the computed value doesn't match the expected value. This
might be more an indication that the port over of the OTBN extensions
is not actually complete.

Signed-off-by: bunnie <bunnie@kosagi.com>
…ng script

Signed-off-by: bunnie <bunnie@kosagi.com>
somewhere along the way this output file got munged into an
ascii text format which isn't actually the supported format
used by the bazel tools to compare outputs.

Signed-off-by: bunnie <bunnie@kosagi.com>
…exp file

Signed-off-by: bunnie <bunnie@kosagi.com>
This seems to be necessary to select a variant of mlkem/mldsa
for running a particular test.

Signed-off-by: bunnie <bunnie@kosagi.com>
this is another blind-copy of artifacts from the pqc-opentitan
repo, a lot of changes happened so probably the diff needs to be
carefully reviewed

Signed-off-by: bunnie <bunnie@kosagi.com>
I'm able to at least build the test, but running it fails. Some important
notes:

kyber768_mlkem_keypair_test is missing `const_tomont`. What I did
was just blindly copy all the constants from what looks to be a more
modern kyber_mlkem_keypair_test.s file back into this file in hopes
that maybe it works. This is probably the wrong thing to do.

kyber_mlkem_keypair_test.s does *not* have a .exp file associated
with it in the PQC-opentitan repo, which makes me think that the
testing methodology may have changed along the way. It also looks
like a section of entropy that should be an input to the test is
all 0's, which makes me think the test bench is expecting these
data to be poked into the test bench dynamically.

So, this commit serves as more of a checkpoint to see which series
of mlkem tests are the correct ones to be focusing on and to check
if the testing methodology is the same.

Signed-off-by: bunnie <bunnie@kosagi.com>
not sure if this breaks running from bazel environment, but
it fixed command line runs

Signed-off-by: bunnie <bunnie@kosagi.com>
this passes using the manually run script

Signed-off-by: bunnie <bunnie@kosagi.com>
this passes - next is to figure out how to integrate this
into bazel?

Signed-off-by: bunnie <bunnie@kosagi.com>
Signed-off-by: bunnie <bunnie@kosagi.com>
Signed-off-by: Jade Philipoom <jadep@zerorisc.com>
- optionally allow specifying --copt parameters
- optionally allow specifying a custom nm path
- allow 'D' as well as 'd' labels
- print errors if a command fails
- stop after finding one label for a data region

Signed-off-by: Jade Philipoom <jadep@zerorisc.com>
Also get dilithium2_keypair_test passing with exp -> dexp conversion

Signed-off-by: Jade Philipoom <jadep@zerorisc.com>
Signed-off-by: Jade Philipoom <jadep@zerorisc.com>
Signed-off-by: Jade Philipoom <jadep@zerorisc.com>
Signed-off-by: Jade Philipoom <jadep@zerorisc.com>
Signed-off-by: Jade Philipoom <jadep@zerorisc.com>
the false_dec tests currently still fail because they aren't working on the
source repo.

Signed-off-by: Jade Philipoom <jadep@zerorisc.com>
These were accidentally omitted from a recent commit.

Signed-off-by: Jade Philipoom <jadep@zerorisc.com>
these tests were broken before, but now they work!

Signed-off-by: Jade Philipoom <jadep@zerorisc.com>
apinise and others added 24 commits February 20, 2026 13:43
…v paper.

Contains the RTL for the new adder and multiplier architecture.
Changes to OTBN RTL for new instruction support.
Adds the ACCH WSR to the python model.
Includes new instructions to the bignum-insn.yml and otbnsim model.
Wrapped RTL behind and ifdef OTBN_PQC
Contains a few patch fixes to the MAC to fix build errors with dvsim.
Added mulv/l insns to smoke test.

Signed-off-by: Evan Apinis <eapinis@zerorisc.com>
Signed-off-by: Evan Apinis <eapinis@zerorisc.com>
Signed-off-by: Evan Apinis <eapinis@zerorisc.com>
Signed-off-by: Evan Apinis <eapinis@zerorisc.com>
…ension

Signed-off-by: Evan Apinis <eapinis@zerorisc.com>
Signed-off-by: Evan Apinis <eapinis@zerorisc.com>
Signed-off-by: Evan Apinis <eapinis@zerorisc.com>
This is a reapplication of the transformations in commit 53e256 to the code
with the new multiplier.

Signed-off-by: Jade Philipoom <jadep@zerorisc.com>
This is a reapplication of the transformations in commit 506dcc to the code
with the new multiplier.

Signed-off-by: Jade Philipoom <jadep@zerorisc.com>
…ssary.

This is a reapplication of the transformations in commit 51cd78 to the code
with the new multiplier.

Signed-off-by: Jade Philipoom <jadep@zerorisc.com>
This is a reapplication of the transformations in commit b0e6b8 to the code
with the new multiplier.

Signed-off-by: Jade Philipoom <jadep@zerorisc.com>
This is a reapplication of the transformations in commit e442e7 to the code
with the new multiplier.

Signed-off-by: Jade Philipoom <jadep@zerorisc.com>
This is a reapplication of the transformations in commit 1bb148 to the code
with the new multiplier.

Signed-off-by: Jade Philipoom <jadep@zerorisc.com>
This is a reapplication of the transformations in commit 50c83d to the code
with the new multiplier.

Signed-off-by: Jade Philipoom <jadep@zerorisc.com>
This is a reapplication of the transformations in commit 9d0817 to the code
with the new multiplier.

This is the final commit for reapplying sign memory optimizations, so here are
some quick spot-check stats from the hardcoded tests:

Comparison to 3e6aef, new multiplier without memory optimization:

Stats for rejection loops in mldsa44_sign_test:
71957	-> 177151	(+105194 cycles, +146.19%)
69786	-> 157492	(+87706 cycles,  +125.68%)
75180	-> 215185	(+140005 cycles, +186.23%)
52721	-> 138077	(+85356 cycles,  +161.90%)
71173	-> 176229	(+105056 cycles, +147.61%)

Stats for rejection loops in mldsa65_sign_test:
99186	-> 284517	(+185331 cycles, +186.85%)

Stats for rejection loops in mldsa87_sign_test:
104333	-> 360873	(+256540 cycles, +245.89%)
147344	-> 520633	(+373289 cycles, +253.35%)
145500	-> 502098	(+356598 cycles, +245.08%)
106992	-> 377983	(+270991 cycles, +253.28%)
100080	-> 342141	(+242061 cycles, +241.87%)

Comparison to 9d0817, old multiplier with memory optimization:

Stats for rejection loops in mldsa44_sign_test:
200888	-> 177151	(-23737 cycles, -11.82%)
177524	-> 157492	(-20032 cycles, -11.28%)
246330	-> 215185	(-31145 cycles, -12.64%)
154405	-> 138077	(-16328 cycles, -10.57%)
199966	-> 176229	(-23737 cycles, -11.87%)

Stats for rejection loops in mldsa65_sign_test:
319207	-> 284517	(-34690 cycles, -10.87%)

Stats for rejection loops in mldsa87_sign_test:
391092	-> 360873	(-30219 cycles, -7.73%)
582342	-> 520633	(-61709 cycles, -10.60%)
560102	-> 502098	(-58004 cycles, -10.36%)
411906	-> 377983	(-33923 cycles, -8.24%)
368656	-> 342141	(-26515 cycles, -7.19%)

Signed-off-by: Jade Philipoom <jadep@zerorisc.com>
Change .rept loops in sign that do not have a branch back to the start of the
rejection loop to use the `loopi` instruction. Saves 4616 bytes of IMEM size
for ML-DSA-87.

Signed-off-by: Jade Philipoom <jadep@zerorisc.com>
These are just to make it easier for callers to know how big of a stack the
function expects.

Signed-off-by: Jade Philipoom <jadep@zerorisc.com>
This bug wasn't previously showing up because polyt1_unpack also set t5 = 5 and
ran before polyz_unpack, but it emerged when I tried to move polyt1_unpack.

Signed-off-by: Jade Philipoom <jadep@zerorisc.com>
This is a first step towards not needing to store the entirety of w1 on the
stack. Required modifying the kmac_send_msg function to handle partial writes
before the end of the message.

Signed-off-by: Jade Philipoom <jadep@zerorisc.com>
This transformation is a step towards eliminating the polyvec-sized w1 buffer
on the stack, and removes the need for the temporary buffer for the packed w1.

Signed-off-by: Jade Philipoom <jadep@zerorisc.com>
Continue to transform the verify operation to operate on polynomials one at a
time to reduce stack requirements.

Signed-off-by: Jade Philipoom <jadep@zerorisc.com>
This allows us to drop the stack space needed for the matrix down from K*L
polynomials ({16,30,56} for ML-DSA-{44,65,87}) all the way down to 1
polynomial, saving 55kB of stack space for ML-DSA-87. Unlike for signing, this
introduces no performance penalty because the matrix is only used once.

Signed-off-by: Jade Philipoom <jadep@zerorisc.com>
This means we no longer need to store t1 on the stack.

Signed-off-by: Jade Philipoom <jadep@zerorisc.com>
This is the final memory optimization required to get ML-DSA verification under
32kB for all parameters. Here are some summary statistics compared with the
version of verification before any memory optimizations (just spot-checks using
the existing tests, not proper high-sample-size benchmarks):

ML-DSA-44 verify test: 130955 -> 131376 cycles (+0.32%)
ML-DSA-65 verify test: 215996 -> 216786 cycles (+0.37%)
ML-DSA-87 verify test: 364978 -> 366139 cycles (+0.32%)

The performance impact is consistent across parameter sets and very small,
which makes sense since most data structures in verification are only processed
once and streaming them incurs a small amount of overhead but not any
duplication of long steps.

Note that some of the rearrangements of verification checks (in particular the
one in this commit, which moves the decoding of h much later) will cause
invalid signatures to get rejected more slowly, even though they do not
strongly affect performance for valid signatures.

Signed-off-by: Jade Philipoom <jadep@zerorisc.com>
Copy link
Contributor

@qmn qmn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, this is just a split of #197. See https://github.com/orgs/community/discussions/27521#discussioncomment-13046664 for the 100-commit restriction on the "rebase and merge" merge method. Note that by rebasing and merging you'll have to rebase #202 (or #197) on top of this, as the commit hashes are going to diverge again.

@apinise apinise merged commit 5b06fd6 into master Feb 20, 2026
15 of 16 checks passed
@github-actions github-actions bot locked and limited conversation to collaborators Feb 20, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants