[acc, pqc] Part 1 rebase of PQC development work into expo#201
Merged
[acc, pqc] Part 1 rebase of PQC development work into expo#201
Conversation
…nvironment poly_add_dilithium was arbitrarily chosen for its innocuous-sounding name. Signed-off-by: bunnie <bunnie@kosagi.com>
This contains the test case and expected data as copied from the PQ working repo. Note that tag on the expected data case is 'dexp' for the current main branch, not 'exp'. Signed-off-by: bunnie <bunnie@kosagi.com>
Simple overwrite of what's in the expo main, no changes. Signed-off-by: bunnie <bunnie@kosagi.com>
these are also a direct copy from the PQ working repo Signed-off-by: bunnie <bunnie@kosagi.com>
This is a full copy of the HJSON file from the PQ working repo. This contains references to a kmac block which are not actually incorporated in this patch set (it is in the simulator but not in the hardware). But more importantly, it also expands the size of the IMEM/DMEM for OTBN, which is necessary in order for the "assert" to pass on the size check in the dmem compare. Signed-off-by: bunnie <bunnie@kosagi.com>
…ng repo Here, I think there may be some divergence between what's in expo and what's in the PQ working repo. The actual format of the dmem expected data takes the form of address ranges followed by hex numbers. What the original code seemed to expect is actually a single large hex block. The code from the working repo is pulled into expo in this commit, but this may not actually be the correct decision. Furthermore, there is a strange problem where parse_actual_dump() is handed an ASCII object, but the type is 'bytes'. There is a mismatch in the abstractions: what we get is ASCII data that looks like this: dmem: 04112345... So there is literally the six bytes b'dmem:\n' are in the stream, and the data being passed in is ASII characters representing the binary data. A converter which strips off the first 6 bytes, decodes the ASCII and turns it into the binary data is inserted into parse_actual_dmem(). This feels hacky and wrong; but I'll commit this here and leave it up for reviewers to comment on the correct method to fix this. Signed-off-by: bunnie <bunnie@kosagi.com>
More API divergence. it looks like what might be happening is some fairly fancy provision was baked into the expo main branch which can go into an ELF file and extract multiple disjoint regions of memory for comparison. This would imply that the data is labeled and tagged for such comparisons, but no such labels exist in the actual test data provided. This change *probably* breaks some other tests that depend on this feature, so while this allows us to run this one PQ test case, I have a strong feeling that this is the wrong approach. However, it's easy to pull out commits so let's drop this into a PR and see what reviewers have to say. Signed-off-by: bunnie <bunnie@kosagi.com>
…ng of dmem Signed-off-by: bunnie <bunnie@kosagi.com>
…ntitan Usage: 1. Copy the routine from pq-opentitan into the desired location and setup the BUILD template 2. Copy the test from pq-opentitan into the desired location, setup BUILD, *but* in this case, rename the '.exp' file with an extension of '.exp2'. The '.exp2' file serves as a reference to generate .exp/.dexp files. 3. Run the script using the test target you set up, e.g.: `convert_exp.py --test_target //sw/otbn/crypto/tests/mldsa:poly_add_dilithium_test` This will attempt to do the following: 1. Build the test, to create a .elf file by trying to run the test. It's OK if the test fails, but we do need a .elf file. 2. From the .elf file, dumps all the symbols with `nm` 3. Tries to match based on address in dmem any contiguous memory regions in the '.exp2' file, and assigns them to the label corresponding to that label in the code. The matched areas are written to a '.dexp' file that is consistent with the current format in expo. 4. Also checks to see if register values are specified, if the line pattern-matches against a register spec, it will also output a '.exp' file that contains the expceted register contents. Regions that can't be matched are printed as an error. Signed-off-by: bunnie <bunnie@kosagi.com>
This one I think extracts successfully, but the test doesn't pass because the computed value doesn't match the expected value. This might be more an indication that the port over of the OTBN extensions is not actually complete. Signed-off-by: bunnie <bunnie@kosagi.com>
…ng script Signed-off-by: bunnie <bunnie@kosagi.com>
somewhere along the way this output file got munged into an ascii text format which isn't actually the supported format used by the bazel tools to compare outputs. Signed-off-by: bunnie <bunnie@kosagi.com>
…exp file Signed-off-by: bunnie <bunnie@kosagi.com>
This seems to be necessary to select a variant of mlkem/mldsa for running a particular test. Signed-off-by: bunnie <bunnie@kosagi.com>
this is another blind-copy of artifacts from the pqc-opentitan repo, a lot of changes happened so probably the diff needs to be carefully reviewed Signed-off-by: bunnie <bunnie@kosagi.com>
I'm able to at least build the test, but running it fails. Some important notes: kyber768_mlkem_keypair_test is missing `const_tomont`. What I did was just blindly copy all the constants from what looks to be a more modern kyber_mlkem_keypair_test.s file back into this file in hopes that maybe it works. This is probably the wrong thing to do. kyber_mlkem_keypair_test.s does *not* have a .exp file associated with it in the PQC-opentitan repo, which makes me think that the testing methodology may have changed along the way. It also looks like a section of entropy that should be an input to the test is all 0's, which makes me think the test bench is expecting these data to be poked into the test bench dynamically. So, this commit serves as more of a checkpoint to see which series of mlkem tests are the correct ones to be focusing on and to check if the testing methodology is the same. Signed-off-by: bunnie <bunnie@kosagi.com>
not sure if this breaks running from bazel environment, but it fixed command line runs Signed-off-by: bunnie <bunnie@kosagi.com>
this passes using the manually run script Signed-off-by: bunnie <bunnie@kosagi.com>
this passes - next is to figure out how to integrate this into bazel? Signed-off-by: bunnie <bunnie@kosagi.com>
Signed-off-by: bunnie <bunnie@kosagi.com>
Signed-off-by: Jade Philipoom <jadep@zerorisc.com>
- optionally allow specifying --copt parameters - optionally allow specifying a custom nm path - allow 'D' as well as 'd' labels - print errors if a command fails - stop after finding one label for a data region Signed-off-by: Jade Philipoom <jadep@zerorisc.com>
Also get dilithium2_keypair_test passing with exp -> dexp conversion Signed-off-by: Jade Philipoom <jadep@zerorisc.com>
Signed-off-by: Jade Philipoom <jadep@zerorisc.com>
Signed-off-by: Jade Philipoom <jadep@zerorisc.com>
Signed-off-by: Jade Philipoom <jadep@zerorisc.com>
Signed-off-by: Jade Philipoom <jadep@zerorisc.com>
the false_dec tests currently still fail because they aren't working on the source repo. Signed-off-by: Jade Philipoom <jadep@zerorisc.com>
These were accidentally omitted from a recent commit. Signed-off-by: Jade Philipoom <jadep@zerorisc.com>
these tests were broken before, but now they work! Signed-off-by: Jade Philipoom <jadep@zerorisc.com>
…v paper. Contains the RTL for the new adder and multiplier architecture. Changes to OTBN RTL for new instruction support. Adds the ACCH WSR to the python model. Includes new instructions to the bignum-insn.yml and otbnsim model. Wrapped RTL behind and ifdef OTBN_PQC Contains a few patch fixes to the MAC to fix build errors with dvsim. Added mulv/l insns to smoke test. Signed-off-by: Evan Apinis <eapinis@zerorisc.com>
Signed-off-by: Evan Apinis <eapinis@zerorisc.com>
Signed-off-by: Evan Apinis <eapinis@zerorisc.com>
Signed-off-by: Evan Apinis <eapinis@zerorisc.com>
…ension Signed-off-by: Evan Apinis <eapinis@zerorisc.com>
Signed-off-by: Evan Apinis <eapinis@zerorisc.com>
Signed-off-by: Evan Apinis <eapinis@zerorisc.com>
This is a reapplication of the transformations in commit 53e256 to the code with the new multiplier. Signed-off-by: Jade Philipoom <jadep@zerorisc.com>
This is a reapplication of the transformations in commit 506dcc to the code with the new multiplier. Signed-off-by: Jade Philipoom <jadep@zerorisc.com>
…ssary. This is a reapplication of the transformations in commit 51cd78 to the code with the new multiplier. Signed-off-by: Jade Philipoom <jadep@zerorisc.com>
This is a reapplication of the transformations in commit b0e6b8 to the code with the new multiplier. Signed-off-by: Jade Philipoom <jadep@zerorisc.com>
This is a reapplication of the transformations in commit e442e7 to the code with the new multiplier. Signed-off-by: Jade Philipoom <jadep@zerorisc.com>
This is a reapplication of the transformations in commit 1bb148 to the code with the new multiplier. Signed-off-by: Jade Philipoom <jadep@zerorisc.com>
This is a reapplication of the transformations in commit 50c83d to the code with the new multiplier. Signed-off-by: Jade Philipoom <jadep@zerorisc.com>
This is a reapplication of the transformations in commit 9d0817 to the code with the new multiplier. This is the final commit for reapplying sign memory optimizations, so here are some quick spot-check stats from the hardcoded tests: Comparison to 3e6aef, new multiplier without memory optimization: Stats for rejection loops in mldsa44_sign_test: 71957 -> 177151 (+105194 cycles, +146.19%) 69786 -> 157492 (+87706 cycles, +125.68%) 75180 -> 215185 (+140005 cycles, +186.23%) 52721 -> 138077 (+85356 cycles, +161.90%) 71173 -> 176229 (+105056 cycles, +147.61%) Stats for rejection loops in mldsa65_sign_test: 99186 -> 284517 (+185331 cycles, +186.85%) Stats for rejection loops in mldsa87_sign_test: 104333 -> 360873 (+256540 cycles, +245.89%) 147344 -> 520633 (+373289 cycles, +253.35%) 145500 -> 502098 (+356598 cycles, +245.08%) 106992 -> 377983 (+270991 cycles, +253.28%) 100080 -> 342141 (+242061 cycles, +241.87%) Comparison to 9d0817, old multiplier with memory optimization: Stats for rejection loops in mldsa44_sign_test: 200888 -> 177151 (-23737 cycles, -11.82%) 177524 -> 157492 (-20032 cycles, -11.28%) 246330 -> 215185 (-31145 cycles, -12.64%) 154405 -> 138077 (-16328 cycles, -10.57%) 199966 -> 176229 (-23737 cycles, -11.87%) Stats for rejection loops in mldsa65_sign_test: 319207 -> 284517 (-34690 cycles, -10.87%) Stats for rejection loops in mldsa87_sign_test: 391092 -> 360873 (-30219 cycles, -7.73%) 582342 -> 520633 (-61709 cycles, -10.60%) 560102 -> 502098 (-58004 cycles, -10.36%) 411906 -> 377983 (-33923 cycles, -8.24%) 368656 -> 342141 (-26515 cycles, -7.19%) Signed-off-by: Jade Philipoom <jadep@zerorisc.com>
Change .rept loops in sign that do not have a branch back to the start of the rejection loop to use the `loopi` instruction. Saves 4616 bytes of IMEM size for ML-DSA-87. Signed-off-by: Jade Philipoom <jadep@zerorisc.com>
These are just to make it easier for callers to know how big of a stack the function expects. Signed-off-by: Jade Philipoom <jadep@zerorisc.com>
This bug wasn't previously showing up because polyt1_unpack also set t5 = 5 and ran before polyz_unpack, but it emerged when I tried to move polyt1_unpack. Signed-off-by: Jade Philipoom <jadep@zerorisc.com>
This is a first step towards not needing to store the entirety of w1 on the stack. Required modifying the kmac_send_msg function to handle partial writes before the end of the message. Signed-off-by: Jade Philipoom <jadep@zerorisc.com>
This transformation is a step towards eliminating the polyvec-sized w1 buffer on the stack, and removes the need for the temporary buffer for the packed w1. Signed-off-by: Jade Philipoom <jadep@zerorisc.com>
Continue to transform the verify operation to operate on polynomials one at a time to reduce stack requirements. Signed-off-by: Jade Philipoom <jadep@zerorisc.com>
This allows us to drop the stack space needed for the matrix down from K*L
polynomials ({16,30,56} for ML-DSA-{44,65,87}) all the way down to 1
polynomial, saving 55kB of stack space for ML-DSA-87. Unlike for signing, this
introduces no performance penalty because the matrix is only used once.
Signed-off-by: Jade Philipoom <jadep@zerorisc.com>
This means we no longer need to store t1 on the stack. Signed-off-by: Jade Philipoom <jadep@zerorisc.com>
This is the final memory optimization required to get ML-DSA verification under 32kB for all parameters. Here are some summary statistics compared with the version of verification before any memory optimizations (just spot-checks using the existing tests, not proper high-sample-size benchmarks): ML-DSA-44 verify test: 130955 -> 131376 cycles (+0.32%) ML-DSA-65 verify test: 215996 -> 216786 cycles (+0.37%) ML-DSA-87 verify test: 364978 -> 366139 cycles (+0.32%) The performance impact is consistent across parameter sets and very small, which makes sense since most data structures in verification are only processed once and streaming them incurs a small amount of overhead but not any duplication of long steps. Note that some of the rearrangements of verification checks (in particular the one in this commit, which moves the decoding of h much later) will cause invalid signatures to get rejected more slowly, even though they do not strongly affect performance for valid signatures. Signed-off-by: Jade Philipoom <jadep@zerorisc.com>
This was referenced Feb 20, 2026
qmn
approved these changes
Feb 20, 2026
Contributor
There was a problem hiding this comment.
LGTM, this is just a split of #197. See https://github.com/orgs/community/discussions/27521#discussioncomment-13046664 for the 100-commit restriction on the "rebase and merge" merge method. Note that by rebasing and merging you'll have to rebase #202 (or #197) on top of this, as the commit hashes are going to diverge again.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR commits the first half of changes from expo-otbn-pqc into expo. It has been separated into 2 parts due to the size of the rebase being > 100 commits.
See PR #197 for the entire rebase and CI tests passing.