-
Notifications
You must be signed in to change notification settings - Fork 24
Refactor squeeze/absorb to use mld_keccakf1600 primitives #436
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @bremoran. I support the refactoring of fips202.c.
The changes in keccakf1600.c seem not needed for the time being.
The CBMC proofs need to be adjusted (basically you need to add the newly called function to the USE_FUNCTION_CONTRACT=
make variable in the respective failing proofs). I can take care of this after addressed my comment above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mac Mini (M1, 2020) benchmarks (opt)
Benchmark suite | Current: bce8611 | Previous: d0e5eba | Ratio |
---|---|---|---|
ML-DSA-44 keypair |
49562 cycles |
50485 cycles |
0.98 |
ML-DSA-44 sign |
198858 cycles |
222428 cycles |
0.89 |
ML-DSA-44 verify |
58672 cycles |
72829 cycles |
0.81 |
ML-DSA-65 keypair |
86329 cycles |
87391 cycles |
0.99 |
ML-DSA-65 sign |
324629 cycles |
356394 cycles |
0.91 |
ML-DSA-65 verify |
93809 cycles |
112773 cycles |
0.83 |
ML-DSA-87 keypair |
139251 cycles |
140140 cycles |
0.99 |
ML-DSA-87 sign |
395320 cycles |
425766 cycles |
0.93 |
ML-DSA-87 verify |
148970 cycles |
173236 cycles |
0.86 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mac Mini (M1, 2020) benchmarks (no-opt)
Benchmark suite | Current: bce8611 | Previous: d0e5eba | Ratio |
---|---|---|---|
ML-DSA-44 keypair |
115116 cycles |
116059 cycles |
0.99 |
ML-DSA-44 sign |
430512 cycles |
454902 cycles |
0.95 |
ML-DSA-44 verify |
122178 cycles |
136834 cycles |
0.89 |
ML-DSA-65 keypair |
196935 cycles |
198013 cycles |
0.99 |
ML-DSA-65 sign |
700849 cycles |
733554 cycles |
0.96 |
ML-DSA-65 verify |
197607 cycles |
216827 cycles |
0.91 |
ML-DSA-87 keypair |
324939 cycles |
335358 cycles |
0.97 |
ML-DSA-87 sign |
884167 cycles |
915279 cycles |
0.97 |
ML-DSA-87 verify |
328670 cycles |
353531 cycles |
0.93 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Arm Cortex-A76 (Raspberry Pi 5) benchmarks (opt)
Benchmark suite | Current: bce8611 | Previous: d0e5eba | Ratio |
---|---|---|---|
ML-DSA-44 keypair |
118642 cycles |
120291 cycles |
0.99 |
ML-DSA-44 sign |
470944 cycles |
487575 cycles |
0.97 |
ML-DSA-44 verify |
136176 cycles |
145607 cycles |
0.94 |
ML-DSA-65 keypair |
204732 cycles |
207407 cycles |
0.99 |
ML-DSA-65 sign |
781331 cycles |
803126 cycles |
0.97 |
ML-DSA-65 verify |
219238 cycles |
231885 cycles |
0.95 |
ML-DSA-87 keypair |
334153 cycles |
336321 cycles |
0.99 |
ML-DSA-87 sign |
960834 cycles |
982698 cycles |
0.98 |
ML-DSA-87 verify |
354097 cycles |
369410 cycles |
0.96 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Arm Cortex-A76 (Raspberry Pi 5) benchmarks (no-opt)
Benchmark suite | Current: bce8611 | Previous: d0e5eba | Ratio |
---|---|---|---|
ML-DSA-44 keypair |
213163 cycles |
214921 cycles |
0.99 |
ML-DSA-44 sign |
781109 cycles |
796938 cycles |
0.98 |
ML-DSA-44 verify |
229936 cycles |
239879 cycles |
0.96 |
ML-DSA-65 keypair |
381392 cycles |
383546 cycles |
0.99 |
ML-DSA-65 sign |
1284323 cycles |
1316119 cycles |
0.98 |
ML-DSA-65 verify |
372595 cycles |
384800 cycles |
0.97 |
ML-DSA-87 keypair |
608888 cycles |
611048 cycles |
1.00 |
ML-DSA-87 sign |
1642534 cycles |
1662747 cycles |
0.99 |
ML-DSA-87 verify |
621151 cycles |
637375 cycles |
0.97 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Intel Xeon 4th gen (c7i)
Benchmark suite | Current: bce8611 | Previous: d0e5eba | Ratio |
---|---|---|---|
ML-DSA-44 keypair |
36212 cycles |
37336 cycles |
0.97 |
ML-DSA-44 sign |
163085 cycles |
169656 cycles |
0.96 |
ML-DSA-44 verify |
45930 cycles |
50105 cycles |
0.92 |
ML-DSA-65 keypair |
65187 cycles |
65461 cycles |
1.00 |
ML-DSA-65 sign |
269644 cycles |
279581 cycles |
0.96 |
ML-DSA-65 verify |
73502 cycles |
78137 cycles |
0.94 |
ML-DSA-87 keypair |
99000 cycles |
101115 cycles |
0.98 |
ML-DSA-87 sign |
317827 cycles |
328206 cycles |
0.97 |
ML-DSA-87 verify |
109541 cycles |
116497 cycles |
0.94 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Intel Xeon 3rd gen (c6i)
Benchmark suite | Current: bce8611 | Previous: d0e5eba | Ratio |
---|---|---|---|
ML-DSA-44 keypair |
59095 cycles |
60983 cycles |
0.97 |
ML-DSA-44 sign |
255045 cycles |
265066 cycles |
0.96 |
ML-DSA-44 verify |
74279 cycles |
80645 cycles |
0.92 |
ML-DSA-65 keypair |
104923 cycles |
108431 cycles |
0.97 |
ML-DSA-65 sign |
423767 cycles |
444769 cycles |
0.95 |
ML-DSA-65 verify |
119732 cycles |
129801 cycles |
0.92 |
ML-DSA-87 keypair |
160932 cycles |
163435 cycles |
0.98 |
ML-DSA-87 sign |
499287 cycles |
511616 cycles |
0.98 |
ML-DSA-87 verify |
179998 cycles |
189081 cycles |
0.95 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Intel Xeon 4th gen (c7i) (no-opt)
Benchmark suite | Current: bce8611 | Previous: d0e5eba | Ratio |
---|---|---|---|
ML-DSA-44 keypair |
94596 cycles |
96372 cycles |
0.98 |
ML-DSA-44 sign |
342989 cycles |
351483 cycles |
0.98 |
ML-DSA-44 verify |
101490 cycles |
105691 cycles |
0.96 |
ML-DSA-65 keypair |
163526 cycles |
163710 cycles |
1.00 |
ML-DSA-65 sign |
570845 cycles |
576859 cycles |
0.99 |
ML-DSA-65 verify |
163518 cycles |
170256 cycles |
0.96 |
ML-DSA-87 keypair |
269205 cycles |
273579 cycles |
0.98 |
ML-DSA-87 sign |
722380 cycles |
735275 cycles |
0.98 |
ML-DSA-87 verify |
271184 cycles |
280114 cycles |
0.97 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AMD EPYC 3rd gen (c6a)
Benchmark suite | Current: bce8611 | Previous: d0e5eba | Ratio |
---|---|---|---|
ML-DSA-44 keypair |
72744 cycles |
74177 cycles |
0.98 |
ML-DSA-44 sign |
257248 cycles |
268537 cycles |
0.96 |
ML-DSA-44 verify |
82419 cycles |
89027 cycles |
0.93 |
ML-DSA-65 keypair |
124718 cycles |
127704 cycles |
0.98 |
ML-DSA-65 sign |
418393 cycles |
437619 cycles |
0.96 |
ML-DSA-65 verify |
133899 cycles |
143019 cycles |
0.94 |
ML-DSA-87 keypair |
215781 cycles |
212177 cycles |
1.02 |
ML-DSA-87 sign |
537519 cycles |
542602 cycles |
0.99 |
ML-DSA-87 verify |
223941 cycles |
230379 cycles |
0.97 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AMD EPYC 4th gen (c7a)
Benchmark suite | Current: bce8611 | Previous: d0e5eba | Ratio |
---|---|---|---|
ML-DSA-44 keypair |
46361 cycles |
45310 cycles |
1.02 |
ML-DSA-44 sign |
197761 cycles |
198143 cycles |
1.00 |
ML-DSA-44 verify |
59051 cycles |
62984 cycles |
0.94 |
ML-DSA-65 keypair |
76750 cycles |
76871 cycles |
1.00 |
ML-DSA-65 sign |
306116 cycles |
325750 cycles |
0.94 |
ML-DSA-65 verify |
87186 cycles |
96266 cycles |
0.91 |
ML-DSA-87 keypair |
114996 cycles |
116674 cycles |
0.99 |
ML-DSA-87 sign |
354465 cycles |
366154 cycles |
0.97 |
ML-DSA-87 verify |
129885 cycles |
139952 cycles |
0.93 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Intel Xeon 3rd gen (c6i) (no-opt)
Benchmark suite | Current: bce8611 | Previous: d0e5eba | Ratio |
---|---|---|---|
ML-DSA-44 keypair |
157050 cycles |
159577 cycles |
0.98 |
ML-DSA-44 sign |
563354 cycles |
574753 cycles |
0.98 |
ML-DSA-44 verify |
169123 cycles |
175203 cycles |
0.97 |
ML-DSA-65 keypair |
269997 cycles |
272720 cycles |
0.99 |
ML-DSA-65 sign |
929518 cycles |
945985 cycles |
0.98 |
ML-DSA-65 verify |
274528 cycles |
283415 cycles |
0.97 |
ML-DSA-87 keypair |
449825 cycles |
453069 cycles |
0.99 |
ML-DSA-87 sign |
1176593 cycles |
1192316 cycles |
0.99 |
ML-DSA-87 verify |
459136 cycles |
469703 cycles |
0.98 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Graviton4
Benchmark suite | Current: bce8611 | Previous: d0e5eba | Ratio |
---|---|---|---|
ML-DSA-44 keypair |
71522 cycles |
73300 cycles |
0.98 |
ML-DSA-44 sign |
273874 cycles |
282486 cycles |
0.97 |
ML-DSA-44 verify |
82240 cycles |
87271 cycles |
0.94 |
ML-DSA-65 keypair |
126531 cycles |
129003 cycles |
0.98 |
ML-DSA-65 sign |
449807 cycles |
461243 cycles |
0.98 |
ML-DSA-65 verify |
133375 cycles |
139667 cycles |
0.95 |
ML-DSA-87 keypair |
205265 cycles |
208392 cycles |
0.98 |
ML-DSA-87 sign |
553986 cycles |
565901 cycles |
0.98 |
ML-DSA-87 verify |
215759 cycles |
223269 cycles |
0.97 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AMD EPYC 3rd gen (c6a) (no-opt)
Benchmark suite | Current: bce8611 | Previous: d0e5eba | Ratio |
---|---|---|---|
ML-DSA-44 keypair |
135508 cycles |
136345 cycles |
0.99 |
ML-DSA-44 sign |
542398 cycles |
554700 cycles |
0.98 |
ML-DSA-44 verify |
148663 cycles |
155242 cycles |
0.96 |
ML-DSA-65 keypair |
227809 cycles |
227981 cycles |
1.00 |
ML-DSA-65 sign |
874643 cycles |
893854 cycles |
0.98 |
ML-DSA-65 verify |
236078 cycles |
243954 cycles |
0.97 |
ML-DSA-87 keypair |
374966 cycles |
376878 cycles |
0.99 |
ML-DSA-87 sign |
1098058 cycles |
1115644 cycles |
0.98 |
ML-DSA-87 verify |
386905 cycles |
398895 cycles |
0.97 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AMD EPYC 4th gen (c7a) (no-opt)
Benchmark suite | Current: bce8611 | Previous: d0e5eba | Ratio |
---|---|---|---|
ML-DSA-44 keypair |
120986 cycles |
121632 cycles |
0.99 |
ML-DSA-44 sign |
451932 cycles |
461721 cycles |
0.98 |
ML-DSA-44 verify |
131661 cycles |
137157 cycles |
0.96 |
ML-DSA-65 keypair |
205657 cycles |
205812 cycles |
1.00 |
ML-DSA-65 sign |
740072 cycles |
752983 cycles |
0.98 |
ML-DSA-65 verify |
211066 cycles |
216831 cycles |
0.97 |
ML-DSA-87 keypair |
339147 cycles |
339530 cycles |
1.00 |
ML-DSA-87 sign |
942332 cycles |
956240 cycles |
0.99 |
ML-DSA-87 verify |
348162 cycles |
357463 cycles |
0.97 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Graviton2
Benchmark suite | Current: bce8611 | Previous: d0e5eba | Ratio |
---|---|---|---|
ML-DSA-44 keypair |
119130 cycles |
120799 cycles |
0.99 |
ML-DSA-44 sign |
472589 cycles |
488399 cycles |
0.97 |
ML-DSA-44 verify |
136811 cycles |
145989 cycles |
0.94 |
ML-DSA-65 keypair |
204918 cycles |
208066 cycles |
0.98 |
ML-DSA-65 sign |
781470 cycles |
805186 cycles |
0.97 |
ML-DSA-65 verify |
219277 cycles |
232525 cycles |
0.94 |
ML-DSA-87 keypair |
334609 cycles |
336894 cycles |
0.99 |
ML-DSA-87 sign |
962569 cycles |
985538 cycles |
0.98 |
ML-DSA-87 verify |
354138 cycles |
370175 cycles |
0.96 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Graviton3
Benchmark suite | Current: bce8611 | Previous: d0e5eba | Ratio |
---|---|---|---|
ML-DSA-44 keypair |
76608 cycles |
77954 cycles |
0.98 |
ML-DSA-44 sign |
293501 cycles |
304386 cycles |
0.96 |
ML-DSA-44 verify |
89487 cycles |
95538 cycles |
0.94 |
ML-DSA-65 keypair |
133497 cycles |
135025 cycles |
0.99 |
ML-DSA-65 sign |
482326 cycles |
496830 cycles |
0.97 |
ML-DSA-65 verify |
143934 cycles |
151258 cycles |
0.95 |
ML-DSA-87 keypair |
215860 cycles |
217588 cycles |
0.99 |
ML-DSA-87 sign |
594391 cycles |
607209 cycles |
0.98 |
ML-DSA-87 verify |
230110 cycles |
239670 cycles |
0.96 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Graviton4 (no-opt)
Benchmark suite | Current: bce8611 | Previous: d0e5eba | Ratio |
---|---|---|---|
ML-DSA-44 keypair |
133082 cycles |
134660 cycles |
0.99 |
ML-DSA-44 sign |
498260 cycles |
506939 cycles |
0.98 |
ML-DSA-44 verify |
144896 cycles |
149574 cycles |
0.97 |
ML-DSA-65 keypair |
226122 cycles |
228568 cycles |
0.99 |
ML-DSA-65 sign |
813917 cycles |
824817 cycles |
0.99 |
ML-DSA-65 verify |
231233 cycles |
237103 cycles |
0.98 |
ML-DSA-87 keypair |
374601 cycles |
377328 cycles |
0.99 |
ML-DSA-87 sign |
1019872 cycles |
1030998 cycles |
0.99 |
ML-DSA-87 verify |
383530 cycles |
390919 cycles |
0.98 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Graviton3 (no-opt)
Benchmark suite | Current: bce8611 | Previous: d0e5eba | Ratio |
---|---|---|---|
ML-DSA-44 keypair |
138666 cycles |
139731 cycles |
0.99 |
ML-DSA-44 sign |
495052 cycles |
505692 cycles |
0.98 |
ML-DSA-44 verify |
148928 cycles |
154387 cycles |
0.96 |
ML-DSA-65 keypair |
242421 cycles |
245185 cycles |
0.99 |
ML-DSA-65 sign |
810192 cycles |
823206 cycles |
0.98 |
ML-DSA-65 verify |
241091 cycles |
248706 cycles |
0.97 |
ML-DSA-87 keypair |
396138 cycles |
397296 cycles |
1.00 |
ML-DSA-87 sign |
1031110 cycles |
1044080 cycles |
0.99 |
ML-DSA-87 verify |
402422 cycles |
411301 cycles |
0.98 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Graviton2 (no-opt)
Benchmark suite | Current: bce8611 | Previous: d0e5eba | Ratio |
---|---|---|---|
ML-DSA-44 keypair |
213471 cycles |
215538 cycles |
0.99 |
ML-DSA-44 sign |
782770 cycles |
809773 cycles |
0.97 |
ML-DSA-44 verify |
230256 cycles |
239814 cycles |
0.96 |
ML-DSA-65 keypair |
381757 cycles |
384029 cycles |
0.99 |
ML-DSA-65 sign |
1286038 cycles |
1307857 cycles |
0.98 |
ML-DSA-65 verify |
373026 cycles |
385278 cycles |
0.97 |
ML-DSA-87 keypair |
609835 cycles |
611946 cycles |
1.00 |
ML-DSA-87 sign |
1645687 cycles |
1665959 cycles |
0.99 |
ML-DSA-87 verify |
621822 cycles |
637587 cycles |
0.98 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SpacemiT K1 8 (Banana Pi F3) benchmarks (no-opt)
Benchmark suite | Current: bce8611 | Previous: d0e5eba | Ratio |
---|---|---|---|
ML-DSA-44 keypair |
824555 cycles |
828347 cycles |
1.00 |
ML-DSA-44 sign |
3322292 cycles |
3361635 cycles |
0.99 |
ML-DSA-44 verify |
910081 cycles |
931593 cycles |
0.98 |
ML-DSA-65 keypair |
1390153 cycles |
1394523 cycles |
1.00 |
ML-DSA-65 sign |
5439685 cycles |
5464549 cycles |
1.00 |
ML-DSA-65 verify |
1455949 cycles |
1479842 cycles |
0.98 |
ML-DSA-87 keypair |
2307451 cycles |
2308212 cycles |
1.00 |
ML-DSA-87 sign |
6876862 cycles |
6902480 cycles |
1.00 |
ML-DSA-87 verify |
2410586 cycles |
2435035 cycles |
0.99 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Arm Cortex-A55 (Snapdragon 888) benchmarks (opt)
Benchmark suite | Current: bce8611 | Previous: d0e5eba | Ratio |
---|---|---|---|
ML-DSA-44 keypair |
293900 cycles |
299048 cycles |
0.98 |
ML-DSA-44 sign |
1209347 cycles |
1235706 cycles |
0.98 |
ML-DSA-44 verify |
343802 cycles |
358136 cycles |
0.96 |
ML-DSA-65 keypair |
497019 cycles |
506696 cycles |
0.98 |
ML-DSA-65 sign |
1941494 cycles |
2024461 cycles |
0.96 |
ML-DSA-65 verify |
540122 cycles |
561853 cycles |
0.96 |
ML-DSA-87 keypair |
850216 cycles |
847773 cycles |
1.00 |
ML-DSA-87 sign |
2534477 cycles |
2562883 cycles |
0.99 |
ML-DSA-87 verify |
902552 cycles |
922665 cycles |
0.98 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Arm Cortex-A55 (Snapdragon 888) benchmarks (no-opt)
Benchmark suite | Current: bce8611 | Previous: d0e5eba | Ratio |
---|---|---|---|
ML-DSA-44 keypair |
466413 cycles |
469956 cycles |
0.99 |
ML-DSA-44 sign |
2215549 cycles |
2239780 cycles |
0.99 |
ML-DSA-44 verify |
550719 cycles |
565745 cycles |
0.97 |
ML-DSA-65 keypair |
776107 cycles |
782939 cycles |
0.99 |
ML-DSA-65 sign |
3632934 cycles |
3681139 cycles |
0.99 |
ML-DSA-65 verify |
850930 cycles |
868526 cycles |
0.98 |
ML-DSA-87 keypair |
1258536 cycles |
1271149 cycles |
0.99 |
ML-DSA-87 sign |
4510629 cycles |
4570247 cycles |
0.99 |
ML-DSA-87 verify |
1370939 cycles |
1397775 cycles |
0.98 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Arm Cortex-A72 (Raspberry Pi 4) benchmarks (opt)
Benchmark suite | Current: bce8611 | Previous: d0e5eba | Ratio |
---|---|---|---|
ML-DSA-44 keypair |
241156 cycles |
233930 cycles |
1.03 |
ML-DSA-44 sign |
813433 cycles |
832225 cycles |
0.98 |
ML-DSA-44 verify |
255088 cycles |
267150 cycles |
0.95 |
ML-DSA-65 keypair |
406266 cycles |
409016 cycles |
0.99 |
ML-DSA-65 sign |
1339350 cycles |
1339528 cycles |
1.00 |
ML-DSA-65 verify |
412574 cycles |
424019 cycles |
0.97 |
ML-DSA-87 keypair |
691161 cycles |
671470 cycles |
1.03 |
ML-DSA-87 sign |
1740077 cycles |
1682075 cycles |
1.03 |
ML-DSA-87 verify |
687866 cycles |
697557 cycles |
0.99 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Arm Cortex-A72 (Raspberry Pi 4) benchmarks (no-opt)
Benchmark suite | Current: bce8611 | Previous: d0e5eba | Ratio |
---|---|---|---|
ML-DSA-44 keypair |
306953 cycles |
314364 cycles |
0.98 |
ML-DSA-44 sign |
1214223 cycles |
1243263 cycles |
0.98 |
ML-DSA-44 verify |
340116 cycles |
357064 cycles |
0.95 |
ML-DSA-65 keypair |
557834 cycles |
572673 cycles |
0.97 |
ML-DSA-65 sign |
1949763 cycles |
2004422 cycles |
0.97 |
ML-DSA-65 verify |
530903 cycles |
554343 cycles |
0.96 |
ML-DSA-87 keypair |
857966 cycles |
881801 cycles |
0.97 |
ML-DSA-87 sign |
2440171 cycles |
2515457 cycles |
0.97 |
ML-DSA-87 verify |
875816 cycles |
915213 cycles |
0.96 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the updates! Amazing speed ups!
I am happy with these changes now. Before we merge the commit history needs to be cleaned up though.
@hanno-becker, could you also have a brief look - particularly the CBMC proof changes?
mldsa/fips202/fips202.c
Outdated
if (pos == r) | ||
{ | ||
mld_keccakf1600_permute(s); | ||
pos = 0; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a functional change, isn't it? We now permute when running with pos == r
and outlen == 0
, while previously, this would have been a no-op.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're correct. I will fix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's also the point that the tests didn't catch this => it needs a test as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not a correctness issue, but one of consistency I think. Previously, it seems that pos == r
was allowed and would lead to permutation and pos = 0
reset only upon the next squeeze. With the change -- and it is still the case after your latest change -- the squeeze logic instead permutes and resets to pos = 0
immediately when the block is full. Either is OK, but I think it should be consistent with finalize
which does not permute and reset.
If possible, I would like to keep this uniform and in line with mlkem-native. Can you have a think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I will issue a PR to mlkem-native as well, since it is also using an assignment loop instead of an accelerated memset.
a8da8c6
to
bce8611
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
⚠️ Performance Alert ⚠️
Possible performance regression was detected for benchmark 'Arm Cortex-A72 (Raspberry Pi 4) benchmarks (opt)'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.03
.
Benchmark suite | Current: bce8611 | Previous: d0e5eba | Ratio |
---|---|---|---|
ML-DSA-44 keypair |
241156 cycles |
233930 cycles |
1.03 |
ML-DSA-87 sign |
1740077 cycles |
1682075 cycles |
1.03 |
This comment was automatically generated by workflow using github-action-benchmark.
@@ -199,15 +120,15 @@ static unsigned int keccak_squeeze(uint8_t *out, size_t outlen, | |||
unsigned int pos, unsigned int r) | |||
__contract__( | |||
requires((r == SHAKE128_RATE && pos <= SHAKE128_RATE) || | |||
(r == SHAKE256_RATE && pos <= SHAKE256_RATE)) | |||
(r == SHAKE256_RATE && pos <= SHAKE256_RATE) || |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we restrict r
in the first place?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I support the change per-se, and the speedup is great, thank you @bremoran.
There is one minor issue of consistency, though, with regards to when to permute and reset pos = 0
during squeezing. Previously, we would do this lazily only when we need to squeeze more. Now, we keep pos = r
at finalize, but squeeze
eagerly permutes and resets to pos = 0
. If we can uphold the previous approach (in line with mlkem-native) while still riping the performance benefits, I'd prefer that.
mldsa/fips202/fips202.c
Outdated
if ((outlen > 0) && (pos >= r)) | ||
{ | ||
mld_keccakf1600_permute(s); | ||
pos = 0; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would this case not be handled correctly by the first loop iteration already? (xor_bytes
being a no-op)
mldsa/fips202/fips202.c
Outdated
pos = 0; | ||
} | ||
|
||
while (pos + bytes_to_go >= r) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps the previous behavior can be retained simply by using >
here.
Signed-off-by: willieyz <willie.zhao@chelpis.com>
Unfortunately mlkem-native and mldsa-native are already quite inconsistent. In fact, the API we're discussing doesn't actually exist in mlkem-native. I'm not sure what consistency you're aiming for here? It might be worth attempting to realign mldsa-native with mlkem-native if this is the goal, however it will be a much deeper API review than this small PR. |
See above: After finalize, |
I think you may have missed my point. This code does not exist in mlkem-native. That is the source of the "inconsistency." It's rather difficult to be consistent with code that doesn't exist. The closest code in mlkem-native is:
Note that this does not track pos. A second implementation is also quite close.
This also does not track pos. Of course the mldsa-native code is "inconsistent." There is no equivalent mlkem-native code. |
@hanno-becker hopefully this change is more what you had in mind. |
Squeeze & Absorb currently use plain xor & extract in C. This is fine for 64-bit scalar Keccak, but it doesn't work well for 32-bit architectures, since performance demands bit interleaving. As a result, keccak_absorb and keccak_squeeze should use the existing mld_keccakf1600_xor_bytes and mld_keccakf1600_extract_bytes. This commit refactors Squeeze & Absorb to use mld_keccakf1600_xor_bytes and mld_keccakf1600_extract_bytes instead of ad-hoc C code for accessing the state. Co-authored-by: Matthias J. Kannwischer <matthias@kannwischer.eu> Signed-off-by: Brendan Moran <brendan.moran@arm.com> Signed-off-by: Matthias J. Kannwischer <matthias@kannwischer.eu> Signed-off-by: Brendan Moran <brendan.moran@arm.com>
dada7f1
to
8ac075f
Compare
Squeeze & Absorb currently use plain xor & extract in C. This is fine for 64-bit scalar Keccak, but it doesn't work well for 32-bit architectures, since performance demands bit interleaving. As a result,
keccak_absorb
andkeccak_squeeze
should use the existingmld_keccakf1600_xor_bytes
andmld_keccakf1600_extract_bytes
.This PR refactors Squeeze & Absorb to use
mld_keccakf1600_xor_bytes
andmld_keccakf1600_extract_bytes
instead of ad-hoc C code for accessing the state.