diff --git a/README.md b/README.md index 3bfc018..9838da9 100644 --- a/README.md +++ b/README.md @@ -35,11 +35,35 @@ The paper proposes embedding thousands of these security blocks throughout an AI ## Quickstart -### Prerequisites +### REVISIT Verilog (SystemVerilog + Verilator) + +#### Prerequisites + +- **Verilator** (5.x+) + +#### Run Tests + +```bash +cd verilog + +# Run the full security block test suite (14 tests) +make sim TB=top + +# Run individual test benches +make sim TB=ecdsa +make sim TB=arith + +# Lint +make lint +``` + +### HardCaml (OCaml reference model) + +#### Prerequisites - **OCaml** (4.14+) and **opam** -### Installation +#### Installation ```bash # Install opam if needed (macOS: brew install opam, Ubuntu: apt install opam) @@ -47,7 +71,7 @@ opam init eval $(opam env) # Install dependencies -opam install hardcaml hardcaml_waveterm ppx_hardcaml zarith +opam install hardcaml hardcaml_waveterm ppx_hardcaml zarith # Clone and build git clone https://github.com/JamesPetrie/off-switch @@ -55,7 +79,7 @@ cd off-switch dune build ``` -### Run Tests +#### Run Tests ```bash # Run security block test suite @@ -90,9 +114,9 @@ flowchart TB end SL -->|request_new| TRNG - TRNG -->|"nonce, valid"| SL - SL -->|start| ECDSA - ECDSA -->|"done, valid"| SL + TRNG -->|"nonce_valid, nonce"| SL + SL -->|valid| ECDSA + ECDSA -->|"ready, verif_passed"| SL SL -->|increment| ALLOW ALLOW -->|enabled| AND ADDER --> AND @@ -104,9 +128,9 @@ flowchart TB WOUT["Workload
Output"]:::external end - AUTH <-->|"license_submit, r, s
nonce, ready"| SL - WIN --> ADDER - AND --> WOUT + AUTH <-->|"license_valid, r, s
nonce_ready, nonce, license_ready"| SL + WIN -->|"workload_valid, workload_a, workload_b"| ADDER + AND -->|"result_valid, workload_result"| WOUT classDef external fill:#fff,stroke:#333,stroke-dasharray: 5 5 classDef security fill:#cce5ff,stroke:#004085 @@ -122,9 +146,9 @@ flowchart TB | Module | Type | Purpose | |--------|------|---------| -| `Trng` | Submodule | Nonce generation (256-bit counter in prototype; ring oscillator in production) | -| `Ecdsa` | Submodule | Signature verification using secp256k1 curve | -| Security Logic | Inline | State machine orchestration (7 states) | +| `trng` | Submodule | Nonce generation (256-bit counter in prototype; ring oscillator in production) | +| `ecdsa` | Submodule | Signature verification using secp256k1 curve | +| Security Logic | Inline | State machine orchestration (5 states) | | Usage Allowance | Inline | 64-bit authorization counter | | Workload | Inline | Gated essential operation (Int8 Add example) | @@ -139,7 +163,7 @@ The authorization protocol follows Section 2 of the paper (see Figure 2): 1. TRNG generates nonce (at initialization or after valid license) 2. Security Logic latches and publishes nonce (`nonce_ready` = 1) 3. External authority reads nonce, signs it with private key -4. Authority submits license (r, s) via `license_submit` pulse +4. Authority submits license (r, s) via valid-ready handshake (`license_valid`/`license_ready`) 5. ECDSA verifies signature against nonce and hardcoded public key 6. **If valid:** - Allowance incremented @@ -151,7 +175,7 @@ The authorization protocol follows Section 2 of the paper (see Figure 2): ### Workload Flow -1. Workload inputs (`int8_a`, `int8_b`) arrive with `workload_valid` = 1 +1. Workload inputs (`workload_a`, `workload_b`) arrive with `workload_valid` = 1 2. Computation performed (Int8 addition, wrapping on overflow) 3. Output gating: each result bit ANDed with `enabled` signal - If `allowance > 0`: `enabled` = 1, result passes through @@ -210,43 +234,39 @@ The paper's Section 4 discusses attack vectors against these assumptions in deta | Signal | Width | Description | |--------|-------|-------------| -| `clock` | 1 | System clock | -| `clear` | 1 | Synchronous reset (active high) | -| `license_submit` | 1 | Pulse high for one cycle to submit license | +| `clk` | 1 | System clock | +| `rst_n` | 1 | Asynchronous reset (active low) | +| `license_valid` | 1 | License submission request (hold until `license_ready`) | | `license_r` | 256 | ECDSA signature r component | | `license_s` | 256 | ECDSA signature s component | | `workload_valid` | 1 | Workload input data valid | -| `int8_a` | 8 | Signed 8-bit operand A | -| `int8_b` | 8 | Signed 8-bit operand B | -| `param_a` | 256 | ECDSA curve parameter a (0 for secp256k1) | -| `param_b3` | 256 | ECDSA curve parameter 3b (21 for secp256k1) | -| `trng_seed` | 256 | Seed value for TRNG (testing only) | +| `workload_a` | 8 | Workload operand A | +| `workload_b` | 8 | Workload operand B | | `trng_load_seed` | 1 | Load seed into TRNG (testing only) | +| `trng_seed` | 256 | Seed value for TRNG (testing only) | ### Top-Level Outputs | Signal | Width | Description | |--------|-------|-------------| +| `license_ready` | 1 | License verification complete (pulse) | | `nonce` | 256 | Current nonce value | | `nonce_ready` | 1 | Nonce is stable and ready for signing | -| `int8_result` | 8 | Gated workload output | +| `workload_result` | 8 | Gated workload output | | `result_valid` | 1 | Result output is valid | | `allowance` | 64 | Current allowance counter value | | `enabled` | 1 | Allowance > 0 | -| `state_debug` | 4 | Current state machine state (debug) | -| `licenses_accepted` | 16 | Count of valid licenses processed (debug) | -| `ecdsa_busy` | 1 | ECDSA verification in progress (debug) | ### TRNG Submodule Interface | Direction | Signal | Width | Description | |-----------|--------|-------|-------------| -| Input | `clock` | 1 | System clock | -| Input | `clear` | 1 | Synchronous reset | +| Input | `clk` | 1 | System clock | +| Input | `rst_n` | 1 | Asynchronous reset (active low) | | Input | `enable` | 1 | Enable entropy counter | | Input | `request_new` | 1 | Pulse to latch new nonce | -| Input | `seed` | 256 | Seed value (testing only) | | Input | `load_seed` | 1 | Load seed (testing only) | +| Input | `seed` | 256 | Seed value (testing only) | | Output | `nonce` | 256 | Latched nonce value | | Output | `nonce_valid` | 1 | Nonce has been latched | @@ -254,17 +274,14 @@ The paper's Section 4 discusses attack vectors against these assumptions in deta | Direction | Signal | Width | Description | |-----------|--------|-------|-------------| -| Input | `clock` | 1 | System clock | -| Input | `clear` | 1 | Synchronous reset | -| Input | `start` | 1 | Pulse to begin verification | +| Input | `clk` | 1 | System clock | +| Input | `rst_n` | 1 | Asynchronous reset (active low) | +| Input | `valid` | 1 | Start verification (hold until `ready`) | | Input | `z` | 256 | Message hash (= nonce) | | Input | `r` | 256 | Signature r component | | Input | `s` | 256 | Signature s component | -| Input | `param_a` | 256 | Curve parameter a | -| Input | `param_b3` | 256 | Curve parameter 3b | -| Output | `done_` | 1 | Verification complete (pulse) | -| Output | `valid` | 1 | Signature is valid | -| Output | `busy` | 1 | Verification in progress | +| Output | `ready` | 1 | Verification complete (pulse) | +| Output | `verif_passed` | 1 | Signature is valid | --- @@ -274,32 +291,24 @@ The paper's Section 4 discusses attack vectors against these assumptions in deta ```mermaid stateDiagram-v2 - [*] --> Init_delay - Init_delay --> Request_nonce: counter ≥ 100 - Request_nonce --> Wait_nonce: immediate - Wait_nonce --> Publish: nonce_valid - Publish --> Verify_start: license_submit - Verify_start --> Verify_wait: !ecdsa.busy - Verify_wait --> Update: ecdsa.done_ - Update --> Request_nonce: valid - Update --> Publish: invalid + [*] --> StInitDelay + StInitDelay --> StRequestNonce: counter ≥ 100 + StRequestNonce --> StWaitNonce: immediate + StWaitNonce --> StPublishAndWait: nonce_valid + StPublishAndWait --> StWaitVerify: license_valid + StWaitVerify --> StRequestNonce: verif passed + StWaitVerify --> StPublishAndWait: verif failed ``` ### State Descriptions | State | Entry Condition | Actions | Exit Condition | |-------|-----------------|---------|----------------| -| `Init_delay` | Reset | Increment delay counter | Counter ≥ 100 | -| `Request_nonce` | From Init_delay or Update (valid) | Assert `request_new` to TRNG | Immediate | -| `Wait_nonce` | From Request_nonce | Wait for TRNG | `nonce_valid` | -| `Publish` | From Wait_nonce or Update (invalid) | Latch nonce; `nonce_ready` = 1 | `license_submit` | -| `Verify_start` | From Publish | Latch r, s; assert `ecdsa_start` | `!ecdsa.busy` | -| `Verify_wait` | From Verify_start | Wait for ECDSA | `ecdsa.done_` | -| `Update` | From Verify_wait | If valid: increment allowance | Immediate | - ---- - -Here's an expanded section on the ECDSA and modular arithmetic architecture to add to the README: +| `StInitDelay` | Reset | Increment delay counter | Counter ≥ 100 | +| `StRequestNonce` | From StInitDelay or StWaitVerify (valid) | Pulse `request_new` to TRNG | Immediate | +| `StWaitNonce` | From StRequestNonce | Wait for TRNG | `nonce_valid` | +| `StPublishAndWait` | From StWaitNonce or StWaitVerify (invalid) | `nonce_ready` = 1; wait for license | `license_valid` | +| `StWaitVerify` | From StPublishAndWait | Wait for ECDSA; if valid: increment allowance | `ecdsa_ready` | --- @@ -315,21 +324,18 @@ flowchart TB subgraph SM["State Machine"] direction TB - SM_PREP["Prep Phase
u1, u2 computation"] + SM_PREP["Prepare
u1, u2 computation"] SM_LOOP["Scalar Mult Loop
256 iterations"] - SM_FIN["Finalize
projective to affine"] - SM_CMP["Compare
x_affine == r ?"] + SM_FIN["Finalize
projective → affine
compare x == r"] SM_PREP --> SM_LOOP SM_LOOP --> SM_FIN - SM_FIN --> SM_CMP end - subgraph REGS["Register File --- 17 x 256-bit"] + subgraph REGS["Register File --- 15 x 256-bit"] direction LR R_PT["Point Coords
X1 Y1 Z1
X2 Y2 Z2
X3 Y3 Z3"] R_TMP["Temps
t0 - t5"] - R_PRM["Params
a, b3"] end end @@ -339,19 +345,17 @@ flowchart TB subgraph ARITH["Modular Arithmetic Unit"] direction TB - subgraph INV["Inverse
Ext Euclidean"] + subgraph INV["Inverse
Binary Ext GCD"] direction TB end - subgraph MUL["Multiply
shift-and-add"] + subgraph MUL["Multiply
Shift-and-Add"] direction TB end subgraph ADDSUB["Add - Sub"] direction TB - MOD["Modulus Select
prime p or order n"] ADD256["256-bit Adder"] - MOD --> ADD256 end INV --> ADDSUB @@ -359,12 +363,12 @@ flowchart TB end end - SM <-->|"start, op
done"| ARITH + SM <-->|"valid, op
ready"| ARITH REGS <-->|"read A B
write result"| ARITH end EXT_IN["Inputs:
z, r, s"] --> ECDSA - ECDSA --> EXT_OUT["Output:
valid"] + ECDSA --> EXT_OUT["Output:
verif_passed"] classDef outer fill:#f0f7ff,stroke:#2563eb,stroke-width:2px,color:#1e40af classDef arithbox fill:#fef9e7,stroke:#b7950b,stroke-width:2px,color:#7d6608 @@ -379,11 +383,11 @@ flowchart TB class ECDSA outer class ARITH arithbox class SM smbox - class SM_PREP,SM_LOOP,SM_FIN,SM_CMP smnode + class SM_PREP,SM_LOOP,SM_FIN smnode class REGS regsbox class R_PT,R_TMP,R_PRM regsnode class INV,MUL,ADDSUB subunit - class shared,MOD,ADD256 sharedbox + class shared,ADD256 sharedbox class EXT_IN,EXT_OUT external ``` @@ -414,7 +418,7 @@ Computing `u₁·G + u₂·Q` naively would require two separate scalar multipli For each bit position `i` from 255 down to 0: 1. **Double** the accumulator point `P` 2. **Add** a precomputed point based on the bit pair `(u₁[i], u₂[i])`: - - `(0,0)`: add nothing (skip) + - `(0,0)`: add nothing - `(1,0)`: add `G` - `(0,1)`: add `Q` - `(1,1)`: add `G+Q` (precomputed) @@ -428,18 +432,19 @@ Point addition uses the complete addition formulas from Renes, Costello, and Bat - Avoid branching on point values, which simplifies the state machine and improves side-channel resistance - Require only field operations (add, subtract, multiply) with no inversions during the main loop -Each point addition/doubling executes a fixed sequence of 40 field operations, implemented as a microcode program: +Each point addition/doubling executes a fixed sequence of 40 field operations, implemented as a microcode ROM: -```ocaml -let program = [| - { op = Op.mul; src1 = Config.x1; src2 = Config.x2; dst = Config.t0 }; (* t0 = X1·X2 *) - { op = Op.mul; src1 = Config.y1; src2 = Config.y2; dst = Config.t1 }; (* t1 = Y1·Y2 *) - { op = Op.mul; src1 = Config.z1; src2 = Config.z2; dst = Config.t2 }; (* t2 = Z1·Z2 *) - (* ... 37 more operations ... *) -|] +```systemverilog +localparam instr_t PROGRAM [ROM_SIZE] = '{ + // ... Point addition (Renes-Costello-Batina, 40 steps) ... + '{op: OP_MUL, src1: X1, src2: X2, dst: T0}, // t0 = X1·X2 + '{op: OP_MUL, src1: Y1, src2: Y2, dst: T1}, // t1 = Y1·Y2 + '{op: OP_MUL, src1: Z1, src2: Z2, dst: T2}, // t2 = Z1·Z2 + // ... 37 more operations ... +}; ``` -The formula uses 6 temporary registers (`t0`–`t5`) plus input/output point coordinates and curve parameters, for a total of 17 registers. +The formula uses 6 temporary registers (`t0`–`t5`) plus input/output point coordinates (`X1`–`Z3`), for a total of 15 registers. Curve constants `a` and `3b` are addressed as pseudo-registers but are hardcoded, not stored. ### Modular Arithmetic Unit @@ -449,41 +454,36 @@ The `Arith` module provides the four operations needed for elliptic curve arithm |-----------|-------------|-----------| | `add` | `(a + b) mod m` | Add with conditional subtraction | | `sub` | `(a - b) mod m` | Subtract with conditional addition | -| `mul` | `(a · b) mod m` | Montgomery multiplication (256 iterations) | -| `inv` | `a⁻¹ mod m` | Extended Euclidean algorithm | +| `mul` | `(a · b) mod m` | Binary shift-and-add (256 iterations) | +| `inv` | `a⁻¹ mod m` | Binary Extended GCD | All operations work over 256-bit operands and can use either the field prime `p` or curve order `n` as the modulus: - Point arithmetic (during scalar multiplication) uses `mod p` - Scalar preparation (`u₁`, `u₂` computation) and final comparison use `mod n` -The arithmetic unit interfaces with a 17-register file. Operations are started with a pulse and signal completion via `done_`. Typical cycle counts: +The arithmetic unit interfaces with the register file. Operations are started by asserting `valid` and signal completion via `ready`. Typical cycle counts: - Add/Sub: 2–3 cycles -- Mul: ~500-1000 cycles (bit-serial, varies with y input) +- Mul: ~500-1000 cycles (bit-serial, varies with b input) - Inv: ~2000–3000 cycles (varies with input) ### State Machine Overview The ECDSA verification state machine proceeds through these phases: -``` -Idle → Prep_op → Loop ⟷ Load → Run_add → Finalize_op → Compare → Done - ↑__________________| -``` - -**Prep_op** (3 operations, using `mod n`): +**StPrepare** (3 operations, using `mod n`): 1. `w = s⁻¹ mod n` 2. `u₁ = z · w mod n` 3. `u₂ = r · w mod n` -**Loop/Load/Run_add** (256 bit positions × ~40 ops each): -- For each bit position, double the accumulator and conditionally add `G`, `Q`, or `G+Q` +**StAdd/StDouble** (2 × 256 bit positions × 40 ops each): +- For each bit position (MSB to LSB), add a selected point then double the accumulator +- Point selection via Shamir's trick: `G`, `Q`, `G+Q`, or infinity based on `(u₁[i], u₂[i])` - Point at infinity handled via projective coordinates (`Z = 0`) -**Finalize_op** (2 operations, using `mod p`): +**StFinalize** (3 operations, using `mod p`): 1. `z_inv = Z⁻¹ mod p` (convert from projective to affine) 2. `x_affine = X · z_inv mod p` - -**Compare**: Check if `x_affine == r` +3. `diff = x_affine - r mod p` (valid if `diff == 0`) ### Cycle Count @@ -491,13 +491,14 @@ Total verification takes approximately 5 million cycles, dominated by the ~256 p ### Hardcoded Constants -The prototype hardcodes: +The prototype hardcodes the following secp256k1 constants: - Generator point `G` (from secp256k1 specification) -- Public key `Q = 2G` (would be chip-specific in production) -- Precomputed sum `G + Q = 3G` +- Public key `Q = d · G`, where d is the Private Key (using 2G for testing, would be chip-specific in production) +- Precomputed sum `GPQ = G + Q` - Point at infinity `(0, 1, 0)` in projective coordinates -- Field prime `p = 2²⁵⁶ - 2³² - 977` -- Curve order `n = 2²⁵⁶ - 432420386565659656852420866394968145599` +- Field prime `p = 2²⁵⁶ - 2³² - 977` (from secp256k1 specification) +- Curve order `n = 2²⁵⁶ - 432420386565659656852420866394968145599` (from secp256k1 specification) +- Curve parameters `a = 0`, `b = 7` (y² = x³ + ax + b, from secp256k1 specification) In production, `Q` would be unique per chip (or per batch) and stored in Mask ROM, as recommended in the paper. The other constants are fixed by the secp256k1 specification. @@ -521,11 +522,11 @@ This implementation omits several features needed for production: | Operation | Cycles | Notes | |-----------|--------|-------| -| Initialization delay | 100 | Configurable via `Config.init_delay_cycles` | +| Initialization delay | 100 | Configurable via `INIT_DELAY_CYCLES` | | Nonce generation | 2 | Request + latch | -| License verification | ~10⁶ | ECDSA scalar multiplication dominates | +| License verification | ~5×10⁶ | ECDSA scalar multiplication dominates | | Workload operation | 1 | Combinational add + output register | -| Allowance per license | 10¹² | Configurable via `Config.allowance_increment` | +| Allowance per license | 10¹² | Configurable via `ALLOWANCE_INCREMENT` | ### Allowance Calculation @@ -597,26 +598,27 @@ This is a proof-of-concept implementation. The paper discusses broader limitatio ## Configuration Parameters -```ocaml -module Config = struct - let nonce_width = 256 - let signature_width = 256 - let allowance_width = 64 - let init_delay_cycles = 100 - let allowance_increment = 1_000_000_000_000 (* ~17 min at 1GHz *) -end +```systemverilog +// arith_pkg.sv +localparam int WIDTH = 256; // nonce, signature, and field element width + +// security_block.sv +localparam int unsigned ALLOW_W = 64; // allowance counter width +localparam int INIT_DELAY_CYCLES = 100; // cycles before first nonce +localparam logic [ALLOW_W-1:0] ALLOWANCE_INCREMENT = 64'd1_000_000_000_000; // ~17 min at 1 GHz ``` | Parameter | Value | Description | |-----------|-------|-------------| -| `nonce_width` | 256 | Width of nonce in bits (matches ECDSA message size) | -| `signature_width` | 256 | Width of signature components r and s | -| `allowance_width` | 64 | Width of allowance counter (supports ~584 years at 1 GHz) | -| `init_delay_cycles` | 100 | Cycles to wait after reset before requesting first nonce | -| `allowance_increment` | 10¹² | Cycles added to allowance per valid license (~17 min at 1 GHz) | +| `WIDTH` | 256 | Width of nonce, signature components, and field elements | +| `ALLOW_W` | 64 | Width of allowance counter (supports ~584 years at 1 GHz) | +| `INIT_DELAY_CYCLES` | 100 | Cycles to wait after reset before requesting first nonce | +| `ALLOWANCE_INCREMENT` | 10¹² | Cycles added to allowance per valid license (~17 min at 1 GHz) | --- ## References Petrie, J. (2025). Embedded Off-Switches for AI Compute. *arXiv preprint* arXiv:2509.07637. https://arxiv.org/abs/2509.07637 + +[1]: https://www.secg.org/sec2-v2.pdf diff --git a/verilog/Makefile b/verilog/Makefile new file mode 100644 index 0000000..f651765 --- /dev/null +++ b/verilog/Makefile @@ -0,0 +1,91 @@ +# --------------------------------------------------------------------------- +# Makefile configuration — edit these for your run +# --------------------------------------------------------------------------- + +TOOL ?= verilator +TB ?= top +GUI ?= 0 + +# --------------------------------------------------------------------------- +# Constant variables +# --------------------------------------------------------------------------- + +RTL_VC := rtl/design.vc +TB_DIR := tb +TB_FILE := $(TB_DIR)/tb_$(TB).sv +BUILD_DIR := build +TOP_MODULE := tb + +# --------------------------------------------------------------------------- +# Simulation +# --------------------------------------------------------------------------- + +VFLAGS := -cc --exe --build -j --timing +ifeq ($(GUI),1) + # using FST instead of VCD as the ECDSA simulation dump is in GBs range + # FST is 5-20x smaller in general + VFLAGS += --trace-fst +endif + +.PHONY: sim +sim: +ifeq ($(TOOL),verilator) + @echo "-- VERILATE & BUILD --------" + verilator $(VFLAGS) -F $(RTL_VC) +incdir+$(TB_DIR) $(TB_FILE) --top-module $(TOP_MODULE) tb/sim_main.cpp --Mdir $(BUILD_DIR) + @echo "-- RUN ---------------------" + $(BUILD_DIR)/V$(TOP_MODULE) + @echo "-- DONE --------------------" +ifeq ($(GUI),1) + @echo "-- OPENING GUI -------------" + gtkwave dump.fst +endif +else + $(error Unknown sim tool '$(TOOL)'. See 'make help' for valid options) +endif + +# --------------------------------------------------------------------------- +# Lint +# --------------------------------------------------------------------------- +# ALL_CAPS | CamelCase +VERIBLE_RULES_FLAG := --rules="parameter-name-style=localparam_style_regex:([A-Z][A-Z0-9]*(_[A-Z0-9]+)*|([A-Z][a-z0-9]*)+(_[0-9]+)?)" + +.PHONY: lint +lint: +ifeq ($(GUI),1) + $(error Lint GUI '$(GUI)' not supported. See 'make help' for valid options) +endif +ifeq ($(TOOL),verilator) + @echo "-- RUN VERILATOR LINT -------" + verilator --lint-only -Wall -F $(RTL_VC) + @echo "-- DONE --------------------" +else ifeq ($(TOOL),verible) + @echo "-- RUN VERIBLE LINT --------" + cd $(dir $(RTL_VC)) && \ + verible-verilog-lint $(VERIBLE_RULES_FLAG) $(shell cat $(RTL_VC)) && \ + cd - + @echo "-- DONE --------------------" +else + $(error Unknown lint tool '$(TOOL)'. See 'make help' for valid options) +endif + + + +# --------------------------------------------------------------------------- +# Housekeeping +# --------------------------------------------------------------------------- + +.PHONY: clean +clean: + rm -rf $(BUILD_DIR) *.vcd *.fst *.vpd *.wlf *.log simv csrc + +.PHONY: help +help: + @echo "" + @echo "RTL Makefile — available targets" + @echo "────────────────────────────────────────────────────────" + @echo " sim Run simulation (TOOL=verilator TB=top|ecdsa|arith GUI=0|1)" + @echo " lint Run linter (TOOL=verilator|verible GUI=0)" + @echo " clean Remove all generated build artefacts" + @echo " help Print this message" + +.DEFAULT_GOAL := help diff --git a/verilog/rtl/arith.sv b/verilog/rtl/arith.sv new file mode 100644 index 0000000..c6a8e6c --- /dev/null +++ b/verilog/rtl/arith.sv @@ -0,0 +1,199 @@ +// Arith - Modular arithmetic unit for secp256k1 field operations +// +// Performs add, sub, mul, inv modulo either field prime p or curve order n. +// Operands are read from and results written to an external register file. +// +// Operations (op input): +// 0 = add: f <- a + b mod m +// 1 = sub: f <- a - b mod m +// 2 = mul: f <- a * b mod m +// 3 = inv: f <- a^(-1) mod m (b ignored) +// +// Protocol: +// 1. Set a, b, modulus, op; pulse valid high and hold until ready +// 2. ready pulses high for one cycle when the result is available + +module arith + import arith_pkg::*; // import in module header to be used in port list +( + input logic clk, + input logic rst_n, + input logic valid, + input op_e op, + input logic [WIDTH-1:0] a, + input logic [WIDTH-1:0] b, + input logic [WIDTH-1:0] modulus, + + output logic ready, + output logic [WIDTH-1:0] result +); + + // --------------------------------------------------------------------------- + // Shared mod_add — instance + // --------------------------------------------------------------------------- + + // Inputs shared across multiple blocks, assigned after each block declared + logic mod_add_valid; + logic [WIDTH-1:0] mod_add_a; + logic [WIDTH-1:0] mod_add_b; + logic mod_add_subtract; + + logic mod_add_ready; + logic [WIDTH-1:0] mod_add_result; + logic mod_add_adjust; + + mod_add u_mod_add ( + .clk (clk), + .rst_n (rst_n), + .valid (mod_add_valid), + .a (mod_add_a), + .b (mod_add_b), + .modulus (modulus), + .subtract (mod_add_subtract), + .ready (mod_add_ready), + .result (mod_add_result), + .adjust (mod_add_adjust) + ); + + // --------------------------------------------------------------------------- + // mod_mul instance + // --------------------------------------------------------------------------- + + // Input glue logic + wire mod_mul_valid = valid && (op == OP_MUL); + + // Output nets + logic mod_mul_ready; + logic [WIDTH-1:0] mod_mul_result; + + // mod_add interface + logic mod_mul_add_valid; + logic [WIDTH-1:0] mod_mul_add_a; + logic [WIDTH-1:0] mod_mul_add_b; + logic mod_mul_add_subtract; + + // mod_add resp glue logic + wire mod_mul_add_ready = mod_mul_add_valid && mod_add_ready; + wire [WIDTH-1:0] mod_mul_add_result = mod_add_result; + + mod_mul u_mod_mul ( + .clk (clk), + .rst_n (rst_n), + .valid (mod_mul_valid), + .a (a), + .b (b), + .mod_add_ready (mod_mul_add_ready), + .mod_add_result (mod_mul_add_result), + .ready (mod_mul_ready), + .result (mod_mul_result), + .mod_add_valid (mod_mul_add_valid), + .mod_add_a (mod_mul_add_a), + .mod_add_b (mod_mul_add_b), + .mod_add_subtract(mod_mul_add_subtract) + ); + + // --------------------------------------------------------------------------- + // mod_inv instance + // --------------------------------------------------------------------------- + + // Input glue logic + wire mod_inv_valid = valid && (op == OP_INV); + + // Output nets + logic mod_inv_ready; + logic mod_inv_exists_unused; // not used currently at arith level + logic [WIDTH-1:0] mod_inv_result; + + // mod_add interface + logic mod_inv_add_valid; + logic [WIDTH-1:0] mod_inv_add_a; + logic [WIDTH-1:0] mod_inv_add_b; + logic mod_inv_add_subtract; + + // mod_add resp glue logic + wire mod_inv_add_ready = mod_inv_add_valid && mod_add_ready; + wire [WIDTH-1:0] mod_inv_add_result = mod_add_result; + wire mod_inv_add_adjust = mod_add_adjust; + + mod_inv u_mod_inv ( + .clk (clk), + .rst_n (rst_n), + .valid (mod_inv_valid), + .a (a), + .modulus (modulus), + .mod_add_ready (mod_inv_add_ready), + .mod_add_result (mod_inv_add_result), + .mod_add_adjust (mod_inv_add_adjust), + .ready (mod_inv_ready), + .exists (mod_inv_exists_unused), + .result (mod_inv_result), + .mod_add_valid (mod_inv_add_valid), + .mod_add_a (mod_inv_add_a), + .mod_add_b (mod_inv_add_b), + .mod_add_subtract(mod_inv_add_subtract) + ); + + // --------------------------------------------------------------------------- + // mod_add input assignments + // --------------------------------------------------------------------------- + + always_comb begin + mod_add_valid = 1'b0; + mod_add_a = '0; + mod_add_b = '0; + mod_add_subtract = 1'b0; + + unique case(op) + OP_ADD: begin + mod_add_valid = valid; + mod_add_a = a; + mod_add_b = b; + mod_add_subtract = 1'b0; + end + OP_SUB: begin + mod_add_valid = valid; + mod_add_a = a; + mod_add_b = b; + mod_add_subtract = 1'b1; + end + OP_MUL: begin + mod_add_valid = mod_mul_add_valid; + mod_add_a = mod_mul_add_a; + mod_add_b = mod_mul_add_b; + mod_add_subtract = mod_mul_add_subtract; + end + OP_INV: begin + mod_add_valid = mod_inv_add_valid; + mod_add_a = mod_inv_add_a; + mod_add_b = mod_inv_add_b; + mod_add_subtract = mod_inv_add_subtract; + end + endcase + end + + // --------------------------------------------------------------------------- + // Output assignments + // --------------------------------------------------------------------------- + + always_comb begin + ready = 1'b0; + result = '0; + + unique case(op) + OP_ADD, + OP_SUB: begin + ready = mod_add_ready; + result = mod_add_result; + end + OP_MUL: begin + ready = mod_mul_ready; + result = mod_mul_result; + end + OP_INV: begin + ready = mod_inv_ready; + result = mod_inv_result; + end + endcase + end + +endmodule diff --git a/verilog/rtl/arith_pkg.sv b/verilog/rtl/arith_pkg.sv new file mode 100644 index 0000000..14de35c --- /dev/null +++ b/verilog/rtl/arith_pkg.sv @@ -0,0 +1,12 @@ +package arith_pkg; + + parameter int unsigned WIDTH = 256; + + typedef enum logic [1:0] { + OP_ADD, // modular addition: a + b mod p + OP_SUB, // modular subtraction: a - b mod p + OP_MUL, // modular multiplication: a * b mod p + OP_INV // modular inverse: a^-1 mod p (b ignored) + } op_e; + +endpackage diff --git a/verilog/rtl/comb_add.sv b/verilog/rtl/comb_add.sv new file mode 100644 index 0000000..8e4823d --- /dev/null +++ b/verilog/rtl/comb_add.sv @@ -0,0 +1,26 @@ +module comb_add + import arith_pkg::*; // import in module header to be used in port list +( + input wire [WIDTH-1:0] a, + input wire [WIDTH-1:0] b, + input wire subtract, + output wire [WIDTH-1:0] result, + output wire carry_out +); + + // Two's complement negation of b: + // Step 1: bitwise invert b (ones' complement) + // Step 2: add 1 via carry-in (subtract fed as cin below) + // Together these form -(b) in two's complement. + // When subtract=0 the uninverted b and cin=0 pass through unchanged. + wire [WIDTH:0] b_ext = {1'b0, b}; + wire [WIDTH:0] b_eff = subtract ? ~b_ext : b_ext; + + // Single (WIDTH+1)-bit full adder with carry-in. + // The third operand is a single bit (the carry-in), + // which synthesis tools map directly to the adder's carry-in port. + wire [WIDTH:0] sum = {1'b0, a} + b_eff + { {WIDTH{1'b0}}, subtract}; + + assign result = sum[WIDTH-1:0]; + assign carry_out = sum[WIDTH]; +endmodule diff --git a/verilog/rtl/design.vc b/verilog/rtl/design.vc new file mode 100644 index 0000000..315299d --- /dev/null +++ b/verilog/rtl/design.vc @@ -0,0 +1,10 @@ +./arith_pkg.sv +./comb_add.sv +./mod_add.sv +./mod_mul.sv +./mod_inv.sv +./arith.sv +./trng.sv +./secp256k1_pkg.sv +./ecdsa.sv +./security_block.sv diff --git a/verilog/rtl/ecdsa.sv b/verilog/rtl/ecdsa.sv new file mode 100644 index 0000000..74098b3 --- /dev/null +++ b/verilog/rtl/ecdsa.sv @@ -0,0 +1,509 @@ +// ECDSA - Signature verification for secp256k1 +// +// Verifies ECDSA signatures using: +// R = u1*G + u2*Q +// where: +// u1 = z * s^(-1) mod n +// u2 = r * s^(-1) mod n +// +// Signature is valid if R.x mod n == r +// +// Uses Renes-Costello-Batina complete addition formula in projective coordinates. +// Uses Shamir's trick for simultaneous scalar multiplication (processes u1/u2 +// bits in parallel, selecting G/Q/G+Q/infinity per iteration). +// +// Hardcoded: G (generator), Q = 2G (public key), G+Q = 3G (precomputed sum) +// +// Protocol: +// 1. Assert valid and hold z, r, s stable until ready pulses +// 2. ready pulses high for one cycle when verification completes +// 3. When ready, check verif_passed: 1 = signature verification passed, 0 = signature verification failed +// +// FSM: +// +// StIdle -> StPrepare -> StAdd -> StDouble -> StAdd -> StFinalize -> StIdle +// ^ | +// |__________| +// +// Note: could do the same skip StAdd optimization as in mod_mul but +// PC loading does not currently support re-running the same state (StDouble after StDouble) + + +module ecdsa + import arith_pkg::*; // import in module header to be used in port list + import secp256k1_pkg::*; +( + input logic clk, + input logic rst_n, + input logic valid, + input logic [WIDTH-1:0] z, + input logic [WIDTH-1:0] r, + input logic [WIDTH-1:0] s, + + output logic ready, + output logic verif_passed +); + + // ------------------------------------------------------------------------- + // Types and Constants + // ------------------------------------------------------------------------- + + typedef logic [4:0] all_addr_t; + + // Register file indices + typedef enum all_addr_t { + T0, T1, T2, T3, T4, T5, + X3, Y3, Z3, + X1, Y1, Z1, + X2, Y2, Z2, + A1, B3, // constants, not actual registers + NUM_ADDRS // last element to contain the total number of addresses + } all_addr_e; + + localparam int NUM_CONSTS = 2; + localparam int NUM_REGS = int'(NUM_ADDRS) - NUM_CONSTS; // number of actual registers + + typedef logic [$clog2(NUM_REGS)-1:0] reg_addr_t; + + localparam int BITCNT_W = $clog2(WIDTH); // Bit Counter Width + + // Public key Q (derived from G and Private key d) + localparam logic [WIDTH-1:0] + Q_X = 256'hc6047f9441ed7d6d3045406e95c07cd85c778e4b8cef3ca7abac09b95c709ee5, + Q_Y = 256'h1ae168fea63dc339a3c58419466ceaeef7f632653266d0e1236431a950cfe52a, + Q_Z = 1; + + // Precomputed G + Q (assumed to be computed together with Q, so not implementing the addition here) + localparam logic [WIDTH-1:0] + GPQ_X = 256'hf9308a019258c31049344f85f89d5229b531c845836f99b08601f113bce036f9, + GPQ_Y = 256'h388f7b0f632de8140fe337e62a37f3566500a99934c2231b6cb9fd7584b8e672, + GPQ_Z = 1; + + // Point at infinity (z = 0) + localparam logic [WIDTH-1:0] + INF_X = 0, + INF_Y = 1, + INF_Z = 0; + + // ------------------------------------------------------------------------- + // Instruction ROM + // ------------------------------------------------------------------------- + + typedef struct packed { + op_e op; + all_addr_t src1; + all_addr_t src2; + all_addr_t dst; + // Note: dst can only be register (not constant) so reg_addr_t could also work, + // but the reg enums are using all_addr_t, so using that avoids casting + } instr_t; + + // Segment lengths and PC width + // Note: tried assigning the programs to separate arrays to qurry lengths + // but verilator had issues with concatenating those + localparam int PREPARE_LEN = 3; + localparam int POINT_ADD_LEN = 40; + localparam int FINALIZE_LEN = 3; + localparam int ROM_SIZE = PREPARE_LEN + POINT_ADD_LEN + FINALIZE_LEN; + localparam int PC_WIDTH = $clog2(ROM_SIZE); + + typedef logic [PC_WIDTH-1:0] pc_t; + + localparam instr_t PROGRAM [ROM_SIZE] = '{ + + // --- Prepare (mod n) --- + // w = s^(-1) mod n; u1 = z*w mod n; u2 = r*w mod n + // Assumes t0=s, t1=z, t2=r + /* 1 */ '{op: OP_INV, src1: T0, src2: T0, dst: T0}, // t0 = inv(t0) + /* 2 */ '{op: OP_MUL, src1: T1, src2: T0, dst: T1}, // t1 = t1 * t0 + /* 3 */ '{op: OP_MUL, src1: T2, src2: T0, dst: T2}, // t2 = t2 * t0 + + // --- Point addition (Renes-Costello-Batina, 40 steps) --- + /* 1 */ '{op: OP_MUL, src1: X1, src2: X2, dst: T0}, // t0 = x1*x2 + /* 2 */ '{op: OP_MUL, src1: Y1, src2: Y2, dst: T1}, // t1 = y1*y2 + /* 3 */ '{op: OP_MUL, src1: Z1, src2: Z2, dst: T2}, // t2 = z1*z2 + /* 4 */ '{op: OP_ADD, src1: X1, src2: Y1, dst: T3}, // t3 = x1+y1 + /* 5 */ '{op: OP_ADD, src1: X2, src2: Y2, dst: T4}, // t4 = x2+y2 + /* 6 */ '{op: OP_MUL, src1: T3, src2: T4, dst: T3}, // t3 = t3*t4 + /* 7 */ '{op: OP_ADD, src1: T0, src2: T1, dst: T4}, // t4 = t0+t1 + /* 8 */ '{op: OP_SUB, src1: T3, src2: T4, dst: T3}, // t3 = t3-t4 + /* 9 */ '{op: OP_ADD, src1: X1, src2: Z1, dst: T4}, // t4 = x1+z1 + /* 10 */ '{op: OP_ADD, src1: X2, src2: Z2, dst: T5}, // t5 = x2+z2 + /* 11 */ '{op: OP_MUL, src1: T4, src2: T5, dst: T4}, // t4 = t4*t5 + /* 12 */ '{op: OP_ADD, src1: T0, src2: T2, dst: T5}, // t5 = t0+t2 + /* 13 */ '{op: OP_SUB, src1: T4, src2: T5, dst: T4}, // t4 = t4-t5 + /* 14 */ '{op: OP_ADD, src1: Y1, src2: Z1, dst: T5}, // t5 = y1+z1 + /* 15 */ '{op: OP_ADD, src1: Y2, src2: Z2, dst: X3}, // x3 = y2+z2 + /* 16 */ '{op: OP_MUL, src1: T5, src2: X3, dst: T5}, // t5 = t5*x3 + /* 17 */ '{op: OP_ADD, src1: T1, src2: T2, dst: X3}, // x3 = t1+t2 + /* 18 */ '{op: OP_SUB, src1: T5, src2: X3, dst: T5}, // t5 = t5-x3 + /* 19 */ '{op: OP_MUL, src1: A1, src2: T4, dst: Z3}, // z3 = a1*t4 + /* 20 */ '{op: OP_MUL, src1: B3, src2: T2, dst: X3}, // x3 = b3*t2 + /* 21 */ '{op: OP_ADD, src1: X3, src2: Z3, dst: Z3}, // z3 = x3+z3 + /* 22 */ '{op: OP_SUB, src1: T1, src2: Z3, dst: X3}, // x3 = t1-z3 + /* 23 */ '{op: OP_ADD, src1: T1, src2: Z3, dst: Z3}, // z3 = t1+z3 + /* 24 */ '{op: OP_MUL, src1: X3, src2: Z3, dst: Y3}, // y3 = x3*z3 + /* 25 */ '{op: OP_ADD, src1: T0, src2: T0, dst: T1}, // t1 = t0+t0 + /* 26 */ '{op: OP_ADD, src1: T1, src2: T0, dst: T1}, // t1 = t1+t0 + /* 27 */ '{op: OP_MUL, src1: A1, src2: T2, dst: T2}, // t2 = a1*t2 + /* 28 */ '{op: OP_MUL, src1: B3, src2: T4, dst: T4}, // t4 = b3*t4 + /* 29 */ '{op: OP_ADD, src1: T1, src2: T2, dst: T1}, // t1 = t1+t2 + /* 30 */ '{op: OP_SUB, src1: T0, src2: T2, dst: T2}, // t2 = t0-t2 + /* 31 */ '{op: OP_MUL, src1: A1, src2: T2, dst: T2}, // t2 = a1*t2 + /* 32 */ '{op: OP_ADD, src1: T4, src2: T2, dst: T4}, // t4 = t4+t2 + /* 33 */ '{op: OP_MUL, src1: T1, src2: T4, dst: T0}, // t0 = t1*t4 + /* 34 */ '{op: OP_ADD, src1: Y3, src2: T0, dst: Y1}, // y1 = y3+t0 + /* 35 */ '{op: OP_MUL, src1: T5, src2: T4, dst: T0}, // t0 = t5*t4 + /* 36 */ '{op: OP_MUL, src1: T3, src2: X3, dst: X3}, // x3 = t3*x3 + /* 37 */ '{op: OP_SUB, src1: X3, src2: T0, dst: X1}, // x1 = x3-t0 + /* 38 */ '{op: OP_MUL, src1: T3, src2: T1, dst: T0}, // t0 = t3*t1 + /* 39 */ '{op: OP_MUL, src1: T5, src2: Z3, dst: Z3}, // z3 = t5*z3 + /* 40 */ '{op: OP_ADD, src1: Z3, src2: T0, dst: Z1}, // z1 = z3+t0 + + // --- Finalize (mod p) --- + // z_inv = z1^(-1) mod p; x_affine = x1*z_inv; result = x_affine - r + // Assumes t2=r (restored from r input before entering finalize) + /* 1 */ '{op: OP_INV, src1: Z1, src2: Z1, dst: T0}, // t0 = inv(z1) + /* 2 */ '{op: OP_MUL, src1: X1, src2: T0, dst: T0}, // t0 = x1*t0 + /* 3 */ '{op: OP_SUB, src1: T0, src2: T2, dst: T0} // t0 = t0-t2 + }; + + // Segment boundaries + localparam int ROM_START = 0; + localparam int PREPARE_START = ROM_START; + localparam int PREPARE_END = PREPARE_START + PREPARE_LEN - 1; + localparam int POINT_ADD_START = PREPARE_END + 1; + localparam int POINT_ADD_END = POINT_ADD_START + POINT_ADD_LEN - 1; + localparam int FINALIZE_START = POINT_ADD_END + 1; + localparam int FINALIZE_END = FINALIZE_START + FINALIZE_LEN - 1; + + // Array to collect the PC values where execution should automatically stop + localparam int PROGRAM_ENDS [3] = '{PREPARE_END, POINT_ADD_END, FINALIZE_END}; + + // ------------------------------------------------------------------------- + // FSM states + // ------------------------------------------------------------------------- + + typedef enum logic [2:0] { + StIdle, + StPrepare, + StAdd, + StDouble, + StFinalize + } state_e; + + // ------------------------------------------------------------------------- + // Registers + // ------------------------------------------------------------------------- + + // FSM state + state_e state_q, state_d; + + // Register file + logic [WIDTH-1:0] reg_file_q [NUM_REGS]; + logic [WIDTH-1:0] reg_file_d [NUM_REGS]; + + // Other registers + pc_t pc_q, pc_d; + logic [WIDTH-1:0] u1_q, u1_d; + logic [WIDTH-1:0] u2_q, u2_d; + logic [BITCNT_W-1:0] bit_pos_q, bit_pos_d; + + // ------------------------------------------------------------------------- + // Instruction decode + // ------------------------------------------------------------------------- + + instr_t current_instr; + assign current_instr = PROGRAM[pc_q]; + + // only Prepare requires PRIME_N + wire [WIDTH-1:0] modulus = (int'(pc_q) <= PREPARE_END) ? PRIME_N : PRIME_P; + + // ------------------------------------------------------------------------- + // Register file access helpers + // ------------------------------------------------------------------------- + + function automatic logic [WIDTH-1:0] reg_read(input all_addr_t addr); + case (addr) + // A1 and B3 are constants, not part of the actual register file + A1 : return CURVE_A1; + B3 : return CURVE_B3; + // casting might be needed if the actual register file array requires less bit(s) for indexing + default : return reg_file_q[reg_addr_t'(addr)]; + endcase + endfunction + + function automatic void reg_write(input all_addr_t addr, input logic [WIDTH-1:0] val); + // Making it explicit to lint that discarding MSB is fine when the widths differ + // (The addresses of the constants should not be used for reg_write) + if ( addr[$size(all_addr_t)-1] || + !addr[$size(all_addr_t)-1]) begin + + // casting might be needed if the actual register file array requires less bit(s) for indexing + reg_file_d[reg_addr_t'(addr)] = val; + end + endfunction + + // ------------------------------------------------------------------------- + // Arith instance + // ------------------------------------------------------------------------- + + // arith block enable register + logic arith_valid_q, arith_valid_d; + + // Outputs, used in FSM always_comb + logic arith_ready; // used to increment the PC and sample arith_result + logic [WIDTH-1:0] arith_result; // stored in current_instr.dst register + + arith u_arith ( + .clk (clk), + .rst_n (rst_n), + .valid (arith_valid_q), + .op (current_instr.op), + .a (reg_read(current_instr.src1)), + .b (reg_read(current_instr.src2)), + .modulus (modulus), + .ready (arith_ready), + .result (arith_result) + ); + + // ------------------------------------------------------------------------- + // Shamir's trick point selection + // ------------------------------------------------------------------------- + + logic [WIDTH-1:0] sel_x, sel_y, sel_z; + + always_comb begin + // REVISIT - shift register approach to access u1 and u2 bits could be much less gates + unique case ({u2_q[bit_pos_q], u1_q[bit_pos_q]}) + 2'b00: begin sel_x = INF_X; sel_y = INF_Y; sel_z = INF_Z; end + 2'b01: begin sel_x = G_X; sel_y = G_Y; sel_z = G_Z; end + 2'b10: begin sel_x = Q_X; sel_y = Q_Y; sel_z = Q_Z; end + 2'b11: begin sel_x = GPQ_X; sel_y = GPQ_Y; sel_z = GPQ_Z; end + default: ; + endcase + end + + // ------------------------------------------------------------------------- + // PC — combinational next-state + // ------------------------------------------------------------------------- + always_comb begin + // hold by default + pc_d = pc_q; + + if (arith_ready) begin + // Increment whenever arithmetic block ready + pc_d = pc_q + 1; + end else if (state_d != state_q) begin + // Load new value when FSM state changes (should not coincide with arith_ready) + case (state_d) + StPrepare: pc_d = pc_t'(PREPARE_START); + StAdd: pc_d = pc_t'(POINT_ADD_START); + StDouble: pc_d = pc_t'(POINT_ADD_START); + StFinalize: pc_d = pc_t'(FINALIZE_START); + default: ; // no need to load for the other states + endcase + end + end + + // ------------------------------------------------------------------------- + // FSM — combinational next-state and data path + // ------------------------------------------------------------------------- + + always_comb begin + // Outputs (inactive by default) + ready = 1'b0; + verif_passed = 1'b0; + + // Simple registers (hold by default) + state_d = state_q; + u1_d = u1_q; + u2_d = u2_q; + bit_pos_d = bit_pos_q; + arith_valid_d = arith_valid_q; + + foreach (reg_file_d[i]) begin + reg_file_d[i] = reg_file_q[i]; + end + + // Handle running the program here centrally for all states + if (arith_ready) begin + + // When arith block ready, store result + reg_write(current_instr.dst, arith_result); + + // If end of program reached, stop the program + if (int'(pc_q) inside {PROGRAM_ENDS}) begin + arith_valid_d = 1'b0; + end + end + + // State machine + unique case (state_q) + // ----------------------------------------------------------------- + StIdle: begin + if (valid) begin + // Initialize P1 accumulator to point infinity + reg_write(X1, INF_X); + reg_write(Y1, INF_Y); + reg_write(Z1, INF_Z); + + // Move to next state + state_d = StPrepare; + end + end + + // ----------------------------------------------------------------- + StPrepare: begin + + // PC loading handled in separate always_comb + + // If program not started yet, load the inputs and start the program + if (!arith_valid_q && int'(pc_q) == PREPARE_START) begin + + reg_write(T0, s); + reg_write(T1, z); + reg_write(T2, r); + + arith_valid_d = 1'b1; + end + + // Nothing to do here when the program running, it's handled outside the case statement + + // When program finished, store the u1, u2 results, initialize loop counter and move to next state + if (!arith_valid_q && int'(pc_q) != PREPARE_START) begin + u1_d = reg_read(T1); + u2_d = reg_read(T2); + bit_pos_d = BITCNT_W'(WIDTH-1); + + state_d = StAdd; + end + end + + // ----------------------------------------------------------------- + StAdd: begin + + // PC loading handled in separate always_comb + + // If program not started yet, load the inputs and start the program + if (!arith_valid_q && int'(pc_q) == POINT_ADD_START) begin + // P2 = selected_point (for P1 += P2) + reg_write(X2, sel_x); + reg_write(Y2, sel_y); + reg_write(Z2, sel_z); + + arith_valid_d = 1'b1; + end + + // Nothing to do here when the program running, it's handled outside the case statement + + // When program finished, move to next state (results already in the P1 accumulator) + if (!arith_valid_q && int'(pc_q) != POINT_ADD_START) begin + // Stop condition: last bit (doubling not needed then) + state_d = (bit_pos_q != '0) ? StDouble : StFinalize; + end + end + + // ----------------------------------------------------------------- + StDouble: begin + + // PC loading handled in separate always_comb + + // If program not started yet, load the inputs and start the program + if (!arith_valid_q && int'(pc_q) == POINT_ADD_START) begin + // P2 = P1 (for P1 + P2 = 2*P1) + reg_write(X2, reg_read(X1)); + reg_write(Y2, reg_read(Y1)); + reg_write(Z2, reg_read(Z1)); + + arith_valid_d = 1'b1; + end + + // Nothing to do here when the program running, it's handled outside the case statement + + // When program finished, move back to add state and decrement bit counter (results already in the P1 accumulator) + if (!arith_valid_q && int'(pc_q) != POINT_ADD_START) begin + bit_pos_d = bit_pos_q - 1; + // Note: could do the same skip StAdd optimization as in mod_mul but + // PC loading does not currently support re-running the same state (StDouble after StDouble) + state_d = StAdd; + end + end + + // ----------------------------------------------------------------- + StFinalize: begin + + // PC loading handled in separate always_comb + + // If program not started yet, load the inputs and start the program + if (!arith_valid_q && int'(pc_q) == FINALIZE_START) begin + // X1, Y1, Z1 are already in the corresponding registers + reg_write(T2, r); + + arith_valid_d = 1'b1; + end + + // Nothing to do here when the program running, it's handled outside the case statement + + // When program finished, check the result and move back to idle + if (!arith_valid_q && int'(pc_q) != FINALIZE_START) begin + ready = 1'b1; + verif_passed = (reg_read(T0) == '0); + + state_d = StIdle; + end + end + + default: ; + endcase + end + + // ------------------------------------------------------------------------- + // Sequential: register updates, asynchronous active-low reset + // ------------------------------------------------------------------------- + + // Register file registers + always_ff @(posedge clk or negedge rst_n) begin + if (!rst_n) begin + for (int i = 0; i < NUM_REGS; i++) begin + reg_file_q[i] <= '0; + end + end else begin + for (int i = 0; i < NUM_REGS; i++) begin + reg_file_q[i] <= reg_file_d[i]; + end + end + end + + // FSM state register + always_ff @(posedge clk or negedge rst_n) begin + if (!rst_n) state_q <= StIdle; + else state_q <= state_d; + end + + // PC register + always_ff @(posedge clk or negedge rst_n) begin + if (!rst_n) pc_q <= '0; + else pc_q <= pc_d; + end + + // arith_valid register + always_ff @(posedge clk or negedge rst_n) begin + if (!rst_n) arith_valid_q <= 1'b0; + else arith_valid_q <= arith_valid_d; + end + + // u1, u2, and bit_pos registers + always_ff @(posedge clk or negedge rst_n) begin + if (!rst_n) begin + u1_q <= '0; + u2_q <= '0; + bit_pos_q <= '0; + end else begin + u1_q <= u1_d; + u2_q <= u2_d; + bit_pos_q <= bit_pos_d; + end + end + +endmodule diff --git a/verilog/rtl/mod_add.sv b/verilog/rtl/mod_add.sv new file mode 100644 index 0000000..f0405f7 --- /dev/null +++ b/verilog/rtl/mod_add.sv @@ -0,0 +1,158 @@ +// Mod_add - Simple modular addition / subtraction +// +// Computes (x ± y) mod modulus, where the subtract signal determines the sense of the operation. +// +// State machine: StAdd -> StAdjust -> StAdd ... +// Processing one modular operation in each StAdd -> StAdjust cycle. +// If there is no new request, the FSM remains in StAdd until a new request arrives. + + +module mod_add + import arith_pkg::*; // import in module header to be used in port list +( + input logic clk, + input logic rst_n, + input logic valid, + input logic [WIDTH-1:0] a, + input logic [WIDTH-1:0] b, + input logic [WIDTH-1:0] modulus, + input logic subtract, + + output logic ready, + output logic [WIDTH-1:0] result, + output logic adjust +); + + // FSM enum + typedef enum logic { + StAdd, + StAdjust + } state_e; + + // --------------------------------------------------------------------------- + // Registers + // --------------------------------------------------------------------------- + + // FSM state + state_e state_q; + state_e state_d; + + // Intermediate result + logic [WIDTH-1:0] result_ab_q; + logic [WIDTH-1:0] result_ab_d; + + // Intermediate carry + logic carry_ab_q; + logic carry_ab_d; + + // --------------------------------------------------------------------------- + // Adder instance + // --------------------------------------------------------------------------- + + // Adder inputs + logic [WIDTH-1:0] adder_a; + logic [WIDTH-1:0] adder_b; + logic adder_subtract; + + // Adder outputs + logic [WIDTH-1:0] adder_result; + logic adder_carry_out; + + // Additional adder_ready signal considered in the FSM to support sequential adders as well + // adder_ready is always 1 (comb_add has no latency) + wire adder_ready = 1'b1; + + comb_add u_comb_add ( + .a (adder_a), + .b (adder_b), + .subtract (adder_subtract), + .result (adder_result), + .carry_out (adder_carry_out) + ); + + // --------------------------------------------------------------------------- + // FSM + // --------------------------------------------------------------------------- + + // FSM: combinational next-state, and output decode (including adder inputs) + data registers controlled by the FSM + always_comb begin + // Defaults + + // Next state (maintain by default) + state_d = state_q; + + // Adder inputs (unsused defaults, always overridden) + adder_a = 'x; + adder_b = 'x; + adder_subtract = 1'bx; + + // Module outputs (masked when inactive) + ready = 1'b0; + result = '0; + adjust = 1'b0; + + // Data registers (maintain by default) + result_ab_d = result_ab_q; + carry_ab_d = carry_ab_q; + + + unique case (state_q) + StAdd: begin + // Adder computes a ± b + adder_a = a; + adder_b = b; + adder_subtract = subtract; + + // sample adder results and move to next state when inputs are valid and adder is ready + if (valid && adder_ready) begin + result_ab_d = adder_result; + carry_ab_d = adder_carry_out; + state_d = StAdjust; + end + end + StAdjust: begin + // Adder computes result_ab ∓ modulus for correction (subtract sense inverted) + adder_a = result_ab_q; + adder_b = modulus; + adder_subtract = ~subtract; + + // assign final results when adder is ready + // new adder result is discarded if adjust is not needed + if (adder_ready) begin + ready = 1'b1; + // assign adjust first and then reuse for result + adjust = // when a-b is negative (carry_ab=1) + ( subtract && carry_ab_q) || + // when a+b overflowed (carry_ab=1), can't rely on adder_carry_out + (!subtract && carry_ab_q) || + // when a+b did not overflow but subtracting m does not make it negative + (!subtract && ~adder_carry_out); + result = adjust ? adder_result : result_ab_q; + + state_d = StAdd; + end + end + default: ; // empty - defaults are set outside the case statement + endcase + end + + // Sequential: register updates, asynchronous active-low reset + always_ff @(posedge clk or negedge rst_n) begin + if (!rst_n) begin + state_q <= StAdd; + end else begin + state_q <= state_d; + end + end + + always_ff @(posedge clk or negedge rst_n) begin + if (!rst_n) begin + result_ab_q <= '0; + carry_ab_q <= 1'b0; + end else begin + result_ab_q <= result_ab_d; + carry_ab_q <= carry_ab_d; + end + end + +endmodule diff --git a/verilog/rtl/mod_inv.sv b/verilog/rtl/mod_inv.sv new file mode 100644 index 0000000..d496a58 --- /dev/null +++ b/verilog/rtl/mod_inv.sv @@ -0,0 +1,308 @@ +// Mod_inv - Modular inverse via Binary Extended GCD +// +// Computes a^(-1) mod modulus, or reports that the inverse does not exist. +// Assumes the modulus is an odd prime (secp256k1 field prime or curve order). +// +// Drives an external mod_add instance for all arithmetic; the caller wires +// mod_add_{valid,a,b,subtract} to the mod_add inputs and feeds +// mod_add_{result,ready,adjust} back as inputs to this module. +// +// Protocol: +// 1. Assert valid and hold a, modulus stable until ready pulses +// 2. Wire the external mod_add as directed by mod_add_valid/a/b/subtract +// 3. Feed mod_add_result, mod_add_ready, mod_add_adjust back as inputs +// 4. ready pulses high for one cycle when result is available +// 5. When ready, check exists: 1 = result is the inverse, 0 = no inverse +// +// State machine: +// _________________________________________________________ +// | _______________________ | +// | | | | +// | | -> StDiv2Add -> StDiv2P1 | +// v v | | +// StIdle -> StOpSel -|-> StSubRems -> StSubRemsRev (Conditional) -> StSubCoeffs +// | +// -> StDone -> StIdle + + +module mod_inv + import arith_pkg::*; // import in module header to be used in port list +( + input logic clk, + input logic rst_n, + // Control + input logic valid, + input logic [WIDTH-1:0] a, + input logic [WIDTH-1:0] modulus, + + // External mod_add resp + input logic mod_add_ready, + input logic [WIDTH-1:0] mod_add_result, + input logic mod_add_adjust, + + // Result + output logic ready, + output logic exists, + output logic [WIDTH-1:0] result, + + // External mod_add req + output logic mod_add_valid, + output logic [WIDTH-1:0] mod_add_a, + output logic [WIDTH-1:0] mod_add_b, + output logic mod_add_subtract +); + + // FSM states + typedef enum logic [2:0] { + StIdle, + StOpSel, + StDiv2Add, + StDiv2P1, + StSubRems, + StSubRemsRev, + StSubCoeffs, + StDone + } state_e; + + // --------------------------------------------------------------------------- + // Registers + // --------------------------------------------------------------------------- + + // FSM state + state_e state_q, state_d; + + // Remainders and a's Bezout coefficients + // followings must always hold: + // a*s == u mod m (same as a*s + m*x == u mod m) + // a*t == v mod m (same as a*t + m*y == v mod m) + // Note: coefficient for the modulus vanishes due to mod m arithmetic, so no need to track those + logic [WIDTH-1:0] u_rem_q, u_rem_d; + logic [WIDTH-1:0] v_rem_q, v_rem_d; + logic [WIDTH-1:0] s_coeff_q, s_coeff_d; + logic [WIDTH-1:0] t_coeff_q, t_coeff_d; + + // Helper flags + logic reduced_unv_q, reduced_unv_d; // 1 = u was reduced, 0 = v was reduced + logic div2_unv_q, div2_unv_d; // 1 = dividing u/s, 0 = dividing v/t + logic div2_coeff_odd_q, div2_coeff_odd_d; // was the original coefficient odd? + + // --------------------------------------------------------------------------- + // Combinational helpers + // --------------------------------------------------------------------------- + + // Select current coefficient based on div2_unv + wire [WIDTH-1:0] div2_coeff = div2_unv_q ? s_coeff_q : t_coeff_q; + + // --------------------------------------------------------------------------- + // FSM — combinational next-state, output decode, data register inputs + // --------------------------------------------------------------------------- + + always_comb begin + // Outputs (inactive by default) + ready = 1'b0; + result = '0; + exists = 1'b0; + + // Registers (hold value by default) + state_d = state_q; + u_rem_d = u_rem_q; + v_rem_d = v_rem_q; + s_coeff_d = s_coeff_q; + t_coeff_d = t_coeff_q; + reduced_unv_d = reduced_unv_q; + div2_unv_d = div2_unv_q; + div2_coeff_odd_d = div2_coeff_odd_q; + + // mod_add outputs (masked when inactive) + mod_add_valid = 1'b0; + mod_add_a = '0; + mod_add_b = '0; + mod_add_subtract = 1'b0; + + unique case (state_q) + // ----------------------------------------------------------------- + StIdle: begin + if (valid) begin + u_rem_d = a; // u = a + v_rem_d = modulus; // v = b = modulus + s_coeff_d = 1; // a*s == u mod m => s = 1 + t_coeff_d = 0; // a*t == v mod m => t = 0 + state_d = StOpSel; + end + end + + // ----------------------------------------------------------------- + StOpSel: begin + if (u_rem_q == '0) begin + // Termination: gcd found + state_d = StDone; + end else if (!u_rem_q[0]) begin + // u is even: divide u/s pair + div2_unv_d = 1'b1; + div2_coeff_odd_d = s_coeff_q[0]; + state_d = StDiv2Add; + end else if (!v_rem_q[0]) begin + // v is even: divide v/t pair + div2_unv_d = 1'b0; + div2_coeff_odd_d = t_coeff_q[0]; + state_d = StDiv2Add; + end else begin + // Both odd: subtract remainders + state_d = StSubRems; + end + end + + // ----------------------------------------------------------------- + // Div2: divide remainder by 2, adjust coefficient + // + // r = r >> 1 + // if c is even: c = c >> 1 + // if c is odd: c = (c >> 1) + (mod >> 1), then c = c + 1 + // + // The odd case is split across StDiv2Add and StDiv2P1 to avoid + // exceeding WIDTH bits in the intermediate (c + mod) value. + // ----------------------------------------------------------------- + StDiv2Add: begin + // Drive mod_add: (c >> 1) + (mod >> 1) in case it's needed + mod_add_valid = 1'b1; + mod_add_a = div2_coeff >> 1; + mod_add_b = modulus >> 1; + mod_add_subtract = 1'b0; + + if (mod_add_ready) begin + // Shift the remainder and update coefficient based on parity + if (div2_unv_q) begin + u_rem_d = u_rem_q >> 1; + s_coeff_d = div2_coeff[0] ? mod_add_result : (div2_coeff >> 1); + end else begin + v_rem_d = v_rem_q >> 1; + t_coeff_d = div2_coeff[0] ? mod_add_result : (div2_coeff >> 1); + end + div2_coeff_odd_d = div2_coeff[0]; + state_d = StDiv2P1; + end + end + + // ----------------------------------------------------------------- + StDiv2P1: begin + // Drive mod_add: c + 1 in case it's needed + mod_add_valid = 1'b1; + mod_add_a = div2_coeff; + mod_add_b = 1; + + if (mod_add_ready) begin + // Apply the +1 only if original coefficient was odd + if (div2_coeff_odd_q) begin + if (div2_unv_q) + s_coeff_d = mod_add_result; + else + t_coeff_d = mod_add_result; + end + state_d = StOpSel; + end + end + + // ----------------------------------------------------------------- + StSubRems: begin + // Try u - v + mod_add_valid = 1'b1; + mod_add_a = u_rem_q; + mod_add_b = v_rem_q; + mod_add_subtract = 1'b1; + + if (mod_add_ready) begin + if (!mod_add_adjust) begin + // No underflow: u >= v + u_rem_d = mod_add_result; + reduced_unv_d = 1'b1; + state_d = StSubCoeffs; + end else begin + // Underflow: u < v, need reverse subtraction + state_d = StSubRemsRev; + end + end + end + + // ----------------------------------------------------------------- + StSubRemsRev: begin + // v - u (guaranteed no underflow) + mod_add_valid = 1'b1; + mod_add_a = v_rem_q; + mod_add_b = u_rem_q; + mod_add_subtract = 1'b1; + + if (mod_add_ready) begin + v_rem_d = mod_add_result; + reduced_unv_d = 1'b0; + state_d = StSubCoeffs; + end + end + + // ----------------------------------------------------------------- + StSubCoeffs: begin + // If u was reduced: s = s - t, else: t = t - s + mod_add_valid = 1'b1; + mod_add_a = reduced_unv_q ? s_coeff_q : t_coeff_q; + mod_add_b = reduced_unv_q ? t_coeff_q : s_coeff_q; + mod_add_subtract = 1'b1; + + if (mod_add_ready) begin + if (reduced_unv_q) + s_coeff_d = mod_add_result; + else + t_coeff_d = mod_add_result; + state_d = StOpSel; + end + end + + // ----------------------------------------------------------------- + StDone: begin + state_d = StIdle; + ready = 1'b1; + exists = (v_rem_q == 256'd1); // ignoring m = 0/1 cases (assuming large prime) + result = exists ? t_coeff_q : '0; + end + + default: ; // empty — defaults are set outside the case statement + endcase + end + + // --------------------------------------------------------------------------- + // Sequential: register updates, asynchronous active-low reset + // --------------------------------------------------------------------------- + + // State register + always_ff @(posedge clk or negedge rst_n) begin + if (!rst_n) state_q <= StIdle; + else state_q <= state_d; + end + + // Remainders and coefficients registers + always_ff @(posedge clk or negedge rst_n) begin + if (!rst_n) begin + u_rem_q <= '0; + v_rem_q <= '0; + s_coeff_q <= '0; + t_coeff_q <= '0; + end else begin + u_rem_q <= u_rem_d; + v_rem_q <= v_rem_d; + s_coeff_q <= s_coeff_d; + t_coeff_q <= t_coeff_d; + end + end + + // Helper flags registers + always_ff @(posedge clk or negedge rst_n) begin + if (!rst_n) begin + reduced_unv_q <= 1'b0; + div2_unv_q <= 1'b0; + div2_coeff_odd_q <= 1'b0; + end else begin + reduced_unv_q <= reduced_unv_d; + div2_unv_q <= div2_unv_d; + div2_coeff_odd_q <= div2_coeff_odd_d; + end + end + +endmodule diff --git a/verilog/rtl/mod_mul.sv b/verilog/rtl/mod_mul.sv new file mode 100644 index 0000000..c1defda --- /dev/null +++ b/verilog/rtl/mod_mul.sv @@ -0,0 +1,182 @@ +// Mod_mul - Modular multiplication via binary shift-and-add (using modular add and double) +// +// Computes (a * b) mod modulus. +// Drives an external mod_add instance for all additions; the caller wires +// mod_add_{valid,a,b,subtract} to the mod_add inputs and feeds +// mod_add_{result,ready} back as inputs to this module. +// +// Protocol: +// 1. Assert valid and hold a, b stable until ready pulses +// 2. Wire the external mod_add as directed by mod_add_valid/a/b/subtract +// 3. Feed mod_add_result and mod_add_ready back as inputs +// 4. ready pulses high for one cycle when result is available +// +// State machine: StIdle -> StAdd (conditional) -> StDone -> StIdle +// ^ | +// |__StDouble__| + +module mod_mul + import arith_pkg::*; // import in module header to be used in port list +( + input logic clk, + input logic rst_n, + // Control + input logic valid, + // Operands (held stable throughout computation) + input logic [WIDTH-1:0] a, + input logic [WIDTH-1:0] b, + // input logic [WIDTH-1:0] modulus, // feeding the modulus to mod_add is taken care of in the arith block + + // External mod_add resp + input logic mod_add_ready, + input logic [WIDTH-1:0] mod_add_result, + + // Result + output logic ready, + output logic [WIDTH-1:0] result, + + // External mod_add req + output logic mod_add_valid, + output logic [WIDTH-1:0] mod_add_a, + output logic [WIDTH-1:0] mod_add_b, + output logic mod_add_subtract // always 0 for mod_mul (add and double) +); + + // FSM states + typedef enum logic [1:0] { + StIdle, + StAdd, + StDouble, + StDone + } state_e; + + // --------------------------------------------------------------------------- + // Registers + // --------------------------------------------------------------------------- + + // FSM state + state_e state_q, state_d; + + // REVISIT - could use MSB first shift-and-add to avoid having a register for the multiplicand + // multiplicand "left-shifted", actually doubled via modular self-add (no real shifting happens) + logic [WIDTH-1:0] multiplicand_lsh_q, multiplicand_lsh_d; + + // REVISIT - could also use a mux to index the multiplier bits (though that is also significant gate count) + // multiplier right-shifted, here we do real shifting and always check the LSB only + logic [WIDTH-1:0] multiplier_rsh_q, multiplier_rsh_d; + + // Accumulates the result after each addition step; holds the final result at the end + logic [WIDTH-1:0] result_acc_q, result_acc_d; + + // --------------------------------------------------------------------------- + // FSM — combinational next-state, output decode, data register inputs + // --------------------------------------------------------------------------- + + always_comb begin + // Outputs (inactive by default) + ready = 1'b0; + result = '0; + + // Registers (hold value by default) + state_d = state_q; + multiplicand_lsh_d = multiplicand_lsh_q; + multiplier_rsh_d = multiplier_rsh_q; + result_acc_d = result_acc_q; + + // mod_add outputs (masked when inactive) + mod_add_valid = 1'b0; + mod_add_a = '0; + mod_add_b = '0; + mod_add_subtract = 1'b0; + + unique case (state_q) + StIdle: begin + if (valid) begin + multiplicand_lsh_d = a; + multiplier_rsh_d = b; + result_acc_d = '0; + state_d = StAdd; + end + end + + StAdd: begin + // Drive mod_add: acc + multiplicand_lsh + mod_add_valid = 1'b1; + mod_add_a = result_acc_q; + mod_add_b = multiplicand_lsh_q; + mod_add_subtract = 1'b0; + + if (mod_add_ready) begin + if (multiplier_rsh_q[0]) begin + result_acc_d = mod_add_result; + end + + // Check stop condition (are all other multiplier bits zero?) + state_d = (multiplier_rsh_q[WIDTH-1:1] != '0) ? StDouble : StDone; + end + end + + StDouble: begin + // Drive mod_add: multiplicand_lsh * 2 (via self-add) + mod_add_valid = 1'b1; + mod_add_a = multiplicand_lsh_q; + mod_add_b = multiplicand_lsh_q; + mod_add_subtract = 1'b0; + + if (mod_add_ready) begin + multiplicand_lsh_d = mod_add_result; + multiplier_rsh_d = multiplier_rsh_q >> 1; + + // Optimization: skip StAdd when next LSB=0 + // Simulation time and run cycles saving might be significant + state_d = multiplier_rsh_q[1] ? StAdd : StDouble; + end + end + + StDone: begin + state_d = StIdle; + ready = 1'b1; + result = result_acc_q; + end + + default: ; // empty — defaults are set outside the case statement + endcase + end + + // --------------------------------------------------------------------------- + // Sequential: register updates, asynchronous active-low reset + // --------------------------------------------------------------------------- + + always_ff @(posedge clk or negedge rst_n) begin + if (!rst_n) begin + state_q <= StIdle; + end else begin + state_q <= state_d; + end + end + + always_ff @(posedge clk or negedge rst_n) begin + if (!rst_n) begin + multiplicand_lsh_q <= '0; + end else begin + multiplicand_lsh_q <= multiplicand_lsh_d; + end + end + + always_ff @(posedge clk or negedge rst_n) begin + if (!rst_n) begin + multiplier_rsh_q <= '0; + end else begin + multiplier_rsh_q <= multiplier_rsh_d; + end + end + + always_ff @(posedge clk or negedge rst_n) begin + if (!rst_n) begin + result_acc_q <= '0; + end else begin + result_acc_q <= result_acc_d; + end + end + +endmodule diff --git a/verilog/rtl/secp256k1_pkg.sv b/verilog/rtl/secp256k1_pkg.sv new file mode 100644 index 0000000..968abb6 --- /dev/null +++ b/verilog/rtl/secp256k1_pkg.sv @@ -0,0 +1,28 @@ +package secp256k1_pkg; + + import arith_pkg::*; + + // ------------------------------------------------------------------------- + // secp256k1 constants + // ------------------------------------------------------------------------- + + // field prime: p = 2^256 - 2^32 - 977 + // factored out 2**32 to avoid 2**256 overflow + localparam logic [WIDTH-1:0] PRIME_P = + 256'd2**32 * (256'd2**224 - 256'd1) - 256'd977; + + // scurve order n (no closed form exists) + localparam logic [WIDTH-1:0] PRIME_N = + 256'hFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFE_BAAEDCE6AF48A03BBFD25E8CD0364141; + + // curve parameters: y² = x³ + ax + b, a=0, b=7 + localparam logic [WIDTH-1:0] CURVE_A1 = 1 * 0; // 1*a + localparam logic [WIDTH-1:0] CURVE_B3 = 3 * 7; // 3*b + + // Generator point G + localparam logic [WIDTH-1:0] + G_X = 256'h79BE667EF9DCBBAC55A06295CE870B07029BFCDB2DCE28D959F2815B16F81798, + G_Y = 256'h483ADA7726A3C4655DA4FBFC0E1108A8FD17B448A68554199C47D08FFB10D4B8, + G_Z = 1; // projective coordinate for affine points have Z=1 + +endpackage diff --git a/verilog/rtl/security_block.sv b/verilog/rtl/security_block.sv new file mode 100644 index 0000000..2cb182e --- /dev/null +++ b/verilog/rtl/security_block.sv @@ -0,0 +1,257 @@ +// Security Block +// +// Manages ECDSA-based license validation, a TRNG nonce source, an allowance +// counter, and a gated workload unit. +// +// Protocol: +// 1. On startup, waits INIT_DELAY then generates an initial nonce +// 2. nonce_ready pulses when a fresh nonce is available in nonce[] +// 3. Submit a license via valid-ready handshake: assert license_valid with +// (license_r, license_s); transfer completes when license_ready is high. +// The signature must be over the current nonce as the message hash. +// 4. On valid license: allowance += ALLOWANCE_INCREMENT (saturating), new nonce +// On invalid license: same nonce retained, can retry +// 5. Workload (signed 8-bit add) is gated: result is zeroed when allowance == 0 +// 6. Allowance decrements by 1 every cycle while > 0 + +module security_block + import arith_pkg::*; +# ( + localparam int unsigned ALLOW_W = 64, + localparam int unsigned WORKLD_W = 8 +)( + input logic clk, + input logic rst_n, + + // License interface (valid-ready) + input logic license_valid, + output logic license_ready, + input logic [WIDTH-1:0] license_r, + input logic [WIDTH-1:0] license_s, + + // Workload interface + input logic workload_valid, + input logic [WORKLD_W-1:0] workload_a, + input logic [WORKLD_W-1:0] workload_b, + + // TRNG seed (for simulation) + input logic [WIDTH-1:0] trng_seed, + input logic trng_load_seed, + + // Outputs + output logic [WIDTH-1:0] nonce, + output logic nonce_ready, + output logic [WORKLD_W-1:0] workload_result, + output logic result_valid, + output logic [ALLOW_W-1:0] allowance, + output logic enabled +); + + // ------------------------------------------------------------------------- + // Constants + // ------------------------------------------------------------------------- + + localparam int INIT_DELAY = 100; + localparam int DELAYCNT_W = $clog2(INIT_DELAY); // delay counter width + + localparam logic [ALLOW_W-1:0] ALLOWANCE_INCREMENT = 64'd1_000_000_000_000; + + // ------------------------------------------------------------------------- + // FSM states + // ------------------------------------------------------------------------- + + typedef enum logic [2:0] { + StInitDelay, + StRequestNonce, + StPublishAndWait, + StWaitVerify + } state_e; + + // ------------------------------------------------------------------------- + // Registers + // ------------------------------------------------------------------------- + + state_e state_q, state_d; + logic [ALLOW_W-1:0] allowance_q, allowance_d; + logic result_valid_q, result_valid_d; + logic [WORKLD_W-1:0] workload_result_q, workload_result_d; + logic [DELAYCNT_W-1:0] delay_cnt_q, delay_cnt_d; // counts init delay + + + // ------------------------------------------------------------------------- + // TRNG instance + // ------------------------------------------------------------------------- + + logic trng_request_new; + logic [WIDTH-1:0] trng_nonce; + logic trng_nonce_valid; + + trng u_trng ( + .clk (clk), + .rst_n (rst_n), + .enable (1'b1), + .request_new (trng_request_new), + .load_seed (trng_load_seed), + .seed (trng_seed), + .nonce (trng_nonce), + .nonce_valid (trng_nonce_valid) + ); + + // ------------------------------------------------------------------------- + // ECDSA instance + // ------------------------------------------------------------------------- + + // Input valid to be driven from the FSM + logic ecdsa_valid; + + // Outputs + logic ecdsa_ready; + logic ecdsa_verif_passed; + + ecdsa u_ecdsa ( + .clk (clk), + .rst_n (rst_n), + .valid (ecdsa_valid), + .z (trng_nonce), + .r (license_r), + .s (license_s), + .ready (ecdsa_ready), + .verif_passed (ecdsa_verif_passed) + ); + + // ------------------------------------------------------------------------- + // Allowance — combinational next value + // ------------------------------------------------------------------------- + + logic increment_allowance; + + // one bit wider for overflow check + wire [ALLOW_W:0] allowance_sum = {1'b0, allowance_q} + {1'b0, ALLOWANCE_INCREMENT}; + + always_comb begin + if (increment_allowance) + allowance_d = !allowance_sum[ALLOW_W] ? allowance_sum[ALLOW_W-1:0] : '1; // sum if no overflow, else max value (all 1s) + else if (allowance_q != 0) + allowance_d = allowance_q - 1; + else + allowance_d = '0; + end + + // ------------------------------------------------------------------------- + // Workload — combinational, pipelined one cycle + // ------------------------------------------------------------------------- + + assign workload_result_d = {WORKLD_W{enabled}} & (workload_a + workload_b); + assign result_valid_d = workload_valid; + + // ------------------------------------------------------------------------- + // FSM — combinational + // ------------------------------------------------------------------------- + + always_comb begin + // Register input defaults + state_d = state_q; + delay_cnt_d = delay_cnt_q; + + // Combinational signal defaults + trng_request_new = 1'b0; + nonce_ready = 1'b0; + nonce = '0; + ecdsa_valid = 1'b0; + license_ready = 1'b0; + increment_allowance = 1'b0; + + unique case (state_q) + + StInitDelay: begin + delay_cnt_d = delay_cnt_q + 1; + if (int'(delay_cnt_q) >= INIT_DELAY) + state_d = StRequestNonce; + end + + StRequestNonce: begin + trng_request_new = 1'b1; + state_d = StPublishAndWait; + end + + StPublishAndWait: begin + if (trng_nonce_valid) begin + nonce_ready = 1; + nonce = trng_nonce; + if (license_valid) begin + state_d = StWaitVerify; + end + end + end + + StWaitVerify: begin + ecdsa_valid = 1'b1; + if (ecdsa_ready) begin + license_ready = 1'b1; + if (ecdsa_verif_passed) begin + increment_allowance = 1'b1; + state_d = StRequestNonce; + end else begin + state_d = StPublishAndWait; + end + end + end + + default: ; + endcase + end + + // ------------------------------------------------------------------------- + // Assign register based outputs + // ------------------------------------------------------------------------- + + assign allowance = allowance_q; + assign enabled = (allowance_q != 0) ? 1'b1 : 1'b0; + + assign workload_result = workload_result_q; + assign result_valid = result_valid_q; + + // ------------------------------------------------------------------------- + // Sequential + // ------------------------------------------------------------------------- + + // FSM state register + always_ff @(posedge clk or negedge rst_n) begin + if (!rst_n) begin + state_q <= StInitDelay; + end else begin + state_q <= state_d; + end + end + + // Allowance register + always_ff @(posedge clk or negedge rst_n) begin + if (!rst_n) begin + allowance_q <= '0; + end else begin + allowance_q <= allowance_d; + end + end + + // Workload result pipeline registers + always_ff @(posedge clk or negedge rst_n) begin + if (!rst_n) begin + result_valid_q <= 1'b0; + workload_result_q <= '0; + end else begin + result_valid_q <= result_valid_d; + workload_result_q <= workload_result_d; + end + end + + // Init delay counter + always_ff @(posedge clk or negedge rst_n) begin + if (!rst_n) begin + delay_cnt_q <= '0; + end else begin + delay_cnt_q <= delay_cnt_d; + end + end + + +endmodule diff --git a/verilog/rtl/trng.sv b/verilog/rtl/trng.sv new file mode 100644 index 0000000..1868a37 --- /dev/null +++ b/verilog/rtl/trng.sv @@ -0,0 +1,57 @@ +// TRNG - True Random Number Generator (counter-based prototype) +// +// Generates 256-bit nonces for use as ECDSA message hashes. +// In production this would use a ring oscillator (e.g. Vasyltsov et al.); +// here a free-running counter is used for deterministic simulation. +// +// Protocol: +// 1. Assert enable to run the counter +// 2. Pulse request_new to latch the current counter value; nonce_valid rises +// 3. nonce is stable until the next request_new pulse +// 4. For simulation: assert load_seed for one cycle to seed the counter + +module trng + import arith_pkg::*; // import in module header to be used in port list +( + input logic clk, + input logic rst_n, + input logic enable, + input logic request_new, + input logic load_seed, + input logic [WIDTH-1:0] seed, + + output logic [WIDTH-1:0] nonce, + output logic nonce_valid +); + + logic [WIDTH-1:0] counter_q; + logic [WIDTH-1:0] nonce_q; + logic nonce_valid_q; + + // counter + always_ff @(posedge clk or negedge rst_n) begin + if (!rst_n) begin + counter_q <= '0; + end else begin + if (load_seed) counter_q <= seed; + else if (enable) counter_q <= counter_q + 1; + end + end + + // sampling + always_ff @(posedge clk or negedge rst_n) begin + if (!rst_n) begin + nonce_q <= '0; + nonce_valid_q <= 1'b0; + end else begin + if (request_new) begin + nonce_q <= counter_q; + nonce_valid_q <= 1'b1; + end + end + end + + assign nonce = nonce_q; + assign nonce_valid = nonce_valid_q; + +endmodule diff --git a/verilog/tb/sim_main.cpp b/verilog/tb/sim_main.cpp new file mode 100644 index 0000000..ff7bf69 --- /dev/null +++ b/verilog/tb/sim_main.cpp @@ -0,0 +1,62 @@ +#include "Vtb.h" +#include "verilated.h" +#if VM_TRACE +#include "verilated_fst_c.h" +#endif +#include + +int main(int argc, char** argv) { + VerilatedContext* ctx = new VerilatedContext; + ctx->commandArgs(argc, argv); + +#if VM_TRACE + ctx->traceEverOn(true); +#endif + + Vtb* tb = new Vtb{ctx}; + +#if VM_TRACE + VerilatedFstC* fst = new VerilatedFstC; + tb->trace(fst, 99); + fst->open("dump.fst"); + #define WAVE_DUMP(t) fst->dump(t) +#else + #define WAVE_DUMP(t) +#endif + + // initialize rst and clk + tb->rst_n = 1; + tb->clk = 0; + tb->eval(); + WAVE_DUMP(ctx->time()); + + // run 1 cycle + tb->clk = 1; tb->eval(); ctx->timeInc(1); WAVE_DUMP(ctx->time()); + tb->clk = 0; tb->eval(); ctx->timeInc(1); WAVE_DUMP(ctx->time()); + + // Assert reset + tb->rst_n = 0; + + // run 2 cycles + tb->clk = 1; tb->eval(); ctx->timeInc(1); WAVE_DUMP(ctx->time()); + tb->clk = 0; tb->eval(); ctx->timeInc(1); WAVE_DUMP(ctx->time()); + tb->clk = 1; tb->eval(); ctx->timeInc(1); WAVE_DUMP(ctx->time()); + tb->clk = 0; tb->eval(); ctx->timeInc(1); WAVE_DUMP(ctx->time()); + + // Deassert reset + tb->rst_n = 1; + + // Run the test + for (int i = 0; i < 50000000 && !ctx->gotFinish(); i++) { + tb->clk = 1; tb->eval(); ctx->timeInc(1); WAVE_DUMP(ctx->time()); + tb->clk = 0; tb->eval(); ctx->timeInc(1); WAVE_DUMP(ctx->time()); + } + +#if VM_TRACE + fst->close(); + delete fst; +#endif + delete tb; + delete ctx; + return 0; +} diff --git a/verilog/tb/tb_arith.sv b/verilog/tb/tb_arith.sv new file mode 100644 index 0000000..4e97ab8 --- /dev/null +++ b/verilog/tb/tb_arith.sv @@ -0,0 +1,248 @@ +module tb ( + input logic clk, + input logic rst_n +); + + // --------------------------------------------------------------------------- + // Test vector type and array + // --------------------------------------------------------------------------- + + typedef struct { + string name; + logic [255:0] a; + logic [255:0] b; + logic [1:0] op; // 0=add, 1=sub, 2=mul, 3=inv (b ignored) + logic prime_sel; // 0=prime_p, 1=prime_n + logic [255:0] expected; + } test_vec_t; + + // secp256k1 field prime: p = 2^256 - 2^32 - 977 + localparam logic [255:0] P = + 256'hFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFEFFFFFC2F; + // secp256k1 curve order n + localparam logic [255:0] N = + 256'hFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFEBAAEDCE6AF48A03BBFD25E8CD0364141; + + // --------------------------------------------------------------------------- + // Active tests: add + sub + mul + // When inv is implemented: append inv vectors and update NumTests + // --------------------------------------------------------------------------- + + localparam int NumTests = 24; + + localparam test_vec_t TESTS [NumTests] = '{ + + // --- Addition --- + '{name: "Add: 100 + 200 mod p", + a: 256'd100, b: 256'd200, op: 2'd0, prime_sel: 1'b0, + expected: 256'd300}, + + '{name: "Add: (p-63) + 100 mod p (wrap)", + a: P - 256'd63, b: 256'd100, op: 2'd0, prime_sel: 1'b0, + expected: 256'd37}, + + '{name: "Add: 12345 + 0 mod p", + a: 256'd12345, b: 256'd0, op: 2'd0, prime_sel: 1'b0, + expected: 256'd12345}, + + '{name: "Add: 100 + 200 mod n", + a: 256'd100, b: 256'd200, op: 2'd0, prime_sel: 1'b1, + expected: 256'd300}, + + // --- Subtraction --- + '{name: "Sub: 500 - 300 mod p", + a: 256'd500, b: 256'd300, op: 2'd1, prime_sel: 1'b0, + expected: 256'd200}, + + '{name: "Sub: 100 - 200 mod p (wrap)", + a: 256'd100, b: 256'd200, op: 2'd1, prime_sel: 1'b0, + expected: P - 256'd100}, + + '{name: "Sub: (p-1) - 10 mod p", + a: P - 256'd1, b: 256'd10, op: 2'd1, prime_sel: 1'b0, + expected: P - 256'd11}, + + '{name: "Sub: 12345 - 0 mod p", + a: 256'd12345, b: 256'd0, op: 2'd1, prime_sel: 1'b0, + expected: 256'd12345}, + + '{name: "Sub: 10 - 20 mod n (wrap)", + a: 256'd10, b: 256'd20, op: 2'd1, prime_sel: 1'b1, + expected: N - 256'd10}, + + // --- Multiplication (op=2) --- + '{name: "Mul: 3 * 5 mod p", + a: 256'd3, b: 256'd5, op: 2'd2, prime_sel: 1'b0, + expected: 256'h000000000000000000000000000000000000000000000000000000000000000f}, + + '{name: "Mul: 12345 * 0 mod p", + a: 256'd12345, b: 256'd0, op: 2'd2, prime_sel: 1'b0, + expected: 256'h0000000000000000000000000000000000000000000000000000000000000000}, + + '{name: "Mul: 12345 * 1 mod p", + a: 256'd12345, b: 256'd1, op: 2'd2, prime_sel: 1'b0, + expected: 256'h0000000000000000000000000000000000000000000000000000000000003039}, + + '{name: "Mul: 123456 * 789012 mod p", + a: 256'd123456, b: 256'd789012, op: 2'd2, prime_sel: 1'b0, + expected: 256'h00000000000000000000000000000000000000000000000000000016adfc2d00}, + + '{name: "Mul: 64-bit operands mod p", + a: 256'd12345678901234, b: 256'd98765432109876, op: 2'd2, prime_sel: 1'b0, + expected: 256'h000000000000000000000000000000000000000003f09a63c9ae1be72ffc8328}, + + '{name: "Mul: 128-bit operands mod p", + a: 256'd123456789012345678901234567890, + b: 256'd987654321098765432109876543210, op: 2'd2, prime_sel: 1'b0, + expected: 256'h00000000000000136ccc118300207d2e6cfe0022e5d56a89116ec6de5d5f3ff4}, + + '{name: "Mul: 256-bit operands mod p", + a: 256'd123456789012345678901234567890123456789, + b: 256'd987654321098765432109876543210987654321, op: 2'd2, prime_sel: 1'b0, + expected: 256'h0d936c6dd454c29c60200b8f07db6a2cc48ee37874bd5a6e7df6807e34223f56}, + + '{name: "Mul: 12345 * 67890 mod n", + a: 256'd12345, b: 256'd67890, op: 2'd2, prime_sel: 1'b1, + expected: 256'h0000000000000000000000000000000000000000000000000000000031f46c22}, + + // --- Inversion (op=3, b ignored) --- + '{name: "Inv: 3^-1 mod p", + a: 256'd3, b: 256'd0, op: 2'd3, prime_sel: 1'b0, + expected: 256'haaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa9fffffd75}, + + '{name: "Inv: 1^-1 mod p", + a: 256'd1, b: 256'd0, op: 2'd3, prime_sel: 1'b0, + expected: 256'h0000000000000000000000000000000000000000000000000000000000000001}, + + '{name: "Inv: (p-1)^-1 mod p", + a: P - 256'd1, b: 256'd0, op: 2'd3, prime_sel: 1'b0, + expected: 256'hfffffffffffffffffffffffffffffffffffffffffffffffffffffffefffffc2e}, + + '{name: "Inv: 64-bit operand mod p", + a: 256'd123456789012345, b: 256'd0, op: 2'd3, prime_sel: 1'b0, + expected: 256'h43935996906f5d218e1ec367f09936b53fc2d144ffe34e491ea06c94d9c5b23a}, + + '{name: "Inv: 128-bit operand mod p", + a: 256'd123456789012345678901234567890, b: 256'd0, op: 2'd3, prime_sel: 1'b0, + expected: 256'hfe9887f806cf8d2b479104f140d50f5ad3564dbf2da0aac4102af985dcfbdcc1}, + + '{name: "Inv: 256-bit operand mod p", + a: 256'd12345678901234567890123456789012345678901234567890, + b: 256'd0, op: 2'd3, prime_sel: 1'b0, + expected: 256'he283f7a0797a92877a86eafa0c633f36504f8c6dc2fcc48c94784d7b6b356746}, + + '{name: "Inv: 999999999999999999^-1 mod n", + a: 256'd999999999999999999, b: 256'd0, op: 2'd3, prime_sel: 1'b1, + expected: 256'h770324249fefd1cf9af30bf8abb7b824d83d80511a9c7f91c354d804d1eb0322} + }; + + // TB signal to avoid execution before reset + logic reset_done = 0; + + // --------------------------------------------------------------------------- + // DUT + // --------------------------------------------------------------------------- + + // Driven exclusively by the always block below + logic valid = 1'b0; + logic [1:0] op = '0; + logic prime_sel = 1'b0; + logic [255:0] a = '0; + logic [255:0] b = '0; + + logic ready; + logic [255:0] result; + + arith u_dut ( + .clk (clk), + .rst_n (rst_n), + .valid (valid), + .op (op), + .modulus (prime_sel ? N : P), + .a (a), + .b (b), + .ready (ready), + .result (result) + ); + + // --------------------------------------------------------------------------- + // Test sequencer — sole driver of DUT inputs + // --------------------------------------------------------------------------- + + // using separate pointers for request and response to allow back-to-back testing + int next_req_ptr = 0; // Next request to be driven to the DUT + int curr_rsp_ptr = 0; // Current response driven by the DUT when ready = 1 + int pass_count = 0; + int fail_count = 0; + int cycles = 0; // number of cycles with valid asserted for the current request + + // Using always block to allow non-blocking assignment of the DUT inputs in Verilator + always @(posedge clk or negedge rst_n) begin + if (!rst_n) begin + reset_done <= 1'b1; + + valid <= 1'b0; + op <= '0; + prime_sel <= 1'b0; + a <= '0; + b <= '0; + + end else if (reset_done) begin + + // Drive next request (or idle) + if (!valid || ready) begin + if (next_req_ptr < NumTests) begin + valid <= 1'b1; + cycles <= 1; + op <= TESTS[next_req_ptr].op; + prime_sel <= TESTS[next_req_ptr].prime_sel; + a <= TESTS[next_req_ptr].a; + b <= TESTS[next_req_ptr].b; + next_req_ptr <= next_req_ptr + 1; + end else begin + valid <= 1'b0; + cycles <= 0; + end + end + + // Check response and handle timeout + if (ready) begin + // Result available — check and advance + if (result === TESTS[curr_rsp_ptr].expected) begin + $display("PASS [%s] — %0d cycle(s)", + TESTS[curr_rsp_ptr].name, cycles); + pass_count <= pass_count + 1; + end else begin + $display("FAIL [%s]", TESTS[curr_rsp_ptr].name); + $display(" expected: %h", TESTS[curr_rsp_ptr].expected); + $display(" actual: %h", result); + fail_count <= fail_count + 1; + end + + curr_rsp_ptr <= curr_rsp_ptr + 1; + + end else if (valid) begin + // Waiting — increment cycle counter and watch for timeout + cycles <= cycles + 1; + if (cycles > 4000) begin + $display("FAIL [%s] — timeout", TESTS[curr_rsp_ptr].name); + fail_count <= fail_count + 1; + curr_rsp_ptr <= curr_rsp_ptr + 1; + valid <= 1'b0; + end + end + + // All tests complete + if (curr_rsp_ptr == NumTests) begin + $display(""); + if (fail_count == 0) + $display("All %0d arith tests passed.", pass_count); + else + $display("arith: %0d passed, %0d failed.", pass_count, fail_count); + $finish; + end + + end + end + +endmodule diff --git a/verilog/tb/tb_ecdsa.sv b/verilog/tb/tb_ecdsa.sv new file mode 100644 index 0000000..4096146 --- /dev/null +++ b/verilog/tb/tb_ecdsa.sv @@ -0,0 +1,183 @@ +module tb ( + input logic clk, + input logic rst_n +); + + // ------------------------------------------------------------------------- + // Test vector type + // ------------------------------------------------------------------------- + + typedef struct { + string name; + logic [255:0] z; + logic [255:0] r; + logic [255:0] s; + logic expect_verif_passed; + } test_vec_t; + + // ------------------------------------------------------------------------- + // Test vectors (r, s pre-computed from ECDSA signing with d=2, Q=2G) + // ------------------------------------------------------------------------- + + localparam int NumTests = 8; + + localparam test_vec_t TESTS [NumTests] = '{ + + // Test 1: Valid signature (z=12345, k=7) + '{name: "Valid: z=12345, k=7", + z: 256'h0000000000000000000000000000000000000000000000000000000000003039, + r: 256'h5cbdf0646e5db4eaa398f365f2ea7a0e3d419b7e0330e39ce92bddedcac4f9bc, + s: 256'hf5ed201cb1d1a1679c74d7d3fc42fe4c1f3ae9c52970ca600b9c474eec66cf51, + expect_verif_passed: 1'b1}, + + // Test 2: Valid signature (z=0xDEADBEEF, k=0x123456) + '{name: "Valid: z=0xDEADBEEF, k=0x123456", + z: 256'h00000000000000000000000000000000000000000000000000000000DEADBEEF, + r: 256'hf8ccf508990ceef9e5b84f5aeb9fee1739d3d3b140fc05e5b2ff58524c660ba2, + s: 256'h98c86259b3f72418d11058c5ec03fc5dca499123880da2d4a989089afef4be26, + expect_verif_passed: 1'b1}, + + // Test 3: Valid signature (256-bit z, k) + '{name: "Valid: 256-bit z, k", + z: 256'hb94d27b9934d3e08a52e52d7da7dabfac484efe37a5380ee9088f7ace2efcde9, + r: 256'hd595ec6770e9878b6ee380665e7f6785f32cf6a2f2b31343504c4e9e96622ff0, + s: 256'h8ce29a99cf1a9f77a7cd29fbd9f240b76b3215222b10ce7fa184083b2782a379, + expect_verif_passed: 1'b1}, + + // Test 4: Invalid — wrong message hash (z=99999 instead of 12345) + '{name: "Invalid: wrong z", + z: 256'h000000000000000000000000000000000000000000000000000000000001869F, + r: 256'h5cbdf0646e5db4eaa398f365f2ea7a0e3d419b7e0330e39ce92bddedcac4f9bc, + s: 256'hf5ed201cb1d1a1679c74d7d3fc42fe4c1f3ae9c52970ca600b9c474eec66cf51, + expect_verif_passed: 1'b0}, + + // Test 5: Invalid — wrong r (r=11111 instead of real r) + '{name: "Invalid: wrong r", + z: 256'h0000000000000000000000000000000000000000000000000000000000003039, + r: 256'h0000000000000000000000000000000000000000000000000000000000002B67, + s: 256'hf5ed201cb1d1a1679c74d7d3fc42fe4c1f3ae9c52970ca600b9c474eec66cf51, + expect_verif_passed: 1'b0}, + + // Test 6: Invalid — wrong s (s=22222 instead of real s) + '{name: "Invalid: wrong s", + z: 256'h0000000000000000000000000000000000000000000000000000000000003039, + r: 256'h5cbdf0646e5db4eaa398f365f2ea7a0e3d419b7e0330e39ce92bddedcac4f9bc, + s: 256'h00000000000000000000000000000000000000000000000000000000000056CE, + expect_verif_passed: 1'b0}, + + // Test 7: Valid signature (z=0xCAFEBABE, k=0x999) + '{name: "Valid: z=0xCAFEBABE, k=0x999", + z: 256'h00000000000000000000000000000000000000000000000000000000CAFEBABE, + r: 256'h2e43be7a12916cf6f312a513fcb6c98b708ce2dd18dc4ebf72a807c9c8a31b0d, + s: 256'h8f67ef46a32e112d1b99b2d2e6adbb9a55e2b8894a1dcecec8e039f56b5eb2f8, + expect_verif_passed: 1'b1}, + + // Test 8: Invalid — random z/r/s + '{name: "Invalid: random z/r/s", + z: 256'h0000000000000000000000000000000000000000000000001111111111111111, + r: 256'h0000000000000000000000000000000000000000000000002222222222222222, + s: 256'h0000000000000000000000000000000000000000000000003333333333333333, + expect_verif_passed: 1'b0} + }; + + // TB signal to avoid execution before reset + logic reset_done = 0; + + // ------------------------------------------------------------------------- + // DUT + // ------------------------------------------------------------------------- + + logic valid = 1'b0; + logic [255:0] dut_z = '0; + logic [255:0] dut_r = '0; + logic [255:0] dut_s = '0; + + logic ready; + logic verif_passed; + + ecdsa u_dut ( + .clk (clk), + .rst_n (rst_n), + .valid (valid), + .z (dut_z), + .r (dut_r), + .s (dut_s), + .ready (ready), + .verif_passed (verif_passed) + ); + + // ------------------------------------------------------------------------- + // Test sequencer + // ------------------------------------------------------------------------- + + int next_req_ptr = 0; + int curr_rsp_ptr = 0; + int pass_count = 0; + int fail_count = 0; + int cycles = 0; + + always @(posedge clk or negedge rst_n) begin + if (!rst_n) begin + reset_done <= 1'b1; + + valid <= 1'b0; + dut_z <= '0; + dut_r <= '0; + dut_s <= '0; + + end else if (reset_done) begin + + // Drive next request (or idle) + if (!valid || ready) begin + if (next_req_ptr < NumTests) begin + valid <= 1'b1; + cycles <= 1; + dut_z <= TESTS[next_req_ptr].z; + dut_r <= TESTS[next_req_ptr].r; + dut_s <= TESTS[next_req_ptr].s; + next_req_ptr <= next_req_ptr + 1; + end else begin + valid <= 1'b0; + cycles <= 0; + end + end + + // Check response and handle timeout + if (ready) begin + if (verif_passed === TESTS[curr_rsp_ptr].expect_verif_passed) begin + $display("PASS [%s] — %0d cycle(s), verif_passed=%0b", + TESTS[curr_rsp_ptr].name, cycles, verif_passed); + pass_count <= pass_count + 1; + end else begin + $display("FAIL [%s] — expected verif_passed=%0b, got %0b", + TESTS[curr_rsp_ptr].name, + TESTS[curr_rsp_ptr].expect_verif_passed, verif_passed); + fail_count <= fail_count + 1; + end + + curr_rsp_ptr <= curr_rsp_ptr + 1; + + end else if (valid) begin + cycles <= cycles + 1; + if (cycles > 10000000) begin + $display("FAIL [%s] — timeout", TESTS[curr_rsp_ptr].name); + fail_count <= fail_count + 1; + curr_rsp_ptr <= curr_rsp_ptr + 1; + valid <= 1'b0; + end + end + + // All tests complete + if (curr_rsp_ptr == NumTests) begin + $display(""); + if (fail_count == 0) + $display("All %0d ECDSA tests passed.", pass_count); + else + $display("ECDSA: %0d passed, %0d failed.", pass_count, fail_count); + $finish; + end + + end + end + +endmodule diff --git a/verilog/tb/tb_math_pkg.sv b/verilog/tb/tb_math_pkg.sv new file mode 100644 index 0000000..a1410dc --- /dev/null +++ b/verilog/tb/tb_math_pkg.sv @@ -0,0 +1,149 @@ +// Modular arithmetic utilities for testbenches (simulation only). +// +// All functions operate on WIDTH-bit unsigned integers modulo a prime m. +// Not synthesisable — uses full-width multiply and Fermat's little theorem. + +package tb_math_pkg; + + import arith_pkg::*; + + localparam int W2 = 2 * WIDTH; + + // (a + b) mod m + function automatic logic [WIDTH-1:0] mod_add( + input logic [WIDTH-1:0] a, b, m + ); + logic [WIDTH:0] sum = {1'b0, a} + {1'b0, b}; + return WIDTH'(sum >= {1'b0, m} ? sum - {1'b0, m} : sum); + endfunction + + // (a - b) mod m + function automatic logic [WIDTH-1:0] mod_sub( + input logic [WIDTH-1:0] a, b, m + ); + return a >= b ? a - b : m - (b - a); + endfunction + + // (a * b) mod m + function automatic logic [WIDTH-1:0] mod_mul( + input logic [WIDTH-1:0] a, b, m + ); + logic [W2-1:0] product = W2'(a) * W2'(b); + return WIDTH'(product % W2'(m)); + endfunction + + // a^(-1) mod m — Fermat's little theorem: a^(m-2) mod m + function automatic logic [WIDTH-1:0] mod_inv( + input logic [WIDTH-1:0] a, m + ); + logic [WIDTH-1:0] exp = m - 2; + logic [WIDTH-1:0] result = 1; + logic [WIDTH-1:0] base = a; + + for (int i = 0; i < WIDTH; i++) begin + if (exp[i]) + result = mod_mul(result, base, m); + base = mod_mul(base, base, m); + end + + return result; + endfunction + + // ------------------------------------------------------------------------- + // secp256k1 constants + // ------------------------------------------------------------------------- + + localparam logic [WIDTH-1:0] SECP256K1_P = + 256'hFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFEFFFFFC2F; + localparam logic [WIDTH-1:0] SECP256K1_N = + 256'hFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFEBAAEDCE6AF48A03BBFD25E8CD0364141; + localparam logic [WIDTH-1:0] SECP256K1_GX = + 256'h79BE667EF9DCBBAC55A06295CE870B07029BFCDB2DCE28D959F2815B16F81798; + localparam logic [WIDTH-1:0] SECP256K1_GY = + 256'h483ADA7726A3C4655DA4FBFC0E1108A8FD17B448A68554199C47D08FFB10D4B8; + + // ------------------------------------------------------------------------- + // EC point type (affine). valid=0 means point at infinity. + // ------------------------------------------------------------------------- + + typedef struct packed { + logic valid; + logic [WIDTH-1:0] x; + logic [WIDTH-1:0] y; + } ec_point_t; + + localparam ec_point_t EC_INF = '{valid: 1'b0, x: '0, y: '0}; + localparam ec_point_t EC_G = '{valid: 1'b1, x: SECP256K1_GX, y: SECP256K1_GY}; + + // ------------------------------------------------------------------------- + // Affine point addition over secp256k1 (mod p) + // ------------------------------------------------------------------------- + + function automatic ec_point_t ec_add(input ec_point_t p1, input ec_point_t p2); + logic [WIDTH-1:0] p, lam, x3, y3; + p = SECP256K1_P; + + if (!p1.valid) return p2; + if (!p2.valid) return p1; + + if (p1.x == p2.x) begin + if (p1.y == p2.y) begin + // Doubling: λ = 3x₁² / 2y₁ + lam = mod_mul(mod_mul(256'd3, mod_mul(p1.x, p1.x, p), p), + mod_inv(mod_add(p1.y, p1.y, p), p), p); + end else begin + return EC_INF; + end + end else begin + // General: λ = (y₂ - y₁) / (x₂ - x₁) + lam = mod_mul(mod_sub(p2.y, p1.y, p), mod_inv(mod_sub(p2.x, p1.x, p), p), p); + end + + x3 = mod_sub(mod_sub(mod_mul(lam, lam, p), p1.x, p), p2.x, p); + y3 = mod_sub(mod_mul(lam, mod_sub(p1.x, x3, p), p), p1.y, p); + return '{valid: 1'b1, x: x3, y: y3}; + endfunction + + // ------------------------------------------------------------------------- + // Scalar multiplication: k * P (double-and-add) + // ------------------------------------------------------------------------- + + function automatic ec_point_t ec_mul(input logic [WIDTH-1:0] k, input ec_point_t pt); + ec_point_t acc = EC_INF; + ec_point_t cur = pt; + + for (int i = 0; i < WIDTH; i++) begin + if (k[i]) + acc = ec_add(acc, cur); + cur = ec_add(cur, cur); + end + + return acc; + endfunction + + // ------------------------------------------------------------------------- + // ECDSA signature type and sign function + // ------------------------------------------------------------------------- + + typedef struct packed { + logic [WIDTH-1:0] r; + logic [WIDTH-1:0] s; + } ecdsa_sig_t; + + // Sign message hash z with private key d and nonce k + function automatic ecdsa_sig_t ecdsa_sign( + input logic [WIDTH-1:0] z, d, k + ); + ec_point_t kg; + logic [WIDTH-1:0] n, r_val, k_inv, s_val; + + n = SECP256K1_N; + kg = ec_mul(k, EC_G); + r_val = kg.x % n; + k_inv = mod_inv(k, n); + s_val = mod_mul(k_inv, mod_add(z, mod_mul(r_val, d, n), n), n); + + return '{r: r_val, s: s_val}; + endfunction + +endpackage diff --git a/verilog/tb/tb_top.sv b/verilog/tb/tb_top.sv new file mode 100644 index 0000000..1646493 --- /dev/null +++ b/verilog/tb/tb_top.sv @@ -0,0 +1,634 @@ +// tb_security_block.sv +// +// Nonce derivation: +// trng_load_seed=1 fires on the first posedge after rst_n deasserts. +// TRNG counter = TRNG_SEED after that edge, then increments. +// trng_request_new pulses 100 cycles later (after INIT_DELAY_CYCLES). + +`include "tb_math_pkg.sv" + +module tb ( + input logic clk, + input logic rst_n +); + import arith_pkg::*; + import tb_math_pkg::*; + + // ------------------------------------------------------------------------- + // DUT signals + // ------------------------------------------------------------------------- + + logic license_valid = 1'b0; + logic license_ready; + logic [WIDTH-1:0] license_r = '0; + logic [WIDTH-1:0] license_s = '0; + logic workload_valid = 1'b0; + logic [7:0] workload_a = '0; + logic [7:0] workload_b = '0; + logic [WIDTH-1:0] trng_seed = '0; + logic trng_load_seed = 1'b0; + + logic [WIDTH-1:0] nonce; + logic nonce_ready; + logic [7:0] workload_result; + logic result_valid; + logic [63:0] allowance; + logic enabled; + + security_block u_dut ( + .clk (clk), + .rst_n (rst_n), + .license_valid (license_valid), + .license_ready (license_ready), + .license_r (license_r), + .license_s (license_s), + .workload_valid (workload_valid), + .workload_a (workload_a), + .workload_b (workload_b), + .trng_seed (trng_seed), + .trng_load_seed (trng_load_seed), + .nonce (nonce), + .nonce_ready (nonce_ready), + .workload_result(workload_result), + .result_valid (result_valid), + .allowance (allowance), + .enabled (enabled) + ); + + // ------------------------------------------------------------------------- + // Constants + // ------------------------------------------------------------------------- + + localparam logic [WIDTH-1:0] TRNG_SEED = 256'd12345; + + // ECDSA signing: d=2 (Q=2G), k=7 + localparam logic [WIDTH-1:0] PRIV_KEY = 256'd2; + localparam logic [WIDTH-1:0] SIGN_K = 256'd7; + + localparam int VERIFY_TIMEOUT = 15_000_000; + localparam int NONCE_TIMEOUT = 300; + + // ------------------------------------------------------------------------- + // TB state machine + // ------------------------------------------------------------------------- + + typedef enum logic [4:0] { + PH_INIT, + PH_T1_CHECK, + PH_T2_DRIVE, PH_T2_CHECK, + PH_T3_CHECK, + PH_T4_SUBMIT, PH_T4_CHECK, + PH_T5_DRIVE, PH_T5_CHECK, + PH_T6_SUBMIT, PH_T6_CHECK, + PH_T7_DRIVE, PH_T7_CHECK, + PH_T8_DRIVE, PH_T8_CHECK, + PH_T9_DRIVE, PH_T9_CHECK, + PH_T10_DRIVE, PH_T10_CHECK, + PH_T11_WAIT, PH_T11_CHECK, + PH_T12_SUBMIT, PH_T12_CHECK, + PH_T13_SUBMIT, PH_T13_CHECK, + PH_T14_SUBMIT, PH_T14_WAIT, PH_T14_REPLAY, PH_T14_CHECK, + PH_DONE + } ph_e; + + ph_e phase; + logic reset_done = 1'b0; + int wait_cnt = 0; + int pass_count = 0; + int fail_count = 0; + logic [63:0] saved_allow; + logic [WIDTH-1:0] saved_nonce; + logic [WIDTH-1:0] saved_r; + logic [WIDTH-1:0] saved_s; + + // ------------------------------------------------------------------------- + // Sequencer + // ------------------------------------------------------------------------- + + // Previous cycle phase register for edge detection + ph_e phase_d1; + always @(posedge clk or negedge rst_n) begin + if (!rst_n) begin + phase_d1 <= PH_INIT; + end else begin + phase_d1 <= phase; + end + end + + // Timeout counter + always @(posedge clk or negedge rst_n) begin + if (!rst_n) begin + wait_cnt <= 0; + end else if (phase != phase_d1) begin // phase change → reset counter + wait_cnt <= 0; + end else begin // else increment + wait_cnt <= wait_cnt + 1; + end + end + + // Stimulus and checks - FSM + + // Stimulus driving and checking logic + always @(posedge clk or negedge rst_n) begin + if (!rst_n) begin + phase <= PH_INIT; + + reset_done <= 1'b1; + trng_seed <= TRNG_SEED; + trng_load_seed <= 1'b1; + license_valid <= 1'b0; + license_r <= '0; + license_s <= '0; + workload_valid <= 1'b0; + workload_a <= '0; + workload_b <= '0; + pass_count <= 0; + fail_count <= 0; + end else if (reset_done) begin + + case (phase) + + // ------------------------------------------------------- + PH_INIT: begin + trng_load_seed <= 1'b0; // one-cycle seed pulse done + phase <= PH_T1_CHECK; + end + + // ------------------------------------------------------- + // T1: Initial state + // ------------------------------------------------------- + PH_T1_CHECK: begin + if (allowance == '0 && enabled == 1'b0) begin + $display("PASS [T1 initial state] allowance=0 enabled=0"); + pass_count <= pass_count + 1; + end else begin + $display("FAIL [T1 initial state] allowance=%0d enabled=%0b", + allowance, enabled); + fail_count <= fail_count + 1; + end + phase <= phase.next(); + end + + // ------------------------------------------------------- + // T2: Workload blocked (enabled=0) + // ------------------------------------------------------- + PH_T2_DRIVE: begin + workload_valid <= 1'b1; + workload_a <= 8'd10; + workload_b <= 8'd20; + phase <= PH_T2_CHECK; + end + + PH_T2_CHECK: begin + workload_valid <= 1'b0; + if (result_valid) begin + + // Check + if (workload_result == 8'd0) begin + $display("PASS [T2 workload blocked] result=0"); + pass_count <= pass_count + 1; + end else begin + $display("FAIL [T2 workload blocked] result=%0d (expected 0)", + workload_result); + fail_count <= fail_count + 1; + end + + // Next test + phase <= phase.next(); + end + end + + // ------------------------------------------------------- + // T3: nonce_ready asserts correctly + // ------------------------------------------------------- + PH_T3_CHECK: begin + if (nonce_ready) begin + $display("PASS [T3 nonce_ready] nonce=0x%h", nonce); + pass_count <= pass_count + 1; + phase <= phase.next(); + end else if (wait_cnt > NONCE_TIMEOUT) begin + $fatal("FAIL [T3 nonce ready] timeout"); + end + end + + // ------------------------------------------------------- + // T4: Submit valid license + // ------------------------------------------------------- + PH_T4_SUBMIT: begin + if (nonce_ready) begin + ecdsa_sig_t sig; + + assert(allowance == 0) else $fatal("Expected allowance=0 at license submission, got %0d", allowance); + + sig = ecdsa_sign(nonce, PRIV_KEY, SIGN_K); + license_valid <= 1'b1; + license_r <= sig.r; + license_s <= sig.s; + + phase <= PH_T4_CHECK; + end + end + + PH_T4_CHECK: begin + + // Hold license_r/s until license_ready pulses + if (license_ready) begin + license_valid <= 1'b0; + license_r <= '0; + license_s <= '0; + end + + // Check allowance afterwards (when valid is back to deasserted) + if (!license_valid) begin + + if (allowance != '0) begin + $display("PASS [T4 valid license] allowance incremented to %0d", allowance); + pass_count <= pass_count + 1; + end else begin + $display("FAIL [T4 valid license] allowance not incremented"); + fail_count <= fail_count + 1; + end + + phase <= phase.next(); + end else if (wait_cnt > VERIFY_TIMEOUT) begin + $fatal("FAIL [T4 valid license] handshake timeout"); + end + end + + // ------------------------------------------------------- + // T5: Workload unblocked — 50 + 30 = 80 + // ------------------------------------------------------- + PH_T5_DRIVE: begin + + assert(allowance != '0) else $fatal("Expected allowance>0 before driving T5, got %0d", allowance); + assert(enabled == 1) else $fatal("Expected enabled=1 before driving T5, got %0d", enabled); + + workload_valid <= 1'b1; + workload_a <= 8'd50; + workload_b <= 8'd30; + phase <= PH_T5_CHECK; + end + + PH_T5_CHECK: begin + workload_valid <= 1'b0; + + if (result_valid) begin + + // Check + if (workload_result == 8'd80) begin + $display("PASS [T5 workload unblocked] 50+30=%0d", workload_result); + pass_count <= pass_count + 1; + end else begin + $display("FAIL [T5 workload unblocked] expected 80 got %0d", + workload_result); + fail_count <= fail_count + 1; + end + + // Next test + phase <= phase.next(); + end + end + + // ------------------------------------------------------- + // T6: Invalid license — expect rejection, nonce unchanged + // Submit VALID_R/VALID_S (valid for z=NONCE_1) against + // the current nonce (which is no longer NONCE_1). + // ------------------------------------------------------- + PH_T6_SUBMIT: begin + if (nonce_ready) begin + license_valid <= 1'b1; + license_r <= 256'd11111; + license_s <= 256'd22222; + saved_allow <= allowance; + saved_nonce <= nonce; + phase <= PH_T6_CHECK; + end + end + + PH_T6_CHECK: begin + // Hold license_r/s until license_ready pulses + if (license_ready) begin + license_valid <= 1'b0; + license_r <= '0; + license_s <= '0; + end + + // Check after deassert + if (!license_valid) begin + if (allowance <= saved_allow && nonce == saved_nonce) begin + $display("PASS [T6 invalid license] allowance not incremented, nonce unchanged"); + pass_count <= pass_count + 1; + end else begin + $display("FAIL [T6 invalid license] allowance incremented or nonce changed \ + (allowance=%0d, expected <=%0d; nonce=0x%h, expected 0x%h)", + allowance, saved_allow, nonce, saved_nonce); + fail_count <= fail_count + 1; + end + phase <= phase.next(); + end else if (wait_cnt > VERIFY_TIMEOUT) begin + $fatal("FAIL [T6 invalid license] timeout"); + end + end + + // ------------------------------------------------------- + // T7: Workload — 50 + 30 = 80 (positive values) + // ------------------------------------------------------- + PH_T7_DRIVE: begin + workload_valid <= 1'b1; + workload_a <= 8'd50; + workload_b <= 8'd30; + phase <= PH_T7_CHECK; + end + + PH_T7_CHECK: begin + workload_valid <= 1'b0; + + if (result_valid) begin + + // Check + if (workload_result == 8'd80) begin + $display("PASS [T7 50+30] result=%0d", workload_result); + pass_count <= pass_count + 1; + end else begin + $display("FAIL [T7 50+30] expected 80 got %0d", workload_result); + fail_count <= fail_count + 1; + end + + // Next test + phase <= phase.next(); + end + end + + // ------------------------------------------------------- + // T8: -10 + -20 = -30 + // ------------------------------------------------------- + PH_T8_DRIVE: begin + workload_valid <= 1'b1; + workload_a <= 8'hF6; // -10 + workload_b <= 8'hEC; // -20 + phase <= PH_T8_CHECK; + end + + PH_T8_CHECK: begin + workload_valid <= 1'b0; + + if (result_valid) begin + + // Check + if (workload_result == 8'hE2) begin // -30 + $display("PASS [T8 -10+-20] result=0x%h", workload_result); + pass_count <= pass_count + 1; + end else begin + $display("FAIL [T8 -10+-20] expected 0xE2 got 0x%h", workload_result); + fail_count <= fail_count + 1; + end + + // Next test + phase <= phase.next(); + end + end + + // ------------------------------------------------------- + // T9: 100 + -30 = 70 + // ------------------------------------------------------- + PH_T9_DRIVE: begin + workload_valid <= 1'b1; + workload_a <= 8'd100; + workload_b <= 8'hE2; // -30 + phase <= PH_T9_CHECK; + end + + PH_T9_CHECK: begin + workload_valid <= 1'b0; + + if (result_valid) begin + + // Check + if (workload_result == 8'd70) begin + $display("PASS [T9 100+-30] result=%0d", workload_result); + pass_count <= pass_count + 1; + end else begin + $display("FAIL [T9 100+-30] expected 70 got %0d", workload_result); + fail_count <= fail_count + 1; + end + + // Next test + phase <= phase.next(); + end + end + + // ------------------------------------------------------- + // T10: 127 + 1 = -128 (overflow wrapping) + // ------------------------------------------------------- + PH_T10_DRIVE: begin + workload_valid <= 1'b1; + workload_a <= 8'd127; + workload_b <= 8'd1; + phase <= PH_T10_CHECK; + end + + PH_T10_CHECK: begin + workload_valid <= 1'b0; + + if (result_valid) begin + + // Check + if (workload_result == 8'h80) begin // -128 + $display("PASS [T10 127+1 overflow] result=0x%h", workload_result); + pass_count <= pass_count + 1; + end else begin + $display("FAIL [T10 127+1 overflow] expected 0x80 got 0x%h", workload_result); + fail_count <= fail_count + 1; + end + + // Next test + phase <= phase.next(); + end + end + + // ------------------------------------------------------- + // T11: Allowance decrements by 1 per cycle + // ------------------------------------------------------- + PH_T11_WAIT: begin + if (wait_cnt == 0) + saved_allow <= allowance; // capture starting allowance at beginning of wait + if (wait_cnt == 100) + phase <= phase.next(); + end + + PH_T11_CHECK: begin + if ( allowance >= (saved_allow - 105) && + allowance <= (saved_allow - 95) ) begin + $display("PASS [T11 allowance decrement] delta=%0d over ~100 cycles", saved_allow - allowance); + pass_count <= pass_count + 1; + end else begin + $display("FAIL [T11 allowance decrement] delta=%0d, expected ~100", saved_allow - allowance); + fail_count <= fail_count + 1; + end + phase <= phase.next(); + end + + // ------------------------------------------------------- + // T12: New nonce after valid license + // Sign the current nonce dynamically, submit, check + // that the nonce changes afterwards. + // ------------------------------------------------------- + PH_T12_SUBMIT: begin + if (nonce_ready) begin + ecdsa_sig_t sig; + sig = ecdsa_sign(nonce, PRIV_KEY, SIGN_K); + license_valid <= 1'b1; + license_r <= sig.r; + license_s <= sig.s; + saved_nonce <= nonce; + phase <= PH_T12_CHECK; + end + end + + PH_T12_CHECK: begin + // Hold license_r/s until license_ready pulses + if (license_ready) begin + license_valid <= 1'b0; + license_r <= '0; + license_s <= '0; + end + + // Check when next nonce is ready (after handshake completed) + if (!license_valid && nonce_ready) begin + if (nonce != saved_nonce) begin + $display("PASS [T12 new nonce] nonce changed"); + pass_count <= pass_count + 1; + end else begin + $display("FAIL [T12 new nonce] nonce unchanged"); + fail_count <= fail_count + 1; + end + phase <= phase.next(); + end else if (wait_cnt > VERIFY_TIMEOUT) begin + $fatal("FAIL [T12 new nonce] timeout"); + end + end + + // ------------------------------------------------------- + // T13: License signed for wrong nonce is rejected + // Sign a wrong nonce (9999), submit against the + // current nonce. Expect rejection. + // ------------------------------------------------------- + PH_T13_SUBMIT: begin + if (nonce_ready) begin + ecdsa_sig_t sig; + sig = ecdsa_sign(256'd9999, PRIV_KEY, SIGN_K); + license_valid <= 1'b1; + license_r <= sig.r; + license_s <= sig.s; + saved_allow <= allowance; + saved_nonce <= nonce; + phase <= PH_T13_CHECK; + end + end + + PH_T13_CHECK: begin + // Hold license_r/s until license_ready pulses + if (license_ready) begin + license_valid <= 1'b0; + license_r <= '0; + license_s <= '0; + end + + // Check after deassert + if (!license_valid) begin + if (allowance <= saved_allow && nonce == saved_nonce) begin + $display("PASS [T13 wrong nonce] allowance not incremented, nonce unchanged"); + pass_count <= pass_count + 1; + end else begin + $display("FAIL [T13 wrong nonce] allowance incremented or nonce changed \ + (allowance=%0d, expected <=%0d; nonce=0x%h, expected 0x%h)", + allowance, saved_allow, nonce, saved_nonce); + fail_count <= fail_count + 1; + end + phase <= phase.next(); + end else if (wait_cnt > VERIFY_TIMEOUT) begin + $fatal("FAIL [T13 wrong nonce] timeout"); + end + end + + // ------------------------------------------------------- + // T14: Replay attack — submit same signature twice + // First submission is valid (signs current nonce), + // second reuses the same (r, s) against a new nonce. + // ------------------------------------------------------- + PH_T14_SUBMIT: begin + if (nonce_ready) begin + ecdsa_sig_t sig; + sig = ecdsa_sign(nonce, PRIV_KEY, SIGN_K); + license_valid <= 1'b1; + license_r <= sig.r; + license_s <= sig.s; + saved_r <= sig.r; + saved_s <= sig.s; + phase <= PH_T14_WAIT; + end + end + + PH_T14_WAIT: begin + // Hold license_r/s until license_ready pulses + if (license_ready) begin + license_valid <= 1'b0; + license_r <= '0; + license_s <= '0; + phase <= PH_T14_REPLAY; + end + end + + PH_T14_REPLAY: begin + if (nonce_ready) begin + // Replay the saved signature against the new nonce + license_valid <= 1'b1; + license_r <= saved_r; + license_s <= saved_s; + saved_allow <= allowance; + saved_nonce <= nonce; + phase <= PH_T14_CHECK; + end + end + + PH_T14_CHECK: begin + // Hold license_r/s until license_ready pulses + if (license_ready) begin + license_valid <= 1'b0; + license_r <= '0; + license_s <= '0; + end + + // Check after deassert + if (!license_valid) begin + if (allowance <= saved_allow && nonce == saved_nonce) begin + $display("PASS [T14 replay attack] allowance not incremented, nonce unchanged"); + pass_count <= pass_count + 1; + end else begin + $display("FAIL [T14 replay attack] allowance incremented or nonce changed \ + (allowance=%0d, expected <=%0d; nonce=0x%h, expected 0x%h)", + allowance, saved_allow, nonce, saved_nonce); + fail_count <= fail_count + 1; + end + phase <= phase.next(); + end else if (wait_cnt > VERIFY_TIMEOUT) begin + $fatal("FAIL [T14 replay attack] timeout"); + end + end + + // ------------------------------------------------------- + PH_DONE: begin + $display(""); + if (fail_count == 0) + $display("All %0d security_block tests passed.", pass_count); + else + $display("security_block: %0d passed, %0d FAILED.", pass_count, fail_count); + $finish; + end + + default: ; + endcase + end + end + +endmodule