This repository contains a Verilog implementation of an AES-128 encryption system optimized to drastically reduce dynamic power consumption through Architecture-Driven Voltage Scaling.
By employing an N=2 hardware duplication strategy and staggered execution, the architecture lowers the effective per-lane operating rate while strictly preserving the aggregate system throughput.
Synthesized with a baseline 100 MHz target frequency and optimized with Voltage Scaling (
| Metric | Baseline (Single Core, 1.0V) | Proposed (N=2, 0.95V) | Change |
|---|---|---|---|
| Total Dynamic Power | 0.700 W | 0.329 W | -53.0% |
| Total On-Chip Power | 0.832 W | 0.457 W | -45.1% |
| Throughput | 1.28 Gbps | 1.28 Gbps | Preserved |
| Per-Block Latency | 100 ns | 200 ns | +100% |
| Slice LUTs | 3,978 | 5,313 | +1,335 |
| Slice Registers (FFs) | 262 | 3,905 | +3,643 |
Note: The remaining dynamic power in the proposed design (0.258 W) is completely dominated by the massive I/O pin switching (384+ pins for AES interfaces) at 3.3V. The core AES logic alone accounts for only 0.071 W (71 mW) of dynamic power.
The area increase is an expected architectural tradeoff resulting from duplication overhead, input broadcast routing, completion multiplexing, and Reorder Buffer (ROB) sequence tracking.
- N=2 Hardware Duplication: Two parallel AES lanes share an input broadcast, dividing the workload.
- Staggered Execution: Utilizes one global clock combined with a round-robin phase counter to generate per-lane clock-enable (CE) pulses.
-
Effective Update Rate: Each lane operates at an effective rate of
$f_{sample}/2$ , allowing for lower dynamic switching power. -
Voltage Scaling: By relaxing the timing constraints through parallel execution, operating voltages (
vccintlowered to minimum 0.95V,vccauxto 1.71V,mgtavccto 0.95V) have been scaled down in the.xdcconstraints. This aggressively reduces core dynamic power down to 71 mW within the Artix-7 device limits.
- Baseline Core Execution: Features a standard 10-round AES execution after the initial add-round-key (one round per clock).
- Sequence-Aware Retirement: Lane outputs in the parallel path are tagged by sequence ID and written to a Reorder Buffer (ROB).
- In-Order Emittance: Ensures the ciphertext is retired strictly in its original input order, cleanly abstracting the parallel execution from the downstream logic.
The design has been verified using Verilog testbenches against the NIST AES-128 ECB Known-Answer Test (KAT) vectors.
Simulation Success:
- Baseline (
tb_aes_top.v): Validates the correctness of the single-core baseline encryption. - Proposed (
tb_aes_top_parallel.v): Validates the staggered CE dispatch and confirms the in-order output retirement from the ROB.
How to Run Simulation: Use your preferred Verilog simulator. Compile all source files under
src/rtl/together with the selected testbench fromsim/tb/.
AES128-LowPower-Architecture/
├── constraints/
│ └── aes_top.xdc
├── src/
│ └── rtl/
│ ├── addroundkey.v
│ ├── aes128_core.v
│ ├── aes128_core_ce.v
│ ├── aes_final_round.v
│ ├── aes_round.v
│ ├── aes_top.v
│ ├── aes_top_parallel.v
│ ├── key_expand.v
│ ├── mixcolumns.v
│ ├── sbox.v
│ ├── shiftrows.v
│ ├── subbytes.v
│ └── subword.v
├── sim/
│ └── tb/
│ ├── tb_aes_top.v
│ └── tb_aes_top_parallel.v
├── .gitattributes
├── .gitignore
└── README.md