Skip to content

[Verilog] XLS-Based Arith/Math Unit Generator #240

@Jiahui17

Description

@Jiahui17

[Verilog] XLS-Based Arith/Math Unit Generator

Overview

Unlike flopoco for VHDL, there is no dedicated arith/math generator for Verilog. XLS is a solid synthesis tool for mid-level synthesis. We write DSL (called "dslx") to generate Verilog there. It has some extremely useful features for Dynamatic:

  • Handshake communication: XLS-generated hardware (called "proc")
    communicate with external world using the same handshake protocol as we do.
  • Floating-point arithmetics: XLS has some support for FP-arithmetic.
  • Configurable pipelining: XLS wraps the arithmetic functions by itself and applies delay-driven SDC scheduling to get a good pipeline solution.

XLS is also extensively fuzzed and has very few bugs.

As an example, consider the following XLS proc written in dslx, which implements an FP adder (thanks @schilkp for writing this):

import std;
import apfloat;
import float32;

proc addf32 {

    lhs: chan<F32> in;
    rhs: chan<F32> in;
    result: chan<F32> out;

    init { () } 

    config(lhs: chan<F32> in, rhs: chan<F32> in, result: chan<F32> out) {
        (lhs, rhs, result)
    }

    next(_ : ()) {
        let (tok_a, a) = recv(join(), lhs);
        let (tok_b, b) = recv(join(), rhs);
        send(join(tok_a, tok_b), result, float32::add(a,b));
    }
}

XLS can automatically convert it into a dataflow unit that works out of the box in our circuit (except the port names are not exactly the same):

module addf32(
  input wire clk,
  input wire rst,
  input wire [31:0] xls_float_ips__rhs,
  input wire xls_float_ips__rhs_valid,
  input wire [31:0] xls_float_ips__lhs,
  input wire xls_float_ips__lhs_valid,
  input wire xls_float_ips__result_ready,
  output wire [31:0] xls_float_ips__result,
  output wire xls_float_ips__result_valid,
  output wire xls_float_ips__rhs_ready,
  output wire xls_float_ips__lhs_ready
);
// ... Implementation details omitted ...
endmodule

Currently, all the FP units have fixed implementation (only one circuit instance that might only work for one frequency); we could utilize XLS to customize the generation of those units for required maximum frequency.

Proposed Changes

Here is an overview on how we are going to use XLS in our HLS flow:

  • We create a library of dataflow units in DSLX (i.e., addf, mulf, etc.).
  • During Dynamatic HLS compilation, we determine the pipeline stages (+other configs) for these units during circuit optimization (according to the unit type and our frequency target).
  • During RTL generation, we use XLS to generate these units using the pipeline stages (+other configs) determined in the previous step, and adapt them to our circuit.

Detail 1: Unit Generator for Everything

Now it is more sensible to have a generator per each module. For instance, in the rtl-config-verilog.json, we could have something like:

  {
    "name": "handshake.addf",
    "parameters": [
      { "name": "PIPELINE_STAGES", "type": "unsigned" },
      { "name": "DATA_WIDTH", "type": "unsigned", "generic": true }
    ],
    "generator": "\"$DYNAMATIC/bin/generators/addf-generator-verilog\" \"$OUTPUT_DIR/$MODULE_NAME.v\" $PIPELINE_STAGES",
    "hdl": "verilog"
  },

Which calls the generator to generate a Verilog adapter and to generate FP cores using XLS:

module handshake_addf_0 (
  // inputs
  input  clk,
  input  rst,
  input  [DATA_TYPE - 1 : 0] lhs,
  input  lhs_valid,
  input  [DATA_TYPE - 1 : 0] rhs,
  input  rhs_valid,
  input  result_ready,
  // outputs
  output [DATA_TYPE - 1 : 0] result,
  output result_valid,
  output lhs_ready,
  output rhs_ready
);
  
  xls_float_ips__addf32_stage_9 ip (
    .clk(clk),
    .rst(rst),
    .xls_float_ips__lhs(lhs),
    .xls_float_ips__lhs_vld(lhs_valid),
    .xls_float_ips__lhs_rdy(lhs_ready),
    .xls_float_ips__rhs(rhs),
    .xls_float_ips__rhs_vld(rhs_valid),
    .xls_float_ips__rhs_rdy(rhs_ready),
    .xls_float_ips__result(result),
    .xls_float_ips__result_vld(result_valid),
    .xls_float_ips__result_rdy(result_ready)
  );
endmodule

Detail 2: Extra Dependency

Adding XLS as a dependency in Dynamatic wouldn't make the build process much more complex: we could fetch a specific XLS release when building Dynamatic.

Challenges / Discussions

  • Timing models. XLS's timing model might not be compatible with ours.
  • Missing operations. XLS doesn't provide some FP ops (e.g., FP division). We can just implement them algorithmically and utilize XLS to get a good pipeline solution. Here are some implementations. Yet, these ad-hoc units must be properly tested.
  • Runtime. Using XLS as a backend adds some runtime overhead to HDL generation. But this is easily parallelizable.

Tasks:

These tasks are necessary for the circuit functionality:

  • Implement an DSLX library for all arith/math handshake units.
  • Write generator scripts for adapting the XLS-generated design to our design.
  • Integrate the generator scripts in rtl-config-verilog.json.

These tasks are necessary for the circuit quality:

  • For each unit, sweep over different pipeline stages to find the stages needed for a particular frequency target.Here are more details on this.
  • Using the results above, we can build an adapter numStages = getNumPipelineStages(handshakeOp, Fmax).

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementImprovement to existing feature of DynamaticfeatureAbout a new feature in Dynamatic

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions