Skip to content

Implementation of Fast Token Delivery flow within Dynamatic #177

@pcineverdies

Description

@pcineverdies

Description

The work from Unleashing Parallelism in Elastic Circuits with Faster Token Delivery [1] is still missing in the codebase. The goal of this issue is to outline a roadmap for their implementation, highlighting what needs to be done.

The Fast Token Delivery (FTD) methodology is a different algorithm to build the handshake IR of the the cf IR. It gets rid of the concepts of basic blocks in the original CFG, relying only on the relationship between producers and consumers among different operations. In the paper, it was shown it can provide significant execution time advantage, together with a reduction of area utilization.

The methodology aims at getting rid of anything related to control dependencies between basic blocks. However, at a first stage of the implementation, the network of control merges is maintained. This allows to allocate the memory operations to the LSQ, and to indicate the termination of the execution of the kernel to the memory interfaces.

The idea of the issue is to design an alternative version of the current CfToHandshake pass (let's call it FtdCfToHandshake) which can be enabled through a flag at compile time (compile --fast-token-delivery). In this way, the current version of Dynamatic requires no change.
Also, since this methodology is about the conversion from the cf dialect to the handshake dialect, the rest of the Dynamatic flow remains unchanged (considering both the transformations to the handshake IR and the following lowering to the hw dialect).

Steps

  • Finish a Boolean Logic library in Dynamatic.
    The FTD methodology requires many analysis of boolean conditions within the circuit (it is mainly about understanding which conditions must be satisfied to go from node A to node B following a given path). A draft of this library is already present in the Experimental folder. However, some new features must be added in order for it to be used in the algorithm. We are aware that having a custom library might be redundant in the codebase - taking into account that Espresso is already among the dependencies of the project. Once that FTD is implemented, we aim at replacing the current custom library with something else.

  • Implement an analysis pass to obtain control dependency information on the CFG.
    This analysis pass (based on [4]) is a fundamental step of the FTD algorithm and the GSA conversion.

  • Implement an analysis pass which extracts the GSA information out of the MLIR's SSA representation.
    The FTD methodology is based on GSA, an alternative version to SSA which employs boolean conditions together with phi nodes (phi nodes are not explicit in MLIR, but encoded through block arguments in the cf dialect). There are many papers showing how to approach such a problem, such as [2] and [3]. The main consequence of having GSA is that, instead of using merge nodes for phi functions, we can go for multiplexers.

  • Implement the FTD methodology according to [1].
    This step is the largest among the ones presented in the issue, as it requires to fully rewrite the CfToHandshake pass. In this first phase, the way memories are handled remains identical to what is done in the current Dynamatic (thus without SQ, using the network of control merges to allocate blocks in the LSQ). This can be further divided in the following steps:

    • Initial pass structure and required classes (intermediate non complete step);
    • Use the GSA information to get rid of block arguments in basic blocks and substitute them with multiplexers (intermediate non complete step);
    • Use the start signal to trigger constants and undefined values in the circuit (intermediate non complete step);
    • Add the regeneration mechanism for signals in loops (intermediate non complete step);
    • Add the suppression mechanism for signals (final complete step);

Additional comments

As I mentioned above, the rest of the flow is not modified.
Some issues might arise from the interaction with other passes.
Since this is something not strictly related to the FTD algorithm (but rather, on the consistency of the whole flow) some new issues will be opened later on.


[1] A. Elakhras, A. Guerrieri, L. Josipović, and P. Ienne, “Unleashing parallelism in elastic circuits with faster token delivery,” in 2022 32nd International Conference on Field-Programmable Logic and Applications (FPL), IEEE, 2022, pp. 253–261. Accessed: Oct. 14, 2024. [Online]. Available: https://ieeexplore.ieee.org/abstract/document/10035134/
[2] P. Havlak, “Construction of thinned gated single-assignment form,” in Languages and Compilers for Parallel Computing, U. Banerjee, D. Gelernter, A. Nicolau, and D. Padua, Eds., Berlin, Heidelberg: Springer, 1994, pp. 477–499. doi: 10.1007/3-540-57659-2_28.
[3] P. Tu and D. Padua, “Efficient Building and Placing of Gating Functions,” ACM SIGPLAN Notices, vol. 30, no. 6, pp. 47–55, Jan. 1995, doi: 10.1145/223428.207115.
[4] J. Ferrante, K. J. Ottenstein, and J. D. Warren, “The program dependence graph and its use in optimization,” ACM Trans. Program. Lang. Syst., vol. 9, no. 3, pp. 319–349, Jul. 1987, doi: 10.1145/24039.24041.

Metadata

Metadata

Assignees

Labels

featureAbout a new feature in DynamaticmajorKey functionnality not part of the core flow

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions