A Python program that generates a hardware-accelerated inference runtime from an ONNX model. Both the hardware and the software that drives it are generated automatically, meaning that we are essentially making a Model -> Runtime compiler. The runtime is intended to run in an embedded system and be accelerated by a custom accelerator generated by Versat, a CGRA compiler.
Currently the project is in a very unstable initial stage. Files and folder structures are volatile and will change in the future. The current code is capable of extracting useful info from an ONNX model and generating the barebones structure of the runtime and the necessary code to test the implementation, but currently only a small number of operators are implemented.
The current high-level priorities of the project are, in order:
- Implement more operators. Currently we do not have enough operators implemented in order to fully run a single model using hardware acceleretion. Some operators are done in software.
- (Optionally) - Either integrate open-source or implement tools that optimize models for inference (apply optimizations like folding batch normalization layers into convolutional layers, among other optimizations)
To run a simple test, simply run the command '''make pc-emul-run'''. This test is generated by creating a onnx model with each currently supported operator, creating random tests for each operator and then generating the runtime and running it in the emulation environment provided by the py2hwsw project.
To run the same test in simulation, run the command '''make sim-run'''.
This project was funded through the NGI0 Core Fund, a fund established by NLnet with financial support from the European Commission's Next Generation Internet programme, under the aegis of DG Communications Networks, Content and Technology under grant agreement No 101092990.