feat: better dynamic axes #28

carlschader-saronic · 2025-08-31T22:52:46Z

Description

Added support for as many dynamic axes as you want wherever you want. It doesn't just have to be the first dimension of a tensor.

Testing

All the examples run
Also integrated into a certain saronic downstream program and it works

Notes

Finished adding support for dynamic axes in libinfer wherever whenever: Some comparison

libinfer 0.0.4 DETR benchmark.rs

2025-08-31T21:32:50.467814Z  INFO benchmark: inference calls    : 4096
2025-08-31T21:32:50.467820Z  INFO benchmark: total latency      : 54.53879
2025-08-31T21:32:50.467823Z  INFO benchmark: avg. frame latency : 0.0066575673
2025-08-31T21:32:50.467825Z  INFO benchmark: avg. frame fps     : 150.20502
2025-08-31T21:32:50.467826Z  INFO benchmark: avg. batch latency : 0.013315135
2025-08-31T21:32:50.467828Z  INFO benchmark: avg. batch fps     : 75.10251

libinfer 0.0.5 (dynamic axes of death) DETR benchmark.rs

2025-08-31T21:29:09.053947Z  INFO benchmark: inference calls    : 4096
2025-08-31T21:29:09.053951Z  INFO benchmark: total latency      : 30.404646
2025-08-31T21:29:09.053954Z  INFO benchmark: avg. batch latency : 0.0074230093
2025-08-31T21:29:09.053955Z  INFO benchmark: avg. batch fps     : 134.71625

libinfer 0.0.4 yolov8

2025-08-31T21:37:40.372793Z  INFO benchmark: inference calls    : 4096
2025-08-31T21:37:40.372798Z  INFO benchmark: total latency      : 50.759754
2025-08-31T21:37:40.372800Z  INFO benchmark: avg. frame latency : 0.012392518
2025-08-31T21:37:40.372802Z  INFO benchmark: avg. frame fps     : 80.69385
2025-08-31T21:37:40.372803Z  INFO benchmark: avg. batch latency : 0.012392518
2025-08-31T21:37:40.372805Z  INFO benchmark: avg. batch fps     : 80.69385

libinfer 0.0.5 yolov8

2025-08-31T21:39:45.790839Z  INFO benchmark: inference calls    : 4096
2025-08-31T21:39:45.790845Z  INFO benchmark: total latency      : 59.263123
2025-08-31T21:39:45.790847Z  INFO benchmark: avg. batch latency : 0.014468536
2025-08-31T21:39:45.790849Z  INFO benchmark: avg. batch fps     : 69.11549

I found a major optimization bug where we were prematurely synchronizing the cuda stream. I introduced this in 0.0.4. By removing this we have a pretty massive performance improvement on larger models. Strangely I am getting better performance on the new tracker trained DETR model than yolov8. The DETR model is quite a bit larger and has two transformers so I am suprised. Not complaining though, this is nearly a 2x performance improvement

We are still IO bound on f32 output tensors. Will save that for 0.0.6

… don't

… refactoring for new datatypes

…ike. Now need to update engine.cpp logic

…ill need to finish the output tensor part of infer

… basic.rs

freeman94

Mostly notes for future PRs

freeman94 · 2025-10-03T17:34:44Z

flake.nix

+        pkgs-unstable = import nixpkgs-unstable { 
+          inherit system; 
+          config.allowUnfree = true;


Prefer to keep on a stable release

freeman94 · 2025-10-03T17:36:07Z

src/lib.rs

    struct TensorInfo {
        name: String,
-        dims: Vec<u32>,
+        shape: Vec<i64>, // -1 for dynamic dimensions


Would be nice if we could convert this into an enum at some point in the future, rather than having to check for -1 and then interpret the min/max/opt fields.

freeman94 · 2025-10-03T17:37:19Z

src/lib.rs

        name: String,
        data: Vec<u8>,
+        shape: Vec<i64>, // this should always be positive, just i64 for convenience
        dtype: TensorDataType,


This could also be genericized by dtype so that the data Vec is appropriately cast without the user having to do so.

freeman94 · 2025-10-03T18:08:58Z

src/engine.cpp

+
+  // ASSUMPTION: we always use optimization profile 0
+  // set the optimization profile to 0 so we can query output shapes after setting input shapes
+  mContext->setOptimizationProfileAsync(0, stream);


Probably want to make this configurable in the future.

…d only effect it on load and unload of the model

carlschader-saronic added 30 commits August 22, 2025 15:21

ne functional test that test models of arbitrary input/output lengths

6e4ebfe

feat: functional test

fccbf40

hmm. yolo.onnx and yolo.engine have the same outputs but other models…

161d613

… don't

wip

fc2822f

onnx-trt-cmp script

9c258ac

added benchmarking to the onnx trt compare

0ad7ade

execution providers

986964c

fix: cudnn in dev shell

6f9cb16

working on adding int64 and bool and using u8 outputs instead of f32

0034aa9

InputDataType -> TensorDataType

8a146c7

accounting for tensor sizes

f67b92e

basic.rs works

15d05ce

dynamic works

ae1f25e

debug logs in infer

79c9add

benchmark logging. Not sure yet but I think we lost performance after…

17362cc

… refactoring for new datatypes

fixed functional test example

295d525

update onnx trt cmp example for dtype support

eccc224

bump version

32f9c0b

fixed funtional_test.rs

1077023

merge conflicts

8550bbb

logging

22ca885

fix: add dtype to output tensor

bd62a0e

updated lib.rs and engine.h with what the new interface should look l…

bdad208

…ike. Now need to update engine.cpp logic

refactored alot of stuff. I think I found a major performance bug. st…

4171678

…ill need to finish the output tensor part of infer

finished infer. Now I just need to update examples and test

62fa6f9

fixed complier errors

af85153

fixed the buffer ptr indexes and output buffer max size allocation

7fd6575

fixed get io info functions to account for dynamic axes. also updated…

365dead

… basic.rs

updated benchmark example

d8f6572

dynamic updated

fcdad72

carlschader-saronic added 6 commits August 31, 2025 18:17

functional test updated

b34e558

onnx compare update

aef08fa

version increment

e094f94

added min/opt/max shapes to TensorInfo

21c60a5

feat: mixed tensor io type support in onnx trt compare script

04f99e4

decrement version num

c30fe8d

freeman94 changed the title ~~Version 0.0.5 Better dynamic axes~~ feat: better dynamic axes Sep 3, 2025

carlschader-saronic added 4 commits September 4, 2025 12:10

fix: updated lock

b364400

chore: merge conflicts

7e60286

update flake to use 10.9 trt

7f5e110

chore: rm loggers in onnx trt cmp script

1a93fb9

freeman94 approved these changes Oct 6, 2025

View reviewed changes

carlschader-saronic added 7 commits December 11, 2025 12:18

fix: wrap load with stderr wrapper

5150dda

feat: globally redirect stdout to stderr

5e04358

fix: the global redirect was affecting the package caller. This shoul…

86338f9

…d only effect it on load and unload of the model

fix: also redirect on inference

3c37da6

chore: wip rm redirects for testing

f81f973

chore: wip

9721478

fix: revert all redirects

9f1904d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: better dynamic axes #28

feat: better dynamic axes #28

Uh oh!

carlschader-saronic commented Aug 31, 2025 •

edited

Loading

Uh oh!

freeman94 left a comment

Uh oh!

freeman94 Oct 3, 2025

Uh oh!

freeman94 Oct 3, 2025

Uh oh!

freeman94 Oct 3, 2025

Uh oh!

freeman94 Oct 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

feat: better dynamic axes #28

Are you sure you want to change the base?

feat: better dynamic axes #28

Uh oh!

Conversation

carlschader-saronic commented Aug 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Testing

Notes

Uh oh!

freeman94 left a comment

Choose a reason for hiding this comment

Uh oh!

freeman94 Oct 3, 2025

Choose a reason for hiding this comment

Uh oh!

freeman94 Oct 3, 2025

Choose a reason for hiding this comment

Uh oh!

freeman94 Oct 3, 2025

Choose a reason for hiding this comment

Uh oh!

freeman94 Oct 3, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

carlschader-saronic commented Aug 31, 2025 •

edited

Loading