-
Notifications
You must be signed in to change notification settings - Fork 68
Closed
Description
Trying to generate images with TRT SD3 results in gibberish.
Resulting image
Workflow:
Screenshot of workflow:
Workflow file taken from #30
Additional info:
Engine file was generated on Debian Stable headless.
GPU: RTX 3060 12GB
RAM: 64GB
[06/20/2024-20:30:41] [TRT] [I] The logger passed into createInferBuilder differs from one already provided for an existing builder, runtime, or refitter. Uses of the global logger, returned by nvinfer1::getLogger(), will return the existing value.
[06/20/2024-20:30:41] [TRT] [I] [MemUsageChange] Init CUDA: CPU +0, GPU +0, now: CPU 4714, GPU 201 (MiB)
[06/20/2024-20:30:42] [TRT] [I] [MemUsageChange] Init builder kernel library: CPU +1634, GPU +288, now: CPU 6495, GPU 489 (MiB)
[06/20/2024-20:30:42] [TRT] [I] ----------------------------------------------------------------
[06/20/2024-20:30:42] [TRT] [I] Input filename: /home/anon/AI/ComfyUI/temp/1718915424.0667076/model.onnx
[06/20/2024-20:30:42] [TRT] [I] ONNX IR version: 0.0.8
[06/20/2024-20:30:42] [TRT] [I] Opset version: 17
[06/20/2024-20:30:42] [TRT] [I] Producer name: pytorch
[06/20/2024-20:30:42] [TRT] [I] Producer version: 2.1.2
[06/20/2024-20:30:42] [TRT] [I] Domain:
[06/20/2024-20:30:42] [TRT] [I] Model version: 0
[06/20/2024-20:30:42] [TRT] [I] Doc string:
[06/20/2024-20:30:42] [TRT] [I] ----------------------------------------------------------------
Read 1425569 bytes from timing cache.
[06/20/2024-20:30:45] [TRT] [I] Global timing cache in use. Profiling results in this builder pass will be stored.
[06/20/2024-20:31:14] [TRT] [I] Detected 4 inputs and 1 output network tensors.
[06/20/2024-20:31:14] [TRT] [I] Total Host Persistent Memory: 5552
[06/20/2024-20:31:14] [TRT] [I] Total Device Persistent Memory: 0
[06/20/2024-20:31:14] [TRT] [I] Total Scratch Memory: 156175360
[06/20/2024-20:31:14] [TRT] [I] [BlockAssignment] Started assigning block shifts. This will take 4 steps to complete.
[06/20/2024-20:31:14] [TRT] [I] [BlockAssignment] Algorithm ShiftNTopDown took 0.022366ms to assign 3 blocks to 4 nodes requiring 169282560 bytes.
[06/20/2024-20:31:14] [TRT] [I] Total Activation Memory: 169282560
[06/20/2024-20:31:15] [TRT] [I] Total Weights Memory: 4069294720
[06/20/2024-20:31:15] [TRT] [I] Engine generation completed in 29.4208 seconds.
[06/20/2024-20:31:15] [TRT] [I] [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 2901 MiB, GPU 11296 MiB
[06/20/2024-20:31:16] [TRT] [I] [MemUsageStats] Peak memory usage during Engine building and serialization: CPU: 26537 MiB
[06/20/2024-20:31:17] [TRT] [I] Serialized 5654 bytes of code generator cache.
[06/20/2024-20:31:17] [TRT] [I] Serialized 1406431 bytes of compilation cache.
[06/20/2024-20:31:17] [TRT] [I] Serialized 128 timing cache entries
ComfyUI commit: d5efde89b76c40c72668daec052c07c71a737908
ComfyUI_TensorRT commit: a9a6923 (latest)
SDXL works with TRT and nets a speedup of 80% (from 13s to 7s) for a 20step 1024x1024 image
Metadata
Metadata
Assignees
Labels
No labels

