Skip to content

[Issue] Stable Diffusion 3 TRT Generates gibberish #41

@1aienthusiast

Description

@1aienthusiast

Trying to generate images with TRT SD3 results in gibberish.

Resulting image

ComfyUI_00554_

Workflow:

Screenshot of workflow:

image

Workflow file taken from #30

Additional info:

Engine file was generated on Debian Stable headless.
GPU: RTX 3060 12GB
RAM: 64GB

[06/20/2024-20:30:41] [TRT] [I] The logger passed into createInferBuilder differs from one already provided for an existing builder, runtime, or refitter. Uses of the global logger, returned by nvinfer1::getLogger(), will return the existing value.
[06/20/2024-20:30:41] [TRT] [I] [MemUsageChange] Init CUDA: CPU +0, GPU +0, now: CPU 4714, GPU 201 (MiB)
[06/20/2024-20:30:42] [TRT] [I] [MemUsageChange] Init builder kernel library: CPU +1634, GPU +288, now: CPU 6495, GPU 489 (MiB)
[06/20/2024-20:30:42] [TRT] [I] ----------------------------------------------------------------
[06/20/2024-20:30:42] [TRT] [I] Input filename:   /home/anon/AI/ComfyUI/temp/1718915424.0667076/model.onnx
[06/20/2024-20:30:42] [TRT] [I] ONNX IR version:  0.0.8
[06/20/2024-20:30:42] [TRT] [I] Opset version:    17
[06/20/2024-20:30:42] [TRT] [I] Producer name:    pytorch
[06/20/2024-20:30:42] [TRT] [I] Producer version: 2.1.2
[06/20/2024-20:30:42] [TRT] [I] Domain:           
[06/20/2024-20:30:42] [TRT] [I] Model version:    0
[06/20/2024-20:30:42] [TRT] [I] Doc string:       
[06/20/2024-20:30:42] [TRT] [I] ----------------------------------------------------------------
Read 1425569 bytes from timing cache.
[06/20/2024-20:30:45] [TRT] [I] Global timing cache in use. Profiling results in this builder pass will be stored.
[06/20/2024-20:31:14] [TRT] [I] Detected 4 inputs and 1 output network tensors.
[06/20/2024-20:31:14] [TRT] [I] Total Host Persistent Memory: 5552
[06/20/2024-20:31:14] [TRT] [I] Total Device Persistent Memory: 0
[06/20/2024-20:31:14] [TRT] [I] Total Scratch Memory: 156175360
[06/20/2024-20:31:14] [TRT] [I] [BlockAssignment] Started assigning block shifts. This will take 4 steps to complete.
[06/20/2024-20:31:14] [TRT] [I] [BlockAssignment] Algorithm ShiftNTopDown took 0.022366ms to assign 3 blocks to 4 nodes requiring 169282560 bytes.
[06/20/2024-20:31:14] [TRT] [I] Total Activation Memory: 169282560
[06/20/2024-20:31:15] [TRT] [I] Total Weights Memory: 4069294720
[06/20/2024-20:31:15] [TRT] [I] Engine generation completed in 29.4208 seconds.
[06/20/2024-20:31:15] [TRT] [I] [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 2901 MiB, GPU 11296 MiB
[06/20/2024-20:31:16] [TRT] [I] [MemUsageStats] Peak memory usage during Engine building and serialization: CPU: 26537 MiB
[06/20/2024-20:31:17] [TRT] [I] Serialized 5654 bytes of code generator cache.
[06/20/2024-20:31:17] [TRT] [I] Serialized 1406431 bytes of compilation cache.
[06/20/2024-20:31:17] [TRT] [I] Serialized 128 timing cache entries

ComfyUI commit: d5efde89b76c40c72668daec052c07c71a737908
ComfyUI_TensorRT commit: a9a6923 (latest)
SDXL works with TRT and nets a speedup of 80% (from 13s to 7s) for a 20step 1024x1024 image

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions