Skip to content
This repository was archived by the owner on Dec 2, 2025. It is now read-only.
This repository was archived by the owner on Dec 2, 2025. It is now read-only.

[BUG] GPU overflow during generation #14

@Lyaaaaaaaaaaaaaaa

Description

@Lyaaaaaaaaaaaaaaa

Describe the bug
The GPU runs out of memory when generating text.

If your GPU has enough (but there is not much left) VRAM to barely load the model, it will start up with gpu(no fall back to cpu) somehow. But in that case, the VRAM usage overshoots when trying to generate the first sentence. Therefore, an error does not occur immediately after loading, and an error occurs in the first generation.

Expected behavior
An error message and a fall back to the CPU or emptying the GPU cache, something like that.

Metadata

Metadata

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions