Skip to content
This repository was archived by the owner on Dec 2, 2025. It is now read-only.
This repository was archived by the owner on Dec 2, 2025. It is now read-only.

[BUG] "topk_cpu" not implemented for 'Half' #31

@Lyaaaaaaaaaaaaaaa

Description

@Lyaaaaaaaaaaaaaaa

Describe the bug
The server crashes when using float16 without CUDA.

To Reproduce
Steps to reproduce the behavior:

  1. Run without CUDA
  2. Load an AI with float16
  3. Generate something
  4. See error

Expected behavior
It should fall back to float32 to avoid crash.

UserWarning: You are calling .generate() with the input_ids being on a device type different than your model's device. input_ids is on cpu, whereas the model is on cuda. You may experience unexpected behaviors or slower generation. Please make sure that you have put input_ids to the correct device by calling for example input_ids = input_ids.to('cuda') before running .generate().
warnings.warn(
"topk_cpu" not implemented for 'Half'

Metadata

Metadata

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions