[BUG] "topk_cpu" not implemented for 'Half'

**Describe the bug**
The server crashes when using float16 without CUDA.

**To Reproduce**
Steps to reproduce the behavior:
1. Run without CUDA
2. Load an AI with float16
3. Generate something
4. See error

**Expected behavior**
It should fall back to float32 to avoid crash.

> UserWarning: You are calling .generate() with the `input_ids` being on a device type different than your model's device. `input_ids` is on cpu, whereas the model is on cuda. You may experience unexpected behaviors or slower generation. Please make sure that you have put `input_ids` to the correct device by calling for example input_ids = input_ids.to('cuda') before running `.generate()`.
  warnings.warn(
"topk_cpu" not implemented for 'Half'

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[BUG] "topk_cpu" not implemented for 'Half' #31

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[BUG] "topk_cpu" not implemented for 'Half' #31

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions