Skip to content

fix(transcribe): auto-fallback to CPU + int8 when CUDA is unavailable#19

Open
fadenb wants to merge 1 commit intopretyflaco:mainfrom
fadenb:main
Open

fix(transcribe): auto-fallback to CPU + int8 when CUDA is unavailable#19
fadenb wants to merge 1 commit intopretyflaco:mainfrom
fadenb:main

Conversation

@fadenb
Copy link
Copy Markdown

@fadenb fadenb commented May 7, 2026

Summary

  • TranscriptionConfig.post_init no longer raises ValueError when device='cuda' (or torch_device='cuda') is requested but CUDA is not present. Instead it automatically falls back to 'cpu' and downgrades compute_type from 'float16' to 'int8' (float16 is unsupported on CPU).
  • Model-loading log now indicates whether CPU was explicitly requested (forced), automatically chosen because no GPU was found (fallback - no GPU), or torch is missing (no torch).

Motivation
Running meet run on a machine without a GPU (laptop, container without passthrough, CI runner) currently crashes with an unhelpful ValueError. The user must know to pass --device cpu --compute-type int8 manually. This change makes it "just work" - the common case shouldn't require flags.

Test plan
I currently can only do one of them as I lack a device with CUDA capable GPU.

  • Run meet run on a machine without a GPU - should see warning, then transcribe successfully with int8 on CPU
  • Run meet run --device cpu on a machine with a GPU - should see (forced) in the log and use CPU as requested
  • Run meet run on a machine with a GPU - should use CUDA with float16 as before (no behavioural change)

Instead of raising ValueError when the requested CUDA device is not
present, automatically fall back to CPU and downgrade compute_type from
float16 to int8 (float16 is unsupported on CPU).  Also indicate whether
CPU is forced or a fallback in the model-loading print message.
Copy link
Copy Markdown
Owner

@pretyflaco pretyflaco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Direction is right — the current ValueError is genuinely user-hostile on no-GPU machines. The PR is well-scoped and the motivation is clear. Approving.

A few small follow-ups I'd want before this hits a release:

  1. Unit test for the __post_init__ fallback. Mocking _torch_device_available lets us cover all three of your test scenarios in CI without the hardware mix you flagged. ~15 lines in tests/test_transcribe.py.
  2. Warning log should mention the compute_type change. Right now a user who passed --compute-type float16 explicitly sees only the device fallback warning, while compute_type silently flips to int8. One-line append.
  3. (Out of scope, just noting:) meet check on a CUDA-less machine should keep working — your change preserves the if available is None: continue branch so this should be fine, I'll smoke-test it on my side.

You're blocked on hardware for two of your three test scenarios anyway, so happy to land this as-is and push a follow-up commit with the unit test + warning tweak — or if you'd rather do it yourself for the learning, take a few days and add them here. Either works for me, just let me know which you prefer.

Either way, thanks for the clean PR — the diagnosis in the description was excellent.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants