Skip to content

Conversation

@0xrushi
Copy link
Contributor

@0xrushi 0xrushi commented Jan 5, 2026

  • Updated restart.sh, start.sh, and stop.sh to accept optional arguments for service management commands.
  • Enhanced services.py to support new compute mode 'strixhalo' for service deployment.
  • Modified wizard.py and related configuration files to include Strix Halo as a valid compute option.
  • Updated Docker Compose files to define services for Strix Halo, ensuring compatibility with AMD ROCm.
  • Improved documentation and comments for clarity on new configurations and service options.
image

Summary by CodeRabbit

  • New Features

    • Added AMD Strix Halo (NPU) as a compute acceleration option alongside existing CPU and GPU variants
    • Service containers now include dedicated configurations for different hardware accelerators
  • Documentation

    • Expanded documentation with startup flow diagrams and comprehensive system architecture explanations
  • Improvements

    • Enhanced script flexibility for custom argument handling across initialization and service management workflows

✏️ Tip: You can customize this high-level summary in your review settings.

- Updated restart.sh, start.sh, and stop.sh to accept optional arguments for service management commands.
- Enhanced services.py to support new compute mode 'strixhalo' for service deployment.
- Modified wizard.py and related configuration files to include Strix Halo as a valid compute option.
- Updated Docker Compose files to define services for Strix Halo, ensuring compatibility with AMD ROCm.
- Improved documentation and comments for clarity on new configurations and service options.
- Simplified the installation of system dependencies by removing conditional package manager checks and directly using `dnf` for installation.
- Cleaned up the Dockerfile to enhance readability and maintainability.
@thestumonkey
Copy link
Contributor

will test this

@AnkushMalaker
Copy link
Collaborator

@coderabbitai review

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 6, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 6, 2026

📝 Walkthrough

Walkthrough

This PR extends the system to support AMD Strix Halo (NPU) as a new accelerator variant alongside existing CPU and NVIDIA GPU options. Changes span configuration files, build recipes, dependency management, service orchestration, and startup workflows across ASR services and speaker recognition components.

Changes

Cohort / File(s) Summary
Documentation
Docs/init-system.md
Adds detailed sections explaining startup flow, service variant selection (COMPUTE_MODE-driven), and per-accelerator build recipes; includes repeated sections and minor formatting adjustments.
ASR Services Configuration
extras/asr-services/.env.template, extras/asr-services/init.py
Expands CUDA version options and interactive prompts to include strixhalo option ("4"); updates CLI argument choices.
ASR Services Build & Composition
extras/asr-services/Dockerfile_Parakeet, extras/asr-services/docker-compose.yml
Adds ffmpeg to runtime dependencies, replaces uv sync with --frozen flag; introduces two GPU variants (parakeet-asr-nvidia for NVIDIA, parakeet-asr-strixhalo for AMD/ROCm) with platform-specific device access, group options, and environment configuration.
ASR Services Dependencies
extras/asr-services/pyproject.toml
Introduces strixhalo extra with rocm-gfx1151 index and dependency resolution; treats strixhalo as separate extra in conflict resolution.
Speaker Recognition Configuration
extras/speaker-recognition/init.py
Extends compute mode options to include "strixhalo" (option "3"); adds conditional logic to set PYTORCH_CUDA_VERSION to strixhalo when selected; updates CLI argument choices.
Speaker Recognition Build
extras/speaker-recognition/Dockerfile.strixhalo
New Dockerfile for strixhalo variant; uses kyuz0/vllm-therock-gfx1151 base, includes conditional pyannote.audio installation and ROCm-specific environment variables.
Speaker Recognition Composition
extras/speaker-recognition/docker-compose.yml
Adds speaker-service-strixhalo variant with AMD/ROCm device access (/dev/kfd, /dev/dri) and platform-specific security options; extends web-ui and nginx service profiles to include strixhalo.
Speaker Recognition Dependencies
extras/speaker-recognition/pyproject.toml
Updates Python requirement to >=3.11,<3.14; introduces strixhalo extra with extensive pinned/git-based dependencies; excludes pyannote.audio from strixhalo; adds UV required-environments and conflict resolution for strixhalo.
Speaker Recognition Import Handling
extras/speaker-recognition/src/simple_speaker_recognition/api/routers/identification.py, speakers.py
Relocates create_speaker_analysis imports from module-level to function-level to avoid ROCm/BLAS runtime conflicts with sklearn/UMAP.
Service Orchestration
services.py, wizard.py
Extends services.py to read PYTORCH_CUDA_VERSION from .env for asr-services and select appropriate docker-compose profile/service variant; updates speaker-recognition logic to select service based on COMPUTE_MODE; adds strixhalo support to wizard.py COMPUTE_MODE and PYTORCH_CUDA_VERSION reuse logic.
Startup Scripts
start.sh, stop.sh, restart.sh
Converts single-line commands to Bash scripts with shebangs; adds conditional logic to default to --all when no arguments provided, otherwise forward user arguments via "$@".
Wizard Integration
wizard.sh
Updates command invocation to forward all script arguments to wizard.py via "$@".

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Suggested reviewers

  • thestumonkey

Poem

🐰 A strix halo shines so bright,
NPU power, what a sight!
From cpu to gpu, now three,
Variants dancing wild and free,
Docker profiles hop with glee! 🎉

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and accurately summarizes the main changes in the PR: enhancing service management scripts and adding support for a new hardware accelerator (Strix Halo).
Docstring Coverage ✅ Passed Docstring coverage is 90.00% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
  • 📝 Generate docstrings

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Fix all issues with AI Agents 🤖
In @extras/asr-services/pyproject.toml:
- Around line 89-92: The index entry named "rocm-gfx1151" currently points to
AMD nightlies and should be changed to a stable AMD PyPI index: update the URL
in the [[tool.uv.index]] block for name "rocm-gfx1151" to use the official
stable endpoint (e.g. https://pypi.amd.com/simple or a versioned path like
https://pypi.amd.com/rocm-7.0.2/simple) and, if you intentionally need
nightlies, add a brief comment nearby explaining that rationale and why nightly
packages are required for this project.
🧹 Nitpick comments (7)
extras/speaker-recognition/src/simple_speaker_recognition/api/routers/speakers.py (1)

79-81: Good workaround for ROCm OpenMP conflicts.

The local import approach is reasonable for avoiding OpenMP/BLAS threading conflicts between PyTorch (ROCm) and scikit-learn/UMAP libraries. Python's import caching ensures minimal performance overhead.

However, please verify this actually prevents segmentation faults in your ROCm environment, as conflicts can still occur if both libraries are used in the same process.

For additional robustness, consider also setting these environment variables at service startup:

import os
os.environ["OMP_NUM_THREADS"] = "1"
os.environ["MKL_NUM_THREADS"] = "1"

This can further reduce OpenMP conflicts when multiple libraries are linked against different OpenMP runtimes.

extras/asr-services/.env.template (1)

5-7: Comment could be misleading for AMD users.

The instruction to check CUDA version with nvidia-smi doesn't apply when selecting strixhalo for AMD hardware.

🔎 Suggested improvement
 # Options: cu121 (CUDA 12.1), cu126 (CUDA 12.6), cu128 (CUDA 12.8), strixhalo (AMD strixhalo apu)
-# Should match your system's CUDA version (check with: nvidia-smi)
+# For NVIDIA: should match your system's CUDA version (check with: nvidia-smi)
+# For AMD Strix Halo: use 'strixhalo'
 PYTORCH_CUDA_VERSION=cu126
extras/speaker-recognition/Dockerfile.strixhalo (1)

14-15: Pin the uv version for reproducibility.

While the base image is pinned to a specific SHA, the uv installation uses :latest, which could lead to inconsistent builds over time. Consider pinning to a specific version like 0.9.21.

🔎 Suggested fix
-COPY --from=ghcr.io/astral-sh/uv:latest /uv /bin/uv
+COPY --from=ghcr.io/astral-sh/uv:0.9.21 /uv /bin/uv
wizard.sh (1)

1-1: Missing shebang line.

The script lacks a shebang (#!/bin/bash or #!/bin/sh), which could cause unexpected behavior depending on how it's executed. Other scripts in this PR (e.g., stop.sh) include the shebang for consistency.

🔎 Suggested fix
+#!/bin/bash
 uv run --with-requirements setup-requirements.txt wizard.py "$@"
services.py (1)

154-174: Consider adding AMD device detection for Strix Halo.

The logic correctly validates NVIDIA device nodes (/dev/nvidiactl, /dev/nvidia0) before starting the NVIDIA variant, falling back to CPU if absent. For symmetry, consider checking for AMD ROCm device nodes (/dev/kfd) when strixhalo is configured to provide a similar safeguard and user-friendly warning.

🔎 Suggested enhancement
         if command == 'up':
             if pytorch_variant == 'strixhalo':
-                cmd.extend(['--profile', 'strixhalo', 'up', '-d', 'parakeet-asr-strixhalo'])
+                # Check for AMD ROCm device nodes
+                if Path("/dev/kfd").exists():
+                    cmd.extend(['--profile', 'strixhalo', 'up', '-d', 'parakeet-asr-strixhalo'])
+                else:
+                    console.print("[yellow]⚠️  asr-services configured for Strix Halo (AMD ROCm), but /dev/kfd not found; starting CPU variant instead.[/yellow]")
+                    cmd.extend(['up', '-d', 'parakeet-asr'])
             elif pytorch_variant.startswith('cu'):
extras/speaker-recognition/pyproject.toml (2)

61-83: Consider documenting the sys_platform == 'never' pattern more prominently.

The marker hack to prevent uv from installing base-image-provided packages is clever but unconventional. Future maintainers may be confused by this pattern.

Also, note the OpenTelemetry version constraints: opentelemetry-exporter-otlp>=1.34.0 has a lower minimum than opentelemetry-api/sdk>=1.39.0. While pip/uv will resolve to compatible versions, explicitly aligning these minimums (all to >=1.39.0) would be cleaner.

🔎 Suggested alignment for OpenTelemetry versions
     # OpenTelemetry dependencies for pyannote.audio
     "opentelemetry-api>=1.39.0",
     "opentelemetry-sdk>=1.39.0",
-    "opentelemetry-exporter-otlp>=1.34.0",
+    "opentelemetry-exporter-otlp>=1.39.0",

110-110: Pin git dependency to a specific commit or tag for reproducible builds.

The pyannote.audio git source will pull from the default branch (main), which could change at any time. This makes builds non-reproducible and could introduce unexpected breaking changes.

🔎 Suggested fix to pin to a specific revision
-"pyannote.audio" = { git = "https://github.com/pyannote/pyannote-audio.git", extra = "strixhalo" }
+"pyannote.audio" = { git = "https://github.com/pyannote/pyannote-audio.git", rev = "v3.3.2", extra = "strixhalo" }

Replace v3.3.2 with the appropriate tag or commit hash that has been tested with the Strix Halo/ROCm setup.

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1036a17 and 0dcd50e.

⛔ Files ignored due to path filters (1)
  • extras/speaker-recognition/uv.lock is excluded by !**/*.lock
📒 Files selected for processing (18)
  • Docs/init-system.md
  • extras/asr-services/.env.template
  • extras/asr-services/Dockerfile_Parakeet
  • extras/asr-services/docker-compose.yml
  • extras/asr-services/init.py
  • extras/asr-services/pyproject.toml
  • extras/speaker-recognition/Dockerfile.strixhalo
  • extras/speaker-recognition/docker-compose.yml
  • extras/speaker-recognition/init.py
  • extras/speaker-recognition/pyproject.toml
  • extras/speaker-recognition/src/simple_speaker_recognition/api/routers/identification.py
  • extras/speaker-recognition/src/simple_speaker_recognition/api/routers/speakers.py
  • restart.sh
  • services.py
  • start.sh
  • stop.sh
  • wizard.py
  • wizard.sh
🧰 Additional context used
🧬 Code graph analysis (1)
stop.sh (2)
extras/asr-services/init.py (1)
  • run (226-253)
extras/speaker-recognition/init.py (1)
  • run (416-447)
🔇 Additional comments (27)
extras/speaker-recognition/src/simple_speaker_recognition/api/routers/identification.py (1)

575-577: Consistent application of the OpenMP conflict workaround.

The local import pattern is applied consistently in both analyze_annotation_segments and analyze_segments_with_enrolled_speakers. This aligns well with the same approach used in speakers.py.

The comments clearly document the rationale, making it easy for future maintainers to understand why imports are deferred in these specific functions.

Also applies to: 710-712

Docs/init-system.md (1)

195-246: LGTM! Clear and comprehensive documentation for the startup flow.

The Mermaid diagrams effectively illustrate the two-layer architecture (Setup vs Run) and the compute mode selection logic. The documentation accurately reflects the new strixhalo option alongside existing cpu and gpu modes.

extras/speaker-recognition/Dockerfile.strixhalo (2)

28-42: Well-structured dependency management for ROCm compatibility.

The approach of exporting requirements while pruning torch-related packages preserves the base image's ROCm torch stack. Installing pyannote.audio from git with --no-deps correctly avoids pulling incompatible PyPI dependencies.


56-65: LGTM!

Environment configuration is appropriate for ROCm with comprehensive LD_LIBRARY_PATH setup. The CMD correctly uses the absolute path to the entrypoint in the base image's venv.

extras/asr-services/Dockerfile_Parakeet (2)

20-21: LGTM!

The --frozen flag ensures strict adherence to the lockfile, preventing unexpected dependency updates during builds.


14-16: No action needed. The ffmpeg installation is correctly placed only in the builder stage. Runtime audio processing relies on soundfile and libsndfile1 (already installed in runtime at line 30), not ffmpeg. The application code contains no runtime usage of ffmpeg.

start.sh (1)

1-6: LGTM!

Clean implementation that preserves backward compatibility while enabling selective service management. Using "$@" correctly handles arguments containing spaces.

restart.sh (1)

2-6: LGTM!

Correctly implements the same argument-forwarding pattern as start.sh. The absence of --build in the default case aligns with the documented behavior that restart should not rebuild images.

wizard.py (3)

60-61: Verify the need for --prerelease=allow.

Adding --prerelease=allow to uv commands enables installation of prerelease package versions. This could introduce instability if unintentional.

Is this flag required for specific dependencies in the setup process, or is it a precautionary measure for future compatibility?


221-223: LGTM!

Correctly extends the compute mode validation to include strixhalo, enabling configuration reuse for AMD Strix Halo setups.


229-231: LGTM!

The CUDA version validation correctly includes strixhalo alongside existing NVIDIA CUDA versions, maintaining consistency with the .env.template and init.py options.

stop.sh (1)

1-6: LGTM!

The script correctly implements the conditional wrapper pattern: defaults to --all when no arguments are provided and forwards all arguments otherwise. The use of "$@" properly preserves argument quoting.

extras/asr-services/pyproject.toml (1)

39-42: LGTM!

The strixhalo optional dependency group follows the established pattern of the existing CUDA variants, maintaining consistency in the dependency configuration.

extras/asr-services/init.py (1)

155-172: LGTM!

The Strix Halo option is correctly integrated into both the interactive prompt and CLI argument handling. The choice mappings are consistent and the label "AMD Strix Halo (NPU)" clearly identifies the platform.

extras/asr-services/docker-compose.yml (2)

78-79: Security relaxation with seccomp:unconfined.

Disabling seccomp filtering reduces container isolation. While this is often required for ROCm to function properly, it increases the attack surface. Consider documenting this security trade-off in the setup documentation so operators are aware.


59-88: LGTM - AMD ROCm service configuration is well-structured.

The parakeet-asr-strixhalo service correctly configures:

  • ROCm device passthrough (/dev/kfd, /dev/dri)
  • Video group membership
  • Appropriate profile labels

Note: All variants share the parakeet-asr:latest image tag. Ensure users rebuild when switching between CPU/NVIDIA/AMD variants to avoid running mismatched images.

services.py (1)

126-145: LGTM!

The speaker-recognition service handling correctly routes to speaker-service-strixhalo when COMPUTE_MODE=strixhalo, maintaining consistency with the existing GPU/CPU selection pattern.

extras/speaker-recognition/docker-compose.yml (2)

85-85: LGTM!

Profile extension ensures the web UI starts alongside the Strix Halo backend variant.


58-77: LGTM - Strix Halo service configuration is correct.

The speaker-service-strixhalo service correctly:

  • Extends the base configuration with YAML anchors
  • References the existing Dockerfile.strixhalo
  • Configures AMD ROCm device passthrough (/dev/kfd, /dev/dri) and security options appropriately
extras/speaker-recognition/init.py (3)

196-208: LGTM! Compute mode selection logic is well-structured.

The new "strixhalo" option is cleanly integrated into the existing choice flow. The if/elif/else chain correctly maps choices 1→cpu, 2→gpu, 3→strixhalo.


213-216: LGTM! CUDA version handling correctly routes strixhalo to its own pathway.

The conditional chain properly separates strixhalo from the GPU/CUDA detection flow, which aligns with the distinct ROCm-based build path for Strix Halo.


455-457: LGTM! CLI arguments are consistent with interactive mode.

The --compute-mode argument now accepts "strixhalo" and the help text accurately reflects all three options.

extras/speaker-recognition/pyproject.toml (5)

5-5: Verify Python version bump doesn't break existing deployments.

The minimum Python version increased from 3.10 to 3.11. Ensure all deployment environments and CI pipelines support Python 3.11+, as this is a breaking change for users on Python 3.10.


11-11: LGTM! Conditional dependency correctly routes pyannote.audio source.

The extra != 'strixhalo' marker ensures the PyPI version is used for standard builds while allowing the git source (line 110) to provide it for the strixhalo pathway.


86-86: Verify required-environments constraint doesn't block local development.

This restricts dependency resolution to Linux x86_64 only. If contributors develop on macOS or ARM Linux, they'll need a workaround (e.g., Docker). Ensure this is intentional and documented.


93-93: LGTM! Strixhalo correctly added to the mutually exclusive extras.

The conflicts list ensures only one compute backend extra (cpu, cu121, cu126, cu128, strixhalo) can be active at a time, preventing dependency conflicts.


154-154: LGTM! isort profile aligned with Black.

Setting profile = "black" ensures import sorting is compatible with Black's formatting.

Comment on lines +89 to +92
[[tool.uv.index]]
name = "rocm-gfx1151"
url = "https://rocm.nightlies.amd.com/v2/gfx1151/"
explicit = true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

AMD ROCm PyTorch gfx1151 stable release availability

💡 Result:

Short answer: Yes — AMD's ROCm/PyTorch builds include support for gfx1151 (Ryzen AI / MI300-series class) in the ROCm 7.x line, but some components (e.g., AOTriton/other newer toolchains) are listed as experimental or under investigation and there are kernel/workaround notes. (Status verified as of January 6, 2026.) [1][2][3]

Sources

  • ROCm PyTorch compatibility (ROCm docs — shows gfx1151 listed and notes about experimental AOTriton support). [1]
  • ROCm install / PyTorch install doc (calls out gfx1151 / Ryzen AI Max 300 kernel requirements and workarounds). [2]
  • ROCm public issues (community reports/questions about gfx1151 support/build issues). [3]

🌐 Web query:

AMD ROCm official stable releases gfx1151 PyPI index

💡 Result:

Short answer:

  • AMD's official ROCm release notes show gfx1151 (RDNA3.5 / Ryzen AI Max series) enabled in ROCm 7.x (e.g., 7.1.x). [1]
  • AMD publishes an official PyPI package index for ROCm-hosted Python wheels (pypi.amd.com) and ROCm docs instruct using it via pip with --extra-index-url (examples: https://pypi.amd.com/simple or versioned paths like /rocm-7.0.2/simple). [2][3]

Sources:
[1] ROCm 7.1.0 / release notes — gfx1150 / gfx1151 enabled.
[2] hipDF installation (shows AMD PyPI and example --extra-index-url=https://pypi.amd.com/rocm-7.0.2/simple).
[3] Multiple ROCm docs (e.g., MONAI / CuPy) showing pip install --extra-index-url=https://pypi.amd.com/simple.


🌐 Web query:

rocm.nightlies.amd.com PyTorch nightly builds index

💡 Result:

The ROCm-hosted PyTorch nightly wheel index is available at https://rocm.nightlies.amd.com/ (per-variant subpaths like /v2/gfx120X-all/). You can install nightlies with pip using the index URL, for example:

python -m pip install --index-url https://rocm.nightlies.amd.com/v2/gfx120X-all/ --pre torch torchvision torchaudio

Sources: [1], [2].


Consider switching to AMD's official stable PyPI index instead of nightlies.

The rocm-gfx1151 index points to AMD's nightly builds (rocm.nightlies.amd.com), which may introduce unstable or breaking changes. Official stable PyPI releases for ROCm 7.x include gfx1151 support and are available at https://pypi.amd.com/simple or versioned paths like https://pypi.amd.com/rocm-7.0.2/simple. Use stable releases unless nightlies are specifically required for this project, and document the rationale if nightly builds are intentional.

🤖 Prompt for AI Agents
In @extras/asr-services/pyproject.toml around lines 89-92, The index entry named
"rocm-gfx1151" currently points to AMD nightlies and should be changed to a
stable AMD PyPI index: update the URL in the [[tool.uv.index]] block for name
"rocm-gfx1151" to use the official stable endpoint (e.g.
https://pypi.amd.com/simple or a versioned path like
https://pypi.amd.com/rocm-7.0.2/simple) and, if you intentionally need
nightlies, add a brief comment nearby explaining that rationale and why nightly
packages are required for this project.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants