Skip to content

fix: add missing modprobe params for cosim amdgpu driver init#10

Merged
zevorn merged 8 commits intomainfrom
fix/cosim-modprobe-params
Apr 28, 2026
Merged

fix: add missing modprobe params for cosim amdgpu driver init#10
zevorn merged 8 commits intomainfrom
fix/cosim-modprobe-params

Conversation

@zevorn
Copy link
Copy Markdown
Owner

@zevorn zevorn commented Apr 28, 2026

Summary

  • Stop cosim-gpu-setup.sh from delegating to /home/gem5/load_amdgpu.sh (designed for standalone gem5, uses ip_block_mask=0x6f with PSP enabled)
  • Add missing ppfeaturemask=0 dpm=0 audio=0 parameters to all cosim modprobe commands — without these, the ROCm 7.0 driver tries to init PowerPlay/DPM against unmodeled gem5 registers, returning -EINVAL
  • Add insmod fallback for .ko.zst compressed modules (Ubuntu 24.04)
  • Update all docs (en/zh) to use the complete parameter set

Test plan

  • Boot cosim (QEMU+gem5) with the updated disk image, verify cosim-gpu-setup.service loads amdgpu successfully
  • Verify rocm-smi shows device 0x74a0 and rocminfo shows gfx942
  • Test manual modprobe path: modprobe amdgpu ip_block_mask=0x67 ppfeaturemask=0 dpm=0 audio=0 ras_enable=0 discovery=2

Fixes #9

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b98db1705c

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread scripts/cosim_guest_setup.sh Outdated
The cosim environment requires ppfeaturemask=0 dpm=0 audio=0 in
addition to ip_block_mask=0x67 to prevent the ROCm 7.0 driver from
accessing unmodeled power management and audio registers in gem5.

Without these parameters, modprobe fails with -EINVAL (-22) because
the driver attempts PowerPlay/DPM initialization against registers
that gem5 does not handle.

Update all modprobe commands and documentation across scripts, docs,
and CLAUDE.md to use the complete parameter set.

Also fix cosim_guest_setup.sh to stop delegating to load_amdgpu.sh
(which is for standalone gem5 and uses ip_block_mask=0x6f with PSP
enabled).

Fixes #9

Signed-off-by: Chao Liu <chao.liu.zevorn@gmail.com>
@zevorn zevorn force-pushed the fix/cosim-modprobe-params branch from b98db17 to 3ad8e90 Compare April 28, 2026 10:19
zevorn added 2 commits April 28, 2026 18:21
…_setup

The kernel cmdline passes modprobe.blacklist=amdgpu which creates
/run/modprobe.d/ entries causing modprobe to silently exit 0 without
loading the module. Clear the runtime blacklist before probing.

Same fix was already applied to cosim-gpu-setup.sh but was missing
from cosim_guest_setup.sh.

Signed-off-by: Chao Liu <chao.liu.zevorn@gmail.com>
- Convert AMDGPU_ARGS from string to bash array to satisfy SC2086
  in both cosim_guest_setup.sh and cosim-gpu-setup.sh
- Add rm blacklist step to cosim_guest_setup.sh (codex review)
- Add tests/test_modprobe_params.sh: verifies all cosim scripts
  include required modprobe parameters (ppfeaturemask, dpm, etc.)
- Add modprobe-params CI job to prevent regression of issue #9

Signed-off-by: Chao Liu <chao.liu.zevorn@gmail.com>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 39b86f188a

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread tests/test_modprobe_params.sh
@hsharsha
Copy link
Copy Markdown

Tested this fix, works for me. Thanks

Signed-off-by: Chao Liu <chao.liu.zevorn@gmail.com>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 94c8e9afbc

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread tests/test_modprobe_params.sh Outdated
Comment thread .github/workflows/ci.yml Outdated
…clone

- Rewrite test to check each AMDGPU_ARGS definition and inline
  modprobe/insmod line independently instead of merging into one blob
- Remove submodules: true from ShellCheck job (not needed)
- Only init gem5-resources submodule for modprobe-params job

Signed-off-by: Chao Liu <chao.liu.zevorn@gmail.com>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 89c1fe6ad6

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread .github/workflows/ci.yml Outdated
.gitmodules uses SSH URLs which fail on CI runners without deploy
keys. Override with git config insteadOf to use HTTPS.

Signed-off-by: Chao Liu <chao.liu.zevorn@gmail.com>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ec2e9d6da4

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread tests/test_modprobe_params.sh Outdated
…_mask

Broaden the regex to catch any modprobe/insmod amdgpu invocation so
the test also fails if ip_block_mask itself is removed. Skip lines
that use AMDGPU_ARGS variable (validated via the definition check).

Signed-off-by: Chao Liu <chao.liu.zevorn@gmail.com>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 6164191899

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread tests/test_modprobe_params.sh Outdated
Substring matching allowed incorrect values like ppfeaturemask=0xff
to pass (contains "ppfeaturemask=0"). Normalize delimiters and use
word-boundary matching to require exact param=value tokens.

Signed-off-by: Chao Liu <chao.liu.zevorn@gmail.com>
@zevorn zevorn merged commit 3a3562d into main Apr 28, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

amdgpu dirver fails to initialize with ubuntu 24.04 image

2 participants