Skip to content

creating prompt bench#291

Draft
RichieCahill wants to merge 22 commits intomainfrom
feature/creating-prompt_bench
Draft

creating prompt bench#291
RichieCahill wants to merge 22 commits intomainfrom
feature/creating-prompt_bench

Conversation

@RichieCahill
Copy link
Copy Markdown
Owner

@RichieCahill RichieCahill commented Apr 10, 2026

Summary by CodeRabbit

Release Notes

  • New Features
    • Added prompt benchmarking system for evaluating language models via vLLM integration
    • Added fine-tuning capabilities for Qwen 3.5 4B using QLoRA with containerized deployment
    • Added CLI tools for model downloading, dataset preparation, and token analysis
    • Added bill text compression and legislative summarization support
    • Added OpenAI Batch API integration for scalable batch processing

@RichieCahill RichieCahill marked this pull request as draft April 10, 2026 17:00
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 10, 2026

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 47ac8153-8809-4105-ab10-d0da6a19af50

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Introduces a comprehensive LLM benchmarking and fine-tuning framework for Congressional bill summarization. New components include vLLM-based inference orchestration, OpenAI Batch API integration, Unsloth-based QLoRA fine-tuning, bill text compression utilities, and Docker containerization. System configuration updates provision ZFS storage and hardware setup for benchmarking infrastructure.

Changes

Cohort / File(s) Summary
Configuration & Dependencies
.gitignore, overlays/default.nix, pyproject.toml
Added data/ to gitignore; added Python dependencies huggingface-hub and tiktoken; registered five new CLI entry points (prompt-bench, prompt-bench-download, finetune, finetune-container, build-finetune-dataset) with Ruff lint exemptions for new modules.
Benchmarking Core
python/prompt_bench/models.py, python/prompt_bench/container.py, python/prompt_bench/vllm_client.py, python/prompt_bench/main.py, python/prompt_bench/downloader.py
Implemented vLLM-based benchmark orchestration: BenchmarkConfig Pydantic model for configuration, container.py for Docker lifecycle management, vllm_client.py HTTP client with readiness polling, main.py main runner that loads prompts and models then executes parallel benchmarks, and downloader.py for HuggingFace model downloading.
OpenAI Batch API & Token Analysis
python/prompt_bench/batch_bill_summarizer.py, python/prompt_bench/compresion_test.py, python/prompt_bench/tools/build_finetune_dataset.py, python/prompt_bench/tools/count_tokens.py
Added batch job submission (batch_bill_summarizer.py), concurrent API testing (compresion_test.py), fine-tuning dataset construction from batch results (build_finetune_dataset.py), and token usage aggregation (count_tokens.py) across multiple runs.
Fine-tuning Infrastructure
python/prompt_bench/finetune.py, python/prompt_bench/finetune_container.py, python/prompt_bench/Dockerfile.finetune, python/prompt_bench/train.sh
Implemented Unsloth-based QLoRA fine-tuning for Qwen 3.5 4B: finetune.py orchestrates training with trl.SFTTrainer, finetune_container.py manages Docker lifecycle with configurable hyperparameters, Dockerfile builds the container image, and train.sh provides shell wrapper.
Bill Processing Utilities
python/prompt_bench/bill_token_compression.py, python/prompt_bench/summarization_prompts.py, python/prompt_bench/__init__.py
Added bill text compression with 14+ regex and string transformation steps (bill_token_compression.py), legislative analysis system and user prompts for bill summarization, and module docstring.
System Configuration
systems/bob/default.nix, systems/bob/scripts/zfs.sh, systems/bob/snapshot_config.toml, systems/rhapsody-in-green/sunshine.nix, users/richie/home/gui/vscode/settings.json, users/richie/systems/bob.nix
Updated NixOS module signatures; added ZFS pool/dataset provisioning script; adjusted snapshot retention; refactored boot/hardware configuration in sunshine module; added spell-check exceptions and Docker extension settings; removed GUI imports from user configuration.

Sequence Diagram

sequenceDiagram
    participant User as User/CLI
    participant Main as main.py
    participant Config as BenchmarkConfig
    participant Downloader as downloader.py
    participant Container as container.py
    participant Server as vLLM Server
    participant Client as VLLMClient
    participant Executor as ThreadPoolExecutor
    participant Output as Output Files

    User->>Main: Run prompt-bench with config
    Main->>Config: Load & parse TOML
    Main->>Main: Validate prompts directory
    
    loop For each model repo
        Main->>Downloader: check/download model
        Downloader->>Downloader: is_model_present?
        
        Main->>Container: start_vllm(model)
        Container->>Server: docker run vLLM
        Server->>Server: Load model into memory
        
        Main->>Client: Create VLLMClient
        Client->>Server: poll /v1/models
        Server-->>Client: ready
        
        Main->>Executor: Submit prompt batch
        
        loop For each prompt (parallel)
            Executor->>Client: complete(prompt, model)
            Client->>Server: POST /v1/completions
            Server->>Server: Generate response
            Server-->>Client: response text
            Client-->>Executor: completion result
            Executor->>Output: Write response + timing
        end
        
        Main->>Container: stop_vllm()
        Container->>Server: docker stop/rm
    end
    
    Main->>Output: Write _timing.json summary
    Main-->>User: Report totals & metrics
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

  • adding heater control #248: Modifies overlays/default.nix to add Python dependencies (httpx, pydantic, tinytuya) to the same python-env package set; directly overlaps with this PR's dependency additions.
  • setup sunshine.nix #290: Updates systems/rhapsody-in-green/sunshine.nix boot/kernelParams and firmware handling; related to similar configuration refactoring in this PR's sunshine.nix changes.
  • testing "${self}/ in imports #201: Modifies systems/bob/default.nix module imports and structure; overlaps with this PR's changes to Bob's NixOS module signature and configuration.

Poem

🐰 A benchmark bunny hops through vLLM gates,
Counting tokens while fine-tuning weights,
Bills compressed with regex magic so tight,
OpenAI batches processed day and night,
Qwen's wisdom in containers takes flight!

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name Status Explanation Resolution
Title check ❓ Inconclusive The PR title 'creating prompt bench' is vague and generic, failing to describe the substantial changes across multiple modules including CLI tools, Docker configurations, fine-tuning scripts, and infrastructure updates. Use a more descriptive title that captures the primary scope, such as 'Add prompt benchmarking system for LLM evaluation' or 'Implement LLM prompt benchmark and fine-tuning tools'.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feature/creating-prompt_bench

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 20

🧹 Nitpick comments (2)
python/prompt_bench/batch_bill_summarizer.py (1)

170-173: Avoid the dual API-key fallback unless you explicitly want compatibility mode.

This CLI is new, so carrying both CLOSEDAI_TOKEN and OPENAI_API_KEY bakes in precedence behavior you'll need to preserve. Standardizing on one name keeps the interface simpler. As per coding guidelines, "Avoid compatibility branches like PG_USER and POSTGRESQL_URL unless requested".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@python/prompt_bench/batch_bill_summarizer.py` around lines 170 - 173, The
code currently falls back from getenv("CLOSEDAI_TOKEN") to
getenv("OPENAI_API_KEY"); remove the dual-key compatibility and require a single
standardized env var instead (use CLOSEDAI_TOKEN), so update the api_key
assignment to only read getenv("CLOSEDAI_TOKEN") and change the error in the
BadParameter raised by batch_bill_summarizer.py to reference that one variable
name (api_key, getenv, CLOSEDAI_TOKEN, and the BadParameter raise are the
symbols to edit).
python/prompt_bench/compresion_test.py (1)

176-179: Pick one API-key env var instead of a fallback chain.

This CLI is new, so carrying both CLOSEDAI_TOKEN and OPENAI_API_KEY adds precedence behavior you'll need to support indefinitely. If both names are truly required, document the precedence explicitly; otherwise standardize on one. As per coding guidelines, "Avoid compatibility branches like PG_USER and POSTGRESQL_URL unless requested".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@python/prompt_bench/compresion_test.py` around lines 176 - 179, The code
currently uses a fallback chain getenv("CLOSEDAI_TOKEN") or
getenv("OPENAI_API_KEY"), which introduces implicit precedence; pick a single
canonical env var (e.g., CLOSEDAI_TOKEN) and replace the fallback with a single
getenv call for that name, then update the error message in the
typer.BadParameter to reference only the chosen variable; if you truly need
both, explicitly document and implement clear precedence for CLOSEDAI_TOKEN vs
OPENAI_API_KEY rather than relying on the current implicit chain (refer to the
api_key variable and the typer.BadParameter raise to locate where to change).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pyproject.toml`:
- Line 34: The console entrypoint "build-finetune-dataset" points to
python.prompt_bench.build_finetune_dataset:cli but the actual cli() lives in
python.prompt_bench.tools.build_finetune_dataset; fix by either creating a shim
module python/prompt_bench/build_finetune_dataset.py that imports and re-exports
cli from python.prompt_bench.tools.build_finetune_dataset (e.g., from
.tools.build_finetune_dataset import cli) or by editing pyproject.toml to change
the entrypoint to python.prompt_bench.tools.build_finetune_dataset:cli so the
build-finetune-dataset entry references the actual module containing cli().

In `@python/prompt_bench/batch_bill_summarizer.py`:
- Around line 124-125: The current use of safe_id = safe_filename(bill_id)
before request_lines.append(build_request(...)) can cause collisions when
different bill_id values map to the same safe_filename output; change to produce
a collision-safe custom_id (e.g., combine safe_filename(bill_id) with a stable
short hash or unique suffix derived from bill_id) and use that custom_id in
build_request and downstream dataset keys; also keep the original bill_id in the
request metadata so you can trace back to the source. Ensure you update
references to safe_id to the new custom_id variable everywhere
request_lines/dataset building expects custom_id.

In `@python/prompt_bench/compresion_test.py`:
- Around line 35-40: The SYSTEM_PROMPT multi-line string contains lines (notably
the long lines around the EXTRACTION RULES and the IGNORE rule) that exceed the
project's line-length limit; update the SYSTEM_PROMPT definition to wrap those
long lines so each physical line is within the Ruff limit (e.g., break the long
sentences across multiple lines inside the triple-quoted string or split into
concatenated shorter strings) while preserving the exact prompt text and
newlines; locate the SYSTEM_PROMPT variable and adjust the string literal so
lines that previously exceeded the limit (around the EXTRACTION RULES and the
IGNORE bullet) are wrapped to comply with the linter.

In `@python/prompt_bench/container.py`:
- Around line 67-81: check_gpu_free currently only logs warnings, letting
benchmark entry points continue and risk noisy timings or OOMs; change
check_gpu_free to return a boolean (e.g., True when GPU is free, False when
busy) or raise a specific exception (e.g., GPUBusyError) instead of just
logging, and update the benchmark entry points that call check_gpu_free to check
the return value or catch the exception and abort early (or proceed only if an
explicit opt-in/force flag is provided). Ensure the function signature
(check_gpu_free) and its callers are updated consistently and that log messages
remain informative while control flow is handled by the return value/exception.
- Around line 29-74: The subprocess calls use bare executables and trigger Ruff
S607; resolve the absolute paths once with shutil.which and use those absolute
paths in the command lists for start (the docker run command built in the
function that starts vLLM), stop_vllm (the subprocess.run calls that pass
["docker", "stop", CONTAINER_NAME] and ["docker", "rm", CONTAINER_NAME]) and
check_gpu_free (the ["nvidia-smi", ...] call). Add code to call
shutil.which("docker") and shutil.which("nvidia-smi") (store e.g. DOCKER_PATH
and NSMI_PATH), raise a clear RuntimeError or log+exit if which returns None,
and replace the first element of each subprocess command list with the
corresponding absolute path so all subprocess.run invocations reference the
resolved executable.

In `@python/prompt_bench/Dockerfile.finetune`:
- Around line 15-25: The Dockerfile runs as root (no USER directive) which is
insecure; modify the Dockerfile.finetune to create a non-root user and switch to
it before the ENTRYPOINT: add steps after setting WORKDIR to create a user/group
(e.g., uid/gid 1000 or a stable UID), chown /workspace and any copied files
(referenced by WORKDIR and the copied python/prompt_bench/* and
python/__init__.py artifacts) to that user, and add a USER instruction so the
process launched by ENTRYPOINT ["python", "-m", "python.prompt_bench.finetune"]
runs as the non-root user. Ensure ownership changes cover all copied files and
happen before switching users.

In `@python/prompt_bench/downloader.py`:
- Around line 23-27: The is_model_present function can raise if the resolved
path exists but is not a directory; update is_model_present (which calls
local_model_path) to first check path.is_dir() before iterating: return
path.exists() and path.is_dir() and any(path.iterdir()); this ensures you only
call iterdir() on directories and avoids aborting the download flow when a
non-directory file exists at the model path.

In `@python/prompt_bench/finetune_container.py`:
- Around line 25-29: Replace bare "docker" subprocess invocations with a
resolved absolute path: call shutil.which("docker") once (e.g., assign
DOCKER_BIN = shutil.which("docker") at module import), raise a clear error if it
returns None, and then use DOCKER_BIN in all subprocess.run/call invocations
(including the build call that uses DOCKERFILE_PATH and FINETUNE_IMAGE and the
later docker commands around lines ~119–140). This makes subprocess calls
Ruff-compliant (S603/S607) and reuses the resolved path across functions instead
of hardcoding "docker".

In `@python/prompt_bench/finetune.py`:
- Around line 94-95: Add upfront validation in the finetune entrypoint (check
the validation_split parameter passed to the function that declares
validation_split and output_dir) to ensure validation_split is a float within
(0.0, 1.0) (reject 0 or 1) and validate the dataset length before calling
train_test_split (the code that later calls
sklearn.model_selection.train_test_split or a helper for splitting) to ensure
there are enough samples for the requested split (e.g., require
min_total_samples and ensure int(len(dataset) * validation_split) >= 1 and
int(len(dataset) * (1 - validation_split)) >= 1); raise a clear ValueError or
typer.BadParameter with a descriptive message if the split is out of range or
the dataset is too small.

In `@python/prompt_bench/summarization_prompts.py`:
- Around line 1-34: Add a module docstring at the top describing the purpose of
this module (e.g., "Prompt templates for legislative bill summarization") to
satisfy Ruff D100, and fix E501 by breaking up overly long string literals
SUMMARIZATION_SYSTEM_PROMPT and SUMMARIZATION_USER_TEMPLATE so no source line
exceeds the configured max line length (use implicit multi-line string
concatenation with parentheses or textwrap.dedent + join while preserving the
exact prompt text and formatting). Locate the constants
SUMMARIZATION_SYSTEM_PROMPT and SUMMARIZATION_USER_TEMPLATE in this module and
wrap their content into shorter segments or use a dedented triple-quoted
docstring assembled from multiple lines to keep each source line within the
linter limit without changing the prompts' semantics. Ensure no other top-level
comments or code remain missing a module docstring.

In `@python/prompt_bench/tools/build_finetune_dataset.py`:
- Line 1: Add a new python/prompt_bench/tools/__init__.py to turn the implicit
namespace into a regular package (this will clear INP001 and satisfy
Ruff/treefmt); create the file (can be empty or include a minimal __all__
export) so modules like build_finetune_dataset.py import correctly and the tools
package is no longer an implicit namespace.
- Around line 22-61: In both load_requests and load_completions, currently
assigning results[custom_id] will silently overwrite duplicate custom_id
entries; modify each function (load_requests and load_completions) to detect
duplicates before assignment and fail fast—e.g., if custom_id in results: raise
ValueError or log an error and raise with a clear message including the
custom_id and line_number (use line_number in load_completions) to prevent
silent overwrites; ensure the check is placed right before results[custom_id] =
... so duplicates are rejected rather than replaced.

In `@python/prompt_bench/tools/count_tokens.py`:
- Line 1: The CI failure INP001 indicates this module (count_tokens.py) lives in
an implicit namespace package; fix it by making the tools package explicit: add
an __init__.py in the tools package (i.e., alongside count_tokens.py) so
python/prompt_bench/tools becomes an explicit package; ensure the new
__init__.py is minimal (module docstring or empty) and commit so Ruff stops
flagging INP001 for the count_tokens.py module.
- Around line 35-41: Wrap the unguarded JSON parsing around
decoder.raw_decode(text) in a try/except that catches json.JSONDecodeError (and
ValueError for safety), increments totals.errors, logs or notes the offending
path if desired, and continues the loop so a single malformed file doesn't abort
the run; ensure totals.files is only incremented for successfully parsed files
(move totals.files += 1 after successful decode) and keep the existing check of
record.get("usage") to increment totals.errors and continue when usage is
missing.

In `@python/prompt_bench/train.sh`:
- Around line 19-20: The script sets DATASET and OUTPUT_DIR using $(pwd), which
makes paths depend on the caller's CWD; update train.sh to compute the
script/repo directory and build DATASET and OUTPUT_DIR relative to that location
instead of $(pwd). Locate the DATASET and OUTPUT_DIR assignments in train.sh and
replace them with values constructed from the script's directory (e.g., using
dirname/BASH_SOURCE resolution) so the paths correctly point to
output/finetune_dataset.jsonl and output/qwen-bill-summarizer next to the script
regardless of where the script is invoked.

In `@python/prompt_bench/vllm_client.py`:
- Around line 29-40: wait_ready() can exceed max_wait because each _client.get()
uses the client's global timeout; change the probe to compute remaining =
deadline - time.monotonic() before each request, bail if remaining <= 0, and
pass a per-request timeout equal to min(remaining, a safe cap) to
_client.get(...) so a single blocked probe cannot exceed max_wait; ensure you
pass the per-request timeout value supported by httpx.get and continue to catch
httpx.TransportError (and timeout exceptions) as before, using
READY_POLL_INTERVAL between attempts.

In `@systems/bob/scripts/zfs.sh`:
- Around line 6-11: The ZFS dataset path in the provisioning script is
misaligned with the application defaults: change the zpool/zfs dataset creation
so the dataset is mounted at /zfs/models (and create the subpath /zfs/models/hf)
instead of /zfs/storage/models; specifically update the zpool/zfs create
commands in this script (the two sudo zpool create invocations and the sudo zfs
create storage/models line) to create/mount the pool and dataset under
/zfs/models and ensure a dataset or directory for /zfs/models/hf exists so it
matches python/prompt_bench.models:model_dir and
finetune_container:DEFAULT_HF_CACHE and the NixOS hardware.nix mount point.
- Around line 6-7: The two zpool create invocations are invalid because they
omit vdev device paths (the lines containing "sudo zpool create -f -o ...
storage mirror" and "sudo zpool create -o ... storage"); fix by providing
explicit device paths for the vdevs or by removing the duplicate and choosing
the correct topology: for a mirrored pool use "zpool create ... storage mirror
<device1> <device2> [<deviceN>...]" with at least two device nodes (prefer
stable /dev/disk/by-id names), or if you intend a single-disk pool create
without the "mirror" keyword and supply one device; update the script to use the
chosen valid command and remove the invalid alternate create line.

In `@systems/rhapsody-in-green/sunshine.nix`:
- Around line 15-23: Remove the global hardware.firmwareCompression = "none"
setting and instead disable compression only for the EDID derivation: locate the
runCommandLocal "virtual-display-edid" derivation that produces
virtual-display.bin (referenced under hardware.firmware) and add the
per-derivation attribute compressFirmware = false to that derivation; leave
hardware.firmwareCompression unset so other firmware entries remain unaffected.

In `@users/richie/home/gui/vscode/settings.json`:
- Around line 102-103: The JSON parse error is caused by a trailing comma after
the last property "docker.extension.enableComposeLanguageServer": false, —
remove the comma so the final property reads without a trailing comma and the
closing brace remains; ensure the property name
"docker.extension.enableComposeLanguageServer" is left intact and validate the
JSON after editing.

---

Nitpick comments:
In `@python/prompt_bench/batch_bill_summarizer.py`:
- Around line 170-173: The code currently falls back from
getenv("CLOSEDAI_TOKEN") to getenv("OPENAI_API_KEY"); remove the dual-key
compatibility and require a single standardized env var instead (use
CLOSEDAI_TOKEN), so update the api_key assignment to only read
getenv("CLOSEDAI_TOKEN") and change the error in the BadParameter raised by
batch_bill_summarizer.py to reference that one variable name (api_key, getenv,
CLOSEDAI_TOKEN, and the BadParameter raise are the symbols to edit).

In `@python/prompt_bench/compresion_test.py`:
- Around line 176-179: The code currently uses a fallback chain
getenv("CLOSEDAI_TOKEN") or getenv("OPENAI_API_KEY"), which introduces implicit
precedence; pick a single canonical env var (e.g., CLOSEDAI_TOKEN) and replace
the fallback with a single getenv call for that name, then update the error
message in the typer.BadParameter to reference only the chosen variable; if you
truly need both, explicitly document and implement clear precedence for
CLOSEDAI_TOKEN vs OPENAI_API_KEY rather than relying on the current implicit
chain (refer to the api_key variable and the typer.BadParameter raise to locate
where to change).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 962af8a8-beb4-4c1f-ba21-745c5de44dbb

📥 Commits

Reviewing files that changed from the base of the PR and between af365fc and 7215260.

📒 Files selected for processing (25)
  • .gitignore
  • overlays/default.nix
  • pyproject.toml
  • python/prompt_bench/Dockerfile.finetune
  • python/prompt_bench/__init__.py
  • python/prompt_bench/batch_bill_summarizer.py
  • python/prompt_bench/bill_token_compression.py
  • python/prompt_bench/compresion_test.py
  • python/prompt_bench/container.py
  • python/prompt_bench/downloader.py
  • python/prompt_bench/finetune.py
  • python/prompt_bench/finetune_container.py
  • python/prompt_bench/main.py
  • python/prompt_bench/models.py
  • python/prompt_bench/summarization_prompts.py
  • python/prompt_bench/tools/build_finetune_dataset.py
  • python/prompt_bench/tools/count_tokens.py
  • python/prompt_bench/train.sh
  • python/prompt_bench/vllm_client.py
  • systems/bob/default.nix
  • systems/bob/scripts/zfs.sh
  • systems/bob/snapshot_config.toml
  • systems/rhapsody-in-green/sunshine.nix
  • users/richie/home/gui/vscode/settings.json
  • users/richie/systems/bob.nix
💤 Files with no reviewable changes (1)
  • users/richie/systems/bob.nix

prompt-bench-download = "python.prompt_bench.downloader:cli"
finetune = "python.prompt_bench.finetune:cli"
finetune-container = "python.prompt_bench.finetune_container:cli"
build-finetune-dataset = "python.prompt_bench.build_finetune_dataset:cli"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Verify the actual module path that defines `cli()` for build_finetune_dataset.

set -euo pipefail

echo "== entrypoint in pyproject.toml =="
rg -n 'build-finetune-dataset' pyproject.toml -C2

echo
echo "== candidate files =="
fd -a build_finetune_dataset.py

echo
echo "== cli() definitions =="
rg -n --glob '**/build_finetune_dataset.py' '^\s*def\s+cli\s*\(' -C2

Repository: RichieCahill/dotfiles

Length of output: 790


Fix broken console entrypoint for build-finetune-dataset.

Line 34 declares build-finetune-dataset = "python.prompt_bench.build_finetune_dataset:cli", but the cli() function exists only at python/prompt_bench/tools/build_finetune_dataset.py (module path python.prompt_bench.tools.build_finetune_dataset). No shim module exists at python/prompt_bench/build_finetune_dataset.py, so this entrypoint will fail at runtime. Either create a shim that re-exports the cli, or update the entrypoint to reference python.prompt_bench.tools.build_finetune_dataset:cli.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pyproject.toml` at line 34, The console entrypoint "build-finetune-dataset"
points to python.prompt_bench.build_finetune_dataset:cli but the actual cli()
lives in python.prompt_bench.tools.build_finetune_dataset; fix by either
creating a shim module python/prompt_bench/build_finetune_dataset.py that
imports and re-exports cli from python.prompt_bench.tools.build_finetune_dataset
(e.g., from .tools.build_finetune_dataset import cli) or by editing
pyproject.toml to change the entrypoint to
python.prompt_bench.tools.build_finetune_dataset:cli so the
build-finetune-dataset entry references the actual module containing cli().

Comment on lines +124 to +125
safe_id = safe_filename(bill_id)
request_lines.append(build_request(safe_id, model, compressed_text))
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Sanitizing bill_id into custom_id can create collisions.

Two different bill ids can map to the same safe_filename() output, and the downstream dataset builder joins by custom_id using dict keys. Once that happens, one request/completion pair overwrites another with no error.

Collision-safe alternative
+import hashlib
+
 ...
-        safe_id = safe_filename(bill_id)
+        digest = hashlib.sha1(bill_id.encode("utf-8")).hexdigest()[:8]
+        safe_id = f"{safe_filename(bill_id)}-{digest}"
         request_lines.append(build_request(safe_id, model, compressed_text))
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@python/prompt_bench/batch_bill_summarizer.py` around lines 124 - 125, The
current use of safe_id = safe_filename(bill_id) before
request_lines.append(build_request(...)) can cause collisions when different
bill_id values map to the same safe_filename output; change to produce a
collision-safe custom_id (e.g., combine safe_filename(bill_id) with a stable
short hash or unique suffix derived from bill_id) and use that custom_id in
build_request and downstream dataset keys; also keep the original bill_id in the
request metadata so you can trace back to the source. Ensure you update
references to safe_id to the new custom_id variable everywhere
request_lines/dataset building expects custom_id.

Comment on lines +35 to +40
SYSTEM_PROMPT = """You are a legislative analyst extracting policy substance from Congressional bill text.

Your job is to compress a bill into a dense, neutral structured summary that captures every distinct policy action — including secondary effects that might be buried in subsections.

EXTRACTION RULES:
- IGNORE: whereas clauses, congressional findings that are purely political statements, recitals, preambles, citations of existing law by number alone, and procedural boilerplate.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Wrap the long SYSTEM_PROMPT lines; treefmt is already failing here.

Lines 37 and 40 exceed Ruff's line-length limit, so this file currently fails CI. As per coding guidelines, "Make python code ruff compliant".

🧰 Tools
🪛 GitHub Actions: treefmt

[error] 37-37: ruff formatting check failed (ruff). E501 Line too long (181 > 120).


[error] 40-40: ruff formatting check failed (ruff). E501 Line too long (179 > 120).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@python/prompt_bench/compresion_test.py` around lines 35 - 40, The
SYSTEM_PROMPT multi-line string contains lines (notably the long lines around
the EXTRACTION RULES and the IGNORE rule) that exceed the project's line-length
limit; update the SYSTEM_PROMPT definition to wrap those long lines so each
physical line is within the Ruff limit (e.g., break the long sentences across
multiple lines inside the triple-quoted string or split into concatenated
shorter strings) while preserving the exact prompt text and newlines; locate the
SYSTEM_PROMPT variable and adjust the string literal so lines that previously
exceeded the limit (around the EXTRACTION RULES and the IGNORE bullet) are
wrapped to comply with the linter.

Comment on lines +29 to +74
command = [
"docker",
"run",
"-d",
"--name",
CONTAINER_NAME,
"--device=nvidia.com/gpu=all",
"--ipc=host",
"-v",
f"{model_dir}:/models",
"-p",
f"{port}:8000",
VLLM_IMAGE,
"--model",
f"/models/{model}",
"--served-model-name",
model,
"--gpu-memory-utilization",
str(gpu_memory_utilization),
"--max-model-len",
"4096",
]
logger.info("Starting vLLM container with model: %s", model)
result = subprocess.run(command, capture_output=True, text=True, check=False)
if result.returncode != 0:
msg = f"Failed to start vLLM container: {result.stderr.strip()}"
raise RuntimeError(msg)
logger.info("vLLM container started: %s", result.stdout.strip()[:12])


def stop_vllm() -> None:
"""Stop and remove the vLLM benchmark container."""
logger.info("Stopping vLLM container")
subprocess.run(["docker", "stop", CONTAINER_NAME], capture_output=True, check=False)
subprocess.run(["docker", "rm", CONTAINER_NAME], capture_output=True, check=False)
logger.info("vLLM container stopped and removed")


def check_gpu_free() -> None:
"""Warn if GPU-heavy processes (e.g. Ollama) are running."""
result = subprocess.run(
["nvidia-smi", "--query-compute-apps=pid,process_name", "--format=csv,noheader"],
capture_output=True,
text=True,
check=False,
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

python - <<'PY'
from pathlib import Path

path = Path("python/prompt_bench/container.py")
for lineno, line in enumerate(path.read_text().splitlines(), 1):
    if "subprocess.run(" in line or '"docker"' in line or '"nvidia-smi"' in line:
        print(f"{lineno}: {line}")
PY

Repository: RichieCahill/dotfiles

Length of output: 483


🏁 Script executed:

# First, let's see the full file to understand imports and context
head -100 python/prompt_bench/container.py

Repository: RichieCahill/dotfiles

Length of output: 2577


🏁 Script executed:

# Check if ruff is available and run it on the file
which ruff && ruff check python/prompt_bench/container.py --select=S603,S607 || echo "Ruff not available; checking manually with ripgrep"
rg -n "subprocess\.run|S60[37]" python/prompt_bench/container.py -A 2 -B 2

Repository: RichieCahill/dotfiles

Length of output: 1060


🏁 Script executed:

# Search for shutil imports to see if it's already used
rg "import shutil|from shutil" python/prompt_bench/container.py

Repository: RichieCahill/dotfiles

Length of output: 47


Resolve Ruff S607 violations on subprocess calls by using absolute executable paths.

Lines 52, 62–63, and 69 invoke docker and nvidia-smi as bare executables, triggering S607 (partial path in subprocess call). Resolve each executable once using shutil.which() and pass the absolute path to subprocess.run(). As per coding guidelines, "Make python code ruff compliant".

🧰 Tools
🪛 Ruff (0.15.9)

[error] 52-52: subprocess call: check for execution of untrusted input

(S603)


[error] 62-62: subprocess call: check for execution of untrusted input

(S603)


[error] 62-62: Starting a process with a partial executable path

(S607)


[error] 63-63: subprocess call: check for execution of untrusted input

(S603)


[error] 63-63: Starting a process with a partial executable path

(S607)


[error] 70-70: Starting a process with a partial executable path

(S607)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@python/prompt_bench/container.py` around lines 29 - 74, The subprocess calls
use bare executables and trigger Ruff S607; resolve the absolute paths once with
shutil.which and use those absolute paths in the command lists for start (the
docker run command built in the function that starts vLLM), stop_vllm (the
subprocess.run calls that pass ["docker", "stop", CONTAINER_NAME] and ["docker",
"rm", CONTAINER_NAME]) and check_gpu_free (the ["nvidia-smi", ...] call). Add
code to call shutil.which("docker") and shutil.which("nvidia-smi") (store e.g.
DOCKER_PATH and NSMI_PATH), raise a clear RuntimeError or log+exit if which
returns None, and replace the first element of each subprocess command list with
the corresponding absolute path so all subprocess.run invocations reference the
resolved executable.

Comment on lines +67 to +81
def check_gpu_free() -> None:
"""Warn if GPU-heavy processes (e.g. Ollama) are running."""
result = subprocess.run(
["nvidia-smi", "--query-compute-apps=pid,process_name", "--format=csv,noheader"],
capture_output=True,
text=True,
check=False,
)
if result.returncode != 0:
logger.warning("Could not query GPU processes: %s", result.stderr.strip())
return
processes = result.stdout.strip()
if processes:
logger.warning("GPU processes detected:\n%s", processes)
logger.warning("Consider stopping Ollama (sudo systemctl stop ollama) before benchmarking")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

check_gpu_free() never gives callers a way to stop.

Right now a busy GPU only emits warnings, but both benchmark entry points keep running afterward. That makes timings noisy and can turn the fine-tune path into an avoidable OOM/failure loop. Return a boolean or raise here unless the caller explicitly opts into continuing.

🧰 Tools
🪛 Ruff (0.15.9)

[error] 70-70: Starting a process with a partial executable path

(S607)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@python/prompt_bench/container.py` around lines 67 - 81, check_gpu_free
currently only logs warnings, letting benchmark entry points continue and risk
noisy timings or OOMs; change check_gpu_free to return a boolean (e.g., True
when GPU is free, False when busy) or raise a specific exception (e.g.,
GPUBusyError) instead of just logging, and update the benchmark entry points
that call check_gpu_free to check the return value or catch the exception and
abort early (or proceed only if an explicit opt-in/force flag is provided).
Ensure the function signature (check_gpu_free) and its callers are updated
consistently and that log messages remain informative while control flow is
handled by the return value/exception.

Comment on lines +29 to +40
def wait_ready(self, max_wait: int) -> None:
"""Poll /v1/models until the server is ready or timeout."""
deadline = time.monotonic() + max_wait
while time.monotonic() < deadline:
try:
response = self._client.get("/v1/models")
if response.is_success:
logger.info("vLLM server is ready")
return
except httpx.TransportError:
pass
time.sleep(READY_POLL_INTERVAL)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

In httpx, does a client-level timeout apply to every request unless a per-request timeout override is provided?

💡 Result:

Yes, in httpx, a client-level timeout applies to every request made with that client by default, unless a per-request timeout override is provided.

Citations:


wait_ready() can exceed max_wait because the client's global timeout applies to each request.

Line 34 uses the client's default timeout (set at Line 27). When that timeout is large, a single blocked probe can exceed max_wait entirely, making startup timeout nondeterministic.

Proposed fix
     def wait_ready(self, max_wait: int) -> None:
         """Poll /v1/models until the server is ready or timeout."""
         deadline = time.monotonic() + max_wait
         while time.monotonic() < deadline:
             try:
-                response = self._client.get("/v1/models")
+                response = self._client.get("/v1/models", timeout=5.0)
                 if response.is_success:
                     logger.info("vLLM server is ready")
                     return
             except httpx.TransportError:
                 pass
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def wait_ready(self, max_wait: int) -> None:
"""Poll /v1/models until the server is ready or timeout."""
deadline = time.monotonic() + max_wait
while time.monotonic() < deadline:
try:
response = self._client.get("/v1/models")
if response.is_success:
logger.info("vLLM server is ready")
return
except httpx.TransportError:
pass
time.sleep(READY_POLL_INTERVAL)
def wait_ready(self, max_wait: int) -> None:
"""Poll /v1/models until the server is ready or timeout."""
deadline = time.monotonic() + max_wait
while time.monotonic() < deadline:
try:
response = self._client.get("/v1/models", timeout=5.0)
if response.is_success:
logger.info("vLLM server is ready")
return
except httpx.TransportError:
pass
time.sleep(READY_POLL_INTERVAL)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@python/prompt_bench/vllm_client.py` around lines 29 - 40, wait_ready() can
exceed max_wait because each _client.get() uses the client's global timeout;
change the probe to compute remaining = deadline - time.monotonic() before each
request, bail if remaining <= 0, and pass a per-request timeout equal to
min(remaining, a safe cap) to _client.get(...) so a single blocked probe cannot
exceed max_wait; ensure you pass the per-request timeout value supported by
httpx.get and continue to catch httpx.TransportError (and timeout exceptions) as
before, using READY_POLL_INTERVAL between attempts.

Comment on lines +6 to +11
sudo zpool create -f -o ashift=12 -O acltype=posixacl -O atime=off -O dnodesize=auto -O xattr=sa -O compression=zstd -m /zfs/storage storage mirror
sudo zpool create -o ashift=12 -O acltype=posixacl -O atime=off -O dnodesize=auto -O xattr=sa -O compression=zstd -m /zfs/storage storage


# storage datasets
sudo zfs create storage/models -o recordsize=1M
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, locate the referenced Python files
find . -path "*/python/prompt_bench/models.py" -o -path "*/python/prompt_bench/finetune_container.py" | head -20

Repository: RichieCahill/dotfiles

Length of output: 140


🏁 Script executed:

# Also search more broadly for these files
fd -t f "models.py|finetune_container.py" | grep -E "prompt_bench|benchmark"

Repository: RichieCahill/dotfiles

Length of output: 136


🏁 Script executed:

# Check the zfs.sh file content to understand the full context
find . -path "*/systems/bob/scripts/zfs.sh" -exec cat -n {} \;

Repository: RichieCahill/dotfiles

Length of output: 529


🏁 Script executed:

# Check models.py around line 12
sed -n '1,20p' python/prompt_bench/models.py

Repository: RichieCahill/dotfiles

Length of output: 517


🏁 Script executed:

# Check finetune_container.py around lines 16-22
sed -n '1,30p' python/prompt_bench/finetune_container.py

Repository: RichieCahill/dotfiles

Length of output: 869


🏁 Script executed:

# Search for references to /zfs/models/hf throughout the codebase
rg -i "/zfs/models" --type-list | head -5
rg -i "/zfs/models" 2>/dev/null | head -20

Repository: RichieCahill/dotfiles

Length of output: 793


🏁 Script executed:

# Search for references to /zfs/storage in the codebase
rg -i "/zfs/storage" 2>/dev/null | head -20

Repository: RichieCahill/dotfiles

Length of output: 906


🏁 Script executed:

# Check if there are any other zfs dataset creation commands or references to storage/models
rg "storage/models" 2>/dev/null

Repository: RichieCahill/dotfiles

Length of output: 139


Model path mismatch between provisioning script and application defaults.

This script provisions the ZFS dataset at /zfs/storage/models, but all application code expects models at /zfs/models/hf:

  • python/prompt_bench/models.py line 12: model_dir: str = "/zfs/models/hf"
  • python/prompt_bench/finetune_container.py line 20: DEFAULT_HF_CACHE = Path("/zfs/models/hf")
  • NixOS config (systems/bob/hardware.nix) also configures /zfs/models as the mount point

Without alignment, model discovery and caching will fail or use divergent paths.

💡 Proposed fix
-# storage datasets
-sudo zfs create storage/models -o recordsize=1M
+# storage datasets (align with Python defaults)
+sudo zfs create -o mountpoint=/zfs/models -o recordsize=1M storage/models
+sudo zfs create storage/models/hf
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
sudo zpool create -f -o ashift=12 -O acltype=posixacl -O atime=off -O dnodesize=auto -O xattr=sa -O compression=zstd -m /zfs/storage storage mirror
sudo zpool create -o ashift=12 -O acltype=posixacl -O atime=off -O dnodesize=auto -O xattr=sa -O compression=zstd -m /zfs/storage storage
# storage datasets
sudo zfs create storage/models -o recordsize=1M
sudo zpool create -f -o ashift=12 -O acltype=posixacl -O atime=off -O dnodesize=auto -O xattr=sa -O compression=zstd -m /zfs/storage storage mirror
sudo zpool create -o ashift=12 -O acltype=posixacl -O atime=off -O dnodesize=auto -O xattr=sa -O compression=zstd -m /zfs/storage storage
# storage datasets (align with Python defaults)
sudo zfs create -o mountpoint=/zfs/models -o recordsize=1M storage/models
sudo zfs create storage/models/hf
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@systems/bob/scripts/zfs.sh` around lines 6 - 11, The ZFS dataset path in the
provisioning script is misaligned with the application defaults: change the
zpool/zfs dataset creation so the dataset is mounted at /zfs/models (and create
the subpath /zfs/models/hf) instead of /zfs/storage/models; specifically update
the zpool/zfs create commands in this script (the two sudo zpool create
invocations and the sudo zfs create storage/models line) to create/mount the
pool and dataset under /zfs/models and ensure a dataset or directory for
/zfs/models/hf exists so it matches python/prompt_bench.models:model_dir and
finetune_container:DEFAULT_HF_CACHE and the NixOS hardware.nix mount point.

Comment on lines +6 to +7
sudo zpool create -f -o ashift=12 -O acltype=posixacl -O atime=off -O dnodesize=auto -O xattr=sa -O compression=zstd -m /zfs/storage storage mirror
sudo zpool create -o ashift=12 -O acltype=posixacl -O atime=off -O dnodesize=auto -O xattr=sa -O compression=zstd -m /zfs/storage storage
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

cat -n systems/bob/scripts/zfs.sh

Repository: RichieCahill/dotfiles

Length of output: 529


🌐 Web query:

zpool create mirror syntax requirements devices arguments

💡 Result:

The syntax for creating a ZFS mirror pool with zpool create is: zpool create [options] poolname mirror device1 device2 [mirror device3 device4 ...] Key requirements and arguments: - poolname: Must begin with a letter; alphanumeric, underscore, dash, colon, space, period allowed. Reserved names: mirror, raidz, draid, spare, log (and prefixes). - mirror keyword: Specifies a mirrored vdev (virtual device). Repeat for multiple mirrors (striped mirrors). - devices: At least 2 block devices per mirror (whole disks like /dev/sda, partitions like /dev/sda1, or files). Devices must be unused, same size recommended, accessible. Use -f to force if in use. - Minimum: 2 devices for single 2-way mirror. Options (common): - -f: Force use of in-use devices. - -m mountpoint: Set mount point (default /poolname). - -o property=value: Set pool properties (e.g., ashift=12 for 4K sectors). - -d: Dry run. - -R root: Alternate root for cachefile=none, altroot. Example single mirror: zpool create tank mirror c1d0 c2d0 Example two mirrors (striped): zpool create tank mirror c1d0 c2d0 mirror c3d0 c4d0 Devices validated for availability and consistent redundancy. Data striped across multiple mirrors if present.

Citations:


🏁 Script executed:

ls -la systems/bob/
cat systems/bob/README.md 2>/dev/null || echo "No README found"

Repository: RichieCahill/dotfiles

Length of output: 563


🏁 Script executed:

cat -n systems/bob/hardware.nix
echo "---"
cat -n systems/bob/default.nix

Repository: RichieCahill/dotfiles

Length of output: 3591


zpool create commands are syntactically invalid and will fail.

Line 6 ends with mirror but provides no device paths. Line 7 attempts to create the same pool without any vdev specification. Both commands violate ZFS syntax: zpool create poolname mirror device1 device2 ... requires at least two devices per mirror. The script will fail during provisioning.

Proposed fix
-sudo zpool create -f -o ashift=12 -O acltype=posixacl -O atime=off -O dnodesize=auto -O xattr=sa -O compression=zstd -m /zfs/storage storage mirror
-sudo zpool create -o ashift=12 -O acltype=posixacl -O atime=off -O dnodesize=auto -O xattr=sa -O compression=zstd -m /zfs/storage storage
+# Replace <device1> and <device2> with actual block devices (e.g., /dev/sda, /dev/sdb)
+sudo zpool create -f -o ashift=12 -O acltype=posixacl -O atime=off -O dnodesize=auto -O xattr=sa -O compression=zstd -m /zfs/storage storage mirror <device1> <device2>
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
sudo zpool create -f -o ashift=12 -O acltype=posixacl -O atime=off -O dnodesize=auto -O xattr=sa -O compression=zstd -m /zfs/storage storage mirror
sudo zpool create -o ashift=12 -O acltype=posixacl -O atime=off -O dnodesize=auto -O xattr=sa -O compression=zstd -m /zfs/storage storage
# Replace <device1> and <device2> with actual block devices (e.g., /dev/sda, /dev/sdb)
sudo zpool create -f -o ashift=12 -O acltype=posixacl -O atime=off -O dnodesize=auto -O xattr=sa -O compression=zstd -m /zfs/storage storage mirror <device1> <device2>
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@systems/bob/scripts/zfs.sh` around lines 6 - 7, The two zpool create
invocations are invalid because they omit vdev device paths (the lines
containing "sudo zpool create -f -o ... storage mirror" and "sudo zpool create
-o ... storage"); fix by providing explicit device paths for the vdevs or by
removing the duplicate and choosing the correct topology: for a mirrored pool
use "zpool create ... storage mirror <device1> <device2> [<deviceN>...]" with at
least two device nodes (prefer stable /dev/disk/by-id names), or if you intend a
single-disk pool create without the "mirror" keyword and supply one device;
update the script to use the chosen valid command and remove the invalid
alternate create line.

Comment on lines +15 to +23
hardware = {
firmwareCompression = "none";
firmware = [
(pkgs.runCommandLocal "virtual-display-edid" { } ''
mkdir -p $out/lib/firmware/edid
cp ${./edid/virtual-display.bin} $out/lib/firmware/edid/virtual-display.bin
''
)
];
'')
];
};
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

For the nixpkgs/NixOS version pinned by this repo, does hardware.firmwareCompression = "none"apply globally to all entries inhardware.firmware? Also, is compressFirmware = false on a firmware derivation still supported/recommended for per-package control?

💡 Result:

Yes, hardware.firmwareCompression = "none" applies globally to all entries in hardware.firmware. It controls the compression method used when building the firmware environment via pkgs.buildEnv, where each package in the list is passed through a compressFirmware function. This function checks the global option first: if set to "none", no compression is applied to any firmware package, regardless of individual settings. Otherwise, it respects a per-package compressFirmware ? false attribute if present. The compressFirmware = false attribute on a firmware derivation is still supported as a fallback for per-package control when global compression is enabled (e.g., "xz" or "zstd"). It is not needed (and ignored) when global is "none". This is the recommended way for selective disabling, as seen in discussions around PR #172820. This behavior holds in recent nixpkgs (e.g., NixOS 24.11/unstable), with kernel-dependent defaults: "none" pre-5.3, "xz" 5.3-5.18, "zstd" 5.19+.

Citations:


hardware.firmwareCompression = "none" affects all firmware, not just this EDID blob.

This is a global policy change at line 16 that disables compression for every entry in hardware.firmware, not just virtual-display.bin. If other firmware packages exist in this configuration (now or in future), they will also become uncompressed.

For scoped control of only the EDID derivation, remove the global setting and add compressFirmware = false to the derivation itself. The per-package attribute is still supported and is the recommended approach for selective disabling when global compression is enabled.

Suggested scoped alternative
   hardware = {
-    firmwareCompression = "none";
     firmware = [
-      (pkgs.runCommandLocal "virtual-display-edid" { } ''
+      (pkgs.runCommandLocal "virtual-display-edid" {
+        compressFirmware = false;
+      } ''
         mkdir -p $out/lib/firmware/edid
         cp ${./edid/virtual-display.bin} $out/lib/firmware/edid/virtual-display.bin
       '')
     ];
   };
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@systems/rhapsody-in-green/sunshine.nix` around lines 15 - 23, Remove the
global hardware.firmwareCompression = "none" setting and instead disable
compression only for the EDID derivation: locate the runCommandLocal
"virtual-display-edid" derivation that produces virtual-display.bin (referenced
under hardware.firmware) and add the per-derivation attribute compressFirmware =
false to that derivation; leave hardware.firmwareCompression unset so other
firmware entries remain unaffected.

Comment on lines +102 to 103
"docker.extension.enableComposeLanguageServer": false,
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

# Check if file exists and read the lines around 102-103
if [ -f "users/richie/home/gui/vscode/settings.json" ]; then
  echo "File found. Reading lines 95-110 for context:"
  sed -n '95,110p' "users/richie/home/gui/vscode/settings.json" | cat -n
else
  echo "File not found at users/richie/home/gui/vscode/settings.json"
  # Try to find it
  find . -name "settings.json" -type f | head -20
fi

Repository: RichieCahill/dotfiles

Length of output: 504


Remove the trailing comma on the last property to resolve parse failure.

Line 102 has a trailing comma after false, which is invalid JSON syntax. The last property in a JSON object must not have a trailing comma before the closing brace on line 103.

💡 Proposed fix
-  "docker.extension.enableComposeLanguageServer": false,
+  "docker.extension.enableComposeLanguageServer": false
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
"docker.extension.enableComposeLanguageServer": false,
}
"docker.extension.enableComposeLanguageServer": false
}
🧰 Tools
🪛 Biome (2.4.10)

[error] 102-102: End of file expected

(parse)


[error] 102-102: End of file expected

(parse)


[error] 102-102: End of file expected

(parse)


[error] 102-103: End of file expected

(parse)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@users/richie/home/gui/vscode/settings.json` around lines 102 - 103, The JSON
parse error is caused by a trailing comma after the last property
"docker.extension.enableComposeLanguageServer": false, — remove the comma so the
final property reads without a trailing comma and the closing brace remains;
ensure the property name "docker.extension.enableComposeLanguageServer" is left
intact and validate the JSON after editing.

@RichieCahill RichieCahill force-pushed the feature/creating-prompt_bench branch from ac02d40 to a80de99 Compare April 12, 2026 14:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant