creating prompt bench by RichieCahill · Pull Request #291 · RichieCahill/dotfiles

RichieCahill · 2026-04-10T17:00:06Z

Summary by CodeRabbit

Release Notes

New Features
- Added prompt benchmarking system for evaluating language models via vLLM integration
- Added fine-tuning capabilities for Qwen 3.5 4B using QLoRA with containerized deployment
- Added CLI tools for model downloading, dataset preparation, and token analysis
- Added bill text compression and legislative summarization support
- Added OpenAI Batch API integration for scalable batch processing

coderabbitai · 2026-04-10T17:00:21Z

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 47ac8153-8809-4105-ab10-d0da6a19af50

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

📝 Walkthrough

Walkthrough

Introduces a comprehensive LLM benchmarking and fine-tuning framework for Congressional bill summarization. New components include vLLM-based inference orchestration, OpenAI Batch API integration, Unsloth-based QLoRA fine-tuning, bill text compression utilities, and Docker containerization. System configuration updates provision ZFS storage and hardware setup for benchmarking infrastructure.

Changes

Cohort / File(s)	Summary
Configuration & Dependencies `.gitignore`, `overlays/default.nix`, `pyproject.toml`	Added `data/` to gitignore; added Python dependencies `huggingface-hub` and `tiktoken`; registered five new CLI entry points (`prompt-bench`, `prompt-bench-download`, `finetune`, `finetune-container`, `build-finetune-dataset`) with Ruff lint exemptions for new modules.
Benchmarking Core `python/prompt_bench/models.py`, `python/prompt_bench/container.py`, `python/prompt_bench/vllm_client.py`, `python/prompt_bench/main.py`, `python/prompt_bench/downloader.py`	Implemented vLLM-based benchmark orchestration: `BenchmarkConfig` Pydantic model for configuration, `container.py` for Docker lifecycle management, `vllm_client.py` HTTP client with readiness polling, `main.py` main runner that loads prompts and models then executes parallel benchmarks, and `downloader.py` for HuggingFace model downloading.
OpenAI Batch API & Token Analysis `python/prompt_bench/batch_bill_summarizer.py`, `python/prompt_bench/compresion_test.py`, `python/prompt_bench/tools/build_finetune_dataset.py`, `python/prompt_bench/tools/count_tokens.py`	Added batch job submission (`batch_bill_summarizer.py`), concurrent API testing (`compresion_test.py`), fine-tuning dataset construction from batch results (`build_finetune_dataset.py`), and token usage aggregation (`count_tokens.py`) across multiple runs.
Fine-tuning Infrastructure `python/prompt_bench/finetune.py`, `python/prompt_bench/finetune_container.py`, `python/prompt_bench/Dockerfile.finetune`, `python/prompt_bench/train.sh`	Implemented Unsloth-based QLoRA fine-tuning for Qwen 3.5 4B: `finetune.py` orchestrates training with `trl.SFTTrainer`, `finetune_container.py` manages Docker lifecycle with configurable hyperparameters, Dockerfile builds the container image, and `train.sh` provides shell wrapper.
Bill Processing Utilities `python/prompt_bench/bill_token_compression.py`, `python/prompt_bench/summarization_prompts.py`, `python/prompt_bench/__init__.py`	Added bill text compression with 14+ regex and string transformation steps (`bill_token_compression.py`), legislative analysis system and user prompts for bill summarization, and module docstring.
System Configuration `systems/bob/default.nix`, `systems/bob/scripts/zfs.sh`, `systems/bob/snapshot_config.toml`, `systems/rhapsody-in-green/sunshine.nix`, `users/richie/home/gui/vscode/settings.json`, `users/richie/systems/bob.nix`	Updated NixOS module signatures; added ZFS pool/dataset provisioning script; adjusted snapshot retention; refactored boot/hardware configuration in sunshine module; added spell-check exceptions and Docker extension settings; removed GUI imports from user configuration.

Sequence Diagram

sequenceDiagram
    participant User as User/CLI
    participant Main as main.py
    participant Config as BenchmarkConfig
    participant Downloader as downloader.py
    participant Container as container.py
    participant Server as vLLM Server
    participant Client as VLLMClient
    participant Executor as ThreadPoolExecutor
    participant Output as Output Files

    User->>Main: Run prompt-bench with config
    Main->>Config: Load & parse TOML
    Main->>Main: Validate prompts directory
    
    loop For each model repo
        Main->>Downloader: check/download model
        Downloader->>Downloader: is_model_present?
        
        Main->>Container: start_vllm(model)
        Container->>Server: docker run vLLM
        Server->>Server: Load model into memory
        
        Main->>Client: Create VLLMClient
        Client->>Server: poll /v1/models
        Server-->>Client: ready
        
        Main->>Executor: Submit prompt batch
        
        loop For each prompt (parallel)
            Executor->>Client: complete(prompt, model)
            Client->>Server: POST /v1/completions
            Server->>Server: Generate response
            Server-->>Client: response text
            Client-->>Executor: completion result
            Executor->>Output: Write response + timing
        end
        
        Main->>Container: stop_vllm()
        Container->>Server: docker stop/rm
    end
    
    Main->>Output: Write _timing.json summary
    Main-->>User: Report totals & metrics

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

adding heater control #248: Modifies overlays/default.nix to add Python dependencies (httpx, pydantic, tinytuya) to the same python-env package set; directly overlaps with this PR's dependency additions.
setup sunshine.nix #290: Updates systems/rhapsody-in-green/sunshine.nix boot/kernelParams and firmware handling; related to similar configuration refactoring in this PR's sunshine.nix changes.
testing "${self}/ in imports #201: Modifies systems/bob/default.nix module imports and structure; overlaps with this PR's changes to Bob's NixOS module signature and configuration.

Poem

🐰 A benchmark bunny hops through vLLM gates,
Counting tokens while fine-tuning weights,
Bills compressed with regex magic so tight,
OpenAI batches processed day and night,
Qwen's wisdom in containers takes flight! ✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name	Status	Explanation	Resolution
Title check	❓ Inconclusive	The PR title 'creating prompt bench' is vague and generic, failing to describe the substantial changes across multiple modules including CLI tools, Docker configurations, fine-tuning scripts, and infrastructure updates.	Use a more descriptive title that captures the primary scope, such as 'Add prompt benchmarking system for LLM evaluation' or 'Implement LLM prompt benchmark and fine-tuning tools'.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feature/creating-prompt_bench

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 20

🧹 Nitpick comments (2)

python/prompt_bench/batch_bill_summarizer.py (1)
170-173: Avoid the dual API-key fallback unless you explicitly want compatibility mode.

This CLI is new, so carrying both CLOSEDAI_TOKEN and OPENAI_API_KEY bakes in precedence behavior you'll need to preserve. Standardizing on one name keeps the interface simpler. As per coding guidelines, "Avoid compatibility branches like PG_USER and POSTGRESQL_URL unless requested".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@python/prompt_bench/batch_bill_summarizer.py` around lines 170 - 173, The
code currently falls back from getenv("CLOSEDAI_TOKEN") to
getenv("OPENAI_API_KEY"); remove the dual-key compatibility and require a single
standardized env var instead (use CLOSEDAI_TOKEN), so update the api_key
assignment to only read getenv("CLOSEDAI_TOKEN") and change the error in the
BadParameter raised by batch_bill_summarizer.py to reference that one variable
name (api_key, getenv, CLOSEDAI_TOKEN, and the BadParameter raise are the
symbols to edit).
python/prompt_bench/compresion_test.py (1)
176-179: Pick one API-key env var instead of a fallback chain.

This CLI is new, so carrying both CLOSEDAI_TOKEN and OPENAI_API_KEY adds precedence behavior you'll need to support indefinitely. If both names are truly required, document the precedence explicitly; otherwise standardize on one. As per coding guidelines, "Avoid compatibility branches like PG_USER and POSTGRESQL_URL unless requested".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@python/prompt_bench/compresion_test.py` around lines 176 - 179, The code
currently uses a fallback chain getenv("CLOSEDAI_TOKEN") or
getenv("OPENAI_API_KEY"), which introduces implicit precedence; pick a single
canonical env var (e.g., CLOSEDAI_TOKEN) and replace the fallback with a single
getenv call for that name, then update the error message in the
typer.BadParameter to reference only the chosen variable; if you truly need
both, explicitly document and implement clear precedence for CLOSEDAI_TOKEN vs
OPENAI_API_KEY rather than relying on the current implicit chain (refer to the
api_key variable and the typer.BadParameter raise to locate where to change).

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pyproject.toml`:
- Line 34: The console entrypoint "build-finetune-dataset" points to
python.prompt_bench.build_finetune_dataset:cli but the actual cli() lives in
python.prompt_bench.tools.build_finetune_dataset; fix by either creating a shim
module python/prompt_bench/build_finetune_dataset.py that imports and re-exports
cli from python.prompt_bench.tools.build_finetune_dataset (e.g., from
.tools.build_finetune_dataset import cli) or by editing pyproject.toml to change
the entrypoint to python.prompt_bench.tools.build_finetune_dataset:cli so the
build-finetune-dataset entry references the actual module containing cli().

In `@python/prompt_bench/batch_bill_summarizer.py`:
- Around line 124-125: The current use of safe_id = safe_filename(bill_id)
before request_lines.append(build_request(...)) can cause collisions when
different bill_id values map to the same safe_filename output; change to produce
a collision-safe custom_id (e.g., combine safe_filename(bill_id) with a stable
short hash or unique suffix derived from bill_id) and use that custom_id in
build_request and downstream dataset keys; also keep the original bill_id in the
request metadata so you can trace back to the source. Ensure you update
references to safe_id to the new custom_id variable everywhere
request_lines/dataset building expects custom_id.

In `@python/prompt_bench/compresion_test.py`:
- Around line 35-40: The SYSTEM_PROMPT multi-line string contains lines (notably
the long lines around the EXTRACTION RULES and the IGNORE rule) that exceed the
project's line-length limit; update the SYSTEM_PROMPT definition to wrap those
long lines so each physical line is within the Ruff limit (e.g., break the long
sentences across multiple lines inside the triple-quoted string or split into
concatenated shorter strings) while preserving the exact prompt text and
newlines; locate the SYSTEM_PROMPT variable and adjust the string literal so
lines that previously exceeded the limit (around the EXTRACTION RULES and the
IGNORE bullet) are wrapped to comply with the linter.

In `@python/prompt_bench/container.py`:
- Around line 67-81: check_gpu_free currently only logs warnings, letting
benchmark entry points continue and risk noisy timings or OOMs; change
check_gpu_free to return a boolean (e.g., True when GPU is free, False when
busy) or raise a specific exception (e.g., GPUBusyError) instead of just
logging, and update the benchmark entry points that call check_gpu_free to check
the return value or catch the exception and abort early (or proceed only if an
explicit opt-in/force flag is provided). Ensure the function signature
(check_gpu_free) and its callers are updated consistently and that log messages
remain informative while control flow is handled by the return value/exception.
- Around line 29-74: The subprocess calls use bare executables and trigger Ruff
S607; resolve the absolute paths once with shutil.which and use those absolute
paths in the command lists for start (the docker run command built in the
function that starts vLLM), stop_vllm (the subprocess.run calls that pass
["docker", "stop", CONTAINER_NAME] and ["docker", "rm", CONTAINER_NAME]) and
check_gpu_free (the ["nvidia-smi", ...] call). Add code to call
shutil.which("docker") and shutil.which("nvidia-smi") (store e.g. DOCKER_PATH
and NSMI_PATH), raise a clear RuntimeError or log+exit if which returns None,
and replace the first element of each subprocess command list with the
corresponding absolute path so all subprocess.run invocations reference the
resolved executable.

In `@python/prompt_bench/Dockerfile.finetune`:
- Around line 15-25: The Dockerfile runs as root (no USER directive) which is
insecure; modify the Dockerfile.finetune to create a non-root user and switch to
it before the ENTRYPOINT: add steps after setting WORKDIR to create a user/group
(e.g., uid/gid 1000 or a stable UID), chown /workspace and any copied files
(referenced by WORKDIR and the copied python/prompt_bench/* and
python/__init__.py artifacts) to that user, and add a USER instruction so the
process launched by ENTRYPOINT ["python", "-m", "python.prompt_bench.finetune"]
runs as the non-root user. Ensure ownership changes cover all copied files and
happen before switching users.

In `@python/prompt_bench/downloader.py`:
- Around line 23-27: The is_model_present function can raise if the resolved
path exists but is not a directory; update is_model_present (which calls
local_model_path) to first check path.is_dir() before iterating: return
path.exists() and path.is_dir() and any(path.iterdir()); this ensures you only
call iterdir() on directories and avoids aborting the download flow when a
non-directory file exists at the model path.

In `@python/prompt_bench/finetune_container.py`:
- Around line 25-29: Replace bare "docker" subprocess invocations with a
resolved absolute path: call shutil.which("docker") once (e.g., assign
DOCKER_BIN = shutil.which("docker") at module import), raise a clear error if it
returns None, and then use DOCKER_BIN in all subprocess.run/call invocations
(including the build call that uses DOCKERFILE_PATH and FINETUNE_IMAGE and the
later docker commands around lines ~119–140). This makes subprocess calls
Ruff-compliant (S603/S607) and reuses the resolved path across functions instead
of hardcoding "docker".

In `@python/prompt_bench/finetune.py`:
- Around line 94-95: Add upfront validation in the finetune entrypoint (check
the validation_split parameter passed to the function that declares
validation_split and output_dir) to ensure validation_split is a float within
(0.0, 1.0) (reject 0 or 1) and validate the dataset length before calling
train_test_split (the code that later calls
sklearn.model_selection.train_test_split or a helper for splitting) to ensure
there are enough samples for the requested split (e.g., require
min_total_samples and ensure int(len(dataset) * validation_split) >= 1 and
int(len(dataset) * (1 - validation_split)) >= 1); raise a clear ValueError or
typer.BadParameter with a descriptive message if the split is out of range or
the dataset is too small.

In `@python/prompt_bench/summarization_prompts.py`:
- Around line 1-34: Add a module docstring at the top describing the purpose of
this module (e.g., "Prompt templates for legislative bill summarization") to
satisfy Ruff D100, and fix E501 by breaking up overly long string literals
SUMMARIZATION_SYSTEM_PROMPT and SUMMARIZATION_USER_TEMPLATE so no source line
exceeds the configured max line length (use implicit multi-line string
concatenation with parentheses or textwrap.dedent + join while preserving the
exact prompt text and formatting). Locate the constants
SUMMARIZATION_SYSTEM_PROMPT and SUMMARIZATION_USER_TEMPLATE in this module and
wrap their content into shorter segments or use a dedented triple-quoted
docstring assembled from multiple lines to keep each source line within the
linter limit without changing the prompts' semantics. Ensure no other top-level
comments or code remain missing a module docstring.

In `@python/prompt_bench/tools/build_finetune_dataset.py`:
- Line 1: Add a new python/prompt_bench/tools/__init__.py to turn the implicit
namespace into a regular package (this will clear INP001 and satisfy
Ruff/treefmt); create the file (can be empty or include a minimal __all__
export) so modules like build_finetune_dataset.py import correctly and the tools
package is no longer an implicit namespace.
- Around line 22-61: In both load_requests and load_completions, currently
assigning results[custom_id] will silently overwrite duplicate custom_id
entries; modify each function (load_requests and load_completions) to detect
duplicates before assignment and fail fast—e.g., if custom_id in results: raise
ValueError or log an error and raise with a clear message including the
custom_id and line_number (use line_number in load_completions) to prevent
silent overwrites; ensure the check is placed right before results[custom_id] =
... so duplicates are rejected rather than replaced.

In `@python/prompt_bench/tools/count_tokens.py`:
- Line 1: The CI failure INP001 indicates this module (count_tokens.py) lives in
an implicit namespace package; fix it by making the tools package explicit: add
an __init__.py in the tools package (i.e., alongside count_tokens.py) so
python/prompt_bench/tools becomes an explicit package; ensure the new
__init__.py is minimal (module docstring or empty) and commit so Ruff stops
flagging INP001 for the count_tokens.py module.
- Around line 35-41: Wrap the unguarded JSON parsing around
decoder.raw_decode(text) in a try/except that catches json.JSONDecodeError (and
ValueError for safety), increments totals.errors, logs or notes the offending
path if desired, and continues the loop so a single malformed file doesn't abort
the run; ensure totals.files is only incremented for successfully parsed files
(move totals.files += 1 after successful decode) and keep the existing check of
record.get("usage") to increment totals.errors and continue when usage is
missing.

In `@python/prompt_bench/train.sh`:
- Around line 19-20: The script sets DATASET and OUTPUT_DIR using $(pwd), which
makes paths depend on the caller's CWD; update train.sh to compute the
script/repo directory and build DATASET and OUTPUT_DIR relative to that location
instead of $(pwd). Locate the DATASET and OUTPUT_DIR assignments in train.sh and
replace them with values constructed from the script's directory (e.g., using
dirname/BASH_SOURCE resolution) so the paths correctly point to
output/finetune_dataset.jsonl and output/qwen-bill-summarizer next to the script
regardless of where the script is invoked.

In `@python/prompt_bench/vllm_client.py`:
- Around line 29-40: wait_ready() can exceed max_wait because each _client.get()
uses the client's global timeout; change the probe to compute remaining =
deadline - time.monotonic() before each request, bail if remaining <= 0, and
pass a per-request timeout equal to min(remaining, a safe cap) to
_client.get(...) so a single blocked probe cannot exceed max_wait; ensure you
pass the per-request timeout value supported by httpx.get and continue to catch
httpx.TransportError (and timeout exceptions) as before, using
READY_POLL_INTERVAL between attempts.

In `@systems/bob/scripts/zfs.sh`:
- Around line 6-11: The ZFS dataset path in the provisioning script is
misaligned with the application defaults: change the zpool/zfs dataset creation
so the dataset is mounted at /zfs/models (and create the subpath /zfs/models/hf)
instead of /zfs/storage/models; specifically update the zpool/zfs create
commands in this script (the two sudo zpool create invocations and the sudo zfs
create storage/models line) to create/mount the pool and dataset under
/zfs/models and ensure a dataset or directory for /zfs/models/hf exists so it
matches python/prompt_bench.models:model_dir and
finetune_container:DEFAULT_HF_CACHE and the NixOS hardware.nix mount point.
- Around line 6-7: The two zpool create invocations are invalid because they
omit vdev device paths (the lines containing "sudo zpool create -f -o ...
storage mirror" and "sudo zpool create -o ... storage"); fix by providing
explicit device paths for the vdevs or by removing the duplicate and choosing
the correct topology: for a mirrored pool use "zpool create ... storage mirror
<device1> <device2> [<deviceN>...]" with at least two device nodes (prefer
stable /dev/disk/by-id names), or if you intend a single-disk pool create
without the "mirror" keyword and supply one device; update the script to use the
chosen valid command and remove the invalid alternate create line.

In `@systems/rhapsody-in-green/sunshine.nix`:
- Around line 15-23: Remove the global hardware.firmwareCompression = "none"
setting and instead disable compression only for the EDID derivation: locate the
runCommandLocal "virtual-display-edid" derivation that produces
virtual-display.bin (referenced under hardware.firmware) and add the
per-derivation attribute compressFirmware = false to that derivation; leave
hardware.firmwareCompression unset so other firmware entries remain unaffected.

In `@users/richie/home/gui/vscode/settings.json`:
- Around line 102-103: The JSON parse error is caused by a trailing comma after
the last property "docker.extension.enableComposeLanguageServer": false, —
remove the comma so the final property reads without a trailing comma and the
closing brace remains; ensure the property name
"docker.extension.enableComposeLanguageServer" is left intact and validate the
JSON after editing.

---

Nitpick comments:
In `@python/prompt_bench/batch_bill_summarizer.py`:
- Around line 170-173: The code currently falls back from
getenv("CLOSEDAI_TOKEN") to getenv("OPENAI_API_KEY"); remove the dual-key
compatibility and require a single standardized env var instead (use
CLOSEDAI_TOKEN), so update the api_key assignment to only read
getenv("CLOSEDAI_TOKEN") and change the error in the BadParameter raised by
batch_bill_summarizer.py to reference that one variable name (api_key, getenv,
CLOSEDAI_TOKEN, and the BadParameter raise are the symbols to edit).

In `@python/prompt_bench/compresion_test.py`:
- Around line 176-179: The code currently uses a fallback chain
getenv("CLOSEDAI_TOKEN") or getenv("OPENAI_API_KEY"), which introduces implicit
precedence; pick a single canonical env var (e.g., CLOSEDAI_TOKEN) and replace
the fallback with a single getenv call for that name, then update the error
message in the typer.BadParameter to reference only the chosen variable; if you
truly need both, explicitly document and implement clear precedence for
CLOSEDAI_TOKEN vs OPENAI_API_KEY rather than relying on the current implicit
chain (refer to the api_key variable and the typer.BadParameter raise to locate
where to change).

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 962af8a8-beb4-4c1f-ba21-745c5de44dbb

📥 Commits

Reviewing files that changed from the base of the PR and between af365fc and 7215260.

📒 Files selected for processing (25)

.gitignore
overlays/default.nix
pyproject.toml
python/prompt_bench/Dockerfile.finetune
python/prompt_bench/__init__.py
python/prompt_bench/batch_bill_summarizer.py
python/prompt_bench/bill_token_compression.py
python/prompt_bench/compresion_test.py
python/prompt_bench/container.py
python/prompt_bench/downloader.py
python/prompt_bench/finetune.py
python/prompt_bench/finetune_container.py
python/prompt_bench/main.py
python/prompt_bench/models.py
python/prompt_bench/summarization_prompts.py
python/prompt_bench/tools/build_finetune_dataset.py
python/prompt_bench/tools/count_tokens.py
python/prompt_bench/train.sh
python/prompt_bench/vllm_client.py
systems/bob/default.nix
systems/bob/scripts/zfs.sh
systems/bob/snapshot_config.toml
systems/rhapsody-in-green/sunshine.nix
users/richie/home/gui/vscode/settings.json
users/richie/systems/bob.nix

💤 Files with no reviewable changes (1)

users/richie/systems/bob.nix

coderabbitai · 2026-04-10T17:19:28Z

pyproject.toml

+prompt-bench-download = "python.prompt_bench.downloader:cli"
+finetune = "python.prompt_bench.finetune:cli"
+finetune-container = "python.prompt_bench.finetune_container:cli"
+build-finetune-dataset = "python.prompt_bench.build_finetune_dataset:cli"


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Verify the actual module path that defines `cli()` for build_finetune_dataset. set -euo pipefail echo "== entrypoint in pyproject.toml ==" rg -n 'build-finetune-dataset' pyproject.toml -C2 echo echo "== candidate files ==" fd -a build_finetune_dataset.py echo echo "== cli() definitions ==" rg -n --glob '**/build_finetune_dataset.py' '^\s*def\s+cli\s*\(' -C2

Repository: RichieCahill/dotfiles

Length of output: 790

Fix broken console entrypoint for build-finetune-dataset.

Line 34 declares build-finetune-dataset = "python.prompt_bench.build_finetune_dataset:cli", but the cli() function exists only at python/prompt_bench/tools/build_finetune_dataset.py (module path python.prompt_bench.tools.build_finetune_dataset). No shim module exists at python/prompt_bench/build_finetune_dataset.py, so this entrypoint will fail at runtime. Either create a shim that re-exports the cli, or update the entrypoint to reference python.prompt_bench.tools.build_finetune_dataset:cli.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@pyproject.toml` at line 34, The console entrypoint "build-finetune-dataset" points to python.prompt_bench.build_finetune_dataset:cli but the actual cli() lives in python.prompt_bench.tools.build_finetune_dataset; fix by either creating a shim module python/prompt_bench/build_finetune_dataset.py that imports and re-exports cli from python.prompt_bench.tools.build_finetune_dataset (e.g., from .tools.build_finetune_dataset import cli) or by editing pyproject.toml to change the entrypoint to python.prompt_bench.tools.build_finetune_dataset:cli so the build-finetune-dataset entry references the actual module containing cli().

coderabbitai · 2026-04-10T17:19:28Z

python/prompt_bench/batch_bill_summarizer.py

+        safe_id = safe_filename(bill_id)
+        request_lines.append(build_request(safe_id, model, compressed_text))


⚠️ Potential issue | 🟠 Major

Sanitizing bill_id into custom_id can create collisions.

Two different bill ids can map to the same safe_filename() output, and the downstream dataset builder joins by custom_id using dict keys. Once that happens, one request/completion pair overwrites another with no error.

Collision-safe alternative

+import hashlib + ... - safe_id = safe_filename(bill_id) + digest = hashlib.sha1(bill_id.encode("utf-8")).hexdigest()[:8] + safe_id = f"{safe_filename(bill_id)}-{digest}" request_lines.append(build_request(safe_id, model, compressed_text))

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@python/prompt_bench/batch_bill_summarizer.py` around lines 124 - 125, The current use of safe_id = safe_filename(bill_id) before request_lines.append(build_request(...)) can cause collisions when different bill_id values map to the same safe_filename output; change to produce a collision-safe custom_id (e.g., combine safe_filename(bill_id) with a stable short hash or unique suffix derived from bill_id) and use that custom_id in build_request and downstream dataset keys; also keep the original bill_id in the request metadata so you can trace back to the source. Ensure you update references to safe_id to the new custom_id variable everywhere request_lines/dataset building expects custom_id.

coderabbitai · 2026-04-10T17:19:28Z

python/prompt_bench/compresion_test.py

+SYSTEM_PROMPT = """You are a legislative analyst extracting policy substance from Congressional bill text.
+
+Your job is to compress a bill into a dense, neutral structured summary that captures every distinct policy action — including secondary effects that might be buried in subsections.
+
+EXTRACTION RULES:
+- IGNORE: whereas clauses, congressional findings that are purely political statements, recitals, preambles, citations of existing law by number alone, and procedural boilerplate.


⚠️ Potential issue | 🟡 Minor

Wrap the long SYSTEM_PROMPT lines; treefmt is already failing here.

Lines 37 and 40 exceed Ruff's line-length limit, so this file currently fails CI. As per coding guidelines, "Make python code ruff compliant".

🧰 Tools

🪛 GitHub Actions: treefmt

[error] 37-37: ruff formatting check failed (ruff). E501 Line too long (181 > 120).

[error] 40-40: ruff formatting check failed (ruff). E501 Line too long (179 > 120).

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@python/prompt_bench/compresion_test.py` around lines 35 - 40, The SYSTEM_PROMPT multi-line string contains lines (notably the long lines around the EXTRACTION RULES and the IGNORE rule) that exceed the project's line-length limit; update the SYSTEM_PROMPT definition to wrap those long lines so each physical line is within the Ruff limit (e.g., break the long sentences across multiple lines inside the triple-quoted string or split into concatenated shorter strings) while preserving the exact prompt text and newlines; locate the SYSTEM_PROMPT variable and adjust the string literal so lines that previously exceeded the limit (around the EXTRACTION RULES and the IGNORE bullet) are wrapped to comply with the linter.

coderabbitai · 2026-04-10T17:19:28Z

python/prompt_bench/containers/vllm.py

+    command = [
+        "docker",
+        "run",
+        "-d",
+        "--name",
+        CONTAINER_NAME,
+        "--device=nvidia.com/gpu=all",
+        "--ipc=host",
+        "-v",
+        f"{model_dir}:/models",
+        "-p",
+        f"{port}:8000",
+        VLLM_IMAGE,
+        "--model",
+        f"/models/{model}",
+        "--served-model-name",
+        model,
+        "--gpu-memory-utilization",
+        str(gpu_memory_utilization),
+        "--max-model-len",
+        "4096",
+    ]
+    logger.info("Starting vLLM container with model: %s", model)
+    result = subprocess.run(command, capture_output=True, text=True, check=False)
+    if result.returncode != 0:
+        msg = f"Failed to start vLLM container: {result.stderr.strip()}"
+        raise RuntimeError(msg)
+    logger.info("vLLM container started: %s", result.stdout.strip()[:12])
+
+
+def stop_vllm() -> None:
+    """Stop and remove the vLLM benchmark container."""
+    logger.info("Stopping vLLM container")
+    subprocess.run(["docker", "stop", CONTAINER_NAME], capture_output=True, check=False)
+    subprocess.run(["docker", "rm", CONTAINER_NAME], capture_output=True, check=False)
+    logger.info("vLLM container stopped and removed")
+
+
+def check_gpu_free() -> None:
+    """Warn if GPU-heavy processes (e.g. Ollama) are running."""
+    result = subprocess.run(
+        ["nvidia-smi", "--query-compute-apps=pid,process_name", "--format=csv,noheader"],
+        capture_output=True,
+        text=True,
+        check=False,
+    )


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash set -euo pipefail python - <<'PY' from pathlib import Path path = Path("python/prompt_bench/container.py") for lineno, line in enumerate(path.read_text().splitlines(), 1): if "subprocess.run(" in line or '"docker"' in line or '"nvidia-smi"' in line: print(f"{lineno}: {line}") PY

Repository: RichieCahill/dotfiles

Length of output: 483

🏁 Script executed:

# First, let's see the full file to understand imports and context head -100 python/prompt_bench/container.py

Repository: RichieCahill/dotfiles

Length of output: 2577

🏁 Script executed:

# Check if ruff is available and run it on the file which ruff && ruff check python/prompt_bench/container.py --select=S603,S607 || echo "Ruff not available; checking manually with ripgrep" rg -n "subprocess\.run|S60[37]" python/prompt_bench/container.py -A 2 -B 2

Repository: RichieCahill/dotfiles

Length of output: 1060

🏁 Script executed:

# Search for shutil imports to see if it's already used rg "import shutil|from shutil" python/prompt_bench/container.py

Repository: RichieCahill/dotfiles

Length of output: 47

Resolve Ruff S607 violations on subprocess calls by using absolute executable paths.

Lines 52, 62–63, and 69 invoke docker and nvidia-smi as bare executables, triggering S607 (partial path in subprocess call). Resolve each executable once using shutil.which() and pass the absolute path to subprocess.run(). As per coding guidelines, "Make python code ruff compliant".

🧰 Tools

🪛 Ruff (0.15.9)

[error] 52-52: subprocess call: check for execution of untrusted input

(S603)

[error] 62-62: subprocess call: check for execution of untrusted input

(S603)

[error] 62-62: Starting a process with a partial executable path

(S607)

[error] 63-63: subprocess call: check for execution of untrusted input

(S603)

[error] 63-63: Starting a process with a partial executable path

(S607)

[error] 70-70: Starting a process with a partial executable path

(S607)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@python/prompt_bench/container.py` around lines 29 - 74, The subprocess calls use bare executables and trigger Ruff S607; resolve the absolute paths once with shutil.which and use those absolute paths in the command lists for start (the docker run command built in the function that starts vLLM), stop_vllm (the subprocess.run calls that pass ["docker", "stop", CONTAINER_NAME] and ["docker", "rm", CONTAINER_NAME]) and check_gpu_free (the ["nvidia-smi", ...] call). Add code to call shutil.which("docker") and shutil.which("nvidia-smi") (store e.g. DOCKER_PATH and NSMI_PATH), raise a clear RuntimeError or log+exit if which returns None, and replace the first element of each subprocess command list with the corresponding absolute path so all subprocess.run invocations reference the resolved executable.

coderabbitai · 2026-04-10T17:19:28Z

python/prompt_bench/container.py

+def check_gpu_free() -> None:
+    """Warn if GPU-heavy processes (e.g. Ollama) are running."""
+    result = subprocess.run(
+        ["nvidia-smi", "--query-compute-apps=pid,process_name", "--format=csv,noheader"],
+        capture_output=True,
+        text=True,
+        check=False,
+    )
+    if result.returncode != 0:
+        logger.warning("Could not query GPU processes: %s", result.stderr.strip())
+        return
+    processes = result.stdout.strip()
+    if processes:
+        logger.warning("GPU processes detected:\n%s", processes)
+        logger.warning("Consider stopping Ollama (sudo systemctl stop ollama) before benchmarking")


⚠️ Potential issue | 🟠 Major

check_gpu_free() never gives callers a way to stop.

Right now a busy GPU only emits warnings, but both benchmark entry points keep running afterward. That makes timings noisy and can turn the fine-tune path into an avoidable OOM/failure loop. Return a boolean or raise here unless the caller explicitly opts into continuing.

🧰 Tools

🪛 Ruff (0.15.9)

[error] 70-70: Starting a process with a partial executable path

(S607)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@python/prompt_bench/container.py` around lines 67 - 81, check_gpu_free currently only logs warnings, letting benchmark entry points continue and risk noisy timings or OOMs; change check_gpu_free to return a boolean (e.g., True when GPU is free, False when busy) or raise a specific exception (e.g., GPUBusyError) instead of just logging, and update the benchmark entry points that call check_gpu_free to check the return value or catch the exception and abort early (or proceed only if an explicit opt-in/force flag is provided). Ensure the function signature (check_gpu_free) and its callers are updated consistently and that log messages remain informative while control flow is handled by the return value/exception.

coderabbitai · 2026-04-10T17:19:29Z

python/prompt_bench/vllm_client.py

+    def wait_ready(self, max_wait: int) -> None:
+        """Poll /v1/models until the server is ready or timeout."""
+        deadline = time.monotonic() + max_wait
+        while time.monotonic() < deadline:
+            try:
+                response = self._client.get("/v1/models")
+                if response.is_success:
+                    logger.info("vLLM server is ready")
+                    return
+            except httpx.TransportError:
+                pass
+            time.sleep(READY_POLL_INTERVAL)


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

In httpx, does a client-level timeout apply to every request unless a per-request timeout override is provided?

💡 Result:

Yes, in httpx, a client-level timeout applies to every request made with that client by default, unless a per-request timeout override is provided.

Citations:

1: https://www.python-httpx.org/advanced/timeouts/

2: https://www.python-httpx.org/api/

wait_ready() can exceed max_wait because the client's global timeout applies to each request.

Line 34 uses the client's default timeout (set at Line 27). When that timeout is large, a single blocked probe can exceed max_wait entirely, making startup timeout nondeterministic.

Proposed fix

def wait_ready(self, max_wait: int) -> None: """Poll /v1/models until the server is ready or timeout.""" deadline = time.monotonic() + max_wait while time.monotonic() < deadline: try: - response = self._client.get("/v1/models") + response = self._client.get("/v1/models", timeout=5.0) if response.is_success: logger.info("vLLM server is ready") return except httpx.TransportError: pass

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

def wait_ready(self, max_wait: int) -> None:

"""Poll /v1/models until the server is ready or timeout."""

deadline = time.monotonic() + max_wait

while time.monotonic() < deadline:

try:

response = self._client.get("/v1/models")

if response.is_success:

logger.info("vLLM server is ready")

return

except httpx.TransportError:

pass

time.sleep(READY_POLL_INTERVAL)

def wait_ready(self, max_wait: int) -> None:

"""Poll /v1/models until the server is ready or timeout."""

deadline = time.monotonic() + max_wait

while time.monotonic() < deadline:

try:

response = self._client.get("/v1/models", timeout=5.0)

if response.is_success:

logger.info("vLLM server is ready")

return

except httpx.TransportError:

pass

time.sleep(READY_POLL_INTERVAL)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@python/prompt_bench/vllm_client.py` around lines 29 - 40, wait_ready() can exceed max_wait because each _client.get() uses the client's global timeout; change the probe to compute remaining = deadline - time.monotonic() before each request, bail if remaining <= 0, and pass a per-request timeout equal to min(remaining, a safe cap) to _client.get(...) so a single blocked probe cannot exceed max_wait; ensure you pass the per-request timeout value supported by httpx.get and continue to catch httpx.TransportError (and timeout exceptions) as before, using READY_POLL_INTERVAL between attempts.

coderabbitai · 2026-04-10T17:19:29Z

systems/bob/scripts/zfs.sh

+sudo zpool create -f -o ashift=12 -O acltype=posixacl -O atime=off -O dnodesize=auto -O xattr=sa -O compression=zstd -m /zfs/storage storage mirror
+sudo zpool create -o ashift=12 -O acltype=posixacl -O atime=off -O dnodesize=auto -O xattr=sa -O compression=zstd -m /zfs/storage storage
+
+
+# storage datasets
+sudo zfs create storage/models -o recordsize=1M


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, locate the referenced Python files find . -path "*/python/prompt_bench/models.py" -o -path "*/python/prompt_bench/finetune_container.py" | head -20

Repository: RichieCahill/dotfiles

Length of output: 140

🏁 Script executed:

# Also search more broadly for these files fd -t f "models.py|finetune_container.py" | grep -E "prompt_bench|benchmark"

Repository: RichieCahill/dotfiles

Length of output: 136

🏁 Script executed:

# Check the zfs.sh file content to understand the full context find . -path "*/systems/bob/scripts/zfs.sh" -exec cat -n {} \;

Repository: RichieCahill/dotfiles

Length of output: 529

🏁 Script executed:

# Check models.py around line 12 sed -n '1,20p' python/prompt_bench/models.py

Repository: RichieCahill/dotfiles

Length of output: 517

🏁 Script executed:

# Check finetune_container.py around lines 16-22 sed -n '1,30p' python/prompt_bench/finetune_container.py

Repository: RichieCahill/dotfiles

Length of output: 869

🏁 Script executed:

# Search for references to /zfs/models/hf throughout the codebase rg -i "/zfs/models" --type-list | head -5 rg -i "/zfs/models" 2>/dev/null | head -20

Repository: RichieCahill/dotfiles

Length of output: 793

🏁 Script executed:

# Search for references to /zfs/storage in the codebase rg -i "/zfs/storage" 2>/dev/null | head -20

Repository: RichieCahill/dotfiles

Length of output: 906

🏁 Script executed:

# Check if there are any other zfs dataset creation commands or references to storage/models rg "storage/models" 2>/dev/null

Repository: RichieCahill/dotfiles

Length of output: 139

Model path mismatch between provisioning script and application defaults.

This script provisions the ZFS dataset at /zfs/storage/models, but all application code expects models at /zfs/models/hf:

python/prompt_bench/models.py line 12: model_dir: str = "/zfs/models/hf"

python/prompt_bench/finetune_container.py line 20: DEFAULT_HF_CACHE = Path("/zfs/models/hf")

NixOS config (systems/bob/hardware.nix) also configures /zfs/models as the mount point

Without alignment, model discovery and caching will fail or use divergent paths.

💡 Proposed fix

-# storage datasets -sudo zfs create storage/models -o recordsize=1M +# storage datasets (align with Python defaults) +sudo zfs create -o mountpoint=/zfs/models -o recordsize=1M storage/models +sudo zfs create storage/models/hf

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

sudo zpool create -f -o ashift=12 -O acltype=posixacl -O atime=off -O dnodesize=auto -O xattr=sa -O compression=zstd -m /zfs/storage storage mirror

sudo zpool create -o ashift=12 -O acltype=posixacl -O atime=off -O dnodesize=auto -O xattr=sa -O compression=zstd -m /zfs/storage storage

# storage datasets

sudo zfs create storage/models -o recordsize=1M

sudo zpool create -f -o ashift=12 -O acltype=posixacl -O atime=off -O dnodesize=auto -O xattr=sa -O compression=zstd -m /zfs/storage storage mirror

sudo zpool create -o ashift=12 -O acltype=posixacl -O atime=off -O dnodesize=auto -O xattr=sa -O compression=zstd -m /zfs/storage storage

# storage datasets (align with Python defaults)

sudo zfs create -o mountpoint=/zfs/models -o recordsize=1M storage/models

sudo zfs create storage/models/hf

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@systems/bob/scripts/zfs.sh` around lines 6 - 11, The ZFS dataset path in the provisioning script is misaligned with the application defaults: change the zpool/zfs dataset creation so the dataset is mounted at /zfs/models (and create the subpath /zfs/models/hf) instead of /zfs/storage/models; specifically update the zpool/zfs create commands in this script (the two sudo zpool create invocations and the sudo zfs create storage/models line) to create/mount the pool and dataset under /zfs/models and ensure a dataset or directory for /zfs/models/hf exists so it matches python/prompt_bench.models:model_dir and finetune_container:DEFAULT_HF_CACHE and the NixOS hardware.nix mount point.

coderabbitai · 2026-04-10T17:19:29Z

systems/bob/scripts/zfs.sh

+sudo zpool create -f -o ashift=12 -O acltype=posixacl -O atime=off -O dnodesize=auto -O xattr=sa -O compression=zstd -m /zfs/storage storage mirror
+sudo zpool create -o ashift=12 -O acltype=posixacl -O atime=off -O dnodesize=auto -O xattr=sa -O compression=zstd -m /zfs/storage storage


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

cat -n systems/bob/scripts/zfs.sh

Repository: RichieCahill/dotfiles

Length of output: 529

🌐 Web query:

zpool create mirror syntax requirements devices arguments

💡 Result:

The syntax for creating a ZFS mirror pool with zpool create is: zpool create [options] poolname mirror device1 device2 [mirror device3 device4 ...] Key requirements and arguments: - poolname: Must begin with a letter; alphanumeric, underscore, dash, colon, space, period allowed. Reserved names: mirror, raidz, draid, spare, log (and prefixes). - mirror keyword: Specifies a mirrored vdev (virtual device). Repeat for multiple mirrors (striped mirrors). - devices: At least 2 block devices per mirror (whole disks like /dev/sda, partitions like /dev/sda1, or files). Devices must be unused, same size recommended, accessible. Use -f to force if in use. - Minimum: 2 devices for single 2-way mirror. Options (common): - -f: Force use of in-use devices. - -m mountpoint: Set mount point (default /poolname). - -o property=value: Set pool properties (e.g., ashift=12 for 4K sectors). - -d: Dry run. - -R root: Alternate root for cachefile=none, altroot. Example single mirror: zpool create tank mirror c1d0 c2d0 Example two mirrors (striped): zpool create tank mirror c1d0 c2d0 mirror c3d0 c4d0 Devices validated for availability and consistent redundancy. Data striped across multiple mirrors if present.

Citations:

1: https://docs.oracle.com/cd/E19253-01/819-5461/gazhv/index.html

2: https://docs.oracle.com/en/operating-systems/solaris/oracle-solaris/11.4/manage-zfs/creating-mirrored-storage-pool.html

3: https://www.gsp.com/cgi-bin/man.cgi?section=8&topic=ZPOOL-CREATE

4: https://openzfs.github.io/openzfs-docs/man/master/8/zpool-create.8.html

🏁 Script executed:

ls -la systems/bob/ cat systems/bob/README.md 2>/dev/null || echo "No README found"

Repository: RichieCahill/dotfiles

Length of output: 563

🏁 Script executed:

cat -n systems/bob/hardware.nix echo "---" cat -n systems/bob/default.nix

Repository: RichieCahill/dotfiles

Length of output: 3591

zpool create commands are syntactically invalid and will fail.

Line 6 ends with mirror but provides no device paths. Line 7 attempts to create the same pool without any vdev specification. Both commands violate ZFS syntax: zpool create poolname mirror device1 device2 ... requires at least two devices per mirror. The script will fail during provisioning.

Proposed fix

-sudo zpool create -f -o ashift=12 -O acltype=posixacl -O atime=off -O dnodesize=auto -O xattr=sa -O compression=zstd -m /zfs/storage storage mirror -sudo zpool create -o ashift=12 -O acltype=posixacl -O atime=off -O dnodesize=auto -O xattr=sa -O compression=zstd -m /zfs/storage storage +# Replace <device1> and <device2> with actual block devices (e.g., /dev/sda, /dev/sdb) +sudo zpool create -f -o ashift=12 -O acltype=posixacl -O atime=off -O dnodesize=auto -O xattr=sa -O compression=zstd -m /zfs/storage storage mirror <device1> <device2>

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

sudo zpool create -f -o ashift=12 -O acltype=posixacl -O atime=off -O dnodesize=auto -O xattr=sa -O compression=zstd -m /zfs/storage storage mirror

sudo zpool create -o ashift=12 -O acltype=posixacl -O atime=off -O dnodesize=auto -O xattr=sa -O compression=zstd -m /zfs/storage storage

# Replace <device1> and <device2> with actual block devices (e.g., /dev/sda, /dev/sdb)

sudo zpool create -f -o ashift=12 -O acltype=posixacl -O atime=off -O dnodesize=auto -O xattr=sa -O compression=zstd -m /zfs/storage storage mirror <device1> <device2>

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@systems/bob/scripts/zfs.sh` around lines 6 - 7, The two zpool create invocations are invalid because they omit vdev device paths (the lines containing "sudo zpool create -f -o ... storage mirror" and "sudo zpool create -o ... storage"); fix by providing explicit device paths for the vdevs or by removing the duplicate and choosing the correct topology: for a mirrored pool use "zpool create ... storage mirror <device1> <device2> [<deviceN>...]" with at least two device nodes (prefer stable /dev/disk/by-id names), or if you intend a single-disk pool create without the "mirror" keyword and supply one device; update the script to use the chosen valid command and remove the invalid alternate create line.

coderabbitai · 2026-04-10T17:19:29Z

systems/rhapsody-in-green/sunshine.nix

+  hardware = {
+    firmwareCompression = "none";
+    firmware = [
+      (pkgs.runCommandLocal "virtual-display-edid" { } ''
        mkdir -p $out/lib/firmware/edid
        cp ${./edid/virtual-display.bin} $out/lib/firmware/edid/virtual-display.bin
-      ''
-    )
-  ];
+      '')
+    ];
+  };


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

For the nixpkgs/NixOS version pinned by this repo, does hardware.firmwareCompression = "none"apply globally to all entries inhardware.firmware? Also, is compressFirmware = false on a firmware derivation still supported/recommended for per-package control?

💡 Result:

Yes, hardware.firmwareCompression = "none" applies globally to all entries in hardware.firmware. It controls the compression method used when building the firmware environment via pkgs.buildEnv, where each package in the list is passed through a compressFirmware function. This function checks the global option first: if set to "none", no compression is applied to any firmware package, regardless of individual settings. Otherwise, it respects a per-package compressFirmware ? false attribute if present. The compressFirmware = false attribute on a firmware derivation is still supported as a fallback for per-package control when global compression is enabled (e.g., "xz" or "zstd"). It is not needed (and ignored) when global is "none". This is the recommended way for selective disabling, as seen in discussions around PR #172820. This behavior holds in recent nixpkgs (e.g., NixOS 24.11/unstable), with kernel-dependent defaults: "none" pre-5.3, "xz" 5.3-5.18, "zstd" 5.19+.

Citations:

1: https://github.com/NixOS/nixpkgs/blob/ed142ab1b3a092c4d149245d0c4126a5d7ea00b0/nixos/modules/services/hardware/udev.nix

2: https://mynixos.com/nixpkgs/option/hardware.firmwareCompression

3: Compressed firmware NixOS/nixpkgs#172820

4: https://search.nixos.org/options?query=hardware.firmware&show=hardware.firmware

5: https://discourse.nixos.org/t/how-does-nixos-link-built-packages-firmware/30663

hardware.firmwareCompression = "none" affects all firmware, not just this EDID blob.

This is a global policy change at line 16 that disables compression for every entry in hardware.firmware, not just virtual-display.bin. If other firmware packages exist in this configuration (now or in future), they will also become uncompressed.

For scoped control of only the EDID derivation, remove the global setting and add compressFirmware = false to the derivation itself. The per-package attribute is still supported and is the recommended approach for selective disabling when global compression is enabled.

Suggested scoped alternative

hardware = { - firmwareCompression = "none"; firmware = [ - (pkgs.runCommandLocal "virtual-display-edid" { } '' + (pkgs.runCommandLocal "virtual-display-edid" { + compressFirmware = false; + } '' mkdir -p $out/lib/firmware/edid cp ${./edid/virtual-display.bin} $out/lib/firmware/edid/virtual-display.bin '') ]; };

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@systems/rhapsody-in-green/sunshine.nix` around lines 15 - 23, Remove the global hardware.firmwareCompression = "none" setting and instead disable compression only for the EDID derivation: locate the runCommandLocal "virtual-display-edid" derivation that produces virtual-display.bin (referenced under hardware.firmware) and add the per-derivation attribute compressFirmware = false to that derivation; leave hardware.firmwareCompression unset so other firmware entries remain unaffected.

coderabbitai · 2026-04-10T17:19:29Z

users/richie/home/gui/vscode/settings.json

+  "docker.extension.enableComposeLanguageServer": false,
 }


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

# Check if file exists and read the lines around 102-103 if [ -f "users/richie/home/gui/vscode/settings.json" ]; then echo "File found. Reading lines 95-110 for context:" sed -n '95,110p' "users/richie/home/gui/vscode/settings.json" | cat -n else echo "File not found at users/richie/home/gui/vscode/settings.json" # Try to find it find . -name "settings.json" -type f | head -20 fi

Repository: RichieCahill/dotfiles

Length of output: 504

Remove the trailing comma on the last property to resolve parse failure.

Line 102 has a trailing comma after false, which is invalid JSON syntax. The last property in a JSON object must not have a trailing comma before the closing brace on line 103.

💡 Proposed fix

- "docker.extension.enableComposeLanguageServer": false, + "docker.extension.enableComposeLanguageServer": false }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

"docker.extension.enableComposeLanguageServer": false,

}

"docker.extension.enableComposeLanguageServer": false

}

🧰 Tools

🪛 Biome (2.4.10)

[error] 102-102: End of file expected

(parse)

[error] 102-102: End of file expected

(parse)

[error] 102-102: End of file expected

(parse)

[error] 102-103: End of file expected

(parse)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@users/richie/home/gui/vscode/settings.json` around lines 102 - 103, The JSON parse error is caused by a trailing comma after the last property "docker.extension.enableComposeLanguageServer": false, — remove the comma so the final property reads without a trailing comma and the closing brace remains; ensure the property name "docker.extension.enableComposeLanguageServer" is left intact and validate the JSON after editing.

tested on sample size of 100 bills matching the distribution of our data Compression saves ~11.5% on prompt tokens; completion/reasoning are roughly equal across the two sets. prompt completion reasoning total compressed 349,460 157,110 112,128 506,570 uncompressed 394,948 154,710 110,080 549,658 delta −45,488 +2,400 +2,048 −43,088

batch bill summarizer sends a batch api call to gpt

RichieCahill marked this pull request as draft April 10, 2026 17:00

coderabbitai bot reviewed Apr 10, 2026

View reviewed changes

RichieCahill added 22 commits April 12, 2026 10:08

creating prompt_bench downloader

a093c72

converting bob to a server

6b63315

fixed sunshine.nix

6f6d247

created main prompt bench

63cb48a

decreased root_pool/models snapshot life

6c57d74

added batch_bill_summarizer.py

88ceeb5

batch bill summarizer sends a batch api call to gpt

added tools dir for on off scripts i used

c6b4ed4

added summarization_prompts.py to sore the prompts

9ffaa1b

added tiktoken

230ab1d

added storage pool

20fb24e

updated spell check

27b6090

added data dir for traning

20a2046

created working finetuing pipeline

6c45998

moved renamed container.py to vllm_container.py

e8b20bc

conveted to summarization_prompts

ee34a09

added containers dir

face932

deleted train.sh

2757628

setup FinetuneConfig

6819090

updated BenchmarkConfig to have from_toml

30dc365

added config.toml to git ignore

50d56a8

adding math to bob

a80de99

RichieCahill force-pushed the feature/creating-prompt_bench branch from ac02d40 to a80de99 Compare April 12, 2026 14:08

		safe_id = safe_filename(bill_id)
		request_lines.append(build_request(safe_id, model, compressed_text))

		sudo zpool create -f -o ashift=12 -O acltype=posixacl -O atime=off -O dnodesize=auto -O xattr=sa -O compression=zstd -m /zfs/storage storage mirror
		sudo zpool create -o ashift=12 -O acltype=posixacl -O atime=off -O dnodesize=auto -O xattr=sa -O compression=zstd -m /zfs/storage storage

Conversation

RichieCahill commented Apr 10, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai bot commented Apr 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Possibly related PRs

Poem

❌ Failed checks (1 inconclusive)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

RichieCahill commented Apr 10, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Apr 10, 2026 •

edited

Loading