Conversation
|
Important Review skippedDraft detected. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
📝 WalkthroughWalkthroughIntroduces a comprehensive LLM benchmarking and fine-tuning framework for Congressional bill summarization. New components include vLLM-based inference orchestration, OpenAI Batch API integration, Unsloth-based QLoRA fine-tuning, bill text compression utilities, and Docker containerization. System configuration updates provision ZFS storage and hardware setup for benchmarking infrastructure. Changes
Sequence DiagramsequenceDiagram
participant User as User/CLI
participant Main as main.py
participant Config as BenchmarkConfig
participant Downloader as downloader.py
participant Container as container.py
participant Server as vLLM Server
participant Client as VLLMClient
participant Executor as ThreadPoolExecutor
participant Output as Output Files
User->>Main: Run prompt-bench with config
Main->>Config: Load & parse TOML
Main->>Main: Validate prompts directory
loop For each model repo
Main->>Downloader: check/download model
Downloader->>Downloader: is_model_present?
Main->>Container: start_vllm(model)
Container->>Server: docker run vLLM
Server->>Server: Load model into memory
Main->>Client: Create VLLMClient
Client->>Server: poll /v1/models
Server-->>Client: ready
Main->>Executor: Submit prompt batch
loop For each prompt (parallel)
Executor->>Client: complete(prompt, model)
Client->>Server: POST /v1/completions
Server->>Server: Generate response
Server-->>Client: response text
Client-->>Executor: completion result
Executor->>Output: Write response + timing
end
Main->>Container: stop_vllm()
Container->>Server: docker stop/rm
end
Main->>Output: Write _timing.json summary
Main-->>User: Report totals & metrics
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 inconclusive)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 20
🧹 Nitpick comments (2)
python/prompt_bench/batch_bill_summarizer.py (1)
170-173: Avoid the dual API-key fallback unless you explicitly want compatibility mode.This CLI is new, so carrying both
CLOSEDAI_TOKENandOPENAI_API_KEYbakes in precedence behavior you'll need to preserve. Standardizing on one name keeps the interface simpler. As per coding guidelines, "Avoid compatibility branches like PG_USER and POSTGRESQL_URL unless requested".🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@python/prompt_bench/batch_bill_summarizer.py` around lines 170 - 173, The code currently falls back from getenv("CLOSEDAI_TOKEN") to getenv("OPENAI_API_KEY"); remove the dual-key compatibility and require a single standardized env var instead (use CLOSEDAI_TOKEN), so update the api_key assignment to only read getenv("CLOSEDAI_TOKEN") and change the error in the BadParameter raised by batch_bill_summarizer.py to reference that one variable name (api_key, getenv, CLOSEDAI_TOKEN, and the BadParameter raise are the symbols to edit).python/prompt_bench/compresion_test.py (1)
176-179: Pick one API-key env var instead of a fallback chain.This CLI is new, so carrying both
CLOSEDAI_TOKENandOPENAI_API_KEYadds precedence behavior you'll need to support indefinitely. If both names are truly required, document the precedence explicitly; otherwise standardize on one. As per coding guidelines, "Avoid compatibility branches like PG_USER and POSTGRESQL_URL unless requested".🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@python/prompt_bench/compresion_test.py` around lines 176 - 179, The code currently uses a fallback chain getenv("CLOSEDAI_TOKEN") or getenv("OPENAI_API_KEY"), which introduces implicit precedence; pick a single canonical env var (e.g., CLOSEDAI_TOKEN) and replace the fallback with a single getenv call for that name, then update the error message in the typer.BadParameter to reference only the chosen variable; if you truly need both, explicitly document and implement clear precedence for CLOSEDAI_TOKEN vs OPENAI_API_KEY rather than relying on the current implicit chain (refer to the api_key variable and the typer.BadParameter raise to locate where to change).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@pyproject.toml`:
- Line 34: The console entrypoint "build-finetune-dataset" points to
python.prompt_bench.build_finetune_dataset:cli but the actual cli() lives in
python.prompt_bench.tools.build_finetune_dataset; fix by either creating a shim
module python/prompt_bench/build_finetune_dataset.py that imports and re-exports
cli from python.prompt_bench.tools.build_finetune_dataset (e.g., from
.tools.build_finetune_dataset import cli) or by editing pyproject.toml to change
the entrypoint to python.prompt_bench.tools.build_finetune_dataset:cli so the
build-finetune-dataset entry references the actual module containing cli().
In `@python/prompt_bench/batch_bill_summarizer.py`:
- Around line 124-125: The current use of safe_id = safe_filename(bill_id)
before request_lines.append(build_request(...)) can cause collisions when
different bill_id values map to the same safe_filename output; change to produce
a collision-safe custom_id (e.g., combine safe_filename(bill_id) with a stable
short hash or unique suffix derived from bill_id) and use that custom_id in
build_request and downstream dataset keys; also keep the original bill_id in the
request metadata so you can trace back to the source. Ensure you update
references to safe_id to the new custom_id variable everywhere
request_lines/dataset building expects custom_id.
In `@python/prompt_bench/compresion_test.py`:
- Around line 35-40: The SYSTEM_PROMPT multi-line string contains lines (notably
the long lines around the EXTRACTION RULES and the IGNORE rule) that exceed the
project's line-length limit; update the SYSTEM_PROMPT definition to wrap those
long lines so each physical line is within the Ruff limit (e.g., break the long
sentences across multiple lines inside the triple-quoted string or split into
concatenated shorter strings) while preserving the exact prompt text and
newlines; locate the SYSTEM_PROMPT variable and adjust the string literal so
lines that previously exceeded the limit (around the EXTRACTION RULES and the
IGNORE bullet) are wrapped to comply with the linter.
In `@python/prompt_bench/container.py`:
- Around line 67-81: check_gpu_free currently only logs warnings, letting
benchmark entry points continue and risk noisy timings or OOMs; change
check_gpu_free to return a boolean (e.g., True when GPU is free, False when
busy) or raise a specific exception (e.g., GPUBusyError) instead of just
logging, and update the benchmark entry points that call check_gpu_free to check
the return value or catch the exception and abort early (or proceed only if an
explicit opt-in/force flag is provided). Ensure the function signature
(check_gpu_free) and its callers are updated consistently and that log messages
remain informative while control flow is handled by the return value/exception.
- Around line 29-74: The subprocess calls use bare executables and trigger Ruff
S607; resolve the absolute paths once with shutil.which and use those absolute
paths in the command lists for start (the docker run command built in the
function that starts vLLM), stop_vllm (the subprocess.run calls that pass
["docker", "stop", CONTAINER_NAME] and ["docker", "rm", CONTAINER_NAME]) and
check_gpu_free (the ["nvidia-smi", ...] call). Add code to call
shutil.which("docker") and shutil.which("nvidia-smi") (store e.g. DOCKER_PATH
and NSMI_PATH), raise a clear RuntimeError or log+exit if which returns None,
and replace the first element of each subprocess command list with the
corresponding absolute path so all subprocess.run invocations reference the
resolved executable.
In `@python/prompt_bench/Dockerfile.finetune`:
- Around line 15-25: The Dockerfile runs as root (no USER directive) which is
insecure; modify the Dockerfile.finetune to create a non-root user and switch to
it before the ENTRYPOINT: add steps after setting WORKDIR to create a user/group
(e.g., uid/gid 1000 or a stable UID), chown /workspace and any copied files
(referenced by WORKDIR and the copied python/prompt_bench/* and
python/__init__.py artifacts) to that user, and add a USER instruction so the
process launched by ENTRYPOINT ["python", "-m", "python.prompt_bench.finetune"]
runs as the non-root user. Ensure ownership changes cover all copied files and
happen before switching users.
In `@python/prompt_bench/downloader.py`:
- Around line 23-27: The is_model_present function can raise if the resolved
path exists but is not a directory; update is_model_present (which calls
local_model_path) to first check path.is_dir() before iterating: return
path.exists() and path.is_dir() and any(path.iterdir()); this ensures you only
call iterdir() on directories and avoids aborting the download flow when a
non-directory file exists at the model path.
In `@python/prompt_bench/finetune_container.py`:
- Around line 25-29: Replace bare "docker" subprocess invocations with a
resolved absolute path: call shutil.which("docker") once (e.g., assign
DOCKER_BIN = shutil.which("docker") at module import), raise a clear error if it
returns None, and then use DOCKER_BIN in all subprocess.run/call invocations
(including the build call that uses DOCKERFILE_PATH and FINETUNE_IMAGE and the
later docker commands around lines ~119–140). This makes subprocess calls
Ruff-compliant (S603/S607) and reuses the resolved path across functions instead
of hardcoding "docker".
In `@python/prompt_bench/finetune.py`:
- Around line 94-95: Add upfront validation in the finetune entrypoint (check
the validation_split parameter passed to the function that declares
validation_split and output_dir) to ensure validation_split is a float within
(0.0, 1.0) (reject 0 or 1) and validate the dataset length before calling
train_test_split (the code that later calls
sklearn.model_selection.train_test_split or a helper for splitting) to ensure
there are enough samples for the requested split (e.g., require
min_total_samples and ensure int(len(dataset) * validation_split) >= 1 and
int(len(dataset) * (1 - validation_split)) >= 1); raise a clear ValueError or
typer.BadParameter with a descriptive message if the split is out of range or
the dataset is too small.
In `@python/prompt_bench/summarization_prompts.py`:
- Around line 1-34: Add a module docstring at the top describing the purpose of
this module (e.g., "Prompt templates for legislative bill summarization") to
satisfy Ruff D100, and fix E501 by breaking up overly long string literals
SUMMARIZATION_SYSTEM_PROMPT and SUMMARIZATION_USER_TEMPLATE so no source line
exceeds the configured max line length (use implicit multi-line string
concatenation with parentheses or textwrap.dedent + join while preserving the
exact prompt text and formatting). Locate the constants
SUMMARIZATION_SYSTEM_PROMPT and SUMMARIZATION_USER_TEMPLATE in this module and
wrap their content into shorter segments or use a dedented triple-quoted
docstring assembled from multiple lines to keep each source line within the
linter limit without changing the prompts' semantics. Ensure no other top-level
comments or code remain missing a module docstring.
In `@python/prompt_bench/tools/build_finetune_dataset.py`:
- Line 1: Add a new python/prompt_bench/tools/__init__.py to turn the implicit
namespace into a regular package (this will clear INP001 and satisfy
Ruff/treefmt); create the file (can be empty or include a minimal __all__
export) so modules like build_finetune_dataset.py import correctly and the tools
package is no longer an implicit namespace.
- Around line 22-61: In both load_requests and load_completions, currently
assigning results[custom_id] will silently overwrite duplicate custom_id
entries; modify each function (load_requests and load_completions) to detect
duplicates before assignment and fail fast—e.g., if custom_id in results: raise
ValueError or log an error and raise with a clear message including the
custom_id and line_number (use line_number in load_completions) to prevent
silent overwrites; ensure the check is placed right before results[custom_id] =
... so duplicates are rejected rather than replaced.
In `@python/prompt_bench/tools/count_tokens.py`:
- Line 1: The CI failure INP001 indicates this module (count_tokens.py) lives in
an implicit namespace package; fix it by making the tools package explicit: add
an __init__.py in the tools package (i.e., alongside count_tokens.py) so
python/prompt_bench/tools becomes an explicit package; ensure the new
__init__.py is minimal (module docstring or empty) and commit so Ruff stops
flagging INP001 for the count_tokens.py module.
- Around line 35-41: Wrap the unguarded JSON parsing around
decoder.raw_decode(text) in a try/except that catches json.JSONDecodeError (and
ValueError for safety), increments totals.errors, logs or notes the offending
path if desired, and continues the loop so a single malformed file doesn't abort
the run; ensure totals.files is only incremented for successfully parsed files
(move totals.files += 1 after successful decode) and keep the existing check of
record.get("usage") to increment totals.errors and continue when usage is
missing.
In `@python/prompt_bench/train.sh`:
- Around line 19-20: The script sets DATASET and OUTPUT_DIR using $(pwd), which
makes paths depend on the caller's CWD; update train.sh to compute the
script/repo directory and build DATASET and OUTPUT_DIR relative to that location
instead of $(pwd). Locate the DATASET and OUTPUT_DIR assignments in train.sh and
replace them with values constructed from the script's directory (e.g., using
dirname/BASH_SOURCE resolution) so the paths correctly point to
output/finetune_dataset.jsonl and output/qwen-bill-summarizer next to the script
regardless of where the script is invoked.
In `@python/prompt_bench/vllm_client.py`:
- Around line 29-40: wait_ready() can exceed max_wait because each _client.get()
uses the client's global timeout; change the probe to compute remaining =
deadline - time.monotonic() before each request, bail if remaining <= 0, and
pass a per-request timeout equal to min(remaining, a safe cap) to
_client.get(...) so a single blocked probe cannot exceed max_wait; ensure you
pass the per-request timeout value supported by httpx.get and continue to catch
httpx.TransportError (and timeout exceptions) as before, using
READY_POLL_INTERVAL between attempts.
In `@systems/bob/scripts/zfs.sh`:
- Around line 6-11: The ZFS dataset path in the provisioning script is
misaligned with the application defaults: change the zpool/zfs dataset creation
so the dataset is mounted at /zfs/models (and create the subpath /zfs/models/hf)
instead of /zfs/storage/models; specifically update the zpool/zfs create
commands in this script (the two sudo zpool create invocations and the sudo zfs
create storage/models line) to create/mount the pool and dataset under
/zfs/models and ensure a dataset or directory for /zfs/models/hf exists so it
matches python/prompt_bench.models:model_dir and
finetune_container:DEFAULT_HF_CACHE and the NixOS hardware.nix mount point.
- Around line 6-7: The two zpool create invocations are invalid because they
omit vdev device paths (the lines containing "sudo zpool create -f -o ...
storage mirror" and "sudo zpool create -o ... storage"); fix by providing
explicit device paths for the vdevs or by removing the duplicate and choosing
the correct topology: for a mirrored pool use "zpool create ... storage mirror
<device1> <device2> [<deviceN>...]" with at least two device nodes (prefer
stable /dev/disk/by-id names), or if you intend a single-disk pool create
without the "mirror" keyword and supply one device; update the script to use the
chosen valid command and remove the invalid alternate create line.
In `@systems/rhapsody-in-green/sunshine.nix`:
- Around line 15-23: Remove the global hardware.firmwareCompression = "none"
setting and instead disable compression only for the EDID derivation: locate the
runCommandLocal "virtual-display-edid" derivation that produces
virtual-display.bin (referenced under hardware.firmware) and add the
per-derivation attribute compressFirmware = false to that derivation; leave
hardware.firmwareCompression unset so other firmware entries remain unaffected.
In `@users/richie/home/gui/vscode/settings.json`:
- Around line 102-103: The JSON parse error is caused by a trailing comma after
the last property "docker.extension.enableComposeLanguageServer": false, —
remove the comma so the final property reads without a trailing comma and the
closing brace remains; ensure the property name
"docker.extension.enableComposeLanguageServer" is left intact and validate the
JSON after editing.
---
Nitpick comments:
In `@python/prompt_bench/batch_bill_summarizer.py`:
- Around line 170-173: The code currently falls back from
getenv("CLOSEDAI_TOKEN") to getenv("OPENAI_API_KEY"); remove the dual-key
compatibility and require a single standardized env var instead (use
CLOSEDAI_TOKEN), so update the api_key assignment to only read
getenv("CLOSEDAI_TOKEN") and change the error in the BadParameter raised by
batch_bill_summarizer.py to reference that one variable name (api_key, getenv,
CLOSEDAI_TOKEN, and the BadParameter raise are the symbols to edit).
In `@python/prompt_bench/compresion_test.py`:
- Around line 176-179: The code currently uses a fallback chain
getenv("CLOSEDAI_TOKEN") or getenv("OPENAI_API_KEY"), which introduces implicit
precedence; pick a single canonical env var (e.g., CLOSEDAI_TOKEN) and replace
the fallback with a single getenv call for that name, then update the error
message in the typer.BadParameter to reference only the chosen variable; if you
truly need both, explicitly document and implement clear precedence for
CLOSEDAI_TOKEN vs OPENAI_API_KEY rather than relying on the current implicit
chain (refer to the api_key variable and the typer.BadParameter raise to locate
where to change).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 962af8a8-beb4-4c1f-ba21-745c5de44dbb
📒 Files selected for processing (25)
.gitignoreoverlays/default.nixpyproject.tomlpython/prompt_bench/Dockerfile.finetunepython/prompt_bench/__init__.pypython/prompt_bench/batch_bill_summarizer.pypython/prompt_bench/bill_token_compression.pypython/prompt_bench/compresion_test.pypython/prompt_bench/container.pypython/prompt_bench/downloader.pypython/prompt_bench/finetune.pypython/prompt_bench/finetune_container.pypython/prompt_bench/main.pypython/prompt_bench/models.pypython/prompt_bench/summarization_prompts.pypython/prompt_bench/tools/build_finetune_dataset.pypython/prompt_bench/tools/count_tokens.pypython/prompt_bench/train.shpython/prompt_bench/vllm_client.pysystems/bob/default.nixsystems/bob/scripts/zfs.shsystems/bob/snapshot_config.tomlsystems/rhapsody-in-green/sunshine.nixusers/richie/home/gui/vscode/settings.jsonusers/richie/systems/bob.nix
💤 Files with no reviewable changes (1)
- users/richie/systems/bob.nix
| prompt-bench-download = "python.prompt_bench.downloader:cli" | ||
| finetune = "python.prompt_bench.finetune:cli" | ||
| finetune-container = "python.prompt_bench.finetune_container:cli" | ||
| build-finetune-dataset = "python.prompt_bench.build_finetune_dataset:cli" |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Verify the actual module path that defines `cli()` for build_finetune_dataset.
set -euo pipefail
echo "== entrypoint in pyproject.toml =="
rg -n 'build-finetune-dataset' pyproject.toml -C2
echo
echo "== candidate files =="
fd -a build_finetune_dataset.py
echo
echo "== cli() definitions =="
rg -n --glob '**/build_finetune_dataset.py' '^\s*def\s+cli\s*\(' -C2Repository: RichieCahill/dotfiles
Length of output: 790
Fix broken console entrypoint for build-finetune-dataset.
Line 34 declares build-finetune-dataset = "python.prompt_bench.build_finetune_dataset:cli", but the cli() function exists only at python/prompt_bench/tools/build_finetune_dataset.py (module path python.prompt_bench.tools.build_finetune_dataset). No shim module exists at python/prompt_bench/build_finetune_dataset.py, so this entrypoint will fail at runtime. Either create a shim that re-exports the cli, or update the entrypoint to reference python.prompt_bench.tools.build_finetune_dataset:cli.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@pyproject.toml` at line 34, The console entrypoint "build-finetune-dataset"
points to python.prompt_bench.build_finetune_dataset:cli but the actual cli()
lives in python.prompt_bench.tools.build_finetune_dataset; fix by either
creating a shim module python/prompt_bench/build_finetune_dataset.py that
imports and re-exports cli from python.prompt_bench.tools.build_finetune_dataset
(e.g., from .tools.build_finetune_dataset import cli) or by editing
pyproject.toml to change the entrypoint to
python.prompt_bench.tools.build_finetune_dataset:cli so the
build-finetune-dataset entry references the actual module containing cli().
| safe_id = safe_filename(bill_id) | ||
| request_lines.append(build_request(safe_id, model, compressed_text)) |
There was a problem hiding this comment.
Sanitizing bill_id into custom_id can create collisions.
Two different bill ids can map to the same safe_filename() output, and the downstream dataset builder joins by custom_id using dict keys. Once that happens, one request/completion pair overwrites another with no error.
Collision-safe alternative
+import hashlib
+
...
- safe_id = safe_filename(bill_id)
+ digest = hashlib.sha1(bill_id.encode("utf-8")).hexdigest()[:8]
+ safe_id = f"{safe_filename(bill_id)}-{digest}"
request_lines.append(build_request(safe_id, model, compressed_text))🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@python/prompt_bench/batch_bill_summarizer.py` around lines 124 - 125, The
current use of safe_id = safe_filename(bill_id) before
request_lines.append(build_request(...)) can cause collisions when different
bill_id values map to the same safe_filename output; change to produce a
collision-safe custom_id (e.g., combine safe_filename(bill_id) with a stable
short hash or unique suffix derived from bill_id) and use that custom_id in
build_request and downstream dataset keys; also keep the original bill_id in the
request metadata so you can trace back to the source. Ensure you update
references to safe_id to the new custom_id variable everywhere
request_lines/dataset building expects custom_id.
| SYSTEM_PROMPT = """You are a legislative analyst extracting policy substance from Congressional bill text. | ||
|
|
||
| Your job is to compress a bill into a dense, neutral structured summary that captures every distinct policy action — including secondary effects that might be buried in subsections. | ||
|
|
||
| EXTRACTION RULES: | ||
| - IGNORE: whereas clauses, congressional findings that are purely political statements, recitals, preambles, citations of existing law by number alone, and procedural boilerplate. |
There was a problem hiding this comment.
Wrap the long SYSTEM_PROMPT lines; treefmt is already failing here.
Lines 37 and 40 exceed Ruff's line-length limit, so this file currently fails CI. As per coding guidelines, "Make python code ruff compliant".
🧰 Tools
🪛 GitHub Actions: treefmt
[error] 37-37: ruff formatting check failed (ruff). E501 Line too long (181 > 120).
[error] 40-40: ruff formatting check failed (ruff). E501 Line too long (179 > 120).
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@python/prompt_bench/compresion_test.py` around lines 35 - 40, The
SYSTEM_PROMPT multi-line string contains lines (notably the long lines around
the EXTRACTION RULES and the IGNORE rule) that exceed the project's line-length
limit; update the SYSTEM_PROMPT definition to wrap those long lines so each
physical line is within the Ruff limit (e.g., break the long sentences across
multiple lines inside the triple-quoted string or split into concatenated
shorter strings) while preserving the exact prompt text and newlines; locate the
SYSTEM_PROMPT variable and adjust the string literal so lines that previously
exceeded the limit (around the EXTRACTION RULES and the IGNORE bullet) are
wrapped to comply with the linter.
| command = [ | ||
| "docker", | ||
| "run", | ||
| "-d", | ||
| "--name", | ||
| CONTAINER_NAME, | ||
| "--device=nvidia.com/gpu=all", | ||
| "--ipc=host", | ||
| "-v", | ||
| f"{model_dir}:/models", | ||
| "-p", | ||
| f"{port}:8000", | ||
| VLLM_IMAGE, | ||
| "--model", | ||
| f"/models/{model}", | ||
| "--served-model-name", | ||
| model, | ||
| "--gpu-memory-utilization", | ||
| str(gpu_memory_utilization), | ||
| "--max-model-len", | ||
| "4096", | ||
| ] | ||
| logger.info("Starting vLLM container with model: %s", model) | ||
| result = subprocess.run(command, capture_output=True, text=True, check=False) | ||
| if result.returncode != 0: | ||
| msg = f"Failed to start vLLM container: {result.stderr.strip()}" | ||
| raise RuntimeError(msg) | ||
| logger.info("vLLM container started: %s", result.stdout.strip()[:12]) | ||
|
|
||
|
|
||
| def stop_vllm() -> None: | ||
| """Stop and remove the vLLM benchmark container.""" | ||
| logger.info("Stopping vLLM container") | ||
| subprocess.run(["docker", "stop", CONTAINER_NAME], capture_output=True, check=False) | ||
| subprocess.run(["docker", "rm", CONTAINER_NAME], capture_output=True, check=False) | ||
| logger.info("vLLM container stopped and removed") | ||
|
|
||
|
|
||
| def check_gpu_free() -> None: | ||
| """Warn if GPU-heavy processes (e.g. Ollama) are running.""" | ||
| result = subprocess.run( | ||
| ["nvidia-smi", "--query-compute-apps=pid,process_name", "--format=csv,noheader"], | ||
| capture_output=True, | ||
| text=True, | ||
| check=False, | ||
| ) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
python - <<'PY'
from pathlib import Path
path = Path("python/prompt_bench/container.py")
for lineno, line in enumerate(path.read_text().splitlines(), 1):
if "subprocess.run(" in line or '"docker"' in line or '"nvidia-smi"' in line:
print(f"{lineno}: {line}")
PYRepository: RichieCahill/dotfiles
Length of output: 483
🏁 Script executed:
# First, let's see the full file to understand imports and context
head -100 python/prompt_bench/container.pyRepository: RichieCahill/dotfiles
Length of output: 2577
🏁 Script executed:
# Check if ruff is available and run it on the file
which ruff && ruff check python/prompt_bench/container.py --select=S603,S607 || echo "Ruff not available; checking manually with ripgrep"
rg -n "subprocess\.run|S60[37]" python/prompt_bench/container.py -A 2 -B 2Repository: RichieCahill/dotfiles
Length of output: 1060
🏁 Script executed:
# Search for shutil imports to see if it's already used
rg "import shutil|from shutil" python/prompt_bench/container.pyRepository: RichieCahill/dotfiles
Length of output: 47
Resolve Ruff S607 violations on subprocess calls by using absolute executable paths.
Lines 52, 62–63, and 69 invoke docker and nvidia-smi as bare executables, triggering S607 (partial path in subprocess call). Resolve each executable once using shutil.which() and pass the absolute path to subprocess.run(). As per coding guidelines, "Make python code ruff compliant".
🧰 Tools
🪛 Ruff (0.15.9)
[error] 52-52: subprocess call: check for execution of untrusted input
(S603)
[error] 62-62: subprocess call: check for execution of untrusted input
(S603)
[error] 62-62: Starting a process with a partial executable path
(S607)
[error] 63-63: subprocess call: check for execution of untrusted input
(S603)
[error] 63-63: Starting a process with a partial executable path
(S607)
[error] 70-70: Starting a process with a partial executable path
(S607)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@python/prompt_bench/container.py` around lines 29 - 74, The subprocess calls
use bare executables and trigger Ruff S607; resolve the absolute paths once with
shutil.which and use those absolute paths in the command lists for start (the
docker run command built in the function that starts vLLM), stop_vllm (the
subprocess.run calls that pass ["docker", "stop", CONTAINER_NAME] and ["docker",
"rm", CONTAINER_NAME]) and check_gpu_free (the ["nvidia-smi", ...] call). Add
code to call shutil.which("docker") and shutil.which("nvidia-smi") (store e.g.
DOCKER_PATH and NSMI_PATH), raise a clear RuntimeError or log+exit if which
returns None, and replace the first element of each subprocess command list with
the corresponding absolute path so all subprocess.run invocations reference the
resolved executable.
python/prompt_bench/container.py
Outdated
| def check_gpu_free() -> None: | ||
| """Warn if GPU-heavy processes (e.g. Ollama) are running.""" | ||
| result = subprocess.run( | ||
| ["nvidia-smi", "--query-compute-apps=pid,process_name", "--format=csv,noheader"], | ||
| capture_output=True, | ||
| text=True, | ||
| check=False, | ||
| ) | ||
| if result.returncode != 0: | ||
| logger.warning("Could not query GPU processes: %s", result.stderr.strip()) | ||
| return | ||
| processes = result.stdout.strip() | ||
| if processes: | ||
| logger.warning("GPU processes detected:\n%s", processes) | ||
| logger.warning("Consider stopping Ollama (sudo systemctl stop ollama) before benchmarking") |
There was a problem hiding this comment.
check_gpu_free() never gives callers a way to stop.
Right now a busy GPU only emits warnings, but both benchmark entry points keep running afterward. That makes timings noisy and can turn the fine-tune path into an avoidable OOM/failure loop. Return a boolean or raise here unless the caller explicitly opts into continuing.
🧰 Tools
🪛 Ruff (0.15.9)
[error] 70-70: Starting a process with a partial executable path
(S607)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@python/prompt_bench/container.py` around lines 67 - 81, check_gpu_free
currently only logs warnings, letting benchmark entry points continue and risk
noisy timings or OOMs; change check_gpu_free to return a boolean (e.g., True
when GPU is free, False when busy) or raise a specific exception (e.g.,
GPUBusyError) instead of just logging, and update the benchmark entry points
that call check_gpu_free to check the return value or catch the exception and
abort early (or proceed only if an explicit opt-in/force flag is provided).
Ensure the function signature (check_gpu_free) and its callers are updated
consistently and that log messages remain informative while control flow is
handled by the return value/exception.
| def wait_ready(self, max_wait: int) -> None: | ||
| """Poll /v1/models until the server is ready or timeout.""" | ||
| deadline = time.monotonic() + max_wait | ||
| while time.monotonic() < deadline: | ||
| try: | ||
| response = self._client.get("/v1/models") | ||
| if response.is_success: | ||
| logger.info("vLLM server is ready") | ||
| return | ||
| except httpx.TransportError: | ||
| pass | ||
| time.sleep(READY_POLL_INTERVAL) |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
In httpx, does a client-level timeout apply to every request unless a per-request timeout override is provided?
💡 Result:
Yes, in httpx, a client-level timeout applies to every request made with that client by default, unless a per-request timeout override is provided.
Citations:
wait_ready() can exceed max_wait because the client's global timeout applies to each request.
Line 34 uses the client's default timeout (set at Line 27). When that timeout is large, a single blocked probe can exceed max_wait entirely, making startup timeout nondeterministic.
Proposed fix
def wait_ready(self, max_wait: int) -> None:
"""Poll /v1/models until the server is ready or timeout."""
deadline = time.monotonic() + max_wait
while time.monotonic() < deadline:
try:
- response = self._client.get("/v1/models")
+ response = self._client.get("/v1/models", timeout=5.0)
if response.is_success:
logger.info("vLLM server is ready")
return
except httpx.TransportError:
pass📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def wait_ready(self, max_wait: int) -> None: | |
| """Poll /v1/models until the server is ready or timeout.""" | |
| deadline = time.monotonic() + max_wait | |
| while time.monotonic() < deadline: | |
| try: | |
| response = self._client.get("/v1/models") | |
| if response.is_success: | |
| logger.info("vLLM server is ready") | |
| return | |
| except httpx.TransportError: | |
| pass | |
| time.sleep(READY_POLL_INTERVAL) | |
| def wait_ready(self, max_wait: int) -> None: | |
| """Poll /v1/models until the server is ready or timeout.""" | |
| deadline = time.monotonic() + max_wait | |
| while time.monotonic() < deadline: | |
| try: | |
| response = self._client.get("/v1/models", timeout=5.0) | |
| if response.is_success: | |
| logger.info("vLLM server is ready") | |
| return | |
| except httpx.TransportError: | |
| pass | |
| time.sleep(READY_POLL_INTERVAL) |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@python/prompt_bench/vllm_client.py` around lines 29 - 40, wait_ready() can
exceed max_wait because each _client.get() uses the client's global timeout;
change the probe to compute remaining = deadline - time.monotonic() before each
request, bail if remaining <= 0, and pass a per-request timeout equal to
min(remaining, a safe cap) to _client.get(...) so a single blocked probe cannot
exceed max_wait; ensure you pass the per-request timeout value supported by
httpx.get and continue to catch httpx.TransportError (and timeout exceptions) as
before, using READY_POLL_INTERVAL between attempts.
| sudo zpool create -f -o ashift=12 -O acltype=posixacl -O atime=off -O dnodesize=auto -O xattr=sa -O compression=zstd -m /zfs/storage storage mirror | ||
| sudo zpool create -o ashift=12 -O acltype=posixacl -O atime=off -O dnodesize=auto -O xattr=sa -O compression=zstd -m /zfs/storage storage | ||
|
|
||
|
|
||
| # storage datasets | ||
| sudo zfs create storage/models -o recordsize=1M |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, locate the referenced Python files
find . -path "*/python/prompt_bench/models.py" -o -path "*/python/prompt_bench/finetune_container.py" | head -20Repository: RichieCahill/dotfiles
Length of output: 140
🏁 Script executed:
# Also search more broadly for these files
fd -t f "models.py|finetune_container.py" | grep -E "prompt_bench|benchmark"Repository: RichieCahill/dotfiles
Length of output: 136
🏁 Script executed:
# Check the zfs.sh file content to understand the full context
find . -path "*/systems/bob/scripts/zfs.sh" -exec cat -n {} \;Repository: RichieCahill/dotfiles
Length of output: 529
🏁 Script executed:
# Check models.py around line 12
sed -n '1,20p' python/prompt_bench/models.pyRepository: RichieCahill/dotfiles
Length of output: 517
🏁 Script executed:
# Check finetune_container.py around lines 16-22
sed -n '1,30p' python/prompt_bench/finetune_container.pyRepository: RichieCahill/dotfiles
Length of output: 869
🏁 Script executed:
# Search for references to /zfs/models/hf throughout the codebase
rg -i "/zfs/models" --type-list | head -5
rg -i "/zfs/models" 2>/dev/null | head -20Repository: RichieCahill/dotfiles
Length of output: 793
🏁 Script executed:
# Search for references to /zfs/storage in the codebase
rg -i "/zfs/storage" 2>/dev/null | head -20Repository: RichieCahill/dotfiles
Length of output: 906
🏁 Script executed:
# Check if there are any other zfs dataset creation commands or references to storage/models
rg "storage/models" 2>/dev/nullRepository: RichieCahill/dotfiles
Length of output: 139
Model path mismatch between provisioning script and application defaults.
This script provisions the ZFS dataset at /zfs/storage/models, but all application code expects models at /zfs/models/hf:
python/prompt_bench/models.pyline 12:model_dir: str = "/zfs/models/hf"python/prompt_bench/finetune_container.pyline 20:DEFAULT_HF_CACHE = Path("/zfs/models/hf")- NixOS config (
systems/bob/hardware.nix) also configures/zfs/modelsas the mount point
Without alignment, model discovery and caching will fail or use divergent paths.
💡 Proposed fix
-# storage datasets
-sudo zfs create storage/models -o recordsize=1M
+# storage datasets (align with Python defaults)
+sudo zfs create -o mountpoint=/zfs/models -o recordsize=1M storage/models
+sudo zfs create storage/models/hf📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| sudo zpool create -f -o ashift=12 -O acltype=posixacl -O atime=off -O dnodesize=auto -O xattr=sa -O compression=zstd -m /zfs/storage storage mirror | |
| sudo zpool create -o ashift=12 -O acltype=posixacl -O atime=off -O dnodesize=auto -O xattr=sa -O compression=zstd -m /zfs/storage storage | |
| # storage datasets | |
| sudo zfs create storage/models -o recordsize=1M | |
| sudo zpool create -f -o ashift=12 -O acltype=posixacl -O atime=off -O dnodesize=auto -O xattr=sa -O compression=zstd -m /zfs/storage storage mirror | |
| sudo zpool create -o ashift=12 -O acltype=posixacl -O atime=off -O dnodesize=auto -O xattr=sa -O compression=zstd -m /zfs/storage storage | |
| # storage datasets (align with Python defaults) | |
| sudo zfs create -o mountpoint=/zfs/models -o recordsize=1M storage/models | |
| sudo zfs create storage/models/hf |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@systems/bob/scripts/zfs.sh` around lines 6 - 11, The ZFS dataset path in the
provisioning script is misaligned with the application defaults: change the
zpool/zfs dataset creation so the dataset is mounted at /zfs/models (and create
the subpath /zfs/models/hf) instead of /zfs/storage/models; specifically update
the zpool/zfs create commands in this script (the two sudo zpool create
invocations and the sudo zfs create storage/models line) to create/mount the
pool and dataset under /zfs/models and ensure a dataset or directory for
/zfs/models/hf exists so it matches python/prompt_bench.models:model_dir and
finetune_container:DEFAULT_HF_CACHE and the NixOS hardware.nix mount point.
| sudo zpool create -f -o ashift=12 -O acltype=posixacl -O atime=off -O dnodesize=auto -O xattr=sa -O compression=zstd -m /zfs/storage storage mirror | ||
| sudo zpool create -o ashift=12 -O acltype=posixacl -O atime=off -O dnodesize=auto -O xattr=sa -O compression=zstd -m /zfs/storage storage |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
cat -n systems/bob/scripts/zfs.shRepository: RichieCahill/dotfiles
Length of output: 529
🌐 Web query:
zpool create mirror syntax requirements devices arguments
💡 Result:
The syntax for creating a ZFS mirror pool with zpool create is: zpool create [options] poolname mirror device1 device2 [mirror device3 device4 ...] Key requirements and arguments: - poolname: Must begin with a letter; alphanumeric, underscore, dash, colon, space, period allowed. Reserved names: mirror, raidz, draid, spare, log (and prefixes). - mirror keyword: Specifies a mirrored vdev (virtual device). Repeat for multiple mirrors (striped mirrors). - devices: At least 2 block devices per mirror (whole disks like /dev/sda, partitions like /dev/sda1, or files). Devices must be unused, same size recommended, accessible. Use -f to force if in use. - Minimum: 2 devices for single 2-way mirror. Options (common): - -f: Force use of in-use devices. - -m mountpoint: Set mount point (default /poolname). - -o property=value: Set pool properties (e.g., ashift=12 for 4K sectors). - -d: Dry run. - -R root: Alternate root for cachefile=none, altroot. Example single mirror: zpool create tank mirror c1d0 c2d0 Example two mirrors (striped): zpool create tank mirror c1d0 c2d0 mirror c3d0 c4d0 Devices validated for availability and consistent redundancy. Data striped across multiple mirrors if present.
Citations:
- 1: https://docs.oracle.com/cd/E19253-01/819-5461/gazhv/index.html
- 2: https://docs.oracle.com/en/operating-systems/solaris/oracle-solaris/11.4/manage-zfs/creating-mirrored-storage-pool.html
- 3: https://www.gsp.com/cgi-bin/man.cgi?section=8&topic=ZPOOL-CREATE
- 4: https://openzfs.github.io/openzfs-docs/man/master/8/zpool-create.8.html
🏁 Script executed:
ls -la systems/bob/
cat systems/bob/README.md 2>/dev/null || echo "No README found"Repository: RichieCahill/dotfiles
Length of output: 563
🏁 Script executed:
cat -n systems/bob/hardware.nix
echo "---"
cat -n systems/bob/default.nixRepository: RichieCahill/dotfiles
Length of output: 3591
zpool create commands are syntactically invalid and will fail.
Line 6 ends with mirror but provides no device paths. Line 7 attempts to create the same pool without any vdev specification. Both commands violate ZFS syntax: zpool create poolname mirror device1 device2 ... requires at least two devices per mirror. The script will fail during provisioning.
Proposed fix
-sudo zpool create -f -o ashift=12 -O acltype=posixacl -O atime=off -O dnodesize=auto -O xattr=sa -O compression=zstd -m /zfs/storage storage mirror
-sudo zpool create -o ashift=12 -O acltype=posixacl -O atime=off -O dnodesize=auto -O xattr=sa -O compression=zstd -m /zfs/storage storage
+# Replace <device1> and <device2> with actual block devices (e.g., /dev/sda, /dev/sdb)
+sudo zpool create -f -o ashift=12 -O acltype=posixacl -O atime=off -O dnodesize=auto -O xattr=sa -O compression=zstd -m /zfs/storage storage mirror <device1> <device2>📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| sudo zpool create -f -o ashift=12 -O acltype=posixacl -O atime=off -O dnodesize=auto -O xattr=sa -O compression=zstd -m /zfs/storage storage mirror | |
| sudo zpool create -o ashift=12 -O acltype=posixacl -O atime=off -O dnodesize=auto -O xattr=sa -O compression=zstd -m /zfs/storage storage | |
| # Replace <device1> and <device2> with actual block devices (e.g., /dev/sda, /dev/sdb) | |
| sudo zpool create -f -o ashift=12 -O acltype=posixacl -O atime=off -O dnodesize=auto -O xattr=sa -O compression=zstd -m /zfs/storage storage mirror <device1> <device2> |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@systems/bob/scripts/zfs.sh` around lines 6 - 7, The two zpool create
invocations are invalid because they omit vdev device paths (the lines
containing "sudo zpool create -f -o ... storage mirror" and "sudo zpool create
-o ... storage"); fix by providing explicit device paths for the vdevs or by
removing the duplicate and choosing the correct topology: for a mirrored pool
use "zpool create ... storage mirror <device1> <device2> [<deviceN>...]" with at
least two device nodes (prefer stable /dev/disk/by-id names), or if you intend a
single-disk pool create without the "mirror" keyword and supply one device;
update the script to use the chosen valid command and remove the invalid
alternate create line.
| hardware = { | ||
| firmwareCompression = "none"; | ||
| firmware = [ | ||
| (pkgs.runCommandLocal "virtual-display-edid" { } '' | ||
| mkdir -p $out/lib/firmware/edid | ||
| cp ${./edid/virtual-display.bin} $out/lib/firmware/edid/virtual-display.bin | ||
| '' | ||
| ) | ||
| ]; | ||
| '') | ||
| ]; | ||
| }; |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
For the nixpkgs/NixOS version pinned by this repo, does hardware.firmwareCompression = "none"apply globally to all entries inhardware.firmware? Also, is compressFirmware = false on a firmware derivation still supported/recommended for per-package control?
💡 Result:
Yes, hardware.firmwareCompression = "none" applies globally to all entries in hardware.firmware. It controls the compression method used when building the firmware environment via pkgs.buildEnv, where each package in the list is passed through a compressFirmware function. This function checks the global option first: if set to "none", no compression is applied to any firmware package, regardless of individual settings. Otherwise, it respects a per-package compressFirmware ? false attribute if present. The compressFirmware = false attribute on a firmware derivation is still supported as a fallback for per-package control when global compression is enabled (e.g., "xz" or "zstd"). It is not needed (and ignored) when global is "none". This is the recommended way for selective disabling, as seen in discussions around PR #172820. This behavior holds in recent nixpkgs (e.g., NixOS 24.11/unstable), with kernel-dependent defaults: "none" pre-5.3, "xz" 5.3-5.18, "zstd" 5.19+.
Citations:
- 1: https://github.com/NixOS/nixpkgs/blob/ed142ab1b3a092c4d149245d0c4126a5d7ea00b0/nixos/modules/services/hardware/udev.nix
- 2: https://mynixos.com/nixpkgs/option/hardware.firmwareCompression
- 3: Compressed firmware NixOS/nixpkgs#172820
- 4: https://search.nixos.org/options?query=hardware.firmware&show=hardware.firmware
- 5: https://discourse.nixos.org/t/how-does-nixos-link-built-packages-firmware/30663
hardware.firmwareCompression = "none" affects all firmware, not just this EDID blob.
This is a global policy change at line 16 that disables compression for every entry in hardware.firmware, not just virtual-display.bin. If other firmware packages exist in this configuration (now or in future), they will also become uncompressed.
For scoped control of only the EDID derivation, remove the global setting and add compressFirmware = false to the derivation itself. The per-package attribute is still supported and is the recommended approach for selective disabling when global compression is enabled.
Suggested scoped alternative
hardware = {
- firmwareCompression = "none";
firmware = [
- (pkgs.runCommandLocal "virtual-display-edid" { } ''
+ (pkgs.runCommandLocal "virtual-display-edid" {
+ compressFirmware = false;
+ } ''
mkdir -p $out/lib/firmware/edid
cp ${./edid/virtual-display.bin} $out/lib/firmware/edid/virtual-display.bin
'')
];
};🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@systems/rhapsody-in-green/sunshine.nix` around lines 15 - 23, Remove the
global hardware.firmwareCompression = "none" setting and instead disable
compression only for the EDID derivation: locate the runCommandLocal
"virtual-display-edid" derivation that produces virtual-display.bin (referenced
under hardware.firmware) and add the per-derivation attribute compressFirmware =
false to that derivation; leave hardware.firmwareCompression unset so other
firmware entries remain unaffected.
| "docker.extension.enableComposeLanguageServer": false, | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# Check if file exists and read the lines around 102-103
if [ -f "users/richie/home/gui/vscode/settings.json" ]; then
echo "File found. Reading lines 95-110 for context:"
sed -n '95,110p' "users/richie/home/gui/vscode/settings.json" | cat -n
else
echo "File not found at users/richie/home/gui/vscode/settings.json"
# Try to find it
find . -name "settings.json" -type f | head -20
fiRepository: RichieCahill/dotfiles
Length of output: 504
Remove the trailing comma on the last property to resolve parse failure.
Line 102 has a trailing comma after false, which is invalid JSON syntax. The last property in a JSON object must not have a trailing comma before the closing brace on line 103.
💡 Proposed fix
- "docker.extension.enableComposeLanguageServer": false,
+ "docker.extension.enableComposeLanguageServer": false
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| "docker.extension.enableComposeLanguageServer": false, | |
| } | |
| "docker.extension.enableComposeLanguageServer": false | |
| } |
🧰 Tools
🪛 Biome (2.4.10)
[error] 102-102: End of file expected
(parse)
[error] 102-102: End of file expected
(parse)
[error] 102-102: End of file expected
(parse)
[error] 102-103: End of file expected
(parse)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@users/richie/home/gui/vscode/settings.json` around lines 102 - 103, The JSON
parse error is caused by a trailing comma after the last property
"docker.extension.enableComposeLanguageServer": false, — remove the comma so the
final property reads without a trailing comma and the closing brace remains;
ensure the property name "docker.extension.enableComposeLanguageServer" is left
intact and validate the JSON after editing.
tested on sample size of 100 bills matching the distribution of our data Compression saves ~11.5% on prompt tokens; completion/reasoning are roughly equal across the two sets. prompt completion reasoning total compressed 349,460 157,110 112,128 506,570 uncompressed 394,948 154,710 110,080 549,658 delta −45,488 +2,400 +2,048 −43,088
batch bill summarizer sends a batch api call to gpt
ac02d40 to
a80de99
Compare
Summary by CodeRabbit
Release Notes