Merge docker.sh scripts + enhance live monitor and sync#263
Closed
Ultimate-Storm wants to merge 3 commits intomainfrom
Closed
Merge docker.sh scripts + enhance live monitor and sync#263Ultimate-Storm wants to merge 3 commits intomainfrom
Ultimate-Storm wants to merge 3 commits intomainfrom
Conversation
Embed live_sync integration directly in the master_template.yml docker_cln_sh template so each client startup kit produces a single docker.sh with all flags. _injectLiveSyncIntoStartupKits.sh now only copies the helper files (sync.conf, build_heartbeat.sh, live_sync.sh) instead of creating a wrapper that delegates to docker_original.sh. Live sync auto-starts for --local_training (foreground, killed on exit) and --start_client (nohup daemon). All other modes are unchanged. If live_sync.sh is not present the hooks are a graceful no-op. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ads, version tracking
- server_tools/app.py: Major overhaul of the MediSwarm Live Monitor webviewer
- Add filter bar (site, mode, status, job_id) with default sort by newest
- Add status inference (stale >5min, finished >1hr, heartbeat_final.json wins)
- Add file download endpoint for all run artifacts
- Add job grouping for swarm runs
- Add kit version column from heartbeat data
- Add training summary extraction (best val metrics, epoch count, FL rounds)
- Add TensorBoard metric parsing and inline charts via tbparse
- Add enriched detail page with full file inventory, checkpoints, models cards
- Add stats bar with running/finished/stale/site counts
- Add server-side file paths with download buttons
- kit_live_sync/build_heartbeat.sh: Add kit_version field extracted from docker.sh
MEDISWARM_VERSION baked in at build time
- kit_live_sync/live_sync.sh: Fix duplicate entries and empty heartbeat fields
- Export SCRATCHDIR before calling build_heartbeat.sh so run_dir is populated
- Track current run and finalize old runs with heartbeat_final.json when a new
local training run starts (prevents stale "running" entries)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Contributor
There was a problem hiding this comment.
CodeQL found more than 20 potential problems in the proposed changes. Check the Files changed tab for more details.
…hart is present Chart.js CDN script was only included inside the console metrics chart block, so TensorBoard charts would try to use Chart() without the library loaded when console metrics were absent. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
| if not events: | ||
| return {"scalars": []} | ||
|
|
||
| # Parse the directory containing events |
| media_type="application/octet-stream", | ||
| ) | ||
|
|
||
|
|
|
|
||
| return FileResponse( | ||
| path=str(target), | ||
| filename=target.name, |
|
|
||
| return FileResponse( | ||
| path=str(target), | ||
| filename=target.name, |
|
|
||
| if not target.exists() or not target.is_file(): | ||
| raise HTTPException(status_code=404, detail="File not found") | ||
|
|
|
|
||
| if has_final: | ||
| try: | ||
| final = json.loads((run_dir / "heartbeat_final.json").read_text()) |
| - If status is "running" but heartbeat is >1 hour old -> "finished" (presumed) | ||
| - Otherwise use heartbeat status as-is | ||
| """ | ||
| has_final = (run_dir / "heartbeat_final.json").exists() |
| cls = "badge-finished" | ||
| elif status in ("error", "failed"): | ||
| cls = "badge-error" | ||
| elif status == "stale": |
Comment on lines
+1117
to
+1121
|
|
||
|
|
||
| # --------------------------------------------------------------------------- | ||
| # File download endpoint | ||
| # --------------------------------------------------------------------------- |
Comment on lines
+1117
to
+1121
|
|
||
|
|
||
| # --------------------------------------------------------------------------- | ||
| # File download endpoint | ||
| # --------------------------------------------------------------------------- |
Contributor
Author
|
Closing — these changes were already merged via PR #266 and subsequent commits to main. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
docker.shinstead of a wrapperdocker.sh+docker_original.sh. All flags (--dummy_training,--preflight_check,--local_training,--start_client,--job,--model_name, etc.) work directly. Live sync for the MediSwarm Live Monitor is automatically started for--local_trainingand--start_clientmodes.heartbeat_final.jsonso it doesn't linger as "running" forever. Also fixedSCRATCHDIRnot being exported beforebuild_heartbeat.sh, causing empty heartbeat fields.build_heartbeat.shnow extractsMEDISWARM_VERSIONfrom the startup kit'sdocker.shand includes it in heartbeat JSON.Files Changed
docker_config/master_template.ymldocker_cln_shtemplatescripts/build/_injectLiveSyncIntoStartupKits.shserver_tools/app.pykit_live_sync/build_heartbeat.shkit_versionfieldkit_live_sync/live_sync.shTest plan
docker.shgenerated (nodocker_original.sh)--log_dataset_detailson DL0 — completed successfully🤖 Generated with Claude Code