RFC: Streamlining dashboard release and deployment by annakrystalli · Pull Request #35 · reichlab/decisions

annakrystalli · 2026-03-10T14:57:42Z

Summary

Proposes a phased approach to reduce manual steps and institutional knowledge barriers in the hubverse dashboard release/deployment pipeline
Phase 1 (near-term): Claude Code skills encoding release workflows, automated renv.lock updates
Phase 2 (medium-term): Replace brittle Docker CI with independent validation, dedicated test hub with e2e smoke tests, rework config schema handling, evaluate eliminating Docker indirection

Context

The team's first post-departure schema deployment (hubPredEvalsData#33) exposed structural problems in the multi-repo release chain. With the eval-metrics-expansion project planning multiple schema changes, these issues need addressing before the next deployment cycle.

🤖 Generated with Claude Code

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

nickreich

This is a great summary of the current status of things, and a helpful take on some concrete next steps. Some comments and questions:

It wasn't clear to me from a quick read over the whole RFC about the implications in the case of a breaking schema change. For example, if we introduce a breaking schema change, do we have to go through to all dashboards and force a change to their config files? or will they be able to limp along with a previous version? if we need to make changes, then I think part of the documented process should be how to ensure all those changes get made.
Claude skills still feel new to me. While they do seem like a prime candidate for a way to "automate" some of these processes, it feels a bit unknown to me. Probably it will be easier to maintain, but we would want to make sure that there is clear documentation so that whenever a change is made to the process that is reflected in skill updates.
in general, could we be more specific about whether the plan is the complete both Phase 1 and 2 or just phase 1 prior to starting on the new eval schema changes?

annakrystalli · 2026-03-10T15:34:25Z

Thanks for the speedy review @nickreich!

Re: breaking schema changes — Yes, breaking changes will still require updating config files across all dashboards. This is effectively the status quo — while the dashboard tools don't explicitly enforce :latest, they achieve the same effect indirectly (e.g., hubPredEvalsData hardcodes a minimum_version that rejects configs below v1.0.1, so once the tool auto-upgrades via :latest, old configs are rejected). The RFC proposes we lean into this pattern rather than fight it.

The alternative (version-pinnable tool references + backwards compatibility) was considered and rejected (see "Other Options Considered" 1) because it would require significantly more effort to maintain compatibility across the tool chain (different languages, Docker images, config schemas, JS modules) and would push active version management onto hub admins.

What the RFC does propose to improve the breaking change experience:

/dashboard-config-migrate skill to guide and automate updating configs across all dashboards after a schema change
Reworking config schema handling (Phase 2) so that breaking changes produce clear config validation errors pinpointing exactly what needs fixing, rather than the current blunt minimum_version rejection that just says "your version is too old"
End-to-end smoke tests to verify the full pipeline works with updated configs before release

Re: Claude Code skills — Fair concern. Worth noting that skills are essentially documented runbooks in executable form — they encode the same step-by-step processes that are currently in the developer docs, but in a way that can be executed and checked interactively. Think of it as an executable script with Claude handholding you through the process, with the opportunity to work through issues that arise or ask questions as you go along. Unlike a plain script that either succeeds or fails, if something unexpected happens mid-way (a test fails, a version doesn't match), Claude can help diagnose and adapt rather than just aborting.

On maintenance: when the underlying process changes, the skill is the thing that gets updated — there's no separate doc to keep in sync. That said, we should make sure each skill links back to the relevant developer docs, and that any process change triggers a skill review. We could add that as an explicit step in the /dashboard-release skill itself (i.e., "if you changed a process, update the relevant skills").

If it would help, I can demo one of the simpler skills (e.g., /dashboard-local-build) early on so the team can see what working with them feels like in practice before committing to the full set.

Re: phasing vs eval-metrics timeline — The plan is to complete Phase 1 before starting on the eval schema changes. The skills and automated renv.lock workflow are lightweight enough to get in place first, and will directly reduce the deployment pain for each subsequent schema change.

For Phase 2, the most impactful items would be replacing the Docker CI comparison tests with independent validation and the dedicated test hub to run them against, since together they unblock confident releases. However, they're more involved, so whether we tackle them before or alongside the eval schema work depends on how quickly we want to move on those changes. The remaining Phase 2 items (schema handling rework, Docker elimination evaluation) can be picked off if and when we decide we want to — they're improvements, not blockers.

matthewcornell · 2026-03-11T17:55:07Z

Echoing @nickreich 's comment:

This is a great summary of the current status of things, and a helpful take on some concrete next steps.

Some small comments:

"Aims":

should we add something about being able to more easily set up remote staging/previews?

"Claude Code skills for dashboard operations":

I suspect it'll take some time for us to become comfortable with these agents. Trust is one issue - how do we know they're correct?

"Automated renv.lock update workflow"

Is this necessary for all types of changes? I don't tend to update it unless I have a specific reason.

"Replace comparison-based Docker CI with independent validation"

yes!

"Dedicated test hub and end-to-end smoke test"

good idea
would this operate as staging as well?

"Rework config schema handling in hubPredEvalsData"

good idea

"Evaluate eliminating Docker indirection for hubPredEvalsData"

that would simplify things a lot

Add RFC for streamlining dashboard release and deployment

6876396

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

nickreich reviewed Mar 10, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFC: Streamlining dashboard release and deployment#35

RFC: Streamlining dashboard release and deployment#35
annakrystalli wants to merge 1 commit intomainfrom
ak/rfc/dashboard-release-deployment

annakrystalli commented Mar 10, 2026

Uh oh!

nickreich left a comment

Uh oh!

annakrystalli commented Mar 10, 2026 •

edited

Loading

Uh oh!

matthewcornell commented Mar 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

annakrystalli commented Mar 10, 2026

Summary

Context

Uh oh!

nickreich left a comment

Choose a reason for hiding this comment

Uh oh!

annakrystalli commented Mar 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

matthewcornell commented Mar 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

annakrystalli commented Mar 10, 2026 •

edited

Loading