Skip to content

RFC: Streamlining dashboard release and deployment#35

Open
annakrystalli wants to merge 1 commit intomainfrom
ak/rfc/dashboard-release-deployment
Open

RFC: Streamlining dashboard release and deployment#35
annakrystalli wants to merge 1 commit intomainfrom
ak/rfc/dashboard-release-deployment

Conversation

@annakrystalli
Copy link
Collaborator

Summary

  • Proposes a phased approach to reduce manual steps and institutional knowledge barriers in the hubverse dashboard release/deployment pipeline
  • Phase 1 (near-term): Claude Code skills encoding release workflows, automated renv.lock updates
  • Phase 2 (medium-term): Replace brittle Docker CI with independent validation, dedicated test hub with e2e smoke tests, rework config schema handling, evaluate eliminating Docker indirection

Context

The team's first post-departure schema deployment (hubPredEvalsData#33) exposed structural problems in the multi-repo release chain. With the eval-metrics-expansion project planning multiple schema changes, these issues need addressing before the next deployment cycle.

🤖 Generated with Claude Code

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Member

@nickreich nickreich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a great summary of the current status of things, and a helpful take on some concrete next steps. Some comments and questions:

  1. It wasn't clear to me from a quick read over the whole RFC about the implications in the case of a breaking schema change. For example, if we introduce a breaking schema change, do we have to go through to all dashboards and force a change to their config files? or will they be able to limp along with a previous version? if we need to make changes, then I think part of the documented process should be how to ensure all those changes get made.
  2. Claude skills still feel new to me. While they do seem like a prime candidate for a way to "automate" some of these processes, it feels a bit unknown to me. Probably it will be easier to maintain, but we would want to make sure that there is clear documentation so that whenever a change is made to the process that is reflected in skill updates.
  3. in general, could we be more specific about whether the plan is the complete both Phase 1 and 2 or just phase 1 prior to starting on the new eval schema changes?

@annakrystalli
Copy link
Collaborator Author

annakrystalli commented Mar 10, 2026

Thanks for the speedy review @nickreich!

Re: breaking schema changes — Yes, breaking changes will still require updating config files across all dashboards. This is effectively the status quo — while the dashboard tools don't explicitly enforce :latest, they achieve the same effect indirectly (e.g., hubPredEvalsData hardcodes a minimum_version that rejects configs below v1.0.1, so once the tool auto-upgrades via :latest, old configs are rejected). The RFC proposes we lean into this pattern rather than fight it.

The alternative (version-pinnable tool references + backwards compatibility) was considered and rejected (see "Other Options Considered" 1) because it would require significantly more effort to maintain compatibility across the tool chain (different languages, Docker images, config schemas, JS modules) and would push active version management onto hub admins.

What the RFC does propose to improve the breaking change experience:

  • /dashboard-config-migrate skill to guide and automate updating configs across all dashboards after a schema change
  • Reworking config schema handling (Phase 2) so that breaking changes produce clear config validation errors pinpointing exactly what needs fixing, rather than the current blunt minimum_version rejection that just says "your version is too old"
  • End-to-end smoke tests to verify the full pipeline works with updated configs before release

Re: Claude Code skills — Fair concern. Worth noting that skills are essentially documented runbooks in executable form — they encode the same step-by-step processes that are currently in the developer docs, but in a way that can be executed and checked interactively. Think of it as an executable script with Claude handholding you through the process, with the opportunity to work through issues that arise or ask questions as you go along. Unlike a plain script that either succeeds or fails, if something unexpected happens mid-way (a test fails, a version doesn't match), Claude can help diagnose and adapt rather than just aborting.

On maintenance: when the underlying process changes, the skill is the thing that gets updated — there's no separate doc to keep in sync. That said, we should make sure each skill links back to the relevant developer docs, and that any process change triggers a skill review. We could add that as an explicit step in the /dashboard-release skill itself (i.e., "if you changed a process, update the relevant skills").

If it would help, I can demo one of the simpler skills (e.g., /dashboard-local-build) early on so the team can see what working with them feels like in practice before committing to the full set.

Re: phasing vs eval-metrics timeline — The plan is to complete Phase 1 before starting on the eval schema changes. The skills and automated renv.lock workflow are lightweight enough to get in place first, and will directly reduce the deployment pain for each subsequent schema change.

For Phase 2, the most impactful items would be replacing the Docker CI comparison tests with independent validation and the dedicated test hub to run them against, since together they unblock confident releases. However, they're more involved, so whether we tackle them before or alongside the eval schema work depends on how quickly we want to move on those changes. The remaining Phase 2 items (schema handling rework, Docker elimination evaluation) can be picked off if and when we decide we want to — they're improvements, not blockers.

@matthewcornell
Copy link
Member

Echoing @nickreich 's comment:

This is a great summary of the current status of things, and a helpful take on some concrete next steps.

Some small comments:

"Aims":

  • should we add something about being able to more easily set up remote staging/previews?

"Claude Code skills for dashboard operations":

  • I suspect it'll take some time for us to become comfortable with these agents. Trust is one issue - how do we know they're correct?

"Automated renv.lock update workflow"

  • Is this necessary for all types of changes? I don't tend to update it unless I have a specific reason.

"Replace comparison-based Docker CI with independent validation"

  • yes!

"Dedicated test hub and end-to-end smoke test"

  • good idea
  • would this operate as staging as well?

"Rework config schema handling in hubPredEvalsData"

  • good idea

"Evaluate eliminating Docker indirection for hubPredEvalsData"

  • that would simplify things a lot

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants