Skip to content

[poster] Expanding Hubverse Evaluation Metrics and Dashboard Support#34

Open
nickreich wants to merge 9 commits intomainfrom
ngr/poster/eval-metrics-expansion
Open

[poster] Expanding Hubverse Evaluation Metrics and Dashboard Support#34
nickreich wants to merge 9 commits intomainfrom
ngr/poster/eval-metrics-expansion

Conversation

@nickreich
Copy link
Member

Summary

This PR adds the project poster for expanding the hubverse forecast evaluation ecosystem and fixes a README inconsistency.

Project poster (project-posters/eval-metrics-expansion/)

  • Five mini-sprints: UI polish (A), config-driven enhancements (B), scale transformation pipeline (C), variogram score / sample scoring (D), developer documentation (E)
  • Includes AGENTS.md for AI agent / contributor onboarding context
  • Sprint ordering and dependencies documented; each sprint is independently releasable

README fix

Corrects the poster creation instructions to use project-posters/<project>/ instead of posters/<project>/, matching the actual convention used by all existing posters in the repo.

Review timeline

1 week suggested.

🤖 Generated with Claude Code

Adds the project poster for expanding hubverse evaluation metrics and
dashboard support (five mini-sprints: UI polish, config-driven enhancements,
scale transforms, variogram score, documentation).

Also corrects the README poster instructions to use `project-posters/`
instead of `posters/`, matching the actual convention in the repo.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copy link

@seabbs seabbs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me. variogram has landed on scoringutils main already and there should be a CRAN release this week. We expect some small changes to i.e documentation to make it easier to use but nothing breaking.

**hubPredEvalsData changes:**
- Add `transform_defaults` (top-level) and per-target `transform` to `inst/schema/v1.1.0/config_schema.json`
- Allowed transform functions: `log_shift`, `sqrt`, `log1p`, `log`, `log10`, `log2`
- `append: true/false` — when true, scores.csv gains a `scale` column (`"natural"` or transform label)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given you want a wide table, It might be good to think through whether hubPredEvalsData should also output a wide table? This would make the table easier to present, not sure about the evals visualisation? Would be great to hear @matthewcornell 's thoughts on this

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I don't understand enough to answer yet. Is there a summary of the specific changes you're asking about? I haven't touched any of the score-loading/manipulating code in predevals at this point.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we want to treat scores on a transformed scale as a separate score consistently throughout. This means, I think, treating it as a separate column, which will mean new columns in the tables and new variable names in menu selectors for the plots. So maybe this needs to be updated so that the scores.csv would return new columns for the transformed scores, not a new column for scale?

Copy link
Member Author

@nickreich nickreich Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@annakrystalli are you saying that hubPredEvalsData currently outputs a long table but that we might want to change it to output a wide table given the requirements that we want the eventual table to be displayed in wide format?

Copy link
Collaborator

@annakrystalli annakrystalli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for putting this together @nickreich , having a single overview of the full project scope across all the workstreams is really valuable.

I have some high-level structural comments in addition to my inline comments.

1. Remove AGENTS.md

We use Claude Code interactively rather than autonomous agents, we just point it at the relevant document for context. This file duplicates content from the poster (pipeline architecture, sprint structure, design decisions, open questions, key files), creating two documents to review and keep in sync. The poster itself is sufficient.

2. Development standards section doesn't belong here

The "Development standards" section (issue refinement format, TDD workflow, universal DoD) is prescribing team-wide methodology, that's a separate discussion, not something to embed in a project-specific poster. If we agree on a standard workflow, it should live in a team-level contributing guide (or even a Claude Code skill) where it's discoverable and reusable. As it stands, it's also something we haven't discussed as a team.

3. Level of detail

The poster has a lot of implementation-level detail (per-issue DoD checklists, file-level change specs, validation behaviour specifics) that goes beyond what's expected in a high-level project poster. A poster should capture the problem space, workstream overview, sequencing, and risks, the implementation detail belongs in issues in the relevant repos.

4. Consistency with existing planning work

Some of the Sprint C planning was already done in detail in hubPredEvalsData#34, and restating it here has introduced some inconsistencies:

  • The Sprint C DoD says transform: null for opt-out but the issue's schema uses transform: false, these have different semantics.
  • The poster says config applying a transform to a pmf target "fails validation," but the issue specifies a two-tier approach (error if explicitly set, warn if inherited from defaults).

The Sprint D section also introduces joint_across as a new config property name but the underlying hubEvals parameter is compound_taskid_set, which is also what we use in tasks.json. These are actually opposite concepts, so I'd suggest using compound_taskid_set consistently to avoid confusion.

Sprint C should link directly to hubPredEvalsData#34.

The README fix looks good.

@matthewcornell
Copy link
Member

matthewcornell commented Mar 3, 2026

This is super helpful, @nickreich .

Overall I agree with @annakrystalli 's points.

WRT the impact on the UI component and other areas I've been contributing to, I think I'd need to sit down and go over some concrete examples for me to understand the changes. As I said above, I haven't worked with the scoring data in the UI, just interface stuff (if that makes sense).

Re: "predevals JS changes":

When the scale column is present in scores data, treat each (metric × scale) combination as a distinct metric: e.g., "wis (natural)" and "wis (log)" appear as separate items in dropdowns and as separate columns in tables

Are we saying that these are a kind of virtual/dynamic score that has to be added at UI time, rather than being generated as separate columns. If so, this would make me nervous.


It would help me if we could review the changes I'll be responsible for together in detail so I can understand the implications before we move too far along.

Copy link
Contributor

@lshandross lshandross left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have much to add to the others' main comments, but I agree that the agents file should be removed and we don't need developer standards in the poster

@nickreich
Copy link
Member Author

nickreich commented Mar 4, 2026

From @matthewcornell

Are we saying that these are a kind of virtual/dynamic score that has to be added at UI time, rather than being generated as separate columns. If so, this would make me nervous.

No. All scores will be computed and generated as separate columns beforehand. No UI computation.

- Remove AGENTS.md (duplicated poster content); migrate pipeline diagram
  and repo links into the poster's What do we already know section
- Remove Development Standards section (team methodology, not project scope)
- Trim per-sprint DoD checklists to brief acceptance criteria
- Slim down Sprint C to reference hubPredEvalsData#34 as the authoritative
  implementation plan, eliminating duplicated/conflicting detail
- Fix transform: null to transform: false (matching hubPredEvalsData#34)
- Replace joint_across with compound_taskid_set throughout (matching
  established hubverse terminology)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@matthewcornell
Copy link
Member

Per a brief conversation today w/@nickreich , here's the plan we came up with when we made our last hubPredEvalsData schema change (adding the rounds_idx config property):

Copy link
Member

@matthewcornell matthewcornell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it looks good. Thanks, Nick.

Co-authored-by: Nicholas G Reich <nick@umass.edu>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants