Skip to content

Graph has random timestamps for measurement #2

@reills

Description

@reills

Thank you for all your hard work. I love the idea, but I'm having trouble trusting the model score graph.

Say I click on a model, like GPT 5.4 and look at the measurement timestamps. The graph shows tests ranging from 1 minute apart to 4 hours apart (April 15th 1am and April 15th 1:01am). The graph doesn't make sense because the x axis isn't to scale. It makes the chart feel misleading and makes the trend harder to interpret. I'd like to see a little more consistency with when the models get tested.

Image

Anyway, good work and I'm glad someone is trying to measure this because it seems like everyone is in the dark about random performance degradation.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions