Graph has random timestamps for measurement

Thank you for all your hard work. I love the idea, but I'm having trouble trusting the model score graph.

Say I click on a model, like GPT 5.4 and look at the measurement timestamps. The graph shows tests ranging from 1 minute apart to 4 hours apart (April 15th 1am and April 15th 1:01am). The graph doesn't make sense because the x axis isn't to scale. It makes the chart feel misleading and makes the trend harder to interpret. I'd like to see a little more consistency with when the models get tested. 

<img width="1288" height="627" alt="Image" src="https://github.com/user-attachments/assets/65a2220f-0481-4289-a69d-1faa649b833a" />

Anyway, good work and I'm glad someone is trying to measure this because it seems like everyone is in the dark about random performance degradation.  

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Graph has random timestamps for measurement #2

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Graph has random timestamps for measurement #2

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions