Deduplicate instances and fix some cases. #155

zx80 · 2025-06-29T18:46:59Z

This remove duplicate instances from test files and makes minor adjustments so that values are valid wrt to the schema intent, not only its letter.

The total number of remaining test cases is 28,188 instead of 35,407 initially.

jviotti · 2025-06-30T13:00:34Z

@michaelmior Does this mean we should look for more instances now, or is it OK?

michaelmior · 2025-06-30T18:54:19Z

@jviotti I'd like to know more about what was changed. There may be some cases where what was written wasn't what was intended, but at the same time, these are real documents found in the wild so I'd personally lean towards keeping any "errors."

michaelmior · 2025-06-30T22:35:24Z

Also, as far as the duplicate instances go, these instances were collected independently as opposed to being duplicated on our end, so I'm personally not too concerned about it. Although it could be worth looking into the amount of duplication.

zx80 · 2025-07-01T07:02:19Z

Also, as far as the duplicate instances go, these instances were collected independently as opposed to being duplicated on our end, so I'm personally not too concerned about it. Although it could be worth looking into the amount of duplication.

My guess is that duplicate values come from file copies, not necessary two distinct projects which just happend to use the very same json value written independendently one from the other. The amount of duplication is 20%.

zx80 · 2025-07-01T07:11:59Z

@jviotti I'd like to know more about what was changed.

They are listed in jmc/README.md: raw strings in yamllint, not a uri in openapi, property names typos in ansible meta because of a misplaced additionalProperties, one strange character in a string in lazygit.

There may be some cases where what was written wasn't what was intended, but at the same time, these are real documents found in the wild so I'd personally lean towards keeping any "errors."

I do not understand the logic of keeping broken values as well as broken schemas in a benchmark: for me the point of such tools is too show how fast json validation can be, not to show great performance figures because checks are silently skipped or validating obviously wrong values very quickly, as it does not serve the use case well.

Anyway, if you want to keep them, I'll add an option to my scripts in order not to fix obvious user error when generating and compiling models. There is already one, but more is needed to reach 100% pass.

Fabien Coelho added 2 commits June 29, 2025 20:41

Deduplicate instances and fix some cases.

981e69e

replace placeholder with an address to have a valid URI

41189df

zx80 mentioned this pull request Jun 30, 2025

remove or fix yamlint? #153

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Deduplicate instances and fix some cases. #155

Deduplicate instances and fix some cases. #155

Uh oh!

zx80 commented Jun 29, 2025

Uh oh!

jviotti commented Jun 30, 2025

Uh oh!

michaelmior commented Jun 30, 2025

Uh oh!

michaelmior commented Jun 30, 2025

Uh oh!

zx80 commented Jul 1, 2025 •

edited

Loading

Uh oh!

zx80 commented Jul 1, 2025

Uh oh!

Uh oh!

Deduplicate instances and fix some cases. #155

Are you sure you want to change the base?

Deduplicate instances and fix some cases. #155

Uh oh!

Conversation

zx80 commented Jun 29, 2025

Uh oh!

jviotti commented Jun 30, 2025

Uh oh!

michaelmior commented Jun 30, 2025

Uh oh!

michaelmior commented Jun 30, 2025

Uh oh!

zx80 commented Jul 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zx80 commented Jul 1, 2025

Uh oh!

Uh oh!

zx80 commented Jul 1, 2025 •

edited

Loading