sahil2801/replit-code-instruct-glaive result #4
regularfry
started this conversation in
General
Replies: 2 comments
-
Have asked myself the same question and ran some validation experiments. |
Beta Was this translation helpful? Give feedback.
0 replies
-
Hm. That's a pain. These things happen. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Am I reading this right? Right now, https://huggingface.co/sahil2801/replit-code-instruct-glaive, a 3B model, is claiming a pass@1 score of 63.5%, significantly better than the next best model, which is a 15B model.
Is that... real? Or are we seeing an artefact of how the pass@1 test works?
Apologies if this is a stupid question, but that result just looks too good to be true.
Beta Was this translation helpful? Give feedback.
All reactions