sahil2801/replit-code-instruct-glaive result #4

regularfry · 2023-07-07T11:09:09Z

regularfry
Jul 7, 2023

Am I reading this right? Right now, https://huggingface.co/sahil2801/replit-code-instruct-glaive, a 3B model, is claiming a pass@1 score of 63.5%, significantly better than the next best model, which is a 15B model.

Is that... real? Or are we seeing an artefact of how the pass@1 test works?

Apologies if this is a stupid question, but that result just looks too good to be true.

matthiasgeihs · 2023-07-10T10:52:14Z

matthiasgeihs
Jul 10, 2023

Have asked myself the same question and ran some validation experiments.
Seems like test data might have leaked into training data.
See hf discussion for more details.

0 replies

regularfry · 2023-07-11T15:07:43Z

regularfry
Jul 11, 2023
Author

Hm. That's a pain. These things happen.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

sahil2801/replit-code-instruct-glaive result #4

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

sahil2801/replit-code-instruct-glaive result #4

Uh oh!

regularfry Jul 7, 2023

Replies: 2 comments

Uh oh!

matthiasgeihs Jul 10, 2023

Uh oh!

regularfry Jul 11, 2023 Author

regularfry
Jul 7, 2023

matthiasgeihs
Jul 10, 2023

regularfry
Jul 11, 2023
Author