Evaluation dataset for GPT-3 generations

Hi, I'm woundering if you could release your evaluation dataset for GPT-3 generations, including PubMedQA, XSum, and WritingP (each 150 samples). Since the randomness in OpenAI services, a shared evaluation dataset will definitely make the followup work easier. Thanks!