Upload a dataset of decarboxylative olefination reaction#236
Upload a dataset of decarboxylative olefination reaction#236DrHermit wants to merge 1 commit intoopen-reaction-database:#236from
Conversation
|
@DrHermit many thanks for your submission. I've redirected the merge request to a branch on origin so we can run some automated processes there. The next step is that I will review your dataset and get back to you with any corrections/modifications we require or recommend. I have a little concern that there are 561 files being changed in your PR. This could be caused by your fork missing some (most) of the existing datasets in ord-data. I'll look into this and see if we can get the merge to work from here, or we may need to just redo the pull request from a fresh branch and ensure we pull in all the latest updates from origin. We have had some issues with our Git LFS bandwidth in the last week so if you have recently attempted to clone ord-data that could be causing this. Once we get your dataset files safely into the branch on ord-data origin, they will be assigned dataset ids and you'll be able to refer to these in your ChemRxiv preprint and subsequent paper. |
|
@bdeadman Thank you very much for the timely response! Please let me know if I need to redo the pull request, and I'm looking forward to hearing back from you about any corrections/modifications for the dataset. |
|
@DrHermit we will need to set this pull request up from a clean fork. It looks like the existing datasets and other files have been deleted from your fork, so GitHub will also delete these from the main branch if we try to merge this into main. This docs page is a short guide to setting up the pull request if you would like to try again. What should happen is that you will have the full ord-data repository on your computer, and then you add your datasets to it (file location isn't important). When the commit is made it will then only add the new files, and leave the existing ones alone. That said, I would be very happy to walk you through the process on a call. If you would like to work through it together then you can email me ben@bjdeadman.co.uk to organise a time, or you can use this booking link to select a time that works for you. As a backup option I can also setup the pull request from our end, but then your GitHub account won't get the recognition for committing the dataset files. With regards to the review of the dataset - I think it could be helpful to have a short call to discuss it. Usually I'd compare the dataset to the methods reported in the paper, but as this is unpublished work I'm not going to ask you for the paper. We could do most of the review on the call and hopefully get this dataset fast-tracked so it is ready to appear alongside your pre-print. |
|
@bdeadman Thanks for the reply!! After a discussion with my PI, we came to the decision that we'll make another submission of the data after we put the manuscript on ChemRixiv (hopefully by the end of next week). I hope this will make it easier for you to review the data and procedure of our reaction, and I'll try to make the pull request with the correct dataset this time. |
|
No problem @DrHermit. I'll leave this PR open as a reminder to follow up once your preprint is ready. One thing I did notice when I briefly checked your dataset was that each chemical was in a separate 'input' and this could be a common misconception in the schema so its worth checking. Separating the chemicals into multiple inputs signifies a defined procedure for how those chemicals were added to the vessel. For example adding a solution to the vessel would be a single 'input' containing both the solute chemical and the solvent as chemical components. Adding a neat reagent to that same vessel would another 'input' containing just the reagent chemical component. If the chemical addition procedures are not as defined as this in the ORD reaction record then I recommend they are all kept in a single 'input' to avoid implying a specific order and/or mode of addition. |
|
Replaced PR with #237. |
Dataset description:
A dataset of 120 experiments from Bayesian optimization on 5 distinct substrate combinations, and 136 experiments for the transfer learning optimization of another 26 substrate combinations. The dataset can be used as training data for the optimization of similar decarboxylative olefinations between aldehydes and malonic acid derivatives.
The reactions were performed at the Healy lab at NYU Abu Dhabi (https://healylab.com), and will be included in the manuscript that is currently under preparation (will be uploaded to ChemRxiv first).