Skip to content

Upload a dataset of decarboxylative olefination reaction#236

Closed
DrHermit wants to merge 1 commit intoopen-reaction-database:#236from
DrHermit:my_submission
Closed

Upload a dataset of decarboxylative olefination reaction#236
DrHermit wants to merge 1 commit intoopen-reaction-database:#236from
DrHermit:my_submission

Conversation

@DrHermit
Copy link
Copy Markdown

@DrHermit DrHermit commented Mar 19, 2026

Dataset description:
A dataset of 120 experiments from Bayesian optimization on 5 distinct substrate combinations, and 136 experiments for the transfer learning optimization of another 26 substrate combinations. The dataset can be used as training data for the optimization of similar decarboxylative olefinations between aldehydes and malonic acid derivatives.

The reactions were performed at the Healy lab at NYU Abu Dhabi (https://healylab.com), and will be included in the manuscript that is currently under preparation (will be uploaded to ChemRxiv first).

@bdeadman bdeadman changed the base branch from main to #236 March 19, 2026 11:15
@bdeadman
Copy link
Copy Markdown
Collaborator

@DrHermit many thanks for your submission. I've redirected the merge request to a branch on origin so we can run some automated processes there. The next step is that I will review your dataset and get back to you with any corrections/modifications we require or recommend.

I have a little concern that there are 561 files being changed in your PR. This could be caused by your fork missing some (most) of the existing datasets in ord-data. I'll look into this and see if we can get the merge to work from here, or we may need to just redo the pull request from a fresh branch and ensure we pull in all the latest updates from origin. We have had some issues with our Git LFS bandwidth in the last week so if you have recently attempted to clone ord-data that could be causing this.

Once we get your dataset files safely into the branch on ord-data origin, they will be assigned dataset ids and you'll be able to refer to these in your ChemRxiv preprint and subsequent paper.

@DrHermit
Copy link
Copy Markdown
Author

@bdeadman Thank you very much for the timely response! Please let me know if I need to redo the pull request, and I'm looking forward to hearing back from you about any corrections/modifications for the dataset.

@bdeadman
Copy link
Copy Markdown
Collaborator

@DrHermit we will need to set this pull request up from a clean fork. It looks like the existing datasets and other files have been deleted from your fork, so GitHub will also delete these from the main branch if we try to merge this into main.

This docs page is a short guide to setting up the pull request if you would like to try again. What should happen is that you will have the full ord-data repository on your computer, and then you add your datasets to it (file location isn't important). When the commit is made it will then only add the new files, and leave the existing ones alone.

That said, I would be very happy to walk you through the process on a call. If you would like to work through it together then you can email me ben@bjdeadman.co.uk to organise a time, or you can use this booking link to select a time that works for you.

As a backup option I can also setup the pull request from our end, but then your GitHub account won't get the recognition for committing the dataset files.

With regards to the review of the dataset - I think it could be helpful to have a short call to discuss it. Usually I'd compare the dataset to the methods reported in the paper, but as this is unpublished work I'm not going to ask you for the paper. We could do most of the review on the call and hopefully get this dataset fast-tracked so it is ready to appear alongside your pre-print.

@bdeadman bdeadman marked this pull request as draft March 19, 2026 17:20
@DrHermit
Copy link
Copy Markdown
Author

DrHermit commented Mar 20, 2026

@bdeadman Thanks for the reply!! After a discussion with my PI, we came to the decision that we'll make another submission of the data after we put the manuscript on ChemRixiv (hopefully by the end of next week). I hope this will make it easier for you to review the data and procedure of our reaction, and I'll try to make the pull request with the correct dataset this time.

@bdeadman
Copy link
Copy Markdown
Collaborator

No problem @DrHermit. I'll leave this PR open as a reminder to follow up once your preprint is ready.

One thing I did notice when I briefly checked your dataset was that each chemical was in a separate 'input' and this could be a common misconception in the schema so its worth checking. Separating the chemicals into multiple inputs signifies a defined procedure for how those chemicals were added to the vessel. For example adding a solution to the vessel would be a single 'input' containing both the solute chemical and the solvent as chemical components. Adding a neat reagent to that same vessel would another 'input' containing just the reagent chemical component.

If the chemical addition procedures are not as defined as this in the ORD reaction record then I recommend they are all kept in a single 'input' to avoid implying a specific order and/or mode of addition.

@DrHermit DrHermit closed this Mar 26, 2026
@DrHermit DrHermit deleted the my_submission branch March 26, 2026 14:20
@bdeadman
Copy link
Copy Markdown
Collaborator

Replaced PR with #237.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants