Skip to content

Data submission - Decarboxylative Olefination (Machine learning optim…#238

Open
bdeadman wants to merge 5 commits intomainfrom
#237
Open

Data submission - Decarboxylative Olefination (Machine learning optim…#238
bdeadman wants to merge 5 commits intomainfrom
#237

Conversation

@bdeadman
Copy link
Copy Markdown
Collaborator

@bdeadman bdeadman commented Apr 8, 2026

#237 from @DrHermit

TO DO:

  • Update dataset descriptions to cross-reference each other's dataset ID.
  • Change extensions to txtpb so validation and count_reaction checks run correctly.

…ized) (#237)

* Add files via upload

* Add files via upload

* Delete Bayesian Optimization of Decarboxylative Olefination.txtpb

* Delete Transfer Learning of Decarboxylative Olefination.txtpb
@bdeadman bdeadman self-assigned this Apr 8, 2026
@bdeadman bdeadman marked this pull request as draft April 8, 2026 14:27
Replaced txtpb file extensions with pbtxt so that submission and validation scripts would process the datasets.
@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 8, 2026

Change summary:

Filename Added Removed Changed
Bayesian optimization of 5 decarboxylative olefination reactions.pbtxt 120 0 0
Transfer Learning Optimization of 26 decarboxylative olefination reactions.pbtxt 136 0 0
256 0 0

@bdeadman bdeadman marked this pull request as ready for review April 8, 2026 14:32
@bdeadman bdeadman marked this pull request as draft April 8, 2026 14:32
@bdeadman bdeadman marked this pull request as ready for review April 8, 2026 14:34
@bdeadman bdeadman closed this Apr 8, 2026
@bdeadman bdeadman reopened this Apr 8, 2026
@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 8, 2026

Change summary:

Filename Added Removed Changed
data/c7/ord_dataset-c703268ea43a4c7e802c6048b6166b34.pb.gz 120 0 0
data/dc/ord_dataset-dc0249930af34d17a3c76881a762aebf.pb.gz 136 0 0
256 0 0

@bdeadman
Copy link
Copy Markdown
Collaborator Author

bdeadman commented Apr 8, 2026

@DrHermit - Your dataset IDs are now available (see below) and can be used to cite your ORD datasets. They won't appear in the public ORD database until this pull request is approved. I'll need to get another admin to sign off on it, but hopefully we will get it through in a few days.

I'll suggest the dataset names and descriptions here but please do let me know if you want any of these changed before we approve this PR. In particular I would check that you are happy with these being called decarboxylative Knoevenagel condensations. I know it isn't terminology you used in the paper, and you may have a good reason for avoiding that terminology.

Bayesian optimization dataset
Dataset ID: ord_dataset-c703268ea43a4c7e802c6048b6166b34
Name: Bayesian optimization of 6 decarboxylative Knoevenagel condensation reactions
Description:
The Knoevenagel condensation between 6 pairings of aldehydes and malonic acid half-thioesters were studied in a Bayesian optimization campaign of 120 reaction datapoints. For each pairing, the catalyst, solvent, temperature and equivalents were optimized across 4 rounds of 6 experiments. Reactions were performed by the Alan R. Healy group at New York University Abu Dhabi, and the pre-print publication is available on ChemRxiv at https://doi.org/10.26434/chemrxiv.15001213/v1. This dataset was used as training data for a subsequent transfer learning optimization of similar aldehydes and malonic acid derivatives, which is also available on ORD (ord_dataset-dc0249930af34d17a3c76881a762aebf).

Transfer learning dataset
Dataset ID: ord_dataset-dc0249930af34d17a3c76881a762aebf
Name: Transfer learning optimization of 26 decarboxylative Knoevenagel condensation reactions
Description:
The decarboxylative Knoevenagel condensation between 26 pairings of aldehydes and malonic acid derivatives were studied in a transfer learning optimization campaign of 136 reaction datapoints. For each pairing, the catalyst, solvent, temperature and equivalents were optimized across several iterations of 2 experiments. Reactions were performed by the Alan R. Healy group at New York University Abu Dhabi, and the pre-print publication is available on ChemRxiv at https://doi.org/10.26434/chemrxiv.15001213/v1. This transfer learning dataset was trained with the 120-experiment dataset from a Bayesian optimization campaign which is also available on ORD (ord_dataset-c703268ea43a4c7e802c6048b6166b34).

@github-actions
Copy link
Copy Markdown

Change summary:

Filename Added Removed Changed
data/c7/ord_dataset-c703268ea43a4c7e802c6048b6166b34.pb.gz 120 0 0
data/dc/ord_dataset-dc0249930af34d17a3c76881a762aebf.pb.gz 136 0 0
256 0 0

@DrHermit
Copy link
Copy Markdown

DrHermit commented Apr 15, 2026

@bdeadman
Thanks for the update! I took a few days off and didn't check GitHub.

For the name of the dataset, I prefer to call it decarboxylative olefination reaction rather than decarboxylative Knoevenagel condensation. The reason is that the Knoevenagel condensation is usually referred to as the condensation of oxy-esters. For our paper, a few more categories of carbonyl derivatives were covered, so I would keep the name a bit more general.

Other than this, the descriptions of the datasets look good! Thank you very much for all the help!

@bdeadman
Copy link
Copy Markdown
Collaborator Author

Ok. I'll get the name changed before making the dataset go live.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants