Upload Suzuki-coupling reaction dataset#226
Upload Suzuki-coupling reaction dataset#226weiqiz3 wants to merge 1 commit intoopen-reaction-database:mainfrom
Conversation
|
Hi @weiqiz3, many thanks for the updated submission. Its looking much better. I've reviewed it now and have the following feedback.
|
|
Hi @bdeadman, sincere apologies for the delayed response. We have updated our dataset according to your specifications. You can find both the dataset and the spreadsheet here: Suzuki-Coupling for light-harvesting materials.json https://docs.google.com/spreadsheets/d/1ES6cmpph9pYGcF3P2MeAww4hT4tk0G_4TLcVykMWY9w/edit?gid=0#gid=0 Thank you! |
Review CommentsHi @weiqiz3 - this is getting close to being ready now. I have a couple of changes that I will strongly request, and some more that I would recommend to improve the usability of the ORD dataset. Let me know if you have any questions. Mandatory changes
Recommended Changes
Tidying up the MML to ORD conversion codeThe following are some awkward 'bugs' that I found in your ORD dataset. While we can wave them through this time (or just exclude them from the dataset), we should put some more thought into how your internal reaction data is mapped into the ORD schema.
|
|
Answering some of the more specific questions tagged for ORD in your source data spreadsheet here. Some of the responses will also be evident from my suggestions in the review comments.
This one is a little complex. Yes you could have an input named as 'Catalyst' in the ORD and that would be fine. However, for your dataset the catalyst needs to be included in the same input as the other chemical components since we don't have a defined addition order in your method. When the chemical components are listed in separate inputs this shows definitive information about the order of addition, and how the chemicals were added into the vessel.
Yes I think these are useful to include in the ORD dataset. While they won't be directly applicable to external users of the dataset, they will have use to your group, and they are useful for mapping the ORD dataset onto any other associated data you may publish in the future. They can be included as CUSTOM type identifiers at the component level, and include a name and brief description in the associated details field.
This is not necessary for ORD since the specific quantities are included with the input components. If you did want to include it you can add a FEATURE (type: NUMBER, data: the ratio, description: "ratio of boron reactant to halide") to the boron input component.
This is not necessary for ORD without more context about what it means. In this dataset since it is always ZERO it can be ignored.
ORD preference would be to name the individual experimenters if possible. If this is not possible then a lab name with contact email will be OK.
Yes, please do include that as a reaction ID. If nothing else it will help us run spot checks on the ORD dataset against the source data spreadsheet.
If you want to add calculated descriptors to your molecules these can be included as a FEATURE under the associated input or outcome component chemical. Be sure to include a useful description of what it means. Features can be of type NUMBER, TEXT, URL, or even a file UPLOAD.
Yes it is correct to put the atmosphere gas under conditions - pressure - atmosphere. The control type could be specified as SEALED. e.g. """
In the long term I think we need to have another look at how your purification steps are mapped into the ORD data model. It would be much better if the translation script parsed them, and created the appropriate WORKUP messages in the ORD dataset. That said, we can postpone this for now and focus on getting this dataset 'good enough' for release.
Yes we would take these in the database unless there is a known problem with the execution of the reaction.
Yes we would take these in the database unless there is a known problem with the execution of the reaction.
The examples shown in your source data look like they would be a good fit for OBSERVATIONS in the ORD schema. |
|
Attaching my review notes in notebook form here. No need for the dataset authors to use this file since all relevant comments have been copied out into the GitHub issue. |
Hi Ben, this is Weiqi Zhang. We had a meeting about exporting Suzuki-coupling reaction data last month. Here's the dataset and the template.
The attempts_dataset.pbtxt file contains all the Suzuki-coupling reactions from the paper https://www.nature.com/articles/s41586-024-07892-1.
template+data.zip