-
Notifications
You must be signed in to change notification settings - Fork 2
Upload
• Each dataset must have a PARENT sequence identified with the #PARENT# tag.
• If no parent sequence is provided, please add a final “empty” plate e.g. if you have 1 plate, add a final row at the end with a “dummy” row. This is just to ensure we are using the correct parent sequence. The sequence hence must be provided and it is identified by the #PARENT# id. See example experiment ParLQ-a online.
• In such cases, the fold change relative to the parent equals the raw fitness value, ensuring fidelity.
• Alternatively, if an empty well (control) exists, this can be substituted as the parent.
Original data snippet:
parent
1
plate1
A01
#PARENT#
O=C(OCC)C@H[C@H]1C2=CC=C(OC)C=C2
1
yield.
Reformatted row in CSV (Figure S1-compatible):
| id | plate | well | aa_sequence | amino_acid_substitutions | reaction_smiles | fitness_value | additional_information |
|---|---|---|---|---|---|---|---|
| 1 | 1 | A1 | MAVPGYDFGKVPDAPISDADFESLK… | #PARENT# | O=C(OCC)C@H[C@H]1C2=CC=C(OC)C=C2 | 2.181282 | yield |
| 2 | 1 | A2 | CAVPGYDFGKVPDAPISDADFESLK… | M1C_Y57Y_L59L_Q60Q_F89F | O=C(OCC)C@H[C@H]1C2=CC=C(OC)C=C2 | 8.172996082 | yield |
• id – Unique row identifier.
• plate – Plate source (e.g., plate1).
• well – Well identifier (e.g., A01).
• aa_sequence – Cleaned amino acid sequence (#PARENT# for parent entries).
• amino_acid_substitutions – Mutations or changes relative to parent (empty for parent).
• reaction_smiles – SMILES representation of the reaction.
• fitness_value – Activity or yield value.
• additional_information – Other assay descriptors (yield, condition notes, etc.).
Additional columns may exist in the pipeline output, but these seven columns are essential for upload and visualization in DEDB. Extra fields are preserved in the downloadable CSV.
The engineering team is happy to help generate a Python-based data formatting script. If you share the dataset you have in its raw form, they can assist in converting it into the upload-ready format for visualization and downstream analysis.