Skip to content
Ariane Mora edited this page Sep 15, 2025 · 2 revisions

Parent Handling

• Each dataset must have a PARENT sequence identified with the #PARENT# tag.
• If no parent sequence is provided, please add a final “empty” plate e.g. if you have 1 plate, add a final row at the end with a “dummy” row. This is just to ensure we are using the correct parent sequence. The sequence hence must be provided and it is identified by the #PARENT# id. See example experiment ParLQ-a online.
• In such cases, the fold change relative to the parent equals the raw fitness value, ensuring fidelity.
• Alternatively, if an empty well (control) exists, this can be substituted as the parent.

Example Conversion

Original data snippet:

parent
1
plate1
A01
#PARENT#
O=C(OCC)C@H[C@H]1C2=CC=C(OC)C=C2
1
yield.

Reformatted row in CSV (Figure S1-compatible):

id plate well aa_sequence amino_acid_substitutions reaction_smiles fitness_value additional_information
1 1 A1 MAVPGYDFGKVPDAPISDADFESLK… #PARENT# O=C(OCC)C@H[C@H]1C2=CC=C(OC)C=C2 2.181282 yield
2 1 A2 CAVPGYDFGKVPDAPISDADFESLK… M1C_Y57Y_L59L_Q60Q_F89F O=C(OCC)C@H[C@H]1C2=CC=C(OC)C=C2 8.172996082 yield

Column Definitions

• id – Unique row identifier.
• plate – Plate source (e.g., plate1).
• well – Well identifier (e.g., A01).
• aa_sequence – Cleaned amino acid sequence (#PARENT# for parent entries).
• amino_acid_substitutions – Mutations or changes relative to parent (empty for parent).
• reaction_smiles – SMILES representation of the reaction.
• fitness_value – Activity or yield value.
• additional_information – Other assay descriptors (yield, condition notes, etc.).

Additional columns may exist in the pipeline output, but these seven columns are essential for upload and visualization in DEDB. Extra fields are preserved in the downloadable CSV.

Engineering Support

The engineering team is happy to help generate a Python-based data formatting script. If you share the dataset you have in its raw form, they can assist in converting it into the upload-ready format for visualization and downstream analysis.

Clone this wiki locally