Improve Monomer Input Schema: Preserve Counts and Support Named Monomers
Summary
In the current input format (v0.1-beta.0), monomers and their counts are defined separately:
{
"simulation_name": "MySimulation",
"temperature": [300, 400, 500],
"density": 0.8,
"monomers": {
"1": "O=C(O)CCCCO",
"2": "O=C(O)c1ccc(O)cc1",
"3": "CCO"
},
"number_of_monomers": {
"1": 500,
"2": 400,
"3": 300
}
}
However:
- The workflow currently drops or does not consistently preserve the number_of_monomers information.
- The schema only supports numeric keys ("1", "2", etc.).
- There is no direct support for user-defined monomer names.
Problems
- Monomer count information may be ignored or lost in later stages of the workflow.
- Monomer identity relies on numeric indexing instead of semantic identifiers.
- No support for descriptive monomer names (e.g., "adipic_acid", "terephthalic_acid").
- Numeric indexing is basically blind
Expected Behavior
- Monomer SMILES and counts must remain tightly coupled.
- Monomer counts must persist throughout the workflow.
- The system should optionally support named monomers.
###Proposed Schema Improvement
Replace the split dictionaries with a unified monomer structure.
The system should accept either a named dictionary:
{
"monomers": [
{
"name": "adipic_acid",
"smiles": "O=C(O)CCCCO",
"count": 500
},
{
"name": "terephthalic_acid",
"smiles": "O=C(O)c1ccc(O)cc1",
"count": 400
},
{
"name": "ethanol",
"smiles": "CCO",
"count": 300
}
]
}
or a minimal list-based format:
{
"monomers": [
{
"smiles": "O=C(O)CCCCO",
"count": 500
},
{
"smiles": "O=C(O)c1ccc(O)cc1",
"count": 400
},
{
"smiles": "CCO",
"count": 300
}
]
}
formats should be normalized by the input parser into a canonical internal representation:
[
{
"id": 1,
"name": "adipic_acid",
"smiles": "O=C(O)CCCCO",
"count": 500
},
{
"id": 2,
"name": "terephthalic_acid",
"smiles": "O=C(O)c1ccc(O)cc1",
"count": 400
},
{
"id": 3,
"name": None,
"smiles": "CCO",
"count": 300
}
]
Improve Monomer Input Schema: Preserve Counts and Support Named Monomers
Summary
In the current input format (
v0.1-beta.0), monomers and their counts are defined separately:{ "simulation_name": "MySimulation", "temperature": [300, 400, 500], "density": 0.8, "monomers": { "1": "O=C(O)CCCCO", "2": "O=C(O)c1ccc(O)cc1", "3": "CCO" }, "number_of_monomers": { "1": 500, "2": 400, "3": 300 } }However:
Problems
Expected Behavior
###Proposed Schema Improvement
Replace the split dictionaries with a unified monomer structure.
The system should accept either a named dictionary:
{ "monomers": [ { "name": "adipic_acid", "smiles": "O=C(O)CCCCO", "count": 500 }, { "name": "terephthalic_acid", "smiles": "O=C(O)c1ccc(O)cc1", "count": 400 }, { "name": "ethanol", "smiles": "CCO", "count": 300 } ] }or a minimal list-based format:
{ "monomers": [ { "smiles": "O=C(O)CCCCO", "count": 500 }, { "smiles": "O=C(O)c1ccc(O)cc1", "count": 400 }, { "smiles": "CCO", "count": 300 } ] }formats should be normalized by the input parser into a canonical internal representation:
[ { "id": 1, "name": "adipic_acid", "smiles": "O=C(O)CCCCO", "count": 500 }, { "id": 2, "name": "terephthalic_acid", "smiles": "O=C(O)c1ccc(O)cc1", "count": 400 }, { "id": 3, "name": None, "smiles": "CCO", "count": 300 } ]