Refactor reaction mapping and duplicate handling for #25 by janitha-mahanthe · Pull Request #48 · NanoCIPHER-Lab/AutoREACTER

janitha-mahanthe · 2026-03-31T01:26:07Z

This PR focuses on addressing sub-issues #24 and #40 under the main refactor issue #25.

Summary

This PR continues the v0.2 class-based refactor by significantly improving the reaction preparation workflow. It introduces a more stable method for tracking atoms during reactions by leveraging isotope numbers instead of relying solely on map numbers, and it implements a structured approach for handling duplicate reaction instances.

Key Changes

1. Robust Atom Tracking via Isotope Numbers (Addresses #24)

Previously, atom mapping was handled purely using map numbers. This approach was unstable because custom map numbers often vanished after the reaction step, requiring us to force or manually manipulate them using methods like _reveal_template_map_numbers(self, mol: Chem.Mol) -> None.

This PR replaces that brittle approach with a cleaner workflow:

Consistent Numbering: Map numbers now start from 1001 for reactant 1 and 2001 for reactant 2.
Isotope Mapping: These same numbers are simultaneously assigned as isotope numbers.
Persistence: Unlike map numbers, isotope numbers do not vanish during the reaction. This allows us to reliably track atoms post-reaction using the isotope numbers, and then cleanly clear the labels afterward. The old _reveal_template_map_numbers workaround is more straight forward.

2. Duplicate Reaction Handling (Addresses #40)

To cleanly handle potential duplicate reaction instances, duplicate detection is now properly structured:

New Method: Introduced _detect_duplicates(self, reaction_metadata_list: list[ReactionMetadata]) -> list[ReactionMetadata]: to handle the detection logic.
Metadata Dataclass: This method utilizes the new ReactionMetadata dataclass. Setting activity_stats to False helps cleanly manage and filter out duplicates.
```
@dataclass(slots=True)
class ReactionMetadata:
    activity_stats: bool = False
```

Major refactor and cleanup across the reaction preparation pipeline:

Rework PrepareReactions to a more modular pipeline and improve mapping/data handling. - Added csv_path to ReactionMetadata and made template_reactant_to_product_mapping/edge_atoms optional. - Introduced run_reaction_template_pipeline, process_reaction_instances and _process_reaction_products to split responsibilities (mapping, template detection, CSV output). - Replaced/renamed several helpers: _prepare_paths now uses self.cache, added _is_consecutive, _smart_mapping, _build_reactants, _build_reaction, _add_dict_as_new_columns and a safer _add_column_safe. - Improved atom-mapping validation (consecutive indices, full coverage), byproduct detection, initiator checks, and CSV persistence for each processed reaction. - Hardened DataFrame column additions with nullable Int64 dtypes and handled potential None values in template/edge fields. - Updated reaction_templates_highlighted_image_grid to tolerate missing fields and use RWMol for visualization. still a broken code

Refactor reaction preparation flow

Refactor reaction preparation to simplify and harden atom-mapping. Reactant copies are created once per product set and unique atom map numbers are assigned via _assign_atom_map_numbers (using isotope fields to persist IDs through RDKit reactions).

Ensure reactant atoms receive the same atom map numbers as their mapped product atoms for correct visualization. Add a new _clear_isotopes(mol1, mol2) helper and call it before saving CSVs to reset isotopes used as temporary custom IDs and restore normal chemistry. Simplify total_products computation and rename parameters in _assign_atom_map_numbers for clarity.

Restructure PrepareReactions by extracting and consolidating atom-mapping logic into helper methods (_reassign_atom_map_numbers_by_isotope, _build_atom_index_mapping, _assign_first_shell_and_initiators).

Reorganize PrepareReactions by converting several pipeline methods to private implementations and grouping related helpers. Renamed public pipeline entry points (process_reaction_instances -> _process_reaction_instances, process_reaction_products -> _process_reaction_products), moved and reinserted _detect_duplicates, and added new helper methods (_detect_byproducts, _validate_mapping, _assign_atom_map_numbers). Reordered methods into logical sections (PUBLIC, PIPELINE STEPS (PRIVATE), CORE REACTION LOGIC, ATOM MAPPING, BUILDERS, HELPERS, VISUALIZATION) and adjusted internal calls accordingly to improve encapsulation, readability, and maintainability while preserving existing behavior.

Enhance PrepareReactions in prepare_reactions.py: remove unused imports (itertools, os), add type annotations to _assign_first_shell_and_initiators, and replace the previous brittle _validate_mapping checks with robust validation. The new validation ensures the dataframe and required columns exist, checks for matching counts, duplicate indices, index bounds, and full coverage, and raises MappingError with informative messages on failure. Also includes small whitespace/formatting cleanup.

Copilot

Pull request overview

This PR advances the v0.2 class-based refactor by moving reaction preparation toward an isotope-based atom tracking approach (to survive RDKit reaction execution) and adding structured duplicate handling for reaction instances, while also expanding the LUNAR preparation client modules.

Changes:

Introduces a new PrepareReactions pipeline that assigns atom map numbers + isotopes, rebuilds reactant↔product mappings post-reaction, and annotates template/edge atoms.
Adds a duplicate-reaction detection pass based on canonicalized reactant+product structure comparison.
Adds/updates LUNAR client utilities for 3D preparation, template/map generation, and running the LUNAR toolchain.

Reviewed changes

Copilot reviewed 14 out of 22 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
`examples/example_1.ipynb`	Updates the example workflow to use `PrepareReactions` and adds optional visualization steps.
`AutoREACTER/reaction_template_builder/reaction_template_pipeline/util.py`	Removes legacy utility functions from the old template builder pipeline.
`AutoREACTER/reaction_template_builder/reaction_template_pipeline/map_reactant_atoms.py`	Removes the legacy reactant/product mapping implementation.
`AutoREACTER/reaction_template_builder/lunar_client/config.py`	Removes legacy LUNAR config location under the old template builder path.
`AutoREACTER/reaction_preparation/reaction_processor/walker.py`	Disables a debug print during atom walking.
`AutoREACTER/reaction_preparation/reaction_processor/utils.py`	Adds shared helpers for CSV paths, dataframe augmentation, reference extraction, and duplicate comparison.
`AutoREACTER/reaction_preparation/reaction_processor/prepare_reactions.py`	Adds the new reaction preparation pipeline with isotope-based atom identity recovery and metadata output.
`AutoREACTER/reaction_preparation/reaction_processor/fragment_comparison.py`	Adds fragment extraction/capping + history-based duplicate transformation comparison utilities.
`AutoREACTER/reaction_preparation/reaction_processor/atom_mapping.py`	Adds SMARTS-to-reactant mapping helper (extracted from legacy code).
`AutoREACTER/reaction_preparation/lunar_client/molecule_template_preparation.py`	Adds template trimming/reindexing and `.map` generation for bond/react workflows.
`AutoREACTER/reaction_preparation/lunar_client/molecule_3d_preparation.py`	Adds 3D embedding/optimization utilities with fragment separation.
`AutoREACTER/reaction_preparation/lunar_client/lunar_api_wrapper.py`	Adds a wrapper around LUNAR utilities (atom typing, all2lmp, bond_react_merge) plus path normalization helpers.
`AutoREACTER/reaction_preparation/lunar_client/locate_lunar.py`	Adds interactive/auto-detection logic for locating a LUNAR install and persisting it to config.
`AutoREACTER/reaction_preparation/lunar_client/config.py`	Adds a LUNAR root config file (currently committed with a hardcoded absolute path).
`AutoREACTER/reaction_preparation/build_reaction_system.py`	Refactors the reaction template pipeline module structure (currently has import/dataclass issues).
`AutoREACTER/detectors/reactions_library.py`	Renames functional group keys used in hydroxy acid halide reactions.
`AutoREACTER/detectors/reaction_detector.py`	Adjusts `ReactionInstance` (adds `same_reactants`) and changes duplicate-key logic to use stable string identifiers.
`AutoREACTER/detectors/functional_groups_detector.py`	Makes `FunctionalGroupInfo` mutable (removes `frozen=True`).
`.vscode/settings.json`	Updates workspace Python environment manager settings (conda).

Comments suppressed due to low confidence (3)

AutoREACTER/reaction_preparation/build_reaction_system.py:59

Importing ReactionTemplate from AutoREACTER.detectors.reaction_detector will fail because ReactionTemplate is no longer defined/exported there. Remove this import or reintroduce ReactionTemplate in reaction_detector.py (and update all callers accordingly).
AutoREACTER/reaction_preparation/build_reaction_system.py:106
ReactionMetadata dataclass has non-default fields (reactant_to_product_mapping, template_reactant_to_product_mapping, edge_atoms) declared after defaulted fields (e.g., reaction_smarts, csv_path). This raises TypeError: non-default argument ... follows default argument at import time. Reorder fields so all required (non-default) fields come first, and make csv_path/others Optional[...] if they can be None.
AutoREACTER/reaction_preparation/build_reaction_system.py:56
build_reaction_system.py still imports from .reaction_template_pipeline.* / reaction_template_pipeline.*, but that package doesn't exist in AutoREACTER/reaction_preparation (and the old reaction_template_builder/reaction_template_pipeline files were removed in this PR). As a result, importing this module will raise ImportError immediately. Update these imports to the new reaction_preparation.reaction_processor modules (or remove the dead fallback path).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

AutoREACTER/reaction_preparation/reaction_processor/utils.py

AutoREACTER/reaction_preparation/lunar_client/config.py

AutoREACTER/reaction_preparation/reaction_processor/prepare_reactions.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

…actions.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

janitha-mahanthe · 2026-03-31T14:25:26Z

#24 and #40 solved

janitha-mahanthe added 23 commits March 11, 2026 13:33

Rename reaction_template_builder package

a5304ff

Implement reaction mapping pipeline

b289ea8

Integrate walker, add template mapping & image grid

11cc7ff

Update prepare_reactions.py

972b3ca

Update example_1.ipynb

2cbae0f

Fix utils imports, SMILES parsing and reactant names

b7dd6e5

Refactor reaction preparation and clean debug logs

c4a192d

Major refactor and cleanup across the reaction preparation pipeline:

Refactor prepare_reactions: typing & validations

933c700

Update example_1.ipynb

5b081aa

Refactor ReactionMetadata fields and callers

6a5b672

Refactor reaction processing and validation

302c3dd

Refactor reaction preparation flow

Use mapping dict to detect byproducts

9d55976

Refactor reaction processor and utils

4bcf277

Move same_reactants to ReactionInstance

dab7742

Refactor atom-mapping and add helpers

456473f

Refactor reaction preparation to simplify and harden atom-mapping. Reactant copies are created once per product set and unique atom map numbers are assigned via _assign_atom_map_numbers (using isotope fields to persist IDs through RDKit reactions).

Refactor reaction mapping and helper functions

efbfbb7

Restructure PrepareReactions by extracting and consolidating atom-mapping logic into helper methods (_reassign_atom_map_numbers_by_isotope, _build_atom_index_mapping, _assign_first_shell_and_initiators).

Update prepare_reactions.py

568dcb7

Update prepare_reactions.py

837fee9

Refactor reaction preparation and mapping

99e5e99

janitha-mahanthe requested a review from Copilot March 31, 2026 01:26

janitha-mahanthe self-assigned this Mar 31, 2026

Copilot started reviewing on behalf of janitha-mahanthe March 31, 2026 01:26 View session

Copilot AI reviewed Mar 31, 2026

View reviewed changes

janitha-mahanthe and others added 3 commits March 31, 2026 10:22

Update AutoREACTER/reaction_preparation/reaction_processor/utils.py

bde496f

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Clear LUNAR_ROOT_DIR variable in config.py

2aaa6bf

Update AutoREACTER/reaction_preparation/reaction_processor/prepare_re…

4f34fa4

…actions.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Copilot started work on behalf of janitha-mahanthe March 31, 2026 14:24 View session

janitha-mahanthe merged commit 4dc1c00 into refactor-into-class-based-structure Mar 31, 2026
1 check failed

This was referenced Mar 31, 2026

v0.2: Notebook visualizations for reactant/product templates #24

Closed

Handle Potential Duplicate Reaction Instances #40

Closed

janitha-mahanthe mentioned this pull request Apr 8, 2026

Enhance system properties and refactor example notebooks #57

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor reaction mapping and duplicate handling for #25#48

Refactor reaction mapping and duplicate handling for #25#48
janitha-mahanthe merged 26 commits intorefactor-into-class-based-structurefrom
25-v02-refactor-into-class-based

janitha-mahanthe commented Mar 31, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

janitha-mahanthe commented Mar 31, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

janitha-mahanthe commented Mar 31, 2026

Summary

Key Changes

1. Robust Atom Tracking via Isotope Numbers (Addresses #24)

2. Duplicate Reaction Handling (Addresses #40)

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

janitha-mahanthe commented Mar 31, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants