Skip to content

Refactor reaction mapping and duplicate handling for #25#48

Merged
janitha-mahanthe merged 26 commits intorefactor-into-class-based-structurefrom
25-v02-refactor-into-class-based
Mar 31, 2026
Merged

Refactor reaction mapping and duplicate handling for #25#48
janitha-mahanthe merged 26 commits intorefactor-into-class-based-structurefrom
25-v02-refactor-into-class-based

Conversation

@janitha-mahanthe
Copy link
Copy Markdown
Member

This PR focuses on addressing sub-issues #24 and #40 under the main refactor issue #25.

Summary

This PR continues the v0.2 class-based refactor by significantly improving the reaction preparation workflow. It introduces a more stable method for tracking atoms during reactions by leveraging isotope numbers instead of relying solely on map numbers, and it implements a structured approach for handling duplicate reaction instances.

Key Changes

1. Robust Atom Tracking via Isotope Numbers (Addresses #24)

Previously, atom mapping was handled purely using map numbers. This approach was unstable because custom map numbers often vanished after the reaction step, requiring us to force or manually manipulate them using methods like _reveal_template_map_numbers(self, mol: Chem.Mol) -> None.

This PR replaces that brittle approach with a cleaner workflow:

  • Consistent Numbering: Map numbers now start from 1001 for reactant 1 and 2001 for reactant 2.
  • Isotope Mapping: These same numbers are simultaneously assigned as isotope numbers.
  • Persistence: Unlike map numbers, isotope numbers do not vanish during the reaction. This allows us to reliably track atoms post-reaction using the isotope numbers, and then cleanly clear the labels afterward. The old _reveal_template_map_numbers workaround is more straight forward.

2. Duplicate Reaction Handling (Addresses #40)

To cleanly handle potential duplicate reaction instances, duplicate detection is now properly structured:

  • New Method: Introduced _detect_duplicates(self, reaction_metadata_list: list[ReactionMetadata]) -> list[ReactionMetadata]: to handle the detection logic.
  • Metadata Dataclass: This method utilizes the new ReactionMetadata dataclass. Setting activity_stats to False helps cleanly manage and filter out duplicates.
    @dataclass(slots=True)
    class ReactionMetadata:
        activity_stats: bool = False

Major refactor and cleanup across the reaction preparation pipeline:
Rework PrepareReactions to a more modular pipeline and improve mapping/data handling.

- Added csv_path to ReactionMetadata and made template_reactant_to_product_mapping/edge_atoms optional.
- Introduced run_reaction_template_pipeline, process_reaction_instances and _process_reaction_products to split responsibilities (mapping, template detection, CSV output).
- Replaced/renamed several helpers: _prepare_paths now uses self.cache, added _is_consecutive, _smart_mapping, _build_reactants, _build_reaction, _add_dict_as_new_columns and a safer _add_column_safe.
- Improved atom-mapping validation (consecutive indices, full coverage), byproduct detection, initiator checks, and CSV persistence for each processed reaction.
- Hardened DataFrame column additions with nullable Int64 dtypes and handled potential None values in template/edge fields.
- Updated reaction_templates_highlighted_image_grid to tolerate missing fields and use RWMol for visualization.

still a broken code
Refactor reaction preparation flow
Refactor reaction preparation to simplify and harden atom-mapping. Reactant copies are created once per product set and unique atom map numbers are assigned via _assign_atom_map_numbers (using isotope fields to persist IDs through RDKit reactions).
Ensure reactant atoms receive the same atom map numbers as their mapped product atoms for correct visualization. Add a new _clear_isotopes(mol1, mol2) helper and call it before saving CSVs to reset isotopes used as temporary custom IDs and restore normal chemistry. Simplify total_products computation and rename parameters in _assign_atom_map_numbers for clarity.
Restructure PrepareReactions by extracting and consolidating atom-mapping logic into helper methods (_reassign_atom_map_numbers_by_isotope, _build_atom_index_mapping, _assign_first_shell_and_initiators).
Reorganize PrepareReactions by converting several pipeline methods to private implementations and grouping related helpers. Renamed public pipeline entry points (process_reaction_instances -> _process_reaction_instances, process_reaction_products -> _process_reaction_products), moved and reinserted _detect_duplicates, and added new helper methods (_detect_byproducts, _validate_mapping, _assign_atom_map_numbers). Reordered methods into logical sections (PUBLIC, PIPELINE STEPS (PRIVATE), CORE REACTION LOGIC, ATOM MAPPING, BUILDERS, HELPERS, VISUALIZATION) and adjusted internal calls accordingly to improve encapsulation, readability, and maintainability while preserving existing behavior.
Enhance PrepareReactions in prepare_reactions.py: remove unused imports (itertools, os), add type annotations to _assign_first_shell_and_initiators, and replace the previous brittle _validate_mapping checks with robust validation. The new validation ensures the dataframe and required columns exist, checks for matching counts, duplicate indices, index bounds, and full coverage, and raises MappingError with informative messages on failure. Also includes small whitespace/formatting cleanup.
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR advances the v0.2 class-based refactor by moving reaction preparation toward an isotope-based atom tracking approach (to survive RDKit reaction execution) and adding structured duplicate handling for reaction instances, while also expanding the LUNAR preparation client modules.

Changes:

  • Introduces a new PrepareReactions pipeline that assigns atom map numbers + isotopes, rebuilds reactant↔product mappings post-reaction, and annotates template/edge atoms.
  • Adds a duplicate-reaction detection pass based on canonicalized reactant+product structure comparison.
  • Adds/updates LUNAR client utilities for 3D preparation, template/map generation, and running the LUNAR toolchain.

Reviewed changes

Copilot reviewed 14 out of 22 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
examples/example_1.ipynb Updates the example workflow to use PrepareReactions and adds optional visualization steps.
AutoREACTER/reaction_template_builder/reaction_template_pipeline/util.py Removes legacy utility functions from the old template builder pipeline.
AutoREACTER/reaction_template_builder/reaction_template_pipeline/map_reactant_atoms.py Removes the legacy reactant/product mapping implementation.
AutoREACTER/reaction_template_builder/lunar_client/config.py Removes legacy LUNAR config location under the old template builder path.
AutoREACTER/reaction_preparation/reaction_processor/walker.py Disables a debug print during atom walking.
AutoREACTER/reaction_preparation/reaction_processor/utils.py Adds shared helpers for CSV paths, dataframe augmentation, reference extraction, and duplicate comparison.
AutoREACTER/reaction_preparation/reaction_processor/prepare_reactions.py Adds the new reaction preparation pipeline with isotope-based atom identity recovery and metadata output.
AutoREACTER/reaction_preparation/reaction_processor/fragment_comparison.py Adds fragment extraction/capping + history-based duplicate transformation comparison utilities.
AutoREACTER/reaction_preparation/reaction_processor/atom_mapping.py Adds SMARTS-to-reactant mapping helper (extracted from legacy code).
AutoREACTER/reaction_preparation/lunar_client/molecule_template_preparation.py Adds template trimming/reindexing and .map generation for bond/react workflows.
AutoREACTER/reaction_preparation/lunar_client/molecule_3d_preparation.py Adds 3D embedding/optimization utilities with fragment separation.
AutoREACTER/reaction_preparation/lunar_client/lunar_api_wrapper.py Adds a wrapper around LUNAR utilities (atom typing, all2lmp, bond_react_merge) plus path normalization helpers.
AutoREACTER/reaction_preparation/lunar_client/locate_lunar.py Adds interactive/auto-detection logic for locating a LUNAR install and persisting it to config.
AutoREACTER/reaction_preparation/lunar_client/config.py Adds a LUNAR root config file (currently committed with a hardcoded absolute path).
AutoREACTER/reaction_preparation/build_reaction_system.py Refactors the reaction template pipeline module structure (currently has import/dataclass issues).
AutoREACTER/detectors/reactions_library.py Renames functional group keys used in hydroxy acid halide reactions.
AutoREACTER/detectors/reaction_detector.py Adjusts ReactionInstance (adds same_reactants) and changes duplicate-key logic to use stable string identifiers.
AutoREACTER/detectors/functional_groups_detector.py Makes FunctionalGroupInfo mutable (removes frozen=True).
.vscode/settings.json Updates workspace Python environment manager settings (conda).
Comments suppressed due to low confidence (3)

AutoREACTER/reaction_preparation/build_reaction_system.py:59

  • Importing ReactionTemplate from AutoREACTER.detectors.reaction_detector will fail because ReactionTemplate is no longer defined/exported there. Remove this import or reintroduce ReactionTemplate in reaction_detector.py (and update all callers accordingly).
    AutoREACTER/reaction_preparation/build_reaction_system.py:106
  • ReactionMetadata dataclass has non-default fields (reactant_to_product_mapping, template_reactant_to_product_mapping, edge_atoms) declared after defaulted fields (e.g., reaction_smarts, csv_path). This raises TypeError: non-default argument ... follows default argument at import time. Reorder fields so all required (non-default) fields come first, and make csv_path/others Optional[...] if they can be None.
    AutoREACTER/reaction_preparation/build_reaction_system.py:56
  • build_reaction_system.py still imports from .reaction_template_pipeline.* / reaction_template_pipeline.*, but that package doesn't exist in AutoREACTER/reaction_preparation (and the old reaction_template_builder/reaction_template_pipeline files were removed in this PR). As a result, importing this module will raise ImportError immediately. Update these imports to the new reaction_preparation.reaction_processor modules (or remove the dead fallback path).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

janitha-mahanthe and others added 3 commits March 31, 2026 10:22
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
…actions.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@janitha-mahanthe
Copy link
Copy Markdown
Member Author

#24 and #40 solved

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants