Refactor reaction mapping and duplicate handling for #25#48
Merged
janitha-mahanthe merged 26 commits intorefactor-into-class-based-structurefrom Mar 31, 2026
Merged
Conversation
Major refactor and cleanup across the reaction preparation pipeline:
Rework PrepareReactions to a more modular pipeline and improve mapping/data handling. - Added csv_path to ReactionMetadata and made template_reactant_to_product_mapping/edge_atoms optional. - Introduced run_reaction_template_pipeline, process_reaction_instances and _process_reaction_products to split responsibilities (mapping, template detection, CSV output). - Replaced/renamed several helpers: _prepare_paths now uses self.cache, added _is_consecutive, _smart_mapping, _build_reactants, _build_reaction, _add_dict_as_new_columns and a safer _add_column_safe. - Improved atom-mapping validation (consecutive indices, full coverage), byproduct detection, initiator checks, and CSV persistence for each processed reaction. - Hardened DataFrame column additions with nullable Int64 dtypes and handled potential None values in template/edge fields. - Updated reaction_templates_highlighted_image_grid to tolerate missing fields and use RWMol for visualization. still a broken code
Refactor reaction preparation flow
Refactor reaction preparation to simplify and harden atom-mapping. Reactant copies are created once per product set and unique atom map numbers are assigned via _assign_atom_map_numbers (using isotope fields to persist IDs through RDKit reactions).
Ensure reactant atoms receive the same atom map numbers as their mapped product atoms for correct visualization. Add a new _clear_isotopes(mol1, mol2) helper and call it before saving CSVs to reset isotopes used as temporary custom IDs and restore normal chemistry. Simplify total_products computation and rename parameters in _assign_atom_map_numbers for clarity.
Restructure PrepareReactions by extracting and consolidating atom-mapping logic into helper methods (_reassign_atom_map_numbers_by_isotope, _build_atom_index_mapping, _assign_first_shell_and_initiators).
Reorganize PrepareReactions by converting several pipeline methods to private implementations and grouping related helpers. Renamed public pipeline entry points (process_reaction_instances -> _process_reaction_instances, process_reaction_products -> _process_reaction_products), moved and reinserted _detect_duplicates, and added new helper methods (_detect_byproducts, _validate_mapping, _assign_atom_map_numbers). Reordered methods into logical sections (PUBLIC, PIPELINE STEPS (PRIVATE), CORE REACTION LOGIC, ATOM MAPPING, BUILDERS, HELPERS, VISUALIZATION) and adjusted internal calls accordingly to improve encapsulation, readability, and maintainability while preserving existing behavior.
Enhance PrepareReactions in prepare_reactions.py: remove unused imports (itertools, os), add type annotations to _assign_first_shell_and_initiators, and replace the previous brittle _validate_mapping checks with robust validation. The new validation ensures the dataframe and required columns exist, checks for matching counts, duplicate indices, index bounds, and full coverage, and raises MappingError with informative messages on failure. Also includes small whitespace/formatting cleanup.
There was a problem hiding this comment.
Pull request overview
This PR advances the v0.2 class-based refactor by moving reaction preparation toward an isotope-based atom tracking approach (to survive RDKit reaction execution) and adding structured duplicate handling for reaction instances, while also expanding the LUNAR preparation client modules.
Changes:
- Introduces a new
PrepareReactionspipeline that assigns atom map numbers + isotopes, rebuilds reactant↔product mappings post-reaction, and annotates template/edge atoms. - Adds a duplicate-reaction detection pass based on canonicalized reactant+product structure comparison.
- Adds/updates LUNAR client utilities for 3D preparation, template/map generation, and running the LUNAR toolchain.
Reviewed changes
Copilot reviewed 14 out of 22 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
examples/example_1.ipynb |
Updates the example workflow to use PrepareReactions and adds optional visualization steps. |
AutoREACTER/reaction_template_builder/reaction_template_pipeline/util.py |
Removes legacy utility functions from the old template builder pipeline. |
AutoREACTER/reaction_template_builder/reaction_template_pipeline/map_reactant_atoms.py |
Removes the legacy reactant/product mapping implementation. |
AutoREACTER/reaction_template_builder/lunar_client/config.py |
Removes legacy LUNAR config location under the old template builder path. |
AutoREACTER/reaction_preparation/reaction_processor/walker.py |
Disables a debug print during atom walking. |
AutoREACTER/reaction_preparation/reaction_processor/utils.py |
Adds shared helpers for CSV paths, dataframe augmentation, reference extraction, and duplicate comparison. |
AutoREACTER/reaction_preparation/reaction_processor/prepare_reactions.py |
Adds the new reaction preparation pipeline with isotope-based atom identity recovery and metadata output. |
AutoREACTER/reaction_preparation/reaction_processor/fragment_comparison.py |
Adds fragment extraction/capping + history-based duplicate transformation comparison utilities. |
AutoREACTER/reaction_preparation/reaction_processor/atom_mapping.py |
Adds SMARTS-to-reactant mapping helper (extracted from legacy code). |
AutoREACTER/reaction_preparation/lunar_client/molecule_template_preparation.py |
Adds template trimming/reindexing and .map generation for bond/react workflows. |
AutoREACTER/reaction_preparation/lunar_client/molecule_3d_preparation.py |
Adds 3D embedding/optimization utilities with fragment separation. |
AutoREACTER/reaction_preparation/lunar_client/lunar_api_wrapper.py |
Adds a wrapper around LUNAR utilities (atom typing, all2lmp, bond_react_merge) plus path normalization helpers. |
AutoREACTER/reaction_preparation/lunar_client/locate_lunar.py |
Adds interactive/auto-detection logic for locating a LUNAR install and persisting it to config. |
AutoREACTER/reaction_preparation/lunar_client/config.py |
Adds a LUNAR root config file (currently committed with a hardcoded absolute path). |
AutoREACTER/reaction_preparation/build_reaction_system.py |
Refactors the reaction template pipeline module structure (currently has import/dataclass issues). |
AutoREACTER/detectors/reactions_library.py |
Renames functional group keys used in hydroxy acid halide reactions. |
AutoREACTER/detectors/reaction_detector.py |
Adjusts ReactionInstance (adds same_reactants) and changes duplicate-key logic to use stable string identifiers. |
AutoREACTER/detectors/functional_groups_detector.py |
Makes FunctionalGroupInfo mutable (removes frozen=True). |
.vscode/settings.json |
Updates workspace Python environment manager settings (conda). |
Comments suppressed due to low confidence (3)
AutoREACTER/reaction_preparation/build_reaction_system.py:59
- Importing
ReactionTemplatefromAutoREACTER.detectors.reaction_detectorwill fail becauseReactionTemplateis no longer defined/exported there. Remove this import or reintroduceReactionTemplateinreaction_detector.py(and update all callers accordingly).
AutoREACTER/reaction_preparation/build_reaction_system.py:106 ReactionMetadatadataclass has non-default fields (reactant_to_product_mapping,template_reactant_to_product_mapping,edge_atoms) declared after defaulted fields (e.g.,reaction_smarts,csv_path). This raisesTypeError: non-default argument ... follows default argumentat import time. Reorder fields so all required (non-default) fields come first, and makecsv_path/othersOptional[...]if they can beNone.
AutoREACTER/reaction_preparation/build_reaction_system.py:56build_reaction_system.pystill imports from.reaction_template_pipeline.*/reaction_template_pipeline.*, but that package doesn't exist inAutoREACTER/reaction_preparation(and the oldreaction_template_builder/reaction_template_pipelinefiles were removed in this PR). As a result, importing this module will raiseImportErrorimmediately. Update these imports to the newreaction_preparation.reaction_processormodules (or remove the dead fallback path).
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
AutoREACTER/reaction_preparation/reaction_processor/prepare_reactions.py
Show resolved
Hide resolved
AutoREACTER/reaction_preparation/reaction_processor/prepare_reactions.py
Show resolved
Hide resolved
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
…actions.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Member
Author
4dc1c00
into
refactor-into-class-based-structure
1 check failed
This was referenced Mar 31, 2026
Copilot stopped work on behalf of
janitha-mahanthe due to an error
March 31, 2026 14:28
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR focuses on addressing sub-issues #24 and #40 under the main refactor issue #25.
Summary
This PR continues the v0.2 class-based refactor by significantly improving the reaction preparation workflow. It introduces a more stable method for tracking atoms during reactions by leveraging isotope numbers instead of relying solely on map numbers, and it implements a structured approach for handling duplicate reaction instances.
Key Changes
1. Robust Atom Tracking via Isotope Numbers (Addresses #24)
Previously, atom mapping was handled purely using map numbers. This approach was unstable because custom map numbers often vanished after the reaction step, requiring us to force or manually manipulate them using methods like
_reveal_template_map_numbers(self, mol: Chem.Mol) -> None.This PR replaces that brittle approach with a cleaner workflow:
_reveal_template_map_numbersworkaround is more straight forward.2. Duplicate Reaction Handling (Addresses #40)
To cleanly handle potential duplicate reaction instances, duplicate detection is now properly structured:
_detect_duplicates(self, reaction_metadata_list: list[ReactionMetadata]) -> list[ReactionMetadata]:to handle the detection logic.ReactionMetadatadataclass. Settingactivity_statstoFalsehelps cleanly manage and filter out duplicates.