Skip to content

Add pH and Temperature as input features to CatPred training pipeline#1

Draft
Copilot wants to merge 3 commits intomainfrom
copilot/add-ph-temp-features
Draft

Add pH and Temperature as input features to CatPred training pipeline#1
Copilot wants to merge 3 commits intomainfrom
copilot/add-ph-temp-features

Conversation

Copy link

Copilot AI commented Nov 26, 2025

Extends CatPred to support pH and temperature as additional molecule-level input features, with optional normalization and seamless integration into the existing feature pipeline.

Changes

Data Layer (catpred/data/data.py)

  • MoleculeDatapoint: Added ph and temp params with raw value storage for scaling
  • extend_features_with_ph_temp(): Appends pH/Temp to feature vector (None → 0.0)
  • MoleculeDataset: Added normalize_ph_temp(), ph_values(), temp_values(), has_ph_temp_features()

Arguments (catpred/args.py)

  • TrainArgs/PredictArgs: Added --ph_column, --temp_column, --no_ph_temp_features_scaling

Data Loading (catpred/data/utils.py)

  • Extracts pH/Temp from CSV columns with validation
  • Handles missing values: '', 'nan', 'None', 'null'None

Training (catpred/train/run_training.py)

  • Applies pH/Temp normalization (when enabled) and feature extension to train/val/test splits

Usage

python train.py --data_path data.csv \
    --ph_column pH \
    --temp_column temperature \
    --dataset_type regression

CSV should include the specified columns. Missing values are handled gracefully. Scaling is enabled by default; disable with --no_ph_temp_features_scaling.

Backward Compatibility

All new parameters default to None—existing workflows unaffected.

Original prompt

Add PH and Temp as input features to be included in the training pipeline of the CatPred framework. Update the feature handling mechanisms in the necessary files to ensure PH and Temp are incorporated alongside existing molecule-level features. Key tasks include:

  1. Update the MoleculeDatapoint class in catpred/data/data.py to handle PH and Temp as additional features. Specifically, modify the extend_features method to include these new inputs.

  2. In the TrainArgs class, ensure appropriate paths or placeholder mechanisms exist for sourcing PH and Temp data. Update attributes like features_path or create new ones if needed.

  3. Validate and preprocess the new features within the data processing pipeline to ensure they integrate seamlessly with the model. This includes scaling, normalization, and handling of missing values where applicable.

  4. Update the feature-related documentation and ensure consistency with the repository's data and training guidelines.

Testing:

  • Verify that the training pipeline functions correctly with PH and Temp in both the presence and absence of these features.
  • Ensure no adverse effects on the existing functionality when new features are unused.

This will expand the capability of the CatPred framework while maintaining backward compatibility.

This pull request was created as a result of the following prompt from Copilot chat.

Add PH and Temp as input features to be included in the training pipeline of the CatPred framework. Update the feature handling mechanisms in the necessary files to ensure PH and Temp are incorporated alongside existing molecule-level features. Key tasks include:

  1. Update the MoleculeDatapoint class in catpred/data/data.py to handle PH and Temp as additional features. Specifically, modify the extend_features method to include these new inputs.

  2. In the TrainArgs class, ensure appropriate paths or placeholder mechanisms exist for sourcing PH and Temp data. Update attributes like features_path or create new ones if needed.

  3. Validate and preprocess the new features within the data processing pipeline to ensure they integrate seamlessly with the model. This includes scaling, normalization, and handling of missing values where applicable.

  4. Update the feature-related documentation and ensure consistency with the repository's data and training guidelines.

Testing:

  • Verify that the training pipeline functions correctly with PH and Temp in both the presence and absence of these features.
  • Ensure no adverse effects on the existing functionality when new features are unused.

This will expand the capability of the CatPred framework while maintaining backward compatibility.


✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

Copilot AI and others added 2 commits November 26, 2025 12:54
Co-authored-by: samcogan21 <161603591+samcogan21@users.noreply.github.com>
Co-authored-by: samcogan21 <161603591+samcogan21@users.noreply.github.com>
Copilot AI changed the title [WIP] Add PH and Temp as input features in CatPred Add pH and Temperature as input features to CatPred training pipeline Nov 26, 2025
Copilot AI requested a review from samcogan21 November 26, 2025 13:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants