Skip to content

Update Mutual Information data validation test#381

Merged
AnilSorathiya merged 1 commit intomainfrom
anilsorathiya/sc-10893/update-mutual-information-data-validation
Jun 19, 2025
Merged

Update Mutual Information data validation test#381
AnilSorathiya merged 1 commit intomainfrom
anilsorathiya/sc-10893/update-mutual-information-data-validation

Conversation

@AnilSorathiya
Copy link
Contributor

Pull Request Description

Mutual information takes only numerical data

What and why?

Mutual information can be calculated on the numerical data only.

  • updated test to take only numerical columns

How to test

run test

What needs special review?

Dependencies, breaking changes, and deployment notes

Release notes

Checklist

  • What and why
  • Screenshots or videos (Frontend)
  • How to test
  • What needs special review
  • Dependencies, breaking changes, and deployment notes
  • Labels applied
  • PR linked to Shortcut
  • Unit tests added (Backend)
  • Tested locally
  • Documentation updated (if required)
  • Environment variable additions/changes documented (if required)

@AnilSorathiya AnilSorathiya added bug Something isn't working internal Not to be externalized in the release notes labels Jun 19, 2025
@AnilSorathiya AnilSorathiya requested a review from juanmleng June 19, 2025 14:50
@github-actions
Copy link
Contributor

PR Summary

This PR enhances the MutualInformation function by adding additional validations to ensure the dataset contains the necessary numeric features and a target column before proceeding with mutual information calculations. In detail, the following changes have been made:

  • Added a check that raises an exception if the dataset does not have any numeric features (dataset.feature_columns_numeric), ensuring that the calculation only proceeds when valid numeric features are available.
  • Introduced a check to ensure that a target column is provided in the dataset (dataset.target_column) before performing computation.
  • Modified how the feature and target variables (X and y) are retrieved from the dataset, now obtaining values from the internal dataframe (dataset._df) using the specified column identifiers.
  • The selection of the mutual information function (classification vs. regression) remains intact, assuming that the task input is valid.

These changes aim to improve the robustness of the function by providing clear error messages when essential dataset properties are missing.

Test Suggestions

  • Write tests where dataset.feature_columns_numeric is empty, and verify that the correct ValueError is raised with the expected message.
  • Write tests where dataset.target_column is not set, and verify that the function raises the appropriate ValueError.
  • Test the function with a valid dataset that includes numeric features and a target column to ensure that the mutual information calculation proceeds without error.
  • Test both 'classification' and 'regression' task pathways to ensure correct behavior for valid inputs.

@AnilSorathiya AnilSorathiya merged commit 4eb8387 into main Jun 19, 2025
9 checks passed
@AnilSorathiya AnilSorathiya deleted the anilsorathiya/sc-10893/update-mutual-information-data-validation branch June 19, 2025 22:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working internal Not to be externalized in the release notes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants