Skip to content

Improve performance of feature_columns_numeric and feature_columns_categorical#360

Merged
cachafla merged 3 commits intomainfrom
cachafla/sc-9919/more-fixes
Apr 25, 2025
Merged

Improve performance of feature_columns_numeric and feature_columns_categorical#360
cachafla merged 3 commits intomainfrom
cachafla/sc-9919/more-fixes

Conversation

@cachafla
Copy link
Contributor

Internal Notes for Reviewers

External Release Notes

@cachafla cachafla added the internal Not to be externalized in the release notes label Apr 25, 2025
@cachafla cachafla requested a review from johnwalz97 April 25, 2025 20:00
Copy link
Contributor

@johnwalz97 johnwalz97 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

love it!

@github-actions
Copy link
Contributor

PR Summary

This pull request optimizes the method for detecting numeric and categorical feature columns in the validmind/vm_models/dataset/dataset.py file. Previously, the code loaded data into memory to determine the data types of feature columns. The updated implementation retrieves data types without loading data into memory, improving performance and efficiency.

The changes involve:

  • Using dtypes to get the data types of feature columns directly.
  • Applying pd.api.types.is_numeric_dtype and pd.api.types.is_categorical_dtype to determine numeric and categorical columns, respectively.
  • This change reduces memory usage and potentially speeds up the process of setting feature columns.

Test Suggestions

  • Test the _set_feature_columns method to ensure it correctly identifies numeric and categorical columns with a variety of datasets.
  • Verify that the method performs efficiently with large datasets, ensuring no significant memory usage increase.
  • Check edge cases where feature columns might have mixed data types or unexpected data types.
  • Ensure that the method behaves correctly when there are no numeric or categorical columns present.

@cachafla cachafla merged commit bf3848b into main Apr 25, 2025
7 checks passed
@cachafla cachafla deleted the cachafla/sc-9919/more-fixes branch April 25, 2025 22:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

internal Not to be externalized in the release notes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants