Skip to content

Conversation

@juanmleng
Copy link
Contributor

Internal Notes for Reviewers

Some embedding tests were failing because we were trying to store multi-dimensional embedding arrays in the RawData using a DataFrame. The fix changes the raw data storage from DataFrame to dictionary to properly store the embeddings.

External Release Notes

@juanmleng juanmleng self-assigned this Jan 29, 2025
@juanmleng juanmleng added bug Something isn't working internal Not to be externalized in the release notes labels Jan 29, 2025
Copy link
Contributor

@johnwalz97 johnwalz97 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for fixing this Juan!

@github-actions
Copy link
Contributor

PR Summary

This pull request introduces enhancements to the StabilityAnalysis modules and optimizes data handling in the utils.py file. The key changes include:

  1. Return Value Enhancement: The return values of the perturb_data function in the StabilityAnalysisRandomNoise.py, StabilityAnalysisSynonyms.py, and StabilityAnalysisTranslation.py files have been modified. Instead of returning a tuple containing result and RawData, the function now returns the unpacked result followed by RawData. This change improves the flexibility and usability of the function's output.

  2. Data Handling Optimization: In the utils.py file, the creation of a raw data DataFrame using pandas has been replaced with a dictionary. This change reduces the dependency on pandas and potentially improves performance by avoiding unnecessary DataFrame operations when only a simple data structure is needed.

These changes aim to enhance the performance and maintainability of the codebase by optimizing data handling and improving the return values of key functions.

Test Suggestions

  • Test the perturb_data function in each StabilityAnalysis module to ensure the unpacked return values are correctly handled.
  • Verify that the dictionary-based raw data structure in utils.py correctly stores and retrieves original, perturbed, and similarity data.
  • Check for any downstream effects or dependencies that might be affected by the change from a DataFrame to a dictionary in utils.py.
  • Ensure that the removal of the pandas import does not affect any other parts of the codebase.

@juanmleng juanmleng merged commit a45706e into main Jan 29, 2025
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working internal Not to be externalized in the release notes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants