Skip to content

[SC-13352] Small fixes to TrainingTestDegradation test#452

Merged
juanmleng merged 1 commit intomainfrom
juan/sc-13352/small-fixes-to-training-test-degradation-test
Nov 25, 2025
Merged

[SC-13352] Small fixes to TrainingTestDegradation test#452
juanmleng merged 1 commit intomainfrom
juan/sc-13352/small-fixes-to-training-test-degradation-test

Conversation

@juanmleng
Copy link
Contributor

Pull Request Description

What and why?

What

Removed incorrect references to "accuracy" metric from the docstring. The test only evaluates precision, recall, and f1-score per class, not accuracy.

Why

The test docstring is used as context for LLM-based test descriptions. The inaccuracy in the docstring causes the LLM to incorrectly state that accuracy is being computed, which results in low faithfulness scores when evaluating the generated descriptions against the actual test implementation.

How to test

Run the TrainingTestDegradation test—using the customer churn or the credit risk scorecard notebook—and check that the generated description does not reference accuracy as one of the computed metrics.

Before:
Screenshot 2025-11-25 at 14 42 54

After:
Screenshot 2025-11-25 at 14 39 36

What needs special review?

Dependencies, breaking changes, and deployment notes

Release notes

Checklist

  • What and why
  • Screenshots or videos (Frontend)
  • How to test
  • What needs special review
  • Dependencies, breaking changes, and deployment notes
  • Labels applied
  • PR linked to Shortcut
  • Unit tests added (Backend)
  • Tested locally
  • Documentation updated (if required)
  • Environment variable additions/changes documented (if required)

@juanmleng juanmleng self-assigned this Nov 25, 2025
@juanmleng juanmleng added bug Something isn't working internal Not to be externalized in the release notes labels Nov 25, 2025
@github-actions
Copy link
Contributor

PR Summary

This PR updates the documentation within the TrainingTestDegradation test file by removing references to the accuracy metric. The changes clarify that the test now focuses solely on precision, recall, and f1 score to evaluate the degradation between the training and test datasets. Additionally, the explanation of the threshold for acceptable degradation was updated to explicitly include the default value (0.10). These modifications aim to improve clarity and consistency in the test's description without altering the underlying test logic.

Test Suggestions

  • Run the tests to verify that model performance is only evaluated using precision, recall, and f1 score.
  • Validate that the output table correctly lists the train score, test score, degradation percentage, and pass/fail status for the specified metrics.
  • Check that documentation changes are reflected accurately in any auto-generated test reports or rendered documentation views.

Copy link
Contributor

@AnilSorathiya AnilSorathiya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks!

@juanmleng juanmleng merged commit 0b849ec into main Nov 25, 2025
19 checks passed
@juanmleng juanmleng deleted the juan/sc-13352/small-fixes-to-training-test-degradation-test branch November 25, 2025 15:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working internal Not to be externalized in the release notes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants