Skip to content

Conversation

@cachafla
Copy link
Contributor

@cachafla cachafla commented Aug 11, 2025

Pull Request Description

What and why?

    return (
        table,
        all_passed,
        RawData(raw_cardinality_details=raw_data, dataset=dataset.input_id),
    )

If we let MetricOutputHandler handle this then table will be interpreted as a list of unit metrics. We need to let TableOutputHandler take precedence and have unit metrics processed as a last resort.

How to test

What needs special review?

Dependencies, breaking changes, and deployment notes

Release notes

Checklist

  • What and why
  • Screenshots or videos (Frontend)
  • How to test
  • What needs special review
  • Dependencies, breaking changes, and deployment notes
  • Labels applied
  • PR linked to Shortcut
  • Unit tests added (Backend)
  • Tested locally
  • Documentation updated (if required)
  • Environment variable additions/changes documented (if required)

@cachafla cachafla added the internal Not to be externalized in the release notes label Aug 11, 2025
Copy link
Contributor

@johnwalz97 johnwalz97 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice 👌

@github-actions
Copy link
Contributor

PR Summary

This PR introduces several functional improvements focused on enhancing unit tests and output handling for time series data validation and metrics reporting. The changes include:

  1. In the ADF (Augmented Dickey-Fuller) tests, the time series dataset size has been increased from 100 to 200 observations, providing more stable and statistically significant results. A random seed has been set for reproducibility, and the data now better differentiates between stationary and non-stationary series.

  2. The test comparing stationary versus non-stationary series now validates that the ADF statistic of the stationary series is more negative than that of the non-stationary series, rather than relying solely on p-value comparisons. Additional checks ensure that both p-values lie within the valid range of 0 to 1.

  3. Several new metric identifiers have been added to the tests for individual classification metrics (e.g., AbsoluteError, BrierScore, CalibrationError, among others), expanding the coverage of unit metrics considered during testing.

  4. The order of output processing has been adjusted: the MetricOutputHandler has been moved to execute last to ensure that unit metric outputs are processed after other outputs. This reordering helps maintain better control over the flow and handling of test results.

Overall, these enhancements improve the robustness and reliability of the testing framework without altering core functionalities beyond the test suite and output processing middleware.

Test Suggestions

  • Run the entire test suite to ensure all new and existing tests pass with the updated dataset size and random seed.
  • Explicitly verify that the ADF statistics are more negative for stationary series compared to non-stationary ones in various scenarios.
  • Confirm that p-values are always between 0 and 1 for both series across multiple runs.
  • Test the output processing order to ensure that MetricOutputHandler executes last and that all output handlers function as expected.
  • Review the new metric identifiers to check that they integrate seamlessly with the overall metric evaluation process.

@cachafla cachafla merged commit 858c5fe into main Aug 11, 2025
7 of 8 checks passed
@cachafla cachafla deleted the cachafla/fix-process_output branch August 11, 2025 18:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

internal Not to be externalized in the release notes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants