Skip to content

2.0.1 release - fixes to validations#130

Open
mike-finopsorg wants to merge 12 commits intomainfrom
dev
Open

2.0.1 release - fixes to validations#130
mike-finopsorg wants to merge 12 commits intomainfrom
dev

Conversation

@mike-finopsorg
Copy link
Contributor

Material Changes (Excluding Formatting & Tests)

  1. Enhanced Rule Applicability & Skipping System
    New Result Override Architecture (focus_to_duckdb_converter.py)
    Fixes Entities skipped due to Applicability are not being marked as skipped #128

Added apply_result_overrides() method - centralized post-processing system that handles:
Non-applicable rules: Rules that don't meet applicability criteria are now properly skipped
Composite aggregation: Composite rule results are updated based on actual child results
Dependency skips: Rules with failed/skipped dependencies are automatically skipped
New helper methods:
_apply_non_applicable_skips() - Marks non-applicable rules and all descendants
_apply_nested_child_non_applicable_marking() - Handles nested children after composite aggregation
_apply_composite_aggregation() - Updates composite results from child results
_apply_dependency_skips() - Skips rules with failed dependencies
_collect_all_descendants() - Recursively collects all rule descendants
New Check Class: SkippedOptionalCheck

Explicitly handles MAY/OPTIONAL rules that should be skipped
Clear error message: "Rule skipped - marked as MAY/OPTIONAL and not enforced"
Validation Keyword Support
Fixes: #127

Added _get_validation_keyword() method to extract MUST/SHOULD/MAY/RECOMMENDED keywords from rules
Applied throughout all check generators for accurate error messages
2. Improved Error Message Generation
Fixed Double-Negative Grammar Issues

CheckNotValueGenerator: Now detects when keyword already contains "NOT" (e.g., "MUST NOT")
Before: "Column MUST NOT NOT be NULL"
After: "Column MUST NOT be NULL"
CheckNotSameValueGenerator: Same fix for comparing columns
Before: "Columns MUST NOT NOT have same value"
After: "Columns MUST NOT have same value"
Enhanced Sample Violations

Added get_sample_sql() and sample_sql methods to multiple generators
Enables showing actual violating data rows in web reports
3. CSV Data Loading Improvements (csv_data_loader.py)
All-NULL Column Detection

New _peek_for_all_null_columns() method:
Peeks at first ~5000 rows to identify columns that are entirely NULL
Prevents false-positive type errors on NULL-only columns
Modified _try_load_with_types():
Only forces types for all-NULL columns
Lets Polars infer types for columns with data
Enables better type mismatch detection while avoiding false positives
Improved logging for forced type conversions
4. Web Report Enhancements (outputter_web.py)
Skipped Composite Children Handling

Identifies children of skipped composite rules (excluding Dataset entity types)
Automatically marks these children as skipped in the display
Preserves existing skip reasons (e.g., dynamic rules) when present
Default message: "Rule skipped - not applicable to current dataset or configuration"
Enhanced Entity Status Display

Better handling of skipped requirements in entity status calculation
Clear differentiation between non-applicable and dependency-skipped rules
Improved filtering logic for status-based display
Sample Violations Display

Added support for displaying sample violating rows
Shows actual data that failed validation
Formatted display with column=value pairs
5. Rule Model Updates (rule.py)
New Method: is_optional()

Checks if rule has OPTIONAL or MAY keyword
Used to determine if rule should be enforced
6. Integration (spec_rules.py)
Post-Processing Hook

Added call to converter.apply_result_overrides() after all checks run
Ensures all result overrides are applied in a single, maintainable location
Runs before finalization to ensure consistent results
7. Type Safety (focus_to_duckdb_converter.py)
Added Missing Attributes to DuckDBColumnCheck:

_dependencies: Optional[Set[str]] - Tracks rule dependencies
_child_rule_ids: Optional[List[str]] - Tracks composite children
_non_applicable: bool - Non-applicability flag
_non_applicable_reason: Optional[str] - Explanation for non-applicability

…n web report

Simplify the logic for marking rules as Skipped
Clean up error repsonses
Add show violation support to the web report
Added more violation support to generators
Signed-off-by: Mike Fuller <mike@finops.org>
Signed-off-by: Mike Fuller <mike@finops.org>
Signed-off-by: Mike Fuller <mike@finops.org>
Normalise responses to errors with NULL, NaN etc
Force error messages to take correct hierachy
Introduce more error messages and validation samples
Better data loading from CSV

Signed-off-by: Mike Fuller <mike@finops.org>
Signed-off-by: Mike Fuller <mike@finops.org>
Signed-off-by: Mike Fuller <mike@finops.org>
Signed-off-by: Mike Fuller <mike@finops.org>
Signed-off-by: Mike Fuller <mike@finops.org>
Signed-off-by: Mike Fuller <mike@finops.org>
Copy link
Collaborator

@Matt-Cowsert Matt-Cowsert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey Mike, I think it's worth considering adding these recommended changes. For transparency, I ran the PR through Claude Code to help surface any inconsistencies, but they made sense to me to consider.

  1. Type mismatch (focus_to_duckdb_converter.py:198) - _dependencies is typed as Set[str] but used as List[DependencyRef]

  2. XSS vulnerability (outputter_web.py:1702-1703, 1790-1791) - Sample violation values should be HTML-escaped before rendering

Correctly escape output to prevent any issues with unsafe values in data being processed by web output
Fix up filtering issue in rule view.
Signed-off-by: Mike Fuller <mike@finops.org>
Signed-off-by: Mike Fuller <mike@finops.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

2 participants