Sentinel is a comprehensive data validation and processing system designed for managing datasets. The project includes modules for handling missing values, schema validation, regression tests, and statistical stability checks for various datasets. It is aimed at ensuring high-quality, reliable data processing and analysis pipelines.
- Missing Value Handling: Detects and handles missing values across datasets.
- Schema Validation: Ensures the data schema is consistent and meets expected formats.
- Regression Testing: Compares different versions of datasets to detect regressions.
- Statistical Stability Testing: Validates the consistency of data distributions between dataset versions.
- Missing Values Handling: The project automatically detects missing values and can fill them with default values.
- Schema Validation: It ensures that datasets comply with the expected column names and data types.
- Regression Testing: It compares two versions of a dataset to detect any regressions in the data.
- Statistical Stability: This feature tests if the data distribution remains stable between two versions of the dataset using statistical tests.
We welcome contributions! If you'd like to improve Sentinel, feel free to fork the repository, create a new branch, and submit a pull request. Please ensure that you write tests for any new functionality and that the existing tests pass.