Skip to content

Conversation

@alancleary
Copy link
Member

@alancleary alancleary commented Jul 3, 2025

Tests are run against data in s3://tiledb-unittest/vcf-ingestion-test/. Specifically, there's 6 .vcf.gz files, each corresponding to a different success/failure code path in VCF ingestion. Additionally, there's a TileDB "metadata" array and a file containing the .vcf.gz URIs. These, and the .vcf.gz files themselves, are used to exercise all 3 ways VCF URIs can be provided during ingestion.

These tests won't be runnable via CI because the unittest user doesn't have adequate privileges. However, these tests can be run locally, which is better than nothing!

Currently looking for any/all feedback.

Note, this is a branch of alancleary/vcf-12/ingestion-failure-handling and is intended to be merged after PR #723.

This replaces hard-coded strings used when reading/writing the manifest array.
Ingest now distinguishes between smaples that are ready to load and samples that have been loaded using a new "ready" status. The status of a sample is changed from "ready" / "missing index" to "ok" upon successful ingestion, allowing failed ingestions to be resumed.
This includes renaming some variables to add context to the messages.
This ensures that a member's value is saved to the ingestion manifest, rather than the member's name.
This allows the function to be blocking, if need be. The default value is False to preserve the previous unparameterized behavior.
This allows the function to be used externally.
The class provides common setup, log capturing during ingestion, and tests common to all ingestion types.
This class extends the base class for VCF ingestion, implementing ingestion via search URI and search-specific test cases.
@alancleary alancleary added the enhancement New feature or request label Jul 3, 2025
This class extends the base class for VCF ingestion, implementing ingestion via sample list and list-specific test cases.
This class extends the base class for VCF ingestion, implementing ingestion via metadata array and metadata-specific test cases.
…class

This allows the base class configure, setup, and teardown to be leveraged by classes that don't need to run the common tests.
This class extends the base class for VCF ingestion, implementing ingestions that simulate a failed ingestion with subsequent (non-)resume ingestions, as well as resume-specific test cases. Note that causing an ingestion to fail programmatically is not support by VCF; ingestion failure is enabled for this class by monkey patching the sample ingestion DAG function, allowing failures to be simulated deterministically.
This is to prevent them from being run by CI.
This is to prevent an import error during CI since TileDB-VCF is not installed in the CI environment.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants