Source code for QualiBact. Analyzing microbial genome assembly statistics across multiple species. It compares allthebacteria assemblies to NCBI RefSeq assemblies and generates detailed statistics, outlier detection with Isolation Forest, and visualizations.
Website: https://happykhan.github.io/qualibact/
This is the criteria used in Speccheck: https://github.com/happykhan/speccheck
Contributing to QualiBact
We welcome contributions to QualiBact! Major contributions to source code, manuscript development, metric validation and adoption, and providing additional data for calibrating quality thresholds will be granted authorship on publications. Pull requests are welcome through GitHub.
Please Read CONTRIBUTING
- Genome Assembly Analysis: Parses genome assembly statistics across multiple species
- Comparative Analysis: Compares allthebacteria assemblies to NCBI RefSeq assemblies
- Outlier Detection: Uses Isolation Forest for anomaly detection
- Data Visualization: Generates comprehensive plots and statistics
- Data Processing: Includes utilities for merging and processing TSV files
- Web content: Produces markdown files for publishing to the website (https://happykhan.github.io/qualibact/)
python qualibact-run.py \
--workdir <working_directory> \
--species_file <species_file.txt> \
--min_genome_count <min_count>- Per-species plots (
*.png) and CSV summaries (summary.csv,selected_summary.csv) - Combined summary tables across species (
all_metrics.csv,all_metrics_summary.csv) - Outlier visualizations with anomaly scores and joint KDEs
merged.tsv: Combined TSV file with filename tracking (generated bymake_copy.sh)- GC content analysis results in
output_compare_gc/directory
make_copy.sh: Utility script for merging multiple TSV files with filename trackingfind_failed_jobs.py: Helper script for identifying failed processing jobsgc_refseq/do_gc_refseq.py: GC content analysis for RefSeq data
- Python ≥ 3.7
- pandas
- numpy
- seaborn
- matplotlib
- scipy
- scikit-learn
Install them with:
pip install -r requirements.txtIf you find issues or have suggestions, feel free to open an issue or submit a pull request!