Skip to content

MultipathogenGenomics/castanet

Repository files navigation

drawing

Described in Mayne, R., Secret., S., Geoghegan, C., et al. (2024) Castanet: a pipeline for rapid analysis of targeted multi-pathogen genomic data. Bioinformatics, btae591. https://doi.org/10.1093/bioinformatics/btae591

drawing Try Castanet with Docker @ DockerHub

New in V9.2.0

  1. Extensions to consensus generator algorithm to allow for greater precision in deconvolving similar sequences
  2. Mapping reference creator now has extended functionality and user guidance
  3. Docker container refresh with simplified build
  4. Fix for bug using postfilt option in batch mode
  5. Extended support for ONT users
  6. Wiki updates

Castanet documentation is hosted on our GitHub Wiki Page

Castanet workflow

image

Dotted lines indicate optional pipeline stages.

Changelog

Version 9, 21/08/25

N.b. if upgrading an existing installation, users will need to install minimap2 to their Castanet Conda environment manually!
$ conda install -c bioconda -y minimap2
  1. Users may now supply mapping references in any format, and fine-tune them with the /convert_mapping_reference/ funciton. See Making a mapping reference for more details.
  2. Additional CLI support for Nanopore workflows
  3. Additional support for ONT users
    • Option to use Minimap2 as mapper
    • Utility function for concatenating all .fastq.gz files in a directory to a single, Castanet-compatible file
  4. Mapping reference file checks are now completed at the start of each run, and users are alerted if common issues are found
  5. Additional details from stderr written to terminal in errors
  6. Bug fixes
    • Fixed an issue where Kraken2 wouldn't run properly in the filtering pipeline step when nested save directories were used.
    • Interim unzipped fastq files are now not created and left in experiment directories when using Minimap2.
    • Extra error handling for consensus generation issues where third party CLI tools are called.
    • Non-Castanet-compliant mapping reference headers would cause failure to generate consensus sequences
    • Using bowtie2 as mapper would create cosnensus sequences with unusual names
    • Running multiple Castanet jobs in parallel via CLI could cause issues with re-indexing the refstem
    • Added error handling to prevent users from inserting non-printable characters in API arguments
    • Summary statistics for batch runs now aggregate correctly in nested folders
  7. Docker container updated with latest version
  8. Consensus algorithm enhancements to give more representative results with highly diverse/recombination-prone viruses
  9. Post filter option for removing uniquely-mapping reads (uses include removal of index hopping reads)
  10. BAM parsing enhancements for compute time and memory footprint.
  11. Parameterised "debug mode", where if False no intermediate files are generated (cleaner output to save space, esp. for use on shared infrastructure)
  12. Test suite updates

Version 8, 20/12/24

  1. Added user option to map with bowtie2, in addition to BWA-mem2
  2. Troubleshooting section added to Wiki
  3. Additional optional arguments added to Castanet lite (CLI)
  4. Security and compatibility updates for modules: pydantic, fastapi, pandas, numpy
  5. Bug fixes:
    • Infile hashing now works as expected on amplicon pipeline
    • Batch mode will now ignore non-directory items in a folder, rather than explictly calling an error
    • Fixed issue where end of batch run summary files would fail to save, if input files and output directory were at different levels of directory nesting
    • Fixed issue where Kraken2 call wasn't working for some users who had pre-existing installations in their PATH
    • User guidance on running Castanet lite (CLI) fixed with correct path
  6. Bowtie2 support extended in Consensus module
  7. Parameterised Consensus enable/disable terminal trim (ConsensusTrimTerminals)
  8. Expanded range of output statistics for Consensus module
  9. Test suite updated
  10. Additional support for Nanopore/single ended read users
  11. Name of headers in consensus sequences now reflects sample name, organism type and minimum depth
  12. Copies of consensus sequences are now saved to ./{experiment directory}/consensus_sequences/, to make them easier to find
  13. Mac user and bacterial target support enhancements (thanks @mjohnpayne)
  14. Test suite expanded to 80% coverage

Version 7, 19/08/24

  1. Docker support
  2. PyTest suite (80% cov)
  3. Wiki creation
  4. Deprecated original CLI and eval tools (supplanted by current version)

Version 6, 02/07/24

  1. Castanet "lite" - simplified CLI added
  2. Full CLI tool test suite
  3. Updated readme

Version 5, 07/03/24

  1. Additional workflow for analysing pre-mapped bam files
  2. Simplified all workflows by automatic inference of sequence/bam files in input folders
  3. Dependency check endpoint
  4. Expanded exception catching and logging
  5. Updated installer and readme
  6. Added parameterisation for n threads, do trimming, do kraken prefilter
  7. Various bug fixes
  8. Updated dependency installer and various dependency calls to enhance compatibility with Mac M1/M2

Version 4, 17/11/23

  1. Support for outputting intermediate files from consensus generation, for downstream analysis
  2. Support for single ended read sets
  3. Installer scripts modified to allow for more minimal linux builds and newer dependency versions (htslib/viral_consensus)
  4. Aggregation function regular expressions modified for BACT-containing probe names
  5. Deprecated requirement for inputting probes csv file; now inferred from refstem
  6. Various error handler improvements

Version 3, 12/09/23

  1. Refinement of consensus generator functions; addition of user-tunable threshold parameters, fix for long terminal gaps, expanding range of statistics reported, error handling, refactoring etc.
  2. Migration of plotting engine to Plotly
  3. Build script and dataset generation automation
  4. Panel converter endpoint with overhaul of string aggregation; finer control over sub-group reporting
  5. Function for trimming terminal gaps, which may appear as an artefact of Mafft reference alignments
  6. Various bug fixes

Version 2, 28/07/23

  1. Added consensensus calling functions
  2. Added evaluation of consensus sequence functions
  3. Experiment result folder creation and persisitent file storage overhauled
  4. Various optimizations
  5. Readme updated

Version 1, 14/06/23

  1. Python scripts naturalised to Python3
  2. Project and working directory structure
  3. Dependency installer shell script
  4. Experiment directory management functions
  5. Refresh example experiment in readme to fit end to end workflow script
  6. Python scripts adapted to OOP practices for security and speed
  7. Unified API with end-to-end workflow endpoint
  8. Git hooks for contributors
  9. Containerization

Disclaimer

The material embodied in this software is provided to you "as-is", “with all faults”, and without warranty of any kind, express, implied or otherwise, including without limitation, any warranty of fitness for a particular purpose, warranty of non-infringement, or warranties of any kind concerning the safety, suitability, lack of viruses, inaccuracies, or other harmful components of this software. There are inherent dangers in the use of any software, and you are solely responsible for determining whether this software is compatible with your equipment and other software installed on your equipment. You are convert_fasta_to_genbankalso solely responsible for the protection of your equipment and backup of your data, and the developers/providers will not be liable for any damages you may suffer in connection with using, modifying, or distributing this software. Without limiting the foregoing, the developers/providers make no warranty that: the software will meet your requirements; the software will be uninterrupted, timely, secure, or error-free; the results that may be obtained from the use of the software will be effective, accurate, or reliable; the quality of the software will meet your expectations; any errors in the software will be identified or corrected.

Software and its documentation made available here could include technical or other mistakes, inaccuracies, or typographical errors. The developers/providers may make changes to the software or documentation made available here may be out of date, and the developers/providers make no commitment to update such materials.

The developers/providers assume no responsibility for errors or omissions in the software or documentation available from here.

In no event shall the developers/providers be liable to you or anyone else for any direct, special, incidental, indirect, or consequential damages of any kind, or any damages whatsoever, including without limitation, loss of data, loss of profit, loss of use, savings or revenue, or the claims of third parties, whether or not the developers/providers have been advised of the possibility of such damages and loss, however caused, and on any theory of liability, arising out of or in connection with the possession, use, or performance of this software.

The use of this software is done at your own discretion and risk and with agreement that you will be solely responsible for any damage to your computer system, or networked devices, or loss of data that results from such activities. No advice or information, whether oral or written, obtained by you from the developers/providers shall create any warranty for the software.

About

Sample type- and sequencing method-agnostic application for analysis of metagenomics sequencing data.

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •