Skip to content

microbialARC/Redcarpet

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Redcarpet (Recombination Detection using Comparative Analysis of Regional Patterns of Exact Match Targets)

Redcarpet is a alignment-free recombination detection tool that utilizes genomic database distributions of exact protein matches. Redcarpet builds on the WhatsGNU method, which uses exact matching for identifying proteomic novelty.

Redcarpet takes in a single query genome, and for each encoded protein, determines the set of genomes in a database that contain an exact protein sequence match. It then computes the Jaccard similarity coefficient between genome sets for all pairwise protein comparisons in the genome.


Getting the Input File for Redcarpet

WhatsGNU must first be run in order to get the hits file that will serve as the input for Redcarpet. To install WhatsGNU, see here.

Using a hashed database:

WhatsGNU_main_hashes.py -d $database_path -csv $file.csv -i --hash_values -o $output_directory query_faa/

whatsgnu_output/
├── GCF_000005845.2_prtn_id_hashes.csv
├── GCF_000005845.2_WhatsGNU_hits.txt
├── GCF_000005845.2_WhatsGNU_report.txt
└── WhatsGNU_20250108_115433.log

Running Redcarpet

To run Redcarpet, use the following command:

python3 Redcarpet.py $WhatsGNU_hits.txt

usage: Redcarpet.py [-h] [-v] [-bk BOTTOM_K] [--hash_file HASH_FILE] ids_hits_file

Alignment-Free Recombination Detector

positional arguments:
  ids_hits_file         ids_hits_file from WhatsGNU -i option

options:
  -h, --help            show this help message and exit
  -v, --version         print version and exit
  -bk BOTTOM_K, --bottom_k BOTTOM_K
                        bottom-k cutoff for hits (default: all hits are used)
  --hash_file HASH_FILE
                        ids hash file for a WhatsGNU database

[Discuss what is meant by bottom_k option]

Changepoint Analysis

Once the RedCarpet Report has been generated by Redcarpet, you can start on the changepoint analysis step.

Single File Processing

To process a single Redcarpet report, use the following command:

python3 CarpetCleanChangepoints.py --mode single -i $input_report -o $output_directory

Batch Processing

To process multiple Redcarpet reports in a folder simultaneously:

python3 CarpetCleanChangepoints.py --mode batch --input_folder $input_folder -o $output_directory

Command Line Options:


-i, --input_heatmap : Path to the heatmap file generated by Redcarpet (required for single mode)

-input_folder : Path to folder containing multiple Redcarpet reports (required for batch mode)

-o, --output_directory : Directory to store outputs (optional - defaults to input file/folder directory)

--similarity_threshold : P-value threshold for determining region similarity (default: 0.05, lower = more strict)

--k_neighbors : Number of nearest neighbors to compare for efficiency (default: 5, higher = more comparisons)

--num_chunks : Number of chunks to split data into (default: 10)

The output will have different regions of the genome with the changepoints identified. Additionally, information about which regions are similar will be provided.

Additional Resources Available

Already completed heatmaps and reports for all S.aureus and K.pneumoniae

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages