Skip to content

MartinezRuiz-Carlos/consensus_insert_sequence

Repository files navigation

Consensus insert sequence calling

Scripts to generate a consensus inserted sequence of insertions called by Savana. The consusensus is based on multi-alignment of the fasta file produced by Savana for insertion calls. The script preprocess_insert_sequence_fa.sh takes as an input the .inserted_sequences_savana.fa file generated by Savana. It then splits it per variant is run through consensus_insert_sequence.py. The script run_consensus_test.array.sh runs the Python script through all variants of one patient on the Crick cluster.

Software and libraries required

  • SeqKit grep for pre-processing of the inserted sequences file
  • Mafft for multiple alignment
  • Python libraries the file sequence_ins_env.yml has all required libraries in a conda environment:
    • Bio
    • io
    • pathlib
    • argparse

How to run

The scripts are written assuming they will be run on the Crick HPC. First run the pre-processing script to divide the SAVANA fasta by SV ID. The script takes two inputs, the fasta file with all inserted sequences from savana, and an output directory: ./preprocess_insert_sequence_fa.sh SAMPLE_ID.inserted_sequences.fa tmp_dir

Then, run the consensus script in an array job, each of which will run multiple sequences in a loop. This script takes two inputs, a tmp dir with the fasta sequences of each individual SV generated in step 1, and an output directory where the resulting consenus insert sequences will be saved ./run_consensus_test.array.sh tmp_dir out_dir

About

Scripts to generate a consensus insert from inserted sequences detected during SV calling

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors