Microphaser is a tool for phasing small tumor DNA sequences - e.g. coding for small peptides - in linear time.
It can be used in tumor neoantigen prediction to generate the neo-peptidome.
Microphaser will soon be available for installation via Bioconda.
To use microphaser, you need the following input files:
- a sorted and indexed bam file containing mapped tumor reads
- a reference genome in fasta format
- the matching gene and transcript annotation in gtf format
- a bcf/vcf file containing germline and somatic variants, where somatic variants should be flagged with a
SOMATICINFO tag - optional: a bcf/vcf file containing only germline variants
Microphaser returns three important files
- two filtered fasta files containing all neopeptides and their wildtype counterparts for further use with MHC-binding prediction tools
- an info file in tsv format containing meta-information about every neopeptide
The info table consist of the following fields:
- id: peptide identifier as found in the fasta files
- transcript: Ensembl transcript name
- gene_id: Ensembl gene name
- gene_name: Gene symbol
- chrom: Chromosome
- offset: Position of the neopeptide on the chromosome
- freq: Frequency of the neopeptide occurring in all reads overlapping the peptide position
- nvar: number of variants in the neopeptide
- nsomatic: number of somatic variants in the neopeptide
- nvariant_sites: number of variant sites in the range of the neopeptide
- nsomvariant_sites: number of somatic variant sites in the range of the neopeptide
- strand: Strand orientation of the transcript (forward or reverse)
- somatic_positions: Positions of the somatic variants in the neopeptide
- somatic_aa_change: Somatic Amino Acid changes occuring in the neopeptide
- germline_positions: Positions of germline variants in the neopeptide
- germline_aa_change: Germline Amino Acid changes occuring in the neopeptide
- normal_sequence: Nucleotide sequence of the wildtype peptide
- mutant_sequence: Nucleotide sequence of the neopeptide
Currently, microphaser consists of four different submodules:
- somatic (returns neopeptides and their corresponding wildtype peptides)
- normal (returns all wildtype peptides of the patient)
- build_reference (returns a binary file representing the patients wildtype peptidome)
- filter (compares neopeptides against the wildtype peptidome and removes self-similar candidates)
You can run microphaser like this:
Phasing of the tumor reads and variants:
microphaser somatic tumor.bam -r reference.fa -b all_variants.bcf -t neopeptides.info.tsv -n wildtype_peptides.fa < reference.gtf > neopeptides.fa
Generation of the patients germline peptidome:
microphaser normal healthy.bam -r reference.fa -b germline_variants.bcf -n wildtype_peptides.fa < reference.gtf > germline_peptidome.fa
Building the reference binary file of the germline peptidome:
microphaser build_reference germline_peptidome.fa > germline_peptidome.bin
Filtering the neopeptide candidates from subcommand microphaser somatic:
microphaser filter -r germline_peptidome.bin -t neopeptides.info.tsv -o neopeptides.filtered.info.tsv -n wildtype_peptides.filtered.fa > neopeptides.filtered.fa
- Jan Forster (https://github.com/jafors)
- Johannes Köster (https://koesterlab.github.io)