Skip to content

Converts Custom FASTA file to Kraken2 Index without need of taxonomy or accession files

Notifications You must be signed in to change notification settings

arpit20328/FastaKrakenizer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

30 Commits
Β 
Β 
Β 
Β 

Repository files navigation

🧬 FastaKrakenizer

FastaKrakenizer builds a custom Kraken2 database from a plain FASTA file β€” without needing NCBI's names.dmp, nodes.dmp, or accession_list.txt.

It also optionally post-processes a Kraken2 classification report by replacing TaxIDs with readable FASTA header names using a generated flat taxonomy.


πŸ“„ Why Use This Tool?

Kraken2 reports use TaxIDs (numeric) which are uninformative in custom databases. This tool:

  • Creates a flat taxonomy, where each FASTA header is treated as its own species.
  • Assigns custom TaxIDs (e.g. starting from 9000000+).
  • Replaces TaxIDs in Kraken2 reports with corresponding species names (FASTA headers).

Ideal for:

  • Simulated reads
  • Plasmids, ARGs, mobile elements
  • Custom isolate genomes

πŸ“ Required Inputs

  1. FASTA file β€” e.g., custom.fasta
  2. (Optional) Kraken2 Report (report.txt)

Prerequisite

  1. Kraken2 : Install via conda
   conda install -c bioconda kraken2
  1. BBMask : Install via following commands:
wget https://sourceforge.net/projects/bbmap/files/latest/download -O bbtools.tar.gz
tar -xvzf bbtools.tar.gz
mv bbtools ~/bbtools
echo 'export PATH=$PATH:~/bbtools' >> ~/.bashrc
source ~/.bashrc

Installation

git clone https://github.com/arpit20328/FastaKrakenizer.git

πŸ› οΈ Usage

πŸ”Ή Build Kraken2 DB only:

bash custom_kraken2_flat_db.sh <input_fasta> <kraken_db_dir> <starting_taxid> [<threads>]

bash custom_kraken2_flat_db.sh custom.fasta kraken_custom_flat 9000000  64

Sample Output

image
Column Index Meaning Description
1 Percentage of reads assigned Percentage of total reads classified to this taxon or below it (including descendants).
2 Number of reads classified to this taxon Number of reads classified directly to this taxon or its descendants.
3 Number of reads classified directly here Reads classified exactly to this taxon (not including descendants).
4 Taxonomic rank code Single-letter code indicating taxonomic rank (e.g., S = species, U = unclassified).
5 NCBI Taxonomy ID (taxid) Numeric taxonomy identifier assigned by NCBI taxonomy database.
6 Taxon name The scientific name or label for this taxon (e.g., species name, or "unclassified").

Replacing taxid names with Inpur FASTA Headers

If You want to replace the 5th column in a Kraken2 report (or similar file) β€” which usually contains taxonomic IDs or names β€” with your input FASTA headers by using the names.dmp file by following command:

awk -F '\t' 'NR==FNR { taxid_name[$1]=$3; next } { if ($5 in taxid_name) $5=taxid_name[$5]; print }' names.dmp kraken2_report.txt > kraken2_report_with_names.txt

πŸ“¦ Example Kraken2 Index

An example Kraken2 index built using FastaKrakenizer from the complete Homo sapiens genome assembly (T2T-CHM13v2.0) is available at:

πŸ”— Zenodo Record: https://zenodo.org/records/16459107

Runtime

Index of GCF_009914755.1 (T2T-CHM13v2.0) FASTA (3 GB) built in 16 minutes 19.7 seconds using 190 CPU threads.

πŸ“„ License

MIT License

Copyright (c) 2025 Arpit Mathur

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the β€œSoftware”), to deal
in the Software without restriction, including without limitation the rights to
use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of
the Software, and to permit persons to whom the Software is furnished to do so,
subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED β€œAS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES, OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.

πŸ™‹ Author & Support

Developed by Arpit Mathur, independent researcher.
πŸ“§ Contact: arpit20328@iiitd.ac.in

πŸ› For bugs, suggestions, or improvements, please open an issue in the GitHub Issues section.


About

Converts Custom FASTA file to Kraken2 Index without need of taxonomy or accession files

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages