Skip to content

Prepare primers from different formats, i.e. schemes for tiled sequencing from varvamp to artic format. Also prepares and aligns primers with mepcr or exonerate.

License

Notifications You must be signed in to change notification settings

FOI-Bioinformatics/preprimer

Repository files navigation

PrePrimer

Primer scheme converter for tiled amplicon sequencing supporting linear and circular genomes.

CI Status License: MIT Python 3.11+ Platform: Linux | macOS

PrePrimer provides bidirectional conversion between primer design formats used in tiled amplicon sequencing workflows. The tool handles VarVAMP, ARTIC, Olivar, STS, and FASTA formats with automatic topology detection for circular genomes.

Current Version: v0.3.0

Recent Changes

  • Comprehensive Writer Testing: All 5 output writers now have complete test coverage (110/113 tests, 97.3%)
  • BaseWriterTest Pattern: Reusable test infrastructure with automatic contract enforcement
  • Performance Baselines: Established benchmarks for all writers (51-591µs write times)

See CHANGELOG.md for complete release history.

Features

PrePrimer provides format conversion with the following capabilities:

  • Format Support: 4 input parsers (VarVAMP, ARTIC, Olivar, STS) and 5 output writers (VarVAMP, ARTIC, Olivar, STS, FASTA) supporting 20 conversion pathways
  • Topology Detection: Automatic identification of circular genome architectures (mitochondrial DNA, plasmids, viral episomes)
  • Standards Compliance: Implementation of primal-page info.json schema and articbedversion specifications (v2.0/v3.0)
  • IUPAC Support: Handling of degenerate nucleotide codes for variant-aware primer designs
  • Alignment Integration: BLAST, Exonerate, merPCR, and me-PCR providers for primer-to-reference alignment
  • Input Validation: Path sanitization, size limits, and format verification for security

Codebase Statistics

  • Source Code: ~6,900 lines of Python across 59 modules
  • Test Suite: ~22,300 lines implementing 998 tests
  • Test Coverage: 96.90% with 100% pass rate
  • Performance: Sub-second processing for datasets containing 500 amplicons

Installation

Requirements

  • Python 3.11 or higher
  • Linux or macOS operating system

Setup

# Clone repository
git clone https://github.com/FOI-Bioinformatics/preprimer.git
cd preprimer

# Install package
pip install -e .

# Verify installation
preprimer --version

Quick Start

Command Line Interface

# List supported formats
preprimer list

# Inspect input file
preprimer info primers.tsv

# Convert single format
preprimer convert --input primers.tsv --output-dir output/ --output-formats artic

# Convert to multiple formats
preprimer convert --input primers.tsv --output-dir output/ \
                  --output-formats artic fasta sts --prefix MyVirus

Python API

from preprimer.core.converter import PrimerConverter
from preprimer.core.enhanced_config import EnhancedConfig

# Initialize converter
config = EnhancedConfig()
converter = PrimerConverter(config)

# Perform conversion
result = converter.convert(
    input_file="primers.tsv",
    output_dir="output/",
    output_formats=["artic", "fasta"],
    prefix="SARS-CoV-2"
)

print(f"Converted {len(result)} amplicons")

Supported Formats

Format Input Output Specification
VarVAMP .tsv 13-column TSV with IUPAC degenerate nucleotide support
ARTIC .bed BED6 format following articbedversion v2.0/v3.0
Olivar .csv CSV format with amplicon metadata and circular genome support
STS .sts.tsv 3/4-column TSV with automatic header detection
FASTA Multi-FASTA with metadata in sequence headers

Full format specifications available in docs/user-guide/supported-formats.md.

Documentation

User Documentation

Developer Documentation

Technical Documentation

Development Documentation

Alignment Integration

PrePrimer integrates multiple alignment providers for primer-to-reference validation:

# Align primers using merPCR (recommended)
preprimer align --primers primers.bed --reference genome.fasta --aligner merpcr

# Use BLAST for alignment
preprimer align --primers primers.tsv --reference genome.fasta \
                --aligner blast --output alignment.tsv

Available providers: blast, exonerate, merpcr, mepcr

Testing

# Run complete test suite
python -m pytest

# Run with coverage report
python -m pytest --cov=preprimer --cov-report=html

# Run specific test categories
python -m pytest -m unit          # Unit tests only
python -m pytest -m integration   # Integration tests
python -m pytest -m security      # Security validation

Development

Code Quality Tools

# Format code
black preprimer/ tests/
isort preprimer/ tests/

# Static analysis
flake8 preprimer/ tests/ --max-line-length=88 --extend-ignore=E203,W503
mypy preprimer/ --ignore-missing-imports

# Security scanning
bandit -r preprimer/ -ll

Citation

If you use PrePrimer in your research, please cite:

@software{preprimer2025,
  title = {PrePrimer: Primer Scheme Converter for Tiled Amplicon Sequencing},
  author = {PrePrimer Contributors},
  year = {2025},
  url = {https://github.com/FOI-Bioinformatics/preprimer},
  version = {0.3.0}
}

See CITATION.cff for machine-readable citation metadata.

License

This project is licensed under the MIT License - see LICENSE for details.

Security

For security concerns, please review SECURITY.md for reporting procedures.

Support

Acknowledgments

PrePrimer implements specifications from:

About

Prepare primers from different formats, i.e. schemes for tiled sequencing from varvamp to artic format. Also prepares and aligns primers with mepcr or exonerate.

Resources

License

Security policy

Stars

Watchers

Forks

Packages

No packages published

Contributors 3

  •  
  •  
  •  

Languages