Skip to content

ayaanhossain/oligopool

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

598 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Version: 2026.02.22.1

✨ Features - πŸ“¦ Installation - πŸš€ Getting Started - πŸ“š Docs - πŸ“‹ API - πŸ’» CLI - πŸ“– Citation - βš–οΈ License

Oligopool Calculator is a Swiss-army knife for oligo pool libraries: a unified toolkit for high-throughput design, assembly, compression, and analysis of massively parallel assays, designed to integrate seamlessly with Python, the CLI, Jupyter, containers, and AI-assisted workflows.

Design modules generate primers, barcodes, motifs/anchors, and spacers; assembly modules split/pad long constructs; Degenerate Mode compresses similar sequences into IUPAC-degenerate oligos for cost-efficient synthesis (often useful for selection assays); and Analysis Mode packs and counts barcoded reads for activity quantification.

Oligopool Calculator has been used to build libraries of tens of thousands of promoters (see here, and here), ribozymes, and mRNA stability elements (see here). It has been benchmarked to design pools containing millions of oligos and to process hundreds of millions of sequencing reads per hour on low-cost desktop-grade hardware.

To learn more, please check out our paper in ACS Synthetic Biology.

Design and analysis of oligo pool variants using Oligopool Calculator. (a) In Design Mode, Oligopool Calculator generates optimized barcodes, primers, spacers, and motifs. Assembly Mode can split longer oligos into shorter padded fragments for synthesis and assembly. Degenerate Mode can compress similar variants into IUPAC-degenerate oligos for cost-efficient synthesis or selection-based discovery workflows. (b) Once the library is assembled and cloned, barcoded amplicon sequencing data can be processed via Analysis Mode for characterization. Analysis Mode proceeds by first indexing one or more sets of barcodes, packing the reads, and then producing count matrices either using acount (association counting) or xcount (combinatorial counting).

✨ Features

  • 🧬 Design mode: constraint-based design of barcodes, primers, motifs/anchors, and spacers with background screening and utilities (barcode, primer, motif, spacer, background, merge, revcomp, join, final).
  • πŸ”§ Assembly mode: fragment long oligos into overlapping pieces and add Type IIS primer pads for scarless assembly (split, pad).
  • πŸ§ͺ Degenerate mode: compress variant libraries with low mutational diversity into IUPAC-degenerate oligos for cost-efficient synthesis and selection-based characterization (compress, expand).
  • πŸ“ˆ Analysis mode: fast NGS-based activity quantification with read indexing, packing, and barcode/associate counting (index, pack, acount, xcount) extensible with callback methods (via Python library).
  • βœ… QC mode: validate and inspect constraints and outputs (lenstat, verify, inspect).
  • πŸ” Iterative & multiplexed workflows: patch_mode for extending existing pools, cross-set barcode separation, and per-group primer design with cross-compatibility screening.
  • ⚑ Performance: scalable to very large libraries and high-throughput sequencing datasets, with published benchmarks demonstrating efficient design and analysis on commodity hardware (see paper).
  • πŸ”’ Rich constraints: IUPAC sequence constraints, motif exclusion, repeat screening, Hamming-distance barcodes, and primer thermodynamic constraints (including optional paired-primer Tm matching).
  • πŸ“Š DataFrame-centric: modules operate on CSV/DataFrames and return updated tables plus stats; the CLI can emit JSON and supports reproducible stochastic runs (random_seed).
  • πŸ’» CLI + library-first: full-featured command-line interface with YAML config files, multi-step pipelines (sequential or parallel DAG), and a composable Python API for interactive use in scripts and Jupyter notebooks.
  • πŸ€– AI-assisted design: agent-ready documentation for Claude, ChatGPT, and Copilot.

πŸ€– AI-Assisted Design

Oligopool Calculator is optimized for AI-assisted workflows. Either share the docs/agent-skills.md file with your agent, or share the following raw URL along with a suitable prompt, for direct parsing.

https://raw.githubusercontent.com/ayaanhossain/oligopool/refs/heads/master/docs/agent-skills.md

Ensure that your AI/agent explores this document thoroughly. Afterwards, you can chat about the package, your specific design goals, and have the agent plan and execute the design and analysis pipelines.

πŸ“¦ Installation

Oligopool Calculator is a Python 3.10+-exclusive library.

On Linux, macOS, and Windows Subsystem for Linux, you can install Oligopool Calculator from PyPI, where it is published as the oligopool package.

$ pip install --upgrade oligopool # Installs and/or upgrades oligopool

This also installs the command line tools: oligopool and op.

Or install it directly from GitHub:

$ pip install git+https://github.com/ayaanhossain/oligopool.git

Both approaches should install all dependencies automatically.

Note The GitHub version will always be updated with all recent fixes. The PyPI version should be more stable.

If you are on Windows or simply prefer to, Oligopool Calculator can also be used via Docker (please see the notes).

Successful installation will look like this.

$ python
>>> import oligopool as op
>>> op.__version__
'2026.02.22.1'
>>>

πŸš€ Getting Started

Oligopool Calculator is carefully designed, easy to use, and stupid fast.

You can import the library and use its various functions either in a script or interactively inside a Jupyter environment. Use help(...) to read the docs as necessary and follow along.

The examples directory includes a design parser, a library compressor, an analysis pipeline, and a complete CLI YAML pipeline.

If you want the full end-to-end walkthrough, start with the notebook: Oligopool Calculator in action.

Documentation:

  • User Guide - Comprehensive tutorials, examples, and workflows
  • API Reference - Complete parameter documentation for all modules
  • AI Agent Guide - Decision trees, best practices, and gotchas for AI-assisted design (Claude, ChatGPT, Copilot)
  • Docker Guide - Run oligopool in a container for cross-platform consistency
$ python
>>>
>>> import oligopool as op
>>> help(op)
...
    Automated design and analysis of oligo pool libraries for
    high-throughput genomics and synthetic biology applications.

    Design Mode - build synthesis-ready oligo architectures
        barcode     orthogonal barcodes with Hamming distance guarantees
        primer      Tm-optimized primers with off-target screening
        motif       sequence motifs or anchors
        spacer      neutral fill to reach target length
        background  k-mer database for off-target screening
        merge       collapse columns into single element
        revcomp     reverse complement a column range
        join        join two tables on ID with ordered insertion
        final       concatenate into synthesis-ready oligos

    Assembly Mode - fragment long oligos for assembly
        split       fragment oligos into overlapping pieces
        pad         Type IIS primer pads for scarless excision

    Degenerate Mode - compress variant libraries for synthesis
        compress    reduce similar variants to IUPAC-degenerate oligos
        expand      expand IUPAC-degenerate oligos into concrete sequences

    Analysis Mode - quantify variants from NGS reads
        index       index barcodes and associated variants
        pack        filter/merge/deduplicate FastQ reads
        acount      association counting (barcode + variant verification)
        xcount      combinatorial counting (single or multiple barcodes)

    QC Mode - validate and inspect outputs
        lenstat     length statistics and free-space check
        verify      verify length, motif, and background conflicts
        inspect     inspect background/index/pack artifacts

    Advanced
        vectorDB    ShareDB k-mer storage
        Scry        1-NN barcode classifier

    Usage
        >>> import oligopool as op
        >>> df, stats = op.barcode(input_data='variants.csv', ...)
        >>> help(op.barcode)  # module docs

    Modules return (DataFrame, stats). Chain them iteratively; use patch_mode=True
    to extend pools without overwriting existing designs.

    CLI: `op` | `op COMMAND` | Docs: https://github.com/ayaanhossain/oligopool
...

πŸ’» Command Line Interface (CLI)

The oligopool package installs a CLI with two equivalent entry points: oligopool and op.

$ op
$ op cite
$ op manual
$ op manual topics
$ oligopool manual barcode

Run op with no arguments to see the command list, and run op COMMAND to see command-specific options.

$ op

oligopool v2026.02.22.1
by ah

Oligopool Calculator is a suite of algorithms for
automated design and analysis of oligo pool libraries.

usage: oligopool COMMAND --argument=<value> ...

COMMANDS Available:

    manual      show module documentation
    cite        show citation information

    pipeline    execute multi-step pipeline from config

    barcode     orthogonal barcodes with cross-set separation
    primer      thermodynamic primers with optional Tm matching
    motif       design or add motifs/anchors
    spacer      neutral spacers to meet length targets

    background  build k-mer background database

    split       break long oligos into overlapping fragments
    pad         add excisable primer pads for scarless assembly

    merge       collapse contiguous columns
    revcomp     reverse-complement a column range
    join        join two oligo pool tables on ID

    lenstat     compute length stats and free space
    verify      detect length, motif, and background conflicts

    final       finalize into synthesis-ready oligos

    compress    compress sequences into IUPAC-degenerate oligos
    expand      expand IUPAC oligos to concrete sequences

    index       build barcode/associate index
    pack        preprocess and deduplicate FastQ reads
    acount      association counting (single index)
    xcount      combinatorial counting (multiple indexes)

    inspect     inspect non-CSV artifacts

    complete    print or install shell completion

Run "oligopool COMMAND" to see command-specific options.

Install tab-completion to blaze through interactive CLI use (recommended).

$ op complete --install          # auto-detect shell (restart your shell)
$ op complete --install bash     # or: zsh|fish

For detailed CLI behavior (output basenames, suffixing, type aliases, sequence-constraint shorthand, and split output defaults), see the CLI-Specific Notes.

YAML Pipelines

Define entire workflows in a single YAML config file and execute with one command:

$ op pipeline --config pipeline.yaml
$ op pipeline --config pipeline.yaml --dry-run  # validate first

Pipelines support sequential or parallel DAG execution, where independent steps run concurrently.

Example (single design output, serial chain):

pipeline:
  name: "MPRA Design (Serial)"
  steps:
    - primer
    - barcode
    - spacer
    - final

primer:
  input_data: "variants.csv"
  output_file: "01_primer"
  primer_type: forward
  # ...

Example (parallel DAG, best fit for analysis):

pipeline:
  name: "Counting DAG (Parallel)"
  steps:
    - name: index_bc1
      command: index
    - name: index_bc2
      command: index
    - name: pack_reads
      command: pack
    - name: count
      command: xcount
      after: [index_bc1, index_bc2, pack_reads]

# (Configs for index/pack/xcount omitted here for brevity.)

Working examples live in examples/cli-yaml-pipeline. Full pipeline rules live in Config Files.

πŸ“– Citation

If you use Oligopool Calculator in your research publication, please cite our paper.

Hossain A, Cetnar DP, LaFleur TL, McLellan JR, Salis HM.
Automated Design of Oligopools and Rapid Analysis of Massively Parallel Barcoded Measurements.
ACS Synth Biol. 2024;13(12):4218-4232. doi:10.1021/acssynbio.4c00661

BibTeX:

@article{Hossain2024Oligopool,
  title   = {Automated Design of Oligopools and Rapid Analysis of Massively Parallel Barcoded Measurements},
  author  = {Hossain, Ayaan and Cetnar, Daniel P. and LaFleur, Travis L. and McLellan, James R. and Salis, Howard M.},
  journal = {ACS Synthetic Biology},
  year    = {2024},
  volume  = {13},
  number  = {12},
  pages   = {4218--4232},
  doi     = {10.1021/acssynbio.4c00661}
}

You can read the paper online for free at ACS Synthetic Biology.

  • PMCID: PMC11669329
  • PMID: 39641628

βš–οΈ License

Oligopool Calculator (c) 2026 Ayaan Hossain.

Oligopool Calculator is an open-source software under the GPL-3.0 license.

See LICENSE file for more details.

Packages

 
 
 

Contributors