dnadesign is a collection of modular bioinformatic pipelines and helper packages related to biological sequence design.
dnadesign/
├─ README.md # High-level project documentation
├─ pyproject.toml
├─ uv.lock
└── src/
└── dnadesign/
├── permuter/ # in silico deep mutational scanning
├── infer/ # model-agnostic inference (Evo2 adapter)
├── densegen/ # string-packing nucleic acid assembly
├── opal/ # active-learning engine
└── ...
- Installation
- Quickstart marimo notebooks
- Maintaining dependencies
- CUDA/GPU install notes (BU SCC)
- Marimo reference
-
usr (Universal Sequence Record)
Consists of utility commands to inspect datasets/Parquet files used across the
dnadesignproject. -
DNA sequence design pipeline built on the integer linear programming framework from the
dense-arrayspackage. -
Model-agnostic wrapper for DNA/protein language models (e.g., Evo2).
-
An EVOLVEpro-style active-learning tool for DNA/protein sequence design campaigns.
-
A Parquet/CSV-first tool for Leiden clustering, UMAP visualisation, and a mix of other analyses.
-
Quantifies the regulatory diversity of dense-array DNA libraries generated by densegen.
-
Iteratively subsamples sequence libraries from the sibling sequences directory and computes diversity metrics using the billboard pipeline as its engine.
-
Applies Non-Negative Matrix Factorization (NMF) to a library of sequences generated by densegen to uncover higher-order transcription factor binding site combinations.
-
Pipeline for latent space analysis of DNA sequences.
-
Pipeline that parses TF position-weight matrices (MEME, JASPAR, etc.) via plug-in parsers, and then runs a discrete Categorical Gibbs optimiser (or other plug-ins) to discover short DNA sequences that score highly on one or more TFs.
-
Pipeline for analyzing transcription factor knockdown (TFKD) effects using PPTP-seq (Promoter responses to TF perturbation sequencing) data—a high-throughput approach described in Han et al. (2023).
-
Wrapper for Biopython's PairwiseAligner, which is a class for computing Needleman–Wunsch global alignment scores between nucleotide sequences.
-
Pipeline for biological sequence permutation and subsequent evaluation workflows.
-
Contains a mix of old legacy projects and prototypes.
@e-south