dnadesign

dnadesign is a collection of modular bioinformatic pipelines and helper packages related to biological sequence design.

Directory layout

dnadesign/
├─ README.md            # High-level project documentation
├─ pyproject.toml
├─ uv.lock
└── src/
    └── dnadesign/
        ├── permuter/    # in silico deep mutational scanning
        ├── infer/       # model-agnostic inference (Evo2 adapter)
        ├── densegen/    # string-packing nucleic acid assembly
        ├── opal/        # active-learning engine
        └── ...

Documentation

Available tools

usr (Universal Sequence Record)

Consists of utility commands to inspect datasets/Parquet files used across the dnadesign project.
densegen

DNA sequence design pipeline built on the integer linear programming framework from the dense-arrays package.
infer

Model-agnostic wrapper for DNA/protein language models (e.g., Evo2).
opal

An EVOLVEpro-style active-learning tool for DNA/protein sequence design campaigns.
cluster

A Parquet/CSV-first tool for Leiden clustering, UMAP visualisation, and a mix of other analyses.
billboard

Quantifies the regulatory diversity of dense-array DNA libraries generated by densegen.
libshuffle

Iteratively subsamples sequence libraries from the sibling sequences directory and computes diversity metrics using the billboard pipeline as its engine.
nmf

Applies Non-Negative Matrix Factorization (NMF) to a library of sequences generated by densegen to uncover higher-order transcription factor binding site combinations.
latdna

Pipeline for latent space analysis of DNA sequences.
cruncher

Pipeline that parses TF position-weight matrices (MEME, JASPAR, etc.) via plug-in parsers, and then runs a discrete Categorical Gibbs optimiser (or other plug-ins) to discover short DNA sequences that score highly on one or more TFs.
tfkdanalysis

Pipeline for analyzing transcription factor knockdown (TFKD) effects using PPTP-seq (Promoter responses to TF perturbation sequencing) data—a high-throughput approach described in Han et al. (2023).
aligner

Wrapper for Biopython's PairwiseAligner, which is a class for computing Needleman–Wunsch global alignment scores between nucleotide sequences.
permuter

Pipeline for biological sequence permutation and subsequent evaluation workflows.
archived

Contains a mix of old legacy projects and prototypes.

@e-south

Name		Name	Last commit message	Last commit date
Latest commit History 165 Commits
.github/workflows		.github/workflows
docs		docs
images		images
src/dnadesign		src/dnadesign
.gitattributes		.gitattributes
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.python-version		.python-version
AGENTS.md		AGENTS.md
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

dnadesign

Contents

Directory layout

Documentation

Available tools

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

e-south/dnadesign

Folders and files

Latest commit

History

Repository files navigation

dnadesign

Contents

Directory layout

Documentation

Available tools

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages