LOOM — Open-Science CRISPR Target Discovery

A 195 KB WebAssembly search engine that scans 700,000+ pathogen genomes in sub-millisecond time, entirely in the browser. No backend, no install, no API key.

Try the live tool →

What This Is

LOOM is a BWT/FM-index exact-retrieval engine built in Rust and compiled to WebAssembly. Its primary application is open-science CRISPR diagnostic target discovery: finding conserved 23-mer sequences across massive pathogen genome collections that could serve as diagnostic detection targets.

The project produced:

A browser-based search tool — select a pathogen, browse pre-computed targets, filter by gene/PAM/novelty, or run live FM-index search against the full genome collection
52 novel CRISPR diagnostic targets — in genes with no prior published CRISPR diagnostics, verified through a 4-strategy literature scan with ontology-backed synonym expansion
A reproducible pipeline — every result in the paper can be regenerated from the scripts and data in this repository

Key Numbers

Metric	Value
WASM binary size	195 KB
Search latency	< 1 ms
Pathogen genomes indexed	12.3M+
Pre-computed targets	140,000+
Novel targets discovered	52
Off-target animal genomes screened	7 (human, mouse, cow, pig, chicken, bat, camel)

Repository Structure

brenda/                 Rust FM-index crate (the core engine)
  src/indexer/          Index construction: BWT, suffix arrays, sharding
  src/search/           Query engine: exact match, federated search

src/                    CLI + WASM bindings
  main.rs              Command-line interface entry point
  wasm.rs              WebAssembly bindings for browser deployment
  dna/                 Genomic pipeline: CRISPR scanning, off-target, conservation
  cli/                 CLI subcommands

web/                    Browser search interface (deployed to Azure Static Web Apps)
  crispr-search.html   Main search page — pathogen targets database + live search
  crispr-animal.html   Animal genome off-target screening
  learn.html           Interactive CRISPR learning guide
  methodology.html     v4 scan methodology & integrity report
  data/                Pre-computed JSON datasets served to the browser

data/crispr_guides/     Research results & validation data
  novel_52_guides.tsv   The 52 novel diagnostic targets (primary result)
  offtargets_*.csv      Off-target screening against 7 animal genomes
  patent_scan_report.json   Prior art screening results

scripts/                Reproducibility pipeline
  overnight_openscience.sh  Full pipeline orchestrator
  scan_crispr_targets.py    Target scanning wrapper
  compute_*.py/sh           Scoring, conservation, cross-reactivity, RNAfold
  pubmed_scan_v3.py         4-strategy literature gap verification (PubMed + Europe PMC)
  patent_scan.py            Lens.org prior art screening
  verify-citations.py       Citation verification for the paper
  ontology/                 Ontology fetchers (NCBI taxonomy, Disease Ontology, MONDO)

publications/           Research papers
  papers/               Source markdown for the paper
  preprints/            Built PDFs

docs/
  runbooks/             Pipeline operation guides
  plans/                Research planning documents
  lessons/              Development lessons learned

Getting Started

Prerequisites

Rust (stable, 2024 edition)
wasm-pack (for WASM builds)
Python 3.10+ with a virtual environment (for pipeline scripts)

Build the CLI

cargo build --release

Build the WASM module

wasm-pack build --target web --out-dir pkg --release

Run the search tool locally

Open web/crispr-search.html in a browser, or serve the web/ directory:

cd web && python -m http.server 8080

Run the pipeline

See docs/runbooks/overnight-openscience-pipeline.md for full pipeline instructions.

# Scan targets for a pathogen
cargo run --release -- crispr-scan --fasta /path/to/genomes.fasta --output targets.json

# Score and filter guides
python scripts/compute_guide_scores.py

# Run off-target screening
bash scripts/compute_cross_reactivity.sh

CLI Commands

Command	Description
`index`	Build an FM-index from text or FASTA files
`search`	Exact-match search against an index
`batch-search`	Search multiple patterns in one pass
`crispr-scan`	Scan genomes for CRISPR-targetable 23-mers
`guide-conservation`	Compute conservation scores across genome sets
`shard-search`	Distributed search across sharded indexes

The Paper

The research paper describing the methodology and results is in publications/papers/:

Full paper: pangenomic-crispr-targets-merged.md
5-page brief: pangenomic-crispr-targets-brief.md

An annotated walkthrough of the results is available at the live website.

License

MIT — see web/LICENSE

Author

Alvaro Videla Godoy — built with a Mac Studio and GitHub Copilot, evenings and weekends.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.cargo		.cargo
.github		.github
brenda		brenda
data/crispr_guides		data/crispr_guides
docs		docs
publications		publications
scripts		scripts
src		src
tests		tests
web		web
.gitignore		.gitignore
AGENTS.md		AGENTS.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LOOM — Open-Science CRISPR Target Discovery

What This Is

Key Numbers

Repository Structure

Getting Started

Prerequisites

Build the CLI

Build the WASM module

Run the search tool locally

Run the pipeline

CLI Commands

The Paper

License

Author

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

LOOM — Open-Science CRISPR Target Discovery

What This Is

Key Numbers

Repository Structure

Getting Started

Prerequisites

Build the CLI

Build the WASM module

Run the search tool locally

Run the pipeline

CLI Commands

The Paper

License

Author

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages