INDIalign

RNA structural alignment via multi-strategy TM-score optimization.

INDIalign finds the rigid-body superposition (rotation R, translation t) that maximizes the RNA TM-score between two sets of C1' coordinates. It uses a multi-strategy seed-and-refine pipeline that searches more broadly than a single alignment heuristic. Across three benchmarks on real PDB structures and synthetic pairs, INDIalign consistently finds higher-scoring superpositions than USalign (54--84% win rate, p < 0.004), with the largest gains on hard targets. It is typically 1.8--3.5x slower depending on structure size and difficulty.

Quick Start

cd indialign_c && make    # auto-detects CUDA; use `make cpu` for CPU-only

Python (ctypes)

import ctypes, numpy as np

lib = ctypes.CDLL("indialign_c/libindialign.so")

class NativeConfig(ctypes.Structure):
    _fields_ = [("backend_mode", ctypes.c_int),
                ("use_fragment_search", ctypes.c_int),
                ("max_iter", ctypes.c_int), ("dp_iter", ctypes.c_int),
                ("max_mem_gb", ctypes.c_double),
                ("request_cuda", ctypes.c_int),
                ("rescue_score_1", ctypes.c_double),
                ("rescue_score_2", ctypes.c_double),
                ("early_term_score", ctypes.c_double)]

class NativePairInput(ctypes.Structure):
    _fields_ = [("length", ctypes.c_int),
                ("pred_xyz", ctypes.POINTER(ctypes.c_double)),
                ("native_xyz", ctypes.POINTER(ctypes.c_double)),
                ("valid_mask", ctypes.POINTER(ctypes.c_uint8)),
                ("pred_valid_mask", ctypes.POINTER(ctypes.c_uint8)),
                ("native_valid_mask", ctypes.POINTER(ctypes.c_uint8)),
                ("lnorm", ctypes.c_double)]

class NativePairResult(ctypes.Structure):
    _fields_ = [("score", ctypes.c_double),
                ("rotation", ctypes.c_double * 9),
                ("translation", ctypes.c_double * 3)]

# Set up input
N = len(pred_coords)  # number of residues
pred = np.ascontiguousarray(pred_coords.flatten(), dtype=np.float64)
native = np.ascontiguousarray(native_coords.flatten(), dtype=np.float64)
valid = np.ones(N, dtype=np.uint8)

inp = NativePairInput()
inp.length = N
inp.pred_xyz = pred.ctypes.data_as(ctypes.POINTER(ctypes.c_double))
inp.native_xyz = native.ctypes.data_as(ctypes.POINTER(ctypes.c_double))
inp.valid_mask = valid.ctypes.data_as(ctypes.POINTER(ctypes.c_uint8))
inp.pred_valid_mask = valid.ctypes.data_as(ctypes.POINTER(ctypes.c_uint8))
inp.native_valid_mask = valid.ctypes.data_as(ctypes.POINTER(ctypes.c_uint8))
inp.lnorm = float(N)

cfg = NativeConfig()
lib.indialign_default_config(ctypes.byref(cfg))

res = NativePairResult()
lib.indialign_tmscore_search(ctypes.byref(inp), ctypes.byref(cfg), ctypes.byref(res))

print(f"TM-score: {res.score:.4f}")

Benchmark: INDIalign vs USalign

Methodology

Both tools receive the same input pair and each produces its own superposition (R, t). Both transforms are then scored by an identical common scorer -- same d0 formula, same TM-score formula sum(1 / (1 + d^2/d0^2)) / Lnorm, no d8 filter. This eliminates self-reporting bias.

Real PDB Structures

128 pairs from 26 NMR RNA ensembles downloaded from RCSB PDB (27--77 nucleotides). Each pair aligns model 1 against other models from the same ensemble, providing natural conformational variation.

Metric	Value
INDIalign wins	108 / 128 (84.4%)
USalign wins	13 / 128 (10.2%)
Ties	7 / 128 (5.5%)
Mean TM-score delta	+0.005995
Median TM-score delta	+0.003391
Sign test p-value	4.2 x 10^-20
Wilcoxon p-value	1.7 x 10^-19

Difficulty	N	INDIalign wins	USalign wins	Ties	Mean delta
Hard (TM < 0.3)	6	6	0	0	+0.007664
Medium (0.3--0.5)	80	66	9	5	+0.007062
Moderate (0.5--0.7)	31	28	2	1	+0.004886
Easy (TM > 0.7)	11	8	2	1	+0.000447

Speed on PDB structures: INDIalign 4.1 ms/pair, USalign 2.3 ms/pair (1.8x).

Challenging PDB Pairs

72 harder pairs: most divergent NMR model pairs (50--155 nt), X-ray chain-vs-chain (46--412 nt), and cross-structure pairs from the same RNA family (TPP, SAM, purine riboswitches; HDV ribozyme).

Metric	Value
INDIalign wins	39 / 72 (54.2%)
USalign wins	18 / 72 (25.0%)
Ties	15 / 72 (20.8%)
Mean TM-score delta	+0.000976
Sign test p-value	3.8 x 10^-3
Wilcoxon p-value	1.5 x 10^-3

Difficulty	N	INDIalign wins	USalign wins	Mean delta
Hard (TM < 0.3)	10	7	0	+0.005522
Medium (0.3--0.5)	16	6	5	+0.000098
Moderate (0.5--0.7)	17	9	7	+0.000431
Easy (TM > 0.7)	29	17	6	+0.000213

Speed on challenging pairs: INDIalign 20.7 ms/pair, USalign 5.8 ms/pair (3.5x).

Synthetic Pairs

500 synthetic RNA-like coordinate pairs, lengths 40--300, Gaussian noise sigma ~ U(0.5, 14.0) angstroms, fixed seed (42).

Metric	Value
INDIalign wins	330 / 500 (66.0%)
USalign wins	137 / 500 (27.4%)
Ties	33 / 500 (6.6%)
Mean TM-score delta	+0.000477
Sign test p-value	9.7 x 10^-20

Difficulty	N	INDIalign wins	USalign wins	Ties	Mean delta
Very hard (TM < 0.17)	168	106	41	21	+0.000725
Hard (0.17--0.3)	132	101	28	3	+0.000657
Medium (0.3--0.5)	98	72	26	0	+0.000306
Moderate (0.5--0.7)	50	18	31	1	-0.000013
Easy (TM > 0.7)	52	33	11	8	+0.000012

Speed on synthetic pairs: INDIalign 83 ms/pair, USalign 22 ms/pair (3.8x).

Speed

INDIalign trades speed for search breadth. The speed ratio depends on structure size and difficulty: 1.8x on short NMR structures (27--77 nt), 3.5x on challenging PDB pairs, 3.8x on longer synthetic pairs (40--300 nt). Very long hard targets can trigger expensive rescue passes; the dense local-fragment rescue now uses a coarser stride on long inputs to reduce worst-case CPU time without changing the overall search pipeline. If throughput is the primary constraint, USalign is the better choice.

Reproducing

# Requires: USalign binary on PATH (or set USALIGN=/path/to/USalign),
#           numpy, scipy
cd benchmark
python pdb_benchmark.py            # NMR model-vs-model (128 pairs)
python pdb_hard_benchmark.py       # challenging PDB pairs (72 pairs)
python fair_benchmark.py 500       # synthetic pairs (500 pairs)

Run the benchmark scripts sequentially if you want meaningful wall-clock timing comparisons.

Algorithm

INDIalign runs a multi-stage pipeline. Each stage produces candidate (R, t) superpositions; the best is kept.

Stage	Source	Description
1. Seed generation	`seeds.h`	Fragment seeds at multiple lengths + transform-independent anchor contact seeds (spatial-grid accelerated)
2. Seed bank evaluation	`search.h`	K seeds x C d0-candidates, each refined by iterative Kabsch with convergence detection and early termination
3. SS-based alignment	`ss.h`	Secondary structure from C1' geometry, NW alignment (pure SS + combined SS/structure)
4. Gapless threading	`threading.h`	Offset alignments pred[i] -> native[i-k], Kabsch + iterative refinement
5. Rescue strategies	`rescue.h`	Local fragment DP + threading DP, activated at score < 0.5 and < 0.3
6. Distance seed	`seeds.h`	Top-50% closest residues under current best (R,t), re-evaluated as seed
7. DP refinement	`align.h`	Needleman-Wunsch cross-index DP, Kabsch on aligned pairs, rescored same-index
8. Weighted refinement	`core.h`	4 iterations: weight residues by TM contribution, weighted Kabsch

Core math: Horn quaternion method via Jacobi 4x4 eigensolve. RNA d0 formula: piecewise for L < 30, 0.6 * sqrt(L - 0.5) - 2.5 for L >= 30 (identical to USalign).

Key Design Decisions

Anchor contact seeds (seeds.h): Identify residue pairs with consistent inter-residue distances in both structures, without needing an initial (R, t). Effective starting points for hard cases. Uses a spatial grid to reduce neighbor lookups from O(V^2) to O(V * k_avg).
Cross-index DP with same-index rescoring: DP alignment finds pred[i] -> native[j] mappings. The resulting (R, t) is always rescored with same-index tm_score_no_d8 on the full coordinate arrays to maintain scoring integrity.
Progressive rescue: Rescue strategies activate only when needed (configurable thresholds, default 0.5 and 0.3), avoiding wasted compute on easy targets. This is the primary source of both the accuracy advantage and the speed cost. On very long targets, the dense local-fragment schedule uses a coarser stride to cap worst-case runtime.
Early termination: The seed bank evaluation skips remaining seeds once the score exceeds a threshold (default 0.99), significantly speeding up easy cases where a near-perfect alignment is found early.
Fused GPU kernel (gpu_kernels.cu): A single CUDA kernel performs iterative Kabsch refinement + TM-score per seed, using warp-shuffle reductions and shared-memory eigensolve.

Code Structure

indialign_c/             C++20 core (~1,500 lines C++, ~330 lines CUDA)
  indialign.cpp          Main search pipeline + C API
  api.h                  Public structs: NativePairInput, NativePairResult, NativeConfig
  core.h                 Kabsch (Horn quaternion), TM-score, weighted refinement, d0
  seeds.h                Fragment seeds, anchor contact seeds, top-k distance seeds
  search.h               Seed bank evaluation (OpenMP parallel)
  align.h                Cross-index DP refinement, detailed search on aligned pairs
  nw.h                   Needleman-Wunsch DP with affine gap penalty
  ss.h                   RNA secondary structure from C1' geometry
  threading.h            Gapless threading search
  rescue.h               Local fragment DP, threading DP rescue strategies
  gpu_kernels.cu         Fused CUDA kernel: seed refinement + TM-score
  gpu_device.cuh         Device-side Jacobi eigensolve, warp reductions
  Makefile               Builds libindialign.so (CPU or CPU+CUDA)

scoring/                 Python GPU layer (PyTorch + Triton, for batch evaluation)
benchmark/               Reproducible benchmark vs USalign

Building

# CPU only (requires g++ with C++20 and OpenMP)
cd indialign_c && make cpu

# With CUDA (auto-detects GPU architecture)
cd indialign_c && make

# Specific CUDA architecture
cd indialign_c && make gpu CUDA_ARCH=sm_90

Requirements: C++20 compiler (g++ 10+ or clang 13+), OpenMP. Optional: CUDA toolkit 12+ for GPU acceleration. The GPU path uses float32 internally and falls back to CPU on any CUDA error.

macOS: Apple Clang does not ship with OpenMP. Install it via brew install libomp and add -I$(brew --prefix libomp)/include -L$(brew --prefix libomp)/lib to CXXFLAGS/LDFLAGS, or use GCC (brew install gcc).

Profiling

Set INDIALIGN_PROFILE=1 to print per-stage timings to stderr for a single search call. Set INDIALIGN_PROFILE_TAG to prepend an identifier such as a PDB code or pair label.

INDIALIGN_PROFILE=1 INDIALIGN_PROFILE_TAG='[1H1K]' python benchmark/pdb_hard_benchmark.py

C API

#include "api.h"

// Initialize default configuration
void indialign_default_config(NativeConfig *cfg);

// Align a single pair, returns 0 on success
int indialign_tmscore_search(const NativePairInput *inp,
                             const NativeConfig *cfg,
                             NativePairResult *out);

// Align multiple pairs in parallel (OpenMP)
int indialign_tmscore_search_batch(int n_pairs,
                                    const NativePairInput *inputs,
                                    const NativeConfig *cfg,
                                    NativePairResult *outputs);

// Check CUDA availability (1 = available, 0 = not)
int indialign_cuda_available(void);

Input: pred_xyz and native_xyz are flat arrays of length 3*N (x,y,z interleaved). valid_mask marks residues present in both structures. lnorm is the normalization length (typically N).

Output: rotation is a 3x3 matrix stored column-major (9 doubles). translation is a 3-vector. The transform applies as: aligned = pred @ R^T + t.

Limitations

Speed. 1.8--3.8x slower than USalign depending on structure size. The multi-strategy pipeline is inherently more expensive.
Benchmark scope. PDB benchmarks cover NMR ensembles (27--155 nt), X-ray chain pairs (up to 412 nt), and cross-structure riboswitch/ribozyme families. Performance on protein-RNA complexes or prediction model outputs has not been tested.
Same-length assumption. The C API requires pred and native to have the same length N. Structures with different lengths require pre-alignment or padding with invalid masks.

Citation

If you use INDIalign in your research, please cite:

@software{chalapati2026indialign,
  author  = {Chalapati, Sachin},
  title   = {{INDIalign}: {RNA} structural alignment via multi-strategy
             {TM}-score optimization},
  year    = {2026},
  url     = {https://github.com/teenu/INDIalign},
  version = {1.1.0}
}

License

MIT License. See LICENSE.

Acknowledgments

USalign (Zhang & Skolnick lab) is the baseline for comparison. INDIalign uses the same RNA d0 normalization formula as USalign.

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
benchmark		benchmark
indialign_c		indialign_c
scoring		scoring
tests		tests
.gitignore		.gitignore
CITATION.cff		CITATION.cff
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

INDIalign

Quick Start

Python (ctypes)

Benchmark: INDIalign vs USalign

Methodology

Real PDB Structures

Challenging PDB Pairs

Synthetic Pairs

Speed

Reproducing

Algorithm

Key Design Decisions

Code Structure

Building

Profiling

C API

Limitations

Citation

License

Acknowledgments

About

Uh oh!

Releases 3

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

INDIalign

Quick Start

Python (ctypes)

Benchmark: INDIalign vs USalign

Methodology

Real PDB Structures

Challenging PDB Pairs

Synthetic Pairs

Speed

Reproducing

Algorithm

Key Design Decisions

Code Structure

Building

Profiling

C API

Limitations

Citation

License

Acknowledgments

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 3

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages