RNA structural alignment via multi-strategy TM-score optimization.
INDIalign finds the rigid-body superposition (rotation R, translation t) that maximizes the RNA TM-score between two sets of C1' coordinates. It uses a multi-strategy seed-and-refine pipeline that searches more broadly than a single alignment heuristic. Across three benchmarks on real PDB structures and synthetic pairs, INDIalign consistently finds higher-scoring superpositions than USalign (54--84% win rate, p < 0.004), with the largest gains on hard targets. It is typically 1.8--3.5x slower depending on structure size and difficulty.
cd indialign_c && make # auto-detects CUDA; use `make cpu` for CPU-onlyimport ctypes, numpy as np
lib = ctypes.CDLL("indialign_c/libindialign.so")
class NativeConfig(ctypes.Structure):
_fields_ = [("backend_mode", ctypes.c_int),
("use_fragment_search", ctypes.c_int),
("max_iter", ctypes.c_int), ("dp_iter", ctypes.c_int),
("max_mem_gb", ctypes.c_double),
("request_cuda", ctypes.c_int),
("rescue_score_1", ctypes.c_double),
("rescue_score_2", ctypes.c_double),
("early_term_score", ctypes.c_double)]
class NativePairInput(ctypes.Structure):
_fields_ = [("length", ctypes.c_int),
("pred_xyz", ctypes.POINTER(ctypes.c_double)),
("native_xyz", ctypes.POINTER(ctypes.c_double)),
("valid_mask", ctypes.POINTER(ctypes.c_uint8)),
("pred_valid_mask", ctypes.POINTER(ctypes.c_uint8)),
("native_valid_mask", ctypes.POINTER(ctypes.c_uint8)),
("lnorm", ctypes.c_double)]
class NativePairResult(ctypes.Structure):
_fields_ = [("score", ctypes.c_double),
("rotation", ctypes.c_double * 9),
("translation", ctypes.c_double * 3)]
# Set up input
N = len(pred_coords) # number of residues
pred = np.ascontiguousarray(pred_coords.flatten(), dtype=np.float64)
native = np.ascontiguousarray(native_coords.flatten(), dtype=np.float64)
valid = np.ones(N, dtype=np.uint8)
inp = NativePairInput()
inp.length = N
inp.pred_xyz = pred.ctypes.data_as(ctypes.POINTER(ctypes.c_double))
inp.native_xyz = native.ctypes.data_as(ctypes.POINTER(ctypes.c_double))
inp.valid_mask = valid.ctypes.data_as(ctypes.POINTER(ctypes.c_uint8))
inp.pred_valid_mask = valid.ctypes.data_as(ctypes.POINTER(ctypes.c_uint8))
inp.native_valid_mask = valid.ctypes.data_as(ctypes.POINTER(ctypes.c_uint8))
inp.lnorm = float(N)
cfg = NativeConfig()
lib.indialign_default_config(ctypes.byref(cfg))
res = NativePairResult()
lib.indialign_tmscore_search(ctypes.byref(inp), ctypes.byref(cfg), ctypes.byref(res))
print(f"TM-score: {res.score:.4f}")Both tools receive the same input pair and each produces its own superposition (R, t). Both transforms are then scored by an identical common scorer -- same d0 formula, same TM-score formula sum(1 / (1 + d^2/d0^2)) / Lnorm, no d8 filter. This eliminates self-reporting bias.
128 pairs from 26 NMR RNA ensembles downloaded from RCSB PDB (27--77 nucleotides). Each pair aligns model 1 against other models from the same ensemble, providing natural conformational variation.
| Metric | Value |
|---|---|
| INDIalign wins | 108 / 128 (84.4%) |
| USalign wins | 13 / 128 (10.2%) |
| Ties | 7 / 128 (5.5%) |
| Mean TM-score delta | +0.005995 |
| Median TM-score delta | +0.003391 |
| Sign test p-value | 4.2 x 10^-20 |
| Wilcoxon p-value | 1.7 x 10^-19 |
| Difficulty | N | INDIalign wins | USalign wins | Ties | Mean delta |
|---|---|---|---|---|---|
| Hard (TM < 0.3) | 6 | 6 | 0 | 0 | +0.007664 |
| Medium (0.3--0.5) | 80 | 66 | 9 | 5 | +0.007062 |
| Moderate (0.5--0.7) | 31 | 28 | 2 | 1 | +0.004886 |
| Easy (TM > 0.7) | 11 | 8 | 2 | 1 | +0.000447 |
Speed on PDB structures: INDIalign 4.1 ms/pair, USalign 2.3 ms/pair (1.8x).
72 harder pairs: most divergent NMR model pairs (50--155 nt), X-ray chain-vs-chain (46--412 nt), and cross-structure pairs from the same RNA family (TPP, SAM, purine riboswitches; HDV ribozyme).
| Metric | Value |
|---|---|
| INDIalign wins | 39 / 72 (54.2%) |
| USalign wins | 18 / 72 (25.0%) |
| Ties | 15 / 72 (20.8%) |
| Mean TM-score delta | +0.000976 |
| Sign test p-value | 3.8 x 10^-3 |
| Wilcoxon p-value | 1.5 x 10^-3 |
| Difficulty | N | INDIalign wins | USalign wins | Mean delta |
|---|---|---|---|---|
| Hard (TM < 0.3) | 10 | 7 | 0 | +0.005522 |
| Medium (0.3--0.5) | 16 | 6 | 5 | +0.000098 |
| Moderate (0.5--0.7) | 17 | 9 | 7 | +0.000431 |
| Easy (TM > 0.7) | 29 | 17 | 6 | +0.000213 |
Speed on challenging pairs: INDIalign 20.7 ms/pair, USalign 5.8 ms/pair (3.5x).
500 synthetic RNA-like coordinate pairs, lengths 40--300, Gaussian noise sigma ~ U(0.5, 14.0) angstroms, fixed seed (42).
| Metric | Value |
|---|---|
| INDIalign wins | 330 / 500 (66.0%) |
| USalign wins | 137 / 500 (27.4%) |
| Ties | 33 / 500 (6.6%) |
| Mean TM-score delta | +0.000477 |
| Sign test p-value | 9.7 x 10^-20 |
| Difficulty | N | INDIalign wins | USalign wins | Ties | Mean delta |
|---|---|---|---|---|---|
| Very hard (TM < 0.17) | 168 | 106 | 41 | 21 | +0.000725 |
| Hard (0.17--0.3) | 132 | 101 | 28 | 3 | +0.000657 |
| Medium (0.3--0.5) | 98 | 72 | 26 | 0 | +0.000306 |
| Moderate (0.5--0.7) | 50 | 18 | 31 | 1 | -0.000013 |
| Easy (TM > 0.7) | 52 | 33 | 11 | 8 | +0.000012 |
Speed on synthetic pairs: INDIalign 83 ms/pair, USalign 22 ms/pair (3.8x).
INDIalign trades speed for search breadth. The speed ratio depends on structure size and difficulty: 1.8x on short NMR structures (27--77 nt), 3.5x on challenging PDB pairs, 3.8x on longer synthetic pairs (40--300 nt). Very long hard targets can trigger expensive rescue passes; the dense local-fragment rescue now uses a coarser stride on long inputs to reduce worst-case CPU time without changing the overall search pipeline. If throughput is the primary constraint, USalign is the better choice.
# Requires: USalign binary on PATH (or set USALIGN=/path/to/USalign),
# numpy, scipy
cd benchmark
python pdb_benchmark.py # NMR model-vs-model (128 pairs)
python pdb_hard_benchmark.py # challenging PDB pairs (72 pairs)
python fair_benchmark.py 500 # synthetic pairs (500 pairs)Run the benchmark scripts sequentially if you want meaningful wall-clock timing comparisons.
INDIalign runs a multi-stage pipeline. Each stage produces candidate (R, t) superpositions; the best is kept.
| Stage | Source | Description |
|---|---|---|
| 1. Seed generation | seeds.h |
Fragment seeds at multiple lengths + transform-independent anchor contact seeds (spatial-grid accelerated) |
| 2. Seed bank evaluation | search.h |
K seeds x C d0-candidates, each refined by iterative Kabsch with convergence detection and early termination |
| 3. SS-based alignment | ss.h |
Secondary structure from C1' geometry, NW alignment (pure SS + combined SS/structure) |
| 4. Gapless threading | threading.h |
Offset alignments pred[i] -> native[i-k], Kabsch + iterative refinement |
| 5. Rescue strategies | rescue.h |
Local fragment DP + threading DP, activated at score < 0.5 and < 0.3 |
| 6. Distance seed | seeds.h |
Top-50% closest residues under current best (R,t), re-evaluated as seed |
| 7. DP refinement | align.h |
Needleman-Wunsch cross-index DP, Kabsch on aligned pairs, rescored same-index |
| 8. Weighted refinement | core.h |
4 iterations: weight residues by TM contribution, weighted Kabsch |
Core math: Horn quaternion method via Jacobi 4x4 eigensolve. RNA d0 formula: piecewise for L < 30, 0.6 * sqrt(L - 0.5) - 2.5 for L >= 30 (identical to USalign).
- Anchor contact seeds (
seeds.h): Identify residue pairs with consistent inter-residue distances in both structures, without needing an initial (R, t). Effective starting points for hard cases. Uses a spatial grid to reduce neighbor lookups from O(V^2) to O(V * k_avg). - Cross-index DP with same-index rescoring: DP alignment finds pred[i] -> native[j] mappings. The resulting (R, t) is always rescored with same-index
tm_score_no_d8on the full coordinate arrays to maintain scoring integrity. - Progressive rescue: Rescue strategies activate only when needed (configurable thresholds, default 0.5 and 0.3), avoiding wasted compute on easy targets. This is the primary source of both the accuracy advantage and the speed cost. On very long targets, the dense local-fragment schedule uses a coarser stride to cap worst-case runtime.
- Early termination: The seed bank evaluation skips remaining seeds once the score exceeds a threshold (default 0.99), significantly speeding up easy cases where a near-perfect alignment is found early.
- Fused GPU kernel (
gpu_kernels.cu): A single CUDA kernel performs iterative Kabsch refinement + TM-score per seed, using warp-shuffle reductions and shared-memory eigensolve.
indialign_c/ C++20 core (~1,500 lines C++, ~330 lines CUDA)
indialign.cpp Main search pipeline + C API
api.h Public structs: NativePairInput, NativePairResult, NativeConfig
core.h Kabsch (Horn quaternion), TM-score, weighted refinement, d0
seeds.h Fragment seeds, anchor contact seeds, top-k distance seeds
search.h Seed bank evaluation (OpenMP parallel)
align.h Cross-index DP refinement, detailed search on aligned pairs
nw.h Needleman-Wunsch DP with affine gap penalty
ss.h RNA secondary structure from C1' geometry
threading.h Gapless threading search
rescue.h Local fragment DP, threading DP rescue strategies
gpu_kernels.cu Fused CUDA kernel: seed refinement + TM-score
gpu_device.cuh Device-side Jacobi eigensolve, warp reductions
Makefile Builds libindialign.so (CPU or CPU+CUDA)
scoring/ Python GPU layer (PyTorch + Triton, for batch evaluation)
benchmark/ Reproducible benchmark vs USalign
# CPU only (requires g++ with C++20 and OpenMP)
cd indialign_c && make cpu
# With CUDA (auto-detects GPU architecture)
cd indialign_c && make
# Specific CUDA architecture
cd indialign_c && make gpu CUDA_ARCH=sm_90Requirements: C++20 compiler (g++ 10+ or clang 13+), OpenMP. Optional: CUDA toolkit 12+ for GPU acceleration. The GPU path uses float32 internally and falls back to CPU on any CUDA error.
macOS: Apple Clang does not ship with OpenMP. Install it via brew install libomp and add -I$(brew --prefix libomp)/include -L$(brew --prefix libomp)/lib to CXXFLAGS/LDFLAGS, or use GCC (brew install gcc).
Set INDIALIGN_PROFILE=1 to print per-stage timings to stderr for a single search call. Set INDIALIGN_PROFILE_TAG to prepend an identifier such as a PDB code or pair label.
INDIALIGN_PROFILE=1 INDIALIGN_PROFILE_TAG='[1H1K]' python benchmark/pdb_hard_benchmark.py#include "api.h"
// Initialize default configuration
void indialign_default_config(NativeConfig *cfg);
// Align a single pair, returns 0 on success
int indialign_tmscore_search(const NativePairInput *inp,
const NativeConfig *cfg,
NativePairResult *out);
// Align multiple pairs in parallel (OpenMP)
int indialign_tmscore_search_batch(int n_pairs,
const NativePairInput *inputs,
const NativeConfig *cfg,
NativePairResult *outputs);
// Check CUDA availability (1 = available, 0 = not)
int indialign_cuda_available(void);Input: pred_xyz and native_xyz are flat arrays of length 3*N (x,y,z interleaved). valid_mask marks residues present in both structures. lnorm is the normalization length (typically N).
Output: rotation is a 3x3 matrix stored column-major (9 doubles). translation is a 3-vector. The transform applies as: aligned = pred @ R^T + t.
- Speed. 1.8--3.8x slower than USalign depending on structure size. The multi-strategy pipeline is inherently more expensive.
- Benchmark scope. PDB benchmarks cover NMR ensembles (27--155 nt), X-ray chain pairs (up to 412 nt), and cross-structure riboswitch/ribozyme families. Performance on protein-RNA complexes or prediction model outputs has not been tested.
- Same-length assumption. The C API requires pred and native to have the same length N. Structures with different lengths require pre-alignment or padding with invalid masks.
If you use INDIalign in your research, please cite:
@software{chalapati2026indialign,
author = {Chalapati, Sachin},
title = {{INDIalign}: {RNA} structural alignment via multi-strategy
{TM}-score optimization},
year = {2026},
url = {https://github.com/teenu/INDIalign},
version = {1.1.0}
}MIT License. See LICENSE.
USalign (Zhang & Skolnick lab) is the baseline for comparison. INDIalign uses the same RNA d0 normalization formula as USalign.