Skip to content

Releases: Entrolution/echidna

v0.5.0

14 Mar 22:47
32f8a75

Choose a tag to compare

Added

  • GPU cast safety audit: SAFETY comments on all as u32 casts in GPU paths (mod.rs, cuda_backend.rs, wgpu_backend.rs, stde_gpu.rs). Added debug_assert! guards on user-provided direction/batch counts in stde_gpu.rs.
  • #[must_use] annotations: 19 pure functions now carry #[must_use] (support module helpers, GPU codegen, solver wrappers, Laurent::zero/one).
  • #![warn(missing_docs)]: enabled crate-wide. All public items — 35 OpCode variants, ~190 elemental methods across Dual, DualVec, Taylor, TaylorDyn, Laurent, struct fields, and trait methods — now have doc comments.

Changed

  • Test decomposition: split tests/stde.rs (1630 lines, 76 tests) into 5 focused files: stde_core, stde_stats, stde_pipeline, stde_higher_order, stde_dense. All 76 tests preserved.
  • Removed ROADMAP.md — all phases (0–5) complete.

v0.4.0

26 Feb 01:52
764925c

Choose a tag to compare

Changed

Internal Architecture

  • BytecodeTape decomposition: split 2,689-line monolithic bytecode_tape.rs into a directory module with 10 focused submodules (forward.rs, reverse.rs, tangent.rs, jacobian.rs, sparse.rs, optimize.rs, taylor.rs, parallel.rs, serde_support.rs, thread_local.rs). Zero public API changes; benchmarks confirm no performance impact.
  • Deduplicated reverse sweep in gradient_with_buf() and sparse_jacobian_par() — both now call shared reverse_sweep_core() instead of inlining the loop. gradient_with_buf gains the zero-adjoint skip optimization it was previously missing.
  • Bumped nalgebra dependency from 0.33 to 0.34

Fixed

  • Corrected opcode variant count in documentation (44 variants, not 38/43)
  • Fixed CONTRIBUTING.md MSRV reference (1.93, not 1.80)

v0.3.0

25 Feb 14:25
98020ac

Choose a tag to compare

Added

Differential Operator Evaluation (diffop feature)

  • diffop::mixed_partial(tape, x, orders) — compute any mixed partial derivative via jet coefficient extraction
  • diffop::hessian(tape, x) — full Hessian via jet extraction (cross-validated against tape.hessian())
  • MultiIndex — specify which mixed partial to compute (e.g., [2, 0, 1] = ∂³u/∂x₀²∂x₂)
  • JetPlan::plan(n, indices) — precompute slot assignments and extraction prefactors; reuse across evaluation points
  • diffop::eval_dyn(plan, tape, x) — evaluate a plan at a new point using TaylorDyn
  • Pushforward grouping: multi-indices with different active variable sets get separate forward passes to avoid slot contamination
  • Prime window sliding for collision-free slot assignment up to high derivative orders

v0.2.0

25 Feb 01:02
9e19af9

Choose a tag to compare

Added

Bytecode Tape (Graph-Mode AD)

  • BytecodeTape SoA graph-mode AD with opcode dispatch and tape optimization (CSE, DCE, constant folding)
  • BReverse<F> tape-recording reverse-mode variable
  • record() / record_multi() to build tapes from closures
  • Hessian computation via forward-over-reverse (hessian, hvp)
  • DualVec<F, N> batched forward-mode with N tangent lanes for vectorized Hessians (hessian_vec)

Sparse Derivatives

  • Sparsity pattern detection via bitset propagation
  • Graph coloring: greedy distance-2 for Jacobians, star bicoloring for Hessians
  • sparse_jacobian, sparse_hessian, sparse_hessian_vec
  • CSR storage (CsrPattern, JacobianSparsityPattern, SparsityPattern)

Taylor Mode AD

  • Taylor<F, K> const-generic Taylor coefficients with Cauchy product propagation
  • TaylorDyn<F> arena-based dynamic Taylor (runtime degree)
  • taylor_grad / taylor_grad_with_buf — reverse-over-Taylor for gradient + HVP + higher-order adjoints
  • ode_taylor_step / ode_taylor_step_with_buf — ODE Taylor series integration via coefficient bootstrapping

Stochastic Taylor Derivative Estimators (STDE)

  • laplacian — Hutchinson trace estimator for Laplacian approximation
  • hessian_diagonal — exact Hessian diagonal via coordinate basis
  • directional_derivatives — batched second-order directional derivatives
  • laplacian_with_stats — Welford's online variance tracking
  • laplacian_with_control — diagonal control variate variance reduction
  • Estimator trait generalizing per-direction sample computation (Laplacian, GradientSquaredNorm)
  • estimate / estimate_weighted generic pipeline
  • Hutchinson divergence estimator for vector fields via Dual<F> forward mode
  • Hutch++ (Meyer et al. 2021) O(1/S²) trace estimator via sketch + residual decomposition
  • Importance-weighted estimation (West's 1979 algorithm)

Cross-Country Elimination

  • jacobian_cross_country — Markowitz vertex elimination on linearized computational graph

Custom Operations

  • eval_dual / partials_dual default methods on CustomOp<F> for correct second-order derivatives (HVP, Hessian) through custom ops

Nonsmooth AD

  • forward_nonsmooth — branch tracking and kink detection for abs/min/max/signum/floor/ceil/round/trunc
  • clarke_jacobian — Clarke generalized Jacobian via limiting Jacobian enumeration
  • has_nontrivial_subdifferential() — two-tier classification: all 8 nonsmooth ops tracked for proximity detection; only abs/min/max enumerated in Clarke Jacobian
  • KinkEntry, NonsmoothInfo, ClarkeError types

Laurent Series

  • Laurent<F, K> — singularity analysis with pole tracking, flows through BytecodeTape::forward_tangent

Checkpointing

  • grad_checkpointed — binomial Revolve checkpointing
  • grad_checkpointed_online — periodic thinning for unknown step count
  • grad_checkpointed_disk — disk-backed for large state vectors
  • grad_checkpointed_with_hints — user-controlled checkpoint placement

GPU Acceleration

  • wgpu backend: batched forward, gradient, sparse Jacobian, HVP, sparse Hessian (f32, Metal/Vulkan/DX12)
  • CUDA backend: same operations with f32 + f64 support (NVRTC runtime compilation)
  • GpuBackend trait unifying wgpu and CUDA backends behind a common interface

Composable Mode Nesting

  • Type-level AD composition: Dual<BReverse<f64>>, Taylor<BReverse<f64>, K>, DualVec<BReverse<f64>, N>
  • composed_hvp convenience function for forward-over-reverse HVP
  • BReverse<Dual<f64>> reverse-wrapping-forward composition via BtapeThreadLocal impls for Dual<f32> and Dual<f64>

Serialization

  • serde support for BytecodeTape, Laurent<F, K>, KinkEntry, NonsmoothInfo, ClarkeError
  • JSON and bincode roundtrip support

Linear Algebra Integrations

  • faer_support: HVP, sparse Hessian, dense/sparse solvers (LU, Cholesky)
  • nalgebra_support: gradient, Hessian, Jacobian with nalgebra types
  • ndarray_support: HVP, sparse Hessian, sparse Jacobian with ndarray types

Optimization Solvers (echidna-optim)

  • L-BFGS solver with two-loop recursion
  • Newton solver with Cholesky factorization
  • Trust-region solver with Steihaug-Toint CG
  • Armijo line search
  • Implicit differentiation: implicit_tangent, implicit_adjoint, implicit_jacobian, implicit_hvp, implicit_hessian
  • Piggyback differentiation: tangent, adjoint, and interleaved forward-adjoint modes
  • Sparse implicit differentiation via faer sparse LU (sparse-implicit feature)

Benchmarking

  • Criterion benchmarks for Taylor mode, STDE, cross-country, sparse derivatives, nonsmooth
  • Comparison benchmarks against num-dual and ad-trait (forward + reverse gradient)
  • Correctness cross-check tests verifying ad-trait gradient agreement with echidna
  • CI regression detection via criterion-compare-action

Changed

  • Tape optimization: algebraic simplification at recording time (identity, absorbing, powi patterns)
  • Tape optimization: targeted multi-output DCE (dead_code_elimination_for_outputs)
  • Thread-local Adept tape pooling — grad()/vjp() reuse cleared tapes via thread-local pool instead of per-call allocation
  • Signed::signum() for BReverse<F> now records OpCode::Signum to tape (was returning a constant)
  • MSRV raised from 1.80 to 1.93
  • WelfordAccumulator struct extracted, deduplicating Welford's algorithm across 4 STDE functions
  • cuda_err helper extracted, replacing 72 inline .map_err closures in CUDA backend
  • create_tape_bind_group method extracted, replacing 4 duplicated bind group blocks in wgpu backend