Releases: Entrolution/echidna
Releases · Entrolution/echidna
v0.5.0
Added
- GPU cast safety audit: SAFETY comments on all
as u32casts in GPU paths (mod.rs,cuda_backend.rs,wgpu_backend.rs,stde_gpu.rs). Addeddebug_assert!guards on user-provided direction/batch counts instde_gpu.rs. #[must_use]annotations: 19 pure functions now carry#[must_use](support module helpers, GPU codegen, solver wrappers,Laurent::zero/one).#![warn(missing_docs)]: enabled crate-wide. All public items — 35OpCodevariants, ~190 elemental methods acrossDual,DualVec,Taylor,TaylorDyn,Laurent, struct fields, and trait methods — now have doc comments.
Changed
- Test decomposition: split
tests/stde.rs(1630 lines, 76 tests) into 5 focused files:stde_core,stde_stats,stde_pipeline,stde_higher_order,stde_dense. All 76 tests preserved. - Removed
ROADMAP.md— all phases (0–5) complete.
v0.4.0
Changed
Internal Architecture
- BytecodeTape decomposition: split 2,689-line monolithic
bytecode_tape.rsinto a directory module with 10 focused submodules (forward.rs,reverse.rs,tangent.rs,jacobian.rs,sparse.rs,optimize.rs,taylor.rs,parallel.rs,serde_support.rs,thread_local.rs). Zero public API changes; benchmarks confirm no performance impact. - Deduplicated reverse sweep in
gradient_with_buf()andsparse_jacobian_par()— both now call sharedreverse_sweep_core()instead of inlining the loop.gradient_with_bufgains the zero-adjoint skip optimization it was previously missing. - Bumped
nalgebradependency from 0.33 to 0.34
Fixed
- Corrected opcode variant count in documentation (44 variants, not 38/43)
- Fixed CONTRIBUTING.md MSRV reference (1.93, not 1.80)
v0.3.0
Added
Differential Operator Evaluation (diffop feature)
diffop::mixed_partial(tape, x, orders)— compute any mixed partial derivative via jet coefficient extractiondiffop::hessian(tape, x)— full Hessian via jet extraction (cross-validated againsttape.hessian())MultiIndex— specify which mixed partial to compute (e.g.,[2, 0, 1]= ∂³u/∂x₀²∂x₂)JetPlan::plan(n, indices)— precompute slot assignments and extraction prefactors; reuse across evaluation pointsdiffop::eval_dyn(plan, tape, x)— evaluate a plan at a new point usingTaylorDyn- Pushforward grouping: multi-indices with different active variable sets get separate forward passes to avoid slot contamination
- Prime window sliding for collision-free slot assignment up to high derivative orders
v0.2.0
Added
Bytecode Tape (Graph-Mode AD)
BytecodeTapeSoA graph-mode AD with opcode dispatch and tape optimization (CSE, DCE, constant folding)BReverse<F>tape-recording reverse-mode variablerecord()/record_multi()to build tapes from closures- Hessian computation via forward-over-reverse (
hessian,hvp) DualVec<F, N>batched forward-mode with N tangent lanes for vectorized Hessians (hessian_vec)
Sparse Derivatives
- Sparsity pattern detection via bitset propagation
- Graph coloring: greedy distance-2 for Jacobians, star bicoloring for Hessians
sparse_jacobian,sparse_hessian,sparse_hessian_vec- CSR storage (
CsrPattern,JacobianSparsityPattern,SparsityPattern)
Taylor Mode AD
Taylor<F, K>const-generic Taylor coefficients with Cauchy product propagationTaylorDyn<F>arena-based dynamic Taylor (runtime degree)taylor_grad/taylor_grad_with_buf— reverse-over-Taylor for gradient + HVP + higher-order adjointsode_taylor_step/ode_taylor_step_with_buf— ODE Taylor series integration via coefficient bootstrapping
Stochastic Taylor Derivative Estimators (STDE)
laplacian— Hutchinson trace estimator for Laplacian approximationhessian_diagonal— exact Hessian diagonal via coordinate basisdirectional_derivatives— batched second-order directional derivativeslaplacian_with_stats— Welford's online variance trackinglaplacian_with_control— diagonal control variate variance reductionEstimatortrait generalizing per-direction sample computation (Laplacian,GradientSquaredNorm)estimate/estimate_weightedgeneric pipeline- Hutchinson divergence estimator for vector fields via
Dual<F>forward mode - Hutch++ (Meyer et al. 2021) O(1/S²) trace estimator via sketch + residual decomposition
- Importance-weighted estimation (West's 1979 algorithm)
Cross-Country Elimination
jacobian_cross_country— Markowitz vertex elimination on linearized computational graph
Custom Operations
eval_dual/partials_dualdefault methods onCustomOp<F>for correct second-order derivatives (HVP, Hessian) through custom ops
Nonsmooth AD
forward_nonsmooth— branch tracking and kink detection for abs/min/max/signum/floor/ceil/round/truncclarke_jacobian— Clarke generalized Jacobian via limiting Jacobian enumerationhas_nontrivial_subdifferential()— two-tier classification: all 8 nonsmooth ops tracked for proximity detection; only abs/min/max enumerated in Clarke JacobianKinkEntry,NonsmoothInfo,ClarkeErrortypes
Laurent Series
Laurent<F, K>— singularity analysis with pole tracking, flows throughBytecodeTape::forward_tangent
Checkpointing
grad_checkpointed— binomial Revolve checkpointinggrad_checkpointed_online— periodic thinning for unknown step countgrad_checkpointed_disk— disk-backed for large state vectorsgrad_checkpointed_with_hints— user-controlled checkpoint placement
GPU Acceleration
- wgpu backend: batched forward, gradient, sparse Jacobian, HVP, sparse Hessian (f32, Metal/Vulkan/DX12)
- CUDA backend: same operations with f32 + f64 support (NVRTC runtime compilation)
GpuBackendtrait unifying wgpu and CUDA backends behind a common interface
Composable Mode Nesting
- Type-level AD composition:
Dual<BReverse<f64>>,Taylor<BReverse<f64>, K>,DualVec<BReverse<f64>, N> composed_hvpconvenience function for forward-over-reverse HVPBReverse<Dual<f64>>reverse-wrapping-forward composition viaBtapeThreadLocalimpls forDual<f32>andDual<f64>
Serialization
serdesupport forBytecodeTape,Laurent<F, K>,KinkEntry,NonsmoothInfo,ClarkeError- JSON and bincode roundtrip support
Linear Algebra Integrations
faer_support: HVP, sparse Hessian, dense/sparse solvers (LU, Cholesky)nalgebra_support: gradient, Hessian, Jacobian with nalgebra typesndarray_support: HVP, sparse Hessian, sparse Jacobian with ndarray types
Optimization Solvers (echidna-optim)
- L-BFGS solver with two-loop recursion
- Newton solver with Cholesky factorization
- Trust-region solver with Steihaug-Toint CG
- Armijo line search
- Implicit differentiation:
implicit_tangent,implicit_adjoint,implicit_jacobian,implicit_hvp,implicit_hessian - Piggyback differentiation: tangent, adjoint, and interleaved forward-adjoint modes
- Sparse implicit differentiation via faer sparse LU (
sparse-implicitfeature)
Benchmarking
- Criterion benchmarks for Taylor mode, STDE, cross-country, sparse derivatives, nonsmooth
- Comparison benchmarks against num-dual and ad-trait (forward + reverse gradient)
- Correctness cross-check tests verifying ad-trait gradient agreement with echidna
- CI regression detection via criterion-compare-action
Changed
- Tape optimization: algebraic simplification at recording time (identity, absorbing, powi patterns)
- Tape optimization: targeted multi-output DCE (
dead_code_elimination_for_outputs) - Thread-local Adept tape pooling —
grad()/vjp()reuse cleared tapes via thread-local pool instead of per-call allocation Signed::signum()forBReverse<F>now recordsOpCode::Signumto tape (was returning a constant)- MSRV raised from 1.80 to 1.93
WelfordAccumulatorstruct extracted, deduplicating Welford's algorithm across 4 STDE functionscuda_errhelper extracted, replacing 72 inline.map_errclosures in CUDA backendcreate_tape_bind_groupmethod extracted, replacing 4 duplicated bind group blocks in wgpu backend