Deterministic, bit-perfect data pipeline for safety-critical ML systems.
Pure C99. Zero dynamic allocation. Certifiable for DO-178C, IEC 62304, and ISO 26262.
Standard ML data pipelines are inherently non-deterministic:
- Floating-point normalization varies across platforms
- Random shuffling produces different orders each run
- Data augmentation depends on random number generators
- Non-deterministic batch construction
For safety-critical systems, you cannot certify what you cannot reproduce.
Read more:
- Why Floating Point Is Dangerous
- Bit-Perfect Reproducibility: Why It Matters and How to Prove It
- Why Your ML Model Gives Different Results Every Tuesday
certifiable-data defines data processing as a deterministic transformation pipeline:
Q16.16 format for all normalization. Same math, same result, every platform.
Read more: Fixed-Point Neural Networks: The Math Behind Q16.16
Feistel network with cycle-walking. Cryptographic bijection: π: [0, N-1] → [0, N-1].
Counter-based PRNG: PRNG(seed, sample_id, epoch) → deterministic transforms. Same seed = same augmentation.
Every batch cryptographically committed. Any batch verifiable in O(log N) time.
Read more: Cryptographic Execution Tracing and Evidentiary Integrity
Result: B_t = Pipeline(D, seed, epoch, t) — Data loading is a pure function.
All core modules complete — 8/8 test suites passing (142 tests).
| Module | Description | Status |
|---|---|---|
| DVM Primitives | Fixed-point arithmetic with fault detection | ✅ |
| Counter-based PRNG | Deterministic pseudo-random generation | ✅ |
| Feistel Shuffle | Cycle-walking bijection for any N | ✅ |
| Normalization | Q16.16 standardization | ✅ |
| Augmentation | Deterministic flip, crop, noise | ✅ |
| Batch Construction | Static allocation with Merkle commitment | ✅ |
| Merkle Chain | SHA256 provenance trail | ✅ |
| Bit Identity | Cross-platform reproducibility tests | ✅ |
mkdir build && cd build
cmake ..
make
make test # Run all 8 test suites (142 tests)100% tests passed, 0 tests failed out of 8
Total Test time (real) = 0.04 sec
#include "ct_types.h"
#include "loader.h"
#include "normalize.h"
#include "augment.h"
#include "shuffle.h"
#include "batch.h"
#include "merkle.h"
// All buffers pre-allocated (no malloc)
ct_sample_t dataset_samples[60000];
ct_dataset_t dataset = {
.samples = dataset_samples,
.num_samples = 60000
};
ct_fault_flags_t faults = {0};
// Load data from CSV (deterministic decimal parsing)
ct_load_csv("mnist.csv", &dataset, &faults);
// Setup normalization
int32_t means[784] = {/* computed mean per feature */};
int32_t inv_stds[784] = {/* 1/std per feature in Q16.16 */};
ct_normalize_ctx_t norm_ctx;
ct_normalize_init(&norm_ctx, means, inv_stds, 784);
// Setup augmentation
ct_augment_flags_t aug_flags = {
.h_flip = 1,
.random_crop = 1,
.gaussian_noise = 0
};
ct_augment_ctx_t aug_ctx;
ct_augment_init(&aug_ctx, seed, epoch, aug_flags);
// Setup shuffling
ct_shuffle_ctx_t shuffle_ctx;
ct_shuffle_init(&shuffle_ctx, seed, epoch);
// Create batch
ct_sample_t batch_samples[32];
ct_hash_t batch_hashes[32];
ct_batch_t batch;
ct_batch_init(&batch, batch_samples, batch_hashes, 32);
ct_batch_fill(&batch, &dataset, batch_index, epoch, seed);
// Verify batch integrity
int valid = ct_batch_verify(&batch);
// Initialize provenance chain
ct_provenance_t prov;
ct_hash_t dataset_hash, config_hash;
ct_hash_dataset(&dataset, dataset_hash);
ct_provenance_init(&prov, dataset_hash, config_hash, seed);
// Advance epoch
ct_hash_t epoch_hash;
ct_hash_epoch(batch_hashes, num_batches, epoch_hash);
ct_provenance_advance(&prov, epoch_hash);
if (ct_has_fault(&faults)) {
// Pipeline invalidated - do not proceed
}All arithmetic operations use widening and saturation:
// CORRECT: Explicit widening
int64_t wide = (int64_t)a * (int64_t)b;
return dvm_round_shift_rne(wide, 16, &faults);
// FORBIDDEN: Raw overflow
return (a * b) >> 16; // Undefined behaviorRead more: From Proofs to Code: Mathematical Transcription in C
| Format | Use Case | Range | Precision |
|---|---|---|---|
| Q16.16 | Data values, normalization | ±32768 | 1.5×10⁻⁵ |
| Q16.16 | Augmentation parameters | ±32768 | 1.5×10⁻⁵ |
Every operation signals faults without silent failure:
typedef struct {
uint32_t overflow : 1; // Saturated high
uint32_t underflow : 1; // Saturated low
uint32_t div_zero : 1; // Division by zero
uint32_t domain : 1; // Invalid input
uint32_t precision : 1; // Precision loss detected
} ct_fault_flags_t;Read more: Closure, Totality, and the Algebra of Safe Systems
Cycle-walking Feistel provides true bijection for any dataset size N:
π: [0, N-1] → [0, N-1] (one-to-one and onto)
Test vectors (CT-MATH-001 §7.2):
N=100, seed=0x123456789ABCDEF0, index=0 → 26
N=100, seed=0x123456789ABCDEF0, index=99 → 41
N=60000, seed=0xFEDCBA9876543210, index=0 → 26382
Same seed + epoch = same shuffle, every time, every platform.
Every epoch produces a cryptographic commitment:
h_0 = SHA256(0x03 || H_dataset || H_config || seed)
h_e = SHA256(0x04 || h_{e-1} || H_epoch || e)
Any epoch can be independently verified. If faults occur, the chain is invalidated.
| Module | Tests | Coverage |
|---|---|---|
| DVM Primitives | 38 | CT-MATH-001 §3 test vectors |
| PRNG | 13 | Determinism, distribution quality |
| Shuffle | 19 | Bijection, CT-MATH-001 §7.2 vectors |
| Normalize | 13 | Correctness, metadata preservation |
| Augment | 10 | Deterministic transforms |
| Batch | 12 | Construction, verification |
| Merkle | 20 | Hashing, provenance chain |
| Bit-Identity | 17 | Cross-platform verification |
Total: 142 tests
- CT-MATH-001.md — Mathematical foundations
- CT-STRUCT-001.md — Data structure specifications
- docs/requirements/ — SRS documents with full traceability
| Project | Description |
|---|---|
| certifiable-data | Deterministic data pipeline |
| certifiable-training | Deterministic training engine |
| certifiable-quant | Deterministic quantization |
| certifiable-deploy | Deterministic model packaging |
| certifiable-inference | Deterministic inference engine |
Together, these projects provide a complete deterministic ML pipeline for safety-critical systems:
certifiable-data → certifiable-training → certifiable-quant → certifiable-deploy → certifiable-inference
IEC 62304 Class C requires traceable, reproducible software. Non-deterministic data loading cannot be validated.
Read more: IEC 62304 Class C: What Medical Device Software Actually Requires
ISO 26262 ASIL-D demands provable behavior. Data pipelines must be auditable.
Read more:
DO-178C Level A requires complete requirements traceability. "We shuffled the data randomly" is not certifiable.
Read more: DO-178C Level A Certification: How Deterministic Execution Can Streamline Certification Effort
This is the first ML data pipeline designed from the ground up for safety-critical certification.
Want to understand the engineering principles behind certifiable-data?
Determinism & Reproducibility:
- Bit-Perfect Reproducibility: Why It Matters and How to Prove It
- The ML Non-Determinism Problem
- From Proofs to Code: Mathematical Transcription in C
Safety-Critical Foundations:
- The Real Cost of Dynamic Memory in Safety-Critical Systems
- Closure, Totality, and the Algebra of Safe Systems
Production ML Architecture:
This implementation is designed to support certification under:
- DO-178C (Aerospace software)
- IEC 62304 (Medical device software)
- ISO 26262 (Automotive functional safety)
- IEC 61508 (Industrial safety systems)
For compliance packages and certification assistance, contact below.
We welcome contributions from systems engineers working in safety-critical domains. See CONTRIBUTING.md.
Important: All contributors must sign a Contributor License Agreement.
Dual Licensed:
- Open Source: GNU General Public License v3.0 (GPLv3)
- Commercial: Available for proprietary use in safety-critical systems
For commercial licensing and compliance documentation packages, contact below.
This implementation is built on the Murray Deterministic Computing Platform (MDCP), protected by UK Patent GB2521625.0.
MDCP defines a deterministic computing architecture for safety-critical systems, providing:
- Provable execution bounds
- Resource-deterministic operation
- Certification-ready patterns
- Platform-independent behavior
Read more: MDCP vs. Conventional RTOS
For commercial licensing inquiries: william@fstopify.com
Built by SpeyTech in the Scottish Highlands.
30 years of UNIX infrastructure experience applied to deterministic computing for safety-critical systems.
Patent: UK GB2521625.0 - Murray Deterministic Computing Platform (MDCP)
Contact:
William Murray
william@fstopify.com
speytech.com
More from SpeyTech:
Building deterministic AI systems for when lives depend on the answer.
Copyright © 2026 The Murray Family Innovation Trust. All rights reserved.